diff --git a/docs/_posts/DevinTDHa/2024-09-23-phi3.5_mini_4k_instruct_q4_gguf_en.md b/docs/_posts/DevinTDHa/2024-09-23-phi3.5_mini_4k_instruct_q4_gguf_en.md new file mode 100644 index 00000000000000..66850c18993895 --- /dev/null +++ b/docs/_posts/DevinTDHa/2024-09-23-phi3.5_mini_4k_instruct_q4_gguf_en.md @@ -0,0 +1,120 @@ +--- +layout: model +title: Phi-3.5-mini Q4_K_M GGUF +author: John Snow Labs +name: phi3.5_mini_4k_instruct_q4_gguf +date: 2024-09-23 +tags: [gguf, phi, open_source, en, tensorflow] +task: Text Generation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: tensorflow +annotator: AutoGGUFModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Phi-3.5-mini is a lightweight, state-of-the-art open model built upon datasets used for Phi-3 - synthetic data and filtered publicly available websites - with a focus on very high-quality, reasoning dense data. The model belongs to the Phi-3 model family and supports 128K token context length. + +Original model from https://huggingface.co/bartowski/Phi-3.5-mini-instruct-GGUF. + +## Predicted Entities + + + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/phi3.5_mini_4k_instruct_q4_gguf_en_5.5.0_3.0_1727109802829.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/phi3.5_mini_4k_instruct_q4_gguf_en_5.5.0_3.0_1727109802829.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +import sparknlp +from sparknlp.base import * +from sparknlp.annotator import * +from pyspark.ml import Pipeline + +document = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +autoGGUFModel = AutoGGUFModel.pretrained() \ + .setInputCols(["document"]) \ + .setOutputCol("completions") \ + .setBatchSize(4) \ + .setNPredict(20) \ + .setNGpuLayers(99) \ + .setTemperature(0.4) \ + .setTopK(40) \ + .setTopP(0.9) \ + .setPenalizeNl(True) + +pipeline = Pipeline().setStages([document, autoGGUFModel]) +data = spark.createDataFrame([["Hello, I am a"]]).toDF("text") +result = pipeline.fit(data).transform(data) +result.select("completions").show(truncate = False) +``` +```scala +import com.johnsnowlabs.nlp.base._ +import com.johnsnowlabs.nlp.annotator._ +import org.apache.spark.ml.Pipeline +import spark.implicits._ + +val document = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val autoGGUFModel = AutoGGUFModel + .pretrained() + .setInputCols("document") + .setOutputCol("completions") + .setBatchSize(4) + .setNPredict(20) + .setNGpuLayers(99) + .setTemperature(0.4f) + .setTopK(40) + .setTopP(0.9f) + .setPenalizeNl(true) + +val pipeline = new Pipeline().setStages(Array(document, autoGGUFModel)) + +val data = Seq("Hello, I am a").toDF("text") +val result = pipeline.fit(data).transform(data) +result.select("completions").show(truncate = false) +``` +
+ +## Results + +```bash ++-----------------------------------------------------------------------------------------------------------------------------------+ +|completions | ++-----------------------------------------------------------------------------------------------------------------------------------+ +|[{document, 0, 78, new user. I am currently working on a project and I need to create a list of , {prompt -> Hello, I am a}, []}]| ++-----------------------------------------------------------------------------------------------------------------------------------+ +``` + +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|phi3.5_mini_4k_instruct_q4_gguf| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[completions]| +|Language:|en| +|Size:|2.4 GB| \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-04-burmese_awesome_wnut_model_sgonzalezsilot_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-04-burmese_awesome_wnut_model_sgonzalezsilot_pipeline_en.md new file mode 100644 index 00000000000000..4fac412ca9f8de --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-04-burmese_awesome_wnut_model_sgonzalezsilot_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_wnut_model_sgonzalezsilot_pipeline pipeline DistilBertForTokenClassification from sgonzalezsilot +author: John Snow Labs +name: burmese_awesome_wnut_model_sgonzalezsilot_pipeline +date: 2024-09-04 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_wnut_model_sgonzalezsilot_pipeline` is a English model originally trained by sgonzalezsilot. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_wnut_model_sgonzalezsilot_pipeline_en_5.5.0_3.0_1725493021938.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_wnut_model_sgonzalezsilot_pipeline_en_5.5.0_3.0_1725493021938.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_wnut_model_sgonzalezsilot_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_wnut_model_sgonzalezsilot_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_wnut_model_sgonzalezsilot_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.3 MB| + +## References + +https://huggingface.co/sgonzalezsilot/my_awesome_wnut_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-04-sent_bert_base_spanish_wwm_uncased_pipeline_es.md b/docs/_posts/ahmedlone127/2024-09-04-sent_bert_base_spanish_wwm_uncased_pipeline_es.md new file mode 100644 index 00000000000000..31f9b1e49628ed --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-04-sent_bert_base_spanish_wwm_uncased_pipeline_es.md @@ -0,0 +1,71 @@ +--- +layout: model +title: Castilian, Spanish sent_bert_base_spanish_wwm_uncased_pipeline pipeline BertSentenceEmbeddings from dccuchile +author: John Snow Labs +name: sent_bert_base_spanish_wwm_uncased_pipeline +date: 2024-09-04 +tags: [es, open_source, pipeline, onnx] +task: Embeddings +language: es +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_spanish_wwm_uncased_pipeline` is a Castilian, Spanish model originally trained by dccuchile. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_spanish_wwm_uncased_pipeline_es_5.5.0_3.0_1725415961164.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_spanish_wwm_uncased_pipeline_es_5.5.0_3.0_1725415961164.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_spanish_wwm_uncased_pipeline", lang = "es") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_spanish_wwm_uncased_pipeline", lang = "es") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_spanish_wwm_uncased_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|es| +|Size:|410.2 MB| + +## References + +https://huggingface.co/dccuchile/bert-base-spanish-wwm-uncased + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-04-sent_tiny_biobert_en.md b/docs/_posts/ahmedlone127/2024-09-04-sent_tiny_biobert_en.md new file mode 100644 index 00000000000000..541d3ddeb29bf0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-04-sent_tiny_biobert_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_tiny_biobert BertSentenceEmbeddings from nlpie +author: John Snow Labs +name: sent_tiny_biobert +date: 2024-09-04 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_tiny_biobert` is a English model originally trained by nlpie. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_tiny_biobert_en_5.5.0_3.0_1725454293294.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_tiny_biobert_en_5.5.0_3.0_1725454293294.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_tiny_biobert","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_tiny_biobert","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_tiny_biobert| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|51.9 MB| + +## References + +https://huggingface.co/nlpie/tiny-biobert \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-04-xlm_roberta_base_tweet_sentiment_portuguese_trimmed_portuguese_15000_en.md b/docs/_posts/ahmedlone127/2024-09-04-xlm_roberta_base_tweet_sentiment_portuguese_trimmed_portuguese_15000_en.md new file mode 100644 index 00000000000000..e58752f63054d8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-04-xlm_roberta_base_tweet_sentiment_portuguese_trimmed_portuguese_15000_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_tweet_sentiment_portuguese_trimmed_portuguese_15000 XlmRoBertaForSequenceClassification from vocabtrimmer +author: John Snow Labs +name: xlm_roberta_base_tweet_sentiment_portuguese_trimmed_portuguese_15000 +date: 2024-09-04 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_tweet_sentiment_portuguese_trimmed_portuguese_15000` is a English model originally trained by vocabtrimmer. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_tweet_sentiment_portuguese_trimmed_portuguese_15000_en_5.5.0_3.0_1725410232175.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_tweet_sentiment_portuguese_trimmed_portuguese_15000_en_5.5.0_3.0_1725410232175.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_tweet_sentiment_portuguese_trimmed_portuguese_15000","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_tweet_sentiment_portuguese_trimmed_portuguese_15000", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_tweet_sentiment_portuguese_trimmed_portuguese_15000| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|358.2 MB| + +## References + +https://huggingface.co/vocabtrimmer/xlm-roberta-base-tweet-sentiment-pt-trimmed-pt-15000 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-05-bert_finetuned_ner_july_en.md b/docs/_posts/ahmedlone127/2024-09-05-bert_finetuned_ner_july_en.md new file mode 100644 index 00000000000000..7b2d708b5657e4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-05-bert_finetuned_ner_july_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_finetuned_ner_july DistilBertForTokenClassification from Amhyr +author: John Snow Labs +name: bert_finetuned_ner_july +date: 2024-09-05 +tags: [en, open_source, onnx, token_classification, distilbert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_finetuned_ner_july` is a English model originally trained by Amhyr. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_finetuned_ner_july_en_5.5.0_3.0_1725506530169.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_finetuned_ner_july_en_5.5.0_3.0_1725506530169.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = DistilBertForTokenClassification.pretrained("bert_finetuned_ner_july","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = DistilBertForTokenClassification.pretrained("bert_finetuned_ner_july", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_finetuned_ner_july| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|505.4 MB| + +## References + +https://huggingface.co/Amhyr/bert-finetuned-ner_july \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-05-darijabert_ar.md b/docs/_posts/ahmedlone127/2024-09-05-darijabert_ar.md new file mode 100644 index 00000000000000..4e51f13e835362 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-05-darijabert_ar.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Arabic darijabert BertEmbeddings from SI2M-Lab +author: John Snow Labs +name: darijabert +date: 2024-09-05 +tags: [ar, open_source, onnx, embeddings, bert] +task: Embeddings +language: ar +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`darijabert` is a Arabic model originally trained by SI2M-Lab. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/darijabert_ar_5.5.0_3.0_1725520088704.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/darijabert_ar_5.5.0_3.0_1725520088704.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = BertEmbeddings.pretrained("darijabert","ar") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = BertEmbeddings.pretrained("darijabert","ar") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|darijabert| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[bert]| +|Language:|ar| +|Size:|551.5 MB| + +## References + +https://huggingface.co/SI2M-Lab/DarijaBERT \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-05-distilbert_base_uncased_finetuned_yelp_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-05-distilbert_base_uncased_finetuned_yelp_pipeline_en.md new file mode 100644 index 00000000000000..34137fdde2b597 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-05-distilbert_base_uncased_finetuned_yelp_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_yelp_pipeline pipeline DistilBertForSequenceClassification from vinhanguyen +author: John Snow Labs +name: distilbert_base_uncased_finetuned_yelp_pipeline +date: 2024-09-05 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_yelp_pipeline` is a English model originally trained by vinhanguyen. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_yelp_pipeline_en_5.5.0_3.0_1725579974379.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_yelp_pipeline_en_5.5.0_3.0_1725579974379.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_yelp_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_yelp_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_yelp_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/vinhanguyen/distilbert-base-uncased-finetuned-yelp + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-05-finetuning_emotion_model_purushothama_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-05-finetuning_emotion_model_purushothama_pipeline_en.md new file mode 100644 index 00000000000000..8ebe7feaee9f3a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-05-finetuning_emotion_model_purushothama_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuning_emotion_model_purushothama_pipeline pipeline DistilBertForSequenceClassification from Purushothama +author: John Snow Labs +name: finetuning_emotion_model_purushothama_pipeline +date: 2024-09-05 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_emotion_model_purushothama_pipeline` is a English model originally trained by Purushothama. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_emotion_model_purushothama_pipeline_en_5.5.0_3.0_1725507702704.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_emotion_model_purushothama_pipeline_en_5.5.0_3.0_1725507702704.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuning_emotion_model_purushothama_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuning_emotion_model_purushothama_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_emotion_model_purushothama_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Purushothama/finetuning-emotion-model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-05-opus_maltese_english_romanian_finetuned_romanian_tonga_tonga_islands_english_european_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-05-opus_maltese_english_romanian_finetuned_romanian_tonga_tonga_islands_english_european_pipeline_en.md new file mode 100644 index 00000000000000..898a35c4f3834e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-05-opus_maltese_english_romanian_finetuned_romanian_tonga_tonga_islands_english_european_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English opus_maltese_english_romanian_finetuned_romanian_tonga_tonga_islands_english_european_pipeline pipeline MarianTransformer from himanshubeniwal +author: John Snow Labs +name: opus_maltese_english_romanian_finetuned_romanian_tonga_tonga_islands_english_european_pipeline +date: 2024-09-05 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opus_maltese_english_romanian_finetuned_romanian_tonga_tonga_islands_english_european_pipeline` is a English model originally trained by himanshubeniwal. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opus_maltese_english_romanian_finetuned_romanian_tonga_tonga_islands_english_european_pipeline_en_5.5.0_3.0_1725546188832.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opus_maltese_english_romanian_finetuned_romanian_tonga_tonga_islands_english_european_pipeline_en_5.5.0_3.0_1725546188832.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("opus_maltese_english_romanian_finetuned_romanian_tonga_tonga_islands_english_european_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("opus_maltese_english_romanian_finetuned_romanian_tonga_tonga_islands_english_european_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opus_maltese_english_romanian_finetuned_romanian_tonga_tonga_islands_english_european_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|508.6 MB| + +## References + +https://huggingface.co/himanshubeniwal/opus-mt-en-ro-finetuned-ro-to-en-European + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-05-qa_synthetic_data_only_finetuned_v1_0_en.md b/docs/_posts/ahmedlone127/2024-09-05-qa_synthetic_data_only_finetuned_v1_0_en.md new file mode 100644 index 00000000000000..9e3d2f0b42603e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-05-qa_synthetic_data_only_finetuned_v1_0_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English qa_synthetic_data_only_finetuned_v1_0 XlmRoBertaForQuestionAnswering from am-infoweb +author: John Snow Labs +name: qa_synthetic_data_only_finetuned_v1_0 +date: 2024-09-05 +tags: [en, open_source, onnx, question_answering, xlm_roberta] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`qa_synthetic_data_only_finetuned_v1_0` is a English model originally trained by am-infoweb. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/qa_synthetic_data_only_finetuned_v1_0_en_5.5.0_3.0_1725557042495.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/qa_synthetic_data_only_finetuned_v1_0_en_5.5.0_3.0_1725557042495.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = XlmRoBertaForQuestionAnswering.pretrained("qa_synthetic_data_only_finetuned_v1_0","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = XlmRoBertaForQuestionAnswering.pretrained("qa_synthetic_data_only_finetuned_v1_0", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|qa_synthetic_data_only_finetuned_v1_0| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|803.6 MB| + +## References + +https://huggingface.co/am-infoweb/QA_SYNTHETIC_DATA_ONLY_Finetuned_v1.0 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-06-claim_extraction_classifier_en.md b/docs/_posts/ahmedlone127/2024-09-06-claim_extraction_classifier_en.md new file mode 100644 index 00000000000000..e58eec7e15b188 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-06-claim_extraction_classifier_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English claim_extraction_classifier DeBertaForSequenceClassification from KnutJaegersberg +author: John Snow Labs +name: claim_extraction_classifier +date: 2024-09-06 +tags: [en, open_source, onnx, sequence_classification, deberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DeBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DeBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`claim_extraction_classifier` is a English model originally trained by KnutJaegersberg. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/claim_extraction_classifier_en_5.5.0_3.0_1725611976387.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/claim_extraction_classifier_en_5.5.0_3.0_1725611976387.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DeBertaForSequenceClassification.pretrained("claim_extraction_classifier","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DeBertaForSequenceClassification.pretrained("claim_extraction_classifier", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|claim_extraction_classifier| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.5 GB| + +## References + +https://huggingface.co/KnutJaegersberg/claim_extraction_classifier \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-06-complaints_classifier_jpsteinhafel_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-06-complaints_classifier_jpsteinhafel_pipeline_en.md new file mode 100644 index 00000000000000..4f4462b01c62c5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-06-complaints_classifier_jpsteinhafel_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English complaints_classifier_jpsteinhafel_pipeline pipeline DistilBertForSequenceClassification from jpsteinhafel +author: John Snow Labs +name: complaints_classifier_jpsteinhafel_pipeline +date: 2024-09-06 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`complaints_classifier_jpsteinhafel_pipeline` is a English model originally trained by jpsteinhafel. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/complaints_classifier_jpsteinhafel_pipeline_en_5.5.0_3.0_1725608160763.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/complaints_classifier_jpsteinhafel_pipeline_en_5.5.0_3.0_1725608160763.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("complaints_classifier_jpsteinhafel_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("complaints_classifier_jpsteinhafel_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|complaints_classifier_jpsteinhafel_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/jpsteinhafel/complaints_classifier + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-06-distibert_ner_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-06-distibert_ner_pipeline_en.md new file mode 100644 index 00000000000000..d8b7b96dd88c2d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-06-distibert_ner_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distibert_ner_pipeline pipeline DistilBertForTokenClassification from satyamrajawat1994 +author: John Snow Labs +name: distibert_ner_pipeline +date: 2024-09-06 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distibert_ner_pipeline` is a English model originally trained by satyamrajawat1994. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distibert_ner_pipeline_en_5.5.0_3.0_1725599516402.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distibert_ner_pipeline_en_5.5.0_3.0_1725599516402.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distibert_ner_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distibert_ner_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distibert_ner_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.3 MB| + +## References + +https://huggingface.co/satyamrajawat1994/distibert-ner + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-06-distilbert_base_uncased_finetuned_squad_d5716d28_mbateman_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-06-distilbert_base_uncased_finetuned_squad_d5716d28_mbateman_pipeline_en.md new file mode 100644 index 00000000000000..2aa2bf376b662c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-06-distilbert_base_uncased_finetuned_squad_d5716d28_mbateman_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_squad_d5716d28_mbateman_pipeline pipeline DistilBertForQuestionAnswering from mbateman +author: John Snow Labs +name: distilbert_base_uncased_finetuned_squad_d5716d28_mbateman_pipeline +date: 2024-09-06 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_squad_d5716d28_mbateman_pipeline` is a English model originally trained by mbateman. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_d5716d28_mbateman_pipeline_en_5.5.0_3.0_1725652836285.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_d5716d28_mbateman_pipeline_en_5.5.0_3.0_1725652836285.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_squad_d5716d28_mbateman_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_squad_d5716d28_mbateman_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_squad_d5716d28_mbateman_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.3 MB| + +## References + +https://huggingface.co/mbateman/distilbert-base-uncased-finetuned-squad-d5716d28 + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-06-distilbert_tokenizer_256k_mlm_750k_en.md b/docs/_posts/ahmedlone127/2024-09-06-distilbert_tokenizer_256k_mlm_750k_en.md new file mode 100644 index 00000000000000..0abfe057e23f65 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-06-distilbert_tokenizer_256k_mlm_750k_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_tokenizer_256k_mlm_750k DistilBertEmbeddings from vocab-transformers +author: John Snow Labs +name: distilbert_tokenizer_256k_mlm_750k +date: 2024-09-06 +tags: [en, open_source, onnx, embeddings, distilbert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_tokenizer_256k_mlm_750k` is a English model originally trained by vocab-transformers. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_tokenizer_256k_mlm_750k_en_5.5.0_3.0_1725639754885.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_tokenizer_256k_mlm_750k_en_5.5.0_3.0_1725639754885.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = DistilBertEmbeddings.pretrained("distilbert_tokenizer_256k_mlm_750k","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = DistilBertEmbeddings.pretrained("distilbert_tokenizer_256k_mlm_750k","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_tokenizer_256k_mlm_750k| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[distilbert]| +|Language:|en| +|Size:|891.8 MB| + +## References + +https://huggingface.co/vocab-transformers/distilbert-tokenizer_256k-MLM_750k \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-06-opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_german_finetuned_english_tonga_tonga_islands_german_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-06-opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_german_finetuned_english_tonga_tonga_islands_german_pipeline_en.md new file mode 100644 index 00000000000000..05ab6a6f3648d4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-06-opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_german_finetuned_english_tonga_tonga_islands_german_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_german_finetuned_english_tonga_tonga_islands_german_pipeline pipeline MarianTransformer from MicMer17 +author: John Snow Labs +name: opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_german_finetuned_english_tonga_tonga_islands_german_pipeline +date: 2024-09-06 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_german_finetuned_english_tonga_tonga_islands_german_pipeline` is a English model originally trained by MicMer17. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_german_finetuned_english_tonga_tonga_islands_german_pipeline_en_5.5.0_3.0_1725635684279.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_german_finetuned_english_tonga_tonga_islands_german_pipeline_en_5.5.0_3.0_1725635684279.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_german_finetuned_english_tonga_tonga_islands_german_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_german_finetuned_english_tonga_tonga_islands_german_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_german_finetuned_english_tonga_tonga_islands_german_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|509.1 MB| + +## References + +https://huggingface.co/MicMer17/opus-mt-en-ro-finetuned-en-to-de-finetuned-en-to-de + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-06-qa_model3_en.md b/docs/_posts/ahmedlone127/2024-09-06-qa_model3_en.md new file mode 100644 index 00000000000000..dc5e07679e6033 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-06-qa_model3_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English qa_model3 DistilBertForQuestionAnswering from sumittagadiya +author: John Snow Labs +name: qa_model3 +date: 2024-09-06 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`qa_model3` is a English model originally trained by sumittagadiya. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/qa_model3_en_5.5.0_3.0_1725654541549.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/qa_model3_en_5.5.0_3.0_1725654541549.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("qa_model3","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("qa_model3", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|qa_model3| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/sumittagadiya/qa_model3 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-06-roberta_classifier_large_finetuned_clinc_1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-06-roberta_classifier_large_finetuned_clinc_1_pipeline_en.md new file mode 100644 index 00000000000000..e8837a9f7d3821 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-06-roberta_classifier_large_finetuned_clinc_1_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_classifier_large_finetuned_clinc_1_pipeline pipeline RoBertaForSequenceClassification from lewtun +author: John Snow Labs +name: roberta_classifier_large_finetuned_clinc_1_pipeline +date: 2024-09-06 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_classifier_large_finetuned_clinc_1_pipeline` is a English model originally trained by lewtun. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_classifier_large_finetuned_clinc_1_pipeline_en_5.5.0_3.0_1725613321492.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_classifier_large_finetuned_clinc_1_pipeline_en_5.5.0_3.0_1725613321492.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_classifier_large_finetuned_clinc_1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_classifier_large_finetuned_clinc_1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_classifier_large_finetuned_clinc_1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/lewtun/roberta-large-finetuned-clinc-1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-06-tab_anonymizer_v2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-06-tab_anonymizer_v2_pipeline_en.md new file mode 100644 index 00000000000000..5c1412dcd7bdf1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-06-tab_anonymizer_v2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English tab_anonymizer_v2_pipeline pipeline DistilBertForTokenClassification from madaanpulkit +author: John Snow Labs +name: tab_anonymizer_v2_pipeline +date: 2024-09-06 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`tab_anonymizer_v2_pipeline` is a English model originally trained by madaanpulkit. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/tab_anonymizer_v2_pipeline_en_5.5.0_3.0_1725599112827.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/tab_anonymizer_v2_pipeline_en_5.5.0_3.0_1725599112827.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("tab_anonymizer_v2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("tab_anonymizer_v2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|tab_anonymizer_v2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.4 MB| + +## References + +https://huggingface.co/madaanpulkit/tab-anonymizer-v2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-06-whisper_small_english_accented_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-06-whisper_small_english_accented_pipeline_en.md new file mode 100644 index 00000000000000..5b21f7b5371152 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-06-whisper_small_english_accented_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_small_english_accented_pipeline pipeline WhisperForCTC from Abdo96 +author: John Snow Labs +name: whisper_small_english_accented_pipeline +date: 2024-09-06 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_english_accented_pipeline` is a English model originally trained by Abdo96. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_english_accented_pipeline_en_5.5.0_3.0_1725643691362.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_english_accented_pipeline_en_5.5.0_3.0_1725643691362.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_english_accented_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_english_accented_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_english_accented_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/Abdo96/whisper-small-en-accented + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-06-xlm_roberta_base_finetuned_emotion_37_labels_en.md b/docs/_posts/ahmedlone127/2024-09-06-xlm_roberta_base_finetuned_emotion_37_labels_en.md new file mode 100644 index 00000000000000..0fc405696689b4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-06-xlm_roberta_base_finetuned_emotion_37_labels_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_emotion_37_labels XlmRoBertaForSequenceClassification from upsalite +author: John Snow Labs +name: xlm_roberta_base_finetuned_emotion_37_labels +date: 2024-09-06 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_emotion_37_labels` is a English model originally trained by upsalite. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_emotion_37_labels_en_5.5.0_3.0_1725617257200.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_emotion_37_labels_en_5.5.0_3.0_1725617257200.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_finetuned_emotion_37_labels","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_finetuned_emotion_37_labels", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_emotion_37_labels| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|877.5 MB| + +## References + +https://huggingface.co/upsalite/xlm-roberta-base-finetuned-emotion-37-labels \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-07-bert_wnut_token_classifier_en.md b/docs/_posts/ahmedlone127/2024-09-07-bert_wnut_token_classifier_en.md new file mode 100644 index 00000000000000..e7c15210e33ccb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-07-bert_wnut_token_classifier_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_wnut_token_classifier DistilBertForTokenClassification from ZappY-AI +author: John Snow Labs +name: bert_wnut_token_classifier +date: 2024-09-07 +tags: [en, open_source, onnx, token_classification, distilbert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_wnut_token_classifier` is a English model originally trained by ZappY-AI. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_wnut_token_classifier_en_5.5.0_3.0_1725729872514.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_wnut_token_classifier_en_5.5.0_3.0_1725729872514.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = DistilBertForTokenClassification.pretrained("bert_wnut_token_classifier","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = DistilBertForTokenClassification.pretrained("bert_wnut_token_classifier", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_wnut_token_classifier| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|247.3 MB| + +## References + +https://huggingface.co/ZappY-AI/bert-wnut-token-classifier \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-07-bsc_bio_ehr_spanish_combined_train_drugtemist_dev_85_ner_en.md b/docs/_posts/ahmedlone127/2024-09-07-bsc_bio_ehr_spanish_combined_train_drugtemist_dev_85_ner_en.md new file mode 100644 index 00000000000000..af787ab0c74b20 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-07-bsc_bio_ehr_spanish_combined_train_drugtemist_dev_85_ner_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bsc_bio_ehr_spanish_combined_train_drugtemist_dev_85_ner RoBertaForTokenClassification from Rodrigo1771 +author: John Snow Labs +name: bsc_bio_ehr_spanish_combined_train_drugtemist_dev_85_ner +date: 2024-09-07 +tags: [en, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bsc_bio_ehr_spanish_combined_train_drugtemist_dev_85_ner` is a English model originally trained by Rodrigo1771. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bsc_bio_ehr_spanish_combined_train_drugtemist_dev_85_ner_en_5.5.0_3.0_1725720999381.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bsc_bio_ehr_spanish_combined_train_drugtemist_dev_85_ner_en_5.5.0_3.0_1725720999381.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("bsc_bio_ehr_spanish_combined_train_drugtemist_dev_85_ner","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("bsc_bio_ehr_spanish_combined_train_drugtemist_dev_85_ner", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bsc_bio_ehr_spanish_combined_train_drugtemist_dev_85_ner| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|440.6 MB| + +## References + +https://huggingface.co/Rodrigo1771/bsc-bio-ehr-es-combined-train-drugtemist-dev-85-ner \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-07-bsc_bio_ehr_spanish_combined_train_drugtemist_dev_ner_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-07-bsc_bio_ehr_spanish_combined_train_drugtemist_dev_ner_pipeline_en.md new file mode 100644 index 00000000000000..b7010dea9f20b2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-07-bsc_bio_ehr_spanish_combined_train_drugtemist_dev_ner_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bsc_bio_ehr_spanish_combined_train_drugtemist_dev_ner_pipeline pipeline RoBertaForTokenClassification from Rodrigo1771 +author: John Snow Labs +name: bsc_bio_ehr_spanish_combined_train_drugtemist_dev_ner_pipeline +date: 2024-09-07 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bsc_bio_ehr_spanish_combined_train_drugtemist_dev_ner_pipeline` is a English model originally trained by Rodrigo1771. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bsc_bio_ehr_spanish_combined_train_drugtemist_dev_ner_pipeline_en_5.5.0_3.0_1725668614321.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bsc_bio_ehr_spanish_combined_train_drugtemist_dev_ner_pipeline_en_5.5.0_3.0_1725668614321.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bsc_bio_ehr_spanish_combined_train_drugtemist_dev_ner_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bsc_bio_ehr_spanish_combined_train_drugtemist_dev_ner_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bsc_bio_ehr_spanish_combined_train_drugtemist_dev_ner_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|440.5 MB| + +## References + +https://huggingface.co/Rodrigo1771/bsc-bio-ehr-es-combined-train-drugtemist-dev-ner + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-07-burmese_awesome_model_zaizhou57_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-07-burmese_awesome_model_zaizhou57_pipeline_en.md new file mode 100644 index 00000000000000..83ee0fc79733ce --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-07-burmese_awesome_model_zaizhou57_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_model_zaizhou57_pipeline pipeline DistilBertForSequenceClassification from zaizhou57 +author: John Snow Labs +name: burmese_awesome_model_zaizhou57_pipeline +date: 2024-09-07 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_zaizhou57_pipeline` is a English model originally trained by zaizhou57. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_zaizhou57_pipeline_en_5.5.0_3.0_1725675047340.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_zaizhou57_pipeline_en_5.5.0_3.0_1725675047340.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_model_zaizhou57_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_model_zaizhou57_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_zaizhou57_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/zaizhou57/my_awesome_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-07-burmese_awesome_qa_model_gekyume_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-07-burmese_awesome_qa_model_gekyume_pipeline_en.md new file mode 100644 index 00000000000000..3481b3fd176c9a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-07-burmese_awesome_qa_model_gekyume_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English burmese_awesome_qa_model_gekyume_pipeline pipeline DistilBertForQuestionAnswering from Gekyume +author: John Snow Labs +name: burmese_awesome_qa_model_gekyume_pipeline +date: 2024-09-07 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_qa_model_gekyume_pipeline` is a English model originally trained by Gekyume. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_gekyume_pipeline_en_5.5.0_3.0_1725746264861.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_gekyume_pipeline_en_5.5.0_3.0_1725746264861.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_qa_model_gekyume_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_qa_model_gekyume_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_qa_model_gekyume_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.3 MB| + +## References + +https://huggingface.co/Gekyume/my_awesome_qa_model + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-07-burmese_awesome_qa_model_reza2002_en.md b/docs/_posts/ahmedlone127/2024-09-07-burmese_awesome_qa_model_reza2002_en.md new file mode 100644 index 00000000000000..bb8c15f90b59a0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-07-burmese_awesome_qa_model_reza2002_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English burmese_awesome_qa_model_reza2002 DistilBertForQuestionAnswering from Reza2002 +author: John Snow Labs +name: burmese_awesome_qa_model_reza2002 +date: 2024-09-07 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_qa_model_reza2002` is a English model originally trained by Reza2002. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_reza2002_en_5.5.0_3.0_1725736281303.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_reza2002_en_5.5.0_3.0_1725736281303.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("burmese_awesome_qa_model_reza2002","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("burmese_awesome_qa_model_reza2002", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_qa_model_reza2002| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/Reza2002/my_awesome_qa_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-07-burmese_awesome_wnut_jgtg_en.md b/docs/_posts/ahmedlone127/2024-09-07-burmese_awesome_wnut_jgtg_en.md new file mode 100644 index 00000000000000..1ac345c30db98e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-07-burmese_awesome_wnut_jgtg_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_awesome_wnut_jgtg DistilBertForTokenClassification from gonzalezrostani +author: John Snow Labs +name: burmese_awesome_wnut_jgtg +date: 2024-09-07 +tags: [en, open_source, onnx, token_classification, distilbert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_wnut_jgtg` is a English model originally trained by gonzalezrostani. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_wnut_jgtg_en_5.5.0_3.0_1725730237475.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_wnut_jgtg_en_5.5.0_3.0_1725730237475.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = DistilBertForTokenClassification.pretrained("burmese_awesome_wnut_jgtg","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = DistilBertForTokenClassification.pretrained("burmese_awesome_wnut_jgtg", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_wnut_jgtg| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/gonzalezrostani/my_awesome_wnut_JGTg \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-07-burmese_awesome_wnut_model_navnitan_en.md b/docs/_posts/ahmedlone127/2024-09-07-burmese_awesome_wnut_model_navnitan_en.md new file mode 100644 index 00000000000000..e39234091ce5c5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-07-burmese_awesome_wnut_model_navnitan_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_awesome_wnut_model_navnitan DistilBertForTokenClassification from navnitan +author: John Snow Labs +name: burmese_awesome_wnut_model_navnitan +date: 2024-09-07 +tags: [en, open_source, onnx, token_classification, distilbert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_wnut_model_navnitan` is a English model originally trained by navnitan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_wnut_model_navnitan_en_5.5.0_3.0_1725729691315.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_wnut_model_navnitan_en_5.5.0_3.0_1725729691315.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = DistilBertForTokenClassification.pretrained("burmese_awesome_wnut_model_navnitan","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = DistilBertForTokenClassification.pretrained("burmese_awesome_wnut_model_navnitan", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_wnut_model_navnitan| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|247.3 MB| + +## References + +https://huggingface.co/navnitan/my_awesome_wnut_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-07-burmese_awesome_wnut_model_pavement_en.md b/docs/_posts/ahmedlone127/2024-09-07-burmese_awesome_wnut_model_pavement_en.md new file mode 100644 index 00000000000000..7ae54f250576c4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-07-burmese_awesome_wnut_model_pavement_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_awesome_wnut_model_pavement DistilBertForTokenClassification from pavement +author: John Snow Labs +name: burmese_awesome_wnut_model_pavement +date: 2024-09-07 +tags: [en, open_source, onnx, token_classification, distilbert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_wnut_model_pavement` is a English model originally trained by pavement. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_wnut_model_pavement_en_5.5.0_3.0_1725730756627.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_wnut_model_pavement_en_5.5.0_3.0_1725730756627.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = DistilBertForTokenClassification.pretrained("burmese_awesome_wnut_model_pavement","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = DistilBertForTokenClassification.pretrained("burmese_awesome_wnut_model_pavement", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_wnut_model_pavement| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|247.3 MB| + +## References + +https://huggingface.co/pavement/my_awesome_wnut_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-07-burmese_ner_model_veronica1608_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-07-burmese_ner_model_veronica1608_pipeline_en.md new file mode 100644 index 00000000000000..c2087e1f612531 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-07-burmese_ner_model_veronica1608_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_ner_model_veronica1608_pipeline pipeline DistilBertForTokenClassification from veronica1608 +author: John Snow Labs +name: burmese_ner_model_veronica1608_pipeline +date: 2024-09-07 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_ner_model_veronica1608_pipeline` is a English model originally trained by veronica1608. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_ner_model_veronica1608_pipeline_en_5.5.0_3.0_1725730786824.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_ner_model_veronica1608_pipeline_en_5.5.0_3.0_1725730786824.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_ner_model_veronica1608_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_ner_model_veronica1608_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_ner_model_veronica1608_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.3 MB| + +## References + +https://huggingface.co/veronica1608/my_ner_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-07-distilbert_base_uncased_finetuned_ner_perriewang_en.md b/docs/_posts/ahmedlone127/2024-09-07-distilbert_base_uncased_finetuned_ner_perriewang_en.md new file mode 100644 index 00000000000000..b280cf78928a11 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-07-distilbert_base_uncased_finetuned_ner_perriewang_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_ner_perriewang DistilBertForTokenClassification from Perriewang +author: John Snow Labs +name: distilbert_base_uncased_finetuned_ner_perriewang +date: 2024-09-07 +tags: [en, open_source, onnx, token_classification, distilbert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_ner_perriewang` is a English model originally trained by Perriewang. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_ner_perriewang_en_5.5.0_3.0_1725730700245.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_ner_perriewang_en_5.5.0_3.0_1725730700245.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = DistilBertForTokenClassification.pretrained("distilbert_base_uncased_finetuned_ner_perriewang","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = DistilBertForTokenClassification.pretrained("distilbert_base_uncased_finetuned_ner_perriewang", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_ner_perriewang| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|247.3 MB| + +## References + +https://huggingface.co/Perriewang/distilbert-base-uncased-finetuned-ner \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-07-distilbert_base_uncased_finetuned_squad_soikit_en.md b/docs/_posts/ahmedlone127/2024-09-07-distilbert_base_uncased_finetuned_squad_soikit_en.md new file mode 100644 index 00000000000000..80ec0b755d5d82 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-07-distilbert_base_uncased_finetuned_squad_soikit_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_squad_soikit DistilBertForQuestionAnswering from soikit +author: John Snow Labs +name: distilbert_base_uncased_finetuned_squad_soikit +date: 2024-09-07 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_squad_soikit` is a English model originally trained by soikit. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_soikit_en_5.5.0_3.0_1725695167307.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_soikit_en_5.5.0_3.0_1725695167307.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_uncased_finetuned_squad_soikit","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_uncased_finetuned_squad_soikit", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_squad_soikit| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/soikit/distilbert-base-uncased-finetuned-squad \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-07-distilbert_imdb_deborahm_en.md b/docs/_posts/ahmedlone127/2024-09-07-distilbert_imdb_deborahm_en.md new file mode 100644 index 00000000000000..6bb64eecb2ba85 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-07-distilbert_imdb_deborahm_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_imdb_deborahm DistilBertForSequenceClassification from deborahm +author: John Snow Labs +name: distilbert_imdb_deborahm +date: 2024-09-07 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_imdb_deborahm` is a English model originally trained by deborahm. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_imdb_deborahm_en_5.5.0_3.0_1725674740208.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_imdb_deborahm_en_5.5.0_3.0_1725674740208.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_imdb_deborahm","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_imdb_deborahm", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_imdb_deborahm| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/deborahm/distilbert-imdb \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-07-dummy_model_arnmig_en.md b/docs/_posts/ahmedlone127/2024-09-07-dummy_model_arnmig_en.md new file mode 100644 index 00000000000000..e9b1d2daec3d99 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-07-dummy_model_arnmig_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English dummy_model_arnmig CamemBertEmbeddings from arnmig +author: John Snow Labs +name: dummy_model_arnmig +date: 2024-09-07 +tags: [en, open_source, onnx, embeddings, camembert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: CamemBertEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained CamemBertEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`dummy_model_arnmig` is a English model originally trained by arnmig. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/dummy_model_arnmig_en_5.5.0_3.0_1725728363900.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/dummy_model_arnmig_en_5.5.0_3.0_1725728363900.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = CamemBertEmbeddings.pretrained("dummy_model_arnmig","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = CamemBertEmbeddings.pretrained("dummy_model_arnmig","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|dummy_model_arnmig| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[camembert]| +|Language:|en| +|Size:|264.0 MB| + +## References + +https://huggingface.co/arnmig/dummy-model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-07-finetuned_english_tonga_tonga_islands_vietnamese_pien_27_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-07-finetuned_english_tonga_tonga_islands_vietnamese_pien_27_pipeline_en.md new file mode 100644 index 00000000000000..853e17899b736d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-07-finetuned_english_tonga_tonga_islands_vietnamese_pien_27_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuned_english_tonga_tonga_islands_vietnamese_pien_27_pipeline pipeline MarianTransformer from pien-27 +author: John Snow Labs +name: finetuned_english_tonga_tonga_islands_vietnamese_pien_27_pipeline +date: 2024-09-07 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuned_english_tonga_tonga_islands_vietnamese_pien_27_pipeline` is a English model originally trained by pien-27. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuned_english_tonga_tonga_islands_vietnamese_pien_27_pipeline_en_5.5.0_3.0_1725747946611.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuned_english_tonga_tonga_islands_vietnamese_pien_27_pipeline_en_5.5.0_3.0_1725747946611.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuned_english_tonga_tonga_islands_vietnamese_pien_27_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuned_english_tonga_tonga_islands_vietnamese_pien_27_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuned_english_tonga_tonga_islands_vietnamese_pien_27_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|475.9 MB| + +## References + +https://huggingface.co/pien-27/finetuned-en-to-vi + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-07-lld_valbadia_ita_loresmt_l4_it.md b/docs/_posts/ahmedlone127/2024-09-07-lld_valbadia_ita_loresmt_l4_it.md new file mode 100644 index 00000000000000..f4748484f0fd17 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-07-lld_valbadia_ita_loresmt_l4_it.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Italian lld_valbadia_ita_loresmt_l4 MarianTransformer from sfrontull +author: John Snow Labs +name: lld_valbadia_ita_loresmt_l4 +date: 2024-09-07 +tags: [it, open_source, onnx, translation, marian] +task: Translation +language: it +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`lld_valbadia_ita_loresmt_l4` is a Italian model originally trained by sfrontull. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/lld_valbadia_ita_loresmt_l4_it_5.5.0_3.0_1725740911977.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/lld_valbadia_ita_loresmt_l4_it_5.5.0_3.0_1725740911977.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("lld_valbadia_ita_loresmt_l4","it") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("lld_valbadia_ita_loresmt_l4","it") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|lld_valbadia_ita_loresmt_l4| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|it| +|Size:|410.4 MB| + +## References + +https://huggingface.co/sfrontull/lld_valbadia-ita-loresmt-L4 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-07-opendispatcher_v3_gpt35turbo_and_gpt4_en.md b/docs/_posts/ahmedlone127/2024-09-07-opendispatcher_v3_gpt35turbo_and_gpt4_en.md new file mode 100644 index 00000000000000..ad9858d37bb570 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-07-opendispatcher_v3_gpt35turbo_and_gpt4_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English opendispatcher_v3_gpt35turbo_and_gpt4 DistilBertForSequenceClassification from gaodrew +author: John Snow Labs +name: opendispatcher_v3_gpt35turbo_and_gpt4 +date: 2024-09-07 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opendispatcher_v3_gpt35turbo_and_gpt4` is a English model originally trained by gaodrew. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opendispatcher_v3_gpt35turbo_and_gpt4_en_5.5.0_3.0_1725674771769.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opendispatcher_v3_gpt35turbo_and_gpt4_en_5.5.0_3.0_1725674771769.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("opendispatcher_v3_gpt35turbo_and_gpt4","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("opendispatcher_v3_gpt35turbo_and_gpt4", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opendispatcher_v3_gpt35turbo_and_gpt4| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/gaodrew/OpenDispatcher_v3_gpt35turbo_and_gpt4 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-07-qa_model_study_1_en.md b/docs/_posts/ahmedlone127/2024-09-07-qa_model_study_1_en.md new file mode 100644 index 00000000000000..8e7d2533f7b1c9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-07-qa_model_study_1_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English qa_model_study_1 DistilBertForQuestionAnswering from konstaya +author: John Snow Labs +name: qa_model_study_1 +date: 2024-09-07 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`qa_model_study_1` is a English model originally trained by konstaya. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/qa_model_study_1_en_5.5.0_3.0_1725735718656.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/qa_model_study_1_en_5.5.0_3.0_1725735718656.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("qa_model_study_1","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("qa_model_study_1", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|qa_model_study_1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/konstaya/qa_model_study_1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-07-r_fb_sms_lm_en.md b/docs/_posts/ahmedlone127/2024-09-07-r_fb_sms_lm_en.md new file mode 100644 index 00000000000000..38397793a00f06 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-07-r_fb_sms_lm_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English r_fb_sms_lm RoBertaEmbeddings from adnankhawaja +author: John Snow Labs +name: r_fb_sms_lm +date: 2024-09-07 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`r_fb_sms_lm` is a English model originally trained by adnankhawaja. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/r_fb_sms_lm_en_5.5.0_3.0_1725678505363.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/r_fb_sms_lm_en_5.5.0_3.0_1725678505363.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("r_fb_sms_lm","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("r_fb_sms_lm","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|r_fb_sms_lm| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|465.9 MB| + +## References + +https://huggingface.co/adnankhawaja/R_FB_SMS_LM \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-07-tesakantaibert_en.md b/docs/_posts/ahmedlone127/2024-09-07-tesakantaibert_en.md new file mode 100644 index 00000000000000..4564463db6a764 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-07-tesakantaibert_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English tesakantaibert RoBertaEmbeddings from DipanAI +author: John Snow Labs +name: tesakantaibert +date: 2024-09-07 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`tesakantaibert` is a English model originally trained by DipanAI. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/tesakantaibert_en_5.5.0_3.0_1725673544619.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/tesakantaibert_en_5.5.0_3.0_1725673544619.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("tesakantaibert","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("tesakantaibert","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|tesakantaibert| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|310.5 MB| + +## References + +https://huggingface.co/DipanAI/TesAKantaiBERT \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-07-translatear_english_en.md b/docs/_posts/ahmedlone127/2024-09-07-translatear_english_en.md new file mode 100644 index 00000000000000..eb2f64e6104063 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-07-translatear_english_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English translatear_english MarianTransformer from shahad-alh +author: John Snow Labs +name: translatear_english +date: 2024-09-07 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`translatear_english` is a English model originally trained by shahad-alh. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/translatear_english_en_5.5.0_3.0_1725746516159.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/translatear_english_en_5.5.0_3.0_1725746516159.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("translatear_english","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("translatear_english","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|translatear_english| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|527.9 MB| + +## References + +https://huggingface.co/shahad-alh/translateAR_EN \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-07-xlm_roberta_base_finetuned_panx_all_photonmz_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-07-xlm_roberta_base_finetuned_panx_all_photonmz_pipeline_en.md new file mode 100644 index 00000000000000..1313739e8de208 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-07-xlm_roberta_base_finetuned_panx_all_photonmz_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_all_photonmz_pipeline pipeline XlmRoBertaForTokenClassification from photonmz +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_all_photonmz_pipeline +date: 2024-09-07 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_all_photonmz_pipeline` is a English model originally trained by photonmz. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_photonmz_pipeline_en_5.5.0_3.0_1725743564681.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_photonmz_pipeline_en_5.5.0_3.0_1725743564681.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_all_photonmz_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_all_photonmz_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_all_photonmz_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|853.9 MB| + +## References + +https://huggingface.co/photonmz/xlm-roberta-base-finetuned-panx-all + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-07-xlm_roberta_base_finetuned_panx_french_jfmatos_isq_en.md b/docs/_posts/ahmedlone127/2024-09-07-xlm_roberta_base_finetuned_panx_french_jfmatos_isq_en.md new file mode 100644 index 00000000000000..656b26db801b15 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-07-xlm_roberta_base_finetuned_panx_french_jfmatos_isq_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_jfmatos_isq XlmRoBertaForTokenClassification from jfmatos-isq +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_jfmatos_isq +date: 2024-09-07 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_jfmatos_isq` is a English model originally trained by jfmatos-isq. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_jfmatos_isq_en_5.5.0_3.0_1725704493586.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_jfmatos_isq_en_5.5.0_3.0_1725704493586.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_french_jfmatos_isq","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_french_jfmatos_isq", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_jfmatos_isq| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|840.9 MB| + +## References + +https://huggingface.co/jfmatos-isq/xlm-roberta-base-finetuned-panx-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-07-xlm_roberta_base_finetuned_panx_german_gonalb_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-07-xlm_roberta_base_finetuned_panx_german_gonalb_pipeline_en.md new file mode 100644 index 00000000000000..87afd5f4a17caf --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-07-xlm_roberta_base_finetuned_panx_german_gonalb_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_gonalb_pipeline pipeline XlmRoBertaForTokenClassification from Gonalb +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_gonalb_pipeline +date: 2024-09-07 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_gonalb_pipeline` is a English model originally trained by Gonalb. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_gonalb_pipeline_en_5.5.0_3.0_1725705315767.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_gonalb_pipeline_en_5.5.0_3.0_1725705315767.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_gonalb_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_gonalb_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_gonalb_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/Gonalb/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-07-xlm_roberta_base_finetuned_panx_german_monkdalma_en.md b/docs/_posts/ahmedlone127/2024-09-07-xlm_roberta_base_finetuned_panx_german_monkdalma_en.md new file mode 100644 index 00000000000000..6510d6122128fb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-07-xlm_roberta_base_finetuned_panx_german_monkdalma_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_monkdalma XlmRoBertaForTokenClassification from MonkDalma +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_monkdalma +date: 2024-09-07 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_monkdalma` is a English model originally trained by MonkDalma. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_monkdalma_en_5.5.0_3.0_1725704223811.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_monkdalma_en_5.5.0_3.0_1725704223811.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_monkdalma","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_monkdalma", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_monkdalma| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/MonkDalma/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-08-abhi11_model_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-08-abhi11_model_pipeline_en.md new file mode 100644 index 00000000000000..d40e45bd2ac5a4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-08-abhi11_model_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English abhi11_model_pipeline pipeline CamemBertEmbeddings from ABHIiiii1 +author: John Snow Labs +name: abhi11_model_pipeline +date: 2024-09-08 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained CamemBertEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`abhi11_model_pipeline` is a English model originally trained by ABHIiiii1. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/abhi11_model_pipeline_en_5.5.0_3.0_1725786584186.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/abhi11_model_pipeline_en_5.5.0_3.0_1725786584186.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("abhi11_model_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("abhi11_model_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|abhi11_model_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|264.0 MB| + +## References + +https://huggingface.co/ABHIiiii1/abhi11-model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- CamemBertEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-08-all_mpnet_base_v2_finetuned_en.md b/docs/_posts/ahmedlone127/2024-09-08-all_mpnet_base_v2_finetuned_en.md new file mode 100644 index 00000000000000..5ef7eb93b51980 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-08-all_mpnet_base_v2_finetuned_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English all_mpnet_base_v2_finetuned MPNetEmbeddings from DashReza7 +author: John Snow Labs +name: all_mpnet_base_v2_finetuned +date: 2024-09-08 +tags: [en, open_source, onnx, embeddings, mpnet] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MPNetEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MPNetEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`all_mpnet_base_v2_finetuned` is a English model originally trained by DashReza7. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/all_mpnet_base_v2_finetuned_en_5.5.0_3.0_1725816704396.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/all_mpnet_base_v2_finetuned_en_5.5.0_3.0_1725816704396.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = MPNetEmbeddings.pretrained("all_mpnet_base_v2_finetuned","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val embeddings = MPNetEmbeddings.pretrained("all_mpnet_base_v2_finetuned","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|all_mpnet_base_v2_finetuned| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[mpnet]| +|Language:|en| +|Size:|406.8 MB| + +## References + +https://huggingface.co/DashReza7/all-mpnet-base-v2_FINETUNED \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-08-all_mpnet_base_v2_margin_5_epoch_1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-08-all_mpnet_base_v2_margin_5_epoch_1_pipeline_en.md new file mode 100644 index 00000000000000..1284949f980884 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-08-all_mpnet_base_v2_margin_5_epoch_1_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English all_mpnet_base_v2_margin_5_epoch_1_pipeline pipeline MPNetEmbeddings from luiz-and-robert-thesis +author: John Snow Labs +name: all_mpnet_base_v2_margin_5_epoch_1_pipeline +date: 2024-09-08 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MPNetEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`all_mpnet_base_v2_margin_5_epoch_1_pipeline` is a English model originally trained by luiz-and-robert-thesis. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/all_mpnet_base_v2_margin_5_epoch_1_pipeline_en_5.5.0_3.0_1725815956427.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/all_mpnet_base_v2_margin_5_epoch_1_pipeline_en_5.5.0_3.0_1725815956427.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("all_mpnet_base_v2_margin_5_epoch_1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("all_mpnet_base_v2_margin_5_epoch_1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|all_mpnet_base_v2_margin_5_epoch_1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.1 MB| + +## References + +https://huggingface.co/luiz-and-robert-thesis/all-mpnet-base-v2-margin-5-epoch-1 + +## Included Models + +- DocumentAssembler +- MPNetEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-08-analisis_sentimientos_beto_tass_c_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-08-analisis_sentimientos_beto_tass_c_pipeline_en.md new file mode 100644 index 00000000000000..bfd16cf164ff30 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-08-analisis_sentimientos_beto_tass_c_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English analisis_sentimientos_beto_tass_c_pipeline pipeline BertForSequenceClassification from raulgdp +author: John Snow Labs +name: analisis_sentimientos_beto_tass_c_pipeline +date: 2024-09-08 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`analisis_sentimientos_beto_tass_c_pipeline` is a English model originally trained by raulgdp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/analisis_sentimientos_beto_tass_c_pipeline_en_5.5.0_3.0_1725767889756.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/analisis_sentimientos_beto_tass_c_pipeline_en_5.5.0_3.0_1725767889756.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("analisis_sentimientos_beto_tass_c_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("analisis_sentimientos_beto_tass_c_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|analisis_sentimientos_beto_tass_c_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|411.7 MB| + +## References + +https://huggingface.co/raulgdp/Analisis-sentimientos-BETO-TASS-C + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-08-arabert_mini_algerian_pipeline_ar.md b/docs/_posts/ahmedlone127/2024-09-08-arabert_mini_algerian_pipeline_ar.md new file mode 100644 index 00000000000000..3d8cea5411c327 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-08-arabert_mini_algerian_pipeline_ar.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Arabic arabert_mini_algerian_pipeline pipeline BertForSequenceClassification from Abdou +author: John Snow Labs +name: arabert_mini_algerian_pipeline +date: 2024-09-08 +tags: [ar, open_source, pipeline, onnx] +task: Text Classification +language: ar +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`arabert_mini_algerian_pipeline` is a Arabic model originally trained by Abdou. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/arabert_mini_algerian_pipeline_ar_5.5.0_3.0_1725839247964.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/arabert_mini_algerian_pipeline_ar_5.5.0_3.0_1725839247964.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("arabert_mini_algerian_pipeline", lang = "ar") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("arabert_mini_algerian_pipeline", lang = "ar") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|arabert_mini_algerian_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|ar| +|Size:|43.6 MB| + +## References + +https://huggingface.co/Abdou/arabert-mini-algerian + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-08-bcms_bertic_parlasent_bcs_ter_hr.md b/docs/_posts/ahmedlone127/2024-09-08-bcms_bertic_parlasent_bcs_ter_hr.md new file mode 100644 index 00000000000000..5908919e28d9b1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-08-bcms_bertic_parlasent_bcs_ter_hr.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Croatian bcms_bertic_parlasent_bcs_ter BertForSequenceClassification from classla +author: John Snow Labs +name: bcms_bertic_parlasent_bcs_ter +date: 2024-09-08 +tags: [hr, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: hr +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bcms_bertic_parlasent_bcs_ter` is a Croatian model originally trained by classla. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bcms_bertic_parlasent_bcs_ter_hr_5.5.0_3.0_1725826054539.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bcms_bertic_parlasent_bcs_ter_hr_5.5.0_3.0_1725826054539.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bcms_bertic_parlasent_bcs_ter","hr") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bcms_bertic_parlasent_bcs_ter", "hr") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bcms_bertic_parlasent_bcs_ter| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|hr| +|Size:|414.9 MB| + +## References + +https://huggingface.co/classla/bcms-bertic-parlasent-bcs-ter \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-08-bert_large_uncased_mrpc_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-08-bert_large_uncased_mrpc_pipeline_en.md new file mode 100644 index 00000000000000..c0c21df4cf178b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-08-bert_large_uncased_mrpc_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_large_uncased_mrpc_pipeline pipeline BertForSequenceClassification from yoshitomo-matsubara +author: John Snow Labs +name: bert_large_uncased_mrpc_pipeline +date: 2024-09-08 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_large_uncased_mrpc_pipeline` is a English model originally trained by yoshitomo-matsubara. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_large_uncased_mrpc_pipeline_en_5.5.0_3.0_1725801980637.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_large_uncased_mrpc_pipeline_en_5.5.0_3.0_1725801980637.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_large_uncased_mrpc_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_large_uncased_mrpc_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_large_uncased_mrpc_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-mrpc + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-08-burmese_awesome_qa_model_alinadevkota_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-08-burmese_awesome_qa_model_alinadevkota_pipeline_en.md new file mode 100644 index 00000000000000..92e6945dec5413 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-08-burmese_awesome_qa_model_alinadevkota_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English burmese_awesome_qa_model_alinadevkota_pipeline pipeline DistilBertForQuestionAnswering from alinadevkota +author: John Snow Labs +name: burmese_awesome_qa_model_alinadevkota_pipeline +date: 2024-09-08 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_qa_model_alinadevkota_pipeline` is a English model originally trained by alinadevkota. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_alinadevkota_pipeline_en_5.5.0_3.0_1725798313953.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_alinadevkota_pipeline_en_5.5.0_3.0_1725798313953.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_qa_model_alinadevkota_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_qa_model_alinadevkota_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_qa_model_alinadevkota_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/alinadevkota/my_awesome_qa_model + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-08-burmese_awesome_wnut_jaoo_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-08-burmese_awesome_wnut_jaoo_pipeline_en.md new file mode 100644 index 00000000000000..36e45090684cc3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-08-burmese_awesome_wnut_jaoo_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_wnut_jaoo_pipeline pipeline DistilBertForTokenClassification from gonzalezrostani +author: John Snow Labs +name: burmese_awesome_wnut_jaoo_pipeline +date: 2024-09-08 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_wnut_jaoo_pipeline` is a English model originally trained by gonzalezrostani. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_wnut_jaoo_pipeline_en_5.5.0_3.0_1725837720041.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_wnut_jaoo_pipeline_en_5.5.0_3.0_1725837720041.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_wnut_jaoo_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_wnut_jaoo_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_wnut_jaoo_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.3 MB| + +## References + +https://huggingface.co/gonzalezrostani/my_awesome_wnut_JAOo + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-08-cls_model_en.md b/docs/_posts/ahmedlone127/2024-09-08-cls_model_en.md new file mode 100644 index 00000000000000..131648c331574a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-08-cls_model_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English cls_model MPNetEmbeddings from maneprajakta +author: John Snow Labs +name: cls_model +date: 2024-09-08 +tags: [en, open_source, onnx, embeddings, mpnet] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MPNetEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MPNetEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`cls_model` is a English model originally trained by maneprajakta. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/cls_model_en_5.5.0_3.0_1725816841868.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/cls_model_en_5.5.0_3.0_1725816841868.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = MPNetEmbeddings.pretrained("cls_model","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val embeddings = MPNetEmbeddings.pretrained("cls_model","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|cls_model| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[mpnet]| +|Language:|en| +|Size:|407.1 MB| + +## References + +https://huggingface.co/maneprajakta/cls_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-08-customersentiment_en.md b/docs/_posts/ahmedlone127/2024-09-08-customersentiment_en.md new file mode 100644 index 00000000000000..3a659df843adba --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-08-customersentiment_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English customersentiment DistilBertForSequenceClassification from kearney +author: John Snow Labs +name: customersentiment +date: 2024-09-08 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`customersentiment` is a English model originally trained by kearney. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/customersentiment_en_5.5.0_3.0_1725777151848.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/customersentiment_en_5.5.0_3.0_1725777151848.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("customersentiment","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("customersentiment", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|customersentiment| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|246.0 MB| + +## References + +https://huggingface.co/kearney/customersentiment \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-08-deberta_v3_xsmall_survey_nepal_bhasa_fact_main_passage_rater_en.md b/docs/_posts/ahmedlone127/2024-09-08-deberta_v3_xsmall_survey_nepal_bhasa_fact_main_passage_rater_en.md new file mode 100644 index 00000000000000..c8be7c69cd8c6b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-08-deberta_v3_xsmall_survey_nepal_bhasa_fact_main_passage_rater_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English deberta_v3_xsmall_survey_nepal_bhasa_fact_main_passage_rater DeBertaForSequenceClassification from domenicrosati +author: John Snow Labs +name: deberta_v3_xsmall_survey_nepal_bhasa_fact_main_passage_rater +date: 2024-09-08 +tags: [en, open_source, onnx, sequence_classification, deberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DeBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DeBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`deberta_v3_xsmall_survey_nepal_bhasa_fact_main_passage_rater` is a English model originally trained by domenicrosati. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/deberta_v3_xsmall_survey_nepal_bhasa_fact_main_passage_rater_en_5.5.0_3.0_1725804348690.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/deberta_v3_xsmall_survey_nepal_bhasa_fact_main_passage_rater_en_5.5.0_3.0_1725804348690.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DeBertaForSequenceClassification.pretrained("deberta_v3_xsmall_survey_nepal_bhasa_fact_main_passage_rater","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DeBertaForSequenceClassification.pretrained("deberta_v3_xsmall_survey_nepal_bhasa_fact_main_passage_rater", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|deberta_v3_xsmall_survey_nepal_bhasa_fact_main_passage_rater| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|224.9 MB| + +## References + +https://huggingface.co/domenicrosati/deberta-v3-xsmall-survey-new_fact_main_passage-rater \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-08-distilbert_base_uncased_finetuned_cola_lmajer_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-08-distilbert_base_uncased_finetuned_cola_lmajer_pipeline_en.md new file mode 100644 index 00000000000000..328356fc0fa3da --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-08-distilbert_base_uncased_finetuned_cola_lmajer_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_cola_lmajer_pipeline pipeline DistilBertForSequenceClassification from lmajer +author: John Snow Labs +name: distilbert_base_uncased_finetuned_cola_lmajer_pipeline +date: 2024-09-08 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_cola_lmajer_pipeline` is a English model originally trained by lmajer. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_lmajer_pipeline_en_5.5.0_3.0_1725774761491.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_lmajer_pipeline_en_5.5.0_3.0_1725774761491.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_cola_lmajer_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_cola_lmajer_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_cola_lmajer_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/lmajer/distilbert-base-uncased-finetuned-cola + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-08-distilbert_base_uncased_finetuned_ner_heclope_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-08-distilbert_base_uncased_finetuned_ner_heclope_pipeline_en.md new file mode 100644 index 00000000000000..e192321da859cc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-08-distilbert_base_uncased_finetuned_ner_heclope_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_ner_heclope_pipeline pipeline DistilBertForTokenClassification from heclope +author: John Snow Labs +name: distilbert_base_uncased_finetuned_ner_heclope_pipeline +date: 2024-09-08 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_ner_heclope_pipeline` is a English model originally trained by heclope. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_ner_heclope_pipeline_en_5.5.0_3.0_1725788759115.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_ner_heclope_pipeline_en_5.5.0_3.0_1725788759115.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_ner_heclope_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_ner_heclope_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_ner_heclope_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.3 MB| + +## References + +https://huggingface.co/heclope/distilbert-base-uncased-finetuned-ner + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-08-distilbert_base_uncased_finetuned_squad_hfyutojp_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-08-distilbert_base_uncased_finetuned_squad_hfyutojp_pipeline_en.md new file mode 100644 index 00000000000000..7668666e49cb0b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-08-distilbert_base_uncased_finetuned_squad_hfyutojp_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_squad_hfyutojp_pipeline pipeline DistilBertForQuestionAnswering from hfyutojp +author: John Snow Labs +name: distilbert_base_uncased_finetuned_squad_hfyutojp_pipeline +date: 2024-09-08 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_squad_hfyutojp_pipeline` is a English model originally trained by hfyutojp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_hfyutojp_pipeline_en_5.5.0_3.0_1725798140735.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_hfyutojp_pipeline_en_5.5.0_3.0_1725798140735.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_squad_hfyutojp_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_squad_hfyutojp_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_squad_hfyutojp_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/hfyutojp/distilbert-base-uncased-finetuned-squad + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-08-distilbert_base_uncased_finetuned_squad_patrikrac_en.md b/docs/_posts/ahmedlone127/2024-09-08-distilbert_base_uncased_finetuned_squad_patrikrac_en.md new file mode 100644 index 00000000000000..aec6f26431f79b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-08-distilbert_base_uncased_finetuned_squad_patrikrac_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_squad_patrikrac DistilBertForQuestionAnswering from patrikrac +author: John Snow Labs +name: distilbert_base_uncased_finetuned_squad_patrikrac +date: 2024-09-08 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_squad_patrikrac` is a English model originally trained by patrikrac. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_patrikrac_en_5.5.0_3.0_1725798077063.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_patrikrac_en_5.5.0_3.0_1725798077063.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_uncased_finetuned_squad_patrikrac","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_uncased_finetuned_squad_patrikrac", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_squad_patrikrac| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/patrikrac/distilbert-base-uncased-finetuned-squad \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-08-distilbert_base_uncased_finetuned_squad_taeseon_en.md b/docs/_posts/ahmedlone127/2024-09-08-distilbert_base_uncased_finetuned_squad_taeseon_en.md new file mode 100644 index 00000000000000..f3e6d2a2940cd6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-08-distilbert_base_uncased_finetuned_squad_taeseon_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_squad_taeseon DistilBertForQuestionAnswering from taeseon +author: John Snow Labs +name: distilbert_base_uncased_finetuned_squad_taeseon +date: 2024-09-08 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_squad_taeseon` is a English model originally trained by taeseon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_taeseon_en_5.5.0_3.0_1725818616750.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_taeseon_en_5.5.0_3.0_1725818616750.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_uncased_finetuned_squad_taeseon","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_uncased_finetuned_squad_taeseon", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_squad_taeseon| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/taeseon/distilbert-base-uncased-finetuned-squad \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-08-dummy_model_botmync_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-08-dummy_model_botmync_pipeline_en.md new file mode 100644 index 00000000000000..b84c26c8c3c45d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-08-dummy_model_botmync_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English dummy_model_botmync_pipeline pipeline CamemBertEmbeddings from botMync +author: John Snow Labs +name: dummy_model_botmync_pipeline +date: 2024-09-08 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained CamemBertEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`dummy_model_botmync_pipeline` is a English model originally trained by botMync. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/dummy_model_botmync_pipeline_en_5.5.0_3.0_1725835822437.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/dummy_model_botmync_pipeline_en_5.5.0_3.0_1725835822437.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("dummy_model_botmync_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("dummy_model_botmync_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|dummy_model_botmync_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|264.0 MB| + +## References + +https://huggingface.co/botMync/dummy-model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- CamemBertEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-08-dummy_model_cwtmyd_en.md b/docs/_posts/ahmedlone127/2024-09-08-dummy_model_cwtmyd_en.md new file mode 100644 index 00000000000000..91a324c2871c65 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-08-dummy_model_cwtmyd_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English dummy_model_cwtmyd CamemBertEmbeddings from cwtmyd +author: John Snow Labs +name: dummy_model_cwtmyd +date: 2024-09-08 +tags: [en, open_source, onnx, embeddings, camembert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: CamemBertEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained CamemBertEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`dummy_model_cwtmyd` is a English model originally trained by cwtmyd. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/dummy_model_cwtmyd_en_5.5.0_3.0_1725786907121.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/dummy_model_cwtmyd_en_5.5.0_3.0_1725786907121.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = CamemBertEmbeddings.pretrained("dummy_model_cwtmyd","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = CamemBertEmbeddings.pretrained("dummy_model_cwtmyd","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|dummy_model_cwtmyd| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[camembert]| +|Language:|en| +|Size:|264.0 MB| + +## References + +https://huggingface.co/cwtmyd/dummy-model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-08-dummy_model_jaese_en.md b/docs/_posts/ahmedlone127/2024-09-08-dummy_model_jaese_en.md new file mode 100644 index 00000000000000..6ba4b8eda2f3e3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-08-dummy_model_jaese_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English dummy_model_jaese CamemBertEmbeddings from jaese +author: John Snow Labs +name: dummy_model_jaese +date: 2024-09-08 +tags: [en, open_source, onnx, embeddings, camembert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: CamemBertEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained CamemBertEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`dummy_model_jaese` is a English model originally trained by jaese. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/dummy_model_jaese_en_5.5.0_3.0_1725836471508.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/dummy_model_jaese_en_5.5.0_3.0_1725836471508.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = CamemBertEmbeddings.pretrained("dummy_model_jaese","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = CamemBertEmbeddings.pretrained("dummy_model_jaese","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|dummy_model_jaese| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[camembert]| +|Language:|en| +|Size:|264.0 MB| + +## References + +https://huggingface.co/jaese/dummy-model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-08-dummy_model_katster_en.md b/docs/_posts/ahmedlone127/2024-09-08-dummy_model_katster_en.md new file mode 100644 index 00000000000000..a72aa038df929b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-08-dummy_model_katster_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English dummy_model_katster CamemBertEmbeddings from Katster +author: John Snow Labs +name: dummy_model_katster +date: 2024-09-08 +tags: [en, open_source, onnx, embeddings, camembert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: CamemBertEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained CamemBertEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`dummy_model_katster` is a English model originally trained by Katster. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/dummy_model_katster_en_5.5.0_3.0_1725787089096.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/dummy_model_katster_en_5.5.0_3.0_1725787089096.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = CamemBertEmbeddings.pretrained("dummy_model_katster","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = CamemBertEmbeddings.pretrained("dummy_model_katster","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|dummy_model_katster| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[camembert]| +|Language:|en| +|Size:|264.0 MB| + +## References + +https://huggingface.co/Katster/dummy-model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-08-english_vietnamese_maltese_model_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-08-english_vietnamese_maltese_model_pipeline_en.md new file mode 100644 index 00000000000000..8862e29a444b11 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-08-english_vietnamese_maltese_model_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English english_vietnamese_maltese_model_pipeline pipeline MarianTransformer from haotieu +author: John Snow Labs +name: english_vietnamese_maltese_model_pipeline +date: 2024-09-08 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`english_vietnamese_maltese_model_pipeline` is a English model originally trained by haotieu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/english_vietnamese_maltese_model_pipeline_en_5.5.0_3.0_1725795287535.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/english_vietnamese_maltese_model_pipeline_en_5.5.0_3.0_1725795287535.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("english_vietnamese_maltese_model_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("english_vietnamese_maltese_model_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|english_vietnamese_maltese_model_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|475.1 MB| + +## References + +https://huggingface.co/haotieu/en-vi-mt-model + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-08-final_model1_en.md b/docs/_posts/ahmedlone127/2024-09-08-final_model1_en.md new file mode 100644 index 00000000000000..8b4975752fd0a2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-08-final_model1_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English final_model1 DistilBertForSequenceClassification from sachit56 +author: John Snow Labs +name: final_model1 +date: 2024-09-08 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`final_model1` is a English model originally trained by sachit56. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/final_model1_en_5.5.0_3.0_1725774746342.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/final_model1_en_5.5.0_3.0_1725774746342.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("final_model1","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("final_model1", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|final_model1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/sachit56/final_model1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-08-helsinki_english_spanish_fine_tune_opus100_en.md b/docs/_posts/ahmedlone127/2024-09-08-helsinki_english_spanish_fine_tune_opus100_en.md new file mode 100644 index 00000000000000..608482d1800886 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-08-helsinki_english_spanish_fine_tune_opus100_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English helsinki_english_spanish_fine_tune_opus100 MarianTransformer from beanslmao +author: John Snow Labs +name: helsinki_english_spanish_fine_tune_opus100 +date: 2024-09-08 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`helsinki_english_spanish_fine_tune_opus100` is a English model originally trained by beanslmao. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/helsinki_english_spanish_fine_tune_opus100_en_5.5.0_3.0_1725839842494.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/helsinki_english_spanish_fine_tune_opus100_en_5.5.0_3.0_1725839842494.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("helsinki_english_spanish_fine_tune_opus100","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("helsinki_english_spanish_fine_tune_opus100","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|helsinki_english_spanish_fine_tune_opus100| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|539.9 MB| + +## References + +https://huggingface.co/beanslmao/helsinki-en-es-fine-tune-opus100 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-08-horai_medium_10k_v7_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-08-horai_medium_10k_v7_pipeline_en.md new file mode 100644 index 00000000000000..9226cda4e9b033 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-08-horai_medium_10k_v7_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English horai_medium_10k_v7_pipeline pipeline RoBertaForSequenceClassification from stealthwriter +author: John Snow Labs +name: horai_medium_10k_v7_pipeline +date: 2024-09-08 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`horai_medium_10k_v7_pipeline` is a English model originally trained by stealthwriter. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/horai_medium_10k_v7_pipeline_en_5.5.0_3.0_1725830485657.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/horai_medium_10k_v7_pipeline_en_5.5.0_3.0_1725830485657.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("horai_medium_10k_v7_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("horai_medium_10k_v7_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|horai_medium_10k_v7_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|439.0 MB| + +## References + +https://huggingface.co/stealthwriter/HorAI-medium-10k-V7 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-08-lab1_finetuning_twobjohn_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-08-lab1_finetuning_twobjohn_pipeline_en.md new file mode 100644 index 00000000000000..1d8c4018ea60cb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-08-lab1_finetuning_twobjohn_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English lab1_finetuning_twobjohn_pipeline pipeline MarianTransformer from TwoBJohn +author: John Snow Labs +name: lab1_finetuning_twobjohn_pipeline +date: 2024-09-08 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`lab1_finetuning_twobjohn_pipeline` is a English model originally trained by TwoBJohn. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/lab1_finetuning_twobjohn_pipeline_en_5.5.0_3.0_1725824337759.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/lab1_finetuning_twobjohn_pipeline_en_5.5.0_3.0_1725824337759.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("lab1_finetuning_twobjohn_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("lab1_finetuning_twobjohn_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|lab1_finetuning_twobjohn_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|508.8 MB| + +## References + +https://huggingface.co/TwoBJohn/lab1_finetuning + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-08-marian_finetuned_combined_dataset_substituted_1_1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-08-marian_finetuned_combined_dataset_substituted_1_1_pipeline_en.md new file mode 100644 index 00000000000000..aff2502d875e98 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-08-marian_finetuned_combined_dataset_substituted_1_1_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English marian_finetuned_combined_dataset_substituted_1_1_pipeline pipeline MarianTransformer from kalcho100 +author: John Snow Labs +name: marian_finetuned_combined_dataset_substituted_1_1_pipeline +date: 2024-09-08 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`marian_finetuned_combined_dataset_substituted_1_1_pipeline` is a English model originally trained by kalcho100. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/marian_finetuned_combined_dataset_substituted_1_1_pipeline_en_5.5.0_3.0_1725832421666.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/marian_finetuned_combined_dataset_substituted_1_1_pipeline_en_5.5.0_3.0_1725832421666.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("marian_finetuned_combined_dataset_substituted_1_1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("marian_finetuned_combined_dataset_substituted_1_1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|marian_finetuned_combined_dataset_substituted_1_1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|548.3 MB| + +## References + +https://huggingface.co/kalcho100/Marian-finetuned_combined_dataset_substituted_1_1 + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-08-marian_finetuned_kde4_english_tonga_tonga_islands_french_bill1888_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-08-marian_finetuned_kde4_english_tonga_tonga_islands_french_bill1888_pipeline_en.md new file mode 100644 index 00000000000000..d5f0a56c0fcf41 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-08-marian_finetuned_kde4_english_tonga_tonga_islands_french_bill1888_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English marian_finetuned_kde4_english_tonga_tonga_islands_french_bill1888_pipeline pipeline MarianTransformer from bill1888 +author: John Snow Labs +name: marian_finetuned_kde4_english_tonga_tonga_islands_french_bill1888_pipeline +date: 2024-09-08 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`marian_finetuned_kde4_english_tonga_tonga_islands_french_bill1888_pipeline` is a English model originally trained by bill1888. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/marian_finetuned_kde4_english_tonga_tonga_islands_french_bill1888_pipeline_en_5.5.0_3.0_1725766387220.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/marian_finetuned_kde4_english_tonga_tonga_islands_french_bill1888_pipeline_en_5.5.0_3.0_1725766387220.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("marian_finetuned_kde4_english_tonga_tonga_islands_french_bill1888_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("marian_finetuned_kde4_english_tonga_tonga_islands_french_bill1888_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|marian_finetuned_kde4_english_tonga_tonga_islands_french_bill1888_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|510.3 MB| + +## References + +https://huggingface.co/bill1888/marian-finetuned-kde4-en-to-fr + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-08-mdeberta_v3_base_cantemist_pipeline_es.md b/docs/_posts/ahmedlone127/2024-09-08-mdeberta_v3_base_cantemist_pipeline_es.md new file mode 100644 index 00000000000000..362ebaeb9cd70a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-08-mdeberta_v3_base_cantemist_pipeline_es.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Castilian, Spanish mdeberta_v3_base_cantemist_pipeline pipeline DeBertaForSequenceClassification from IIC +author: John Snow Labs +name: mdeberta_v3_base_cantemist_pipeline +date: 2024-09-08 +tags: [es, open_source, pipeline, onnx] +task: Text Classification +language: es +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DeBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`mdeberta_v3_base_cantemist_pipeline` is a Castilian, Spanish model originally trained by IIC. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/mdeberta_v3_base_cantemist_pipeline_es_5.5.0_3.0_1725811779468.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/mdeberta_v3_base_cantemist_pipeline_es_5.5.0_3.0_1725811779468.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("mdeberta_v3_base_cantemist_pipeline", lang = "es") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("mdeberta_v3_base_cantemist_pipeline", lang = "es") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|mdeberta_v3_base_cantemist_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|es| +|Size:|799.8 MB| + +## References + +https://huggingface.co/IIC/mdeberta-v3-base-cantemist + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DeBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-08-mpnet_base_all_nli_triplet_korruz_en.md b/docs/_posts/ahmedlone127/2024-09-08-mpnet_base_all_nli_triplet_korruz_en.md new file mode 100644 index 00000000000000..b6ae5c8f138dd4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-08-mpnet_base_all_nli_triplet_korruz_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English mpnet_base_all_nli_triplet_korruz MPNetEmbeddings from korruz +author: John Snow Labs +name: mpnet_base_all_nli_triplet_korruz +date: 2024-09-08 +tags: [en, open_source, onnx, embeddings, mpnet] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MPNetEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MPNetEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`mpnet_base_all_nli_triplet_korruz` is a English model originally trained by korruz. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/mpnet_base_all_nli_triplet_korruz_en_5.5.0_3.0_1725816094251.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/mpnet_base_all_nli_triplet_korruz_en_5.5.0_3.0_1725816094251.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = MPNetEmbeddings.pretrained("mpnet_base_all_nli_triplet_korruz","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val embeddings = MPNetEmbeddings.pretrained("mpnet_base_all_nli_triplet_korruz","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|mpnet_base_all_nli_triplet_korruz| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[mpnet]| +|Language:|en| +|Size:|390.1 MB| + +## References + +https://huggingface.co/korruz/mpnet-base-all-nli-triplet \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-08-multilingual_xlm_roberta_for_ner_c4n11_pipeline_xx.md b/docs/_posts/ahmedlone127/2024-09-08-multilingual_xlm_roberta_for_ner_c4n11_pipeline_xx.md new file mode 100644 index 00000000000000..dab5043c138f75 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-08-multilingual_xlm_roberta_for_ner_c4n11_pipeline_xx.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Multilingual multilingual_xlm_roberta_for_ner_c4n11_pipeline pipeline XlmRoBertaForTokenClassification from c4n11 +author: John Snow Labs +name: multilingual_xlm_roberta_for_ner_c4n11_pipeline +date: 2024-09-08 +tags: [xx, open_source, pipeline, onnx] +task: Named Entity Recognition +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`multilingual_xlm_roberta_for_ner_c4n11_pipeline` is a Multilingual model originally trained by c4n11. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/multilingual_xlm_roberta_for_ner_c4n11_pipeline_xx_5.5.0_3.0_1725773335687.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/multilingual_xlm_roberta_for_ner_c4n11_pipeline_xx_5.5.0_3.0_1725773335687.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("multilingual_xlm_roberta_for_ner_c4n11_pipeline", lang = "xx") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("multilingual_xlm_roberta_for_ner_c4n11_pipeline", lang = "xx") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|multilingual_xlm_roberta_for_ner_c4n11_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|xx| +|Size:|839.7 MB| + +## References + +https://huggingface.co/c4n11/multilingual-xlm-roberta-for-ner + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-08-opus_maltese_english_french_wmt14_english_french_1million_20epochs_v2_nan.md b/docs/_posts/ahmedlone127/2024-09-08-opus_maltese_english_french_wmt14_english_french_1million_20epochs_v2_nan.md new file mode 100644 index 00000000000000..1019bf516b8880 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-08-opus_maltese_english_french_wmt14_english_french_1million_20epochs_v2_nan.md @@ -0,0 +1,94 @@ +--- +layout: model +title: None opus_maltese_english_french_wmt14_english_french_1million_20epochs_v2 MarianTransformer from sriram-sanjeev9s +author: John Snow Labs +name: opus_maltese_english_french_wmt14_english_french_1million_20epochs_v2 +date: 2024-09-08 +tags: [nan, open_source, onnx, translation, marian] +task: Translation +language: nan +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opus_maltese_english_french_wmt14_english_french_1million_20epochs_v2` is a None model originally trained by sriram-sanjeev9s. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opus_maltese_english_french_wmt14_english_french_1million_20epochs_v2_nan_5.5.0_3.0_1725765323104.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opus_maltese_english_french_wmt14_english_french_1million_20epochs_v2_nan_5.5.0_3.0_1725765323104.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("opus_maltese_english_french_wmt14_english_french_1million_20epochs_v2","nan") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("opus_maltese_english_french_wmt14_english_french_1million_20epochs_v2","nan") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opus_maltese_english_french_wmt14_english_french_1million_20epochs_v2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|nan| +|Size:|502.0 MB| + +## References + +https://huggingface.co/sriram-sanjeev9s/opus-mt-en-fr_wmt14_En_Fr_1million_20epochs_v2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-08-opus_maltese_english_spanish_finetuned_spanish_tonga_tonga_islands_pbb_v2_en.md b/docs/_posts/ahmedlone127/2024-09-08-opus_maltese_english_spanish_finetuned_spanish_tonga_tonga_islands_pbb_v2_en.md new file mode 100644 index 00000000000000..ea3eac44a045b6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-08-opus_maltese_english_spanish_finetuned_spanish_tonga_tonga_islands_pbb_v2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English opus_maltese_english_spanish_finetuned_spanish_tonga_tonga_islands_pbb_v2 MarianTransformer from mekjr1 +author: John Snow Labs +name: opus_maltese_english_spanish_finetuned_spanish_tonga_tonga_islands_pbb_v2 +date: 2024-09-08 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opus_maltese_english_spanish_finetuned_spanish_tonga_tonga_islands_pbb_v2` is a English model originally trained by mekjr1. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opus_maltese_english_spanish_finetuned_spanish_tonga_tonga_islands_pbb_v2_en_5.5.0_3.0_1725824135212.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opus_maltese_english_spanish_finetuned_spanish_tonga_tonga_islands_pbb_v2_en_5.5.0_3.0_1725824135212.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("opus_maltese_english_spanish_finetuned_spanish_tonga_tonga_islands_pbb_v2","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("opus_maltese_english_spanish_finetuned_spanish_tonga_tonga_islands_pbb_v2","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opus_maltese_english_spanish_finetuned_spanish_tonga_tonga_islands_pbb_v2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|539.9 MB| + +## References + +https://huggingface.co/mekjr1/opus-mt-en-es-finetuned-es-to-pbb-v2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-08-q2d_333_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-08-q2d_333_pipeline_en.md new file mode 100644 index 00000000000000..a44e5cafd7ce50 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-08-q2d_333_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English q2d_333_pipeline pipeline MPNetEmbeddings from ingeol +author: John Snow Labs +name: q2d_333_pipeline +date: 2024-09-08 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MPNetEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`q2d_333_pipeline` is a English model originally trained by ingeol. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/q2d_333_pipeline_en_5.5.0_3.0_1725815286090.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/q2d_333_pipeline_en_5.5.0_3.0_1725815286090.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("q2d_333_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("q2d_333_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|q2d_333_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.1 MB| + +## References + +https://huggingface.co/ingeol/q2d_333 + +## Included Models + +- DocumentAssembler +- MPNetEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-08-q2d_ep3_35_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-08-q2d_ep3_35_pipeline_en.md new file mode 100644 index 00000000000000..7549bac7dc8820 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-08-q2d_ep3_35_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English q2d_ep3_35_pipeline pipeline MPNetEmbeddings from ingeol +author: John Snow Labs +name: q2d_ep3_35_pipeline +date: 2024-09-08 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MPNetEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`q2d_ep3_35_pipeline` is a English model originally trained by ingeol. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/q2d_ep3_35_pipeline_en_5.5.0_3.0_1725769074834.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/q2d_ep3_35_pipeline_en_5.5.0_3.0_1725769074834.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("q2d_ep3_35_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("q2d_ep3_35_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|q2d_ep3_35_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.1 MB| + +## References + +https://huggingface.co/ingeol/q2d_ep3_35 + +## Included Models + +- DocumentAssembler +- MPNetEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-08-qa_model_vasanth_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-08-qa_model_vasanth_pipeline_en.md new file mode 100644 index 00000000000000..e6d0cec2b75a23 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-08-qa_model_vasanth_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English qa_model_vasanth_pipeline pipeline DistilBertForQuestionAnswering from Vasanth +author: John Snow Labs +name: qa_model_vasanth_pipeline +date: 2024-09-08 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`qa_model_vasanth_pipeline` is a English model originally trained by Vasanth. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/qa_model_vasanth_pipeline_en_5.5.0_3.0_1725823658879.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/qa_model_vasanth_pipeline_en_5.5.0_3.0_1725823658879.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("qa_model_vasanth_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("qa_model_vasanth_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|qa_model_vasanth_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|243.8 MB| + +## References + +https://huggingface.co/Vasanth/qa_model + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-08-qqp_microsoft_deberta_v3_base_seed_3_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-08-qqp_microsoft_deberta_v3_base_seed_3_pipeline_en.md new file mode 100644 index 00000000000000..c61dcdfdcb243a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-08-qqp_microsoft_deberta_v3_base_seed_3_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English qqp_microsoft_deberta_v3_base_seed_3_pipeline pipeline DeBertaForSequenceClassification from utahnlp +author: John Snow Labs +name: qqp_microsoft_deberta_v3_base_seed_3_pipeline +date: 2024-09-08 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DeBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`qqp_microsoft_deberta_v3_base_seed_3_pipeline` is a English model originally trained by utahnlp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/qqp_microsoft_deberta_v3_base_seed_3_pipeline_en_5.5.0_3.0_1725803323949.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/qqp_microsoft_deberta_v3_base_seed_3_pipeline_en_5.5.0_3.0_1725803323949.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("qqp_microsoft_deberta_v3_base_seed_3_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("qqp_microsoft_deberta_v3_base_seed_3_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|qqp_microsoft_deberta_v3_base_seed_3_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|648.9 MB| + +## References + +https://huggingface.co/utahnlp/qqp_microsoft_deberta-v3-base_seed-3 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DeBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-08-response_toxicity_classifier_base_pipeline_ru.md b/docs/_posts/ahmedlone127/2024-09-08-response_toxicity_classifier_base_pipeline_ru.md new file mode 100644 index 00000000000000..06e41122b31d65 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-08-response_toxicity_classifier_base_pipeline_ru.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Russian response_toxicity_classifier_base_pipeline pipeline BertForSequenceClassification from t-bank-ai +author: John Snow Labs +name: response_toxicity_classifier_base_pipeline +date: 2024-09-08 +tags: [ru, open_source, pipeline, onnx] +task: Text Classification +language: ru +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`response_toxicity_classifier_base_pipeline` is a Russian model originally trained by t-bank-ai. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/response_toxicity_classifier_base_pipeline_ru_5.5.0_3.0_1725839048333.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/response_toxicity_classifier_base_pipeline_ru_5.5.0_3.0_1725839048333.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("response_toxicity_classifier_base_pipeline", lang = "ru") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("response_toxicity_classifier_base_pipeline", lang = "ru") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|response_toxicity_classifier_base_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|ru| +|Size:|611.0 MB| + +## References + +https://huggingface.co/t-bank-ai/response-toxicity-classifier-base + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-08-results_forwarder1121_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-08-results_forwarder1121_pipeline_en.md new file mode 100644 index 00000000000000..985a69c548abb9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-08-results_forwarder1121_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English results_forwarder1121_pipeline pipeline DistilBertForSequenceClassification from forwarder1121 +author: John Snow Labs +name: results_forwarder1121_pipeline +date: 2024-09-08 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`results_forwarder1121_pipeline` is a English model originally trained by forwarder1121. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/results_forwarder1121_pipeline_en_5.5.0_3.0_1725808945032.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/results_forwarder1121_pipeline_en_5.5.0_3.0_1725808945032.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("results_forwarder1121_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("results_forwarder1121_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|results_forwarder1121_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/forwarder1121/results + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-08-retrieval_mpnet_dot_finetuned_llama3_openbiollm_synthetic_dataset_en.md b/docs/_posts/ahmedlone127/2024-09-08-retrieval_mpnet_dot_finetuned_llama3_openbiollm_synthetic_dataset_en.md new file mode 100644 index 00000000000000..b00b644297f3f3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-08-retrieval_mpnet_dot_finetuned_llama3_openbiollm_synthetic_dataset_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English retrieval_mpnet_dot_finetuned_llama3_openbiollm_synthetic_dataset MPNetEmbeddings from antonkirk +author: John Snow Labs +name: retrieval_mpnet_dot_finetuned_llama3_openbiollm_synthetic_dataset +date: 2024-09-08 +tags: [en, open_source, onnx, embeddings, mpnet] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MPNetEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MPNetEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`retrieval_mpnet_dot_finetuned_llama3_openbiollm_synthetic_dataset` is a English model originally trained by antonkirk. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/retrieval_mpnet_dot_finetuned_llama3_openbiollm_synthetic_dataset_en_5.5.0_3.0_1725815372003.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/retrieval_mpnet_dot_finetuned_llama3_openbiollm_synthetic_dataset_en_5.5.0_3.0_1725815372003.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = MPNetEmbeddings.pretrained("retrieval_mpnet_dot_finetuned_llama3_openbiollm_synthetic_dataset","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val embeddings = MPNetEmbeddings.pretrained("retrieval_mpnet_dot_finetuned_llama3_openbiollm_synthetic_dataset","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|retrieval_mpnet_dot_finetuned_llama3_openbiollm_synthetic_dataset| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[mpnet]| +|Language:|en| +|Size:|406.9 MB| + +## References + +https://huggingface.co/antonkirk/retrieval-mpnet-dot-finetuned-llama3-openbiollm-synthetic-dataset \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-08-rise_ner_distilbert_base_cased_system_b_v2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-08-rise_ner_distilbert_base_cased_system_b_v2_pipeline_en.md new file mode 100644 index 00000000000000..1076d9f2a9d2c6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-08-rise_ner_distilbert_base_cased_system_b_v2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English rise_ner_distilbert_base_cased_system_b_v2_pipeline pipeline DistilBertForTokenClassification from petersamoaa +author: John Snow Labs +name: rise_ner_distilbert_base_cased_system_b_v2_pipeline +date: 2024-09-08 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`rise_ner_distilbert_base_cased_system_b_v2_pipeline` is a English model originally trained by petersamoaa. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/rise_ner_distilbert_base_cased_system_b_v2_pipeline_en_5.5.0_3.0_1725837489776.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/rise_ner_distilbert_base_cased_system_b_v2_pipeline_en_5.5.0_3.0_1725837489776.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("rise_ner_distilbert_base_cased_system_b_v2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("rise_ner_distilbert_base_cased_system_b_v2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|rise_ner_distilbert_base_cased_system_b_v2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|243.9 MB| + +## References + +https://huggingface.co/petersamoaa/rise-ner-distilbert-base-cased-system-b-v2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-08-roberta_base_climate_evidence_related_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-08-roberta_base_climate_evidence_related_pipeline_en.md new file mode 100644 index 00000000000000..a69b9dd071f845 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-08-roberta_base_climate_evidence_related_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_climate_evidence_related_pipeline pipeline RoBertaForSequenceClassification from mwong +author: John Snow Labs +name: roberta_base_climate_evidence_related_pipeline +date: 2024-09-08 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_climate_evidence_related_pipeline` is a English model originally trained by mwong. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_climate_evidence_related_pipeline_en_5.5.0_3.0_1725778566900.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_climate_evidence_related_pipeline_en_5.5.0_3.0_1725778566900.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_climate_evidence_related_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_climate_evidence_related_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_climate_evidence_related_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|300.4 MB| + +## References + +https://huggingface.co/mwong/roberta-base-climate-evidence-related + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-08-roberta_soft_hd_en.md b/docs/_posts/ahmedlone127/2024-09-08-roberta_soft_hd_en.md new file mode 100644 index 00000000000000..7e11898e864a4f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-08-roberta_soft_hd_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_soft_hd RoBertaForSequenceClassification from Multiperspective +author: John Snow Labs +name: roberta_soft_hd +date: 2024-09-08 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_soft_hd` is a English model originally trained by Multiperspective. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_soft_hd_en_5.5.0_3.0_1725821251691.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_soft_hd_en_5.5.0_3.0_1725821251691.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_soft_hd","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_soft_hd", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_soft_hd| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/Multiperspective/roberta-soft-hd \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-08-spark_name_armenian_tonga_tonga_islands_english_en.md b/docs/_posts/ahmedlone127/2024-09-08-spark_name_armenian_tonga_tonga_islands_english_en.md new file mode 100644 index 00000000000000..579b43a1ff4a31 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-08-spark_name_armenian_tonga_tonga_islands_english_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English spark_name_armenian_tonga_tonga_islands_english MarianTransformer from ihebaker10 +author: John Snow Labs +name: spark_name_armenian_tonga_tonga_islands_english +date: 2024-09-08 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`spark_name_armenian_tonga_tonga_islands_english` is a English model originally trained by ihebaker10. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/spark_name_armenian_tonga_tonga_islands_english_en_5.5.0_3.0_1725824974673.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/spark_name_armenian_tonga_tonga_islands_english_en_5.5.0_3.0_1725824974673.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("spark_name_armenian_tonga_tonga_islands_english","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("spark_name_armenian_tonga_tonga_islands_english","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|spark_name_armenian_tonga_tonga_islands_english| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|523.5 MB| + +## References + +https://huggingface.co/ihebaker10/spark-name-hy-to-en \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-08-spotify_podcast_advertising_classification_en.md b/docs/_posts/ahmedlone127/2024-09-08-spotify_podcast_advertising_classification_en.md new file mode 100644 index 00000000000000..2aaa9e72ac4c35 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-08-spotify_podcast_advertising_classification_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English spotify_podcast_advertising_classification BertForSequenceClassification from morenolq +author: John Snow Labs +name: spotify_podcast_advertising_classification +date: 2024-09-08 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`spotify_podcast_advertising_classification` is a English model originally trained by morenolq. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/spotify_podcast_advertising_classification_en_5.5.0_3.0_1725761516783.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/spotify_podcast_advertising_classification_en_5.5.0_3.0_1725761516783.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("spotify_podcast_advertising_classification","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("spotify_podcast_advertising_classification", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|spotify_podcast_advertising_classification| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|405.9 MB| + +## References + +https://huggingface.co/morenolq/spotify-podcast-advertising-classification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-08-test_model_12944qwerty_en.md b/docs/_posts/ahmedlone127/2024-09-08-test_model_12944qwerty_en.md new file mode 100644 index 00000000000000..c3087a05fd3503 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-08-test_model_12944qwerty_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English test_model_12944qwerty DistilBertForQuestionAnswering from 12944qwerty +author: John Snow Labs +name: test_model_12944qwerty +date: 2024-09-08 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`test_model_12944qwerty` is a English model originally trained by 12944qwerty. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/test_model_12944qwerty_en_5.5.0_3.0_1725798023359.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/test_model_12944qwerty_en_5.5.0_3.0_1725798023359.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("test_model_12944qwerty","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("test_model_12944qwerty", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|test_model_12944qwerty| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/12944qwerty/test_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-08-test_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-08-test_pipeline_en.md new file mode 100644 index 00000000000000..3b1dd46dab18ce --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-08-test_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English test_pipeline pipeline MPNetEmbeddings from sheaDurgin +author: John Snow Labs +name: test_pipeline +date: 2024-09-08 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MPNetEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`test_pipeline` is a English model originally trained by sheaDurgin. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/test_pipeline_en_5.5.0_3.0_1725817148932.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/test_pipeline_en_5.5.0_3.0_1725817148932.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("test_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("test_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|test_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/sheaDurgin/test + +## Included Models + +- DocumentAssembler +- MPNetEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-08-token_classification_test_sahithiankireddy_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-08-token_classification_test_sahithiankireddy_pipeline_en.md new file mode 100644 index 00000000000000..caf79bc8dbc68a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-08-token_classification_test_sahithiankireddy_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English token_classification_test_sahithiankireddy_pipeline pipeline DistilBertForTokenClassification from sahithiankireddy +author: John Snow Labs +name: token_classification_test_sahithiankireddy_pipeline +date: 2024-09-08 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`token_classification_test_sahithiankireddy_pipeline` is a English model originally trained by sahithiankireddy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/token_classification_test_sahithiankireddy_pipeline_en_5.5.0_3.0_1725788965117.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/token_classification_test_sahithiankireddy_pipeline_en_5.5.0_3.0_1725788965117.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("token_classification_test_sahithiankireddy_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("token_classification_test_sahithiankireddy_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|token_classification_test_sahithiankireddy_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.4 MB| + +## References + +https://huggingface.co/sahithiankireddy/token_classification_test + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-08-xtremedistil_l6_h256_uncased_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-08-xtremedistil_l6_h256_uncased_pipeline_en.md new file mode 100644 index 00000000000000..7974915042c028 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-08-xtremedistil_l6_h256_uncased_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xtremedistil_l6_h256_uncased_pipeline pipeline BertForSequenceClassification from microsoft +author: John Snow Labs +name: xtremedistil_l6_h256_uncased_pipeline +date: 2024-09-08 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xtremedistil_l6_h256_uncased_pipeline` is a English model originally trained by microsoft. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xtremedistil_l6_h256_uncased_pipeline_en_5.5.0_3.0_1725801955061.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xtremedistil_l6_h256_uncased_pipeline_en_5.5.0_3.0_1725801955061.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xtremedistil_l6_h256_uncased_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xtremedistil_l6_h256_uncased_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xtremedistil_l6_h256_uncased_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|47.3 MB| + +## References + +https://huggingface.co/microsoft/xtremedistil-l6-h256-uncased + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-09-005_microsoft_deberta_v3_base_finetuned_yahoo_140k_en.md b/docs/_posts/ahmedlone127/2024-09-09-005_microsoft_deberta_v3_base_finetuned_yahoo_140k_en.md new file mode 100644 index 00000000000000..97e0b0ef250ac0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-09-005_microsoft_deberta_v3_base_finetuned_yahoo_140k_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English 005_microsoft_deberta_v3_base_finetuned_yahoo_140k DeBertaForSequenceClassification from diogopaes10 +author: John Snow Labs +name: 005_microsoft_deberta_v3_base_finetuned_yahoo_140k +date: 2024-09-09 +tags: [en, open_source, onnx, sequence_classification, deberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DeBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DeBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`005_microsoft_deberta_v3_base_finetuned_yahoo_140k` is a English model originally trained by diogopaes10. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/005_microsoft_deberta_v3_base_finetuned_yahoo_140k_en_5.5.0_3.0_1725879622346.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/005_microsoft_deberta_v3_base_finetuned_yahoo_140k_en_5.5.0_3.0_1725879622346.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DeBertaForSequenceClassification.pretrained("005_microsoft_deberta_v3_base_finetuned_yahoo_140k","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DeBertaForSequenceClassification.pretrained("005_microsoft_deberta_v3_base_finetuned_yahoo_140k", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|005_microsoft_deberta_v3_base_finetuned_yahoo_140k| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|659.6 MB| + +## References + +https://huggingface.co/diogopaes10/005-microsoft-deberta-v3-base-finetuned-yahoo-140k \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-09-albert_base_v2weighted_hoax_classifier_final_defs_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-09-albert_base_v2weighted_hoax_classifier_final_defs_pipeline_en.md new file mode 100644 index 00000000000000..0344cd967b1ef4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-09-albert_base_v2weighted_hoax_classifier_final_defs_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English albert_base_v2weighted_hoax_classifier_final_defs_pipeline pipeline AlbertForSequenceClassification from research-dump +author: John Snow Labs +name: albert_base_v2weighted_hoax_classifier_final_defs_pipeline +date: 2024-09-09 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained AlbertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`albert_base_v2weighted_hoax_classifier_final_defs_pipeline` is a English model originally trained by research-dump. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/albert_base_v2weighted_hoax_classifier_final_defs_pipeline_en_5.5.0_3.0_1725889371458.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/albert_base_v2weighted_hoax_classifier_final_defs_pipeline_en_5.5.0_3.0_1725889371458.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("albert_base_v2weighted_hoax_classifier_final_defs_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("albert_base_v2weighted_hoax_classifier_final_defs_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|albert_base_v2weighted_hoax_classifier_final_defs_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|44.2 MB| + +## References + +https://huggingface.co/research-dump/albert-base-v2weighted_hoax_classifier_final_defs + +## Included Models + +- DocumentAssembler +- TokenizerModel +- AlbertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-09-albert_finetuned_stationary_update_en.md b/docs/_posts/ahmedlone127/2024-09-09-albert_finetuned_stationary_update_en.md new file mode 100644 index 00000000000000..aecdaeb74bd382 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-09-albert_finetuned_stationary_update_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English albert_finetuned_stationary_update AlbertForSequenceClassification from MKS3099 +author: John Snow Labs +name: albert_finetuned_stationary_update +date: 2024-09-09 +tags: [en, open_source, onnx, sequence_classification, albert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: AlbertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained AlbertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`albert_finetuned_stationary_update` is a English model originally trained by MKS3099. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/albert_finetuned_stationary_update_en_5.5.0_3.0_1725853999178.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/albert_finetuned_stationary_update_en_5.5.0_3.0_1725853999178.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = AlbertForSequenceClassification.pretrained("albert_finetuned_stationary_update","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = AlbertForSequenceClassification.pretrained("albert_finetuned_stationary_update", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|albert_finetuned_stationary_update| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|44.2 MB| + +## References + +https://huggingface.co/MKS3099/Albert-finetuned-stationary-update \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-09-albert_persian_farsi_base_v2_clf_digimag_pipeline_fa.md b/docs/_posts/ahmedlone127/2024-09-09-albert_persian_farsi_base_v2_clf_digimag_pipeline_fa.md new file mode 100644 index 00000000000000..d22a01b5810ffb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-09-albert_persian_farsi_base_v2_clf_digimag_pipeline_fa.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Persian albert_persian_farsi_base_v2_clf_digimag_pipeline pipeline AlbertForSequenceClassification from m3hrdadfi +author: John Snow Labs +name: albert_persian_farsi_base_v2_clf_digimag_pipeline +date: 2024-09-09 +tags: [fa, open_source, pipeline, onnx] +task: Text Classification +language: fa +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained AlbertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`albert_persian_farsi_base_v2_clf_digimag_pipeline` is a Persian model originally trained by m3hrdadfi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/albert_persian_farsi_base_v2_clf_digimag_pipeline_fa_5.5.0_3.0_1725889141824.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/albert_persian_farsi_base_v2_clf_digimag_pipeline_fa_5.5.0_3.0_1725889141824.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("albert_persian_farsi_base_v2_clf_digimag_pipeline", lang = "fa") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("albert_persian_farsi_base_v2_clf_digimag_pipeline", lang = "fa") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|albert_persian_farsi_base_v2_clf_digimag_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|fa| +|Size:|68.6 MB| + +## References + +https://huggingface.co/m3hrdadfi/albert-fa-base-v2-clf-digimag + +## Included Models + +- DocumentAssembler +- TokenizerModel +- AlbertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-09-all_roberta_large_v1_small_talk_1_16_5_en.md b/docs/_posts/ahmedlone127/2024-09-09-all_roberta_large_v1_small_talk_1_16_5_en.md new file mode 100644 index 00000000000000..c5ed1790fdb1a7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-09-all_roberta_large_v1_small_talk_1_16_5_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English all_roberta_large_v1_small_talk_1_16_5 RoBertaForSequenceClassification from fathyshalab +author: John Snow Labs +name: all_roberta_large_v1_small_talk_1_16_5 +date: 2024-09-09 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`all_roberta_large_v1_small_talk_1_16_5` is a English model originally trained by fathyshalab. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/all_roberta_large_v1_small_talk_1_16_5_en_5.5.0_3.0_1725902249835.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/all_roberta_large_v1_small_talk_1_16_5_en_5.5.0_3.0_1725902249835.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("all_roberta_large_v1_small_talk_1_16_5","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("all_roberta_large_v1_small_talk_1_16_5", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|all_roberta_large_v1_small_talk_1_16_5| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/fathyshalab/all-roberta-large-v1-small_talk-1-16-5 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-09-arabic2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-09-arabic2_pipeline_en.md new file mode 100644 index 00000000000000..de1d2667c60568 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-09-arabic2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English arabic2_pipeline pipeline MarianTransformer from PontifexMaximus +author: John Snow Labs +name: arabic2_pipeline +date: 2024-09-09 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`arabic2_pipeline` is a English model originally trained by PontifexMaximus. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/arabic2_pipeline_en_5.5.0_3.0_1725891708866.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/arabic2_pipeline_en_5.5.0_3.0_1725891708866.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("arabic2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("arabic2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|arabic2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|528.5 MB| + +## References + +https://huggingface.co/PontifexMaximus/Arabic2 + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-09-bertimbau_finetuned_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-09-bertimbau_finetuned_pipeline_en.md new file mode 100644 index 00000000000000..56cab9fe45b61a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-09-bertimbau_finetuned_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bertimbau_finetuned_pipeline pipeline BertForSequenceClassification from Horusprg +author: John Snow Labs +name: bertimbau_finetuned_pipeline +date: 2024-09-09 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bertimbau_finetuned_pipeline` is a English model originally trained by Horusprg. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bertimbau_finetuned_pipeline_en_5.5.0_3.0_1725900028806.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bertimbau_finetuned_pipeline_en_5.5.0_3.0_1725900028806.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bertimbau_finetuned_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bertimbau_finetuned_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bertimbau_finetuned_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|408.2 MB| + +## References + +https://huggingface.co/Horusprg/bertimbau-finetuned + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-09-best_model_yelp_polarity_64_42_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-09-best_model_yelp_polarity_64_42_pipeline_en.md new file mode 100644 index 00000000000000..a8c27f8aeca95a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-09-best_model_yelp_polarity_64_42_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English best_model_yelp_polarity_64_42_pipeline pipeline AlbertForSequenceClassification from simonycl +author: John Snow Labs +name: best_model_yelp_polarity_64_42_pipeline +date: 2024-09-09 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained AlbertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`best_model_yelp_polarity_64_42_pipeline` is a English model originally trained by simonycl. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/best_model_yelp_polarity_64_42_pipeline_en_5.5.0_3.0_1725924141406.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/best_model_yelp_polarity_64_42_pipeline_en_5.5.0_3.0_1725924141406.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("best_model_yelp_polarity_64_42_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("best_model_yelp_polarity_64_42_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|best_model_yelp_polarity_64_42_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|44.2 MB| + +## References + +https://huggingface.co/simonycl/best_model-yelp_polarity-64-42 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- AlbertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-09-burmese_awesome_qa_model_acezxn_en.md b/docs/_posts/ahmedlone127/2024-09-09-burmese_awesome_qa_model_acezxn_en.md new file mode 100644 index 00000000000000..65f34a73b57608 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-09-burmese_awesome_qa_model_acezxn_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English burmese_awesome_qa_model_acezxn DistilBertForQuestionAnswering from acezxn +author: John Snow Labs +name: burmese_awesome_qa_model_acezxn +date: 2024-09-09 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_qa_model_acezxn` is a English model originally trained by acezxn. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_acezxn_en_5.5.0_3.0_1725876818400.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_acezxn_en_5.5.0_3.0_1725876818400.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("burmese_awesome_qa_model_acezxn","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("burmese_awesome_qa_model_acezxn", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_qa_model_acezxn| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/acezxn/my_awesome_qa_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-09-burmese_awesome_setfit_model_kanixwang_en.md b/docs/_posts/ahmedlone127/2024-09-09-burmese_awesome_setfit_model_kanixwang_en.md new file mode 100644 index 00000000000000..ddcd30b29ec2de --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-09-burmese_awesome_setfit_model_kanixwang_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English burmese_awesome_setfit_model_kanixwang MPNetEmbeddings from kanixwang +author: John Snow Labs +name: burmese_awesome_setfit_model_kanixwang +date: 2024-09-09 +tags: [en, open_source, onnx, embeddings, mpnet] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MPNetEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MPNetEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_setfit_model_kanixwang` is a English model originally trained by kanixwang. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_setfit_model_kanixwang_en_5.5.0_3.0_1725897230236.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_setfit_model_kanixwang_en_5.5.0_3.0_1725897230236.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = MPNetEmbeddings.pretrained("burmese_awesome_setfit_model_kanixwang","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val embeddings = MPNetEmbeddings.pretrained("burmese_awesome_setfit_model_kanixwang","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_setfit_model_kanixwang| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[mpnet]| +|Language:|en| +|Size:|406.9 MB| + +## References + +https://huggingface.co/kanixwang/my-awesome-setfit-model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-09-childes_parentberto_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-09-childes_parentberto_pipeline_en.md new file mode 100644 index 00000000000000..2b5829f19b6f17 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-09-childes_parentberto_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English childes_parentberto_pipeline pipeline RoBertaEmbeddings from pulp +author: John Snow Labs +name: childes_parentberto_pipeline +date: 2024-09-09 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`childes_parentberto_pipeline` is a English model originally trained by pulp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/childes_parentberto_pipeline_en_5.5.0_3.0_1725925172695.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/childes_parentberto_pipeline_en_5.5.0_3.0_1725925172695.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("childes_parentberto_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("childes_parentberto_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|childes_parentberto_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|256.6 MB| + +## References + +https://huggingface.co/pulp/CHILDES-ParentBERTo + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-09-dataequity_opus_maltese_tagalog_spanish_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-09-dataequity_opus_maltese_tagalog_spanish_pipeline_en.md new file mode 100644 index 00000000000000..f8f22525ac6c85 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-09-dataequity_opus_maltese_tagalog_spanish_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English dataequity_opus_maltese_tagalog_spanish_pipeline pipeline MarianTransformer from dataequity +author: John Snow Labs +name: dataequity_opus_maltese_tagalog_spanish_pipeline +date: 2024-09-09 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`dataequity_opus_maltese_tagalog_spanish_pipeline` is a English model originally trained by dataequity. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/dataequity_opus_maltese_tagalog_spanish_pipeline_en_5.5.0_3.0_1725913086669.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/dataequity_opus_maltese_tagalog_spanish_pipeline_en_5.5.0_3.0_1725913086669.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("dataequity_opus_maltese_tagalog_spanish_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("dataequity_opus_maltese_tagalog_spanish_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|dataequity_opus_maltese_tagalog_spanish_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|511.2 MB| + +## References + +https://huggingface.co/dataequity/dataequity-opus-mt-tl-es + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-09-distilbert_base_cased_concept_extraction_wikipedia_v1_0_concept_extraction_indoiranian_languages_v1_0_en.md b/docs/_posts/ahmedlone127/2024-09-09-distilbert_base_cased_concept_extraction_wikipedia_v1_0_concept_extraction_indoiranian_languages_v1_0_en.md new file mode 100644 index 00000000000000..c450e340570308 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-09-distilbert_base_cased_concept_extraction_wikipedia_v1_0_concept_extraction_indoiranian_languages_v1_0_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_cased_concept_extraction_wikipedia_v1_0_concept_extraction_indoiranian_languages_v1_0 DistilBertForTokenClassification from HungChau +author: John Snow Labs +name: distilbert_base_cased_concept_extraction_wikipedia_v1_0_concept_extraction_indoiranian_languages_v1_0 +date: 2024-09-09 +tags: [en, open_source, onnx, token_classification, distilbert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_cased_concept_extraction_wikipedia_v1_0_concept_extraction_indoiranian_languages_v1_0` is a English model originally trained by HungChau. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_cased_concept_extraction_wikipedia_v1_0_concept_extraction_indoiranian_languages_v1_0_en_5.5.0_3.0_1725889772953.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_cased_concept_extraction_wikipedia_v1_0_concept_extraction_indoiranian_languages_v1_0_en_5.5.0_3.0_1725889772953.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = DistilBertForTokenClassification.pretrained("distilbert_base_cased_concept_extraction_wikipedia_v1_0_concept_extraction_indoiranian_languages_v1_0","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = DistilBertForTokenClassification.pretrained("distilbert_base_cased_concept_extraction_wikipedia_v1_0_concept_extraction_indoiranian_languages_v1_0", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_cased_concept_extraction_wikipedia_v1_0_concept_extraction_indoiranian_languages_v1_0| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|243.8 MB| + +## References + +https://huggingface.co/HungChau/distilbert-base-cased-concept-extraction-wikipedia-v1.0-concept-extraction-iir-v1.0 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-09-distilbert_base_uncased_distilled_squad_bilstm_finetuned_squad_pure2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-09-distilbert_base_uncased_distilled_squad_bilstm_finetuned_squad_pure2_pipeline_en.md new file mode 100644 index 00000000000000..751586d1519802 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-09-distilbert_base_uncased_distilled_squad_bilstm_finetuned_squad_pure2_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English distilbert_base_uncased_distilled_squad_bilstm_finetuned_squad_pure2_pipeline pipeline DistilBertForQuestionAnswering from allistair99 +author: John Snow Labs +name: distilbert_base_uncased_distilled_squad_bilstm_finetuned_squad_pure2_pipeline +date: 2024-09-09 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_distilled_squad_bilstm_finetuned_squad_pure2_pipeline` is a English model originally trained by allistair99. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_distilled_squad_bilstm_finetuned_squad_pure2_pipeline_en_5.5.0_3.0_1725892236084.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_distilled_squad_bilstm_finetuned_squad_pure2_pipeline_en_5.5.0_3.0_1725892236084.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_distilled_squad_bilstm_finetuned_squad_pure2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_distilled_squad_bilstm_finetuned_squad_pure2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_distilled_squad_bilstm_finetuned_squad_pure2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/allistair99/distilbert-base-uncased-distilled-squad-BiLSTM-finetuned-squad-pure2 + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-09-distilbert_base_uncased_finetuned_ag_news_v3_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-09-distilbert_base_uncased_finetuned_ag_news_v3_pipeline_en.md new file mode 100644 index 00000000000000..9ac00090c0b77c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-09-distilbert_base_uncased_finetuned_ag_news_v3_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_ag_news_v3_pipeline pipeline DistilBertEmbeddings from miggwp +author: John Snow Labs +name: distilbert_base_uncased_finetuned_ag_news_v3_pipeline +date: 2024-09-09 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_ag_news_v3_pipeline` is a English model originally trained by miggwp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_ag_news_v3_pipeline_en_5.5.0_3.0_1725921525015.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_ag_news_v3_pipeline_en_5.5.0_3.0_1725921525015.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_ag_news_v3_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_ag_news_v3_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_ag_news_v3_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/miggwp/distilbert-base-uncased-finetuned-ag-news-v3 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-09-distilbert_base_uncased_finetuned_imdb_copypaste_en.md b/docs/_posts/ahmedlone127/2024-09-09-distilbert_base_uncased_finetuned_imdb_copypaste_en.md new file mode 100644 index 00000000000000..8acda05918149d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-09-distilbert_base_uncased_finetuned_imdb_copypaste_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_imdb_copypaste DistilBertEmbeddings from CopyPaste +author: John Snow Labs +name: distilbert_base_uncased_finetuned_imdb_copypaste +date: 2024-09-09 +tags: [en, open_source, onnx, embeddings, distilbert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_imdb_copypaste` is a English model originally trained by CopyPaste. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_imdb_copypaste_en_5.5.0_3.0_1725905426089.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_imdb_copypaste_en_5.5.0_3.0_1725905426089.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = DistilBertEmbeddings.pretrained("distilbert_base_uncased_finetuned_imdb_copypaste","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = DistilBertEmbeddings.pretrained("distilbert_base_uncased_finetuned_imdb_copypaste","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_imdb_copypaste| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[distilbert]| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/CopyPaste/distilbert-base-uncased-finetuned-imdb \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-09-distilbert_base_uncased_finetuned_imdb_ruidanwang_en.md b/docs/_posts/ahmedlone127/2024-09-09-distilbert_base_uncased_finetuned_imdb_ruidanwang_en.md new file mode 100644 index 00000000000000..d724c3cb88526c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-09-distilbert_base_uncased_finetuned_imdb_ruidanwang_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_imdb_ruidanwang DistilBertEmbeddings from ruidanwang +author: John Snow Labs +name: distilbert_base_uncased_finetuned_imdb_ruidanwang +date: 2024-09-09 +tags: [en, open_source, onnx, embeddings, distilbert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_imdb_ruidanwang` is a English model originally trained by ruidanwang. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_imdb_ruidanwang_en_5.5.0_3.0_1725868264059.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_imdb_ruidanwang_en_5.5.0_3.0_1725868264059.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = DistilBertEmbeddings.pretrained("distilbert_base_uncased_finetuned_imdb_ruidanwang","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = DistilBertEmbeddings.pretrained("distilbert_base_uncased_finetuned_imdb_ruidanwang","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_imdb_ruidanwang| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[distilbert]| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/ruidanwang/distilbert-base-uncased-finetuned-imdb \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-09-distilbert_base_uncased_finetuned_imdb_victorbarra_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-09-distilbert_base_uncased_finetuned_imdb_victorbarra_pipeline_en.md new file mode 100644 index 00000000000000..396f929da2fc11 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-09-distilbert_base_uncased_finetuned_imdb_victorbarra_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_imdb_victorbarra_pipeline pipeline DistilBertEmbeddings from victorbarra +author: John Snow Labs +name: distilbert_base_uncased_finetuned_imdb_victorbarra_pipeline +date: 2024-09-09 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_imdb_victorbarra_pipeline` is a English model originally trained by victorbarra. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_imdb_victorbarra_pipeline_en_5.5.0_3.0_1725905588381.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_imdb_victorbarra_pipeline_en_5.5.0_3.0_1725905588381.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_imdb_victorbarra_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_imdb_victorbarra_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_imdb_victorbarra_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/victorbarra/distilbert-base-uncased-finetuned-imdb + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-09-distilbert_base_uncased_finetuned_squad_d5716d28_lakecrimsonn_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-09-distilbert_base_uncased_finetuned_squad_d5716d28_lakecrimsonn_pipeline_en.md new file mode 100644 index 00000000000000..5af0095e350349 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-09-distilbert_base_uncased_finetuned_squad_d5716d28_lakecrimsonn_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_squad_d5716d28_lakecrimsonn_pipeline pipeline DistilBertForQuestionAnswering from lakecrimsonn +author: John Snow Labs +name: distilbert_base_uncased_finetuned_squad_d5716d28_lakecrimsonn_pipeline +date: 2024-09-09 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_squad_d5716d28_lakecrimsonn_pipeline` is a English model originally trained by lakecrimsonn. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_d5716d28_lakecrimsonn_pipeline_en_5.5.0_3.0_1725892209346.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_d5716d28_lakecrimsonn_pipeline_en_5.5.0_3.0_1725892209346.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_squad_d5716d28_lakecrimsonn_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_squad_d5716d28_lakecrimsonn_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_squad_d5716d28_lakecrimsonn_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.3 MB| + +## References + +https://huggingface.co/lakecrimsonn/distilbert-base-uncased-finetuned-squad-d5716d28 + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-09-distilroberta_conll2003_en.md b/docs/_posts/ahmedlone127/2024-09-09-distilroberta_conll2003_en.md new file mode 100644 index 00000000000000..bed86c81c6c6ab --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-09-distilroberta_conll2003_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilroberta_conll2003 RoBertaForTokenClassification from jinhybr +author: John Snow Labs +name: distilroberta_conll2003 +date: 2024-09-09 +tags: [en, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilroberta_conll2003` is a English model originally trained by jinhybr. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilroberta_conll2003_en_5.5.0_3.0_1725888018229.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilroberta_conll2003_en_5.5.0_3.0_1725888018229.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("distilroberta_conll2003","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("distilroberta_conll2003", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilroberta_conll2003| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|306.5 MB| + +## References + +https://huggingface.co/jinhybr/distilroberta-ConLL2003 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-09-dummy_model_verfallen_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-09-dummy_model_verfallen_pipeline_en.md new file mode 100644 index 00000000000000..99bba134d1a75d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-09-dummy_model_verfallen_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English dummy_model_verfallen_pipeline pipeline CamemBertEmbeddings from verfallen +author: John Snow Labs +name: dummy_model_verfallen_pipeline +date: 2024-09-09 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained CamemBertEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`dummy_model_verfallen_pipeline` is a English model originally trained by verfallen. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/dummy_model_verfallen_pipeline_en_5.5.0_3.0_1725851295777.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/dummy_model_verfallen_pipeline_en_5.5.0_3.0_1725851295777.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("dummy_model_verfallen_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("dummy_model_verfallen_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|dummy_model_verfallen_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|264.0 MB| + +## References + +https://huggingface.co/verfallen/dummy-model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- CamemBertEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-09-en2zh40_en.md b/docs/_posts/ahmedlone127/2024-09-09-en2zh40_en.md new file mode 100644 index 00000000000000..7e4c4b42b05fdd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-09-en2zh40_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English en2zh40 MarianTransformer from Carlosino +author: John Snow Labs +name: en2zh40 +date: 2024-09-09 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`en2zh40` is a English model originally trained by Carlosino. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/en2zh40_en_5.5.0_3.0_1725913237265.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/en2zh40_en_5.5.0_3.0_1725913237265.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("en2zh40","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("en2zh40","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|en2zh40| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|541.0 MB| + +## References + +https://huggingface.co/Carlosino/en2zh40 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-09-finetuning_amaskedlanguage_en.md b/docs/_posts/ahmedlone127/2024-09-09-finetuning_amaskedlanguage_en.md new file mode 100644 index 00000000000000..2ae369fe554f5c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-09-finetuning_amaskedlanguage_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuning_amaskedlanguage DistilBertEmbeddings from DeveloperAya +author: John Snow Labs +name: finetuning_amaskedlanguage +date: 2024-09-09 +tags: [en, open_source, onnx, embeddings, distilbert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_amaskedlanguage` is a English model originally trained by DeveloperAya. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_amaskedlanguage_en_5.5.0_3.0_1725921617036.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_amaskedlanguage_en_5.5.0_3.0_1725921617036.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = DistilBertEmbeddings.pretrained("finetuning_amaskedlanguage","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = DistilBertEmbeddings.pretrained("finetuning_amaskedlanguage","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_amaskedlanguage| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[distilbert]| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/DeveloperAya/FineTuning_aMaskedLanguage \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-09-helsinki_danish_swedish_v10_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-09-helsinki_danish_swedish_v10_pipeline_en.md new file mode 100644 index 00000000000000..c224cac2053913 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-09-helsinki_danish_swedish_v10_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English helsinki_danish_swedish_v10_pipeline pipeline MarianTransformer from Danieljacobsen +author: John Snow Labs +name: helsinki_danish_swedish_v10_pipeline +date: 2024-09-09 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`helsinki_danish_swedish_v10_pipeline` is a English model originally trained by Danieljacobsen. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/helsinki_danish_swedish_v10_pipeline_en_5.5.0_3.0_1725891225259.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/helsinki_danish_swedish_v10_pipeline_en_5.5.0_3.0_1725891225259.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("helsinki_danish_swedish_v10_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("helsinki_danish_swedish_v10_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|helsinki_danish_swedish_v10_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|497.4 MB| + +## References + +https://huggingface.co/Danieljacobsen/Helsinki-DA-SV-v10 + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-09-hf_nlp_course_distilbert_base_uncased_finetuned_imdb_en.md b/docs/_posts/ahmedlone127/2024-09-09-hf_nlp_course_distilbert_base_uncased_finetuned_imdb_en.md new file mode 100644 index 00000000000000..d55eb453628eeb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-09-hf_nlp_course_distilbert_base_uncased_finetuned_imdb_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English hf_nlp_course_distilbert_base_uncased_finetuned_imdb DistilBertEmbeddings from yuwei2342 +author: John Snow Labs +name: hf_nlp_course_distilbert_base_uncased_finetuned_imdb +date: 2024-09-09 +tags: [en, open_source, onnx, embeddings, distilbert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`hf_nlp_course_distilbert_base_uncased_finetuned_imdb` is a English model originally trained by yuwei2342. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/hf_nlp_course_distilbert_base_uncased_finetuned_imdb_en_5.5.0_3.0_1725905773861.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/hf_nlp_course_distilbert_base_uncased_finetuned_imdb_en_5.5.0_3.0_1725905773861.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = DistilBertEmbeddings.pretrained("hf_nlp_course_distilbert_base_uncased_finetuned_imdb","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = DistilBertEmbeddings.pretrained("hf_nlp_course_distilbert_base_uncased_finetuned_imdb","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|hf_nlp_course_distilbert_base_uncased_finetuned_imdb| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[distilbert]| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/yuwei2342/hf-nlp-course-distilbert-base-uncased-finetuned-imdb \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-09-maltese_hitz_catalan_basque_pipeline_ca.md b/docs/_posts/ahmedlone127/2024-09-09-maltese_hitz_catalan_basque_pipeline_ca.md new file mode 100644 index 00000000000000..71d6ca30ad095a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-09-maltese_hitz_catalan_basque_pipeline_ca.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Catalan, Valencian maltese_hitz_catalan_basque_pipeline pipeline MarianTransformer from HiTZ +author: John Snow Labs +name: maltese_hitz_catalan_basque_pipeline +date: 2024-09-09 +tags: [ca, open_source, pipeline, onnx] +task: Translation +language: ca +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`maltese_hitz_catalan_basque_pipeline` is a Catalan, Valencian model originally trained by HiTZ. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/maltese_hitz_catalan_basque_pipeline_ca_5.5.0_3.0_1725891980977.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/maltese_hitz_catalan_basque_pipeline_ca_5.5.0_3.0_1725891980977.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("maltese_hitz_catalan_basque_pipeline", lang = "ca") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("maltese_hitz_catalan_basque_pipeline", lang = "ca") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|maltese_hitz_catalan_basque_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|ca| +|Size:|225.8 MB| + +## References + +https://huggingface.co/HiTZ/mt-hitz-ca-eu + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-09-mpnet_base_natural_questions_mnrl_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-09-mpnet_base_natural_questions_mnrl_pipeline_en.md new file mode 100644 index 00000000000000..dd213a16f74551 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-09-mpnet_base_natural_questions_mnrl_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English mpnet_base_natural_questions_mnrl_pipeline pipeline MPNetEmbeddings from tomaarsen +author: John Snow Labs +name: mpnet_base_natural_questions_mnrl_pipeline +date: 2024-09-09 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MPNetEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`mpnet_base_natural_questions_mnrl_pipeline` is a English model originally trained by tomaarsen. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/mpnet_base_natural_questions_mnrl_pipeline_en_5.5.0_3.0_1725897150565.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/mpnet_base_natural_questions_mnrl_pipeline_en_5.5.0_3.0_1725897150565.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("mpnet_base_natural_questions_mnrl_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("mpnet_base_natural_questions_mnrl_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|mpnet_base_natural_questions_mnrl_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|406.6 MB| + +## References + +https://huggingface.co/tomaarsen/mpnet-base-natural-questions-mnrl + +## Included Models + +- DocumentAssembler +- MPNetEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-09-n_roberta_agnews_padding100model_en.md b/docs/_posts/ahmedlone127/2024-09-09-n_roberta_agnews_padding100model_en.md new file mode 100644 index 00000000000000..d0554ee6dc1254 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-09-n_roberta_agnews_padding100model_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English n_roberta_agnews_padding100model RoBertaForSequenceClassification from Realgon +author: John Snow Labs +name: n_roberta_agnews_padding100model +date: 2024-09-09 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`n_roberta_agnews_padding100model` is a English model originally trained by Realgon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/n_roberta_agnews_padding100model_en_5.5.0_3.0_1725920856357.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/n_roberta_agnews_padding100model_en_5.5.0_3.0_1725920856357.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("n_roberta_agnews_padding100model","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("n_roberta_agnews_padding100model", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|n_roberta_agnews_padding100model| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|464.5 MB| + +## References + +https://huggingface.co/Realgon/N_roberta_agnews_padding100model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-09-opus_base_fine_freq_wce_unsampled_en.md b/docs/_posts/ahmedlone127/2024-09-09-opus_base_fine_freq_wce_unsampled_en.md new file mode 100644 index 00000000000000..05f7497f3ea337 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-09-opus_base_fine_freq_wce_unsampled_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English opus_base_fine_freq_wce_unsampled MarianTransformer from ethansimrm +author: John Snow Labs +name: opus_base_fine_freq_wce_unsampled +date: 2024-09-09 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opus_base_fine_freq_wce_unsampled` is a English model originally trained by ethansimrm. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opus_base_fine_freq_wce_unsampled_en_5.5.0_3.0_1725891714649.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opus_base_fine_freq_wce_unsampled_en_5.5.0_3.0_1725891714649.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("opus_base_fine_freq_wce_unsampled","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("opus_base_fine_freq_wce_unsampled","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opus_base_fine_freq_wce_unsampled| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|508.4 MB| + +## References + +https://huggingface.co/ethansimrm/opus_base_fine_freq_wce_unsampled \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-09-opus_maltese_english_italian_finetuned_4600_english_tonga_tonga_islands_italian_en.md b/docs/_posts/ahmedlone127/2024-09-09-opus_maltese_english_italian_finetuned_4600_english_tonga_tonga_islands_italian_en.md new file mode 100644 index 00000000000000..6cfe7de2e94892 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-09-opus_maltese_english_italian_finetuned_4600_english_tonga_tonga_islands_italian_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English opus_maltese_english_italian_finetuned_4600_english_tonga_tonga_islands_italian MarianTransformer from VFiona +author: John Snow Labs +name: opus_maltese_english_italian_finetuned_4600_english_tonga_tonga_islands_italian +date: 2024-09-09 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opus_maltese_english_italian_finetuned_4600_english_tonga_tonga_islands_italian` is a English model originally trained by VFiona. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opus_maltese_english_italian_finetuned_4600_english_tonga_tonga_islands_italian_en_5.5.0_3.0_1725865141230.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opus_maltese_english_italian_finetuned_4600_english_tonga_tonga_islands_italian_en_5.5.0_3.0_1725865141230.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("opus_maltese_english_italian_finetuned_4600_english_tonga_tonga_islands_italian","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("opus_maltese_english_italian_finetuned_4600_english_tonga_tonga_islands_italian","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opus_maltese_english_italian_finetuned_4600_english_tonga_tonga_islands_italian| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|622.9 MB| + +## References + +https://huggingface.co/VFiona/opus-mt-en-it-finetuned_4600-en-to-it \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-09-opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_andreypurwanto_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-09-opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_andreypurwanto_pipeline_en.md new file mode 100644 index 00000000000000..5218d20bff6bd0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-09-opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_andreypurwanto_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_andreypurwanto_pipeline pipeline MarianTransformer from andreypurwanto +author: John Snow Labs +name: opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_andreypurwanto_pipeline +date: 2024-09-09 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_andreypurwanto_pipeline` is a English model originally trained by andreypurwanto. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_andreypurwanto_pipeline_en_5.5.0_3.0_1725913083574.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_andreypurwanto_pipeline_en_5.5.0_3.0_1725913083574.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_andreypurwanto_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_andreypurwanto_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_andreypurwanto_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|509.2 MB| + +## References + +https://huggingface.co/andreypurwanto/opus-mt-en-ro-finetuned-en-to-ro + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-09-opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_chlorhexidine_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-09-opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_chlorhexidine_pipeline_en.md new file mode 100644 index 00000000000000..9f87e31f74f252 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-09-opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_chlorhexidine_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_chlorhexidine_pipeline pipeline MarianTransformer from Chlorhexidine +author: John Snow Labs +name: opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_chlorhexidine_pipeline +date: 2024-09-09 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_chlorhexidine_pipeline` is a English model originally trained by Chlorhexidine. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_chlorhexidine_pipeline_en_5.5.0_3.0_1725865119271.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_chlorhexidine_pipeline_en_5.5.0_3.0_1725865119271.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_chlorhexidine_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_chlorhexidine_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_chlorhexidine_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|509.2 MB| + +## References + +https://huggingface.co/Chlorhexidine/opus-mt-en-ro-finetuned-en-to-ro + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-09-opus_maltese_english_russian_finetuned_english_tonga_tonga_islands_russian_qwerty22tau_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-09-opus_maltese_english_russian_finetuned_english_tonga_tonga_islands_russian_qwerty22tau_pipeline_en.md new file mode 100644 index 00000000000000..ffbae5660d7a91 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-09-opus_maltese_english_russian_finetuned_english_tonga_tonga_islands_russian_qwerty22tau_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English opus_maltese_english_russian_finetuned_english_tonga_tonga_islands_russian_qwerty22tau_pipeline pipeline MarianTransformer from qwerty22tau +author: John Snow Labs +name: opus_maltese_english_russian_finetuned_english_tonga_tonga_islands_russian_qwerty22tau_pipeline +date: 2024-09-09 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opus_maltese_english_russian_finetuned_english_tonga_tonga_islands_russian_qwerty22tau_pipeline` is a English model originally trained by qwerty22tau. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opus_maltese_english_russian_finetuned_english_tonga_tonga_islands_russian_qwerty22tau_pipeline_en_5.5.0_3.0_1725891613581.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opus_maltese_english_russian_finetuned_english_tonga_tonga_islands_russian_qwerty22tau_pipeline_en_5.5.0_3.0_1725891613581.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("opus_maltese_english_russian_finetuned_english_tonga_tonga_islands_russian_qwerty22tau_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("opus_maltese_english_russian_finetuned_english_tonga_tonga_islands_russian_qwerty22tau_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opus_maltese_english_russian_finetuned_english_tonga_tonga_islands_russian_qwerty22tau_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|526.0 MB| + +## References + +https://huggingface.co/qwerty22tau/opus-mt-en-ru-finetuned-en-to-ru + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-09-opus_maltese_ft_v3_3_epochs_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-09-opus_maltese_ft_v3_3_epochs_pipeline_en.md new file mode 100644 index 00000000000000..e88cd70d528f0b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-09-opus_maltese_ft_v3_3_epochs_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English opus_maltese_ft_v3_3_epochs_pipeline pipeline MarianTransformer from abdiharyadi +author: John Snow Labs +name: opus_maltese_ft_v3_3_epochs_pipeline +date: 2024-09-09 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opus_maltese_ft_v3_3_epochs_pipeline` is a English model originally trained by abdiharyadi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opus_maltese_ft_v3_3_epochs_pipeline_en_5.5.0_3.0_1725840182886.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opus_maltese_ft_v3_3_epochs_pipeline_en_5.5.0_3.0_1725840182886.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("opus_maltese_ft_v3_3_epochs_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("opus_maltese_ft_v3_3_epochs_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opus_maltese_ft_v3_3_epochs_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|482.7 MB| + +## References + +https://huggingface.co/abdiharyadi/opus-mt-ft-v3-3-epochs + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-09-opus_maltese_korean_english_en.md b/docs/_posts/ahmedlone127/2024-09-09-opus_maltese_korean_english_en.md new file mode 100644 index 00000000000000..dec8e8f27be5d3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-09-opus_maltese_korean_english_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English opus_maltese_korean_english MarianTransformer from seohyun-choi +author: John Snow Labs +name: opus_maltese_korean_english +date: 2024-09-09 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opus_maltese_korean_english` is a English model originally trained by seohyun-choi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opus_maltese_korean_english_en_5.5.0_3.0_1725865364983.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opus_maltese_korean_english_en_5.5.0_3.0_1725865364983.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("opus_maltese_korean_english","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("opus_maltese_korean_english","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opus_maltese_korean_english| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|540.6 MB| + +## References + +https://huggingface.co/seohyun-choi/opus-mt-ko-en \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-09-opus_maltese_walloon_english_finetuned_npomo_english_15_epochs_en.md b/docs/_posts/ahmedlone127/2024-09-09-opus_maltese_walloon_english_finetuned_npomo_english_15_epochs_en.md new file mode 100644 index 00000000000000..a3cd8ab2f9b1df --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-09-opus_maltese_walloon_english_finetuned_npomo_english_15_epochs_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English opus_maltese_walloon_english_finetuned_npomo_english_15_epochs MarianTransformer from UnassumingOwl +author: John Snow Labs +name: opus_maltese_walloon_english_finetuned_npomo_english_15_epochs +date: 2024-09-09 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opus_maltese_walloon_english_finetuned_npomo_english_15_epochs` is a English model originally trained by UnassumingOwl. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opus_maltese_walloon_english_finetuned_npomo_english_15_epochs_en_5.5.0_3.0_1725864279280.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opus_maltese_walloon_english_finetuned_npomo_english_15_epochs_en_5.5.0_3.0_1725864279280.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("opus_maltese_walloon_english_finetuned_npomo_english_15_epochs","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("opus_maltese_walloon_english_finetuned_npomo_english_15_epochs","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opus_maltese_walloon_english_finetuned_npomo_english_15_epochs| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|506.2 MB| + +## References + +https://huggingface.co/UnassumingOwl/opus-mt-wa-en-finetuned-npomo-en-15-epochs \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-09-question_answering_hemg_en.md b/docs/_posts/ahmedlone127/2024-09-09-question_answering_hemg_en.md new file mode 100644 index 00000000000000..597e3df9ce915d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-09-question_answering_hemg_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English question_answering_hemg DistilBertForQuestionAnswering from Hemg +author: John Snow Labs +name: question_answering_hemg +date: 2024-09-09 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`question_answering_hemg` is a English model originally trained by Hemg. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/question_answering_hemg_en_5.5.0_3.0_1725868737704.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/question_answering_hemg_en_5.5.0_3.0_1725868737704.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("question_answering_hemg","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("question_answering_hemg", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|question_answering_hemg| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/Hemg/Question-answering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-09-refpydst_5p_icdst_split_v3_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-09-refpydst_5p_icdst_split_v3_pipeline_en.md new file mode 100644 index 00000000000000..3075c3170105f4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-09-refpydst_5p_icdst_split_v3_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English refpydst_5p_icdst_split_v3_pipeline pipeline MPNetEmbeddings from Brendan +author: John Snow Labs +name: refpydst_5p_icdst_split_v3_pipeline +date: 2024-09-09 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MPNetEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`refpydst_5p_icdst_split_v3_pipeline` is a English model originally trained by Brendan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/refpydst_5p_icdst_split_v3_pipeline_en_5.5.0_3.0_1725896436202.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/refpydst_5p_icdst_split_v3_pipeline_en_5.5.0_3.0_1725896436202.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("refpydst_5p_icdst_split_v3_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("refpydst_5p_icdst_split_v3_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|refpydst_5p_icdst_split_v3_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|406.9 MB| + +## References + +https://huggingface.co/Brendan/refpydst-5p-icdst-split-v3 + +## Included Models + +- DocumentAssembler +- MPNetEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-09-roberta_base_corener_en.md b/docs/_posts/ahmedlone127/2024-09-09-roberta_base_corener_en.md new file mode 100644 index 00000000000000..1409a870eadedf --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-09-roberta_base_corener_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_base_corener RoBertaEmbeddings from aiola +author: John Snow Labs +name: roberta_base_corener +date: 2024-09-09 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_corener` is a English model originally trained by aiola. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_corener_en_5.5.0_3.0_1725910000235.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_corener_en_5.5.0_3.0_1725910000235.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("roberta_base_corener","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("roberta_base_corener","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_corener| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|464.1 MB| + +## References + +https://huggingface.co/aiola/roberta-base-corener \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-09-roberta_legal_german_cased_german_legal_squad_part_augmented_1000_pipeline_de.md b/docs/_posts/ahmedlone127/2024-09-09-roberta_legal_german_cased_german_legal_squad_part_augmented_1000_pipeline_de.md new file mode 100644 index 00000000000000..dfceb63e9b1409 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-09-roberta_legal_german_cased_german_legal_squad_part_augmented_1000_pipeline_de.md @@ -0,0 +1,69 @@ +--- +layout: model +title: German roberta_legal_german_cased_german_legal_squad_part_augmented_1000_pipeline pipeline RoBertaForQuestionAnswering from farid1088 +author: John Snow Labs +name: roberta_legal_german_cased_german_legal_squad_part_augmented_1000_pipeline +date: 2024-09-09 +tags: [de, open_source, pipeline, onnx] +task: Question Answering +language: de +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_legal_german_cased_german_legal_squad_part_augmented_1000_pipeline` is a German model originally trained by farid1088. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_legal_german_cased_german_legal_squad_part_augmented_1000_pipeline_de_5.5.0_3.0_1725867331478.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_legal_german_cased_german_legal_squad_part_augmented_1000_pipeline_de_5.5.0_3.0_1725867331478.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_legal_german_cased_german_legal_squad_part_augmented_1000_pipeline", lang = "de") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_legal_german_cased_german_legal_squad_part_augmented_1000_pipeline", lang = "de") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_legal_german_cased_german_legal_squad_part_augmented_1000_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|de| +|Size:|465.8 MB| + +## References + +https://huggingface.co/farid1088/RoBERTa-legal-de-cased_German_legal_SQuAD_part_augmented_1000 + +## Included Models + +- MultiDocumentAssembler +- RoBertaForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-09-roberta_qa_model_10k_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-09-roberta_qa_model_10k_pipeline_en.md new file mode 100644 index 00000000000000..b764279155041f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-09-roberta_qa_model_10k_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English roberta_qa_model_10k_pipeline pipeline RoBertaForQuestionAnswering from anablasi +author: John Snow Labs +name: roberta_qa_model_10k_pipeline +date: 2024-09-09 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_qa_model_10k_pipeline` is a English model originally trained by anablasi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_qa_model_10k_pipeline_en_5.5.0_3.0_1725867032966.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_qa_model_10k_pipeline_en_5.5.0_3.0_1725867032966.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_qa_model_10k_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_qa_model_10k_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_qa_model_10k_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|466.4 MB| + +## References + +https://huggingface.co/anablasi/model_10k_qa + +## Included Models + +- MultiDocumentAssembler +- RoBertaForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-09-sentiment_analysis_benlitzen43_en.md b/docs/_posts/ahmedlone127/2024-09-09-sentiment_analysis_benlitzen43_en.md new file mode 100644 index 00000000000000..5c6dc8cde9d4de --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-09-sentiment_analysis_benlitzen43_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sentiment_analysis_benlitzen43 DistilBertForSequenceClassification from Benlitzen43 +author: John Snow Labs +name: sentiment_analysis_benlitzen43 +date: 2024-09-09 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sentiment_analysis_benlitzen43` is a English model originally trained by Benlitzen43. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sentiment_analysis_benlitzen43_en_5.5.0_3.0_1725873353980.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sentiment_analysis_benlitzen43_en_5.5.0_3.0_1725873353980.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("sentiment_analysis_benlitzen43","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("sentiment_analysis_benlitzen43", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sentiment_analysis_benlitzen43| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Benlitzen43/Sentiment-Analysis \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-09-sentiment_analysis_twitter_roberta_fine_tune_hashtag_removed_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-09-sentiment_analysis_twitter_roberta_fine_tune_hashtag_removed_pipeline_en.md new file mode 100644 index 00000000000000..19d4c7f79983a0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-09-sentiment_analysis_twitter_roberta_fine_tune_hashtag_removed_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English sentiment_analysis_twitter_roberta_fine_tune_hashtag_removed_pipeline pipeline RoBertaForSequenceClassification from technocrat3128 +author: John Snow Labs +name: sentiment_analysis_twitter_roberta_fine_tune_hashtag_removed_pipeline +date: 2024-09-09 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sentiment_analysis_twitter_roberta_fine_tune_hashtag_removed_pipeline` is a English model originally trained by technocrat3128. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sentiment_analysis_twitter_roberta_fine_tune_hashtag_removed_pipeline_en_5.5.0_3.0_1725903862853.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sentiment_analysis_twitter_roberta_fine_tune_hashtag_removed_pipeline_en_5.5.0_3.0_1725903862853.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sentiment_analysis_twitter_roberta_fine_tune_hashtag_removed_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sentiment_analysis_twitter_roberta_fine_tune_hashtag_removed_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sentiment_analysis_twitter_roberta_fine_tune_hashtag_removed_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|468.3 MB| + +## References + +https://huggingface.co/technocrat3128/sentiment_analysis_Twitter_roberta_fine_tune_hashtag_removed + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-09-tatoeba_finetuned_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-09-tatoeba_finetuned_pipeline_en.md new file mode 100644 index 00000000000000..b7c3ceabaabe02 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-09-tatoeba_finetuned_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English tatoeba_finetuned_pipeline pipeline MarianTransformer from muibk +author: John Snow Labs +name: tatoeba_finetuned_pipeline +date: 2024-09-09 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`tatoeba_finetuned_pipeline` is a English model originally trained by muibk. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/tatoeba_finetuned_pipeline_en_5.5.0_3.0_1725865058808.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/tatoeba_finetuned_pipeline_en_5.5.0_3.0_1725865058808.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("tatoeba_finetuned_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("tatoeba_finetuned_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|tatoeba_finetuned_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|538.1 MB| + +## References + +https://huggingface.co/muibk/tatoeba_finetuned + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-09-tiny_bert_0102_6500_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-09-tiny_bert_0102_6500_pipeline_en.md new file mode 100644 index 00000000000000..aab897e219d12f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-09-tiny_bert_0102_6500_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English tiny_bert_0102_6500_pipeline pipeline AlbertForSequenceClassification from gg-ai +author: John Snow Labs +name: tiny_bert_0102_6500_pipeline +date: 2024-09-09 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained AlbertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`tiny_bert_0102_6500_pipeline` is a English model originally trained by gg-ai. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/tiny_bert_0102_6500_pipeline_en_5.5.0_3.0_1725923814558.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/tiny_bert_0102_6500_pipeline_en_5.5.0_3.0_1725923814558.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("tiny_bert_0102_6500_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("tiny_bert_0102_6500_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|tiny_bert_0102_6500_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|20.5 MB| + +## References + +https://huggingface.co/gg-ai/tiny-bert-0102-6500 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- AlbertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-09-transformer_classification_test_v2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-09-transformer_classification_test_v2_pipeline_en.md new file mode 100644 index 00000000000000..089dea4083c784 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-09-transformer_classification_test_v2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English transformer_classification_test_v2_pipeline pipeline RoBertaForSequenceClassification from rd-1 +author: John Snow Labs +name: transformer_classification_test_v2_pipeline +date: 2024-09-09 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`transformer_classification_test_v2_pipeline` is a English model originally trained by rd-1. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/transformer_classification_test_v2_pipeline_en_5.5.0_3.0_1725911730253.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/transformer_classification_test_v2_pipeline_en_5.5.0_3.0_1725911730253.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("transformer_classification_test_v2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("transformer_classification_test_v2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|transformer_classification_test_v2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|468.4 MB| + +## References + +https://huggingface.co/rd-1/transformer_classification_test_v2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-09-translate_model_v3_2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-09-translate_model_v3_2_pipeline_en.md new file mode 100644 index 00000000000000..11aa91d02f262b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-09-translate_model_v3_2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English translate_model_v3_2_pipeline pipeline MarianTransformer from gshields +author: John Snow Labs +name: translate_model_v3_2_pipeline +date: 2024-09-09 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`translate_model_v3_2_pipeline` is a English model originally trained by gshields. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/translate_model_v3_2_pipeline_en_5.5.0_3.0_1725891557120.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/translate_model_v3_2_pipeline_en_5.5.0_3.0_1725891557120.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("translate_model_v3_2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("translate_model_v3_2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|translate_model_v3_2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|523.5 MB| + +## References + +https://huggingface.co/gshields/translate_model_v3.2 + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-09-xlm_roberta_base_finetuned_panx_all_hcy5561_en.md b/docs/_posts/ahmedlone127/2024-09-09-xlm_roberta_base_finetuned_panx_all_hcy5561_en.md new file mode 100644 index 00000000000000..33f42a03651013 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-09-xlm_roberta_base_finetuned_panx_all_hcy5561_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_all_hcy5561 XlmRoBertaForTokenClassification from hcy5561 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_all_hcy5561 +date: 2024-09-09 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_all_hcy5561` is a English model originally trained by hcy5561. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_hcy5561_en_5.5.0_3.0_1725923428686.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_hcy5561_en_5.5.0_3.0_1725923428686.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_all_hcy5561","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_all_hcy5561", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_all_hcy5561| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|848.0 MB| + +## References + +https://huggingface.co/hcy5561/xlm-roberta-base-finetuned-panx-all \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-09-xlm_roberta_base_finetuned_panx_german_solvaysphere_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-09-xlm_roberta_base_finetuned_panx_german_solvaysphere_pipeline_en.md new file mode 100644 index 00000000000000..07a16155b900c6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-09-xlm_roberta_base_finetuned_panx_german_solvaysphere_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_solvaysphere_pipeline pipeline XlmRoBertaForTokenClassification from solvaysphere +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_solvaysphere_pipeline +date: 2024-09-09 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_solvaysphere_pipeline` is a English model originally trained by solvaysphere. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_solvaysphere_pipeline_en_5.5.0_3.0_1725922273945.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_solvaysphere_pipeline_en_5.5.0_3.0_1725922273945.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_solvaysphere_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_solvaysphere_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_solvaysphere_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/solvaysphere/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-09-xlm_roberta_base_finetuned_panx_italian_wooseok0303_en.md b/docs/_posts/ahmedlone127/2024-09-09-xlm_roberta_base_finetuned_panx_italian_wooseok0303_en.md new file mode 100644 index 00000000000000..0e151477881c30 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-09-xlm_roberta_base_finetuned_panx_italian_wooseok0303_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_wooseok0303 XlmRoBertaForTokenClassification from wooseok0303 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_wooseok0303 +date: 2024-09-09 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_wooseok0303` is a English model originally trained by wooseok0303. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_wooseok0303_en_5.5.0_3.0_1725922910854.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_wooseok0303_en_5.5.0_3.0_1725922910854.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_wooseok0303","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_wooseok0303", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_wooseok0303| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|816.7 MB| + +## References + +https://huggingface.co/wooseok0303/xlm-roberta-base-finetuned-panx-it \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-09-xlmr_sinhalese_english_train_shuffled_1986_test2000_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-09-xlmr_sinhalese_english_train_shuffled_1986_test2000_pipeline_en.md new file mode 100644 index 00000000000000..2dc31f231c283c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-09-xlmr_sinhalese_english_train_shuffled_1986_test2000_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlmr_sinhalese_english_train_shuffled_1986_test2000_pipeline pipeline XlmRoBertaForSequenceClassification from patpizio +author: John Snow Labs +name: xlmr_sinhalese_english_train_shuffled_1986_test2000_pipeline +date: 2024-09-09 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlmr_sinhalese_english_train_shuffled_1986_test2000_pipeline` is a English model originally trained by patpizio. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmr_sinhalese_english_train_shuffled_1986_test2000_pipeline_en_5.5.0_3.0_1725906750068.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmr_sinhalese_english_train_shuffled_1986_test2000_pipeline_en_5.5.0_3.0_1725906750068.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlmr_sinhalese_english_train_shuffled_1986_test2000_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlmr_sinhalese_english_train_shuffled_1986_test2000_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmr_sinhalese_english_train_shuffled_1986_test2000_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|814.3 MB| + +## References + +https://huggingface.co/patpizio/xlmr-si-en-train_shuffled-1986-test2000 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-10-16_shot_twitter_2classes_nepal_bhasa_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-10-16_shot_twitter_2classes_nepal_bhasa_pipeline_en.md new file mode 100644 index 00000000000000..36770a8a9d3da0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-10-16_shot_twitter_2classes_nepal_bhasa_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English 16_shot_twitter_2classes_nepal_bhasa_pipeline pipeline MPNetEmbeddings from Nhat1904 +author: John Snow Labs +name: 16_shot_twitter_2classes_nepal_bhasa_pipeline +date: 2024-09-10 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MPNetEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`16_shot_twitter_2classes_nepal_bhasa_pipeline` is a English model originally trained by Nhat1904. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/16_shot_twitter_2classes_nepal_bhasa_pipeline_en_5.5.0_3.0_1725935918225.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/16_shot_twitter_2classes_nepal_bhasa_pipeline_en_5.5.0_3.0_1725935918225.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("16_shot_twitter_2classes_nepal_bhasa_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("16_shot_twitter_2classes_nepal_bhasa_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|16_shot_twitter_2classes_nepal_bhasa_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|406.9 MB| + +## References + +https://huggingface.co/Nhat1904/16-shot-twitter-2classes-new + +## Included Models + +- DocumentAssembler +- MPNetEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-10-albert_model__29_4_en.md b/docs/_posts/ahmedlone127/2024-09-10-albert_model__29_4_en.md new file mode 100644 index 00000000000000..2984b92630281f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-10-albert_model__29_4_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English albert_model__29_4 DistilBertForSequenceClassification from KalaiselvanD +author: John Snow Labs +name: albert_model__29_4 +date: 2024-09-10 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`albert_model__29_4` is a English model originally trained by KalaiselvanD. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/albert_model__29_4_en_5.5.0_3.0_1726009637645.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/albert_model__29_4_en_5.5.0_3.0_1726009637645.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("albert_model__29_4","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("albert_model__29_4", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|albert_model__29_4| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/KalaiselvanD/albert_model__29_4 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-10-all_mpnet_lr1e_8_margin_1_bosnian_32_en.md b/docs/_posts/ahmedlone127/2024-09-10-all_mpnet_lr1e_8_margin_1_bosnian_32_en.md new file mode 100644 index 00000000000000..6d323e90bd1004 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-10-all_mpnet_lr1e_8_margin_1_bosnian_32_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English all_mpnet_lr1e_8_margin_1_bosnian_32 MPNetEmbeddings from luiz-and-robert-thesis +author: John Snow Labs +name: all_mpnet_lr1e_8_margin_1_bosnian_32 +date: 2024-09-10 +tags: [en, open_source, onnx, embeddings, mpnet] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MPNetEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MPNetEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`all_mpnet_lr1e_8_margin_1_bosnian_32` is a English model originally trained by luiz-and-robert-thesis. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/all_mpnet_lr1e_8_margin_1_bosnian_32_en_5.5.0_3.0_1725978209257.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/all_mpnet_lr1e_8_margin_1_bosnian_32_en_5.5.0_3.0_1725978209257.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = MPNetEmbeddings.pretrained("all_mpnet_lr1e_8_margin_1_bosnian_32","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val embeddings = MPNetEmbeddings.pretrained("all_mpnet_lr1e_8_margin_1_bosnian_32","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|all_mpnet_lr1e_8_margin_1_bosnian_32| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[mpnet]| +|Language:|en| +|Size:|406.9 MB| + +## References + +https://huggingface.co/luiz-and-robert-thesis/all-mpnet-lr1e-8-margin-1-bs-32 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-10-bert_classifier_llmhum_en.md b/docs/_posts/ahmedlone127/2024-09-10-bert_classifier_llmhum_en.md new file mode 100644 index 00000000000000..33e893c29dc4b3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-10-bert_classifier_llmhum_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_classifier_llmhum MPNetEmbeddings from vmmalvarez +author: John Snow Labs +name: bert_classifier_llmhum +date: 2024-09-10 +tags: [en, open_source, onnx, embeddings, mpnet] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MPNetEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MPNetEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_classifier_llmhum` is a English model originally trained by vmmalvarez. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_classifier_llmhum_en_5.5.0_3.0_1725936039927.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_classifier_llmhum_en_5.5.0_3.0_1725936039927.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = MPNetEmbeddings.pretrained("bert_classifier_llmhum","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val embeddings = MPNetEmbeddings.pretrained("bert_classifier_llmhum","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_classifier_llmhum| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[mpnet]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/vmmalvarez/bert_classifier_llmhum \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-10-bert_gemma_strongoversight_vllm_0_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-10-bert_gemma_strongoversight_vllm_0_pipeline_en.md new file mode 100644 index 00000000000000..30180a31d08c55 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-10-bert_gemma_strongoversight_vllm_0_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_gemma_strongoversight_vllm_0_pipeline pipeline DistilBertForSequenceClassification from jvelja +author: John Snow Labs +name: bert_gemma_strongoversight_vllm_0_pipeline +date: 2024-09-10 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_gemma_strongoversight_vllm_0_pipeline` is a English model originally trained by jvelja. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_gemma_strongoversight_vllm_0_pipeline_en_5.5.0_3.0_1726009691198.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_gemma_strongoversight_vllm_0_pipeline_en_5.5.0_3.0_1726009691198.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_gemma_strongoversight_vllm_0_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_gemma_strongoversight_vllm_0_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_gemma_strongoversight_vllm_0_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/jvelja/BERT_gemma-strongOversight-vllm_0 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-10-burmese_awesome_qa_model_rouhan_en.md b/docs/_posts/ahmedlone127/2024-09-10-burmese_awesome_qa_model_rouhan_en.md new file mode 100644 index 00000000000000..1004ef2e7905eb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-10-burmese_awesome_qa_model_rouhan_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English burmese_awesome_qa_model_rouhan DistilBertForQuestionAnswering from Rouhan +author: John Snow Labs +name: burmese_awesome_qa_model_rouhan +date: 2024-09-10 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_qa_model_rouhan` is a English model originally trained by Rouhan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_rouhan_en_5.5.0_3.0_1725979724005.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_rouhan_en_5.5.0_3.0_1725979724005.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("burmese_awesome_qa_model_rouhan","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("burmese_awesome_qa_model_rouhan", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_qa_model_rouhan| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/Rouhan/my_awesome_qa_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-10-burmese_nepal_bhasa_model_en.md b/docs/_posts/ahmedlone127/2024-09-10-burmese_nepal_bhasa_model_en.md new file mode 100644 index 00000000000000..e3f20ef3e91028 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-10-burmese_nepal_bhasa_model_en.md @@ -0,0 +1,98 @@ +--- +layout: model +title: English burmese_nepal_bhasa_model DistilBertForSequenceClassification from CohleM +author: John Snow Labs +name: burmese_nepal_bhasa_model +date: 2024-09-10 +tags: [bert, en, open_source, sequence_classification, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MPNetEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_nepal_bhasa_model` is a English model originally trained by CohleM. + +## Predicted Entities + + + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_nepal_bhasa_model_en_5.5.0_3.0_1725936036718.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_nepal_bhasa_model_en_5.5.0_3.0_1725936036718.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +document_assembler = DocumentAssembler()\ + .setInputCol("text")\ + .setOutputCol("document") + +tokenizer = Tokenizer()\ + .setInputCols("document")\ + .setOutputCol("token") + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_nepal_bhasa_model","en")\ + .setInputCols(["document","token"])\ + .setOutputCol("class") + +pipeline = Pipeline().setStages([document_assembler, tokenizer, sequenceClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val document_assembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_nepal_bhasa_model","en") + .setInputCols(Array("document","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_nepal_bhasa_model| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[mpnet]| +|Language:|en| +|Size:|406.8 MB| + +## References + +References + +https://huggingface.co/CohleM/my_new_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-10-cot_ep3_35_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-10-cot_ep3_35_pipeline_en.md new file mode 100644 index 00000000000000..27f5b5b86a3bbb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-10-cot_ep3_35_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English cot_ep3_35_pipeline pipeline MPNetEmbeddings from ingeol +author: John Snow Labs +name: cot_ep3_35_pipeline +date: 2024-09-10 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MPNetEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`cot_ep3_35_pipeline` is a English model originally trained by ingeol. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/cot_ep3_35_pipeline_en_5.5.0_3.0_1725963866603.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/cot_ep3_35_pipeline_en_5.5.0_3.0_1725963866603.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("cot_ep3_35_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("cot_ep3_35_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|cot_ep3_35_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.1 MB| + +## References + +https://huggingface.co/ingeol/cot_ep3_35 + +## Included Models + +- DocumentAssembler +- MPNetEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-10-cuad_distil_document_name_cased_08_31_v1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-10-cuad_distil_document_name_cased_08_31_v1_pipeline_en.md new file mode 100644 index 00000000000000..10edc7c0f8de82 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-10-cuad_distil_document_name_cased_08_31_v1_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English cuad_distil_document_name_cased_08_31_v1_pipeline pipeline DistilBertForQuestionAnswering from saraks +author: John Snow Labs +name: cuad_distil_document_name_cased_08_31_v1_pipeline +date: 2024-09-10 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`cuad_distil_document_name_cased_08_31_v1_pipeline` is a English model originally trained by saraks. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/cuad_distil_document_name_cased_08_31_v1_pipeline_en_5.5.0_3.0_1725960078109.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/cuad_distil_document_name_cased_08_31_v1_pipeline_en_5.5.0_3.0_1725960078109.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("cuad_distil_document_name_cased_08_31_v1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("cuad_distil_document_name_cased_08_31_v1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|cuad_distil_document_name_cased_08_31_v1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|243.8 MB| + +## References + +https://huggingface.co/saraks/cuad-distil-document_name-cased-08-31-v1 + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-10-dataset_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-10-dataset_pipeline_en.md new file mode 100644 index 00000000000000..e2fb50d6c3d04f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-10-dataset_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English dataset_pipeline pipeline DistilBertForQuestionAnswering from ajaydvrj +author: John Snow Labs +name: dataset_pipeline +date: 2024-09-10 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`dataset_pipeline` is a English model originally trained by ajaydvrj. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/dataset_pipeline_en_5.5.0_3.0_1725932252340.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/dataset_pipeline_en_5.5.0_3.0_1725932252340.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("dataset_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("dataset_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|dataset_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.3 MB| + +## References + +https://huggingface.co/ajaydvrj/dataset + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-10-deproberta_v5_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-10-deproberta_v5_pipeline_en.md new file mode 100644 index 00000000000000..f98dc4d51dae56 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-10-deproberta_v5_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English deproberta_v5_pipeline pipeline RoBertaForSequenceClassification from ericNguyen0132 +author: John Snow Labs +name: deproberta_v5_pipeline +date: 2024-09-10 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`deproberta_v5_pipeline` is a English model originally trained by ericNguyen0132. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/deproberta_v5_pipeline_en_5.5.0_3.0_1725971874035.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/deproberta_v5_pipeline_en_5.5.0_3.0_1725971874035.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("deproberta_v5_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("deproberta_v5_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|deproberta_v5_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/ericNguyen0132/DepRoBERTa-v5 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-10-distilbert_base_uncased_detect_ai_generated_text_luciayn_en.md b/docs/_posts/ahmedlone127/2024-09-10-distilbert_base_uncased_detect_ai_generated_text_luciayn_en.md new file mode 100644 index 00000000000000..34b7df8f153888 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-10-distilbert_base_uncased_detect_ai_generated_text_luciayn_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_detect_ai_generated_text_luciayn DistilBertForSequenceClassification from luciayn +author: John Snow Labs +name: distilbert_base_uncased_detect_ai_generated_text_luciayn +date: 2024-09-10 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_detect_ai_generated_text_luciayn` is a English model originally trained by luciayn. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_detect_ai_generated_text_luciayn_en_5.5.0_3.0_1726008980329.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_detect_ai_generated_text_luciayn_en_5.5.0_3.0_1726008980329.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_detect_ai_generated_text_luciayn","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_detect_ai_generated_text_luciayn", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_detect_ai_generated_text_luciayn| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/luciayn/distilbert-base-uncased-detect_ai_generated_text \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-10-distilbert_base_uncased_finetuned_imdb_mightyvuai_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-10-distilbert_base_uncased_finetuned_imdb_mightyvuai_pipeline_en.md new file mode 100644 index 00000000000000..49f06754ee27b8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-10-distilbert_base_uncased_finetuned_imdb_mightyvuai_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_imdb_mightyvuai_pipeline pipeline DistilBertEmbeddings from MightyVuAI +author: John Snow Labs +name: distilbert_base_uncased_finetuned_imdb_mightyvuai_pipeline +date: 2024-09-10 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_imdb_mightyvuai_pipeline` is a English model originally trained by MightyVuAI. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_imdb_mightyvuai_pipeline_en_5.5.0_3.0_1725935213931.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_imdb_mightyvuai_pipeline_en_5.5.0_3.0_1725935213931.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_imdb_mightyvuai_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_imdb_mightyvuai_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_imdb_mightyvuai_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/MightyVuAI/distilbert-base-uncased-finetuned-imdb + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-10-distilbert_base_uncased_finetuned_imdb_whole_word_masking_alagrine_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-10-distilbert_base_uncased_finetuned_imdb_whole_word_masking_alagrine_pipeline_en.md new file mode 100644 index 00000000000000..83cb9f54ec61fa --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-10-distilbert_base_uncased_finetuned_imdb_whole_word_masking_alagrine_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_imdb_whole_word_masking_alagrine_pipeline pipeline DistilBertEmbeddings from AlaGrine +author: John Snow Labs +name: distilbert_base_uncased_finetuned_imdb_whole_word_masking_alagrine_pipeline +date: 2024-09-10 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_imdb_whole_word_masking_alagrine_pipeline` is a English model originally trained by AlaGrine. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_imdb_whole_word_masking_alagrine_pipeline_en_5.5.0_3.0_1725995850664.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_imdb_whole_word_masking_alagrine_pipeline_en_5.5.0_3.0_1725995850664.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_imdb_whole_word_masking_alagrine_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_imdb_whole_word_masking_alagrine_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_imdb_whole_word_masking_alagrine_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/AlaGrine/distilbert-base-uncased-finetuned-imdb-whole-word-masking + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-10-distilbert_base_uncased_finetuned_squad_snape_v_en.md b/docs/_posts/ahmedlone127/2024-09-10-distilbert_base_uncased_finetuned_squad_snape_v_en.md new file mode 100644 index 00000000000000..a5f2474c008fcc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-10-distilbert_base_uncased_finetuned_squad_snape_v_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_squad_snape_v DistilBertForQuestionAnswering from Snape-v +author: John Snow Labs +name: distilbert_base_uncased_finetuned_squad_snape_v +date: 2024-09-10 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_squad_snape_v` is a English model originally trained by Snape-v. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_snape_v_en_5.5.0_3.0_1725980023976.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_snape_v_en_5.5.0_3.0_1725980023976.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_uncased_finetuned_squad_snape_v","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_uncased_finetuned_squad_snape_v", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_squad_snape_v| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/Snape-v/distilbert-base-uncased-finetuned-squad \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-10-distilbert_imdb_400_200_en.md b/docs/_posts/ahmedlone127/2024-09-10-distilbert_imdb_400_200_en.md new file mode 100644 index 00000000000000..74687c59092d67 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-10-distilbert_imdb_400_200_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_imdb_400_200 DistilBertForSequenceClassification from cordondata +author: John Snow Labs +name: distilbert_imdb_400_200 +date: 2024-09-10 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_imdb_400_200` is a English model originally trained by cordondata. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_imdb_400_200_en_5.5.0_3.0_1726009304024.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_imdb_400_200_en_5.5.0_3.0_1726009304024.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_imdb_400_200","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_imdb_400_200", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_imdb_400_200| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/cordondata/distilbert_imdb_400_200 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-10-distilbert_sanskrit_saskta_glue_experiment_data_aug_mrpc_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-10-distilbert_sanskrit_saskta_glue_experiment_data_aug_mrpc_pipeline_en.md new file mode 100644 index 00000000000000..6d9416ec740a98 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-10-distilbert_sanskrit_saskta_glue_experiment_data_aug_mrpc_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_sanskrit_saskta_glue_experiment_data_aug_mrpc_pipeline pipeline DistilBertForSequenceClassification from gokuls +author: John Snow Labs +name: distilbert_sanskrit_saskta_glue_experiment_data_aug_mrpc_pipeline +date: 2024-09-10 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_sanskrit_saskta_glue_experiment_data_aug_mrpc_pipeline` is a English model originally trained by gokuls. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_sanskrit_saskta_glue_experiment_data_aug_mrpc_pipeline_en_5.5.0_3.0_1725984057950.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_sanskrit_saskta_glue_experiment_data_aug_mrpc_pipeline_en_5.5.0_3.0_1725984057950.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_sanskrit_saskta_glue_experiment_data_aug_mrpc_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_sanskrit_saskta_glue_experiment_data_aug_mrpc_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_sanskrit_saskta_glue_experiment_data_aug_mrpc_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|250.9 MB| + +## References + +https://huggingface.co/gokuls/distilbert_sa_GLUE_Experiment_data_aug_mrpc + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-10-dummy_model_safik_en.md b/docs/_posts/ahmedlone127/2024-09-10-dummy_model_safik_en.md new file mode 100644 index 00000000000000..8d378ddb27d608 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-10-dummy_model_safik_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English dummy_model_safik CamemBertEmbeddings from safik +author: John Snow Labs +name: dummy_model_safik +date: 2024-09-10 +tags: [en, open_source, onnx, embeddings, camembert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: CamemBertEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained CamemBertEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`dummy_model_safik` is a English model originally trained by safik. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/dummy_model_safik_en_5.5.0_3.0_1725938382013.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/dummy_model_safik_en_5.5.0_3.0_1725938382013.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = CamemBertEmbeddings.pretrained("dummy_model_safik","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = CamemBertEmbeddings.pretrained("dummy_model_safik","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|dummy_model_safik| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[camembert]| +|Language:|en| +|Size:|264.0 MB| + +## References + +https://huggingface.co/safik/dummy-model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-10-dummy_model_tomjam_en.md b/docs/_posts/ahmedlone127/2024-09-10-dummy_model_tomjam_en.md new file mode 100644 index 00000000000000..65e8028e6d9195 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-10-dummy_model_tomjam_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English dummy_model_tomjam CamemBertEmbeddings from tomjam +author: John Snow Labs +name: dummy_model_tomjam +date: 2024-09-10 +tags: [en, open_source, onnx, embeddings, camembert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: CamemBertEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained CamemBertEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`dummy_model_tomjam` is a English model originally trained by tomjam. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/dummy_model_tomjam_en_5.5.0_3.0_1725938313028.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/dummy_model_tomjam_en_5.5.0_3.0_1725938313028.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = CamemBertEmbeddings.pretrained("dummy_model_tomjam","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = CamemBertEmbeddings.pretrained("dummy_model_tomjam","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|dummy_model_tomjam| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[camembert]| +|Language:|en| +|Size:|264.0 MB| + +## References + +https://huggingface.co/tomjam/dummy-model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-10-facets_5_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-10-facets_5_pipeline_en.md new file mode 100644 index 00000000000000..75f99b7deaeb9f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-10-facets_5_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English facets_5_pipeline pipeline MPNetEmbeddings from ingeol +author: John Snow Labs +name: facets_5_pipeline +date: 2024-09-10 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MPNetEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`facets_5_pipeline` is a English model originally trained by ingeol. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/facets_5_pipeline_en_5.5.0_3.0_1725969975075.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/facets_5_pipeline_en_5.5.0_3.0_1725969975075.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("facets_5_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("facets_5_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|facets_5_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.1 MB| + +## References + +https://huggingface.co/ingeol/facets_5 + +## Included Models + +- DocumentAssembler +- MPNetEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-10-facets_gpt_expanswer_1234_en.md b/docs/_posts/ahmedlone127/2024-09-10-facets_gpt_expanswer_1234_en.md new file mode 100644 index 00000000000000..d558a044c426ab --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-10-facets_gpt_expanswer_1234_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English facets_gpt_expanswer_1234 MPNetEmbeddings from ingeol +author: John Snow Labs +name: facets_gpt_expanswer_1234 +date: 2024-09-10 +tags: [en, open_source, onnx, embeddings, mpnet] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MPNetEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MPNetEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`facets_gpt_expanswer_1234` is a English model originally trained by ingeol. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/facets_gpt_expanswer_1234_en_5.5.0_3.0_1725996947387.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/facets_gpt_expanswer_1234_en_5.5.0_3.0_1725996947387.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = MPNetEmbeddings.pretrained("facets_gpt_expanswer_1234","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val embeddings = MPNetEmbeddings.pretrained("facets_gpt_expanswer_1234","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|facets_gpt_expanswer_1234| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[mpnet]| +|Language:|en| +|Size:|407.1 MB| + +## References + +https://huggingface.co/ingeol/facets_gpt_expanswer_1234 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-10-finetuned__roberta_clinical_wl_spanish__augmented_ultrasounds_en.md b/docs/_posts/ahmedlone127/2024-09-10-finetuned__roberta_clinical_wl_spanish__augmented_ultrasounds_en.md new file mode 100644 index 00000000000000..4b7026785a272a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-10-finetuned__roberta_clinical_wl_spanish__augmented_ultrasounds_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuned__roberta_clinical_wl_spanish__augmented_ultrasounds RoBertaEmbeddings from manucos +author: John Snow Labs +name: finetuned__roberta_clinical_wl_spanish__augmented_ultrasounds +date: 2024-09-10 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuned__roberta_clinical_wl_spanish__augmented_ultrasounds` is a English model originally trained by manucos. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuned__roberta_clinical_wl_spanish__augmented_ultrasounds_en_5.5.0_3.0_1725937249296.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuned__roberta_clinical_wl_spanish__augmented_ultrasounds_en_5.5.0_3.0_1725937249296.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("finetuned__roberta_clinical_wl_spanish__augmented_ultrasounds","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("finetuned__roberta_clinical_wl_spanish__augmented_ultrasounds","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuned__roberta_clinical_wl_spanish__augmented_ultrasounds| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|469.7 MB| + +## References + +https://huggingface.co/manucos/finetuned__roberta-clinical-wl-es__augmented-ultrasounds \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-10-gsm_finetunned_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-10-gsm_finetunned_pipeline_en.md new file mode 100644 index 00000000000000..68f75ea481defa --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-10-gsm_finetunned_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English gsm_finetunned_pipeline pipeline MPNetEmbeddings from anomys +author: John Snow Labs +name: gsm_finetunned_pipeline +date: 2024-09-10 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MPNetEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`gsm_finetunned_pipeline` is a English model originally trained by anomys. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/gsm_finetunned_pipeline_en_5.5.0_3.0_1725978385500.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/gsm_finetunned_pipeline_en_5.5.0_3.0_1725978385500.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("gsm_finetunned_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("gsm_finetunned_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|gsm_finetunned_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|406.7 MB| + +## References + +https://huggingface.co/anomys/gsm-finetunned + +## Included Models + +- DocumentAssembler +- MPNetEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-10-llama_amazbooks_mpnet_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-10-llama_amazbooks_mpnet_pipeline_en.md new file mode 100644 index 00000000000000..fa4c7eb14c434d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-10-llama_amazbooks_mpnet_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English llama_amazbooks_mpnet_pipeline pipeline MPNetEmbeddings from beeformer +author: John Snow Labs +name: llama_amazbooks_mpnet_pipeline +date: 2024-09-10 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MPNetEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`llama_amazbooks_mpnet_pipeline` is a English model originally trained by beeformer. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/llama_amazbooks_mpnet_pipeline_en_5.5.0_3.0_1725963715402.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/llama_amazbooks_mpnet_pipeline_en_5.5.0_3.0_1725963715402.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("llama_amazbooks_mpnet_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("llama_amazbooks_mpnet_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|llama_amazbooks_mpnet_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|406.8 MB| + +## References + +https://huggingface.co/beeformer/Llama-amazbooks-mpnet + +## Included Models + +- DocumentAssembler +- MPNetEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-10-marian_dyula_french_translator_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-10-marian_dyula_french_translator_pipeline_en.md new file mode 100644 index 00000000000000..257b2efe7ba56e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-10-marian_dyula_french_translator_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English marian_dyula_french_translator_pipeline pipeline MarianTransformer from Kimmy7 +author: John Snow Labs +name: marian_dyula_french_translator_pipeline +date: 2024-09-10 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`marian_dyula_french_translator_pipeline` is a English model originally trained by Kimmy7. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/marian_dyula_french_translator_pipeline_en_5.5.0_3.0_1726002415162.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/marian_dyula_french_translator_pipeline_en_5.5.0_3.0_1726002415162.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("marian_dyula_french_translator_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("marian_dyula_french_translator_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|marian_dyula_french_translator_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|533.2 MB| + +## References + +https://huggingface.co/Kimmy7/marian_dyula_french_translator + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-10-muril_base_cased_hate_speech_ben_hin_bn.md b/docs/_posts/ahmedlone127/2024-09-10-muril_base_cased_hate_speech_ben_hin_bn.md new file mode 100644 index 00000000000000..7a5999a5d760ae --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-10-muril_base_cased_hate_speech_ben_hin_bn.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Bengali muril_base_cased_hate_speech_ben_hin BertForSequenceClassification from abirmondalind +author: John Snow Labs +name: muril_base_cased_hate_speech_ben_hin +date: 2024-09-10 +tags: [bn, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: bn +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`muril_base_cased_hate_speech_ben_hin` is a Bengali model originally trained by abirmondalind. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/muril_base_cased_hate_speech_ben_hin_bn_5.5.0_3.0_1725999628248.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/muril_base_cased_hate_speech_ben_hin_bn_5.5.0_3.0_1725999628248.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("muril_base_cased_hate_speech_ben_hin","bn") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("muril_base_cased_hate_speech_ben_hin", "bn") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|muril_base_cased_hate_speech_ben_hin| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|bn| +|Size:|892.7 MB| + +## References + +https://huggingface.co/abirmondalind/muril-base-cased-hate-speech-ben-hin \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-10-ner_column_bert_base_ner_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-10-ner_column_bert_base_ner_pipeline_en.md new file mode 100644 index 00000000000000..23661b9a800d1a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-10-ner_column_bert_base_ner_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English ner_column_bert_base_ner_pipeline pipeline BertForTokenClassification from almaghrabima +author: John Snow Labs +name: ner_column_bert_base_ner_pipeline +date: 2024-09-10 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ner_column_bert_base_ner_pipeline` is a English model originally trained by almaghrabima. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ner_column_bert_base_ner_pipeline_en_5.5.0_3.0_1725934736379.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ner_column_bert_base_ner_pipeline_en_5.5.0_3.0_1725934736379.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("ner_column_bert_base_ner_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("ner_column_bert_base_ner_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ner_column_bert_base_ner_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|404.8 MB| + +## References + +https://huggingface.co/almaghrabima/ner_column_bert-base-NER + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-10-ner_hindi_bert_en.md b/docs/_posts/ahmedlone127/2024-09-10-ner_hindi_bert_en.md new file mode 100644 index 00000000000000..62c1400b4111ba --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-10-ner_hindi_bert_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English ner_hindi_bert BertForTokenClassification from lakshaywadhwa1993 +author: John Snow Labs +name: ner_hindi_bert +date: 2024-09-10 +tags: [en, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ner_hindi_bert` is a English model originally trained by lakshaywadhwa1993. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ner_hindi_bert_en_5.5.0_3.0_1725955679986.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ner_hindi_bert_en_5.5.0_3.0_1725955679986.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("ner_hindi_bert","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("ner_hindi_bert", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ner_hindi_bert| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|665.1 MB| + +## References + +https://huggingface.co/lakshaywadhwa1993/ner_hindi_bert \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-10-ner_hindi_bert_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-10-ner_hindi_bert_pipeline_en.md new file mode 100644 index 00000000000000..7ae4b5ccdd6622 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-10-ner_hindi_bert_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English ner_hindi_bert_pipeline pipeline BertForTokenClassification from lakshaywadhwa1993 +author: John Snow Labs +name: ner_hindi_bert_pipeline +date: 2024-09-10 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ner_hindi_bert_pipeline` is a English model originally trained by lakshaywadhwa1993. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ner_hindi_bert_pipeline_en_5.5.0_3.0_1725955710391.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ner_hindi_bert_pipeline_en_5.5.0_3.0_1725955710391.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("ner_hindi_bert_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("ner_hindi_bert_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ner_hindi_bert_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|665.1 MB| + +## References + +https://huggingface.co/lakshaywadhwa1993/ner_hindi_bert + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-10-nordic_roberta_wiki_pipeline_sv.md b/docs/_posts/ahmedlone127/2024-09-10-nordic_roberta_wiki_pipeline_sv.md new file mode 100644 index 00000000000000..4c4dc61cff5e1c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-10-nordic_roberta_wiki_pipeline_sv.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Swedish nordic_roberta_wiki_pipeline pipeline RoBertaEmbeddings from flax-community +author: John Snow Labs +name: nordic_roberta_wiki_pipeline +date: 2024-09-10 +tags: [sv, open_source, pipeline, onnx] +task: Embeddings +language: sv +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`nordic_roberta_wiki_pipeline` is a Swedish model originally trained by flax-community. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/nordic_roberta_wiki_pipeline_sv_5.5.0_3.0_1726005604514.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/nordic_roberta_wiki_pipeline_sv_5.5.0_3.0_1726005604514.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("nordic_roberta_wiki_pipeline", lang = "sv") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("nordic_roberta_wiki_pipeline", lang = "sv") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|nordic_roberta_wiki_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|sv| +|Size:|465.6 MB| + +## References + +https://huggingface.co/flax-community/nordic-roberta-wiki + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-10-output_en.md b/docs/_posts/ahmedlone127/2024-09-10-output_en.md new file mode 100644 index 00000000000000..59c6774f200871 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-10-output_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English output DistilBertEmbeddings from soyisauce +author: John Snow Labs +name: output +date: 2024-09-10 +tags: [distilbert, en, open_source, fill_mask, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`output` is a English model originally trained by soyisauce. + +## Predicted Entities + + + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/output_en_5.5.0_3.0_1725980309782.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/output_en_5.5.0_3.0_1725980309782.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +document_assembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +embeddings =DistilBertEmbeddings.pretrained("output","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([document_assembler, embeddings]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) +``` +```scala +val document_assembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val embeddings = DistilBertEmbeddings + .pretrained("output", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(document_assembler, embeddings)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|output| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|247.2 MB| + +## References + +References + +References + +https://huggingface.co/soyisauce/output \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-10-output_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-10-output_pipeline_en.md new file mode 100644 index 00000000000000..106cf7fbf825e7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-10-output_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English output_pipeline pipeline DistilBertForQuestionAnswering from AparnaGayathri +author: John Snow Labs +name: output_pipeline +date: 2024-09-10 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`output_pipeline` is a English model originally trained by AparnaGayathri. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/output_pipeline_en_5.5.0_3.0_1725980321343.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/output_pipeline_en_5.5.0_3.0_1725980321343.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("output_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("output_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|output_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/AparnaGayathri/output + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-10-passage_ranker_v1_xs_english_en.md b/docs/_posts/ahmedlone127/2024-09-10-passage_ranker_v1_xs_english_en.md new file mode 100644 index 00000000000000..71260a5355e218 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-10-passage_ranker_v1_xs_english_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English passage_ranker_v1_xs_english BertForSequenceClassification from sinequa +author: John Snow Labs +name: passage_ranker_v1_xs_english +date: 2024-09-10 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`passage_ranker_v1_xs_english` is a English model originally trained by sinequa. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/passage_ranker_v1_xs_english_en_5.5.0_3.0_1725999696534.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/passage_ranker_v1_xs_english_en_5.5.0_3.0_1725999696534.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("passage_ranker_v1_xs_english","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("passage_ranker_v1_xs_english", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|passage_ranker_v1_xs_english| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|42.1 MB| + +## References + +https://huggingface.co/sinequa/passage-ranker-v1-XS-en \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-10-q_only_ep3_42_en.md b/docs/_posts/ahmedlone127/2024-09-10-q_only_ep3_42_en.md new file mode 100644 index 00000000000000..423f91bcd295ac --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-10-q_only_ep3_42_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English q_only_ep3_42 MPNetEmbeddings from ingeol +author: John Snow Labs +name: q_only_ep3_42 +date: 2024-09-10 +tags: [en, open_source, onnx, embeddings, mpnet] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MPNetEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MPNetEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`q_only_ep3_42` is a English model originally trained by ingeol. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/q_only_ep3_42_en_5.5.0_3.0_1725969718590.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/q_only_ep3_42_en_5.5.0_3.0_1725969718590.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = MPNetEmbeddings.pretrained("q_only_ep3_42","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val embeddings = MPNetEmbeddings.pretrained("q_only_ep3_42","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|q_only_ep3_42| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[mpnet]| +|Language:|en| +|Size:|407.1 MB| + +## References + +https://huggingface.co/ingeol/q_only_ep3_42 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-10-r2_w266_setfit_mbti_multiclass_hypsearch_mpnet_nov30_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-10-r2_w266_setfit_mbti_multiclass_hypsearch_mpnet_nov30_pipeline_en.md new file mode 100644 index 00000000000000..63fedb8bdc0056 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-10-r2_w266_setfit_mbti_multiclass_hypsearch_mpnet_nov30_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English r2_w266_setfit_mbti_multiclass_hypsearch_mpnet_nov30_pipeline pipeline MPNetEmbeddings from shrinivasbjoshi +author: John Snow Labs +name: r2_w266_setfit_mbti_multiclass_hypsearch_mpnet_nov30_pipeline +date: 2024-09-10 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MPNetEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`r2_w266_setfit_mbti_multiclass_hypsearch_mpnet_nov30_pipeline` is a English model originally trained by shrinivasbjoshi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/r2_w266_setfit_mbti_multiclass_hypsearch_mpnet_nov30_pipeline_en_5.5.0_3.0_1725963430814.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/r2_w266_setfit_mbti_multiclass_hypsearch_mpnet_nov30_pipeline_en_5.5.0_3.0_1725963430814.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("r2_w266_setfit_mbti_multiclass_hypsearch_mpnet_nov30_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("r2_w266_setfit_mbti_multiclass_hypsearch_mpnet_nov30_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|r2_w266_setfit_mbti_multiclass_hypsearch_mpnet_nov30_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/shrinivasbjoshi/r2-w266-setfit-mbti-multiclass-hypsearch-mpnet-nov30 + +## Included Models + +- DocumentAssembler +- MPNetEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-10-refpydst_10p_referredstates_split_v1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-10-refpydst_10p_referredstates_split_v1_pipeline_en.md new file mode 100644 index 00000000000000..61e4424c7e2c6a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-10-refpydst_10p_referredstates_split_v1_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English refpydst_10p_referredstates_split_v1_pipeline pipeline MPNetEmbeddings from Brendan +author: John Snow Labs +name: refpydst_10p_referredstates_split_v1_pipeline +date: 2024-09-10 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MPNetEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`refpydst_10p_referredstates_split_v1_pipeline` is a English model originally trained by Brendan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/refpydst_10p_referredstates_split_v1_pipeline_en_5.5.0_3.0_1725935927689.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/refpydst_10p_referredstates_split_v1_pipeline_en_5.5.0_3.0_1725935927689.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("refpydst_10p_referredstates_split_v1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("refpydst_10p_referredstates_split_v1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|refpydst_10p_referredstates_split_v1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|406.9 MB| + +## References + +https://huggingface.co/Brendan/refpydst-10p-referredstates-split-v1 + +## Included Models + +- DocumentAssembler +- MPNetEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-10-retrieval_mpnet_dot_finetuned_llama3_synthetic_dataset_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-10-retrieval_mpnet_dot_finetuned_llama3_synthetic_dataset_pipeline_en.md new file mode 100644 index 00000000000000..728aa497609c66 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-10-retrieval_mpnet_dot_finetuned_llama3_synthetic_dataset_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English retrieval_mpnet_dot_finetuned_llama3_synthetic_dataset_pipeline pipeline MPNetEmbeddings from antonkirk +author: John Snow Labs +name: retrieval_mpnet_dot_finetuned_llama3_synthetic_dataset_pipeline +date: 2024-09-10 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MPNetEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`retrieval_mpnet_dot_finetuned_llama3_synthetic_dataset_pipeline` is a English model originally trained by antonkirk. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/retrieval_mpnet_dot_finetuned_llama3_synthetic_dataset_pipeline_en_5.5.0_3.0_1725936451753.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/retrieval_mpnet_dot_finetuned_llama3_synthetic_dataset_pipeline_en_5.5.0_3.0_1725936451753.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("retrieval_mpnet_dot_finetuned_llama3_synthetic_dataset_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("retrieval_mpnet_dot_finetuned_llama3_synthetic_dataset_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|retrieval_mpnet_dot_finetuned_llama3_synthetic_dataset_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.0 MB| + +## References + +https://huggingface.co/antonkirk/retrieval-mpnet-dot-finetuned-llama3-synthetic-dataset + +## Included Models + +- DocumentAssembler +- MPNetEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-10-roberta_base_roberta_model_enyonam_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-10-roberta_base_roberta_model_enyonam_pipeline_en.md new file mode 100644 index 00000000000000..65779bf4677f51 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-10-roberta_base_roberta_model_enyonam_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_roberta_model_enyonam_pipeline pipeline RoBertaForSequenceClassification from Enyonam +author: John Snow Labs +name: roberta_base_roberta_model_enyonam_pipeline +date: 2024-09-10 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_roberta_model_enyonam_pipeline` is a English model originally trained by Enyonam. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_roberta_model_enyonam_pipeline_en_5.5.0_3.0_1725962633171.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_roberta_model_enyonam_pipeline_en_5.5.0_3.0_1725962633171.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_roberta_model_enyonam_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_roberta_model_enyonam_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_roberta_model_enyonam_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|424.6 MB| + +## References + +https://huggingface.co/Enyonam/roberta-base-Roberta-Model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-10-roberta_tagalog_base_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-10-roberta_tagalog_base_pipeline_en.md new file mode 100644 index 00000000000000..b0dacda2c05c24 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-10-roberta_tagalog_base_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_tagalog_base_pipeline pipeline RoBertaEmbeddings from GKLMIP +author: John Snow Labs +name: roberta_tagalog_base_pipeline +date: 2024-09-10 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_tagalog_base_pipeline` is a English model originally trained by GKLMIP. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_tagalog_base_pipeline_en_5.5.0_3.0_1725937753524.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_tagalog_base_pipeline_en_5.5.0_3.0_1725937753524.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_tagalog_base_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_tagalog_base_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_tagalog_base_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|467.7 MB| + +## References + +https://huggingface.co/GKLMIP/roberta-tagalog-base + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-10-rulebert_v0_3_k3_pipeline_it.md b/docs/_posts/ahmedlone127/2024-09-10-rulebert_v0_3_k3_pipeline_it.md new file mode 100644 index 00000000000000..f25fcdfa1d3cc8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-10-rulebert_v0_3_k3_pipeline_it.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Italian rulebert_v0_3_k3_pipeline pipeline XlmRoBertaForSequenceClassification from ribesstefano +author: John Snow Labs +name: rulebert_v0_3_k3_pipeline +date: 2024-09-10 +tags: [it, open_source, pipeline, onnx] +task: Text Classification +language: it +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`rulebert_v0_3_k3_pipeline` is a Italian model originally trained by ribesstefano. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/rulebert_v0_3_k3_pipeline_it_5.5.0_3.0_1726003953022.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/rulebert_v0_3_k3_pipeline_it_5.5.0_3.0_1726003953022.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("rulebert_v0_3_k3_pipeline", lang = "it") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("rulebert_v0_3_k3_pipeline", lang = "it") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|rulebert_v0_3_k3_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|it| +|Size:|870.5 MB| + +## References + +https://huggingface.co/ribesstefano/RuleBert-v0.3-k3 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-10-setfit_model_feb12_misinformation_on_convoy_en.md b/docs/_posts/ahmedlone127/2024-09-10-setfit_model_feb12_misinformation_on_convoy_en.md new file mode 100644 index 00000000000000..6f7906c8566e67 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-10-setfit_model_feb12_misinformation_on_convoy_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English setfit_model_feb12_misinformation_on_convoy MPNetEmbeddings from mitra-mir +author: John Snow Labs +name: setfit_model_feb12_misinformation_on_convoy +date: 2024-09-10 +tags: [en, open_source, onnx, embeddings, mpnet] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MPNetEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MPNetEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`setfit_model_feb12_misinformation_on_convoy` is a English model originally trained by mitra-mir. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/setfit_model_feb12_misinformation_on_convoy_en_5.5.0_3.0_1725936463678.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/setfit_model_feb12_misinformation_on_convoy_en_5.5.0_3.0_1725936463678.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = MPNetEmbeddings.pretrained("setfit_model_feb12_misinformation_on_convoy","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val embeddings = MPNetEmbeddings.pretrained("setfit_model_feb12_misinformation_on_convoy","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|setfit_model_feb12_misinformation_on_convoy| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[mpnet]| +|Language:|en| +|Size:|406.8 MB| + +## References + +https://huggingface.co/mitra-mir/setfit-model-Feb12-Misinformation-on-Convoy \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-10-setfit_model_ireland_binary_label2_epochs2_en.md b/docs/_posts/ahmedlone127/2024-09-10-setfit_model_ireland_binary_label2_epochs2_en.md new file mode 100644 index 00000000000000..52e3c7f3749a76 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-10-setfit_model_ireland_binary_label2_epochs2_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English setfit_model_ireland_binary_label2_epochs2 MPNetEmbeddings from mitra-mir +author: John Snow Labs +name: setfit_model_ireland_binary_label2_epochs2 +date: 2024-09-10 +tags: [en, open_source, onnx, embeddings, mpnet] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MPNetEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MPNetEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`setfit_model_ireland_binary_label2_epochs2` is a English model originally trained by mitra-mir. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/setfit_model_ireland_binary_label2_epochs2_en_5.5.0_3.0_1725963692083.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/setfit_model_ireland_binary_label2_epochs2_en_5.5.0_3.0_1725963692083.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = MPNetEmbeddings.pretrained("setfit_model_ireland_binary_label2_epochs2","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val embeddings = MPNetEmbeddings.pretrained("setfit_model_ireland_binary_label2_epochs2","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|setfit_model_ireland_binary_label2_epochs2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[mpnet]| +|Language:|en| +|Size:|406.8 MB| + +## References + +https://huggingface.co/mitra-mir/setfit_model_Ireland_binary_label2_epochs2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-10-setfit_mpnet_base_v2_finetuned_senteval_cree_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-10-setfit_mpnet_base_v2_finetuned_senteval_cree_pipeline_en.md new file mode 100644 index 00000000000000..b92374a7f80435 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-10-setfit_mpnet_base_v2_finetuned_senteval_cree_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English setfit_mpnet_base_v2_finetuned_senteval_cree_pipeline pipeline MPNetEmbeddings from mrm8488 +author: John Snow Labs +name: setfit_mpnet_base_v2_finetuned_senteval_cree_pipeline +date: 2024-09-10 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MPNetEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`setfit_mpnet_base_v2_finetuned_senteval_cree_pipeline` is a English model originally trained by mrm8488. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/setfit_mpnet_base_v2_finetuned_senteval_cree_pipeline_en_5.5.0_3.0_1725964274890.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/setfit_mpnet_base_v2_finetuned_senteval_cree_pipeline_en_5.5.0_3.0_1725964274890.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("setfit_mpnet_base_v2_finetuned_senteval_cree_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("setfit_mpnet_base_v2_finetuned_senteval_cree_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|setfit_mpnet_base_v2_finetuned_senteval_cree_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|406.9 MB| + +## References + +https://huggingface.co/mrm8488/setfit-mpnet-base-v2-finetuned-sentEval-CR + +## Included Models + +- DocumentAssembler +- MPNetEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-10-southern_sotho_all_mpnet_finetuned_english_2000_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-10-southern_sotho_all_mpnet_finetuned_english_2000_pipeline_en.md new file mode 100644 index 00000000000000..bcc4a8ad3cf46c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-10-southern_sotho_all_mpnet_finetuned_english_2000_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English southern_sotho_all_mpnet_finetuned_english_2000_pipeline pipeline MPNetEmbeddings from danfeg +author: John Snow Labs +name: southern_sotho_all_mpnet_finetuned_english_2000_pipeline +date: 2024-09-10 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MPNetEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`southern_sotho_all_mpnet_finetuned_english_2000_pipeline` is a English model originally trained by danfeg. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/southern_sotho_all_mpnet_finetuned_english_2000_pipeline_en_5.5.0_3.0_1725936173271.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/southern_sotho_all_mpnet_finetuned_english_2000_pipeline_en_5.5.0_3.0_1725936173271.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("southern_sotho_all_mpnet_finetuned_english_2000_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("southern_sotho_all_mpnet_finetuned_english_2000_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|southern_sotho_all_mpnet_finetuned_english_2000_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|406.8 MB| + +## References + +https://huggingface.co/danfeg/ST-ALL-MPNET_Finetuned-EN-2000 + +## Included Models + +- DocumentAssembler +- MPNetEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-10-squad_clip_text_3_en.md b/docs/_posts/ahmedlone127/2024-09-10-squad_clip_text_3_en.md new file mode 100644 index 00000000000000..474523fbe88891 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-10-squad_clip_text_3_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English squad_clip_text_3 RoBertaForQuestionAnswering from AnonymousSub +author: John Snow Labs +name: squad_clip_text_3 +date: 2024-09-10 +tags: [en, open_source, onnx, question_answering, roberta] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`squad_clip_text_3` is a English model originally trained by AnonymousSub. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/squad_clip_text_3_en_5.5.0_3.0_1725959135554.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/squad_clip_text_3_en_5.5.0_3.0_1725959135554.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = RoBertaForQuestionAnswering.pretrained("squad_clip_text_3","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = RoBertaForQuestionAnswering.pretrained("squad_clip_text_3", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|squad_clip_text_3| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|466.3 MB| + +## References + +https://huggingface.co/AnonymousSub/SQuAD_CLIP_text_3 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-10-test_false_positive_2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-10-test_false_positive_2_pipeline_en.md new file mode 100644 index 00000000000000..5760cc16ae3647 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-10-test_false_positive_2_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English test_false_positive_2_pipeline pipeline MPNetEmbeddings from witty-works +author: John Snow Labs +name: test_false_positive_2_pipeline +date: 2024-09-10 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MPNetEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`test_false_positive_2_pipeline` is a English model originally trained by witty-works. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/test_false_positive_2_pipeline_en_5.5.0_3.0_1725936743903.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/test_false_positive_2_pipeline_en_5.5.0_3.0_1725936743903.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("test_false_positive_2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("test_false_positive_2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|test_false_positive_2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|406.9 MB| + +## References + +https://huggingface.co/witty-works/test_false_positive_2 + +## Included Models + +- DocumentAssembler +- MPNetEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-10-the_doctor_asked_if_the_patient_had_any_more_questions_bert_last512_en.md b/docs/_posts/ahmedlone127/2024-09-10-the_doctor_asked_if_the_patient_had_any_more_questions_bert_last512_en.md new file mode 100644 index 00000000000000..46b1fbce53d53b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-10-the_doctor_asked_if_the_patient_had_any_more_questions_bert_last512_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English the_doctor_asked_if_the_patient_had_any_more_questions_bert_last512 BertForSequenceClassification from etadevosyan +author: John Snow Labs +name: the_doctor_asked_if_the_patient_had_any_more_questions_bert_last512 +date: 2024-09-10 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`the_doctor_asked_if_the_patient_had_any_more_questions_bert_last512` is a English model originally trained by etadevosyan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/the_doctor_asked_if_the_patient_had_any_more_questions_bert_last512_en_5.5.0_3.0_1725957659208.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/the_doctor_asked_if_the_patient_had_any_more_questions_bert_last512_en_5.5.0_3.0_1725957659208.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("the_doctor_asked_if_the_patient_had_any_more_questions_bert_last512","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("the_doctor_asked_if_the_patient_had_any_more_questions_bert_last512", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|the_doctor_asked_if_the_patient_had_any_more_questions_bert_last512| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|666.5 MB| + +## References + +https://huggingface.co/etadevosyan/the_doctor_asked_if_the_patient_had_any_more_questions_bert_Last512 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-10-tonga_tonga_islands_classifier_v0_en.md b/docs/_posts/ahmedlone127/2024-09-10-tonga_tonga_islands_classifier_v0_en.md new file mode 100644 index 00000000000000..e79904c93240b5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-10-tonga_tonga_islands_classifier_v0_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English tonga_tonga_islands_classifier_v0 MPNetEmbeddings from futuredatascience +author: John Snow Labs +name: tonga_tonga_islands_classifier_v0 +date: 2024-09-10 +tags: [en, open_source, onnx, embeddings, mpnet] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MPNetEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MPNetEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`tonga_tonga_islands_classifier_v0` is a English model originally trained by futuredatascience. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/tonga_tonga_islands_classifier_v0_en_5.5.0_3.0_1725978699161.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/tonga_tonga_islands_classifier_v0_en_5.5.0_3.0_1725978699161.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = MPNetEmbeddings.pretrained("tonga_tonga_islands_classifier_v0","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val embeddings = MPNetEmbeddings.pretrained("tonga_tonga_islands_classifier_v0","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|tonga_tonga_islands_classifier_v0| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[mpnet]| +|Language:|en| +|Size:|406.9 MB| + +## References + +https://huggingface.co/futuredatascience/to-classifier-v0 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-10-tweeteval_fewshot_en.md b/docs/_posts/ahmedlone127/2024-09-10-tweeteval_fewshot_en.md new file mode 100644 index 00000000000000..f24deed0bfebe8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-10-tweeteval_fewshot_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English tweeteval_fewshot MPNetEmbeddings from pig4431 +author: John Snow Labs +name: tweeteval_fewshot +date: 2024-09-10 +tags: [en, open_source, onnx, embeddings, mpnet] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MPNetEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MPNetEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`tweeteval_fewshot` is a English model originally trained by pig4431. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/tweeteval_fewshot_en_5.5.0_3.0_1725936610583.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/tweeteval_fewshot_en_5.5.0_3.0_1725936610583.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = MPNetEmbeddings.pretrained("tweeteval_fewshot","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val embeddings = MPNetEmbeddings.pretrained("tweeteval_fewshot","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|tweeteval_fewshot| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[mpnet]| +|Language:|en| +|Size:|406.9 MB| + +## References + +https://huggingface.co/pig4431/TweetEval_fewshot \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-10-vetbert_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-10-vetbert_pipeline_en.md new file mode 100644 index 00000000000000..5b7ff8afa7ce77 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-10-vetbert_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English vetbert_pipeline pipeline BertEmbeddings from havocy28 +author: John Snow Labs +name: vetbert_pipeline +date: 2024-09-10 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`vetbert_pipeline` is a English model originally trained by havocy28. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/vetbert_pipeline_en_5.5.0_3.0_1725989221341.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/vetbert_pipeline_en_5.5.0_3.0_1725989221341.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("vetbert_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("vetbert_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|vetbert_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|402.8 MB| + +## References + +https://huggingface.co/havocy28/VetBERT + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-10-whisper_small_ukrainian_art1xgg_pipeline_uk.md b/docs/_posts/ahmedlone127/2024-09-10-whisper_small_ukrainian_art1xgg_pipeline_uk.md new file mode 100644 index 00000000000000..64b9389e9c9945 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-10-whisper_small_ukrainian_art1xgg_pipeline_uk.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Ukrainian whisper_small_ukrainian_art1xgg_pipeline pipeline WhisperForCTC from art1xgg +author: John Snow Labs +name: whisper_small_ukrainian_art1xgg_pipeline +date: 2024-09-10 +tags: [uk, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: uk +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_ukrainian_art1xgg_pipeline` is a Ukrainian model originally trained by art1xgg. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_ukrainian_art1xgg_pipeline_uk_5.5.0_3.0_1725950825173.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_ukrainian_art1xgg_pipeline_uk_5.5.0_3.0_1725950825173.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_ukrainian_art1xgg_pipeline", lang = "uk") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_ukrainian_art1xgg_pipeline", lang = "uk") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_ukrainian_art1xgg_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|uk| +|Size:|1.1 GB| + +## References + +https://huggingface.co/art1xgg/whisper-small-uk + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-10-whisper_small_yue_chinese_full_en.md b/docs/_posts/ahmedlone127/2024-09-10-whisper_small_yue_chinese_full_en.md new file mode 100644 index 00000000000000..2272dc194239f2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-10-whisper_small_yue_chinese_full_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_small_yue_chinese_full WhisperForCTC from safecantonese +author: John Snow Labs +name: whisper_small_yue_chinese_full +date: 2024-09-10 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_yue_chinese_full` is a English model originally trained by safecantonese. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_yue_chinese_full_en_5.5.0_3.0_1725949325788.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_yue_chinese_full_en_5.5.0_3.0_1725949325788.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_yue_chinese_full","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_yue_chinese_full", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_yue_chinese_full| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/safecantonese/whisper-small-yue-full \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-10-whisper_small_yue_chinese_full_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-10-whisper_small_yue_chinese_full_pipeline_en.md new file mode 100644 index 00000000000000..8e6206e6136623 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-10-whisper_small_yue_chinese_full_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_small_yue_chinese_full_pipeline pipeline WhisperForCTC from safecantonese +author: John Snow Labs +name: whisper_small_yue_chinese_full_pipeline +date: 2024-09-10 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_yue_chinese_full_pipeline` is a English model originally trained by safecantonese. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_yue_chinese_full_pipeline_en_5.5.0_3.0_1725949408274.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_yue_chinese_full_pipeline_en_5.5.0_3.0_1725949408274.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_yue_chinese_full_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_yue_chinese_full_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_yue_chinese_full_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/safecantonese/whisper-small-yue-full + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-10-whisper_tiny_papi_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-10-whisper_tiny_papi_pipeline_en.md new file mode 100644 index 00000000000000..f1a48ddc6a43fa --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-10-whisper_tiny_papi_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_tiny_papi_pipeline pipeline WhisperForCTC from sonnygeorge +author: John Snow Labs +name: whisper_tiny_papi_pipeline +date: 2024-09-10 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_papi_pipeline` is a English model originally trained by sonnygeorge. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_papi_pipeline_en_5.5.0_3.0_1725950497564.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_papi_pipeline_en_5.5.0_3.0_1725950497564.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_tiny_papi_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_tiny_papi_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_papi_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|389.9 MB| + +## References + +https://huggingface.co/sonnygeorge/whisper-tiny-papi + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-10-withinapps_ndd_phoenix_test_tags_cwadj_en.md b/docs/_posts/ahmedlone127/2024-09-10-withinapps_ndd_phoenix_test_tags_cwadj_en.md new file mode 100644 index 00000000000000..b225723b0ec1a2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-10-withinapps_ndd_phoenix_test_tags_cwadj_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English withinapps_ndd_phoenix_test_tags_cwadj DistilBertForSequenceClassification from lgk03 +author: John Snow Labs +name: withinapps_ndd_phoenix_test_tags_cwadj +date: 2024-09-10 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`withinapps_ndd_phoenix_test_tags_cwadj` is a English model originally trained by lgk03. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/withinapps_ndd_phoenix_test_tags_cwadj_en_5.5.0_3.0_1726009029510.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/withinapps_ndd_phoenix_test_tags_cwadj_en_5.5.0_3.0_1726009029510.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("withinapps_ndd_phoenix_test_tags_cwadj","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("withinapps_ndd_phoenix_test_tags_cwadj", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|withinapps_ndd_phoenix_test_tags_cwadj| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/lgk03/WITHINAPPS_NDD-phoenix_test-tags-CWAdj \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-10-xlm_emo_t_milanlproc_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-10-xlm_emo_t_milanlproc_pipeline_en.md new file mode 100644 index 00000000000000..86b71a849e79e6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-10-xlm_emo_t_milanlproc_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_emo_t_milanlproc_pipeline pipeline XlmRoBertaForSequenceClassification from MilaNLProc +author: John Snow Labs +name: xlm_emo_t_milanlproc_pipeline +date: 2024-09-10 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_emo_t_milanlproc_pipeline` is a English model originally trained by MilaNLProc. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_emo_t_milanlproc_pipeline_en_5.5.0_3.0_1725967907793.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_emo_t_milanlproc_pipeline_en_5.5.0_3.0_1725967907793.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_emo_t_milanlproc_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_emo_t_milanlproc_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_emo_t_milanlproc_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.0 GB| + +## References + +https://huggingface.co/MilaNLProc/xlm-emo-t + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-10-xlm_roberta_base_finetuned_panx_all_fraisier_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-10-xlm_roberta_base_finetuned_panx_all_fraisier_pipeline_en.md new file mode 100644 index 00000000000000..8fcb1b3de5eb78 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-10-xlm_roberta_base_finetuned_panx_all_fraisier_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_all_fraisier_pipeline pipeline XlmRoBertaForTokenClassification from Fraisier +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_all_fraisier_pipeline +date: 2024-09-10 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_all_fraisier_pipeline` is a English model originally trained by Fraisier. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_fraisier_pipeline_en_5.5.0_3.0_1725973885591.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_fraisier_pipeline_en_5.5.0_3.0_1725973885591.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_all_fraisier_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_all_fraisier_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_all_fraisier_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|861.0 MB| + +## References + +https://huggingface.co/Fraisier/xlm-roberta-base-finetuned-panx-all + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-10-xlm_roberta_base_finetuned_panx_all_oscarnav_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-10-xlm_roberta_base_finetuned_panx_all_oscarnav_pipeline_en.md new file mode 100644 index 00000000000000..f57074adbcfba8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-10-xlm_roberta_base_finetuned_panx_all_oscarnav_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_all_oscarnav_pipeline pipeline XlmRoBertaForTokenClassification from OscarNav +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_all_oscarnav_pipeline +date: 2024-09-10 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_all_oscarnav_pipeline` is a English model originally trained by OscarNav. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_oscarnav_pipeline_en_5.5.0_3.0_1725974778208.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_oscarnav_pipeline_en_5.5.0_3.0_1725974778208.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_all_oscarnav_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_all_oscarnav_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_all_oscarnav_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|848.0 MB| + +## References + +https://huggingface.co/OscarNav/xlm-roberta-base-finetuned-panx-all + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-10-xlm_roberta_base_finetuned_panx_english_g22tk021_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-10-xlm_roberta_base_finetuned_panx_english_g22tk021_pipeline_en.md new file mode 100644 index 00000000000000..b4032f1ac4e9de --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-10-xlm_roberta_base_finetuned_panx_english_g22tk021_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_english_g22tk021_pipeline pipeline XlmRoBertaForTokenClassification from g22tk021 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_english_g22tk021_pipeline +date: 2024-09-10 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_english_g22tk021_pipeline` is a English model originally trained by g22tk021. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_g22tk021_pipeline_en_5.5.0_3.0_1725973335366.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_g22tk021_pipeline_en_5.5.0_3.0_1725973335366.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_english_g22tk021_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_english_g22tk021_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_english_g22tk021_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|826.4 MB| + +## References + +https://huggingface.co/g22tk021/xlm-roberta-base-finetuned-panx-en + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-10-xlm_roberta_base_finetuned_panx_german_stdntlfe_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-10-xlm_roberta_base_finetuned_panx_german_stdntlfe_pipeline_en.md new file mode 100644 index 00000000000000..bdab02bd484dd9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-10-xlm_roberta_base_finetuned_panx_german_stdntlfe_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_stdntlfe_pipeline pipeline XlmRoBertaForTokenClassification from stdntlfe +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_stdntlfe_pipeline +date: 2024-09-10 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_stdntlfe_pipeline` is a English model originally trained by stdntlfe. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_stdntlfe_pipeline_en_5.5.0_3.0_1725985385454.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_stdntlfe_pipeline_en_5.5.0_3.0_1725985385454.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_stdntlfe_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_stdntlfe_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_stdntlfe_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/stdntlfe/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-10-xlm_roberta_base_finetuned_panx_german_winterlight28_en.md b/docs/_posts/ahmedlone127/2024-09-10-xlm_roberta_base_finetuned_panx_german_winterlight28_en.md new file mode 100644 index 00000000000000..a94614a7d3d391 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-10-xlm_roberta_base_finetuned_panx_german_winterlight28_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_winterlight28 XlmRoBertaForTokenClassification from winterlight28 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_winterlight28 +date: 2024-09-10 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_winterlight28` is a English model originally trained by winterlight28. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_winterlight28_en_5.5.0_3.0_1725985624545.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_winterlight28_en_5.5.0_3.0_1725985624545.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_winterlight28","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_winterlight28", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_winterlight28| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|840.8 MB| + +## References + +https://huggingface.co/winterlight28/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-10-xlm_roberta_base_finetuned_panx_italian_juhyun76_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-10-xlm_roberta_base_finetuned_panx_italian_juhyun76_pipeline_en.md new file mode 100644 index 00000000000000..3545769df2442f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-10-xlm_roberta_base_finetuned_panx_italian_juhyun76_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_juhyun76_pipeline pipeline XlmRoBertaForTokenClassification from juhyun76 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_juhyun76_pipeline +date: 2024-09-10 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_juhyun76_pipeline` is a English model originally trained by juhyun76. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_juhyun76_pipeline_en_5.5.0_3.0_1725985206534.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_juhyun76_pipeline_en_5.5.0_3.0_1725985206534.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_juhyun76_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_juhyun76_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_juhyun76_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|818.4 MB| + +## References + +https://huggingface.co/juhyun76/xlm-roberta-base-finetuned-panx-it + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-10-xlm_roberta_base_seed42_original_kinyarwanda_amh_eng_train_en.md b/docs/_posts/ahmedlone127/2024-09-10-xlm_roberta_base_seed42_original_kinyarwanda_amh_eng_train_en.md new file mode 100644 index 00000000000000..27bf83d80c547a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-10-xlm_roberta_base_seed42_original_kinyarwanda_amh_eng_train_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_seed42_original_kinyarwanda_amh_eng_train XlmRoBertaForSequenceClassification from shanhy +author: John Snow Labs +name: xlm_roberta_base_seed42_original_kinyarwanda_amh_eng_train +date: 2024-09-10 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_seed42_original_kinyarwanda_amh_eng_train` is a English model originally trained by shanhy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_seed42_original_kinyarwanda_amh_eng_train_en_5.5.0_3.0_1726003610201.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_seed42_original_kinyarwanda_amh_eng_train_en_5.5.0_3.0_1726003610201.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_seed42_original_kinyarwanda_amh_eng_train","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_seed42_original_kinyarwanda_amh_eng_train", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_seed42_original_kinyarwanda_amh_eng_train| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|798.2 MB| + +## References + +https://huggingface.co/shanhy/xlm-roberta-base_seed42_original_kin-amh-eng_train \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-10-xlmroberta_ner_employment_contract_ner_da.md b/docs/_posts/ahmedlone127/2024-09-10-xlmroberta_ner_employment_contract_ner_da.md new file mode 100644 index 00000000000000..b7daeae08362ac --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-10-xlmroberta_ner_employment_contract_ner_da.md @@ -0,0 +1,112 @@ +--- +layout: model +title: Danish XLMRobertaForTokenClassification Cased model (from saattrupdan) +author: John Snow Labs +name: xlmroberta_ner_employment_contract_ner +date: 2024-09-10 +tags: [da, open_source, xlm_roberta, ner, onnx] +task: Named Entity Recognition +language: da +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XLMRobertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `employment-contract-ner-da` is a Danish model originally trained by `saattrupdan`. + +## Predicted Entities + +`SALARY`, `STARTDATE`, `WORKHOURS`, `WORKPLACE` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_employment_contract_ner_da_5.5.0_3.0_1725974005655.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_employment_contract_ner_da_5.5.0_3.0_1725974005655.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +token_classifier = XlmRoBertaForTokenClassification.pretrained("xlmroberta_ner_employment_contract_ner","da") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("ner") + +ner_converter = NerConverter()\ + .setInputCols(["document", "token", "ner"])\ + .setOutputCol("ner_chunk") + +pipeline = Pipeline(stages=[documentAssembler, tokenizer, token_classifier, ner_converter]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCols(Array("text")) + .setOutputCols(Array("document")) + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val token_classifier = XlmRoBertaForTokenClassification.pretrained("xlmroberta_ner_employment_contract_ner","da") + .setInputCols(Array("document", "token")) + .setOutputCol("ner") + +val ner_converter = new NerConverter() + .setInputCols(Array("document", "token', "ner")) + .setOutputCol("ner_chunk") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, token_classifier, ner_converter)) + +val data = Seq("PUT YOUR STRING HERE").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("da.ner.xlmr_roberta").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmroberta_ner_employment_contract_ner| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|da| +|Size:|798.1 MB| + +## References + +References + +- https://huggingface.co/saattrupdan/employment-contract-ner-da \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-10-xlmroberta_ner_gpt2_large_detector_german_v1_pipeline_de.md b/docs/_posts/ahmedlone127/2024-09-10-xlmroberta_ner_gpt2_large_detector_german_v1_pipeline_de.md new file mode 100644 index 00000000000000..9220d9c4ddeb80 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-10-xlmroberta_ner_gpt2_large_detector_german_v1_pipeline_de.md @@ -0,0 +1,70 @@ +--- +layout: model +title: German xlmroberta_ner_gpt2_large_detector_german_v1_pipeline pipeline XlmRoBertaForTokenClassification from bettertextapp +author: John Snow Labs +name: xlmroberta_ner_gpt2_large_detector_german_v1_pipeline +date: 2024-09-10 +tags: [de, open_source, pipeline, onnx] +task: Named Entity Recognition +language: de +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlmroberta_ner_gpt2_large_detector_german_v1_pipeline` is a German model originally trained by bettertextapp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_gpt2_large_detector_german_v1_pipeline_de_5.5.0_3.0_1726012170831.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_gpt2_large_detector_german_v1_pipeline_de_5.5.0_3.0_1726012170831.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlmroberta_ner_gpt2_large_detector_german_v1_pipeline", lang = "de") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlmroberta_ner_gpt2_large_detector_german_v1_pipeline", lang = "de") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmroberta_ner_gpt2_large_detector_german_v1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|de| +|Size:|810.6 MB| + +## References + +https://huggingface.co/bettertextapp/gpt2-large-detector-de-v1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-011_microsoft_deberta_v3_base_finetuned_yahoo_8000_2000_en.md b/docs/_posts/ahmedlone127/2024-09-11-011_microsoft_deberta_v3_base_finetuned_yahoo_8000_2000_en.md new file mode 100644 index 00000000000000..901911cf6d262d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-011_microsoft_deberta_v3_base_finetuned_yahoo_8000_2000_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English 011_microsoft_deberta_v3_base_finetuned_yahoo_8000_2000 DeBertaForSequenceClassification from diogopaes10 +author: John Snow Labs +name: 011_microsoft_deberta_v3_base_finetuned_yahoo_8000_2000 +date: 2024-09-11 +tags: [en, open_source, onnx, sequence_classification, deberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DeBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DeBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`011_microsoft_deberta_v3_base_finetuned_yahoo_8000_2000` is a English model originally trained by diogopaes10. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/011_microsoft_deberta_v3_base_finetuned_yahoo_8000_2000_en_5.5.0_3.0_1726030288369.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/011_microsoft_deberta_v3_base_finetuned_yahoo_8000_2000_en_5.5.0_3.0_1726030288369.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DeBertaForSequenceClassification.pretrained("011_microsoft_deberta_v3_base_finetuned_yahoo_8000_2000","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DeBertaForSequenceClassification.pretrained("011_microsoft_deberta_v3_base_finetuned_yahoo_8000_2000", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|011_microsoft_deberta_v3_base_finetuned_yahoo_8000_2000| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|618.4 MB| + +## References + +https://huggingface.co/diogopaes10/011-microsoft-deberta-v3-base-finetuned-yahoo-8000_2000 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-0_00003_0_99_en.md b/docs/_posts/ahmedlone127/2024-09-11-0_00003_0_99_en.md new file mode 100644 index 00000000000000..67ee83382dab99 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-0_00003_0_99_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English 0_00003_0_99 RoBertaForSequenceClassification from rose-e-wang +author: John Snow Labs +name: 0_00003_0_99 +date: 2024-09-11 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`0_00003_0_99` is a English model originally trained by rose-e-wang. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/0_00003_0_99_en_5.5.0_3.0_1726063938909.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/0_00003_0_99_en_5.5.0_3.0_1726063938909.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("0_00003_0_99","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("0_00003_0_99", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|0_00003_0_99| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/rose-e-wang/0.00003_0.99 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-albert_base_v2weighted_hoax_classifier_definition_en.md b/docs/_posts/ahmedlone127/2024-09-11-albert_base_v2weighted_hoax_classifier_definition_en.md new file mode 100644 index 00000000000000..5a6b1a9b21f84e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-albert_base_v2weighted_hoax_classifier_definition_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English albert_base_v2weighted_hoax_classifier_definition AlbertForSequenceClassification from research-dump +author: John Snow Labs +name: albert_base_v2weighted_hoax_classifier_definition +date: 2024-09-11 +tags: [en, open_source, onnx, sequence_classification, albert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: AlbertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained AlbertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`albert_base_v2weighted_hoax_classifier_definition` is a English model originally trained by research-dump. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/albert_base_v2weighted_hoax_classifier_definition_en_5.5.0_3.0_1726027198098.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/albert_base_v2weighted_hoax_classifier_definition_en_5.5.0_3.0_1726027198098.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = AlbertForSequenceClassification.pretrained("albert_base_v2weighted_hoax_classifier_definition","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = AlbertForSequenceClassification.pretrained("albert_base_v2weighted_hoax_classifier_definition", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|albert_base_v2weighted_hoax_classifier_definition| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|44.2 MB| + +## References + +https://huggingface.co/research-dump/albert-base-v2weighted_hoax_classifier_definition \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-all_roberta_large_v1_banking_1_16_5_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-11-all_roberta_large_v1_banking_1_16_5_pipeline_en.md new file mode 100644 index 00000000000000..3a8be89d0266b6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-all_roberta_large_v1_banking_1_16_5_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English all_roberta_large_v1_banking_1_16_5_pipeline pipeline RoBertaForSequenceClassification from fathyshalab +author: John Snow Labs +name: all_roberta_large_v1_banking_1_16_5_pipeline +date: 2024-09-11 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`all_roberta_large_v1_banking_1_16_5_pipeline` is a English model originally trained by fathyshalab. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/all_roberta_large_v1_banking_1_16_5_pipeline_en_5.5.0_3.0_1726060856197.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/all_roberta_large_v1_banking_1_16_5_pipeline_en_5.5.0_3.0_1726060856197.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("all_roberta_large_v1_banking_1_16_5_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("all_roberta_large_v1_banking_1_16_5_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|all_roberta_large_v1_banking_1_16_5_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/fathyshalab/all-roberta-large-v1-banking-1-16-5 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-amazon_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-11-amazon_pipeline_en.md new file mode 100644 index 00000000000000..a5118a6e9b1b81 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-amazon_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English amazon_pipeline pipeline DistilBertForSequenceClassification from bl03 +author: John Snow Labs +name: amazon_pipeline +date: 2024-09-11 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`amazon_pipeline` is a English model originally trained by bl03. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/amazon_pipeline_en_5.5.0_3.0_1726017805766.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/amazon_pipeline_en_5.5.0_3.0_1726017805766.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("amazon_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("amazon_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|amazon_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/bl03/amazon + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-babyberta_aochildes_2_5m_aochildes_french_with_masking_seed3_finetuned_squad_en.md b/docs/_posts/ahmedlone127/2024-09-11-babyberta_aochildes_2_5m_aochildes_french_with_masking_seed3_finetuned_squad_en.md new file mode 100644 index 00000000000000..f0a291ac2558a2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-babyberta_aochildes_2_5m_aochildes_french_with_masking_seed3_finetuned_squad_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English babyberta_aochildes_2_5m_aochildes_french_with_masking_seed3_finetuned_squad RoBertaForQuestionAnswering from lielbin +author: John Snow Labs +name: babyberta_aochildes_2_5m_aochildes_french_with_masking_seed3_finetuned_squad +date: 2024-09-11 +tags: [en, open_source, onnx, question_answering, roberta] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`babyberta_aochildes_2_5m_aochildes_french_with_masking_seed3_finetuned_squad` is a English model originally trained by lielbin. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/babyberta_aochildes_2_5m_aochildes_french_with_masking_seed3_finetuned_squad_en_5.5.0_3.0_1726058540470.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/babyberta_aochildes_2_5m_aochildes_french_with_masking_seed3_finetuned_squad_en_5.5.0_3.0_1726058540470.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = RoBertaForQuestionAnswering.pretrained("babyberta_aochildes_2_5m_aochildes_french_with_masking_seed3_finetuned_squad","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = RoBertaForQuestionAnswering.pretrained("babyberta_aochildes_2_5m_aochildes_french_with_masking_seed3_finetuned_squad", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|babyberta_aochildes_2_5m_aochildes_french_with_masking_seed3_finetuned_squad| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|32.0 MB| + +## References + +https://huggingface.co/lielbin/BabyBERTa-aochildes_2.5M_aochildes-french-with-Masking-seed3-finetuned-SQuAD \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-babyberta_wikipedia1_2_5m_wikipedia_french_with_masking_finetuned_squad_en.md b/docs/_posts/ahmedlone127/2024-09-11-babyberta_wikipedia1_2_5m_wikipedia_french_with_masking_finetuned_squad_en.md new file mode 100644 index 00000000000000..f10356f416d6f0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-babyberta_wikipedia1_2_5m_wikipedia_french_with_masking_finetuned_squad_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English babyberta_wikipedia1_2_5m_wikipedia_french_with_masking_finetuned_squad RoBertaForQuestionAnswering from lielbin +author: John Snow Labs +name: babyberta_wikipedia1_2_5m_wikipedia_french_with_masking_finetuned_squad +date: 2024-09-11 +tags: [en, open_source, onnx, question_answering, roberta] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`babyberta_wikipedia1_2_5m_wikipedia_french_with_masking_finetuned_squad` is a English model originally trained by lielbin. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/babyberta_wikipedia1_2_5m_wikipedia_french_with_masking_finetuned_squad_en_5.5.0_3.0_1726062322048.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/babyberta_wikipedia1_2_5m_wikipedia_french_with_masking_finetuned_squad_en_5.5.0_3.0_1726062322048.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = RoBertaForQuestionAnswering.pretrained("babyberta_wikipedia1_2_5m_wikipedia_french_with_masking_finetuned_squad","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = RoBertaForQuestionAnswering.pretrained("babyberta_wikipedia1_2_5m_wikipedia_french_with_masking_finetuned_squad", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|babyberta_wikipedia1_2_5m_wikipedia_french_with_masking_finetuned_squad| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|32.0 MB| + +## References + +https://huggingface.co/lielbin/BabyBERTa-wikipedia1_2.5M_wikipedia_french-with-Masking-finetuned-SQuAD \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-babyberta_wikipedia1_2_5m_wikipedia_french_with_masking_seed3_finetuned_squad_en.md b/docs/_posts/ahmedlone127/2024-09-11-babyberta_wikipedia1_2_5m_wikipedia_french_with_masking_seed3_finetuned_squad_en.md new file mode 100644 index 00000000000000..37392c2be7ecad --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-babyberta_wikipedia1_2_5m_wikipedia_french_with_masking_seed3_finetuned_squad_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English babyberta_wikipedia1_2_5m_wikipedia_french_with_masking_seed3_finetuned_squad RoBertaForQuestionAnswering from lielbin +author: John Snow Labs +name: babyberta_wikipedia1_2_5m_wikipedia_french_with_masking_seed3_finetuned_squad +date: 2024-09-11 +tags: [en, open_source, onnx, question_answering, roberta] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`babyberta_wikipedia1_2_5m_wikipedia_french_with_masking_seed3_finetuned_squad` is a English model originally trained by lielbin. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/babyberta_wikipedia1_2_5m_wikipedia_french_with_masking_seed3_finetuned_squad_en_5.5.0_3.0_1726039179363.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/babyberta_wikipedia1_2_5m_wikipedia_french_with_masking_seed3_finetuned_squad_en_5.5.0_3.0_1726039179363.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = RoBertaForQuestionAnswering.pretrained("babyberta_wikipedia1_2_5m_wikipedia_french_with_masking_seed3_finetuned_squad","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = RoBertaForQuestionAnswering.pretrained("babyberta_wikipedia1_2_5m_wikipedia_french_with_masking_seed3_finetuned_squad", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|babyberta_wikipedia1_2_5m_wikipedia_french_with_masking_seed3_finetuned_squad| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|32.0 MB| + +## References + +https://huggingface.co/lielbin/BabyBERTa-wikipedia1_2.5M_wikipedia_french-with-Masking-seed3-finetuned-SQuAD \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-bert_base_uncased_dbpedia_14_en.md b/docs/_posts/ahmedlone127/2024-09-11-bert_base_uncased_dbpedia_14_en.md new file mode 100644 index 00000000000000..17d606e07c3a89 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-bert_base_uncased_dbpedia_14_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_uncased_dbpedia_14 BertForSequenceClassification from fabriceyhc +author: John Snow Labs +name: bert_base_uncased_dbpedia_14 +date: 2024-09-11 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_dbpedia_14` is a English model originally trained by fabriceyhc. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_dbpedia_14_en_5.5.0_3.0_1726015525366.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_dbpedia_14_en_5.5.0_3.0_1726015525366.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_uncased_dbpedia_14","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_uncased_dbpedia_14", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_dbpedia_14| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/fabriceyhc/bert-base-uncased-dbpedia_14 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-bert_base_uncased_finetuned_srl_arg_en.md b/docs/_posts/ahmedlone127/2024-09-11-bert_base_uncased_finetuned_srl_arg_en.md new file mode 100644 index 00000000000000..0a736dffa2fb8b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-bert_base_uncased_finetuned_srl_arg_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_uncased_finetuned_srl_arg BertForTokenClassification from dannashao +author: John Snow Labs +name: bert_base_uncased_finetuned_srl_arg +date: 2024-09-11 +tags: [en, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetuned_srl_arg` is a English model originally trained by dannashao. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_srl_arg_en_5.5.0_3.0_1726026192173.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_srl_arg_en_5.5.0_3.0_1726026192173.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("bert_base_uncased_finetuned_srl_arg","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_base_uncased_finetuned_srl_arg", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetuned_srl_arg| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|407.4 MB| + +## References + +https://huggingface.co/dannashao/bert-base-uncased-finetuned-srl_arg \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-bert_uncased_slot_filling_en.md b/docs/_posts/ahmedlone127/2024-09-11-bert_uncased_slot_filling_en.md new file mode 100644 index 00000000000000..38046db6d81b6d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-bert_uncased_slot_filling_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_uncased_slot_filling BertForTokenClassification from andgonzalez +author: John Snow Labs +name: bert_uncased_slot_filling +date: 2024-09-11 +tags: [en, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_uncased_slot_filling` is a English model originally trained by andgonzalez. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_uncased_slot_filling_en_5.5.0_3.0_1726026081713.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_uncased_slot_filling_en_5.5.0_3.0_1726026081713.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("bert_uncased_slot_filling","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_uncased_slot_filling", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_uncased_slot_filling| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|407.4 MB| + +## References + +https://huggingface.co/andgonzalez/bert-uncased-slot-filling \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-best_model_yelp_polarity_16_87_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-11-best_model_yelp_polarity_16_87_pipeline_en.md new file mode 100644 index 00000000000000..22605a784c6b25 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-best_model_yelp_polarity_16_87_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English best_model_yelp_polarity_16_87_pipeline pipeline AlbertForSequenceClassification from simonycl +author: John Snow Labs +name: best_model_yelp_polarity_16_87_pipeline +date: 2024-09-11 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained AlbertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`best_model_yelp_polarity_16_87_pipeline` is a English model originally trained by simonycl. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/best_model_yelp_polarity_16_87_pipeline_en_5.5.0_3.0_1726013196169.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/best_model_yelp_polarity_16_87_pipeline_en_5.5.0_3.0_1726013196169.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("best_model_yelp_polarity_16_87_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("best_model_yelp_polarity_16_87_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|best_model_yelp_polarity_16_87_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|44.2 MB| + +## References + +https://huggingface.co/simonycl/best_model-yelp_polarity-16-87 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- AlbertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-best_model_yelp_polarity_32_100_en.md b/docs/_posts/ahmedlone127/2024-09-11-best_model_yelp_polarity_32_100_en.md new file mode 100644 index 00000000000000..8a60fc7191a3e2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-best_model_yelp_polarity_32_100_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English best_model_yelp_polarity_32_100 AlbertForSequenceClassification from simonycl +author: John Snow Labs +name: best_model_yelp_polarity_32_100 +date: 2024-09-11 +tags: [en, open_source, onnx, sequence_classification, albert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: AlbertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained AlbertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`best_model_yelp_polarity_32_100` is a English model originally trained by simonycl. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/best_model_yelp_polarity_32_100_en_5.5.0_3.0_1726013594619.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/best_model_yelp_polarity_32_100_en_5.5.0_3.0_1726013594619.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = AlbertForSequenceClassification.pretrained("best_model_yelp_polarity_32_100","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = AlbertForSequenceClassification.pretrained("best_model_yelp_polarity_32_100", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|best_model_yelp_polarity_32_100| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|44.2 MB| + +## References + +https://huggingface.co/simonycl/best_model-yelp_polarity-32-100 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-burmese_awesome_english_vietnamese_model_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-11-burmese_awesome_english_vietnamese_model_pipeline_en.md new file mode 100644 index 00000000000000..7830eb99441048 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-burmese_awesome_english_vietnamese_model_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_english_vietnamese_model_pipeline pipeline MarianTransformer from Kudod +author: John Snow Labs +name: burmese_awesome_english_vietnamese_model_pipeline +date: 2024-09-11 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_english_vietnamese_model_pipeline` is a English model originally trained by Kudod. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_english_vietnamese_model_pipeline_en_5.5.0_3.0_1726073261986.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_english_vietnamese_model_pipeline_en_5.5.0_3.0_1726073261986.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_english_vietnamese_model_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_english_vietnamese_model_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_english_vietnamese_model_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|595.0 MB| + +## References + +https://huggingface.co/Kudod/my_awesome_en_vi_model + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-burmese_awesome_model_patchingfailed_en.md b/docs/_posts/ahmedlone127/2024-09-11-burmese_awesome_model_patchingfailed_en.md new file mode 100644 index 00000000000000..005d3494942dae --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-burmese_awesome_model_patchingfailed_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_awesome_model_patchingfailed DistilBertForSequenceClassification from patchingfailed +author: John Snow Labs +name: burmese_awesome_model_patchingfailed +date: 2024-09-11 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_patchingfailed` is a English model originally trained by patchingfailed. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_patchingfailed_en_5.5.0_3.0_1726052092612.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_patchingfailed_en_5.5.0_3.0_1726052092612.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_patchingfailed","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_patchingfailed", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_patchingfailed| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/patchingfailed/my_awesome_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-burmese_awesome_model_patchingfailed_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-11-burmese_awesome_model_patchingfailed_pipeline_en.md new file mode 100644 index 00000000000000..e0dc989f1c2418 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-burmese_awesome_model_patchingfailed_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_model_patchingfailed_pipeline pipeline DistilBertForSequenceClassification from patchingfailed +author: John Snow Labs +name: burmese_awesome_model_patchingfailed_pipeline +date: 2024-09-11 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_patchingfailed_pipeline` is a English model originally trained by patchingfailed. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_patchingfailed_pipeline_en_5.5.0_3.0_1726052104730.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_patchingfailed_pipeline_en_5.5.0_3.0_1726052104730.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_model_patchingfailed_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_model_patchingfailed_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_patchingfailed_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/patchingfailed/my_awesome_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-burmese_awesome_qa_model_30len_en.md b/docs/_posts/ahmedlone127/2024-09-11-burmese_awesome_qa_model_30len_en.md new file mode 100644 index 00000000000000..c6b8bd495406a9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-burmese_awesome_qa_model_30len_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English burmese_awesome_qa_model_30len RoBertaForQuestionAnswering from yashwan2003 +author: John Snow Labs +name: burmese_awesome_qa_model_30len +date: 2024-09-11 +tags: [en, open_source, onnx, question_answering, roberta] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_qa_model_30len` is a English model originally trained by yashwan2003. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_30len_en_5.5.0_3.0_1726036583112.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_30len_en_5.5.0_3.0_1726036583112.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = RoBertaForQuestionAnswering.pretrained("burmese_awesome_qa_model_30len","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = RoBertaForQuestionAnswering.pretrained("burmese_awesome_qa_model_30len", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_qa_model_30len| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|463.6 MB| + +## References + +https://huggingface.co/yashwan2003/my_awesome_qa_model_30len \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-burmese_awesome_qa_model_30len_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-11-burmese_awesome_qa_model_30len_pipeline_en.md new file mode 100644 index 00000000000000..703974ff8b10c9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-burmese_awesome_qa_model_30len_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English burmese_awesome_qa_model_30len_pipeline pipeline RoBertaForQuestionAnswering from yashwan2003 +author: John Snow Labs +name: burmese_awesome_qa_model_30len_pipeline +date: 2024-09-11 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_qa_model_30len_pipeline` is a English model originally trained by yashwan2003. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_30len_pipeline_en_5.5.0_3.0_1726036604022.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_30len_pipeline_en_5.5.0_3.0_1726036604022.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_qa_model_30len_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_qa_model_30len_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_qa_model_30len_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|463.6 MB| + +## References + +https://huggingface.co/yashwan2003/my_awesome_qa_model_30len + +## Included Models + +- MultiDocumentAssembler +- RoBertaForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-burmese_awesome_qa_model_kjh97_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-11-burmese_awesome_qa_model_kjh97_pipeline_en.md new file mode 100644 index 00000000000000..f18a0a6735e10b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-burmese_awesome_qa_model_kjh97_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English burmese_awesome_qa_model_kjh97_pipeline pipeline RoBertaForQuestionAnswering from KJH97 +author: John Snow Labs +name: burmese_awesome_qa_model_kjh97_pipeline +date: 2024-09-11 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_qa_model_kjh97_pipeline` is a English model originally trained by KJH97. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_kjh97_pipeline_en_5.5.0_3.0_1726055879861.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_kjh97_pipeline_en_5.5.0_3.0_1726055879861.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_qa_model_kjh97_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_qa_model_kjh97_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_qa_model_kjh97_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|463.6 MB| + +## References + +https://huggingface.co/KJH97/my_awesome_qa_model + +## Included Models + +- MultiDocumentAssembler +- RoBertaForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-burmese_awesome_setfit_model_pablongo_en.md b/docs/_posts/ahmedlone127/2024-09-11-burmese_awesome_setfit_model_pablongo_en.md new file mode 100644 index 00000000000000..138378b6c1397b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-burmese_awesome_setfit_model_pablongo_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English burmese_awesome_setfit_model_pablongo MPNetEmbeddings from Pablongo +author: John Snow Labs +name: burmese_awesome_setfit_model_pablongo +date: 2024-09-11 +tags: [en, open_source, onnx, embeddings, mpnet] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MPNetEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MPNetEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_setfit_model_pablongo` is a English model originally trained by Pablongo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_setfit_model_pablongo_en_5.5.0_3.0_1726089184345.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_setfit_model_pablongo_en_5.5.0_3.0_1726089184345.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = MPNetEmbeddings.pretrained("burmese_awesome_setfit_model_pablongo","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val embeddings = MPNetEmbeddings.pretrained("burmese_awesome_setfit_model_pablongo","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_setfit_model_pablongo| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[mpnet]| +|Language:|en| +|Size:|406.9 MB| + +## References + +https://huggingface.co/Pablongo/my-awesome-setfit-model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-craft_bionlp_roberta_base_en.md b/docs/_posts/ahmedlone127/2024-09-11-craft_bionlp_roberta_base_en.md new file mode 100644 index 00000000000000..fd2937c6660a79 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-craft_bionlp_roberta_base_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English craft_bionlp_roberta_base RoBertaEmbeddings from abhi1nandy2 +author: John Snow Labs +name: craft_bionlp_roberta_base +date: 2024-09-11 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`craft_bionlp_roberta_base` is a English model originally trained by abhi1nandy2. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/craft_bionlp_roberta_base_en_5.5.0_3.0_1726065724659.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/craft_bionlp_roberta_base_en_5.5.0_3.0_1726065724659.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("craft_bionlp_roberta_base","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("craft_bionlp_roberta_base","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|craft_bionlp_roberta_base| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|466.3 MB| + +## References + +https://huggingface.co/abhi1nandy2/Craft-bionlp-roberta-base \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-custommodel_finance_sentiment_analytics_en.md b/docs/_posts/ahmedlone127/2024-09-11-custommodel_finance_sentiment_analytics_en.md new file mode 100644 index 00000000000000..aa8491bbb7f2b6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-custommodel_finance_sentiment_analytics_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English custommodel_finance_sentiment_analytics RoBertaForSequenceClassification from WillWEI0103 +author: John Snow Labs +name: custommodel_finance_sentiment_analytics +date: 2024-09-11 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`custommodel_finance_sentiment_analytics` is a English model originally trained by WillWEI0103. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/custommodel_finance_sentiment_analytics_en_5.5.0_3.0_1726022221653.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/custommodel_finance_sentiment_analytics_en_5.5.0_3.0_1726022221653.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("custommodel_finance_sentiment_analytics","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("custommodel_finance_sentiment_analytics", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|custommodel_finance_sentiment_analytics| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|308.7 MB| + +## References + +https://huggingface.co/WillWEI0103/CustomModel_finance_sentiment_analytics \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-deberta_v3_base_finetuned_cola_en.md b/docs/_posts/ahmedlone127/2024-09-11-deberta_v3_base_finetuned_cola_en.md new file mode 100644 index 00000000000000..35348a45dd645a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-deberta_v3_base_finetuned_cola_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English deberta_v3_base_finetuned_cola DeBertaForSequenceClassification from manyet1k +author: John Snow Labs +name: deberta_v3_base_finetuned_cola +date: 2024-09-11 +tags: [en, open_source, onnx, sequence_classification, deberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DeBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DeBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`deberta_v3_base_finetuned_cola` is a English model originally trained by manyet1k. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/deberta_v3_base_finetuned_cola_en_5.5.0_3.0_1726099028615.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/deberta_v3_base_finetuned_cola_en_5.5.0_3.0_1726099028615.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DeBertaForSequenceClassification.pretrained("deberta_v3_base_finetuned_cola","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DeBertaForSequenceClassification.pretrained("deberta_v3_base_finetuned_cola", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|deberta_v3_base_finetuned_cola| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|586.7 MB| + +## References + +https://huggingface.co/manyet1k/deberta-v3-base-finetuned-cola \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-deberta_v3_large_survey_related_passage_consistency_rater_gpt4_en.md b/docs/_posts/ahmedlone127/2024-09-11-deberta_v3_large_survey_related_passage_consistency_rater_gpt4_en.md new file mode 100644 index 00000000000000..0f3b9c0d4dd671 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-deberta_v3_large_survey_related_passage_consistency_rater_gpt4_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English deberta_v3_large_survey_related_passage_consistency_rater_gpt4 DeBertaForSequenceClassification from domenicrosati +author: John Snow Labs +name: deberta_v3_large_survey_related_passage_consistency_rater_gpt4 +date: 2024-09-11 +tags: [en, open_source, onnx, sequence_classification, deberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DeBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DeBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`deberta_v3_large_survey_related_passage_consistency_rater_gpt4` is a English model originally trained by domenicrosati. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/deberta_v3_large_survey_related_passage_consistency_rater_gpt4_en_5.5.0_3.0_1726029645322.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/deberta_v3_large_survey_related_passage_consistency_rater_gpt4_en_5.5.0_3.0_1726029645322.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DeBertaForSequenceClassification.pretrained("deberta_v3_large_survey_related_passage_consistency_rater_gpt4","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DeBertaForSequenceClassification.pretrained("deberta_v3_large_survey_related_passage_consistency_rater_gpt4", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|deberta_v3_large_survey_related_passage_consistency_rater_gpt4| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.5 GB| + +## References + +https://huggingface.co/domenicrosati/deberta-v3-large-survey-related_passage_consistency-rater-gpt4 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-did_the_doctor_thank_the_patient_for_his_treatment_oriya_wanted_tonga_tonga_islands_belarusian_healthy_bert_last128_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-11-did_the_doctor_thank_the_patient_for_his_treatment_oriya_wanted_tonga_tonga_islands_belarusian_healthy_bert_last128_pipeline_en.md new file mode 100644 index 00000000000000..830b7b160ee942 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-did_the_doctor_thank_the_patient_for_his_treatment_oriya_wanted_tonga_tonga_islands_belarusian_healthy_bert_last128_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English did_the_doctor_thank_the_patient_for_his_treatment_oriya_wanted_tonga_tonga_islands_belarusian_healthy_bert_last128_pipeline pipeline BertForSequenceClassification from etadevosyan +author: John Snow Labs +name: did_the_doctor_thank_the_patient_for_his_treatment_oriya_wanted_tonga_tonga_islands_belarusian_healthy_bert_last128_pipeline +date: 2024-09-11 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`did_the_doctor_thank_the_patient_for_his_treatment_oriya_wanted_tonga_tonga_islands_belarusian_healthy_bert_last128_pipeline` is a English model originally trained by etadevosyan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/did_the_doctor_thank_the_patient_for_his_treatment_oriya_wanted_tonga_tonga_islands_belarusian_healthy_bert_last128_pipeline_en_5.5.0_3.0_1726092523391.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/did_the_doctor_thank_the_patient_for_his_treatment_oriya_wanted_tonga_tonga_islands_belarusian_healthy_bert_last128_pipeline_en_5.5.0_3.0_1726092523391.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("did_the_doctor_thank_the_patient_for_his_treatment_oriya_wanted_tonga_tonga_islands_belarusian_healthy_bert_last128_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("did_the_doctor_thank_the_patient_for_his_treatment_oriya_wanted_tonga_tonga_islands_belarusian_healthy_bert_last128_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|did_the_doctor_thank_the_patient_for_his_treatment_oriya_wanted_tonga_tonga_islands_belarusian_healthy_bert_last128_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|666.5 MB| + +## References + +https://huggingface.co/etadevosyan/did_the_doctor_thank_the_patient_for_his_treatment_or_wanted_to_be_healthy_bert_Last128 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-distilbert_base_uncased_2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-11-distilbert_base_uncased_2_pipeline_en.md new file mode 100644 index 00000000000000..fade05e7c2fb6a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-distilbert_base_uncased_2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_2_pipeline pipeline DistilBertForTokenClassification from bisoye +author: John Snow Labs +name: distilbert_base_uncased_2_pipeline +date: 2024-09-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_2_pipeline` is a English model originally trained by bisoye. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_2_pipeline_en_5.5.0_3.0_1726093323539.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_2_pipeline_en_5.5.0_3.0_1726093323539.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.3 MB| + +## References + +https://huggingface.co/bisoye/distilbert-base-uncased_2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-distilbert_base_uncased_finetuned_clinc_saqidr_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-11-distilbert_base_uncased_finetuned_clinc_saqidr_pipeline_en.md new file mode 100644 index 00000000000000..709f803e0bbc3c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-distilbert_base_uncased_finetuned_clinc_saqidr_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_clinc_saqidr_pipeline pipeline DistilBertForSequenceClassification from saqidr +author: John Snow Labs +name: distilbert_base_uncased_finetuned_clinc_saqidr_pipeline +date: 2024-09-11 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_clinc_saqidr_pipeline` is a English model originally trained by saqidr. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_clinc_saqidr_pipeline_en_5.5.0_3.0_1726052202991.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_clinc_saqidr_pipeline_en_5.5.0_3.0_1726052202991.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_clinc_saqidr_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_clinc_saqidr_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_clinc_saqidr_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.9 MB| + +## References + +https://huggingface.co/saqidr/distilbert-base-uncased-finetuned-clinc + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-distilbert_base_uncased_finetuned_emotion_overall_normalised_text_3_0_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-11-distilbert_base_uncased_finetuned_emotion_overall_normalised_text_3_0_pipeline_en.md new file mode 100644 index 00000000000000..3200728e7c7a74 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-distilbert_base_uncased_finetuned_emotion_overall_normalised_text_3_0_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_overall_normalised_text_3_0_pipeline pipeline DistilBertForSequenceClassification from LeBruse +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_overall_normalised_text_3_0_pipeline +date: 2024-09-11 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_overall_normalised_text_3_0_pipeline` is a English model originally trained by LeBruse. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_overall_normalised_text_3_0_pipeline_en_5.5.0_3.0_1726014530074.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_overall_normalised_text_3_0_pipeline_en_5.5.0_3.0_1726014530074.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_overall_normalised_text_3_0_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_overall_normalised_text_3_0_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_overall_normalised_text_3_0_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/LeBruse/distilbert-base-uncased-finetuned-emotion-overall-normalised-text-3.0 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-distilbert_base_uncased_finetuned_emotion_ttellner_en.md b/docs/_posts/ahmedlone127/2024-09-11-distilbert_base_uncased_finetuned_emotion_ttellner_en.md new file mode 100644 index 00000000000000..6a254176d540d8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-distilbert_base_uncased_finetuned_emotion_ttellner_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_ttellner DistilBertForSequenceClassification from ttellner +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_ttellner +date: 2024-09-11 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_ttellner` is a English model originally trained by ttellner. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_ttellner_en_5.5.0_3.0_1726014552276.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_ttellner_en_5.5.0_3.0_1726014552276.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_ttellner","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_ttellner", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_ttellner| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/ttellner/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-distilbert_base_uncased_finetuned_events_v6_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-11-distilbert_base_uncased_finetuned_events_v6_pipeline_en.md new file mode 100644 index 00000000000000..3519a72b98d670 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-distilbert_base_uncased_finetuned_events_v6_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_events_v6_pipeline pipeline DistilBertForSequenceClassification from joedonino +author: John Snow Labs +name: distilbert_base_uncased_finetuned_events_v6_pipeline +date: 2024-09-11 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_events_v6_pipeline` is a English model originally trained by joedonino. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_events_v6_pipeline_en_5.5.0_3.0_1726017866126.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_events_v6_pipeline_en_5.5.0_3.0_1726017866126.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_events_v6_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_events_v6_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_events_v6_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/joedonino/distilbert-base-uncased-finetuned-events-v6 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-distilbert_base_uncased_odm_zphr_0st7sd_ut72ut5_plprefix0stlarge_simsp100_clean200_en.md b/docs/_posts/ahmedlone127/2024-09-11-distilbert_base_uncased_odm_zphr_0st7sd_ut72ut5_plprefix0stlarge_simsp100_clean200_en.md new file mode 100644 index 00000000000000..27ca4911c51e9f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-distilbert_base_uncased_odm_zphr_0st7sd_ut72ut5_plprefix0stlarge_simsp100_clean200_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st7sd_ut72ut5_plprefix0stlarge_simsp100_clean200 DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st7sd_ut72ut5_plprefix0stlarge_simsp100_clean200 +date: 2024-09-11 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st7sd_ut72ut5_plprefix0stlarge_simsp100_clean200` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st7sd_ut72ut5_plprefix0stlarge_simsp100_clean200_en_5.5.0_3.0_1726014218360.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st7sd_ut72ut5_plprefix0stlarge_simsp100_clean200_en_5.5.0_3.0_1726014218360.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st7sd_ut72ut5_plprefix0stlarge_simsp100_clean200","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st7sd_ut72ut5_plprefix0stlarge_simsp100_clean200", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st7sd_ut72ut5_plprefix0stlarge_simsp100_clean200| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.6 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st7sd_ut72ut5_PLPrefix0stlarge_simsp100_clean200 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-distilbert_base_uncased_odm_zphr_0st9sd_ut72ut1large9pfxnf_simsp_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-11-distilbert_base_uncased_odm_zphr_0st9sd_ut72ut1large9pfxnf_simsp_pipeline_en.md new file mode 100644 index 00000000000000..2a190cb9ec5311 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-distilbert_base_uncased_odm_zphr_0st9sd_ut72ut1large9pfxnf_simsp_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st9sd_ut72ut1large9pfxnf_simsp_pipeline pipeline DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st9sd_ut72ut1large9pfxnf_simsp_pipeline +date: 2024-09-11 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st9sd_ut72ut1large9pfxnf_simsp_pipeline` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st9sd_ut72ut1large9pfxnf_simsp_pipeline_en_5.5.0_3.0_1726017706941.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st9sd_ut72ut1large9pfxnf_simsp_pipeline_en_5.5.0_3.0_1726017706941.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st9sd_ut72ut1large9pfxnf_simsp_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st9sd_ut72ut1large9pfxnf_simsp_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st9sd_ut72ut1large9pfxnf_simsp_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.7 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st9sd_ut72ut1large9PfxNf_simsp + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-distilbert_qa_aqg_chuvash_squad_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-11-distilbert_qa_aqg_chuvash_squad_pipeline_en.md new file mode 100644 index 00000000000000..78173b9586cb5c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-distilbert_qa_aqg_chuvash_squad_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English distilbert_qa_aqg_chuvash_squad_pipeline pipeline DistilBertForQuestionAnswering from sunitha +author: John Snow Labs +name: distilbert_qa_aqg_chuvash_squad_pipeline +date: 2024-09-11 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_qa_aqg_chuvash_squad_pipeline` is a English model originally trained by sunitha. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_qa_aqg_chuvash_squad_pipeline_en_5.5.0_3.0_1726087879238.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_qa_aqg_chuvash_squad_pipeline_en_5.5.0_3.0_1726087879238.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_qa_aqg_chuvash_squad_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_qa_aqg_chuvash_squad_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_qa_aqg_chuvash_squad_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.3 MB| + +## References + +https://huggingface.co/sunitha/AQG_CV_Squad + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-distilroberta_base_etc_sym_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-11-distilroberta_base_etc_sym_pipeline_en.md new file mode 100644 index 00000000000000..79811dc295a03a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-distilroberta_base_etc_sym_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilroberta_base_etc_sym_pipeline pipeline RoBertaForSequenceClassification from agi-css +author: John Snow Labs +name: distilroberta_base_etc_sym_pipeline +date: 2024-09-11 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilroberta_base_etc_sym_pipeline` is a English model originally trained by agi-css. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilroberta_base_etc_sym_pipeline_en_5.5.0_3.0_1726054021995.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilroberta_base_etc_sym_pipeline_en_5.5.0_3.0_1726054021995.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilroberta_base_etc_sym_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilroberta_base_etc_sym_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilroberta_base_etc_sym_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|308.8 MB| + +## References + +https://huggingface.co/agi-css/distilroberta-base-etc-sym + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-dutch_tonga_tonga_islands_iac_marian_en.md b/docs/_posts/ahmedlone127/2024-09-11-dutch_tonga_tonga_islands_iac_marian_en.md new file mode 100644 index 00000000000000..b8297add2bc69c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-dutch_tonga_tonga_islands_iac_marian_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English dutch_tonga_tonga_islands_iac_marian MarianTransformer from MihaiIonascu +author: John Snow Labs +name: dutch_tonga_tonga_islands_iac_marian +date: 2024-09-11 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`dutch_tonga_tonga_islands_iac_marian` is a English model originally trained by MihaiIonascu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/dutch_tonga_tonga_islands_iac_marian_en_5.5.0_3.0_1726049225070.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/dutch_tonga_tonga_islands_iac_marian_en_5.5.0_3.0_1726049225070.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("dutch_tonga_tonga_islands_iac_marian","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("dutch_tonga_tonga_islands_iac_marian","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|dutch_tonga_tonga_islands_iac_marian| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|196.3 MB| + +## References + +https://huggingface.co/MihaiIonascu/NL_to_IaC_Marian \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-europarl_roberta_base_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-11-europarl_roberta_base_pipeline_en.md new file mode 100644 index 00000000000000..9d83e12193e56b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-europarl_roberta_base_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English europarl_roberta_base_pipeline pipeline RoBertaEmbeddings from abhi1nandy2 +author: John Snow Labs +name: europarl_roberta_base_pipeline +date: 2024-09-11 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`europarl_roberta_base_pipeline` is a English model originally trained by abhi1nandy2. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/europarl_roberta_base_pipeline_en_5.5.0_3.0_1726024371662.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/europarl_roberta_base_pipeline_en_5.5.0_3.0_1726024371662.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("europarl_roberta_base_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("europarl_roberta_base_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|europarl_roberta_base_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|466.1 MB| + +## References + +https://huggingface.co/abhi1nandy2/Europarl-roberta-base + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-exxon_semantic_search_en.md b/docs/_posts/ahmedlone127/2024-09-11-exxon_semantic_search_en.md new file mode 100644 index 00000000000000..2e25b48bc0fefb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-exxon_semantic_search_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English exxon_semantic_search MPNetEmbeddings from akshitguptafintek24 +author: John Snow Labs +name: exxon_semantic_search +date: 2024-09-11 +tags: [en, open_source, onnx, embeddings, mpnet] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MPNetEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MPNetEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`exxon_semantic_search` is a English model originally trained by akshitguptafintek24. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/exxon_semantic_search_en_5.5.0_3.0_1726033590658.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/exxon_semantic_search_en_5.5.0_3.0_1726033590658.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = MPNetEmbeddings.pretrained("exxon_semantic_search","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val embeddings = MPNetEmbeddings.pretrained("exxon_semantic_search","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|exxon_semantic_search| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[mpnet]| +|Language:|en| +|Size:|406.7 MB| + +## References + +https://huggingface.co/akshitguptafintek24/exxon-semantic-search \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-frabert_distilbert_base_uncased_train_en.md b/docs/_posts/ahmedlone127/2024-09-11-frabert_distilbert_base_uncased_train_en.md new file mode 100644 index 00000000000000..ae5881eb69aa02 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-frabert_distilbert_base_uncased_train_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English frabert_distilbert_base_uncased_train DistilBertForSequenceClassification from Francesco0101 +author: John Snow Labs +name: frabert_distilbert_base_uncased_train +date: 2024-09-11 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`frabert_distilbert_base_uncased_train` is a English model originally trained by Francesco0101. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/frabert_distilbert_base_uncased_train_en_5.5.0_3.0_1726052581508.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/frabert_distilbert_base_uncased_train_en_5.5.0_3.0_1726052581508.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("frabert_distilbert_base_uncased_train","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("frabert_distilbert_base_uncased_train", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|frabert_distilbert_base_uncased_train| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Francesco0101/FRABERT-distilbert-base-uncased-TRAIN \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-hate_hate_random0_seed2_twitter_roberta_base_dec2020_en.md b/docs/_posts/ahmedlone127/2024-09-11-hate_hate_random0_seed2_twitter_roberta_base_dec2020_en.md new file mode 100644 index 00000000000000..417c1caee23697 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-hate_hate_random0_seed2_twitter_roberta_base_dec2020_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English hate_hate_random0_seed2_twitter_roberta_base_dec2020 RoBertaForSequenceClassification from tweettemposhift +author: John Snow Labs +name: hate_hate_random0_seed2_twitter_roberta_base_dec2020 +date: 2024-09-11 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`hate_hate_random0_seed2_twitter_roberta_base_dec2020` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/hate_hate_random0_seed2_twitter_roberta_base_dec2020_en_5.5.0_3.0_1726095793954.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/hate_hate_random0_seed2_twitter_roberta_base_dec2020_en_5.5.0_3.0_1726095793954.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("hate_hate_random0_seed2_twitter_roberta_base_dec2020","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("hate_hate_random0_seed2_twitter_roberta_base_dec2020", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|hate_hate_random0_seed2_twitter_roberta_base_dec2020| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|468.3 MB| + +## References + +https://huggingface.co/tweettemposhift/hate-hate_random0_seed2-twitter-roberta-base-dec2020 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-imdbreviews_classification_distilbert_v02_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-11-imdbreviews_classification_distilbert_v02_pipeline_en.md new file mode 100644 index 00000000000000..27175b76dad47a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-imdbreviews_classification_distilbert_v02_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English imdbreviews_classification_distilbert_v02_pipeline pipeline AlbertForSequenceClassification from maherh +author: John Snow Labs +name: imdbreviews_classification_distilbert_v02_pipeline +date: 2024-09-11 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained AlbertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`imdbreviews_classification_distilbert_v02_pipeline` is a English model originally trained by maherh. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/imdbreviews_classification_distilbert_v02_pipeline_en_5.5.0_3.0_1726013077355.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/imdbreviews_classification_distilbert_v02_pipeline_en_5.5.0_3.0_1726013077355.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("imdbreviews_classification_distilbert_v02_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("imdbreviews_classification_distilbert_v02_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|imdbreviews_classification_distilbert_v02_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|44.2 MB| + +## References + +https://huggingface.co/maherh/imdbreviews_classification_distilbert_v02 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- AlbertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-job_title_classify_isco_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-11-job_title_classify_isco_pipeline_en.md new file mode 100644 index 00000000000000..1e6d509a8085db --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-job_title_classify_isco_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English job_title_classify_isco_pipeline pipeline BertForSequenceClassification from razzaghi +author: John Snow Labs +name: job_title_classify_isco_pipeline +date: 2024-09-11 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`job_title_classify_isco_pipeline` is a English model originally trained by razzaghi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/job_title_classify_isco_pipeline_en_5.5.0_3.0_1726015497133.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/job_title_classify_isco_pipeline_en_5.5.0_3.0_1726015497133.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("job_title_classify_isco_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("job_title_classify_isco_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|job_title_classify_isco_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|667.3 MB| + +## References + +https://huggingface.co/razzaghi/job_title_classify_isco + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-mdeberta_base_v3_1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-11-mdeberta_base_v3_1_pipeline_en.md new file mode 100644 index 00000000000000..29c4fcb9363cb7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-mdeberta_base_v3_1_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English mdeberta_base_v3_1_pipeline pipeline DeBertaForSequenceClassification from alyazharr +author: John Snow Labs +name: mdeberta_base_v3_1_pipeline +date: 2024-09-11 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DeBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`mdeberta_base_v3_1_pipeline` is a English model originally trained by alyazharr. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/mdeberta_base_v3_1_pipeline_en_5.5.0_3.0_1726029851256.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/mdeberta_base_v3_1_pipeline_en_5.5.0_3.0_1726029851256.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("mdeberta_base_v3_1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("mdeberta_base_v3_1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|mdeberta_base_v3_1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|832.6 MB| + +## References + +https://huggingface.co/alyazharr/mdeberta_base_v3_1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DeBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-modelolongformerbecas_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-11-modelolongformerbecas_pipeline_en.md new file mode 100644 index 00000000000000..56d86f499341dd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-modelolongformerbecas_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English modelolongformerbecas_pipeline pipeline RoBertaForQuestionAnswering from jonasaid +author: John Snow Labs +name: modelolongformerbecas_pipeline +date: 2024-09-11 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`modelolongformerbecas_pipeline` is a English model originally trained by jonasaid. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/modelolongformerbecas_pipeline_en_5.5.0_3.0_1726058091234.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/modelolongformerbecas_pipeline_en_5.5.0_3.0_1726058091234.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("modelolongformerbecas_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("modelolongformerbecas_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|modelolongformerbecas_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|473.3 MB| + +## References + +https://huggingface.co/jonasaid/modeloLongformerBecas + +## Included Models + +- MultiDocumentAssembler +- RoBertaForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-mrr_latest_27_7_en.md b/docs/_posts/ahmedlone127/2024-09-11-mrr_latest_27_7_en.md new file mode 100644 index 00000000000000..c0a51dbbd379cc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-mrr_latest_27_7_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English mrr_latest_27_7 RoBertaForQuestionAnswering from prajwalJumde +author: John Snow Labs +name: mrr_latest_27_7 +date: 2024-09-11 +tags: [en, open_source, onnx, question_answering, roberta] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`mrr_latest_27_7` is a English model originally trained by prajwalJumde. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/mrr_latest_27_7_en_5.5.0_3.0_1726061948579.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/mrr_latest_27_7_en_5.5.0_3.0_1726061948579.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = RoBertaForQuestionAnswering.pretrained("mrr_latest_27_7","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = RoBertaForQuestionAnswering.pretrained("mrr_latest_27_7", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|mrr_latest_27_7| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|463.9 MB| + +## References + +https://huggingface.co/prajwalJumde/MRR-Latest-27-7 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-mxbai_personality_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-11-mxbai_personality_pipeline_en.md new file mode 100644 index 00000000000000..641c9d5855c279 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-mxbai_personality_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English mxbai_personality_pipeline pipeline MPNetEmbeddings from dwulff +author: John Snow Labs +name: mxbai_personality_pipeline +date: 2024-09-11 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MPNetEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`mxbai_personality_pipeline` is a English model originally trained by dwulff. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/mxbai_personality_pipeline_en_5.5.0_3.0_1726054810759.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/mxbai_personality_pipeline_en_5.5.0_3.0_1726054810759.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("mxbai_personality_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("mxbai_personality_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|mxbai_personality_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|406.9 MB| + +## References + +https://huggingface.co/dwulff/mxbai-personality + +## Included Models + +- DocumentAssembler +- MPNetEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-n_roberta_sst5_padding0model_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-11-n_roberta_sst5_padding0model_pipeline_en.md new file mode 100644 index 00000000000000..ebac64c8cc1ce5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-n_roberta_sst5_padding0model_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English n_roberta_sst5_padding0model_pipeline pipeline RoBertaForSequenceClassification from Realgon +author: John Snow Labs +name: n_roberta_sst5_padding0model_pipeline +date: 2024-09-11 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`n_roberta_sst5_padding0model_pipeline` is a English model originally trained by Realgon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/n_roberta_sst5_padding0model_pipeline_en_5.5.0_3.0_1726053242992.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/n_roberta_sst5_padding0model_pipeline_en_5.5.0_3.0_1726053242992.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("n_roberta_sst5_padding0model_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("n_roberta_sst5_padding0model_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|n_roberta_sst5_padding0model_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|437.8 MB| + +## References + +https://huggingface.co/Realgon/N_roberta_sst5_padding0model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-nerd_nerd_random3_seed0_twitter_roberta_base_2022_154m_en.md b/docs/_posts/ahmedlone127/2024-09-11-nerd_nerd_random3_seed0_twitter_roberta_base_2022_154m_en.md new file mode 100644 index 00000000000000..a68358859083e3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-nerd_nerd_random3_seed0_twitter_roberta_base_2022_154m_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English nerd_nerd_random3_seed0_twitter_roberta_base_2022_154m RoBertaForSequenceClassification from tweettemposhift +author: John Snow Labs +name: nerd_nerd_random3_seed0_twitter_roberta_base_2022_154m +date: 2024-09-11 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`nerd_nerd_random3_seed0_twitter_roberta_base_2022_154m` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/nerd_nerd_random3_seed0_twitter_roberta_base_2022_154m_en_5.5.0_3.0_1726071125925.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/nerd_nerd_random3_seed0_twitter_roberta_base_2022_154m_en_5.5.0_3.0_1726071125925.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("nerd_nerd_random3_seed0_twitter_roberta_base_2022_154m","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("nerd_nerd_random3_seed0_twitter_roberta_base_2022_154m", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|nerd_nerd_random3_seed0_twitter_roberta_base_2022_154m| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|468.2 MB| + +## References + +https://huggingface.co/tweettemposhift/nerd-nerd_random3_seed0-twitter-roberta-base-2022-154m \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-nerd_nerd_temporal_twitter_roberta_base_dec2020_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-11-nerd_nerd_temporal_twitter_roberta_base_dec2020_pipeline_en.md new file mode 100644 index 00000000000000..2597ee896e49fe --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-nerd_nerd_temporal_twitter_roberta_base_dec2020_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English nerd_nerd_temporal_twitter_roberta_base_dec2020_pipeline pipeline RoBertaForSequenceClassification from tweettemposhift +author: John Snow Labs +name: nerd_nerd_temporal_twitter_roberta_base_dec2020_pipeline +date: 2024-09-11 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`nerd_nerd_temporal_twitter_roberta_base_dec2020_pipeline` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/nerd_nerd_temporal_twitter_roberta_base_dec2020_pipeline_en_5.5.0_3.0_1726081909035.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/nerd_nerd_temporal_twitter_roberta_base_dec2020_pipeline_en_5.5.0_3.0_1726081909035.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("nerd_nerd_temporal_twitter_roberta_base_dec2020_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("nerd_nerd_temporal_twitter_roberta_base_dec2020_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|nerd_nerd_temporal_twitter_roberta_base_dec2020_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|468.3 MB| + +## References + +https://huggingface.co/tweettemposhift/nerd-nerd_temporal-twitter-roberta-base-dec2020 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-nlp_hf_workshop_distilbert_en.md b/docs/_posts/ahmedlone127/2024-09-11-nlp_hf_workshop_distilbert_en.md new file mode 100644 index 00000000000000..0c68fc665f04b7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-nlp_hf_workshop_distilbert_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English nlp_hf_workshop_distilbert DistilBertForSequenceClassification from mspoulaei +author: John Snow Labs +name: nlp_hf_workshop_distilbert +date: 2024-09-11 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`nlp_hf_workshop_distilbert` is a English model originally trained by mspoulaei. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/nlp_hf_workshop_distilbert_en_5.5.0_3.0_1726014993777.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/nlp_hf_workshop_distilbert_en_5.5.0_3.0_1726014993777.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("nlp_hf_workshop_distilbert","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("nlp_hf_workshop_distilbert", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|nlp_hf_workshop_distilbert| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|246.0 MB| + +## References + +https://huggingface.co/mspoulaei/NLP_HF_Workshop_distilbert \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-opus_base_aon_tfidf_wce_unsampled_en.md b/docs/_posts/ahmedlone127/2024-09-11-opus_base_aon_tfidf_wce_unsampled_en.md new file mode 100644 index 00000000000000..1d72c986b44056 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-opus_base_aon_tfidf_wce_unsampled_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English opus_base_aon_tfidf_wce_unsampled MarianTransformer from ethansimrm +author: John Snow Labs +name: opus_base_aon_tfidf_wce_unsampled +date: 2024-09-11 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opus_base_aon_tfidf_wce_unsampled` is a English model originally trained by ethansimrm. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opus_base_aon_tfidf_wce_unsampled_en_5.5.0_3.0_1726038498397.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opus_base_aon_tfidf_wce_unsampled_en_5.5.0_3.0_1726038498397.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("opus_base_aon_tfidf_wce_unsampled","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("opus_base_aon_tfidf_wce_unsampled","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opus_base_aon_tfidf_wce_unsampled| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|508.4 MB| + +## References + +https://huggingface.co/ethansimrm/opus_base_AoN_tfidf_wce_unsampled \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-opus_big_enfr_ft_wang_2022_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-11-opus_big_enfr_ft_wang_2022_pipeline_en.md new file mode 100644 index 00000000000000..dee54da36f9344 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-opus_big_enfr_ft_wang_2022_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English opus_big_enfr_ft_wang_2022_pipeline pipeline MarianTransformer from ethansimrm +author: John Snow Labs +name: opus_big_enfr_ft_wang_2022_pipeline +date: 2024-09-11 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opus_big_enfr_ft_wang_2022_pipeline` is a English model originally trained by ethansimrm. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opus_big_enfr_ft_wang_2022_pipeline_en_5.5.0_3.0_1726037615433.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opus_big_enfr_ft_wang_2022_pipeline_en_5.5.0_3.0_1726037615433.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("opus_big_enfr_ft_wang_2022_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("opus_big_enfr_ft_wang_2022_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opus_big_enfr_ft_wang_2022_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/ethansimrm/opus_big_enfr_FT_wang_2022 + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-opus_maltese_english_indonesian_opus100_en.md b/docs/_posts/ahmedlone127/2024-09-11-opus_maltese_english_indonesian_opus100_en.md new file mode 100644 index 00000000000000..0c9361b097d41c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-opus_maltese_english_indonesian_opus100_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English opus_maltese_english_indonesian_opus100 MarianTransformer from yonathanstwn +author: John Snow Labs +name: opus_maltese_english_indonesian_opus100 +date: 2024-09-11 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opus_maltese_english_indonesian_opus100` is a English model originally trained by yonathanstwn. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opus_maltese_english_indonesian_opus100_en_5.5.0_3.0_1726038969534.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opus_maltese_english_indonesian_opus100_en_5.5.0_3.0_1726038969534.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("opus_maltese_english_indonesian_opus100","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("opus_maltese_english_indonesian_opus100","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opus_maltese_english_indonesian_opus100| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|481.7 MB| + +## References + +https://huggingface.co/yonathanstwn/opus-mt-en-id-opus100 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-opus_maltese_korean_english_finetuned_korean_tonga_tonga_islands_en100_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-11-opus_maltese_korean_english_finetuned_korean_tonga_tonga_islands_en100_pipeline_en.md new file mode 100644 index 00000000000000..6ec5229c512227 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-opus_maltese_korean_english_finetuned_korean_tonga_tonga_islands_en100_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English opus_maltese_korean_english_finetuned_korean_tonga_tonga_islands_en100_pipeline pipeline MarianTransformer from alphahg +author: John Snow Labs +name: opus_maltese_korean_english_finetuned_korean_tonga_tonga_islands_en100_pipeline +date: 2024-09-11 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opus_maltese_korean_english_finetuned_korean_tonga_tonga_islands_en100_pipeline` is a English model originally trained by alphahg. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opus_maltese_korean_english_finetuned_korean_tonga_tonga_islands_en100_pipeline_en_5.5.0_3.0_1726050435456.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opus_maltese_korean_english_finetuned_korean_tonga_tonga_islands_en100_pipeline_en_5.5.0_3.0_1726050435456.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("opus_maltese_korean_english_finetuned_korean_tonga_tonga_islands_en100_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("opus_maltese_korean_english_finetuned_korean_tonga_tonga_islands_en100_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opus_maltese_korean_english_finetuned_korean_tonga_tonga_islands_en100_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|541.2 MB| + +## References + +https://huggingface.co/alphahg/opus-mt-ko-en-finetuned-ko-to-en100 + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-opus_maltese_romance_english_finetuned_npomo_english_10_epochs_en.md b/docs/_posts/ahmedlone127/2024-09-11-opus_maltese_romance_english_finetuned_npomo_english_10_epochs_en.md new file mode 100644 index 00000000000000..3bc4fff072bec8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-opus_maltese_romance_english_finetuned_npomo_english_10_epochs_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English opus_maltese_romance_english_finetuned_npomo_english_10_epochs MarianTransformer from UnassumingOwl +author: John Snow Labs +name: opus_maltese_romance_english_finetuned_npomo_english_10_epochs +date: 2024-09-11 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opus_maltese_romance_english_finetuned_npomo_english_10_epochs` is a English model originally trained by UnassumingOwl. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opus_maltese_romance_english_finetuned_npomo_english_10_epochs_en_5.5.0_3.0_1726047151112.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opus_maltese_romance_english_finetuned_npomo_english_10_epochs_en_5.5.0_3.0_1726047151112.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("opus_maltese_romance_english_finetuned_npomo_english_10_epochs","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("opus_maltese_romance_english_finetuned_npomo_english_10_epochs","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opus_maltese_romance_english_finetuned_npomo_english_10_epochs| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|538.9 MB| + +## References + +https://huggingface.co/UnassumingOwl/opus-mt-ROMANCE-en-finetuned-npomo-en-10-epochs \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-opus_maltese_spanish_english_finetuned_npomo_english_10_epochs_en.md b/docs/_posts/ahmedlone127/2024-09-11-opus_maltese_spanish_english_finetuned_npomo_english_10_epochs_en.md new file mode 100644 index 00000000000000..b13336dd1af07c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-opus_maltese_spanish_english_finetuned_npomo_english_10_epochs_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English opus_maltese_spanish_english_finetuned_npomo_english_10_epochs MarianTransformer from UnassumingOwl +author: John Snow Labs +name: opus_maltese_spanish_english_finetuned_npomo_english_10_epochs +date: 2024-09-11 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opus_maltese_spanish_english_finetuned_npomo_english_10_epochs` is a English model originally trained by UnassumingOwl. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opus_maltese_spanish_english_finetuned_npomo_english_10_epochs_en_5.5.0_3.0_1726038497063.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opus_maltese_spanish_english_finetuned_npomo_english_10_epochs_en_5.5.0_3.0_1726038497063.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("opus_maltese_spanish_english_finetuned_npomo_english_10_epochs","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("opus_maltese_spanish_english_finetuned_npomo_english_10_epochs","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opus_maltese_spanish_english_finetuned_npomo_english_10_epochs| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|539.3 MB| + +## References + +https://huggingface.co/UnassumingOwl/opus-mt-es-en-finetuned-npomo-en-10-epochs \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-orig_refpydst_5p_referredstates_split_v1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-11-orig_refpydst_5p_referredstates_split_v1_pipeline_en.md new file mode 100644 index 00000000000000..f6030a0d24feb1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-orig_refpydst_5p_referredstates_split_v1_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English orig_refpydst_5p_referredstates_split_v1_pipeline pipeline MPNetEmbeddings from Brendan +author: John Snow Labs +name: orig_refpydst_5p_referredstates_split_v1_pipeline +date: 2024-09-11 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MPNetEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`orig_refpydst_5p_referredstates_split_v1_pipeline` is a English model originally trained by Brendan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/orig_refpydst_5p_referredstates_split_v1_pipeline_en_5.5.0_3.0_1726034160797.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/orig_refpydst_5p_referredstates_split_v1_pipeline_en_5.5.0_3.0_1726034160797.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("orig_refpydst_5p_referredstates_split_v1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("orig_refpydst_5p_referredstates_split_v1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|orig_refpydst_5p_referredstates_split_v1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|406.9 MB| + +## References + +https://huggingface.co/Brendan/orig-refpydst-5p-referredstates-split-v1 + +## Included Models + +- DocumentAssembler +- MPNetEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-parallel_roberta_large_en.md b/docs/_posts/ahmedlone127/2024-09-11-parallel_roberta_large_en.md new file mode 100644 index 00000000000000..58b2c6596d039d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-parallel_roberta_large_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English parallel_roberta_large RoBertaEmbeddings from luffycodes +author: John Snow Labs +name: parallel_roberta_large +date: 2024-09-11 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`parallel_roberta_large` is a English model originally trained by luffycodes. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/parallel_roberta_large_en_5.5.0_3.0_1726032112397.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/parallel_roberta_large_en_5.5.0_3.0_1726032112397.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("parallel_roberta_large","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("parallel_roberta_large","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|parallel_roberta_large| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/luffycodes/parallel-roberta-large \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-predict_dermat_pipeline_es.md b/docs/_posts/ahmedlone127/2024-09-11-predict_dermat_pipeline_es.md new file mode 100644 index 00000000000000..68d1eb3459c253 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-predict_dermat_pipeline_es.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Castilian, Spanish predict_dermat_pipeline pipeline RoBertaForSequenceClassification from fundacionctic +author: John Snow Labs +name: predict_dermat_pipeline +date: 2024-09-11 +tags: [es, open_source, pipeline, onnx] +task: Text Classification +language: es +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`predict_dermat_pipeline` is a Castilian, Spanish model originally trained by fundacionctic. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/predict_dermat_pipeline_es_5.5.0_3.0_1726063004517.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/predict_dermat_pipeline_es_5.5.0_3.0_1726063004517.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("predict_dermat_pipeline", lang = "es") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("predict_dermat_pipeline", lang = "es") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|predict_dermat_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|es| +|Size:|431.5 MB| + +## References + +https://huggingface.co/fundacionctic/predict-dermat + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-q2e_ep3_1122_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-11-q2e_ep3_1122_pipeline_en.md new file mode 100644 index 00000000000000..487356e889365c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-q2e_ep3_1122_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English q2e_ep3_1122_pipeline pipeline MPNetEmbeddings from ingeol +author: John Snow Labs +name: q2e_ep3_1122_pipeline +date: 2024-09-11 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MPNetEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`q2e_ep3_1122_pipeline` is a English model originally trained by ingeol. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/q2e_ep3_1122_pipeline_en_5.5.0_3.0_1726089062286.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/q2e_ep3_1122_pipeline_en_5.5.0_3.0_1726089062286.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("q2e_ep3_1122_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("q2e_ep3_1122_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|q2e_ep3_1122_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.1 MB| + +## References + +https://huggingface.co/ingeol/q2e_ep3_1122 + +## Included Models + +- DocumentAssembler +- MPNetEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-roberta_base_epoch_15_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-11-roberta_base_epoch_15_pipeline_en.md new file mode 100644 index 00000000000000..b1e10b4c4b1a9a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-roberta_base_epoch_15_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_epoch_15_pipeline pipeline RoBertaEmbeddings from yanaiela +author: John Snow Labs +name: roberta_base_epoch_15_pipeline +date: 2024-09-11 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_epoch_15_pipeline` is a English model originally trained by yanaiela. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_epoch_15_pipeline_en_5.5.0_3.0_1726094295857.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_epoch_15_pipeline_en_5.5.0_3.0_1726094295857.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_epoch_15_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_epoch_15_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_epoch_15_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|297.2 MB| + +## References + +https://huggingface.co/yanaiela/roberta-base-epoch_15 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-roberta_base_finetuned_squad_squad_covid_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-11-roberta_base_finetuned_squad_squad_covid_pipeline_en.md new file mode 100644 index 00000000000000..cd8f1574d89879 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-roberta_base_finetuned_squad_squad_covid_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English roberta_base_finetuned_squad_squad_covid_pipeline pipeline RoBertaForQuestionAnswering from ahcene-ikram +author: John Snow Labs +name: roberta_base_finetuned_squad_squad_covid_pipeline +date: 2024-09-11 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_finetuned_squad_squad_covid_pipeline` is a English model originally trained by ahcene-ikram. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_finetuned_squad_squad_covid_pipeline_en_5.5.0_3.0_1726039449433.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_finetuned_squad_squad_covid_pipeline_en_5.5.0_3.0_1726039449433.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_finetuned_squad_squad_covid_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_finetuned_squad_squad_covid_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_finetuned_squad_squad_covid_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|464.0 MB| + +## References + +https://huggingface.co/ahcene-ikram/roberta-base-finetuned-squad-squad-covid + +## Included Models + +- MultiDocumentAssembler +- RoBertaForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-roberta_base_squad_i8_f32_p50_en.md b/docs/_posts/ahmedlone127/2024-09-11-roberta_base_squad_i8_f32_p50_en.md new file mode 100644 index 00000000000000..7b4ec6b23f67df --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-roberta_base_squad_i8_f32_p50_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English roberta_base_squad_i8_f32_p50 RoBertaForQuestionAnswering from pminha +author: John Snow Labs +name: roberta_base_squad_i8_f32_p50 +date: 2024-09-11 +tags: [en, open_source, onnx, question_answering, roberta] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_squad_i8_f32_p50` is a English model originally trained by pminha. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_squad_i8_f32_p50_en_5.5.0_3.0_1726036093510.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_squad_i8_f32_p50_en_5.5.0_3.0_1726036093510.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = RoBertaForQuestionAnswering.pretrained("roberta_base_squad_i8_f32_p50","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = RoBertaForQuestionAnswering.pretrained("roberta_base_squad_i8_f32_p50", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_squad_i8_f32_p50| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|228.5 MB| + +## References + +https://huggingface.co/pminha/roberta-base-squad-i8-f32-p50 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-roberta_finetuned_chennaiqa_10_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-11-roberta_finetuned_chennaiqa_10_pipeline_en.md new file mode 100644 index 00000000000000..6351e8161b53b3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-roberta_finetuned_chennaiqa_10_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English roberta_finetuned_chennaiqa_10_pipeline pipeline RoBertaForQuestionAnswering from aditi2212 +author: John Snow Labs +name: roberta_finetuned_chennaiqa_10_pipeline +date: 2024-09-11 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_finetuned_chennaiqa_10_pipeline` is a English model originally trained by aditi2212. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_finetuned_chennaiqa_10_pipeline_en_5.5.0_3.0_1726036746163.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_finetuned_chennaiqa_10_pipeline_en_5.5.0_3.0_1726036746163.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_finetuned_chennaiqa_10_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_finetuned_chennaiqa_10_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_finetuned_chennaiqa_10_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|463.6 MB| + +## References + +https://huggingface.co/aditi2212/roberta-finetuned-ChennaiQA-10 + +## Included Models + +- MultiDocumentAssembler +- RoBertaForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-roberta_finetuned_subjqa_movies_2_manishonly_en.md b/docs/_posts/ahmedlone127/2024-09-11-roberta_finetuned_subjqa_movies_2_manishonly_en.md new file mode 100644 index 00000000000000..32b58881dc1df2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-roberta_finetuned_subjqa_movies_2_manishonly_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English roberta_finetuned_subjqa_movies_2_manishonly RoBertaForQuestionAnswering from Manishonly +author: John Snow Labs +name: roberta_finetuned_subjqa_movies_2_manishonly +date: 2024-09-11 +tags: [en, open_source, onnx, question_answering, roberta] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_finetuned_subjqa_movies_2_manishonly` is a English model originally trained by Manishonly. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_finetuned_subjqa_movies_2_manishonly_en_5.5.0_3.0_1726036611400.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_finetuned_subjqa_movies_2_manishonly_en_5.5.0_3.0_1726036611400.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = RoBertaForQuestionAnswering.pretrained("roberta_finetuned_subjqa_movies_2_manishonly","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = RoBertaForQuestionAnswering.pretrained("roberta_finetuned_subjqa_movies_2_manishonly", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_finetuned_subjqa_movies_2_manishonly| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|464.1 MB| + +## References + +https://huggingface.co/Manishonly/roberta-finetuned-subjqa-movies_2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-roberta_finetuned_subjqa_movies_2_quocc_en.md b/docs/_posts/ahmedlone127/2024-09-11-roberta_finetuned_subjqa_movies_2_quocc_en.md new file mode 100644 index 00000000000000..879d7aded89cc1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-roberta_finetuned_subjqa_movies_2_quocc_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English roberta_finetuned_subjqa_movies_2_quocc RoBertaForQuestionAnswering from Quocc +author: John Snow Labs +name: roberta_finetuned_subjqa_movies_2_quocc +date: 2024-09-11 +tags: [en, open_source, onnx, question_answering, roberta] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_finetuned_subjqa_movies_2_quocc` is a English model originally trained by Quocc. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_finetuned_subjqa_movies_2_quocc_en_5.5.0_3.0_1726055867824.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_finetuned_subjqa_movies_2_quocc_en_5.5.0_3.0_1726055867824.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = RoBertaForQuestionAnswering.pretrained("roberta_finetuned_subjqa_movies_2_quocc","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = RoBertaForQuestionAnswering.pretrained("roberta_finetuned_subjqa_movies_2_quocc", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_finetuned_subjqa_movies_2_quocc| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|464.1 MB| + +## References + +https://huggingface.co/Quocc/roberta-finetuned-subjqa-movies_2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-roberta_large_few_shot_k_256_finetuned_squad_seed_0_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-11-roberta_large_few_shot_k_256_finetuned_squad_seed_0_pipeline_en.md new file mode 100644 index 00000000000000..19bb7d871ea15f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-roberta_large_few_shot_k_256_finetuned_squad_seed_0_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English roberta_large_few_shot_k_256_finetuned_squad_seed_0_pipeline pipeline RoBertaForQuestionAnswering from anas-awadalla +author: John Snow Labs +name: roberta_large_few_shot_k_256_finetuned_squad_seed_0_pipeline +date: 2024-09-11 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_large_few_shot_k_256_finetuned_squad_seed_0_pipeline` is a English model originally trained by anas-awadalla. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_large_few_shot_k_256_finetuned_squad_seed_0_pipeline_en_5.5.0_3.0_1726039448200.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_large_few_shot_k_256_finetuned_squad_seed_0_pipeline_en_5.5.0_3.0_1726039448200.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_large_few_shot_k_256_finetuned_squad_seed_0_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_large_few_shot_k_256_finetuned_squad_seed_0_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_large_few_shot_k_256_finetuned_squad_seed_0_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/anas-awadalla/roberta-large-few-shot-k-256-finetuned-squad-seed-0 + +## Included Models + +- MultiDocumentAssembler +- RoBertaForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-roberta_large_finnish_flax_community_fi.md b/docs/_posts/ahmedlone127/2024-09-11-roberta_large_finnish_flax_community_fi.md new file mode 100644 index 00000000000000..c27a462c90e13e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-roberta_large_finnish_flax_community_fi.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Finnish roberta_large_finnish_flax_community RoBertaEmbeddings from flax-community +author: John Snow Labs +name: roberta_large_finnish_flax_community +date: 2024-09-11 +tags: [fi, open_source, onnx, embeddings, roberta] +task: Embeddings +language: fi +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_large_finnish_flax_community` is a Finnish model originally trained by flax-community. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_large_finnish_flax_community_fi_5.5.0_3.0_1726066336613.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_large_finnish_flax_community_fi_5.5.0_3.0_1726066336613.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("roberta_large_finnish_flax_community","fi") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("roberta_large_finnish_flax_community","fi") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_large_finnish_flax_community| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|fi| +|Size:|1.3 GB| + +## References + +https://huggingface.co/flax-community/RoBERTa-large-finnish \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-roberta_large_squad2_fine_tuned_3e_en.md b/docs/_posts/ahmedlone127/2024-09-11-roberta_large_squad2_fine_tuned_3e_en.md new file mode 100644 index 00000000000000..898645c3887467 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-roberta_large_squad2_fine_tuned_3e_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English roberta_large_squad2_fine_tuned_3e RoBertaForQuestionAnswering from marwanimroz18 +author: John Snow Labs +name: roberta_large_squad2_fine_tuned_3e +date: 2024-09-11 +tags: [en, open_source, onnx, question_answering, roberta] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_large_squad2_fine_tuned_3e` is a English model originally trained by marwanimroz18. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_large_squad2_fine_tuned_3e_en_5.5.0_3.0_1726039706751.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_large_squad2_fine_tuned_3e_en_5.5.0_3.0_1726039706751.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = RoBertaForQuestionAnswering.pretrained("roberta_large_squad2_fine_tuned_3e","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = RoBertaForQuestionAnswering.pretrained("roberta_large_squad2_fine_tuned_3e", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_large_squad2_fine_tuned_3e| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/marwanimroz18/roberta-large-squad2-fine-tuned-3e \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-roberta_qa_base_squad_finetuned_on_runaways_en.md b/docs/_posts/ahmedlone127/2024-09-11-roberta_qa_base_squad_finetuned_on_runaways_en.md new file mode 100644 index 00000000000000..bc8f7edda4304a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-roberta_qa_base_squad_finetuned_on_runaways_en.md @@ -0,0 +1,92 @@ +--- +layout: model +title: English RobertaForQuestionAnswering Base Cased model (from Nadav) +author: John Snow Labs +name: roberta_qa_base_squad_finetuned_on_runaways +date: 2024-09-11 +tags: [en, open_source, roberta, question_answering, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RobertaForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `roberta-base-squad-finetuned-on-runaways-en` is a English model originally trained by `Nadav`. + +## Predicted Entities + + + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_qa_base_squad_finetuned_on_runaways_en_5.5.0_3.0_1726036226246.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_qa_base_squad_finetuned_on_runaways_en_5.5.0_3.0_1726036226246.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +Document_Assembler = MultiDocumentAssembler()\ + .setInputCols(["question", "context"])\ + .setOutputCols(["document_question", "document_context"]) + +Question_Answering = RoBertaForQuestionAnswering.pretrained("roberta_qa_base_squad_finetuned_on_runaways","en")\ + .setInputCols(["document_question", "document_context"])\ + .setOutputCol("answer")\ + .setCaseSensitive(True) + +pipeline = Pipeline(stages=[Document_Assembler, Question_Answering]) + +data = spark.createDataFrame([["What's my name?","My name is Clara and I live in Berkeley."]]).toDF("question", "context") + +result = pipeline.fit(data).transform(data) +``` +```scala +val Document_Assembler = new MultiDocumentAssembler() + .setInputCols(Array("question", "context")) + .setOutputCols(Array("document_question", "document_context")) + +val Question_Answering = RoBertaForQuestionAnswering.pretrained("roberta_qa_base_squad_finetuned_on_runaways","en") + .setInputCols(Array("document_question", "document_context")) + .setOutputCol("answer") + .setCaseSensitive(true) + +val pipeline = new Pipeline().setStages(Array(Document_Assembler, Question_Answering)) + +val data = Seq("What's my name?","My name is Clara and I live in Berkeley.").toDS.toDF("question", "context") + +val result = pipeline.fit(data).transform(data) +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_qa_base_squad_finetuned_on_runaways| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|466.3 MB| + +## References + +References + +- https://huggingface.co/Nadav/roberta-base-squad-finetuned-on-runaways-en \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-roberta_vaers_en.md b/docs/_posts/ahmedlone127/2024-09-11-roberta_vaers_en.md new file mode 100644 index 00000000000000..dda0ccaa6c5a66 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-roberta_vaers_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_vaers RoBertaForSequenceClassification from paragon-analytics +author: John Snow Labs +name: roberta_vaers +date: 2024-09-11 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_vaers` is a English model originally trained by paragon-analytics. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_vaers_en_5.5.0_3.0_1726071140306.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_vaers_en_5.5.0_3.0_1726071140306.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_vaers","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_vaers", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_vaers| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|431.4 MB| + +## References + +https://huggingface.co/paragon-analytics/roberta_vaers \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-salamathankstransformer_en2fil_v2_en.md b/docs/_posts/ahmedlone127/2024-09-11-salamathankstransformer_en2fil_v2_en.md new file mode 100644 index 00000000000000..b834fa9e4bf84a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-salamathankstransformer_en2fil_v2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English salamathankstransformer_en2fil_v2 MarianTransformer from SalamaThanks +author: John Snow Labs +name: salamathankstransformer_en2fil_v2 +date: 2024-09-11 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`salamathankstransformer_en2fil_v2` is a English model originally trained by SalamaThanks. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/salamathankstransformer_en2fil_v2_en_5.5.0_3.0_1726038838003.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/salamathankstransformer_en2fil_v2_en_5.5.0_3.0_1726038838003.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("salamathankstransformer_en2fil_v2","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("salamathankstransformer_en2fil_v2","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|salamathankstransformer_en2fil_v2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|496.6 MB| + +## References + +https://huggingface.co/SalamaThanks/SalamaThanksTransformer_en2fil_v2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-sanskrit_saskta_tweet_bert_large_e3_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-11-sanskrit_saskta_tweet_bert_large_e3_pipeline_en.md new file mode 100644 index 00000000000000..fee278e9b4d5ed --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-sanskrit_saskta_tweet_bert_large_e3_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English sanskrit_saskta_tweet_bert_large_e3_pipeline pipeline RoBertaForSequenceClassification from JerryYanJiang +author: John Snow Labs +name: sanskrit_saskta_tweet_bert_large_e3_pipeline +date: 2024-09-11 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sanskrit_saskta_tweet_bert_large_e3_pipeline` is a English model originally trained by JerryYanJiang. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sanskrit_saskta_tweet_bert_large_e3_pipeline_en_5.5.0_3.0_1726022238783.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sanskrit_saskta_tweet_bert_large_e3_pipeline_en_5.5.0_3.0_1726022238783.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sanskrit_saskta_tweet_bert_large_e3_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sanskrit_saskta_tweet_bert_large_e3_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sanskrit_saskta_tweet_bert_large_e3_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/JerryYanJiang/SA-tweet-bert-large-e3 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-sarcasm_detection_roberta_base_cree_sayula_popoluca_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-11-sarcasm_detection_roberta_base_cree_sayula_popoluca_pipeline_en.md new file mode 100644 index 00000000000000..7209fee7698384 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-sarcasm_detection_roberta_base_cree_sayula_popoluca_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English sarcasm_detection_roberta_base_cree_sayula_popoluca_pipeline pipeline RoBertaForSequenceClassification from jkhan447 +author: John Snow Labs +name: sarcasm_detection_roberta_base_cree_sayula_popoluca_pipeline +date: 2024-09-11 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sarcasm_detection_roberta_base_cree_sayula_popoluca_pipeline` is a English model originally trained by jkhan447. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sarcasm_detection_roberta_base_cree_sayula_popoluca_pipeline_en_5.5.0_3.0_1726062835452.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sarcasm_detection_roberta_base_cree_sayula_popoluca_pipeline_en_5.5.0_3.0_1726062835452.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sarcasm_detection_roberta_base_cree_sayula_popoluca_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sarcasm_detection_roberta_base_cree_sayula_popoluca_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sarcasm_detection_roberta_base_cree_sayula_popoluca_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|428.7 MB| + +## References + +https://huggingface.co/jkhan447/sarcasm-detection-RoBerta-base-CR-POS + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-sent_bert_base_multilingual_cased_finetuned_amharic_pipeline_xx.md b/docs/_posts/ahmedlone127/2024-09-11-sent_bert_base_multilingual_cased_finetuned_amharic_pipeline_xx.md new file mode 100644 index 00000000000000..58d3934a14344f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-sent_bert_base_multilingual_cased_finetuned_amharic_pipeline_xx.md @@ -0,0 +1,71 @@ +--- +layout: model +title: Multilingual sent_bert_base_multilingual_cased_finetuned_amharic_pipeline pipeline BertSentenceEmbeddings from Davlan +author: John Snow Labs +name: sent_bert_base_multilingual_cased_finetuned_amharic_pipeline +date: 2024-09-11 +tags: [xx, open_source, pipeline, onnx] +task: Embeddings +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_multilingual_cased_finetuned_amharic_pipeline` is a Multilingual model originally trained by Davlan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_multilingual_cased_finetuned_amharic_pipeline_xx_5.5.0_3.0_1726057230184.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_multilingual_cased_finetuned_amharic_pipeline_xx_5.5.0_3.0_1726057230184.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_multilingual_cased_finetuned_amharic_pipeline", lang = "xx") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_multilingual_cased_finetuned_amharic_pipeline", lang = "xx") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_multilingual_cased_finetuned_amharic_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|xx| +|Size:|665.6 MB| + +## References + +https://huggingface.co/Davlan/bert-base-multilingual-cased-finetuned-amharic + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-sent_bert_base_standard_bahasa_cased_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-11-sent_bert_base_standard_bahasa_cased_pipeline_en.md new file mode 100644 index 00000000000000..1b0c34c22c81ba --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-sent_bert_base_standard_bahasa_cased_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_base_standard_bahasa_cased_pipeline pipeline BertSentenceEmbeddings from mesolitica +author: John Snow Labs +name: sent_bert_base_standard_bahasa_cased_pipeline +date: 2024-09-11 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_standard_bahasa_cased_pipeline` is a English model originally trained by mesolitica. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_standard_bahasa_cased_pipeline_en_5.5.0_3.0_1726056965875.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_standard_bahasa_cased_pipeline_en_5.5.0_3.0_1726056965875.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_standard_bahasa_cased_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_standard_bahasa_cased_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_standard_bahasa_cased_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|413.2 MB| + +## References + +https://huggingface.co/mesolitica/bert-base-standard-bahasa-cased + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-sent_bert_large_arabic_ar.md b/docs/_posts/ahmedlone127/2024-09-11-sent_bert_large_arabic_ar.md new file mode 100644 index 00000000000000..f239f8473a3bdd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-sent_bert_large_arabic_ar.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Arabic sent_bert_large_arabic BertSentenceEmbeddings from asafaya +author: John Snow Labs +name: sent_bert_large_arabic +date: 2024-09-11 +tags: [ar, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: ar +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_large_arabic` is a Arabic model originally trained by asafaya. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_large_arabic_ar_5.5.0_3.0_1726057243306.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_large_arabic_ar_5.5.0_3.0_1726057243306.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_large_arabic","ar") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_large_arabic","ar") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_large_arabic| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|ar| +|Size:|1.3 GB| + +## References + +https://huggingface.co/asafaya/bert-large-arabic \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-sent_legalbertpt_fp_en.md b/docs/_posts/ahmedlone127/2024-09-11-sent_legalbertpt_fp_en.md new file mode 100644 index 00000000000000..871d8a2c3306a9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-sent_legalbertpt_fp_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_legalbertpt_fp BertSentenceEmbeddings from raquelsilveira +author: John Snow Labs +name: sent_legalbertpt_fp +date: 2024-09-11 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_legalbertpt_fp` is a English model originally trained by raquelsilveira. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_legalbertpt_fp_en_5.5.0_3.0_1726080809082.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_legalbertpt_fp_en_5.5.0_3.0_1726080809082.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_legalbertpt_fp","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_legalbertpt_fp","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_legalbertpt_fp| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|405.8 MB| + +## References + +https://huggingface.co/raquelsilveira/legalbertpt_fp \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-sentiment_analysis_pi_duboij_en.md b/docs/_posts/ahmedlone127/2024-09-11-sentiment_analysis_pi_duboij_en.md new file mode 100644 index 00000000000000..9c222617dcbc59 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-sentiment_analysis_pi_duboij_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sentiment_analysis_pi_duboij RoBertaForSequenceClassification from DuboiJ +author: John Snow Labs +name: sentiment_analysis_pi_duboij +date: 2024-09-11 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sentiment_analysis_pi_duboij` is a English model originally trained by DuboiJ. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sentiment_analysis_pi_duboij_en_5.5.0_3.0_1726081959176.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sentiment_analysis_pi_duboij_en_5.5.0_3.0_1726081959176.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("sentiment_analysis_pi_duboij","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("sentiment_analysis_pi_duboij", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sentiment_analysis_pi_duboij| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|423.8 MB| + +## References + +https://huggingface.co/DuboiJ/Sentiment_analysis_PI \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-sentiment_sentiment_small_random3_seed1_roberta_base_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-11-sentiment_sentiment_small_random3_seed1_roberta_base_pipeline_en.md new file mode 100644 index 00000000000000..6d6790c98c0881 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-sentiment_sentiment_small_random3_seed1_roberta_base_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English sentiment_sentiment_small_random3_seed1_roberta_base_pipeline pipeline RoBertaForSequenceClassification from tweettemposhift +author: John Snow Labs +name: sentiment_sentiment_small_random3_seed1_roberta_base_pipeline +date: 2024-09-11 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sentiment_sentiment_small_random3_seed1_roberta_base_pipeline` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sentiment_sentiment_small_random3_seed1_roberta_base_pipeline_en_5.5.0_3.0_1726096128184.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sentiment_sentiment_small_random3_seed1_roberta_base_pipeline_en_5.5.0_3.0_1726096128184.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sentiment_sentiment_small_random3_seed1_roberta_base_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sentiment_sentiment_small_random3_seed1_roberta_base_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sentiment_sentiment_small_random3_seed1_roberta_base_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|430.5 MB| + +## References + +https://huggingface.co/tweettemposhift/sentiment-sentiment_small_random3_seed1-roberta-base + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-sentimentanalysis_yelpreviews_optimizedmodel_en.md b/docs/_posts/ahmedlone127/2024-09-11-sentimentanalysis_yelpreviews_optimizedmodel_en.md new file mode 100644 index 00000000000000..735649cbf2bbdf --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-sentimentanalysis_yelpreviews_optimizedmodel_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sentimentanalysis_yelpreviews_optimizedmodel DistilBertForSequenceClassification from ElizaClaPa +author: John Snow Labs +name: sentimentanalysis_yelpreviews_optimizedmodel +date: 2024-09-11 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sentimentanalysis_yelpreviews_optimizedmodel` is a English model originally trained by ElizaClaPa. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sentimentanalysis_yelpreviews_optimizedmodel_en_5.5.0_3.0_1726052441672.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sentimentanalysis_yelpreviews_optimizedmodel_en_5.5.0_3.0_1726052441672.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("sentimentanalysis_yelpreviews_optimizedmodel","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("sentimentanalysis_yelpreviews_optimizedmodel", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sentimentanalysis_yelpreviews_optimizedmodel| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.7 MB| + +## References + +https://huggingface.co/ElizaClaPa/SentimentAnalysis-YelpReviews-OptimizedModel \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-sms_spam_model_v1_1_en.md b/docs/_posts/ahmedlone127/2024-09-11-sms_spam_model_v1_1_en.md new file mode 100644 index 00000000000000..a4613230be536a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-sms_spam_model_v1_1_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sms_spam_model_v1_1 DistilBertForSequenceClassification from xia0t1an +author: John Snow Labs +name: sms_spam_model_v1_1 +date: 2024-09-11 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sms_spam_model_v1_1` is a English model originally trained by xia0t1an. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sms_spam_model_v1_1_en_5.5.0_3.0_1726014372284.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sms_spam_model_v1_1_en_5.5.0_3.0_1726014372284.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("sms_spam_model_v1_1","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("sms_spam_model_v1_1", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sms_spam_model_v1_1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/xia0t1an/sms-spam-model-v1_1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-stack_overflow_open_status_classifier_portuguese_warm_supervised_120_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-11-stack_overflow_open_status_classifier_portuguese_warm_supervised_120_pipeline_en.md new file mode 100644 index 00000000000000..1910afb2ccd09a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-stack_overflow_open_status_classifier_portuguese_warm_supervised_120_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English stack_overflow_open_status_classifier_portuguese_warm_supervised_120_pipeline pipeline AlbertForSequenceClassification from reubenjohn +author: John Snow Labs +name: stack_overflow_open_status_classifier_portuguese_warm_supervised_120_pipeline +date: 2024-09-11 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained AlbertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`stack_overflow_open_status_classifier_portuguese_warm_supervised_120_pipeline` is a English model originally trained by reubenjohn. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/stack_overflow_open_status_classifier_portuguese_warm_supervised_120_pipeline_en_5.5.0_3.0_1726013387534.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/stack_overflow_open_status_classifier_portuguese_warm_supervised_120_pipeline_en_5.5.0_3.0_1726013387534.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("stack_overflow_open_status_classifier_portuguese_warm_supervised_120_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("stack_overflow_open_status_classifier_portuguese_warm_supervised_120_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|stack_overflow_open_status_classifier_portuguese_warm_supervised_120_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|44.3 MB| + +## References + +https://huggingface.co/reubenjohn/stack-overflow-open-status-classifier-pt-warm-supervised-120 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- AlbertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-stance_twi_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-11-stance_twi_pipeline_en.md new file mode 100644 index 00000000000000..1f83111783b4fb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-stance_twi_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English stance_twi_pipeline pipeline RoBertaForSequenceClassification from eevvgg +author: John Snow Labs +name: stance_twi_pipeline +date: 2024-09-11 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`stance_twi_pipeline` is a English model originally trained by eevvgg. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/stance_twi_pipeline_en_5.5.0_3.0_1726053641283.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/stance_twi_pipeline_en_5.5.0_3.0_1726053641283.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("stance_twi_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("stance_twi_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|stance_twi_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/eevvgg/Stance-Tw + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-symptoms_tonga_tonga_islands_diagnosis_sonatafyai_bert_v1_sonatafyai_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-11-symptoms_tonga_tonga_islands_diagnosis_sonatafyai_bert_v1_sonatafyai_pipeline_en.md new file mode 100644 index 00000000000000..2fbc999fcbebbd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-symptoms_tonga_tonga_islands_diagnosis_sonatafyai_bert_v1_sonatafyai_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English symptoms_tonga_tonga_islands_diagnosis_sonatafyai_bert_v1_sonatafyai_pipeline pipeline BertForSequenceClassification from Sonatafyai +author: John Snow Labs +name: symptoms_tonga_tonga_islands_diagnosis_sonatafyai_bert_v1_sonatafyai_pipeline +date: 2024-09-11 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`symptoms_tonga_tonga_islands_diagnosis_sonatafyai_bert_v1_sonatafyai_pipeline` is a English model originally trained by Sonatafyai. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/symptoms_tonga_tonga_islands_diagnosis_sonatafyai_bert_v1_sonatafyai_pipeline_en_5.5.0_3.0_1726015182769.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/symptoms_tonga_tonga_islands_diagnosis_sonatafyai_bert_v1_sonatafyai_pipeline_en_5.5.0_3.0_1726015182769.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("symptoms_tonga_tonga_islands_diagnosis_sonatafyai_bert_v1_sonatafyai_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("symptoms_tonga_tonga_islands_diagnosis_sonatafyai_bert_v1_sonatafyai_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|symptoms_tonga_tonga_islands_diagnosis_sonatafyai_bert_v1_sonatafyai_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.5 MB| + +## References + +https://huggingface.co/Sonatafyai/Symptoms_to_Diagnosis_SonatafyAI_BERT_v1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-t5_kazakh_question_generation_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-11-t5_kazakh_question_generation_pipeline_en.md new file mode 100644 index 00000000000000..0b559975986d57 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-t5_kazakh_question_generation_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English t5_kazakh_question_generation_pipeline pipeline T5Transformer from llmprojectkaz +author: John Snow Labs +name: t5_kazakh_question_generation_pipeline +date: 2024-09-11 +tags: [en, open_source, pipeline, onnx] +task: [Question Answering, Summarization, Translation, Text Generation] +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained T5Transformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`t5_kazakh_question_generation_pipeline` is a English model originally trained by llmprojectkaz. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/t5_kazakh_question_generation_pipeline_en_5.5.0_3.0_1726075384815.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/t5_kazakh_question_generation_pipeline_en_5.5.0_3.0_1726075384815.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("t5_kazakh_question_generation_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("t5_kazakh_question_generation_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|t5_kazakh_question_generation_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|2.9 GB| + +## References + +https://huggingface.co/llmprojectkaz/t5-kazakh-question-generation + +## Included Models + +- DocumentAssembler +- T5Transformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-telugu_bertu_sayula_popoluca_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-11-telugu_bertu_sayula_popoluca_pipeline_en.md new file mode 100644 index 00000000000000..7567ec8e496d84 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-telugu_bertu_sayula_popoluca_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English telugu_bertu_sayula_popoluca_pipeline pipeline BertForTokenClassification from kuppuluri +author: John Snow Labs +name: telugu_bertu_sayula_popoluca_pipeline +date: 2024-09-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`telugu_bertu_sayula_popoluca_pipeline` is a English model originally trained by kuppuluri. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/telugu_bertu_sayula_popoluca_pipeline_en_5.5.0_3.0_1726026741799.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/telugu_bertu_sayula_popoluca_pipeline_en_5.5.0_3.0_1726026741799.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("telugu_bertu_sayula_popoluca_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("telugu_bertu_sayula_popoluca_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|telugu_bertu_sayula_popoluca_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|412.6 MB| + +## References + +https://huggingface.co/kuppuluri/telugu_bertu_pos + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-teroberta_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-11-teroberta_pipeline_en.md new file mode 100644 index 00000000000000..1034ff025ba802 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-teroberta_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English teroberta_pipeline pipeline RoBertaEmbeddings from subbareddyiiit +author: John Snow Labs +name: teroberta_pipeline +date: 2024-09-11 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`teroberta_pipeline` is a English model originally trained by subbareddyiiit. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/teroberta_pipeline_en_5.5.0_3.0_1726031913943.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/teroberta_pipeline_en_5.5.0_3.0_1726031913943.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("teroberta_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("teroberta_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|teroberta_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|901.4 MB| + +## References + +https://huggingface.co/subbareddyiiit/TeRobeRta + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-tesla_earningscall_sentiment_analysis_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-11-tesla_earningscall_sentiment_analysis_pipeline_en.md new file mode 100644 index 00000000000000..38494d0afb8fc2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-tesla_earningscall_sentiment_analysis_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English tesla_earningscall_sentiment_analysis_pipeline pipeline RoBertaForSequenceClassification from weip9012 +author: John Snow Labs +name: tesla_earningscall_sentiment_analysis_pipeline +date: 2024-09-11 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`tesla_earningscall_sentiment_analysis_pipeline` is a English model originally trained by weip9012. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/tesla_earningscall_sentiment_analysis_pipeline_en_5.5.0_3.0_1726060975160.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/tesla_earningscall_sentiment_analysis_pipeline_en_5.5.0_3.0_1726060975160.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("tesla_earningscall_sentiment_analysis_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("tesla_earningscall_sentiment_analysis_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|tesla_earningscall_sentiment_analysis_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|308.8 MB| + +## References + +https://huggingface.co/weip9012/tesla_earningscall_sentiment_analysis + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-test_trainer_greatakela_en.md b/docs/_posts/ahmedlone127/2024-09-11-test_trainer_greatakela_en.md new file mode 100644 index 00000000000000..95cd6e685657cc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-test_trainer_greatakela_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English test_trainer_greatakela DistilBertForSequenceClassification from greatakela +author: John Snow Labs +name: test_trainer_greatakela +date: 2024-09-11 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`test_trainer_greatakela` is a English model originally trained by greatakela. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/test_trainer_greatakela_en_5.5.0_3.0_1726018039676.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/test_trainer_greatakela_en_5.5.0_3.0_1726018039676.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("test_trainer_greatakela","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("test_trainer_greatakela", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|test_trainer_greatakela| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/greatakela/test-trainer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-xlm_roberta_base_finetuned_panx_english_royam0820_en.md b/docs/_posts/ahmedlone127/2024-09-11-xlm_roberta_base_finetuned_panx_english_royam0820_en.md new file mode 100644 index 00000000000000..d5179a9ed0f037 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-xlm_roberta_base_finetuned_panx_english_royam0820_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_english_royam0820 XlmRoBertaForTokenClassification from royam0820 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_english_royam0820 +date: 2024-09-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_english_royam0820` is a English model originally trained by royam0820. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_royam0820_en_5.5.0_3.0_1726019964007.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_royam0820_en_5.5.0_3.0_1726019964007.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_english_royam0820","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_english_royam0820", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_english_royam0820| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|826.4 MB| + +## References + +https://huggingface.co/royam0820/xlm-roberta-base-finetuned-panx-en \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-xlm_roberta_base_finetuned_panx_german_french_leotunganh_en.md b/docs/_posts/ahmedlone127/2024-09-11-xlm_roberta_base_finetuned_panx_german_french_leotunganh_en.md new file mode 100644 index 00000000000000..d38524c9903c35 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-xlm_roberta_base_finetuned_panx_german_french_leotunganh_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_leotunganh XlmRoBertaForTokenClassification from LeoTungAnh +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_leotunganh +date: 2024-09-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_leotunganh` is a English model originally trained by LeoTungAnh. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_leotunganh_en_5.5.0_3.0_1726019692484.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_leotunganh_en_5.5.0_3.0_1726019692484.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_leotunganh","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_leotunganh", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_leotunganh| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|842.4 MB| + +## References + +https://huggingface.co/LeoTungAnh/xlm-roberta-base-finetuned-panx-de-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-xlm_roberta_base_finetuned_panx_german_lkk688_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-11-xlm_roberta_base_finetuned_panx_german_lkk688_pipeline_en.md new file mode 100644 index 00000000000000..50282157b5edc5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-xlm_roberta_base_finetuned_panx_german_lkk688_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_lkk688_pipeline pipeline XlmRoBertaForTokenClassification from lkk688 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_lkk688_pipeline +date: 2024-09-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_lkk688_pipeline` is a English model originally trained by lkk688. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_lkk688_pipeline_en_5.5.0_3.0_1726046664851.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_lkk688_pipeline_en_5.5.0_3.0_1726046664851.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_lkk688_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_lkk688_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_lkk688_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/lkk688/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-xlm_roberta_base_finetuned_panx_german_y629_en.md b/docs/_posts/ahmedlone127/2024-09-11-xlm_roberta_base_finetuned_panx_german_y629_en.md new file mode 100644 index 00000000000000..45f91a6d5efd69 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-xlm_roberta_base_finetuned_panx_german_y629_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_y629 XlmRoBertaForTokenClassification from y629 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_y629 +date: 2024-09-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_y629` is a English model originally trained by y629. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_y629_en_5.5.0_3.0_1726078834036.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_y629_en_5.5.0_3.0_1726078834036.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_y629","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_y629", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_y629| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/y629/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-xlm_roberta_base_ontonotesv5_english_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-11-xlm_roberta_base_ontonotesv5_english_pipeline_en.md new file mode 100644 index 00000000000000..a98baef5ee7589 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-xlm_roberta_base_ontonotesv5_english_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_ontonotesv5_english_pipeline pipeline XlmRoBertaForTokenClassification from Amir13 +author: John Snow Labs +name: xlm_roberta_base_ontonotesv5_english_pipeline +date: 2024-09-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_ontonotesv5_english_pipeline` is a English model originally trained by Amir13. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_ontonotesv5_english_pipeline_en_5.5.0_3.0_1726046637502.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_ontonotesv5_english_pipeline_en_5.5.0_3.0_1726046637502.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_ontonotesv5_english_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_ontonotesv5_english_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_ontonotesv5_english_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|858.1 MB| + +## References + +https://huggingface.co/Amir13/xlm-roberta-base-ontonotesv5-en + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-221026optimizedmodel_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-12-221026optimizedmodel_pipeline_en.md new file mode 100644 index 00000000000000..b4331dfd695f08 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-221026optimizedmodel_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English 221026optimizedmodel_pipeline pipeline RoBertaForSequenceClassification from AsceticShibs +author: John Snow Labs +name: 221026optimizedmodel_pipeline +date: 2024-09-12 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`221026optimizedmodel_pipeline` is a English model originally trained by AsceticShibs. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/221026optimizedmodel_pipeline_en_5.5.0_3.0_1726118012674.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/221026optimizedmodel_pipeline_en_5.5.0_3.0_1726118012674.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("221026optimizedmodel_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("221026optimizedmodel_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|221026optimizedmodel_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|467.7 MB| + +## References + +https://huggingface.co/AsceticShibs/221026OptimizedModel + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-angela_shuffle_tokens_regular_eval_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-12-angela_shuffle_tokens_regular_eval_pipeline_en.md new file mode 100644 index 00000000000000..ce4b064ce0410f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-angela_shuffle_tokens_regular_eval_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English angela_shuffle_tokens_regular_eval_pipeline pipeline XlmRoBertaForTokenClassification from azhang1212 +author: John Snow Labs +name: angela_shuffle_tokens_regular_eval_pipeline +date: 2024-09-12 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`angela_shuffle_tokens_regular_eval_pipeline` is a English model originally trained by azhang1212. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/angela_shuffle_tokens_regular_eval_pipeline_en_5.5.0_3.0_1726164969915.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/angela_shuffle_tokens_regular_eval_pipeline_en_5.5.0_3.0_1726164969915.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("angela_shuffle_tokens_regular_eval_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("angela_shuffle_tokens_regular_eval_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|angela_shuffle_tokens_regular_eval_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.0 GB| + +## References + +https://huggingface.co/azhang1212/angela_shuffle_tokens_regular_eval + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-any_news_classifier_pipeline_ru.md b/docs/_posts/ahmedlone127/2024-09-12-any_news_classifier_pipeline_ru.md new file mode 100644 index 00000000000000..0acf3d61d20e20 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-any_news_classifier_pipeline_ru.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Russian any_news_classifier_pipeline pipeline BertForSequenceClassification from data-silence +author: John Snow Labs +name: any_news_classifier_pipeline +date: 2024-09-12 +tags: [ru, open_source, pipeline, onnx] +task: Text Classification +language: ru +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`any_news_classifier_pipeline` is a Russian model originally trained by data-silence. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/any_news_classifier_pipeline_ru_5.5.0_3.0_1726123994059.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/any_news_classifier_pipeline_ru_5.5.0_3.0_1726123994059.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("any_news_classifier_pipeline", lang = "ru") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("any_news_classifier_pipeline", lang = "ru") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|any_news_classifier_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|ru| +|Size:|1.8 GB| + +## References + +https://huggingface.co/data-silence/any-news-classifier + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-args_me_roberta_base_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-12-args_me_roberta_base_pipeline_en.md new file mode 100644 index 00000000000000..c74dc0f80887c7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-args_me_roberta_base_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English args_me_roberta_base_pipeline pipeline RoBertaEmbeddings from ragarwal +author: John Snow Labs +name: args_me_roberta_base_pipeline +date: 2024-09-12 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`args_me_roberta_base_pipeline` is a English model originally trained by ragarwal. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/args_me_roberta_base_pipeline_en_5.5.0_3.0_1726109214768.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/args_me_roberta_base_pipeline_en_5.5.0_3.0_1726109214768.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("args_me_roberta_base_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("args_me_roberta_base_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|args_me_roberta_base_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|466.3 MB| + +## References + +https://huggingface.co/ragarwal/args-me-roberta-base + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-babyberta_aochildes_2_5m_wikipedia1_2_5m_with_masking_seed6_finetuned_squad_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-12-babyberta_aochildes_2_5m_wikipedia1_2_5m_with_masking_seed6_finetuned_squad_pipeline_en.md new file mode 100644 index 00000000000000..c4821bfb1e9eea --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-babyberta_aochildes_2_5m_wikipedia1_2_5m_with_masking_seed6_finetuned_squad_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English babyberta_aochildes_2_5m_wikipedia1_2_5m_with_masking_seed6_finetuned_squad_pipeline pipeline RoBertaForQuestionAnswering from lielbin +author: John Snow Labs +name: babyberta_aochildes_2_5m_wikipedia1_2_5m_with_masking_seed6_finetuned_squad_pipeline +date: 2024-09-12 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`babyberta_aochildes_2_5m_wikipedia1_2_5m_with_masking_seed6_finetuned_squad_pipeline` is a English model originally trained by lielbin. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/babyberta_aochildes_2_5m_wikipedia1_2_5m_with_masking_seed6_finetuned_squad_pipeline_en_5.5.0_3.0_1726176098952.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/babyberta_aochildes_2_5m_wikipedia1_2_5m_with_masking_seed6_finetuned_squad_pipeline_en_5.5.0_3.0_1726176098952.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("babyberta_aochildes_2_5m_wikipedia1_2_5m_with_masking_seed6_finetuned_squad_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("babyberta_aochildes_2_5m_wikipedia1_2_5m_with_masking_seed6_finetuned_squad_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|babyberta_aochildes_2_5m_wikipedia1_2_5m_with_masking_seed6_finetuned_squad_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|32.0 MB| + +## References + +https://huggingface.co/lielbin/BabyBERTa-aochildes_2.5M_wikipedia1_2.5M-with-Masking-seed6-finetuned-SQuAD + +## Included Models + +- MultiDocumentAssembler +- RoBertaForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-babyberta_wikipedia1_2_5m_aochildes_2_5m_without_masking_seed6_finetuned_squad_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-12-babyberta_wikipedia1_2_5m_aochildes_2_5m_without_masking_seed6_finetuned_squad_pipeline_en.md new file mode 100644 index 00000000000000..71d5930c3bd28e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-babyberta_wikipedia1_2_5m_aochildes_2_5m_without_masking_seed6_finetuned_squad_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English babyberta_wikipedia1_2_5m_aochildes_2_5m_without_masking_seed6_finetuned_squad_pipeline pipeline RoBertaForQuestionAnswering from lielbin +author: John Snow Labs +name: babyberta_wikipedia1_2_5m_aochildes_2_5m_without_masking_seed6_finetuned_squad_pipeline +date: 2024-09-12 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`babyberta_wikipedia1_2_5m_aochildes_2_5m_without_masking_seed6_finetuned_squad_pipeline` is a English model originally trained by lielbin. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/babyberta_wikipedia1_2_5m_aochildes_2_5m_without_masking_seed6_finetuned_squad_pipeline_en_5.5.0_3.0_1726107005216.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/babyberta_wikipedia1_2_5m_aochildes_2_5m_without_masking_seed6_finetuned_squad_pipeline_en_5.5.0_3.0_1726107005216.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("babyberta_wikipedia1_2_5m_aochildes_2_5m_without_masking_seed6_finetuned_squad_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("babyberta_wikipedia1_2_5m_aochildes_2_5m_without_masking_seed6_finetuned_squad_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|babyberta_wikipedia1_2_5m_aochildes_2_5m_without_masking_seed6_finetuned_squad_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|32.0 MB| + +## References + +https://huggingface.co/lielbin/BabyBERTa-wikipedia1_2.5M_aochildes_2.5M-without-Masking-seed6-finetuned-SQuAD + +## Included Models + +- MultiDocumentAssembler +- RoBertaForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-bert_csat_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-12-bert_csat_pipeline_en.md new file mode 100644 index 00000000000000..6c154b80537845 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-bert_csat_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_csat_pipeline pipeline DistilBertForSequenceClassification from MoaazZaki +author: John Snow Labs +name: bert_csat_pipeline +date: 2024-09-12 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_csat_pipeline` is a English model originally trained by MoaazZaki. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_csat_pipeline_en_5.5.0_3.0_1726100663141.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_csat_pipeline_en_5.5.0_3.0_1726100663141.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_csat_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_csat_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_csat_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/MoaazZaki/bert-csat + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-burmese_awesome_eli5_mlm_model_nicedanger4u_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-12-burmese_awesome_eli5_mlm_model_nicedanger4u_pipeline_en.md new file mode 100644 index 00000000000000..72aa4364e4b7ba --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-burmese_awesome_eli5_mlm_model_nicedanger4u_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_eli5_mlm_model_nicedanger4u_pipeline pipeline RoBertaEmbeddings from NiceDanger4U +author: John Snow Labs +name: burmese_awesome_eli5_mlm_model_nicedanger4u_pipeline +date: 2024-09-12 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_eli5_mlm_model_nicedanger4u_pipeline` is a English model originally trained by NiceDanger4U. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_eli5_mlm_model_nicedanger4u_pipeline_en_5.5.0_3.0_1726112939975.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_eli5_mlm_model_nicedanger4u_pipeline_en_5.5.0_3.0_1726112939975.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_eli5_mlm_model_nicedanger4u_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_eli5_mlm_model_nicedanger4u_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_eli5_mlm_model_nicedanger4u_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|306.5 MB| + +## References + +https://huggingface.co/NiceDanger4U/my_awesome_eli5_mlm_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-burmese_awesome_qa_model_annajohn_en.md b/docs/_posts/ahmedlone127/2024-09-12-burmese_awesome_qa_model_annajohn_en.md new file mode 100644 index 00000000000000..2f529ca4fc737f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-burmese_awesome_qa_model_annajohn_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English burmese_awesome_qa_model_annajohn DistilBertForQuestionAnswering from annajohn +author: John Snow Labs +name: burmese_awesome_qa_model_annajohn +date: 2024-09-12 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_qa_model_annajohn` is a English model originally trained by annajohn. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_annajohn_en_5.5.0_3.0_1726180301785.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_annajohn_en_5.5.0_3.0_1726180301785.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("burmese_awesome_qa_model_annajohn","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("burmese_awesome_qa_model_annajohn", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_qa_model_annajohn| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/annajohn/my_awesome_qa_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-burmese_awesome_qa_model_gsl22_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-12-burmese_awesome_qa_model_gsl22_pipeline_en.md new file mode 100644 index 00000000000000..2a4cb35275aeec --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-burmese_awesome_qa_model_gsl22_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English burmese_awesome_qa_model_gsl22_pipeline pipeline RoBertaForQuestionAnswering from gsl22 +author: John Snow Labs +name: burmese_awesome_qa_model_gsl22_pipeline +date: 2024-09-12 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_qa_model_gsl22_pipeline` is a English model originally trained by gsl22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_gsl22_pipeline_en_5.5.0_3.0_1726175729402.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_gsl22_pipeline_en_5.5.0_3.0_1726175729402.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_qa_model_gsl22_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_qa_model_gsl22_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_qa_model_gsl22_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|463.6 MB| + +## References + +https://huggingface.co/gsl22/my_awesome_qa_model + +## Included Models + +- MultiDocumentAssembler +- RoBertaForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-burmese_awesome_qa_model_question_ans_en.md b/docs/_posts/ahmedlone127/2024-09-12-burmese_awesome_qa_model_question_ans_en.md new file mode 100644 index 00000000000000..909e561c2c287a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-burmese_awesome_qa_model_question_ans_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English burmese_awesome_qa_model_question_ans DistilBertForQuestionAnswering from wyxwangmed +author: John Snow Labs +name: burmese_awesome_qa_model_question_ans +date: 2024-09-12 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_qa_model_question_ans` is a English model originally trained by wyxwangmed. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_question_ans_en_5.5.0_3.0_1726180689020.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_question_ans_en_5.5.0_3.0_1726180689020.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("burmese_awesome_qa_model_question_ans","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("burmese_awesome_qa_model_question_ans", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_qa_model_question_ans| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/wyxwangmed/my_awesome_qa_model_question_ans \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-calender_event_classification_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-12-calender_event_classification_pipeline_en.md new file mode 100644 index 00000000000000..0afb743ee4b8b8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-calender_event_classification_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English calender_event_classification_pipeline pipeline DistilBertForSequenceClassification from Indramal +author: John Snow Labs +name: calender_event_classification_pipeline +date: 2024-09-12 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`calender_event_classification_pipeline` is a English model originally trained by Indramal. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/calender_event_classification_pipeline_en_5.5.0_3.0_1726125209167.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/calender_event_classification_pipeline_en_5.5.0_3.0_1726125209167.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("calender_event_classification_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("calender_event_classification_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|calender_event_classification_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Indramal/Calender-Event-Classification + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-cbdc_sentiment_bert_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-12-cbdc_sentiment_bert_pipeline_en.md new file mode 100644 index 00000000000000..0203eacb6986a4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-cbdc_sentiment_bert_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English cbdc_sentiment_bert_pipeline pipeline BertForSequenceClassification from JonasOuatt +author: John Snow Labs +name: cbdc_sentiment_bert_pipeline +date: 2024-09-12 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`cbdc_sentiment_bert_pipeline` is a English model originally trained by JonasOuatt. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/cbdc_sentiment_bert_pipeline_en_5.5.0_3.0_1726104218438.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/cbdc_sentiment_bert_pipeline_en_5.5.0_3.0_1726104218438.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("cbdc_sentiment_bert_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("cbdc_sentiment_bert_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|cbdc_sentiment_bert_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|627.8 MB| + +## References + +https://huggingface.co/JonasOuatt/CBDC-sentiment-bert + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-checkpoint_124500_finetuned_squad_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-12-checkpoint_124500_finetuned_squad_pipeline_en.md new file mode 100644 index 00000000000000..b6dba6878b487e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-checkpoint_124500_finetuned_squad_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English checkpoint_124500_finetuned_squad_pipeline pipeline DistilBertForQuestionAnswering from botika +author: John Snow Labs +name: checkpoint_124500_finetuned_squad_pipeline +date: 2024-09-12 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`checkpoint_124500_finetuned_squad_pipeline` is a English model originally trained by botika. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/checkpoint_124500_finetuned_squad_pipeline_en_5.5.0_3.0_1726180582484.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/checkpoint_124500_finetuned_squad_pipeline_en_5.5.0_3.0_1726180582484.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("checkpoint_124500_finetuned_squad_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("checkpoint_124500_finetuned_squad_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|checkpoint_124500_finetuned_squad_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.3 MB| + +## References + +https://huggingface.co/botika/checkpoint-124500-finetuned-squad + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-coha1910s_en.md b/docs/_posts/ahmedlone127/2024-09-12-coha1910s_en.md new file mode 100644 index 00000000000000..f176607cc4aa52 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-coha1910s_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English coha1910s RoBertaEmbeddings from simonmun +author: John Snow Labs +name: coha1910s +date: 2024-09-12 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`coha1910s` is a English model originally trained by simonmun. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/coha1910s_en_5.5.0_3.0_1726185434313.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/coha1910s_en_5.5.0_3.0_1726185434313.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("coha1910s","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("coha1910s","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|coha1910s| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|311.5 MB| + +## References + +https://huggingface.co/simonmun/COHA1910s \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-danish_distilbert_base_uncased_nlp_feup_en.md b/docs/_posts/ahmedlone127/2024-09-12-danish_distilbert_base_uncased_nlp_feup_en.md new file mode 100644 index 00000000000000..b109c84c014b19 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-danish_distilbert_base_uncased_nlp_feup_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English danish_distilbert_base_uncased_nlp_feup DistilBertEmbeddings from NLP-FEUP +author: John Snow Labs +name: danish_distilbert_base_uncased_nlp_feup +date: 2024-09-12 +tags: [en, open_source, onnx, embeddings, distilbert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`danish_distilbert_base_uncased_nlp_feup` is a English model originally trained by NLP-FEUP. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/danish_distilbert_base_uncased_nlp_feup_en_5.5.0_3.0_1726171943684.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/danish_distilbert_base_uncased_nlp_feup_en_5.5.0_3.0_1726171943684.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = DistilBertEmbeddings.pretrained("danish_distilbert_base_uncased_nlp_feup","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = DistilBertEmbeddings.pretrained("danish_distilbert_base_uncased_nlp_feup","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|danish_distilbert_base_uncased_nlp_feup| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[distilbert]| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/NLP-FEUP/DA-distilbert-base-uncased \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-deberta_v3_small_finetuned_hate_speech18_narrativaai_en.md b/docs/_posts/ahmedlone127/2024-09-12-deberta_v3_small_finetuned_hate_speech18_narrativaai_en.md new file mode 100644 index 00000000000000..10d16251763126 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-deberta_v3_small_finetuned_hate_speech18_narrativaai_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English deberta_v3_small_finetuned_hate_speech18_narrativaai DeBertaForSequenceClassification from Narrativaai +author: John Snow Labs +name: deberta_v3_small_finetuned_hate_speech18_narrativaai +date: 2024-09-12 +tags: [en, open_source, onnx, sequence_classification, deberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DeBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DeBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`deberta_v3_small_finetuned_hate_speech18_narrativaai` is a English model originally trained by Narrativaai. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/deberta_v3_small_finetuned_hate_speech18_narrativaai_en_5.5.0_3.0_1726163442748.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/deberta_v3_small_finetuned_hate_speech18_narrativaai_en_5.5.0_3.0_1726163442748.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DeBertaForSequenceClassification.pretrained("deberta_v3_small_finetuned_hate_speech18_narrativaai","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DeBertaForSequenceClassification.pretrained("deberta_v3_small_finetuned_hate_speech18_narrativaai", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|deberta_v3_small_finetuned_hate_speech18_narrativaai| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|444.3 MB| + +## References + +https://huggingface.co/Narrativaai/deberta-v3-small-finetuned-hate_speech18 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-deberta_v3_xsmall_survey_rater_combined_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-12-deberta_v3_xsmall_survey_rater_combined_pipeline_en.md new file mode 100644 index 00000000000000..50dd88301caed4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-deberta_v3_xsmall_survey_rater_combined_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English deberta_v3_xsmall_survey_rater_combined_pipeline pipeline DeBertaForSequenceClassification from domenicrosati +author: John Snow Labs +name: deberta_v3_xsmall_survey_rater_combined_pipeline +date: 2024-09-12 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DeBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`deberta_v3_xsmall_survey_rater_combined_pipeline` is a English model originally trained by domenicrosati. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/deberta_v3_xsmall_survey_rater_combined_pipeline_en_5.5.0_3.0_1726168662492.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/deberta_v3_xsmall_survey_rater_combined_pipeline_en_5.5.0_3.0_1726168662492.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("deberta_v3_xsmall_survey_rater_combined_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("deberta_v3_xsmall_survey_rater_combined_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|deberta_v3_xsmall_survey_rater_combined_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|233.8 MB| + +## References + +https://huggingface.co/domenicrosati/deberta-v3-xsmall-survey-rater-combined + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DeBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-distilbert_base_uncased_distilled_squad_finetuned_squad_kranasian_en.md b/docs/_posts/ahmedlone127/2024-09-12-distilbert_base_uncased_distilled_squad_finetuned_squad_kranasian_en.md new file mode 100644 index 00000000000000..5add91e59f209a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-distilbert_base_uncased_distilled_squad_finetuned_squad_kranasian_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English distilbert_base_uncased_distilled_squad_finetuned_squad_kranasian DistilBertForQuestionAnswering from kranasian +author: John Snow Labs +name: distilbert_base_uncased_distilled_squad_finetuned_squad_kranasian +date: 2024-09-12 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_distilled_squad_finetuned_squad_kranasian` is a English model originally trained by kranasian. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_distilled_squad_finetuned_squad_kranasian_en_5.5.0_3.0_1726180605548.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_distilled_squad_finetuned_squad_kranasian_en_5.5.0_3.0_1726180605548.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_uncased_distilled_squad_finetuned_squad_kranasian","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_uncased_distilled_squad_finetuned_squad_kranasian", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_distilled_squad_finetuned_squad_kranasian| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/kranasian/distilbert-base-uncased-distilled-squad-finetuned-squad \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-distilbert_base_uncased_finetuned_imdb_accelerate_marcosautuori_en.md b/docs/_posts/ahmedlone127/2024-09-12-distilbert_base_uncased_finetuned_imdb_accelerate_marcosautuori_en.md new file mode 100644 index 00000000000000..648b9fbe7c5514 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-distilbert_base_uncased_finetuned_imdb_accelerate_marcosautuori_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_imdb_accelerate_marcosautuori DistilBertEmbeddings from MarcosAutuori +author: John Snow Labs +name: distilbert_base_uncased_finetuned_imdb_accelerate_marcosautuori +date: 2024-09-12 +tags: [en, open_source, onnx, embeddings, distilbert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_imdb_accelerate_marcosautuori` is a English model originally trained by MarcosAutuori. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_imdb_accelerate_marcosautuori_en_5.5.0_3.0_1726172047815.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_imdb_accelerate_marcosautuori_en_5.5.0_3.0_1726172047815.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = DistilBertEmbeddings.pretrained("distilbert_base_uncased_finetuned_imdb_accelerate_marcosautuori","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = DistilBertEmbeddings.pretrained("distilbert_base_uncased_finetuned_imdb_accelerate_marcosautuori","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_imdb_accelerate_marcosautuori| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[distilbert]| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/MarcosAutuori/distilbert-base-uncased-finetuned-imdb-accelerate \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-distilbert_qa_pytorch_full_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-12-distilbert_qa_pytorch_full_pipeline_en.md new file mode 100644 index 00000000000000..4f4cb66f0f4bd6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-distilbert_qa_pytorch_full_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English distilbert_qa_pytorch_full_pipeline pipeline DistilBertForQuestionAnswering from tyavika +author: John Snow Labs +name: distilbert_qa_pytorch_full_pipeline +date: 2024-09-12 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_qa_pytorch_full_pipeline` is a English model originally trained by tyavika. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_qa_pytorch_full_pipeline_en_5.5.0_3.0_1726180805130.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_qa_pytorch_full_pipeline_en_5.5.0_3.0_1726180805130.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_qa_pytorch_full_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_qa_pytorch_full_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_qa_pytorch_full_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.3 MB| + +## References + +https://huggingface.co/tyavika/Distilbert-QA-Pytorch-FULL + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_qnli_en.md b/docs/_posts/ahmedlone127/2024-09-12-distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_qnli_en.md new file mode 100644 index 00000000000000..6a58dc300ebcb5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_qnli_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_qnli DistilBertForSequenceClassification from gokuls +author: John Snow Labs +name: distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_qnli +date: 2024-09-12 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_qnli` is a English model originally trained by gokuls. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_qnli_en_5.5.0_3.0_1726100782595.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_qnli_en_5.5.0_3.0_1726100782595.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_qnli","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_qnli", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_qnli| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|251.0 MB| + +## References + +https://huggingface.co/gokuls/distilbert_sa_GLUE_Experiment_logit_kd_data_aug_qnli \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-distilbert_sanskrit_saskta_glue_experiment_logit_kd_wnli_256_en.md b/docs/_posts/ahmedlone127/2024-09-12-distilbert_sanskrit_saskta_glue_experiment_logit_kd_wnli_256_en.md new file mode 100644 index 00000000000000..73cbe97797c41f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-distilbert_sanskrit_saskta_glue_experiment_logit_kd_wnli_256_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_sanskrit_saskta_glue_experiment_logit_kd_wnli_256 DistilBertForSequenceClassification from gokuls +author: John Snow Labs +name: distilbert_sanskrit_saskta_glue_experiment_logit_kd_wnli_256 +date: 2024-09-12 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_sanskrit_saskta_glue_experiment_logit_kd_wnli_256` is a English model originally trained by gokuls. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_sanskrit_saskta_glue_experiment_logit_kd_wnli_256_en_5.5.0_3.0_1726100451863.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_sanskrit_saskta_glue_experiment_logit_kd_wnli_256_en_5.5.0_3.0_1726100451863.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_sanskrit_saskta_glue_experiment_logit_kd_wnli_256","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_sanskrit_saskta_glue_experiment_logit_kd_wnli_256", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_sanskrit_saskta_glue_experiment_logit_kd_wnli_256| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|71.6 MB| + +## References + +https://huggingface.co/gokuls/distilbert_sa_GLUE_Experiment_logit_kd_wnli_256 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-distilbert_sanskrit_saskta_glue_experiment_qnli_256_en.md b/docs/_posts/ahmedlone127/2024-09-12-distilbert_sanskrit_saskta_glue_experiment_qnli_256_en.md new file mode 100644 index 00000000000000..3f704ee7a1d093 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-distilbert_sanskrit_saskta_glue_experiment_qnli_256_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_sanskrit_saskta_glue_experiment_qnli_256 DistilBertForSequenceClassification from gokuls +author: John Snow Labs +name: distilbert_sanskrit_saskta_glue_experiment_qnli_256 +date: 2024-09-12 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_sanskrit_saskta_glue_experiment_qnli_256` is a English model originally trained by gokuls. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_sanskrit_saskta_glue_experiment_qnli_256_en_5.5.0_3.0_1726100100380.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_sanskrit_saskta_glue_experiment_qnli_256_en_5.5.0_3.0_1726100100380.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_sanskrit_saskta_glue_experiment_qnli_256","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_sanskrit_saskta_glue_experiment_qnli_256", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_sanskrit_saskta_glue_experiment_qnli_256| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|71.6 MB| + +## References + +https://huggingface.co/gokuls/distilbert_sa_GLUE_Experiment_qnli_256 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-english_zhtw_en.md b/docs/_posts/ahmedlone127/2024-09-12-english_zhtw_en.md new file mode 100644 index 00000000000000..a009b37eb55d66 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-english_zhtw_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English english_zhtw MarianTransformer from agentlans +author: John Snow Labs +name: english_zhtw +date: 2024-09-12 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`english_zhtw` is a English model originally trained by agentlans. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/english_zhtw_en_5.5.0_3.0_1726167622952.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/english_zhtw_en_5.5.0_3.0_1726167622952.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("english_zhtw","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("english_zhtw","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|english_zhtw| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|541.9 MB| + +## References + +https://huggingface.co/agentlans/en-zhtw \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-ethnicity_model_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-12-ethnicity_model_pipeline_en.md new file mode 100644 index 00000000000000..f4afadd17cb35a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-ethnicity_model_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English ethnicity_model_pipeline pipeline BertForSequenceClassification from BananaFish45 +author: John Snow Labs +name: ethnicity_model_pipeline +date: 2024-09-12 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ethnicity_model_pipeline` is a English model originally trained by BananaFish45. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ethnicity_model_pipeline_en_5.5.0_3.0_1726123121151.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ethnicity_model_pipeline_en_5.5.0_3.0_1726123121151.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("ethnicity_model_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("ethnicity_model_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ethnicity_model_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/BananaFish45/Ethnicity_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-financial_phrasebank_oversampling_10perc_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-12-financial_phrasebank_oversampling_10perc_pipeline_en.md new file mode 100644 index 00000000000000..ab179696473979 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-financial_phrasebank_oversampling_10perc_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English financial_phrasebank_oversampling_10perc_pipeline pipeline RoBertaForSequenceClassification from kruthof +author: John Snow Labs +name: financial_phrasebank_oversampling_10perc_pipeline +date: 2024-09-12 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`financial_phrasebank_oversampling_10perc_pipeline` is a English model originally trained by kruthof. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/financial_phrasebank_oversampling_10perc_pipeline_en_5.5.0_3.0_1726117934012.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/financial_phrasebank_oversampling_10perc_pipeline_en_5.5.0_3.0_1726117934012.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("financial_phrasebank_oversampling_10perc_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("financial_phrasebank_oversampling_10perc_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|financial_phrasebank_oversampling_10perc_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|428.8 MB| + +## References + +https://huggingface.co/kruthof/financial_phrasebank_oversampling_10perc + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-fine_tune_spatial_validation_en.md b/docs/_posts/ahmedlone127/2024-09-12-fine_tune_spatial_validation_en.md new file mode 100644 index 00000000000000..cb043837ab924a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-fine_tune_spatial_validation_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English fine_tune_spatial_validation RoBertaForQuestionAnswering from dflcmu +author: John Snow Labs +name: fine_tune_spatial_validation +date: 2024-09-12 +tags: [en, open_source, onnx, question_answering, roberta] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`fine_tune_spatial_validation` is a English model originally trained by dflcmu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/fine_tune_spatial_validation_en_5.5.0_3.0_1726106590856.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/fine_tune_spatial_validation_en_5.5.0_3.0_1726106590856.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = RoBertaForQuestionAnswering.pretrained("fine_tune_spatial_validation","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = RoBertaForQuestionAnswering.pretrained("fine_tune_spatial_validation", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|fine_tune_spatial_validation| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|463.6 MB| + +## References + +https://huggingface.co/dflcmu/fine_tune_spatial_validation \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-fine_tuned_roberta_roamify_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-12-fine_tuned_roberta_roamify_pipeline_en.md new file mode 100644 index 00000000000000..754b1b0024cbc4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-fine_tuned_roberta_roamify_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English fine_tuned_roberta_roamify_pipeline pipeline RoBertaForQuestionAnswering from Roamify +author: John Snow Labs +name: fine_tuned_roberta_roamify_pipeline +date: 2024-09-12 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`fine_tuned_roberta_roamify_pipeline` is a English model originally trained by Roamify. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/fine_tuned_roberta_roamify_pipeline_en_5.5.0_3.0_1726176227267.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/fine_tuned_roberta_roamify_pipeline_en_5.5.0_3.0_1726176227267.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("fine_tuned_roberta_roamify_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("fine_tuned_roberta_roamify_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|fine_tuned_roberta_roamify_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|419.9 MB| + +## References + +https://huggingface.co/Roamify/fine-tuned-roberta + +## Included Models + +- MultiDocumentAssembler +- RoBertaForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-flat_model_en.md b/docs/_posts/ahmedlone127/2024-09-12-flat_model_en.md new file mode 100644 index 00000000000000..ffe06ff24add12 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-flat_model_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English flat_model DistilBertForQuestionAnswering from rugvedabodke +author: John Snow Labs +name: flat_model +date: 2024-09-12 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`flat_model` is a English model originally trained by rugvedabodke. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/flat_model_en_5.5.0_3.0_1726180574683.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/flat_model_en_5.5.0_3.0_1726180574683.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("flat_model","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("flat_model", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|flat_model| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/rugvedabodke/flat_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-helsinki_nlp_korean_english_base_test_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-12-helsinki_nlp_korean_english_base_test_pipeline_en.md new file mode 100644 index 00000000000000..da2f5cf89ac813 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-helsinki_nlp_korean_english_base_test_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English helsinki_nlp_korean_english_base_test_pipeline pipeline MarianTransformer from dalzza +author: John Snow Labs +name: helsinki_nlp_korean_english_base_test_pipeline +date: 2024-09-12 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`helsinki_nlp_korean_english_base_test_pipeline` is a English model originally trained by dalzza. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/helsinki_nlp_korean_english_base_test_pipeline_en_5.5.0_3.0_1726168077854.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/helsinki_nlp_korean_english_base_test_pipeline_en_5.5.0_3.0_1726168077854.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("helsinki_nlp_korean_english_base_test_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("helsinki_nlp_korean_english_base_test_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|helsinki_nlp_korean_english_base_test_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|541.0 MB| + +## References + +https://huggingface.co/dalzza/helsinki-nlp-ko-en-base-test + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-hw001_lostck_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-12-hw001_lostck_pipeline_en.md new file mode 100644 index 00000000000000..f6e24eade8c87b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-hw001_lostck_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English hw001_lostck_pipeline pipeline DistilBertForSequenceClassification from lostck +author: John Snow Labs +name: hw001_lostck_pipeline +date: 2024-09-12 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`hw001_lostck_pipeline` is a English model originally trained by lostck. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/hw001_lostck_pipeline_en_5.5.0_3.0_1726124933026.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/hw001_lostck_pipeline_en_5.5.0_3.0_1726124933026.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("hw001_lostck_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("hw001_lostck_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|hw001_lostck_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/lostck/HW001 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-icelandic_there_aragonese_allergy_bert_first512_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-12-icelandic_there_aragonese_allergy_bert_first512_pipeline_en.md new file mode 100644 index 00000000000000..ff1236082af8b1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-icelandic_there_aragonese_allergy_bert_first512_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English icelandic_there_aragonese_allergy_bert_first512_pipeline pipeline BertForSequenceClassification from etadevosyan +author: John Snow Labs +name: icelandic_there_aragonese_allergy_bert_first512_pipeline +date: 2024-09-12 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`icelandic_there_aragonese_allergy_bert_first512_pipeline` is a English model originally trained by etadevosyan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/icelandic_there_aragonese_allergy_bert_first512_pipeline_en_5.5.0_3.0_1726182670516.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/icelandic_there_aragonese_allergy_bert_first512_pipeline_en_5.5.0_3.0_1726182670516.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("icelandic_there_aragonese_allergy_bert_first512_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("icelandic_there_aragonese_allergy_bert_first512_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|icelandic_there_aragonese_allergy_bert_first512_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|666.5 MB| + +## References + +https://huggingface.co/etadevosyan/is_there_an_allergy_bert_First512 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-iwslt17_marian_big_ctx4_cwd0_english_french_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-12-iwslt17_marian_big_ctx4_cwd0_english_french_pipeline_en.md new file mode 100644 index 00000000000000..7c72f01db1b801 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-iwslt17_marian_big_ctx4_cwd0_english_french_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English iwslt17_marian_big_ctx4_cwd0_english_french_pipeline pipeline MarianTransformer from context-mt +author: John Snow Labs +name: iwslt17_marian_big_ctx4_cwd0_english_french_pipeline +date: 2024-09-12 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`iwslt17_marian_big_ctx4_cwd0_english_french_pipeline` is a English model originally trained by context-mt. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/iwslt17_marian_big_ctx4_cwd0_english_french_pipeline_en_5.5.0_3.0_1726161779380.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/iwslt17_marian_big_ctx4_cwd0_english_french_pipeline_en_5.5.0_3.0_1726161779380.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("iwslt17_marian_big_ctx4_cwd0_english_french_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("iwslt17_marian_big_ctx4_cwd0_english_french_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|iwslt17_marian_big_ctx4_cwd0_english_french_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/context-mt/iwslt17-marian-big-ctx4-cwd0-en-fr + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-iwslt17_marian_small_ctx8_cwd0_english_french_en.md b/docs/_posts/ahmedlone127/2024-09-12-iwslt17_marian_small_ctx8_cwd0_english_french_en.md new file mode 100644 index 00000000000000..93c805496a54b5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-iwslt17_marian_small_ctx8_cwd0_english_french_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English iwslt17_marian_small_ctx8_cwd0_english_french MarianTransformer from context-mt +author: John Snow Labs +name: iwslt17_marian_small_ctx8_cwd0_english_french +date: 2024-09-12 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`iwslt17_marian_small_ctx8_cwd0_english_french` is a English model originally trained by context-mt. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/iwslt17_marian_small_ctx8_cwd0_english_french_en_5.5.0_3.0_1726126724941.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/iwslt17_marian_small_ctx8_cwd0_english_french_en_5.5.0_3.0_1726126724941.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("iwslt17_marian_small_ctx8_cwd0_english_french","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("iwslt17_marian_small_ctx8_cwd0_english_french","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|iwslt17_marian_small_ctx8_cwd0_english_french| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|508.4 MB| + +## References + +https://huggingface.co/context-mt/iwslt17-marian-small-ctx8-cwd0-en-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-lab1_finetuning_zeen0_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-12-lab1_finetuning_zeen0_pipeline_en.md new file mode 100644 index 00000000000000..246c27a0a8232e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-lab1_finetuning_zeen0_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English lab1_finetuning_zeen0_pipeline pipeline MarianTransformer from Zeen0 +author: John Snow Labs +name: lab1_finetuning_zeen0_pipeline +date: 2024-09-12 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`lab1_finetuning_zeen0_pipeline` is a English model originally trained by Zeen0. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/lab1_finetuning_zeen0_pipeline_en_5.5.0_3.0_1726160843959.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/lab1_finetuning_zeen0_pipeline_en_5.5.0_3.0_1726160843959.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("lab1_finetuning_zeen0_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("lab1_finetuning_zeen0_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|lab1_finetuning_zeen0_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|508.6 MB| + +## References + +https://huggingface.co/Zeen0/lab1_finetuning + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-lab1_random_yimeiyang_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-12-lab1_random_yimeiyang_pipeline_en.md new file mode 100644 index 00000000000000..40d7eb98237d85 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-lab1_random_yimeiyang_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English lab1_random_yimeiyang_pipeline pipeline MarianTransformer from yimeiyang +author: John Snow Labs +name: lab1_random_yimeiyang_pipeline +date: 2024-09-12 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`lab1_random_yimeiyang_pipeline` is a English model originally trained by yimeiyang. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/lab1_random_yimeiyang_pipeline_en_5.5.0_3.0_1726161445654.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/lab1_random_yimeiyang_pipeline_en_5.5.0_3.0_1726161445654.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("lab1_random_yimeiyang_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("lab1_random_yimeiyang_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|lab1_random_yimeiyang_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|508.8 MB| + +## References + +https://huggingface.co/yimeiyang/lab1_random + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-lusa_events_en.md b/docs/_posts/ahmedlone127/2024-09-12-lusa_events_en.md new file mode 100644 index 00000000000000..359a8ab1941db1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-lusa_events_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English lusa_events BertForTokenClassification from lfcc +author: John Snow Labs +name: lusa_events +date: 2024-09-12 +tags: [en, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`lusa_events` is a English model originally trained by lfcc. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/lusa_events_en_5.5.0_3.0_1726174650335.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/lusa_events_en_5.5.0_3.0_1726174650335.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("lusa_events","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("lusa_events", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|lusa_events| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|406.0 MB| + +## References + +https://huggingface.co/lfcc/lusa_events \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-lusa_events_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-12-lusa_events_pipeline_en.md new file mode 100644 index 00000000000000..032c5afaea6f1e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-lusa_events_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English lusa_events_pipeline pipeline BertForTokenClassification from lfcc +author: John Snow Labs +name: lusa_events_pipeline +date: 2024-09-12 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`lusa_events_pipeline` is a English model originally trained by lfcc. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/lusa_events_pipeline_en_5.5.0_3.0_1726174669288.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/lusa_events_pipeline_en_5.5.0_3.0_1726174669288.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("lusa_events_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("lusa_events_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|lusa_events_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|406.0 MB| + +## References + +https://huggingface.co/lfcc/lusa_events + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-maltese_coref_english_hebrew_modern_coref_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-12-maltese_coref_english_hebrew_modern_coref_pipeline_en.md new file mode 100644 index 00000000000000..18008944072214 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-maltese_coref_english_hebrew_modern_coref_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English maltese_coref_english_hebrew_modern_coref_pipeline pipeline MarianTransformer from nlphuji +author: John Snow Labs +name: maltese_coref_english_hebrew_modern_coref_pipeline +date: 2024-09-12 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`maltese_coref_english_hebrew_modern_coref_pipeline` is a English model originally trained by nlphuji. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/maltese_coref_english_hebrew_modern_coref_pipeline_en_5.5.0_3.0_1726126611004.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/maltese_coref_english_hebrew_modern_coref_pipeline_en_5.5.0_3.0_1726126611004.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("maltese_coref_english_hebrew_modern_coref_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("maltese_coref_english_hebrew_modern_coref_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|maltese_coref_english_hebrew_modern_coref_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|545.7 MB| + +## References + +https://huggingface.co/nlphuji/mt_coref_en_he_coref + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-marefa_maltese_english_arabic_parallel_10k_splitted_euclidean_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-12-marefa_maltese_english_arabic_parallel_10k_splitted_euclidean_pipeline_en.md new file mode 100644 index 00000000000000..587206c651f70c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-marefa_maltese_english_arabic_parallel_10k_splitted_euclidean_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English marefa_maltese_english_arabic_parallel_10k_splitted_euclidean_pipeline pipeline MarianTransformer from HamdanXI +author: John Snow Labs +name: marefa_maltese_english_arabic_parallel_10k_splitted_euclidean_pipeline +date: 2024-09-12 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`marefa_maltese_english_arabic_parallel_10k_splitted_euclidean_pipeline` is a English model originally trained by HamdanXI. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/marefa_maltese_english_arabic_parallel_10k_splitted_euclidean_pipeline_en_5.5.0_3.0_1726126819339.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/marefa_maltese_english_arabic_parallel_10k_splitted_euclidean_pipeline_en_5.5.0_3.0_1726126819339.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("marefa_maltese_english_arabic_parallel_10k_splitted_euclidean_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("marefa_maltese_english_arabic_parallel_10k_splitted_euclidean_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|marefa_maltese_english_arabic_parallel_10k_splitted_euclidean_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|528.0 MB| + +## References + +https://huggingface.co/HamdanXI/marefa-mt-en-ar-parallel-10k-splitted-euclidean + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-marian_finetuned_kde4_english_tonga_tonga_islands_french_accelerate_tsobolev_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-12-marian_finetuned_kde4_english_tonga_tonga_islands_french_accelerate_tsobolev_pipeline_en.md new file mode 100644 index 00000000000000..42e7b4c39e6e7f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-marian_finetuned_kde4_english_tonga_tonga_islands_french_accelerate_tsobolev_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English marian_finetuned_kde4_english_tonga_tonga_islands_french_accelerate_tsobolev_pipeline pipeline MarianTransformer from tsobolev +author: John Snow Labs +name: marian_finetuned_kde4_english_tonga_tonga_islands_french_accelerate_tsobolev_pipeline +date: 2024-09-12 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`marian_finetuned_kde4_english_tonga_tonga_islands_french_accelerate_tsobolev_pipeline` is a English model originally trained by tsobolev. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/marian_finetuned_kde4_english_tonga_tonga_islands_french_accelerate_tsobolev_pipeline_en_5.5.0_3.0_1726167677671.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/marian_finetuned_kde4_english_tonga_tonga_islands_french_accelerate_tsobolev_pipeline_en_5.5.0_3.0_1726167677671.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("marian_finetuned_kde4_english_tonga_tonga_islands_french_accelerate_tsobolev_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("marian_finetuned_kde4_english_tonga_tonga_islands_french_accelerate_tsobolev_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|marian_finetuned_kde4_english_tonga_tonga_islands_french_accelerate_tsobolev_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|508.8 MB| + +## References + +https://huggingface.co/tsobolev/marian-finetuned-kde4-en-to-fr-accelerate + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-marian_finetuned_kde4_english_tonga_tonga_islands_french_ai_newbie89_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-12-marian_finetuned_kde4_english_tonga_tonga_islands_french_ai_newbie89_pipeline_en.md new file mode 100644 index 00000000000000..616121e1e341c2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-marian_finetuned_kde4_english_tonga_tonga_islands_french_ai_newbie89_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English marian_finetuned_kde4_english_tonga_tonga_islands_french_ai_newbie89_pipeline pipeline MarianTransformer from AI-newbie89 +author: John Snow Labs +name: marian_finetuned_kde4_english_tonga_tonga_islands_french_ai_newbie89_pipeline +date: 2024-09-12 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`marian_finetuned_kde4_english_tonga_tonga_islands_french_ai_newbie89_pipeline` is a English model originally trained by AI-newbie89. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/marian_finetuned_kde4_english_tonga_tonga_islands_french_ai_newbie89_pipeline_en_5.5.0_3.0_1726126131856.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/marian_finetuned_kde4_english_tonga_tonga_islands_french_ai_newbie89_pipeline_en_5.5.0_3.0_1726126131856.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("marian_finetuned_kde4_english_tonga_tonga_islands_french_ai_newbie89_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("marian_finetuned_kde4_english_tonga_tonga_islands_french_ai_newbie89_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|marian_finetuned_kde4_english_tonga_tonga_islands_french_ai_newbie89_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|508.9 MB| + +## References + +https://huggingface.co/AI-newbie89/marian-finetuned-kde4-en-to-fr + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-mdeberta_v3_base_amazon_massive_intent_en.md b/docs/_posts/ahmedlone127/2024-09-12-mdeberta_v3_base_amazon_massive_intent_en.md new file mode 100644 index 00000000000000..2966c659b335c4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-mdeberta_v3_base_amazon_massive_intent_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English mdeberta_v3_base_amazon_massive_intent DeBertaForSequenceClassification from cartesinus +author: John Snow Labs +name: mdeberta_v3_base_amazon_massive_intent +date: 2024-09-12 +tags: [en, open_source, onnx, sequence_classification, deberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DeBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DeBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`mdeberta_v3_base_amazon_massive_intent` is a English model originally trained by cartesinus. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/mdeberta_v3_base_amazon_massive_intent_en_5.5.0_3.0_1726162941500.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/mdeberta_v3_base_amazon_massive_intent_en_5.5.0_3.0_1726162941500.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DeBertaForSequenceClassification.pretrained("mdeberta_v3_base_amazon_massive_intent","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DeBertaForSequenceClassification.pretrained("mdeberta_v3_base_amazon_massive_intent", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|mdeberta_v3_base_amazon_massive_intent| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|839.3 MB| + +## References + +https://huggingface.co/cartesinus/mdeberta-v3-base_amazon-massive_intent \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-medical_tiny_english_1_1v_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-12-medical_tiny_english_1_1v_pipeline_en.md new file mode 100644 index 00000000000000..366e5f6b2fef9f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-medical_tiny_english_1_1v_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English medical_tiny_english_1_1v_pipeline pipeline WhisperForCTC from Dev372 +author: John Snow Labs +name: medical_tiny_english_1_1v_pipeline +date: 2024-09-12 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`medical_tiny_english_1_1v_pipeline` is a English model originally trained by Dev372. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/medical_tiny_english_1_1v_pipeline_en_5.5.0_3.0_1726137073505.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/medical_tiny_english_1_1v_pipeline_en_5.5.0_3.0_1726137073505.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("medical_tiny_english_1_1v_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("medical_tiny_english_1_1v_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|medical_tiny_english_1_1v_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|395.1 MB| + +## References + +https://huggingface.co/Dev372/Medical_tiny_en_1_1v + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-opus_big_fine_freq_wce_unsampled_en.md b/docs/_posts/ahmedlone127/2024-09-12-opus_big_fine_freq_wce_unsampled_en.md new file mode 100644 index 00000000000000..59f7ee0e81b7ce --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-opus_big_fine_freq_wce_unsampled_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English opus_big_fine_freq_wce_unsampled MarianTransformer from ethansimrm +author: John Snow Labs +name: opus_big_fine_freq_wce_unsampled +date: 2024-09-12 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opus_big_fine_freq_wce_unsampled` is a English model originally trained by ethansimrm. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opus_big_fine_freq_wce_unsampled_en_5.5.0_3.0_1726161501166.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opus_big_fine_freq_wce_unsampled_en_5.5.0_3.0_1726161501166.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("opus_big_fine_freq_wce_unsampled","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("opus_big_fine_freq_wce_unsampled","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opus_big_fine_freq_wce_unsampled| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/ethansimrm/opus_big_fine_freq_wce_unsampled \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-opus_maltese_english_indonesian_jakarta_best_loss_bleu_en.md b/docs/_posts/ahmedlone127/2024-09-12-opus_maltese_english_indonesian_jakarta_best_loss_bleu_en.md new file mode 100644 index 00000000000000..32d4fc5d1b7424 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-opus_maltese_english_indonesian_jakarta_best_loss_bleu_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English opus_maltese_english_indonesian_jakarta_best_loss_bleu MarianTransformer from yonathanstwn +author: John Snow Labs +name: opus_maltese_english_indonesian_jakarta_best_loss_bleu +date: 2024-09-12 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opus_maltese_english_indonesian_jakarta_best_loss_bleu` is a English model originally trained by yonathanstwn. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opus_maltese_english_indonesian_jakarta_best_loss_bleu_en_5.5.0_3.0_1726160815988.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opus_maltese_english_indonesian_jakarta_best_loss_bleu_en_5.5.0_3.0_1726160815988.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("opus_maltese_english_indonesian_jakarta_best_loss_bleu","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("opus_maltese_english_indonesian_jakarta_best_loss_bleu","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opus_maltese_english_indonesian_jakarta_best_loss_bleu| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|482.1 MB| + +## References + +https://huggingface.co/yonathanstwn/opus-mt-en-id-jakarta-best-loss-bleu \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-opus_maltese_multiple_languages_english_finetuned_npomo_english_15_epochs_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-12-opus_maltese_multiple_languages_english_finetuned_npomo_english_15_epochs_pipeline_en.md new file mode 100644 index 00000000000000..9364b57827c196 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-opus_maltese_multiple_languages_english_finetuned_npomo_english_15_epochs_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English opus_maltese_multiple_languages_english_finetuned_npomo_english_15_epochs_pipeline pipeline MarianTransformer from UnassumingOwl +author: John Snow Labs +name: opus_maltese_multiple_languages_english_finetuned_npomo_english_15_epochs_pipeline +date: 2024-09-12 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opus_maltese_multiple_languages_english_finetuned_npomo_english_15_epochs_pipeline` is a English model originally trained by UnassumingOwl. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opus_maltese_multiple_languages_english_finetuned_npomo_english_15_epochs_pipeline_en_5.5.0_3.0_1726161826051.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opus_maltese_multiple_languages_english_finetuned_npomo_english_15_epochs_pipeline_en_5.5.0_3.0_1726161826051.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("opus_maltese_multiple_languages_english_finetuned_npomo_english_15_epochs_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("opus_maltese_multiple_languages_english_finetuned_npomo_english_15_epochs_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opus_maltese_multiple_languages_english_finetuned_npomo_english_15_epochs_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|533.3 MB| + +## References + +https://huggingface.co/UnassumingOwl/opus-mt-mul-en-finetuned-npomo-en-15-epochs + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-opus_maltese_russian_english_finetuned_english_tonga_tonga_islands_russian_bukabyaka_en.md b/docs/_posts/ahmedlone127/2024-09-12-opus_maltese_russian_english_finetuned_english_tonga_tonga_islands_russian_bukabyaka_en.md new file mode 100644 index 00000000000000..66b2330a802a9d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-opus_maltese_russian_english_finetuned_english_tonga_tonga_islands_russian_bukabyaka_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English opus_maltese_russian_english_finetuned_english_tonga_tonga_islands_russian_bukabyaka MarianTransformer from BukaByaka +author: John Snow Labs +name: opus_maltese_russian_english_finetuned_english_tonga_tonga_islands_russian_bukabyaka +date: 2024-09-12 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opus_maltese_russian_english_finetuned_english_tonga_tonga_islands_russian_bukabyaka` is a English model originally trained by BukaByaka. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opus_maltese_russian_english_finetuned_english_tonga_tonga_islands_russian_bukabyaka_en_5.5.0_3.0_1726167248584.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opus_maltese_russian_english_finetuned_english_tonga_tonga_islands_russian_bukabyaka_en_5.5.0_3.0_1726167248584.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("opus_maltese_russian_english_finetuned_english_tonga_tonga_islands_russian_bukabyaka","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("opus_maltese_russian_english_finetuned_english_tonga_tonga_islands_russian_bukabyaka","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opus_maltese_russian_english_finetuned_english_tonga_tonga_islands_russian_bukabyaka| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|526.4 MB| + +## References + +https://huggingface.co/BukaByaka/opus-mt-ru-en-finetuned-en-to-ru \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-opus_maltese_russian_english_finetuned_english_tonga_tonga_islands_russian_bukabyaka_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-12-opus_maltese_russian_english_finetuned_english_tonga_tonga_islands_russian_bukabyaka_pipeline_en.md new file mode 100644 index 00000000000000..499048025d61e9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-opus_maltese_russian_english_finetuned_english_tonga_tonga_islands_russian_bukabyaka_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English opus_maltese_russian_english_finetuned_english_tonga_tonga_islands_russian_bukabyaka_pipeline pipeline MarianTransformer from BukaByaka +author: John Snow Labs +name: opus_maltese_russian_english_finetuned_english_tonga_tonga_islands_russian_bukabyaka_pipeline +date: 2024-09-12 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opus_maltese_russian_english_finetuned_english_tonga_tonga_islands_russian_bukabyaka_pipeline` is a English model originally trained by BukaByaka. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opus_maltese_russian_english_finetuned_english_tonga_tonga_islands_russian_bukabyaka_pipeline_en_5.5.0_3.0_1726167278408.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opus_maltese_russian_english_finetuned_english_tonga_tonga_islands_russian_bukabyaka_pipeline_en_5.5.0_3.0_1726167278408.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("opus_maltese_russian_english_finetuned_english_tonga_tonga_islands_russian_bukabyaka_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("opus_maltese_russian_english_finetuned_english_tonga_tonga_islands_russian_bukabyaka_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opus_maltese_russian_english_finetuned_english_tonga_tonga_islands_russian_bukabyaka_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|526.9 MB| + +## References + +https://huggingface.co/BukaByaka/opus-mt-ru-en-finetuned-en-to-ru + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-prototipo_4_emi_en.md b/docs/_posts/ahmedlone127/2024-09-12-prototipo_4_emi_en.md new file mode 100644 index 00000000000000..406fd85c1a6c0b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-prototipo_4_emi_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English prototipo_4_emi DistilBertForSequenceClassification from Armandodelca +author: John Snow Labs +name: prototipo_4_emi +date: 2024-09-12 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`prototipo_4_emi` is a English model originally trained by Armandodelca. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/prototipo_4_emi_en_5.5.0_3.0_1726125026098.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/prototipo_4_emi_en_5.5.0_3.0_1726125026098.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("prototipo_4_emi","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("prototipo_4_emi", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|prototipo_4_emi| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|252.5 MB| + +## References + +https://huggingface.co/Armandodelca/Prototipo_4_EMI \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-rbt8_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-12-rbt8_pipeline_en.md new file mode 100644 index 00000000000000..c6e17291b5b08c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-rbt8_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English rbt8_pipeline pipeline RoBertaForQuestionAnswering from SUTS102779289 +author: John Snow Labs +name: rbt8_pipeline +date: 2024-09-12 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`rbt8_pipeline` is a English model originally trained by SUTS102779289. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/rbt8_pipeline_en_5.5.0_3.0_1726106232059.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/rbt8_pipeline_en_5.5.0_3.0_1726106232059.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("rbt8_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("rbt8_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|rbt8_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|463.6 MB| + +## References + +https://huggingface.co/SUTS102779289/rbt8 + +## Included Models + +- MultiDocumentAssembler +- RoBertaForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-redo_norwegian_delete_5e_5_hausa_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-12-redo_norwegian_delete_5e_5_hausa_pipeline_en.md new file mode 100644 index 00000000000000..a0be91a2aa0dc2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-redo_norwegian_delete_5e_5_hausa_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English redo_norwegian_delete_5e_5_hausa_pipeline pipeline XlmRoBertaForTokenClassification from grace-pro +author: John Snow Labs +name: redo_norwegian_delete_5e_5_hausa_pipeline +date: 2024-09-12 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`redo_norwegian_delete_5e_5_hausa_pipeline` is a English model originally trained by grace-pro. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/redo_norwegian_delete_5e_5_hausa_pipeline_en_5.5.0_3.0_1726131820807.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/redo_norwegian_delete_5e_5_hausa_pipeline_en_5.5.0_3.0_1726131820807.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("redo_norwegian_delete_5e_5_hausa_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("redo_norwegian_delete_5e_5_hausa_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|redo_norwegian_delete_5e_5_hausa_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.0 GB| + +## References + +https://huggingface.co/grace-pro/redo_no_delete_5e-5_hausa + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-roberta_base_bne_finetuned_amazon_reviews_multi_anabach_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-12-roberta_base_bne_finetuned_amazon_reviews_multi_anabach_pipeline_en.md new file mode 100644 index 00000000000000..8ef435243a27de --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-roberta_base_bne_finetuned_amazon_reviews_multi_anabach_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_bne_finetuned_amazon_reviews_multi_anabach_pipeline pipeline RoBertaForSequenceClassification from AnaBach +author: John Snow Labs +name: roberta_base_bne_finetuned_amazon_reviews_multi_anabach_pipeline +date: 2024-09-12 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_bne_finetuned_amazon_reviews_multi_anabach_pipeline` is a English model originally trained by AnaBach. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_bne_finetuned_amazon_reviews_multi_anabach_pipeline_en_5.5.0_3.0_1726118239552.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_bne_finetuned_amazon_reviews_multi_anabach_pipeline_en_5.5.0_3.0_1726118239552.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_bne_finetuned_amazon_reviews_multi_anabach_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_bne_finetuned_amazon_reviews_multi_anabach_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_bne_finetuned_amazon_reviews_multi_anabach_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|446.8 MB| + +## References + +https://huggingface.co/AnaBach/roberta-base-bne-finetuned-amazon_reviews_multi + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-roberta_base_epoch_12_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-12-roberta_base_epoch_12_pipeline_en.md new file mode 100644 index 00000000000000..568a0d76fc266b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-roberta_base_epoch_12_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_epoch_12_pipeline pipeline RoBertaEmbeddings from yanaiela +author: John Snow Labs +name: roberta_base_epoch_12_pipeline +date: 2024-09-12 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_epoch_12_pipeline` is a English model originally trained by yanaiela. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_epoch_12_pipeline_en_5.5.0_3.0_1726113253790.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_epoch_12_pipeline_en_5.5.0_3.0_1726113253790.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_epoch_12_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_epoch_12_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_epoch_12_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|297.1 MB| + +## References + +https://huggingface.co/yanaiela/roberta-base-epoch_12 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-roberta_finetuned_subjqa_movies_2_vlso_en.md b/docs/_posts/ahmedlone127/2024-09-12-roberta_finetuned_subjqa_movies_2_vlso_en.md new file mode 100644 index 00000000000000..26db587301860a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-roberta_finetuned_subjqa_movies_2_vlso_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English roberta_finetuned_subjqa_movies_2_vlso RoBertaForQuestionAnswering from vlso +author: John Snow Labs +name: roberta_finetuned_subjqa_movies_2_vlso +date: 2024-09-12 +tags: [en, open_source, onnx, question_answering, roberta] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_finetuned_subjqa_movies_2_vlso` is a English model originally trained by vlso. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_finetuned_subjqa_movies_2_vlso_en_5.5.0_3.0_1726105966581.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_finetuned_subjqa_movies_2_vlso_en_5.5.0_3.0_1726105966581.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = RoBertaForQuestionAnswering.pretrained("roberta_finetuned_subjqa_movies_2_vlso","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = RoBertaForQuestionAnswering.pretrained("roberta_finetuned_subjqa_movies_2_vlso", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_finetuned_subjqa_movies_2_vlso| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|464.1 MB| + +## References + +https://huggingface.co/vlso/roberta-finetuned-subjqa-movies_2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-roberta_indosquadv2_1691593432_16_2e_06_0_01_5_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-12-roberta_indosquadv2_1691593432_16_2e_06_0_01_5_pipeline_en.md new file mode 100644 index 00000000000000..c48125f3b1a04f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-roberta_indosquadv2_1691593432_16_2e_06_0_01_5_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English roberta_indosquadv2_1691593432_16_2e_06_0_01_5_pipeline pipeline RoBertaForQuestionAnswering from rizquuula +author: John Snow Labs +name: roberta_indosquadv2_1691593432_16_2e_06_0_01_5_pipeline +date: 2024-09-12 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_indosquadv2_1691593432_16_2e_06_0_01_5_pipeline` is a English model originally trained by rizquuula. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_indosquadv2_1691593432_16_2e_06_0_01_5_pipeline_en_5.5.0_3.0_1726175807702.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_indosquadv2_1691593432_16_2e_06_0_01_5_pipeline_en_5.5.0_3.0_1726175807702.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_indosquadv2_1691593432_16_2e_06_0_01_5_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_indosquadv2_1691593432_16_2e_06_0_01_5_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_indosquadv2_1691593432_16_2e_06_0_01_5_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|447.2 MB| + +## References + +https://huggingface.co/rizquuula/RoBERTa-IndoSQuADv2_1691593432-16-2e-06-0.01-5 + +## Included Models + +- MultiDocumentAssembler +- RoBertaForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-roberta_large_finetuned_abrar_en.md b/docs/_posts/ahmedlone127/2024-09-12-roberta_large_finetuned_abrar_en.md new file mode 100644 index 00000000000000..5580a6fee4f747 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-roberta_large_finetuned_abrar_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_large_finetuned_abrar RoBertaEmbeddings from Transabrar +author: John Snow Labs +name: roberta_large_finetuned_abrar +date: 2024-09-12 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_large_finetuned_abrar` is a English model originally trained by Transabrar. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_large_finetuned_abrar_en_5.5.0_3.0_1726109326397.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_large_finetuned_abrar_en_5.5.0_3.0_1726109326397.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("roberta_large_finetuned_abrar","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("roberta_large_finetuned_abrar","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_large_finetuned_abrar| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/Transabrar/roberta-large-finetuned-abrar \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-roberta_large_finetuned_squad_seymacakir_en.md b/docs/_posts/ahmedlone127/2024-09-12-roberta_large_finetuned_squad_seymacakir_en.md new file mode 100644 index 00000000000000..8e758995defe23 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-roberta_large_finetuned_squad_seymacakir_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English roberta_large_finetuned_squad_seymacakir RoBertaForQuestionAnswering from SeymaCakir +author: John Snow Labs +name: roberta_large_finetuned_squad_seymacakir +date: 2024-09-12 +tags: [en, open_source, onnx, question_answering, roberta] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_large_finetuned_squad_seymacakir` is a English model originally trained by SeymaCakir. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_large_finetuned_squad_seymacakir_en_5.5.0_3.0_1726176366315.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_large_finetuned_squad_seymacakir_en_5.5.0_3.0_1726176366315.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = RoBertaForQuestionAnswering.pretrained("roberta_large_finetuned_squad_seymacakir","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = RoBertaForQuestionAnswering.pretrained("roberta_large_finetuned_squad_seymacakir", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_large_finetuned_squad_seymacakir| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/SeymaCakir/roberta-large-finetuned-squad \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-roberta_large_three_classification_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-12-roberta_large_three_classification_pipeline_en.md new file mode 100644 index 00000000000000..c593ef92eb28e2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-roberta_large_three_classification_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_large_three_classification_pipeline pipeline RoBertaForSequenceClassification from hagara +author: John Snow Labs +name: roberta_large_three_classification_pipeline +date: 2024-09-12 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_large_three_classification_pipeline` is a English model originally trained by hagara. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_large_three_classification_pipeline_en_5.5.0_3.0_1726117699892.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_large_three_classification_pipeline_en_5.5.0_3.0_1726117699892.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_large_three_classification_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_large_three_classification_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_large_three_classification_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/hagara/roberta-large-three-classification + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-roberta_mrqa_old_en.md b/docs/_posts/ahmedlone127/2024-09-12-roberta_mrqa_old_en.md new file mode 100644 index 00000000000000..69a7ff22db7d13 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-roberta_mrqa_old_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English roberta_mrqa_old RoBertaForQuestionAnswering from enriquesaou +author: John Snow Labs +name: roberta_mrqa_old +date: 2024-09-12 +tags: [en, open_source, onnx, question_answering, roberta] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_mrqa_old` is a English model originally trained by enriquesaou. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_mrqa_old_en_5.5.0_3.0_1726107019891.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_mrqa_old_en_5.5.0_3.0_1726107019891.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = RoBertaForQuestionAnswering.pretrained("roberta_mrqa_old","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = RoBertaForQuestionAnswering.pretrained("roberta_mrqa_old", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_mrqa_old| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|464.9 MB| + +## References + +https://huggingface.co/enriquesaou/roberta-mrqa-old \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-roberta_spanish_v1_en.md b/docs/_posts/ahmedlone127/2024-09-12-roberta_spanish_v1_en.md new file mode 100644 index 00000000000000..2282355120e85f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-roberta_spanish_v1_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English roberta_spanish_v1 RoBertaForQuestionAnswering from enriquesaou +author: John Snow Labs +name: roberta_spanish_v1 +date: 2024-09-12 +tags: [en, open_source, onnx, question_answering, roberta] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_spanish_v1` is a English model originally trained by enriquesaou. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_spanish_v1_en_5.5.0_3.0_1726106625576.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_spanish_v1_en_5.5.0_3.0_1726106625576.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = RoBertaForQuestionAnswering.pretrained("roberta_spanish_v1","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = RoBertaForQuestionAnswering.pretrained("roberta_spanish_v1", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_spanish_v1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|438.2 MB| + +## References + +https://huggingface.co/enriquesaou/roberta_es_v1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-rulebert_v0_2_k0_pipeline_it.md b/docs/_posts/ahmedlone127/2024-09-12-rulebert_v0_2_k0_pipeline_it.md new file mode 100644 index 00000000000000..93b55f6ed0e726 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-rulebert_v0_2_k0_pipeline_it.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Italian rulebert_v0_2_k0_pipeline pipeline XlmRoBertaForSequenceClassification from ribesstefano +author: John Snow Labs +name: rulebert_v0_2_k0_pipeline +date: 2024-09-12 +tags: [it, open_source, pipeline, onnx] +task: Text Classification +language: it +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`rulebert_v0_2_k0_pipeline` is a Italian model originally trained by ribesstefano. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/rulebert_v0_2_k0_pipeline_it_5.5.0_3.0_1726146446100.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/rulebert_v0_2_k0_pipeline_it_5.5.0_3.0_1726146446100.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("rulebert_v0_2_k0_pipeline", lang = "it") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("rulebert_v0_2_k0_pipeline", lang = "it") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|rulebert_v0_2_k0_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|it| +|Size:|870.5 MB| + +## References + +https://huggingface.co/ribesstefano/RuleBert-v0.2-k0 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-sent_bert_base_german_dbmdz_cased_pipeline_de.md b/docs/_posts/ahmedlone127/2024-09-12-sent_bert_base_german_dbmdz_cased_pipeline_de.md new file mode 100644 index 00000000000000..8c4c69bc6304ac --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-sent_bert_base_german_dbmdz_cased_pipeline_de.md @@ -0,0 +1,71 @@ +--- +layout: model +title: German sent_bert_base_german_dbmdz_cased_pipeline pipeline BertSentenceEmbeddings from google-bert +author: John Snow Labs +name: sent_bert_base_german_dbmdz_cased_pipeline +date: 2024-09-12 +tags: [de, open_source, pipeline, onnx] +task: Embeddings +language: de +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_german_dbmdz_cased_pipeline` is a German model originally trained by google-bert. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_german_dbmdz_cased_pipeline_de_5.5.0_3.0_1726177929003.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_german_dbmdz_cased_pipeline_de_5.5.0_3.0_1726177929003.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_german_dbmdz_cased_pipeline", lang = "de") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_german_dbmdz_cased_pipeline", lang = "de") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_german_dbmdz_cased_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|de| +|Size:|410.4 MB| + +## References + +https://huggingface.co/google-bert/bert-base-german-dbmdz-cased + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-sent_bert_base_qarib60_1970k_ar.md b/docs/_posts/ahmedlone127/2024-09-12-sent_bert_base_qarib60_1970k_ar.md new file mode 100644 index 00000000000000..a415d242b9943a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-sent_bert_base_qarib60_1970k_ar.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Arabic sent_bert_base_qarib60_1970k BertSentenceEmbeddings from qarib +author: John Snow Labs +name: sent_bert_base_qarib60_1970k +date: 2024-09-12 +tags: [ar, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: ar +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_qarib60_1970k` is a Arabic model originally trained by qarib. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_qarib60_1970k_ar_5.5.0_3.0_1726141102747.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_qarib60_1970k_ar_5.5.0_3.0_1726141102747.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_qarib60_1970k","ar") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_qarib60_1970k","ar") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_qarib60_1970k| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|ar| +|Size:|504.9 MB| + +## References + +https://huggingface.co/qarib/bert-base-qarib60_1970k \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-sent_turkish_small_bert_uncased_pipeline_tr.md b/docs/_posts/ahmedlone127/2024-09-12-sent_turkish_small_bert_uncased_pipeline_tr.md new file mode 100644 index 00000000000000..1b7c1c4e7eb64c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-sent_turkish_small_bert_uncased_pipeline_tr.md @@ -0,0 +1,71 @@ +--- +layout: model +title: Turkish sent_turkish_small_bert_uncased_pipeline pipeline BertSentenceEmbeddings from ytu-ce-cosmos +author: John Snow Labs +name: sent_turkish_small_bert_uncased_pipeline +date: 2024-09-12 +tags: [tr, open_source, pipeline, onnx] +task: Embeddings +language: tr +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_turkish_small_bert_uncased_pipeline` is a Turkish model originally trained by ytu-ce-cosmos. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_turkish_small_bert_uncased_pipeline_tr_5.5.0_3.0_1726119337304.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_turkish_small_bert_uncased_pipeline_tr_5.5.0_3.0_1726119337304.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_turkish_small_bert_uncased_pipeline", lang = "tr") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_turkish_small_bert_uncased_pipeline", lang = "tr") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_turkish_small_bert_uncased_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|tr| +|Size:|110.4 MB| + +## References + +https://huggingface.co/ytu-ce-cosmos/turkish-small-bert-uncased + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-sentiment_analysis_distilbert_ghylb_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-12-sentiment_analysis_distilbert_ghylb_pipeline_en.md new file mode 100644 index 00000000000000..975f8491e79706 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-sentiment_analysis_distilbert_ghylb_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English sentiment_analysis_distilbert_ghylb_pipeline pipeline DistilBertForSequenceClassification from GhylB +author: John Snow Labs +name: sentiment_analysis_distilbert_ghylb_pipeline +date: 2024-09-12 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sentiment_analysis_distilbert_ghylb_pipeline` is a English model originally trained by GhylB. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sentiment_analysis_distilbert_ghylb_pipeline_en_5.5.0_3.0_1726100339292.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sentiment_analysis_distilbert_ghylb_pipeline_en_5.5.0_3.0_1726100339292.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sentiment_analysis_distilbert_ghylb_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sentiment_analysis_distilbert_ghylb_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sentiment_analysis_distilbert_ghylb_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/GhylB/Sentiment_Analysis_DistilBERT + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-spamai_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-12-spamai_pipeline_en.md new file mode 100644 index 00000000000000..a3583eb7979e8f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-spamai_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English spamai_pipeline pipeline BertForSequenceClassification from cybert79 +author: John Snow Labs +name: spamai_pipeline +date: 2024-09-12 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`spamai_pipeline` is a English model originally trained by cybert79. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/spamai_pipeline_en_5.5.0_3.0_1726123203710.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/spamai_pipeline_en_5.5.0_3.0_1726123203710.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("spamai_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("spamai_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|spamai_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/cybert79/spamai + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-twitter_scratch_roberta_base_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-12-twitter_scratch_roberta_base_pipeline_en.md new file mode 100644 index 00000000000000..7076aab2bb873e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-twitter_scratch_roberta_base_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English twitter_scratch_roberta_base_pipeline pipeline RoBertaEmbeddings from cardiffnlp +author: John Snow Labs +name: twitter_scratch_roberta_base_pipeline +date: 2024-09-12 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`twitter_scratch_roberta_base_pipeline` is a English model originally trained by cardiffnlp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/twitter_scratch_roberta_base_pipeline_en_5.5.0_3.0_1726109395895.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/twitter_scratch_roberta_base_pipeline_en_5.5.0_3.0_1726109395895.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("twitter_scratch_roberta_base_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("twitter_scratch_roberta_base_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|twitter_scratch_roberta_base_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|464.0 MB| + +## References + +https://huggingface.co/cardiffnlp/twitter-scratch-roberta-base + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-whisper_base_cv16_hungarian_v2_pipeline_hu.md b/docs/_posts/ahmedlone127/2024-09-12-whisper_base_cv16_hungarian_v2_pipeline_hu.md new file mode 100644 index 00000000000000..067ed5e2e0108a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-whisper_base_cv16_hungarian_v2_pipeline_hu.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Hungarian whisper_base_cv16_hungarian_v2_pipeline pipeline WhisperForCTC from Hungarians +author: John Snow Labs +name: whisper_base_cv16_hungarian_v2_pipeline +date: 2024-09-12 +tags: [hu, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: hu +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_base_cv16_hungarian_v2_pipeline` is a Hungarian model originally trained by Hungarians. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_base_cv16_hungarian_v2_pipeline_hu_5.5.0_3.0_1726151920697.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_base_cv16_hungarian_v2_pipeline_hu_5.5.0_3.0_1726151920697.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_base_cv16_hungarian_v2_pipeline", lang = "hu") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_base_cv16_hungarian_v2_pipeline", lang = "hu") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_base_cv16_hungarian_v2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|hu| +|Size:|638.3 MB| + +## References + +https://huggingface.co/Hungarians/whisper-base-cv16-hu-v2 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-whisper_small_afrikaans_za_ptah23_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-12-whisper_small_afrikaans_za_ptah23_pipeline_en.md new file mode 100644 index 00000000000000..f2fed0f8cbcf34 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-whisper_small_afrikaans_za_ptah23_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_small_afrikaans_za_ptah23_pipeline pipeline WhisperForCTC from ptah23 +author: John Snow Labs +name: whisper_small_afrikaans_za_ptah23_pipeline +date: 2024-09-12 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_afrikaans_za_ptah23_pipeline` is a English model originally trained by ptah23. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_afrikaans_za_ptah23_pipeline_en_5.5.0_3.0_1726150689066.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_afrikaans_za_ptah23_pipeline_en_5.5.0_3.0_1726150689066.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_afrikaans_za_ptah23_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_afrikaans_za_ptah23_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_afrikaans_za_ptah23_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/ptah23/whisper-small-af-ZA + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-whisper_small_arabic_2_ar.md b/docs/_posts/ahmedlone127/2024-09-12-whisper_small_arabic_2_ar.md new file mode 100644 index 00000000000000..4beafdfd6ac384 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-whisper_small_arabic_2_ar.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Arabic whisper_small_arabic_2 WhisperForCTC from UAEpro +author: John Snow Labs +name: whisper_small_arabic_2 +date: 2024-09-12 +tags: [ar, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: ar +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_arabic_2` is a Arabic model originally trained by UAEpro. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_arabic_2_ar_5.5.0_3.0_1726137152667.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_arabic_2_ar_5.5.0_3.0_1726137152667.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_arabic_2","ar") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_arabic_2", "ar") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_arabic_2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|ar| +|Size:|1.7 GB| + +## References + +https://huggingface.co/UAEpro/whisper-small-ar-2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-whisper_small_taiwanese_yuweiiizz_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-12-whisper_small_taiwanese_yuweiiizz_pipeline_en.md new file mode 100644 index 00000000000000..71c83e5727408a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-whisper_small_taiwanese_yuweiiizz_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_small_taiwanese_yuweiiizz_pipeline pipeline WhisperForCTC from yuweiiizz +author: John Snow Labs +name: whisper_small_taiwanese_yuweiiizz_pipeline +date: 2024-09-12 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_taiwanese_yuweiiizz_pipeline` is a English model originally trained by yuweiiizz. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_taiwanese_yuweiiizz_pipeline_en_5.5.0_3.0_1726135931062.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_taiwanese_yuweiiizz_pipeline_en_5.5.0_3.0_1726135931062.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_taiwanese_yuweiiizz_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_taiwanese_yuweiiizz_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_taiwanese_yuweiiizz_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/yuweiiizz/whisper-small-taiwanese + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-whisper_tiny_english_jonnhan_en.md b/docs/_posts/ahmedlone127/2024-09-12-whisper_tiny_english_jonnhan_en.md new file mode 100644 index 00000000000000..603d0d425700bb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-whisper_tiny_english_jonnhan_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_tiny_english_jonnhan WhisperForCTC from Jonnhan +author: John Snow Labs +name: whisper_tiny_english_jonnhan +date: 2024-09-12 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_english_jonnhan` is a English model originally trained by Jonnhan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_english_jonnhan_en_5.5.0_3.0_1726134921342.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_english_jonnhan_en_5.5.0_3.0_1726134921342.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_tiny_english_jonnhan","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_tiny_english_jonnhan", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_english_jonnhan| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|390.9 MB| + +## References + +https://huggingface.co/Jonnhan/whisper-tiny-en \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-xlm_roberta_base_final_vietnam_aug_replace_bert_2_en.md b/docs/_posts/ahmedlone127/2024-09-12-xlm_roberta_base_final_vietnam_aug_replace_bert_2_en.md new file mode 100644 index 00000000000000..afef67c8c9f41e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-xlm_roberta_base_final_vietnam_aug_replace_bert_2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_final_vietnam_aug_replace_bert_2 XlmRoBertaForSequenceClassification from ThuyNT03 +author: John Snow Labs +name: xlm_roberta_base_final_vietnam_aug_replace_bert_2 +date: 2024-09-12 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_final_vietnam_aug_replace_bert_2` is a English model originally trained by ThuyNT03. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_final_vietnam_aug_replace_bert_2_en_5.5.0_3.0_1726147119157.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_final_vietnam_aug_replace_bert_2_en_5.5.0_3.0_1726147119157.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_final_vietnam_aug_replace_bert_2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_final_vietnam_aug_replace_bert_2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_final_vietnam_aug_replace_bert_2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|794.4 MB| + +## References + +https://huggingface.co/ThuyNT03/xlm-roberta-base-Final_VietNam-aug_replace_BERT-2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-xlm_roberta_base_finetuned_panx_english_sorabe_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-12-xlm_roberta_base_finetuned_panx_english_sorabe_pipeline_en.md new file mode 100644 index 00000000000000..7423240312782f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-xlm_roberta_base_finetuned_panx_english_sorabe_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_english_sorabe_pipeline pipeline XlmRoBertaForTokenClassification from SORABE +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_english_sorabe_pipeline +date: 2024-09-12 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_english_sorabe_pipeline` is a English model originally trained by SORABE. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_sorabe_pipeline_en_5.5.0_3.0_1726164758491.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_sorabe_pipeline_en_5.5.0_3.0_1726164758491.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_english_sorabe_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_english_sorabe_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_english_sorabe_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|814.3 MB| + +## References + +https://huggingface.co/SORABE/xlm-roberta-base-finetuned-panx-en + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-xlm_roberta_base_finetuned_panx_french_jx7789_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-12-xlm_roberta_base_finetuned_panx_french_jx7789_pipeline_en.md new file mode 100644 index 00000000000000..dfdaaa0ef125d5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-xlm_roberta_base_finetuned_panx_french_jx7789_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_jx7789_pipeline pipeline XlmRoBertaForTokenClassification from jx7789 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_jx7789_pipeline +date: 2024-09-12 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_jx7789_pipeline` is a English model originally trained by jx7789. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_jx7789_pipeline_en_5.5.0_3.0_1726131611153.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_jx7789_pipeline_en_5.5.0_3.0_1726131611153.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_jx7789_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_jx7789_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_jx7789_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|840.9 MB| + +## References + +https://huggingface.co/jx7789/xlm-roberta-base-finetuned-panx-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-xlm_roberta_base_finetuned_panx_french_pstary_en.md b/docs/_posts/ahmedlone127/2024-09-12-xlm_roberta_base_finetuned_panx_french_pstary_en.md new file mode 100644 index 00000000000000..cc1d993a8ec321 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-xlm_roberta_base_finetuned_panx_french_pstary_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_pstary XlmRoBertaForTokenClassification from Pstary +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_pstary +date: 2024-09-12 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_pstary` is a English model originally trained by Pstary. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_pstary_en_5.5.0_3.0_1726156619423.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_pstary_en_5.5.0_3.0_1726156619423.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_french_pstary","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_french_pstary", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_pstary| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|827.9 MB| + +## References + +https://huggingface.co/Pstary/xlm-roberta-base-finetuned-panx-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-xlm_roberta_base_finetuned_panx_german_french_abdus_en.md b/docs/_posts/ahmedlone127/2024-09-12-xlm_roberta_base_finetuned_panx_german_french_abdus_en.md new file mode 100644 index 00000000000000..a6c756b3995fb5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-xlm_roberta_base_finetuned_panx_german_french_abdus_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_abdus XlmRoBertaForTokenClassification from abdus +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_abdus +date: 2024-09-12 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_abdus` is a English model originally trained by abdus. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_abdus_en_5.5.0_3.0_1726156323696.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_abdus_en_5.5.0_3.0_1726156323696.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_abdus","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_abdus", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_abdus| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|858.2 MB| + +## References + +https://huggingface.co/abdus/xlm-roberta-base-finetuned-panx-de-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-xlm_roberta_base_finetuned_panx_german_french_amartyobanerjee_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-12-xlm_roberta_base_finetuned_panx_german_french_amartyobanerjee_pipeline_en.md new file mode 100644 index 00000000000000..f4b0617514bcb0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-xlm_roberta_base_finetuned_panx_german_french_amartyobanerjee_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_amartyobanerjee_pipeline pipeline XlmRoBertaForTokenClassification from amartyobanerjee +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_amartyobanerjee_pipeline +date: 2024-09-12 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_amartyobanerjee_pipeline` is a English model originally trained by amartyobanerjee. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_amartyobanerjee_pipeline_en_5.5.0_3.0_1726159758933.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_amartyobanerjee_pipeline_en_5.5.0_3.0_1726159758933.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_amartyobanerjee_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_amartyobanerjee_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_amartyobanerjee_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|858.2 MB| + +## References + +https://huggingface.co/amartyobanerjee/xlm-roberta-base-finetuned-panx-de-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-xlm_roberta_base_finetuned_panx_german_french_khadija267_en.md b/docs/_posts/ahmedlone127/2024-09-12-xlm_roberta_base_finetuned_panx_german_french_khadija267_en.md new file mode 100644 index 00000000000000..a1d1bbafd49611 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-xlm_roberta_base_finetuned_panx_german_french_khadija267_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_khadija267 XlmRoBertaForTokenClassification from khadija267 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_khadija267 +date: 2024-09-12 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_khadija267` is a English model originally trained by khadija267. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_khadija267_en_5.5.0_3.0_1726165176560.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_khadija267_en_5.5.0_3.0_1726165176560.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_khadija267","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_khadija267", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_khadija267| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|858.2 MB| + +## References + +https://huggingface.co/khadija267/xlm-roberta-base-finetuned-panx-de-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-xlm_roberta_base_finetuned_panx_german_french_pstary_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-12-xlm_roberta_base_finetuned_panx_german_french_pstary_pipeline_en.md new file mode 100644 index 00000000000000..a1cbca2ed29aad --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-xlm_roberta_base_finetuned_panx_german_french_pstary_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_pstary_pipeline pipeline XlmRoBertaForTokenClassification from Pstary +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_pstary_pipeline +date: 2024-09-12 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_pstary_pipeline` is a English model originally trained by Pstary. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_pstary_pipeline_en_5.5.0_3.0_1726160306217.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_pstary_pipeline_en_5.5.0_3.0_1726160306217.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_pstary_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_pstary_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_pstary_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|843.4 MB| + +## References + +https://huggingface.co/Pstary/xlm-roberta-base-finetuned-panx-de-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-xlm_roberta_base_finetuned_panx_german_mshirae3_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-12-xlm_roberta_base_finetuned_panx_german_mshirae3_pipeline_en.md new file mode 100644 index 00000000000000..67477d30ac4621 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-xlm_roberta_base_finetuned_panx_german_mshirae3_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_mshirae3_pipeline pipeline XlmRoBertaForTokenClassification from mshirae3 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_mshirae3_pipeline +date: 2024-09-12 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_mshirae3_pipeline` is a English model originally trained by mshirae3. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_mshirae3_pipeline_en_5.5.0_3.0_1726130332967.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_mshirae3_pipeline_en_5.5.0_3.0_1726130332967.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_mshirae3_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_mshirae3_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_mshirae3_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|840.8 MB| + +## References + +https://huggingface.co/mshirae3/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-xlm_roberta_base_finetuned_panx_hindi_deepaperi_en.md b/docs/_posts/ahmedlone127/2024-09-12-xlm_roberta_base_finetuned_panx_hindi_deepaperi_en.md new file mode 100644 index 00000000000000..5e4b8c1d2a4e20 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-xlm_roberta_base_finetuned_panx_hindi_deepaperi_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_hindi_deepaperi XlmRoBertaForTokenClassification from DeepaPeri +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_hindi_deepaperi +date: 2024-09-12 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_hindi_deepaperi` is a English model originally trained by DeepaPeri. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_hindi_deepaperi_en_5.5.0_3.0_1726116854558.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_hindi_deepaperi_en_5.5.0_3.0_1726116854558.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_hindi_deepaperi","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_hindi_deepaperi", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_hindi_deepaperi| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|820.4 MB| + +## References + +https://huggingface.co/DeepaPeri/xlm-roberta-base-finetuned-panx-hi \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-akan_1_ak.md b/docs/_posts/ahmedlone127/2024-09-13-akan_1_ak.md new file mode 100644 index 00000000000000..737401a2920445 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-akan_1_ak.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Akan akan_1 WhisperForCTC from devkyle +author: John Snow Labs +name: akan_1 +date: 2024-09-13 +tags: [ak, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: ak +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`akan_1` is a Akan model originally trained by devkyle. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/akan_1_ak_5.5.0_3.0_1726252437935.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/akan_1_ak_5.5.0_3.0_1726252437935.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("akan_1","ak") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("akan_1", "ak") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|akan_1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|ak| +|Size:|389.9 MB| + +## References + +https://huggingface.co/devkyle/Akan-1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-antismetisimlargedata_finetuned_mlm_nepal_bhasa_en.md b/docs/_posts/ahmedlone127/2024-09-13-antismetisimlargedata_finetuned_mlm_nepal_bhasa_en.md new file mode 100644 index 00000000000000..f488eca577376c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-antismetisimlargedata_finetuned_mlm_nepal_bhasa_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English antismetisimlargedata_finetuned_mlm_nepal_bhasa BertEmbeddings from Dhanush66 +author: John Snow Labs +name: antismetisimlargedata_finetuned_mlm_nepal_bhasa +date: 2024-09-13 +tags: [en, open_source, onnx, embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`antismetisimlargedata_finetuned_mlm_nepal_bhasa` is a English model originally trained by Dhanush66. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/antismetisimlargedata_finetuned_mlm_nepal_bhasa_en_5.5.0_3.0_1726189255244.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/antismetisimlargedata_finetuned_mlm_nepal_bhasa_en_5.5.0_3.0_1726189255244.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = BertEmbeddings.pretrained("antismetisimlargedata_finetuned_mlm_nepal_bhasa","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = BertEmbeddings.pretrained("antismetisimlargedata_finetuned_mlm_nepal_bhasa","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|antismetisimlargedata_finetuned_mlm_nepal_bhasa| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[bert]| +|Language:|en| +|Size:|407.1 MB| + +## References + +https://huggingface.co/Dhanush66/AntismetisimLargedata-finetuned-MLM-NEW \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-babyberta_aochildes_french_wikipedia_french_without_masking_finetuned_squad_en.md b/docs/_posts/ahmedlone127/2024-09-13-babyberta_aochildes_french_wikipedia_french_without_masking_finetuned_squad_en.md new file mode 100644 index 00000000000000..1b23a746be67d1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-babyberta_aochildes_french_wikipedia_french_without_masking_finetuned_squad_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English babyberta_aochildes_french_wikipedia_french_without_masking_finetuned_squad RoBertaForQuestionAnswering from lielbin +author: John Snow Labs +name: babyberta_aochildes_french_wikipedia_french_without_masking_finetuned_squad +date: 2024-09-13 +tags: [en, open_source, onnx, question_answering, roberta] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`babyberta_aochildes_french_wikipedia_french_without_masking_finetuned_squad` is a English model originally trained by lielbin. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/babyberta_aochildes_french_wikipedia_french_without_masking_finetuned_squad_en_5.5.0_3.0_1726199000004.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/babyberta_aochildes_french_wikipedia_french_without_masking_finetuned_squad_en_5.5.0_3.0_1726199000004.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = RoBertaForQuestionAnswering.pretrained("babyberta_aochildes_french_wikipedia_french_without_masking_finetuned_squad","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = RoBertaForQuestionAnswering.pretrained("babyberta_aochildes_french_wikipedia_french_without_masking_finetuned_squad", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|babyberta_aochildes_french_wikipedia_french_without_masking_finetuned_squad| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|32.0 MB| + +## References + +https://huggingface.co/lielbin/BabyBERTa-aochildes-french_wikipedia_french-without-Masking-finetuned-SQuAD \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-babylm_roberta_base_epoch_10_en.md b/docs/_posts/ahmedlone127/2024-09-13-babylm_roberta_base_epoch_10_en.md new file mode 100644 index 00000000000000..989dceabc84bab --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-babylm_roberta_base_epoch_10_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English babylm_roberta_base_epoch_10 RoBertaEmbeddings from Raj-Sanjay-Shah +author: John Snow Labs +name: babylm_roberta_base_epoch_10 +date: 2024-09-13 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`babylm_roberta_base_epoch_10` is a English model originally trained by Raj-Sanjay-Shah. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/babylm_roberta_base_epoch_10_en_5.5.0_3.0_1726197290444.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/babylm_roberta_base_epoch_10_en_5.5.0_3.0_1726197290444.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("babylm_roberta_base_epoch_10","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("babylm_roberta_base_epoch_10","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|babylm_roberta_base_epoch_10| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|465.6 MB| + +## References + +https://huggingface.co/Raj-Sanjay-Shah/babyLM_roberta_base_epoch_10 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-bert_ner_custom_vasanth_en.md b/docs/_posts/ahmedlone127/2024-09-13-bert_ner_custom_vasanth_en.md new file mode 100644 index 00000000000000..5a67f0e4532cb4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-bert_ner_custom_vasanth_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_ner_custom_vasanth BertForTokenClassification from Vasanth +author: John Snow Labs +name: bert_ner_custom_vasanth +date: 2024-09-13 +tags: [en, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_ner_custom_vasanth` is a English model originally trained by Vasanth. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_custom_vasanth_en_5.5.0_3.0_1726268141347.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_custom_vasanth_en_5.5.0_3.0_1726268141347.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_custom_vasanth","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_custom_vasanth", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_custom_vasanth| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/Vasanth/bert-ner-custom \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-bert_ner_custom_vasanth_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-13-bert_ner_custom_vasanth_pipeline_en.md new file mode 100644 index 00000000000000..f8d3d1cf34d1d7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-bert_ner_custom_vasanth_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_ner_custom_vasanth_pipeline pipeline BertForTokenClassification from Vasanth +author: John Snow Labs +name: bert_ner_custom_vasanth_pipeline +date: 2024-09-13 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_ner_custom_vasanth_pipeline` is a English model originally trained by Vasanth. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_custom_vasanth_pipeline_en_5.5.0_3.0_1726268159839.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_custom_vasanth_pipeline_en_5.5.0_3.0_1726268159839.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_ner_custom_vasanth_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_ner_custom_vasanth_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_custom_vasanth_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/Vasanth/bert-ner-custom + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-bertatweetgr_pipeline_el.md b/docs/_posts/ahmedlone127/2024-09-13-bertatweetgr_pipeline_el.md new file mode 100644 index 00000000000000..576cc580584e14 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-bertatweetgr_pipeline_el.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Modern Greek (1453-) bertatweetgr_pipeline pipeline RoBertaEmbeddings from Konstantinos +author: John Snow Labs +name: bertatweetgr_pipeline +date: 2024-09-13 +tags: [el, open_source, pipeline, onnx] +task: Embeddings +language: el +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bertatweetgr_pipeline` is a Modern Greek (1453-) model originally trained by Konstantinos. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bertatweetgr_pipeline_el_5.5.0_3.0_1726197597591.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bertatweetgr_pipeline_el_5.5.0_3.0_1726197597591.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bertatweetgr_pipeline", lang = "el") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bertatweetgr_pipeline", lang = "el") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bertatweetgr_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|el| +|Size:|312.0 MB| + +## References + +https://huggingface.co/Konstantinos/BERTaTweetGR + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-bleurt_large_512_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-13-bleurt_large_512_pipeline_en.md new file mode 100644 index 00000000000000..4c5ffabd52e2ab --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-bleurt_large_512_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bleurt_large_512_pipeline pipeline BertForSequenceClassification from Elron +author: John Snow Labs +name: bleurt_large_512_pipeline +date: 2024-09-13 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bleurt_large_512_pipeline` is a English model originally trained by Elron. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bleurt_large_512_pipeline_en_5.5.0_3.0_1726201595866.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bleurt_large_512_pipeline_en_5.5.0_3.0_1726201595866.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bleurt_large_512_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bleurt_large_512_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bleurt_large_512_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/Elron/bleurt-large-512 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-bulbert_wiki_bulgarian_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-13-bulbert_wiki_bulgarian_pipeline_en.md new file mode 100644 index 00000000000000..3d3d9cca79a5c5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-bulbert_wiki_bulgarian_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bulbert_wiki_bulgarian_pipeline pipeline RoBertaEmbeddings from mor40 +author: John Snow Labs +name: bulbert_wiki_bulgarian_pipeline +date: 2024-09-13 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bulbert_wiki_bulgarian_pipeline` is a English model originally trained by mor40. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bulbert_wiki_bulgarian_pipeline_en_5.5.0_3.0_1726197151634.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bulbert_wiki_bulgarian_pipeline_en_5.5.0_3.0_1726197151634.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bulbert_wiki_bulgarian_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bulbert_wiki_bulgarian_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bulbert_wiki_bulgarian_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|306.8 MB| + +## References + +https://huggingface.co/mor40/BulBERT-wiki-bg + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-burmese_awesome_qa_model_maniack_en.md b/docs/_posts/ahmedlone127/2024-09-13-burmese_awesome_qa_model_maniack_en.md new file mode 100644 index 00000000000000..3d658bfb6d41e2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-burmese_awesome_qa_model_maniack_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English burmese_awesome_qa_model_maniack DistilBertForQuestionAnswering from maniack +author: John Snow Labs +name: burmese_awesome_qa_model_maniack +date: 2024-09-13 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_qa_model_maniack` is a English model originally trained by maniack. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_maniack_en_5.5.0_3.0_1726245512994.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_maniack_en_5.5.0_3.0_1726245512994.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("burmese_awesome_qa_model_maniack","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("burmese_awesome_qa_model_maniack", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_qa_model_maniack| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/maniack/my_awesome_qa_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-burmese_awesome_qa_model_mikeldiez_en.md b/docs/_posts/ahmedlone127/2024-09-13-burmese_awesome_qa_model_mikeldiez_en.md new file mode 100644 index 00000000000000..f40f519921ffd8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-burmese_awesome_qa_model_mikeldiez_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English burmese_awesome_qa_model_mikeldiez DistilBertForQuestionAnswering from mikeldiez +author: John Snow Labs +name: burmese_awesome_qa_model_mikeldiez +date: 2024-09-13 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_qa_model_mikeldiez` is a English model originally trained by mikeldiez. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_mikeldiez_en_5.5.0_3.0_1726267117335.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_mikeldiez_en_5.5.0_3.0_1726267117335.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("burmese_awesome_qa_model_mikeldiez","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("burmese_awesome_qa_model_mikeldiez", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_qa_model_mikeldiez| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|505.4 MB| + +## References + +https://huggingface.co/mikeldiez/my_awesome_qa_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-burmese_awesome_qa_model_mikeldiez_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-13-burmese_awesome_qa_model_mikeldiez_pipeline_en.md new file mode 100644 index 00000000000000..b8e24c5aaf84b9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-burmese_awesome_qa_model_mikeldiez_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English burmese_awesome_qa_model_mikeldiez_pipeline pipeline DistilBertForQuestionAnswering from mikeldiez +author: John Snow Labs +name: burmese_awesome_qa_model_mikeldiez_pipeline +date: 2024-09-13 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_qa_model_mikeldiez_pipeline` is a English model originally trained by mikeldiez. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_mikeldiez_pipeline_en_5.5.0_3.0_1726267139370.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_mikeldiez_pipeline_en_5.5.0_3.0_1726267139370.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_qa_model_mikeldiez_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_qa_model_mikeldiez_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_qa_model_mikeldiez_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|505.4 MB| + +## References + +https://huggingface.co/mikeldiez/my_awesome_qa_model + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-burmese_awesome_qa_model_samira1234_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-13-burmese_awesome_qa_model_samira1234_pipeline_en.md new file mode 100644 index 00000000000000..1aee7acdce2cc0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-burmese_awesome_qa_model_samira1234_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English burmese_awesome_qa_model_samira1234_pipeline pipeline DistilBertForQuestionAnswering from samira1234 +author: John Snow Labs +name: burmese_awesome_qa_model_samira1234_pipeline +date: 2024-09-13 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_qa_model_samira1234_pipeline` is a English model originally trained by samira1234. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_samira1234_pipeline_en_5.5.0_3.0_1726266934955.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_samira1234_pipeline_en_5.5.0_3.0_1726266934955.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_qa_model_samira1234_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_qa_model_samira1234_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_qa_model_samira1234_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/samira1234/my_awesome_qa_model + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-crowspairs_trainer_roberta_large_finetuned_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-13-crowspairs_trainer_roberta_large_finetuned_pipeline_en.md new file mode 100644 index 00000000000000..d6c952a122b408 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-crowspairs_trainer_roberta_large_finetuned_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English crowspairs_trainer_roberta_large_finetuned_pipeline pipeline RoBertaForSequenceClassification from henryscheible +author: John Snow Labs +name: crowspairs_trainer_roberta_large_finetuned_pipeline +date: 2024-09-13 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`crowspairs_trainer_roberta_large_finetuned_pipeline` is a English model originally trained by henryscheible. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/crowspairs_trainer_roberta_large_finetuned_pipeline_en_5.5.0_3.0_1726247117619.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/crowspairs_trainer_roberta_large_finetuned_pipeline_en_5.5.0_3.0_1726247117619.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("crowspairs_trainer_roberta_large_finetuned_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("crowspairs_trainer_roberta_large_finetuned_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|crowspairs_trainer_roberta_large_finetuned_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/henryscheible/crowspairs_trainer_roberta-large_finetuned + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-das_rest2cam_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-13-das_rest2cam_pipeline_en.md new file mode 100644 index 00000000000000..d998486a40b39d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-das_rest2cam_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English das_rest2cam_pipeline pipeline RoBertaEmbeddings from UIC-Liu-Lab +author: John Snow Labs +name: das_rest2cam_pipeline +date: 2024-09-13 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`das_rest2cam_pipeline` is a English model originally trained by UIC-Liu-Lab. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/das_rest2cam_pipeline_en_5.5.0_3.0_1726264988140.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/das_rest2cam_pipeline_en_5.5.0_3.0_1726264988140.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("das_rest2cam_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("das_rest2cam_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|das_rest2cam_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|466.0 MB| + +## References + +https://huggingface.co/UIC-Liu-Lab/DAS-Rest2Cam + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-deberta_amazon_reviews_v1_westernmonster_en.md b/docs/_posts/ahmedlone127/2024-09-13-deberta_amazon_reviews_v1_westernmonster_en.md new file mode 100644 index 00000000000000..ba779c6ab9627f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-deberta_amazon_reviews_v1_westernmonster_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English deberta_amazon_reviews_v1_westernmonster DeBertaForSequenceClassification from westernmonster +author: John Snow Labs +name: deberta_amazon_reviews_v1_westernmonster +date: 2024-09-13 +tags: [en, open_source, onnx, sequence_classification, deberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DeBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DeBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`deberta_amazon_reviews_v1_westernmonster` is a English model originally trained by westernmonster. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/deberta_amazon_reviews_v1_westernmonster_en_5.5.0_3.0_1726190248635.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/deberta_amazon_reviews_v1_westernmonster_en_5.5.0_3.0_1726190248635.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DeBertaForSequenceClassification.pretrained("deberta_amazon_reviews_v1_westernmonster","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DeBertaForSequenceClassification.pretrained("deberta_amazon_reviews_v1_westernmonster", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|deberta_amazon_reviews_v1_westernmonster| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|621.2 MB| + +## References + +https://huggingface.co/westernmonster/deberta_amazon_reviews_v1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-deberta_v3_base_civil_comments_wilds_10k_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-13-deberta_v3_base_civil_comments_wilds_10k_pipeline_en.md new file mode 100644 index 00000000000000..778db89bfa6705 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-deberta_v3_base_civil_comments_wilds_10k_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English deberta_v3_base_civil_comments_wilds_10k_pipeline pipeline DeBertaForSequenceClassification from shlomihod +author: John Snow Labs +name: deberta_v3_base_civil_comments_wilds_10k_pipeline +date: 2024-09-13 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DeBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`deberta_v3_base_civil_comments_wilds_10k_pipeline` is a English model originally trained by shlomihod. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/deberta_v3_base_civil_comments_wilds_10k_pipeline_en_5.5.0_3.0_1726270920386.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/deberta_v3_base_civil_comments_wilds_10k_pipeline_en_5.5.0_3.0_1726270920386.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("deberta_v3_base_civil_comments_wilds_10k_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("deberta_v3_base_civil_comments_wilds_10k_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|deberta_v3_base_civil_comments_wilds_10k_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|623.8 MB| + +## References + +https://huggingface.co/shlomihod/deberta-v3-base-civil-comments-wilds-10k + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DeBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-deberta_v3_large_survey_related_passage_consistency_rater_half_gpt4_en.md b/docs/_posts/ahmedlone127/2024-09-13-deberta_v3_large_survey_related_passage_consistency_rater_half_gpt4_en.md new file mode 100644 index 00000000000000..d4599c41d8db46 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-deberta_v3_large_survey_related_passage_consistency_rater_half_gpt4_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English deberta_v3_large_survey_related_passage_consistency_rater_half_gpt4 DeBertaForSequenceClassification from domenicrosati +author: John Snow Labs +name: deberta_v3_large_survey_related_passage_consistency_rater_half_gpt4 +date: 2024-09-13 +tags: [en, open_source, onnx, sequence_classification, deberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DeBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DeBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`deberta_v3_large_survey_related_passage_consistency_rater_half_gpt4` is a English model originally trained by domenicrosati. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/deberta_v3_large_survey_related_passage_consistency_rater_half_gpt4_en_5.5.0_3.0_1726199795612.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/deberta_v3_large_survey_related_passage_consistency_rater_half_gpt4_en_5.5.0_3.0_1726199795612.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DeBertaForSequenceClassification.pretrained("deberta_v3_large_survey_related_passage_consistency_rater_half_gpt4","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DeBertaForSequenceClassification.pretrained("deberta_v3_large_survey_related_passage_consistency_rater_half_gpt4", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|deberta_v3_large_survey_related_passage_consistency_rater_half_gpt4| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.5 GB| + +## References + +https://huggingface.co/domenicrosati/deberta-v3-large-survey-related_passage_consistency-rater-half-gpt4 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-distilbert_base_uncased_finetuned_emotion_dmitry2000_en.md b/docs/_posts/ahmedlone127/2024-09-13-distilbert_base_uncased_finetuned_emotion_dmitry2000_en.md new file mode 100644 index 00000000000000..258e3c284e0376 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-distilbert_base_uncased_finetuned_emotion_dmitry2000_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_dmitry2000 DistilBertForSequenceClassification from Dmitry2000 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_dmitry2000 +date: 2024-09-13 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_dmitry2000` is a English model originally trained by Dmitry2000. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_dmitry2000_en_5.5.0_3.0_1726262739526.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_dmitry2000_en_5.5.0_3.0_1726262739526.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_dmitry2000","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_dmitry2000", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_dmitry2000| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Dmitry2000/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-distilbert_base_uncased_finetuned_emotion_igniter909_en.md b/docs/_posts/ahmedlone127/2024-09-13-distilbert_base_uncased_finetuned_emotion_igniter909_en.md new file mode 100644 index 00000000000000..8ab47689809f6f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-distilbert_base_uncased_finetuned_emotion_igniter909_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_igniter909 DistilBertForSequenceClassification from Igniter909 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_igniter909 +date: 2024-09-13 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_igniter909` is a English model originally trained by Igniter909. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_igniter909_en_5.5.0_3.0_1726242656901.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_igniter909_en_5.5.0_3.0_1726242656901.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_igniter909","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_igniter909", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_igniter909| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Igniter909/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-distilbert_base_uncased_finetuned_squad_likhith231_en.md b/docs/_posts/ahmedlone127/2024-09-13-distilbert_base_uncased_finetuned_squad_likhith231_en.md new file mode 100644 index 00000000000000..0696367d9e15c0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-distilbert_base_uncased_finetuned_squad_likhith231_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_squad_likhith231 DistilBertForQuestionAnswering from likhith231 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_squad_likhith231 +date: 2024-09-13 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_squad_likhith231` is a English model originally trained by likhith231. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_likhith231_en_5.5.0_3.0_1726245491731.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_likhith231_en_5.5.0_3.0_1726245491731.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_uncased_finetuned_squad_likhith231","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_uncased_finetuned_squad_likhith231", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_squad_likhith231| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/likhith231/distilbert-base-uncased-finetuned-squad \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-distilbert_base_uncased_finetuned_squad_nahomk_en.md b/docs/_posts/ahmedlone127/2024-09-13-distilbert_base_uncased_finetuned_squad_nahomk_en.md new file mode 100644 index 00000000000000..5daf23031af71a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-distilbert_base_uncased_finetuned_squad_nahomk_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_squad_nahomk DistilBertForQuestionAnswering from nahomk +author: John Snow Labs +name: distilbert_base_uncased_finetuned_squad_nahomk +date: 2024-09-13 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_squad_nahomk` is a English model originally trained by nahomk. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_nahomk_en_5.5.0_3.0_1726245348722.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_nahomk_en_5.5.0_3.0_1726245348722.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_uncased_finetuned_squad_nahomk","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_uncased_finetuned_squad_nahomk", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_squad_nahomk| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/nahomk/distilbert-base-uncased-finetuned-squad \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-distilbert_cased_reviews_v1_en.md b/docs/_posts/ahmedlone127/2024-09-13-distilbert_cased_reviews_v1_en.md new file mode 100644 index 00000000000000..9ac6f0165db0f4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-distilbert_cased_reviews_v1_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_cased_reviews_v1 DistilBertForSequenceClassification from Asteriks +author: John Snow Labs +name: distilbert_cased_reviews_v1 +date: 2024-09-13 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_cased_reviews_v1` is a English model originally trained by Asteriks. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_cased_reviews_v1_en_5.5.0_3.0_1726242772995.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_cased_reviews_v1_en_5.5.0_3.0_1726242772995.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_cased_reviews_v1","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_cased_reviews_v1", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_cased_reviews_v1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|246.1 MB| + +## References + +https://huggingface.co/Asteriks/distilbert-cased-reviews-v1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-distilbert_uncased_finetuned_ecommerce_reviews_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-13-distilbert_uncased_finetuned_ecommerce_reviews_pipeline_en.md new file mode 100644 index 00000000000000..a9f3b870020560 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-distilbert_uncased_finetuned_ecommerce_reviews_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_uncased_finetuned_ecommerce_reviews_pipeline pipeline DistilBertForSequenceClassification from sayandg +author: John Snow Labs +name: distilbert_uncased_finetuned_ecommerce_reviews_pipeline +date: 2024-09-13 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_uncased_finetuned_ecommerce_reviews_pipeline` is a English model originally trained by sayandg. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_uncased_finetuned_ecommerce_reviews_pipeline_en_5.5.0_3.0_1726242813008.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_uncased_finetuned_ecommerce_reviews_pipeline_en_5.5.0_3.0_1726242813008.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_uncased_finetuned_ecommerce_reviews_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_uncased_finetuned_ecommerce_reviews_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_uncased_finetuned_ecommerce_reviews_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/sayandg/distilbert_uncased_finetuned_ecommerce_reviews + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-distilroberta_nli_en.md b/docs/_posts/ahmedlone127/2024-09-13-distilroberta_nli_en.md new file mode 100644 index 00000000000000..d73ade60bb9420 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-distilroberta_nli_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilroberta_nli RoBertaForSequenceClassification from AdamCodd +author: John Snow Labs +name: distilroberta_nli +date: 2024-09-13 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilroberta_nli` is a English model originally trained by AdamCodd. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilroberta_nli_en_5.5.0_3.0_1726247921468.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilroberta_nli_en_5.5.0_3.0_1726247921468.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("distilroberta_nli","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("distilroberta_nli", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilroberta_nli| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|309.0 MB| + +## References + +https://huggingface.co/AdamCodd/distilroberta-NLI \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-finetuned_demo_2_alessioantonelli_en.md b/docs/_posts/ahmedlone127/2024-09-13-finetuned_demo_2_alessioantonelli_en.md new file mode 100644 index 00000000000000..c780bb191b2a04 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-finetuned_demo_2_alessioantonelli_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuned_demo_2_alessioantonelli DistilBertForSequenceClassification from alessioantonelli +author: John Snow Labs +name: finetuned_demo_2_alessioantonelli +date: 2024-09-13 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuned_demo_2_alessioantonelli` is a English model originally trained by alessioantonelli. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuned_demo_2_alessioantonelli_en_5.5.0_3.0_1726242266205.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuned_demo_2_alessioantonelli_en_5.5.0_3.0_1726242266205.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuned_demo_2_alessioantonelli","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuned_demo_2_alessioantonelli", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuned_demo_2_alessioantonelli| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|246.0 MB| + +## References + +https://huggingface.co/alessioantonelli/finetuned_demo_2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-finetuned_sail2017_additionalpretrained_indic_bert_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-13-finetuned_sail2017_additionalpretrained_indic_bert_pipeline_en.md new file mode 100644 index 00000000000000..6c67992fe950f2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-finetuned_sail2017_additionalpretrained_indic_bert_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuned_sail2017_additionalpretrained_indic_bert_pipeline pipeline AlbertForSequenceClassification from aditeyabaral +author: John Snow Labs +name: finetuned_sail2017_additionalpretrained_indic_bert_pipeline +date: 2024-09-13 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained AlbertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuned_sail2017_additionalpretrained_indic_bert_pipeline` is a English model originally trained by aditeyabaral. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuned_sail2017_additionalpretrained_indic_bert_pipeline_en_5.5.0_3.0_1726188305415.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuned_sail2017_additionalpretrained_indic_bert_pipeline_en_5.5.0_3.0_1726188305415.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuned_sail2017_additionalpretrained_indic_bert_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuned_sail2017_additionalpretrained_indic_bert_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuned_sail2017_additionalpretrained_indic_bert_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|127.8 MB| + +## References + +https://huggingface.co/aditeyabaral/finetuned-sail2017-additionalpretrained-indic-bert + +## Included Models + +- DocumentAssembler +- TokenizerModel +- AlbertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-finetuning_sentiment_model_ophelia_3_1_0_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-13-finetuning_sentiment_model_ophelia_3_1_0_pipeline_en.md new file mode 100644 index 00000000000000..adf44974d17faf --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-finetuning_sentiment_model_ophelia_3_1_0_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuning_sentiment_model_ophelia_3_1_0_pipeline pipeline DistilBertForSequenceClassification from Razafaheem +author: John Snow Labs +name: finetuning_sentiment_model_ophelia_3_1_0_pipeline +date: 2024-09-13 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_ophelia_3_1_0_pipeline` is a English model originally trained by Razafaheem. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_ophelia_3_1_0_pipeline_en_5.5.0_3.0_1726262561299.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_ophelia_3_1_0_pipeline_en_5.5.0_3.0_1726262561299.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuning_sentiment_model_ophelia_3_1_0_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuning_sentiment_model_ophelia_3_1_0_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_ophelia_3_1_0_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Razafaheem/finetuning-sentiment-model-ophelia-3.1.0 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-gqa_roberta_german_legal_squad_part_augmented_17_de.md b/docs/_posts/ahmedlone127/2024-09-13-gqa_roberta_german_legal_squad_part_augmented_17_de.md new file mode 100644 index 00000000000000..58fcee133cbf36 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-gqa_roberta_german_legal_squad_part_augmented_17_de.md @@ -0,0 +1,86 @@ +--- +layout: model +title: German gqa_roberta_german_legal_squad_part_augmented_17 RoBertaForQuestionAnswering from farid1088 +author: John Snow Labs +name: gqa_roberta_german_legal_squad_part_augmented_17 +date: 2024-09-13 +tags: [de, open_source, onnx, question_answering, roberta] +task: Question Answering +language: de +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`gqa_roberta_german_legal_squad_part_augmented_17` is a German model originally trained by farid1088. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/gqa_roberta_german_legal_squad_part_augmented_17_de_5.5.0_3.0_1726199051540.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/gqa_roberta_german_legal_squad_part_augmented_17_de_5.5.0_3.0_1726199051540.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = RoBertaForQuestionAnswering.pretrained("gqa_roberta_german_legal_squad_part_augmented_17","de") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = RoBertaForQuestionAnswering.pretrained("gqa_roberta_german_legal_squad_part_augmented_17", "de") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|gqa_roberta_german_legal_squad_part_augmented_17| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|de| +|Size:|465.8 MB| + +## References + +https://huggingface.co/farid1088/GQA_RoBERTa_German_legal_SQuAD_part_augmented_17 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-gqa_roberta_german_legal_squad_part_augmented_2000_de.md b/docs/_posts/ahmedlone127/2024-09-13-gqa_roberta_german_legal_squad_part_augmented_2000_de.md new file mode 100644 index 00000000000000..fb7ae226ae8d4e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-gqa_roberta_german_legal_squad_part_augmented_2000_de.md @@ -0,0 +1,86 @@ +--- +layout: model +title: German gqa_roberta_german_legal_squad_part_augmented_2000 RoBertaForQuestionAnswering from farid1088 +author: John Snow Labs +name: gqa_roberta_german_legal_squad_part_augmented_2000 +date: 2024-09-13 +tags: [de, open_source, onnx, question_answering, roberta] +task: Question Answering +language: de +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`gqa_roberta_german_legal_squad_part_augmented_2000` is a German model originally trained by farid1088. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/gqa_roberta_german_legal_squad_part_augmented_2000_de_5.5.0_3.0_1726231395031.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/gqa_roberta_german_legal_squad_part_augmented_2000_de_5.5.0_3.0_1726231395031.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = RoBertaForQuestionAnswering.pretrained("gqa_roberta_german_legal_squad_part_augmented_2000","de") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = RoBertaForQuestionAnswering.pretrained("gqa_roberta_german_legal_squad_part_augmented_2000", "de") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|gqa_roberta_german_legal_squad_part_augmented_2000| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|de| +|Size:|465.8 MB| + +## References + +https://huggingface.co/farid1088/GQA_RoBERTa_German_legal_SQuAD_part_augmented_2000 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-horai_medium_10k_v4_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-13-horai_medium_10k_v4_pipeline_en.md new file mode 100644 index 00000000000000..97675d78927d9e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-horai_medium_10k_v4_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English horai_medium_10k_v4_pipeline pipeline RoBertaForSequenceClassification from stealthwriter +author: John Snow Labs +name: horai_medium_10k_v4_pipeline +date: 2024-09-13 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`horai_medium_10k_v4_pipeline` is a English model originally trained by stealthwriter. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/horai_medium_10k_v4_pipeline_en_5.5.0_3.0_1726187577444.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/horai_medium_10k_v4_pipeline_en_5.5.0_3.0_1726187577444.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("horai_medium_10k_v4_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("horai_medium_10k_v4_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|horai_medium_10k_v4_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|438.9 MB| + +## References + +https://huggingface.co/stealthwriter/HorAI-medium-10k-V4 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-hunembert3_pipeline_hu.md b/docs/_posts/ahmedlone127/2024-09-13-hunembert3_pipeline_hu.md new file mode 100644 index 00000000000000..dc8d445f8405f7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-hunembert3_pipeline_hu.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Hungarian hunembert3_pipeline pipeline BertForSequenceClassification from poltextlab +author: John Snow Labs +name: hunembert3_pipeline +date: 2024-09-13 +tags: [hu, open_source, pipeline, onnx] +task: Text Classification +language: hu +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`hunembert3_pipeline` is a Hungarian model originally trained by poltextlab. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/hunembert3_pipeline_hu_5.5.0_3.0_1726201703167.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/hunembert3_pipeline_hu_5.5.0_3.0_1726201703167.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("hunembert3_pipeline", lang = "hu") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("hunembert3_pipeline", lang = "hu") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|hunembert3_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|hu| +|Size:|414.7 MB| + +## References + +https://huggingface.co/poltextlab/HunEmBERT3 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-job_compatibility_model_en.md b/docs/_posts/ahmedlone127/2024-09-13-job_compatibility_model_en.md new file mode 100644 index 00000000000000..94ec588d337f0d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-job_compatibility_model_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English job_compatibility_model DistilBertForSequenceClassification from DaJulster +author: John Snow Labs +name: job_compatibility_model +date: 2024-09-13 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`job_compatibility_model` is a English model originally trained by DaJulster. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/job_compatibility_model_en_5.5.0_3.0_1726262430661.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/job_compatibility_model_en_5.5.0_3.0_1726262430661.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("job_compatibility_model","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("job_compatibility_model", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|job_compatibility_model| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/DaJulster/Job_compatibility_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-lab1_finetuning_reshphil_en.md b/docs/_posts/ahmedlone127/2024-09-13-lab1_finetuning_reshphil_en.md new file mode 100644 index 00000000000000..26e452f30186de --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-lab1_finetuning_reshphil_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English lab1_finetuning_reshphil MarianTransformer from Reshphil +author: John Snow Labs +name: lab1_finetuning_reshphil +date: 2024-09-13 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`lab1_finetuning_reshphil` is a English model originally trained by Reshphil. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/lab1_finetuning_reshphil_en_5.5.0_3.0_1726191723727.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/lab1_finetuning_reshphil_en_5.5.0_3.0_1726191723727.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("lab1_finetuning_reshphil","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("lab1_finetuning_reshphil","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|lab1_finetuning_reshphil| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|508.1 MB| + +## References + +https://huggingface.co/Reshphil/lab1_finetuning \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-mach_2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-13-mach_2_pipeline_en.md new file mode 100644 index 00000000000000..0a03136f488c5b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-mach_2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English mach_2_pipeline pipeline RoBertaForSequenceClassification from BaronSch +author: John Snow Labs +name: mach_2_pipeline +date: 2024-09-13 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`mach_2_pipeline` is a English model originally trained by BaronSch. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/mach_2_pipeline_en_5.5.0_3.0_1726187610389.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/mach_2_pipeline_en_5.5.0_3.0_1726187610389.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("mach_2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("mach_2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|mach_2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|468.5 MB| + +## References + +https://huggingface.co/BaronSch/Mach_2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-marian_finetuned_kde4_english_tonga_tonga_islands_french_billzhou1888_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-13-marian_finetuned_kde4_english_tonga_tonga_islands_french_billzhou1888_pipeline_en.md new file mode 100644 index 00000000000000..2eff3f955b3354 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-marian_finetuned_kde4_english_tonga_tonga_islands_french_billzhou1888_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English marian_finetuned_kde4_english_tonga_tonga_islands_french_billzhou1888_pipeline pipeline MarianTransformer from billzhou1888 +author: John Snow Labs +name: marian_finetuned_kde4_english_tonga_tonga_islands_french_billzhou1888_pipeline +date: 2024-09-13 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`marian_finetuned_kde4_english_tonga_tonga_islands_french_billzhou1888_pipeline` is a English model originally trained by billzhou1888. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/marian_finetuned_kde4_english_tonga_tonga_islands_french_billzhou1888_pipeline_en_5.5.0_3.0_1726191296792.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/marian_finetuned_kde4_english_tonga_tonga_islands_french_billzhou1888_pipeline_en_5.5.0_3.0_1726191296792.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("marian_finetuned_kde4_english_tonga_tonga_islands_french_billzhou1888_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("marian_finetuned_kde4_english_tonga_tonga_islands_french_billzhou1888_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|marian_finetuned_kde4_english_tonga_tonga_islands_french_billzhou1888_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|508.8 MB| + +## References + +https://huggingface.co/billzhou1888/marian-finetuned-kde4-en-to-fr + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-marianmt_finetuned_english_vietnamese_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-13-marianmt_finetuned_english_vietnamese_pipeline_en.md new file mode 100644 index 00000000000000..d7c3a81494b361 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-marianmt_finetuned_english_vietnamese_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English marianmt_finetuned_english_vietnamese_pipeline pipeline MarianTransformer from lmh2011 +author: John Snow Labs +name: marianmt_finetuned_english_vietnamese_pipeline +date: 2024-09-13 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`marianmt_finetuned_english_vietnamese_pipeline` is a English model originally trained by lmh2011. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/marianmt_finetuned_english_vietnamese_pipeline_en_5.5.0_3.0_1726191481043.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/marianmt_finetuned_english_vietnamese_pipeline_en_5.5.0_3.0_1726191481043.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("marianmt_finetuned_english_vietnamese_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("marianmt_finetuned_english_vietnamese_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|marianmt_finetuned_english_vietnamese_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|475.1 MB| + +## References + +https://huggingface.co/lmh2011/marianMT-finetuned-en-vi + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-mdeberta_v3_base_assin2_entailment_pt.md b/docs/_posts/ahmedlone127/2024-09-13-mdeberta_v3_base_assin2_entailment_pt.md new file mode 100644 index 00000000000000..ae4a9b28f6c36d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-mdeberta_v3_base_assin2_entailment_pt.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Portuguese mdeberta_v3_base_assin2_entailment DeBertaForSequenceClassification from ruanchaves +author: John Snow Labs +name: mdeberta_v3_base_assin2_entailment +date: 2024-09-13 +tags: [pt, open_source, onnx, sequence_classification, deberta] +task: Text Classification +language: pt +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DeBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DeBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`mdeberta_v3_base_assin2_entailment` is a Portuguese model originally trained by ruanchaves. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/mdeberta_v3_base_assin2_entailment_pt_5.5.0_3.0_1726260457649.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/mdeberta_v3_base_assin2_entailment_pt_5.5.0_3.0_1726260457649.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DeBertaForSequenceClassification.pretrained("mdeberta_v3_base_assin2_entailment","pt") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DeBertaForSequenceClassification.pretrained("mdeberta_v3_base_assin2_entailment", "pt") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|mdeberta_v3_base_assin2_entailment| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|pt| +|Size:|836.8 MB| + +## References + +https://huggingface.co/ruanchaves/mdeberta-v3-base-assin2-entailment \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-mlm_cl_descreption_epochs_5_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-13-mlm_cl_descreption_epochs_5_pipeline_en.md new file mode 100644 index 00000000000000..fdf906bbdf7b8a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-mlm_cl_descreption_epochs_5_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English mlm_cl_descreption_epochs_5_pipeline pipeline DistilBertEmbeddings from Milad1b +author: John Snow Labs +name: mlm_cl_descreption_epochs_5_pipeline +date: 2024-09-13 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`mlm_cl_descreption_epochs_5_pipeline` is a English model originally trained by Milad1b. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/mlm_cl_descreption_epochs_5_pipeline_en_5.5.0_3.0_1726192875617.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/mlm_cl_descreption_epochs_5_pipeline_en_5.5.0_3.0_1726192875617.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("mlm_cl_descreption_epochs_5_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("mlm_cl_descreption_epochs_5_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|mlm_cl_descreption_epochs_5_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|505.3 MB| + +## References + +https://huggingface.co/Milad1b/MLM_CL_descreption_epochs-5 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-nlp_hf_workshop_mohammadhabp_en.md b/docs/_posts/ahmedlone127/2024-09-13-nlp_hf_workshop_mohammadhabp_en.md new file mode 100644 index 00000000000000..400c2c1898af1c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-nlp_hf_workshop_mohammadhabp_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English nlp_hf_workshop_mohammadhabp DistilBertForSequenceClassification from mohammadhabp +author: John Snow Labs +name: nlp_hf_workshop_mohammadhabp +date: 2024-09-13 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`nlp_hf_workshop_mohammadhabp` is a English model originally trained by mohammadhabp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/nlp_hf_workshop_mohammadhabp_en_5.5.0_3.0_1726262006560.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/nlp_hf_workshop_mohammadhabp_en_5.5.0_3.0_1726262006560.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("nlp_hf_workshop_mohammadhabp","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("nlp_hf_workshop_mohammadhabp", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|nlp_hf_workshop_mohammadhabp| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|246.0 MB| + +## References + +https://huggingface.co/mohammadhabp/NLP_HF_Workshop \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-opus_maltese_english_spanish_finetuned_spanish_tonga_tonga_islands_guc_en.md b/docs/_posts/ahmedlone127/2024-09-13-opus_maltese_english_spanish_finetuned_spanish_tonga_tonga_islands_guc_en.md new file mode 100644 index 00000000000000..18d1fee49fd722 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-opus_maltese_english_spanish_finetuned_spanish_tonga_tonga_islands_guc_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English opus_maltese_english_spanish_finetuned_spanish_tonga_tonga_islands_guc MarianTransformer from mekjr1 +author: John Snow Labs +name: opus_maltese_english_spanish_finetuned_spanish_tonga_tonga_islands_guc +date: 2024-09-13 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opus_maltese_english_spanish_finetuned_spanish_tonga_tonga_islands_guc` is a English model originally trained by mekjr1. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opus_maltese_english_spanish_finetuned_spanish_tonga_tonga_islands_guc_en_5.5.0_3.0_1726269564467.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opus_maltese_english_spanish_finetuned_spanish_tonga_tonga_islands_guc_en_5.5.0_3.0_1726269564467.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("opus_maltese_english_spanish_finetuned_spanish_tonga_tonga_islands_guc","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("opus_maltese_english_spanish_finetuned_spanish_tonga_tonga_islands_guc","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opus_maltese_english_spanish_finetuned_spanish_tonga_tonga_islands_guc| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|539.9 MB| + +## References + +https://huggingface.co/mekjr1/opus-mt-en-es-finetuned-es-to-guc \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-pubchem10m_smiles_bpe_396_250_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-13-pubchem10m_smiles_bpe_396_250_pipeline_en.md new file mode 100644 index 00000000000000..20d3452ca75243 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-pubchem10m_smiles_bpe_396_250_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English pubchem10m_smiles_bpe_396_250_pipeline pipeline RoBertaEmbeddings from seyonec +author: John Snow Labs +name: pubchem10m_smiles_bpe_396_250_pipeline +date: 2024-09-13 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`pubchem10m_smiles_bpe_396_250_pipeline` is a English model originally trained by seyonec. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/pubchem10m_smiles_bpe_396_250_pipeline_en_5.5.0_3.0_1726185785260.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/pubchem10m_smiles_bpe_396_250_pipeline_en_5.5.0_3.0_1726185785260.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("pubchem10m_smiles_bpe_396_250_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("pubchem10m_smiles_bpe_396_250_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|pubchem10m_smiles_bpe_396_250_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|311.0 MB| + +## References + +https://huggingface.co/seyonec/PubChem10M_SMILES_BPE_396_250 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-q06_kaggle_debertav2_01_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-13-q06_kaggle_debertav2_01_pipeline_en.md new file mode 100644 index 00000000000000..479e4b918dfd30 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-q06_kaggle_debertav2_01_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English q06_kaggle_debertav2_01_pipeline pipeline DeBertaForSequenceClassification from wallacenpj +author: John Snow Labs +name: q06_kaggle_debertav2_01_pipeline +date: 2024-09-13 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DeBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`q06_kaggle_debertav2_01_pipeline` is a English model originally trained by wallacenpj. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/q06_kaggle_debertav2_01_pipeline_en_5.5.0_3.0_1726190519689.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/q06_kaggle_debertav2_01_pipeline_en_5.5.0_3.0_1726190519689.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("q06_kaggle_debertav2_01_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("q06_kaggle_debertav2_01_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|q06_kaggle_debertav2_01_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|559.0 MB| + +## References + +https://huggingface.co/wallacenpj/q06_kaggle_debertav2_01 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DeBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-qa_model_zehralx_en.md b/docs/_posts/ahmedlone127/2024-09-13-qa_model_zehralx_en.md new file mode 100644 index 00000000000000..122046df67e6af --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-qa_model_zehralx_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English qa_model_zehralx DistilBertForQuestionAnswering from zehralx +author: John Snow Labs +name: qa_model_zehralx +date: 2024-09-13 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`qa_model_zehralx` is a English model originally trained by zehralx. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/qa_model_zehralx_en_5.5.0_3.0_1726266774060.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/qa_model_zehralx_en_5.5.0_3.0_1726266774060.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("qa_model_zehralx","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("qa_model_zehralx", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|qa_model_zehralx| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/zehralx/qa_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-qa_refined_questions_and_data_14k_15_08_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-13-qa_refined_questions_and_data_14k_15_08_pipeline_en.md new file mode 100644 index 00000000000000..1a3d5da25fb6b0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-qa_refined_questions_and_data_14k_15_08_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English qa_refined_questions_and_data_14k_15_08_pipeline pipeline RoBertaForQuestionAnswering from am-infoweb +author: John Snow Labs +name: qa_refined_questions_and_data_14k_15_08_pipeline +date: 2024-09-13 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`qa_refined_questions_and_data_14k_15_08_pipeline` is a English model originally trained by am-infoweb. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/qa_refined_questions_and_data_14k_15_08_pipeline_en_5.5.0_3.0_1726198969380.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/qa_refined_questions_and_data_14k_15_08_pipeline_en_5.5.0_3.0_1726198969380.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("qa_refined_questions_and_data_14k_15_08_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("qa_refined_questions_and_data_14k_15_08_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|qa_refined_questions_and_data_14k_15_08_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|464.0 MB| + +## References + +https://huggingface.co/am-infoweb/QA_REFINED_QUESTIONS_AND_DATA_14K_15-08 + +## Included Models + +- MultiDocumentAssembler +- RoBertaForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-qhr_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-13-qhr_pipeline_en.md new file mode 100644 index 00000000000000..d23ce23dfbf5ba --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-qhr_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English qhr_pipeline pipeline RoBertaForSequenceClassification from aloxatel +author: John Snow Labs +name: qhr_pipeline +date: 2024-09-13 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`qhr_pipeline` is a English model originally trained by aloxatel. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/qhr_pipeline_en_5.5.0_3.0_1726227382241.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/qhr_pipeline_en_5.5.0_3.0_1726227382241.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("qhr_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("qhr_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|qhr_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/aloxatel/QHR + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-rmse_1_en.md b/docs/_posts/ahmedlone127/2024-09-13-rmse_1_en.md new file mode 100644 index 00000000000000..3a9c95ecec5fe4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-rmse_1_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English rmse_1 RoBertaForSequenceClassification from BaronSch +author: John Snow Labs +name: rmse_1 +date: 2024-09-13 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`rmse_1` is a English model originally trained by BaronSch. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/rmse_1_en_5.5.0_3.0_1726248042220.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/rmse_1_en_5.5.0_3.0_1726248042220.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("rmse_1","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("rmse_1", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|rmse_1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|468.5 MB| + +## References + +https://huggingface.co/BaronSch/RMSE_1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-roberta_base_qa_squad2_en.md b/docs/_posts/ahmedlone127/2024-09-13-roberta_base_qa_squad2_en.md new file mode 100644 index 00000000000000..72cfaa3298efcb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-roberta_base_qa_squad2_en.md @@ -0,0 +1,102 @@ +--- +layout: model +title: English RoBertaForQuestionAnswering model (from deepset) +author: John Snow Labs +name: roberta_base_qa_squad2 +date: 2024-09-13 +tags: [open_source, roberta, question_answering, en, onnx, openvino] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: openvino +annotator: RoBertaForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained Question Answering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `roberta-base-squad2` is a English model originally trained by `deepset`. + +## Predicted Entities + + + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_qa_squad2_en_5.5.0_3.0_1726221242507.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_qa_squad2_en_5.5.0_3.0_1726221242507.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = MultiDocumentAssembler() \ +.setInputCols(["question", "context"]) \ +.setOutputCols(["document_question", "document_context"]) + +spanClassifier = RoBertaForQuestionAnswering.pretrained("roberta_base_qa_squad2","en") \ +.setInputCols(["document_question", "document_context"]) \ +.setOutputCol("answer")\ +.setCaseSensitive(True) + +pipeline = Pipeline(stages=[documentAssembler, spanClassifier]) + +data = spark.createDataFrame([["What is my name?", "My name is Clara and I live in Berkeley."]]).toDF("question", "context") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new MultiDocumentAssembler() +.setInputCols(Array("question", "context")) +.setOutputCols(Array("document_question", "document_context")) + +val spanClassifer = RoBertaForQuestionAnswering.pretrained("roberta_base_qa_squad2","en") +.setInputCols(Array("document", "token")) +.setOutputCol("answer") +.setCaseSensitive(true) + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) + +val data = Seq("What is my name?", "My name is Clara and I live in Berkeley.").toDF("question", "context") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.answer_question.squadv2.roberta.base.by_deepset").predict("""What is my name?|||"My name is Clara and I live in Berkeley.""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_qa_squad2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|463.2 MB| +|Case sensitive:|false| +|Max sentence length:|512| + +## References + +References + +References + +https://huggingface.co/deepset/roberta-base-squad2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-roberta_finetuned_subjqa_movies_2_ayoubsassi_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-13-roberta_finetuned_subjqa_movies_2_ayoubsassi_pipeline_en.md new file mode 100644 index 00000000000000..a5562b4cd7c785 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-roberta_finetuned_subjqa_movies_2_ayoubsassi_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English roberta_finetuned_subjqa_movies_2_ayoubsassi_pipeline pipeline RoBertaForQuestionAnswering from ayoubsassi +author: John Snow Labs +name: roberta_finetuned_subjqa_movies_2_ayoubsassi_pipeline +date: 2024-09-13 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_finetuned_subjqa_movies_2_ayoubsassi_pipeline` is a English model originally trained by ayoubsassi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_finetuned_subjqa_movies_2_ayoubsassi_pipeline_en_5.5.0_3.0_1726207102885.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_finetuned_subjqa_movies_2_ayoubsassi_pipeline_en_5.5.0_3.0_1726207102885.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_finetuned_subjqa_movies_2_ayoubsassi_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_finetuned_subjqa_movies_2_ayoubsassi_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_finetuned_subjqa_movies_2_ayoubsassi_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|464.1 MB| + +## References + +https://huggingface.co/ayoubsassi/roberta-finetuned-subjqa-movies_2 + +## Included Models + +- MultiDocumentAssembler +- RoBertaForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-roberta_finetuned_subjqa_movies_2_vijayaphani5_en.md b/docs/_posts/ahmedlone127/2024-09-13-roberta_finetuned_subjqa_movies_2_vijayaphani5_en.md new file mode 100644 index 00000000000000..08e6107f8a82e9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-roberta_finetuned_subjqa_movies_2_vijayaphani5_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English roberta_finetuned_subjqa_movies_2_vijayaphani5 RoBertaForQuestionAnswering from vijayaphani5 +author: John Snow Labs +name: roberta_finetuned_subjqa_movies_2_vijayaphani5 +date: 2024-09-13 +tags: [en, open_source, onnx, question_answering, roberta] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_finetuned_subjqa_movies_2_vijayaphani5` is a English model originally trained by vijayaphani5. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_finetuned_subjqa_movies_2_vijayaphani5_en_5.5.0_3.0_1726207077888.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_finetuned_subjqa_movies_2_vijayaphani5_en_5.5.0_3.0_1726207077888.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = RoBertaForQuestionAnswering.pretrained("roberta_finetuned_subjqa_movies_2_vijayaphani5","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = RoBertaForQuestionAnswering.pretrained("roberta_finetuned_subjqa_movies_2_vijayaphani5", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_finetuned_subjqa_movies_2_vijayaphani5| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|464.1 MB| + +## References + +https://huggingface.co/vijayaphani5/roberta-finetuned-subjqa-movies_2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-roberta_small_turkish_clean_uncased_pipeline_tr.md b/docs/_posts/ahmedlone127/2024-09-13-roberta_small_turkish_clean_uncased_pipeline_tr.md new file mode 100644 index 00000000000000..a3a4d017174dcb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-roberta_small_turkish_clean_uncased_pipeline_tr.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Turkish roberta_small_turkish_clean_uncased_pipeline pipeline RoBertaEmbeddings from burakaytan +author: John Snow Labs +name: roberta_small_turkish_clean_uncased_pipeline +date: 2024-09-13 +tags: [tr, open_source, pipeline, onnx] +task: Embeddings +language: tr +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_small_turkish_clean_uncased_pipeline` is a Turkish model originally trained by burakaytan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_small_turkish_clean_uncased_pipeline_tr_5.5.0_3.0_1726264761396.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_small_turkish_clean_uncased_pipeline_tr_5.5.0_3.0_1726264761396.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_small_turkish_clean_uncased_pipeline", lang = "tr") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_small_turkish_clean_uncased_pipeline", lang = "tr") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_small_turkish_clean_uncased_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|tr| +|Size:|222.4 MB| + +## References + +https://huggingface.co/burakaytan/roberta-small-turkish-clean-uncased + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-roberta_small_turkish_clean_uncased_tr.md b/docs/_posts/ahmedlone127/2024-09-13-roberta_small_turkish_clean_uncased_tr.md new file mode 100644 index 00000000000000..20074933711ba8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-roberta_small_turkish_clean_uncased_tr.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Turkish roberta_small_turkish_clean_uncased RoBertaEmbeddings from burakaytan +author: John Snow Labs +name: roberta_small_turkish_clean_uncased +date: 2024-09-13 +tags: [tr, open_source, onnx, embeddings, roberta] +task: Embeddings +language: tr +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_small_turkish_clean_uncased` is a Turkish model originally trained by burakaytan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_small_turkish_clean_uncased_tr_5.5.0_3.0_1726264750214.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_small_turkish_clean_uncased_tr_5.5.0_3.0_1726264750214.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("roberta_small_turkish_clean_uncased","tr") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("roberta_small_turkish_clean_uncased","tr") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_small_turkish_clean_uncased| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|tr| +|Size:|222.3 MB| + +## References + +https://huggingface.co/burakaytan/roberta-small-turkish-clean-uncased \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-scenario_non_kd_scr_d2_data_amazonscience_massive_all_1_1333_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-13-scenario_non_kd_scr_d2_data_amazonscience_massive_all_1_1333_pipeline_en.md new file mode 100644 index 00000000000000..1061b766f821a5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-scenario_non_kd_scr_d2_data_amazonscience_massive_all_1_1333_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English scenario_non_kd_scr_d2_data_amazonscience_massive_all_1_1333_pipeline pipeline XlmRoBertaForSequenceClassification from haryoaw +author: John Snow Labs +name: scenario_non_kd_scr_d2_data_amazonscience_massive_all_1_1333_pipeline +date: 2024-09-13 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`scenario_non_kd_scr_d2_data_amazonscience_massive_all_1_1333_pipeline` is a English model originally trained by haryoaw. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/scenario_non_kd_scr_d2_data_amazonscience_massive_all_1_1333_pipeline_en_5.5.0_3.0_1726196358194.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/scenario_non_kd_scr_d2_data_amazonscience_massive_all_1_1333_pipeline_en_5.5.0_3.0_1726196358194.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("scenario_non_kd_scr_d2_data_amazonscience_massive_all_1_1333_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("scenario_non_kd_scr_d2_data_amazonscience_massive_all_1_1333_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|scenario_non_kd_scr_d2_data_amazonscience_massive_all_1_1333_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|884.2 MB| + +## References + +https://huggingface.co/haryoaw/scenario-NON-KD-SCR-D2_data-AmazonScience_massive_all_1_1333 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-sent_bert_base_arabert_finetuned_mdeberta_tswana_v2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-13-sent_bert_base_arabert_finetuned_mdeberta_tswana_v2_pipeline_en.md new file mode 100644 index 00000000000000..6b07b1f301e1bd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-sent_bert_base_arabert_finetuned_mdeberta_tswana_v2_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_base_arabert_finetuned_mdeberta_tswana_v2_pipeline pipeline BertSentenceEmbeddings from betteib +author: John Snow Labs +name: sent_bert_base_arabert_finetuned_mdeberta_tswana_v2_pipeline +date: 2024-09-13 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_arabert_finetuned_mdeberta_tswana_v2_pipeline` is a English model originally trained by betteib. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_arabert_finetuned_mdeberta_tswana_v2_pipeline_en_5.5.0_3.0_1726246349573.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_arabert_finetuned_mdeberta_tswana_v2_pipeline_en_5.5.0_3.0_1726246349573.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_arabert_finetuned_mdeberta_tswana_v2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_arabert_finetuned_mdeberta_tswana_v2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_arabert_finetuned_mdeberta_tswana_v2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|505.1 MB| + +## References + +https://huggingface.co/betteib/bert-base-arabert-finetuned-mdeberta-tn-v2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-sent_bert_base_arabertv01_ar.md b/docs/_posts/ahmedlone127/2024-09-13-sent_bert_base_arabertv01_ar.md new file mode 100644 index 00000000000000..57d44cfe9b7ba7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-sent_bert_base_arabertv01_ar.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Arabic sent_bert_base_arabertv01 BertSentenceEmbeddings from aubmindlab +author: John Snow Labs +name: sent_bert_base_arabertv01 +date: 2024-09-13 +tags: [ar, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: ar +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_arabertv01` is a Arabic model originally trained by aubmindlab. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_arabertv01_ar_5.5.0_3.0_1726203445620.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_arabertv01_ar_5.5.0_3.0_1726203445620.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_arabertv01","ar") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_arabertv01","ar") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_arabertv01| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|ar| +|Size:|505.0 MB| + +## References + +https://huggingface.co/aubmindlab/bert-base-arabertv01 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-sent_bert_srb_base_cased_oscar_en.md b/docs/_posts/ahmedlone127/2024-09-13-sent_bert_srb_base_cased_oscar_en.md new file mode 100644 index 00000000000000..c82071685ea757 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-sent_bert_srb_base_cased_oscar_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_srb_base_cased_oscar BertSentenceEmbeddings from Aleksandar +author: John Snow Labs +name: sent_bert_srb_base_cased_oscar +date: 2024-09-13 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_srb_base_cased_oscar` is a English model originally trained by Aleksandar. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_srb_base_cased_oscar_en_5.5.0_3.0_1726246262082.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_srb_base_cased_oscar_en_5.5.0_3.0_1726246262082.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_srb_base_cased_oscar","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_srb_base_cased_oscar","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_srb_base_cased_oscar| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|403.9 MB| + +## References + +https://huggingface.co/Aleksandar/bert-srb-base-cased-oscar \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-sent_bertimbaulaw_base_portuguese_cased_en.md b/docs/_posts/ahmedlone127/2024-09-13-sent_bertimbaulaw_base_portuguese_cased_en.md new file mode 100644 index 00000000000000..e5b741f217732c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-sent_bertimbaulaw_base_portuguese_cased_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bertimbaulaw_base_portuguese_cased BertSentenceEmbeddings from alfaneo +author: John Snow Labs +name: sent_bertimbaulaw_base_portuguese_cased +date: 2024-09-13 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bertimbaulaw_base_portuguese_cased` is a English model originally trained by alfaneo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bertimbaulaw_base_portuguese_cased_en_5.5.0_3.0_1726223712526.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bertimbaulaw_base_portuguese_cased_en_5.5.0_3.0_1726223712526.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bertimbaulaw_base_portuguese_cased","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bertimbaulaw_base_portuguese_cased","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bertimbaulaw_base_portuguese_cased| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|405.8 MB| + +## References + +https://huggingface.co/alfaneo/bertimbaulaw-base-portuguese-cased \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-sent_condenser_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-13-sent_condenser_pipeline_en.md new file mode 100644 index 00000000000000..84d8c4d6f78017 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-sent_condenser_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_condenser_pipeline pipeline BertSentenceEmbeddings from Luyu +author: John Snow Labs +name: sent_condenser_pipeline +date: 2024-09-13 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_condenser_pipeline` is a English model originally trained by Luyu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_condenser_pipeline_en_5.5.0_3.0_1726224361633.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_condenser_pipeline_en_5.5.0_3.0_1726224361633.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_condenser_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_condenser_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_condenser_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.9 MB| + +## References + +https://huggingface.co/Luyu/condenser + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-sent_qe3_ar.md b/docs/_posts/ahmedlone127/2024-09-13-sent_qe3_ar.md new file mode 100644 index 00000000000000..9f807c3b2a0441 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-sent_qe3_ar.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Arabic sent_qe3 BertSentenceEmbeddings from NLP-EXP +author: John Snow Labs +name: sent_qe3 +date: 2024-09-13 +tags: [ar, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: ar +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_qe3` is a Arabic model originally trained by NLP-EXP. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_qe3_ar_5.5.0_3.0_1726233105019.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_qe3_ar_5.5.0_3.0_1726233105019.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_qe3","ar") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_qe3","ar") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_qe3| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|ar| +|Size:|504.1 MB| + +## References + +https://huggingface.co/NLP-EXP/QE3 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-sent_rubiobert_en.md b/docs/_posts/ahmedlone127/2024-09-13-sent_rubiobert_en.md new file mode 100644 index 00000000000000..7895e1f86f829f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-sent_rubiobert_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_rubiobert BertSentenceEmbeddings from alexyalunin +author: John Snow Labs +name: sent_rubiobert +date: 2024-09-13 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_rubiobert` is a English model originally trained by alexyalunin. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_rubiobert_en_5.5.0_3.0_1726246224036.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_rubiobert_en_5.5.0_3.0_1726246224036.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_rubiobert","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_rubiobert","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_rubiobert| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|667.1 MB| + +## References + +https://huggingface.co/alexyalunin/RuBioBERT \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-sent_rubiobert_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-13-sent_rubiobert_pipeline_en.md new file mode 100644 index 00000000000000..b0ca1f71e84fb4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-sent_rubiobert_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_rubiobert_pipeline pipeline BertSentenceEmbeddings from alexyalunin +author: John Snow Labs +name: sent_rubiobert_pipeline +date: 2024-09-13 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_rubiobert_pipeline` is a English model originally trained by alexyalunin. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_rubiobert_pipeline_en_5.5.0_3.0_1726246257329.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_rubiobert_pipeline_en_5.5.0_3.0_1726246257329.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_rubiobert_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_rubiobert_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_rubiobert_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|667.7 MB| + +## References + +https://huggingface.co/alexyalunin/RuBioBERT + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-snli_microsoft_deberta_v3_large_seed_3_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-13-snli_microsoft_deberta_v3_large_seed_3_pipeline_en.md new file mode 100644 index 00000000000000..862f48fea81e22 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-snli_microsoft_deberta_v3_large_seed_3_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English snli_microsoft_deberta_v3_large_seed_3_pipeline pipeline DeBertaForSequenceClassification from utahnlp +author: John Snow Labs +name: snli_microsoft_deberta_v3_large_seed_3_pipeline +date: 2024-09-13 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DeBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`snli_microsoft_deberta_v3_large_seed_3_pipeline` is a English model originally trained by utahnlp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/snli_microsoft_deberta_v3_large_seed_3_pipeline_en_5.5.0_3.0_1726244645963.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/snli_microsoft_deberta_v3_large_seed_3_pipeline_en_5.5.0_3.0_1726244645963.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("snli_microsoft_deberta_v3_large_seed_3_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("snli_microsoft_deberta_v3_large_seed_3_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|snli_microsoft_deberta_v3_large_seed_3_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.5 GB| + +## References + +https://huggingface.co/utahnlp/snli_microsoft_deberta-v3-large_seed-3 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DeBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-socbert_final_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-13-socbert_final_pipeline_en.md new file mode 100644 index 00000000000000..1e81597938a2da --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-socbert_final_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English socbert_final_pipeline pipeline RoBertaEmbeddings from sarkerlab +author: John Snow Labs +name: socbert_final_pipeline +date: 2024-09-13 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`socbert_final_pipeline` is a English model originally trained by sarkerlab. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/socbert_final_pipeline_en_5.5.0_3.0_1726197533621.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/socbert_final_pipeline_en_5.5.0_3.0_1726197533621.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("socbert_final_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("socbert_final_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|socbert_final_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|534.6 MB| + +## References + +https://huggingface.co/sarkerlab/SocBERT-final + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-sroberta_base_hr.md b/docs/_posts/ahmedlone127/2024-09-13-sroberta_base_hr.md new file mode 100644 index 00000000000000..efd3440026bf4f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-sroberta_base_hr.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Croatian sroberta_base RoBertaEmbeddings from Andrija +author: John Snow Labs +name: sroberta_base +date: 2024-09-13 +tags: [hr, open_source, onnx, embeddings, roberta] +task: Embeddings +language: hr +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sroberta_base` is a Croatian model originally trained by Andrija. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sroberta_base_hr_5.5.0_3.0_1726264596531.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sroberta_base_hr_5.5.0_3.0_1726264596531.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("sroberta_base","hr") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("sroberta_base","hr") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sroberta_base| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|hr| +|Size:|300.1 MB| + +## References + +https://huggingface.co/Andrija/SRoBERTa-base \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-test1_jliucy_en.md b/docs/_posts/ahmedlone127/2024-09-13-test1_jliucy_en.md new file mode 100644 index 00000000000000..b0b4ebbb070a8a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-test1_jliucy_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English test1_jliucy DistilBertForSequenceClassification from jliucy +author: John Snow Labs +name: test1_jliucy +date: 2024-09-13 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`test1_jliucy` is a English model originally trained by jliucy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/test1_jliucy_en_5.5.0_3.0_1726262130827.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/test1_jliucy_en_5.5.0_3.0_1726262130827.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("test1_jliucy","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("test1_jliucy", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|test1_jliucy| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/jliucy/test1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-test1_jliucy_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-13-test1_jliucy_pipeline_en.md new file mode 100644 index 00000000000000..e6ca31ed4421ce --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-test1_jliucy_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English test1_jliucy_pipeline pipeline DistilBertForSequenceClassification from jliucy +author: John Snow Labs +name: test1_jliucy_pipeline +date: 2024-09-13 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`test1_jliucy_pipeline` is a English model originally trained by jliucy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/test1_jliucy_pipeline_en_5.5.0_3.0_1726262143495.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/test1_jliucy_pipeline_en_5.5.0_3.0_1726262143495.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("test1_jliucy_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("test1_jliucy_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|test1_jliucy_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/jliucy/test1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-test_model_yzhangqs_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-13-test_model_yzhangqs_pipeline_en.md new file mode 100644 index 00000000000000..d0b8c76cc3022a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-test_model_yzhangqs_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English test_model_yzhangqs_pipeline pipeline DistilBertForSequenceClassification from yzhangqs +author: John Snow Labs +name: test_model_yzhangqs_pipeline +date: 2024-09-13 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`test_model_yzhangqs_pipeline` is a English model originally trained by yzhangqs. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/test_model_yzhangqs_pipeline_en_5.5.0_3.0_1726262021604.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/test_model_yzhangqs_pipeline_en_5.5.0_3.0_1726262021604.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("test_model_yzhangqs_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("test_model_yzhangqs_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|test_model_yzhangqs_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/yzhangqs/Test_Model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-trainer_chapter4_lixiwu_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-13-trainer_chapter4_lixiwu_pipeline_en.md new file mode 100644 index 00000000000000..ada8c1f15b8bc4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-trainer_chapter4_lixiwu_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English trainer_chapter4_lixiwu_pipeline pipeline DistilBertForSequenceClassification from lixiwu +author: John Snow Labs +name: trainer_chapter4_lixiwu_pipeline +date: 2024-09-13 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`trainer_chapter4_lixiwu_pipeline` is a English model originally trained by lixiwu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/trainer_chapter4_lixiwu_pipeline_en_5.5.0_3.0_1726262553116.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/trainer_chapter4_lixiwu_pipeline_en_5.5.0_3.0_1726262553116.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("trainer_chapter4_lixiwu_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("trainer_chapter4_lixiwu_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|trainer_chapter4_lixiwu_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/lixiwu/trainer-chapter4 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-whisper_small_arabic_heikal_pipeline_ar.md b/docs/_posts/ahmedlone127/2024-09-13-whisper_small_arabic_heikal_pipeline_ar.md new file mode 100644 index 00000000000000..071f7a0dc1985e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-whisper_small_arabic_heikal_pipeline_ar.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Arabic whisper_small_arabic_heikal_pipeline pipeline WhisperForCTC from heikal +author: John Snow Labs +name: whisper_small_arabic_heikal_pipeline +date: 2024-09-13 +tags: [ar, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: ar +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_arabic_heikal_pipeline` is a Arabic model originally trained by heikal. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_arabic_heikal_pipeline_ar_5.5.0_3.0_1726252650787.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_arabic_heikal_pipeline_ar_5.5.0_3.0_1726252650787.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_arabic_heikal_pipeline", lang = "ar") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_arabic_heikal_pipeline", lang = "ar") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_arabic_heikal_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|ar| +|Size:|1.7 GB| + +## References + +https://huggingface.co/heikal/whisper-small-ar + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-whisper_small_cuzco_quechua_pipeline_qu.md b/docs/_posts/ahmedlone127/2024-09-13-whisper_small_cuzco_quechua_pipeline_qu.md new file mode 100644 index 00000000000000..640390edc8da7f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-whisper_small_cuzco_quechua_pipeline_qu.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Quechua whisper_small_cuzco_quechua_pipeline pipeline WhisperForCTC from pollitoconpapass +author: John Snow Labs +name: whisper_small_cuzco_quechua_pipeline +date: 2024-09-13 +tags: [qu, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: qu +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_cuzco_quechua_pipeline` is a Quechua model originally trained by pollitoconpapass. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_cuzco_quechua_pipeline_qu_5.5.0_3.0_1726256700132.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_cuzco_quechua_pipeline_qu_5.5.0_3.0_1726256700132.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_cuzco_quechua_pipeline", lang = "qu") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_cuzco_quechua_pipeline", lang = "qu") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_cuzco_quechua_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|qu| +|Size:|1.7 GB| + +## References + +https://huggingface.co/pollitoconpapass/whisper-small-cuzco-quechua + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-whisper_small_hindi_sanchit_gandhi_hi.md b/docs/_posts/ahmedlone127/2024-09-13-whisper_small_hindi_sanchit_gandhi_hi.md new file mode 100644 index 00000000000000..ce4cae34fd565d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-whisper_small_hindi_sanchit_gandhi_hi.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Hindi whisper_small_hindi_sanchit_gandhi WhisperForCTC from sanchit-gandhi +author: John Snow Labs +name: whisper_small_hindi_sanchit_gandhi +date: 2024-09-13 +tags: [hi, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: hi +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_hindi_sanchit_gandhi` is a Hindi model originally trained by sanchit-gandhi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_hindi_sanchit_gandhi_hi_5.5.0_3.0_1726257360072.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_hindi_sanchit_gandhi_hi_5.5.0_3.0_1726257360072.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_hindi_sanchit_gandhi","hi") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_hindi_sanchit_gandhi", "hi") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_hindi_sanchit_gandhi| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|hi| +|Size:|1.7 GB| + +## References + +https://huggingface.co/sanchit-gandhi/whisper-small-hi \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-whisper_small_llm_lingo_trelis_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-13-whisper_small_llm_lingo_trelis_pipeline_en.md new file mode 100644 index 00000000000000..a25a1e23b2bd77 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-whisper_small_llm_lingo_trelis_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_small_llm_lingo_trelis_pipeline pipeline WhisperForCTC from Trelis +author: John Snow Labs +name: whisper_small_llm_lingo_trelis_pipeline +date: 2024-09-13 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_llm_lingo_trelis_pipeline` is a English model originally trained by Trelis. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_llm_lingo_trelis_pipeline_en_5.5.0_3.0_1726223111426.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_llm_lingo_trelis_pipeline_en_5.5.0_3.0_1726223111426.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_llm_lingo_trelis_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_llm_lingo_trelis_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_llm_lingo_trelis_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/Trelis/whisper-small-llm-lingo + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-whisper_small_swahili_jayem_11_en.md b/docs/_posts/ahmedlone127/2024-09-13-whisper_small_swahili_jayem_11_en.md new file mode 100644 index 00000000000000..f8f98e6644d3c5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-whisper_small_swahili_jayem_11_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_small_swahili_jayem_11 WhisperForCTC from Jayem-11 +author: John Snow Labs +name: whisper_small_swahili_jayem_11 +date: 2024-09-13 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_swahili_jayem_11` is a English model originally trained by Jayem-11. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_swahili_jayem_11_en_5.5.0_3.0_1726219179545.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_swahili_jayem_11_en_5.5.0_3.0_1726219179545.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_swahili_jayem_11","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_swahili_jayem_11", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_swahili_jayem_11| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/Jayem-11/whisper-small-swahili \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-withinapps_ndd_mantisbt_test_content_tags_cwadj_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-13-withinapps_ndd_mantisbt_test_content_tags_cwadj_pipeline_en.md new file mode 100644 index 00000000000000..ad4919b6d16894 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-withinapps_ndd_mantisbt_test_content_tags_cwadj_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English withinapps_ndd_mantisbt_test_content_tags_cwadj_pipeline pipeline DistilBertForSequenceClassification from lgk03 +author: John Snow Labs +name: withinapps_ndd_mantisbt_test_content_tags_cwadj_pipeline +date: 2024-09-13 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`withinapps_ndd_mantisbt_test_content_tags_cwadj_pipeline` is a English model originally trained by lgk03. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/withinapps_ndd_mantisbt_test_content_tags_cwadj_pipeline_en_5.5.0_3.0_1726242379802.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/withinapps_ndd_mantisbt_test_content_tags_cwadj_pipeline_en_5.5.0_3.0_1726242379802.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("withinapps_ndd_mantisbt_test_content_tags_cwadj_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("withinapps_ndd_mantisbt_test_content_tags_cwadj_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|withinapps_ndd_mantisbt_test_content_tags_cwadj_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/lgk03/WITHINAPPS_NDD-mantisbt_test-content_tags-CWAdj + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-xlm_robert_base_finetuned_panx_german_french_en.md b/docs/_posts/ahmedlone127/2024-09-13-xlm_robert_base_finetuned_panx_german_french_en.md new file mode 100644 index 00000000000000..735a04fc2f2b07 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-xlm_robert_base_finetuned_panx_german_french_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_robert_base_finetuned_panx_german_french XlmRoBertaForTokenClassification from hiroki-rad +author: John Snow Labs +name: xlm_robert_base_finetuned_panx_german_french +date: 2024-09-13 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_robert_base_finetuned_panx_german_french` is a English model originally trained by hiroki-rad. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_robert_base_finetuned_panx_german_french_en_5.5.0_3.0_1726215677402.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_robert_base_finetuned_panx_german_french_en_5.5.0_3.0_1726215677402.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_robert_base_finetuned_panx_german_french","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_robert_base_finetuned_panx_german_french", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_robert_base_finetuned_panx_german_french| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|858.2 MB| + +## References + +https://huggingface.co/hiroki-rad/xlm-robert-base-finetuned-panx-de-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-xlm_roberta_base_finetuned_panx_english_omersubasi_en.md b/docs/_posts/ahmedlone127/2024-09-13-xlm_roberta_base_finetuned_panx_english_omersubasi_en.md new file mode 100644 index 00000000000000..fa72028ffbf35d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-xlm_roberta_base_finetuned_panx_english_omersubasi_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_english_omersubasi XlmRoBertaForTokenClassification from omersubasi +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_english_omersubasi +date: 2024-09-13 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_english_omersubasi` is a English model originally trained by omersubasi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_omersubasi_en_5.5.0_3.0_1726214873486.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_omersubasi_en_5.5.0_3.0_1726214873486.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_english_omersubasi","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_english_omersubasi", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_english_omersubasi| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|826.4 MB| + +## References + +https://huggingface.co/omersubasi/xlm-roberta-base-finetuned-panx-en \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-xlm_roberta_base_finetuned_panx_german_smallsuper_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-13-xlm_roberta_base_finetuned_panx_german_smallsuper_pipeline_en.md new file mode 100644 index 00000000000000..abeeb18ba036e6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-xlm_roberta_base_finetuned_panx_german_smallsuper_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_smallsuper_pipeline pipeline XlmRoBertaForTokenClassification from smallsuper +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_smallsuper_pipeline +date: 2024-09-13 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_smallsuper_pipeline` is a English model originally trained by smallsuper. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_smallsuper_pipeline_en_5.5.0_3.0_1726238167123.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_smallsuper_pipeline_en_5.5.0_3.0_1726238167123.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_smallsuper_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_smallsuper_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_smallsuper_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/smallsuper/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-xlm_roberta_base_finetuned_panx_italian_handun_en.md b/docs/_posts/ahmedlone127/2024-09-13-xlm_roberta_base_finetuned_panx_italian_handun_en.md new file mode 100644 index 00000000000000..0d326d68fb982e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-xlm_roberta_base_finetuned_panx_italian_handun_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_handun XlmRoBertaForTokenClassification from Handun +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_handun +date: 2024-09-13 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_handun` is a English model originally trained by Handun. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_handun_en_5.5.0_3.0_1726238424157.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_handun_en_5.5.0_3.0_1726238424157.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_handun","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_handun", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_handun| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|828.6 MB| + +## References + +https://huggingface.co/Handun/xlm-roberta-base-finetuned-panx-it \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-xlm_roberta_base_trimmed_english_10000_tweet_sentiment_english_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-13-xlm_roberta_base_trimmed_english_10000_tweet_sentiment_english_pipeline_en.md new file mode 100644 index 00000000000000..d9390866b95095 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-xlm_roberta_base_trimmed_english_10000_tweet_sentiment_english_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_trimmed_english_10000_tweet_sentiment_english_pipeline pipeline XlmRoBertaForSequenceClassification from vocabtrimmer +author: John Snow Labs +name: xlm_roberta_base_trimmed_english_10000_tweet_sentiment_english_pipeline +date: 2024-09-13 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_trimmed_english_10000_tweet_sentiment_english_pipeline` is a English model originally trained by vocabtrimmer. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_trimmed_english_10000_tweet_sentiment_english_pipeline_en_5.5.0_3.0_1726196342418.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_trimmed_english_10000_tweet_sentiment_english_pipeline_en_5.5.0_3.0_1726196342418.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_trimmed_english_10000_tweet_sentiment_english_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_trimmed_english_10000_tweet_sentiment_english_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_trimmed_english_10000_tweet_sentiment_english_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|350.3 MB| + +## References + +https://huggingface.co/vocabtrimmer/xlm-roberta-base-trimmed-en-10000-tweet-sentiment-en + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-albert_hatespeech_classifier6_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-14-albert_hatespeech_classifier6_pipeline_en.md new file mode 100644 index 00000000000000..6409c4a28d6102 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-albert_hatespeech_classifier6_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English albert_hatespeech_classifier6_pipeline pipeline AlbertForSequenceClassification from samuelcolvin26 +author: John Snow Labs +name: albert_hatespeech_classifier6_pipeline +date: 2024-09-14 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained AlbertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`albert_hatespeech_classifier6_pipeline` is a English model originally trained by samuelcolvin26. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/albert_hatespeech_classifier6_pipeline_en_5.5.0_3.0_1726336546941.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/albert_hatespeech_classifier6_pipeline_en_5.5.0_3.0_1726336546941.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("albert_hatespeech_classifier6_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("albert_hatespeech_classifier6_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|albert_hatespeech_classifier6_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|44.2 MB| + +## References + +https://huggingface.co/samuelcolvin26/Albert_Hatespeech_Classifier6 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- AlbertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-albert_model_akash24_en.md b/docs/_posts/ahmedlone127/2024-09-14-albert_model_akash24_en.md new file mode 100644 index 00000000000000..a21437c4f6a398 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-albert_model_akash24_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English albert_model_akash24 AlbertForSequenceClassification from Akash24 +author: John Snow Labs +name: albert_model_akash24 +date: 2024-09-14 +tags: [en, open_source, onnx, sequence_classification, albert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: AlbertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained AlbertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`albert_model_akash24` is a English model originally trained by Akash24. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/albert_model_akash24_en_5.5.0_3.0_1726336527917.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/albert_model_akash24_en_5.5.0_3.0_1726336527917.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = AlbertForSequenceClassification.pretrained("albert_model_akash24","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = AlbertForSequenceClassification.pretrained("albert_model_akash24", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|albert_model_akash24| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|44.3 MB| + +## References + +https://huggingface.co/Akash24/albert_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-arabert_large_algerian_ar.md b/docs/_posts/ahmedlone127/2024-09-14-arabert_large_algerian_ar.md new file mode 100644 index 00000000000000..54e56e3e9dfcfc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-arabert_large_algerian_ar.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Arabic arabert_large_algerian BertForSequenceClassification from Abdou +author: John Snow Labs +name: arabert_large_algerian +date: 2024-09-14 +tags: [ar, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: ar +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`arabert_large_algerian` is a Arabic model originally trained by Abdou. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/arabert_large_algerian_ar_5.5.0_3.0_1726348460967.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/arabert_large_algerian_ar_5.5.0_3.0_1726348460967.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("arabert_large_algerian","ar") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("arabert_large_algerian", "ar") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|arabert_large_algerian| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|ar| +|Size:|1.3 GB| + +## References + +https://huggingface.co/Abdou/arabert-large-algerian \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-bert_base_uncased_finetuned_squad_v2_lauraparra28_en.md b/docs/_posts/ahmedlone127/2024-09-14-bert_base_uncased_finetuned_squad_v2_lauraparra28_en.md new file mode 100644 index 00000000000000..bb9a583f34d95a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-bert_base_uncased_finetuned_squad_v2_lauraparra28_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_uncased_finetuned_squad_v2_lauraparra28 BertForQuestionAnswering from lauraparra28 +author: John Snow Labs +name: bert_base_uncased_finetuned_squad_v2_lauraparra28 +date: 2024-09-14 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetuned_squad_v2_lauraparra28` is a English model originally trained by lauraparra28. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_squad_v2_lauraparra28_en_5.5.0_3.0_1726350029721.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_squad_v2_lauraparra28_en_5.5.0_3.0_1726350029721.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetuned_squad_v2_lauraparra28","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetuned_squad_v2_lauraparra28", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetuned_squad_v2_lauraparra28| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/lauraparra28/bert-base-uncased-finetuned-squad_v2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-bert_finetuned_ner_tokenizer_en.md b/docs/_posts/ahmedlone127/2024-09-14-bert_finetuned_ner_tokenizer_en.md new file mode 100644 index 00000000000000..dfabdb96a55268 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-bert_finetuned_ner_tokenizer_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_finetuned_ner_tokenizer BertForTokenClassification from alban12 +author: John Snow Labs +name: bert_finetuned_ner_tokenizer +date: 2024-09-14 +tags: [en, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_finetuned_ner_tokenizer` is a English model originally trained by alban12. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_finetuned_ner_tokenizer_en_5.5.0_3.0_1726305583884.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_finetuned_ner_tokenizer_en_5.5.0_3.0_1726305583884.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("bert_finetuned_ner_tokenizer","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_finetuned_ner_tokenizer", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_finetuned_ner_tokenizer| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|408.6 MB| + +## References + +https://huggingface.co/alban12/bert-finetuned-ner-tokenizer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-bert_finetuned_ner_tokenizer_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-14-bert_finetuned_ner_tokenizer_pipeline_en.md new file mode 100644 index 00000000000000..179c0e14af0559 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-bert_finetuned_ner_tokenizer_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_finetuned_ner_tokenizer_pipeline pipeline BertForTokenClassification from alban12 +author: John Snow Labs +name: bert_finetuned_ner_tokenizer_pipeline +date: 2024-09-14 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_finetuned_ner_tokenizer_pipeline` is a English model originally trained by alban12. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_finetuned_ner_tokenizer_pipeline_en_5.5.0_3.0_1726305602145.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_finetuned_ner_tokenizer_pipeline_en_5.5.0_3.0_1726305602145.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_finetuned_ner_tokenizer_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_finetuned_ner_tokenizer_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_finetuned_ner_tokenizer_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|408.6 MB| + +## References + +https://huggingface.co/alban12/bert-finetuned-ner-tokenizer + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-bert_turkish_turkish_movie_reviews_pipeline_tr.md b/docs/_posts/ahmedlone127/2024-09-14-bert_turkish_turkish_movie_reviews_pipeline_tr.md new file mode 100644 index 00000000000000..1b51278c5a838e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-bert_turkish_turkish_movie_reviews_pipeline_tr.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Turkish bert_turkish_turkish_movie_reviews_pipeline pipeline BertForSequenceClassification from anilguven +author: John Snow Labs +name: bert_turkish_turkish_movie_reviews_pipeline +date: 2024-09-14 +tags: [tr, open_source, pipeline, onnx] +task: Text Classification +language: tr +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_turkish_turkish_movie_reviews_pipeline` is a Turkish model originally trained by anilguven. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_turkish_turkish_movie_reviews_pipeline_tr_5.5.0_3.0_1726347651521.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_turkish_turkish_movie_reviews_pipeline_tr_5.5.0_3.0_1726347651521.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_turkish_turkish_movie_reviews_pipeline", lang = "tr") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_turkish_turkish_movie_reviews_pipeline", lang = "tr") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_turkish_turkish_movie_reviews_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|tr| +|Size:|414.5 MB| + +## References + +https://huggingface.co/anilguven/bert_tr_turkish_movie_reviews + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-berturk_cased_ner_alierenak_tr.md b/docs/_posts/ahmedlone127/2024-09-14-berturk_cased_ner_alierenak_tr.md new file mode 100644 index 00000000000000..1cd963e78cce75 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-berturk_cased_ner_alierenak_tr.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Turkish berturk_cased_ner_alierenak BertForTokenClassification from alierenak +author: John Snow Labs +name: berturk_cased_ner_alierenak +date: 2024-09-14 +tags: [tr, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: tr +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`berturk_cased_ner_alierenak` is a Turkish model originally trained by alierenak. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/berturk_cased_ner_alierenak_tr_5.5.0_3.0_1726306095147.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/berturk_cased_ner_alierenak_tr_5.5.0_3.0_1726306095147.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("berturk_cased_ner_alierenak","tr") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("berturk_cased_ner_alierenak", "tr") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|berturk_cased_ner_alierenak| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|tr| +|Size:|412.3 MB| + +## References + +https://huggingface.co/alierenak/berturk_cased_ner \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-bsc_bio_ehr_spanish_finetuned_clinais_v2_en.md b/docs/_posts/ahmedlone127/2024-09-14-bsc_bio_ehr_spanish_finetuned_clinais_v2_en.md new file mode 100644 index 00000000000000..af944b214335cc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-bsc_bio_ehr_spanish_finetuned_clinais_v2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bsc_bio_ehr_spanish_finetuned_clinais_v2 RoBertaEmbeddings from joheras +author: John Snow Labs +name: bsc_bio_ehr_spanish_finetuned_clinais_v2 +date: 2024-09-14 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bsc_bio_ehr_spanish_finetuned_clinais_v2` is a English model originally trained by joheras. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bsc_bio_ehr_spanish_finetuned_clinais_v2_en_5.5.0_3.0_1726334330653.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bsc_bio_ehr_spanish_finetuned_clinais_v2_en_5.5.0_3.0_1726334330653.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("bsc_bio_ehr_spanish_finetuned_clinais_v2","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("bsc_bio_ehr_spanish_finetuned_clinais_v2","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bsc_bio_ehr_spanish_finetuned_clinais_v2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|464.3 MB| + +## References + +https://huggingface.co/joheras/bsc-bio-ehr-es-finetuned-clinais-v2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-bsc_bio_ehr_spanish_livingner3_pipeline_es.md b/docs/_posts/ahmedlone127/2024-09-14-bsc_bio_ehr_spanish_livingner3_pipeline_es.md new file mode 100644 index 00000000000000..ae0b151944122a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-bsc_bio_ehr_spanish_livingner3_pipeline_es.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Castilian, Spanish bsc_bio_ehr_spanish_livingner3_pipeline pipeline RoBertaForSequenceClassification from IIC +author: John Snow Labs +name: bsc_bio_ehr_spanish_livingner3_pipeline +date: 2024-09-14 +tags: [es, open_source, pipeline, onnx] +task: Text Classification +language: es +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bsc_bio_ehr_spanish_livingner3_pipeline` is a Castilian, Spanish model originally trained by IIC. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bsc_bio_ehr_spanish_livingner3_pipeline_es_5.5.0_3.0_1726272655991.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bsc_bio_ehr_spanish_livingner3_pipeline_es_5.5.0_3.0_1726272655991.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bsc_bio_ehr_spanish_livingner3_pipeline", lang = "es") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bsc_bio_ehr_spanish_livingner3_pipeline", lang = "es") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bsc_bio_ehr_spanish_livingner3_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|es| +|Size:|435.4 MB| + +## References + +https://huggingface.co/IIC/bsc-bio-ehr-es-livingner3 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-burmese_awesome_eli5_mlm_model_2_en.md b/docs/_posts/ahmedlone127/2024-09-14-burmese_awesome_eli5_mlm_model_2_en.md new file mode 100644 index 00000000000000..edf3f11d8af368 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-burmese_awesome_eli5_mlm_model_2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_awesome_eli5_mlm_model_2 RoBertaEmbeddings from amirhamza11 +author: John Snow Labs +name: burmese_awesome_eli5_mlm_model_2 +date: 2024-09-14 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_eli5_mlm_model_2` is a English model originally trained by amirhamza11. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_eli5_mlm_model_2_en_5.5.0_3.0_1726338676643.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_eli5_mlm_model_2_en_5.5.0_3.0_1726338676643.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("burmese_awesome_eli5_mlm_model_2","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("burmese_awesome_eli5_mlm_model_2","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_eli5_mlm_model_2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|306.5 MB| + +## References + +https://huggingface.co/amirhamza11/my_awesome_eli5_mlm_model_2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-burmese_awesome_qa_model_realtiff_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-14-burmese_awesome_qa_model_realtiff_pipeline_en.md new file mode 100644 index 00000000000000..e250d828381eb3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-burmese_awesome_qa_model_realtiff_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English burmese_awesome_qa_model_realtiff_pipeline pipeline DistilBertForQuestionAnswering from realtiff +author: John Snow Labs +name: burmese_awesome_qa_model_realtiff_pipeline +date: 2024-09-14 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_qa_model_realtiff_pipeline` is a English model originally trained by realtiff. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_realtiff_pipeline_en_5.5.0_3.0_1726335814285.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_realtiff_pipeline_en_5.5.0_3.0_1726335814285.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_qa_model_realtiff_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_qa_model_realtiff_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_qa_model_realtiff_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.3 MB| + +## References + +https://huggingface.co/realtiff/my_awesome_qa_model + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-burmese_awesome_qa_model_simraniitrpr_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-14-burmese_awesome_qa_model_simraniitrpr_pipeline_en.md new file mode 100644 index 00000000000000..ff66c24b26c4da --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-burmese_awesome_qa_model_simraniitrpr_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English burmese_awesome_qa_model_simraniitrpr_pipeline pipeline DistilBertForQuestionAnswering from SimranIITRpr +author: John Snow Labs +name: burmese_awesome_qa_model_simraniitrpr_pipeline +date: 2024-09-14 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_qa_model_simraniitrpr_pipeline` is a English model originally trained by SimranIITRpr. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_simraniitrpr_pipeline_en_5.5.0_3.0_1726335704258.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_simraniitrpr_pipeline_en_5.5.0_3.0_1726335704258.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_qa_model_simraniitrpr_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_qa_model_simraniitrpr_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_qa_model_simraniitrpr_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/SimranIITRpr/my_awesome_qa_model + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-burmese_awesome_qa_model_uday1998_en.md b/docs/_posts/ahmedlone127/2024-09-14-burmese_awesome_qa_model_uday1998_en.md new file mode 100644 index 00000000000000..b96662d1f8e491 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-burmese_awesome_qa_model_uday1998_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English burmese_awesome_qa_model_uday1998 DistilBertForQuestionAnswering from Uday1998 +author: John Snow Labs +name: burmese_awesome_qa_model_uday1998 +date: 2024-09-14 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_qa_model_uday1998` is a English model originally trained by Uday1998. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_uday1998_en_5.5.0_3.0_1726335910492.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_uday1998_en_5.5.0_3.0_1726335910492.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("burmese_awesome_qa_model_uday1998","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("burmese_awesome_qa_model_uday1998", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_qa_model_uday1998| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/Uday1998/my_awesome_qa_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-classify_clickbait_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-14-classify_clickbait_pipeline_en.md new file mode 100644 index 00000000000000..9a7f8294c2e0e2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-classify_clickbait_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English classify_clickbait_pipeline pipeline AlbertForSequenceClassification from rkotari +author: John Snow Labs +name: classify_clickbait_pipeline +date: 2024-09-14 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained AlbertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`classify_clickbait_pipeline` is a English model originally trained by rkotari. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/classify_clickbait_pipeline_en_5.5.0_3.0_1726309356718.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/classify_clickbait_pipeline_en_5.5.0_3.0_1726309356718.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("classify_clickbait_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("classify_clickbait_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|classify_clickbait_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|44.2 MB| + +## References + +https://huggingface.co/rkotari/classify-clickbait + +## Included Models + +- DocumentAssembler +- TokenizerModel +- AlbertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-combined_model_v1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-14-combined_model_v1_pipeline_en.md new file mode 100644 index 00000000000000..bbf89f2dcc77ba --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-combined_model_v1_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English combined_model_v1_pipeline pipeline RoBertaForTokenClassification from ICT2214Team7 +author: John Snow Labs +name: combined_model_v1_pipeline +date: 2024-09-14 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`combined_model_v1_pipeline` is a English model originally trained by ICT2214Team7. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/combined_model_v1_pipeline_en_5.5.0_3.0_1726316018766.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/combined_model_v1_pipeline_en_5.5.0_3.0_1726316018766.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("combined_model_v1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("combined_model_v1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|combined_model_v1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|306.6 MB| + +## References + +https://huggingface.co/ICT2214Team7/Combined_model_v1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-ct_m2_complete_en.md b/docs/_posts/ahmedlone127/2024-09-14-ct_m2_complete_en.md new file mode 100644 index 00000000000000..fd0dc0348bf862 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-ct_m2_complete_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English ct_m2_complete RoBertaEmbeddings from crisistransformers +author: John Snow Labs +name: ct_m2_complete +date: 2024-09-14 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ct_m2_complete` is a English model originally trained by crisistransformers. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ct_m2_complete_en_5.5.0_3.0_1726334896888.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ct_m2_complete_en_5.5.0_3.0_1726334896888.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("ct_m2_complete","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("ct_m2_complete","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ct_m2_complete| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|466.2 MB| + +## References + +https://huggingface.co/crisistransformers/CT-M2-Complete \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-distilbert_base_uncased_finetuned_squad_ethanoutangoun_en.md b/docs/_posts/ahmedlone127/2024-09-14-distilbert_base_uncased_finetuned_squad_ethanoutangoun_en.md new file mode 100644 index 00000000000000..59f2d374b27f68 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-distilbert_base_uncased_finetuned_squad_ethanoutangoun_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_squad_ethanoutangoun DistilBertForQuestionAnswering from ethanoutangoun +author: John Snow Labs +name: distilbert_base_uncased_finetuned_squad_ethanoutangoun +date: 2024-09-14 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_squad_ethanoutangoun` is a English model originally trained by ethanoutangoun. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_ethanoutangoun_en_5.5.0_3.0_1726335670373.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_ethanoutangoun_en_5.5.0_3.0_1726335670373.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_uncased_finetuned_squad_ethanoutangoun","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_uncased_finetuned_squad_ethanoutangoun", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_squad_ethanoutangoun| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/ethanoutangoun/distilbert-base-uncased-finetuned-squad \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-distilbert_base_uncased_finetuned_squad_messiah10_en.md b/docs/_posts/ahmedlone127/2024-09-14-distilbert_base_uncased_finetuned_squad_messiah10_en.md new file mode 100644 index 00000000000000..8ab2120e3f14a5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-distilbert_base_uncased_finetuned_squad_messiah10_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_squad_messiah10 DistilBertForQuestionAnswering from messiah10 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_squad_messiah10 +date: 2024-09-14 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_squad_messiah10` is a English model originally trained by messiah10. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_messiah10_en_5.5.0_3.0_1726335706954.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_messiah10_en_5.5.0_3.0_1726335706954.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_uncased_finetuned_squad_messiah10","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_uncased_finetuned_squad_messiah10", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_squad_messiah10| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/messiah10/distilbert-base-uncased-finetuned-squad \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-distilroberta_base_finetuned_jira_qt_issue_titles_and_bodies_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-14-distilroberta_base_finetuned_jira_qt_issue_titles_and_bodies_pipeline_en.md new file mode 100644 index 00000000000000..b50d1e65dda75f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-distilroberta_base_finetuned_jira_qt_issue_titles_and_bodies_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilroberta_base_finetuned_jira_qt_issue_titles_and_bodies_pipeline pipeline RoBertaEmbeddings from ietz +author: John Snow Labs +name: distilroberta_base_finetuned_jira_qt_issue_titles_and_bodies_pipeline +date: 2024-09-14 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilroberta_base_finetuned_jira_qt_issue_titles_and_bodies_pipeline` is a English model originally trained by ietz. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilroberta_base_finetuned_jira_qt_issue_titles_and_bodies_pipeline_en_5.5.0_3.0_1726338244439.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilroberta_base_finetuned_jira_qt_issue_titles_and_bodies_pipeline_en_5.5.0_3.0_1726338244439.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilroberta_base_finetuned_jira_qt_issue_titles_and_bodies_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilroberta_base_finetuned_jira_qt_issue_titles_and_bodies_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilroberta_base_finetuned_jira_qt_issue_titles_and_bodies_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|306.5 MB| + +## References + +https://huggingface.co/ietz/distilroberta-base-finetuned-jira-qt-issue-titles-and-bodies + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-eli5_mlm_model_someonegg_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-14-eli5_mlm_model_someonegg_pipeline_en.md new file mode 100644 index 00000000000000..aecc61c7cf3520 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-eli5_mlm_model_someonegg_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English eli5_mlm_model_someonegg_pipeline pipeline RoBertaEmbeddings from someonegg +author: John Snow Labs +name: eli5_mlm_model_someonegg_pipeline +date: 2024-09-14 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`eli5_mlm_model_someonegg_pipeline` is a English model originally trained by someonegg. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/eli5_mlm_model_someonegg_pipeline_en_5.5.0_3.0_1726338316312.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/eli5_mlm_model_someonegg_pipeline_en_5.5.0_3.0_1726338316312.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("eli5_mlm_model_someonegg_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("eli5_mlm_model_someonegg_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|eli5_mlm_model_someonegg_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|466.4 MB| + +## References + +https://huggingface.co/someonegg/eli5_mlm_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-english_spanish_en.md b/docs/_posts/ahmedlone127/2024-09-14-english_spanish_en.md new file mode 100644 index 00000000000000..b1d9b1cb4a6512 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-english_spanish_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English english_spanish MarianTransformer from adeebkm +author: John Snow Labs +name: english_spanish +date: 2024-09-14 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`english_spanish` is a English model originally trained by adeebkm. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/english_spanish_en_5.5.0_3.0_1726351408269.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/english_spanish_en_5.5.0_3.0_1726351408269.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("english_spanish","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("english_spanish","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|english_spanish| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|539.9 MB| + +## References + +https://huggingface.co/adeebkm/en-es \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-finetune4en_vietnamese_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-14-finetune4en_vietnamese_pipeline_en.md new file mode 100644 index 00000000000000..af3d35638a7b7f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-finetune4en_vietnamese_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetune4en_vietnamese_pipeline pipeline MarianTransformer from TrinhDacPhu +author: John Snow Labs +name: finetune4en_vietnamese_pipeline +date: 2024-09-14 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetune4en_vietnamese_pipeline` is a English model originally trained by TrinhDacPhu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetune4en_vietnamese_pipeline_en_5.5.0_3.0_1726350746442.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetune4en_vietnamese_pipeline_en_5.5.0_3.0_1726350746442.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetune4en_vietnamese_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetune4en_vietnamese_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetune4en_vietnamese_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|474.8 MB| + +## References + +https://huggingface.co/TrinhDacPhu/finetune4en-vi + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-finetuned_twitter_sentiment_roberta_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-14-finetuned_twitter_sentiment_roberta_pipeline_en.md new file mode 100644 index 00000000000000..5ec234eff2d5c9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-finetuned_twitter_sentiment_roberta_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuned_twitter_sentiment_roberta_pipeline pipeline XlmRoBertaForSequenceClassification from coderSounak +author: John Snow Labs +name: finetuned_twitter_sentiment_roberta_pipeline +date: 2024-09-14 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuned_twitter_sentiment_roberta_pipeline` is a English model originally trained by coderSounak. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuned_twitter_sentiment_roberta_pipeline_en_5.5.0_3.0_1726318129570.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuned_twitter_sentiment_roberta_pipeline_en_5.5.0_3.0_1726318129570.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuned_twitter_sentiment_roberta_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuned_twitter_sentiment_roberta_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuned_twitter_sentiment_roberta_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.0 GB| + +## References + +https://huggingface.co/coderSounak/finetuned_twitter_sentiment_roberta + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-flowberta_en.md b/docs/_posts/ahmedlone127/2024-09-14-flowberta_en.md new file mode 100644 index 00000000000000..c612cae79d875d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-flowberta_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English flowberta RoBertaEmbeddings from BigSalmon +author: John Snow Labs +name: flowberta +date: 2024-09-14 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`flowberta` is a English model originally trained by BigSalmon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/flowberta_en_5.5.0_3.0_1726300135822.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/flowberta_en_5.5.0_3.0_1726300135822.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("flowberta","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("flowberta","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|flowberta| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|306.5 MB| + +## References + +https://huggingface.co/BigSalmon/Flowberta \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-govroberta_base_en.md b/docs/_posts/ahmedlone127/2024-09-14-govroberta_base_en.md new file mode 100644 index 00000000000000..984857e05e0c24 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-govroberta_base_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English govroberta_base RoBertaEmbeddings from ESGBERT +author: John Snow Labs +name: govroberta_base +date: 2024-09-14 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`govroberta_base` is a English model originally trained by ESGBERT. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/govroberta_base_en_5.5.0_3.0_1726334626320.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/govroberta_base_en_5.5.0_3.0_1726334626320.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("govroberta_base","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("govroberta_base","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|govroberta_base| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|466.0 MB| + +## References + +https://huggingface.co/ESGBERT/GovRoBERTa-base \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-greeklegalroberta_v4_en.md b/docs/_posts/ahmedlone127/2024-09-14-greeklegalroberta_v4_en.md new file mode 100644 index 00000000000000..d107aa5373c1cf --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-greeklegalroberta_v4_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English greeklegalroberta_v4 RoBertaEmbeddings from AI-team-UoA +author: John Snow Labs +name: greeklegalroberta_v4 +date: 2024-09-14 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`greeklegalroberta_v4` is a English model originally trained by AI-team-UoA. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/greeklegalroberta_v4_en_5.5.0_3.0_1726300053283.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/greeklegalroberta_v4_en_5.5.0_3.0_1726300053283.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("greeklegalroberta_v4","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("greeklegalroberta_v4","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|greeklegalroberta_v4| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|464.6 MB| + +## References + +https://huggingface.co/AI-team-UoA/GreekLegalRoBERTa_v4 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-hatebertimbau_twitter_pipeline_pt.md b/docs/_posts/ahmedlone127/2024-09-14-hatebertimbau_twitter_pipeline_pt.md new file mode 100644 index 00000000000000..eebc4259665b04 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-hatebertimbau_twitter_pipeline_pt.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Portuguese hatebertimbau_twitter_pipeline pipeline BertForSequenceClassification from knowhate +author: John Snow Labs +name: hatebertimbau_twitter_pipeline +date: 2024-09-14 +tags: [pt, open_source, pipeline, onnx] +task: Text Classification +language: pt +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`hatebertimbau_twitter_pipeline` is a Portuguese model originally trained by knowhate. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/hatebertimbau_twitter_pipeline_pt_5.5.0_3.0_1726348004045.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/hatebertimbau_twitter_pipeline_pt_5.5.0_3.0_1726348004045.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("hatebertimbau_twitter_pipeline", lang = "pt") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("hatebertimbau_twitter_pipeline", lang = "pt") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|hatebertimbau_twitter_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|pt| +|Size:|407.8 MB| + +## References + +https://huggingface.co/knowhate/HateBERTimbau-twitter + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-intent_identifier_13_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-14-intent_identifier_13_pipeline_en.md new file mode 100644 index 00000000000000..6d318eac507bd2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-intent_identifier_13_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English intent_identifier_13_pipeline pipeline BertForSequenceClassification from dotzero24 +author: John Snow Labs +name: intent_identifier_13_pipeline +date: 2024-09-14 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`intent_identifier_13_pipeline` is a English model originally trained by dotzero24. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/intent_identifier_13_pipeline_en_5.5.0_3.0_1726348003236.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/intent_identifier_13_pipeline_en_5.5.0_3.0_1726348003236.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("intent_identifier_13_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("intent_identifier_13_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|intent_identifier_13_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|627.8 MB| + +## References + +https://huggingface.co/dotzero24/intent_identifier-13 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-iwslt17_marian_big_ctx2_cwd0_english_french_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-14-iwslt17_marian_big_ctx2_cwd0_english_french_pipeline_en.md new file mode 100644 index 00000000000000..6404e53778a825 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-iwslt17_marian_big_ctx2_cwd0_english_french_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English iwslt17_marian_big_ctx2_cwd0_english_french_pipeline pipeline MarianTransformer from context-mt +author: John Snow Labs +name: iwslt17_marian_big_ctx2_cwd0_english_french_pipeline +date: 2024-09-14 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`iwslt17_marian_big_ctx2_cwd0_english_french_pipeline` is a English model originally trained by context-mt. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/iwslt17_marian_big_ctx2_cwd0_english_french_pipeline_en_5.5.0_3.0_1726350872257.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/iwslt17_marian_big_ctx2_cwd0_english_french_pipeline_en_5.5.0_3.0_1726350872257.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("iwslt17_marian_big_ctx2_cwd0_english_french_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("iwslt17_marian_big_ctx2_cwd0_english_french_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|iwslt17_marian_big_ctx2_cwd0_english_french_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/context-mt/iwslt17-marian-big-ctx2-cwd0-en-fr + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-iwslt17_marian_small_ctx0_cwd0_english_french_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-14-iwslt17_marian_small_ctx0_cwd0_english_french_pipeline_en.md new file mode 100644 index 00000000000000..3be7dc8432898e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-iwslt17_marian_small_ctx0_cwd0_english_french_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English iwslt17_marian_small_ctx0_cwd0_english_french_pipeline pipeline MarianTransformer from context-mt +author: John Snow Labs +name: iwslt17_marian_small_ctx0_cwd0_english_french_pipeline +date: 2024-09-14 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`iwslt17_marian_small_ctx0_cwd0_english_french_pipeline` is a English model originally trained by context-mt. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/iwslt17_marian_small_ctx0_cwd0_english_french_pipeline_en_5.5.0_3.0_1726351479521.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/iwslt17_marian_small_ctx0_cwd0_english_french_pipeline_en_5.5.0_3.0_1726351479521.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("iwslt17_marian_small_ctx0_cwd0_english_french_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("iwslt17_marian_small_ctx0_cwd0_english_french_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|iwslt17_marian_small_ctx0_cwd0_english_french_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|508.9 MB| + +## References + +https://huggingface.co/context-mt/iwslt17-marian-small-ctx0-cwd0-en-fr + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-lab1_finetuning_jingyi28_en.md b/docs/_posts/ahmedlone127/2024-09-14-lab1_finetuning_jingyi28_en.md new file mode 100644 index 00000000000000..e3bc9d754c9659 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-lab1_finetuning_jingyi28_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English lab1_finetuning_jingyi28 MarianTransformer from Jingyi28 +author: John Snow Labs +name: lab1_finetuning_jingyi28 +date: 2024-09-14 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`lab1_finetuning_jingyi28` is a English model originally trained by Jingyi28. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/lab1_finetuning_jingyi28_en_5.5.0_3.0_1726351726829.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/lab1_finetuning_jingyi28_en_5.5.0_3.0_1726351726829.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("lab1_finetuning_jingyi28","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("lab1_finetuning_jingyi28","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|lab1_finetuning_jingyi28| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|508.3 MB| + +## References + +https://huggingface.co/Jingyi28/lab1_finetuning \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-labour_law_sanskrit_saskta_qa_pipeline_ar.md b/docs/_posts/ahmedlone127/2024-09-14-labour_law_sanskrit_saskta_qa_pipeline_ar.md new file mode 100644 index 00000000000000..19289ee207275b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-labour_law_sanskrit_saskta_qa_pipeline_ar.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Arabic labour_law_sanskrit_saskta_qa_pipeline pipeline BertForQuestionAnswering from faisalaljahlan +author: John Snow Labs +name: labour_law_sanskrit_saskta_qa_pipeline +date: 2024-09-14 +tags: [ar, open_source, pipeline, onnx] +task: Question Answering +language: ar +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`labour_law_sanskrit_saskta_qa_pipeline` is a Arabic model originally trained by faisalaljahlan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/labour_law_sanskrit_saskta_qa_pipeline_ar_5.5.0_3.0_1726349646092.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/labour_law_sanskrit_saskta_qa_pipeline_ar_5.5.0_3.0_1726349646092.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("labour_law_sanskrit_saskta_qa_pipeline", lang = "ar") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("labour_law_sanskrit_saskta_qa_pipeline", lang = "ar") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|labour_law_sanskrit_saskta_qa_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|ar| +|Size:|504.6 MB| + +## References + +https://huggingface.co/faisalaljahlan/Labour-Law-SA-QA + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-limiwhisper_small_korean_dia_gs_ko.md b/docs/_posts/ahmedlone127/2024-09-14-limiwhisper_small_korean_dia_gs_ko.md new file mode 100644 index 00000000000000..d561e9e38bbfe2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-limiwhisper_small_korean_dia_gs_ko.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Korean limiwhisper_small_korean_dia_gs WhisperForCTC from p4b +author: John Snow Labs +name: limiwhisper_small_korean_dia_gs +date: 2024-09-14 +tags: [ko, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: ko +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`limiwhisper_small_korean_dia_gs` is a Korean model originally trained by p4b. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/limiwhisper_small_korean_dia_gs_ko_5.5.0_3.0_1726330725287.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/limiwhisper_small_korean_dia_gs_ko_5.5.0_3.0_1726330725287.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("limiwhisper_small_korean_dia_gs","ko") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("limiwhisper_small_korean_dia_gs", "ko") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|limiwhisper_small_korean_dia_gs| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|ko| +|Size:|1.1 GB| + +## References + +https://huggingface.co/p4b/limiwhisper-small-ko-dia-gs \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-lisa_whisper_small_latest_en.md b/docs/_posts/ahmedlone127/2024-09-14-lisa_whisper_small_latest_en.md new file mode 100644 index 00000000000000..9c847800dcbb69 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-lisa_whisper_small_latest_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English lisa_whisper_small_latest WhisperForCTC from Shubham09 +author: John Snow Labs +name: lisa_whisper_small_latest +date: 2024-09-14 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`lisa_whisper_small_latest` is a English model originally trained by Shubham09. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/lisa_whisper_small_latest_en_5.5.0_3.0_1726298616674.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/lisa_whisper_small_latest_en_5.5.0_3.0_1726298616674.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("lisa_whisper_small_latest","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("lisa_whisper_small_latest", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|lisa_whisper_small_latest| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/Shubham09/LISA_Whisper_small_latest \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-marathi_sentiment_tweets_mr.md b/docs/_posts/ahmedlone127/2024-09-14-marathi_sentiment_tweets_mr.md new file mode 100644 index 00000000000000..1952f636802096 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-marathi_sentiment_tweets_mr.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Marathi marathi_sentiment_tweets BertForSequenceClassification from l3cube-pune +author: John Snow Labs +name: marathi_sentiment_tweets +date: 2024-09-14 +tags: [mr, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: mr +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`marathi_sentiment_tweets` is a Marathi model originally trained by l3cube-pune. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/marathi_sentiment_tweets_mr_5.5.0_3.0_1726348309010.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/marathi_sentiment_tweets_mr_5.5.0_3.0_1726348309010.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("marathi_sentiment_tweets","mr") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("marathi_sentiment_tweets", "mr") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|marathi_sentiment_tweets| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|mr| +|Size:|892.8 MB| + +## References + +https://huggingface.co/l3cube-pune/marathi-sentiment-tweets \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-marian_finetuned_kde4_english_tonga_tonga_islands_french_neerajnigam6_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-14-marian_finetuned_kde4_english_tonga_tonga_islands_french_neerajnigam6_pipeline_en.md new file mode 100644 index 00000000000000..41cc309903c99a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-marian_finetuned_kde4_english_tonga_tonga_islands_french_neerajnigam6_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English marian_finetuned_kde4_english_tonga_tonga_islands_french_neerajnigam6_pipeline pipeline MarianTransformer from neerajnigam6 +author: John Snow Labs +name: marian_finetuned_kde4_english_tonga_tonga_islands_french_neerajnigam6_pipeline +date: 2024-09-14 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`marian_finetuned_kde4_english_tonga_tonga_islands_french_neerajnigam6_pipeline` is a English model originally trained by neerajnigam6. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/marian_finetuned_kde4_english_tonga_tonga_islands_french_neerajnigam6_pipeline_en_5.5.0_3.0_1726351590800.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/marian_finetuned_kde4_english_tonga_tonga_islands_french_neerajnigam6_pipeline_en_5.5.0_3.0_1726351590800.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("marian_finetuned_kde4_english_tonga_tonga_islands_french_neerajnigam6_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("marian_finetuned_kde4_english_tonga_tonga_islands_french_neerajnigam6_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|marian_finetuned_kde4_english_tonga_tonga_islands_french_neerajnigam6_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|523.4 MB| + +## References + +https://huggingface.co/neerajnigam6/marian-finetuned-kde4-en-to-fr + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-marian_finetuned_kde4_korean_tonga_tonga_islands_english_en.md b/docs/_posts/ahmedlone127/2024-09-14-marian_finetuned_kde4_korean_tonga_tonga_islands_english_en.md new file mode 100644 index 00000000000000..6c98b8bb74e014 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-marian_finetuned_kde4_korean_tonga_tonga_islands_english_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English marian_finetuned_kde4_korean_tonga_tonga_islands_english MarianTransformer from JIEUN21 +author: John Snow Labs +name: marian_finetuned_kde4_korean_tonga_tonga_islands_english +date: 2024-09-14 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`marian_finetuned_kde4_korean_tonga_tonga_islands_english` is a English model originally trained by JIEUN21. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/marian_finetuned_kde4_korean_tonga_tonga_islands_english_en_5.5.0_3.0_1726351742629.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/marian_finetuned_kde4_korean_tonga_tonga_islands_english_en_5.5.0_3.0_1726351742629.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("marian_finetuned_kde4_korean_tonga_tonga_islands_english","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("marian_finetuned_kde4_korean_tonga_tonga_islands_english","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|marian_finetuned_kde4_korean_tonga_tonga_islands_english| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|540.6 MB| + +## References + +https://huggingface.co/JIEUN21/marian-finetuned-kde4-ko-to-en \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-marian_random_kde4_english_tonga_tonga_islands_french_chloe018_en.md b/docs/_posts/ahmedlone127/2024-09-14-marian_random_kde4_english_tonga_tonga_islands_french_chloe018_en.md new file mode 100644 index 00000000000000..9a344945986320 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-marian_random_kde4_english_tonga_tonga_islands_french_chloe018_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English marian_random_kde4_english_tonga_tonga_islands_french_chloe018 MarianTransformer from chloe018 +author: John Snow Labs +name: marian_random_kde4_english_tonga_tonga_islands_french_chloe018 +date: 2024-09-14 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`marian_random_kde4_english_tonga_tonga_islands_french_chloe018` is a English model originally trained by chloe018. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/marian_random_kde4_english_tonga_tonga_islands_french_chloe018_en_5.5.0_3.0_1726351556730.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/marian_random_kde4_english_tonga_tonga_islands_french_chloe018_en_5.5.0_3.0_1726351556730.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("marian_random_kde4_english_tonga_tonga_islands_french_chloe018","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("marian_random_kde4_english_tonga_tonga_islands_french_chloe018","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|marian_random_kde4_english_tonga_tonga_islands_french_chloe018| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|509.8 MB| + +## References + +https://huggingface.co/chloe018/marian-random-kde4-en-to-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-medical_english_chinese_9_1_pt2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-14-medical_english_chinese_9_1_pt2_pipeline_en.md new file mode 100644 index 00000000000000..9c7615e752bdb2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-medical_english_chinese_9_1_pt2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English medical_english_chinese_9_1_pt2_pipeline pipeline MarianTransformer from DogGoesBark +author: John Snow Labs +name: medical_english_chinese_9_1_pt2_pipeline +date: 2024-09-14 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`medical_english_chinese_9_1_pt2_pipeline` is a English model originally trained by DogGoesBark. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/medical_english_chinese_9_1_pt2_pipeline_en_5.5.0_3.0_1726351419793.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/medical_english_chinese_9_1_pt2_pipeline_en_5.5.0_3.0_1726351419793.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("medical_english_chinese_9_1_pt2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("medical_english_chinese_9_1_pt2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|medical_english_chinese_9_1_pt2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|541.9 MB| + +## References + +https://huggingface.co/DogGoesBark/medical_en_zh_9_1_pt2 + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-medical_tiny_english_1_0v_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-14-medical_tiny_english_1_0v_pipeline_en.md new file mode 100644 index 00000000000000..11945be6e10789 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-medical_tiny_english_1_0v_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English medical_tiny_english_1_0v_pipeline pipeline WhisperForCTC from Dev372 +author: John Snow Labs +name: medical_tiny_english_1_0v_pipeline +date: 2024-09-14 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`medical_tiny_english_1_0v_pipeline` is a English model originally trained by Dev372. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/medical_tiny_english_1_0v_pipeline_en_5.5.0_3.0_1726298923756.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/medical_tiny_english_1_0v_pipeline_en_5.5.0_3.0_1726298923756.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("medical_tiny_english_1_0v_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("medical_tiny_english_1_0v_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|medical_tiny_english_1_0v_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|394.0 MB| + +## References + +https://huggingface.co/Dev372/Medical_tiny_en_1_0v + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-medrurobertalarge_sayula_popoluca_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-14-medrurobertalarge_sayula_popoluca_pipeline_en.md new file mode 100644 index 00000000000000..98eb9cbeb206e2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-medrurobertalarge_sayula_popoluca_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English medrurobertalarge_sayula_popoluca_pipeline pipeline RoBertaForTokenClassification from DimasikKurd +author: John Snow Labs +name: medrurobertalarge_sayula_popoluca_pipeline +date: 2024-09-14 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`medrurobertalarge_sayula_popoluca_pipeline` is a English model originally trained by DimasikKurd. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/medrurobertalarge_sayula_popoluca_pipeline_en_5.5.0_3.0_1726315195837.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/medrurobertalarge_sayula_popoluca_pipeline_en_5.5.0_3.0_1726315195837.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("medrurobertalarge_sayula_popoluca_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("medrurobertalarge_sayula_popoluca_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|medrurobertalarge_sayula_popoluca_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/DimasikKurd/MedRuRobertaLarge_pos + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-mobilebert_sst2_en.md b/docs/_posts/ahmedlone127/2024-09-14-mobilebert_sst2_en.md new file mode 100644 index 00000000000000..f28fcbc5a5bf8f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-mobilebert_sst2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English mobilebert_sst2 BertForSequenceClassification from Alireza1044 +author: John Snow Labs +name: mobilebert_sst2 +date: 2024-09-14 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`mobilebert_sst2` is a English model originally trained by Alireza1044. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/mobilebert_sst2_en_5.5.0_3.0_1726348524905.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/mobilebert_sst2_en_5.5.0_3.0_1726348524905.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("mobilebert_sst2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("mobilebert_sst2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|mobilebert_sst2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|92.5 MB| + +## References + +https://huggingface.co/Alireza1044/mobilebert_sst2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-model_for_french_en.md b/docs/_posts/ahmedlone127/2024-09-14-model_for_french_en.md new file mode 100644 index 00000000000000..29a9901c78e794 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-model_for_french_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English model_for_french XlmRoBertaForTokenClassification from LGLT +author: John Snow Labs +name: model_for_french +date: 2024-09-14 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`model_for_french` is a English model originally trained by LGLT. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/model_for_french_en_5.5.0_3.0_1726345895583.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/model_for_french_en_5.5.0_3.0_1726345895583.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("model_for_french","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("model_for_french", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|model_for_french| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|781.6 MB| + +## References + +https://huggingface.co/LGLT/model_for_fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-mrc_v2_en.md b/docs/_posts/ahmedlone127/2024-09-14-mrc_v2_en.md new file mode 100644 index 00000000000000..2955e65cbbc5e1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-mrc_v2_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English mrc_v2 RoBertaForQuestionAnswering from Matheusmatos2916 +author: John Snow Labs +name: mrc_v2 +date: 2024-09-14 +tags: [en, open_source, onnx, question_answering, roberta] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`mrc_v2` is a English model originally trained by Matheusmatos2916. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/mrc_v2_en_5.5.0_3.0_1726343097166.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/mrc_v2_en_5.5.0_3.0_1726343097166.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = RoBertaForQuestionAnswering.pretrained("mrc_v2","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = RoBertaForQuestionAnswering.pretrained("mrc_v2", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|mrc_v2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|463.6 MB| + +## References + +https://huggingface.co/Matheusmatos2916/MRC_v2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-ner_roberta_en.md b/docs/_posts/ahmedlone127/2024-09-14-ner_roberta_en.md new file mode 100644 index 00000000000000..e514b5c3f5cdaf --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-ner_roberta_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English ner_roberta RoBertaForTokenClassification from textminr +author: John Snow Labs +name: ner_roberta +date: 2024-09-14 +tags: [en, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ner_roberta` is a English model originally trained by textminr. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ner_roberta_en_5.5.0_3.0_1726307053675.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ner_roberta_en_5.5.0_3.0_1726307053675.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("ner_roberta","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("ner_roberta", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ner_roberta| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|416.0 MB| + +## References + +https://huggingface.co/textminr/ner_roberta \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-norwegian_roberta_base_highlr_512_en.md b/docs/_posts/ahmedlone127/2024-09-14-norwegian_roberta_base_highlr_512_en.md new file mode 100644 index 00000000000000..bc61b7ec71bfd9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-norwegian_roberta_base_highlr_512_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English norwegian_roberta_base_highlr_512 RoBertaEmbeddings from pere +author: John Snow Labs +name: norwegian_roberta_base_highlr_512 +date: 2024-09-14 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`norwegian_roberta_base_highlr_512` is a English model originally trained by pere. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/norwegian_roberta_base_highlr_512_en_5.5.0_3.0_1726338120835.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/norwegian_roberta_base_highlr_512_en_5.5.0_3.0_1726338120835.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("norwegian_roberta_base_highlr_512","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("norwegian_roberta_base_highlr_512","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|norwegian_roberta_base_highlr_512| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|466.0 MB| + +## References + +https://huggingface.co/pere/norwegian-roberta-base-highlr-512 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-opus_maltese_english_indonesian_open_subtitles_en.md b/docs/_posts/ahmedlone127/2024-09-14-opus_maltese_english_indonesian_open_subtitles_en.md new file mode 100644 index 00000000000000..62149a1b47ce62 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-opus_maltese_english_indonesian_open_subtitles_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English opus_maltese_english_indonesian_open_subtitles MarianTransformer from yonathanstwn +author: John Snow Labs +name: opus_maltese_english_indonesian_open_subtitles +date: 2024-09-14 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opus_maltese_english_indonesian_open_subtitles` is a English model originally trained by yonathanstwn. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opus_maltese_english_indonesian_open_subtitles_en_5.5.0_3.0_1726351001337.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opus_maltese_english_indonesian_open_subtitles_en_5.5.0_3.0_1726351001337.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("opus_maltese_english_indonesian_open_subtitles","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("opus_maltese_english_indonesian_open_subtitles","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opus_maltese_english_indonesian_open_subtitles| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|481.6 MB| + +## References + +https://huggingface.co/yonathanstwn/opus-mt-en-id-open-subtitles \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-opus_maltese_english_indonesian_open_subtitles_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-14-opus_maltese_english_indonesian_open_subtitles_pipeline_en.md new file mode 100644 index 00000000000000..907f2153c128e1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-opus_maltese_english_indonesian_open_subtitles_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English opus_maltese_english_indonesian_open_subtitles_pipeline pipeline MarianTransformer from yonathanstwn +author: John Snow Labs +name: opus_maltese_english_indonesian_open_subtitles_pipeline +date: 2024-09-14 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opus_maltese_english_indonesian_open_subtitles_pipeline` is a English model originally trained by yonathanstwn. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opus_maltese_english_indonesian_open_subtitles_pipeline_en_5.5.0_3.0_1726351022468.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opus_maltese_english_indonesian_open_subtitles_pipeline_en_5.5.0_3.0_1726351022468.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("opus_maltese_english_indonesian_open_subtitles_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("opus_maltese_english_indonesian_open_subtitles_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opus_maltese_english_indonesian_open_subtitles_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|482.1 MB| + +## References + +https://huggingface.co/yonathanstwn/opus-mt-en-id-open-subtitles + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_lekshmiprabha_en.md b/docs/_posts/ahmedlone127/2024-09-14-opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_lekshmiprabha_en.md new file mode 100644 index 00000000000000..837945a3b76b56 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_lekshmiprabha_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_lekshmiprabha MarianTransformer from Lekshmiprabha +author: John Snow Labs +name: opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_lekshmiprabha +date: 2024-09-14 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_lekshmiprabha` is a English model originally trained by Lekshmiprabha. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_lekshmiprabha_en_5.5.0_3.0_1726350546314.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_lekshmiprabha_en_5.5.0_3.0_1726350546314.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_lekshmiprabha","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_lekshmiprabha","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_lekshmiprabha| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|508.6 MB| + +## References + +https://huggingface.co/Lekshmiprabha/opus-mt-en-ro-finetuned-en-to-ro \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-personal_es.md b/docs/_posts/ahmedlone127/2024-09-14-personal_es.md new file mode 100644 index 00000000000000..d651509e448aab --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-personal_es.md @@ -0,0 +1,86 @@ +--- +layout: model +title: Castilian, Spanish personal BertForQuestionAnswering from Antonio49 +author: John Snow Labs +name: personal +date: 2024-09-14 +tags: [es, open_source, onnx, question_answering, bert] +task: Question Answering +language: es +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`personal` is a Castilian, Spanish model originally trained by Antonio49. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/personal_es_5.5.0_3.0_1726349831860.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/personal_es_5.5.0_3.0_1726349831860.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("personal","es") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("personal", "es") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|personal| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|es| +|Size:|409.7 MB| + +## References + +https://huggingface.co/Antonio49/Personal \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-pig_latin_tonga_tonga_islands_eng_en.md b/docs/_posts/ahmedlone127/2024-09-14-pig_latin_tonga_tonga_islands_eng_en.md new file mode 100644 index 00000000000000..c68210616afec4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-pig_latin_tonga_tonga_islands_eng_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English pig_latin_tonga_tonga_islands_eng MarianTransformer from soschuetze +author: John Snow Labs +name: pig_latin_tonga_tonga_islands_eng +date: 2024-09-14 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`pig_latin_tonga_tonga_islands_eng` is a English model originally trained by soschuetze. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/pig_latin_tonga_tonga_islands_eng_en_5.5.0_3.0_1726350916879.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/pig_latin_tonga_tonga_islands_eng_en_5.5.0_3.0_1726350916879.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("pig_latin_tonga_tonga_islands_eng","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("pig_latin_tonga_tonga_islands_eng","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|pig_latin_tonga_tonga_islands_eng| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|532.7 MB| + +## References + +https://huggingface.co/soschuetze/pig-latin-to-eng \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-polibert_sanskrit_saskta_it.md b/docs/_posts/ahmedlone127/2024-09-14-polibert_sanskrit_saskta_it.md new file mode 100644 index 00000000000000..0e33af5a3f35f4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-polibert_sanskrit_saskta_it.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Italian polibert_sanskrit_saskta BertForSequenceClassification from gbarone77 +author: John Snow Labs +name: polibert_sanskrit_saskta +date: 2024-09-14 +tags: [it, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: it +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`polibert_sanskrit_saskta` is a Italian model originally trained by gbarone77. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/polibert_sanskrit_saskta_it_5.5.0_3.0_1726347712923.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/polibert_sanskrit_saskta_it_5.5.0_3.0_1726347712923.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("polibert_sanskrit_saskta","it") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("polibert_sanskrit_saskta", "it") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|polibert_sanskrit_saskta| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|it| +|Size:|414.8 MB| + +## References + +https://huggingface.co/gbarone77/polibert_sa \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-quran_recitation_errors_test_pipeline_ar.md b/docs/_posts/ahmedlone127/2024-09-14-quran_recitation_errors_test_pipeline_ar.md new file mode 100644 index 00000000000000..bf98830b8f9814 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-quran_recitation_errors_test_pipeline_ar.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Arabic quran_recitation_errors_test_pipeline pipeline WhisperForCTC from cherifkhalifah +author: John Snow Labs +name: quran_recitation_errors_test_pipeline +date: 2024-09-14 +tags: [ar, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: ar +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`quran_recitation_errors_test_pipeline` is a Arabic model originally trained by cherifkhalifah. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/quran_recitation_errors_test_pipeline_ar_5.5.0_3.0_1726329749260.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/quran_recitation_errors_test_pipeline_ar_5.5.0_3.0_1726329749260.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("quran_recitation_errors_test_pipeline", lang = "ar") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("quran_recitation_errors_test_pipeline", lang = "ar") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|quran_recitation_errors_test_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|ar| +|Size:|390.6 MB| + +## References + +https://huggingface.co/cherifkhalifah/quran-recitation-errors-test + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-radbert_roberta_4m_ucsd_va_health_en.md b/docs/_posts/ahmedlone127/2024-09-14-radbert_roberta_4m_ucsd_va_health_en.md new file mode 100644 index 00000000000000..bc2466ee238c2f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-radbert_roberta_4m_ucsd_va_health_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English radbert_roberta_4m_ucsd_va_health RoBertaEmbeddings from UCSD-VA-health +author: John Snow Labs +name: radbert_roberta_4m_ucsd_va_health +date: 2024-09-14 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`radbert_roberta_4m_ucsd_va_health` is a English model originally trained by UCSD-VA-health. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/radbert_roberta_4m_ucsd_va_health_en_5.5.0_3.0_1726300665831.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/radbert_roberta_4m_ucsd_va_health_en_5.5.0_3.0_1726300665831.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("radbert_roberta_4m_ucsd_va_health","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("radbert_roberta_4m_ucsd_va_health","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|radbert_roberta_4m_ucsd_va_health| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|465.7 MB| + +## References + +https://huggingface.co/UCSD-VA-health/RadBERT-RoBERTa-4m \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-roberta_augmented_finetuned_atis_1pct_v1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-14-roberta_augmented_finetuned_atis_1pct_v1_pipeline_en.md new file mode 100644 index 00000000000000..dd454b7e52f523 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-roberta_augmented_finetuned_atis_1pct_v1_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_augmented_finetuned_atis_1pct_v1_pipeline pipeline RoBertaForSequenceClassification from benayas +author: John Snow Labs +name: roberta_augmented_finetuned_atis_1pct_v1_pipeline +date: 2024-09-14 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_augmented_finetuned_atis_1pct_v1_pipeline` is a English model originally trained by benayas. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_augmented_finetuned_atis_1pct_v1_pipeline_en_5.5.0_3.0_1726272010732.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_augmented_finetuned_atis_1pct_v1_pipeline_en_5.5.0_3.0_1726272010732.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_augmented_finetuned_atis_1pct_v1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_augmented_finetuned_atis_1pct_v1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_augmented_finetuned_atis_1pct_v1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|426.4 MB| + +## References + +https://huggingface.co/benayas/roberta-augmented-finetuned-atis_1pct_v1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-roberta_base_biomedical_clinical_spanish_finetuned_clinais_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-14-roberta_base_biomedical_clinical_spanish_finetuned_clinais_pipeline_en.md new file mode 100644 index 00000000000000..626d94a8b1cd2c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-roberta_base_biomedical_clinical_spanish_finetuned_clinais_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_biomedical_clinical_spanish_finetuned_clinais_pipeline pipeline RoBertaEmbeddings from joheras +author: John Snow Labs +name: roberta_base_biomedical_clinical_spanish_finetuned_clinais_pipeline +date: 2024-09-14 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_biomedical_clinical_spanish_finetuned_clinais_pipeline` is a English model originally trained by joheras. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_biomedical_clinical_spanish_finetuned_clinais_pipeline_en_5.5.0_3.0_1726334130651.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_biomedical_clinical_spanish_finetuned_clinais_pipeline_en_5.5.0_3.0_1726334130651.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_biomedical_clinical_spanish_finetuned_clinais_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_biomedical_clinical_spanish_finetuned_clinais_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_biomedical_clinical_spanish_finetuned_clinais_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|469.4 MB| + +## References + +https://huggingface.co/joheras/roberta-base-biomedical-clinical-es-finetuned-clinais + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-roberta_base_epoch_21_en.md b/docs/_posts/ahmedlone127/2024-09-14-roberta_base_epoch_21_en.md new file mode 100644 index 00000000000000..3e00848ba315ae --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-roberta_base_epoch_21_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_base_epoch_21 RoBertaEmbeddings from yanaiela +author: John Snow Labs +name: roberta_base_epoch_21 +date: 2024-09-14 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_epoch_21` is a English model originally trained by yanaiela. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_epoch_21_en_5.5.0_3.0_1726338081851.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_epoch_21_en_5.5.0_3.0_1726338081851.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("roberta_base_epoch_21","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("roberta_base_epoch_21","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_epoch_21| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|297.3 MB| + +## References + +https://huggingface.co/yanaiela/roberta-base-epoch_21 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-roberta_base_finetune_subjqa_en.md b/docs/_posts/ahmedlone127/2024-09-14-roberta_base_finetune_subjqa_en.md new file mode 100644 index 00000000000000..38dc734da1481d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-roberta_base_finetune_subjqa_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English roberta_base_finetune_subjqa RoBertaForQuestionAnswering from DucQuynh +author: John Snow Labs +name: roberta_base_finetune_subjqa +date: 2024-09-14 +tags: [en, open_source, onnx, question_answering, roberta] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_finetune_subjqa` is a English model originally trained by DucQuynh. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_finetune_subjqa_en_5.5.0_3.0_1726343090766.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_finetune_subjqa_en_5.5.0_3.0_1726343090766.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = RoBertaForQuestionAnswering.pretrained("roberta_base_finetune_subjqa","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = RoBertaForQuestionAnswering.pretrained("roberta_base_finetune_subjqa", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_finetune_subjqa| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|463.9 MB| + +## References + +https://huggingface.co/DucQuynh/roberta-base-finetune-subjqa \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-roberta_base_mlm_manojalexender_en.md b/docs/_posts/ahmedlone127/2024-09-14-roberta_base_mlm_manojalexender_en.md new file mode 100644 index 00000000000000..93fe34a95962a9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-roberta_base_mlm_manojalexender_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_base_mlm_manojalexender RoBertaEmbeddings from ManojAlexender +author: John Snow Labs +name: roberta_base_mlm_manojalexender +date: 2024-09-14 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_mlm_manojalexender` is a English model originally trained by ManojAlexender. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_mlm_manojalexender_en_5.5.0_3.0_1726338565370.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_mlm_manojalexender_en_5.5.0_3.0_1726338565370.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("roberta_base_mlm_manojalexender","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("roberta_base_mlm_manojalexender","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_mlm_manojalexender| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|466.2 MB| + +## References + +https://huggingface.co/ManojAlexender/roberta-base_MLM \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-roberta_base_ner_demo_bek1_pipeline_mn.md b/docs/_posts/ahmedlone127/2024-09-14-roberta_base_ner_demo_bek1_pipeline_mn.md new file mode 100644 index 00000000000000..fcd4a36e010197 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-roberta_base_ner_demo_bek1_pipeline_mn.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Mongolian roberta_base_ner_demo_bek1_pipeline pipeline RoBertaForTokenClassification from bek1 +author: John Snow Labs +name: roberta_base_ner_demo_bek1_pipeline +date: 2024-09-14 +tags: [mn, open_source, pipeline, onnx] +task: Named Entity Recognition +language: mn +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_ner_demo_bek1_pipeline` is a Mongolian model originally trained by bek1. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_ner_demo_bek1_pipeline_mn_5.5.0_3.0_1726314159587.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_ner_demo_bek1_pipeline_mn_5.5.0_3.0_1726314159587.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_ner_demo_bek1_pipeline", lang = "mn") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_ner_demo_bek1_pipeline", lang = "mn") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_ner_demo_bek1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|mn| +|Size:|465.7 MB| + +## References + +https://huggingface.co/bek1/roberta-base-ner-demo + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-roberta_cyner_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-14-roberta_cyner_pipeline_en.md new file mode 100644 index 00000000000000..7783452655c8d1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-roberta_cyner_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_cyner_pipeline pipeline RoBertaForTokenClassification from Cyber-ThreaD +author: John Snow Labs +name: roberta_cyner_pipeline +date: 2024-09-14 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_cyner_pipeline` is a English model originally trained by Cyber-ThreaD. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_cyner_pipeline_en_5.5.0_3.0_1726306461705.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_cyner_pipeline_en_5.5.0_3.0_1726306461705.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_cyner_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_cyner_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_cyner_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|426.0 MB| + +## References + +https://huggingface.co/Cyber-ThreaD/RoBERTa-CyNER + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-roberta_large_conll2003_titanbot_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-14-roberta_large_conll2003_titanbot_pipeline_en.md new file mode 100644 index 00000000000000..04d39e9f0a2124 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-roberta_large_conll2003_titanbot_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_large_conll2003_titanbot_pipeline pipeline RoBertaForTokenClassification from titanbot +author: John Snow Labs +name: roberta_large_conll2003_titanbot_pipeline +date: 2024-09-14 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_large_conll2003_titanbot_pipeline` is a English model originally trained by titanbot. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_large_conll2003_titanbot_pipeline_en_5.5.0_3.0_1726314452290.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_large_conll2003_titanbot_pipeline_en_5.5.0_3.0_1726314452290.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_large_conll2003_titanbot_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_large_conll2003_titanbot_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_large_conll2003_titanbot_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/titanbot/Roberta-Large-CONLL2003 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-roberta_med_small_1m_3_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-14-roberta_med_small_1m_3_pipeline_en.md new file mode 100644 index 00000000000000..7a9f87333dc3bd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-roberta_med_small_1m_3_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_med_small_1m_3_pipeline pipeline RoBertaEmbeddings from nyu-mll +author: John Snow Labs +name: roberta_med_small_1m_3_pipeline +date: 2024-09-14 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_med_small_1m_3_pipeline` is a English model originally trained by nyu-mll. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_med_small_1m_3_pipeline_en_5.5.0_3.0_1726300290380.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_med_small_1m_3_pipeline_en_5.5.0_3.0_1726300290380.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_med_small_1m_3_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_med_small_1m_3_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_med_small_1m_3_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|108.0 MB| + +## References + +https://huggingface.co/nyu-mll/roberta-med-small-1M-3 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-roberta_ner_deid_roberta_i2b2_en.md b/docs/_posts/ahmedlone127/2024-09-14-roberta_ner_deid_roberta_i2b2_en.md new file mode 100644 index 00000000000000..84b3d690777319 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-roberta_ner_deid_roberta_i2b2_en.md @@ -0,0 +1,122 @@ +--- +layout: model +title: English RobertaForTokenClassification Cased model (from obi) +author: John Snow Labs +name: roberta_ner_deid_roberta_i2b2 +date: 2024-09-14 +tags: [bert, ner, open_source, en, onnx, openvino] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: openvino +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RobertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `deid_roberta_i2b2` is a English model originally trained by `obi`. + +## Predicted Entities + +`DATE`, `L-AGE`, `U-PATIENT`, `L-STAFF`, `U-OTHERPHI`, `U-ID`, `EMAIL`, `U-LOC`, `L-HOSP`, `L-PATIENT`, `PATIENT`, `PHONE`, `U-PHONE`, `L-OTHERPHI`, `HOSP`, `L-PATORG`, `AGE`, `U-EMAIL`, `L-ID`, `U-HOSP`, `U-AGE`, `OTHERPHI`, `LOC`, `ID`, `U-DATE`, `L-DATE`, `U-PATORG`, `L-PHONE`, `STAFF`, `L-EMAIL`, `PATORG`, `U-STAFF`, `L-LOC` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_ner_deid_roberta_i2b2_en_5.5.0_3.0_1726298204413.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_ner_deid_roberta_i2b2_en_5.5.0_3.0_1726298204413.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("roberta_ner_deid_roberta_i2b2","en") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("roberta_ner_deid_roberta_i2b2","en") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.ner.roberta.by_obi").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_ner_deid_roberta_i2b2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[token, document]| +|Output Labels:|[label]| +|Language:|en| +|Size:|1.3 GB| +|Case sensitive:|true| + +## References + +References + +References + +- https://huggingface.co/obi/deid_roberta_i2b2 +- https://arxiv.org/pdf/1907.11692.pdf +- https://github.com/obi-ml-public/ehr_deidentification/tree/master/steps/train +- https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4978170/ +- https://www.hhs.gov/hipaa/for-professionals/privacy/laws-regulations/index.html +- https://github.com/obi-ml-public/ehr_deidentification +- https://github.com/obi-ml-public/ehr_deidentification/tree/master/steps/forward_pass +- https://github.com/obi-ml-public/ehr_deidentification/blob/master/AnnotationGuidelines.md \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-roberta_social_roles_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-14-roberta_social_roles_pipeline_en.md new file mode 100644 index 00000000000000..da66c06a998eef --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-roberta_social_roles_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_social_roles_pipeline pipeline RoBertaForTokenClassification from lucy3 +author: John Snow Labs +name: roberta_social_roles_pipeline +date: 2024-09-14 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_social_roles_pipeline` is a English model originally trained by lucy3. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_social_roles_pipeline_en_5.5.0_3.0_1726314510778.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_social_roles_pipeline_en_5.5.0_3.0_1726314510778.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_social_roles_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_social_roles_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_social_roles_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|466.4 MB| + +## References + +https://huggingface.co/lucy3/roberta_social_roles + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-roberta_time_identification_en.md b/docs/_posts/ahmedlone127/2024-09-14-roberta_time_identification_en.md new file mode 100644 index 00000000000000..f5a5bd26dfbd90 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-roberta_time_identification_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_time_identification RoBertaForTokenClassification from DAMO-NLP-SG +author: John Snow Labs +name: roberta_time_identification +date: 2024-09-14 +tags: [en, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_time_identification` is a English model originally trained by DAMO-NLP-SG. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_time_identification_en_5.5.0_3.0_1726306507195.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_time_identification_en_5.5.0_3.0_1726306507195.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_time_identification","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_time_identification", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_time_identification| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/DAMO-NLP-SG/roberta-time_identification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-robertabase_ppt_occitan_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-14-robertabase_ppt_occitan_pipeline_en.md new file mode 100644 index 00000000000000..488bd14bce116a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-robertabase_ppt_occitan_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English robertabase_ppt_occitan_pipeline pipeline RoBertaEmbeddings from mehrshadk +author: John Snow Labs +name: robertabase_ppt_occitan_pipeline +date: 2024-09-14 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`robertabase_ppt_occitan_pipeline` is a English model originally trained by mehrshadk. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/robertabase_ppt_occitan_pipeline_en_5.5.0_3.0_1726338440401.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/robertabase_ppt_occitan_pipeline_en_5.5.0_3.0_1726338440401.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("robertabase_ppt_occitan_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("robertabase_ppt_occitan_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|robertabase_ppt_occitan_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|465.8 MB| + +## References + +https://huggingface.co/mehrshadk/robertaBase_ppt_OC + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-rubert_multiconer_pipeline_ru.md b/docs/_posts/ahmedlone127/2024-09-14-rubert_multiconer_pipeline_ru.md new file mode 100644 index 00000000000000..0be13cc282b949 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-rubert_multiconer_pipeline_ru.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Russian rubert_multiconer_pipeline pipeline BertForTokenClassification from bond005 +author: John Snow Labs +name: rubert_multiconer_pipeline +date: 2024-09-14 +tags: [ru, open_source, pipeline, onnx] +task: Named Entity Recognition +language: ru +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`rubert_multiconer_pipeline` is a Russian model originally trained by bond005. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/rubert_multiconer_pipeline_ru_5.5.0_3.0_1726305605300.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/rubert_multiconer_pipeline_ru_5.5.0_3.0_1726305605300.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("rubert_multiconer_pipeline", lang = "ru") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("rubert_multiconer_pipeline", lang = "ru") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|rubert_multiconer_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|ru| +|Size:|664.4 MB| + +## References + +https://huggingface.co/bond005/rubert-multiconer + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-scenario_non_kd_scr_d2_data_cl_cardiff_cl_onlyalpha_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-14-scenario_non_kd_scr_d2_data_cl_cardiff_cl_onlyalpha_pipeline_en.md new file mode 100644 index 00000000000000..6c0aa69cc0bbe4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-scenario_non_kd_scr_d2_data_cl_cardiff_cl_onlyalpha_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English scenario_non_kd_scr_d2_data_cl_cardiff_cl_onlyalpha_pipeline pipeline XlmRoBertaForSequenceClassification from haryoaw +author: John Snow Labs +name: scenario_non_kd_scr_d2_data_cl_cardiff_cl_onlyalpha_pipeline +date: 2024-09-14 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`scenario_non_kd_scr_d2_data_cl_cardiff_cl_onlyalpha_pipeline` is a English model originally trained by haryoaw. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/scenario_non_kd_scr_d2_data_cl_cardiff_cl_onlyalpha_pipeline_en_5.5.0_3.0_1726317367544.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/scenario_non_kd_scr_d2_data_cl_cardiff_cl_onlyalpha_pipeline_en_5.5.0_3.0_1726317367544.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("scenario_non_kd_scr_d2_data_cl_cardiff_cl_onlyalpha_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("scenario_non_kd_scr_d2_data_cl_cardiff_cl_onlyalpha_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|scenario_non_kd_scr_d2_data_cl_cardiff_cl_onlyalpha_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|883.9 MB| + +## References + +https://huggingface.co/haryoaw/scenario-NON-KD-SCR-D2_data-cl-cardiff_cl_onlyalpha + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-sent_bert_base_code_comments_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-14-sent_bert_base_code_comments_pipeline_en.md new file mode 100644 index 00000000000000..6858e1799ccb2f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-sent_bert_base_code_comments_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_base_code_comments_pipeline pipeline BertSentenceEmbeddings from giganticode +author: John Snow Labs +name: sent_bert_base_code_comments_pipeline +date: 2024-09-14 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_code_comments_pipeline` is a English model originally trained by giganticode. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_code_comments_pipeline_en_5.5.0_3.0_1726336913056.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_code_comments_pipeline_en_5.5.0_3.0_1726336913056.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_code_comments_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_code_comments_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_code_comments_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.7 MB| + +## References + +https://huggingface.co/giganticode/bert-base-code_comments + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-sent_bert_base_historic_multilingual_64k_td_cased_xx.md b/docs/_posts/ahmedlone127/2024-09-14-sent_bert_base_historic_multilingual_64k_td_cased_xx.md new file mode 100644 index 00000000000000..1179189466e6db --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-sent_bert_base_historic_multilingual_64k_td_cased_xx.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Multilingual sent_bert_base_historic_multilingual_64k_td_cased BertSentenceEmbeddings from dbmdz +author: John Snow Labs +name: sent_bert_base_historic_multilingual_64k_td_cased +date: 2024-09-14 +tags: [xx, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_historic_multilingual_64k_td_cased` is a Multilingual model originally trained by dbmdz. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_historic_multilingual_64k_td_cased_xx_5.5.0_3.0_1726337237009.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_historic_multilingual_64k_td_cased_xx_5.5.0_3.0_1726337237009.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_historic_multilingual_64k_td_cased","xx") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_historic_multilingual_64k_td_cased","xx") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_historic_multilingual_64k_td_cased| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|xx| +|Size:|504.6 MB| + +## References + +https://huggingface.co/dbmdz/bert-base-historic-multilingual-64k-td-cased \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-sent_bert_base_multilingual_cased_finetuned_luganda_xx.md b/docs/_posts/ahmedlone127/2024-09-14-sent_bert_base_multilingual_cased_finetuned_luganda_xx.md new file mode 100644 index 00000000000000..98dd009d182b80 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-sent_bert_base_multilingual_cased_finetuned_luganda_xx.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Multilingual sent_bert_base_multilingual_cased_finetuned_luganda BertSentenceEmbeddings from Davlan +author: John Snow Labs +name: sent_bert_base_multilingual_cased_finetuned_luganda +date: 2024-09-14 +tags: [xx, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_multilingual_cased_finetuned_luganda` is a Multilingual model originally trained by Davlan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_multilingual_cased_finetuned_luganda_xx_5.5.0_3.0_1726310900295.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_multilingual_cased_finetuned_luganda_xx_5.5.0_3.0_1726310900295.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_multilingual_cased_finetuned_luganda","xx") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_multilingual_cased_finetuned_luganda","xx") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_multilingual_cased_finetuned_luganda| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|xx| +|Size:|665.0 MB| + +## References + +https://huggingface.co/Davlan/bert-base-multilingual-cased-finetuned-luganda \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-sent_bert_small_historic_multilingual_cased_xx.md b/docs/_posts/ahmedlone127/2024-09-14-sent_bert_small_historic_multilingual_cased_xx.md new file mode 100644 index 00000000000000..e754db5467b54c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-sent_bert_small_historic_multilingual_cased_xx.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Multilingual sent_bert_small_historic_multilingual_cased BertSentenceEmbeddings from dbmdz +author: John Snow Labs +name: sent_bert_small_historic_multilingual_cased +date: 2024-09-14 +tags: [xx, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_small_historic_multilingual_cased` is a Multilingual model originally trained by dbmdz. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_small_historic_multilingual_cased_xx_5.5.0_3.0_1726303046298.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_small_historic_multilingual_cased_xx_5.5.0_3.0_1726303046298.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_small_historic_multilingual_cased","xx") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_small_historic_multilingual_cased","xx") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_small_historic_multilingual_cased| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|xx| +|Size:|109.5 MB| + +## References + +https://huggingface.co/dbmdz/bert-small-historic-multilingual-cased \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-sent_luxembert_v2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-14-sent_luxembert_v2_pipeline_en.md new file mode 100644 index 00000000000000..f22f4e59c6eb07 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-sent_luxembert_v2_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_luxembert_v2_pipeline pipeline BertSentenceEmbeddings from iolariu +author: John Snow Labs +name: sent_luxembert_v2_pipeline +date: 2024-09-14 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_luxembert_v2_pipeline` is a English model originally trained by iolariu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_luxembert_v2_pipeline_en_5.5.0_3.0_1726320284168.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_luxembert_v2_pipeline_en_5.5.0_3.0_1726320284168.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_luxembert_v2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_luxembert_v2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_luxembert_v2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|406.4 MB| + +## References + +https://huggingface.co/iolariu/LuxemBERT-v2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-sentiment140_albert_5e_en.md b/docs/_posts/ahmedlone127/2024-09-14-sentiment140_albert_5e_en.md new file mode 100644 index 00000000000000..13d14de6c19029 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-sentiment140_albert_5e_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sentiment140_albert_5e AlbertForSequenceClassification from pig4431 +author: John Snow Labs +name: sentiment140_albert_5e +date: 2024-09-14 +tags: [en, open_source, onnx, sequence_classification, albert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: AlbertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained AlbertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sentiment140_albert_5e` is a English model originally trained by pig4431. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sentiment140_albert_5e_en_5.5.0_3.0_1726308738284.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sentiment140_albert_5e_en_5.5.0_3.0_1726308738284.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = AlbertForSequenceClassification.pretrained("sentiment140_albert_5e","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = AlbertForSequenceClassification.pretrained("sentiment140_albert_5e", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sentiment140_albert_5e| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|44.2 MB| + +## References + +https://huggingface.co/pig4431/Sentiment140_ALBERT_5E \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-socroberta_base_en.md b/docs/_posts/ahmedlone127/2024-09-14-socroberta_base_en.md new file mode 100644 index 00000000000000..3bda66a8f7a096 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-socroberta_base_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English socroberta_base RoBertaEmbeddings from ESGBERT +author: John Snow Labs +name: socroberta_base +date: 2024-09-14 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`socroberta_base` is a English model originally trained by ESGBERT. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/socroberta_base_en_5.5.0_3.0_1726299566405.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/socroberta_base_en_5.5.0_3.0_1726299566405.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("socroberta_base","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("socroberta_base","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|socroberta_base| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|466.1 MB| + +## References + +https://huggingface.co/ESGBERT/SocRoBERTa-base \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-takalane_afr_roberta_af.md b/docs/_posts/ahmedlone127/2024-09-14-takalane_afr_roberta_af.md new file mode 100644 index 00000000000000..7c3271216cce2b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-takalane_afr_roberta_af.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Afrikaans takalane_afr_roberta RoBertaEmbeddings from jannesg +author: John Snow Labs +name: takalane_afr_roberta +date: 2024-09-14 +tags: [af, open_source, onnx, embeddings, roberta] +task: Embeddings +language: af +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`takalane_afr_roberta` is a Afrikaans model originally trained by jannesg. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/takalane_afr_roberta_af_5.5.0_3.0_1726338343913.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/takalane_afr_roberta_af_5.5.0_3.0_1726338343913.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("takalane_afr_roberta","af") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("takalane_afr_roberta","af") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|takalane_afr_roberta| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|af| +|Size:|311.5 MB| + +## References + +https://huggingface.co/jannesg/takalane_afr_roberta \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-tatoeba_emea_20k_english_german_en.md b/docs/_posts/ahmedlone127/2024-09-14-tatoeba_emea_20k_english_german_en.md new file mode 100644 index 00000000000000..6b581f61810163 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-tatoeba_emea_20k_english_german_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English tatoeba_emea_20k_english_german MarianTransformer from muibk +author: John Snow Labs +name: tatoeba_emea_20k_english_german +date: 2024-09-14 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`tatoeba_emea_20k_english_german` is a English model originally trained by muibk. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/tatoeba_emea_20k_english_german_en_5.5.0_3.0_1726351056892.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/tatoeba_emea_20k_english_german_en_5.5.0_3.0_1726351056892.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("tatoeba_emea_20k_english_german","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("tatoeba_emea_20k_english_german","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|tatoeba_emea_20k_english_german| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|538.0 MB| + +## References + +https://huggingface.co/muibk/tatoeba_emea_20k_en-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-test_whisper_tiny_thai_kritchayahir_en.md b/docs/_posts/ahmedlone127/2024-09-14-test_whisper_tiny_thai_kritchayahir_en.md new file mode 100644 index 00000000000000..968731fd39bc73 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-test_whisper_tiny_thai_kritchayahir_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English test_whisper_tiny_thai_kritchayahir WhisperForCTC from kritchayaHir +author: John Snow Labs +name: test_whisper_tiny_thai_kritchayahir +date: 2024-09-14 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`test_whisper_tiny_thai_kritchayahir` is a English model originally trained by kritchayaHir. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/test_whisper_tiny_thai_kritchayahir_en_5.5.0_3.0_1726325243051.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/test_whisper_tiny_thai_kritchayahir_en_5.5.0_3.0_1726325243051.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("test_whisper_tiny_thai_kritchayahir","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("test_whisper_tiny_thai_kritchayahir", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|test_whisper_tiny_thai_kritchayahir| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|389.9 MB| + +## References + +https://huggingface.co/kritchayaHir/test-whisper-tiny-th \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-transcriber_small_en.md b/docs/_posts/ahmedlone127/2024-09-14-transcriber_small_en.md new file mode 100644 index 00000000000000..5c39946b4d68e0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-transcriber_small_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English transcriber_small WhisperForCTC from mediaProcessing +author: John Snow Labs +name: transcriber_small +date: 2024-09-14 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`transcriber_small` is a English model originally trained by mediaProcessing. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/transcriber_small_en_5.5.0_3.0_1726331356203.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/transcriber_small_en_5.5.0_3.0_1726331356203.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("transcriber_small","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("transcriber_small", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|transcriber_small| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/mediaProcessing/Transcriber-small \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-tse_albert_5e_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-14-tse_albert_5e_pipeline_en.md new file mode 100644 index 00000000000000..55fbecfd2aa06f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-tse_albert_5e_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English tse_albert_5e_pipeline pipeline AlbertForSequenceClassification from pig4431 +author: John Snow Labs +name: tse_albert_5e_pipeline +date: 2024-09-14 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained AlbertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`tse_albert_5e_pipeline` is a English model originally trained by pig4431. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/tse_albert_5e_pipeline_en_5.5.0_3.0_1726336234917.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/tse_albert_5e_pipeline_en_5.5.0_3.0_1726336234917.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("tse_albert_5e_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("tse_albert_5e_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|tse_albert_5e_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|44.2 MB| + +## References + +https://huggingface.co/pig4431/TSE_ALBERT_5E + +## Included Models + +- DocumentAssembler +- TokenizerModel +- AlbertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-twitter_roberta_base_mar2022_15m_incr_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-14-twitter_roberta_base_mar2022_15m_incr_pipeline_en.md new file mode 100644 index 00000000000000..f7ccc4a790e3dd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-twitter_roberta_base_mar2022_15m_incr_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English twitter_roberta_base_mar2022_15m_incr_pipeline pipeline RoBertaEmbeddings from cardiffnlp +author: John Snow Labs +name: twitter_roberta_base_mar2022_15m_incr_pipeline +date: 2024-09-14 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`twitter_roberta_base_mar2022_15m_incr_pipeline` is a English model originally trained by cardiffnlp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/twitter_roberta_base_mar2022_15m_incr_pipeline_en_5.5.0_3.0_1726300024881.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/twitter_roberta_base_mar2022_15m_incr_pipeline_en_5.5.0_3.0_1726300024881.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("twitter_roberta_base_mar2022_15m_incr_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("twitter_roberta_base_mar2022_15m_incr_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|twitter_roberta_base_mar2022_15m_incr_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|466.1 MB| + +## References + +https://huggingface.co/cardiffnlp/twitter-roberta-base-mar2022-15M-incr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-whisper_small_alb_sq.md b/docs/_posts/ahmedlone127/2024-09-14-whisper_small_alb_sq.md new file mode 100644 index 00000000000000..ae864db5458d29 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-whisper_small_alb_sq.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Albanian whisper_small_alb WhisperForCTC from somu9 +author: John Snow Labs +name: whisper_small_alb +date: 2024-09-14 +tags: [sq, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: sq +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_alb` is a Albanian model originally trained by somu9. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_alb_sq_5.5.0_3.0_1726356953630.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_alb_sq_5.5.0_3.0_1726356953630.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_alb","sq") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_alb", "sq") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_alb| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|sq| +|Size:|1.7 GB| + +## References + +https://huggingface.co/somu9/whisper-small-alb \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-whisper_small_bengali_kurokabe_bn.md b/docs/_posts/ahmedlone127/2024-09-14-whisper_small_bengali_kurokabe_bn.md new file mode 100644 index 00000000000000..55bf9f81d0ee5d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-whisper_small_bengali_kurokabe_bn.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Bengali whisper_small_bengali_kurokabe WhisperForCTC from Kurokabe +author: John Snow Labs +name: whisper_small_bengali_kurokabe +date: 2024-09-14 +tags: [bn, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: bn +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_bengali_kurokabe` is a Bengali model originally trained by Kurokabe. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_bengali_kurokabe_bn_5.5.0_3.0_1726330823123.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_bengali_kurokabe_bn_5.5.0_3.0_1726330823123.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_bengali_kurokabe","bn") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_bengali_kurokabe", "bn") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_bengali_kurokabe| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|bn| +|Size:|1.7 GB| + +## References + +https://huggingface.co/Kurokabe/whisper-small-bn \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-whisper_small_cebtoeng_pipeline_hi.md b/docs/_posts/ahmedlone127/2024-09-14-whisper_small_cebtoeng_pipeline_hi.md new file mode 100644 index 00000000000000..1668d74efba78d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-whisper_small_cebtoeng_pipeline_hi.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Hindi whisper_small_cebtoeng_pipeline pipeline WhisperForCTC from ahoka +author: John Snow Labs +name: whisper_small_cebtoeng_pipeline +date: 2024-09-14 +tags: [hi, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: hi +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_cebtoeng_pipeline` is a Hindi model originally trained by ahoka. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_cebtoeng_pipeline_hi_5.5.0_3.0_1726275973510.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_cebtoeng_pipeline_hi_5.5.0_3.0_1726275973510.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_cebtoeng_pipeline", lang = "hi") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_cebtoeng_pipeline", lang = "hi") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_cebtoeng_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|hi| +|Size:|1.1 GB| + +## References + +https://huggingface.co/ahoka/whisper-small-cebToEng + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-whisper_small_chinese_happytsai_zh.md b/docs/_posts/ahmedlone127/2024-09-14-whisper_small_chinese_happytsai_zh.md new file mode 100644 index 00000000000000..7826ae5e28181a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-whisper_small_chinese_happytsai_zh.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Chinese whisper_small_chinese_happytsai WhisperForCTC from HappyTsai +author: John Snow Labs +name: whisper_small_chinese_happytsai +date: 2024-09-14 +tags: [zh, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: zh +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_chinese_happytsai` is a Chinese model originally trained by HappyTsai. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_chinese_happytsai_zh_5.5.0_3.0_1726298018155.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_chinese_happytsai_zh_5.5.0_3.0_1726298018155.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_chinese_happytsai","zh") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_chinese_happytsai", "zh") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_chinese_happytsai| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|zh| +|Size:|1.7 GB| + +## References + +https://huggingface.co/HappyTsai/whisper-small-zh \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-whisper_small_cv16_hungarian_v2_pipeline_hu.md b/docs/_posts/ahmedlone127/2024-09-14-whisper_small_cv16_hungarian_v2_pipeline_hu.md new file mode 100644 index 00000000000000..ef1944df4bc7f5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-whisper_small_cv16_hungarian_v2_pipeline_hu.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Hungarian whisper_small_cv16_hungarian_v2_pipeline pipeline WhisperForCTC from Hungarians +author: John Snow Labs +name: whisper_small_cv16_hungarian_v2_pipeline +date: 2024-09-14 +tags: [hu, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: hu +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_cv16_hungarian_v2_pipeline` is a Hungarian model originally trained by Hungarians. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_cv16_hungarian_v2_pipeline_hu_5.5.0_3.0_1726274275141.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_cv16_hungarian_v2_pipeline_hu_5.5.0_3.0_1726274275141.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_cv16_hungarian_v2_pipeline", lang = "hu") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_cv16_hungarian_v2_pipeline", lang = "hu") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_cv16_hungarian_v2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|hu| +|Size:|1.7 GB| + +## References + +https://huggingface.co/Hungarians/whisper-small-cv16-hu-v2 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-whisper_small_fine_tuned_russian_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-14-whisper_small_fine_tuned_russian_pipeline_en.md new file mode 100644 index 00000000000000..5ed447987dba21 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-whisper_small_fine_tuned_russian_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_small_fine_tuned_russian_pipeline pipeline WhisperForCTC from artyomboyko +author: John Snow Labs +name: whisper_small_fine_tuned_russian_pipeline +date: 2024-09-14 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_fine_tuned_russian_pipeline` is a English model originally trained by artyomboyko. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_fine_tuned_russian_pipeline_en_5.5.0_3.0_1726279343507.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_fine_tuned_russian_pipeline_en_5.5.0_3.0_1726279343507.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_fine_tuned_russian_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_fine_tuned_russian_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_fine_tuned_russian_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/artyomboyko/whisper-small-fine_tuned-ru + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-whisper_small_finnish_full_pipeline_fi.md b/docs/_posts/ahmedlone127/2024-09-14-whisper_small_finnish_full_pipeline_fi.md new file mode 100644 index 00000000000000..8738bfd4d6780a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-whisper_small_finnish_full_pipeline_fi.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Finnish whisper_small_finnish_full_pipeline pipeline WhisperForCTC from sgangireddy +author: John Snow Labs +name: whisper_small_finnish_full_pipeline +date: 2024-09-14 +tags: [fi, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: fi +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_finnish_full_pipeline` is a Finnish model originally trained by sgangireddy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_finnish_full_pipeline_fi_5.5.0_3.0_1726277643604.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_finnish_full_pipeline_fi_5.5.0_3.0_1726277643604.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_finnish_full_pipeline", lang = "fi") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_finnish_full_pipeline", lang = "fi") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_finnish_full_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|fi| +|Size:|1.7 GB| + +## References + +https://huggingface.co/sgangireddy/whisper-small-fi-full + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-whisper_small_hindi_eguladida_hi.md b/docs/_posts/ahmedlone127/2024-09-14-whisper_small_hindi_eguladida_hi.md new file mode 100644 index 00000000000000..0a8a99a857c630 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-whisper_small_hindi_eguladida_hi.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Hindi whisper_small_hindi_eguladida WhisperForCTC from eguladida +author: John Snow Labs +name: whisper_small_hindi_eguladida +date: 2024-09-14 +tags: [hi, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: hi +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_hindi_eguladida` is a Hindi model originally trained by eguladida. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_hindi_eguladida_hi_5.5.0_3.0_1726284561589.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_hindi_eguladida_hi_5.5.0_3.0_1726284561589.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_hindi_eguladida","hi") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_hindi_eguladida", "hi") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_hindi_eguladida| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|hi| +|Size:|1.7 GB| + +## References + +https://huggingface.co/eguladida/whisper-small-hi \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-whisper_small_hindi_hunzla_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-14-whisper_small_hindi_hunzla_pipeline_en.md new file mode 100644 index 00000000000000..f77097f23ba3c1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-whisper_small_hindi_hunzla_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_small_hindi_hunzla_pipeline pipeline WhisperForCTC from Hunzla +author: John Snow Labs +name: whisper_small_hindi_hunzla_pipeline +date: 2024-09-14 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_hindi_hunzla_pipeline` is a English model originally trained by Hunzla. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_hindi_hunzla_pipeline_en_5.5.0_3.0_1726295895559.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_hindi_hunzla_pipeline_en_5.5.0_3.0_1726295895559.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_hindi_hunzla_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_hindi_hunzla_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_hindi_hunzla_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|391.0 MB| + +## References + +https://huggingface.co/Hunzla/whisper-small-hi + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-whisper_small_hindi_l_inuri_en.md b/docs/_posts/ahmedlone127/2024-09-14-whisper_small_hindi_l_inuri_en.md new file mode 100644 index 00000000000000..69da8b96af20ea --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-whisper_small_hindi_l_inuri_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_small_hindi_l_inuri WhisperForCTC from L-Inuri +author: John Snow Labs +name: whisper_small_hindi_l_inuri +date: 2024-09-14 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_hindi_l_inuri` is a English model originally trained by L-Inuri. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_hindi_l_inuri_en_5.5.0_3.0_1726280852361.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_hindi_l_inuri_en_5.5.0_3.0_1726280852361.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_hindi_l_inuri","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_hindi_l_inuri", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_hindi_l_inuri| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/L-Inuri/whisper-small-hi \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-whisper_small_indonesian_cahya_id.md b/docs/_posts/ahmedlone127/2024-09-14-whisper_small_indonesian_cahya_id.md new file mode 100644 index 00000000000000..f8722b2dfb9a2a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-whisper_small_indonesian_cahya_id.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Indonesian whisper_small_indonesian_cahya WhisperForCTC from cahya +author: John Snow Labs +name: whisper_small_indonesian_cahya +date: 2024-09-14 +tags: [id, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: id +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_indonesian_cahya` is a Indonesian model originally trained by cahya. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_indonesian_cahya_id_5.5.0_3.0_1726321577596.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_indonesian_cahya_id_5.5.0_3.0_1726321577596.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_indonesian_cahya","id") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_indonesian_cahya", "id") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_indonesian_cahya| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|id| +|Size:|1.7 GB| + +## References + +https://huggingface.co/cahya/whisper-small-id \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-whisper_small_mandarin_en.md b/docs/_posts/ahmedlone127/2024-09-14-whisper_small_mandarin_en.md new file mode 100644 index 00000000000000..cd7eed038aed2a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-whisper_small_mandarin_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_small_mandarin WhisperForCTC from Wenjian12581 +author: John Snow Labs +name: whisper_small_mandarin +date: 2024-09-14 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_mandarin` is a English model originally trained by Wenjian12581. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_mandarin_en_5.5.0_3.0_1726356226979.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_mandarin_en_5.5.0_3.0_1726356226979.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_mandarin","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_mandarin", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_mandarin| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/Wenjian12581/whisper-small-mandarin \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-whisper_small_vivos_jrhuy_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-14-whisper_small_vivos_jrhuy_pipeline_en.md new file mode 100644 index 00000000000000..aea7811bed4dac --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-whisper_small_vivos_jrhuy_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_small_vivos_jrhuy_pipeline pipeline WhisperForCTC from JRHuy +author: John Snow Labs +name: whisper_small_vivos_jrhuy_pipeline +date: 2024-09-14 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_vivos_jrhuy_pipeline` is a English model originally trained by JRHuy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_vivos_jrhuy_pipeline_en_5.5.0_3.0_1726331761204.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_vivos_jrhuy_pipeline_en_5.5.0_3.0_1726331761204.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_vivos_jrhuy_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_vivos_jrhuy_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_vivos_jrhuy_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/JRHuy/whisper-small-vivos + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-whisper_tiny_bengali_ehzawad_pipeline_bn.md b/docs/_posts/ahmedlone127/2024-09-14-whisper_tiny_bengali_ehzawad_pipeline_bn.md new file mode 100644 index 00000000000000..d45d0e40bae69d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-whisper_tiny_bengali_ehzawad_pipeline_bn.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Bengali whisper_tiny_bengali_ehzawad_pipeline pipeline WhisperForCTC from ehzawad +author: John Snow Labs +name: whisper_tiny_bengali_ehzawad_pipeline +date: 2024-09-14 +tags: [bn, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: bn +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_bengali_ehzawad_pipeline` is a Bengali model originally trained by ehzawad. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_bengali_ehzawad_pipeline_bn_5.5.0_3.0_1726354785453.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_bengali_ehzawad_pipeline_bn_5.5.0_3.0_1726354785453.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_tiny_bengali_ehzawad_pipeline", lang = "bn") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_tiny_bengali_ehzawad_pipeline", lang = "bn") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_bengali_ehzawad_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|bn| +|Size:|391.3 MB| + +## References + +https://huggingface.co/ehzawad/whisper-tiny-bn + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-whisper_tiny_divehi_model_man_en.md b/docs/_posts/ahmedlone127/2024-09-14-whisper_tiny_divehi_model_man_en.md new file mode 100644 index 00000000000000..2fe50c69f8a188 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-whisper_tiny_divehi_model_man_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_tiny_divehi_model_man WhisperForCTC from model-man +author: John Snow Labs +name: whisper_tiny_divehi_model_man +date: 2024-09-14 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_divehi_model_man` is a English model originally trained by model-man. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_divehi_model_man_en_5.5.0_3.0_1726297329160.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_divehi_model_man_en_5.5.0_3.0_1726297329160.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_tiny_divehi_model_man","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_tiny_divehi_model_man", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_divehi_model_man| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|390.9 MB| + +## References + +https://huggingface.co/model-man/whisper-tiny-dv \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-whisper_tiny_divehi_model_man_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-14-whisper_tiny_divehi_model_man_pipeline_en.md new file mode 100644 index 00000000000000..0e7bcd044c076b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-whisper_tiny_divehi_model_man_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_tiny_divehi_model_man_pipeline pipeline WhisperForCTC from model-man +author: John Snow Labs +name: whisper_tiny_divehi_model_man_pipeline +date: 2024-09-14 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_divehi_model_man_pipeline` is a English model originally trained by model-man. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_divehi_model_man_pipeline_en_5.5.0_3.0_1726297356360.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_divehi_model_man_pipeline_en_5.5.0_3.0_1726297356360.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_tiny_divehi_model_man_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_tiny_divehi_model_man_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_divehi_model_man_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|390.9 MB| + +## References + +https://huggingface.co/model-man/whisper-tiny-dv + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-whisper_tiny_finetuned_minds14_artyomboyko_en.md b/docs/_posts/ahmedlone127/2024-09-14-whisper_tiny_finetuned_minds14_artyomboyko_en.md new file mode 100644 index 00000000000000..1f64e2c70d5e6c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-whisper_tiny_finetuned_minds14_artyomboyko_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_tiny_finetuned_minds14_artyomboyko WhisperForCTC from artyomboyko +author: John Snow Labs +name: whisper_tiny_finetuned_minds14_artyomboyko +date: 2024-09-14 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_finetuned_minds14_artyomboyko` is a English model originally trained by artyomboyko. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_finetuned_minds14_artyomboyko_en_5.5.0_3.0_1726296553580.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_finetuned_minds14_artyomboyko_en_5.5.0_3.0_1726296553580.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_tiny_finetuned_minds14_artyomboyko","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_tiny_finetuned_minds14_artyomboyko", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_finetuned_minds14_artyomboyko| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|390.9 MB| + +## References + +https://huggingface.co/artyomboyko/whisper-tiny-finetuned-minds14 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-whisper_tiny_thai_suphisara_en.md b/docs/_posts/ahmedlone127/2024-09-14-whisper_tiny_thai_suphisara_en.md new file mode 100644 index 00000000000000..a5c0cf52b9f875 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-whisper_tiny_thai_suphisara_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_tiny_thai_suphisara WhisperForCTC from suphisara +author: John Snow Labs +name: whisper_tiny_thai_suphisara +date: 2024-09-14 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_thai_suphisara` is a English model originally trained by suphisara. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_thai_suphisara_en_5.5.0_3.0_1726278152804.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_thai_suphisara_en_5.5.0_3.0_1726278152804.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_tiny_thai_suphisara","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_tiny_thai_suphisara", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_thai_suphisara| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|389.9 MB| + +## References + +https://huggingface.co/suphisara/whisper-tiny-th \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-whisper_tiny_thai_wrtzp_en.md b/docs/_posts/ahmedlone127/2024-09-14-whisper_tiny_thai_wrtzp_en.md new file mode 100644 index 00000000000000..b24e735532211f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-whisper_tiny_thai_wrtzp_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_tiny_thai_wrtzp WhisperForCTC from wrtzp +author: John Snow Labs +name: whisper_tiny_thai_wrtzp +date: 2024-09-14 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_thai_wrtzp` is a English model originally trained by wrtzp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_thai_wrtzp_en_5.5.0_3.0_1726278440777.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_thai_wrtzp_en_5.5.0_3.0_1726278440777.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_tiny_thai_wrtzp","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_tiny_thai_wrtzp", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_thai_wrtzp| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|389.9 MB| + +## References + +https://huggingface.co/wrtzp/whisper-tiny-th \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-whisperoutput_en.md b/docs/_posts/ahmedlone127/2024-09-14-whisperoutput_en.md new file mode 100644 index 00000000000000..62892373c1ed7d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-whisperoutput_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisperoutput WhisperForCTC from EssiaJaadari +author: John Snow Labs +name: whisperoutput +date: 2024-09-14 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisperoutput` is a English model originally trained by EssiaJaadari. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisperoutput_en_5.5.0_3.0_1726354977635.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisperoutput_en_5.5.0_3.0_1726354977635.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisperoutput","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisperoutput", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisperoutput| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/EssiaJaadari/WhisperOutput \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-xlm_roberta_base_finetuned_panx_all_wooseok0303_en.md b/docs/_posts/ahmedlone127/2024-09-14-xlm_roberta_base_finetuned_panx_all_wooseok0303_en.md new file mode 100644 index 00000000000000..f02ae8aec78768 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-xlm_roberta_base_finetuned_panx_all_wooseok0303_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_all_wooseok0303 XlmRoBertaForTokenClassification from wooseok0303 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_all_wooseok0303 +date: 2024-09-14 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_all_wooseok0303` is a English model originally trained by wooseok0303. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_wooseok0303_en_5.5.0_3.0_1726292462612.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_wooseok0303_en_5.5.0_3.0_1726292462612.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_all_wooseok0303","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_all_wooseok0303", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_all_wooseok0303| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|848.0 MB| + +## References + +https://huggingface.co/wooseok0303/xlm-roberta-base-finetuned-panx-all \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-xlm_roberta_base_finetuned_panx_all_wooseok0303_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-14-xlm_roberta_base_finetuned_panx_all_wooseok0303_pipeline_en.md new file mode 100644 index 00000000000000..f4a366f36b89dc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-xlm_roberta_base_finetuned_panx_all_wooseok0303_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_all_wooseok0303_pipeline pipeline XlmRoBertaForTokenClassification from wooseok0303 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_all_wooseok0303_pipeline +date: 2024-09-14 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_all_wooseok0303_pipeline` is a English model originally trained by wooseok0303. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_wooseok0303_pipeline_en_5.5.0_3.0_1726292540121.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_wooseok0303_pipeline_en_5.5.0_3.0_1726292540121.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_all_wooseok0303_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_all_wooseok0303_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_all_wooseok0303_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|848.0 MB| + +## References + +https://huggingface.co/wooseok0303/xlm-roberta-base-finetuned-panx-all + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-xlm_roberta_base_finetuned_panx_english_andreaschandra_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-14-xlm_roberta_base_finetuned_panx_english_andreaschandra_pipeline_en.md new file mode 100644 index 00000000000000..36db3a92c94275 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-xlm_roberta_base_finetuned_panx_english_andreaschandra_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_english_andreaschandra_pipeline pipeline XlmRoBertaForTokenClassification from andreaschandra +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_english_andreaschandra_pipeline +date: 2024-09-14 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_english_andreaschandra_pipeline` is a English model originally trained by andreaschandra. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_andreaschandra_pipeline_en_5.5.0_3.0_1726291721833.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_andreaschandra_pipeline_en_5.5.0_3.0_1726291721833.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_english_andreaschandra_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_english_andreaschandra_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_english_andreaschandra_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|826.4 MB| + +## References + +https://huggingface.co/andreaschandra/xlm-roberta-base-finetuned-panx-en + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-xlm_roberta_base_finetuned_panx_english_occupy1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-14-xlm_roberta_base_finetuned_panx_english_occupy1_pipeline_en.md new file mode 100644 index 00000000000000..7cacaeec662661 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-xlm_roberta_base_finetuned_panx_english_occupy1_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_english_occupy1_pipeline pipeline XlmRoBertaForTokenClassification from occupy1 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_english_occupy1_pipeline +date: 2024-09-14 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_english_occupy1_pipeline` is a English model originally trained by occupy1. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_occupy1_pipeline_en_5.5.0_3.0_1726345485770.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_occupy1_pipeline_en_5.5.0_3.0_1726345485770.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_english_occupy1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_english_occupy1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_english_occupy1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|814.3 MB| + +## References + +https://huggingface.co/occupy1/xlm-roberta-base-finetuned-panx-en + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-xlm_roberta_base_finetuned_panx_english_seobak_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-14-xlm_roberta_base_finetuned_panx_english_seobak_pipeline_en.md new file mode 100644 index 00000000000000..f1a3e72aecd06d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-xlm_roberta_base_finetuned_panx_english_seobak_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_english_seobak_pipeline pipeline XlmRoBertaForTokenClassification from seobak +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_english_seobak_pipeline +date: 2024-09-14 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_english_seobak_pipeline` is a English model originally trained by seobak. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_seobak_pipeline_en_5.5.0_3.0_1726292394515.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_seobak_pipeline_en_5.5.0_3.0_1726292394515.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_english_seobak_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_english_seobak_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_english_seobak_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|814.3 MB| + +## References + +https://huggingface.co/seobak/xlm-roberta-base-finetuned-panx-en + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-xlm_roberta_base_finetuned_panx_german_chaewonlee_en.md b/docs/_posts/ahmedlone127/2024-09-14-xlm_roberta_base_finetuned_panx_german_chaewonlee_en.md new file mode 100644 index 00000000000000..7d20beed6dfb16 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-xlm_roberta_base_finetuned_panx_german_chaewonlee_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_chaewonlee XlmRoBertaForTokenClassification from chaewonlee +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_chaewonlee +date: 2024-09-14 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_chaewonlee` is a English model originally trained by chaewonlee. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_chaewonlee_en_5.5.0_3.0_1726346696478.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_chaewonlee_en_5.5.0_3.0_1726346696478.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_chaewonlee","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_chaewonlee", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_chaewonlee| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|858.2 MB| + +## References + +https://huggingface.co/chaewonlee/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-xlm_roberta_base_finetuned_panx_german_fraisier_en.md b/docs/_posts/ahmedlone127/2024-09-14-xlm_roberta_base_finetuned_panx_german_fraisier_en.md new file mode 100644 index 00000000000000..adce6f6d296578 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-xlm_roberta_base_finetuned_panx_german_fraisier_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_fraisier XlmRoBertaForTokenClassification from Fraisier +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_fraisier +date: 2024-09-14 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_fraisier` is a English model originally trained by Fraisier. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_fraisier_en_5.5.0_3.0_1726346559704.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_fraisier_en_5.5.0_3.0_1726346559704.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_fraisier","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_fraisier", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_fraisier| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/Fraisier/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-xlm_roberta_base_finetuned_panx_german_sunwooooong_en.md b/docs/_posts/ahmedlone127/2024-09-14-xlm_roberta_base_finetuned_panx_german_sunwooooong_en.md new file mode 100644 index 00000000000000..830f084b677775 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-xlm_roberta_base_finetuned_panx_german_sunwooooong_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_sunwooooong XlmRoBertaForTokenClassification from sunwooooong +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_sunwooooong +date: 2024-09-14 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_sunwooooong` is a English model originally trained by sunwooooong. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_sunwooooong_en_5.5.0_3.0_1726289439500.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_sunwooooong_en_5.5.0_3.0_1726289439500.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_sunwooooong","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_sunwooooong", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_sunwooooong| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/sunwooooong/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-xlm_roberta_base_finetuned_panx_italian_koroku_en.md b/docs/_posts/ahmedlone127/2024-09-14-xlm_roberta_base_finetuned_panx_italian_koroku_en.md new file mode 100644 index 00000000000000..9b32bdcaf0acce --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-xlm_roberta_base_finetuned_panx_italian_koroku_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_koroku XlmRoBertaForTokenClassification from koroku +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_koroku +date: 2024-09-14 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_koroku` is a English model originally trained by koroku. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_koroku_en_5.5.0_3.0_1726289562632.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_koroku_en_5.5.0_3.0_1726289562632.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_koroku","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_koroku", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_koroku| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|826.2 MB| + +## References + +https://huggingface.co/koroku/xlm-roberta-base-finetuned-panx-it \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-xlm_roberta_base_trimmed_french_30000_tweet_sentiment_french_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-14-xlm_roberta_base_trimmed_french_30000_tweet_sentiment_french_pipeline_en.md new file mode 100644 index 00000000000000..3f8ef65ce1fec7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-xlm_roberta_base_trimmed_french_30000_tweet_sentiment_french_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_trimmed_french_30000_tweet_sentiment_french_pipeline pipeline XlmRoBertaForSequenceClassification from vocabtrimmer +author: John Snow Labs +name: xlm_roberta_base_trimmed_french_30000_tweet_sentiment_french_pipeline +date: 2024-09-14 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_trimmed_french_30000_tweet_sentiment_french_pipeline` is a English model originally trained by vocabtrimmer. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_trimmed_french_30000_tweet_sentiment_french_pipeline_en_5.5.0_3.0_1726317520763.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_trimmed_french_30000_tweet_sentiment_french_pipeline_en_5.5.0_3.0_1726317520763.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_trimmed_french_30000_tweet_sentiment_french_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_trimmed_french_30000_tweet_sentiment_french_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_trimmed_french_30000_tweet_sentiment_french_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|388.3 MB| + +## References + +https://huggingface.co/vocabtrimmer/xlm-roberta-base-trimmed-fr-30000-tweet-sentiment-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-2407_lmsys_v01_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-15-2407_lmsys_v01_pipeline_en.md new file mode 100644 index 00000000000000..73df40ea5c1e09 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-2407_lmsys_v01_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English 2407_lmsys_v01_pipeline pipeline DistilBertForSequenceClassification from tom-010 +author: John Snow Labs +name: 2407_lmsys_v01_pipeline +date: 2024-09-15 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`2407_lmsys_v01_pipeline` is a English model originally trained by tom-010. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/2407_lmsys_v01_pipeline_en_5.5.0_3.0_1726406389368.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/2407_lmsys_v01_pipeline_en_5.5.0_3.0_1726406389368.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("2407_lmsys_v01_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("2407_lmsys_v01_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|2407_lmsys_v01_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/tom-010/2407_lmsys_v01 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-adrv2024_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-15-adrv2024_pipeline_en.md new file mode 100644 index 00000000000000..ddc77253acf404 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-adrv2024_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English adrv2024_pipeline pipeline DistilBertForSequenceClassification from jschwaller +author: John Snow Labs +name: adrv2024_pipeline +date: 2024-09-15 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`adrv2024_pipeline` is a English model originally trained by jschwaller. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/adrv2024_pipeline_en_5.5.0_3.0_1726365835743.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/adrv2024_pipeline_en_5.5.0_3.0_1726365835743.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("adrv2024_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("adrv2024_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|adrv2024_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/jschwaller/ADRv2024 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-albert_model__29_5_en.md b/docs/_posts/ahmedlone127/2024-09-15-albert_model__29_5_en.md new file mode 100644 index 00000000000000..6ac257346a7456 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-albert_model__29_5_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English albert_model__29_5 DistilBertForSequenceClassification from KalaiselvanD +author: John Snow Labs +name: albert_model__29_5 +date: 2024-09-15 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`albert_model__29_5` is a English model originally trained by KalaiselvanD. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/albert_model__29_5_en_5.5.0_3.0_1726365714209.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/albert_model__29_5_en_5.5.0_3.0_1726365714209.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("albert_model__29_5","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("albert_model__29_5", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|albert_model__29_5| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/KalaiselvanD/albert_model__29_5 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-albert_turkish_turkish_movie_reviews_pipeline_tr.md b/docs/_posts/ahmedlone127/2024-09-15-albert_turkish_turkish_movie_reviews_pipeline_tr.md new file mode 100644 index 00000000000000..a704eba0ff5a91 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-albert_turkish_turkish_movie_reviews_pipeline_tr.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Turkish albert_turkish_turkish_movie_reviews_pipeline pipeline AlbertForSequenceClassification from anilguven +author: John Snow Labs +name: albert_turkish_turkish_movie_reviews_pipeline +date: 2024-09-15 +tags: [tr, open_source, pipeline, onnx] +task: Text Classification +language: tr +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained AlbertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`albert_turkish_turkish_movie_reviews_pipeline` is a Turkish model originally trained by anilguven. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/albert_turkish_turkish_movie_reviews_pipeline_tr_5.5.0_3.0_1726396721926.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/albert_turkish_turkish_movie_reviews_pipeline_tr_5.5.0_3.0_1726396721926.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("albert_turkish_turkish_movie_reviews_pipeline", lang = "tr") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("albert_turkish_turkish_movie_reviews_pipeline", lang = "tr") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|albert_turkish_turkish_movie_reviews_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|tr| +|Size:|45.1 MB| + +## References + +https://huggingface.co/anilguven/albert_tr_turkish_movie_reviews + +## Included Models + +- DocumentAssembler +- TokenizerModel +- AlbertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-aqc_1_en.md b/docs/_posts/ahmedlone127/2024-09-15-aqc_1_en.md new file mode 100644 index 00000000000000..3da246332a02b1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-aqc_1_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English aqc_1 DistilBertForSequenceClassification from oyonay12 +author: John Snow Labs +name: aqc_1 +date: 2024-09-15 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`aqc_1` is a English model originally trained by oyonay12. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/aqc_1_en_5.5.0_3.0_1726365944436.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/aqc_1_en_5.5.0_3.0_1726365944436.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("aqc_1","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("aqc_1", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|aqc_1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|250.3 MB| + +## References + +https://huggingface.co/oyonay12/aqc_1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-aqc_1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-15-aqc_1_pipeline_en.md new file mode 100644 index 00000000000000..a61dcb16a06e23 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-aqc_1_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English aqc_1_pipeline pipeline DistilBertForSequenceClassification from oyonay12 +author: John Snow Labs +name: aqc_1_pipeline +date: 2024-09-15 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`aqc_1_pipeline` is a English model originally trained by oyonay12. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/aqc_1_pipeline_en_5.5.0_3.0_1726365957264.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/aqc_1_pipeline_en_5.5.0_3.0_1726365957264.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("aqc_1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("aqc_1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|aqc_1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|250.3 MB| + +## References + +https://huggingface.co/oyonay12/aqc_1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-b_base_x12_en.md b/docs/_posts/ahmedlone127/2024-09-15-b_base_x12_en.md new file mode 100644 index 00000000000000..18241678627a0d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-b_base_x12_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English b_base_x12 AlbertForSequenceClassification from damgomz +author: John Snow Labs +name: b_base_x12 +date: 2024-09-15 +tags: [en, open_source, onnx, sequence_classification, albert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: AlbertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained AlbertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`b_base_x12` is a English model originally trained by damgomz. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/b_base_x12_en_5.5.0_3.0_1726372599287.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/b_base_x12_en_5.5.0_3.0_1726372599287.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = AlbertForSequenceClassification.pretrained("b_base_x12","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = AlbertForSequenceClassification.pretrained("b_base_x12", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|b_base_x12| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|336.0 MB| + +## References + +https://huggingface.co/damgomz/B_base_x12 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-babyberta_aochildes_french_aochildes_2_5m_with_masking_finetuned_squad_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-15-babyberta_aochildes_french_aochildes_2_5m_with_masking_finetuned_squad_pipeline_en.md new file mode 100644 index 00000000000000..4cb179eecdf282 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-babyberta_aochildes_french_aochildes_2_5m_with_masking_finetuned_squad_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English babyberta_aochildes_french_aochildes_2_5m_with_masking_finetuned_squad_pipeline pipeline RoBertaForQuestionAnswering from lielbin +author: John Snow Labs +name: babyberta_aochildes_french_aochildes_2_5m_with_masking_finetuned_squad_pipeline +date: 2024-09-15 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`babyberta_aochildes_french_aochildes_2_5m_with_masking_finetuned_squad_pipeline` is a English model originally trained by lielbin. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/babyberta_aochildes_french_aochildes_2_5m_with_masking_finetuned_squad_pipeline_en_5.5.0_3.0_1726404519698.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/babyberta_aochildes_french_aochildes_2_5m_with_masking_finetuned_squad_pipeline_en_5.5.0_3.0_1726404519698.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("babyberta_aochildes_french_aochildes_2_5m_with_masking_finetuned_squad_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("babyberta_aochildes_french_aochildes_2_5m_with_masking_finetuned_squad_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|babyberta_aochildes_french_aochildes_2_5m_with_masking_finetuned_squad_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|32.0 MB| + +## References + +https://huggingface.co/lielbin/BabyBERTa-aochildes-french_aochildes_2.5M-with-Masking-finetuned-SQuAD + +## Included Models + +- MultiDocumentAssembler +- RoBertaForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-babyberta_wikipedia_french_run3_without_masking_finetuned_qamr_en.md b/docs/_posts/ahmedlone127/2024-09-15-babyberta_wikipedia_french_run3_without_masking_finetuned_qamr_en.md new file mode 100644 index 00000000000000..91b97fa9b837a7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-babyberta_wikipedia_french_run3_without_masking_finetuned_qamr_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English babyberta_wikipedia_french_run3_without_masking_finetuned_qamr RoBertaForQuestionAnswering from lielbin +author: John Snow Labs +name: babyberta_wikipedia_french_run3_without_masking_finetuned_qamr +date: 2024-09-15 +tags: [en, open_source, onnx, question_answering, roberta] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`babyberta_wikipedia_french_run3_without_masking_finetuned_qamr` is a English model originally trained by lielbin. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/babyberta_wikipedia_french_run3_without_masking_finetuned_qamr_en_5.5.0_3.0_1726363633994.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/babyberta_wikipedia_french_run3_without_masking_finetuned_qamr_en_5.5.0_3.0_1726363633994.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = RoBertaForQuestionAnswering.pretrained("babyberta_wikipedia_french_run3_without_masking_finetuned_qamr","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = RoBertaForQuestionAnswering.pretrained("babyberta_wikipedia_french_run3_without_masking_finetuned_qamr", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|babyberta_wikipedia_french_run3_without_masking_finetuned_qamr| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|32.0 MB| + +## References + +https://huggingface.co/lielbin/BabyBERTa-wikipedia_french-run3-without-Masking-finetuned-QAMR \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-bert_finetuned_squad_question_generation_100_percent_cased_en.md b/docs/_posts/ahmedlone127/2024-09-15-bert_finetuned_squad_question_generation_100_percent_cased_en.md new file mode 100644 index 00000000000000..0ce74ee233854d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-bert_finetuned_squad_question_generation_100_percent_cased_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_finetuned_squad_question_generation_100_percent_cased BertForQuestionAnswering from mohilp1998 +author: John Snow Labs +name: bert_finetuned_squad_question_generation_100_percent_cased +date: 2024-09-15 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_finetuned_squad_question_generation_100_percent_cased` is a English model originally trained by mohilp1998. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_finetuned_squad_question_generation_100_percent_cased_en_5.5.0_3.0_1726386222590.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_finetuned_squad_question_generation_100_percent_cased_en_5.5.0_3.0_1726386222590.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_finetuned_squad_question_generation_100_percent_cased","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_finetuned_squad_question_generation_100_percent_cased", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_finetuned_squad_question_generation_100_percent_cased| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/mohilp1998/bert-finetuned-squad-question-generation-100_percent-cased \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-bert_gemma2b_multivllm_nodropsus_12_en.md b/docs/_posts/ahmedlone127/2024-09-15-bert_gemma2b_multivllm_nodropsus_12_en.md new file mode 100644 index 00000000000000..85b08ca80fe722 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-bert_gemma2b_multivllm_nodropsus_12_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_gemma2b_multivllm_nodropsus_12 DistilBertForSequenceClassification from jvelja +author: John Snow Labs +name: bert_gemma2b_multivllm_nodropsus_12 +date: 2024-09-15 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_gemma2b_multivllm_nodropsus_12` is a English model originally trained by jvelja. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_gemma2b_multivllm_nodropsus_12_en_5.5.0_3.0_1726394100231.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_gemma2b_multivllm_nodropsus_12_en_5.5.0_3.0_1726394100231.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("bert_gemma2b_multivllm_nodropsus_12","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("bert_gemma2b_multivllm_nodropsus_12", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_gemma2b_multivllm_nodropsus_12| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/jvelja/BERT_gemma2b-multivllm-NodropSus_12 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-bert_large2_en.md b/docs/_posts/ahmedlone127/2024-09-15-bert_large2_en.md new file mode 100644 index 00000000000000..1b0edb8a78aa07 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-bert_large2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_large2 RoBertaForSequenceClassification from RogerKam +author: John Snow Labs +name: bert_large2 +date: 2024-09-15 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_large2` is a English model originally trained by RogerKam. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_large2_en_5.5.0_3.0_1726401366524.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_large2_en_5.5.0_3.0_1726401366524.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("bert_large2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("bert_large2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_large2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|434.3 MB| + +## References + +https://huggingface.co/RogerKam/BERT-large2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-bert_large2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-15-bert_large2_pipeline_en.md new file mode 100644 index 00000000000000..fef83d20a46464 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-bert_large2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_large2_pipeline pipeline RoBertaForSequenceClassification from RogerKam +author: John Snow Labs +name: bert_large2_pipeline +date: 2024-09-15 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_large2_pipeline` is a English model originally trained by RogerKam. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_large2_pipeline_en_5.5.0_3.0_1726401399024.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_large2_pipeline_en_5.5.0_3.0_1726401399024.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_large2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_large2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_large2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|434.3 MB| + +## References + +https://huggingface.co/RogerKam/BERT-large2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-bsc_bio_ehr_spanish_nubes_es.md b/docs/_posts/ahmedlone127/2024-09-15-bsc_bio_ehr_spanish_nubes_es.md new file mode 100644 index 00000000000000..83082d68568412 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-bsc_bio_ehr_spanish_nubes_es.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Castilian, Spanish bsc_bio_ehr_spanish_nubes RoBertaForTokenClassification from IIC +author: John Snow Labs +name: bsc_bio_ehr_spanish_nubes +date: 2024-09-15 +tags: [es, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: es +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bsc_bio_ehr_spanish_nubes` is a Castilian, Spanish model originally trained by IIC. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bsc_bio_ehr_spanish_nubes_es_5.5.0_3.0_1726403494287.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bsc_bio_ehr_spanish_nubes_es_5.5.0_3.0_1726403494287.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("bsc_bio_ehr_spanish_nubes","es") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("bsc_bio_ehr_spanish_nubes", "es") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bsc_bio_ehr_spanish_nubes| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|es| +|Size:|434.7 MB| + +## References + +https://huggingface.co/IIC/bsc-bio-ehr-es-nubes \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-burmese_awesome_model_mahmoud59_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-15-burmese_awesome_model_mahmoud59_pipeline_en.md new file mode 100644 index 00000000000000..5cdf99eb244eb0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-burmese_awesome_model_mahmoud59_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_model_mahmoud59_pipeline pipeline DistilBertForSequenceClassification from Mahmoud59 +author: John Snow Labs +name: burmese_awesome_model_mahmoud59_pipeline +date: 2024-09-15 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_mahmoud59_pipeline` is a English model originally trained by Mahmoud59. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_mahmoud59_pipeline_en_5.5.0_3.0_1726394105469.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_mahmoud59_pipeline_en_5.5.0_3.0_1726394105469.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_model_mahmoud59_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_model_mahmoud59_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_mahmoud59_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Mahmoud59/my_awesome_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-burmese_awesome_qa_model_faaany_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-15-burmese_awesome_qa_model_faaany_pipeline_en.md new file mode 100644 index 00000000000000..ec1fd58d20f72f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-burmese_awesome_qa_model_faaany_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English burmese_awesome_qa_model_faaany_pipeline pipeline DistilBertForQuestionAnswering from faaany +author: John Snow Labs +name: burmese_awesome_qa_model_faaany_pipeline +date: 2024-09-15 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_qa_model_faaany_pipeline` is a English model originally trained by faaany. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_faaany_pipeline_en_5.5.0_3.0_1726382877012.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_faaany_pipeline_en_5.5.0_3.0_1726382877012.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_qa_model_faaany_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_qa_model_faaany_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_qa_model_faaany_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/faaany/my_awesome_qa_model + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-burmese_awesome_qa_model_hark99_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-15-burmese_awesome_qa_model_hark99_pipeline_en.md new file mode 100644 index 00000000000000..ede2caf118c110 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-burmese_awesome_qa_model_hark99_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English burmese_awesome_qa_model_hark99_pipeline pipeline DistilBertForQuestionAnswering from hark99 +author: John Snow Labs +name: burmese_awesome_qa_model_hark99_pipeline +date: 2024-09-15 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_qa_model_hark99_pipeline` is a English model originally trained by hark99. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_hark99_pipeline_en_5.5.0_3.0_1726435188505.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_hark99_pipeline_en_5.5.0_3.0_1726435188505.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_qa_model_hark99_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_qa_model_hark99_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_qa_model_hark99_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|243.8 MB| + +## References + +https://huggingface.co/hark99/my_awesome_qa_model + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-burmese_awesome_qa_model_wearecoding_en.md b/docs/_posts/ahmedlone127/2024-09-15-burmese_awesome_qa_model_wearecoding_en.md new file mode 100644 index 00000000000000..95cd57aada67b9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-burmese_awesome_qa_model_wearecoding_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English burmese_awesome_qa_model_wearecoding DistilBertForQuestionAnswering from wearecoding +author: John Snow Labs +name: burmese_awesome_qa_model_wearecoding +date: 2024-09-15 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_qa_model_wearecoding` is a English model originally trained by wearecoding. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_wearecoding_en_5.5.0_3.0_1726382835407.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_wearecoding_en_5.5.0_3.0_1726382835407.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("burmese_awesome_qa_model_wearecoding","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("burmese_awesome_qa_model_wearecoding", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_qa_model_wearecoding| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/wearecoding/my_awesome_qa_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-burmese_awesome_qa_model_wearecoding_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-15-burmese_awesome_qa_model_wearecoding_pipeline_en.md new file mode 100644 index 00000000000000..6c70fade2b1b4b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-burmese_awesome_qa_model_wearecoding_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English burmese_awesome_qa_model_wearecoding_pipeline pipeline DistilBertForQuestionAnswering from wearecoding +author: John Snow Labs +name: burmese_awesome_qa_model_wearecoding_pipeline +date: 2024-09-15 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_qa_model_wearecoding_pipeline` is a English model originally trained by wearecoding. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_wearecoding_pipeline_en_5.5.0_3.0_1726382846787.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_wearecoding_pipeline_en_5.5.0_3.0_1726382846787.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_qa_model_wearecoding_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_qa_model_wearecoding_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_qa_model_wearecoding_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/wearecoding/my_awesome_qa_model + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-burmese_hw6_imdb_model_en.md b/docs/_posts/ahmedlone127/2024-09-15-burmese_hw6_imdb_model_en.md new file mode 100644 index 00000000000000..ab8aadf218d36e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-burmese_hw6_imdb_model_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_hw6_imdb_model DistilBertForSequenceClassification from shahsp2 +author: John Snow Labs +name: burmese_hw6_imdb_model +date: 2024-09-15 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_hw6_imdb_model` is a English model originally trained by shahsp2. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_hw6_imdb_model_en_5.5.0_3.0_1726406214817.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_hw6_imdb_model_en_5.5.0_3.0_1726406214817.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_hw6_imdb_model","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_hw6_imdb_model", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_hw6_imdb_model| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/shahsp2/my_hw6_imdb_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-burmese_qa_model_anm8edboi_en.md b/docs/_posts/ahmedlone127/2024-09-15-burmese_qa_model_anm8edboi_en.md new file mode 100644 index 00000000000000..9cb765933e14f2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-burmese_qa_model_anm8edboi_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English burmese_qa_model_anm8edboi DistilBertForQuestionAnswering from anm8edboi +author: John Snow Labs +name: burmese_qa_model_anm8edboi +date: 2024-09-15 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_qa_model_anm8edboi` is a English model originally trained by anm8edboi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_qa_model_anm8edboi_en_5.5.0_3.0_1726382680743.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_qa_model_anm8edboi_en_5.5.0_3.0_1726382680743.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("burmese_qa_model_anm8edboi","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("burmese_qa_model_anm8edboi", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_qa_model_anm8edboi| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/anm8edboi/my_qa_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-burmese_sentiment_analysis_model_jay369_en.md b/docs/_posts/ahmedlone127/2024-09-15-burmese_sentiment_analysis_model_jay369_en.md new file mode 100644 index 00000000000000..32c89b153c9f4e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-burmese_sentiment_analysis_model_jay369_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_sentiment_analysis_model_jay369 DistilBertForSequenceClassification from Jay369 +author: John Snow Labs +name: burmese_sentiment_analysis_model_jay369 +date: 2024-09-15 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_sentiment_analysis_model_jay369` is a English model originally trained by Jay369. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_sentiment_analysis_model_jay369_en_5.5.0_3.0_1726385401058.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_sentiment_analysis_model_jay369_en_5.5.0_3.0_1726385401058.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_sentiment_analysis_model_jay369","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_sentiment_analysis_model_jay369", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_sentiment_analysis_model_jay369| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Jay369/my_sentiment_analysis_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-chatloom_test_1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-15-chatloom_test_1_pipeline_en.md new file mode 100644 index 00000000000000..e795ca4813ec11 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-chatloom_test_1_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English chatloom_test_1_pipeline pipeline RoBertaForQuestionAnswering from SkullWreker +author: John Snow Labs +name: chatloom_test_1_pipeline +date: 2024-09-15 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`chatloom_test_1_pipeline` is a English model originally trained by SkullWreker. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/chatloom_test_1_pipeline_en_5.5.0_3.0_1726369312572.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/chatloom_test_1_pipeline_en_5.5.0_3.0_1726369312572.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("chatloom_test_1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("chatloom_test_1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|chatloom_test_1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|464.1 MB| + +## References + +https://huggingface.co/SkullWreker/ChatLoom_Test_1 + +## Included Models + +- MultiDocumentAssembler +- RoBertaForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-chinese_roberta_wwm_ext_finetuned_accelerate_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-15-chinese_roberta_wwm_ext_finetuned_accelerate_pipeline_en.md new file mode 100644 index 00000000000000..7ef0ba3070c222 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-chinese_roberta_wwm_ext_finetuned_accelerate_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English chinese_roberta_wwm_ext_finetuned_accelerate_pipeline pipeline BertForQuestionAnswering from DaydreamerF +author: John Snow Labs +name: chinese_roberta_wwm_ext_finetuned_accelerate_pipeline +date: 2024-09-15 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`chinese_roberta_wwm_ext_finetuned_accelerate_pipeline` is a English model originally trained by DaydreamerF. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/chinese_roberta_wwm_ext_finetuned_accelerate_pipeline_en_5.5.0_3.0_1726367880377.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/chinese_roberta_wwm_ext_finetuned_accelerate_pipeline_en_5.5.0_3.0_1726367880377.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("chinese_roberta_wwm_ext_finetuned_accelerate_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("chinese_roberta_wwm_ext_finetuned_accelerate_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|chinese_roberta_wwm_ext_finetuned_accelerate_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|381.0 MB| + +## References + +https://huggingface.co/DaydreamerF/chinese-roberta-wwm-ext-finetuned-accelerate + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-cuad_distil_governing_law_cased_08_31_v1_en.md b/docs/_posts/ahmedlone127/2024-09-15-cuad_distil_governing_law_cased_08_31_v1_en.md new file mode 100644 index 00000000000000..0437c7362ecfe7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-cuad_distil_governing_law_cased_08_31_v1_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English cuad_distil_governing_law_cased_08_31_v1 DistilBertForQuestionAnswering from saraks +author: John Snow Labs +name: cuad_distil_governing_law_cased_08_31_v1 +date: 2024-09-15 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`cuad_distil_governing_law_cased_08_31_v1` is a English model originally trained by saraks. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/cuad_distil_governing_law_cased_08_31_v1_en_5.5.0_3.0_1726382711221.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/cuad_distil_governing_law_cased_08_31_v1_en_5.5.0_3.0_1726382711221.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("cuad_distil_governing_law_cased_08_31_v1","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("cuad_distil_governing_law_cased_08_31_v1", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|cuad_distil_governing_law_cased_08_31_v1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|243.8 MB| + +## References + +https://huggingface.co/saraks/cuad-distil-governing_law-cased-08-31-v1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-cuatr_distilbert_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-15-cuatr_distilbert_pipeline_en.md new file mode 100644 index 00000000000000..d892ba26c35966 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-cuatr_distilbert_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English cuatr_distilbert_pipeline pipeline DistilBertForSequenceClassification from chathuru +author: John Snow Labs +name: cuatr_distilbert_pipeline +date: 2024-09-15 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`cuatr_distilbert_pipeline` is a English model originally trained by chathuru. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/cuatr_distilbert_pipeline_en_5.5.0_3.0_1726394184110.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/cuatr_distilbert_pipeline_en_5.5.0_3.0_1726394184110.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("cuatr_distilbert_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("cuatr_distilbert_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|cuatr_distilbert_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/chathuru/CuATR-distilbert + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-culturebank_relevance_classifier_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-15-culturebank_relevance_classifier_pipeline_en.md new file mode 100644 index 00000000000000..8378ea9e0de631 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-culturebank_relevance_classifier_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English culturebank_relevance_classifier_pipeline pipeline DistilBertForSequenceClassification from SALT-NLP +author: John Snow Labs +name: culturebank_relevance_classifier_pipeline +date: 2024-09-15 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`culturebank_relevance_classifier_pipeline` is a English model originally trained by SALT-NLP. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/culturebank_relevance_classifier_pipeline_en_5.5.0_3.0_1726393554687.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/culturebank_relevance_classifier_pipeline_en_5.5.0_3.0_1726393554687.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("culturebank_relevance_classifier_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("culturebank_relevance_classifier_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|culturebank_relevance_classifier_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/SALT-NLP/CultureBank-Relevance-Classifier + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-custommodel_yelp1_1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-15-custommodel_yelp1_1_pipeline_en.md new file mode 100644 index 00000000000000..90b525aab785bb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-custommodel_yelp1_1_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English custommodel_yelp1_1_pipeline pipeline DistilBertForSequenceClassification from Mintiny +author: John Snow Labs +name: custommodel_yelp1_1_pipeline +date: 2024-09-15 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`custommodel_yelp1_1_pipeline` is a English model originally trained by Mintiny. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/custommodel_yelp1_1_pipeline_en_5.5.0_3.0_1726393894496.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/custommodel_yelp1_1_pipeline_en_5.5.0_3.0_1726393894496.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("custommodel_yelp1_1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("custommodel_yelp1_1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|custommodel_yelp1_1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Mintiny/CustomModel_yelp1.1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-dialogue_one_sharontudi_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-15-dialogue_one_sharontudi_pipeline_en.md new file mode 100644 index 00000000000000..f7308bee9e580a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-dialogue_one_sharontudi_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English dialogue_one_sharontudi_pipeline pipeline DistilBertForSequenceClassification from SharonTudi +author: John Snow Labs +name: dialogue_one_sharontudi_pipeline +date: 2024-09-15 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`dialogue_one_sharontudi_pipeline` is a English model originally trained by SharonTudi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/dialogue_one_sharontudi_pipeline_en_5.5.0_3.0_1726366578569.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/dialogue_one_sharontudi_pipeline_en_5.5.0_3.0_1726366578569.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("dialogue_one_sharontudi_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("dialogue_one_sharontudi_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|dialogue_one_sharontudi_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|246.0 MB| + +## References + +https://huggingface.co/SharonTudi/DIALOGUE_one + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-distil_news_finetune_en.md b/docs/_posts/ahmedlone127/2024-09-15-distil_news_finetune_en.md new file mode 100644 index 00000000000000..12265e04691ebe --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-distil_news_finetune_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distil_news_finetune DistilBertForSequenceClassification from anggari +author: John Snow Labs +name: distil_news_finetune +date: 2024-09-15 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distil_news_finetune` is a English model originally trained by anggari. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distil_news_finetune_en_5.5.0_3.0_1726366306526.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distil_news_finetune_en_5.5.0_3.0_1726366306526.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distil_news_finetune","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distil_news_finetune", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distil_news_finetune| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/anggari/distil_news_finetune \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-distil_news_finetune_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-15-distil_news_finetune_pipeline_en.md new file mode 100644 index 00000000000000..ab5ea042a4321d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-distil_news_finetune_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distil_news_finetune_pipeline pipeline DistilBertForSequenceClassification from anggari +author: John Snow Labs +name: distil_news_finetune_pipeline +date: 2024-09-15 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distil_news_finetune_pipeline` is a English model originally trained by anggari. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distil_news_finetune_pipeline_en_5.5.0_3.0_1726366318562.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distil_news_finetune_pipeline_en_5.5.0_3.0_1726366318562.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distil_news_finetune_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distil_news_finetune_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distil_news_finetune_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/anggari/distil_news_finetune + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-distilbert_base_fact_updates_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-15-distilbert_base_fact_updates_pipeline_en.md new file mode 100644 index 00000000000000..3a1b98125c5031 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-distilbert_base_fact_updates_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_fact_updates_pipeline pipeline DistilBertForSequenceClassification from rishavranaut +author: John Snow Labs +name: distilbert_base_fact_updates_pipeline +date: 2024-09-15 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_fact_updates_pipeline` is a English model originally trained by rishavranaut. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_fact_updates_pipeline_en_5.5.0_3.0_1726394297201.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_fact_updates_pipeline_en_5.5.0_3.0_1726394297201.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_fact_updates_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_fact_updates_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_fact_updates_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/rishavranaut/distilbert-base-fact-updates + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-distilbert_base_spanish_cased_autext_en.md b/docs/_posts/ahmedlone127/2024-09-15-distilbert_base_spanish_cased_autext_en.md new file mode 100644 index 00000000000000..2914c612d90c16 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-distilbert_base_spanish_cased_autext_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_spanish_cased_autext DistilBertForSequenceClassification from jorgefg03 +author: John Snow Labs +name: distilbert_base_spanish_cased_autext +date: 2024-09-15 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_spanish_cased_autext` is a English model originally trained by jorgefg03. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_spanish_cased_autext_en_5.5.0_3.0_1726394196869.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_spanish_cased_autext_en_5.5.0_3.0_1726394196869.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_spanish_cased_autext","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_spanish_cased_autext", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_spanish_cased_autext| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|238.9 MB| + +## References + +https://huggingface.co/jorgefg03/distilbert-base-es-cased-autext \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-distilbert_base_spanish_cased_autext_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-15-distilbert_base_spanish_cased_autext_pipeline_en.md new file mode 100644 index 00000000000000..58f0eb467b69e5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-distilbert_base_spanish_cased_autext_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_spanish_cased_autext_pipeline pipeline DistilBertForSequenceClassification from jorgefg03 +author: John Snow Labs +name: distilbert_base_spanish_cased_autext_pipeline +date: 2024-09-15 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_spanish_cased_autext_pipeline` is a English model originally trained by jorgefg03. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_spanish_cased_autext_pipeline_en_5.5.0_3.0_1726394208333.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_spanish_cased_autext_pipeline_en_5.5.0_3.0_1726394208333.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_spanish_cased_autext_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_spanish_cased_autext_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_spanish_cased_autext_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|238.9 MB| + +## References + +https://huggingface.co/jorgefg03/distilbert-base-es-cased-autext + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-distilbert_base_uncased_finetuned_clinc_dohwan9672_en.md b/docs/_posts/ahmedlone127/2024-09-15-distilbert_base_uncased_finetuned_clinc_dohwan9672_en.md new file mode 100644 index 00000000000000..fc81e69531abaf --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-distilbert_base_uncased_finetuned_clinc_dohwan9672_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_clinc_dohwan9672 DistilBertForSequenceClassification from DoHwan9672 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_clinc_dohwan9672 +date: 2024-09-15 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_clinc_dohwan9672` is a English model originally trained by DoHwan9672. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_clinc_dohwan9672_en_5.5.0_3.0_1726406100140.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_clinc_dohwan9672_en_5.5.0_3.0_1726406100140.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_clinc_dohwan9672","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_clinc_dohwan9672", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_clinc_dohwan9672| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.9 MB| + +## References + +https://huggingface.co/DoHwan9672/distilbert-base-uncased-finetuned-clinc \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-distilbert_base_uncased_finetuned_cola_vpkoji_en.md b/docs/_posts/ahmedlone127/2024-09-15-distilbert_base_uncased_finetuned_cola_vpkoji_en.md new file mode 100644 index 00000000000000..6ea27fcaeeef1a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-distilbert_base_uncased_finetuned_cola_vpkoji_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_cola_vpkoji DistilBertForSequenceClassification from VPKoji +author: John Snow Labs +name: distilbert_base_uncased_finetuned_cola_vpkoji +date: 2024-09-15 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_cola_vpkoji` is a English model originally trained by VPKoji. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_vpkoji_en_5.5.0_3.0_1726406349502.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_vpkoji_en_5.5.0_3.0_1726406349502.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_cola_vpkoji","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_cola_vpkoji", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_cola_vpkoji| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/VPKoji/distilbert-base-uncased-finetuned-cola \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-distilbert_base_uncased_finetuned_emotion_1_13_am_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-15-distilbert_base_uncased_finetuned_emotion_1_13_am_pipeline_en.md new file mode 100644 index 00000000000000..ff033da5108142 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-distilbert_base_uncased_finetuned_emotion_1_13_am_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_1_13_am_pipeline pipeline DistilBertForSequenceClassification from 1-13-am +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_1_13_am_pipeline +date: 2024-09-15 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_1_13_am_pipeline` is a English model originally trained by 1-13-am. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_1_13_am_pipeline_en_5.5.0_3.0_1726393668777.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_1_13_am_pipeline_en_5.5.0_3.0_1726393668777.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_1_13_am_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_1_13_am_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_1_13_am_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/1-13-am/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-distilbert_base_uncased_finetuned_emotion_adlv_en.md b/docs/_posts/ahmedlone127/2024-09-15-distilbert_base_uncased_finetuned_emotion_adlv_en.md new file mode 100644 index 00000000000000..c287f5b3740060 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-distilbert_base_uncased_finetuned_emotion_adlv_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_adlv DistilBertForSequenceClassification from adlv +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_adlv +date: 2024-09-15 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_adlv` is a English model originally trained by adlv. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_adlv_en_5.5.0_3.0_1726366192477.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_adlv_en_5.5.0_3.0_1726366192477.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_adlv","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_adlv", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_adlv| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/adlv/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-distilbert_base_uncased_finetuned_emotion_adlv_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-15-distilbert_base_uncased_finetuned_emotion_adlv_pipeline_en.md new file mode 100644 index 00000000000000..cc78f2f7a58241 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-distilbert_base_uncased_finetuned_emotion_adlv_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_adlv_pipeline pipeline DistilBertForSequenceClassification from adlv +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_adlv_pipeline +date: 2024-09-15 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_adlv_pipeline` is a English model originally trained by adlv. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_adlv_pipeline_en_5.5.0_3.0_1726366204215.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_adlv_pipeline_en_5.5.0_3.0_1726366204215.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_adlv_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_adlv_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_adlv_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/adlv/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-distilbert_base_uncased_finetuned_emotion_ashkanero_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-15-distilbert_base_uncased_finetuned_emotion_ashkanero_pipeline_en.md new file mode 100644 index 00000000000000..efc3fae752199d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-distilbert_base_uncased_finetuned_emotion_ashkanero_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_ashkanero_pipeline pipeline DistilBertForSequenceClassification from Ashkanero +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_ashkanero_pipeline +date: 2024-09-15 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_ashkanero_pipeline` is a English model originally trained by Ashkanero. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_ashkanero_pipeline_en_5.5.0_3.0_1726365846427.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_ashkanero_pipeline_en_5.5.0_3.0_1726365846427.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_ashkanero_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_ashkanero_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_ashkanero_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Ashkanero/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-distilbert_base_uncased_finetuned_emotion_edmonds0_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-15-distilbert_base_uncased_finetuned_emotion_edmonds0_pipeline_en.md new file mode 100644 index 00000000000000..3954a709ec202c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-distilbert_base_uncased_finetuned_emotion_edmonds0_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_edmonds0_pipeline pipeline DistilBertForSequenceClassification from Edmonds0 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_edmonds0_pipeline +date: 2024-09-15 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_edmonds0_pipeline` is a English model originally trained by Edmonds0. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_edmonds0_pipeline_en_5.5.0_3.0_1726385522649.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_edmonds0_pipeline_en_5.5.0_3.0_1726385522649.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_edmonds0_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_edmonds0_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_edmonds0_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Edmonds0/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-distilbert_base_uncased_finetuned_emotion_hcyying_en.md b/docs/_posts/ahmedlone127/2024-09-15-distilbert_base_uncased_finetuned_emotion_hcyying_en.md new file mode 100644 index 00000000000000..b09fbce68f1703 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-distilbert_base_uncased_finetuned_emotion_hcyying_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_hcyying DistilBertForSequenceClassification from hcyying +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_hcyying +date: 2024-09-15 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_hcyying` is a English model originally trained by hcyying. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_hcyying_en_5.5.0_3.0_1726394355515.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_hcyying_en_5.5.0_3.0_1726394355515.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_hcyying","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_hcyying", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_hcyying| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/hcyying/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-distilbert_base_uncased_finetuned_emotion_hschia2_en.md b/docs/_posts/ahmedlone127/2024-09-15-distilbert_base_uncased_finetuned_emotion_hschia2_en.md new file mode 100644 index 00000000000000..9d9641f6bb6407 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-distilbert_base_uncased_finetuned_emotion_hschia2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_hschia2 DistilBertForSequenceClassification from hschia2 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_hschia2 +date: 2024-09-15 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_hschia2` is a English model originally trained by hschia2. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_hschia2_en_5.5.0_3.0_1726385369811.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_hschia2_en_5.5.0_3.0_1726385369811.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_hschia2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_hschia2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_hschia2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/hschia2/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-distilbert_base_uncased_finetuned_emotion_kaspersmidt_en.md b/docs/_posts/ahmedlone127/2024-09-15-distilbert_base_uncased_finetuned_emotion_kaspersmidt_en.md new file mode 100644 index 00000000000000..8edb259f992482 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-distilbert_base_uncased_finetuned_emotion_kaspersmidt_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_kaspersmidt DistilBertForSequenceClassification from Kaspersmidt +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_kaspersmidt +date: 2024-09-15 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_kaspersmidt` is a English model originally trained by Kaspersmidt. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_kaspersmidt_en_5.5.0_3.0_1726366369901.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_kaspersmidt_en_5.5.0_3.0_1726366369901.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_kaspersmidt","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_kaspersmidt", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_kaspersmidt| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Kaspersmidt/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-distilbert_base_uncased_finetuned_squad_manik114_en.md b/docs/_posts/ahmedlone127/2024-09-15-distilbert_base_uncased_finetuned_squad_manik114_en.md new file mode 100644 index 00000000000000..9ab4ef842083db --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-distilbert_base_uncased_finetuned_squad_manik114_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_squad_manik114 DistilBertForQuestionAnswering from manik114 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_squad_manik114 +date: 2024-09-15 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_squad_manik114` is a English model originally trained by manik114. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_manik114_en_5.5.0_3.0_1726382754690.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_manik114_en_5.5.0_3.0_1726382754690.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_uncased_finetuned_squad_manik114","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_uncased_finetuned_squad_manik114", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_squad_manik114| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/manik114/distilbert-base-uncased-finetuned-squad \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-distilbert_base_uncased_finetuned_squad_manik114_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-15-distilbert_base_uncased_finetuned_squad_manik114_pipeline_en.md new file mode 100644 index 00000000000000..0d7e54e36310ec --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-distilbert_base_uncased_finetuned_squad_manik114_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_squad_manik114_pipeline pipeline DistilBertForQuestionAnswering from manik114 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_squad_manik114_pipeline +date: 2024-09-15 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_squad_manik114_pipeline` is a English model originally trained by manik114. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_manik114_pipeline_en_5.5.0_3.0_1726382766265.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_manik114_pipeline_en_5.5.0_3.0_1726382766265.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_squad_manik114_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_squad_manik114_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_squad_manik114_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/manik114/distilbert-base-uncased-finetuned-squad + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-distilbert_base_uncased_fituned_clinc_en.md b/docs/_posts/ahmedlone127/2024-09-15-distilbert_base_uncased_fituned_clinc_en.md new file mode 100644 index 00000000000000..d090c34d1b3549 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-distilbert_base_uncased_fituned_clinc_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_fituned_clinc DistilBertForSequenceClassification from Takeshi10Days +author: John Snow Labs +name: distilbert_base_uncased_fituned_clinc +date: 2024-09-15 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_fituned_clinc` is a English model originally trained by Takeshi10Days. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_fituned_clinc_en_5.5.0_3.0_1726394190213.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_fituned_clinc_en_5.5.0_3.0_1726394190213.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_fituned_clinc","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_fituned_clinc", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_fituned_clinc| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.9 MB| + +## References + +https://huggingface.co/Takeshi10Days/distilbert-base-uncased-fituned-clinc \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-distilbert_base_uncased_idm_zphr_0st52sd_ut32ut1_pl0stlarge42_simsp_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-15-distilbert_base_uncased_idm_zphr_0st52sd_ut32ut1_pl0stlarge42_simsp_pipeline_en.md new file mode 100644 index 00000000000000..1ad5c36bad783c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-distilbert_base_uncased_idm_zphr_0st52sd_ut32ut1_pl0stlarge42_simsp_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_idm_zphr_0st52sd_ut32ut1_pl0stlarge42_simsp_pipeline pipeline DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_idm_zphr_0st52sd_ut32ut1_pl0stlarge42_simsp_pipeline +date: 2024-09-15 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_idm_zphr_0st52sd_ut32ut1_pl0stlarge42_simsp_pipeline` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_idm_zphr_0st52sd_ut32ut1_pl0stlarge42_simsp_pipeline_en_5.5.0_3.0_1726394289124.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_idm_zphr_0st52sd_ut32ut1_pl0stlarge42_simsp_pipeline_en_5.5.0_3.0_1726394289124.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_idm_zphr_0st52sd_ut32ut1_pl0stlarge42_simsp_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_idm_zphr_0st52sd_ut32ut1_pl0stlarge42_simsp_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_idm_zphr_0st52sd_ut32ut1_pl0stlarge42_simsp_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.7 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_idm_zphr_0st52sd_ut32ut1_PL0stlarge42_simsp + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-distilbert_base_uncased_odm_zphr_0st2sd_ut72ut1_plprefix0stlarge2_simsp400_clean100_en.md b/docs/_posts/ahmedlone127/2024-09-15-distilbert_base_uncased_odm_zphr_0st2sd_ut72ut1_plprefix0stlarge2_simsp400_clean100_en.md new file mode 100644 index 00000000000000..bcbd49d7e5696d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-distilbert_base_uncased_odm_zphr_0st2sd_ut72ut1_plprefix0stlarge2_simsp400_clean100_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st2sd_ut72ut1_plprefix0stlarge2_simsp400_clean100 DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st2sd_ut72ut1_plprefix0stlarge2_simsp400_clean100 +date: 2024-09-15 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st2sd_ut72ut1_plprefix0stlarge2_simsp400_clean100` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st2sd_ut72ut1_plprefix0stlarge2_simsp400_clean100_en_5.5.0_3.0_1726406018472.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st2sd_ut72ut1_plprefix0stlarge2_simsp400_clean100_en_5.5.0_3.0_1726406018472.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st2sd_ut72ut1_plprefix0stlarge2_simsp400_clean100","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st2sd_ut72ut1_plprefix0stlarge2_simsp400_clean100", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st2sd_ut72ut1_plprefix0stlarge2_simsp400_clean100| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.6 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st2sd_ut72ut1_PLPrefix0stlarge2_simsp400_clean100 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-distilbert_base_uncased_utility_zphr_0st_ut52ut1_plain_simsp_en.md b/docs/_posts/ahmedlone127/2024-09-15-distilbert_base_uncased_utility_zphr_0st_ut52ut1_plain_simsp_en.md new file mode 100644 index 00000000000000..a1e4d7ace9dfda --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-distilbert_base_uncased_utility_zphr_0st_ut52ut1_plain_simsp_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_utility_zphr_0st_ut52ut1_plain_simsp DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_utility_zphr_0st_ut52ut1_plain_simsp +date: 2024-09-15 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_utility_zphr_0st_ut52ut1_plain_simsp` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_utility_zphr_0st_ut52ut1_plain_simsp_en_5.5.0_3.0_1726406120587.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_utility_zphr_0st_ut52ut1_plain_simsp_en_5.5.0_3.0_1726406120587.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_utility_zphr_0st_ut52ut1_plain_simsp","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_utility_zphr_0st_ut52ut1_plain_simsp", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_utility_zphr_0st_ut52ut1_plain_simsp| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_utility_zphr_0st_ut52ut1_plain_simsp \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-distilbert_finance_future_amounts_en.md b/docs/_posts/ahmedlone127/2024-09-15-distilbert_finance_future_amounts_en.md new file mode 100644 index 00000000000000..bd13cfbd23110c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-distilbert_finance_future_amounts_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_finance_future_amounts DistilBertForSequenceClassification from finsynth +author: John Snow Labs +name: distilbert_finance_future_amounts +date: 2024-09-15 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_finance_future_amounts` is a English model originally trained by finsynth. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_finance_future_amounts_en_5.5.0_3.0_1726365944593.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_finance_future_amounts_en_5.5.0_3.0_1726365944593.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_finance_future_amounts","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_finance_future_amounts", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_finance_future_amounts| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/finsynth/distilbert-finance-future-amounts \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-distilbert_finetuned_imdb_sentiment_aniket_jain_9_en.md b/docs/_posts/ahmedlone127/2024-09-15-distilbert_finetuned_imdb_sentiment_aniket_jain_9_en.md new file mode 100644 index 00000000000000..b9b076527b2729 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-distilbert_finetuned_imdb_sentiment_aniket_jain_9_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_finetuned_imdb_sentiment_aniket_jain_9 DistilBertForSequenceClassification from aniket-jain-9 +author: John Snow Labs +name: distilbert_finetuned_imdb_sentiment_aniket_jain_9 +date: 2024-09-15 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_finetuned_imdb_sentiment_aniket_jain_9` is a English model originally trained by aniket-jain-9. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_finetuned_imdb_sentiment_aniket_jain_9_en_5.5.0_3.0_1726385194171.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_finetuned_imdb_sentiment_aniket_jain_9_en_5.5.0_3.0_1726385194171.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_finetuned_imdb_sentiment_aniket_jain_9","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_finetuned_imdb_sentiment_aniket_jain_9", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_finetuned_imdb_sentiment_aniket_jain_9| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/aniket-jain-9/distilbert-finetuned-imdb-sentiment \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-distilbertbaselineoneepochevaluate_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-15-distilbertbaselineoneepochevaluate_pipeline_en.md new file mode 100644 index 00000000000000..f5b32a612a376c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-distilbertbaselineoneepochevaluate_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English distilbertbaselineoneepochevaluate_pipeline pipeline DistilBertForQuestionAnswering from KarthikAlagarsamy +author: John Snow Labs +name: distilbertbaselineoneepochevaluate_pipeline +date: 2024-09-15 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbertbaselineoneepochevaluate_pipeline` is a English model originally trained by KarthikAlagarsamy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbertbaselineoneepochevaluate_pipeline_en_5.5.0_3.0_1726435328354.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbertbaselineoneepochevaluate_pipeline_en_5.5.0_3.0_1726435328354.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbertbaselineoneepochevaluate_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbertbaselineoneepochevaluate_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbertbaselineoneepochevaluate_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/KarthikAlagarsamy/distilbertbaselineoneepochevaluate + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-distilbertbaselinethreeepochevaluate_en.md b/docs/_posts/ahmedlone127/2024-09-15-distilbertbaselinethreeepochevaluate_en.md new file mode 100644 index 00000000000000..0dce79c8757870 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-distilbertbaselinethreeepochevaluate_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English distilbertbaselinethreeepochevaluate DistilBertForQuestionAnswering from KarthikAlagarsamy +author: John Snow Labs +name: distilbertbaselinethreeepochevaluate +date: 2024-09-15 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbertbaselinethreeepochevaluate` is a English model originally trained by KarthikAlagarsamy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbertbaselinethreeepochevaluate_en_5.5.0_3.0_1726435016793.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbertbaselinethreeepochevaluate_en_5.5.0_3.0_1726435016793.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbertbaselinethreeepochevaluate","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbertbaselinethreeepochevaluate", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbertbaselinethreeepochevaluate| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/KarthikAlagarsamy/distilbertbaselinethreeepochevaluate \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-efficient_mlm_m0_15_801010_en.md b/docs/_posts/ahmedlone127/2024-09-15-efficient_mlm_m0_15_801010_en.md new file mode 100644 index 00000000000000..a1dd00d0690749 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-efficient_mlm_m0_15_801010_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English efficient_mlm_m0_15_801010 RoBertaEmbeddings from princeton-nlp +author: John Snow Labs +name: efficient_mlm_m0_15_801010 +date: 2024-09-15 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`efficient_mlm_m0_15_801010` is a English model originally trained by princeton-nlp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/efficient_mlm_m0_15_801010_en_5.5.0_3.0_1726413626821.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/efficient_mlm_m0_15_801010_en_5.5.0_3.0_1726413626821.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("efficient_mlm_m0_15_801010","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("efficient_mlm_m0_15_801010","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|efficient_mlm_m0_15_801010| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|844.2 MB| + +## References + +https://huggingface.co/princeton-nlp/efficient_mlm_m0.15-801010 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-fae_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-15-fae_pipeline_en.md new file mode 100644 index 00000000000000..c2fab022fd10fa --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-fae_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English fae_pipeline pipeline BertEmbeddings from sereneWithU +author: John Snow Labs +name: fae_pipeline +date: 2024-09-15 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`fae_pipeline` is a English model originally trained by sereneWithU. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/fae_pipeline_en_5.5.0_3.0_1726444271544.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/fae_pipeline_en_5.5.0_3.0_1726444271544.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("fae_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("fae_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|fae_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/sereneWithU/FAE + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-faq_qa_model_vinayvemuri_en.md b/docs/_posts/ahmedlone127/2024-09-15-faq_qa_model_vinayvemuri_en.md new file mode 100644 index 00000000000000..ffca1efe1a8dae --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-faq_qa_model_vinayvemuri_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English faq_qa_model_vinayvemuri DistilBertForQuestionAnswering from vinayvemuri +author: John Snow Labs +name: faq_qa_model_vinayvemuri +date: 2024-09-15 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`faq_qa_model_vinayvemuri` is a English model originally trained by vinayvemuri. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/faq_qa_model_vinayvemuri_en_5.5.0_3.0_1726435031723.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/faq_qa_model_vinayvemuri_en_5.5.0_3.0_1726435031723.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("faq_qa_model_vinayvemuri","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("faq_qa_model_vinayvemuri", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|faq_qa_model_vinayvemuri| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/vinayvemuri/faq_qa_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-fernet_news_slovak_sk.md b/docs/_posts/ahmedlone127/2024-09-15-fernet_news_slovak_sk.md new file mode 100644 index 00000000000000..1e6617138cfddb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-fernet_news_slovak_sk.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Slovak fernet_news_slovak RoBertaEmbeddings from fav-kky +author: John Snow Labs +name: fernet_news_slovak +date: 2024-09-15 +tags: [sk, open_source, onnx, embeddings, roberta] +task: Embeddings +language: sk +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`fernet_news_slovak` is a Slovak model originally trained by fav-kky. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/fernet_news_slovak_sk_5.5.0_3.0_1726383955785.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/fernet_news_slovak_sk_5.5.0_3.0_1726383955785.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("fernet_news_slovak","sk") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("fernet_news_slovak","sk") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|fernet_news_slovak| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|sk| +|Size:|464.6 MB| + +## References + +https://huggingface.co/fav-kky/FERNET-News_sk \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-fine_tuned_resume_model_en.md b/docs/_posts/ahmedlone127/2024-09-15-fine_tuned_resume_model_en.md new file mode 100644 index 00000000000000..15c3831361c68c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-fine_tuned_resume_model_en.md @@ -0,0 +1,98 @@ +--- +layout: model +title: English fine_tuned_resume_model DistilBertForSequenceClassification from Invimatic +author: John Snow Labs +name: fine_tuned_resume_model +date: 2024-09-15 +tags: [bert, en, open_source, sequence_classification, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`fine_tuned_resume_model` is a English model originally trained by Invimatic. + +## Predicted Entities + + + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/fine_tuned_resume_model_en_5.5.0_3.0_1726385216113.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/fine_tuned_resume_model_en_5.5.0_3.0_1726385216113.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +document_assembler = DocumentAssembler()\ + .setInputCol("text")\ + .setOutputCol("document") + +tokenizer = Tokenizer()\ + .setInputCols("document")\ + .setOutputCol("token") + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("fine_tuned_resume_model","en")\ + .setInputCols(["document","token"])\ + .setOutputCol("class") + +pipeline = Pipeline().setStages([document_assembler, tokenizer, sequenceClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val document_assembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("fine_tuned_resume_model","en") + .setInputCols(Array("document","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|fine_tuned_resume_model| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.4 MB| + +## References + +References + +https://huggingface.co/Invimatic/fine_tuned_resume_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-finetuned_sentiment_classfication_roberta_base_model_en.md b/docs/_posts/ahmedlone127/2024-09-15-finetuned_sentiment_classfication_roberta_base_model_en.md new file mode 100644 index 00000000000000..7a5404796f932c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-finetuned_sentiment_classfication_roberta_base_model_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuned_sentiment_classfication_roberta_base_model RoBertaForSequenceClassification from Pendo +author: John Snow Labs +name: finetuned_sentiment_classfication_roberta_base_model +date: 2024-09-15 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuned_sentiment_classfication_roberta_base_model` is a English model originally trained by Pendo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuned_sentiment_classfication_roberta_base_model_en_5.5.0_3.0_1726402239082.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuned_sentiment_classfication_roberta_base_model_en_5.5.0_3.0_1726402239082.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("finetuned_sentiment_classfication_roberta_base_model","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("finetuned_sentiment_classfication_roberta_base_model", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuned_sentiment_classfication_roberta_base_model| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|445.1 MB| + +## References + +https://huggingface.co/Pendo/finetuned-Sentiment-classfication-ROBERTA-Base-model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-finetuning_distilbert_model_steam_game_reviews_en.md b/docs/_posts/ahmedlone127/2024-09-15-finetuning_distilbert_model_steam_game_reviews_en.md new file mode 100644 index 00000000000000..5a6960e605c684 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-finetuning_distilbert_model_steam_game_reviews_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuning_distilbert_model_steam_game_reviews DistilBertForSequenceClassification from zitroeth +author: John Snow Labs +name: finetuning_distilbert_model_steam_game_reviews +date: 2024-09-15 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_distilbert_model_steam_game_reviews` is a English model originally trained by zitroeth. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_distilbert_model_steam_game_reviews_en_5.5.0_3.0_1726393795569.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_distilbert_model_steam_game_reviews_en_5.5.0_3.0_1726393795569.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_distilbert_model_steam_game_reviews","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_distilbert_model_steam_game_reviews", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_distilbert_model_steam_game_reviews| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/zitroeth/finetuning-distilbert-model-steam-game-reviews \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-finetuning_sentiment_model_3000_samples_cr7istian_en.md b/docs/_posts/ahmedlone127/2024-09-15-finetuning_sentiment_model_3000_samples_cr7istian_en.md new file mode 100644 index 00000000000000..8fb282270e4273 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-finetuning_sentiment_model_3000_samples_cr7istian_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_cr7istian DistilBertForSequenceClassification from Cr7istian +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_cr7istian +date: 2024-09-15 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_cr7istian` is a English model originally trained by Cr7istian. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_cr7istian_en_5.5.0_3.0_1726393983772.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_cr7istian_en_5.5.0_3.0_1726393983772.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_cr7istian","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_cr7istian", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_cr7istian| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Cr7istian/finetuning-sentiment-model-3000-samples \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-finetuning_sentiment_model_3000_samples_hugrahulface_en.md b/docs/_posts/ahmedlone127/2024-09-15-finetuning_sentiment_model_3000_samples_hugrahulface_en.md new file mode 100644 index 00000000000000..8708bdb97f96c9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-finetuning_sentiment_model_3000_samples_hugrahulface_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_hugrahulface DistilBertForSequenceClassification from hugrahulface +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_hugrahulface +date: 2024-09-15 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_hugrahulface` is a English model originally trained by hugrahulface. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_hugrahulface_en_5.5.0_3.0_1726406474985.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_hugrahulface_en_5.5.0_3.0_1726406474985.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_hugrahulface","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_hugrahulface", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_hugrahulface| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/hugrahulface/finetuning-sentiment-model-3000-samples \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-finetuning_sentiment_model_3000_samples_sanju2u_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-15-finetuning_sentiment_model_3000_samples_sanju2u_pipeline_en.md new file mode 100644 index 00000000000000..f4d504a3401e63 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-finetuning_sentiment_model_3000_samples_sanju2u_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_sanju2u_pipeline pipeline DistilBertForSequenceClassification from sanju2u +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_sanju2u_pipeline +date: 2024-09-15 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_sanju2u_pipeline` is a English model originally trained by sanju2u. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_sanju2u_pipeline_en_5.5.0_3.0_1726394012126.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_sanju2u_pipeline_en_5.5.0_3.0_1726394012126.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuning_sentiment_model_3000_samples_sanju2u_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuning_sentiment_model_3000_samples_sanju2u_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_sanju2u_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/sanju2u/finetuning-sentiment-model-3000-samples + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-finetuning_sentiment_model_3000_samples_vipuljain_en.md b/docs/_posts/ahmedlone127/2024-09-15-finetuning_sentiment_model_3000_samples_vipuljain_en.md new file mode 100644 index 00000000000000..10fd505aee8d32 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-finetuning_sentiment_model_3000_samples_vipuljain_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_vipuljain DistilBertForSequenceClassification from vipuljain +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_vipuljain +date: 2024-09-15 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_vipuljain` is a English model originally trained by vipuljain. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_vipuljain_en_5.5.0_3.0_1726366199282.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_vipuljain_en_5.5.0_3.0_1726366199282.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_vipuljain","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_vipuljain", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_vipuljain| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/vipuljain/finetuning-sentiment-model-3000-samples \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-finetuning_sentiment_model_david5473_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-15-finetuning_sentiment_model_david5473_pipeline_en.md new file mode 100644 index 00000000000000..e113a06a9be4bf --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-finetuning_sentiment_model_david5473_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuning_sentiment_model_david5473_pipeline pipeline DistilBertForSequenceClassification from David5473 +author: John Snow Labs +name: finetuning_sentiment_model_david5473_pipeline +date: 2024-09-15 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_david5473_pipeline` is a English model originally trained by David5473. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_david5473_pipeline_en_5.5.0_3.0_1726384914483.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_david5473_pipeline_en_5.5.0_3.0_1726384914483.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuning_sentiment_model_david5473_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuning_sentiment_model_david5473_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_david5473_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/David5473/finetuning-sentiment-model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-gal_ner_portuguese_2_en.md b/docs/_posts/ahmedlone127/2024-09-15-gal_ner_portuguese_2_en.md new file mode 100644 index 00000000000000..7e8ae3d1e37a38 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-gal_ner_portuguese_2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English gal_ner_portuguese_2 XlmRoBertaForTokenClassification from homersimpson +author: John Snow Labs +name: gal_ner_portuguese_2 +date: 2024-09-15 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`gal_ner_portuguese_2` is a English model originally trained by homersimpson. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/gal_ner_portuguese_2_en_5.5.0_3.0_1726398340638.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/gal_ner_portuguese_2_en_5.5.0_3.0_1726398340638.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("gal_ner_portuguese_2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("gal_ner_portuguese_2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|gal_ner_portuguese_2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|389.6 MB| + +## References + +https://huggingface.co/homersimpson/gal-ner-pt-2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-hamsa_tiny_v0_8_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-15-hamsa_tiny_v0_8_pipeline_en.md new file mode 100644 index 00000000000000..ea1ebb908d3786 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-hamsa_tiny_v0_8_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English hamsa_tiny_v0_8_pipeline pipeline WhisperForCTC from Ahmed107 +author: John Snow Labs +name: hamsa_tiny_v0_8_pipeline +date: 2024-09-15 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`hamsa_tiny_v0_8_pipeline` is a English model originally trained by Ahmed107. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/hamsa_tiny_v0_8_pipeline_en_5.5.0_3.0_1726427608569.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/hamsa_tiny_v0_8_pipeline_en_5.5.0_3.0_1726427608569.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("hamsa_tiny_v0_8_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("hamsa_tiny_v0_8_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|hamsa_tiny_v0_8_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|389.5 MB| + +## References + +https://huggingface.co/Ahmed107/hamsa-tiny-v0.8 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-hate_hate_balance_random0_seed2_bernice_en.md b/docs/_posts/ahmedlone127/2024-09-15-hate_hate_balance_random0_seed2_bernice_en.md new file mode 100644 index 00000000000000..5bb7c49b187ca2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-hate_hate_balance_random0_seed2_bernice_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English hate_hate_balance_random0_seed2_bernice XlmRoBertaForSequenceClassification from tweettemposhift +author: John Snow Labs +name: hate_hate_balance_random0_seed2_bernice +date: 2024-09-15 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`hate_hate_balance_random0_seed2_bernice` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/hate_hate_balance_random0_seed2_bernice_en_5.5.0_3.0_1726442147144.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/hate_hate_balance_random0_seed2_bernice_en_5.5.0_3.0_1726442147144.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("hate_hate_balance_random0_seed2_bernice","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("hate_hate_balance_random0_seed2_bernice", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|hate_hate_balance_random0_seed2_bernice| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|783.3 MB| + +## References + +https://huggingface.co/tweettemposhift/hate-hate_balance_random0_seed2-bernice \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-hatespeech_distilbert_en.md b/docs/_posts/ahmedlone127/2024-09-15-hatespeech_distilbert_en.md new file mode 100644 index 00000000000000..74d0157c3560b4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-hatespeech_distilbert_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English hatespeech_distilbert DistilBertForSequenceClassification from DL-Project +author: John Snow Labs +name: hatespeech_distilbert +date: 2024-09-15 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`hatespeech_distilbert` is a English model originally trained by DL-Project. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/hatespeech_distilbert_en_5.5.0_3.0_1726366288125.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/hatespeech_distilbert_en_5.5.0_3.0_1726366288125.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("hatespeech_distilbert","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("hatespeech_distilbert", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|hatespeech_distilbert| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/DL-Project/hatespeech_distilbert \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-hawk_en.md b/docs/_posts/ahmedlone127/2024-09-15-hawk_en.md new file mode 100644 index 00000000000000..a85064a2c93a23 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-hawk_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English hawk RoBertaForQuestionAnswering from nickbot606 +author: John Snow Labs +name: hawk +date: 2024-09-15 +tags: [en, open_source, onnx, question_answering, roberta] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`hawk` is a English model originally trained by nickbot606. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/hawk_en_5.5.0_3.0_1726379788668.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/hawk_en_5.5.0_3.0_1726379788668.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = RoBertaForQuestionAnswering.pretrained("hawk","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = RoBertaForQuestionAnswering.pretrained("hawk", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|hawk| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|463.5 MB| + +## References + +https://huggingface.co/nickbot606/hawk \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-lab9_model_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-15-lab9_model_pipeline_en.md new file mode 100644 index 00000000000000..47f7efcef5886a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-lab9_model_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English lab9_model_pipeline pipeline DistilBertForQuestionAnswering from krob +author: John Snow Labs +name: lab9_model_pipeline +date: 2024-09-15 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`lab9_model_pipeline` is a English model originally trained by krob. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/lab9_model_pipeline_en_5.5.0_3.0_1726382426053.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/lab9_model_pipeline_en_5.5.0_3.0_1726382426053.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("lab9_model_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("lab9_model_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|lab9_model_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/krob/lab9_model + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-lenate_model_5_en.md b/docs/_posts/ahmedlone127/2024-09-15-lenate_model_5_en.md new file mode 100644 index 00000000000000..39c5b4eef0eb6d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-lenate_model_5_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English lenate_model_5 DistilBertForSequenceClassification from lenate +author: John Snow Labs +name: lenate_model_5 +date: 2024-09-15 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`lenate_model_5` is a English model originally trained by lenate. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/lenate_model_5_en_5.5.0_3.0_1726385297982.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/lenate_model_5_en_5.5.0_3.0_1726385297982.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("lenate_model_5","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("lenate_model_5", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|lenate_model_5| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/lenate/lenate_model_5 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-m7_mlm_en.md b/docs/_posts/ahmedlone127/2024-09-15-m7_mlm_en.md new file mode 100644 index 00000000000000..72f9d2676982f3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-m7_mlm_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English m7_mlm RoBertaEmbeddings from S2312dal +author: John Snow Labs +name: m7_mlm +date: 2024-09-15 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`m7_mlm` is a English model originally trained by S2312dal. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/m7_mlm_en_5.5.0_3.0_1726383499912.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/m7_mlm_en_5.5.0_3.0_1726383499912.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("m7_mlm","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("m7_mlm","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|m7_mlm| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|306.7 MB| + +## References + +https://huggingface.co/S2312dal/M7_MLM \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-malayalam_anomaly_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-15-malayalam_anomaly_pipeline_en.md new file mode 100644 index 00000000000000..377cc6ffb6ccdb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-malayalam_anomaly_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English malayalam_anomaly_pipeline pipeline DistilBertForSequenceClassification from rn7s2 +author: John Snow Labs +name: malayalam_anomaly_pipeline +date: 2024-09-15 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`malayalam_anomaly_pipeline` is a English model originally trained by rn7s2. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/malayalam_anomaly_pipeline_en_5.5.0_3.0_1726366508291.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/malayalam_anomaly_pipeline_en_5.5.0_3.0_1726366508291.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("malayalam_anomaly_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("malayalam_anomaly_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|malayalam_anomaly_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/rn7s2/ml_anomaly + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-mbtipj_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-15-mbtipj_pipeline_en.md new file mode 100644 index 00000000000000..75bb34b1d9133e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-mbtipj_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English mbtipj_pipeline pipeline AlbertForSequenceClassification from StormyCreeper +author: John Snow Labs +name: mbtipj_pipeline +date: 2024-09-15 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained AlbertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`mbtipj_pipeline` is a English model originally trained by StormyCreeper. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/mbtipj_pipeline_en_5.5.0_3.0_1726372193275.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/mbtipj_pipeline_en_5.5.0_3.0_1726372193275.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("mbtipj_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("mbtipj_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|mbtipj_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|44.2 MB| + +## References + +https://huggingface.co/StormyCreeper/mbtiPJ + +## Included Models + +- DocumentAssembler +- TokenizerModel +- AlbertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-mentalroberta_empai_final3_en.md b/docs/_posts/ahmedlone127/2024-09-15-mentalroberta_empai_final3_en.md new file mode 100644 index 00000000000000..3204a7f1b7a89e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-mentalroberta_empai_final3_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English mentalroberta_empai_final3 RoBertaEmbeddings from LuangMV97 +author: John Snow Labs +name: mentalroberta_empai_final3 +date: 2024-09-15 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`mentalroberta_empai_final3` is a English model originally trained by LuangMV97. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/mentalroberta_empai_final3_en_5.5.0_3.0_1726413205588.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/mentalroberta_empai_final3_en_5.5.0_3.0_1726413205588.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("mentalroberta_empai_final3","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("mentalroberta_empai_final3","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|mentalroberta_empai_final3| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|466.0 MB| + +## References + +https://huggingface.co/LuangMV97/MentalRoBERTa_EmpAI_final3 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-model_01_s14_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-15-model_01_s14_pipeline_en.md new file mode 100644 index 00000000000000..f54283a432cbaf --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-model_01_s14_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English model_01_s14_pipeline pipeline RoBertaForSequenceClassification from Lucrosus +author: John Snow Labs +name: model_01_s14_pipeline +date: 2024-09-15 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`model_01_s14_pipeline` is a English model originally trained by Lucrosus. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/model_01_s14_pipeline_en_5.5.0_3.0_1726439311736.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/model_01_s14_pipeline_en_5.5.0_3.0_1726439311736.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("model_01_s14_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("model_01_s14_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|model_01_s14_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/Lucrosus/model-01-s14 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-models_mil00_en.md b/docs/_posts/ahmedlone127/2024-09-15-models_mil00_en.md new file mode 100644 index 00000000000000..6d76846f7c5d26 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-models_mil00_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English models_mil00 DistilBertForSequenceClassification from Mil00 +author: John Snow Labs +name: models_mil00 +date: 2024-09-15 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`models_mil00` is a English model originally trained by Mil00. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/models_mil00_en_5.5.0_3.0_1726393772071.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/models_mil00_en_5.5.0_3.0_1726393772071.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("models_mil00","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("models_mil00", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|models_mil00| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|250.2 MB| + +## References + +https://huggingface.co/Mil00/Models \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-mon_modele_whisper_json_en.md b/docs/_posts/ahmedlone127/2024-09-15-mon_modele_whisper_json_en.md new file mode 100644 index 00000000000000..074fbf6082c7ee --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-mon_modele_whisper_json_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English mon_modele_whisper_json WhisperForCTC from jeanbap166 +author: John Snow Labs +name: mon_modele_whisper_json +date: 2024-09-15 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`mon_modele_whisper_json` is a English model originally trained by jeanbap166. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/mon_modele_whisper_json_en_5.5.0_3.0_1726387692388.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/mon_modele_whisper_json_en_5.5.0_3.0_1726387692388.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("mon_modele_whisper_json","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("mon_modele_whisper_json", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|mon_modele_whisper_json| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/jeanbap166/mon-modele-whisper_json \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-multi_label_class_classification_on_github_issues_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-15-multi_label_class_classification_on_github_issues_pipeline_en.md new file mode 100644 index 00000000000000..61159a617d61cd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-multi_label_class_classification_on_github_issues_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English multi_label_class_classification_on_github_issues_pipeline pipeline BertForSequenceClassification from Rami +author: John Snow Labs +name: multi_label_class_classification_on_github_issues_pipeline +date: 2024-09-15 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`multi_label_class_classification_on_github_issues_pipeline` is a English model originally trained by Rami. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/multi_label_class_classification_on_github_issues_pipeline_en_5.5.0_3.0_1726375923874.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/multi_label_class_classification_on_github_issues_pipeline_en_5.5.0_3.0_1726375923874.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("multi_label_class_classification_on_github_issues_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("multi_label_class_classification_on_github_issues_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|multi_label_class_classification_on_github_issues_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|411.5 MB| + +## References + +https://huggingface.co/Rami/multi-label-class-classification-on-github-issues + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-n_roberta_sst5_padding20model_en.md b/docs/_posts/ahmedlone127/2024-09-15-n_roberta_sst5_padding20model_en.md new file mode 100644 index 00000000000000..781f5226127ee1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-n_roberta_sst5_padding20model_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English n_roberta_sst5_padding20model RoBertaForSequenceClassification from Realgon +author: John Snow Labs +name: n_roberta_sst5_padding20model +date: 2024-09-15 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`n_roberta_sst5_padding20model` is a English model originally trained by Realgon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/n_roberta_sst5_padding20model_en_5.5.0_3.0_1726439884199.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/n_roberta_sst5_padding20model_en_5.5.0_3.0_1726439884199.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("n_roberta_sst5_padding20model","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("n_roberta_sst5_padding20model", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|n_roberta_sst5_padding20model| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|439.1 MB| + +## References + +https://huggingface.co/Realgon/N_roberta_sst5_padding20model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-ner_411_2_id.md b/docs/_posts/ahmedlone127/2024-09-15-ner_411_2_id.md new file mode 100644 index 00000000000000..6538ad8bdd8d6d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-ner_411_2_id.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Indonesian ner_411_2 XlmRoBertaForTokenClassification from blekkk +author: John Snow Labs +name: ner_411_2 +date: 2024-09-15 +tags: [id, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: id +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ner_411_2` is a Indonesian model originally trained by blekkk. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ner_411_2_id_5.5.0_3.0_1726397714998.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ner_411_2_id_5.5.0_3.0_1726397714998.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("ner_411_2","id") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("ner_411_2", "id") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ner_411_2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|id| +|Size:|772.7 MB| + +## References + +https://huggingface.co/blekkk/ner_411_2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-norwegian_bokml_whisper_base_verbatim_nbailabbeta_no.md b/docs/_posts/ahmedlone127/2024-09-15-norwegian_bokml_whisper_base_verbatim_nbailabbeta_no.md new file mode 100644 index 00000000000000..e8e7791bd5d6ec --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-norwegian_bokml_whisper_base_verbatim_nbailabbeta_no.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Norwegian norwegian_bokml_whisper_base_verbatim_nbailabbeta WhisperForCTC from NbAiLabBeta +author: John Snow Labs +name: norwegian_bokml_whisper_base_verbatim_nbailabbeta +date: 2024-09-15 +tags: ["no", open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: "no" +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`norwegian_bokml_whisper_base_verbatim_nbailabbeta` is a Norwegian model originally trained by NbAiLabBeta. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/norwegian_bokml_whisper_base_verbatim_nbailabbeta_no_5.5.0_3.0_1726359129562.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/norwegian_bokml_whisper_base_verbatim_nbailabbeta_no_5.5.0_3.0_1726359129562.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("norwegian_bokml_whisper_base_verbatim_nbailabbeta","no") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("norwegian_bokml_whisper_base_verbatim_nbailabbeta", "no") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|norwegian_bokml_whisper_base_verbatim_nbailabbeta| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|no| +|Size:|633.6 MB| + +## References + +https://huggingface.co/NbAiLabBeta/nb-whisper-base-verbatim \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-openai_whisper_tiny_spanish_ecu911dm_pipeline_es.md b/docs/_posts/ahmedlone127/2024-09-15-openai_whisper_tiny_spanish_ecu911dm_pipeline_es.md new file mode 100644 index 00000000000000..7d8253dda00a49 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-openai_whisper_tiny_spanish_ecu911dm_pipeline_es.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Castilian, Spanish openai_whisper_tiny_spanish_ecu911dm_pipeline pipeline WhisperForCTC from DanielMarquez +author: John Snow Labs +name: openai_whisper_tiny_spanish_ecu911dm_pipeline +date: 2024-09-15 +tags: [es, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: es +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`openai_whisper_tiny_spanish_ecu911dm_pipeline` is a Castilian, Spanish model originally trained by DanielMarquez. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/openai_whisper_tiny_spanish_ecu911dm_pipeline_es_5.5.0_3.0_1726407231786.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/openai_whisper_tiny_spanish_ecu911dm_pipeline_es_5.5.0_3.0_1726407231786.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("openai_whisper_tiny_spanish_ecu911dm_pipeline", lang = "es") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("openai_whisper_tiny_spanish_ecu911dm_pipeline", lang = "es") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|openai_whisper_tiny_spanish_ecu911dm_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|es| +|Size:|379.6 MB| + +## References + +https://huggingface.co/DanielMarquez/openai-whisper-tiny-es_ecu911DM + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-output_en.md b/docs/_posts/ahmedlone127/2024-09-15-output_en.md new file mode 100644 index 00000000000000..d3b775989f43a7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-output_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English output DistilBertEmbeddings from soyisauce +author: John Snow Labs +name: output +date: 2024-09-15 +tags: [distilbert, en, open_source, fill_mask, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`output` is a English model originally trained by soyisauce. + +## Predicted Entities + + + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/output_en_5.5.0_3.0_1726361757457.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/output_en_5.5.0_3.0_1726361757457.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +document_assembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +embeddings =DistilBertEmbeddings.pretrained("output","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([document_assembler, embeddings]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) +``` +```scala +val document_assembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val embeddings = DistilBertEmbeddings + .pretrained("output", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(document_assembler, embeddings)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|output| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|853.8 MB| + +## References + +References + +References + +https://huggingface.co/soyisauce/output \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-pardonmycaption_en.md b/docs/_posts/ahmedlone127/2024-09-15-pardonmycaption_en.md new file mode 100644 index 00000000000000..0bd83665c61985 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-pardonmycaption_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English pardonmycaption DistilBertForSequenceClassification from tarekziade +author: John Snow Labs +name: pardonmycaption +date: 2024-09-15 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`pardonmycaption` is a English model originally trained by tarekziade. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/pardonmycaption_en_5.5.0_3.0_1726384787799.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/pardonmycaption_en_5.5.0_3.0_1726384787799.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("pardonmycaption","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("pardonmycaption", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|pardonmycaption| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/tarekziade/pardonmycaption \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-ratingbook_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-15-ratingbook_pipeline_en.md new file mode 100644 index 00000000000000..8d965fc6ebd4cf --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-ratingbook_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English ratingbook_pipeline pipeline DistilBertForSequenceClassification from DragonImortal +author: John Snow Labs +name: ratingbook_pipeline +date: 2024-09-15 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ratingbook_pipeline` is a English model originally trained by DragonImortal. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ratingbook_pipeline_en_5.5.0_3.0_1726394108104.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ratingbook_pipeline_en_5.5.0_3.0_1726394108104.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("ratingbook_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("ratingbook_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ratingbook_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/DragonImortal/Ratingbook + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-retrieval_model_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-15-retrieval_model_pipeline_en.md new file mode 100644 index 00000000000000..85c938bf1bb1de --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-retrieval_model_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English retrieval_model_pipeline pipeline DistilBertForSequenceClassification from sms1097 +author: John Snow Labs +name: retrieval_model_pipeline +date: 2024-09-15 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`retrieval_model_pipeline` is a English model originally trained by sms1097. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/retrieval_model_pipeline_en_5.5.0_3.0_1726365727359.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/retrieval_model_pipeline_en_5.5.0_3.0_1726365727359.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("retrieval_model_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("retrieval_model_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|retrieval_model_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/sms1097/retrieval_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-roberta_base_bne_finetuned_sqac_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-15-roberta_base_bne_finetuned_sqac_pipeline_en.md new file mode 100644 index 00000000000000..51e94d55cc7654 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-roberta_base_bne_finetuned_sqac_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English roberta_base_bne_finetuned_sqac_pipeline pipeline RoBertaForQuestionAnswering from DevCar +author: John Snow Labs +name: roberta_base_bne_finetuned_sqac_pipeline +date: 2024-09-15 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_bne_finetuned_sqac_pipeline` is a English model originally trained by DevCar. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_bne_finetuned_sqac_pipeline_en_5.5.0_3.0_1726368815186.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_bne_finetuned_sqac_pipeline_en_5.5.0_3.0_1726368815186.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_bne_finetuned_sqac_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_bne_finetuned_sqac_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_bne_finetuned_sqac_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|457.4 MB| + +## References + +https://huggingface.co/DevCar/roberta-base-bne-finetuned-sqac + +## Included Models + +- MultiDocumentAssembler +- RoBertaForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-roberta_base_epoch_57_en.md b/docs/_posts/ahmedlone127/2024-09-15-roberta_base_epoch_57_en.md new file mode 100644 index 00000000000000..2b3e000e9a83dc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-roberta_base_epoch_57_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_base_epoch_57 RoBertaEmbeddings from yanaiela +author: John Snow Labs +name: roberta_base_epoch_57 +date: 2024-09-15 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_epoch_57` is a English model originally trained by yanaiela. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_epoch_57_en_5.5.0_3.0_1726413324052.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_epoch_57_en_5.5.0_3.0_1726413324052.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("roberta_base_epoch_57","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("roberta_base_epoch_57","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_epoch_57| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|297.3 MB| + +## References + +https://huggingface.co/yanaiela/roberta-base-epoch_57 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-roberta_base_finetuned_academic_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-15-roberta_base_finetuned_academic_pipeline_en.md new file mode 100644 index 00000000000000..c51556400f8437 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-roberta_base_finetuned_academic_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_finetuned_academic_pipeline pipeline RoBertaEmbeddings from egumasa +author: John Snow Labs +name: roberta_base_finetuned_academic_pipeline +date: 2024-09-15 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_finetuned_academic_pipeline` is a English model originally trained by egumasa. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_finetuned_academic_pipeline_en_5.5.0_3.0_1726413979664.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_finetuned_academic_pipeline_en_5.5.0_3.0_1726413979664.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_finetuned_academic_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_finetuned_academic_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_finetuned_academic_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|466.3 MB| + +## References + +https://huggingface.co/egumasa/roberta-base-finetuned-academic + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-roberta_base_finetuned_squad_ngchuchi_en.md b/docs/_posts/ahmedlone127/2024-09-15-roberta_base_finetuned_squad_ngchuchi_en.md new file mode 100644 index 00000000000000..3849c45a083fea --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-roberta_base_finetuned_squad_ngchuchi_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English roberta_base_finetuned_squad_ngchuchi RoBertaForQuestionAnswering from ngchuchi +author: John Snow Labs +name: roberta_base_finetuned_squad_ngchuchi +date: 2024-09-15 +tags: [en, open_source, onnx, question_answering, roberta] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_finetuned_squad_ngchuchi` is a English model originally trained by ngchuchi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_finetuned_squad_ngchuchi_en_5.5.0_3.0_1726368974488.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_finetuned_squad_ngchuchi_en_5.5.0_3.0_1726368974488.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = RoBertaForQuestionAnswering.pretrained("roberta_base_finetuned_squad_ngchuchi","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = RoBertaForQuestionAnswering.pretrained("roberta_base_finetuned_squad_ngchuchi", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_finetuned_squad_ngchuchi| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|443.2 MB| + +## References + +https://huggingface.co/ngchuchi/roberta-base-finetuned-squad \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-roberta_base_lora_591k_squad_model3_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-15-roberta_base_lora_591k_squad_model3_pipeline_en.md new file mode 100644 index 00000000000000..71ba736db37112 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-roberta_base_lora_591k_squad_model3_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English roberta_base_lora_591k_squad_model3_pipeline pipeline RoBertaForQuestionAnswering from varun-v-rao +author: John Snow Labs +name: roberta_base_lora_591k_squad_model3_pipeline +date: 2024-09-15 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_lora_591k_squad_model3_pipeline` is a English model originally trained by varun-v-rao. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_lora_591k_squad_model3_pipeline_en_5.5.0_3.0_1726369290494.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_lora_591k_squad_model3_pipeline_en_5.5.0_3.0_1726369290494.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_lora_591k_squad_model3_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_lora_591k_squad_model3_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_lora_591k_squad_model3_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|317.4 MB| + +## References + +https://huggingface.co/varun-v-rao/roberta-base-lora-591K-squad-model3 + +## Included Models + +- MultiDocumentAssembler +- RoBertaForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-roberta_finetuned_medquad_2_en.md b/docs/_posts/ahmedlone127/2024-09-15-roberta_finetuned_medquad_2_en.md new file mode 100644 index 00000000000000..bbd629f28f8b27 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-roberta_finetuned_medquad_2_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English roberta_finetuned_medquad_2 RoBertaForQuestionAnswering from DataScientist1122 +author: John Snow Labs +name: roberta_finetuned_medquad_2 +date: 2024-09-15 +tags: [en, open_source, onnx, question_answering, roberta] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_finetuned_medquad_2` is a English model originally trained by DataScientist1122. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_finetuned_medquad_2_en_5.5.0_3.0_1726363983517.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_finetuned_medquad_2_en_5.5.0_3.0_1726363983517.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = RoBertaForQuestionAnswering.pretrained("roberta_finetuned_medquad_2","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = RoBertaForQuestionAnswering.pretrained("roberta_finetuned_medquad_2", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_finetuned_medquad_2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|443.9 MB| + +## References + +https://huggingface.co/DataScientist1122/roberta-finetuned-medquad_2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-roberta_large_finetuned_mrpc_en.md b/docs/_posts/ahmedlone127/2024-09-15-roberta_large_finetuned_mrpc_en.md new file mode 100644 index 00000000000000..8e6930db08b459 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-roberta_large_finetuned_mrpc_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_large_finetuned_mrpc RoBertaForSequenceClassification from VitaliiVrublevskyi +author: John Snow Labs +name: roberta_large_finetuned_mrpc +date: 2024-09-15 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_large_finetuned_mrpc` is a English model originally trained by VitaliiVrublevskyi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_large_finetuned_mrpc_en_5.5.0_3.0_1726401680278.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_large_finetuned_mrpc_en_5.5.0_3.0_1726401680278.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_large_finetuned_mrpc","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_large_finetuned_mrpc", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_large_finetuned_mrpc| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/VitaliiVrublevskyi/roberta-large-finetuned-mrpc \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-roberta_llm_noninstruct_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-15-roberta_llm_noninstruct_pipeline_en.md new file mode 100644 index 00000000000000..b1787472943dde --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-roberta_llm_noninstruct_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_llm_noninstruct_pipeline pipeline RoBertaForSequenceClassification from Multiperspective +author: John Snow Labs +name: roberta_llm_noninstruct_pipeline +date: 2024-09-15 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_llm_noninstruct_pipeline` is a English model originally trained by Multiperspective. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_llm_noninstruct_pipeline_en_5.5.0_3.0_1726439689269.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_llm_noninstruct_pipeline_en_5.5.0_3.0_1726439689269.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_llm_noninstruct_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_llm_noninstruct_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_llm_noninstruct_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/Multiperspective/roberta-llm-noninstruct + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-roberta_qa_RoBERTa_for_seizureFrequency_QA_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-15-roberta_qa_RoBERTa_for_seizureFrequency_QA_pipeline_en.md new file mode 100644 index 00000000000000..8f8b28a1868f95 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-roberta_qa_RoBERTa_for_seizureFrequency_QA_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English roberta_qa_RoBERTa_for_seizureFrequency_QA_pipeline pipeline RoBertaForQuestionAnswering from CNT-UPenn +author: John Snow Labs +name: roberta_qa_RoBERTa_for_seizureFrequency_QA_pipeline +date: 2024-09-15 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_qa_RoBERTa_for_seizureFrequency_QA_pipeline` is a English model originally trained by CNT-UPenn. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_qa_RoBERTa_for_seizureFrequency_QA_pipeline_en_5.5.0_3.0_1726364233424.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_qa_RoBERTa_for_seizureFrequency_QA_pipeline_en_5.5.0_3.0_1726364233424.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_qa_RoBERTa_for_seizureFrequency_QA_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_qa_RoBERTa_for_seizureFrequency_QA_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_qa_RoBERTa_for_seizureFrequency_QA_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|466.2 MB| + +## References + +https://huggingface.co/CNT-UPenn/RoBERTa_for_seizureFrequency_QA + +## Included Models + +- MultiDocumentAssembler +- RoBertaForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-roberta_spanish_v2_en.md b/docs/_posts/ahmedlone127/2024-09-15-roberta_spanish_v2_en.md new file mode 100644 index 00000000000000..a73e2c811400f4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-roberta_spanish_v2_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English roberta_spanish_v2 RoBertaForQuestionAnswering from enriquesaou +author: John Snow Labs +name: roberta_spanish_v2 +date: 2024-09-15 +tags: [en, open_source, onnx, question_answering, roberta] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_spanish_v2` is a English model originally trained by enriquesaou. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_spanish_v2_en_5.5.0_3.0_1726364035175.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_spanish_v2_en_5.5.0_3.0_1726364035175.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = RoBertaForQuestionAnswering.pretrained("roberta_spanish_v2","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = RoBertaForQuestionAnswering.pretrained("roberta_spanish_v2", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_spanish_v2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|436.5 MB| + +## References + +https://huggingface.co/enriquesaou/roberta_es_v2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-schem_roberta_demographic_text_disagreement_predictor_en.md b/docs/_posts/ahmedlone127/2024-09-15-schem_roberta_demographic_text_disagreement_predictor_en.md new file mode 100644 index 00000000000000..316024af12cdcf --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-schem_roberta_demographic_text_disagreement_predictor_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English schem_roberta_demographic_text_disagreement_predictor RoBertaForSequenceClassification from RuyuanWan +author: John Snow Labs +name: schem_roberta_demographic_text_disagreement_predictor +date: 2024-09-15 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`schem_roberta_demographic_text_disagreement_predictor` is a English model originally trained by RuyuanWan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/schem_roberta_demographic_text_disagreement_predictor_en_5.5.0_3.0_1726401504408.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/schem_roberta_demographic_text_disagreement_predictor_en_5.5.0_3.0_1726401504408.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("schem_roberta_demographic_text_disagreement_predictor","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("schem_roberta_demographic_text_disagreement_predictor", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|schem_roberta_demographic_text_disagreement_predictor| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|422.8 MB| + +## References + +https://huggingface.co/RuyuanWan/SChem_RoBERTa_Demographic-text_Disagreement_Predictor \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-schem_roberta_demographic_text_disagreement_predictor_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-15-schem_roberta_demographic_text_disagreement_predictor_pipeline_en.md new file mode 100644 index 00000000000000..fa9b5537d9daa1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-schem_roberta_demographic_text_disagreement_predictor_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English schem_roberta_demographic_text_disagreement_predictor_pipeline pipeline RoBertaForSequenceClassification from RuyuanWan +author: John Snow Labs +name: schem_roberta_demographic_text_disagreement_predictor_pipeline +date: 2024-09-15 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`schem_roberta_demographic_text_disagreement_predictor_pipeline` is a English model originally trained by RuyuanWan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/schem_roberta_demographic_text_disagreement_predictor_pipeline_en_5.5.0_3.0_1726401544764.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/schem_roberta_demographic_text_disagreement_predictor_pipeline_en_5.5.0_3.0_1726401544764.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("schem_roberta_demographic_text_disagreement_predictor_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("schem_roberta_demographic_text_disagreement_predictor_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|schem_roberta_demographic_text_disagreement_predictor_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|422.8 MB| + +## References + +https://huggingface.co/RuyuanWan/SChem_RoBERTa_Demographic-text_Disagreement_Predictor + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-sent_bert_base_10lang_cased_pipeline_xx.md b/docs/_posts/ahmedlone127/2024-09-15-sent_bert_base_10lang_cased_pipeline_xx.md new file mode 100644 index 00000000000000..df157e08c940ac --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-sent_bert_base_10lang_cased_pipeline_xx.md @@ -0,0 +1,71 @@ +--- +layout: model +title: Multilingual sent_bert_base_10lang_cased_pipeline pipeline BertSentenceEmbeddings from Geotrend +author: John Snow Labs +name: sent_bert_base_10lang_cased_pipeline +date: 2024-09-15 +tags: [xx, open_source, pipeline, onnx] +task: Embeddings +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_10lang_cased_pipeline` is a Multilingual model originally trained by Geotrend. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_10lang_cased_pipeline_xx_5.5.0_3.0_1726436556417.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_10lang_cased_pipeline_xx_5.5.0_3.0_1726436556417.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_10lang_cased_pipeline", lang = "xx") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_10lang_cased_pipeline", lang = "xx") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_10lang_cased_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|xx| +|Size:|514.9 MB| + +## References + +https://huggingface.co/Geotrend/bert-base-10lang-cased + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-sent_bert_base_arabic_camelbert_msa_half_pipeline_ar.md b/docs/_posts/ahmedlone127/2024-09-15-sent_bert_base_arabic_camelbert_msa_half_pipeline_ar.md new file mode 100644 index 00000000000000..8db992a22ee512 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-sent_bert_base_arabic_camelbert_msa_half_pipeline_ar.md @@ -0,0 +1,71 @@ +--- +layout: model +title: Arabic sent_bert_base_arabic_camelbert_msa_half_pipeline pipeline BertSentenceEmbeddings from CAMeL-Lab +author: John Snow Labs +name: sent_bert_base_arabic_camelbert_msa_half_pipeline +date: 2024-09-15 +tags: [ar, open_source, pipeline, onnx] +task: Embeddings +language: ar +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_arabic_camelbert_msa_half_pipeline` is a Arabic model originally trained by CAMeL-Lab. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_arabic_camelbert_msa_half_pipeline_ar_5.5.0_3.0_1726377533658.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_arabic_camelbert_msa_half_pipeline_ar_5.5.0_3.0_1726377533658.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_arabic_camelbert_msa_half_pipeline", lang = "ar") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_arabic_camelbert_msa_half_pipeline", lang = "ar") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_arabic_camelbert_msa_half_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|ar| +|Size:|406.9 MB| + +## References + +https://huggingface.co/CAMeL-Lab/bert-base-arabic-camelbert-msa-half + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-sent_clinicaltrialbiobert_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-15-sent_clinicaltrialbiobert_pipeline_en.md new file mode 100644 index 00000000000000..5638018c3684d0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-sent_clinicaltrialbiobert_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_clinicaltrialbiobert_pipeline pipeline BertSentenceEmbeddings from domenicrosati +author: John Snow Labs +name: sent_clinicaltrialbiobert_pipeline +date: 2024-09-15 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_clinicaltrialbiobert_pipeline` is a English model originally trained by domenicrosati. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_clinicaltrialbiobert_pipeline_en_5.5.0_3.0_1726442987150.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_clinicaltrialbiobert_pipeline_en_5.5.0_3.0_1726442987150.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_clinicaltrialbiobert_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_clinicaltrialbiobert_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_clinicaltrialbiobert_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|403.3 MB| + +## References + +https://huggingface.co/domenicrosati/ClinicalTrialBioBert + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-sent_gbert_base_pipeline_de.md b/docs/_posts/ahmedlone127/2024-09-15-sent_gbert_base_pipeline_de.md new file mode 100644 index 00000000000000..c15b68ed056289 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-sent_gbert_base_pipeline_de.md @@ -0,0 +1,71 @@ +--- +layout: model +title: German sent_gbert_base_pipeline pipeline BertSentenceEmbeddings from deepset +author: John Snow Labs +name: sent_gbert_base_pipeline +date: 2024-09-15 +tags: [de, open_source, pipeline, onnx] +task: Embeddings +language: de +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_gbert_base_pipeline` is a German model originally trained by deepset. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_gbert_base_pipeline_de_5.5.0_3.0_1726436783308.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_gbert_base_pipeline_de_5.5.0_3.0_1726436783308.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_gbert_base_pipeline", lang = "de") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_gbert_base_pipeline", lang = "de") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_gbert_base_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|de| +|Size:|410.3 MB| + +## References + +https://huggingface.co/deepset/gbert-base + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-sent_greeksocialbert_base_greek_social_media_v2_el.md b/docs/_posts/ahmedlone127/2024-09-15-sent_greeksocialbert_base_greek_social_media_v2_el.md new file mode 100644 index 00000000000000..c9bb9e89cac6dc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-sent_greeksocialbert_base_greek_social_media_v2_el.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Modern Greek (1453-) sent_greeksocialbert_base_greek_social_media_v2 BertSentenceEmbeddings from pchatz +author: John Snow Labs +name: sent_greeksocialbert_base_greek_social_media_v2 +date: 2024-09-15 +tags: [el, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: el +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_greeksocialbert_base_greek_social_media_v2` is a Modern Greek (1453-) model originally trained by pchatz. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_greeksocialbert_base_greek_social_media_v2_el_5.5.0_3.0_1726443236388.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_greeksocialbert_base_greek_social_media_v2_el_5.5.0_3.0_1726443236388.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_greeksocialbert_base_greek_social_media_v2","el") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_greeksocialbert_base_greek_social_media_v2","el") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_greeksocialbert_base_greek_social_media_v2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|el| +|Size:|421.3 MB| + +## References + +https://huggingface.co/pchatz/greeksocialbert-base-greek-social-media-v2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-sent_kannada_bert_kn.md b/docs/_posts/ahmedlone127/2024-09-15-sent_kannada_bert_kn.md new file mode 100644 index 00000000000000..a1b4d2d4b136dd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-sent_kannada_bert_kn.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Kannada sent_kannada_bert BertSentenceEmbeddings from l3cube-pune +author: John Snow Labs +name: sent_kannada_bert +date: 2024-09-15 +tags: [kn, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: kn +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_kannada_bert` is a Kannada model originally trained by l3cube-pune. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_kannada_bert_kn_5.5.0_3.0_1726443487561.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_kannada_bert_kn_5.5.0_3.0_1726443487561.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_kannada_bert","kn") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_kannada_bert","kn") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_kannada_bert| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|kn| +|Size:|890.5 MB| + +## References + +https://huggingface.co/l3cube-pune/kannada-bert \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-six_class_sentimental_classifier_distilbert_base_en.md b/docs/_posts/ahmedlone127/2024-09-15-six_class_sentimental_classifier_distilbert_base_en.md new file mode 100644 index 00000000000000..9cfac3609e698c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-six_class_sentimental_classifier_distilbert_base_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English six_class_sentimental_classifier_distilbert_base DistilBertForSequenceClassification from halugop +author: John Snow Labs +name: six_class_sentimental_classifier_distilbert_base +date: 2024-09-15 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`six_class_sentimental_classifier_distilbert_base` is a English model originally trained by halugop. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/six_class_sentimental_classifier_distilbert_base_en_5.5.0_3.0_1726393675841.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/six_class_sentimental_classifier_distilbert_base_en_5.5.0_3.0_1726393675841.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("six_class_sentimental_classifier_distilbert_base","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("six_class_sentimental_classifier_distilbert_base", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|six_class_sentimental_classifier_distilbert_base| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/halugop/Six_Class_Sentimental_Classifier_DistilBERT_base \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-stego_classifier_checkpoint_epoch_30_2024_07_26_16_03_28_en.md b/docs/_posts/ahmedlone127/2024-09-15-stego_classifier_checkpoint_epoch_30_2024_07_26_16_03_28_en.md new file mode 100644 index 00000000000000..2a52cbe100021d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-stego_classifier_checkpoint_epoch_30_2024_07_26_16_03_28_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English stego_classifier_checkpoint_epoch_30_2024_07_26_16_03_28 DistilBertForSequenceClassification from jvelja +author: John Snow Labs +name: stego_classifier_checkpoint_epoch_30_2024_07_26_16_03_28 +date: 2024-09-15 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`stego_classifier_checkpoint_epoch_30_2024_07_26_16_03_28` is a English model originally trained by jvelja. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/stego_classifier_checkpoint_epoch_30_2024_07_26_16_03_28_en_5.5.0_3.0_1726406110768.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/stego_classifier_checkpoint_epoch_30_2024_07_26_16_03_28_en_5.5.0_3.0_1726406110768.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("stego_classifier_checkpoint_epoch_30_2024_07_26_16_03_28","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("stego_classifier_checkpoint_epoch_30_2024_07_26_16_03_28", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|stego_classifier_checkpoint_epoch_30_2024_07_26_16_03_28| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/jvelja/stego-classifier-checkpoint-epoch-30-2024-07-26_16-03-28 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-stego_classifier_checkpoint_epoch_41_start_exp_time_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-15-stego_classifier_checkpoint_epoch_41_start_exp_time_pipeline_en.md new file mode 100644 index 00000000000000..7d87497879bc1b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-stego_classifier_checkpoint_epoch_41_start_exp_time_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English stego_classifier_checkpoint_epoch_41_start_exp_time_pipeline pipeline DistilBertForSequenceClassification from jvelja +author: John Snow Labs +name: stego_classifier_checkpoint_epoch_41_start_exp_time_pipeline +date: 2024-09-15 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`stego_classifier_checkpoint_epoch_41_start_exp_time_pipeline` is a English model originally trained by jvelja. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/stego_classifier_checkpoint_epoch_41_start_exp_time_pipeline_en_5.5.0_3.0_1726393672043.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/stego_classifier_checkpoint_epoch_41_start_exp_time_pipeline_en_5.5.0_3.0_1726393672043.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("stego_classifier_checkpoint_epoch_41_start_exp_time_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("stego_classifier_checkpoint_epoch_41_start_exp_time_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|stego_classifier_checkpoint_epoch_41_start_exp_time_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/jvelja/stego-classifier-checkpoint-epoch-41-START_EXP_TIME + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-stego_classifier_checkpoint_epoch_51_start_exp_time_en.md b/docs/_posts/ahmedlone127/2024-09-15-stego_classifier_checkpoint_epoch_51_start_exp_time_en.md new file mode 100644 index 00000000000000..92b0b20185a5c1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-stego_classifier_checkpoint_epoch_51_start_exp_time_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English stego_classifier_checkpoint_epoch_51_start_exp_time DistilBertForSequenceClassification from jvelja +author: John Snow Labs +name: stego_classifier_checkpoint_epoch_51_start_exp_time +date: 2024-09-15 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`stego_classifier_checkpoint_epoch_51_start_exp_time` is a English model originally trained by jvelja. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/stego_classifier_checkpoint_epoch_51_start_exp_time_en_5.5.0_3.0_1726405783186.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/stego_classifier_checkpoint_epoch_51_start_exp_time_en_5.5.0_3.0_1726405783186.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("stego_classifier_checkpoint_epoch_51_start_exp_time","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("stego_classifier_checkpoint_epoch_51_start_exp_time", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|stego_classifier_checkpoint_epoch_51_start_exp_time| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/jvelja/stego-classifier-checkpoint-epoch-51-START_EXP_TIME \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-stego_classifier_checkpoint_epoch_61_start_exp_time_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-15-stego_classifier_checkpoint_epoch_61_start_exp_time_pipeline_en.md new file mode 100644 index 00000000000000..d9b20a53c7f9f0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-stego_classifier_checkpoint_epoch_61_start_exp_time_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English stego_classifier_checkpoint_epoch_61_start_exp_time_pipeline pipeline DistilBertForSequenceClassification from jvelja +author: John Snow Labs +name: stego_classifier_checkpoint_epoch_61_start_exp_time_pipeline +date: 2024-09-15 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`stego_classifier_checkpoint_epoch_61_start_exp_time_pipeline` is a English model originally trained by jvelja. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/stego_classifier_checkpoint_epoch_61_start_exp_time_pipeline_en_5.5.0_3.0_1726385410407.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/stego_classifier_checkpoint_epoch_61_start_exp_time_pipeline_en_5.5.0_3.0_1726385410407.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("stego_classifier_checkpoint_epoch_61_start_exp_time_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("stego_classifier_checkpoint_epoch_61_start_exp_time_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|stego_classifier_checkpoint_epoch_61_start_exp_time_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/jvelja/stego-classifier-checkpoint-epoch-61-START_EXP_TIME + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-sun_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-15-sun_pipeline_en.md new file mode 100644 index 00000000000000..555c8b9bb4f82b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-sun_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English sun_pipeline pipeline DistilBertForSequenceClassification from chebmarcel +author: John Snow Labs +name: sun_pipeline +date: 2024-09-15 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sun_pipeline` is a English model originally trained by chebmarcel. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sun_pipeline_en_5.5.0_3.0_1726406032178.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sun_pipeline_en_5.5.0_3.0_1726406032178.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sun_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sun_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sun_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/chebmarcel/sun + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-test_rrrrrrrita_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-15-test_rrrrrrrita_pipeline_en.md new file mode 100644 index 00000000000000..620016743feed3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-test_rrrrrrrita_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English test_rrrrrrrita_pipeline pipeline DistilBertForSequenceClassification from Rrrrrrrita +author: John Snow Labs +name: test_rrrrrrrita_pipeline +date: 2024-09-15 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`test_rrrrrrrita_pipeline` is a English model originally trained by Rrrrrrrita. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/test_rrrrrrrita_pipeline_en_5.5.0_3.0_1726366512741.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/test_rrrrrrrita_pipeline_en_5.5.0_3.0_1726366512741.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("test_rrrrrrrita_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("test_rrrrrrrita_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|test_rrrrrrrita_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Rrrrrrrita/test + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-tiny_english_combined_v4_4_0_8_1e_06_silver_sweep_32_en.md b/docs/_posts/ahmedlone127/2024-09-15-tiny_english_combined_v4_4_0_8_1e_06_silver_sweep_32_en.md new file mode 100644 index 00000000000000..6f01466414b935 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-tiny_english_combined_v4_4_0_8_1e_06_silver_sweep_32_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English tiny_english_combined_v4_4_0_8_1e_06_silver_sweep_32 WhisperForCTC from saahith +author: John Snow Labs +name: tiny_english_combined_v4_4_0_8_1e_06_silver_sweep_32 +date: 2024-09-15 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`tiny_english_combined_v4_4_0_8_1e_06_silver_sweep_32` is a English model originally trained by saahith. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/tiny_english_combined_v4_4_0_8_1e_06_silver_sweep_32_en_5.5.0_3.0_1726425306754.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/tiny_english_combined_v4_4_0_8_1e_06_silver_sweep_32_en_5.5.0_3.0_1726425306754.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("tiny_english_combined_v4_4_0_8_1e_06_silver_sweep_32","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("tiny_english_combined_v4_4_0_8_1e_06_silver_sweep_32", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|tiny_english_combined_v4_4_0_8_1e_06_silver_sweep_32| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|394.8 MB| + +## References + +https://huggingface.co/saahith/tiny.en-combined_v4-4-0-8-1e-06-silver-sweep-32 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-visobert_vihds_en.md b/docs/_posts/ahmedlone127/2024-09-15-visobert_vihds_en.md new file mode 100644 index 00000000000000..baa8a39f019b1b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-visobert_vihds_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English visobert_vihds XlmRoBertaForSequenceClassification from kietnt0603 +author: John Snow Labs +name: visobert_vihds +date: 2024-09-15 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`visobert_vihds` is a English model originally trained by kietnt0603. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/visobert_vihds_en_5.5.0_3.0_1726440474925.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/visobert_vihds_en_5.5.0_3.0_1726440474925.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("visobert_vihds","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("visobert_vihds", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|visobert_vihds| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|365.9 MB| + +## References + +https://huggingface.co/kietnt0603/visobert-vihds \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-w_f1_tiny_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-15-w_f1_tiny_pipeline_en.md new file mode 100644 index 00000000000000..c7163661d6fa50 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-w_f1_tiny_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English w_f1_tiny_pipeline pipeline WhisperForCTC from bhattasp +author: John Snow Labs +name: w_f1_tiny_pipeline +date: 2024-09-15 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`w_f1_tiny_pipeline` is a English model originally trained by bhattasp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/w_f1_tiny_pipeline_en_5.5.0_3.0_1726407222900.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/w_f1_tiny_pipeline_en_5.5.0_3.0_1726407222900.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("w_f1_tiny_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("w_f1_tiny_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|w_f1_tiny_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|389.9 MB| + +## References + +https://huggingface.co/bhattasp/w_f1_tiny + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-whisper_small_cantonese_funpang_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-15-whisper_small_cantonese_funpang_pipeline_en.md new file mode 100644 index 00000000000000..2476552048075d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-whisper_small_cantonese_funpang_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_small_cantonese_funpang_pipeline pipeline WhisperForCTC from FunPang +author: John Snow Labs +name: whisper_small_cantonese_funpang_pipeline +date: 2024-09-15 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_cantonese_funpang_pipeline` is a English model originally trained by FunPang. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_cantonese_funpang_pipeline_en_5.5.0_3.0_1726411138087.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_cantonese_funpang_pipeline_en_5.5.0_3.0_1726411138087.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_cantonese_funpang_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_cantonese_funpang_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_cantonese_funpang_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/FunPang/whisper-small-Cantonese + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-whisper_small_ga2en_v1_0_1_pipeline_ga.md b/docs/_posts/ahmedlone127/2024-09-15-whisper_small_ga2en_v1_0_1_pipeline_ga.md new file mode 100644 index 00000000000000..3bfdb74892108d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-whisper_small_ga2en_v1_0_1_pipeline_ga.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Irish whisper_small_ga2en_v1_0_1_pipeline pipeline WhisperForCTC from ymoslem +author: John Snow Labs +name: whisper_small_ga2en_v1_0_1_pipeline +date: 2024-09-15 +tags: [ga, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: ga +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_ga2en_v1_0_1_pipeline` is a Irish model originally trained by ymoslem. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_ga2en_v1_0_1_pipeline_ga_5.5.0_3.0_1726432147893.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_ga2en_v1_0_1_pipeline_ga_5.5.0_3.0_1726432147893.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_ga2en_v1_0_1_pipeline", lang = "ga") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_ga2en_v1_0_1_pipeline", lang = "ga") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_ga2en_v1_0_1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|ga| +|Size:|1.7 GB| + +## References + +https://huggingface.co/ymoslem/whisper-small-ga2en-v1.0.1 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-whisper_small_northern_sami_pipeline_sv.md b/docs/_posts/ahmedlone127/2024-09-15-whisper_small_northern_sami_pipeline_sv.md new file mode 100644 index 00000000000000..426953ece11613 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-whisper_small_northern_sami_pipeline_sv.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Swedish whisper_small_northern_sami_pipeline pipeline WhisperForCTC from TeoJM +author: John Snow Labs +name: whisper_small_northern_sami_pipeline +date: 2024-09-15 +tags: [sv, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: sv +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_northern_sami_pipeline` is a Swedish model originally trained by TeoJM. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_northern_sami_pipeline_sv_5.5.0_3.0_1726410543283.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_northern_sami_pipeline_sv_5.5.0_3.0_1726410543283.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_northern_sami_pipeline", lang = "sv") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_northern_sami_pipeline", lang = "sv") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_northern_sami_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|sv| +|Size:|1.7 GB| + +## References + +https://huggingface.co/TeoJM/whisper-small-se + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-whisper_small_northern_sami_sv.md b/docs/_posts/ahmedlone127/2024-09-15-whisper_small_northern_sami_sv.md new file mode 100644 index 00000000000000..00bbdaf5f2ac5f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-whisper_small_northern_sami_sv.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Swedish whisper_small_northern_sami WhisperForCTC from TeoJM +author: John Snow Labs +name: whisper_small_northern_sami +date: 2024-09-15 +tags: [sv, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: sv +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_northern_sami` is a Swedish model originally trained by TeoJM. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_northern_sami_sv_5.5.0_3.0_1726410459125.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_northern_sami_sv_5.5.0_3.0_1726410459125.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_northern_sami","sv") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_northern_sami", "sv") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_northern_sami| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|sv| +|Size:|1.7 GB| + +## References + +https://huggingface.co/TeoJM/whisper-small-se \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-whisper_small_spanish_zuazo_es.md b/docs/_posts/ahmedlone127/2024-09-15-whisper_small_spanish_zuazo_es.md new file mode 100644 index 00000000000000..a8bc2c95d36f2a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-whisper_small_spanish_zuazo_es.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Castilian, Spanish whisper_small_spanish_zuazo WhisperForCTC from zuazo +author: John Snow Labs +name: whisper_small_spanish_zuazo +date: 2024-09-15 +tags: [es, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: es +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_spanish_zuazo` is a Castilian, Spanish model originally trained by zuazo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_spanish_zuazo_es_5.5.0_3.0_1726390289364.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_spanish_zuazo_es_5.5.0_3.0_1726390289364.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_spanish_zuazo","es") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_spanish_zuazo", "es") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_spanish_zuazo| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|es| +|Size:|1.7 GB| + +## References + +https://huggingface.co/zuazo/whisper-small-es \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-whisper_small_vietnamese_joey234_pipeline_vi.md b/docs/_posts/ahmedlone127/2024-09-15-whisper_small_vietnamese_joey234_pipeline_vi.md new file mode 100644 index 00000000000000..8a5c91f13eff49 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-whisper_small_vietnamese_joey234_pipeline_vi.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Vietnamese whisper_small_vietnamese_joey234_pipeline pipeline WhisperForCTC from joey234 +author: John Snow Labs +name: whisper_small_vietnamese_joey234_pipeline +date: 2024-09-15 +tags: [vi, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: vi +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_vietnamese_joey234_pipeline` is a Vietnamese model originally trained by joey234. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_vietnamese_joey234_pipeline_vi_5.5.0_3.0_1726420433791.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_vietnamese_joey234_pipeline_vi_5.5.0_3.0_1726420433791.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_vietnamese_joey234_pipeline", lang = "vi") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_vietnamese_joey234_pipeline", lang = "vi") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_vietnamese_joey234_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|vi| +|Size:|1.7 GB| + +## References + +https://huggingface.co/joey234/whisper-small-vi + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-whisper_tiny_child10k_adult6k_pipeline_ko.md b/docs/_posts/ahmedlone127/2024-09-15-whisper_tiny_child10k_adult6k_pipeline_ko.md new file mode 100644 index 00000000000000..cd88bcab5b5b7c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-whisper_tiny_child10k_adult6k_pipeline_ko.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Korean whisper_tiny_child10k_adult6k_pipeline pipeline WhisperForCTC from haseong8012 +author: John Snow Labs +name: whisper_tiny_child10k_adult6k_pipeline +date: 2024-09-15 +tags: [ko, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: ko +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_child10k_adult6k_pipeline` is a Korean model originally trained by haseong8012. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_child10k_adult6k_pipeline_ko_5.5.0_3.0_1726425725889.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_child10k_adult6k_pipeline_ko_5.5.0_3.0_1726425725889.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_tiny_child10k_adult6k_pipeline", lang = "ko") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_tiny_child10k_adult6k_pipeline", lang = "ko") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_child10k_adult6k_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|ko| +|Size:|390.1 MB| + +## References + +https://huggingface.co/haseong8012/whisper-tiny_child10k-adult6k + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-whisper_tiny_faroese_100h_5k_steps_v2_en.md b/docs/_posts/ahmedlone127/2024-09-15-whisper_tiny_faroese_100h_5k_steps_v2_en.md new file mode 100644 index 00000000000000..a8b93d707422e7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-whisper_tiny_faroese_100h_5k_steps_v2_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_tiny_faroese_100h_5k_steps_v2 WhisperForCTC from davidilag +author: John Snow Labs +name: whisper_tiny_faroese_100h_5k_steps_v2 +date: 2024-09-15 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_faroese_100h_5k_steps_v2` is a English model originally trained by davidilag. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_faroese_100h_5k_steps_v2_en_5.5.0_3.0_1726412161207.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_faroese_100h_5k_steps_v2_en_5.5.0_3.0_1726412161207.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_tiny_faroese_100h_5k_steps_v2","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_tiny_faroese_100h_5k_steps_v2", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_faroese_100h_5k_steps_v2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|389.9 MB| + +## References + +https://huggingface.co/davidilag/whisper-tiny-fo-100h-5k-steps_v2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-whisper_tiny_faroese_100h_5k_steps_v2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-15-whisper_tiny_faroese_100h_5k_steps_v2_pipeline_en.md new file mode 100644 index 00000000000000..563ce6e699b879 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-whisper_tiny_faroese_100h_5k_steps_v2_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_tiny_faroese_100h_5k_steps_v2_pipeline pipeline WhisperForCTC from davidilag +author: John Snow Labs +name: whisper_tiny_faroese_100h_5k_steps_v2_pipeline +date: 2024-09-15 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_faroese_100h_5k_steps_v2_pipeline` is a English model originally trained by davidilag. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_faroese_100h_5k_steps_v2_pipeline_en_5.5.0_3.0_1726412181921.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_faroese_100h_5k_steps_v2_pipeline_en_5.5.0_3.0_1726412181921.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_tiny_faroese_100h_5k_steps_v2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_tiny_faroese_100h_5k_steps_v2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_faroese_100h_5k_steps_v2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|389.9 MB| + +## References + +https://huggingface.co/davidilag/whisper-tiny-fo-100h-5k-steps_v2 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-whisper_tiny_hebrew_modern_2_mike249_pipeline_he.md b/docs/_posts/ahmedlone127/2024-09-15-whisper_tiny_hebrew_modern_2_mike249_pipeline_he.md new file mode 100644 index 00000000000000..3e4036e09a11f4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-whisper_tiny_hebrew_modern_2_mike249_pipeline_he.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Hebrew whisper_tiny_hebrew_modern_2_mike249_pipeline pipeline WhisperForCTC from mike249 +author: John Snow Labs +name: whisper_tiny_hebrew_modern_2_mike249_pipeline +date: 2024-09-15 +tags: [he, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: he +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_hebrew_modern_2_mike249_pipeline` is a Hebrew model originally trained by mike249. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_hebrew_modern_2_mike249_pipeline_he_5.5.0_3.0_1726386999871.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_hebrew_modern_2_mike249_pipeline_he_5.5.0_3.0_1726386999871.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_tiny_hebrew_modern_2_mike249_pipeline", lang = "he") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_tiny_hebrew_modern_2_mike249_pipeline", lang = "he") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_hebrew_modern_2_mike249_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|he| +|Size:|390.1 MB| + +## References + +https://huggingface.co/mike249/whisper-tiny-he-2 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-withinapps_ndd_claroline_test_content_tags_cwadj_en.md b/docs/_posts/ahmedlone127/2024-09-15-withinapps_ndd_claroline_test_content_tags_cwadj_en.md new file mode 100644 index 00000000000000..fce55c16341a9f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-withinapps_ndd_claroline_test_content_tags_cwadj_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English withinapps_ndd_claroline_test_content_tags_cwadj DistilBertForSequenceClassification from lgk03 +author: John Snow Labs +name: withinapps_ndd_claroline_test_content_tags_cwadj +date: 2024-09-15 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`withinapps_ndd_claroline_test_content_tags_cwadj` is a English model originally trained by lgk03. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/withinapps_ndd_claroline_test_content_tags_cwadj_en_5.5.0_3.0_1726406164624.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/withinapps_ndd_claroline_test_content_tags_cwadj_en_5.5.0_3.0_1726406164624.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("withinapps_ndd_claroline_test_content_tags_cwadj","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("withinapps_ndd_claroline_test_content_tags_cwadj", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|withinapps_ndd_claroline_test_content_tags_cwadj| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/lgk03/WITHINAPPS_NDD-claroline_test-content_tags-CWAdj \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-xlm_roberta_base_cobaxlmr_en.md b/docs/_posts/ahmedlone127/2024-09-15-xlm_roberta_base_cobaxlmr_en.md new file mode 100644 index 00000000000000..1f45448bd058fe --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-xlm_roberta_base_cobaxlmr_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_cobaxlmr XlmRoBertaForSequenceClassification from alyazharr +author: John Snow Labs +name: xlm_roberta_base_cobaxlmr +date: 2024-09-15 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_cobaxlmr` is a English model originally trained by alyazharr. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_cobaxlmr_en_5.5.0_3.0_1726433906680.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_cobaxlmr_en_5.5.0_3.0_1726433906680.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_cobaxlmr","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_cobaxlmr", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_cobaxlmr| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|827.0 MB| + +## References + +https://huggingface.co/alyazharr/xlm_roberta_base_cobaxlmr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-xlm_roberta_base_finetuned_panx_all_cicimen_en.md b/docs/_posts/ahmedlone127/2024-09-15-xlm_roberta_base_finetuned_panx_all_cicimen_en.md new file mode 100644 index 00000000000000..ed5ae70b54a65b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-xlm_roberta_base_finetuned_panx_all_cicimen_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_all_cicimen XlmRoBertaForTokenClassification from cicimen +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_all_cicimen +date: 2024-09-15 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_all_cicimen` is a English model originally trained by cicimen. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_cicimen_en_5.5.0_3.0_1726362566584.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_cicimen_en_5.5.0_3.0_1726362566584.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_all_cicimen","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_all_cicimen", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_all_cicimen| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|848.0 MB| + +## References + +https://huggingface.co/cicimen/xlm-roberta-base-finetuned-panx-all \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-xlm_roberta_base_finetuned_panx_english_thkkvui_en.md b/docs/_posts/ahmedlone127/2024-09-15-xlm_roberta_base_finetuned_panx_english_thkkvui_en.md new file mode 100644 index 00000000000000..0c6a1c7a470c03 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-xlm_roberta_base_finetuned_panx_english_thkkvui_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_english_thkkvui XlmRoBertaForTokenClassification from thkkvui +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_english_thkkvui +date: 2024-09-15 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_english_thkkvui` is a English model originally trained by thkkvui. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_thkkvui_en_5.5.0_3.0_1726370740706.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_thkkvui_en_5.5.0_3.0_1726370740706.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_english_thkkvui","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_english_thkkvui", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_english_thkkvui| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|826.4 MB| + +## References + +https://huggingface.co/thkkvui/xlm-roberta-base-finetuned-panx-en \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-xlm_roberta_base_finetuned_panx_german_aidiary_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-15-xlm_roberta_base_finetuned_panx_german_aidiary_pipeline_en.md new file mode 100644 index 00000000000000..c1ac89165f260c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-xlm_roberta_base_finetuned_panx_german_aidiary_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_aidiary_pipeline pipeline XlmRoBertaForTokenClassification from aidiary +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_aidiary_pipeline +date: 2024-09-15 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_aidiary_pipeline` is a English model originally trained by aidiary. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_aidiary_pipeline_en_5.5.0_3.0_1726362502384.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_aidiary_pipeline_en_5.5.0_3.0_1726362502384.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_aidiary_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_aidiary_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_aidiary_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/aidiary/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-xlm_roberta_base_finetuned_panx_german_rigsbyjt_en.md b/docs/_posts/ahmedlone127/2024-09-15-xlm_roberta_base_finetuned_panx_german_rigsbyjt_en.md new file mode 100644 index 00000000000000..91a14607809328 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-xlm_roberta_base_finetuned_panx_german_rigsbyjt_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_rigsbyjt XlmRoBertaForTokenClassification from rigsbyjt +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_rigsbyjt +date: 2024-09-15 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_rigsbyjt` is a English model originally trained by rigsbyjt. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_rigsbyjt_en_5.5.0_3.0_1726362911288.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_rigsbyjt_en_5.5.0_3.0_1726362911288.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_rigsbyjt","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_rigsbyjt", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_rigsbyjt| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/rigsbyjt/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-xlm_roberta_base_finetuned_panx_german_rigsbyjt_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-15-xlm_roberta_base_finetuned_panx_german_rigsbyjt_pipeline_en.md new file mode 100644 index 00000000000000..b0b5591f95b377 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-xlm_roberta_base_finetuned_panx_german_rigsbyjt_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_rigsbyjt_pipeline pipeline XlmRoBertaForTokenClassification from rigsbyjt +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_rigsbyjt_pipeline +date: 2024-09-15 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_rigsbyjt_pipeline` is a English model originally trained by rigsbyjt. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_rigsbyjt_pipeline_en_5.5.0_3.0_1726362974578.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_rigsbyjt_pipeline_en_5.5.0_3.0_1726362974578.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_rigsbyjt_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_rigsbyjt_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_rigsbyjt_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/rigsbyjt/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-xlm_roberta_base_finetuned_panx_german_tatsunori_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-15-xlm_roberta_base_finetuned_panx_german_tatsunori_pipeline_en.md new file mode 100644 index 00000000000000..a596a36832e2e7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-xlm_roberta_base_finetuned_panx_german_tatsunori_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_tatsunori_pipeline pipeline XlmRoBertaForTokenClassification from tatsunori +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_tatsunori_pipeline +date: 2024-09-15 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_tatsunori_pipeline` is a English model originally trained by tatsunori. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_tatsunori_pipeline_en_5.5.0_3.0_1726370327393.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_tatsunori_pipeline_en_5.5.0_3.0_1726370327393.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_tatsunori_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_tatsunori_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_tatsunori_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|840.8 MB| + +## References + +https://huggingface.co/tatsunori/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-xlm_roberta_base_trimmed_arabic_15000_xnli_arabic_en.md b/docs/_posts/ahmedlone127/2024-09-15-xlm_roberta_base_trimmed_arabic_15000_xnli_arabic_en.md new file mode 100644 index 00000000000000..76ff9d2331722f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-xlm_roberta_base_trimmed_arabic_15000_xnli_arabic_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_trimmed_arabic_15000_xnli_arabic XlmRoBertaForSequenceClassification from vocabtrimmer +author: John Snow Labs +name: xlm_roberta_base_trimmed_arabic_15000_xnli_arabic +date: 2024-09-15 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_trimmed_arabic_15000_xnli_arabic` is a English model originally trained by vocabtrimmer. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_trimmed_arabic_15000_xnli_arabic_en_5.5.0_3.0_1726373030353.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_trimmed_arabic_15000_xnli_arabic_en_5.5.0_3.0_1726373030353.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_trimmed_arabic_15000_xnli_arabic","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_trimmed_arabic_15000_xnli_arabic", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_trimmed_arabic_15000_xnli_arabic| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|364.9 MB| + +## References + +https://huggingface.co/vocabtrimmer/xlm-roberta-base-trimmed-ar-15000-xnli-ar \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-xlm_roberta_base_trimmed_arabic_xnli_arabic_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-15-xlm_roberta_base_trimmed_arabic_xnli_arabic_pipeline_en.md new file mode 100644 index 00000000000000..39a574c8f10a6a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-xlm_roberta_base_trimmed_arabic_xnli_arabic_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_trimmed_arabic_xnli_arabic_pipeline pipeline XlmRoBertaForSequenceClassification from vocabtrimmer +author: John Snow Labs +name: xlm_roberta_base_trimmed_arabic_xnli_arabic_pipeline +date: 2024-09-15 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_trimmed_arabic_xnli_arabic_pipeline` is a English model originally trained by vocabtrimmer. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_trimmed_arabic_xnli_arabic_pipeline_en_5.5.0_3.0_1726442040510.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_trimmed_arabic_xnli_arabic_pipeline_en_5.5.0_3.0_1726442040510.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_trimmed_arabic_xnli_arabic_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_trimmed_arabic_xnli_arabic_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_trimmed_arabic_xnli_arabic_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|438.4 MB| + +## References + +https://huggingface.co/vocabtrimmer/xlm-roberta-base-trimmed-ar-xnli-ar + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-xlm_roberta_base_trimmed_german_60000_tweet_sentiment_german_en.md b/docs/_posts/ahmedlone127/2024-09-15-xlm_roberta_base_trimmed_german_60000_tweet_sentiment_german_en.md new file mode 100644 index 00000000000000..88473d0ffd4364 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-xlm_roberta_base_trimmed_german_60000_tweet_sentiment_german_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_trimmed_german_60000_tweet_sentiment_german XlmRoBertaForSequenceClassification from vocabtrimmer +author: John Snow Labs +name: xlm_roberta_base_trimmed_german_60000_tweet_sentiment_german +date: 2024-09-15 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_trimmed_german_60000_tweet_sentiment_german` is a English model originally trained by vocabtrimmer. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_trimmed_german_60000_tweet_sentiment_german_en_5.5.0_3.0_1726373060290.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_trimmed_german_60000_tweet_sentiment_german_en_5.5.0_3.0_1726373060290.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_trimmed_german_60000_tweet_sentiment_german","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_trimmed_german_60000_tweet_sentiment_german", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_trimmed_german_60000_tweet_sentiment_german| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|444.5 MB| + +## References + +https://huggingface.co/vocabtrimmer/xlm-roberta-base-trimmed-de-60000-tweet-sentiment-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-200mil_codeberta_small_v1_en.md b/docs/_posts/ahmedlone127/2024-09-16-200mil_codeberta_small_v1_en.md new file mode 100644 index 00000000000000..4c1ea36931b144 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-200mil_codeberta_small_v1_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English 200mil_codeberta_small_v1 RoBertaForSequenceClassification from G-WOO +author: John Snow Labs +name: 200mil_codeberta_small_v1 +date: 2024-09-16 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`200mil_codeberta_small_v1` is a English model originally trained by G-WOO. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/200mil_codeberta_small_v1_en_5.5.0_3.0_1726504699805.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/200mil_codeberta_small_v1_en_5.5.0_3.0_1726504699805.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("200mil_codeberta_small_v1","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("200mil_codeberta_small_v1", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|200mil_codeberta_small_v1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|314.0 MB| + +## References + +https://huggingface.co/G-WOO/200mil-CodeBERTa-small-v1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-2504separado1_en.md b/docs/_posts/ahmedlone127/2024-09-16-2504separado1_en.md new file mode 100644 index 00000000000000..59c9729df4ccaa --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-2504separado1_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English 2504separado1 RoBertaForSequenceClassification from adriansanz +author: John Snow Labs +name: 2504separado1 +date: 2024-09-16 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`2504separado1` is a English model originally trained by adriansanz. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/2504separado1_en_5.5.0_3.0_1726455828348.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/2504separado1_en_5.5.0_3.0_1726455828348.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("2504separado1","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("2504separado1", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|2504separado1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|448.4 MB| + +## References + +https://huggingface.co/adriansanz/2504separado1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-2504separado1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-2504separado1_pipeline_en.md new file mode 100644 index 00000000000000..56ec4604d90dc9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-2504separado1_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English 2504separado1_pipeline pipeline RoBertaForSequenceClassification from adriansanz +author: John Snow Labs +name: 2504separado1_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`2504separado1_pipeline` is a English model originally trained by adriansanz. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/2504separado1_pipeline_en_5.5.0_3.0_1726455856455.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/2504separado1_pipeline_en_5.5.0_3.0_1726455856455.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("2504separado1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("2504separado1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|2504separado1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|448.4 MB| + +## References + +https://huggingface.co/adriansanz/2504separado1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-2nddeproberta_en.md b/docs/_posts/ahmedlone127/2024-09-16-2nddeproberta_en.md new file mode 100644 index 00000000000000..f5ad7da51b0aa0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-2nddeproberta_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English 2nddeproberta RoBertaForSequenceClassification from ericNguyen0132 +author: John Snow Labs +name: 2nddeproberta +date: 2024-09-16 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`2nddeproberta` is a English model originally trained by ericNguyen0132. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/2nddeproberta_en_5.5.0_3.0_1726470936268.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/2nddeproberta_en_5.5.0_3.0_1726470936268.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("2nddeproberta","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("2nddeproberta", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|2nddeproberta| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/ericNguyen0132/2ndDepRoBERTa \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-2nddeproberta_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-2nddeproberta_pipeline_en.md new file mode 100644 index 00000000000000..3c8b44bedc154f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-2nddeproberta_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English 2nddeproberta_pipeline pipeline RoBertaForSequenceClassification from ericNguyen0132 +author: John Snow Labs +name: 2nddeproberta_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`2nddeproberta_pipeline` is a English model originally trained by ericNguyen0132. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/2nddeproberta_pipeline_en_5.5.0_3.0_1726470999355.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/2nddeproberta_pipeline_en_5.5.0_3.0_1726470999355.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("2nddeproberta_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("2nddeproberta_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|2nddeproberta_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/ericNguyen0132/2ndDepRoBERTa + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-440fc6ae_75a5_4a7e_a238_65e06e620a59_en.md b/docs/_posts/ahmedlone127/2024-09-16-440fc6ae_75a5_4a7e_a238_65e06e620a59_en.md new file mode 100644 index 00000000000000..47d14d6b458521 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-440fc6ae_75a5_4a7e_a238_65e06e620a59_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English 440fc6ae_75a5_4a7e_a238_65e06e620a59 RoBertaForSequenceClassification from IDQO +author: John Snow Labs +name: 440fc6ae_75a5_4a7e_a238_65e06e620a59 +date: 2024-09-16 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`440fc6ae_75a5_4a7e_a238_65e06e620a59` is a English model originally trained by IDQO. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/440fc6ae_75a5_4a7e_a238_65e06e620a59_en_5.5.0_3.0_1726527948657.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/440fc6ae_75a5_4a7e_a238_65e06e620a59_en_5.5.0_3.0_1726527948657.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("440fc6ae_75a5_4a7e_a238_65e06e620a59","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("440fc6ae_75a5_4a7e_a238_65e06e620a59", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|440fc6ae_75a5_4a7e_a238_65e06e620a59| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|437.9 MB| + +## References + +https://huggingface.co/IDQO/440fc6ae-75a5-4a7e-a238-65e06e620a59 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-afro_xlmr_base_hausa_2e_5_en.md b/docs/_posts/ahmedlone127/2024-09-16-afro_xlmr_base_hausa_2e_5_en.md new file mode 100644 index 00000000000000..b0965a9ee8e323 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-afro_xlmr_base_hausa_2e_5_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English afro_xlmr_base_hausa_2e_5 XlmRoBertaForTokenClassification from grace-pro +author: John Snow Labs +name: afro_xlmr_base_hausa_2e_5 +date: 2024-09-16 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`afro_xlmr_base_hausa_2e_5` is a English model originally trained by grace-pro. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/afro_xlmr_base_hausa_2e_5_en_5.5.0_3.0_1726498048437.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/afro_xlmr_base_hausa_2e_5_en_5.5.0_3.0_1726498048437.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("afro_xlmr_base_hausa_2e_5","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("afro_xlmr_base_hausa_2e_5", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|afro_xlmr_base_hausa_2e_5| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|1.0 GB| + +## References + +https://huggingface.co/grace-pro/afro-xlmr-base-hausa-2e-5 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-ag_news_bert_classification_en.md b/docs/_posts/ahmedlone127/2024-09-16-ag_news_bert_classification_en.md new file mode 100644 index 00000000000000..9259bbcef6c0d8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-ag_news_bert_classification_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English ag_news_bert_classification BertForSequenceClassification from mansoorhamidzadeh +author: John Snow Labs +name: ag_news_bert_classification +date: 2024-09-16 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ag_news_bert_classification` is a English model originally trained by mansoorhamidzadeh. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ag_news_bert_classification_en_5.5.0_3.0_1726462574545.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ag_news_bert_classification_en_5.5.0_3.0_1726462574545.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("ag_news_bert_classification","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("ag_news_bert_classification", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ag_news_bert_classification| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/mansoorhamidzadeh/ag-news-bert-classification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-ag_news_classification_distillbert_en.md b/docs/_posts/ahmedlone127/2024-09-16-ag_news_classification_distillbert_en.md new file mode 100644 index 00000000000000..5f45202e5ddb5a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-ag_news_classification_distillbert_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English ag_news_classification_distillbert DistilBertForSequenceClassification from cornelliusyudhawijaya +author: John Snow Labs +name: ag_news_classification_distillbert +date: 2024-09-16 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ag_news_classification_distillbert` is a English model originally trained by cornelliusyudhawijaya. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ag_news_classification_distillbert_en_5.5.0_3.0_1726506389298.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ag_news_classification_distillbert_en_5.5.0_3.0_1726506389298.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("ag_news_classification_distillbert","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("ag_news_classification_distillbert", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ag_news_classification_distillbert| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/cornelliusyudhawijaya/AG_News_Classification_DistillBert \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-ai_text_detector_en.md b/docs/_posts/ahmedlone127/2024-09-16-ai_text_detector_en.md new file mode 100644 index 00000000000000..28b1659e99e1d6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-ai_text_detector_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English ai_text_detector BertForSequenceClassification from yongchao +author: John Snow Labs +name: ai_text_detector +date: 2024-09-16 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ai_text_detector` is a English model originally trained by yongchao. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ai_text_detector_en_5.5.0_3.0_1726493332482.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ai_text_detector_en_5.5.0_3.0_1726493332482.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("ai_text_detector","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("ai_text_detector", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ai_text_detector| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/yongchao/ai_text_detector \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-albert_finetuned_tweet_eval_en.md b/docs/_posts/ahmedlone127/2024-09-16-albert_finetuned_tweet_eval_en.md new file mode 100644 index 00000000000000..39243970fb976c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-albert_finetuned_tweet_eval_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English albert_finetuned_tweet_eval AlbertForSequenceClassification from iaminhridoy +author: John Snow Labs +name: albert_finetuned_tweet_eval +date: 2024-09-16 +tags: [en, open_source, onnx, sequence_classification, albert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: AlbertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained AlbertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`albert_finetuned_tweet_eval` is a English model originally trained by iaminhridoy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/albert_finetuned_tweet_eval_en_5.5.0_3.0_1726523514014.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/albert_finetuned_tweet_eval_en_5.5.0_3.0_1726523514014.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = AlbertForSequenceClassification.pretrained("albert_finetuned_tweet_eval","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = AlbertForSequenceClassification.pretrained("albert_finetuned_tweet_eval", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|albert_finetuned_tweet_eval| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|44.2 MB| + +## References + +https://huggingface.co/iaminhridoy/AlBert-finetuned-Tweet_Eval \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-albert_finetuned_tweet_eval_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-albert_finetuned_tweet_eval_pipeline_en.md new file mode 100644 index 00000000000000..bcce8ff176c55b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-albert_finetuned_tweet_eval_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English albert_finetuned_tweet_eval_pipeline pipeline AlbertForSequenceClassification from iaminhridoy +author: John Snow Labs +name: albert_finetuned_tweet_eval_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained AlbertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`albert_finetuned_tweet_eval_pipeline` is a English model originally trained by iaminhridoy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/albert_finetuned_tweet_eval_pipeline_en_5.5.0_3.0_1726523516421.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/albert_finetuned_tweet_eval_pipeline_en_5.5.0_3.0_1726523516421.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("albert_finetuned_tweet_eval_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("albert_finetuned_tweet_eval_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|albert_finetuned_tweet_eval_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|44.2 MB| + +## References + +https://huggingface.co/iaminhridoy/AlBert-finetuned-Tweet_Eval + +## Included Models + +- DocumentAssembler +- TokenizerModel +- AlbertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-albert_sentiment_analysis_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-albert_sentiment_analysis_pipeline_en.md new file mode 100644 index 00000000000000..d1261823b9fefe --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-albert_sentiment_analysis_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English albert_sentiment_analysis_pipeline pipeline AlbertForSequenceClassification from maherh +author: John Snow Labs +name: albert_sentiment_analysis_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained AlbertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`albert_sentiment_analysis_pipeline` is a English model originally trained by maherh. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/albert_sentiment_analysis_pipeline_en_5.5.0_3.0_1726523544056.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/albert_sentiment_analysis_pipeline_en_5.5.0_3.0_1726523544056.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("albert_sentiment_analysis_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("albert_sentiment_analysis_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|albert_sentiment_analysis_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|44.2 MB| + +## References + +https://huggingface.co/maherh/albert_sentiment_analysis + +## Included Models + +- DocumentAssembler +- TokenizerModel +- AlbertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-all_roberta_large_v1_travel_7_16_5_en.md b/docs/_posts/ahmedlone127/2024-09-16-all_roberta_large_v1_travel_7_16_5_en.md new file mode 100644 index 00000000000000..eae3bc51f317ce --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-all_roberta_large_v1_travel_7_16_5_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English all_roberta_large_v1_travel_7_16_5 RoBertaForSequenceClassification from fathyshalab +author: John Snow Labs +name: all_roberta_large_v1_travel_7_16_5 +date: 2024-09-16 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`all_roberta_large_v1_travel_7_16_5` is a English model originally trained by fathyshalab. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/all_roberta_large_v1_travel_7_16_5_en_5.5.0_3.0_1726456078562.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/all_roberta_large_v1_travel_7_16_5_en_5.5.0_3.0_1726456078562.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("all_roberta_large_v1_travel_7_16_5","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("all_roberta_large_v1_travel_7_16_5", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|all_roberta_large_v1_travel_7_16_5| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/fathyshalab/all-roberta-large-v1-travel-7-16-5 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-all_roberta_large_v1_utility_4_16_5_oos_en.md b/docs/_posts/ahmedlone127/2024-09-16-all_roberta_large_v1_utility_4_16_5_oos_en.md new file mode 100644 index 00000000000000..7a8c8ece0a97a5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-all_roberta_large_v1_utility_4_16_5_oos_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English all_roberta_large_v1_utility_4_16_5_oos RoBertaForSequenceClassification from fathyshalab +author: John Snow Labs +name: all_roberta_large_v1_utility_4_16_5_oos +date: 2024-09-16 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`all_roberta_large_v1_utility_4_16_5_oos` is a English model originally trained by fathyshalab. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/all_roberta_large_v1_utility_4_16_5_oos_en_5.5.0_3.0_1726518405918.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/all_roberta_large_v1_utility_4_16_5_oos_en_5.5.0_3.0_1726518405918.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("all_roberta_large_v1_utility_4_16_5_oos","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("all_roberta_large_v1_utility_4_16_5_oos", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|all_roberta_large_v1_utility_4_16_5_oos| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/fathyshalab/all-roberta-large-v1-utility-4-16-5-oos \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-alvaro_marian_finetuned_italian_turkish_en.md b/docs/_posts/ahmedlone127/2024-09-16-alvaro_marian_finetuned_italian_turkish_en.md new file mode 100644 index 00000000000000..b5af13dd696725 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-alvaro_marian_finetuned_italian_turkish_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English alvaro_marian_finetuned_italian_turkish MarianTransformer from Rooshan +author: John Snow Labs +name: alvaro_marian_finetuned_italian_turkish +date: 2024-09-16 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`alvaro_marian_finetuned_italian_turkish` is a English model originally trained by Rooshan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/alvaro_marian_finetuned_italian_turkish_en_5.5.0_3.0_1726503358878.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/alvaro_marian_finetuned_italian_turkish_en_5.5.0_3.0_1726503358878.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("alvaro_marian_finetuned_italian_turkish","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("alvaro_marian_finetuned_italian_turkish","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|alvaro_marian_finetuned_italian_turkish| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/Rooshan/Alvaro-marian_finetuned_it_tr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-autotrain_tais_roberta_53328125642_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-autotrain_tais_roberta_53328125642_pipeline_en.md new file mode 100644 index 00000000000000..c39eb82ccab750 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-autotrain_tais_roberta_53328125642_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English autotrain_tais_roberta_53328125642_pipeline pipeline RoBertaForSequenceClassification from manasviiiiiiiiiiiiiiiiiiiiiiiiii +author: John Snow Labs +name: autotrain_tais_roberta_53328125642_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`autotrain_tais_roberta_53328125642_pipeline` is a English model originally trained by manasviiiiiiiiiiiiiiiiiiiiiiiiii. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/autotrain_tais_roberta_53328125642_pipeline_en_5.5.0_3.0_1726471499499.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/autotrain_tais_roberta_53328125642_pipeline_en_5.5.0_3.0_1726471499499.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("autotrain_tais_roberta_53328125642_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("autotrain_tais_roberta_53328125642_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|autotrain_tais_roberta_53328125642_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|423.2 MB| + +## References + +https://huggingface.co/manasviiiiiiiiiiiiiiiiiiiiiiiiii/autotrain-tais-roberta-53328125642 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-baby_cothought_en.md b/docs/_posts/ahmedlone127/2024-09-16-baby_cothought_en.md new file mode 100644 index 00000000000000..b2e90ea9b4a377 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-baby_cothought_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English baby_cothought RoBertaEmbeddings from yaanhaan +author: John Snow Labs +name: baby_cothought +date: 2024-09-16 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`baby_cothought` is a English model originally trained by yaanhaan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/baby_cothought_en_5.5.0_3.0_1726513878475.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/baby_cothought_en_5.5.0_3.0_1726513878475.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("baby_cothought","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("baby_cothought","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|baby_cothought| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|470.9 MB| + +## References + +https://huggingface.co/yaanhaan/Baby-CoThought \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-babyberta_french1_25m_masking_finetuned_qamr_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-babyberta_french1_25m_masking_finetuned_qamr_pipeline_en.md new file mode 100644 index 00000000000000..9b14520614209e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-babyberta_french1_25m_masking_finetuned_qamr_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English babyberta_french1_25m_masking_finetuned_qamr_pipeline pipeline RoBertaForQuestionAnswering from lielbin +author: John Snow Labs +name: babyberta_french1_25m_masking_finetuned_qamr_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`babyberta_french1_25m_masking_finetuned_qamr_pipeline` is a English model originally trained by lielbin. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/babyberta_french1_25m_masking_finetuned_qamr_pipeline_en_5.5.0_3.0_1726460599101.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/babyberta_french1_25m_masking_finetuned_qamr_pipeline_en_5.5.0_3.0_1726460599101.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("babyberta_french1_25m_masking_finetuned_qamr_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("babyberta_french1_25m_masking_finetuned_qamr_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|babyberta_french1_25m_masking_finetuned_qamr_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|31.8 MB| + +## References + +https://huggingface.co/lielbin/BabyBERTa-french1.25M-Masking-finetuned-qamr + +## Included Models + +- MultiDocumentAssembler +- RoBertaForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-bae_roberta_base_mrpc_5_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-bae_roberta_base_mrpc_5_pipeline_en.md new file mode 100644 index 00000000000000..4a756782dbb78b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-bae_roberta_base_mrpc_5_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bae_roberta_base_mrpc_5_pipeline pipeline RoBertaForSequenceClassification from korca +author: John Snow Labs +name: bae_roberta_base_mrpc_5_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bae_roberta_base_mrpc_5_pipeline` is a English model originally trained by korca. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bae_roberta_base_mrpc_5_pipeline_en_5.5.0_3.0_1726456428148.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bae_roberta_base_mrpc_5_pipeline_en_5.5.0_3.0_1726456428148.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bae_roberta_base_mrpc_5_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bae_roberta_base_mrpc_5_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bae_roberta_base_mrpc_5_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|445.1 MB| + +## References + +https://huggingface.co/korca/bae-roberta-base-mrpc-5 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-bankstatementmodelver7_en.md b/docs/_posts/ahmedlone127/2024-09-16-bankstatementmodelver7_en.md new file mode 100644 index 00000000000000..da84ae506f5d94 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-bankstatementmodelver7_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bankstatementmodelver7 RoBertaForQuestionAnswering from Souvik123 +author: John Snow Labs +name: bankstatementmodelver7 +date: 2024-09-16 +tags: [en, open_source, onnx, question_answering, roberta] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bankstatementmodelver7` is a English model originally trained by Souvik123. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bankstatementmodelver7_en_5.5.0_3.0_1726460637592.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bankstatementmodelver7_en_5.5.0_3.0_1726460637592.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = RoBertaForQuestionAnswering.pretrained("bankstatementmodelver7","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = RoBertaForQuestionAnswering.pretrained("bankstatementmodelver7", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bankstatementmodelver7| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|463.7 MB| + +## References + +https://huggingface.co/Souvik123/bankstatementmodelver7 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-bert_base_tweet_topic_classification_en.md b/docs/_posts/ahmedlone127/2024-09-16-bert_base_tweet_topic_classification_en.md new file mode 100644 index 00000000000000..8c7c9b9c06bb5b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-bert_base_tweet_topic_classification_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_tweet_topic_classification BertForSequenceClassification from GeeDino +author: John Snow Labs +name: bert_base_tweet_topic_classification +date: 2024-09-16 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_tweet_topic_classification` is a English model originally trained by GeeDino. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_tweet_topic_classification_en_5.5.0_3.0_1726499150857.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_tweet_topic_classification_en_5.5.0_3.0_1726499150857.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_tweet_topic_classification","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_tweet_topic_classification", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_tweet_topic_classification| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|627.8 MB| + +## References + +https://huggingface.co/GeeDino/bert-base-tweet-topic-classification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-bert_base_uncased_squadv2_en.md b/docs/_posts/ahmedlone127/2024-09-16-bert_base_uncased_squadv2_en.md new file mode 100644 index 00000000000000..604bb56f34f3f5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-bert_base_uncased_squadv2_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_uncased_squadv2 BertForQuestionAnswering from Pennywise881 +author: John Snow Labs +name: bert_base_uncased_squadv2 +date: 2024-09-16 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_squadv2` is a English model originally trained by Pennywise881. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_squadv2_en_5.5.0_3.0_1726511080685.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_squadv2_en_5.5.0_3.0_1726511080685.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_squadv2","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_squadv2", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_squadv2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/Pennywise881/bert-base-uncased-squadv2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-bert_finetuned_ner_haluptzok_en.md b/docs/_posts/ahmedlone127/2024-09-16-bert_finetuned_ner_haluptzok_en.md new file mode 100644 index 00000000000000..712be0bf4991ba --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-bert_finetuned_ner_haluptzok_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_finetuned_ner_haluptzok BertForTokenClassification from haluptzok +author: John Snow Labs +name: bert_finetuned_ner_haluptzok +date: 2024-09-16 +tags: [en, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_finetuned_ner_haluptzok` is a English model originally trained by haluptzok. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_finetuned_ner_haluptzok_en_5.5.0_3.0_1726461153441.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_finetuned_ner_haluptzok_en_5.5.0_3.0_1726461153441.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("bert_finetuned_ner_haluptzok","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_finetuned_ner_haluptzok", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_finetuned_ner_haluptzok| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/haluptzok/bert-finetuned-ner \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-boolq_paws_en1000_en.md b/docs/_posts/ahmedlone127/2024-09-16-boolq_paws_en1000_en.md new file mode 100644 index 00000000000000..ce172da206bf70 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-boolq_paws_en1000_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English boolq_paws_en1000 RoBertaForSequenceClassification from yeyejmm +author: John Snow Labs +name: boolq_paws_en1000 +date: 2024-09-16 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`boolq_paws_en1000` is a English model originally trained by yeyejmm. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/boolq_paws_en1000_en_5.5.0_3.0_1726527808855.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/boolq_paws_en1000_en_5.5.0_3.0_1726527808855.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("boolq_paws_en1000","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("boolq_paws_en1000", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|boolq_paws_en1000| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|460.1 MB| + +## References + +https://huggingface.co/yeyejmm/BoolQ-PAWS-en1000 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-burmese_awesome_qa_model_aanwar_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-burmese_awesome_qa_model_aanwar_pipeline_en.md new file mode 100644 index 00000000000000..58258b16ca1bec --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-burmese_awesome_qa_model_aanwar_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English burmese_awesome_qa_model_aanwar_pipeline pipeline DistilBertForQuestionAnswering from aanwar +author: John Snow Labs +name: burmese_awesome_qa_model_aanwar_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_qa_model_aanwar_pipeline` is a English model originally trained by aanwar. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_aanwar_pipeline_en_5.5.0_3.0_1726515449379.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_aanwar_pipeline_en_5.5.0_3.0_1726515449379.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_qa_model_aanwar_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_qa_model_aanwar_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_qa_model_aanwar_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.3 MB| + +## References + +https://huggingface.co/aanwar/my_awesome_qa_model + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-burmese_awesome_qa_model_bibibobo777_en.md b/docs/_posts/ahmedlone127/2024-09-16-burmese_awesome_qa_model_bibibobo777_en.md new file mode 100644 index 00000000000000..e4305a2a2a0d46 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-burmese_awesome_qa_model_bibibobo777_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English burmese_awesome_qa_model_bibibobo777 DistilBertForQuestionAnswering from bibibobo777 +author: John Snow Labs +name: burmese_awesome_qa_model_bibibobo777 +date: 2024-09-16 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_qa_model_bibibobo777` is a English model originally trained by bibibobo777. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_bibibobo777_en_5.5.0_3.0_1726469593410.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_bibibobo777_en_5.5.0_3.0_1726469593410.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("burmese_awesome_qa_model_bibibobo777","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("burmese_awesome_qa_model_bibibobo777", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_qa_model_bibibobo777| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/bibibobo777/my_awesome_qa_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-burmese_awesome_qa_model_bibibobo777_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-burmese_awesome_qa_model_bibibobo777_pipeline_en.md new file mode 100644 index 00000000000000..ddc075f4b47cc0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-burmese_awesome_qa_model_bibibobo777_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English burmese_awesome_qa_model_bibibobo777_pipeline pipeline DistilBertForQuestionAnswering from bibibobo777 +author: John Snow Labs +name: burmese_awesome_qa_model_bibibobo777_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_qa_model_bibibobo777_pipeline` is a English model originally trained by bibibobo777. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_bibibobo777_pipeline_en_5.5.0_3.0_1726469604881.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_bibibobo777_pipeline_en_5.5.0_3.0_1726469604881.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_qa_model_bibibobo777_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_qa_model_bibibobo777_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_qa_model_bibibobo777_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.3 MB| + +## References + +https://huggingface.co/bibibobo777/my_awesome_qa_model + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-burmese_awesome_qa_model_reyeb_en.md b/docs/_posts/ahmedlone127/2024-09-16-burmese_awesome_qa_model_reyeb_en.md new file mode 100644 index 00000000000000..545df16af250db --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-burmese_awesome_qa_model_reyeb_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English burmese_awesome_qa_model_reyeb DistilBertForQuestionAnswering from reyeb +author: John Snow Labs +name: burmese_awesome_qa_model_reyeb +date: 2024-09-16 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_qa_model_reyeb` is a English model originally trained by reyeb. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_reyeb_en_5.5.0_3.0_1726515099889.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_reyeb_en_5.5.0_3.0_1726515099889.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("burmese_awesome_qa_model_reyeb","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("burmese_awesome_qa_model_reyeb", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_qa_model_reyeb| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/reyeb/my_awesome_qa_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-burmese_finetuned_financenews_distilbert_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-burmese_finetuned_financenews_distilbert_pipeline_en.md new file mode 100644 index 00000000000000..c4205171159b36 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-burmese_finetuned_financenews_distilbert_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_finetuned_financenews_distilbert_pipeline pipeline DistilBertForSequenceClassification from zijay +author: John Snow Labs +name: burmese_finetuned_financenews_distilbert_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_finetuned_financenews_distilbert_pipeline` is a English model originally trained by zijay. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_finetuned_financenews_distilbert_pipeline_en_5.5.0_3.0_1726506866547.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_finetuned_financenews_distilbert_pipeline_en_5.5.0_3.0_1726506866547.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_finetuned_financenews_distilbert_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_finetuned_financenews_distilbert_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_finetuned_financenews_distilbert_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|246.0 MB| + +## References + +https://huggingface.co/zijay/my-finetuned-FinanceNews-distilbert + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-classification_2_messing_around_en.md b/docs/_posts/ahmedlone127/2024-09-16-classification_2_messing_around_en.md new file mode 100644 index 00000000000000..0adb01171d5c42 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-classification_2_messing_around_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English classification_2_messing_around DistilBertForSequenceClassification from Pranavsenthilvel +author: John Snow Labs +name: classification_2_messing_around +date: 2024-09-16 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`classification_2_messing_around` is a English model originally trained by Pranavsenthilvel. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/classification_2_messing_around_en_5.5.0_3.0_1726506800536.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/classification_2_messing_around_en_5.5.0_3.0_1726506800536.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("classification_2_messing_around","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("classification_2_messing_around", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|classification_2_messing_around| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Pranavsenthilvel/classification-2-messing-around \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-code_search_codebert_base_4_random_trimmed_with_g_and_spaces_en.md b/docs/_posts/ahmedlone127/2024-09-16-code_search_codebert_base_4_random_trimmed_with_g_and_spaces_en.md new file mode 100644 index 00000000000000..c9062672fc710f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-code_search_codebert_base_4_random_trimmed_with_g_and_spaces_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English code_search_codebert_base_4_random_trimmed_with_g_and_spaces RoBertaForTokenClassification from DianaIulia +author: John Snow Labs +name: code_search_codebert_base_4_random_trimmed_with_g_and_spaces +date: 2024-09-16 +tags: [en, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`code_search_codebert_base_4_random_trimmed_with_g_and_spaces` is a English model originally trained by DianaIulia. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/code_search_codebert_base_4_random_trimmed_with_g_and_spaces_en_5.5.0_3.0_1726508968474.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/code_search_codebert_base_4_random_trimmed_with_g_and_spaces_en_5.5.0_3.0_1726508968474.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("code_search_codebert_base_4_random_trimmed_with_g_and_spaces","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("code_search_codebert_base_4_random_trimmed_with_g_and_spaces", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|code_search_codebert_base_4_random_trimmed_with_g_and_spaces| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|466.1 MB| + +## References + +https://huggingface.co/DianaIulia/code_search_codebert_base_4_random_trimmed_with_g_and_spaces \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-consumerresponseclassifier_en.md b/docs/_posts/ahmedlone127/2024-09-16-consumerresponseclassifier_en.md new file mode 100644 index 00000000000000..b7fb72798ae52c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-consumerresponseclassifier_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English consumerresponseclassifier RoBertaForSequenceClassification from ahaanlimaye +author: John Snow Labs +name: consumerresponseclassifier +date: 2024-09-16 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`consumerresponseclassifier` is a English model originally trained by ahaanlimaye. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/consumerresponseclassifier_en_5.5.0_3.0_1726504833456.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/consumerresponseclassifier_en_5.5.0_3.0_1726504833456.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("consumerresponseclassifier","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("consumerresponseclassifier", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|consumerresponseclassifier| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|424.9 MB| + +## References + +https://huggingface.co/ahaanlimaye/ConsumerResponseClassifier \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-consumerresponseclassifier_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-consumerresponseclassifier_pipeline_en.md new file mode 100644 index 00000000000000..aca746daee9e67 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-consumerresponseclassifier_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English consumerresponseclassifier_pipeline pipeline RoBertaForSequenceClassification from ahaanlimaye +author: John Snow Labs +name: consumerresponseclassifier_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`consumerresponseclassifier_pipeline` is a English model originally trained by ahaanlimaye. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/consumerresponseclassifier_pipeline_en_5.5.0_3.0_1726504871544.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/consumerresponseclassifier_pipeline_en_5.5.0_3.0_1726504871544.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("consumerresponseclassifier_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("consumerresponseclassifier_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|consumerresponseclassifier_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|424.9 MB| + +## References + +https://huggingface.co/ahaanlimaye/ConsumerResponseClassifier + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-discriminative_detection_binary2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-discriminative_detection_binary2_pipeline_en.md new file mode 100644 index 00000000000000..dcb90f038be0c4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-discriminative_detection_binary2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English discriminative_detection_binary2_pipeline pipeline RoBertaForSequenceClassification from fatmhd1995 +author: John Snow Labs +name: discriminative_detection_binary2_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`discriminative_detection_binary2_pipeline` is a English model originally trained by fatmhd1995. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/discriminative_detection_binary2_pipeline_en_5.5.0_3.0_1726471559206.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/discriminative_detection_binary2_pipeline_en_5.5.0_3.0_1726471559206.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("discriminative_detection_binary2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("discriminative_detection_binary2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|discriminative_detection_binary2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|460.2 MB| + +## References + +https://huggingface.co/fatmhd1995/discriminative-detection-binary2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-distibert_finetuned_arxiv_multi_label_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-distibert_finetuned_arxiv_multi_label_pipeline_en.md new file mode 100644 index 00000000000000..cf98189691223d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-distibert_finetuned_arxiv_multi_label_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distibert_finetuned_arxiv_multi_label_pipeline pipeline DistilBertForSequenceClassification from Hatoun +author: John Snow Labs +name: distibert_finetuned_arxiv_multi_label_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distibert_finetuned_arxiv_multi_label_pipeline` is a English model originally trained by Hatoun. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distibert_finetuned_arxiv_multi_label_pipeline_en_5.5.0_3.0_1726506901196.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distibert_finetuned_arxiv_multi_label_pipeline_en_5.5.0_3.0_1726506901196.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distibert_finetuned_arxiv_multi_label_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distibert_finetuned_arxiv_multi_label_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distibert_finetuned_arxiv_multi_label_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.6 MB| + +## References + +https://huggingface.co/Hatoun/DistiBERT-finetuned-arxiv-multi-label + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-distil_bert_ft_qa_model_7up_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-distil_bert_ft_qa_model_7up_pipeline_en.md new file mode 100644 index 00000000000000..36ac42a3e66c78 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-distil_bert_ft_qa_model_7up_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English distil_bert_ft_qa_model_7up_pipeline pipeline DistilBertForQuestionAnswering from cadzchua +author: John Snow Labs +name: distil_bert_ft_qa_model_7up_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distil_bert_ft_qa_model_7up_pipeline` is a English model originally trained by cadzchua. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distil_bert_ft_qa_model_7up_pipeline_en_5.5.0_3.0_1726515468646.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distil_bert_ft_qa_model_7up_pipeline_en_5.5.0_3.0_1726515468646.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distil_bert_ft_qa_model_7up_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distil_bert_ft_qa_model_7up_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distil_bert_ft_qa_model_7up_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.3 MB| + +## References + +https://huggingface.co/cadzchua/distil-bert-ft-qa-model-7up + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_cased_distilled_squad_bloomlonely_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_cased_distilled_squad_bloomlonely_pipeline_en.md new file mode 100644 index 00000000000000..219ac9c6806b45 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_cased_distilled_squad_bloomlonely_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English distilbert_base_cased_distilled_squad_bloomlonely_pipeline pipeline DistilBertForQuestionAnswering from BloomLonely +author: John Snow Labs +name: distilbert_base_cased_distilled_squad_bloomlonely_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_cased_distilled_squad_bloomlonely_pipeline` is a English model originally trained by BloomLonely. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_cased_distilled_squad_bloomlonely_pipeline_en_5.5.0_3.0_1726515569100.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_cased_distilled_squad_bloomlonely_pipeline_en_5.5.0_3.0_1726515569100.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_cased_distilled_squad_bloomlonely_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_cased_distilled_squad_bloomlonely_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_cased_distilled_squad_bloomlonely_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|243.8 MB| + +## References + +https://huggingface.co/BloomLonely/distilbert-base-cased-distilled-squad + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_uncased_detect_ai_generated_text_lau123_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_uncased_detect_ai_generated_text_lau123_pipeline_en.md new file mode 100644 index 00000000000000..1a8f84bb8b53a0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_uncased_detect_ai_generated_text_lau123_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_detect_ai_generated_text_lau123_pipeline pipeline DistilBertForSequenceClassification from Lau123 +author: John Snow Labs +name: distilbert_base_uncased_detect_ai_generated_text_lau123_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_detect_ai_generated_text_lau123_pipeline` is a English model originally trained by Lau123. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_detect_ai_generated_text_lau123_pipeline_en_5.5.0_3.0_1726525696933.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_detect_ai_generated_text_lau123_pipeline_en_5.5.0_3.0_1726525696933.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_detect_ai_generated_text_lau123_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_detect_ai_generated_text_lau123_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_detect_ai_generated_text_lau123_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Lau123/distilbert-base-uncased-detect_ai_generated_text + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_uncased_distilled_clinc_jeongyeom_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_uncased_distilled_clinc_jeongyeom_pipeline_en.md new file mode 100644 index 00000000000000..93d2857cb0ae42 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_uncased_distilled_clinc_jeongyeom_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_distilled_clinc_jeongyeom_pipeline pipeline DistilBertForSequenceClassification from jeongyeom +author: John Snow Labs +name: distilbert_base_uncased_distilled_clinc_jeongyeom_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_distilled_clinc_jeongyeom_pipeline` is a English model originally trained by jeongyeom. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_distilled_clinc_jeongyeom_pipeline_en_5.5.0_3.0_1726525589020.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_distilled_clinc_jeongyeom_pipeline_en_5.5.0_3.0_1726525589020.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_distilled_clinc_jeongyeom_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_distilled_clinc_jeongyeom_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_distilled_clinc_jeongyeom_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.9 MB| + +## References + +https://huggingface.co/jeongyeom/distilbert-base-uncased-distilled-clinc + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_uncased_finetuned_emotion_cardosoccc_en.md b/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_uncased_finetuned_emotion_cardosoccc_en.md new file mode 100644 index 00000000000000..5eeaa6ac3b88c3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_uncased_finetuned_emotion_cardosoccc_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_cardosoccc DistilBertForSequenceClassification from cardosoccc +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_cardosoccc +date: 2024-09-16 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_cardosoccc` is a English model originally trained by cardosoccc. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_cardosoccc_en_5.5.0_3.0_1726525473370.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_cardosoccc_en_5.5.0_3.0_1726525473370.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_cardosoccc","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_cardosoccc", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_cardosoccc| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/cardosoccc/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_uncased_finetuned_emotion_danielrsn_en.md b/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_uncased_finetuned_emotion_danielrsn_en.md new file mode 100644 index 00000000000000..c52f59cbc7748d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_uncased_finetuned_emotion_danielrsn_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_danielrsn DistilBertForSequenceClassification from danielrsn +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_danielrsn +date: 2024-09-16 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_danielrsn` is a English model originally trained by danielrsn. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_danielrsn_en_5.5.0_3.0_1726506922604.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_danielrsn_en_5.5.0_3.0_1726506922604.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_danielrsn","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_danielrsn", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_danielrsn| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/danielrsn/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_uncased_finetuned_emotion_danielrsn_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_uncased_finetuned_emotion_danielrsn_pipeline_en.md new file mode 100644 index 00000000000000..25e22d115ba619 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_uncased_finetuned_emotion_danielrsn_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_danielrsn_pipeline pipeline DistilBertForSequenceClassification from danielrsn +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_danielrsn_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_danielrsn_pipeline` is a English model originally trained by danielrsn. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_danielrsn_pipeline_en_5.5.0_3.0_1726506934043.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_danielrsn_pipeline_en_5.5.0_3.0_1726506934043.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_danielrsn_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_danielrsn_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_danielrsn_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/danielrsn/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_uncased_finetuned_emotion_wickelman_en.md b/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_uncased_finetuned_emotion_wickelman_en.md new file mode 100644 index 00000000000000..4c420206a02236 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_uncased_finetuned_emotion_wickelman_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_wickelman DistilBertForSequenceClassification from Wickelman +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_wickelman +date: 2024-09-16 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_wickelman` is a English model originally trained by Wickelman. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_wickelman_en_5.5.0_3.0_1726506181169.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_wickelman_en_5.5.0_3.0_1726506181169.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_wickelman","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_wickelman", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_wickelman| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Wickelman/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_uncased_finetuned_squad_hunniee_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_uncased_finetuned_squad_hunniee_pipeline_en.md new file mode 100644 index 00000000000000..c61e775a6aaa48 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_uncased_finetuned_squad_hunniee_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_squad_hunniee_pipeline pipeline DistilBertForQuestionAnswering from hunniee +author: John Snow Labs +name: distilbert_base_uncased_finetuned_squad_hunniee_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_squad_hunniee_pipeline` is a English model originally trained by hunniee. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_hunniee_pipeline_en_5.5.0_3.0_1726515329750.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_hunniee_pipeline_en_5.5.0_3.0_1726515329750.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_squad_hunniee_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_squad_hunniee_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_squad_hunniee_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/hunniee/distilbert-base-uncased-finetuned-squad + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_uncased_finetuned_squad_hyounguk_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_uncased_finetuned_squad_hyounguk_pipeline_en.md new file mode 100644 index 00000000000000..b9c53a9ee6e2f3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_uncased_finetuned_squad_hyounguk_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_squad_hyounguk_pipeline pipeline DistilBertForQuestionAnswering from Hyounguk +author: John Snow Labs +name: distilbert_base_uncased_finetuned_squad_hyounguk_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_squad_hyounguk_pipeline` is a English model originally trained by Hyounguk. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_hyounguk_pipeline_en_5.5.0_3.0_1726469799964.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_hyounguk_pipeline_en_5.5.0_3.0_1726469799964.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_squad_hyounguk_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_squad_hyounguk_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_squad_hyounguk_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/Hyounguk/distilbert-base-uncased-finetuned-squad + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_uncased_finetuned_squad_keerthana12_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_uncased_finetuned_squad_keerthana12_pipeline_en.md new file mode 100644 index 00000000000000..ce6cb509242275 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_uncased_finetuned_squad_keerthana12_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_squad_keerthana12_pipeline pipeline DistilBertForQuestionAnswering from Keerthana12 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_squad_keerthana12_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_squad_keerthana12_pipeline` is a English model originally trained by Keerthana12. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_keerthana12_pipeline_en_5.5.0_3.0_1726515144302.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_keerthana12_pipeline_en_5.5.0_3.0_1726515144302.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_squad_keerthana12_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_squad_keerthana12_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_squad_keerthana12_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.3 MB| + +## References + +https://huggingface.co/Keerthana12/distilbert-base-uncased-finetuned-squad + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_uncased_finetuned_squad_kubba_en.md b/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_uncased_finetuned_squad_kubba_en.md new file mode 100644 index 00000000000000..9b474a12773a13 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_uncased_finetuned_squad_kubba_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_squad_kubba DistilBertForQuestionAnswering from Kubba +author: John Snow Labs +name: distilbert_base_uncased_finetuned_squad_kubba +date: 2024-09-16 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_squad_kubba` is a English model originally trained by Kubba. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_kubba_en_5.5.0_3.0_1726515625747.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_kubba_en_5.5.0_3.0_1726515625747.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_uncased_finetuned_squad_kubba","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_uncased_finetuned_squad_kubba", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_squad_kubba| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/Kubba/distilbert-base-uncased-finetuned-squad \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_uncased_finetuned_squad_maguitai_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_uncased_finetuned_squad_maguitai_pipeline_en.md new file mode 100644 index 00000000000000..2c569bae38212b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_uncased_finetuned_squad_maguitai_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_squad_maguitai_pipeline pipeline DistilBertForQuestionAnswering from maguitai +author: John Snow Labs +name: distilbert_base_uncased_finetuned_squad_maguitai_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_squad_maguitai_pipeline` is a English model originally trained by maguitai. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_maguitai_pipeline_en_5.5.0_3.0_1726515235241.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_maguitai_pipeline_en_5.5.0_3.0_1726515235241.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_squad_maguitai_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_squad_maguitai_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_squad_maguitai_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/maguitai/distilbert-base-uncased-finetuned-squad + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_uncased_finetuned_squad_markr23_en.md b/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_uncased_finetuned_squad_markr23_en.md new file mode 100644 index 00000000000000..cca39ef9e63619 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_uncased_finetuned_squad_markr23_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_squad_markr23 DistilBertForQuestionAnswering from markr23 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_squad_markr23 +date: 2024-09-16 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_squad_markr23` is a English model originally trained by markr23. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_markr23_en_5.5.0_3.0_1726469674752.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_markr23_en_5.5.0_3.0_1726469674752.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_uncased_finetuned_squad_markr23","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_uncased_finetuned_squad_markr23", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_squad_markr23| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/markr23/distilbert-base-uncased-finetuned-squad \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_uncased_finetuned_squad_sm750s_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_uncased_finetuned_squad_sm750s_pipeline_en.md new file mode 100644 index 00000000000000..658748d8625826 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_uncased_finetuned_squad_sm750s_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_squad_sm750s_pipeline pipeline DistilBertForQuestionAnswering from sm750s +author: John Snow Labs +name: distilbert_base_uncased_finetuned_squad_sm750s_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_squad_sm750s_pipeline` is a English model originally trained by sm750s. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_sm750s_pipeline_en_5.5.0_3.0_1726469255102.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_sm750s_pipeline_en_5.5.0_3.0_1726469255102.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_squad_sm750s_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_squad_sm750s_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_squad_sm750s_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/sm750s/distilbert-base-uncased-finetuned-squad + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_uncased_finetuned_squad_suthanhcong_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_uncased_finetuned_squad_suthanhcong_pipeline_en.md new file mode 100644 index 00000000000000..bc83547a56c7fc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_uncased_finetuned_squad_suthanhcong_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_squad_suthanhcong_pipeline pipeline DistilBertForQuestionAnswering from suthanhcong +author: John Snow Labs +name: distilbert_base_uncased_finetuned_squad_suthanhcong_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_squad_suthanhcong_pipeline` is a English model originally trained by suthanhcong. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_suthanhcong_pipeline_en_5.5.0_3.0_1726515559291.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_suthanhcong_pipeline_en_5.5.0_3.0_1726515559291.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_squad_suthanhcong_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_squad_suthanhcong_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_squad_suthanhcong_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.3 MB| + +## References + +https://huggingface.co/suthanhcong/distilbert-base-uncased-finetuned-squad + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_uncased_finetuned_subj_obj_1_0_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_uncased_finetuned_subj_obj_1_0_pipeline_en.md new file mode 100644 index 00000000000000..323dcf330f7f45 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_uncased_finetuned_subj_obj_1_0_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_subj_obj_1_0_pipeline pipeline DistilBertForSequenceClassification from sheshuan +author: John Snow Labs +name: distilbert_base_uncased_finetuned_subj_obj_1_0_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_subj_obj_1_0_pipeline` is a English model originally trained by sheshuan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_subj_obj_1_0_pipeline_en_5.5.0_3.0_1726525596706.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_subj_obj_1_0_pipeline_en_5.5.0_3.0_1726525596706.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_subj_obj_1_0_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_subj_obj_1_0_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_subj_obj_1_0_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/sheshuan/distilbert-base-uncased-finetuned-subj_obj_1.0 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-distilbert_finetuned_squad2_en.md b/docs/_posts/ahmedlone127/2024-09-16-distilbert_finetuned_squad2_en.md new file mode 100644 index 00000000000000..947adb7c9000ca --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-distilbert_finetuned_squad2_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English distilbert_finetuned_squad2 DistilBertForQuestionAnswering from NMCxyz +author: John Snow Labs +name: distilbert_finetuned_squad2 +date: 2024-09-16 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_finetuned_squad2` is a English model originally trained by NMCxyz. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_finetuned_squad2_en_5.5.0_3.0_1726515140061.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_finetuned_squad2_en_5.5.0_3.0_1726515140061.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_finetuned_squad2","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_finetuned_squad2", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_finetuned_squad2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/NMCxyz/distilbert-finetuned-squad2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-distilbert_finetuned_squad2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-distilbert_finetuned_squad2_pipeline_en.md new file mode 100644 index 00000000000000..9609e47159371a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-distilbert_finetuned_squad2_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English distilbert_finetuned_squad2_pipeline pipeline DistilBertForQuestionAnswering from NMCxyz +author: John Snow Labs +name: distilbert_finetuned_squad2_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_finetuned_squad2_pipeline` is a English model originally trained by NMCxyz. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_finetuned_squad2_pipeline_en_5.5.0_3.0_1726515153825.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_finetuned_squad2_pipeline_en_5.5.0_3.0_1726515153825.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_finetuned_squad2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_finetuned_squad2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_finetuned_squad2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/NMCxyz/distilbert-finetuned-squad2 + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-distilbert_finetuned_squadv2_nctuananh_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-distilbert_finetuned_squadv2_nctuananh_pipeline_en.md new file mode 100644 index 00000000000000..2df48035ce43f1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-distilbert_finetuned_squadv2_nctuananh_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English distilbert_finetuned_squadv2_nctuananh_pipeline pipeline DistilBertForQuestionAnswering from NCTuanAnh +author: John Snow Labs +name: distilbert_finetuned_squadv2_nctuananh_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_finetuned_squadv2_nctuananh_pipeline` is a English model originally trained by NCTuanAnh. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_finetuned_squadv2_nctuananh_pipeline_en_5.5.0_3.0_1726515611350.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_finetuned_squadv2_nctuananh_pipeline_en_5.5.0_3.0_1726515611350.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_finetuned_squadv2_nctuananh_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_finetuned_squadv2_nctuananh_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_finetuned_squadv2_nctuananh_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.3 MB| + +## References + +https://huggingface.co/NCTuanAnh/distilbert-finetuned-squadv2 + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-distilbert_sanskrit_saskta_glue_experiment_logit_kd_qnli_192_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-distilbert_sanskrit_saskta_glue_experiment_logit_kd_qnli_192_pipeline_en.md new file mode 100644 index 00000000000000..e1b64f9a15bf9d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-distilbert_sanskrit_saskta_glue_experiment_logit_kd_qnli_192_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_sanskrit_saskta_glue_experiment_logit_kd_qnli_192_pipeline pipeline DistilBertForSequenceClassification from gokuls +author: John Snow Labs +name: distilbert_sanskrit_saskta_glue_experiment_logit_kd_qnli_192_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_sanskrit_saskta_glue_experiment_logit_kd_qnli_192_pipeline` is a English model originally trained by gokuls. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_sanskrit_saskta_glue_experiment_logit_kd_qnli_192_pipeline_en_5.5.0_3.0_1726525726664.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_sanskrit_saskta_glue_experiment_logit_kd_qnli_192_pipeline_en_5.5.0_3.0_1726525726664.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_sanskrit_saskta_glue_experiment_logit_kd_qnli_192_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_sanskrit_saskta_glue_experiment_logit_kd_qnli_192_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_sanskrit_saskta_glue_experiment_logit_kd_qnli_192_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|52.7 MB| + +## References + +https://huggingface.co/gokuls/distilbert_sa_GLUE_Experiment_logit_kd_qnli_192 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-distilbert_sanskrit_saskta_glue_experiment_qnli_96_en.md b/docs/_posts/ahmedlone127/2024-09-16-distilbert_sanskrit_saskta_glue_experiment_qnli_96_en.md new file mode 100644 index 00000000000000..d8e1205195af50 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-distilbert_sanskrit_saskta_glue_experiment_qnli_96_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_sanskrit_saskta_glue_experiment_qnli_96 DistilBertForSequenceClassification from gokuls +author: John Snow Labs +name: distilbert_sanskrit_saskta_glue_experiment_qnli_96 +date: 2024-09-16 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_sanskrit_saskta_glue_experiment_qnli_96` is a English model originally trained by gokuls. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_sanskrit_saskta_glue_experiment_qnli_96_en_5.5.0_3.0_1726525464435.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_sanskrit_saskta_glue_experiment_qnli_96_en_5.5.0_3.0_1726525464435.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_sanskrit_saskta_glue_experiment_qnli_96","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_sanskrit_saskta_glue_experiment_qnli_96", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_sanskrit_saskta_glue_experiment_qnli_96| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|25.7 MB| + +## References + +https://huggingface.co/gokuls/distilbert_sa_GLUE_Experiment_qnli_96 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-distilbert_turkish_turkish_movie_reviews_tr.md b/docs/_posts/ahmedlone127/2024-09-16-distilbert_turkish_turkish_movie_reviews_tr.md new file mode 100644 index 00000000000000..bcd864bc033518 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-distilbert_turkish_turkish_movie_reviews_tr.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Turkish distilbert_turkish_turkish_movie_reviews DistilBertForSequenceClassification from anilguven +author: John Snow Labs +name: distilbert_turkish_turkish_movie_reviews +date: 2024-09-16 +tags: [tr, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: tr +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_turkish_turkish_movie_reviews` is a Turkish model originally trained by anilguven. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_turkish_turkish_movie_reviews_tr_5.5.0_3.0_1726525467792.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_turkish_turkish_movie_reviews_tr_5.5.0_3.0_1726525467792.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_turkish_turkish_movie_reviews","tr") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_turkish_turkish_movie_reviews", "tr") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_turkish_turkish_movie_reviews| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|tr| +|Size:|254.1 MB| + +## References + +https://huggingface.co/anilguven/distilbert_tr_turkish_movie_reviews \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-doctorintentmainclassifier_en.md b/docs/_posts/ahmedlone127/2024-09-16-doctorintentmainclassifier_en.md new file mode 100644 index 00000000000000..f290811f3e2290 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-doctorintentmainclassifier_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English doctorintentmainclassifier RoBertaForSequenceClassification from Mikelium5 +author: John Snow Labs +name: doctorintentmainclassifier +date: 2024-09-16 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`doctorintentmainclassifier` is a English model originally trained by Mikelium5. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/doctorintentmainclassifier_en_5.5.0_3.0_1726518140880.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/doctorintentmainclassifier_en_5.5.0_3.0_1726518140880.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("doctorintentmainclassifier","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("doctorintentmainclassifier", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|doctorintentmainclassifier| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|416.7 MB| + +## References + +https://huggingface.co/Mikelium5/DoctorIntentMainClassifier \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-efficient_mlm_m0_40_801010_en.md b/docs/_posts/ahmedlone127/2024-09-16-efficient_mlm_m0_40_801010_en.md new file mode 100644 index 00000000000000..c8c05fe74c6a36 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-efficient_mlm_m0_40_801010_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English efficient_mlm_m0_40_801010 RoBertaEmbeddings from princeton-nlp +author: John Snow Labs +name: efficient_mlm_m0_40_801010 +date: 2024-09-16 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`efficient_mlm_m0_40_801010` is a English model originally trained by princeton-nlp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/efficient_mlm_m0_40_801010_en_5.5.0_3.0_1726513885784.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/efficient_mlm_m0_40_801010_en_5.5.0_3.0_1726513885784.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("efficient_mlm_m0_40_801010","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("efficient_mlm_m0_40_801010","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|efficient_mlm_m0_40_801010| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|845.3 MB| + +## References + +https://huggingface.co/princeton-nlp/efficient_mlm_m0.40-801010 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-emoji_emoji_random1_seed2_twitter_roberta_base_2021_124m_en.md b/docs/_posts/ahmedlone127/2024-09-16-emoji_emoji_random1_seed2_twitter_roberta_base_2021_124m_en.md new file mode 100644 index 00000000000000..4b1079dcdd51cf --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-emoji_emoji_random1_seed2_twitter_roberta_base_2021_124m_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English emoji_emoji_random1_seed2_twitter_roberta_base_2021_124m RoBertaForSequenceClassification from tweettemposhift +author: John Snow Labs +name: emoji_emoji_random1_seed2_twitter_roberta_base_2021_124m +date: 2024-09-16 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`emoji_emoji_random1_seed2_twitter_roberta_base_2021_124m` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/emoji_emoji_random1_seed2_twitter_roberta_base_2021_124m_en_5.5.0_3.0_1726470556357.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/emoji_emoji_random1_seed2_twitter_roberta_base_2021_124m_en_5.5.0_3.0_1726470556357.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("emoji_emoji_random1_seed2_twitter_roberta_base_2021_124m","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("emoji_emoji_random1_seed2_twitter_roberta_base_2021_124m", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|emoji_emoji_random1_seed2_twitter_roberta_base_2021_124m| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|468.6 MB| + +## References + +https://huggingface.co/tweettemposhift/emoji-emoji_random1_seed2-twitter-roberta-base-2021-124m \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-emotion_classification_a2ran_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-emotion_classification_a2ran_pipeline_en.md new file mode 100644 index 00000000000000..54ac9213af8e2c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-emotion_classification_a2ran_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English emotion_classification_a2ran_pipeline pipeline DistilBertForSequenceClassification from a2ran +author: John Snow Labs +name: emotion_classification_a2ran_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`emotion_classification_a2ran_pipeline` is a English model originally trained by a2ran. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/emotion_classification_a2ran_pipeline_en_5.5.0_3.0_1726525268680.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/emotion_classification_a2ran_pipeline_en_5.5.0_3.0_1726525268680.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("emotion_classification_a2ran_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("emotion_classification_a2ran_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|emotion_classification_a2ran_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|507.6 MB| + +## References + +https://huggingface.co/a2ran/emotion_classification + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-enlm_roberta_conll2003_final_en.md b/docs/_posts/ahmedlone127/2024-09-16-enlm_roberta_conll2003_final_en.md new file mode 100644 index 00000000000000..347b88b605f1fc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-enlm_roberta_conll2003_final_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English enlm_roberta_conll2003_final XlmRoBertaForTokenClassification from manirai91 +author: John Snow Labs +name: enlm_roberta_conll2003_final +date: 2024-09-16 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`enlm_roberta_conll2003_final` is a English model originally trained by manirai91. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/enlm_roberta_conll2003_final_en_5.5.0_3.0_1726495965715.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/enlm_roberta_conll2003_final_en_5.5.0_3.0_1726495965715.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("enlm_roberta_conll2003_final","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("enlm_roberta_conll2003_final", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|enlm_roberta_conll2003_final| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|464.4 MB| + +## References + +https://huggingface.co/manirai91/enlm-roberta-conll2003-final \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-exp_number0_en.md b/docs/_posts/ahmedlone127/2024-09-16-exp_number0_en.md new file mode 100644 index 00000000000000..c25800760a09bd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-exp_number0_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English exp_number0 DistilBertForSequenceClassification from classicakeza5 +author: John Snow Labs +name: exp_number0 +date: 2024-09-16 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`exp_number0` is a English model originally trained by classicakeza5. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/exp_number0_en_5.5.0_3.0_1726525895864.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/exp_number0_en_5.5.0_3.0_1726525895864.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("exp_number0","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("exp_number0", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|exp_number0| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/classicakeza5/exp_number0 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-exp_number0_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-exp_number0_pipeline_en.md new file mode 100644 index 00000000000000..57757fd7ecbab4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-exp_number0_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English exp_number0_pipeline pipeline DistilBertForSequenceClassification from classicakeza5 +author: John Snow Labs +name: exp_number0_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`exp_number0_pipeline` is a English model originally trained by classicakeza5. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/exp_number0_pipeline_en_5.5.0_3.0_1726525907804.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/exp_number0_pipeline_en_5.5.0_3.0_1726525907804.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("exp_number0_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("exp_number0_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|exp_number0_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/classicakeza5/exp_number0 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-fake_news_classifier_emmawang_en.md b/docs/_posts/ahmedlone127/2024-09-16-fake_news_classifier_emmawang_en.md new file mode 100644 index 00000000000000..89e187d47f702d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-fake_news_classifier_emmawang_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English fake_news_classifier_emmawang DistilBertForSequenceClassification from Emmawang +author: John Snow Labs +name: fake_news_classifier_emmawang +date: 2024-09-16 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`fake_news_classifier_emmawang` is a English model originally trained by Emmawang. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/fake_news_classifier_emmawang_en_5.5.0_3.0_1726506930265.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/fake_news_classifier_emmawang_en_5.5.0_3.0_1726506930265.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("fake_news_classifier_emmawang","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("fake_news_classifier_emmawang", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|fake_news_classifier_emmawang| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Emmawang/fake_news_classifier \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-fine_tune_whisper_small_inayat_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-fine_tune_whisper_small_inayat_pipeline_en.md new file mode 100644 index 00000000000000..e464f514e3e14c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-fine_tune_whisper_small_inayat_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English fine_tune_whisper_small_inayat_pipeline pipeline WhisperForCTC from Inayat +author: John Snow Labs +name: fine_tune_whisper_small_inayat_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`fine_tune_whisper_small_inayat_pipeline` is a English model originally trained by Inayat. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/fine_tune_whisper_small_inayat_pipeline_en_5.5.0_3.0_1726477363611.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/fine_tune_whisper_small_inayat_pipeline_en_5.5.0_3.0_1726477363611.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("fine_tune_whisper_small_inayat_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("fine_tune_whisper_small_inayat_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|fine_tune_whisper_small_inayat_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/Inayat/Fine_tune_whisper_small + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-fine_tuned_helsinki_model_en.md b/docs/_posts/ahmedlone127/2024-09-16-fine_tuned_helsinki_model_en.md new file mode 100644 index 00000000000000..f9dd496689cbe3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-fine_tuned_helsinki_model_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English fine_tuned_helsinki_model MarianTransformer from EricPeter +author: John Snow Labs +name: fine_tuned_helsinki_model +date: 2024-09-16 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`fine_tuned_helsinki_model` is a English model originally trained by EricPeter. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/fine_tuned_helsinki_model_en_5.5.0_3.0_1726491207322.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/fine_tuned_helsinki_model_en_5.5.0_3.0_1726491207322.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("fine_tuned_helsinki_model","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("fine_tuned_helsinki_model","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|fine_tuned_helsinki_model| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|529.9 MB| + +## References + +https://huggingface.co/EricPeter/fine_tuned_helsinki_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-finetune_t5_base_without_optimization_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-finetune_t5_base_without_optimization_pipeline_en.md new file mode 100644 index 00000000000000..43890218b3181c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-finetune_t5_base_without_optimization_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English finetune_t5_base_without_optimization_pipeline pipeline T5Transformer from yasmineee +author: John Snow Labs +name: finetune_t5_base_without_optimization_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: [Question Answering, Summarization, Translation, Text Generation] +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained T5Transformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetune_t5_base_without_optimization_pipeline` is a English model originally trained by yasmineee. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetune_t5_base_without_optimization_pipeline_en_5.5.0_3.0_1726521359665.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetune_t5_base_without_optimization_pipeline_en_5.5.0_3.0_1726521359665.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetune_t5_base_without_optimization_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetune_t5_base_without_optimization_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetune_t5_base_without_optimization_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.4 GB| + +## References + +https://huggingface.co/yasmineee/finetune-t5-base-without-optimization + +## Included Models + +- DocumentAssembler +- T5Transformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-finetuned_adversarial_paraphrase_model_test_en.md b/docs/_posts/ahmedlone127/2024-09-16-finetuned_adversarial_paraphrase_model_test_en.md new file mode 100644 index 00000000000000..943f58bd2d5a67 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-finetuned_adversarial_paraphrase_model_test_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuned_adversarial_paraphrase_model_test RoBertaForSequenceClassification from chitra +author: John Snow Labs +name: finetuned_adversarial_paraphrase_model_test +date: 2024-09-16 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuned_adversarial_paraphrase_model_test` is a English model originally trained by chitra. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuned_adversarial_paraphrase_model_test_en_5.5.0_3.0_1726456353416.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuned_adversarial_paraphrase_model_test_en_5.5.0_3.0_1726456353416.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("finetuned_adversarial_paraphrase_model_test","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("finetuned_adversarial_paraphrase_model_test", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuned_adversarial_paraphrase_model_test| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/chitra/finetuned-adversarial-paraphrase-model-test \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-finetuning_sentiment_model_3000_samples_sarathaer_en.md b/docs/_posts/ahmedlone127/2024-09-16-finetuning_sentiment_model_3000_samples_sarathaer_en.md new file mode 100644 index 00000000000000..bebb55754145c4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-finetuning_sentiment_model_3000_samples_sarathaer_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_sarathaer DistilBertForSequenceClassification from Sarathaer +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_sarathaer +date: 2024-09-16 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_sarathaer` is a English model originally trained by Sarathaer. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_sarathaer_en_5.5.0_3.0_1726525624894.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_sarathaer_en_5.5.0_3.0_1726525624894.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_sarathaer","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_sarathaer", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_sarathaer| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Sarathaer/finetuning-sentiment-model-3000-samples \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-further_base_v4_0__checkpoint_last_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-further_base_v4_0__checkpoint_last_pipeline_en.md new file mode 100644 index 00000000000000..12c5e32ae74a3f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-further_base_v4_0__checkpoint_last_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English further_base_v4_0__checkpoint_last_pipeline pipeline RoBertaEmbeddings from eduagarcia-temp +author: John Snow Labs +name: further_base_v4_0__checkpoint_last_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`further_base_v4_0__checkpoint_last_pipeline` is a English model originally trained by eduagarcia-temp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/further_base_v4_0__checkpoint_last_pipeline_en_5.5.0_3.0_1726513688044.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/further_base_v4_0__checkpoint_last_pipeline_en_5.5.0_3.0_1726513688044.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("further_base_v4_0__checkpoint_last_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("further_base_v4_0__checkpoint_last_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|further_base_v4_0__checkpoint_last_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|296.3 MB| + +## References + +https://huggingface.co/eduagarcia-temp/further_base_v4_0__checkpoint_last + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-gal_ner_iwcg_6_en.md b/docs/_posts/ahmedlone127/2024-09-16-gal_ner_iwcg_6_en.md new file mode 100644 index 00000000000000..370f95ca4ef42f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-gal_ner_iwcg_6_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English gal_ner_iwcg_6 XlmRoBertaForTokenClassification from homersimpson +author: John Snow Labs +name: gal_ner_iwcg_6 +date: 2024-09-16 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`gal_ner_iwcg_6` is a English model originally trained by homersimpson. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/gal_ner_iwcg_6_en_5.5.0_3.0_1726497291676.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/gal_ner_iwcg_6_en_5.5.0_3.0_1726497291676.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("gal_ner_iwcg_6","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("gal_ner_iwcg_6", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|gal_ner_iwcg_6| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|426.5 MB| + +## References + +https://huggingface.co/homersimpson/gal-ner-iwcg-6 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-gtts_updated_en.md b/docs/_posts/ahmedlone127/2024-09-16-gtts_updated_en.md new file mode 100644 index 00000000000000..d8d64d859f257d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-gtts_updated_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English gtts_updated WhisperForCTC from SamagraDataGov +author: John Snow Labs +name: gtts_updated +date: 2024-09-16 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`gtts_updated` is a English model originally trained by SamagraDataGov. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/gtts_updated_en_5.5.0_3.0_1726479416271.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/gtts_updated_en_5.5.0_3.0_1726479416271.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("gtts_updated","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("gtts_updated", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|gtts_updated| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|389.9 MB| + +## References + +https://huggingface.co/SamagraDataGov/gtts-updated \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-hecone_he.md b/docs/_posts/ahmedlone127/2024-09-16-hecone_he.md new file mode 100644 index 00000000000000..77444523825a53 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-hecone_he.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Hebrew hecone RoBertaForTokenClassification from HeTree +author: John Snow Labs +name: hecone +date: 2024-09-16 +tags: [he, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: he +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`hecone` is a Hebrew model originally trained by HeTree. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/hecone_he_5.5.0_3.0_1726452820874.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/hecone_he_5.5.0_3.0_1726452820874.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("hecone","he") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("hecone", "he") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|hecone| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|he| +|Size:|466.0 MB| + +## References + +https://huggingface.co/HeTree/HeConE \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-hecone_pipeline_he.md b/docs/_posts/ahmedlone127/2024-09-16-hecone_pipeline_he.md new file mode 100644 index 00000000000000..ad59ac51cd1ce4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-hecone_pipeline_he.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Hebrew hecone_pipeline pipeline RoBertaForTokenClassification from HeTree +author: John Snow Labs +name: hecone_pipeline +date: 2024-09-16 +tags: [he, open_source, pipeline, onnx] +task: Named Entity Recognition +language: he +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`hecone_pipeline` is a Hebrew model originally trained by HeTree. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/hecone_pipeline_he_5.5.0_3.0_1726452841616.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/hecone_pipeline_he_5.5.0_3.0_1726452841616.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("hecone_pipeline", lang = "he") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("hecone_pipeline", lang = "he") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|hecone_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|he| +|Size:|466.0 MB| + +## References + +https://huggingface.co/HeTree/HeConE + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-helsinki_danish_swedish_v3_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-helsinki_danish_swedish_v3_pipeline_en.md new file mode 100644 index 00000000000000..4a90147fb33db0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-helsinki_danish_swedish_v3_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English helsinki_danish_swedish_v3_pipeline pipeline MarianTransformer from Danieljacobsen +author: John Snow Labs +name: helsinki_danish_swedish_v3_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`helsinki_danish_swedish_v3_pipeline` is a English model originally trained by Danieljacobsen. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/helsinki_danish_swedish_v3_pipeline_en_5.5.0_3.0_1726458207758.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/helsinki_danish_swedish_v3_pipeline_en_5.5.0_3.0_1726458207758.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("helsinki_danish_swedish_v3_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("helsinki_danish_swedish_v3_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|helsinki_danish_swedish_v3_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|498.3 MB| + +## References + +https://huggingface.co/Danieljacobsen/Helsinki-DA-SV-v3 + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-iit_token_en.md b/docs/_posts/ahmedlone127/2024-09-16-iit_token_en.md new file mode 100644 index 00000000000000..fa8c128c0fe9e1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-iit_token_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English iit_token DistilBertForQuestionAnswering from teju-1210 +author: John Snow Labs +name: iit_token +date: 2024-09-16 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`iit_token` is a English model originally trained by teju-1210. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/iit_token_en_5.5.0_3.0_1726515086588.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/iit_token_en_5.5.0_3.0_1726515086588.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("iit_token","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("iit_token", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|iit_token| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/teju-1210/IIT_Token \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-iwslt17_marian_small_ctx2_cwd0_english_french_en.md b/docs/_posts/ahmedlone127/2024-09-16-iwslt17_marian_small_ctx2_cwd0_english_french_en.md new file mode 100644 index 00000000000000..4f1e75c8212a39 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-iwslt17_marian_small_ctx2_cwd0_english_french_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English iwslt17_marian_small_ctx2_cwd0_english_french MarianTransformer from context-mt +author: John Snow Labs +name: iwslt17_marian_small_ctx2_cwd0_english_french +date: 2024-09-16 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`iwslt17_marian_small_ctx2_cwd0_english_french` is a English model originally trained by context-mt. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/iwslt17_marian_small_ctx2_cwd0_english_french_en_5.5.0_3.0_1726458109327.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/iwslt17_marian_small_ctx2_cwd0_english_french_en_5.5.0_3.0_1726458109327.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("iwslt17_marian_small_ctx2_cwd0_english_french","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("iwslt17_marian_small_ctx2_cwd0_english_french","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|iwslt17_marian_small_ctx2_cwd0_english_french| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|508.3 MB| + +## References + +https://huggingface.co/context-mt/iwslt17-marian-small-ctx2-cwd0-en-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-kinyaroberta_large_kinte_finetuned_kinyarwanda_sent1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-kinyaroberta_large_kinte_finetuned_kinyarwanda_sent1_pipeline_en.md new file mode 100644 index 00000000000000..f6d0b50c1ee163 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-kinyaroberta_large_kinte_finetuned_kinyarwanda_sent1_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English kinyaroberta_large_kinte_finetuned_kinyarwanda_sent1_pipeline pipeline RoBertaForSequenceClassification from RogerB +author: John Snow Labs +name: kinyaroberta_large_kinte_finetuned_kinyarwanda_sent1_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`kinyaroberta_large_kinte_finetuned_kinyarwanda_sent1_pipeline` is a English model originally trained by RogerB. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/kinyaroberta_large_kinte_finetuned_kinyarwanda_sent1_pipeline_en_5.5.0_3.0_1726505390726.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/kinyaroberta_large_kinte_finetuned_kinyarwanda_sent1_pipeline_en_5.5.0_3.0_1726505390726.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("kinyaroberta_large_kinte_finetuned_kinyarwanda_sent1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("kinyaroberta_large_kinte_finetuned_kinyarwanda_sent1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|kinyaroberta_large_kinte_finetuned_kinyarwanda_sent1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.3 MB| + +## References + +https://huggingface.co/RogerB/kinyaRoberta-large-kinte-finetuned-kin-sent1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-lab1_finetuning_robinysh_en.md b/docs/_posts/ahmedlone127/2024-09-16-lab1_finetuning_robinysh_en.md new file mode 100644 index 00000000000000..24e647510c02bd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-lab1_finetuning_robinysh_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English lab1_finetuning_robinysh MarianTransformer from robinysh +author: John Snow Labs +name: lab1_finetuning_robinysh +date: 2024-09-16 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`lab1_finetuning_robinysh` is a English model originally trained by robinysh. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/lab1_finetuning_robinysh_en_5.5.0_3.0_1726491704988.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/lab1_finetuning_robinysh_en_5.5.0_3.0_1726491704988.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("lab1_finetuning_robinysh","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("lab1_finetuning_robinysh","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|lab1_finetuning_robinysh| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|508.2 MB| + +## References + +https://huggingface.co/robinysh/lab1_finetuning \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-lab1_random_haochenhe_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-lab1_random_haochenhe_pipeline_en.md new file mode 100644 index 00000000000000..30e4cd0856dd79 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-lab1_random_haochenhe_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English lab1_random_haochenhe_pipeline pipeline MarianTransformer from haochenhe +author: John Snow Labs +name: lab1_random_haochenhe_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`lab1_random_haochenhe_pipeline` is a English model originally trained by haochenhe. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/lab1_random_haochenhe_pipeline_en_5.5.0_3.0_1726465947821.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/lab1_random_haochenhe_pipeline_en_5.5.0_3.0_1726465947821.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("lab1_random_haochenhe_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("lab1_random_haochenhe_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|lab1_random_haochenhe_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|508.8 MB| + +## References + +https://huggingface.co/haochenhe/lab1_random + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-latin_english_base_aeneid_holdout_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-latin_english_base_aeneid_holdout_pipeline_en.md new file mode 100644 index 00000000000000..b6b4d964839f3c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-latin_english_base_aeneid_holdout_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English latin_english_base_aeneid_holdout_pipeline pipeline MarianTransformer from grosenthal +author: John Snow Labs +name: latin_english_base_aeneid_holdout_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`latin_english_base_aeneid_holdout_pipeline` is a English model originally trained by grosenthal. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/latin_english_base_aeneid_holdout_pipeline_en_5.5.0_3.0_1726457128114.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/latin_english_base_aeneid_holdout_pipeline_en_5.5.0_3.0_1726457128114.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("latin_english_base_aeneid_holdout_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("latin_english_base_aeneid_holdout_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|latin_english_base_aeneid_holdout_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|539.7 MB| + +## References + +https://huggingface.co/grosenthal/la_en_base_aeneid_holdout + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-log_sage_reward_model_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-log_sage_reward_model_pipeline_en.md new file mode 100644 index 00000000000000..0581b056ca9ba8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-log_sage_reward_model_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English log_sage_reward_model_pipeline pipeline DistilBertForSequenceClassification from IrwinD +author: John Snow Labs +name: log_sage_reward_model_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`log_sage_reward_model_pipeline` is a English model originally trained by IrwinD. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/log_sage_reward_model_pipeline_en_5.5.0_3.0_1726525308479.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/log_sage_reward_model_pipeline_en_5.5.0_3.0_1726525308479.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("log_sage_reward_model_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("log_sage_reward_model_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|log_sage_reward_model_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/IrwinD/log_sage_reward_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-maltese_ach_english_en.md b/docs/_posts/ahmedlone127/2024-09-16-maltese_ach_english_en.md new file mode 100644 index 00000000000000..6ce4f8c99dfaa9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-maltese_ach_english_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English maltese_ach_english MarianTransformer from Ogayo +author: John Snow Labs +name: maltese_ach_english +date: 2024-09-16 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`maltese_ach_english` is a English model originally trained by Ogayo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/maltese_ach_english_en_5.5.0_3.0_1726465263736.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/maltese_ach_english_en_5.5.0_3.0_1726465263736.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("maltese_ach_english","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("maltese_ach_english","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|maltese_ach_english| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|465.5 MB| + +## References + +https://huggingface.co/Ogayo/mt-ach-en \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-maltese_coref_english_spanish_gender_exp_en.md b/docs/_posts/ahmedlone127/2024-09-16-maltese_coref_english_spanish_gender_exp_en.md new file mode 100644 index 00000000000000..4a5da8c8d32aef --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-maltese_coref_english_spanish_gender_exp_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English maltese_coref_english_spanish_gender_exp MarianTransformer from nlphuji +author: John Snow Labs +name: maltese_coref_english_spanish_gender_exp +date: 2024-09-16 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`maltese_coref_english_spanish_gender_exp` is a English model originally trained by nlphuji. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/maltese_coref_english_spanish_gender_exp_en_5.5.0_3.0_1726457295217.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/maltese_coref_english_spanish_gender_exp_en_5.5.0_3.0_1726457295217.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("maltese_coref_english_spanish_gender_exp","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("maltese_coref_english_spanish_gender_exp","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|maltese_coref_english_spanish_gender_exp| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|540.3 MB| + +## References + +https://huggingface.co/nlphuji/mt_coref_en_es_gender_exp \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-marian_finetuned_kde4_english_tonga_tonga_islands_french_accelerate_2gpu_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-marian_finetuned_kde4_english_tonga_tonga_islands_french_accelerate_2gpu_pipeline_en.md new file mode 100644 index 00000000000000..e15f34d7642554 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-marian_finetuned_kde4_english_tonga_tonga_islands_french_accelerate_2gpu_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English marian_finetuned_kde4_english_tonga_tonga_islands_french_accelerate_2gpu_pipeline pipeline MarianTransformer from eleldar +author: John Snow Labs +name: marian_finetuned_kde4_english_tonga_tonga_islands_french_accelerate_2gpu_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`marian_finetuned_kde4_english_tonga_tonga_islands_french_accelerate_2gpu_pipeline` is a English model originally trained by eleldar. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/marian_finetuned_kde4_english_tonga_tonga_islands_french_accelerate_2gpu_pipeline_en_5.5.0_3.0_1726510010191.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/marian_finetuned_kde4_english_tonga_tonga_islands_french_accelerate_2gpu_pipeline_en_5.5.0_3.0_1726510010191.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("marian_finetuned_kde4_english_tonga_tonga_islands_french_accelerate_2gpu_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("marian_finetuned_kde4_english_tonga_tonga_islands_french_accelerate_2gpu_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|marian_finetuned_kde4_english_tonga_tonga_islands_french_accelerate_2gpu_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|508.7 MB| + +## References + +https://huggingface.co/eleldar/marian-finetuned-kde4-en-to-fr-accelerate-2gpu + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-marian_finetuned_kde4_english_tonga_tonga_islands_french_accelerate_laura0000_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-marian_finetuned_kde4_english_tonga_tonga_islands_french_accelerate_laura0000_pipeline_en.md new file mode 100644 index 00000000000000..cd948a541177fd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-marian_finetuned_kde4_english_tonga_tonga_islands_french_accelerate_laura0000_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English marian_finetuned_kde4_english_tonga_tonga_islands_french_accelerate_laura0000_pipeline pipeline MarianTransformer from laura0000 +author: John Snow Labs +name: marian_finetuned_kde4_english_tonga_tonga_islands_french_accelerate_laura0000_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`marian_finetuned_kde4_english_tonga_tonga_islands_french_accelerate_laura0000_pipeline` is a English model originally trained by laura0000. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/marian_finetuned_kde4_english_tonga_tonga_islands_french_accelerate_laura0000_pipeline_en_5.5.0_3.0_1726458214141.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/marian_finetuned_kde4_english_tonga_tonga_islands_french_accelerate_laura0000_pipeline_en_5.5.0_3.0_1726458214141.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("marian_finetuned_kde4_english_tonga_tonga_islands_french_accelerate_laura0000_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("marian_finetuned_kde4_english_tonga_tonga_islands_french_accelerate_laura0000_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|marian_finetuned_kde4_english_tonga_tonga_islands_french_accelerate_laura0000_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|508.7 MB| + +## References + +https://huggingface.co/laura0000/marian-finetuned-kde4-en-to-fr-accelerate + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-marian_finetuned_kde4_english_tonga_tonga_islands_french_accelerate_tkoyama_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-marian_finetuned_kde4_english_tonga_tonga_islands_french_accelerate_tkoyama_pipeline_en.md new file mode 100644 index 00000000000000..d6db063017b38a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-marian_finetuned_kde4_english_tonga_tonga_islands_french_accelerate_tkoyama_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English marian_finetuned_kde4_english_tonga_tonga_islands_french_accelerate_tkoyama_pipeline pipeline MarianTransformer from tkoyama +author: John Snow Labs +name: marian_finetuned_kde4_english_tonga_tonga_islands_french_accelerate_tkoyama_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`marian_finetuned_kde4_english_tonga_tonga_islands_french_accelerate_tkoyama_pipeline` is a English model originally trained by tkoyama. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/marian_finetuned_kde4_english_tonga_tonga_islands_french_accelerate_tkoyama_pipeline_en_5.5.0_3.0_1726494320116.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/marian_finetuned_kde4_english_tonga_tonga_islands_french_accelerate_tkoyama_pipeline_en_5.5.0_3.0_1726494320116.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("marian_finetuned_kde4_english_tonga_tonga_islands_french_accelerate_tkoyama_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("marian_finetuned_kde4_english_tonga_tonga_islands_french_accelerate_tkoyama_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|marian_finetuned_kde4_english_tonga_tonga_islands_french_accelerate_tkoyama_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|508.7 MB| + +## References + +https://huggingface.co/tkoyama/marian-finetuned-kde4-en-to-fr-accelerate + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-marian_finetuned_kde4_english_tonga_tonga_islands_french_accelerate_vasaicrow_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-marian_finetuned_kde4_english_tonga_tonga_islands_french_accelerate_vasaicrow_pipeline_en.md new file mode 100644 index 00000000000000..212a64be20eedd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-marian_finetuned_kde4_english_tonga_tonga_islands_french_accelerate_vasaicrow_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English marian_finetuned_kde4_english_tonga_tonga_islands_french_accelerate_vasaicrow_pipeline pipeline MarianTransformer from vasaicrow +author: John Snow Labs +name: marian_finetuned_kde4_english_tonga_tonga_islands_french_accelerate_vasaicrow_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`marian_finetuned_kde4_english_tonga_tonga_islands_french_accelerate_vasaicrow_pipeline` is a English model originally trained by vasaicrow. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/marian_finetuned_kde4_english_tonga_tonga_islands_french_accelerate_vasaicrow_pipeline_en_5.5.0_3.0_1726456976896.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/marian_finetuned_kde4_english_tonga_tonga_islands_french_accelerate_vasaicrow_pipeline_en_5.5.0_3.0_1726456976896.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("marian_finetuned_kde4_english_tonga_tonga_islands_french_accelerate_vasaicrow_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("marian_finetuned_kde4_english_tonga_tonga_islands_french_accelerate_vasaicrow_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|marian_finetuned_kde4_english_tonga_tonga_islands_french_accelerate_vasaicrow_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|508.3 MB| + +## References + +https://huggingface.co/vasaicrow/marian-finetuned-kde4-en-to-fr-accelerate + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-marian_finetuned_kde4_english_tonga_tonga_islands_french_erfangc_en.md b/docs/_posts/ahmedlone127/2024-09-16-marian_finetuned_kde4_english_tonga_tonga_islands_french_erfangc_en.md new file mode 100644 index 00000000000000..2d826191739732 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-marian_finetuned_kde4_english_tonga_tonga_islands_french_erfangc_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English marian_finetuned_kde4_english_tonga_tonga_islands_french_erfangc MarianTransformer from erfangc +author: John Snow Labs +name: marian_finetuned_kde4_english_tonga_tonga_islands_french_erfangc +date: 2024-09-16 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`marian_finetuned_kde4_english_tonga_tonga_islands_french_erfangc` is a English model originally trained by erfangc. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/marian_finetuned_kde4_english_tonga_tonga_islands_french_erfangc_en_5.5.0_3.0_1726491686647.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/marian_finetuned_kde4_english_tonga_tonga_islands_french_erfangc_en_5.5.0_3.0_1726491686647.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("marian_finetuned_kde4_english_tonga_tonga_islands_french_erfangc","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("marian_finetuned_kde4_english_tonga_tonga_islands_french_erfangc","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|marian_finetuned_kde4_english_tonga_tonga_islands_french_erfangc| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|508.1 MB| + +## References + +https://huggingface.co/erfangc/marian-finetuned-kde4-en-to-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-marian_finetuned_kde4_english_tonga_tonga_islands_french_willherbert27_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-marian_finetuned_kde4_english_tonga_tonga_islands_french_willherbert27_pipeline_en.md new file mode 100644 index 00000000000000..8e83530bc8cf36 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-marian_finetuned_kde4_english_tonga_tonga_islands_french_willherbert27_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English marian_finetuned_kde4_english_tonga_tonga_islands_french_willherbert27_pipeline pipeline MarianTransformer from willherbert27 +author: John Snow Labs +name: marian_finetuned_kde4_english_tonga_tonga_islands_french_willherbert27_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`marian_finetuned_kde4_english_tonga_tonga_islands_french_willherbert27_pipeline` is a English model originally trained by willherbert27. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/marian_finetuned_kde4_english_tonga_tonga_islands_french_willherbert27_pipeline_en_5.5.0_3.0_1726509813384.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/marian_finetuned_kde4_english_tonga_tonga_islands_french_willherbert27_pipeline_en_5.5.0_3.0_1726509813384.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("marian_finetuned_kde4_english_tonga_tonga_islands_french_willherbert27_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("marian_finetuned_kde4_english_tonga_tonga_islands_french_willherbert27_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|marian_finetuned_kde4_english_tonga_tonga_islands_french_willherbert27_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|508.6 MB| + +## References + +https://huggingface.co/willherbert27/marian-finetuned-kde4-en-to-fr + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-marian_finetuned_kde4_english_tonga_tonga_islands_romanian_accelerate_en.md b/docs/_posts/ahmedlone127/2024-09-16-marian_finetuned_kde4_english_tonga_tonga_islands_romanian_accelerate_en.md new file mode 100644 index 00000000000000..f02cc668c4986e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-marian_finetuned_kde4_english_tonga_tonga_islands_romanian_accelerate_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English marian_finetuned_kde4_english_tonga_tonga_islands_romanian_accelerate MarianTransformer from ecat3rina +author: John Snow Labs +name: marian_finetuned_kde4_english_tonga_tonga_islands_romanian_accelerate +date: 2024-09-16 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`marian_finetuned_kde4_english_tonga_tonga_islands_romanian_accelerate` is a English model originally trained by ecat3rina. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/marian_finetuned_kde4_english_tonga_tonga_islands_romanian_accelerate_en_5.5.0_3.0_1726491011715.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/marian_finetuned_kde4_english_tonga_tonga_islands_romanian_accelerate_en_5.5.0_3.0_1726491011715.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("marian_finetuned_kde4_english_tonga_tonga_islands_romanian_accelerate","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("marian_finetuned_kde4_english_tonga_tonga_islands_romanian_accelerate","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|marian_finetuned_kde4_english_tonga_tonga_islands_romanian_accelerate| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|508.3 MB| + +## References + +https://huggingface.co/ecat3rina/marian-finetuned-kde4-en-to-ro-accelerate \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-marian_finetuned_kde4_english_tonga_tonga_islands_romanian_accelerate_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-marian_finetuned_kde4_english_tonga_tonga_islands_romanian_accelerate_pipeline_en.md new file mode 100644 index 00000000000000..36272e23bcc658 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-marian_finetuned_kde4_english_tonga_tonga_islands_romanian_accelerate_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English marian_finetuned_kde4_english_tonga_tonga_islands_romanian_accelerate_pipeline pipeline MarianTransformer from ecat3rina +author: John Snow Labs +name: marian_finetuned_kde4_english_tonga_tonga_islands_romanian_accelerate_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`marian_finetuned_kde4_english_tonga_tonga_islands_romanian_accelerate_pipeline` is a English model originally trained by ecat3rina. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/marian_finetuned_kde4_english_tonga_tonga_islands_romanian_accelerate_pipeline_en_5.5.0_3.0_1726491036307.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/marian_finetuned_kde4_english_tonga_tonga_islands_romanian_accelerate_pipeline_en_5.5.0_3.0_1726491036307.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("marian_finetuned_kde4_english_tonga_tonga_islands_romanian_accelerate_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("marian_finetuned_kde4_english_tonga_tonga_islands_romanian_accelerate_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|marian_finetuned_kde4_english_tonga_tonga_islands_romanian_accelerate_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|508.8 MB| + +## References + +https://huggingface.co/ecat3rina/marian-finetuned-kde4-en-to-ro-accelerate + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-mentalroberta_4label_notes_en.md b/docs/_posts/ahmedlone127/2024-09-16-mentalroberta_4label_notes_en.md new file mode 100644 index 00000000000000..da533fb8915891 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-mentalroberta_4label_notes_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English mentalroberta_4label_notes RoBertaForSequenceClassification from AliaeAI +author: John Snow Labs +name: mentalroberta_4label_notes +date: 2024-09-16 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`mentalroberta_4label_notes` is a English model originally trained by AliaeAI. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/mentalroberta_4label_notes_en_5.5.0_3.0_1726504862902.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/mentalroberta_4label_notes_en_5.5.0_3.0_1726504862902.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("mentalroberta_4label_notes","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("mentalroberta_4label_notes", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|mentalroberta_4label_notes| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/AliaeAI/MentalRoBERTa_4label_notes \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-model_router_en.md b/docs/_posts/ahmedlone127/2024-09-16-model_router_en.md new file mode 100644 index 00000000000000..ab6b6deebca594 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-model_router_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English model_router DistilBertForSequenceClassification from marklicata +author: John Snow Labs +name: model_router +date: 2024-09-16 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`model_router` is a English model originally trained by marklicata. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/model_router_en_5.5.0_3.0_1726506504525.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/model_router_en_5.5.0_3.0_1726506504525.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("model_router","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("model_router", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|model_router| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/marklicata/model_router \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-nerd_nerd_random1_seed1_twitter_roberta_base_2022_154m_en.md b/docs/_posts/ahmedlone127/2024-09-16-nerd_nerd_random1_seed1_twitter_roberta_base_2022_154m_en.md new file mode 100644 index 00000000000000..b4ac7d9a86b297 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-nerd_nerd_random1_seed1_twitter_roberta_base_2022_154m_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English nerd_nerd_random1_seed1_twitter_roberta_base_2022_154m RoBertaForSequenceClassification from tweettemposhift +author: John Snow Labs +name: nerd_nerd_random1_seed1_twitter_roberta_base_2022_154m +date: 2024-09-16 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`nerd_nerd_random1_seed1_twitter_roberta_base_2022_154m` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/nerd_nerd_random1_seed1_twitter_roberta_base_2022_154m_en_5.5.0_3.0_1726527132370.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/nerd_nerd_random1_seed1_twitter_roberta_base_2022_154m_en_5.5.0_3.0_1726527132370.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("nerd_nerd_random1_seed1_twitter_roberta_base_2022_154m","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("nerd_nerd_random1_seed1_twitter_roberta_base_2022_154m", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|nerd_nerd_random1_seed1_twitter_roberta_base_2022_154m| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|468.2 MB| + +## References + +https://huggingface.co/tweettemposhift/nerd-nerd_random1_seed1-twitter-roberta-base-2022-154m \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-nofibot2_en.md b/docs/_posts/ahmedlone127/2024-09-16-nofibot2_en.md new file mode 100644 index 00000000000000..35552cbc9830c9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-nofibot2_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English nofibot2 DistilBertForQuestionAnswering from aslakeinbu +author: John Snow Labs +name: nofibot2 +date: 2024-09-16 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`nofibot2` is a English model originally trained by aslakeinbu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/nofibot2_en_5.5.0_3.0_1726515244749.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/nofibot2_en_5.5.0_3.0_1726515244749.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("nofibot2","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("nofibot2", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|nofibot2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|243.8 MB| + +## References + +https://huggingface.co/aslakeinbu/nofibot2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-opus_base_ailem_random_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-opus_base_ailem_random_pipeline_en.md new file mode 100644 index 00000000000000..077c73e9506cbc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-opus_base_ailem_random_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English opus_base_ailem_random_pipeline pipeline MarianTransformer from ethansimrm +author: John Snow Labs +name: opus_base_ailem_random_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opus_base_ailem_random_pipeline` is a English model originally trained by ethansimrm. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opus_base_ailem_random_pipeline_en_5.5.0_3.0_1726457143837.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opus_base_ailem_random_pipeline_en_5.5.0_3.0_1726457143837.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("opus_base_ailem_random_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("opus_base_ailem_random_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opus_base_ailem_random_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|509.0 MB| + +## References + +https://huggingface.co/ethansimrm/opus_base_ailem_random + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-opus_base_wce_random_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-opus_base_wce_random_pipeline_en.md new file mode 100644 index 00000000000000..43c46b41736e4c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-opus_base_wce_random_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English opus_base_wce_random_pipeline pipeline MarianTransformer from ethansimrm +author: John Snow Labs +name: opus_base_wce_random_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opus_base_wce_random_pipeline` is a English model originally trained by ethansimrm. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opus_base_wce_random_pipeline_en_5.5.0_3.0_1726503060650.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opus_base_wce_random_pipeline_en_5.5.0_3.0_1726503060650.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("opus_base_wce_random_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("opus_base_wce_random_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opus_base_wce_random_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|509.0 MB| + +## References + +https://huggingface.co/ethansimrm/opus_base_wce_random + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-opus_maltese_english_french_finetuned_english_tonga_tonga_islands_french_devaibest_en.md b/docs/_posts/ahmedlone127/2024-09-16-opus_maltese_english_french_finetuned_english_tonga_tonga_islands_french_devaibest_en.md new file mode 100644 index 00000000000000..b5f3f2d54d2dda --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-opus_maltese_english_french_finetuned_english_tonga_tonga_islands_french_devaibest_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English opus_maltese_english_french_finetuned_english_tonga_tonga_islands_french_devaibest MarianTransformer from DevAibest +author: John Snow Labs +name: opus_maltese_english_french_finetuned_english_tonga_tonga_islands_french_devaibest +date: 2024-09-16 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opus_maltese_english_french_finetuned_english_tonga_tonga_islands_french_devaibest` is a English model originally trained by DevAibest. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opus_maltese_english_french_finetuned_english_tonga_tonga_islands_french_devaibest_en_5.5.0_3.0_1726503434711.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opus_maltese_english_french_finetuned_english_tonga_tonga_islands_french_devaibest_en_5.5.0_3.0_1726503434711.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("opus_maltese_english_french_finetuned_english_tonga_tonga_islands_french_devaibest","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("opus_maltese_english_french_finetuned_english_tonga_tonga_islands_french_devaibest","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opus_maltese_english_french_finetuned_english_tonga_tonga_islands_french_devaibest| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|508.4 MB| + +## References + +https://huggingface.co/DevAibest/opus-mt-en-fr-finetuned-en-to-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-opus_maltese_english_ganda_finetuned_english_tonga_tonga_islands_ganda_finetuned_english_tonga_tonga_islands_lm_en.md b/docs/_posts/ahmedlone127/2024-09-16-opus_maltese_english_ganda_finetuned_english_tonga_tonga_islands_ganda_finetuned_english_tonga_tonga_islands_lm_en.md new file mode 100644 index 00000000000000..f22d7d49a62756 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-opus_maltese_english_ganda_finetuned_english_tonga_tonga_islands_ganda_finetuned_english_tonga_tonga_islands_lm_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English opus_maltese_english_ganda_finetuned_english_tonga_tonga_islands_ganda_finetuned_english_tonga_tonga_islands_lm MarianTransformer from Eyesiga +author: John Snow Labs +name: opus_maltese_english_ganda_finetuned_english_tonga_tonga_islands_ganda_finetuned_english_tonga_tonga_islands_lm +date: 2024-09-16 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opus_maltese_english_ganda_finetuned_english_tonga_tonga_islands_ganda_finetuned_english_tonga_tonga_islands_lm` is a English model originally trained by Eyesiga. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opus_maltese_english_ganda_finetuned_english_tonga_tonga_islands_ganda_finetuned_english_tonga_tonga_islands_lm_en_5.5.0_3.0_1726457550952.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opus_maltese_english_ganda_finetuned_english_tonga_tonga_islands_ganda_finetuned_english_tonga_tonga_islands_lm_en_5.5.0_3.0_1726457550952.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("opus_maltese_english_ganda_finetuned_english_tonga_tonga_islands_ganda_finetuned_english_tonga_tonga_islands_lm","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("opus_maltese_english_ganda_finetuned_english_tonga_tonga_islands_ganda_finetuned_english_tonga_tonga_islands_lm","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opus_maltese_english_ganda_finetuned_english_tonga_tonga_islands_ganda_finetuned_english_tonga_tonga_islands_lm| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|514.4 MB| + +## References + +https://huggingface.co/Eyesiga/opus-mt-en-lg-finetuned-en-to-lg-finetuned-en-to-lm \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_fxshan_en.md b/docs/_posts/ahmedlone127/2024-09-16-opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_fxshan_en.md new file mode 100644 index 00000000000000..0b80b3ebd99438 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_fxshan_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_fxshan MarianTransformer from fxshan +author: John Snow Labs +name: opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_fxshan +date: 2024-09-16 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_fxshan` is a English model originally trained by fxshan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_fxshan_en_5.5.0_3.0_1726491579349.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_fxshan_en_5.5.0_3.0_1726491579349.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_fxshan","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_fxshan","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_fxshan| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|508.5 MB| + +## References + +https://huggingface.co/fxshan/opus-mt-en-ro-finetuned-en-to-ro \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_polaris79_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_polaris79_pipeline_en.md new file mode 100644 index 00000000000000..652562dc721606 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_polaris79_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_polaris79_pipeline pipeline MarianTransformer from polaris79 +author: John Snow Labs +name: opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_polaris79_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_polaris79_pipeline` is a English model originally trained by polaris79. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_polaris79_pipeline_en_5.5.0_3.0_1726457552561.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_polaris79_pipeline_en_5.5.0_3.0_1726457552561.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_polaris79_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_polaris79_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_polaris79_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|509.0 MB| + +## References + +https://huggingface.co/polaris79/opus-mt-en-ro-finetuned-en-to-ro + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_yeshanp_en.md b/docs/_posts/ahmedlone127/2024-09-16-opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_yeshanp_en.md new file mode 100644 index 00000000000000..c50acd4220925a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_yeshanp_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_yeshanp MarianTransformer from yeshanp +author: John Snow Labs +name: opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_yeshanp +date: 2024-09-16 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_yeshanp` is a English model originally trained by yeshanp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_yeshanp_en_5.5.0_3.0_1726509926502.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_yeshanp_en_5.5.0_3.0_1726509926502.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_yeshanp","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_yeshanp","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_yeshanp| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|508.6 MB| + +## References + +https://huggingface.co/yeshanp/opus-mt-en-ro-finetuned-en-to-ro \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-opus_maltese_english_spanish_finetuned_spanish_tonga_tonga_islands_maq_v2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-opus_maltese_english_spanish_finetuned_spanish_tonga_tonga_islands_maq_v2_pipeline_en.md new file mode 100644 index 00000000000000..f5c354c1091b6d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-opus_maltese_english_spanish_finetuned_spanish_tonga_tonga_islands_maq_v2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English opus_maltese_english_spanish_finetuned_spanish_tonga_tonga_islands_maq_v2_pipeline pipeline MarianTransformer from mekjr1 +author: John Snow Labs +name: opus_maltese_english_spanish_finetuned_spanish_tonga_tonga_islands_maq_v2_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opus_maltese_english_spanish_finetuned_spanish_tonga_tonga_islands_maq_v2_pipeline` is a English model originally trained by mekjr1. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opus_maltese_english_spanish_finetuned_spanish_tonga_tonga_islands_maq_v2_pipeline_en_5.5.0_3.0_1726457504703.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opus_maltese_english_spanish_finetuned_spanish_tonga_tonga_islands_maq_v2_pipeline_en_5.5.0_3.0_1726457504703.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("opus_maltese_english_spanish_finetuned_spanish_tonga_tonga_islands_maq_v2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("opus_maltese_english_spanish_finetuned_spanish_tonga_tonga_islands_maq_v2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opus_maltese_english_spanish_finetuned_spanish_tonga_tonga_islands_maq_v2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|540.4 MB| + +## References + +https://huggingface.co/mekjr1/opus-mt-en-es-finetuned-es-to-maq-v2 + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-opus_maltese_german_icelandic_finetuned_german_tonga_tonga_islands_icelandic_nr2_en.md b/docs/_posts/ahmedlone127/2024-09-16-opus_maltese_german_icelandic_finetuned_german_tonga_tonga_islands_icelandic_nr2_en.md new file mode 100644 index 00000000000000..4484448c208065 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-opus_maltese_german_icelandic_finetuned_german_tonga_tonga_islands_icelandic_nr2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English opus_maltese_german_icelandic_finetuned_german_tonga_tonga_islands_icelandic_nr2 MarianTransformer from Culmenus +author: John Snow Labs +name: opus_maltese_german_icelandic_finetuned_german_tonga_tonga_islands_icelandic_nr2 +date: 2024-09-16 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opus_maltese_german_icelandic_finetuned_german_tonga_tonga_islands_icelandic_nr2` is a English model originally trained by Culmenus. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opus_maltese_german_icelandic_finetuned_german_tonga_tonga_islands_icelandic_nr2_en_5.5.0_3.0_1726465759767.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opus_maltese_german_icelandic_finetuned_german_tonga_tonga_islands_icelandic_nr2_en_5.5.0_3.0_1726465759767.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("opus_maltese_german_icelandic_finetuned_german_tonga_tonga_islands_icelandic_nr2","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("opus_maltese_german_icelandic_finetuned_german_tonga_tonga_islands_icelandic_nr2","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opus_maltese_german_icelandic_finetuned_german_tonga_tonga_islands_icelandic_nr2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|508.5 MB| + +## References + +https://huggingface.co/Culmenus/opus-mt-de-is-finetuned-de-to-is_nr2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-opus_maltese_german_icelandic_finetuned_german_tonga_tonga_islands_icelandic_nr2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-opus_maltese_german_icelandic_finetuned_german_tonga_tonga_islands_icelandic_nr2_pipeline_en.md new file mode 100644 index 00000000000000..5b930d680a64a0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-opus_maltese_german_icelandic_finetuned_german_tonga_tonga_islands_icelandic_nr2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English opus_maltese_german_icelandic_finetuned_german_tonga_tonga_islands_icelandic_nr2_pipeline pipeline MarianTransformer from Culmenus +author: John Snow Labs +name: opus_maltese_german_icelandic_finetuned_german_tonga_tonga_islands_icelandic_nr2_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opus_maltese_german_icelandic_finetuned_german_tonga_tonga_islands_icelandic_nr2_pipeline` is a English model originally trained by Culmenus. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opus_maltese_german_icelandic_finetuned_german_tonga_tonga_islands_icelandic_nr2_pipeline_en_5.5.0_3.0_1726465782743.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opus_maltese_german_icelandic_finetuned_german_tonga_tonga_islands_icelandic_nr2_pipeline_en_5.5.0_3.0_1726465782743.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("opus_maltese_german_icelandic_finetuned_german_tonga_tonga_islands_icelandic_nr2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("opus_maltese_german_icelandic_finetuned_german_tonga_tonga_islands_icelandic_nr2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opus_maltese_german_icelandic_finetuned_german_tonga_tonga_islands_icelandic_nr2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|509.1 MB| + +## References + +https://huggingface.co/Culmenus/opus-mt-de-is-finetuned-de-to-is_nr2 + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-opus_maltese_semitic_languages_english_finetuned_npomo_english_15_epochs_en.md b/docs/_posts/ahmedlone127/2024-09-16-opus_maltese_semitic_languages_english_finetuned_npomo_english_15_epochs_en.md new file mode 100644 index 00000000000000..5dd3264d387735 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-opus_maltese_semitic_languages_english_finetuned_npomo_english_15_epochs_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English opus_maltese_semitic_languages_english_finetuned_npomo_english_15_epochs MarianTransformer from UnassumingOwl +author: John Snow Labs +name: opus_maltese_semitic_languages_english_finetuned_npomo_english_15_epochs +date: 2024-09-16 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opus_maltese_semitic_languages_english_finetuned_npomo_english_15_epochs` is a English model originally trained by UnassumingOwl. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opus_maltese_semitic_languages_english_finetuned_npomo_english_15_epochs_en_5.5.0_3.0_1726491181668.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opus_maltese_semitic_languages_english_finetuned_npomo_english_15_epochs_en_5.5.0_3.0_1726491181668.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("opus_maltese_semitic_languages_english_finetuned_npomo_english_15_epochs","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("opus_maltese_semitic_languages_english_finetuned_npomo_english_15_epochs","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opus_maltese_semitic_languages_english_finetuned_npomo_english_15_epochs| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|518.6 MB| + +## References + +https://huggingface.co/UnassumingOwl/opus-mt-sem-en-finetuned-npomo-en-15-epochs \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-opus_maltese_tc_big_finetuned_english_tonga_tonga_islands_french_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-opus_maltese_tc_big_finetuned_english_tonga_tonga_islands_french_pipeline_en.md new file mode 100644 index 00000000000000..19d64db161a400 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-opus_maltese_tc_big_finetuned_english_tonga_tonga_islands_french_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English opus_maltese_tc_big_finetuned_english_tonga_tonga_islands_french_pipeline pipeline MarianTransformer from DevAibest +author: John Snow Labs +name: opus_maltese_tc_big_finetuned_english_tonga_tonga_islands_french_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opus_maltese_tc_big_finetuned_english_tonga_tonga_islands_french_pipeline` is a English model originally trained by DevAibest. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opus_maltese_tc_big_finetuned_english_tonga_tonga_islands_french_pipeline_en_5.5.0_3.0_1726494139744.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opus_maltese_tc_big_finetuned_english_tonga_tonga_islands_french_pipeline_en_5.5.0_3.0_1726494139744.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("opus_maltese_tc_big_finetuned_english_tonga_tonga_islands_french_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("opus_maltese_tc_big_finetuned_english_tonga_tonga_islands_french_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opus_maltese_tc_big_finetuned_english_tonga_tonga_islands_french_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/DevAibest/opus-mt-tc-big-finetuned-en-to-fr + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-opus_maltese_turkic_languages_english_finetuned_npomo_english_10_epochs_en.md b/docs/_posts/ahmedlone127/2024-09-16-opus_maltese_turkic_languages_english_finetuned_npomo_english_10_epochs_en.md new file mode 100644 index 00000000000000..381e71c6c4065a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-opus_maltese_turkic_languages_english_finetuned_npomo_english_10_epochs_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English opus_maltese_turkic_languages_english_finetuned_npomo_english_10_epochs MarianTransformer from UnassumingOwl +author: John Snow Labs +name: opus_maltese_turkic_languages_english_finetuned_npomo_english_10_epochs +date: 2024-09-16 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opus_maltese_turkic_languages_english_finetuned_npomo_english_10_epochs` is a English model originally trained by UnassumingOwl. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opus_maltese_turkic_languages_english_finetuned_npomo_english_10_epochs_en_5.5.0_3.0_1726456917479.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opus_maltese_turkic_languages_english_finetuned_npomo_english_10_epochs_en_5.5.0_3.0_1726456917479.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("opus_maltese_turkic_languages_english_finetuned_npomo_english_10_epochs","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("opus_maltese_turkic_languages_english_finetuned_npomo_english_10_epochs","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opus_maltese_turkic_languages_english_finetuned_npomo_english_10_epochs| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|518.9 MB| + +## References + +https://huggingface.co/UnassumingOwl/opus-mt-trk-en-finetuned-npomo-en-10-epochs \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-othe_2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-othe_2_pipeline_en.md new file mode 100644 index 00000000000000..dfb2d80090707a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-othe_2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English othe_2_pipeline pipeline RoBertaForSequenceClassification from BaronSch +author: John Snow Labs +name: othe_2_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`othe_2_pipeline` is a English model originally trained by BaronSch. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/othe_2_pipeline_en_5.5.0_3.0_1726518924455.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/othe_2_pipeline_en_5.5.0_3.0_1726518924455.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("othe_2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("othe_2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|othe_2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|468.5 MB| + +## References + +https://huggingface.co/BaronSch/Othe_2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-personal_whisper_medium_english_model_en.md b/docs/_posts/ahmedlone127/2024-09-16-personal_whisper_medium_english_model_en.md new file mode 100644 index 00000000000000..e7c64feaabb96c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-personal_whisper_medium_english_model_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English personal_whisper_medium_english_model WhisperForCTC from fractalego +author: John Snow Labs +name: personal_whisper_medium_english_model +date: 2024-09-16 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`personal_whisper_medium_english_model` is a English model originally trained by fractalego. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/personal_whisper_medium_english_model_en_5.5.0_3.0_1726481837719.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/personal_whisper_medium_english_model_en_5.5.0_3.0_1726481837719.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("personal_whisper_medium_english_model","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("personal_whisper_medium_english_model", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|personal_whisper_medium_english_model| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|4.8 GB| + +## References + +https://huggingface.co/fractalego/personal-whisper-medium.en-model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-platzi_distilroberta_base_mrpc_glue_rafa_rivera_en.md b/docs/_posts/ahmedlone127/2024-09-16-platzi_distilroberta_base_mrpc_glue_rafa_rivera_en.md new file mode 100644 index 00000000000000..0d556261d2ed3f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-platzi_distilroberta_base_mrpc_glue_rafa_rivera_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English platzi_distilroberta_base_mrpc_glue_rafa_rivera RoBertaForSequenceClassification from platzi +author: John Snow Labs +name: platzi_distilroberta_base_mrpc_glue_rafa_rivera +date: 2024-09-16 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`platzi_distilroberta_base_mrpc_glue_rafa_rivera` is a English model originally trained by platzi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/platzi_distilroberta_base_mrpc_glue_rafa_rivera_en_5.5.0_3.0_1726518905897.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/platzi_distilroberta_base_mrpc_glue_rafa_rivera_en_5.5.0_3.0_1726518905897.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("platzi_distilroberta_base_mrpc_glue_rafa_rivera","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("platzi_distilroberta_base_mrpc_glue_rafa_rivera", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|platzi_distilroberta_base_mrpc_glue_rafa_rivera| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|308.6 MB| + +## References + +https://huggingface.co/platzi/platzi-distilroberta-base-mrpc-glue-rafa-rivera \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-platzi_distilroberta_base_mrpc_glue_rafa_rivera_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-platzi_distilroberta_base_mrpc_glue_rafa_rivera_pipeline_en.md new file mode 100644 index 00000000000000..79226adba40990 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-platzi_distilroberta_base_mrpc_glue_rafa_rivera_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English platzi_distilroberta_base_mrpc_glue_rafa_rivera_pipeline pipeline RoBertaForSequenceClassification from platzi +author: John Snow Labs +name: platzi_distilroberta_base_mrpc_glue_rafa_rivera_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`platzi_distilroberta_base_mrpc_glue_rafa_rivera_pipeline` is a English model originally trained by platzi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/platzi_distilroberta_base_mrpc_glue_rafa_rivera_pipeline_en_5.5.0_3.0_1726518921630.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/platzi_distilroberta_base_mrpc_glue_rafa_rivera_pipeline_en_5.5.0_3.0_1726518921630.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("platzi_distilroberta_base_mrpc_glue_rafa_rivera_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("platzi_distilroberta_base_mrpc_glue_rafa_rivera_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|platzi_distilroberta_base_mrpc_glue_rafa_rivera_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|308.6 MB| + +## References + +https://huggingface.co/platzi/platzi-distilroberta-base-mrpc-glue-rafa-rivera + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-qnli_distilled_bart_cross_roberta_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-qnli_distilled_bart_cross_roberta_pipeline_en.md new file mode 100644 index 00000000000000..ca5dcc40ef658e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-qnli_distilled_bart_cross_roberta_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English qnli_distilled_bart_cross_roberta_pipeline pipeline RoBertaForSequenceClassification from Sayan01 +author: John Snow Labs +name: qnli_distilled_bart_cross_roberta_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`qnli_distilled_bart_cross_roberta_pipeline` is a English model originally trained by Sayan01. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/qnli_distilled_bart_cross_roberta_pipeline_en_5.5.0_3.0_1726455704703.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/qnli_distilled_bart_cross_roberta_pipeline_en_5.5.0_3.0_1726455704703.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("qnli_distilled_bart_cross_roberta_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("qnli_distilled_bart_cross_roberta_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|qnli_distilled_bart_cross_roberta_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|309.1 MB| + +## References + +https://huggingface.co/Sayan01/qnli-distilled-bart-cross-roberta + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-quantifying_stereotype_distilbert_en.md b/docs/_posts/ahmedlone127/2024-09-16-quantifying_stereotype_distilbert_en.md new file mode 100644 index 00000000000000..98ff3fdba0fe9a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-quantifying_stereotype_distilbert_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English quantifying_stereotype_distilbert DistilBertForSequenceClassification from lauyon +author: John Snow Labs +name: quantifying_stereotype_distilbert +date: 2024-09-16 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`quantifying_stereotype_distilbert` is a English model originally trained by lauyon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/quantifying_stereotype_distilbert_en_5.5.0_3.0_1726525725842.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/quantifying_stereotype_distilbert_en_5.5.0_3.0_1726525725842.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("quantifying_stereotype_distilbert","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("quantifying_stereotype_distilbert", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|quantifying_stereotype_distilbert| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.7 MB| + +## References + +https://huggingface.co/lauyon/quantifying-stereotype-distilbert \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-results_metrics_distilbert_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-results_metrics_distilbert_pipeline_en.md new file mode 100644 index 00000000000000..a374f8ceed1ca2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-results_metrics_distilbert_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English results_metrics_distilbert_pipeline pipeline DistilBertForSequenceClassification from vaishnavi514 +author: John Snow Labs +name: results_metrics_distilbert_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`results_metrics_distilbert_pipeline` is a English model originally trained by vaishnavi514. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/results_metrics_distilbert_pipeline_en_5.5.0_3.0_1726525319621.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/results_metrics_distilbert_pipeline_en_5.5.0_3.0_1726525319621.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("results_metrics_distilbert_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("results_metrics_distilbert_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|results_metrics_distilbert_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/vaishnavi514/results_metrics_distilbert + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-roberta_augmented_finetuned_atis_1pct_v2_en.md b/docs/_posts/ahmedlone127/2024-09-16-roberta_augmented_finetuned_atis_1pct_v2_en.md new file mode 100644 index 00000000000000..4dfab79782aa3c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-roberta_augmented_finetuned_atis_1pct_v2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_augmented_finetuned_atis_1pct_v2 RoBertaForSequenceClassification from benayas +author: John Snow Labs +name: roberta_augmented_finetuned_atis_1pct_v2 +date: 2024-09-16 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_augmented_finetuned_atis_1pct_v2` is a English model originally trained by benayas. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_augmented_finetuned_atis_1pct_v2_en_5.5.0_3.0_1726470737739.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_augmented_finetuned_atis_1pct_v2_en_5.5.0_3.0_1726470737739.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_augmented_finetuned_atis_1pct_v2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_augmented_finetuned_atis_1pct_v2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_augmented_finetuned_atis_1pct_v2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|426.4 MB| + +## References + +https://huggingface.co/benayas/roberta-augmented-finetuned-atis_1pct_v2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-roberta_base_fold_1_binary_v1_en.md b/docs/_posts/ahmedlone127/2024-09-16-roberta_base_fold_1_binary_v1_en.md new file mode 100644 index 00000000000000..76dd51a4f87af7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-roberta_base_fold_1_binary_v1_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_base_fold_1_binary_v1 RoBertaForSequenceClassification from elopezlopez +author: John Snow Labs +name: roberta_base_fold_1_binary_v1 +date: 2024-09-16 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_fold_1_binary_v1` is a English model originally trained by elopezlopez. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_fold_1_binary_v1_en_5.5.0_3.0_1726455398104.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_fold_1_binary_v1_en_5.5.0_3.0_1726455398104.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_base_fold_1_binary_v1","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_base_fold_1_binary_v1", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_fold_1_binary_v1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|435.1 MB| + +## References + +https://huggingface.co/elopezlopez/roberta-base_fold_1_binary_v1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-roberta_base_hoax_classifier_fulltext_v1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-roberta_base_hoax_classifier_fulltext_v1_pipeline_en.md new file mode 100644 index 00000000000000..da7f6bbd16b4ea --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-roberta_base_hoax_classifier_fulltext_v1_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_hoax_classifier_fulltext_v1_pipeline pipeline RoBertaForSequenceClassification from research-dump +author: John Snow Labs +name: roberta_base_hoax_classifier_fulltext_v1_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_hoax_classifier_fulltext_v1_pipeline` is a English model originally trained by research-dump. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_hoax_classifier_fulltext_v1_pipeline_en_5.5.0_3.0_1726470349131.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_hoax_classifier_fulltext_v1_pipeline_en_5.5.0_3.0_1726470349131.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_hoax_classifier_fulltext_v1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_hoax_classifier_fulltext_v1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_hoax_classifier_fulltext_v1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|435.2 MB| + +## References + +https://huggingface.co/research-dump/roberta-base_hoax_classifier_fulltext_v1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-roberta_base_squad_model1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-roberta_base_squad_model1_pipeline_en.md new file mode 100644 index 00000000000000..986c2de54de8b6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-roberta_base_squad_model1_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English roberta_base_squad_model1_pipeline pipeline RoBertaForQuestionAnswering from varun-v-rao +author: John Snow Labs +name: roberta_base_squad_model1_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_squad_model1_pipeline` is a English model originally trained by varun-v-rao. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_squad_model1_pipeline_en_5.5.0_3.0_1726460724456.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_squad_model1_pipeline_en_5.5.0_3.0_1726460724456.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_squad_model1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_squad_model1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_squad_model1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|461.9 MB| + +## References + +https://huggingface.co/varun-v-rao/roberta-base-squad-model1 + +## Included Models + +- MultiDocumentAssembler +- RoBertaForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-roberta_base_strict_2023_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-roberta_base_strict_2023_pipeline_en.md new file mode 100644 index 00000000000000..d022eda06d359b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-roberta_base_strict_2023_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_strict_2023_pipeline pipeline RoBertaEmbeddings from babylm +author: John Snow Labs +name: roberta_base_strict_2023_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_strict_2023_pipeline` is a English model originally trained by babylm. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_strict_2023_pipeline_en_5.5.0_3.0_1726513672725.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_strict_2023_pipeline_en_5.5.0_3.0_1726513672725.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_strict_2023_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_strict_2023_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_strict_2023_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|465.5 MB| + +## References + +https://huggingface.co/babylm/roberta-base-strict-2023 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-roberta_base_tweet_topic_single_2020_en.md b/docs/_posts/ahmedlone127/2024-09-16-roberta_base_tweet_topic_single_2020_en.md new file mode 100644 index 00000000000000..a116356ed68ce4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-roberta_base_tweet_topic_single_2020_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_base_tweet_topic_single_2020 RoBertaForSequenceClassification from cardiffnlp +author: John Snow Labs +name: roberta_base_tweet_topic_single_2020 +date: 2024-09-16 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_tweet_topic_single_2020` is a English model originally trained by cardiffnlp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_tweet_topic_single_2020_en_5.5.0_3.0_1726518139101.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_tweet_topic_single_2020_en_5.5.0_3.0_1726518139101.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_base_tweet_topic_single_2020","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_base_tweet_topic_single_2020", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_tweet_topic_single_2020| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|436.4 MB| + +## References + +https://huggingface.co/cardiffnlp/roberta-base-tweet-topic-single-2020 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-roberta_finetuned_subjqa_movies_2_mohamed13579_en.md b/docs/_posts/ahmedlone127/2024-09-16-roberta_finetuned_subjqa_movies_2_mohamed13579_en.md new file mode 100644 index 00000000000000..7a2168c8e66ce4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-roberta_finetuned_subjqa_movies_2_mohamed13579_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English roberta_finetuned_subjqa_movies_2_mohamed13579 RoBertaForQuestionAnswering from mohamed13579 +author: John Snow Labs +name: roberta_finetuned_subjqa_movies_2_mohamed13579 +date: 2024-09-16 +tags: [en, open_source, onnx, question_answering, roberta] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_finetuned_subjqa_movies_2_mohamed13579` is a English model originally trained by mohamed13579. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_finetuned_subjqa_movies_2_mohamed13579_en_5.5.0_3.0_1726460339490.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_finetuned_subjqa_movies_2_mohamed13579_en_5.5.0_3.0_1726460339490.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = RoBertaForQuestionAnswering.pretrained("roberta_finetuned_subjqa_movies_2_mohamed13579","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = RoBertaForQuestionAnswering.pretrained("roberta_finetuned_subjqa_movies_2_mohamed13579", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_finetuned_subjqa_movies_2_mohamed13579| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|464.1 MB| + +## References + +https://huggingface.co/mohamed13579/roberta-finetuned-subjqa-movies_2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-roberta_large_deletion_multiclass_complete_final_v2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-roberta_large_deletion_multiclass_complete_final_v2_pipeline_en.md new file mode 100644 index 00000000000000..a2555704f006b3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-roberta_large_deletion_multiclass_complete_final_v2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_large_deletion_multiclass_complete_final_v2_pipeline pipeline RoBertaForSequenceClassification from research-dump +author: John Snow Labs +name: roberta_large_deletion_multiclass_complete_final_v2_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_large_deletion_multiclass_complete_final_v2_pipeline` is a English model originally trained by research-dump. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_large_deletion_multiclass_complete_final_v2_pipeline_en_5.5.0_3.0_1726527980899.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_large_deletion_multiclass_complete_final_v2_pipeline_en_5.5.0_3.0_1726527980899.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_large_deletion_multiclass_complete_final_v2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_large_deletion_multiclass_complete_final_v2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_large_deletion_multiclass_complete_final_v2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/research-dump/roberta-large_deletion_multiclass_complete_final_v2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-roberta_large_depression_classification_v2_en.md b/docs/_posts/ahmedlone127/2024-09-16-roberta_large_depression_classification_v2_en.md new file mode 100644 index 00000000000000..0fb538fc83fc6f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-roberta_large_depression_classification_v2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_large_depression_classification_v2 RoBertaForSequenceClassification from Trong-Nghia +author: John Snow Labs +name: roberta_large_depression_classification_v2 +date: 2024-09-16 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_large_depression_classification_v2` is a English model originally trained by Trong-Nghia. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_large_depression_classification_v2_en_5.5.0_3.0_1726455935606.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_large_depression_classification_v2_en_5.5.0_3.0_1726455935606.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_large_depression_classification_v2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_large_depression_classification_v2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_large_depression_classification_v2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/Trong-Nghia/roberta-large-depression-classification-v2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-roberta_large_go_emotions_v3_en.md b/docs/_posts/ahmedlone127/2024-09-16-roberta_large_go_emotions_v3_en.md new file mode 100644 index 00000000000000..3ef13e1e1ae4ae --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-roberta_large_go_emotions_v3_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_large_go_emotions_v3 RoBertaForSequenceClassification from Prasadrao +author: John Snow Labs +name: roberta_large_go_emotions_v3 +date: 2024-09-16 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_large_go_emotions_v3` is a English model originally trained by Prasadrao. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_large_go_emotions_v3_en_5.5.0_3.0_1726505105983.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_large_go_emotions_v3_en_5.5.0_3.0_1726505105983.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_large_go_emotions_v3","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_large_go_emotions_v3", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_large_go_emotions_v3| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|468.2 MB| + +## References + +https://huggingface.co/Prasadrao/roberta-large-go-emotions_v3 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-roberta_model_babylm_challenge_strict_small_en.md b/docs/_posts/ahmedlone127/2024-09-16-roberta_model_babylm_challenge_strict_small_en.md new file mode 100644 index 00000000000000..9175f8fc7fb685 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-roberta_model_babylm_challenge_strict_small_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_model_babylm_challenge_strict_small RoBertaEmbeddings from TheBguy87 +author: John Snow Labs +name: roberta_model_babylm_challenge_strict_small +date: 2024-09-16 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_model_babylm_challenge_strict_small` is a English model originally trained by TheBguy87. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_model_babylm_challenge_strict_small_en_5.5.0_3.0_1726513849160.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_model_babylm_challenge_strict_small_en_5.5.0_3.0_1726513849160.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("roberta_model_babylm_challenge_strict_small","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("roberta_model_babylm_challenge_strict_small","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_model_babylm_challenge_strict_small| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|311.5 MB| + +## References + +https://huggingface.co/TheBguy87/roBERTa-Model-BabyLM-Challenge-Strict-Small \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-roberta_model_babylm_challenge_strict_small_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-roberta_model_babylm_challenge_strict_small_pipeline_en.md new file mode 100644 index 00000000000000..0ca7283dbf51f5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-roberta_model_babylm_challenge_strict_small_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_model_babylm_challenge_strict_small_pipeline pipeline RoBertaEmbeddings from TheBguy87 +author: John Snow Labs +name: roberta_model_babylm_challenge_strict_small_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_model_babylm_challenge_strict_small_pipeline` is a English model originally trained by TheBguy87. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_model_babylm_challenge_strict_small_pipeline_en_5.5.0_3.0_1726513863531.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_model_babylm_challenge_strict_small_pipeline_en_5.5.0_3.0_1726513863531.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_model_babylm_challenge_strict_small_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_model_babylm_challenge_strict_small_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_model_babylm_challenge_strict_small_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|311.5 MB| + +## References + +https://huggingface.co/TheBguy87/roBERTa-Model-BabyLM-Challenge-Strict-Small + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-roberta_qa_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-roberta_qa_pipeline_en.md new file mode 100644 index 00000000000000..dfd1b13b4ffc34 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-roberta_qa_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English roberta_qa_pipeline pipeline RoBertaForQuestionAnswering from vaibhav9 +author: John Snow Labs +name: roberta_qa_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_qa_pipeline` is a English model originally trained by vaibhav9. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_qa_pipeline_en_5.5.0_3.0_1726501722615.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_qa_pipeline_en_5.5.0_3.0_1726501722615.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_qa_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_qa_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_qa_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|463.7 MB| + +## References + +https://huggingface.co/vaibhav9/roberta-qa + +## Included Models + +- MultiDocumentAssembler +- RoBertaForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-roberta_tweet_eval_finetuned_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-roberta_tweet_eval_finetuned_pipeline_en.md new file mode 100644 index 00000000000000..366bf93f3c4b6d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-roberta_tweet_eval_finetuned_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_tweet_eval_finetuned_pipeline pipeline RoBertaForSequenceClassification from cruiser +author: John Snow Labs +name: roberta_tweet_eval_finetuned_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_tweet_eval_finetuned_pipeline` is a English model originally trained by cruiser. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_tweet_eval_finetuned_pipeline_en_5.5.0_3.0_1726527328768.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_tweet_eval_finetuned_pipeline_en_5.5.0_3.0_1726527328768.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_tweet_eval_finetuned_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_tweet_eval_finetuned_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_tweet_eval_finetuned_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|451.2 MB| + +## References + +https://huggingface.co/cruiser/roberta_tweet_eval_finetuned + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-rubert_base_cased_finetuned_squad_en.md b/docs/_posts/ahmedlone127/2024-09-16-rubert_base_cased_finetuned_squad_en.md new file mode 100644 index 00000000000000..61d3ab6de3928d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-rubert_base_cased_finetuned_squad_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English rubert_base_cased_finetuned_squad BertForQuestionAnswering from KirrAno93 +author: John Snow Labs +name: rubert_base_cased_finetuned_squad +date: 2024-09-16 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`rubert_base_cased_finetuned_squad` is a English model originally trained by KirrAno93. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/rubert_base_cased_finetuned_squad_en_5.5.0_3.0_1726490008996.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/rubert_base_cased_finetuned_squad_en_5.5.0_3.0_1726490008996.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("rubert_base_cased_finetuned_squad","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("rubert_base_cased_finetuned_squad", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|rubert_base_cased_finetuned_squad| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|664.3 MB| + +## References + +https://huggingface.co/KirrAno93/rubert-base-cased-finetuned-squad \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-salamathanksfil2env3_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-salamathanksfil2env3_pipeline_en.md new file mode 100644 index 00000000000000..8221166ba0d5e1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-salamathanksfil2env3_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English salamathanksfil2env3_pipeline pipeline MarianTransformer from jimacasaet +author: John Snow Labs +name: salamathanksfil2env3_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`salamathanksfil2env3_pipeline` is a English model originally trained by jimacasaet. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/salamathanksfil2env3_pipeline_en_5.5.0_3.0_1726491005569.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/salamathanksfil2env3_pipeline_en_5.5.0_3.0_1726491005569.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("salamathanksfil2env3_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("salamathanksfil2env3_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|salamathanksfil2env3_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|497.1 MB| + +## References + +https://huggingface.co/jimacasaet/SalamaThanksFIL2ENv3 + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-sent_bert_ancient_chinese_pipeline_zh.md b/docs/_posts/ahmedlone127/2024-09-16-sent_bert_ancient_chinese_pipeline_zh.md new file mode 100644 index 00000000000000..dc5bb8eac5ac33 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-sent_bert_ancient_chinese_pipeline_zh.md @@ -0,0 +1,71 @@ +--- +layout: model +title: Chinese sent_bert_ancient_chinese_pipeline pipeline BertSentenceEmbeddings from Jihuai +author: John Snow Labs +name: sent_bert_ancient_chinese_pipeline +date: 2024-09-16 +tags: [zh, open_source, pipeline, onnx] +task: Embeddings +language: zh +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_ancient_chinese_pipeline` is a Chinese model originally trained by Jihuai. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_ancient_chinese_pipeline_zh_5.5.0_3.0_1726500508221.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_ancient_chinese_pipeline_zh_5.5.0_3.0_1726500508221.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_ancient_chinese_pipeline", lang = "zh") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_ancient_chinese_pipeline", lang = "zh") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_ancient_chinese_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|zh| +|Size:|431.0 MB| + +## References + +https://huggingface.co/Jihuai/bert-ancient-chinese + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-sent_bert_next_word_prediction_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-sent_bert_next_word_prediction_pipeline_en.md new file mode 100644 index 00000000000000..3b3797c18bcea2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-sent_bert_next_word_prediction_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_next_word_prediction_pipeline pipeline BertSentenceEmbeddings from MattNandavong +author: John Snow Labs +name: sent_bert_next_word_prediction_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_next_word_prediction_pipeline` is a English model originally trained by MattNandavong. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_next_word_prediction_pipeline_en_5.5.0_3.0_1726528724203.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_next_word_prediction_pipeline_en_5.5.0_3.0_1726528724203.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_next_word_prediction_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_next_word_prediction_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_next_word_prediction_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.7 MB| + +## References + +https://huggingface.co/MattNandavong/bert-next-word-prediction + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-sent_clinical_pubmed_bert_base_512_en.md b/docs/_posts/ahmedlone127/2024-09-16-sent_clinical_pubmed_bert_base_512_en.md new file mode 100644 index 00000000000000..dde4a8ed7625aa --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-sent_clinical_pubmed_bert_base_512_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_clinical_pubmed_bert_base_512 BertSentenceEmbeddings from Tsubasaz +author: John Snow Labs +name: sent_clinical_pubmed_bert_base_512 +date: 2024-09-16 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_clinical_pubmed_bert_base_512` is a English model originally trained by Tsubasaz. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_clinical_pubmed_bert_base_512_en_5.5.0_3.0_1726501104628.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_clinical_pubmed_bert_base_512_en_5.5.0_3.0_1726501104628.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_clinical_pubmed_bert_base_512","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_clinical_pubmed_bert_base_512","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_clinical_pubmed_bert_base_512| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|408.0 MB| + +## References + +https://huggingface.co/Tsubasaz/clinical-pubmed-bert-base-512 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-sent_clinical_pubmed_bert_base_512_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-sent_clinical_pubmed_bert_base_512_pipeline_en.md new file mode 100644 index 00000000000000..aaa22ac48adc56 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-sent_clinical_pubmed_bert_base_512_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_clinical_pubmed_bert_base_512_pipeline pipeline BertSentenceEmbeddings from Tsubasaz +author: John Snow Labs +name: sent_clinical_pubmed_bert_base_512_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_clinical_pubmed_bert_base_512_pipeline` is a English model originally trained by Tsubasaz. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_clinical_pubmed_bert_base_512_pipeline_en_5.5.0_3.0_1726501123685.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_clinical_pubmed_bert_base_512_pipeline_en_5.5.0_3.0_1726501123685.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_clinical_pubmed_bert_base_512_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_clinical_pubmed_bert_base_512_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_clinical_pubmed_bert_base_512_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|408.6 MB| + +## References + +https://huggingface.co/Tsubasaz/clinical-pubmed-bert-base-512 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-sent_morrbert_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-sent_morrbert_pipeline_en.md new file mode 100644 index 00000000000000..bada496e53a54d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-sent_morrbert_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_morrbert_pipeline pipeline BertSentenceEmbeddings from otmangi +author: John Snow Labs +name: sent_morrbert_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_morrbert_pipeline` is a English model originally trained by otmangi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_morrbert_pipeline_en_5.5.0_3.0_1726522751201.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_morrbert_pipeline_en_5.5.0_3.0_1726522751201.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_morrbert_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_morrbert_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_morrbert_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|470.5 MB| + +## References + +https://huggingface.co/otmangi/MorrBERT + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-sent_roberta_base_culinary_en.md b/docs/_posts/ahmedlone127/2024-09-16-sent_roberta_base_culinary_en.md new file mode 100644 index 00000000000000..c6b05eea683057 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-sent_roberta_base_culinary_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_roberta_base_culinary BertSentenceEmbeddings from juancavallotti +author: John Snow Labs +name: sent_roberta_base_culinary +date: 2024-09-16 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_roberta_base_culinary` is a English model originally trained by juancavallotti. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_roberta_base_culinary_en_5.5.0_3.0_1726528837898.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_roberta_base_culinary_en_5.5.0_3.0_1726528837898.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_roberta_base_culinary","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_roberta_base_culinary","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_roberta_base_culinary| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|405.5 MB| + +## References + +https://huggingface.co/juancavallotti/roberta-base-culinary \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-sentiment_analysis_on_covid_tweets_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-sentiment_analysis_on_covid_tweets_pipeline_en.md new file mode 100644 index 00000000000000..df3e5ff0e1772c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-sentiment_analysis_on_covid_tweets_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English sentiment_analysis_on_covid_tweets_pipeline pipeline RoBertaForSequenceClassification from AmpomahChief +author: John Snow Labs +name: sentiment_analysis_on_covid_tweets_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sentiment_analysis_on_covid_tweets_pipeline` is a English model originally trained by AmpomahChief. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sentiment_analysis_on_covid_tweets_pipeline_en_5.5.0_3.0_1726456086917.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sentiment_analysis_on_covid_tweets_pipeline_en_5.5.0_3.0_1726456086917.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sentiment_analysis_on_covid_tweets_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sentiment_analysis_on_covid_tweets_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sentiment_analysis_on_covid_tweets_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|468.2 MB| + +## References + +https://huggingface.co/AmpomahChief/sentiment_analysis_on_covid_tweets + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-sentiment_vicd_en.md b/docs/_posts/ahmedlone127/2024-09-16-sentiment_vicd_en.md new file mode 100644 index 00000000000000..4de0c2cb8ced88 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-sentiment_vicd_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sentiment_vicd RoBertaForSequenceClassification from vicd +author: John Snow Labs +name: sentiment_vicd +date: 2024-09-16 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sentiment_vicd` is a English model originally trained by vicd. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sentiment_vicd_en_5.5.0_3.0_1726456358417.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sentiment_vicd_en_5.5.0_3.0_1726456358417.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("sentiment_vicd","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("sentiment_vicd", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sentiment_vicd| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|467.4 MB| + +## References + +https://huggingface.co/vicd/sentiment \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-suicide_distilbert_2_5_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-suicide_distilbert_2_5_pipeline_en.md new file mode 100644 index 00000000000000..4c363dfe8a0456 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-suicide_distilbert_2_5_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English suicide_distilbert_2_5_pipeline pipeline DistilBertForSequenceClassification from cuadron11 +author: John Snow Labs +name: suicide_distilbert_2_5_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`suicide_distilbert_2_5_pipeline` is a English model originally trained by cuadron11. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/suicide_distilbert_2_5_pipeline_en_5.5.0_3.0_1726506828788.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/suicide_distilbert_2_5_pipeline_en_5.5.0_3.0_1726506828788.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("suicide_distilbert_2_5_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("suicide_distilbert_2_5_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|suicide_distilbert_2_5_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/cuadron11/suicide-distilbert-2-5 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-t_10005_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-t_10005_pipeline_en.md new file mode 100644 index 00000000000000..96d18b66f322da --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-t_10005_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English t_10005_pipeline pipeline RoBertaForSequenceClassification from Pablojmed +author: John Snow Labs +name: t_10005_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`t_10005_pipeline` is a English model originally trained by Pablojmed. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/t_10005_pipeline_en_5.5.0_3.0_1726527961022.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/t_10005_pipeline_en_5.5.0_3.0_1726527961022.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("t_10005_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("t_10005_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|t_10005_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|438.2 MB| + +## References + +https://huggingface.co/Pablojmed/t_10005 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-test_trainer_allevelly_en.md b/docs/_posts/ahmedlone127/2024-09-16-test_trainer_allevelly_en.md new file mode 100644 index 00000000000000..f26fce57aa7810 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-test_trainer_allevelly_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English test_trainer_allevelly RoBertaForSequenceClassification from allevelly +author: John Snow Labs +name: test_trainer_allevelly +date: 2024-09-16 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`test_trainer_allevelly` is a English model originally trained by allevelly. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/test_trainer_allevelly_en_5.5.0_3.0_1726504460165.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/test_trainer_allevelly_en_5.5.0_3.0_1726504460165.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("test_trainer_allevelly","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("test_trainer_allevelly", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|test_trainer_allevelly| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|468.1 MB| + +## References + +https://huggingface.co/allevelly/test_trainer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-tner_xlm_roberta_base_uncased_all_english_finetuned_rte_en.md b/docs/_posts/ahmedlone127/2024-09-16-tner_xlm_roberta_base_uncased_all_english_finetuned_rte_en.md new file mode 100644 index 00000000000000..942312a089f63a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-tner_xlm_roberta_base_uncased_all_english_finetuned_rte_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English tner_xlm_roberta_base_uncased_all_english_finetuned_rte XlmRoBertaForSequenceClassification from anamelchor +author: John Snow Labs +name: tner_xlm_roberta_base_uncased_all_english_finetuned_rte +date: 2024-09-16 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`tner_xlm_roberta_base_uncased_all_english_finetuned_rte` is a English model originally trained by anamelchor. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/tner_xlm_roberta_base_uncased_all_english_finetuned_rte_en_5.5.0_3.0_1726516599245.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/tner_xlm_roberta_base_uncased_all_english_finetuned_rte_en_5.5.0_3.0_1726516599245.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("tner_xlm_roberta_base_uncased_all_english_finetuned_rte","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("tner_xlm_roberta_base_uncased_all_english_finetuned_rte", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|tner_xlm_roberta_base_uncased_all_english_finetuned_rte| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|828.1 MB| + +## References + +https://huggingface.co/anamelchor/tner-xlm-roberta-base-uncased-all-english-finetuned-rte \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-transformer_qa_with_batch_en.md b/docs/_posts/ahmedlone127/2024-09-16-transformer_qa_with_batch_en.md new file mode 100644 index 00000000000000..60457450af920f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-transformer_qa_with_batch_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English transformer_qa_with_batch DistilBertForQuestionAnswering from choichoi +author: John Snow Labs +name: transformer_qa_with_batch +date: 2024-09-16 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`transformer_qa_with_batch` is a English model originally trained by choichoi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/transformer_qa_with_batch_en_5.5.0_3.0_1726515321848.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/transformer_qa_with_batch_en_5.5.0_3.0_1726515321848.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("transformer_qa_with_batch","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("transformer_qa_with_batch", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|transformer_qa_with_batch| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/choichoi/transformer_qa_with_batch \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-translate_model_fixed_v0_3_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-translate_model_fixed_v0_3_pipeline_en.md new file mode 100644 index 00000000000000..e7901075793cd6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-translate_model_fixed_v0_3_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English translate_model_fixed_v0_3_pipeline pipeline MarianTransformer from gshields +author: John Snow Labs +name: translate_model_fixed_v0_3_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`translate_model_fixed_v0_3_pipeline` is a English model originally trained by gshields. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/translate_model_fixed_v0_3_pipeline_en_5.5.0_3.0_1726493796255.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/translate_model_fixed_v0_3_pipeline_en_5.5.0_3.0_1726493796255.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("translate_model_fixed_v0_3_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("translate_model_fixed_v0_3_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|translate_model_fixed_v0_3_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|523.4 MB| + +## References + +https://huggingface.co/gshields/translate_model_fixed_v0.3 + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-translator_en.md b/docs/_posts/ahmedlone127/2024-09-16-translator_en.md new file mode 100644 index 00000000000000..5bb87dd4d893cb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-translator_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English translator MarianTransformer from motmans-pj +author: John Snow Labs +name: translator +date: 2024-09-16 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`translator` is a English model originally trained by motmans-pj. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/translator_en_5.5.0_3.0_1726491520646.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/translator_en_5.5.0_3.0_1726491520646.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("translator","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("translator","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|translator| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|548.9 MB| + +## References + +https://huggingface.co/motmans-pj/translator \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-translator_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-translator_pipeline_en.md new file mode 100644 index 00000000000000..d9d505ca2cc657 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-translator_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English translator_pipeline pipeline MarianTransformer from motmans-pj +author: John Snow Labs +name: translator_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`translator_pipeline` is a English model originally trained by motmans-pj. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/translator_pipeline_en_5.5.0_3.0_1726491548220.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/translator_pipeline_en_5.5.0_3.0_1726491548220.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("translator_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("translator_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|translator_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|549.5 MB| + +## References + +https://huggingface.co/motmans-pj/translator + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-twiiter_try8_fold2_en.md b/docs/_posts/ahmedlone127/2024-09-16-twiiter_try8_fold2_en.md new file mode 100644 index 00000000000000..c70e9294e32c18 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-twiiter_try8_fold2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English twiiter_try8_fold2 RoBertaForSequenceClassification from yanezh +author: John Snow Labs +name: twiiter_try8_fold2 +date: 2024-09-16 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`twiiter_try8_fold2` is a English model originally trained by yanezh. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/twiiter_try8_fold2_en_5.5.0_3.0_1726527391813.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/twiiter_try8_fold2_en_5.5.0_3.0_1726527391813.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("twiiter_try8_fold2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("twiiter_try8_fold2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|twiiter_try8_fold2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|468.2 MB| + +## References + +https://huggingface.co/yanezh/twiiter_try8_fold2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-twitter_roberta_base_sentiment_latest_nizar_sayad_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-twitter_roberta_base_sentiment_latest_nizar_sayad_pipeline_en.md new file mode 100644 index 00000000000000..34107ab61b4aef --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-twitter_roberta_base_sentiment_latest_nizar_sayad_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English twitter_roberta_base_sentiment_latest_nizar_sayad_pipeline pipeline RoBertaForSequenceClassification from nizar-sayad +author: John Snow Labs +name: twitter_roberta_base_sentiment_latest_nizar_sayad_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`twitter_roberta_base_sentiment_latest_nizar_sayad_pipeline` is a English model originally trained by nizar-sayad. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/twitter_roberta_base_sentiment_latest_nizar_sayad_pipeline_en_5.5.0_3.0_1726470397384.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/twitter_roberta_base_sentiment_latest_nizar_sayad_pipeline_en_5.5.0_3.0_1726470397384.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("twitter_roberta_base_sentiment_latest_nizar_sayad_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("twitter_roberta_base_sentiment_latest_nizar_sayad_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|twitter_roberta_base_sentiment_latest_nizar_sayad_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|468.3 MB| + +## References + +https://huggingface.co/nizar-sayad/twitter-roberta-base-sentiment-latest + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-vietnamese_whisper_small_en.md b/docs/_posts/ahmedlone127/2024-09-16-vietnamese_whisper_small_en.md new file mode 100644 index 00000000000000..69b164b9346f1b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-vietnamese_whisper_small_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English vietnamese_whisper_small WhisperForCTC from DuyTa +author: John Snow Labs +name: vietnamese_whisper_small +date: 2024-09-16 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`vietnamese_whisper_small` is a English model originally trained by DuyTa. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/vietnamese_whisper_small_en_5.5.0_3.0_1726485857648.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/vietnamese_whisper_small_en_5.5.0_3.0_1726485857648.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("vietnamese_whisper_small","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("vietnamese_whisper_small", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|vietnamese_whisper_small| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/DuyTa/vi_whisper-small \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-whisper_medium_english_atco2_asr_en.md b/docs/_posts/ahmedlone127/2024-09-16-whisper_medium_english_atco2_asr_en.md new file mode 100644 index 00000000000000..ebff425ae4636b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-whisper_medium_english_atco2_asr_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_medium_english_atco2_asr WhisperForCTC from jlvdoorn +author: John Snow Labs +name: whisper_medium_english_atco2_asr +date: 2024-09-16 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_medium_english_atco2_asr` is a English model originally trained by jlvdoorn. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_medium_english_atco2_asr_en_5.5.0_3.0_1726481838556.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_medium_english_atco2_asr_en_5.5.0_3.0_1726481838556.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_medium_english_atco2_asr","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_medium_english_atco2_asr", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_medium_english_atco2_asr| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|4.8 GB| + +## References + +https://huggingface.co/jlvdoorn/whisper-medium.en-atco2-asr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-whisper_small_divehi_victorbarra_pipeline_dv.md b/docs/_posts/ahmedlone127/2024-09-16-whisper_small_divehi_victorbarra_pipeline_dv.md new file mode 100644 index 00000000000000..5a92cf7e984980 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-whisper_small_divehi_victorbarra_pipeline_dv.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Dhivehi, Divehi, Maldivian whisper_small_divehi_victorbarra_pipeline pipeline WhisperForCTC from victorbarra +author: John Snow Labs +name: whisper_small_divehi_victorbarra_pipeline +date: 2024-09-16 +tags: [dv, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: dv +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_divehi_victorbarra_pipeline` is a Dhivehi, Divehi, Maldivian model originally trained by victorbarra. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_divehi_victorbarra_pipeline_dv_5.5.0_3.0_1726478105688.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_divehi_victorbarra_pipeline_dv_5.5.0_3.0_1726478105688.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_divehi_victorbarra_pipeline", lang = "dv") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_divehi_victorbarra_pipeline", lang = "dv") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_divehi_victorbarra_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|dv| +|Size:|1.7 GB| + +## References + +https://huggingface.co/victorbarra/whisper-small-dv + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-whisper_small_persian_farsi_benchmarkcentral_pipeline_fa.md b/docs/_posts/ahmedlone127/2024-09-16-whisper_small_persian_farsi_benchmarkcentral_pipeline_fa.md new file mode 100644 index 00000000000000..ed55270fc70aff --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-whisper_small_persian_farsi_benchmarkcentral_pipeline_fa.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Persian whisper_small_persian_farsi_benchmarkcentral_pipeline pipeline WhisperForCTC from benchmarkcentral +author: John Snow Labs +name: whisper_small_persian_farsi_benchmarkcentral_pipeline +date: 2024-09-16 +tags: [fa, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: fa +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_persian_farsi_benchmarkcentral_pipeline` is a Persian model originally trained by benchmarkcentral. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_persian_farsi_benchmarkcentral_pipeline_fa_5.5.0_3.0_1726481274965.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_persian_farsi_benchmarkcentral_pipeline_fa_5.5.0_3.0_1726481274965.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_persian_farsi_benchmarkcentral_pipeline", lang = "fa") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_persian_farsi_benchmarkcentral_pipeline", lang = "fa") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_persian_farsi_benchmarkcentral_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|fa| +|Size:|1.7 GB| + +## References + +https://huggingface.co/benchmarkcentral/whisper-small-fa + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-whisper_small_seiching_zh.md b/docs/_posts/ahmedlone127/2024-09-16-whisper_small_seiching_zh.md new file mode 100644 index 00000000000000..b60736b3dfe19d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-whisper_small_seiching_zh.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Chinese whisper_small_seiching WhisperForCTC from seiching +author: John Snow Labs +name: whisper_small_seiching +date: 2024-09-16 +tags: [zh, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: zh +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_seiching` is a Chinese model originally trained by seiching. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_seiching_zh_5.5.0_3.0_1726478635840.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_seiching_zh_5.5.0_3.0_1726478635840.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_seiching","zh") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_seiching", "zh") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_seiching| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|zh| +|Size:|1.7 GB| + +## References + +https://huggingface.co/seiching/whisper-small-seiching \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-whisper_tiny_bengali_rakib_pipeline_bn.md b/docs/_posts/ahmedlone127/2024-09-16-whisper_tiny_bengali_rakib_pipeline_bn.md new file mode 100644 index 00000000000000..d746c243f36898 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-whisper_tiny_bengali_rakib_pipeline_bn.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Bengali whisper_tiny_bengali_rakib_pipeline pipeline WhisperForCTC from Rakib +author: John Snow Labs +name: whisper_tiny_bengali_rakib_pipeline +date: 2024-09-16 +tags: [bn, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: bn +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_bengali_rakib_pipeline` is a Bengali model originally trained by Rakib. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_bengali_rakib_pipeline_bn_5.5.0_3.0_1726480094832.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_bengali_rakib_pipeline_bn_5.5.0_3.0_1726480094832.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_tiny_bengali_rakib_pipeline", lang = "bn") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_tiny_bengali_rakib_pipeline", lang = "bn") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_bengali_rakib_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|bn| +|Size:|391.3 MB| + +## References + +https://huggingface.co/Rakib/whisper-tiny-bn + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-whisper_tiny_divehi_rajeshwari_ss_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-whisper_tiny_divehi_rajeshwari_ss_pipeline_en.md new file mode 100644 index 00000000000000..8f5aa1996ce814 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-whisper_tiny_divehi_rajeshwari_ss_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_tiny_divehi_rajeshwari_ss_pipeline pipeline WhisperForCTC from Rajeshwari-SS +author: John Snow Labs +name: whisper_tiny_divehi_rajeshwari_ss_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_divehi_rajeshwari_ss_pipeline` is a English model originally trained by Rajeshwari-SS. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_divehi_rajeshwari_ss_pipeline_en_5.5.0_3.0_1726486594722.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_divehi_rajeshwari_ss_pipeline_en_5.5.0_3.0_1726486594722.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_tiny_divehi_rajeshwari_ss_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_tiny_divehi_rajeshwari_ss_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_divehi_rajeshwari_ss_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|389.9 MB| + +## References + +https://huggingface.co/Rajeshwari-SS/whisper-tiny-dv + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-whisper_tiny_english_us_sfedar_en.md b/docs/_posts/ahmedlone127/2024-09-16-whisper_tiny_english_us_sfedar_en.md new file mode 100644 index 00000000000000..3140f5998cda9e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-whisper_tiny_english_us_sfedar_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_tiny_english_us_sfedar WhisperForCTC from sfedar +author: John Snow Labs +name: whisper_tiny_english_us_sfedar +date: 2024-09-16 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_english_us_sfedar` is a English model originally trained by sfedar. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_english_us_sfedar_en_5.5.0_3.0_1726478952211.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_english_us_sfedar_en_5.5.0_3.0_1726478952211.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_tiny_english_us_sfedar","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_tiny_english_us_sfedar", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_english_us_sfedar| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|389.9 MB| + +## References + +https://huggingface.co/sfedar/whisper-tiny-en-US \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-whisper_tiny_polyai_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-whisper_tiny_polyai_pipeline_en.md new file mode 100644 index 00000000000000..111417edbaea79 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-whisper_tiny_polyai_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_tiny_polyai_pipeline pipeline WhisperForCTC from giocs2017 +author: John Snow Labs +name: whisper_tiny_polyai_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_polyai_pipeline` is a English model originally trained by giocs2017. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_polyai_pipeline_en_5.5.0_3.0_1726485521296.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_polyai_pipeline_en_5.5.0_3.0_1726485521296.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_tiny_polyai_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_tiny_polyai_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_polyai_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|390.9 MB| + +## References + +https://huggingface.co/giocs2017/whisper-tiny-polyai + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-whisper_tiny_smarthome_thai_th.md b/docs/_posts/ahmedlone127/2024-09-16-whisper_tiny_smarthome_thai_th.md new file mode 100644 index 00000000000000..c32e1d30709cea --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-whisper_tiny_smarthome_thai_th.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Thai whisper_tiny_smarthome_thai WhisperForCTC from Porameht +author: John Snow Labs +name: whisper_tiny_smarthome_thai +date: 2024-09-16 +tags: [th, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: th +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_smarthome_thai` is a Thai model originally trained by Porameht. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_smarthome_thai_th_5.5.0_3.0_1726480492974.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_smarthome_thai_th_5.5.0_3.0_1726480492974.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_tiny_smarthome_thai","th") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_tiny_smarthome_thai", "th") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_smarthome_thai| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|th| +|Size:|389.8 MB| + +## References + +https://huggingface.co/Porameht/whisper-tiny-smarthome-thai \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-whisper_tiny_spanish_zuazo_es.md b/docs/_posts/ahmedlone127/2024-09-16-whisper_tiny_spanish_zuazo_es.md new file mode 100644 index 00000000000000..810bb03c69ba72 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-whisper_tiny_spanish_zuazo_es.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Castilian, Spanish whisper_tiny_spanish_zuazo WhisperForCTC from zuazo +author: John Snow Labs +name: whisper_tiny_spanish_zuazo +date: 2024-09-16 +tags: [es, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: es +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_spanish_zuazo` is a Castilian, Spanish model originally trained by zuazo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_spanish_zuazo_es_5.5.0_3.0_1726486380554.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_spanish_zuazo_es_5.5.0_3.0_1726486380554.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_tiny_spanish_zuazo","es") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_tiny_spanish_zuazo", "es") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_spanish_zuazo| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|es| +|Size:|390.5 MB| + +## References + +https://huggingface.co/zuazo/whisper-tiny-es \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-whisper_v4_small_en.md b/docs/_posts/ahmedlone127/2024-09-16-whisper_v4_small_en.md new file mode 100644 index 00000000000000..530cb2a04d5e0e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-whisper_v4_small_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_v4_small WhisperForCTC from karinthommen +author: John Snow Labs +name: whisper_v4_small +date: 2024-09-16 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_v4_small` is a English model originally trained by karinthommen. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_v4_small_en_5.5.0_3.0_1726488189979.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_v4_small_en_5.5.0_3.0_1726488189979.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_v4_small","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_v4_small", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_v4_small| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/karinthommen/whisper-V4-small \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-withinapps_ndd_addressbook_test_content_tags_cwadj_en.md b/docs/_posts/ahmedlone127/2024-09-16-withinapps_ndd_addressbook_test_content_tags_cwadj_en.md new file mode 100644 index 00000000000000..40eb154f68cf8d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-withinapps_ndd_addressbook_test_content_tags_cwadj_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English withinapps_ndd_addressbook_test_content_tags_cwadj DistilBertForSequenceClassification from lgk03 +author: John Snow Labs +name: withinapps_ndd_addressbook_test_content_tags_cwadj +date: 2024-09-16 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`withinapps_ndd_addressbook_test_content_tags_cwadj` is a English model originally trained by lgk03. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/withinapps_ndd_addressbook_test_content_tags_cwadj_en_5.5.0_3.0_1726525645004.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/withinapps_ndd_addressbook_test_content_tags_cwadj_en_5.5.0_3.0_1726525645004.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("withinapps_ndd_addressbook_test_content_tags_cwadj","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("withinapps_ndd_addressbook_test_content_tags_cwadj", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|withinapps_ndd_addressbook_test_content_tags_cwadj| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/lgk03/WITHINAPPS_NDD-addressbook_test-content_tags-CWAdj \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-xlm_roberta_base_finetuned_panx_all_isaacp_en.md b/docs/_posts/ahmedlone127/2024-09-16-xlm_roberta_base_finetuned_panx_all_isaacp_en.md new file mode 100644 index 00000000000000..f6f395cca5a401 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-xlm_roberta_base_finetuned_panx_all_isaacp_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_all_isaacp XlmRoBertaForTokenClassification from Isaacp +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_all_isaacp +date: 2024-09-16 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_all_isaacp` is a English model originally trained by Isaacp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_isaacp_en_5.5.0_3.0_1726497949983.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_isaacp_en_5.5.0_3.0_1726497949983.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_all_isaacp","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_all_isaacp", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_all_isaacp| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|861.0 MB| + +## References + +https://huggingface.co/Isaacp/xlm-roberta-base-finetuned-panx-all \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-xlm_roberta_base_finetuned_panx_all_jhagege_en.md b/docs/_posts/ahmedlone127/2024-09-16-xlm_roberta_base_finetuned_panx_all_jhagege_en.md new file mode 100644 index 00000000000000..6707c432b17d25 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-xlm_roberta_base_finetuned_panx_all_jhagege_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_all_jhagege XlmRoBertaForTokenClassification from jhagege +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_all_jhagege +date: 2024-09-16 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_all_jhagege` is a English model originally trained by jhagege. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_jhagege_en_5.5.0_3.0_1726495880745.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_jhagege_en_5.5.0_3.0_1726495880745.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_all_jhagege","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_all_jhagege", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_all_jhagege| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|859.8 MB| + +## References + +https://huggingface.co/jhagege/xlm-roberta-base-finetuned-panx-all \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-xlm_roberta_base_finetuned_panx_english_alkampfer_en.md b/docs/_posts/ahmedlone127/2024-09-16-xlm_roberta_base_finetuned_panx_english_alkampfer_en.md new file mode 100644 index 00000000000000..8b218cbc2b1592 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-xlm_roberta_base_finetuned_panx_english_alkampfer_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_english_alkampfer XlmRoBertaForTokenClassification from alkampfer +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_english_alkampfer +date: 2024-09-16 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_english_alkampfer` is a English model originally trained by alkampfer. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_alkampfer_en_5.5.0_3.0_1726496317127.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_alkampfer_en_5.5.0_3.0_1726496317127.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_english_alkampfer","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_english_alkampfer", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_english_alkampfer| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|826.4 MB| + +## References + +https://huggingface.co/alkampfer/xlm-roberta-base-finetuned-panx-en \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-xlm_roberta_base_finetuned_panx_english_param_mehta_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-xlm_roberta_base_finetuned_panx_english_param_mehta_pipeline_en.md new file mode 100644 index 00000000000000..074980dcd18426 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-xlm_roberta_base_finetuned_panx_english_param_mehta_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_english_param_mehta_pipeline pipeline XlmRoBertaForTokenClassification from param-mehta +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_english_param_mehta_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_english_param_mehta_pipeline` is a English model originally trained by param-mehta. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_param_mehta_pipeline_en_5.5.0_3.0_1726495548234.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_param_mehta_pipeline_en_5.5.0_3.0_1726495548234.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_english_param_mehta_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_english_param_mehta_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_english_param_mehta_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|817.2 MB| + +## References + +https://huggingface.co/param-mehta/xlm-roberta-base-finetuned-panx-en + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-xlm_roberta_base_finetuned_panx_german_brouwer_en.md b/docs/_posts/ahmedlone127/2024-09-16-xlm_roberta_base_finetuned_panx_german_brouwer_en.md new file mode 100644 index 00000000000000..045b77cec1fbae --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-xlm_roberta_base_finetuned_panx_german_brouwer_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_brouwer XlmRoBertaForTokenClassification from brouwer +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_brouwer +date: 2024-09-16 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_brouwer` is a English model originally trained by brouwer. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_brouwer_en_5.5.0_3.0_1726495909340.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_brouwer_en_5.5.0_3.0_1726495909340.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_brouwer","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_brouwer", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_brouwer| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|574.7 MB| + +## References + +https://huggingface.co/brouwer/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-xlm_roberta_base_finetuned_panx_german_french_francois2511_en.md b/docs/_posts/ahmedlone127/2024-09-16-xlm_roberta_base_finetuned_panx_german_french_francois2511_en.md new file mode 100644 index 00000000000000..e203e8eaf5b381 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-xlm_roberta_base_finetuned_panx_german_french_francois2511_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_francois2511 XlmRoBertaForTokenClassification from Francois2511 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_francois2511 +date: 2024-09-16 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_francois2511` is a English model originally trained by Francois2511. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_francois2511_en_5.5.0_3.0_1726496995627.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_francois2511_en_5.5.0_3.0_1726496995627.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_francois2511","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_francois2511", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_francois2511| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|843.4 MB| + +## References + +https://huggingface.co/Francois2511/xlm-roberta-base-finetuned-panx-de-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-xlm_roberta_base_finetuned_panx_german_french_tyayoi_en.md b/docs/_posts/ahmedlone127/2024-09-16-xlm_roberta_base_finetuned_panx_german_french_tyayoi_en.md new file mode 100644 index 00000000000000..17dc9d428d7b27 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-xlm_roberta_base_finetuned_panx_german_french_tyayoi_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_tyayoi XlmRoBertaForTokenClassification from tyayoi +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_tyayoi +date: 2024-09-16 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_tyayoi` is a English model originally trained by tyayoi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_tyayoi_en_5.5.0_3.0_1726495575577.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_tyayoi_en_5.5.0_3.0_1726495575577.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_tyayoi","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_tyayoi", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_tyayoi| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|843.4 MB| + +## References + +https://huggingface.co/tyayoi/xlm-roberta-base-finetuned-panx-de-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-xlm_roberta_base_finetuned_panx_german_french_tyayoi_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-xlm_roberta_base_finetuned_panx_german_french_tyayoi_pipeline_en.md new file mode 100644 index 00000000000000..1c197a59f9a20e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-xlm_roberta_base_finetuned_panx_german_french_tyayoi_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_tyayoi_pipeline pipeline XlmRoBertaForTokenClassification from tyayoi +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_tyayoi_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_tyayoi_pipeline` is a English model originally trained by tyayoi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_tyayoi_pipeline_en_5.5.0_3.0_1726495659438.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_tyayoi_pipeline_en_5.5.0_3.0_1726495659438.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_tyayoi_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_tyayoi_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_tyayoi_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|843.4 MB| + +## References + +https://huggingface.co/tyayoi/xlm-roberta-base-finetuned-panx-de-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-xlm_roberta_base_finetuned_panx_german_fyl1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-xlm_roberta_base_finetuned_panx_german_fyl1_pipeline_en.md new file mode 100644 index 00000000000000..a8778ce80cddf4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-xlm_roberta_base_finetuned_panx_german_fyl1_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_fyl1_pipeline pipeline XlmRoBertaForTokenClassification from fyl1 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_fyl1_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_fyl1_pipeline` is a English model originally trained by fyl1. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_fyl1_pipeline_en_5.5.0_3.0_1726496262153.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_fyl1_pipeline_en_5.5.0_3.0_1726496262153.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_fyl1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_fyl1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_fyl1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|840.8 MB| + +## References + +https://huggingface.co/fyl1/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-xlm_roberta_base_finetuned_panx_german_hr1588_en.md b/docs/_posts/ahmedlone127/2024-09-16-xlm_roberta_base_finetuned_panx_german_hr1588_en.md new file mode 100644 index 00000000000000..415de6fe91c110 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-xlm_roberta_base_finetuned_panx_german_hr1588_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_hr1588 XlmRoBertaForTokenClassification from hr1588 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_hr1588 +date: 2024-09-16 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_hr1588` is a English model originally trained by hr1588. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_hr1588_en_5.5.0_3.0_1726497791783.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_hr1588_en_5.5.0_3.0_1726497791783.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_hr1588","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_hr1588", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_hr1588| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|853.4 MB| + +## References + +https://huggingface.co/hr1588/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-xlm_roberta_base_finetuned_panx_german_nadle_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-xlm_roberta_base_finetuned_panx_german_nadle_pipeline_en.md new file mode 100644 index 00000000000000..9da8884e7c7c6d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-xlm_roberta_base_finetuned_panx_german_nadle_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_nadle_pipeline pipeline XlmRoBertaForTokenClassification from nadle +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_nadle_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_nadle_pipeline` is a English model originally trained by nadle. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_nadle_pipeline_en_5.5.0_3.0_1726495342585.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_nadle_pipeline_en_5.5.0_3.0_1726495342585.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_nadle_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_nadle_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_nadle_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|826.9 MB| + +## References + +https://huggingface.co/nadle/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-xlm_roberta_conll2003_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-xlm_roberta_conll2003_pipeline_en.md new file mode 100644 index 00000000000000..5f893c0221f7e0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-xlm_roberta_conll2003_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_conll2003_pipeline pipeline XlmRoBertaForTokenClassification from manirai91 +author: John Snow Labs +name: xlm_roberta_conll2003_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_conll2003_pipeline` is a English model originally trained by manirai91. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_conll2003_pipeline_en_5.5.0_3.0_1726495954275.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_conll2003_pipeline_en_5.5.0_3.0_1726495954275.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_conll2003_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_conll2003_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_conll2003_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|853.6 MB| + +## References + +https://huggingface.co/manirai91/xlm-roberta-conll2003 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-xlm_roberta_emotion_detector_en.md b/docs/_posts/ahmedlone127/2024-09-16-xlm_roberta_emotion_detector_en.md new file mode 100644 index 00000000000000..56dce52dce5bb0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-xlm_roberta_emotion_detector_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_emotion_detector XlmRoBertaForSequenceClassification from fremy7 +author: John Snow Labs +name: xlm_roberta_emotion_detector +date: 2024-09-16 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_emotion_detector` is a English model originally trained by fremy7. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_emotion_detector_en_5.5.0_3.0_1726516406866.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_emotion_detector_en_5.5.0_3.0_1726516406866.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_emotion_detector","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_emotion_detector", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_emotion_detector| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|816.5 MB| + +## References + +https://huggingface.co/fremy7/xlm_roberta_emotion_detector \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-xlm_roberta_finetuned_emojis_1_client_toxic_krum_non_iid_fed_en.md b/docs/_posts/ahmedlone127/2024-09-16-xlm_roberta_finetuned_emojis_1_client_toxic_krum_non_iid_fed_en.md new file mode 100644 index 00000000000000..2ed1505b76bc6a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-xlm_roberta_finetuned_emojis_1_client_toxic_krum_non_iid_fed_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_finetuned_emojis_1_client_toxic_krum_non_iid_fed XlmRoBertaForSequenceClassification from Karim-Gamal +author: John Snow Labs +name: xlm_roberta_finetuned_emojis_1_client_toxic_krum_non_iid_fed +date: 2024-09-16 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_finetuned_emojis_1_client_toxic_krum_non_iid_fed` is a English model originally trained by Karim-Gamal. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_finetuned_emojis_1_client_toxic_krum_non_iid_fed_en_5.5.0_3.0_1726517115203.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_finetuned_emojis_1_client_toxic_krum_non_iid_fed_en_5.5.0_3.0_1726517115203.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_finetuned_emojis_1_client_toxic_krum_non_iid_fed","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_finetuned_emojis_1_client_toxic_krum_non_iid_fed", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_finetuned_emojis_1_client_toxic_krum_non_iid_fed| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.0 GB| + +## References + +https://huggingface.co/Karim-Gamal/XLM-Roberta-finetuned-emojis-1-client-toxic-Krum-non-IID-Fed \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-yelp_polarity_tuned_distilbert_base_10k_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-yelp_polarity_tuned_distilbert_base_10k_pipeline_en.md new file mode 100644 index 00000000000000..fa5296c05021f9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-yelp_polarity_tuned_distilbert_base_10k_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English yelp_polarity_tuned_distilbert_base_10k_pipeline pipeline DistilBertForSequenceClassification from kbang2021 +author: John Snow Labs +name: yelp_polarity_tuned_distilbert_base_10k_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`yelp_polarity_tuned_distilbert_base_10k_pipeline` is a English model originally trained by kbang2021. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/yelp_polarity_tuned_distilbert_base_10k_pipeline_en_5.5.0_3.0_1726525432700.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/yelp_polarity_tuned_distilbert_base_10k_pipeline_en_5.5.0_3.0_1726525432700.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("yelp_polarity_tuned_distilbert_base_10k_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("yelp_polarity_tuned_distilbert_base_10k_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|yelp_polarity_tuned_distilbert_base_10k_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|246.0 MB| + +## References + +https://huggingface.co/kbang2021/yelp_polarity_tuned_distilbert_base_10K + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-yt_special_batch8_tiny_en.md b/docs/_posts/ahmedlone127/2024-09-16-yt_special_batch8_tiny_en.md new file mode 100644 index 00000000000000..73d3f71d3e3138 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-yt_special_batch8_tiny_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English yt_special_batch8_tiny WhisperForCTC from TheRains +author: John Snow Labs +name: yt_special_batch8_tiny +date: 2024-09-16 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`yt_special_batch8_tiny` is a English model originally trained by TheRains. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/yt_special_batch8_tiny_en_5.5.0_3.0_1726483691434.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/yt_special_batch8_tiny_en_5.5.0_3.0_1726483691434.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("yt_special_batch8_tiny","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("yt_special_batch8_tiny", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|yt_special_batch8_tiny| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|390.8 MB| + +## References + +https://huggingface.co/TheRains/yt-special-batch8-tiny \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-09_distilbert_qa_pytorch_full_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-09_distilbert_qa_pytorch_full_pipeline_en.md new file mode 100644 index 00000000000000..8194d8d3de642d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-09_distilbert_qa_pytorch_full_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English 09_distilbert_qa_pytorch_full_pipeline pipeline DistilBertForQuestionAnswering from tyavika +author: John Snow Labs +name: 09_distilbert_qa_pytorch_full_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`09_distilbert_qa_pytorch_full_pipeline` is a English model originally trained by tyavika. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/09_distilbert_qa_pytorch_full_pipeline_en_5.5.0_3.0_1726574960075.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/09_distilbert_qa_pytorch_full_pipeline_en_5.5.0_3.0_1726574960075.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("09_distilbert_qa_pytorch_full_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("09_distilbert_qa_pytorch_full_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|09_distilbert_qa_pytorch_full_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.3 MB| + +## References + +https://huggingface.co/tyavika/09-Distilbert-QA-Pytorch-FULL + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-2020_q1_filtered_tweets_tok_en.md b/docs/_posts/ahmedlone127/2024-09-17-2020_q1_filtered_tweets_tok_en.md new file mode 100644 index 00000000000000..0b87d7faac4b73 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-2020_q1_filtered_tweets_tok_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English 2020_q1_filtered_tweets_tok RoBertaEmbeddings from DouglasPontes +author: John Snow Labs +name: 2020_q1_filtered_tweets_tok +date: 2024-09-17 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`2020_q1_filtered_tweets_tok` is a English model originally trained by DouglasPontes. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/2020_q1_filtered_tweets_tok_en_5.5.0_3.0_1726595146538.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/2020_q1_filtered_tweets_tok_en_5.5.0_3.0_1726595146538.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("2020_q1_filtered_tweets_tok","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("2020_q1_filtered_tweets_tok","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|2020_q1_filtered_tweets_tok| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|466.8 MB| + +## References + +https://huggingface.co/DouglasPontes/2020-Q1-filtered_tweets_tok \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-all_roberta_large_v1_home_5_16_5_en.md b/docs/_posts/ahmedlone127/2024-09-17-all_roberta_large_v1_home_5_16_5_en.md new file mode 100644 index 00000000000000..acb5a7cde2d868 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-all_roberta_large_v1_home_5_16_5_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English all_roberta_large_v1_home_5_16_5 RoBertaForSequenceClassification from fathyshalab +author: John Snow Labs +name: all_roberta_large_v1_home_5_16_5 +date: 2024-09-17 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`all_roberta_large_v1_home_5_16_5` is a English model originally trained by fathyshalab. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/all_roberta_large_v1_home_5_16_5_en_5.5.0_3.0_1726591452708.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/all_roberta_large_v1_home_5_16_5_en_5.5.0_3.0_1726591452708.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("all_roberta_large_v1_home_5_16_5","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("all_roberta_large_v1_home_5_16_5", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|all_roberta_large_v1_home_5_16_5| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/fathyshalab/all-roberta-large-v1-home-5-16-5 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-all_roberta_large_v1_kitchen_and_dining_8_16_5_oos_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-all_roberta_large_v1_kitchen_and_dining_8_16_5_oos_pipeline_en.md new file mode 100644 index 00000000000000..62e67f25e30488 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-all_roberta_large_v1_kitchen_and_dining_8_16_5_oos_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English all_roberta_large_v1_kitchen_and_dining_8_16_5_oos_pipeline pipeline RoBertaForSequenceClassification from fathyshalab +author: John Snow Labs +name: all_roberta_large_v1_kitchen_and_dining_8_16_5_oos_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`all_roberta_large_v1_kitchen_and_dining_8_16_5_oos_pipeline` is a English model originally trained by fathyshalab. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/all_roberta_large_v1_kitchen_and_dining_8_16_5_oos_pipeline_en_5.5.0_3.0_1726591972009.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/all_roberta_large_v1_kitchen_and_dining_8_16_5_oos_pipeline_en_5.5.0_3.0_1726591972009.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("all_roberta_large_v1_kitchen_and_dining_8_16_5_oos_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("all_roberta_large_v1_kitchen_and_dining_8_16_5_oos_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|all_roberta_large_v1_kitchen_and_dining_8_16_5_oos_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/fathyshalab/all-roberta-large-v1-kitchen_and_dining-8-16-5-oos + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-alvaro_marian_finetuned_italian_portuguese_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-alvaro_marian_finetuned_italian_portuguese_pipeline_en.md new file mode 100644 index 00000000000000..ab35d1f00f9aeb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-alvaro_marian_finetuned_italian_portuguese_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English alvaro_marian_finetuned_italian_portuguese_pipeline pipeline MarianTransformer from Rooshan +author: John Snow Labs +name: alvaro_marian_finetuned_italian_portuguese_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`alvaro_marian_finetuned_italian_portuguese_pipeline` is a English model originally trained by Rooshan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/alvaro_marian_finetuned_italian_portuguese_pipeline_en_5.5.0_3.0_1726581713923.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/alvaro_marian_finetuned_italian_portuguese_pipeline_en_5.5.0_3.0_1726581713923.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("alvaro_marian_finetuned_italian_portuguese_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("alvaro_marian_finetuned_italian_portuguese_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|alvaro_marian_finetuned_italian_portuguese_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|376.8 MB| + +## References + +https://huggingface.co/Rooshan/Alvaro-marian_finetuned_it_pt + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-argument_classification_ibm_spanish_roberta_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-argument_classification_ibm_spanish_roberta_pipeline_en.md new file mode 100644 index 00000000000000..c2ff2fb37dd04b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-argument_classification_ibm_spanish_roberta_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English argument_classification_ibm_spanish_roberta_pipeline pipeline RoBertaForSequenceClassification from anhuu +author: John Snow Labs +name: argument_classification_ibm_spanish_roberta_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`argument_classification_ibm_spanish_roberta_pipeline` is a English model originally trained by anhuu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/argument_classification_ibm_spanish_roberta_pipeline_en_5.5.0_3.0_1726591007566.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/argument_classification_ibm_spanish_roberta_pipeline_en_5.5.0_3.0_1726591007566.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("argument_classification_ibm_spanish_roberta_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("argument_classification_ibm_spanish_roberta_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|argument_classification_ibm_spanish_roberta_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|444.7 MB| + +## References + +https://huggingface.co/anhuu/argument_classification_ibm_es_roberta + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-auro_2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-auro_2_pipeline_en.md new file mode 100644 index 00000000000000..61a366e10d8174 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-auro_2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English auro_2_pipeline pipeline RoBertaForSequenceClassification from BaronSch +author: John Snow Labs +name: auro_2_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`auro_2_pipeline` is a English model originally trained by BaronSch. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/auro_2_pipeline_en_5.5.0_3.0_1726573332893.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/auro_2_pipeline_en_5.5.0_3.0_1726573332893.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("auro_2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("auro_2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|auro_2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|468.5 MB| + +## References + +https://huggingface.co/BaronSch/AURO_2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-bert_base_multilingual_cased_finetuned_token_language_classification_xx.md b/docs/_posts/ahmedlone127/2024-09-17-bert_base_multilingual_cased_finetuned_token_language_classification_xx.md new file mode 100644 index 00000000000000..6b42ad75a3d07b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-bert_base_multilingual_cased_finetuned_token_language_classification_xx.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Multilingual bert_base_multilingual_cased_finetuned_token_language_classification BertForTokenClassification from emmabedna +author: John Snow Labs +name: bert_base_multilingual_cased_finetuned_token_language_classification +date: 2024-09-17 +tags: [xx, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_multilingual_cased_finetuned_token_language_classification` is a Multilingual model originally trained by emmabedna. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_multilingual_cased_finetuned_token_language_classification_xx_5.5.0_3.0_1726602232674.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_multilingual_cased_finetuned_token_language_classification_xx_5.5.0_3.0_1726602232674.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("bert_base_multilingual_cased_finetuned_token_language_classification","xx") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_base_multilingual_cased_finetuned_token_language_classification", "xx") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_multilingual_cased_finetuned_token_language_classification| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|xx| +|Size:|665.1 MB| + +## References + +https://huggingface.co/emmabedna/bert-base-multilingual-cased-finetuned-token_language_classification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-bert_base_squad_v1_1_portuguese_ibama_v0_220240904175650_en.md b/docs/_posts/ahmedlone127/2024-09-17-bert_base_squad_v1_1_portuguese_ibama_v0_220240904175650_en.md new file mode 100644 index 00000000000000..cb510d49276de5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-bert_base_squad_v1_1_portuguese_ibama_v0_220240904175650_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_squad_v1_1_portuguese_ibama_v0_220240904175650 BertForQuestionAnswering from alcalazans +author: John Snow Labs +name: bert_base_squad_v1_1_portuguese_ibama_v0_220240904175650 +date: 2024-09-17 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_squad_v1_1_portuguese_ibama_v0_220240904175650` is a English model originally trained by alcalazans. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_squad_v1_1_portuguese_ibama_v0_220240904175650_en_5.5.0_3.0_1726567378845.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_squad_v1_1_portuguese_ibama_v0_220240904175650_en_5.5.0_3.0_1726567378845.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_squad_v1_1_portuguese_ibama_v0_220240904175650","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_squad_v1_1_portuguese_ibama_v0_220240904175650", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_squad_v1_1_portuguese_ibama_v0_220240904175650| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|405.9 MB| + +## References + +https://huggingface.co/alcalazans/bert-base-squad-v1.1-pt-IBAMA_v0.220240904175650 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-bert_base_squad_v1_1_portuguese_ibama_v0_220240904182329_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-bert_base_squad_v1_1_portuguese_ibama_v0_220240904182329_pipeline_en.md new file mode 100644 index 00000000000000..d2ec89733a66e1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-bert_base_squad_v1_1_portuguese_ibama_v0_220240904182329_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_squad_v1_1_portuguese_ibama_v0_220240904182329_pipeline pipeline BertForQuestionAnswering from alcalazans +author: John Snow Labs +name: bert_base_squad_v1_1_portuguese_ibama_v0_220240904182329_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_squad_v1_1_portuguese_ibama_v0_220240904182329_pipeline` is a English model originally trained by alcalazans. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_squad_v1_1_portuguese_ibama_v0_220240904182329_pipeline_en_5.5.0_3.0_1726545199415.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_squad_v1_1_portuguese_ibama_v0_220240904182329_pipeline_en_5.5.0_3.0_1726545199415.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_squad_v1_1_portuguese_ibama_v0_220240904182329_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_squad_v1_1_portuguese_ibama_v0_220240904182329_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_squad_v1_1_portuguese_ibama_v0_220240904182329_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|405.9 MB| + +## References + +https://huggingface.co/alcalazans/bert-base-squad-v1.1-pt-IBAMA_v0.220240904182329 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-bert_base_swedish_cased_sv2_en.md b/docs/_posts/ahmedlone127/2024-09-17-bert_base_swedish_cased_sv2_en.md new file mode 100644 index 00000000000000..4bf140224ab7fd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-bert_base_swedish_cased_sv2_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_swedish_cased_sv2 BertForQuestionAnswering from monakth +author: John Snow Labs +name: bert_base_swedish_cased_sv2 +date: 2024-09-17 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_swedish_cased_sv2` is a English model originally trained by monakth. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_swedish_cased_sv2_en_5.5.0_3.0_1726567388557.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_swedish_cased_sv2_en_5.5.0_3.0_1726567388557.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_swedish_cased_sv2","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_swedish_cased_sv2", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_swedish_cased_sv2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|465.2 MB| + +## References + +https://huggingface.co/monakth/bert-base-swedish-cased-sv2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-bert_base_uncased_ep_10_0_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_false_fh_false_hs_1000_en.md b/docs/_posts/ahmedlone127/2024-09-17-bert_base_uncased_ep_10_0_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_false_fh_false_hs_1000_en.md new file mode 100644 index 00000000000000..b7d6543d5cd6cc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-bert_base_uncased_ep_10_0_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_false_fh_false_hs_1000_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_uncased_ep_10_0_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_false_fh_false_hs_1000 BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_ep_10_0_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_false_fh_false_hs_1000 +date: 2024-09-17 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_ep_10_0_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_false_fh_false_hs_1000` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_ep_10_0_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_false_fh_false_hs_1000_en_5.5.0_3.0_1726567649974.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_ep_10_0_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_false_fh_false_hs_1000_en_5.5.0_3.0_1726567649974.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_ep_10_0_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_false_fh_false_hs_1000","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_ep_10_0_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_false_fh_false_hs_1000", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_ep_10_0_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_false_fh_false_hs_1000| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-ep-10.0-b-32-lr-8e-07-dp-0.5-ss-0-st-False-fh-False-hs-1000 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-bert_base_uncased_ep_2_69_b_32_lr_1_2e_06_dp_0_3_swati_0_southern_sotho_true_fh_false_hs_0_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-bert_base_uncased_ep_2_69_b_32_lr_1_2e_06_dp_0_3_swati_0_southern_sotho_true_fh_false_hs_0_pipeline_en.md new file mode 100644 index 00000000000000..4e6491310e010f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-bert_base_uncased_ep_2_69_b_32_lr_1_2e_06_dp_0_3_swati_0_southern_sotho_true_fh_false_hs_0_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_uncased_ep_2_69_b_32_lr_1_2e_06_dp_0_3_swati_0_southern_sotho_true_fh_false_hs_0_pipeline pipeline BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_ep_2_69_b_32_lr_1_2e_06_dp_0_3_swati_0_southern_sotho_true_fh_false_hs_0_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_ep_2_69_b_32_lr_1_2e_06_dp_0_3_swati_0_southern_sotho_true_fh_false_hs_0_pipeline` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_ep_2_69_b_32_lr_1_2e_06_dp_0_3_swati_0_southern_sotho_true_fh_false_hs_0_pipeline_en_5.5.0_3.0_1726589908256.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_ep_2_69_b_32_lr_1_2e_06_dp_0_3_swati_0_southern_sotho_true_fh_false_hs_0_pipeline_en_5.5.0_3.0_1726589908256.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_ep_2_69_b_32_lr_1_2e_06_dp_0_3_swati_0_southern_sotho_true_fh_false_hs_0_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_ep_2_69_b_32_lr_1_2e_06_dp_0_3_swati_0_southern_sotho_true_fh_false_hs_0_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_ep_2_69_b_32_lr_1_2e_06_dp_0_3_swati_0_southern_sotho_true_fh_false_hs_0_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-ep-2.69-b-32-lr-1.2e-06-dp-0.3-ss-0-st-True-fh-False-hs-0 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-bert_base_uncased_ep_2_69_b_32_lr_4e_06_dp_0_1_swati_0_southern_sotho_false_fh_false_hs_700_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-bert_base_uncased_ep_2_69_b_32_lr_4e_06_dp_0_1_swati_0_southern_sotho_false_fh_false_hs_700_pipeline_en.md new file mode 100644 index 00000000000000..25609a82d0bdb1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-bert_base_uncased_ep_2_69_b_32_lr_4e_06_dp_0_1_swati_0_southern_sotho_false_fh_false_hs_700_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_uncased_ep_2_69_b_32_lr_4e_06_dp_0_1_swati_0_southern_sotho_false_fh_false_hs_700_pipeline pipeline BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_ep_2_69_b_32_lr_4e_06_dp_0_1_swati_0_southern_sotho_false_fh_false_hs_700_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_ep_2_69_b_32_lr_4e_06_dp_0_1_swati_0_southern_sotho_false_fh_false_hs_700_pipeline` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_ep_2_69_b_32_lr_4e_06_dp_0_1_swati_0_southern_sotho_false_fh_false_hs_700_pipeline_en_5.5.0_3.0_1726567469175.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_ep_2_69_b_32_lr_4e_06_dp_0_1_swati_0_southern_sotho_false_fh_false_hs_700_pipeline_en_5.5.0_3.0_1726567469175.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_ep_2_69_b_32_lr_4e_06_dp_0_1_swati_0_southern_sotho_false_fh_false_hs_700_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_ep_2_69_b_32_lr_4e_06_dp_0_1_swati_0_southern_sotho_false_fh_false_hs_700_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_ep_2_69_b_32_lr_4e_06_dp_0_1_swati_0_southern_sotho_false_fh_false_hs_700_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-ep-2.69-b-32-lr-4e-06-dp-0.1-ss-0-st-False-fh-False-hs-700 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-bert_base_uncased_qnli_modeltc_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-bert_base_uncased_qnli_modeltc_pipeline_en.md new file mode 100644 index 00000000000000..950b3dee50c23e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-bert_base_uncased_qnli_modeltc_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_base_uncased_qnli_modeltc_pipeline pipeline BertForSequenceClassification from ModelTC +author: John Snow Labs +name: bert_base_uncased_qnli_modeltc_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_qnli_modeltc_pipeline` is a English model originally trained by ModelTC. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_qnli_modeltc_pipeline_en_5.5.0_3.0_1726604380481.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_qnli_modeltc_pipeline_en_5.5.0_3.0_1726604380481.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_qnli_modeltc_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_qnli_modeltc_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_qnli_modeltc_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/ModelTC/bert-base-uncased-qnli + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-bert_classifier_sead_l_6_h_256_a_8_mrpc_en.md b/docs/_posts/ahmedlone127/2024-09-17-bert_classifier_sead_l_6_h_256_a_8_mrpc_en.md new file mode 100644 index 00000000000000..43250338458527 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-bert_classifier_sead_l_6_h_256_a_8_mrpc_en.md @@ -0,0 +1,111 @@ +--- +layout: model +title: English BertForSequenceClassification Cased model (from course5i) +author: John Snow Labs +name: bert_classifier_sead_l_6_h_256_a_8_mrpc +date: 2024-09-17 +tags: [en, open_source, bert, sequence_classification, classification, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `SEAD-L-6_H-256_A-8-mrpc` is a English model originally trained by `course5i`. + +## Predicted Entities + +`0`, `1` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_classifier_sead_l_6_h_256_a_8_mrpc_en_5.5.0_3.0_1726604598811.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_classifier_sead_l_6_h_256_a_8_mrpc_en_5.5.0_3.0_1726604598811.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +seq_classifier = BertForSequenceClassification.pretrained("bert_classifier_sead_l_6_h_256_a_8_mrpc","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("class") + +pipeline = Pipeline(stages=[documentAssembler, tokenizer, seq_classifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCols(Array("text")) + .setOutputCols(Array("document")) + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val seq_classifier = BertForSequenceClassification.pretrained("bert_classifier_sead_l_6_h_256_a_8_mrpc","en") + .setInputCols(Array("document", "token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, seq_classifier)) + +val data = Seq("PUT YOUR STRING HERE").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.classify.bert.glue.6l_256d_a8a_256d").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_classifier_sead_l_6_h_256_a_8_mrpc| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|47.3 MB| + +## References + +References + +- https://huggingface.co/course5i/SEAD-L-6_H-256_A-8-mrpc +- https://arxiv.org/abs/1910.01108 +- https://arxiv.org/abs/1909.10351 +- https://arxiv.org/abs/2002.10957 +- https://arxiv.org/abs/1810.04805 +- https://arxiv.org/abs/1804.07461 +- https://arxiv.org/abs/1905.00537 +- https://www.adasci.org/journals/lattice-35309407/?volumes=true&open=621a3b18edc4364e8a96cb63 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-bert_finetuned_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-bert_finetuned_pipeline_en.md new file mode 100644 index 00000000000000..a559c9f6403b6e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-bert_finetuned_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_finetuned_pipeline pipeline DistilBertForQuestionAnswering from harshil30402 +author: John Snow Labs +name: bert_finetuned_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_finetuned_pipeline` is a English model originally trained by harshil30402. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_finetuned_pipeline_en_5.5.0_3.0_1726555256509.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_finetuned_pipeline_en_5.5.0_3.0_1726555256509.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_finetuned_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_finetuned_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_finetuned_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.3 MB| + +## References + +https://huggingface.co/harshil30402/bert_finetuned + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-bert_gemma2b_sanity_vllm_0_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-bert_gemma2b_sanity_vllm_0_pipeline_en.md new file mode 100644 index 00000000000000..5ca98f28a0f3f5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-bert_gemma2b_sanity_vllm_0_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_gemma2b_sanity_vllm_0_pipeline pipeline DistilBertForSequenceClassification from jvelja +author: John Snow Labs +name: bert_gemma2b_sanity_vllm_0_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_gemma2b_sanity_vllm_0_pipeline` is a English model originally trained by jvelja. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_gemma2b_sanity_vllm_0_pipeline_en_5.5.0_3.0_1726584893370.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_gemma2b_sanity_vllm_0_pipeline_en_5.5.0_3.0_1726584893370.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_gemma2b_sanity_vllm_0_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_gemma2b_sanity_vllm_0_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_gemma2b_sanity_vllm_0_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/jvelja/BERT_gemma2b-sanity-vllm_0 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-bert_milei_es.md b/docs/_posts/ahmedlone127/2024-09-17-bert_milei_es.md new file mode 100644 index 00000000000000..34c52d5d4a42ab --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-bert_milei_es.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Castilian, Spanish bert_milei XlmRoBertaForSequenceClassification from nmarinnn +author: John Snow Labs +name: bert_milei +date: 2024-09-17 +tags: [es, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: es +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_milei` is a Castilian, Spanish model originally trained by nmarinnn. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_milei_es_5.5.0_3.0_1726535229412.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_milei_es_5.5.0_3.0_1726535229412.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("bert_milei","es") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("bert_milei", "es") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_milei| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|es| +|Size:|1.0 GB| + +## References + +https://huggingface.co/nmarinnn/bert-milei \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-bert_milei_pipeline_es.md b/docs/_posts/ahmedlone127/2024-09-17-bert_milei_pipeline_es.md new file mode 100644 index 00000000000000..fb6c22e9527257 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-bert_milei_pipeline_es.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Castilian, Spanish bert_milei_pipeline pipeline XlmRoBertaForSequenceClassification from nmarinnn +author: John Snow Labs +name: bert_milei_pipeline +date: 2024-09-17 +tags: [es, open_source, pipeline, onnx] +task: Text Classification +language: es +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_milei_pipeline` is a Castilian, Spanish model originally trained by nmarinnn. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_milei_pipeline_es_5.5.0_3.0_1726535278930.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_milei_pipeline_es_5.5.0_3.0_1726535278930.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_milei_pipeline", lang = "es") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_milei_pipeline", lang = "es") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_milei_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|es| +|Size:|1.0 GB| + +## References + +https://huggingface.co/nmarinnn/bert-milei + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-brwac_v1_4__checkpoint4_en.md b/docs/_posts/ahmedlone127/2024-09-17-brwac_v1_4__checkpoint4_en.md new file mode 100644 index 00000000000000..483482e06d09ab --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-brwac_v1_4__checkpoint4_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English brwac_v1_4__checkpoint4 RoBertaEmbeddings from eduagarcia-temp +author: John Snow Labs +name: brwac_v1_4__checkpoint4 +date: 2024-09-17 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`brwac_v1_4__checkpoint4` is a English model originally trained by eduagarcia-temp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/brwac_v1_4__checkpoint4_en_5.5.0_3.0_1726603018682.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/brwac_v1_4__checkpoint4_en_5.5.0_3.0_1726603018682.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("brwac_v1_4__checkpoint4","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("brwac_v1_4__checkpoint4","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|brwac_v1_4__checkpoint4| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|298.0 MB| + +## References + +https://huggingface.co/eduagarcia-temp/brwac_v1_4__checkpoint4 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-bsc_bio_ehr_spanish_carmen_procedimiento_es.md b/docs/_posts/ahmedlone127/2024-09-17-bsc_bio_ehr_spanish_carmen_procedimiento_es.md new file mode 100644 index 00000000000000..eca205edd0cbeb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-bsc_bio_ehr_spanish_carmen_procedimiento_es.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Castilian, Spanish bsc_bio_ehr_spanish_carmen_procedimiento RoBertaForTokenClassification from BSC-NLP4BIA +author: John Snow Labs +name: bsc_bio_ehr_spanish_carmen_procedimiento +date: 2024-09-17 +tags: [es, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: es +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bsc_bio_ehr_spanish_carmen_procedimiento` is a Castilian, Spanish model originally trained by BSC-NLP4BIA. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bsc_bio_ehr_spanish_carmen_procedimiento_es_5.5.0_3.0_1726538136538.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bsc_bio_ehr_spanish_carmen_procedimiento_es_5.5.0_3.0_1726538136538.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("bsc_bio_ehr_spanish_carmen_procedimiento","es") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("bsc_bio_ehr_spanish_carmen_procedimiento", "es") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bsc_bio_ehr_spanish_carmen_procedimiento| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|es| +|Size:|437.6 MB| + +## References + +https://huggingface.co/BSC-NLP4BIA/bsc-bio-ehr-es-carmen-procedimiento \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-bsc_bio_ehr_spanish_carmen_procedimiento_pipeline_es.md b/docs/_posts/ahmedlone127/2024-09-17-bsc_bio_ehr_spanish_carmen_procedimiento_pipeline_es.md new file mode 100644 index 00000000000000..4ee119413ccc8c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-bsc_bio_ehr_spanish_carmen_procedimiento_pipeline_es.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Castilian, Spanish bsc_bio_ehr_spanish_carmen_procedimiento_pipeline pipeline RoBertaForTokenClassification from BSC-NLP4BIA +author: John Snow Labs +name: bsc_bio_ehr_spanish_carmen_procedimiento_pipeline +date: 2024-09-17 +tags: [es, open_source, pipeline, onnx] +task: Named Entity Recognition +language: es +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bsc_bio_ehr_spanish_carmen_procedimiento_pipeline` is a Castilian, Spanish model originally trained by BSC-NLP4BIA. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bsc_bio_ehr_spanish_carmen_procedimiento_pipeline_es_5.5.0_3.0_1726538162055.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bsc_bio_ehr_spanish_carmen_procedimiento_pipeline_es_5.5.0_3.0_1726538162055.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bsc_bio_ehr_spanish_carmen_procedimiento_pipeline", lang = "es") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bsc_bio_ehr_spanish_carmen_procedimiento_pipeline", lang = "es") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bsc_bio_ehr_spanish_carmen_procedimiento_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|es| +|Size:|437.7 MB| + +## References + +https://huggingface.co/BSC-NLP4BIA/bsc-bio-ehr-es-carmen-procedimiento + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-bsc_bio_ehr_spanish_livingner1_pipeline_es.md b/docs/_posts/ahmedlone127/2024-09-17-bsc_bio_ehr_spanish_livingner1_pipeline_es.md new file mode 100644 index 00000000000000..4ef515fb61e67d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-bsc_bio_ehr_spanish_livingner1_pipeline_es.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Castilian, Spanish bsc_bio_ehr_spanish_livingner1_pipeline pipeline RoBertaForTokenClassification from IIC +author: John Snow Labs +name: bsc_bio_ehr_spanish_livingner1_pipeline +date: 2024-09-17 +tags: [es, open_source, pipeline, onnx] +task: Named Entity Recognition +language: es +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bsc_bio_ehr_spanish_livingner1_pipeline` is a Castilian, Spanish model originally trained by IIC. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bsc_bio_ehr_spanish_livingner1_pipeline_es_5.5.0_3.0_1726538175580.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bsc_bio_ehr_spanish_livingner1_pipeline_es_5.5.0_3.0_1726538175580.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bsc_bio_ehr_spanish_livingner1_pipeline", lang = "es") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bsc_bio_ehr_spanish_livingner1_pipeline", lang = "es") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bsc_bio_ehr_spanish_livingner1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|es| +|Size:|437.9 MB| + +## References + +https://huggingface.co/IIC/bsc-bio-ehr-es-livingner1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-bsc_bio_ehr_spanish_symptemist_word2vec_8_ner_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-bsc_bio_ehr_spanish_symptemist_word2vec_8_ner_pipeline_en.md new file mode 100644 index 00000000000000..ba83821d85867c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-bsc_bio_ehr_spanish_symptemist_word2vec_8_ner_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bsc_bio_ehr_spanish_symptemist_word2vec_8_ner_pipeline pipeline RoBertaForTokenClassification from Rodrigo1771 +author: John Snow Labs +name: bsc_bio_ehr_spanish_symptemist_word2vec_8_ner_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bsc_bio_ehr_spanish_symptemist_word2vec_8_ner_pipeline` is a English model originally trained by Rodrigo1771. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bsc_bio_ehr_spanish_symptemist_word2vec_8_ner_pipeline_en_5.5.0_3.0_1726538154429.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bsc_bio_ehr_spanish_symptemist_word2vec_8_ner_pipeline_en_5.5.0_3.0_1726538154429.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bsc_bio_ehr_spanish_symptemist_word2vec_8_ner_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bsc_bio_ehr_spanish_symptemist_word2vec_8_ner_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bsc_bio_ehr_spanish_symptemist_word2vec_8_ner_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|435.1 MB| + +## References + +https://huggingface.co/Rodrigo1771/bsc-bio-ehr-es-symptemist-word2vec-8-ner + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-burmese_awesome_model_katxtong_en.md b/docs/_posts/ahmedlone127/2024-09-17-burmese_awesome_model_katxtong_en.md new file mode 100644 index 00000000000000..6965434c838498 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-burmese_awesome_model_katxtong_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English burmese_awesome_model_katxtong DistilBertForQuestionAnswering from katxtong +author: John Snow Labs +name: burmese_awesome_model_katxtong +date: 2024-09-17 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_katxtong` is a English model originally trained by katxtong. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_katxtong_en_5.5.0_3.0_1726575010959.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_katxtong_en_5.5.0_3.0_1726575010959.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("burmese_awesome_model_katxtong","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("burmese_awesome_model_katxtong", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_katxtong| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/katxtong/my_awesome_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-burmese_awesome_model_massiaz_en.md b/docs/_posts/ahmedlone127/2024-09-17-burmese_awesome_model_massiaz_en.md new file mode 100644 index 00000000000000..d8fcfb27cf621c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-burmese_awesome_model_massiaz_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_awesome_model_massiaz DistilBertForSequenceClassification from massiaz +author: John Snow Labs +name: burmese_awesome_model_massiaz +date: 2024-09-17 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_massiaz` is a English model originally trained by massiaz. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_massiaz_en_5.5.0_3.0_1726584285388.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_massiaz_en_5.5.0_3.0_1726584285388.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_massiaz","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_massiaz", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_massiaz| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/massiaz/my_awesome_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-burmese_awesome_model_yeshiovo_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-burmese_awesome_model_yeshiovo_pipeline_en.md new file mode 100644 index 00000000000000..2de3bd366ae22f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-burmese_awesome_model_yeshiovo_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_model_yeshiovo_pipeline pipeline DistilBertForSequenceClassification from yeshiovo +author: John Snow Labs +name: burmese_awesome_model_yeshiovo_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_yeshiovo_pipeline` is a English model originally trained by yeshiovo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_yeshiovo_pipeline_en_5.5.0_3.0_1726593466497.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_yeshiovo_pipeline_en_5.5.0_3.0_1726593466497.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_model_yeshiovo_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_model_yeshiovo_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_yeshiovo_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/yeshiovo/my_awesome_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-burmese_awesome_qa_model_meziane_en.md b/docs/_posts/ahmedlone127/2024-09-17-burmese_awesome_qa_model_meziane_en.md new file mode 100644 index 00000000000000..5d796225e71b15 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-burmese_awesome_qa_model_meziane_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English burmese_awesome_qa_model_meziane DistilBertForQuestionAnswering from Meziane +author: John Snow Labs +name: burmese_awesome_qa_model_meziane +date: 2024-09-17 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_qa_model_meziane` is a English model originally trained by Meziane. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_meziane_en_5.5.0_3.0_1726600018002.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_meziane_en_5.5.0_3.0_1726600018002.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("burmese_awesome_qa_model_meziane","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("burmese_awesome_qa_model_meziane", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_qa_model_meziane| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/Meziane/my_awesome_qa_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-burmese_awesome_qa_model_naitik370_en.md b/docs/_posts/ahmedlone127/2024-09-17-burmese_awesome_qa_model_naitik370_en.md new file mode 100644 index 00000000000000..9fbbccc2bce2f3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-burmese_awesome_qa_model_naitik370_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English burmese_awesome_qa_model_naitik370 DistilBertForQuestionAnswering from Naitik370 +author: John Snow Labs +name: burmese_awesome_qa_model_naitik370 +date: 2024-09-17 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_qa_model_naitik370` is a English model originally trained by Naitik370. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_naitik370_en_5.5.0_3.0_1726574642188.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_naitik370_en_5.5.0_3.0_1726574642188.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("burmese_awesome_qa_model_naitik370","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("burmese_awesome_qa_model_naitik370", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_qa_model_naitik370| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/Naitik370/my_awesome_qa_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-burmese_translation_helsinki2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-burmese_translation_helsinki2_pipeline_en.md new file mode 100644 index 00000000000000..850175b8fdf6de --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-burmese_translation_helsinki2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_translation_helsinki2_pipeline pipeline MarianTransformer from duwuonline +author: John Snow Labs +name: burmese_translation_helsinki2_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_translation_helsinki2_pipeline` is a English model originally trained by duwuonline. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_translation_helsinki2_pipeline_en_5.5.0_3.0_1726532850843.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_translation_helsinki2_pipeline_en_5.5.0_3.0_1726532850843.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_translation_helsinki2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_translation_helsinki2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_translation_helsinki2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|475.3 MB| + +## References + +https://huggingface.co/duwuonline/my-translation-helsinki2 + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-ca3109_movie_genre_classification_from_keywords_en.md b/docs/_posts/ahmedlone127/2024-09-17-ca3109_movie_genre_classification_from_keywords_en.md new file mode 100644 index 00000000000000..9a86d1e70540ee --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-ca3109_movie_genre_classification_from_keywords_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English ca3109_movie_genre_classification_from_keywords DistilBertForSequenceClassification from JordanTallon +author: John Snow Labs +name: ca3109_movie_genre_classification_from_keywords +date: 2024-09-17 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ca3109_movie_genre_classification_from_keywords` is a English model originally trained by JordanTallon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ca3109_movie_genre_classification_from_keywords_en_5.5.0_3.0_1726594072196.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ca3109_movie_genre_classification_from_keywords_en_5.5.0_3.0_1726594072196.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("ca3109_movie_genre_classification_from_keywords","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("ca3109_movie_genre_classification_from_keywords", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ca3109_movie_genre_classification_from_keywords| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/JordanTallon/CA3109-Movie-Genre-Classification-From-Keywords \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-cat_sayula_popoluca_iw_catalan_galician_en.md b/docs/_posts/ahmedlone127/2024-09-17-cat_sayula_popoluca_iw_catalan_galician_en.md new file mode 100644 index 00000000000000..a55d5571a9b653 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-cat_sayula_popoluca_iw_catalan_galician_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English cat_sayula_popoluca_iw_catalan_galician XlmRoBertaForTokenClassification from homersimpson +author: John Snow Labs +name: cat_sayula_popoluca_iw_catalan_galician +date: 2024-09-17 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`cat_sayula_popoluca_iw_catalan_galician` is a English model originally trained by homersimpson. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/cat_sayula_popoluca_iw_catalan_galician_en_5.5.0_3.0_1726576394018.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/cat_sayula_popoluca_iw_catalan_galician_en_5.5.0_3.0_1726576394018.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("cat_sayula_popoluca_iw_catalan_galician","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("cat_sayula_popoluca_iw_catalan_galician", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|cat_sayula_popoluca_iw_catalan_galician| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|432.1 MB| + +## References + +https://huggingface.co/homersimpson/cat-pos-iw-ca-gl \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-chinese_roberta_wwm_ext_2_0_8_ddp_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-chinese_roberta_wwm_ext_2_0_8_ddp_pipeline_en.md new file mode 100644 index 00000000000000..b059cde926f66c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-chinese_roberta_wwm_ext_2_0_8_ddp_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English chinese_roberta_wwm_ext_2_0_8_ddp_pipeline pipeline BertForQuestionAnswering from DaydreamerF +author: John Snow Labs +name: chinese_roberta_wwm_ext_2_0_8_ddp_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`chinese_roberta_wwm_ext_2_0_8_ddp_pipeline` is a English model originally trained by DaydreamerF. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/chinese_roberta_wwm_ext_2_0_8_ddp_pipeline_en_5.5.0_3.0_1726544249381.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/chinese_roberta_wwm_ext_2_0_8_ddp_pipeline_en_5.5.0_3.0_1726544249381.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("chinese_roberta_wwm_ext_2_0_8_ddp_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("chinese_roberta_wwm_ext_2_0_8_ddp_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|chinese_roberta_wwm_ext_2_0_8_ddp_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|381.0 MB| + +## References + +https://huggingface.co/DaydreamerF/chinese-roberta-wwm-ext-2.0-8-ddp + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-cl_style_1_1_epoch_recipe_pretrained_roberta_base_squadv2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-cl_style_1_1_epoch_recipe_pretrained_roberta_base_squadv2_pipeline_en.md new file mode 100644 index 00000000000000..226e077b4b181c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-cl_style_1_1_epoch_recipe_pretrained_roberta_base_squadv2_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English cl_style_1_1_epoch_recipe_pretrained_roberta_base_squadv2_pipeline pipeline RoBertaForQuestionAnswering from AnonymousSub +author: John Snow Labs +name: cl_style_1_1_epoch_recipe_pretrained_roberta_base_squadv2_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`cl_style_1_1_epoch_recipe_pretrained_roberta_base_squadv2_pipeline` is a English model originally trained by AnonymousSub. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/cl_style_1_1_epoch_recipe_pretrained_roberta_base_squadv2_pipeline_en_5.5.0_3.0_1726580653649.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/cl_style_1_1_epoch_recipe_pretrained_roberta_base_squadv2_pipeline_en_5.5.0_3.0_1726580653649.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("cl_style_1_1_epoch_recipe_pretrained_roberta_base_squadv2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("cl_style_1_1_epoch_recipe_pretrained_roberta_base_squadv2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|cl_style_1_1_epoch_recipe_pretrained_roberta_base_squadv2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|466.3 MB| + +## References + +https://huggingface.co/AnonymousSub/CL_style_1_1_epoch_recipe_pretrained_roberta_base_squadv2 + +## Included Models + +- MultiDocumentAssembler +- RoBertaForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-clasificador_onestop_english_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-clasificador_onestop_english_pipeline_en.md new file mode 100644 index 00000000000000..f3c68cf97b4e66 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-clasificador_onestop_english_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English clasificador_onestop_english_pipeline pipeline AlbertForSequenceClassification from algomet +author: John Snow Labs +name: clasificador_onestop_english_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained AlbertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`clasificador_onestop_english_pipeline` is a English model originally trained by algomet. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/clasificador_onestop_english_pipeline_en_5.5.0_3.0_1726601102539.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/clasificador_onestop_english_pipeline_en_5.5.0_3.0_1726601102539.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("clasificador_onestop_english_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("clasificador_onestop_english_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|clasificador_onestop_english_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|44.2 MB| + +## References + +https://huggingface.co/algomet/clasificador-onestop-english + +## Included Models + +- DocumentAssembler +- TokenizerModel +- AlbertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-class_poems_spanish_en.md b/docs/_posts/ahmedlone127/2024-09-17-class_poems_spanish_en.md new file mode 100644 index 00000000000000..b4dd6812a2dd6c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-class_poems_spanish_en.md @@ -0,0 +1,98 @@ +--- +layout: model +title: English class_poems_spanish RoBertaForSequenceClassification from hackathon-pln-es +author: John Snow Labs +name: class_poems_spanish +date: 2024-09-17 +tags: [roberta, en, open_source, sequence_classification, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`class_poems_spanish` is a English model originally trained by hackathon-pln-es. + +## Predicted Entities + + + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/class_poems_spanish_en_5.5.0_3.0_1726574081914.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/class_poems_spanish_en_5.5.0_3.0_1726574081914.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +document_assembler = DocumentAssembler()\ + .setInputCol("text")\ + .setOutputCol("document") + +tokenizer = Tokenizer()\ + .setInputCols("document")\ + .setOutputCol("token") + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("class_poems_spanish","en")\ + .setInputCols(["document","token"])\ + .setOutputCol("class") + +pipeline = Pipeline().setStages([document_assembler, tokenizer, sequenceClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val document_assembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("class_poems_spanish","en") + .setInputCols(Array("document","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|class_poems_spanish| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|435.1 MB| + +## References + +References + +https://huggingface.co/hackathon-pln-es/class-poems-es \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-classifier__generated_data_only__meansdetection_albert_en.md b/docs/_posts/ahmedlone127/2024-09-17-classifier__generated_data_only__meansdetection_albert_en.md new file mode 100644 index 00000000000000..b5c536d1d84f19 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-classifier__generated_data_only__meansdetection_albert_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English classifier__generated_data_only__meansdetection_albert AlbertForSequenceClassification from yevhenkost +author: John Snow Labs +name: classifier__generated_data_only__meansdetection_albert +date: 2024-09-17 +tags: [en, open_source, onnx, sequence_classification, albert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: AlbertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained AlbertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`classifier__generated_data_only__meansdetection_albert` is a English model originally trained by yevhenkost. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/classifier__generated_data_only__meansdetection_albert_en_5.5.0_3.0_1726600525168.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/classifier__generated_data_only__meansdetection_albert_en_5.5.0_3.0_1726600525168.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = AlbertForSequenceClassification.pretrained("classifier__generated_data_only__meansdetection_albert","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = AlbertForSequenceClassification.pretrained("classifier__generated_data_only__meansdetection_albert", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|classifier__generated_data_only__meansdetection_albert| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|44.2 MB| + +## References + +https://huggingface.co/yevhenkost/classifier__generated_data_only__meansdetection_albert \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-classifier__generated_data_only__meansdetection_albert_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-classifier__generated_data_only__meansdetection_albert_pipeline_en.md new file mode 100644 index 00000000000000..d4328c307072aa --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-classifier__generated_data_only__meansdetection_albert_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English classifier__generated_data_only__meansdetection_albert_pipeline pipeline AlbertForSequenceClassification from yevhenkost +author: John Snow Labs +name: classifier__generated_data_only__meansdetection_albert_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained AlbertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`classifier__generated_data_only__meansdetection_albert_pipeline` is a English model originally trained by yevhenkost. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/classifier__generated_data_only__meansdetection_albert_pipeline_en_5.5.0_3.0_1726600527951.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/classifier__generated_data_only__meansdetection_albert_pipeline_en_5.5.0_3.0_1726600527951.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("classifier__generated_data_only__meansdetection_albert_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("classifier__generated_data_only__meansdetection_albert_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|classifier__generated_data_only__meansdetection_albert_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|44.2 MB| + +## References + +https://huggingface.co/yevhenkost/classifier__generated_data_only__meansdetection_albert + +## Included Models + +- DocumentAssembler +- TokenizerModel +- AlbertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-clinicalbertprqa_150_mincls_en.md b/docs/_posts/ahmedlone127/2024-09-17-clinicalbertprqa_150_mincls_en.md new file mode 100644 index 00000000000000..6377d32b06f927 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-clinicalbertprqa_150_mincls_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English clinicalbertprqa_150_mincls BertForQuestionAnswering from lanzv +author: John Snow Labs +name: clinicalbertprqa_150_mincls +date: 2024-09-17 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`clinicalbertprqa_150_mincls` is a English model originally trained by lanzv. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/clinicalbertprqa_150_mincls_en_5.5.0_3.0_1726567510428.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/clinicalbertprqa_150_mincls_en_5.5.0_3.0_1726567510428.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("clinicalbertprqa_150_mincls","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("clinicalbertprqa_150_mincls", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|clinicalbertprqa_150_mincls| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|403.3 MB| + +## References + +https://huggingface.co/lanzv/ClinicalBERTPRQA_150_mincls \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-clr_pretrained_roberta_base_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-clr_pretrained_roberta_base_pipeline_en.md new file mode 100644 index 00000000000000..08a8a7a0b2dbab --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-clr_pretrained_roberta_base_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English clr_pretrained_roberta_base_pipeline pipeline RoBertaEmbeddings from SauravMaheshkar +author: John Snow Labs +name: clr_pretrained_roberta_base_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`clr_pretrained_roberta_base_pipeline` is a English model originally trained by SauravMaheshkar. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/clr_pretrained_roberta_base_pipeline_en_5.5.0_3.0_1726602567779.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/clr_pretrained_roberta_base_pipeline_en_5.5.0_3.0_1726602567779.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("clr_pretrained_roberta_base_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("clr_pretrained_roberta_base_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|clr_pretrained_roberta_base_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|466.4 MB| + +## References + +https://huggingface.co/SauravMaheshkar/clr-pretrained-roberta-base + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-cnj_v1_2__checkpoint_last_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-cnj_v1_2__checkpoint_last_pipeline_en.md new file mode 100644 index 00000000000000..6a5135208eb904 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-cnj_v1_2__checkpoint_last_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English cnj_v1_2__checkpoint_last_pipeline pipeline RoBertaEmbeddings from eduagarcia-temp +author: John Snow Labs +name: cnj_v1_2__checkpoint_last_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`cnj_v1_2__checkpoint_last_pipeline` is a English model originally trained by eduagarcia-temp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/cnj_v1_2__checkpoint_last_pipeline_en_5.5.0_3.0_1726603290466.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/cnj_v1_2__checkpoint_last_pipeline_en_5.5.0_3.0_1726603290466.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("cnj_v1_2__checkpoint_last_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("cnj_v1_2__checkpoint_last_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|cnj_v1_2__checkpoint_last_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|298.3 MB| + +## References + +https://huggingface.co/eduagarcia-temp/cnj_v1_2__checkpoint_last + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-codebert_base_en.md b/docs/_posts/ahmedlone127/2024-09-17-codebert_base_en.md new file mode 100644 index 00000000000000..ab34ebf40ea0d3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-codebert_base_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English codebert_base RoBertaForTokenClassification from DianaIulia +author: John Snow Labs +name: codebert_base +date: 2024-09-17 +tags: [en, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`codebert_base` is a English model originally trained by DianaIulia. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/codebert_base_en_5.5.0_3.0_1726537512795.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/codebert_base_en_5.5.0_3.0_1726537512795.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("codebert_base","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("codebert_base", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|codebert_base| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|466.1 MB| + +## References + +https://huggingface.co/DianaIulia/codebert-base \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-cold_fusion_itr9_seed4_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-cold_fusion_itr9_seed4_pipeline_en.md new file mode 100644 index 00000000000000..bf49b86142a6ad --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-cold_fusion_itr9_seed4_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English cold_fusion_itr9_seed4_pipeline pipeline RoBertaForSequenceClassification from ibm +author: John Snow Labs +name: cold_fusion_itr9_seed4_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`cold_fusion_itr9_seed4_pipeline` is a English model originally trained by ibm. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/cold_fusion_itr9_seed4_pipeline_en_5.5.0_3.0_1726591143275.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/cold_fusion_itr9_seed4_pipeline_en_5.5.0_3.0_1726591143275.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("cold_fusion_itr9_seed4_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("cold_fusion_itr9_seed4_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|cold_fusion_itr9_seed4_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|468.0 MB| + +## References + +https://huggingface.co/ibm/ColD-Fusion-itr9-seed4 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-covid_augment_tweet_roberta_large_e4_en.md b/docs/_posts/ahmedlone127/2024-09-17-covid_augment_tweet_roberta_large_e4_en.md new file mode 100644 index 00000000000000..0c82830b6cb289 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-covid_augment_tweet_roberta_large_e4_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English covid_augment_tweet_roberta_large_e4 RoBertaForSequenceClassification from JerryYanJiang +author: John Snow Labs +name: covid_augment_tweet_roberta_large_e4 +date: 2024-09-17 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`covid_augment_tweet_roberta_large_e4` is a English model originally trained by JerryYanJiang. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/covid_augment_tweet_roberta_large_e4_en_5.5.0_3.0_1726591451910.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/covid_augment_tweet_roberta_large_e4_en_5.5.0_3.0_1726591451910.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("covid_augment_tweet_roberta_large_e4","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("covid_augment_tweet_roberta_large_e4", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|covid_augment_tweet_roberta_large_e4| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/JerryYanJiang/covid-augment-tweet-roberta-large-e4 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-croatian_roberta_base_pipeline_hr.md b/docs/_posts/ahmedlone127/2024-09-17-croatian_roberta_base_pipeline_hr.md new file mode 100644 index 00000000000000..26ea12522841cc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-croatian_roberta_base_pipeline_hr.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Croatian croatian_roberta_base_pipeline pipeline RoBertaEmbeddings from macedonizer +author: John Snow Labs +name: croatian_roberta_base_pipeline +date: 2024-09-17 +tags: [hr, open_source, pipeline, onnx] +task: Embeddings +language: hr +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`croatian_roberta_base_pipeline` is a Croatian model originally trained by macedonizer. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/croatian_roberta_base_pipeline_hr_5.5.0_3.0_1726595564973.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/croatian_roberta_base_pipeline_hr_5.5.0_3.0_1726595564973.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("croatian_roberta_base_pipeline", lang = "hr") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("croatian_roberta_base_pipeline", lang = "hr") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|croatian_roberta_base_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|hr| +|Size:|310.8 MB| + +## References + +https://huggingface.co/macedonizer/hr-roberta-base + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-dataequity_opus_maltese_english_tagalog_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-dataequity_opus_maltese_english_tagalog_pipeline_en.md new file mode 100644 index 00000000000000..73cc78abf40ebb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-dataequity_opus_maltese_english_tagalog_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English dataequity_opus_maltese_english_tagalog_pipeline pipeline MarianTransformer from dataequity +author: John Snow Labs +name: dataequity_opus_maltese_english_tagalog_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`dataequity_opus_maltese_english_tagalog_pipeline` is a English model originally trained by dataequity. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/dataequity_opus_maltese_english_tagalog_pipeline_en_5.5.0_3.0_1726533273840.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/dataequity_opus_maltese_english_tagalog_pipeline_en_5.5.0_3.0_1726533273840.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("dataequity_opus_maltese_english_tagalog_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("dataequity_opus_maltese_english_tagalog_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|dataequity_opus_maltese_english_tagalog_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|497.3 MB| + +## References + +https://huggingface.co/dataequity/dataequity-opus-mt-en-tl + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-dictabert_large_heq_he.md b/docs/_posts/ahmedlone127/2024-09-17-dictabert_large_heq_he.md new file mode 100644 index 00000000000000..a3da808cc95a75 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-dictabert_large_heq_he.md @@ -0,0 +1,86 @@ +--- +layout: model +title: Hebrew dictabert_large_heq BertForQuestionAnswering from dicta-il +author: John Snow Labs +name: dictabert_large_heq +date: 2024-09-17 +tags: [he, open_source, onnx, question_answering, bert] +task: Question Answering +language: he +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`dictabert_large_heq` is a Hebrew model originally trained by dicta-il. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/dictabert_large_heq_he_5.5.0_3.0_1726544197502.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/dictabert_large_heq_he_5.5.0_3.0_1726544197502.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("dictabert_large_heq","he") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("dictabert_large_heq", "he") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|dictabert_large_heq| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|he| +|Size:|1.6 GB| + +## References + +https://huggingface.co/dicta-il/dictabert-large-heq \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-dictabert_large_heq_pipeline_he.md b/docs/_posts/ahmedlone127/2024-09-17-dictabert_large_heq_pipeline_he.md new file mode 100644 index 00000000000000..25476171e399e2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-dictabert_large_heq_pipeline_he.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Hebrew dictabert_large_heq_pipeline pipeline BertForQuestionAnswering from dicta-il +author: John Snow Labs +name: dictabert_large_heq_pipeline +date: 2024-09-17 +tags: [he, open_source, pipeline, onnx] +task: Question Answering +language: he +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`dictabert_large_heq_pipeline` is a Hebrew model originally trained by dicta-il. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/dictabert_large_heq_pipeline_he_5.5.0_3.0_1726544293376.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/dictabert_large_heq_pipeline_he_5.5.0_3.0_1726544293376.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("dictabert_large_heq_pipeline", lang = "he") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("dictabert_large_heq_pipeline", lang = "he") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|dictabert_large_heq_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|he| +|Size:|1.6 GB| + +## References + +https://huggingface.co/dicta-il/dictabert-large-heq + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-discourse_prediction__basic_en.md b/docs/_posts/ahmedlone127/2024-09-17-discourse_prediction__basic_en.md new file mode 100644 index 00000000000000..3c8c9ee47d2e5f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-discourse_prediction__basic_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English discourse_prediction__basic AlbertForSequenceClassification from alex2awesome +author: John Snow Labs +name: discourse_prediction__basic +date: 2024-09-17 +tags: [en, open_source, onnx, sequence_classification, albert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: AlbertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained AlbertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`discourse_prediction__basic` is a English model originally trained by alex2awesome. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/discourse_prediction__basic_en_5.5.0_3.0_1726600650802.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/discourse_prediction__basic_en_5.5.0_3.0_1726600650802.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = AlbertForSequenceClassification.pretrained("discourse_prediction__basic","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = AlbertForSequenceClassification.pretrained("discourse_prediction__basic", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|discourse_prediction__basic| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|834.0 MB| + +## References + +https://huggingface.co/alex2awesome/discourse-prediction__basic \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-discourse_prediction__basic_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-discourse_prediction__basic_pipeline_en.md new file mode 100644 index 00000000000000..5e7fffcb43aba2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-discourse_prediction__basic_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English discourse_prediction__basic_pipeline pipeline AlbertForSequenceClassification from alex2awesome +author: John Snow Labs +name: discourse_prediction__basic_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained AlbertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`discourse_prediction__basic_pipeline` is a English model originally trained by alex2awesome. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/discourse_prediction__basic_pipeline_en_5.5.0_3.0_1726600690209.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/discourse_prediction__basic_pipeline_en_5.5.0_3.0_1726600690209.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("discourse_prediction__basic_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("discourse_prediction__basic_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|discourse_prediction__basic_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|834.0 MB| + +## References + +https://huggingface.co/alex2awesome/discourse-prediction__basic + +## Included Models + +- DocumentAssembler +- TokenizerModel +- AlbertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-distil_whisper_small_polyai_minds14_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-distil_whisper_small_polyai_minds14_pipeline_en.md new file mode 100644 index 00000000000000..16181235dc408f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-distil_whisper_small_polyai_minds14_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English distil_whisper_small_polyai_minds14_pipeline pipeline WhisperForCTC from Shamik +author: John Snow Labs +name: distil_whisper_small_polyai_minds14_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distil_whisper_small_polyai_minds14_pipeline` is a English model originally trained by Shamik. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distil_whisper_small_polyai_minds14_pipeline_en_5.5.0_3.0_1726552980409.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distil_whisper_small_polyai_minds14_pipeline_en_5.5.0_3.0_1726552980409.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distil_whisper_small_polyai_minds14_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distil_whisper_small_polyai_minds14_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distil_whisper_small_polyai_minds14_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/Shamik/distil-whisper-small-polyAI-minds14 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-distilbert_base_amazon_multi_pipeline_xx.md b/docs/_posts/ahmedlone127/2024-09-17-distilbert_base_amazon_multi_pipeline_xx.md new file mode 100644 index 00000000000000..0644a8d7eaef9b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-distilbert_base_amazon_multi_pipeline_xx.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Multilingual distilbert_base_amazon_multi_pipeline pipeline DistilBertForSequenceClassification from arnabdhar +author: John Snow Labs +name: distilbert_base_amazon_multi_pipeline +date: 2024-09-17 +tags: [xx, open_source, pipeline, onnx] +task: Text Classification +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_amazon_multi_pipeline` is a Multilingual model originally trained by arnabdhar. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_amazon_multi_pipeline_xx_5.5.0_3.0_1726594115412.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_amazon_multi_pipeline_xx_5.5.0_3.0_1726594115412.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_amazon_multi_pipeline", lang = "xx") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_amazon_multi_pipeline", lang = "xx") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_amazon_multi_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|xx| +|Size:|507.6 MB| + +## References + +https://huggingface.co/arnabdhar/distilbert-base-amazon-multi + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-distilbert_base_cased_distilled_squad_qnli_v0_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-distilbert_base_cased_distilled_squad_qnli_v0_pipeline_en.md new file mode 100644 index 00000000000000..5c256895fab7c7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-distilbert_base_cased_distilled_squad_qnli_v0_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_cased_distilled_squad_qnli_v0_pipeline pipeline DistilBertForSequenceClassification from HeZhang1019 +author: John Snow Labs +name: distilbert_base_cased_distilled_squad_qnli_v0_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_cased_distilled_squad_qnli_v0_pipeline` is a English model originally trained by HeZhang1019. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_cased_distilled_squad_qnli_v0_pipeline_en_5.5.0_3.0_1726585000591.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_cased_distilled_squad_qnli_v0_pipeline_en_5.5.0_3.0_1726585000591.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_cased_distilled_squad_qnli_v0_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_cased_distilled_squad_qnli_v0_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_cased_distilled_squad_qnli_v0_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|246.0 MB| + +## References + +https://huggingface.co/HeZhang1019/distilbert-base-cased-distilled-squad-qnli-v0 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-distilbert_base_cased_squad_v2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-distilbert_base_cased_squad_v2_pipeline_en.md new file mode 100644 index 00000000000000..3b866f02f7f04f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-distilbert_base_cased_squad_v2_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English distilbert_base_cased_squad_v2_pipeline pipeline DistilBertForQuestionAnswering from jysh1023 +author: John Snow Labs +name: distilbert_base_cased_squad_v2_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_cased_squad_v2_pipeline` is a English model originally trained by jysh1023. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_cased_squad_v2_pipeline_en_5.5.0_3.0_1726586761133.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_cased_squad_v2_pipeline_en_5.5.0_3.0_1726586761133.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_cased_squad_v2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_cased_squad_v2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_cased_squad_v2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|243.8 MB| + +## References + +https://huggingface.co/jysh1023/distilbert-base-cased-squad-v2 + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-distilbert_base_german_cased_v1_en.md b/docs/_posts/ahmedlone127/2024-09-17-distilbert_base_german_cased_v1_en.md new file mode 100644 index 00000000000000..2c95f404a20dda --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-distilbert_base_german_cased_v1_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_german_cased_v1 DistilBertForSequenceClassification from mserloth +author: John Snow Labs +name: distilbert_base_german_cased_v1 +date: 2024-09-17 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_german_cased_v1` is a English model originally trained by mserloth. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_german_cased_v1_en_5.5.0_3.0_1726584885995.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_german_cased_v1_en_5.5.0_3.0_1726584885995.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_german_cased_v1","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_german_cased_v1", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_german_cased_v1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|252.5 MB| + +## References + +https://huggingface.co/mserloth/distilbert-base-german-cased-v1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-distilbert_base_uncased_1212_test_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-distilbert_base_uncased_1212_test_pipeline_en.md new file mode 100644 index 00000000000000..60fc30dfcd7e2e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-distilbert_base_uncased_1212_test_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_1212_test_pipeline pipeline DistilBertForSequenceClassification from aarnow +author: John Snow Labs +name: distilbert_base_uncased_1212_test_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_1212_test_pipeline` is a English model originally trained by aarnow. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_1212_test_pipeline_en_5.5.0_3.0_1726584451972.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_1212_test_pipeline_en_5.5.0_3.0_1726584451972.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_1212_test_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_1212_test_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_1212_test_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/aarnow/distilbert-base-uncased-1212-test + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-distilbert_base_uncased_distilled_squad_finetuned_squad_pminha_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-distilbert_base_uncased_distilled_squad_finetuned_squad_pminha_pipeline_en.md new file mode 100644 index 00000000000000..7710d542a584b0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-distilbert_base_uncased_distilled_squad_finetuned_squad_pminha_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English distilbert_base_uncased_distilled_squad_finetuned_squad_pminha_pipeline pipeline DistilBertForQuestionAnswering from pminha +author: John Snow Labs +name: distilbert_base_uncased_distilled_squad_finetuned_squad_pminha_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_distilled_squad_finetuned_squad_pminha_pipeline` is a English model originally trained by pminha. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_distilled_squad_finetuned_squad_pminha_pipeline_en_5.5.0_3.0_1726586369444.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_distilled_squad_finetuned_squad_pminha_pipeline_en_5.5.0_3.0_1726586369444.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_distilled_squad_finetuned_squad_pminha_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_distilled_squad_finetuned_squad_pminha_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_distilled_squad_finetuned_squad_pminha_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|248.8 MB| + +## References + +https://huggingface.co/pminha/distilbert-base-uncased-distilled-squad-finetuned-squad + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-distilbert_base_uncased_finetuned_clinc_adriana213_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-distilbert_base_uncased_finetuned_clinc_adriana213_pipeline_en.md new file mode 100644 index 00000000000000..4234afc6432378 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-distilbert_base_uncased_finetuned_clinc_adriana213_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_clinc_adriana213_pipeline pipeline DistilBertForSequenceClassification from Adriana213 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_clinc_adriana213_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_clinc_adriana213_pipeline` is a English model originally trained by Adriana213. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_clinc_adriana213_pipeline_en_5.5.0_3.0_1726584802670.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_clinc_adriana213_pipeline_en_5.5.0_3.0_1726584802670.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_clinc_adriana213_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_clinc_adriana213_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_clinc_adriana213_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.9 MB| + +## References + +https://huggingface.co/Adriana213/distilbert-base-uncased-finetuned-clinc + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-distilbert_base_uncased_finetuned_clinc_seddiktrk_en.md b/docs/_posts/ahmedlone127/2024-09-17-distilbert_base_uncased_finetuned_clinc_seddiktrk_en.md new file mode 100644 index 00000000000000..a28f31c0a65b08 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-distilbert_base_uncased_finetuned_clinc_seddiktrk_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_clinc_seddiktrk DistilBertForSequenceClassification from seddiktrk +author: John Snow Labs +name: distilbert_base_uncased_finetuned_clinc_seddiktrk +date: 2024-09-17 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_clinc_seddiktrk` is a English model originally trained by seddiktrk. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_clinc_seddiktrk_en_5.5.0_3.0_1726584476859.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_clinc_seddiktrk_en_5.5.0_3.0_1726584476859.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_clinc_seddiktrk","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_clinc_seddiktrk", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_clinc_seddiktrk| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.9 MB| + +## References + +https://huggingface.co/seddiktrk/distilbert-base-uncased-finetuned-clinc \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-distilbert_base_uncased_finetuned_clinc_seddiktrk_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-distilbert_base_uncased_finetuned_clinc_seddiktrk_pipeline_en.md new file mode 100644 index 00000000000000..00c7defa756117 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-distilbert_base_uncased_finetuned_clinc_seddiktrk_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_clinc_seddiktrk_pipeline pipeline DistilBertForSequenceClassification from seddiktrk +author: John Snow Labs +name: distilbert_base_uncased_finetuned_clinc_seddiktrk_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_clinc_seddiktrk_pipeline` is a English model originally trained by seddiktrk. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_clinc_seddiktrk_pipeline_en_5.5.0_3.0_1726584489371.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_clinc_seddiktrk_pipeline_en_5.5.0_3.0_1726584489371.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_clinc_seddiktrk_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_clinc_seddiktrk_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_clinc_seddiktrk_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.9 MB| + +## References + +https://huggingface.co/seddiktrk/distilbert-base-uncased-finetuned-clinc + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-distilbert_base_uncased_finetuned_dourc_squad_en.md b/docs/_posts/ahmedlone127/2024-09-17-distilbert_base_uncased_finetuned_dourc_squad_en.md new file mode 100644 index 00000000000000..7d27eb04a560d2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-distilbert_base_uncased_finetuned_dourc_squad_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_dourc_squad DistilBertForQuestionAnswering from suthanhcong +author: John Snow Labs +name: distilbert_base_uncased_finetuned_dourc_squad +date: 2024-09-17 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_dourc_squad` is a English model originally trained by suthanhcong. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_dourc_squad_en_5.5.0_3.0_1726574693141.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_dourc_squad_en_5.5.0_3.0_1726574693141.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_uncased_finetuned_dourc_squad","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_uncased_finetuned_dourc_squad", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_dourc_squad| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/suthanhcong/distilbert-base-uncased-finetuned-DouRC_squad \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-distilbert_base_uncased_finetuned_dourc_squad_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-distilbert_base_uncased_finetuned_dourc_squad_pipeline_en.md new file mode 100644 index 00000000000000..713a4cf937be5e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-distilbert_base_uncased_finetuned_dourc_squad_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_dourc_squad_pipeline pipeline DistilBertForQuestionAnswering from suthanhcong +author: John Snow Labs +name: distilbert_base_uncased_finetuned_dourc_squad_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_dourc_squad_pipeline` is a English model originally trained by suthanhcong. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_dourc_squad_pipeline_en_5.5.0_3.0_1726574705967.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_dourc_squad_pipeline_en_5.5.0_3.0_1726574705967.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_dourc_squad_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_dourc_squad_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_dourc_squad_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/suthanhcong/distilbert-base-uncased-finetuned-DouRC_squad + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-distilbert_base_uncased_finetuned_emotion_dommmmmmmmm_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-distilbert_base_uncased_finetuned_emotion_dommmmmmmmm_pipeline_en.md new file mode 100644 index 00000000000000..6436029c8ec979 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-distilbert_base_uncased_finetuned_emotion_dommmmmmmmm_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_dommmmmmmmm_pipeline pipeline DistilBertForSequenceClassification from Dommmmmmmmm +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_dommmmmmmmm_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_dommmmmmmmm_pipeline` is a English model originally trained by Dommmmmmmmm. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_dommmmmmmmm_pipeline_en_5.5.0_3.0_1726593765398.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_dommmmmmmmm_pipeline_en_5.5.0_3.0_1726593765398.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_dommmmmmmmm_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_dommmmmmmmm_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_dommmmmmmmm_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Dommmmmmmmm/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-distilbert_base_uncased_finetuned_emotion_thacwn_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-distilbert_base_uncased_finetuned_emotion_thacwn_pipeline_en.md new file mode 100644 index 00000000000000..037fe189d44c21 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-distilbert_base_uncased_finetuned_emotion_thacwn_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_thacwn_pipeline pipeline DistilBertForSequenceClassification from thacwn +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_thacwn_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_thacwn_pipeline` is a English model originally trained by thacwn. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_thacwn_pipeline_en_5.5.0_3.0_1726584588857.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_thacwn_pipeline_en_5.5.0_3.0_1726584588857.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_thacwn_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_thacwn_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_thacwn_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/thacwn/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-distilbert_base_uncased_finetuned_pfe_projectt_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-distilbert_base_uncased_finetuned_pfe_projectt_pipeline_en.md new file mode 100644 index 00000000000000..9817216b47ce35 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-distilbert_base_uncased_finetuned_pfe_projectt_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_pfe_projectt_pipeline pipeline DistilBertForQuestionAnswering from onsba +author: John Snow Labs +name: distilbert_base_uncased_finetuned_pfe_projectt_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_pfe_projectt_pipeline` is a English model originally trained by onsba. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_pfe_projectt_pipeline_en_5.5.0_3.0_1726555560311.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_pfe_projectt_pipeline_en_5.5.0_3.0_1726555560311.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_pfe_projectt_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_pfe_projectt_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_pfe_projectt_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.3 MB| + +## References + +https://huggingface.co/onsba/distilbert-base-uncased-finetuned-pfe-projectt + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-distilbert_base_uncased_finetuned_squad_atunass_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-distilbert_base_uncased_finetuned_squad_atunass_pipeline_en.md new file mode 100644 index 00000000000000..13faa169968ad6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-distilbert_base_uncased_finetuned_squad_atunass_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_squad_atunass_pipeline pipeline DistilBertForQuestionAnswering from aTunass +author: John Snow Labs +name: distilbert_base_uncased_finetuned_squad_atunass_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_squad_atunass_pipeline` is a English model originally trained by aTunass. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_atunass_pipeline_en_5.5.0_3.0_1726599987992.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_atunass_pipeline_en_5.5.0_3.0_1726599987992.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_squad_atunass_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_squad_atunass_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_squad_atunass_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/aTunass/distilbert-base-uncased-finetuned-squad + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-distilbert_base_uncased_finetuned_squad_d5716d28_justin_2024_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-distilbert_base_uncased_finetuned_squad_d5716d28_justin_2024_pipeline_en.md new file mode 100644 index 00000000000000..dd2469dd337c8f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-distilbert_base_uncased_finetuned_squad_d5716d28_justin_2024_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_squad_d5716d28_justin_2024_pipeline pipeline DistilBertForQuestionAnswering from Justin-2024 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_squad_d5716d28_justin_2024_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_squad_d5716d28_justin_2024_pipeline` is a English model originally trained by Justin-2024. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_d5716d28_justin_2024_pipeline_en_5.5.0_3.0_1726599959259.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_d5716d28_justin_2024_pipeline_en_5.5.0_3.0_1726599959259.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_squad_d5716d28_justin_2024_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_squad_d5716d28_justin_2024_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_squad_d5716d28_justin_2024_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.3 MB| + +## References + +https://huggingface.co/Justin-2024/distilbert-base-uncased-finetuned-squad-d5716d28 + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-distilbert_base_uncased_finetuned_squad_d5716d28_turka_en.md b/docs/_posts/ahmedlone127/2024-09-17-distilbert_base_uncased_finetuned_squad_d5716d28_turka_en.md new file mode 100644 index 00000000000000..760beed3ebd1ca --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-distilbert_base_uncased_finetuned_squad_d5716d28_turka_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_squad_d5716d28_turka DistilBertForQuestionAnswering from Turka +author: John Snow Labs +name: distilbert_base_uncased_finetuned_squad_d5716d28_turka +date: 2024-09-17 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_squad_d5716d28_turka` is a English model originally trained by Turka. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_d5716d28_turka_en_5.5.0_3.0_1726574794576.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_d5716d28_turka_en_5.5.0_3.0_1726574794576.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_uncased_finetuned_squad_d5716d28_turka","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_uncased_finetuned_squad_d5716d28_turka", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_squad_d5716d28_turka| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/Turka/distilbert-base-uncased-finetuned-squad-d5716d28 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-distilbert_base_uncased_finetuned_squad_dohkim_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-distilbert_base_uncased_finetuned_squad_dohkim_pipeline_en.md new file mode 100644 index 00000000000000..55d07424270de3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-distilbert_base_uncased_finetuned_squad_dohkim_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_squad_dohkim_pipeline pipeline DistilBertForQuestionAnswering from Dohkim +author: John Snow Labs +name: distilbert_base_uncased_finetuned_squad_dohkim_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_squad_dohkim_pipeline` is a English model originally trained by Dohkim. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_dohkim_pipeline_en_5.5.0_3.0_1726586490716.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_dohkim_pipeline_en_5.5.0_3.0_1726586490716.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_squad_dohkim_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_squad_dohkim_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_squad_dohkim_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.3 MB| + +## References + +https://huggingface.co/Dohkim/distilbert-base-uncased-finetuned-squad + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-distilbert_base_uncased_finetuned_squad_ezcufe_en.md b/docs/_posts/ahmedlone127/2024-09-17-distilbert_base_uncased_finetuned_squad_ezcufe_en.md new file mode 100644 index 00000000000000..c8a8bbdc3bf695 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-distilbert_base_uncased_finetuned_squad_ezcufe_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_squad_ezcufe BertForQuestionAnswering from AliHashish +author: John Snow Labs +name: distilbert_base_uncased_finetuned_squad_ezcufe +date: 2024-09-17 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_squad_ezcufe` is a English model originally trained by AliHashish. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_ezcufe_en_5.5.0_3.0_1726590519604.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_ezcufe_en_5.5.0_3.0_1726590519604.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("distilbert_base_uncased_finetuned_squad_ezcufe","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("distilbert_base_uncased_finetuned_squad_ezcufe", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_squad_ezcufe| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/AliHashish/distilbert-base-uncased-finetuned-squad-EZcufe \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-distilbert_base_uncased_finetuned_squad_hashemghanem_en.md b/docs/_posts/ahmedlone127/2024-09-17-distilbert_base_uncased_finetuned_squad_hashemghanem_en.md new file mode 100644 index 00000000000000..16ef0a648de39d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-distilbert_base_uncased_finetuned_squad_hashemghanem_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_squad_hashemghanem DistilBertForQuestionAnswering from Hashemghanem +author: John Snow Labs +name: distilbert_base_uncased_finetuned_squad_hashemghanem +date: 2024-09-17 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_squad_hashemghanem` is a English model originally trained by Hashemghanem. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_hashemghanem_en_5.5.0_3.0_1726555339478.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_hashemghanem_en_5.5.0_3.0_1726555339478.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_uncased_finetuned_squad_hashemghanem","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_uncased_finetuned_squad_hashemghanem", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_squad_hashemghanem| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/Hashemghanem/distilbert-base-uncased-finetuned-squad \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-distilbert_base_uncased_finetuned_squad_hotsnow199_en.md b/docs/_posts/ahmedlone127/2024-09-17-distilbert_base_uncased_finetuned_squad_hotsnow199_en.md new file mode 100644 index 00000000000000..029d94b3593e26 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-distilbert_base_uncased_finetuned_squad_hotsnow199_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_squad_hotsnow199 DistilBertForQuestionAnswering from hotsnow199 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_squad_hotsnow199 +date: 2024-09-17 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_squad_hotsnow199` is a English model originally trained by hotsnow199. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_hotsnow199_en_5.5.0_3.0_1726599792276.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_hotsnow199_en_5.5.0_3.0_1726599792276.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_uncased_finetuned_squad_hotsnow199","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_uncased_finetuned_squad_hotsnow199", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_squad_hotsnow199| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/hotsnow199/distilbert-base-uncased-finetuned-squad \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-distilbert_base_uncased_finetuned_squad_karunac_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-distilbert_base_uncased_finetuned_squad_karunac_pipeline_en.md new file mode 100644 index 00000000000000..1e601fb729020f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-distilbert_base_uncased_finetuned_squad_karunac_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_squad_karunac_pipeline pipeline DistilBertForQuestionAnswering from karunac +author: John Snow Labs +name: distilbert_base_uncased_finetuned_squad_karunac_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_squad_karunac_pipeline` is a English model originally trained by karunac. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_karunac_pipeline_en_5.5.0_3.0_1726574736823.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_karunac_pipeline_en_5.5.0_3.0_1726574736823.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_squad_karunac_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_squad_karunac_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_squad_karunac_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.3 MB| + +## References + +https://huggingface.co/karunac/distilbert-base-uncased-finetuned-squad + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-distilbert_base_uncased_finetuned_squad_test1_en.md b/docs/_posts/ahmedlone127/2024-09-17-distilbert_base_uncased_finetuned_squad_test1_en.md new file mode 100644 index 00000000000000..732badf525d57a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-distilbert_base_uncased_finetuned_squad_test1_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_squad_test1 DistilBertForQuestionAnswering from allistair99 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_squad_test1 +date: 2024-09-17 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_squad_test1` is a English model originally trained by allistair99. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_test1_en_5.5.0_3.0_1726586457773.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_test1_en_5.5.0_3.0_1726586457773.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_uncased_finetuned_squad_test1","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_uncased_finetuned_squad_test1", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_squad_test1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/allistair99/distilbert-base-uncased-finetuned-squad-test1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-distilbert_base_uncased_finetuned_squad_thsohn_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-distilbert_base_uncased_finetuned_squad_thsohn_pipeline_en.md new file mode 100644 index 00000000000000..91d39d997c0912 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-distilbert_base_uncased_finetuned_squad_thsohn_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_squad_thsohn_pipeline pipeline DistilBertForQuestionAnswering from thsohn +author: John Snow Labs +name: distilbert_base_uncased_finetuned_squad_thsohn_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_squad_thsohn_pipeline` is a English model originally trained by thsohn. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_thsohn_pipeline_en_5.5.0_3.0_1726555464520.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_thsohn_pipeline_en_5.5.0_3.0_1726555464520.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_squad_thsohn_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_squad_thsohn_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_squad_thsohn_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/thsohn/distilbert-base-uncased-finetuned-squad + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-distilbert_base_uncased_finetuned_squad_tuts2024_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-distilbert_base_uncased_finetuned_squad_tuts2024_pipeline_en.md new file mode 100644 index 00000000000000..ef3bd43e6c4061 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-distilbert_base_uncased_finetuned_squad_tuts2024_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_squad_tuts2024_pipeline pipeline DistilBertForQuestionAnswering from tuts2024 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_squad_tuts2024_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_squad_tuts2024_pipeline` is a English model originally trained by tuts2024. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_tuts2024_pipeline_en_5.5.0_3.0_1726574728952.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_tuts2024_pipeline_en_5.5.0_3.0_1726574728952.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_squad_tuts2024_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_squad_tuts2024_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_squad_tuts2024_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/tuts2024/distilbert-base-uncased-finetuned-squad + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-distilbert_base_uncased_finetuned_squadv2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-distilbert_base_uncased_finetuned_squadv2_pipeline_en.md new file mode 100644 index 00000000000000..d26a4145094600 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-distilbert_base_uncased_finetuned_squadv2_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_squadv2_pipeline pipeline DistilBertForQuestionAnswering from jstotz64 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_squadv2_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_squadv2_pipeline` is a English model originally trained by jstotz64. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squadv2_pipeline_en_5.5.0_3.0_1726555694552.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squadv2_pipeline_en_5.5.0_3.0_1726555694552.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_squadv2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_squadv2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_squadv2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.3 MB| + +## References + +https://huggingface.co/jstotz64/distilbert-base-uncased-finetuned-squadv2 + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-distilbert_base_uncased_finetuned_transcripts_calls_avitalby_en.md b/docs/_posts/ahmedlone127/2024-09-17-distilbert_base_uncased_finetuned_transcripts_calls_avitalby_en.md new file mode 100644 index 00000000000000..1671bcd269e41d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-distilbert_base_uncased_finetuned_transcripts_calls_avitalby_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_transcripts_calls_avitalby DistilBertForQuestionAnswering from AvitalBY +author: John Snow Labs +name: distilbert_base_uncased_finetuned_transcripts_calls_avitalby +date: 2024-09-17 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_transcripts_calls_avitalby` is a English model originally trained by AvitalBY. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_transcripts_calls_avitalby_en_5.5.0_3.0_1726599943235.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_transcripts_calls_avitalby_en_5.5.0_3.0_1726599943235.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_uncased_finetuned_transcripts_calls_avitalby","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_uncased_finetuned_transcripts_calls_avitalby", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_transcripts_calls_avitalby| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/AvitalBY/distilbert-base-uncased-finetuned-transcripts-calls \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-distilbert_base_uncased_kallidavidson_en.md b/docs/_posts/ahmedlone127/2024-09-17-distilbert_base_uncased_kallidavidson_en.md new file mode 100644 index 00000000000000..ea6241f88b2cb4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-distilbert_base_uncased_kallidavidson_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English distilbert_base_uncased_kallidavidson DistilBertForQuestionAnswering from kallidavidson +author: John Snow Labs +name: distilbert_base_uncased_kallidavidson +date: 2024-09-17 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_kallidavidson` is a English model originally trained by kallidavidson. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_kallidavidson_en_5.5.0_3.0_1726586574718.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_kallidavidson_en_5.5.0_3.0_1726586574718.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_uncased_kallidavidson","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_uncased_kallidavidson", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_kallidavidson| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/kallidavidson/distilbert-base-uncased \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-distilbert_base_uncased_odm_zphr_0st17sd_ut72ut1_plprefix0stlarge17_simsp400_clean300_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-distilbert_base_uncased_odm_zphr_0st17sd_ut72ut1_plprefix0stlarge17_simsp400_clean300_pipeline_en.md new file mode 100644 index 00000000000000..d99c4418fcd646 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-distilbert_base_uncased_odm_zphr_0st17sd_ut72ut1_plprefix0stlarge17_simsp400_clean300_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st17sd_ut72ut1_plprefix0stlarge17_simsp400_clean300_pipeline pipeline DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st17sd_ut72ut1_plprefix0stlarge17_simsp400_clean300_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st17sd_ut72ut1_plprefix0stlarge17_simsp400_clean300_pipeline` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st17sd_ut72ut1_plprefix0stlarge17_simsp400_clean300_pipeline_en_5.5.0_3.0_1726593761649.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st17sd_ut72ut1_plprefix0stlarge17_simsp400_clean300_pipeline_en_5.5.0_3.0_1726593761649.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st17sd_ut72ut1_plprefix0stlarge17_simsp400_clean300_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st17sd_ut72ut1_plprefix0stlarge17_simsp400_clean300_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st17sd_ut72ut1_plprefix0stlarge17_simsp400_clean300_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.7 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st17sd_ut72ut1_PLPrefix0stlarge17_simsp400_clean300 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-distilbert_base_uncased_qa_model_v1_en.md b/docs/_posts/ahmedlone127/2024-09-17-distilbert_base_uncased_qa_model_v1_en.md new file mode 100644 index 00000000000000..906ace0b203843 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-distilbert_base_uncased_qa_model_v1_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English distilbert_base_uncased_qa_model_v1 DistilBertForQuestionAnswering from hcy5561 +author: John Snow Labs +name: distilbert_base_uncased_qa_model_v1 +date: 2024-09-17 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_qa_model_v1` is a English model originally trained by hcy5561. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_qa_model_v1_en_5.5.0_3.0_1726586562947.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_qa_model_v1_en_5.5.0_3.0_1726586562947.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_uncased_qa_model_v1","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_uncased_qa_model_v1", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_qa_model_v1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/hcy5561/distilbert-base-uncased-qa-model-v1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-distilbert_base_uncased_qa_model_v1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-distilbert_base_uncased_qa_model_v1_pipeline_en.md new file mode 100644 index 00000000000000..e93b29cc5766ed --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-distilbert_base_uncased_qa_model_v1_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English distilbert_base_uncased_qa_model_v1_pipeline pipeline DistilBertForQuestionAnswering from hcy5561 +author: John Snow Labs +name: distilbert_base_uncased_qa_model_v1_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_qa_model_v1_pipeline` is a English model originally trained by hcy5561. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_qa_model_v1_pipeline_en_5.5.0_3.0_1726586575154.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_qa_model_v1_pipeline_en_5.5.0_3.0_1726586575154.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_qa_model_v1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_qa_model_v1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_qa_model_v1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.3 MB| + +## References + +https://huggingface.co/hcy5561/distilbert-base-uncased-qa-model-v1 + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-distilbert_base_uncased_squad2_p30_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-distilbert_base_uncased_squad2_p30_pipeline_en.md new file mode 100644 index 00000000000000..809f6c4b33a19b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-distilbert_base_uncased_squad2_p30_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English distilbert_base_uncased_squad2_p30_pipeline pipeline DistilBertForQuestionAnswering from pminha +author: John Snow Labs +name: distilbert_base_uncased_squad2_p30_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_squad2_p30_pipeline` is a English model originally trained by pminha. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_squad2_p30_pipeline_en_5.5.0_3.0_1726599902740.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_squad2_p30_pipeline_en_5.5.0_3.0_1726599902740.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_squad2_p30_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_squad2_p30_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_squad2_p30_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|213.1 MB| + +## References + +https://huggingface.co/pminha/distilbert-base-uncased-squad2-p30 + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-distilbert_base_uncased_squad2_p50_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-distilbert_base_uncased_squad2_p50_pipeline_en.md new file mode 100644 index 00000000000000..4a078ce421e3ee --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-distilbert_base_uncased_squad2_p50_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English distilbert_base_uncased_squad2_p50_pipeline pipeline DistilBertForQuestionAnswering from pminha +author: John Snow Labs +name: distilbert_base_uncased_squad2_p50_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_squad2_p50_pipeline` is a English model originally trained by pminha. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_squad2_p50_pipeline_en_5.5.0_3.0_1726574550191.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_squad2_p50_pipeline_en_5.5.0_3.0_1726574550191.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_squad2_p50_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_squad2_p50_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_squad2_p50_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|185.2 MB| + +## References + +https://huggingface.co/pminha/distilbert-base-uncased-squad2-p50 + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-distilbert_base_uncased_squad2_p60_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-distilbert_base_uncased_squad2_p60_pipeline_en.md new file mode 100644 index 00000000000000..2e9c3278db775a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-distilbert_base_uncased_squad2_p60_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English distilbert_base_uncased_squad2_p60_pipeline pipeline DistilBertForQuestionAnswering from pminha +author: John Snow Labs +name: distilbert_base_uncased_squad2_p60_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_squad2_p60_pipeline` is a English model originally trained by pminha. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_squad2_p60_pipeline_en_5.5.0_3.0_1726555237233.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_squad2_p60_pipeline_en_5.5.0_3.0_1726555237233.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_squad2_p60_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_squad2_p60_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_squad2_p60_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|170.3 MB| + +## References + +https://huggingface.co/pminha/distilbert-base-uncased-squad2-p60 + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-distilbert_base_uncased_squad2_p70_en.md b/docs/_posts/ahmedlone127/2024-09-17-distilbert_base_uncased_squad2_p70_en.md new file mode 100644 index 00000000000000..8867ba8eaa8785 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-distilbert_base_uncased_squad2_p70_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English distilbert_base_uncased_squad2_p70 DistilBertForQuestionAnswering from pminha +author: John Snow Labs +name: distilbert_base_uncased_squad2_p70 +date: 2024-09-17 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_squad2_p70` is a English model originally trained by pminha. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_squad2_p70_en_5.5.0_3.0_1726575072029.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_squad2_p70_en_5.5.0_3.0_1726575072029.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_uncased_squad2_p70","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_uncased_squad2_p70", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_squad2_p70| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|155.3 MB| + +## References + +https://huggingface.co/pminha/distilbert-base-uncased-squad2-p70 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-distilbert_base_uncased_squad2_p80_en.md b/docs/_posts/ahmedlone127/2024-09-17-distilbert_base_uncased_squad2_p80_en.md new file mode 100644 index 00000000000000..9042d06cf75d36 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-distilbert_base_uncased_squad2_p80_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English distilbert_base_uncased_squad2_p80 DistilBertForQuestionAnswering from pminha +author: John Snow Labs +name: distilbert_base_uncased_squad2_p80 +date: 2024-09-17 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_squad2_p80` is a English model originally trained by pminha. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_squad2_p80_en_5.5.0_3.0_1726599606155.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_squad2_p80_en_5.5.0_3.0_1726599606155.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_uncased_squad2_p80","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_uncased_squad2_p80", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_squad2_p80| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|139.2 MB| + +## References + +https://huggingface.co/pminha/distilbert-base-uncased-squad2-p80 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-distilbert_base_uncased_squad2_pruned_p25_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-distilbert_base_uncased_squad2_pruned_p25_pipeline_en.md new file mode 100644 index 00000000000000..de1f398dd69080 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-distilbert_base_uncased_squad2_pruned_p25_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English distilbert_base_uncased_squad2_pruned_p25_pipeline pipeline DistilBertForQuestionAnswering from pminha +author: John Snow Labs +name: distilbert_base_uncased_squad2_pruned_p25_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_squad2_pruned_p25_pipeline` is a English model originally trained by pminha. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_squad2_pruned_p25_pipeline_en_5.5.0_3.0_1726586628150.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_squad2_pruned_p25_pipeline_en_5.5.0_3.0_1726586628150.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_squad2_pruned_p25_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_squad2_pruned_p25_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_squad2_pruned_p25_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|219.6 MB| + +## References + +https://huggingface.co/pminha/distilbert-base-uncased-squad2-pruned-p25 + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-distilbert_base_uncased_squad2_pruned_p35_en.md b/docs/_posts/ahmedlone127/2024-09-17-distilbert_base_uncased_squad2_pruned_p35_en.md new file mode 100644 index 00000000000000..c87f413ea70602 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-distilbert_base_uncased_squad2_pruned_p35_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English distilbert_base_uncased_squad2_pruned_p35 DistilBertForQuestionAnswering from pminha +author: John Snow Labs +name: distilbert_base_uncased_squad2_pruned_p35 +date: 2024-09-17 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_squad2_pruned_p35` is a English model originally trained by pminha. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_squad2_pruned_p35_en_5.5.0_3.0_1726555869596.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_squad2_pruned_p35_en_5.5.0_3.0_1726555869596.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_uncased_squad2_pruned_p35","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_uncased_squad2_pruned_p35", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_squad2_pruned_p35| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|206.5 MB| + +## References + +https://huggingface.co/pminha/distilbert-base-uncased-squad2-pruned-p35 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-distilbert_finetuned_squad_accelerate_augmented_full_v4_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-distilbert_finetuned_squad_accelerate_augmented_full_v4_pipeline_en.md new file mode 100644 index 00000000000000..b6616c76fa9158 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-distilbert_finetuned_squad_accelerate_augmented_full_v4_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English distilbert_finetuned_squad_accelerate_augmented_full_v4_pipeline pipeline DistilBertForQuestionAnswering from christti +author: John Snow Labs +name: distilbert_finetuned_squad_accelerate_augmented_full_v4_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_finetuned_squad_accelerate_augmented_full_v4_pipeline` is a English model originally trained by christti. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_finetuned_squad_accelerate_augmented_full_v4_pipeline_en_5.5.0_3.0_1726555785440.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_finetuned_squad_accelerate_augmented_full_v4_pipeline_en_5.5.0_3.0_1726555785440.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_finetuned_squad_accelerate_augmented_full_v4_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_finetuned_squad_accelerate_augmented_full_v4_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_finetuned_squad_accelerate_augmented_full_v4_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.3 MB| + +## References + +https://huggingface.co/christti/distilbert-finetuned-squad-accelerate-augmented-full-v4 + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-distilbert_finetuned_squad_v2_en.md b/docs/_posts/ahmedlone127/2024-09-17-distilbert_finetuned_squad_v2_en.md new file mode 100644 index 00000000000000..e8054e24a825cc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-distilbert_finetuned_squad_v2_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English distilbert_finetuned_squad_v2 DistilBertForQuestionAnswering from quynguyen1704 +author: John Snow Labs +name: distilbert_finetuned_squad_v2 +date: 2024-09-17 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_finetuned_squad_v2` is a English model originally trained by quynguyen1704. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_finetuned_squad_v2_en_5.5.0_3.0_1726555217082.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_finetuned_squad_v2_en_5.5.0_3.0_1726555217082.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_finetuned_squad_v2","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_finetuned_squad_v2", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_finetuned_squad_v2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/quynguyen1704/distilbert-finetuned-squad_v2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-distilbert_imdb_naive_en.md b/docs/_posts/ahmedlone127/2024-09-17-distilbert_imdb_naive_en.md new file mode 100644 index 00000000000000..3d585bd284c611 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-distilbert_imdb_naive_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_imdb_naive DistilBertForSequenceClassification from AmritaBh +author: John Snow Labs +name: distilbert_imdb_naive +date: 2024-09-17 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_imdb_naive` is a English model originally trained by AmritaBh. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_imdb_naive_en_5.5.0_3.0_1726593848604.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_imdb_naive_en_5.5.0_3.0_1726593848604.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_imdb_naive","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_imdb_naive", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_imdb_naive| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/AmritaBh/distilbert-imdb-naive \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-distilbert_imdb_naive_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-distilbert_imdb_naive_pipeline_en.md new file mode 100644 index 00000000000000..c0105e51b037fe --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-distilbert_imdb_naive_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_imdb_naive_pipeline pipeline DistilBertForSequenceClassification from AmritaBh +author: John Snow Labs +name: distilbert_imdb_naive_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_imdb_naive_pipeline` is a English model originally trained by AmritaBh. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_imdb_naive_pipeline_en_5.5.0_3.0_1726593861755.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_imdb_naive_pipeline_en_5.5.0_3.0_1726593861755.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_imdb_naive_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_imdb_naive_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_imdb_naive_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/AmritaBh/distilbert-imdb-naive + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-distilbert_kasuletrevor_en.md b/docs/_posts/ahmedlone127/2024-09-17-distilbert_kasuletrevor_en.md new file mode 100644 index 00000000000000..69a8c78064c211 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-distilbert_kasuletrevor_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_kasuletrevor DistilBertForSequenceClassification from KasuleTrevor +author: John Snow Labs +name: distilbert_kasuletrevor +date: 2024-09-17 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_kasuletrevor` is a English model originally trained by KasuleTrevor. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_kasuletrevor_en_5.5.0_3.0_1726584380804.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_kasuletrevor_en_5.5.0_3.0_1726584380804.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_kasuletrevor","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_kasuletrevor", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_kasuletrevor| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/KasuleTrevor/distilbert \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-distilbert_kasuletrevor_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-distilbert_kasuletrevor_pipeline_en.md new file mode 100644 index 00000000000000..c7a522986ecf94 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-distilbert_kasuletrevor_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_kasuletrevor_pipeline pipeline DistilBertForSequenceClassification from KasuleTrevor +author: John Snow Labs +name: distilbert_kasuletrevor_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_kasuletrevor_pipeline` is a English model originally trained by KasuleTrevor. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_kasuletrevor_pipeline_en_5.5.0_3.0_1726584393148.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_kasuletrevor_pipeline_en_5.5.0_3.0_1726584393148.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_kasuletrevor_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_kasuletrevor_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_kasuletrevor_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/KasuleTrevor/distilbert + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-distilbert_uncased_assamese_hungarian_f1_score_en.md b/docs/_posts/ahmedlone127/2024-09-17-distilbert_uncased_assamese_hungarian_f1_score_en.md new file mode 100644 index 00000000000000..24e3d7a4e339cf --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-distilbert_uncased_assamese_hungarian_f1_score_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_uncased_assamese_hungarian_f1_score DistilBertForSequenceClassification from raulgdp +author: John Snow Labs +name: distilbert_uncased_assamese_hungarian_f1_score +date: 2024-09-17 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_uncased_assamese_hungarian_f1_score` is a English model originally trained by raulgdp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_uncased_assamese_hungarian_f1_score_en_5.5.0_3.0_1726584852801.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_uncased_assamese_hungarian_f1_score_en_5.5.0_3.0_1726584852801.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_uncased_assamese_hungarian_f1_score","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_uncased_assamese_hungarian_f1_score", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_uncased_assamese_hungarian_f1_score| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/raulgdp/Distilbert-uncased-AS-HU-f1-score \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-distilbertfinetunehs3e8b_en.md b/docs/_posts/ahmedlone127/2024-09-17-distilbertfinetunehs3e8b_en.md new file mode 100644 index 00000000000000..2475966275cac2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-distilbertfinetunehs3e8b_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English distilbertfinetunehs3e8b DistilBertForQuestionAnswering from KarthikAlagarsamy +author: John Snow Labs +name: distilbertfinetunehs3e8b +date: 2024-09-17 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbertfinetunehs3e8b` is a English model originally trained by KarthikAlagarsamy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbertfinetunehs3e8b_en_5.5.0_3.0_1726555332133.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbertfinetunehs3e8b_en_5.5.0_3.0_1726555332133.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbertfinetunehs3e8b","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbertfinetunehs3e8b", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbertfinetunehs3e8b| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/KarthikAlagarsamy/distilbertfinetuneHS3E8B \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-distilroberta_base_ft_tifu_en.md b/docs/_posts/ahmedlone127/2024-09-17-distilroberta_base_ft_tifu_en.md new file mode 100644 index 00000000000000..edb0660fa6cc64 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-distilroberta_base_ft_tifu_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilroberta_base_ft_tifu RoBertaEmbeddings from jkruk +author: John Snow Labs +name: distilroberta_base_ft_tifu +date: 2024-09-17 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilroberta_base_ft_tifu` is a English model originally trained by jkruk. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilroberta_base_ft_tifu_en_5.5.0_3.0_1726602931400.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilroberta_base_ft_tifu_en_5.5.0_3.0_1726602931400.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("distilroberta_base_ft_tifu","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("distilroberta_base_ft_tifu","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilroberta_base_ft_tifu| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|306.4 MB| + +## References + +https://huggingface.co/jkruk/distilroberta-base-ft-tifu \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-distilroberta_base_ft_tifu_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-distilroberta_base_ft_tifu_pipeline_en.md new file mode 100644 index 00000000000000..4bbe91aa726699 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-distilroberta_base_ft_tifu_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilroberta_base_ft_tifu_pipeline pipeline RoBertaEmbeddings from jkruk +author: John Snow Labs +name: distilroberta_base_ft_tifu_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilroberta_base_ft_tifu_pipeline` is a English model originally trained by jkruk. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilroberta_base_ft_tifu_pipeline_en_5.5.0_3.0_1726602946452.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilroberta_base_ft_tifu_pipeline_en_5.5.0_3.0_1726602946452.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilroberta_base_ft_tifu_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilroberta_base_ft_tifu_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilroberta_base_ft_tifu_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|306.5 MB| + +## References + +https://huggingface.co/jkruk/distilroberta-base-ft-tifu + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-dl_xlm_roberta_base10_en.md b/docs/_posts/ahmedlone127/2024-09-17-dl_xlm_roberta_base10_en.md new file mode 100644 index 00000000000000..6a80b87550f713 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-dl_xlm_roberta_base10_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English dl_xlm_roberta_base10 XlmRoBertaForSequenceClassification from mohammad-osoolian +author: John Snow Labs +name: dl_xlm_roberta_base10 +date: 2024-09-17 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`dl_xlm_roberta_base10` is a English model originally trained by mohammad-osoolian. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/dl_xlm_roberta_base10_en_5.5.0_3.0_1726616463735.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/dl_xlm_roberta_base10_en_5.5.0_3.0_1726616463735.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("dl_xlm_roberta_base10","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("dl_xlm_roberta_base10", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|dl_xlm_roberta_base10| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|783.3 MB| + +## References + +https://huggingface.co/mohammad-osoolian/DL-xlm-roberta-base10 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-edos_2023_baseline_xlm_roberta_base_label_category_en.md b/docs/_posts/ahmedlone127/2024-09-17-edos_2023_baseline_xlm_roberta_base_label_category_en.md new file mode 100644 index 00000000000000..44712751d8c444 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-edos_2023_baseline_xlm_roberta_base_label_category_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English edos_2023_baseline_xlm_roberta_base_label_category XlmRoBertaForSequenceClassification from lct-rug-2022 +author: John Snow Labs +name: edos_2023_baseline_xlm_roberta_base_label_category +date: 2024-09-17 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`edos_2023_baseline_xlm_roberta_base_label_category` is a English model originally trained by lct-rug-2022. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/edos_2023_baseline_xlm_roberta_base_label_category_en_5.5.0_3.0_1726616482492.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/edos_2023_baseline_xlm_roberta_base_label_category_en_5.5.0_3.0_1726616482492.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("edos_2023_baseline_xlm_roberta_base_label_category","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("edos_2023_baseline_xlm_roberta_base_label_category", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|edos_2023_baseline_xlm_roberta_base_label_category| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|654.5 MB| + +## References + +https://huggingface.co/lct-rug-2022/edos-2023-baseline-xlm-roberta-base-label_category \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-extract_answer_from_text_en.md b/docs/_posts/ahmedlone127/2024-09-17-extract_answer_from_text_en.md new file mode 100644 index 00000000000000..14c190d9a2b146 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-extract_answer_from_text_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English extract_answer_from_text DistilBertForQuestionAnswering from wdavies +author: John Snow Labs +name: extract_answer_from_text +date: 2024-09-17 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`extract_answer_from_text` is a English model originally trained by wdavies. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/extract_answer_from_text_en_5.5.0_3.0_1726586427974.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/extract_answer_from_text_en_5.5.0_3.0_1726586427974.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("extract_answer_from_text","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("extract_answer_from_text", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|extract_answer_from_text| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/wdavies/extract-answer-from-text \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-extract_answer_from_text_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-extract_answer_from_text_pipeline_en.md new file mode 100644 index 00000000000000..5b481e8acda1fc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-extract_answer_from_text_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English extract_answer_from_text_pipeline pipeline DistilBertForQuestionAnswering from wdavies +author: John Snow Labs +name: extract_answer_from_text_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`extract_answer_from_text_pipeline` is a English model originally trained by wdavies. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/extract_answer_from_text_pipeline_en_5.5.0_3.0_1726586440501.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/extract_answer_from_text_pipeline_en_5.5.0_3.0_1726586440501.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("extract_answer_from_text_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("extract_answer_from_text_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|extract_answer_from_text_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.3 MB| + +## References + +https://huggingface.co/wdavies/extract-answer-from-text + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-fake_news_classifier_draip_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-fake_news_classifier_draip_pipeline_en.md new file mode 100644 index 00000000000000..0db90bc69ac971 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-fake_news_classifier_draip_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English fake_news_classifier_draip_pipeline pipeline DistilBertForSequenceClassification from DraiP +author: John Snow Labs +name: fake_news_classifier_draip_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`fake_news_classifier_draip_pipeline` is a English model originally trained by DraiP. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/fake_news_classifier_draip_pipeline_en_5.5.0_3.0_1726584695895.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/fake_news_classifier_draip_pipeline_en_5.5.0_3.0_1726584695895.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("fake_news_classifier_draip_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("fake_news_classifier_draip_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|fake_news_classifier_draip_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/DraiP/Fake_News_Classifier + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-fdistilbert_base_uncased_finetuned_squad_en.md b/docs/_posts/ahmedlone127/2024-09-17-fdistilbert_base_uncased_finetuned_squad_en.md new file mode 100644 index 00000000000000..62c648355b8878 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-fdistilbert_base_uncased_finetuned_squad_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English fdistilbert_base_uncased_finetuned_squad DistilBertForQuestionAnswering from Kamaljp +author: John Snow Labs +name: fdistilbert_base_uncased_finetuned_squad +date: 2024-09-17 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`fdistilbert_base_uncased_finetuned_squad` is a English model originally trained by Kamaljp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/fdistilbert_base_uncased_finetuned_squad_en_5.5.0_3.0_1726574819140.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/fdistilbert_base_uncased_finetuned_squad_en_5.5.0_3.0_1726574819140.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("fdistilbert_base_uncased_finetuned_squad","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("fdistilbert_base_uncased_finetuned_squad", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|fdistilbert_base_uncased_finetuned_squad| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/Kamaljp/fdistilbert-base-uncased-finetuned-squad \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-feedback_classification_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-feedback_classification_pipeline_en.md new file mode 100644 index 00000000000000..d67bf7bbe811f2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-feedback_classification_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English feedback_classification_pipeline pipeline BertForSequenceClassification from Yousefmd +author: John Snow Labs +name: feedback_classification_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`feedback_classification_pipeline` is a English model originally trained by Yousefmd. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/feedback_classification_pipeline_en_5.5.0_3.0_1726605205107.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/feedback_classification_pipeline_en_5.5.0_3.0_1726605205107.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("feedback_classification_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("feedback_classification_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|feedback_classification_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.4 GB| + +## References + +https://huggingface.co/Yousefmd/feedback-classification + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-final_ft__roberta_base_bne__70k_ultrasounds_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-final_ft__roberta_base_bne__70k_ultrasounds_pipeline_en.md new file mode 100644 index 00000000000000..7fb65e69884196 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-final_ft__roberta_base_bne__70k_ultrasounds_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English final_ft__roberta_base_bne__70k_ultrasounds_pipeline pipeline RoBertaEmbeddings from manucos +author: John Snow Labs +name: final_ft__roberta_base_bne__70k_ultrasounds_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`final_ft__roberta_base_bne__70k_ultrasounds_pipeline` is a English model originally trained by manucos. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/final_ft__roberta_base_bne__70k_ultrasounds_pipeline_en_5.5.0_3.0_1726595844500.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/final_ft__roberta_base_bne__70k_ultrasounds_pipeline_en_5.5.0_3.0_1726595844500.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("final_ft__roberta_base_bne__70k_ultrasounds_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("final_ft__roberta_base_bne__70k_ultrasounds_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|final_ft__roberta_base_bne__70k_ultrasounds_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|463.7 MB| + +## References + +https://huggingface.co/manucos/final-ft__roberta-base-bne__70k-ultrasounds + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-fine_tuned_albert_tweets_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-fine_tuned_albert_tweets_pipeline_en.md new file mode 100644 index 00000000000000..0f9a9302d38215 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-fine_tuned_albert_tweets_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English fine_tuned_albert_tweets_pipeline pipeline AlbertForSequenceClassification from imsarfaroz +author: John Snow Labs +name: fine_tuned_albert_tweets_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained AlbertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`fine_tuned_albert_tweets_pipeline` is a English model originally trained by imsarfaroz. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/fine_tuned_albert_tweets_pipeline_en_5.5.0_3.0_1726614199582.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/fine_tuned_albert_tweets_pipeline_en_5.5.0_3.0_1726614199582.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("fine_tuned_albert_tweets_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("fine_tuned_albert_tweets_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|fine_tuned_albert_tweets_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|44.2 MB| + +## References + +https://huggingface.co/imsarfaroz/fine-tuned-albert-tweets + +## Included Models + +- DocumentAssembler +- TokenizerModel +- AlbertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-fine_tuned_roberta_yt_sentiment_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-fine_tuned_roberta_yt_sentiment_pipeline_en.md new file mode 100644 index 00000000000000..6bec4a95f86b37 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-fine_tuned_roberta_yt_sentiment_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English fine_tuned_roberta_yt_sentiment_pipeline pipeline RoBertaForSequenceClassification from roycett +author: John Snow Labs +name: fine_tuned_roberta_yt_sentiment_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`fine_tuned_roberta_yt_sentiment_pipeline` is a English model originally trained by roycett. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/fine_tuned_roberta_yt_sentiment_pipeline_en_5.5.0_3.0_1726591284364.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/fine_tuned_roberta_yt_sentiment_pipeline_en_5.5.0_3.0_1726591284364.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("fine_tuned_roberta_yt_sentiment_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("fine_tuned_roberta_yt_sentiment_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|fine_tuned_roberta_yt_sentiment_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|468.2 MB| + +## References + +https://huggingface.co/roycett/fine-tuned-roberta-yt-sentiment + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-finetuned_bert_model_squad_datset_en.md b/docs/_posts/ahmedlone127/2024-09-17-finetuned_bert_model_squad_datset_en.md new file mode 100644 index 00000000000000..0a3ae880533ea8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-finetuned_bert_model_squad_datset_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English finetuned_bert_model_squad_datset DistilBertForQuestionAnswering from AlyGreo +author: John Snow Labs +name: finetuned_bert_model_squad_datset +date: 2024-09-17 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuned_bert_model_squad_datset` is a English model originally trained by AlyGreo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuned_bert_model_squad_datset_en_5.5.0_3.0_1726555331817.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuned_bert_model_squad_datset_en_5.5.0_3.0_1726555331817.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("finetuned_bert_model_squad_datset","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("finetuned_bert_model_squad_datset", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuned_bert_model_squad_datset| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/AlyGreo/finetuned-bert-model-squad-datset \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-finetuning_sentiment_model_3000_samples_5_en.md b/docs/_posts/ahmedlone127/2024-09-17-finetuning_sentiment_model_3000_samples_5_en.md new file mode 100644 index 00000000000000..8402b0bfb2779e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-finetuning_sentiment_model_3000_samples_5_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_5 DistilBertForSequenceClassification from mamledes +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_5 +date: 2024-09-17 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_5` is a English model originally trained by mamledes. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_5_en_5.5.0_3.0_1726594055316.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_5_en_5.5.0_3.0_1726594055316.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_5","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_5", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_5| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/mamledes/finetuning-sentiment-model-3000-samples_5 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-finetuning_sentiment_model_3000_samples_ganeshglitz_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-finetuning_sentiment_model_3000_samples_ganeshglitz_pipeline_en.md new file mode 100644 index 00000000000000..5c1a6da1a9e22e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-finetuning_sentiment_model_3000_samples_ganeshglitz_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_ganeshglitz_pipeline pipeline DistilBertForSequenceClassification from ganeshglitz +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_ganeshglitz_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_ganeshglitz_pipeline` is a English model originally trained by ganeshglitz. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_ganeshglitz_pipeline_en_5.5.0_3.0_1726593976090.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_ganeshglitz_pipeline_en_5.5.0_3.0_1726593976090.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuning_sentiment_model_3000_samples_ganeshglitz_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuning_sentiment_model_3000_samples_ganeshglitz_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_ganeshglitz_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/ganeshglitz/finetuning-sentiment-model-3000-samples + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-finetuning_sentiment_model_3000_samples_hunnyopenxcell_en.md b/docs/_posts/ahmedlone127/2024-09-17-finetuning_sentiment_model_3000_samples_hunnyopenxcell_en.md new file mode 100644 index 00000000000000..f31bfb194d2ce6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-finetuning_sentiment_model_3000_samples_hunnyopenxcell_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_hunnyopenxcell DistilBertForSequenceClassification from hunnyopenxcell +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_hunnyopenxcell +date: 2024-09-17 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_hunnyopenxcell` is a English model originally trained by hunnyopenxcell. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_hunnyopenxcell_en_5.5.0_3.0_1726584360531.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_hunnyopenxcell_en_5.5.0_3.0_1726584360531.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_hunnyopenxcell","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_hunnyopenxcell", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_hunnyopenxcell| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/hunnyopenxcell/finetuning-sentiment-model-3000-samples \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-ft_mod_all_en.md b/docs/_posts/ahmedlone127/2024-09-17-ft_mod_all_en.md new file mode 100644 index 00000000000000..133b4c7bc2599b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-ft_mod_all_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English ft_mod_all DistilBertForQuestionAnswering from Saty2hoty +author: John Snow Labs +name: ft_mod_all +date: 2024-09-17 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ft_mod_all` is a English model originally trained by Saty2hoty. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ft_mod_all_en_5.5.0_3.0_1726586385652.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ft_mod_all_en_5.5.0_3.0_1726586385652.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("ft_mod_all","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("ft_mod_all", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ft_mod_all| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/Saty2hoty/FT_mod_all \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-ft_mod_all_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-ft_mod_all_pipeline_en.md new file mode 100644 index 00000000000000..1cb14b3affdfec --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-ft_mod_all_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English ft_mod_all_pipeline pipeline DistilBertForQuestionAnswering from Saty2hoty +author: John Snow Labs +name: ft_mod_all_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ft_mod_all_pipeline` is a English model originally trained by Saty2hoty. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ft_mod_all_pipeline_en_5.5.0_3.0_1726586397621.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ft_mod_all_pipeline_en_5.5.0_3.0_1726586397621.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("ft_mod_all_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("ft_mod_all_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ft_mod_all_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.3 MB| + +## References + +https://huggingface.co/Saty2hoty/FT_mod_all + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-hate_speech_detection_tweets_roberta_base_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-hate_speech_detection_tweets_roberta_base_pipeline_en.md new file mode 100644 index 00000000000000..37a654857a8be2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-hate_speech_detection_tweets_roberta_base_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English hate_speech_detection_tweets_roberta_base_pipeline pipeline RoBertaForSequenceClassification from Arvnd03 +author: John Snow Labs +name: hate_speech_detection_tweets_roberta_base_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`hate_speech_detection_tweets_roberta_base_pipeline` is a English model originally trained by Arvnd03. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/hate_speech_detection_tweets_roberta_base_pipeline_en_5.5.0_3.0_1726573606600.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/hate_speech_detection_tweets_roberta_base_pipeline_en_5.5.0_3.0_1726573606600.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("hate_speech_detection_tweets_roberta_base_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("hate_speech_detection_tweets_roberta_base_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|hate_speech_detection_tweets_roberta_base_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|455.6 MB| + +## References + +https://huggingface.co/Arvnd03/Hate-Speech-Detection-Tweets-RoBERTa-base + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-helsinki_danish_swedish_v13_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-helsinki_danish_swedish_v13_pipeline_en.md new file mode 100644 index 00000000000000..69e00a41f27a5f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-helsinki_danish_swedish_v13_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English helsinki_danish_swedish_v13_pipeline pipeline MarianTransformer from Danieljacobsen +author: John Snow Labs +name: helsinki_danish_swedish_v13_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`helsinki_danish_swedish_v13_pipeline` is a English model originally trained by Danieljacobsen. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/helsinki_danish_swedish_v13_pipeline_en_5.5.0_3.0_1726532864934.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/helsinki_danish_swedish_v13_pipeline_en_5.5.0_3.0_1726532864934.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("helsinki_danish_swedish_v13_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("helsinki_danish_swedish_v13_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|helsinki_danish_swedish_v13_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|497.3 MB| + +## References + +https://huggingface.co/Danieljacobsen/Helsinki-DA-SV-v13 + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-igbo_model_pipeline_ig.md b/docs/_posts/ahmedlone127/2024-09-17-igbo_model_pipeline_ig.md new file mode 100644 index 00000000000000..e972a3ced3e24d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-igbo_model_pipeline_ig.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Igbo igbo_model_pipeline pipeline XlmRoBertaForTokenClassification from ignatius +author: John Snow Labs +name: igbo_model_pipeline +date: 2024-09-17 +tags: [ig, open_source, pipeline, onnx] +task: Named Entity Recognition +language: ig +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`igbo_model_pipeline` is a Igbo model originally trained by ignatius. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/igbo_model_pipeline_ig_5.5.0_3.0_1726577158736.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/igbo_model_pipeline_ig_5.5.0_3.0_1726577158736.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("igbo_model_pipeline", lang = "ig") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("igbo_model_pipeline", lang = "ig") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|igbo_model_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|ig| +|Size:|443.2 MB| + +## References + +https://huggingface.co/ignatius/igbo_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-influence_tactic_paper_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-influence_tactic_paper_pipeline_en.md new file mode 100644 index 00000000000000..30387790777856 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-influence_tactic_paper_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English influence_tactic_paper_pipeline pipeline BertForSequenceClassification from InfluenceTactics +author: John Snow Labs +name: influence_tactic_paper_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`influence_tactic_paper_pipeline` is a English model originally trained by InfluenceTactics. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/influence_tactic_paper_pipeline_en_5.5.0_3.0_1726604789831.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/influence_tactic_paper_pipeline_en_5.5.0_3.0_1726604789831.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("influence_tactic_paper_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("influence_tactic_paper_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|influence_tactic_paper_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/InfluenceTactics/Influence_Tactic_Paper + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-interlingua_multilingual_original_script_roberta_pipeline_xx.md b/docs/_posts/ahmedlone127/2024-09-17-interlingua_multilingual_original_script_roberta_pipeline_xx.md new file mode 100644 index 00000000000000..c3ef1cd575056c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-interlingua_multilingual_original_script_roberta_pipeline_xx.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Multilingual interlingua_multilingual_original_script_roberta_pipeline pipeline RoBertaEmbeddings from ibm +author: John Snow Labs +name: interlingua_multilingual_original_script_roberta_pipeline +date: 2024-09-17 +tags: [xx, open_source, pipeline, onnx] +task: Embeddings +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`interlingua_multilingual_original_script_roberta_pipeline` is a Multilingual model originally trained by ibm. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/interlingua_multilingual_original_script_roberta_pipeline_xx_5.5.0_3.0_1726595623586.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/interlingua_multilingual_original_script_roberta_pipeline_xx_5.5.0_3.0_1726595623586.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("interlingua_multilingual_original_script_roberta_pipeline", lang = "xx") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("interlingua_multilingual_original_script_roberta_pipeline", lang = "xx") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|interlingua_multilingual_original_script_roberta_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|xx| +|Size:|638.6 MB| + +## References + +https://huggingface.co/ibm/ia-multilingual-original-script-roberta + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-interlingua_multilingual_transliterated_roberta_pipeline_xx.md b/docs/_posts/ahmedlone127/2024-09-17-interlingua_multilingual_transliterated_roberta_pipeline_xx.md new file mode 100644 index 00000000000000..34fc4bc2aab69a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-interlingua_multilingual_transliterated_roberta_pipeline_xx.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Multilingual interlingua_multilingual_transliterated_roberta_pipeline pipeline RoBertaEmbeddings from ibm +author: John Snow Labs +name: interlingua_multilingual_transliterated_roberta_pipeline +date: 2024-09-17 +tags: [xx, open_source, pipeline, onnx] +task: Embeddings +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`interlingua_multilingual_transliterated_roberta_pipeline` is a Multilingual model originally trained by ibm. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/interlingua_multilingual_transliterated_roberta_pipeline_xx_5.5.0_3.0_1726595983328.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/interlingua_multilingual_transliterated_roberta_pipeline_xx_5.5.0_3.0_1726595983328.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("interlingua_multilingual_transliterated_roberta_pipeline", lang = "xx") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("interlingua_multilingual_transliterated_roberta_pipeline", lang = "xx") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|interlingua_multilingual_transliterated_roberta_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|xx| +|Size:|638.6 MB| + +## References + +https://huggingface.co/ibm/ia-multilingual-transliterated-roberta + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-interval_model_en.md b/docs/_posts/ahmedlone127/2024-09-17-interval_model_en.md new file mode 100644 index 00000000000000..6466eeb16c5a06 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-interval_model_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English interval_model DistilBertForSequenceClassification from coggpt +author: John Snow Labs +name: interval_model +date: 2024-09-17 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`interval_model` is a English model originally trained by coggpt. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/interval_model_en_5.5.0_3.0_1726593973029.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/interval_model_en_5.5.0_3.0_1726593973029.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("interval_model","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("interval_model", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|interval_model| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/coggpt/interval_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-korean_better_old_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-korean_better_old_pipeline_en.md new file mode 100644 index 00000000000000..77d3c8cd1c4697 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-korean_better_old_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English korean_better_old_pipeline pipeline WhisperForCTC from phucd +author: John Snow Labs +name: korean_better_old_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`korean_better_old_pipeline` is a English model originally trained by phucd. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/korean_better_old_pipeline_en_5.5.0_3.0_1726570550146.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/korean_better_old_pipeline_en_5.5.0_3.0_1726570550146.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("korean_better_old_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("korean_better_old_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|korean_better_old_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/phucd/ko-better-old + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-lab1_finetuning_yimeiyang_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-lab1_finetuning_yimeiyang_pipeline_en.md new file mode 100644 index 00000000000000..cf88b6de7c3193 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-lab1_finetuning_yimeiyang_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English lab1_finetuning_yimeiyang_pipeline pipeline MarianTransformer from yimeiyang +author: John Snow Labs +name: lab1_finetuning_yimeiyang_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`lab1_finetuning_yimeiyang_pipeline` is a English model originally trained by yimeiyang. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/lab1_finetuning_yimeiyang_pipeline_en_5.5.0_3.0_1726582225303.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/lab1_finetuning_yimeiyang_pipeline_en_5.5.0_3.0_1726582225303.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("lab1_finetuning_yimeiyang_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("lab1_finetuning_yimeiyang_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|lab1_finetuning_yimeiyang_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|508.7 MB| + +## References + +https://huggingface.co/yimeiyang/lab1_finetuning + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-lab2_adam_reshphil23_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-lab2_adam_reshphil23_pipeline_en.md new file mode 100644 index 00000000000000..88b5e610a0ac56 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-lab2_adam_reshphil23_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English lab2_adam_reshphil23_pipeline pipeline MarianTransformer from reshphil23 +author: John Snow Labs +name: lab2_adam_reshphil23_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`lab2_adam_reshphil23_pipeline` is a English model originally trained by reshphil23. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/lab2_adam_reshphil23_pipeline_en_5.5.0_3.0_1726533352218.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/lab2_adam_reshphil23_pipeline_en_5.5.0_3.0_1726533352218.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("lab2_adam_reshphil23_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("lab2_adam_reshphil23_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|lab2_adam_reshphil23_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|508.9 MB| + +## References + +https://huggingface.co/reshphil23/lab2_adam + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-lab2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-lab2_pipeline_en.md new file mode 100644 index 00000000000000..837c38c8791cd3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-lab2_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English lab2_pipeline pipeline WhisperForCTC from WayneLinn +author: John Snow Labs +name: lab2_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`lab2_pipeline` is a English model originally trained by WayneLinn. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/lab2_pipeline_en_5.5.0_3.0_1726542174729.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/lab2_pipeline_en_5.5.0_3.0_1726542174729.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("lab2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("lab2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|lab2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/WayneLinn/Lab2 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-lr1e5_bs32_distilbert_qa_pytorch_full_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-lr1e5_bs32_distilbert_qa_pytorch_full_pipeline_en.md new file mode 100644 index 00000000000000..dbf91198d67c66 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-lr1e5_bs32_distilbert_qa_pytorch_full_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English lr1e5_bs32_distilbert_qa_pytorch_full_pipeline pipeline DistilBertForQuestionAnswering from tyavika +author: John Snow Labs +name: lr1e5_bs32_distilbert_qa_pytorch_full_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`lr1e5_bs32_distilbert_qa_pytorch_full_pipeline` is a English model originally trained by tyavika. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/lr1e5_bs32_distilbert_qa_pytorch_full_pipeline_en_5.5.0_3.0_1726586374669.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/lr1e5_bs32_distilbert_qa_pytorch_full_pipeline_en_5.5.0_3.0_1726586374669.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("lr1e5_bs32_distilbert_qa_pytorch_full_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("lr1e5_bs32_distilbert_qa_pytorch_full_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|lr1e5_bs32_distilbert_qa_pytorch_full_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.3 MB| + +## References + +https://huggingface.co/tyavika/LR1E5_BS32_Distilbert-QA-Pytorch-FULL + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-maltese_align_finetuned_lst_english_tonga_tonga_islands_thai_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-maltese_align_finetuned_lst_english_tonga_tonga_islands_thai_pipeline_en.md new file mode 100644 index 00000000000000..92545dc1f87433 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-maltese_align_finetuned_lst_english_tonga_tonga_islands_thai_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English maltese_align_finetuned_lst_english_tonga_tonga_islands_thai_pipeline pipeline MarianTransformer from saksornr +author: John Snow Labs +name: maltese_align_finetuned_lst_english_tonga_tonga_islands_thai_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`maltese_align_finetuned_lst_english_tonga_tonga_islands_thai_pipeline` is a English model originally trained by saksornr. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/maltese_align_finetuned_lst_english_tonga_tonga_islands_thai_pipeline_en_5.5.0_3.0_1726581893298.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/maltese_align_finetuned_lst_english_tonga_tonga_islands_thai_pipeline_en_5.5.0_3.0_1726581893298.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("maltese_align_finetuned_lst_english_tonga_tonga_islands_thai_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("maltese_align_finetuned_lst_english_tonga_tonga_islands_thai_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|maltese_align_finetuned_lst_english_tonga_tonga_islands_thai_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|530.8 MB| + +## References + +https://huggingface.co/saksornr/mt-align-finetuned-LST-en-to-th + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-maltese_hitz_spanish_basque_pipeline_es.md b/docs/_posts/ahmedlone127/2024-09-17-maltese_hitz_spanish_basque_pipeline_es.md new file mode 100644 index 00000000000000..db92373f0437c0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-maltese_hitz_spanish_basque_pipeline_es.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Castilian, Spanish maltese_hitz_spanish_basque_pipeline pipeline MarianTransformer from HiTZ +author: John Snow Labs +name: maltese_hitz_spanish_basque_pipeline +date: 2024-09-17 +tags: [es, open_source, pipeline, onnx] +task: Translation +language: es +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`maltese_hitz_spanish_basque_pipeline` is a Castilian, Spanish model originally trained by HiTZ. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/maltese_hitz_spanish_basque_pipeline_es_5.5.0_3.0_1726581748715.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/maltese_hitz_spanish_basque_pipeline_es_5.5.0_3.0_1726581748715.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("maltese_hitz_spanish_basque_pipeline", lang = "es") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("maltese_hitz_spanish_basque_pipeline", lang = "es") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|maltese_hitz_spanish_basque_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|es| +|Size:|225.9 MB| + +## References + +https://huggingface.co/HiTZ/mt-hitz-es-eu + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-marian_finetuned_gw_chinese_tonga_tonga_islands_english_accelerate_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-marian_finetuned_gw_chinese_tonga_tonga_islands_english_accelerate_pipeline_en.md new file mode 100644 index 00000000000000..025ceb0fe8146e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-marian_finetuned_gw_chinese_tonga_tonga_islands_english_accelerate_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English marian_finetuned_gw_chinese_tonga_tonga_islands_english_accelerate_pipeline pipeline MarianTransformer from Pinkky +author: John Snow Labs +name: marian_finetuned_gw_chinese_tonga_tonga_islands_english_accelerate_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`marian_finetuned_gw_chinese_tonga_tonga_islands_english_accelerate_pipeline` is a English model originally trained by Pinkky. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/marian_finetuned_gw_chinese_tonga_tonga_islands_english_accelerate_pipeline_en_5.5.0_3.0_1726582050985.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/marian_finetuned_gw_chinese_tonga_tonga_islands_english_accelerate_pipeline_en_5.5.0_3.0_1726582050985.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("marian_finetuned_gw_chinese_tonga_tonga_islands_english_accelerate_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("marian_finetuned_gw_chinese_tonga_tonga_islands_english_accelerate_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|marian_finetuned_gw_chinese_tonga_tonga_islands_english_accelerate_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|540.8 MB| + +## References + +https://huggingface.co/Pinkky/marian-finetuned-gw-zh-to-en-accelerate + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-marian_finetuned_kde4_english_tonga_tonga_islands_french_accelerate_jakeyunwookim_en.md b/docs/_posts/ahmedlone127/2024-09-17-marian_finetuned_kde4_english_tonga_tonga_islands_french_accelerate_jakeyunwookim_en.md new file mode 100644 index 00000000000000..7bc1c63c5fd16a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-marian_finetuned_kde4_english_tonga_tonga_islands_french_accelerate_jakeyunwookim_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English marian_finetuned_kde4_english_tonga_tonga_islands_french_accelerate_jakeyunwookim MarianTransformer from JakeYunwooKim +author: John Snow Labs +name: marian_finetuned_kde4_english_tonga_tonga_islands_french_accelerate_jakeyunwookim +date: 2024-09-17 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`marian_finetuned_kde4_english_tonga_tonga_islands_french_accelerate_jakeyunwookim` is a English model originally trained by JakeYunwooKim. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/marian_finetuned_kde4_english_tonga_tonga_islands_french_accelerate_jakeyunwookim_en_5.5.0_3.0_1726532833205.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/marian_finetuned_kde4_english_tonga_tonga_islands_french_accelerate_jakeyunwookim_en_5.5.0_3.0_1726532833205.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("marian_finetuned_kde4_english_tonga_tonga_islands_french_accelerate_jakeyunwookim","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("marian_finetuned_kde4_english_tonga_tonga_islands_french_accelerate_jakeyunwookim","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|marian_finetuned_kde4_english_tonga_tonga_islands_french_accelerate_jakeyunwookim| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|508.2 MB| + +## References + +https://huggingface.co/JakeYunwooKim/marian-finetuned-kde4-en-to-fr-accelerate \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-marian_finetuned_kde4_english_tonga_tonga_islands_french_kingxue_en.md b/docs/_posts/ahmedlone127/2024-09-17-marian_finetuned_kde4_english_tonga_tonga_islands_french_kingxue_en.md new file mode 100644 index 00000000000000..5913d48474e17f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-marian_finetuned_kde4_english_tonga_tonga_islands_french_kingxue_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English marian_finetuned_kde4_english_tonga_tonga_islands_french_kingxue MarianTransformer from kingxue +author: John Snow Labs +name: marian_finetuned_kde4_english_tonga_tonga_islands_french_kingxue +date: 2024-09-17 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`marian_finetuned_kde4_english_tonga_tonga_islands_french_kingxue` is a English model originally trained by kingxue. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/marian_finetuned_kde4_english_tonga_tonga_islands_french_kingxue_en_5.5.0_3.0_1726599033403.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/marian_finetuned_kde4_english_tonga_tonga_islands_french_kingxue_en_5.5.0_3.0_1726599033403.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("marian_finetuned_kde4_english_tonga_tonga_islands_french_kingxue","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("marian_finetuned_kde4_english_tonga_tonga_islands_french_kingxue","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|marian_finetuned_kde4_english_tonga_tonga_islands_french_kingxue| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|508.3 MB| + +## References + +https://huggingface.co/kingxue/marian-finetuned-kde4-en-to-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-marian_finetuned_kde4_english_tonga_tonga_islands_french_mriggs_en.md b/docs/_posts/ahmedlone127/2024-09-17-marian_finetuned_kde4_english_tonga_tonga_islands_french_mriggs_en.md new file mode 100644 index 00000000000000..1496acd601600a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-marian_finetuned_kde4_english_tonga_tonga_islands_french_mriggs_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English marian_finetuned_kde4_english_tonga_tonga_islands_french_mriggs MarianTransformer from mriggs +author: John Snow Labs +name: marian_finetuned_kde4_english_tonga_tonga_islands_french_mriggs +date: 2024-09-17 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`marian_finetuned_kde4_english_tonga_tonga_islands_french_mriggs` is a English model originally trained by mriggs. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/marian_finetuned_kde4_english_tonga_tonga_islands_french_mriggs_en_5.5.0_3.0_1726582030697.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/marian_finetuned_kde4_english_tonga_tonga_islands_french_mriggs_en_5.5.0_3.0_1726582030697.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("marian_finetuned_kde4_english_tonga_tonga_islands_french_mriggs","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("marian_finetuned_kde4_english_tonga_tonga_islands_french_mriggs","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|marian_finetuned_kde4_english_tonga_tonga_islands_french_mriggs| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|508.2 MB| + +## References + +https://huggingface.co/mriggs/marian-finetuned-kde4-en-to-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-marianmt_bislama_dev_rom_tagalog_pipeline_hi.md b/docs/_posts/ahmedlone127/2024-09-17-marianmt_bislama_dev_rom_tagalog_pipeline_hi.md new file mode 100644 index 00000000000000..3433c695c79ceb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-marianmt_bislama_dev_rom_tagalog_pipeline_hi.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Hindi marianmt_bislama_dev_rom_tagalog_pipeline pipeline MarianTransformer from ar5entum +author: John Snow Labs +name: marianmt_bislama_dev_rom_tagalog_pipeline +date: 2024-09-17 +tags: [hi, open_source, pipeline, onnx] +task: Translation +language: hi +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`marianmt_bislama_dev_rom_tagalog_pipeline` is a Hindi model originally trained by ar5entum. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/marianmt_bislama_dev_rom_tagalog_pipeline_hi_5.5.0_3.0_1726598996827.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/marianmt_bislama_dev_rom_tagalog_pipeline_hi_5.5.0_3.0_1726598996827.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("marianmt_bislama_dev_rom_tagalog_pipeline", lang = "hi") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("marianmt_bislama_dev_rom_tagalog_pipeline", lang = "hi") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|marianmt_bislama_dev_rom_tagalog_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|hi| +|Size:|519.0 MB| + +## References + +https://huggingface.co/ar5entum/marianMT_bi_dev_rom_tl + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-melbert_roberta_en.md b/docs/_posts/ahmedlone127/2024-09-17-melbert_roberta_en.md new file mode 100644 index 00000000000000..6e56dcfbfe7993 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-melbert_roberta_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English melbert_roberta RoBertaEmbeddings from EhsanAghazadeh +author: John Snow Labs +name: melbert_roberta +date: 2024-09-17 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`melbert_roberta` is a English model originally trained by EhsanAghazadeh. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/melbert_roberta_en_5.5.0_3.0_1726595435864.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/melbert_roberta_en_5.5.0_3.0_1726595435864.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("melbert_roberta","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("melbert_roberta","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|melbert_roberta| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|464.1 MB| + +## References + +https://huggingface.co/EhsanAghazadeh/melbert-roberta \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-mnli_6_var33_3_en.md b/docs/_posts/ahmedlone127/2024-09-17-mnli_6_var33_3_en.md new file mode 100644 index 00000000000000..1a0be6366e3989 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-mnli_6_var33_3_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English mnli_6_var33_3 RoBertaEmbeddings from mahdiyar +author: John Snow Labs +name: mnli_6_var33_3 +date: 2024-09-17 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`mnli_6_var33_3` is a English model originally trained by mahdiyar. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/mnli_6_var33_3_en_5.5.0_3.0_1726595174748.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/mnli_6_var33_3_en_5.5.0_3.0_1726595174748.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("mnli_6_var33_3","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("mnli_6_var33_3","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|mnli_6_var33_3| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|460.3 MB| + +## References + +https://huggingface.co/mahdiyar/mnli-6-var33-3 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-modeltest_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-modeltest_pipeline_en.md new file mode 100644 index 00000000000000..ad35e42bb5477c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-modeltest_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English modeltest_pipeline pipeline DistilBertForSequenceClassification from pranay143342 +author: John Snow Labs +name: modeltest_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`modeltest_pipeline` is a English model originally trained by pranay143342. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/modeltest_pipeline_en_5.5.0_3.0_1726584975910.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/modeltest_pipeline_en_5.5.0_3.0_1726584975910.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("modeltest_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("modeltest_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|modeltest_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/pranay143342/modeltest + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-mt5_rouge_durga_2_en.md b/docs/_posts/ahmedlone127/2024-09-17-mt5_rouge_durga_2_en.md new file mode 100644 index 00000000000000..7cb049f090275e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-mt5_rouge_durga_2_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English mt5_rouge_durga_2 T5Transformer from devagonal +author: John Snow Labs +name: mt5_rouge_durga_2 +date: 2024-09-17 +tags: [en, open_source, onnx, t5, question_answering, summarization, translation, text_generation] +task: [Question Answering, Summarization, Translation, Text Generation] +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: T5Transformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained T5Transformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`mt5_rouge_durga_2` is a English model originally trained by devagonal. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/mt5_rouge_durga_2_en_5.5.0_3.0_1726585949872.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/mt5_rouge_durga_2_en_5.5.0_3.0_1726585949872.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +t5 = T5Transformer.pretrained("mt5_rouge_durga_2","en") \ + .setInputCols(["document"]) \ + .setOutputCol("output") + +pipeline = Pipeline().setStages([documentAssembler, t5]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val t5 = T5Transformer.pretrained("mt5_rouge_durga_2", "en") + .setInputCols(Array("documents")) + .setOutputCol("output") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, t5)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|mt5_rouge_durga_2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[output]| +|Language:|en| +|Size:|2.2 GB| + +## References + +https://huggingface.co/devagonal/mt5-rouge-durga-2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-n_distilbert_imdb_padding70model_en.md b/docs/_posts/ahmedlone127/2024-09-17-n_distilbert_imdb_padding70model_en.md new file mode 100644 index 00000000000000..4f95fac8128789 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-n_distilbert_imdb_padding70model_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English n_distilbert_imdb_padding70model DistilBertForSequenceClassification from Realgon +author: John Snow Labs +name: n_distilbert_imdb_padding70model +date: 2024-09-17 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`n_distilbert_imdb_padding70model` is a English model originally trained by Realgon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/n_distilbert_imdb_padding70model_en_5.5.0_3.0_1726594109024.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/n_distilbert_imdb_padding70model_en_5.5.0_3.0_1726594109024.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("n_distilbert_imdb_padding70model","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("n_distilbert_imdb_padding70model", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|n_distilbert_imdb_padding70model| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.7 MB| + +## References + +https://huggingface.co/Realgon/N_distilbert_imdb_padding70model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-n_distilbert_twitterfin_padding90model_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-n_distilbert_twitterfin_padding90model_pipeline_en.md new file mode 100644 index 00000000000000..6a0fb865c32b9e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-n_distilbert_twitterfin_padding90model_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English n_distilbert_twitterfin_padding90model_pipeline pipeline DistilBertForSequenceClassification from Realgon +author: John Snow Labs +name: n_distilbert_twitterfin_padding90model_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`n_distilbert_twitterfin_padding90model_pipeline` is a English model originally trained by Realgon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/n_distilbert_twitterfin_padding90model_pipeline_en_5.5.0_3.0_1726593964597.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/n_distilbert_twitterfin_padding90model_pipeline_en_5.5.0_3.0_1726593964597.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("n_distilbert_twitterfin_padding90model_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("n_distilbert_twitterfin_padding90model_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|n_distilbert_twitterfin_padding90model_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.7 MB| + +## References + +https://huggingface.co/Realgon/N_distilbert_twitterfin_padding90model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-nlp_herbalmultilabelclassification_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-nlp_herbalmultilabelclassification_pipeline_en.md new file mode 100644 index 00000000000000..51d96d4f78abbc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-nlp_herbalmultilabelclassification_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English nlp_herbalmultilabelclassification_pipeline pipeline DistilBertForSequenceClassification from khygopole +author: John Snow Labs +name: nlp_herbalmultilabelclassification_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`nlp_herbalmultilabelclassification_pipeline` is a English model originally trained by khygopole. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/nlp_herbalmultilabelclassification_pipeline_en_5.5.0_3.0_1726584184084.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/nlp_herbalmultilabelclassification_pipeline_en_5.5.0_3.0_1726584184084.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("nlp_herbalmultilabelclassification_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("nlp_herbalmultilabelclassification_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|nlp_herbalmultilabelclassification_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|507.6 MB| + +## References + +https://huggingface.co/khygopole/NLP_HerbalMultilabelClassification + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-northern_sami_roberta2_en.md b/docs/_posts/ahmedlone127/2024-09-17-northern_sami_roberta2_en.md new file mode 100644 index 00000000000000..430362bdf8c239 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-northern_sami_roberta2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English northern_sami_roberta2 RoBertaForSequenceClassification from James-kc-min +author: John Snow Labs +name: northern_sami_roberta2 +date: 2024-09-17 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`northern_sami_roberta2` is a English model originally trained by James-kc-min. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/northern_sami_roberta2_en_5.5.0_3.0_1726573336225.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/northern_sami_roberta2_en_5.5.0_3.0_1726573336225.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("northern_sami_roberta2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("northern_sami_roberta2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|northern_sami_roberta2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|309.0 MB| + +## References + +https://huggingface.co/James-kc-min/SE_Roberta2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-openai_tiny_asr_handson_en.md b/docs/_posts/ahmedlone127/2024-09-17-openai_tiny_asr_handson_en.md new file mode 100644 index 00000000000000..7a3bf3112430f5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-openai_tiny_asr_handson_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English openai_tiny_asr_handson WhisperForCTC from pknayak +author: John Snow Labs +name: openai_tiny_asr_handson +date: 2024-09-17 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`openai_tiny_asr_handson` is a English model originally trained by pknayak. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/openai_tiny_asr_handson_en_5.5.0_3.0_1726562140169.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/openai_tiny_asr_handson_en_5.5.0_3.0_1726562140169.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("openai_tiny_asr_handson","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("openai_tiny_asr_handson", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|openai_tiny_asr_handson| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|390.9 MB| + +## References + +https://huggingface.co/pknayak/openai-tiny-asr-handson \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-opus_maltese_azerbaijani_english_finetuned_npomo_english_10_epochs_en.md b/docs/_posts/ahmedlone127/2024-09-17-opus_maltese_azerbaijani_english_finetuned_npomo_english_10_epochs_en.md new file mode 100644 index 00000000000000..7f9c650feb25a6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-opus_maltese_azerbaijani_english_finetuned_npomo_english_10_epochs_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English opus_maltese_azerbaijani_english_finetuned_npomo_english_10_epochs MarianTransformer from UnassumingOwl +author: John Snow Labs +name: opus_maltese_azerbaijani_english_finetuned_npomo_english_10_epochs +date: 2024-09-17 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opus_maltese_azerbaijani_english_finetuned_npomo_english_10_epochs` is a English model originally trained by UnassumingOwl. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opus_maltese_azerbaijani_english_finetuned_npomo_english_10_epochs_en_5.5.0_3.0_1726532810812.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opus_maltese_azerbaijani_english_finetuned_npomo_english_10_epochs_en_5.5.0_3.0_1726532810812.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("opus_maltese_azerbaijani_english_finetuned_npomo_english_10_epochs","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("opus_maltese_azerbaijani_english_finetuned_npomo_english_10_epochs","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opus_maltese_azerbaijani_english_finetuned_npomo_english_10_epochs| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|301.2 MB| + +## References + +https://huggingface.co/UnassumingOwl/opus-mt-az-en-finetuned-npomo-en-10-epochs \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_ejembere_en.md b/docs/_posts/ahmedlone127/2024-09-17-opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_ejembere_en.md new file mode 100644 index 00000000000000..2295c63821da27 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_ejembere_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_ejembere MarianTransformer from ejembere +author: John Snow Labs +name: opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_ejembere +date: 2024-09-17 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_ejembere` is a English model originally trained by ejembere. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_ejembere_en_5.5.0_3.0_1726598941700.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_ejembere_en_5.5.0_3.0_1726598941700.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_ejembere","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_ejembere","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_ejembere| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|508.6 MB| + +## References + +https://huggingface.co/ejembere/opus-mt-en-ro-finetuned-en-to-ro \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_eugenegoncharov_en.md b/docs/_posts/ahmedlone127/2024-09-17-opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_eugenegoncharov_en.md new file mode 100644 index 00000000000000..6a03f8abadbddf --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_eugenegoncharov_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_eugenegoncharov MarianTransformer from eugenegoncharov +author: John Snow Labs +name: opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_eugenegoncharov +date: 2024-09-17 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_eugenegoncharov` is a English model originally trained by eugenegoncharov. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_eugenegoncharov_en_5.5.0_3.0_1726581641585.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_eugenegoncharov_en_5.5.0_3.0_1726581641585.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_eugenegoncharov","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_eugenegoncharov","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_eugenegoncharov| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|508.6 MB| + +## References + +https://huggingface.co/eugenegoncharov/opus-mt-en-ro-finetuned-en-to-ro \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-opus_maltese_english_spanish_finetuned_english_tonga_tonga_islands_spanish_maaaaaa1_en.md b/docs/_posts/ahmedlone127/2024-09-17-opus_maltese_english_spanish_finetuned_english_tonga_tonga_islands_spanish_maaaaaa1_en.md new file mode 100644 index 00000000000000..8236c47b240219 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-opus_maltese_english_spanish_finetuned_english_tonga_tonga_islands_spanish_maaaaaa1_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English opus_maltese_english_spanish_finetuned_english_tonga_tonga_islands_spanish_maaaaaa1 MarianTransformer from maaaaaa1 +author: John Snow Labs +name: opus_maltese_english_spanish_finetuned_english_tonga_tonga_islands_spanish_maaaaaa1 +date: 2024-09-17 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opus_maltese_english_spanish_finetuned_english_tonga_tonga_islands_spanish_maaaaaa1` is a English model originally trained by maaaaaa1. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opus_maltese_english_spanish_finetuned_english_tonga_tonga_islands_spanish_maaaaaa1_en_5.5.0_3.0_1726533333436.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opus_maltese_english_spanish_finetuned_english_tonga_tonga_islands_spanish_maaaaaa1_en_5.5.0_3.0_1726533333436.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("opus_maltese_english_spanish_finetuned_english_tonga_tonga_islands_spanish_maaaaaa1","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("opus_maltese_english_spanish_finetuned_english_tonga_tonga_islands_spanish_maaaaaa1","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opus_maltese_english_spanish_finetuned_english_tonga_tonga_islands_spanish_maaaaaa1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|540.0 MB| + +## References + +https://huggingface.co/maaaaaa1/opus-mt-en-es-finetuned-en-to-es \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-opus_maltese_english_spanish_finetuned_english_tonga_tonga_islands_spanish_maaaaaa1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-opus_maltese_english_spanish_finetuned_english_tonga_tonga_islands_spanish_maaaaaa1_pipeline_en.md new file mode 100644 index 00000000000000..5b0fd34dfc906d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-opus_maltese_english_spanish_finetuned_english_tonga_tonga_islands_spanish_maaaaaa1_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English opus_maltese_english_spanish_finetuned_english_tonga_tonga_islands_spanish_maaaaaa1_pipeline pipeline MarianTransformer from maaaaaa1 +author: John Snow Labs +name: opus_maltese_english_spanish_finetuned_english_tonga_tonga_islands_spanish_maaaaaa1_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opus_maltese_english_spanish_finetuned_english_tonga_tonga_islands_spanish_maaaaaa1_pipeline` is a English model originally trained by maaaaaa1. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opus_maltese_english_spanish_finetuned_english_tonga_tonga_islands_spanish_maaaaaa1_pipeline_en_5.5.0_3.0_1726533358971.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opus_maltese_english_spanish_finetuned_english_tonga_tonga_islands_spanish_maaaaaa1_pipeline_en_5.5.0_3.0_1726533358971.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("opus_maltese_english_spanish_finetuned_english_tonga_tonga_islands_spanish_maaaaaa1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("opus_maltese_english_spanish_finetuned_english_tonga_tonga_islands_spanish_maaaaaa1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opus_maltese_english_spanish_finetuned_english_tonga_tonga_islands_spanish_maaaaaa1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|540.6 MB| + +## References + +https://huggingface.co/maaaaaa1/opus-mt-en-es-finetuned-en-to-es + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-opus_maltese_turkic_languages_english_finetuned_npomo_english_15_epochs_en.md b/docs/_posts/ahmedlone127/2024-09-17-opus_maltese_turkic_languages_english_finetuned_npomo_english_15_epochs_en.md new file mode 100644 index 00000000000000..e00573e4fd003c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-opus_maltese_turkic_languages_english_finetuned_npomo_english_15_epochs_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English opus_maltese_turkic_languages_english_finetuned_npomo_english_15_epochs MarianTransformer from UnassumingOwl +author: John Snow Labs +name: opus_maltese_turkic_languages_english_finetuned_npomo_english_15_epochs +date: 2024-09-17 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opus_maltese_turkic_languages_english_finetuned_npomo_english_15_epochs` is a English model originally trained by UnassumingOwl. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opus_maltese_turkic_languages_english_finetuned_npomo_english_15_epochs_en_5.5.0_3.0_1726594713732.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opus_maltese_turkic_languages_english_finetuned_npomo_english_15_epochs_en_5.5.0_3.0_1726594713732.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("opus_maltese_turkic_languages_english_finetuned_npomo_english_15_epochs","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("opus_maltese_turkic_languages_english_finetuned_npomo_english_15_epochs","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opus_maltese_turkic_languages_english_finetuned_npomo_english_15_epochs| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|518.9 MB| + +## References + +https://huggingface.co/UnassumingOwl/opus-mt-trk-en-finetuned-npomo-en-15-epochs \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-opus_wmt_finetuned_enfr_wu_2022_en.md b/docs/_posts/ahmedlone127/2024-09-17-opus_wmt_finetuned_enfr_wu_2022_en.md new file mode 100644 index 00000000000000..ffeba24647b95c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-opus_wmt_finetuned_enfr_wu_2022_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English opus_wmt_finetuned_enfr_wu_2022 MarianTransformer from ethansimrm +author: John Snow Labs +name: opus_wmt_finetuned_enfr_wu_2022 +date: 2024-09-17 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opus_wmt_finetuned_enfr_wu_2022` is a English model originally trained by ethansimrm. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opus_wmt_finetuned_enfr_wu_2022_en_5.5.0_3.0_1726532831064.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opus_wmt_finetuned_enfr_wu_2022_en_5.5.0_3.0_1726532831064.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("opus_wmt_finetuned_enfr_wu_2022","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("opus_wmt_finetuned_enfr_wu_2022","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opus_wmt_finetuned_enfr_wu_2022| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|508.4 MB| + +## References + +https://huggingface.co/ethansimrm/opus_wmt_finetuned_enfr_wu_2022 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-pft_clf_finetuned_fa.md b/docs/_posts/ahmedlone127/2024-09-17-pft_clf_finetuned_fa.md new file mode 100644 index 00000000000000..9796c0874be3c0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-pft_clf_finetuned_fa.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Persian pft_clf_finetuned BertForSequenceClassification from amirhossein1376 +author: John Snow Labs +name: pft_clf_finetuned +date: 2024-09-17 +tags: [fa, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: fa +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`pft_clf_finetuned` is a Persian model originally trained by amirhossein1376. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/pft_clf_finetuned_fa_5.5.0_3.0_1726604766806.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/pft_clf_finetuned_fa_5.5.0_3.0_1726604766806.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("pft_clf_finetuned","fa") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("pft_clf_finetuned", "fa") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|pft_clf_finetuned| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|fa| +|Size:|443.8 MB| + +## References + +https://huggingface.co/amirhossein1376/pft-clf-finetuned \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-pft_clf_finetuned_pipeline_fa.md b/docs/_posts/ahmedlone127/2024-09-17-pft_clf_finetuned_pipeline_fa.md new file mode 100644 index 00000000000000..b42d64771f582e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-pft_clf_finetuned_pipeline_fa.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Persian pft_clf_finetuned_pipeline pipeline BertForSequenceClassification from amirhossein1376 +author: John Snow Labs +name: pft_clf_finetuned_pipeline +date: 2024-09-17 +tags: [fa, open_source, pipeline, onnx] +task: Text Classification +language: fa +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`pft_clf_finetuned_pipeline` is a Persian model originally trained by amirhossein1376. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/pft_clf_finetuned_pipeline_fa_5.5.0_3.0_1726604787693.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/pft_clf_finetuned_pipeline_fa_5.5.0_3.0_1726604787693.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("pft_clf_finetuned_pipeline", lang = "fa") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("pft_clf_finetuned_pipeline", lang = "fa") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|pft_clf_finetuned_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|fa| +|Size:|443.9 MB| + +## References + +https://huggingface.co/amirhossein1376/pft-clf-finetuned + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-pipeline1model3_en.md b/docs/_posts/ahmedlone127/2024-09-17-pipeline1model3_en.md new file mode 100644 index 00000000000000..ce560c7c0124d2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-pipeline1model3_en.md @@ -0,0 +1,66 @@ +--- +layout: model +title: English pipeline1model3 pipeline WhisperForCTC from avery0 +author: John Snow Labs +name: pipeline1model3 +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`pipeline1model3` is a English model originally trained by avery0. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/pipeline1model3_en_5.5.0_3.0_1726563508115.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/pipeline1model3_en_5.5.0_3.0_1726563508115.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("pipeline1model3", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("pipeline1model3", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|pipeline1model3| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|394.0 MB| + +## References + +https://huggingface.co/avery0/pipeline1model3 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-pipeline1model3_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-pipeline1model3_pipeline_en.md new file mode 100644 index 00000000000000..522f3a2edeadad --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-pipeline1model3_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English pipeline1model3_pipeline pipeline WhisperForCTC from avery0 +author: John Snow Labs +name: pipeline1model3_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`pipeline1model3_pipeline` is a English model originally trained by avery0. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/pipeline1model3_pipeline_en_5.5.0_3.0_1726563527611.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/pipeline1model3_pipeline_en_5.5.0_3.0_1726563527611.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("pipeline1model3_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("pipeline1model3_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|pipeline1model3_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|394.0 MB| + +## References + +https://huggingface.co/avery0/pipeline1model3 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-psst_medium_scrambled_english_en.md b/docs/_posts/ahmedlone127/2024-09-17-psst_medium_scrambled_english_en.md new file mode 100644 index 00000000000000..54442f1bcbbdf8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-psst_medium_scrambled_english_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English psst_medium_scrambled_english WhisperForCTC from NathanRoll +author: John Snow Labs +name: psst_medium_scrambled_english +date: 2024-09-17 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`psst_medium_scrambled_english` is a English model originally trained by NathanRoll. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/psst_medium_scrambled_english_en_5.5.0_3.0_1726570085146.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/psst_medium_scrambled_english_en_5.5.0_3.0_1726570085146.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("psst_medium_scrambled_english","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("psst_medium_scrambled_english", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|psst_medium_scrambled_english| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|4.8 GB| + +## References + +https://huggingface.co/NathanRoll/psst-medium-scrambled-en \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-psthebert_en.md b/docs/_posts/ahmedlone127/2024-09-17-psthebert_en.md new file mode 100644 index 00000000000000..a6abd1479b5334 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-psthebert_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English psthebert RoBertaForSequenceClassification from eevvgg +author: John Snow Labs +name: psthebert +date: 2024-09-17 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`psthebert` is a English model originally trained by eevvgg. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/psthebert_en_5.5.0_3.0_1726591283485.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/psthebert_en_5.5.0_3.0_1726591283485.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("psthebert","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("psthebert", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|psthebert| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|308.8 MB| + +## References + +https://huggingface.co/eevvgg/PsTheBERT \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-qa_finetuned_distilbert_based_uncased_pipeline_ar.md b/docs/_posts/ahmedlone127/2024-09-17-qa_finetuned_distilbert_based_uncased_pipeline_ar.md new file mode 100644 index 00000000000000..0a1a642d56cdb3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-qa_finetuned_distilbert_based_uncased_pipeline_ar.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Arabic qa_finetuned_distilbert_based_uncased_pipeline pipeline DistilBertForQuestionAnswering from gp-tar4 +author: John Snow Labs +name: qa_finetuned_distilbert_based_uncased_pipeline +date: 2024-09-17 +tags: [ar, open_source, pipeline, onnx] +task: Question Answering +language: ar +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`qa_finetuned_distilbert_based_uncased_pipeline` is a Arabic model originally trained by gp-tar4. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/qa_finetuned_distilbert_based_uncased_pipeline_ar_5.5.0_3.0_1726586374332.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/qa_finetuned_distilbert_based_uncased_pipeline_ar_5.5.0_3.0_1726586374332.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("qa_finetuned_distilbert_based_uncased_pipeline", lang = "ar") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("qa_finetuned_distilbert_based_uncased_pipeline", lang = "ar") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|qa_finetuned_distilbert_based_uncased_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|ar| +|Size:|243.8 MB| + +## References + +https://huggingface.co/gp-tar4/QA_FineTuned_DistilBert-based-uncased + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-qa_model_manikanta_goli_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-qa_model_manikanta_goli_pipeline_en.md new file mode 100644 index 00000000000000..bc6523e376727d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-qa_model_manikanta_goli_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English qa_model_manikanta_goli_pipeline pipeline DistilBertForQuestionAnswering from Manikanta-goli +author: John Snow Labs +name: qa_model_manikanta_goli_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`qa_model_manikanta_goli_pipeline` is a English model originally trained by Manikanta-goli. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/qa_model_manikanta_goli_pipeline_en_5.5.0_3.0_1726555376924.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/qa_model_manikanta_goli_pipeline_en_5.5.0_3.0_1726555376924.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("qa_model_manikanta_goli_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("qa_model_manikanta_goli_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|qa_model_manikanta_goli_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/Manikanta-goli/qa_model + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-quac_qa_bert_srddev_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-quac_qa_bert_srddev_pipeline_en.md new file mode 100644 index 00000000000000..5ce24e0c31999e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-quac_qa_bert_srddev_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English quac_qa_bert_srddev_pipeline pipeline BertForQuestionAnswering from SRDdev +author: John Snow Labs +name: quac_qa_bert_srddev_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`quac_qa_bert_srddev_pipeline` is a English model originally trained by SRDdev. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/quac_qa_bert_srddev_pipeline_en_5.5.0_3.0_1726554338784.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/quac_qa_bert_srddev_pipeline_en_5.5.0_3.0_1726554338784.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("quac_qa_bert_srddev_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("quac_qa_bert_srddev_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|quac_qa_bert_srddev_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/SRDdev/QuAC-QA-BERT + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-queryner_augmented_data_bert_base_uncased_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-queryner_augmented_data_bert_base_uncased_pipeline_en.md new file mode 100644 index 00000000000000..fea06096d990b1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-queryner_augmented_data_bert_base_uncased_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English queryner_augmented_data_bert_base_uncased_pipeline pipeline BertForTokenClassification from bltlab +author: John Snow Labs +name: queryner_augmented_data_bert_base_uncased_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`queryner_augmented_data_bert_base_uncased_pipeline` is a English model originally trained by bltlab. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/queryner_augmented_data_bert_base_uncased_pipeline_en_5.5.0_3.0_1726597590079.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/queryner_augmented_data_bert_base_uncased_pipeline_en_5.5.0_3.0_1726597590079.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("queryner_augmented_data_bert_base_uncased_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("queryner_augmented_data_bert_base_uncased_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|queryner_augmented_data_bert_base_uncased_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.3 MB| + +## References + +https://huggingface.co/bltlab/queryner-augmented-data-bert-base-uncased + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-question_answearing_7_distillbert_en.md b/docs/_posts/ahmedlone127/2024-09-17-question_answearing_7_distillbert_en.md new file mode 100644 index 00000000000000..06c9dcd31ff866 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-question_answearing_7_distillbert_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English question_answearing_7_distillbert DistilBertForQuestionAnswering from Meziane +author: John Snow Labs +name: question_answearing_7_distillbert +date: 2024-09-17 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`question_answearing_7_distillbert` is a English model originally trained by Meziane. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/question_answearing_7_distillbert_en_5.5.0_3.0_1726586688577.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/question_answearing_7_distillbert_en_5.5.0_3.0_1726586688577.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("question_answearing_7_distillbert","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("question_answearing_7_distillbert", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|question_answearing_7_distillbert| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/Meziane/question_answearing_7_distillbert \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-question_answearing_7_distillbert_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-question_answearing_7_distillbert_pipeline_en.md new file mode 100644 index 00000000000000..3d6ce441d071ca --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-question_answearing_7_distillbert_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English question_answearing_7_distillbert_pipeline pipeline DistilBertForQuestionAnswering from Meziane +author: John Snow Labs +name: question_answearing_7_distillbert_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`question_answearing_7_distillbert_pipeline` is a English model originally trained by Meziane. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/question_answearing_7_distillbert_pipeline_en_5.5.0_3.0_1726586700692.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/question_answearing_7_distillbert_pipeline_en_5.5.0_3.0_1726586700692.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("question_answearing_7_distillbert_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("question_answearing_7_distillbert_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|question_answearing_7_distillbert_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.3 MB| + +## References + +https://huggingface.co/Meziane/question_answearing_7_distillbert + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-raydox11_whisper_small_pipeline_tw.md b/docs/_posts/ahmedlone127/2024-09-17-raydox11_whisper_small_pipeline_tw.md new file mode 100644 index 00000000000000..5aba7da90acbc8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-raydox11_whisper_small_pipeline_tw.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Twi raydox11_whisper_small_pipeline pipeline WhisperForCTC from Raydox10 +author: John Snow Labs +name: raydox11_whisper_small_pipeline +date: 2024-09-17 +tags: [tw, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: tw +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`raydox11_whisper_small_pipeline` is a Twi model originally trained by Raydox10. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/raydox11_whisper_small_pipeline_tw_5.5.0_3.0_1726549726228.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/raydox11_whisper_small_pipeline_tw_5.5.0_3.0_1726549726228.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("raydox11_whisper_small_pipeline", lang = "tw") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("raydox11_whisper_small_pipeline", lang = "tw") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|raydox11_whisper_small_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|tw| +|Size:|1.7 GB| + +## References + +https://huggingface.co/Raydox10/Raydox11-whisper-small + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-raydox11_whisper_small_tw.md b/docs/_posts/ahmedlone127/2024-09-17-raydox11_whisper_small_tw.md new file mode 100644 index 00000000000000..7dd28e33a38e38 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-raydox11_whisper_small_tw.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Twi raydox11_whisper_small WhisperForCTC from Raydox10 +author: John Snow Labs +name: raydox11_whisper_small +date: 2024-09-17 +tags: [tw, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: tw +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`raydox11_whisper_small` is a Twi model originally trained by Raydox10. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/raydox11_whisper_small_tw_5.5.0_3.0_1726549641728.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/raydox11_whisper_small_tw_5.5.0_3.0_1726549641728.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("raydox11_whisper_small","tw") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("raydox11_whisper_small", "tw") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|raydox11_whisper_small| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|tw| +|Size:|1.7 GB| + +## References + +https://huggingface.co/Raydox10/Raydox11-whisper-small \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-results_test_en.md b/docs/_posts/ahmedlone127/2024-09-17-results_test_en.md new file mode 100644 index 00000000000000..3c3bd4f38c2f3d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-results_test_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English results_test WhisperForCTC from RamazanGuven +author: John Snow Labs +name: results_test +date: 2024-09-17 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`results_test` is a English model originally trained by RamazanGuven. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/results_test_en_5.5.0_3.0_1726557733706.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/results_test_en_5.5.0_3.0_1726557733706.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("results_test","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("results_test", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|results_test| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/RamazanGuven/results_test \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-robbert_2023_dutch_large_ft_lcn_actua_en.md b/docs/_posts/ahmedlone127/2024-09-17-robbert_2023_dutch_large_ft_lcn_actua_en.md new file mode 100644 index 00000000000000..644ea59da4bb0d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-robbert_2023_dutch_large_ft_lcn_actua_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English robbert_2023_dutch_large_ft_lcn_actua RoBertaEmbeddings from btamm12 +author: John Snow Labs +name: robbert_2023_dutch_large_ft_lcn_actua +date: 2024-09-17 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`robbert_2023_dutch_large_ft_lcn_actua` is a English model originally trained by btamm12. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/robbert_2023_dutch_large_ft_lcn_actua_en_5.5.0_3.0_1726603011194.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/robbert_2023_dutch_large_ft_lcn_actua_en_5.5.0_3.0_1726603011194.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("robbert_2023_dutch_large_ft_lcn_actua","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("robbert_2023_dutch_large_ft_lcn_actua","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|robbert_2023_dutch_large_ft_lcn_actua| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/btamm12/robbert-2023-dutch-large-ft-lcn-actua \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-robbert_2023_dutch_large_ft_lcn_actua_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-robbert_2023_dutch_large_ft_lcn_actua_pipeline_en.md new file mode 100644 index 00000000000000..342eec1fbe2b84 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-robbert_2023_dutch_large_ft_lcn_actua_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English robbert_2023_dutch_large_ft_lcn_actua_pipeline pipeline RoBertaEmbeddings from btamm12 +author: John Snow Labs +name: robbert_2023_dutch_large_ft_lcn_actua_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`robbert_2023_dutch_large_ft_lcn_actua_pipeline` is a English model originally trained by btamm12. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/robbert_2023_dutch_large_ft_lcn_actua_pipeline_en_5.5.0_3.0_1726603075725.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/robbert_2023_dutch_large_ft_lcn_actua_pipeline_en_5.5.0_3.0_1726603075725.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("robbert_2023_dutch_large_ft_lcn_actua_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("robbert_2023_dutch_large_ft_lcn_actua_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|robbert_2023_dutch_large_ft_lcn_actua_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/btamm12/robbert-2023-dutch-large-ft-lcn-actua + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-roberta_base_description2genre_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-roberta_base_description2genre_pipeline_en.md new file mode 100644 index 00000000000000..ae9e7649fb1768 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-roberta_base_description2genre_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_description2genre_pipeline pipeline RoBertaForSequenceClassification from BEE-spoke-data +author: John Snow Labs +name: roberta_base_description2genre_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_description2genre_pipeline` is a English model originally trained by BEE-spoke-data. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_description2genre_pipeline_en_5.5.0_3.0_1726590900008.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_description2genre_pipeline_en_5.5.0_3.0_1726590900008.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_description2genre_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_description2genre_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_description2genre_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|459.0 MB| + +## References + +https://huggingface.co/BEE-spoke-data/roberta-base-description2genre + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-roberta_base_epoch_80_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-roberta_base_epoch_80_pipeline_en.md new file mode 100644 index 00000000000000..ac48932e427a48 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-roberta_base_epoch_80_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_epoch_80_pipeline pipeline RoBertaEmbeddings from yanaiela +author: John Snow Labs +name: roberta_base_epoch_80_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_epoch_80_pipeline` is a English model originally trained by yanaiela. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_epoch_80_pipeline_en_5.5.0_3.0_1726595417651.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_epoch_80_pipeline_en_5.5.0_3.0_1726595417651.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_epoch_80_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_epoch_80_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_epoch_80_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|297.3 MB| + +## References + +https://huggingface.co/yanaiela/roberta-base-epoch_80 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-roberta_base_finetuned_4_en.md b/docs/_posts/ahmedlone127/2024-09-17-roberta_base_finetuned_4_en.md new file mode 100644 index 00000000000000..7f40864978eeb5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-roberta_base_finetuned_4_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_base_finetuned_4 RoBertaForSequenceClassification from sara-nabhani +author: John Snow Labs +name: roberta_base_finetuned_4 +date: 2024-09-17 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_finetuned_4` is a English model originally trained by sara-nabhani. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_finetuned_4_en_5.5.0_3.0_1726591580101.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_finetuned_4_en_5.5.0_3.0_1726591580101.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_base_finetuned_4","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_base_finetuned_4", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_finetuned_4| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|444.3 MB| + +## References + +https://huggingface.co/sara-nabhani/roberta-base-finetuned-4 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-roberta_base_finetuned_squad_ncouro_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-roberta_base_finetuned_squad_ncouro_pipeline_en.md new file mode 100644 index 00000000000000..a10cff8c45ab21 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-roberta_base_finetuned_squad_ncouro_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English roberta_base_finetuned_squad_ncouro_pipeline pipeline RoBertaForQuestionAnswering from ncouro +author: John Snow Labs +name: roberta_base_finetuned_squad_ncouro_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_finetuned_squad_ncouro_pipeline` is a English model originally trained by ncouro. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_finetuned_squad_ncouro_pipeline_en_5.5.0_3.0_1726580877275.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_finetuned_squad_ncouro_pipeline_en_5.5.0_3.0_1726580877275.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_finetuned_squad_ncouro_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_finetuned_squad_ncouro_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_finetuned_squad_ncouro_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|298.5 MB| + +## References + +https://huggingface.co/ncouro/roberta-base-finetuned-squad + +## Included Models + +- MultiDocumentAssembler +- RoBertaForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-roberta_base_finetuned_wallisian_manual_1ep_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-roberta_base_finetuned_wallisian_manual_1ep_pipeline_en.md new file mode 100644 index 00000000000000..1deed2bad5b489 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-roberta_base_finetuned_wallisian_manual_1ep_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_finetuned_wallisian_manual_1ep_pipeline pipeline RoBertaEmbeddings from btamm12 +author: John Snow Labs +name: roberta_base_finetuned_wallisian_manual_1ep_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_finetuned_wallisian_manual_1ep_pipeline` is a English model originally trained by btamm12. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_finetuned_wallisian_manual_1ep_pipeline_en_5.5.0_3.0_1726602873252.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_finetuned_wallisian_manual_1ep_pipeline_en_5.5.0_3.0_1726602873252.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_finetuned_wallisian_manual_1ep_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_finetuned_wallisian_manual_1ep_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_finetuned_wallisian_manual_1ep_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|463.0 MB| + +## References + +https://huggingface.co/btamm12/roberta-base-finetuned-wls-manual-1ep + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-roberta_base_first_5_chars_acl2023_en.md b/docs/_posts/ahmedlone127/2024-09-17-roberta_base_first_5_chars_acl2023_en.md new file mode 100644 index 00000000000000..45a195d4ecc6a4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-roberta_base_first_5_chars_acl2023_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_base_first_5_chars_acl2023 RoBertaEmbeddings from hitachi-nlp +author: John Snow Labs +name: roberta_base_first_5_chars_acl2023 +date: 2024-09-17 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_first_5_chars_acl2023` is a English model originally trained by hitachi-nlp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_first_5_chars_acl2023_en_5.5.0_3.0_1726602758098.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_first_5_chars_acl2023_en_5.5.0_3.0_1726602758098.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("roberta_base_first_5_chars_acl2023","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("roberta_base_first_5_chars_acl2023","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_first_5_chars_acl2023| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|465.3 MB| + +## References + +https://huggingface.co/hitachi-nlp/roberta-base_first-5-chars_acl2023 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-roberta_base_last_3_chars_acl2023_en.md b/docs/_posts/ahmedlone127/2024-09-17-roberta_base_last_3_chars_acl2023_en.md new file mode 100644 index 00000000000000..4947302b804eac --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-roberta_base_last_3_chars_acl2023_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_base_last_3_chars_acl2023 RoBertaEmbeddings from hitachi-nlp +author: John Snow Labs +name: roberta_base_last_3_chars_acl2023 +date: 2024-09-17 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_last_3_chars_acl2023` is a English model originally trained by hitachi-nlp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_last_3_chars_acl2023_en_5.5.0_3.0_1726603102081.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_last_3_chars_acl2023_en_5.5.0_3.0_1726603102081.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("roberta_base_last_3_chars_acl2023","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("roberta_base_last_3_chars_acl2023","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_last_3_chars_acl2023| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|465.3 MB| + +## References + +https://huggingface.co/hitachi-nlp/roberta-base_last-3-chars_acl2023 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-roberta_base_last_3_chars_acl2023_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-roberta_base_last_3_chars_acl2023_pipeline_en.md new file mode 100644 index 00000000000000..f101bde7e7ce38 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-roberta_base_last_3_chars_acl2023_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_last_3_chars_acl2023_pipeline pipeline RoBertaEmbeddings from hitachi-nlp +author: John Snow Labs +name: roberta_base_last_3_chars_acl2023_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_last_3_chars_acl2023_pipeline` is a English model originally trained by hitachi-nlp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_last_3_chars_acl2023_pipeline_en_5.5.0_3.0_1726603124524.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_last_3_chars_acl2023_pipeline_en_5.5.0_3.0_1726603124524.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_last_3_chars_acl2023_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_last_3_chars_acl2023_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_last_3_chars_acl2023_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|465.3 MB| + +## References + +https://huggingface.co/hitachi-nlp/roberta-base_last-3-chars_acl2023 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-roberta_base_sentiment_sst5_mapped_grouped_0_en.md b/docs/_posts/ahmedlone127/2024-09-17-roberta_base_sentiment_sst5_mapped_grouped_0_en.md new file mode 100644 index 00000000000000..85b153c935f41f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-roberta_base_sentiment_sst5_mapped_grouped_0_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_base_sentiment_sst5_mapped_grouped_0 RoBertaForSequenceClassification from kohankhaki +author: John Snow Labs +name: roberta_base_sentiment_sst5_mapped_grouped_0 +date: 2024-09-17 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_sentiment_sst5_mapped_grouped_0` is a English model originally trained by kohankhaki. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_sentiment_sst5_mapped_grouped_0_en_5.5.0_3.0_1726573727681.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_sentiment_sst5_mapped_grouped_0_en_5.5.0_3.0_1726573727681.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_base_sentiment_sst5_mapped_grouped_0","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_base_sentiment_sst5_mapped_grouped_0", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_sentiment_sst5_mapped_grouped_0| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|424.8 MB| + +## References + +https://huggingface.co/kohankhaki/roberta-base-sentiment-sst5-mapped-grouped-0 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-roberta_base_sentiment_sst5_mapped_grouped_0_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-roberta_base_sentiment_sst5_mapped_grouped_0_pipeline_en.md new file mode 100644 index 00000000000000..07af301279fd72 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-roberta_base_sentiment_sst5_mapped_grouped_0_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_sentiment_sst5_mapped_grouped_0_pipeline pipeline RoBertaForSequenceClassification from kohankhaki +author: John Snow Labs +name: roberta_base_sentiment_sst5_mapped_grouped_0_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_sentiment_sst5_mapped_grouped_0_pipeline` is a English model originally trained by kohankhaki. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_sentiment_sst5_mapped_grouped_0_pipeline_en_5.5.0_3.0_1726573767020.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_sentiment_sst5_mapped_grouped_0_pipeline_en_5.5.0_3.0_1726573767020.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_sentiment_sst5_mapped_grouped_0_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_sentiment_sst5_mapped_grouped_0_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_sentiment_sst5_mapped_grouped_0_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|424.8 MB| + +## References + +https://huggingface.co/kohankhaki/roberta-base-sentiment-sst5-mapped-grouped-0 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-roberta_dependency_max_4split_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-roberta_dependency_max_4split_pipeline_en.md new file mode 100644 index 00000000000000..0cd6409c8aac9d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-roberta_dependency_max_4split_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_dependency_max_4split_pipeline pipeline RoBertaEmbeddings from akari000 +author: John Snow Labs +name: roberta_dependency_max_4split_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_dependency_max_4split_pipeline` is a English model originally trained by akari000. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_dependency_max_4split_pipeline_en_5.5.0_3.0_1726595138426.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_dependency_max_4split_pipeline_en_5.5.0_3.0_1726595138426.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_dependency_max_4split_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_dependency_max_4split_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_dependency_max_4split_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|465.9 MB| + +## References + +https://huggingface.co/akari000/roberta-dependency-max-4split + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-roberta_large_unlabeled_gab_reddit_semeval2023_task10_57000sample_en.md b/docs/_posts/ahmedlone127/2024-09-17-roberta_large_unlabeled_gab_reddit_semeval2023_task10_57000sample_en.md new file mode 100644 index 00000000000000..54bd8da9a0c03a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-roberta_large_unlabeled_gab_reddit_semeval2023_task10_57000sample_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_large_unlabeled_gab_reddit_semeval2023_task10_57000sample RoBertaEmbeddings from HPL +author: John Snow Labs +name: roberta_large_unlabeled_gab_reddit_semeval2023_task10_57000sample +date: 2024-09-17 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_large_unlabeled_gab_reddit_semeval2023_task10_57000sample` is a English model originally trained by HPL. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_large_unlabeled_gab_reddit_semeval2023_task10_57000sample_en_5.5.0_3.0_1726595196652.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_large_unlabeled_gab_reddit_semeval2023_task10_57000sample_en_5.5.0_3.0_1726595196652.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("roberta_large_unlabeled_gab_reddit_semeval2023_task10_57000sample","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("roberta_large_unlabeled_gab_reddit_semeval2023_task10_57000sample","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_large_unlabeled_gab_reddit_semeval2023_task10_57000sample| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/HPL/roberta-large-unlabeled-gab-reddit-semeval2023-task10-57000sample \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-roberta_retrained_russian_covid_en.md b/docs/_posts/ahmedlone127/2024-09-17-roberta_retrained_russian_covid_en.md new file mode 100644 index 00000000000000..9e18c435bb9f00 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-roberta_retrained_russian_covid_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_retrained_russian_covid RoBertaEmbeddings from Daryaflp +author: John Snow Labs +name: roberta_retrained_russian_covid +date: 2024-09-17 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_retrained_russian_covid` is a English model originally trained by Daryaflp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_retrained_russian_covid_en_5.5.0_3.0_1726595385287.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_retrained_russian_covid_en_5.5.0_3.0_1726595385287.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("roberta_retrained_russian_covid","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("roberta_retrained_russian_covid","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_retrained_russian_covid| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|465.4 MB| + +## References + +https://huggingface.co/Daryaflp/roberta-retrained_ru_covid \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-scenario_non_kd_scr_d2_data_amazonscience_massive_all_1_1_betta_jason_en.md b/docs/_posts/ahmedlone127/2024-09-17-scenario_non_kd_scr_d2_data_amazonscience_massive_all_1_1_betta_jason_en.md new file mode 100644 index 00000000000000..704b7ff0c6112f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-scenario_non_kd_scr_d2_data_amazonscience_massive_all_1_1_betta_jason_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English scenario_non_kd_scr_d2_data_amazonscience_massive_all_1_1_betta_jason XlmRoBertaForSequenceClassification from haryoaw +author: John Snow Labs +name: scenario_non_kd_scr_d2_data_amazonscience_massive_all_1_1_betta_jason +date: 2024-09-17 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`scenario_non_kd_scr_d2_data_amazonscience_massive_all_1_1_betta_jason` is a English model originally trained by haryoaw. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/scenario_non_kd_scr_d2_data_amazonscience_massive_all_1_1_betta_jason_en_5.5.0_3.0_1726536242324.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/scenario_non_kd_scr_d2_data_amazonscience_massive_all_1_1_betta_jason_en_5.5.0_3.0_1726536242324.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("scenario_non_kd_scr_d2_data_amazonscience_massive_all_1_1_betta_jason","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("scenario_non_kd_scr_d2_data_amazonscience_massive_all_1_1_betta_jason", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|scenario_non_kd_scr_d2_data_amazonscience_massive_all_1_1_betta_jason| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|884.1 MB| + +## References + +https://huggingface.co/haryoaw/scenario-NON-KD-SCR-D2_data-AmazonScience_massive_all_1_1_betta-jason \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-sent_albert_small_kor_v1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-sent_albert_small_kor_v1_pipeline_en.md new file mode 100644 index 00000000000000..78f84e13ed6871 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-sent_albert_small_kor_v1_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_albert_small_kor_v1_pipeline pipeline BertSentenceEmbeddings from bongsoo +author: John Snow Labs +name: sent_albert_small_kor_v1_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_albert_small_kor_v1_pipeline` is a English model originally trained by bongsoo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_albert_small_kor_v1_pipeline_en_5.5.0_3.0_1726588032285.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_albert_small_kor_v1_pipeline_en_5.5.0_3.0_1726588032285.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_albert_small_kor_v1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_albert_small_kor_v1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_albert_small_kor_v1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|42.2 MB| + +## References + +https://huggingface.co/bongsoo/albert-small-kor-v1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-sent_dictabert_tiny_he.md b/docs/_posts/ahmedlone127/2024-09-17-sent_dictabert_tiny_he.md new file mode 100644 index 00000000000000..704563d94576b7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-sent_dictabert_tiny_he.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Hebrew sent_dictabert_tiny BertSentenceEmbeddings from dicta-il +author: John Snow Labs +name: sent_dictabert_tiny +date: 2024-09-17 +tags: [he, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: he +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_dictabert_tiny` is a Hebrew model originally trained by dicta-il. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_dictabert_tiny_he_5.5.0_3.0_1726587190963.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_dictabert_tiny_he_5.5.0_3.0_1726587190963.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_dictabert_tiny","he") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_dictabert_tiny","he") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_dictabert_tiny| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|he| +|Size:|108.4 MB| + +## References + +https://huggingface.co/dicta-il/dictabert-tiny \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-sent_dictabert_tiny_pipeline_he.md b/docs/_posts/ahmedlone127/2024-09-17-sent_dictabert_tiny_pipeline_he.md new file mode 100644 index 00000000000000..2c50e7e01398ba --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-sent_dictabert_tiny_pipeline_he.md @@ -0,0 +1,71 @@ +--- +layout: model +title: Hebrew sent_dictabert_tiny_pipeline pipeline BertSentenceEmbeddings from dicta-il +author: John Snow Labs +name: sent_dictabert_tiny_pipeline +date: 2024-09-17 +tags: [he, open_source, pipeline, onnx] +task: Embeddings +language: he +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_dictabert_tiny_pipeline` is a Hebrew model originally trained by dicta-il. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_dictabert_tiny_pipeline_he_5.5.0_3.0_1726587222694.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_dictabert_tiny_pipeline_he_5.5.0_3.0_1726587222694.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_dictabert_tiny_pipeline", lang = "he") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_dictabert_tiny_pipeline", lang = "he") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_dictabert_tiny_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|he| +|Size:|108.9 MB| + +## References + +https://huggingface.co/dicta-il/dictabert-tiny + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-sent_distilbert_base_uncased_finetuned_the_fire_flower_en.md b/docs/_posts/ahmedlone127/2024-09-17-sent_distilbert_base_uncased_finetuned_the_fire_flower_en.md new file mode 100644 index 00000000000000..25e60d51ce783e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-sent_distilbert_base_uncased_finetuned_the_fire_flower_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_distilbert_base_uncased_finetuned_the_fire_flower BertSentenceEmbeddings from miggwp +author: John Snow Labs +name: sent_distilbert_base_uncased_finetuned_the_fire_flower +date: 2024-09-17 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_distilbert_base_uncased_finetuned_the_fire_flower` is a English model originally trained by miggwp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_distilbert_base_uncased_finetuned_the_fire_flower_en_5.5.0_3.0_1726587000564.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_distilbert_base_uncased_finetuned_the_fire_flower_en_5.5.0_3.0_1726587000564.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_distilbert_base_uncased_finetuned_the_fire_flower","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_distilbert_base_uncased_finetuned_the_fire_flower","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_distilbert_base_uncased_finetuned_the_fire_flower| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|407.1 MB| + +## References + +https://huggingface.co/miggwp/distilbert-base-uncased-finetuned-the-fire-flower \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-sent_mobilebert_add_pre_training_complete_en.md b/docs/_posts/ahmedlone127/2024-09-17-sent_mobilebert_add_pre_training_complete_en.md new file mode 100644 index 00000000000000..c014c7d2108787 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-sent_mobilebert_add_pre_training_complete_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_mobilebert_add_pre_training_complete BertSentenceEmbeddings from gokuls +author: John Snow Labs +name: sent_mobilebert_add_pre_training_complete +date: 2024-09-17 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_mobilebert_add_pre_training_complete` is a English model originally trained by gokuls. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_mobilebert_add_pre_training_complete_en_5.5.0_3.0_1726587154490.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_mobilebert_add_pre_training_complete_en_5.5.0_3.0_1726587154490.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_mobilebert_add_pre_training_complete","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_mobilebert_add_pre_training_complete","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_mobilebert_add_pre_training_complete| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|92.5 MB| + +## References + +https://huggingface.co/gokuls/mobilebert_add_pre-training-complete \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-sent_mobilebert_add_pre_training_complete_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-sent_mobilebert_add_pre_training_complete_pipeline_en.md new file mode 100644 index 00000000000000..e12db6eed978e9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-sent_mobilebert_add_pre_training_complete_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_mobilebert_add_pre_training_complete_pipeline pipeline BertSentenceEmbeddings from gokuls +author: John Snow Labs +name: sent_mobilebert_add_pre_training_complete_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_mobilebert_add_pre_training_complete_pipeline` is a English model originally trained by gokuls. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_mobilebert_add_pre_training_complete_pipeline_en_5.5.0_3.0_1726587159573.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_mobilebert_add_pre_training_complete_pipeline_en_5.5.0_3.0_1726587159573.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_mobilebert_add_pre_training_complete_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_mobilebert_add_pre_training_complete_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_mobilebert_add_pre_training_complete_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|93.0 MB| + +## References + +https://huggingface.co/gokuls/mobilebert_add_pre-training-complete + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-sent_parlbert_german_law_de.md b/docs/_posts/ahmedlone127/2024-09-17-sent_parlbert_german_law_de.md new file mode 100644 index 00000000000000..d3207bebb2ea98 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-sent_parlbert_german_law_de.md @@ -0,0 +1,94 @@ +--- +layout: model +title: German sent_parlbert_german_law BertSentenceEmbeddings from InfAI +author: John Snow Labs +name: sent_parlbert_german_law +date: 2024-09-17 +tags: [de, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: de +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_parlbert_german_law` is a German model originally trained by InfAI. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_parlbert_german_law_de_5.5.0_3.0_1726607250426.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_parlbert_german_law_de_5.5.0_3.0_1726607250426.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_parlbert_german_law","de") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_parlbert_german_law","de") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_parlbert_german_law| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|de| +|Size:|406.8 MB| + +## References + +https://huggingface.co/InfAI/parlbert-german-law \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-simcse_finetuned_100k_128batch_en.md b/docs/_posts/ahmedlone127/2024-09-17-simcse_finetuned_100k_128batch_en.md new file mode 100644 index 00000000000000..d060565931d9ae --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-simcse_finetuned_100k_128batch_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English simcse_finetuned_100k_128batch RoBertaForSequenceClassification from bitsanlp +author: John Snow Labs +name: simcse_finetuned_100k_128batch +date: 2024-09-17 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`simcse_finetuned_100k_128batch` is a English model originally trained by bitsanlp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/simcse_finetuned_100k_128batch_en_5.5.0_3.0_1726573879142.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/simcse_finetuned_100k_128batch_en_5.5.0_3.0_1726573879142.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("simcse_finetuned_100k_128batch","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("simcse_finetuned_100k_128batch", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|simcse_finetuned_100k_128batch| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|464.0 MB| + +## References + +https://huggingface.co/bitsanlp/simcse_finetuned_100k_128batch \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-small3_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-small3_pipeline_en.md new file mode 100644 index 00000000000000..f866edc667ec16 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-small3_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English small3_pipeline pipeline BertForTokenClassification from Narsil +author: John Snow Labs +name: small3_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`small3_pipeline` is a English model originally trained by Narsil. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/small3_pipeline_en_5.5.0_3.0_1726608591222.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/small3_pipeline_en_5.5.0_3.0_1726608591222.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("small3_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("small3_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|small3_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|559.5 KB| + +## References + +https://huggingface.co/Narsil/small3 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-speech_impediment_audio_pipeline_ko.md b/docs/_posts/ahmedlone127/2024-09-17-speech_impediment_audio_pipeline_ko.md new file mode 100644 index 00000000000000..d2c00f6845f16f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-speech_impediment_audio_pipeline_ko.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Korean speech_impediment_audio_pipeline pipeline WhisperForCTC from yoona-J +author: John Snow Labs +name: speech_impediment_audio_pipeline +date: 2024-09-17 +tags: [ko, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: ko +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`speech_impediment_audio_pipeline` is a Korean model originally trained by yoona-J. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/speech_impediment_audio_pipeline_ko_5.5.0_3.0_1726558647622.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/speech_impediment_audio_pipeline_ko_5.5.0_3.0_1726558647622.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("speech_impediment_audio_pipeline", lang = "ko") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("speech_impediment_audio_pipeline", lang = "ko") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|speech_impediment_audio_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|ko| +|Size:|642.4 MB| + +## References + +https://huggingface.co/yoona-J/speech_impediment_audio + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-squad2_trained_ep4_batch16_finetuned_squad2_emrqa_msquad_en.md b/docs/_posts/ahmedlone127/2024-09-17-squad2_trained_ep4_batch16_finetuned_squad2_emrqa_msquad_en.md new file mode 100644 index 00000000000000..3d8f61fa1c9657 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-squad2_trained_ep4_batch16_finetuned_squad2_emrqa_msquad_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English squad2_trained_ep4_batch16_finetuned_squad2_emrqa_msquad DistilBertForQuestionAnswering from wieheistdu +author: John Snow Labs +name: squad2_trained_ep4_batch16_finetuned_squad2_emrqa_msquad +date: 2024-09-17 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`squad2_trained_ep4_batch16_finetuned_squad2_emrqa_msquad` is a English model originally trained by wieheistdu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/squad2_trained_ep4_batch16_finetuned_squad2_emrqa_msquad_en_5.5.0_3.0_1726574919908.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/squad2_trained_ep4_batch16_finetuned_squad2_emrqa_msquad_en_5.5.0_3.0_1726574919908.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("squad2_trained_ep4_batch16_finetuned_squad2_emrqa_msquad","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("squad2_trained_ep4_batch16_finetuned_squad2_emrqa_msquad", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|squad2_trained_ep4_batch16_finetuned_squad2_emrqa_msquad| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/wieheistdu/squad2-trained-ep4-batch16-finetuned-squad2-emrQA-msquad \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-squad_mbert_english_german_spanish_vietnamese_chinese_model_en.md b/docs/_posts/ahmedlone127/2024-09-17-squad_mbert_english_german_spanish_vietnamese_chinese_model_en.md new file mode 100644 index 00000000000000..f2c238486e2247 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-squad_mbert_english_german_spanish_vietnamese_chinese_model_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English squad_mbert_english_german_spanish_vietnamese_chinese_model BertForQuestionAnswering from ZYW +author: John Snow Labs +name: squad_mbert_english_german_spanish_vietnamese_chinese_model +date: 2024-09-17 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`squad_mbert_english_german_spanish_vietnamese_chinese_model` is a English model originally trained by ZYW. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/squad_mbert_english_german_spanish_vietnamese_chinese_model_en_5.5.0_3.0_1726567981853.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/squad_mbert_english_german_spanish_vietnamese_chinese_model_en_5.5.0_3.0_1726567981853.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("squad_mbert_english_german_spanish_vietnamese_chinese_model","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("squad_mbert_english_german_spanish_vietnamese_chinese_model", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|squad_mbert_english_german_spanish_vietnamese_chinese_model| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|665.1 MB| + +## References + +https://huggingface.co/ZYW/squad-mbert-en-de-es-vi-zh-model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-stego_classifier_checkpoint_epoch_0_2024_07_26_15_06_21_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-stego_classifier_checkpoint_epoch_0_2024_07_26_15_06_21_pipeline_en.md new file mode 100644 index 00000000000000..bee12c72bf719c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-stego_classifier_checkpoint_epoch_0_2024_07_26_15_06_21_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English stego_classifier_checkpoint_epoch_0_2024_07_26_15_06_21_pipeline pipeline DistilBertForSequenceClassification from jvelja +author: John Snow Labs +name: stego_classifier_checkpoint_epoch_0_2024_07_26_15_06_21_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`stego_classifier_checkpoint_epoch_0_2024_07_26_15_06_21_pipeline` is a English model originally trained by jvelja. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/stego_classifier_checkpoint_epoch_0_2024_07_26_15_06_21_pipeline_en_5.5.0_3.0_1726584556781.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/stego_classifier_checkpoint_epoch_0_2024_07_26_15_06_21_pipeline_en_5.5.0_3.0_1726584556781.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("stego_classifier_checkpoint_epoch_0_2024_07_26_15_06_21_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("stego_classifier_checkpoint_epoch_0_2024_07_26_15_06_21_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|stego_classifier_checkpoint_epoch_0_2024_07_26_15_06_21_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/jvelja/stego-classifier-checkpoint-epoch-0-2024-07-26_15-06-21 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-subtopics_roberta_base2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-subtopics_roberta_base2_pipeline_en.md new file mode 100644 index 00000000000000..586c6a5b9ca68a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-subtopics_roberta_base2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English subtopics_roberta_base2_pipeline pipeline RoBertaForSequenceClassification from RogerKam +author: John Snow Labs +name: subtopics_roberta_base2_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`subtopics_roberta_base2_pipeline` is a English model originally trained by RogerKam. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/subtopics_roberta_base2_pipeline_en_5.5.0_3.0_1726590919238.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/subtopics_roberta_base2_pipeline_en_5.5.0_3.0_1726590919238.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("subtopics_roberta_base2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("subtopics_roberta_base2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|subtopics_roberta_base2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|429.5 MB| + +## References + +https://huggingface.co/RogerKam/subTopics-RoBERTa-base2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-swedish_fine_tuned_whisper_model_nadiaholmlund_pipeline_sv.md b/docs/_posts/ahmedlone127/2024-09-17-swedish_fine_tuned_whisper_model_nadiaholmlund_pipeline_sv.md new file mode 100644 index 00000000000000..593573629b70b3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-swedish_fine_tuned_whisper_model_nadiaholmlund_pipeline_sv.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Swedish swedish_fine_tuned_whisper_model_nadiaholmlund_pipeline pipeline WhisperForCTC from NadiaHolmlund +author: John Snow Labs +name: swedish_fine_tuned_whisper_model_nadiaholmlund_pipeline +date: 2024-09-17 +tags: [sv, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: sv +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`swedish_fine_tuned_whisper_model_nadiaholmlund_pipeline` is a Swedish model originally trained by NadiaHolmlund. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/swedish_fine_tuned_whisper_model_nadiaholmlund_pipeline_sv_5.5.0_3.0_1726563762798.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/swedish_fine_tuned_whisper_model_nadiaholmlund_pipeline_sv_5.5.0_3.0_1726563762798.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("swedish_fine_tuned_whisper_model_nadiaholmlund_pipeline", lang = "sv") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("swedish_fine_tuned_whisper_model_nadiaholmlund_pipeline", lang = "sv") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|swedish_fine_tuned_whisper_model_nadiaholmlund_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|sv| +|Size:|390.9 MB| + +## References + +https://huggingface.co/NadiaHolmlund/Swedish_Fine_Tuned_Whisper_Model + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-t2t_bible_portuguese_gun_en.md b/docs/_posts/ahmedlone127/2024-09-17-t2t_bible_portuguese_gun_en.md new file mode 100644 index 00000000000000..5125bedf045d4f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-t2t_bible_portuguese_gun_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English t2t_bible_portuguese_gun MarianTransformer from tiagoblima +author: John Snow Labs +name: t2t_bible_portuguese_gun +date: 2024-09-17 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`t2t_bible_portuguese_gun` is a English model originally trained by tiagoblima. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/t2t_bible_portuguese_gun_en_5.5.0_3.0_1726594683898.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/t2t_bible_portuguese_gun_en_5.5.0_3.0_1726594683898.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("t2t_bible_portuguese_gun","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("t2t_bible_portuguese_gun","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|t2t_bible_portuguese_gun| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|220.6 MB| + +## References + +https://huggingface.co/tiagoblima/t2t-bible-pt-gun \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-telugu_sentiment_analysis_pipeline_te.md b/docs/_posts/ahmedlone127/2024-09-17-telugu_sentiment_analysis_pipeline_te.md new file mode 100644 index 00000000000000..3d54eec4d9e1a4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-telugu_sentiment_analysis_pipeline_te.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Telugu telugu_sentiment_analysis_pipeline pipeline AlbertForSequenceClassification from aashish-249 +author: John Snow Labs +name: telugu_sentiment_analysis_pipeline +date: 2024-09-17 +tags: [te, open_source, pipeline, onnx] +task: Text Classification +language: te +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained AlbertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`telugu_sentiment_analysis_pipeline` is a Telugu model originally trained by aashish-249. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/telugu_sentiment_analysis_pipeline_te_5.5.0_3.0_1726605971118.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/telugu_sentiment_analysis_pipeline_te_5.5.0_3.0_1726605971118.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("telugu_sentiment_analysis_pipeline", lang = "te") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("telugu_sentiment_analysis_pipeline", lang = "te") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|telugu_sentiment_analysis_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|te| +|Size:|125.9 MB| + +## References + +https://huggingface.co/aashish-249/Telugu-sentiment_analysis + +## Included Models + +- DocumentAssembler +- TokenizerModel +- AlbertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-telugu_sentiment_analysis_te.md b/docs/_posts/ahmedlone127/2024-09-17-telugu_sentiment_analysis_te.md new file mode 100644 index 00000000000000..298bd19a81ceaa --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-telugu_sentiment_analysis_te.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Telugu telugu_sentiment_analysis AlbertForSequenceClassification from aashish-249 +author: John Snow Labs +name: telugu_sentiment_analysis +date: 2024-09-17 +tags: [te, open_source, onnx, sequence_classification, albert] +task: Text Classification +language: te +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: AlbertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained AlbertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`telugu_sentiment_analysis` is a Telugu model originally trained by aashish-249. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/telugu_sentiment_analysis_te_5.5.0_3.0_1726605964902.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/telugu_sentiment_analysis_te_5.5.0_3.0_1726605964902.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = AlbertForSequenceClassification.pretrained("telugu_sentiment_analysis","te") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = AlbertForSequenceClassification.pretrained("telugu_sentiment_analysis", "te") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|telugu_sentiment_analysis| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|te| +|Size:|125.9 MB| + +## References + +https://huggingface.co/aashish-249/Telugu-sentiment_analysis \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-test1_joetan_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-test1_joetan_pipeline_en.md new file mode 100644 index 00000000000000..46ec6f6ddeefbe --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-test1_joetan_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English test1_joetan_pipeline pipeline WhisperForCTC from JoeTan +author: John Snow Labs +name: test1_joetan_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`test1_joetan_pipeline` is a English model originally trained by JoeTan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/test1_joetan_pipeline_en_5.5.0_3.0_1726568991342.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/test1_joetan_pipeline_en_5.5.0_3.0_1726568991342.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("test1_joetan_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("test1_joetan_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|test1_joetan_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/JoeTan/test1 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-test_squad_karin25_en.md b/docs/_posts/ahmedlone127/2024-09-17-test_squad_karin25_en.md new file mode 100644 index 00000000000000..cb8119849fc9c6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-test_squad_karin25_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English test_squad_karin25 DistilBertForQuestionAnswering from karin25 +author: John Snow Labs +name: test_squad_karin25 +date: 2024-09-17 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`test_squad_karin25` is a English model originally trained by karin25. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/test_squad_karin25_en_5.5.0_3.0_1726599782967.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/test_squad_karin25_en_5.5.0_3.0_1726599782967.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("test_squad_karin25","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("test_squad_karin25", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|test_squad_karin25| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/karin25/test-squad \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-test_squad_karin25_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-test_squad_karin25_pipeline_en.md new file mode 100644 index 00000000000000..9ba4aef45530fe --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-test_squad_karin25_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English test_squad_karin25_pipeline pipeline DistilBertForQuestionAnswering from karin25 +author: John Snow Labs +name: test_squad_karin25_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`test_squad_karin25_pipeline` is a English model originally trained by karin25. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/test_squad_karin25_pipeline_en_5.5.0_3.0_1726599795187.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/test_squad_karin25_pipeline_en_5.5.0_3.0_1726599795187.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("test_squad_karin25_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("test_squad_karin25_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|test_squad_karin25_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.3 MB| + +## References + +https://huggingface.co/karin25/test-squad + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-tinymax_en.md b/docs/_posts/ahmedlone127/2024-09-17-tinymax_en.md new file mode 100644 index 00000000000000..b9a15ff9f48d22 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-tinymax_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English tinymax WhisperForCTC from tabsadem +author: John Snow Labs +name: tinymax +date: 2024-09-17 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`tinymax` is a English model originally trained by tabsadem. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/tinymax_en_5.5.0_3.0_1726549971269.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/tinymax_en_5.5.0_3.0_1726549971269.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("tinymax","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("tinymax", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|tinymax| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|392.0 MB| + +## References + +https://huggingface.co/tabsadem/tinyMAX \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-translator_italian_english_en.md b/docs/_posts/ahmedlone127/2024-09-17-translator_italian_english_en.md new file mode 100644 index 00000000000000..5e578e277b69fe --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-translator_italian_english_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English translator_italian_english MarianTransformer from zaneas +author: John Snow Labs +name: translator_italian_english +date: 2024-09-17 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`translator_italian_english` is a English model originally trained by zaneas. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/translator_italian_english_en_5.5.0_3.0_1726582222863.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/translator_italian_english_en_5.5.0_3.0_1726582222863.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("translator_italian_english","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("translator_italian_english","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|translator_italian_english| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|625.0 MB| + +## References + +https://huggingface.co/zaneas/translator_IT_EN \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-twitter_data_xlm_roberta_base_eng_only_sentiment_finetuned_memes_en.md b/docs/_posts/ahmedlone127/2024-09-17-twitter_data_xlm_roberta_base_eng_only_sentiment_finetuned_memes_en.md new file mode 100644 index 00000000000000..075abc7460b8c7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-twitter_data_xlm_roberta_base_eng_only_sentiment_finetuned_memes_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English twitter_data_xlm_roberta_base_eng_only_sentiment_finetuned_memes XlmRoBertaForSequenceClassification from jayantapaul888 +author: John Snow Labs +name: twitter_data_xlm_roberta_base_eng_only_sentiment_finetuned_memes +date: 2024-09-17 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`twitter_data_xlm_roberta_base_eng_only_sentiment_finetuned_memes` is a English model originally trained by jayantapaul888. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/twitter_data_xlm_roberta_base_eng_only_sentiment_finetuned_memes_en_5.5.0_3.0_1726535666078.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/twitter_data_xlm_roberta_base_eng_only_sentiment_finetuned_memes_en_5.5.0_3.0_1726535666078.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("twitter_data_xlm_roberta_base_eng_only_sentiment_finetuned_memes","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("twitter_data_xlm_roberta_base_eng_only_sentiment_finetuned_memes", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|twitter_data_xlm_roberta_base_eng_only_sentiment_finetuned_memes| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|807.4 MB| + +## References + +https://huggingface.co/jayantapaul888/twitter-data-xlm-roberta-base-eng-only-sentiment-finetuned-memes \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-whisper_bangla_bn.md b/docs/_posts/ahmedlone127/2024-09-17-whisper_bangla_bn.md new file mode 100644 index 00000000000000..7e6a4026bbba93 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-whisper_bangla_bn.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Bengali whisper_bangla WhisperForCTC from asif00 +author: John Snow Labs +name: whisper_bangla +date: 2024-09-17 +tags: [bn, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: bn +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_bangla` is a Bengali model originally trained by asif00. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_bangla_bn_5.5.0_3.0_1726568435657.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_bangla_bn_5.5.0_3.0_1726568435657.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_bangla","bn") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_bangla", "bn") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_bangla| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|bn| +|Size:|1.7 GB| + +## References + +https://huggingface.co/asif00/whisper-bangla \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-whisper_base_catalan_ca.md b/docs/_posts/ahmedlone127/2024-09-17-whisper_base_catalan_ca.md new file mode 100644 index 00000000000000..687435d6b4e7ba --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-whisper_base_catalan_ca.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Catalan, Valencian whisper_base_catalan WhisperForCTC from zuazo +author: John Snow Labs +name: whisper_base_catalan +date: 2024-09-17 +tags: [ca, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: ca +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_base_catalan` is a Catalan, Valencian model originally trained by zuazo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_base_catalan_ca_5.5.0_3.0_1726542749093.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_base_catalan_ca_5.5.0_3.0_1726542749093.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_base_catalan","ca") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_base_catalan", "ca") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_base_catalan| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|ca| +|Size:|642.7 MB| + +## References + +https://huggingface.co/zuazo/whisper-base-ca \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-whisper_base_chinese_cer_pipeline_zh.md b/docs/_posts/ahmedlone127/2024-09-17-whisper_base_chinese_cer_pipeline_zh.md new file mode 100644 index 00000000000000..db894bf04781d1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-whisper_base_chinese_cer_pipeline_zh.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Chinese whisper_base_chinese_cer_pipeline pipeline WhisperForCTC from HuangJordan +author: John Snow Labs +name: whisper_base_chinese_cer_pipeline +date: 2024-09-17 +tags: [zh, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: zh +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_base_chinese_cer_pipeline` is a Chinese model originally trained by HuangJordan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_base_chinese_cer_pipeline_zh_5.5.0_3.0_1726551311289.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_base_chinese_cer_pipeline_zh_5.5.0_3.0_1726551311289.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_base_chinese_cer_pipeline", lang = "zh") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_base_chinese_cer_pipeline", lang = "zh") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_base_chinese_cer_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|zh| +|Size:|642.1 MB| + +## References + +https://huggingface.co/HuangJordan/whisper-base-chinese-cer + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-whisper_base_encod_vietmed_pipeline_vi.md b/docs/_posts/ahmedlone127/2024-09-17-whisper_base_encod_vietmed_pipeline_vi.md new file mode 100644 index 00000000000000..febae1fb4cebad --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-whisper_base_encod_vietmed_pipeline_vi.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Vietnamese whisper_base_encod_vietmed_pipeline pipeline WhisperForCTC from Hanhpt23 +author: John Snow Labs +name: whisper_base_encod_vietmed_pipeline +date: 2024-09-17 +tags: [vi, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: vi +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_base_encod_vietmed_pipeline` is a Vietnamese model originally trained by Hanhpt23. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_base_encod_vietmed_pipeline_vi_5.5.0_3.0_1726547805888.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_base_encod_vietmed_pipeline_vi_5.5.0_3.0_1726547805888.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_base_encod_vietmed_pipeline", lang = "vi") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_base_encod_vietmed_pipeline", lang = "vi") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_base_encod_vietmed_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|vi| +|Size:|641.7 MB| + +## References + +https://huggingface.co/Hanhpt23/whisper-base-Encod-vietmed + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-whisper_base_full_data_aug_v1_en.md b/docs/_posts/ahmedlone127/2024-09-17-whisper_base_full_data_aug_v1_en.md new file mode 100644 index 00000000000000..c7cec547329b93 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-whisper_base_full_data_aug_v1_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_base_full_data_aug_v1 WhisperForCTC from thanhduycao +author: John Snow Labs +name: whisper_base_full_data_aug_v1 +date: 2024-09-17 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_base_full_data_aug_v1` is a English model originally trained by thanhduycao. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_base_full_data_aug_v1_en_5.5.0_3.0_1726538947021.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_base_full_data_aug_v1_en_5.5.0_3.0_1726538947021.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_base_full_data_aug_v1","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_base_full_data_aug_v1", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_base_full_data_aug_v1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|642.6 MB| + +## References + +https://huggingface.co/thanhduycao/whisper-base-full-data-aug-v1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-whisper_base_phon_drive_dataset_large_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-whisper_base_phon_drive_dataset_large_pipeline_en.md new file mode 100644 index 00000000000000..ad8f43ab3a0cc7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-whisper_base_phon_drive_dataset_large_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_base_phon_drive_dataset_large_pipeline pipeline WhisperForCTC from dg96 +author: John Snow Labs +name: whisper_base_phon_drive_dataset_large_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_base_phon_drive_dataset_large_pipeline` is a English model originally trained by dg96. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_base_phon_drive_dataset_large_pipeline_en_5.5.0_3.0_1726568333264.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_base_phon_drive_dataset_large_pipeline_en_5.5.0_3.0_1726568333264.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_base_phon_drive_dataset_large_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_base_phon_drive_dataset_large_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_base_phon_drive_dataset_large_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|643.6 MB| + +## References + +https://huggingface.co/dg96/whisper-base-phon-drive-dataset-large + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-whisper_base_thai_project_3_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-whisper_base_thai_project_3_pipeline_en.md new file mode 100644 index 00000000000000..c6cc196834bfe1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-whisper_base_thai_project_3_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_base_thai_project_3_pipeline pipeline WhisperForCTC from Varit +author: John Snow Labs +name: whisper_base_thai_project_3_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_base_thai_project_3_pipeline` is a English model originally trained by Varit. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_base_thai_project_3_pipeline_en_5.5.0_3.0_1726542197973.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_base_thai_project_3_pipeline_en_5.5.0_3.0_1726542197973.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_base_thai_project_3_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_base_thai_project_3_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_base_thai_project_3_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|641.9 MB| + +## References + +https://huggingface.co/Varit/whisper-base-th-project-3 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-whisper_base_vtlustos_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-whisper_base_vtlustos_pipeline_en.md new file mode 100644 index 00000000000000..11fffcb2e19308 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-whisper_base_vtlustos_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_base_vtlustos_pipeline pipeline WhisperForCTC from vtlustos +author: John Snow Labs +name: whisper_base_vtlustos_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_base_vtlustos_pipeline` is a English model originally trained by vtlustos. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_base_vtlustos_pipeline_en_5.5.0_3.0_1726568333577.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_base_vtlustos_pipeline_en_5.5.0_3.0_1726568333577.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_base_vtlustos_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_base_vtlustos_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_base_vtlustos_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|643.7 MB| + +## References + +https://huggingface.co/vtlustos/whisper-base + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-whisper_italian_whispy_it.md b/docs/_posts/ahmedlone127/2024-09-17-whisper_italian_whispy_it.md new file mode 100644 index 00000000000000..2ff1118cfdefd5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-whisper_italian_whispy_it.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Italian whisper_italian_whispy WhisperForCTC from whispy +author: John Snow Labs +name: whisper_italian_whispy +date: 2024-09-17 +tags: [it, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: it +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_italian_whispy` is a Italian model originally trained by whispy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_italian_whispy_it_5.5.0_3.0_1726543277944.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_italian_whispy_it_5.5.0_3.0_1726543277944.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_italian_whispy","it") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_italian_whispy", "it") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_italian_whispy| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|it| +|Size:|1.7 GB| + +## References + +https://huggingface.co/whispy/whisper_italian \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-whisper_italian_whispy_pipeline_it.md b/docs/_posts/ahmedlone127/2024-09-17-whisper_italian_whispy_pipeline_it.md new file mode 100644 index 00000000000000..8060d9c8bc3305 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-whisper_italian_whispy_pipeline_it.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Italian whisper_italian_whispy_pipeline pipeline WhisperForCTC from whispy +author: John Snow Labs +name: whisper_italian_whispy_pipeline +date: 2024-09-17 +tags: [it, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: it +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_italian_whispy_pipeline` is a Italian model originally trained by whispy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_italian_whispy_pipeline_it_5.5.0_3.0_1726543359182.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_italian_whispy_pipeline_it_5.5.0_3.0_1726543359182.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_italian_whispy_pipeline", lang = "it") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_italian_whispy_pipeline", lang = "it") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_italian_whispy_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|it| +|Size:|1.7 GB| + +## References + +https://huggingface.co/whispy/whisper_italian + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-whisper_small_arabic_bouim_en.md b/docs/_posts/ahmedlone127/2024-09-17-whisper_small_arabic_bouim_en.md new file mode 100644 index 00000000000000..69288748238607 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-whisper_small_arabic_bouim_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_small_arabic_bouim WhisperForCTC from bouim +author: John Snow Labs +name: whisper_small_arabic_bouim +date: 2024-09-17 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_arabic_bouim` is a English model originally trained by bouim. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_arabic_bouim_en_5.5.0_3.0_1726572080301.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_arabic_bouim_en_5.5.0_3.0_1726572080301.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_arabic_bouim","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_arabic_bouim", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_arabic_bouim| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/bouim/whisper-small-ar \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-whisper_small_arabic_bouim_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-whisper_small_arabic_bouim_pipeline_en.md new file mode 100644 index 00000000000000..14719764095a28 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-whisper_small_arabic_bouim_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_small_arabic_bouim_pipeline pipeline WhisperForCTC from bouim +author: John Snow Labs +name: whisper_small_arabic_bouim_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_arabic_bouim_pipeline` is a English model originally trained by bouim. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_arabic_bouim_pipeline_en_5.5.0_3.0_1726572160274.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_arabic_bouim_pipeline_en_5.5.0_3.0_1726572160274.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_arabic_bouim_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_arabic_bouim_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_arabic_bouim_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/bouim/whisper-small-ar + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-whisper_small_bengali_crblp_bn.md b/docs/_posts/ahmedlone127/2024-09-17-whisper_small_bengali_crblp_bn.md new file mode 100644 index 00000000000000..ee038953177330 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-whisper_small_bengali_crblp_bn.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Bengali whisper_small_bengali_crblp WhisperForCTC from Rakib +author: John Snow Labs +name: whisper_small_bengali_crblp +date: 2024-09-17 +tags: [bn, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: bn +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_bengali_crblp` is a Bengali model originally trained by Rakib. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_bengali_crblp_bn_5.5.0_3.0_1726540813304.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_bengali_crblp_bn_5.5.0_3.0_1726540813304.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_bengali_crblp","bn") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_bengali_crblp", "bn") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_bengali_crblp| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|bn| +|Size:|1.7 GB| + +## References + +https://huggingface.co/Rakib/whisper-small-bn-crblp \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-whisper_small_cantonese_chandc_pipeline_zh.md b/docs/_posts/ahmedlone127/2024-09-17-whisper_small_cantonese_chandc_pipeline_zh.md new file mode 100644 index 00000000000000..9aab98dc93475d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-whisper_small_cantonese_chandc_pipeline_zh.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Chinese whisper_small_cantonese_chandc_pipeline pipeline WhisperForCTC from chandc +author: John Snow Labs +name: whisper_small_cantonese_chandc_pipeline +date: 2024-09-17 +tags: [zh, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: zh +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_cantonese_chandc_pipeline` is a Chinese model originally trained by chandc. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_cantonese_chandc_pipeline_zh_5.5.0_3.0_1726556691031.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_cantonese_chandc_pipeline_zh_5.5.0_3.0_1726556691031.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_cantonese_chandc_pipeline", lang = "zh") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_cantonese_chandc_pipeline", lang = "zh") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_cantonese_chandc_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|zh| +|Size:|1.7 GB| + +## References + +https://huggingface.co/chandc/whisper-small-Cantonese + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-whisper_small_galician_zuazo_pipeline_gl.md b/docs/_posts/ahmedlone127/2024-09-17-whisper_small_galician_zuazo_pipeline_gl.md new file mode 100644 index 00000000000000..41a670a56b8c73 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-whisper_small_galician_zuazo_pipeline_gl.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Galician whisper_small_galician_zuazo_pipeline pipeline WhisperForCTC from zuazo +author: John Snow Labs +name: whisper_small_galician_zuazo_pipeline +date: 2024-09-17 +tags: [gl, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: gl +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_galician_zuazo_pipeline` is a Galician model originally trained by zuazo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_galician_zuazo_pipeline_gl_5.5.0_3.0_1726547900115.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_galician_zuazo_pipeline_gl_5.5.0_3.0_1726547900115.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_galician_zuazo_pipeline", lang = "gl") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_galician_zuazo_pipeline", lang = "gl") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_galician_zuazo_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|gl| +|Size:|1.7 GB| + +## References + +https://huggingface.co/zuazo/whisper-small-gl + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-whisper_small_hindi_agershun_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-whisper_small_hindi_agershun_pipeline_en.md new file mode 100644 index 00000000000000..56d956e6678453 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-whisper_small_hindi_agershun_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_small_hindi_agershun_pipeline pipeline WhisperForCTC from agershun +author: John Snow Labs +name: whisper_small_hindi_agershun_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_hindi_agershun_pipeline` is a English model originally trained by agershun. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_hindi_agershun_pipeline_en_5.5.0_3.0_1726550573212.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_hindi_agershun_pipeline_en_5.5.0_3.0_1726550573212.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_hindi_agershun_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_hindi_agershun_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_hindi_agershun_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/agershun/whisper-small-hi + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-whisper_small_hindi_hsyyy_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-whisper_small_hindi_hsyyy_pipeline_en.md new file mode 100644 index 00000000000000..9a992fc1a0e946 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-whisper_small_hindi_hsyyy_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_small_hindi_hsyyy_pipeline pipeline WhisperForCTC from hsyyy +author: John Snow Labs +name: whisper_small_hindi_hsyyy_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_hindi_hsyyy_pipeline` is a English model originally trained by hsyyy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_hindi_hsyyy_pipeline_en_5.5.0_3.0_1726562067767.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_hindi_hsyyy_pipeline_en_5.5.0_3.0_1726562067767.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_hindi_hsyyy_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_hindi_hsyyy_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_hindi_hsyyy_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/hsyyy/whisper-small-hi + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-whisper_small_hindi_tjohanne_en.md b/docs/_posts/ahmedlone127/2024-09-17-whisper_small_hindi_tjohanne_en.md new file mode 100644 index 00000000000000..c73770f9dc4f37 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-whisper_small_hindi_tjohanne_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_small_hindi_tjohanne WhisperForCTC from tjohanne +author: John Snow Labs +name: whisper_small_hindi_tjohanne +date: 2024-09-17 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_hindi_tjohanne` is a English model originally trained by tjohanne. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_hindi_tjohanne_en_5.5.0_3.0_1726550020728.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_hindi_tjohanne_en_5.5.0_3.0_1726550020728.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_hindi_tjohanne","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_hindi_tjohanne", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_hindi_tjohanne| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/tjohanne/whisper-small-hi \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-whisper_small_hre2_1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-whisper_small_hre2_1_pipeline_en.md new file mode 100644 index 00000000000000..d901a3fa2538a1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-whisper_small_hre2_1_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_small_hre2_1_pipeline pipeline WhisperForCTC from ntviet +author: John Snow Labs +name: whisper_small_hre2_1_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_hre2_1_pipeline` is a English model originally trained by ntviet. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_hre2_1_pipeline_en_5.5.0_3.0_1726558332418.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_hre2_1_pipeline_en_5.5.0_3.0_1726558332418.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_hre2_1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_hre2_1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_hre2_1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/ntviet/whisper-small-hre2.1 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-whisper_small_katti_en.md b/docs/_posts/ahmedlone127/2024-09-17-whisper_small_katti_en.md new file mode 100644 index 00000000000000..6c5f00d3d10aa7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-whisper_small_katti_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_small_katti WhisperForCTC from shreyasdesaisuperU +author: John Snow Labs +name: whisper_small_katti +date: 2024-09-17 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_katti` is a English model originally trained by shreyasdesaisuperU. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_katti_en_5.5.0_3.0_1726541232967.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_katti_en_5.5.0_3.0_1726541232967.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_katti","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_katti", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_katti| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/shreyasdesaisuperU/whisper-small-katti \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-whisper_small_katti_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-whisper_small_katti_pipeline_en.md new file mode 100644 index 00000000000000..c92d868f4263ad --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-whisper_small_katti_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_small_katti_pipeline pipeline WhisperForCTC from shreyasdesaisuperU +author: John Snow Labs +name: whisper_small_katti_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_katti_pipeline` is a English model originally trained by shreyasdesaisuperU. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_katti_pipeline_en_5.5.0_3.0_1726541331250.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_katti_pipeline_en_5.5.0_3.0_1726541331250.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_katti_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_katti_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_katti_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/shreyasdesaisuperU/whisper-small-katti + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-whisper_small_mongolian_7_en.md b/docs/_posts/ahmedlone127/2024-09-17-whisper_small_mongolian_7_en.md new file mode 100644 index 00000000000000..a838cfbc4948f0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-whisper_small_mongolian_7_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_small_mongolian_7 WhisperForCTC from bayartsogt +author: John Snow Labs +name: whisper_small_mongolian_7 +date: 2024-09-17 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_mongolian_7` is a English model originally trained by bayartsogt. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_mongolian_7_en_5.5.0_3.0_1726571463804.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_mongolian_7_en_5.5.0_3.0_1726571463804.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_mongolian_7","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_mongolian_7", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_mongolian_7| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/bayartsogt/whisper-small-mn-7 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-whisper_small_swedish_v4_pipeline_sv.md b/docs/_posts/ahmedlone127/2024-09-17-whisper_small_swedish_v4_pipeline_sv.md new file mode 100644 index 00000000000000..4196c28cb0c31b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-whisper_small_swedish_v4_pipeline_sv.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Swedish whisper_small_swedish_v4_pipeline pipeline WhisperForCTC from AdrianHR +author: John Snow Labs +name: whisper_small_swedish_v4_pipeline +date: 2024-09-17 +tags: [sv, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: sv +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_swedish_v4_pipeline` is a Swedish model originally trained by AdrianHR. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_swedish_v4_pipeline_sv_5.5.0_3.0_1726547719212.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_swedish_v4_pipeline_sv_5.5.0_3.0_1726547719212.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_swedish_v4_pipeline", lang = "sv") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_swedish_v4_pipeline", lang = "sv") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_swedish_v4_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|sv| +|Size:|1.7 GB| + +## References + +https://huggingface.co/AdrianHR/whisper-small-sv-v4 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-whisper_small_turkish_cp_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-whisper_small_turkish_cp_pipeline_en.md new file mode 100644 index 00000000000000..f4a7c216b91a6c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-whisper_small_turkish_cp_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_small_turkish_cp_pipeline pipeline WhisperForCTC from Kiwipirate +author: John Snow Labs +name: whisper_small_turkish_cp_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_turkish_cp_pipeline` is a English model originally trained by Kiwipirate. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_turkish_cp_pipeline_en_5.5.0_3.0_1726565133277.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_turkish_cp_pipeline_en_5.5.0_3.0_1726565133277.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_turkish_cp_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_turkish_cp_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_turkish_cp_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/Kiwipirate/whisper-small-tr-cp + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-whisper_small_turkish_muhtasham_pipeline_tr.md b/docs/_posts/ahmedlone127/2024-09-17-whisper_small_turkish_muhtasham_pipeline_tr.md new file mode 100644 index 00000000000000..02be842031c644 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-whisper_small_turkish_muhtasham_pipeline_tr.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Turkish whisper_small_turkish_muhtasham_pipeline pipeline WhisperForCTC from muhtasham +author: John Snow Labs +name: whisper_small_turkish_muhtasham_pipeline +date: 2024-09-17 +tags: [tr, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: tr +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_turkish_muhtasham_pipeline` is a Turkish model originally trained by muhtasham. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_turkish_muhtasham_pipeline_tr_5.5.0_3.0_1726569287347.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_turkish_muhtasham_pipeline_tr_5.5.0_3.0_1726569287347.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_turkish_muhtasham_pipeline", lang = "tr") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_turkish_muhtasham_pipeline", lang = "tr") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_turkish_muhtasham_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|tr| +|Size:|1.7 GB| + +## References + +https://huggingface.co/muhtasham/whisper-small-tr + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-whisper_small_vietmed_free_e3_11_pipeline_vi.md b/docs/_posts/ahmedlone127/2024-09-17-whisper_small_vietmed_free_e3_11_pipeline_vi.md new file mode 100644 index 00000000000000..6f2c074894baf3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-whisper_small_vietmed_free_e3_11_pipeline_vi.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Vietnamese whisper_small_vietmed_free_e3_11_pipeline pipeline WhisperForCTC from Hanhpt23 +author: John Snow Labs +name: whisper_small_vietmed_free_e3_11_pipeline +date: 2024-09-17 +tags: [vi, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: vi +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_vietmed_free_e3_11_pipeline` is a Vietnamese model originally trained by Hanhpt23. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_vietmed_free_e3_11_pipeline_vi_5.5.0_3.0_1726565568518.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_vietmed_free_e3_11_pipeline_vi_5.5.0_3.0_1726565568518.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_vietmed_free_e3_11_pipeline", lang = "vi") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_vietmed_free_e3_11_pipeline", lang = "vi") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_vietmed_free_e3_11_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|vi| +|Size:|1.6 GB| + +## References + +https://huggingface.co/Hanhpt23/whisper-small-vietmed-free_E3-11 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-whisper_small_xhosa_za_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-whisper_small_xhosa_za_pipeline_en.md new file mode 100644 index 00000000000000..64413d8ab68f6b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-whisper_small_xhosa_za_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_small_xhosa_za_pipeline pipeline WhisperForCTC from julie200 +author: John Snow Labs +name: whisper_small_xhosa_za_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_xhosa_za_pipeline` is a English model originally trained by julie200. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_xhosa_za_pipeline_en_5.5.0_3.0_1726550925273.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_xhosa_za_pipeline_en_5.5.0_3.0_1726550925273.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_xhosa_za_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_xhosa_za_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_xhosa_za_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/julie200/whisper-small-xh_za + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-whisper_tiny_engmed_v2_en.md b/docs/_posts/ahmedlone127/2024-09-17-whisper_tiny_engmed_v2_en.md new file mode 100644 index 00000000000000..2f80cde9442c0f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-whisper_tiny_engmed_v2_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_tiny_engmed_v2 WhisperForCTC from Hanhpt23 +author: John Snow Labs +name: whisper_tiny_engmed_v2 +date: 2024-09-17 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_engmed_v2` is a English model originally trained by Hanhpt23. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_engmed_v2_en_5.5.0_3.0_1726547743868.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_engmed_v2_en_5.5.0_3.0_1726547743868.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_tiny_engmed_v2","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_tiny_engmed_v2", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_engmed_v2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|378.0 MB| + +## References + +https://huggingface.co/Hanhpt23/whisper-tiny-engmed-v2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-whisper_tiny_lbr47_en.md b/docs/_posts/ahmedlone127/2024-09-17-whisper_tiny_lbr47_en.md new file mode 100644 index 00000000000000..f044e1db69d11a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-whisper_tiny_lbr47_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_tiny_lbr47 WhisperForCTC from LBR47 +author: John Snow Labs +name: whisper_tiny_lbr47 +date: 2024-09-17 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_lbr47` is a English model originally trained by LBR47. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_lbr47_en_5.5.0_3.0_1726548104245.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_lbr47_en_5.5.0_3.0_1726548104245.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_tiny_lbr47","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_tiny_lbr47", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_lbr47| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|242.8 MB| + +## References + +https://huggingface.co/LBR47/whisper-tiny \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-whisper_tiny_lbr47_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-whisper_tiny_lbr47_pipeline_en.md new file mode 100644 index 00000000000000..11b5e4a8f3d6c4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-whisper_tiny_lbr47_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_tiny_lbr47_pipeline pipeline WhisperForCTC from LBR47 +author: John Snow Labs +name: whisper_tiny_lbr47_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_lbr47_pipeline` is a English model originally trained by LBR47. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_lbr47_pipeline_en_5.5.0_3.0_1726548174832.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_lbr47_pipeline_en_5.5.0_3.0_1726548174832.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_tiny_lbr47_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_tiny_lbr47_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_lbr47_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|242.9 MB| + +## References + +https://huggingface.co/LBR47/whisper-tiny + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-whisper_tiny_minds14_us_finetuned_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-whisper_tiny_minds14_us_finetuned_pipeline_en.md new file mode 100644 index 00000000000000..fbac3268218bf8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-whisper_tiny_minds14_us_finetuned_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_tiny_minds14_us_finetuned_pipeline pipeline WhisperForCTC from pedroferreira +author: John Snow Labs +name: whisper_tiny_minds14_us_finetuned_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_minds14_us_finetuned_pipeline` is a English model originally trained by pedroferreira. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_minds14_us_finetuned_pipeline_en_5.5.0_3.0_1726558857380.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_minds14_us_finetuned_pipeline_en_5.5.0_3.0_1726558857380.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_tiny_minds14_us_finetuned_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_tiny_minds14_us_finetuned_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_minds14_us_finetuned_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|390.9 MB| + +## References + +https://huggingface.co/pedroferreira/whisper-tiny-minds14-US-finetuned + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-whisper_tiny_spanish_herme_es.md b/docs/_posts/ahmedlone127/2024-09-17-whisper_tiny_spanish_herme_es.md new file mode 100644 index 00000000000000..a417d3fea1baa9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-whisper_tiny_spanish_herme_es.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Castilian, Spanish whisper_tiny_spanish_herme WhisperForCTC from herme +author: John Snow Labs +name: whisper_tiny_spanish_herme +date: 2024-09-17 +tags: [es, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: es +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_spanish_herme` is a Castilian, Spanish model originally trained by herme. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_spanish_herme_es_5.5.0_3.0_1726550881250.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_spanish_herme_es_5.5.0_3.0_1726550881250.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_tiny_spanish_herme","es") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_tiny_spanish_herme", "es") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_spanish_herme| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|es| +|Size:|390.8 MB| + +## References + +https://huggingface.co/herme/whisper-tiny-es \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-wikidata_researchers_classifier_en.md b/docs/_posts/ahmedlone127/2024-09-17-wikidata_researchers_classifier_en.md new file mode 100644 index 00000000000000..724165f0f13991 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-wikidata_researchers_classifier_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English wikidata_researchers_classifier RoBertaForSequenceClassification from matthewleechen +author: John Snow Labs +name: wikidata_researchers_classifier +date: 2024-09-17 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`wikidata_researchers_classifier` is a English model originally trained by matthewleechen. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/wikidata_researchers_classifier_en_5.5.0_3.0_1726573797690.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/wikidata_researchers_classifier_en_5.5.0_3.0_1726573797690.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("wikidata_researchers_classifier","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("wikidata_researchers_classifier", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|wikidata_researchers_classifier| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/matthewleechen/wikidata_researchers_classifier \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-wikidata_researchers_classifier_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-wikidata_researchers_classifier_pipeline_en.md new file mode 100644 index 00000000000000..f318403624d482 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-wikidata_researchers_classifier_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English wikidata_researchers_classifier_pipeline pipeline RoBertaForSequenceClassification from matthewleechen +author: John Snow Labs +name: wikidata_researchers_classifier_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`wikidata_researchers_classifier_pipeline` is a English model originally trained by matthewleechen. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/wikidata_researchers_classifier_pipeline_en_5.5.0_3.0_1726573880993.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/wikidata_researchers_classifier_pipeline_en_5.5.0_3.0_1726573880993.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("wikidata_researchers_classifier_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("wikidata_researchers_classifier_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|wikidata_researchers_classifier_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/matthewleechen/wikidata_researchers_classifier + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-xlm_roberta_base_final_mixed_aug_swap_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-xlm_roberta_base_final_mixed_aug_swap_pipeline_en.md new file mode 100644 index 00000000000000..2f284419e2d32f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-xlm_roberta_base_final_mixed_aug_swap_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_final_mixed_aug_swap_pipeline pipeline XlmRoBertaForSequenceClassification from ThuyNT03 +author: John Snow Labs +name: xlm_roberta_base_final_mixed_aug_swap_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_final_mixed_aug_swap_pipeline` is a English model originally trained by ThuyNT03. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_final_mixed_aug_swap_pipeline_en_5.5.0_3.0_1726535560878.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_final_mixed_aug_swap_pipeline_en_5.5.0_3.0_1726535560878.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_final_mixed_aug_swap_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_final_mixed_aug_swap_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_final_mixed_aug_swap_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|794.4 MB| + +## References + +https://huggingface.co/ThuyNT03/xlm-roberta-base-Final_Mixed-aug_swap + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-xlm_roberta_base_finetuned_panx_all_param_mehta_en.md b/docs/_posts/ahmedlone127/2024-09-17-xlm_roberta_base_finetuned_panx_all_param_mehta_en.md new file mode 100644 index 00000000000000..04f1dadc06d23e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-xlm_roberta_base_finetuned_panx_all_param_mehta_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_all_param_mehta XlmRoBertaForTokenClassification from param-mehta +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_all_param_mehta +date: 2024-09-17 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_all_param_mehta` is a English model originally trained by param-mehta. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_param_mehta_en_5.5.0_3.0_1726577051273.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_param_mehta_en_5.5.0_3.0_1726577051273.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_all_param_mehta","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_all_param_mehta", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_all_param_mehta| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|849.3 MB| + +## References + +https://huggingface.co/param-mehta/xlm-roberta-base-finetuned-panx-all \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-xlm_roberta_base_finetuned_panx_all_param_mehta_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-xlm_roberta_base_finetuned_panx_all_param_mehta_pipeline_en.md new file mode 100644 index 00000000000000..ea291a5160934d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-xlm_roberta_base_finetuned_panx_all_param_mehta_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_all_param_mehta_pipeline pipeline XlmRoBertaForTokenClassification from param-mehta +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_all_param_mehta_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_all_param_mehta_pipeline` is a English model originally trained by param-mehta. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_param_mehta_pipeline_en_5.5.0_3.0_1726577136893.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_param_mehta_pipeline_en_5.5.0_3.0_1726577136893.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_all_param_mehta_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_all_param_mehta_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_all_param_mehta_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|849.3 MB| + +## References + +https://huggingface.co/param-mehta/xlm-roberta-base-finetuned-panx-all + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-xlm_roberta_base_finetuned_panx_all_smilingface88_en.md b/docs/_posts/ahmedlone127/2024-09-17-xlm_roberta_base_finetuned_panx_all_smilingface88_en.md new file mode 100644 index 00000000000000..93b8e381860cbc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-xlm_roberta_base_finetuned_panx_all_smilingface88_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_all_smilingface88 XlmRoBertaForTokenClassification from smilingface88 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_all_smilingface88 +date: 2024-09-17 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_all_smilingface88` is a English model originally trained by smilingface88. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_smilingface88_en_5.5.0_3.0_1726611287054.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_smilingface88_en_5.5.0_3.0_1726611287054.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_all_smilingface88","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_all_smilingface88", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_all_smilingface88| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|861.0 MB| + +## References + +https://huggingface.co/smilingface88/xlm-roberta-base-finetuned-panx-all \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-xlm_roberta_base_finetuned_panx_arabic_roaaoal_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-xlm_roberta_base_finetuned_panx_arabic_roaaoal_pipeline_en.md new file mode 100644 index 00000000000000..7c621ab9857f60 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-xlm_roberta_base_finetuned_panx_arabic_roaaoal_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_arabic_roaaoal_pipeline pipeline XlmRoBertaForTokenClassification from roaaoal +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_arabic_roaaoal_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_arabic_roaaoal_pipeline` is a English model originally trained by roaaoal. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_arabic_roaaoal_pipeline_en_5.5.0_3.0_1726611214676.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_arabic_roaaoal_pipeline_en_5.5.0_3.0_1726611214676.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_arabic_roaaoal_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_arabic_roaaoal_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_arabic_roaaoal_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|843.7 MB| + +## References + +https://huggingface.co/roaaoal/xlm-roberta-base-finetuned-panx-ar + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-xlm_roberta_base_finetuned_panx_english_henryjiang_en.md b/docs/_posts/ahmedlone127/2024-09-17-xlm_roberta_base_finetuned_panx_english_henryjiang_en.md new file mode 100644 index 00000000000000..77d00abaf7585b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-xlm_roberta_base_finetuned_panx_english_henryjiang_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_english_henryjiang XlmRoBertaForTokenClassification from henryjiang +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_english_henryjiang +date: 2024-09-17 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_english_henryjiang` is a English model originally trained by henryjiang. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_henryjiang_en_5.5.0_3.0_1726611607129.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_henryjiang_en_5.5.0_3.0_1726611607129.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_english_henryjiang","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_english_henryjiang", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_english_henryjiang| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|828.3 MB| + +## References + +https://huggingface.co/henryjiang/xlm-roberta-base-finetuned-panx-en \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-xlm_roberta_base_finetuned_panx_english_henryjiang_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-xlm_roberta_base_finetuned_panx_english_henryjiang_pipeline_en.md new file mode 100644 index 00000000000000..c0e8a4545d1b0d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-xlm_roberta_base_finetuned_panx_english_henryjiang_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_english_henryjiang_pipeline pipeline XlmRoBertaForTokenClassification from henryjiang +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_english_henryjiang_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_english_henryjiang_pipeline` is a English model originally trained by henryjiang. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_henryjiang_pipeline_en_5.5.0_3.0_1726611698424.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_henryjiang_pipeline_en_5.5.0_3.0_1726611698424.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_english_henryjiang_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_english_henryjiang_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_english_henryjiang_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|828.4 MB| + +## References + +https://huggingface.co/henryjiang/xlm-roberta-base-finetuned-panx-en + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-xlm_roberta_base_finetuned_panx_english_jaemin12_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-xlm_roberta_base_finetuned_panx_english_jaemin12_pipeline_en.md new file mode 100644 index 00000000000000..3fb043da252390 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-xlm_roberta_base_finetuned_panx_english_jaemin12_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_english_jaemin12_pipeline pipeline XlmRoBertaForTokenClassification from jaemin12 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_english_jaemin12_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_english_jaemin12_pipeline` is a English model originally trained by jaemin12. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_jaemin12_pipeline_en_5.5.0_3.0_1726612367840.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_jaemin12_pipeline_en_5.5.0_3.0_1726612367840.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_english_jaemin12_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_english_jaemin12_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_english_jaemin12_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|826.4 MB| + +## References + +https://huggingface.co/jaemin12/xlm-roberta-base-finetuned-panx-en + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-xlm_roberta_base_finetuned_panx_english_sreek_en.md b/docs/_posts/ahmedlone127/2024-09-17-xlm_roberta_base_finetuned_panx_english_sreek_en.md new file mode 100644 index 00000000000000..719f2eec913938 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-xlm_roberta_base_finetuned_panx_english_sreek_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_english_sreek XlmRoBertaForTokenClassification from Sreek +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_english_sreek +date: 2024-09-17 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_english_sreek` is a English model originally trained by Sreek. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_sreek_en_5.5.0_3.0_1726575848301.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_sreek_en_5.5.0_3.0_1726575848301.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_english_sreek","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_english_sreek", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_english_sreek| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|826.4 MB| + +## References + +https://huggingface.co/Sreek/xlm-roberta-base-finetuned-panx-en \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-xlm_roberta_base_finetuned_panx_french_ashrielbrian_en.md b/docs/_posts/ahmedlone127/2024-09-17-xlm_roberta_base_finetuned_panx_french_ashrielbrian_en.md new file mode 100644 index 00000000000000..8b60415db94c84 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-xlm_roberta_base_finetuned_panx_french_ashrielbrian_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_ashrielbrian XlmRoBertaForTokenClassification from ashrielbrian +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_ashrielbrian +date: 2024-09-17 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_ashrielbrian` is a English model originally trained by ashrielbrian. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_ashrielbrian_en_5.5.0_3.0_1726576261725.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_ashrielbrian_en_5.5.0_3.0_1726576261725.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_french_ashrielbrian","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_french_ashrielbrian", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_ashrielbrian| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|840.9 MB| + +## References + +https://huggingface.co/ashrielbrian/xlm-roberta-base-finetuned-panx-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-xlm_roberta_base_finetuned_panx_french_kbleejohn_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-xlm_roberta_base_finetuned_panx_french_kbleejohn_pipeline_en.md new file mode 100644 index 00000000000000..8e2fc6e3ec5d88 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-xlm_roberta_base_finetuned_panx_french_kbleejohn_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_kbleejohn_pipeline pipeline XlmRoBertaForTokenClassification from kbleejohn +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_kbleejohn_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_kbleejohn_pipeline` is a English model originally trained by kbleejohn. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_kbleejohn_pipeline_en_5.5.0_3.0_1726577349485.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_kbleejohn_pipeline_en_5.5.0_3.0_1726577349485.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_kbleejohn_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_kbleejohn_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_kbleejohn_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|840.9 MB| + +## References + +https://huggingface.co/kbleejohn/xlm-roberta-base-finetuned-panx-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-xlm_roberta_base_finetuned_panx_german_anas1997_en.md b/docs/_posts/ahmedlone127/2024-09-17-xlm_roberta_base_finetuned_panx_german_anas1997_en.md new file mode 100644 index 00000000000000..58e0ff4479b0d6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-xlm_roberta_base_finetuned_panx_german_anas1997_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_anas1997 XlmRoBertaForTokenClassification from Anas1997 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_anas1997 +date: 2024-09-17 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_anas1997` is a English model originally trained by Anas1997. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_anas1997_en_5.5.0_3.0_1726611087735.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_anas1997_en_5.5.0_3.0_1726611087735.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_anas1997","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_anas1997", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_anas1997| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|840.8 MB| + +## References + +https://huggingface.co/Anas1997/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-xlm_roberta_base_finetuned_panx_german_anas1997_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-xlm_roberta_base_finetuned_panx_german_anas1997_pipeline_en.md new file mode 100644 index 00000000000000..0e48de9c162dc2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-xlm_roberta_base_finetuned_panx_german_anas1997_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_anas1997_pipeline pipeline XlmRoBertaForTokenClassification from Anas1997 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_anas1997_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_anas1997_pipeline` is a English model originally trained by Anas1997. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_anas1997_pipeline_en_5.5.0_3.0_1726611175679.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_anas1997_pipeline_en_5.5.0_3.0_1726611175679.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_anas1997_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_anas1997_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_anas1997_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|840.8 MB| + +## References + +https://huggingface.co/Anas1997/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-xlm_roberta_base_finetuned_panx_german_athairus_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-xlm_roberta_base_finetuned_panx_german_athairus_pipeline_en.md new file mode 100644 index 00000000000000..0a1d3820e39988 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-xlm_roberta_base_finetuned_panx_german_athairus_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_athairus_pipeline pipeline XlmRoBertaForTokenClassification from athairus +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_athairus_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_athairus_pipeline` is a English model originally trained by athairus. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_athairus_pipeline_en_5.5.0_3.0_1726611064926.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_athairus_pipeline_en_5.5.0_3.0_1726611064926.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_athairus_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_athairus_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_athairus_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/athairus/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-xlm_roberta_base_finetuned_panx_german_facehugger135_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-xlm_roberta_base_finetuned_panx_german_facehugger135_pipeline_en.md new file mode 100644 index 00000000000000..fa77c370f87785 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-xlm_roberta_base_finetuned_panx_german_facehugger135_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_facehugger135_pipeline pipeline XlmRoBertaForTokenClassification from Facehugger135 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_facehugger135_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_facehugger135_pipeline` is a English model originally trained by Facehugger135. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_facehugger135_pipeline_en_5.5.0_3.0_1726611840083.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_facehugger135_pipeline_en_5.5.0_3.0_1726611840083.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_facehugger135_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_facehugger135_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_facehugger135_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|816.4 MB| + +## References + +https://huggingface.co/Facehugger135/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-xlm_roberta_base_finetuned_panx_german_french_param_mehta_en.md b/docs/_posts/ahmedlone127/2024-09-17-xlm_roberta_base_finetuned_panx_german_french_param_mehta_en.md new file mode 100644 index 00000000000000..7a0b6e0669eb6a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-xlm_roberta_base_finetuned_panx_german_french_param_mehta_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_param_mehta XlmRoBertaForTokenClassification from param-mehta +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_param_mehta +date: 2024-09-17 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_param_mehta` is a English model originally trained by param-mehta. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_param_mehta_en_5.5.0_3.0_1726576866814.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_param_mehta_en_5.5.0_3.0_1726576866814.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_param_mehta","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_param_mehta", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_param_mehta| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|845.6 MB| + +## References + +https://huggingface.co/param-mehta/xlm-roberta-base-finetuned-panx-de-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-xlm_roberta_base_finetuned_panx_german_gulermuslim_en.md b/docs/_posts/ahmedlone127/2024-09-17-xlm_roberta_base_finetuned_panx_german_gulermuslim_en.md new file mode 100644 index 00000000000000..e96d5c1f509ca2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-xlm_roberta_base_finetuned_panx_german_gulermuslim_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_gulermuslim XlmRoBertaForTokenClassification from gulermuslim +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_gulermuslim +date: 2024-09-17 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_gulermuslim` is a English model originally trained by gulermuslim. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_gulermuslim_en_5.5.0_3.0_1726612115887.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_gulermuslim_en_5.5.0_3.0_1726612115887.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_gulermuslim","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_gulermuslim", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_gulermuslim| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/gulermuslim/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-xlm_roberta_base_finetuned_panx_italian_team_nave_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-xlm_roberta_base_finetuned_panx_italian_team_nave_pipeline_en.md new file mode 100644 index 00000000000000..0432c3467421ee --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-xlm_roberta_base_finetuned_panx_italian_team_nave_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_team_nave_pipeline pipeline XlmRoBertaForTokenClassification from team-nave +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_team_nave_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_team_nave_pipeline` is a English model originally trained by team-nave. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_team_nave_pipeline_en_5.5.0_3.0_1726612713401.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_team_nave_pipeline_en_5.5.0_3.0_1726612713401.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_team_nave_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_team_nave_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_team_nave_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|826.5 MB| + +## References + +https://huggingface.co/team-nave/xlm-roberta-base-finetuned-panx-it + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-xlm_roberta_base_finetuned_panx_italian_u00890358_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-xlm_roberta_base_finetuned_panx_italian_u00890358_pipeline_en.md new file mode 100644 index 00000000000000..c96dfc1f80dc81 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-xlm_roberta_base_finetuned_panx_italian_u00890358_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_u00890358_pipeline pipeline XlmRoBertaForTokenClassification from u00890358 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_u00890358_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_u00890358_pipeline` is a English model originally trained by u00890358. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_u00890358_pipeline_en_5.5.0_3.0_1726611641185.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_u00890358_pipeline_en_5.5.0_3.0_1726611641185.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_u00890358_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_u00890358_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_u00890358_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|816.8 MB| + +## References + +https://huggingface.co/u00890358/xlm-roberta-base-finetuned-panx-it + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-xlm_roberta_base_lr2e_05_seed42_kinyarwanda_amh_eng_train_en.md b/docs/_posts/ahmedlone127/2024-09-17-xlm_roberta_base_lr2e_05_seed42_kinyarwanda_amh_eng_train_en.md new file mode 100644 index 00000000000000..db525bc34b1e91 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-xlm_roberta_base_lr2e_05_seed42_kinyarwanda_amh_eng_train_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_lr2e_05_seed42_kinyarwanda_amh_eng_train XlmRoBertaForSequenceClassification from shanhy +author: John Snow Labs +name: xlm_roberta_base_lr2e_05_seed42_kinyarwanda_amh_eng_train +date: 2024-09-17 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_lr2e_05_seed42_kinyarwanda_amh_eng_train` is a English model originally trained by shanhy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_lr2e_05_seed42_kinyarwanda_amh_eng_train_en_5.5.0_3.0_1726616327412.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_lr2e_05_seed42_kinyarwanda_amh_eng_train_en_5.5.0_3.0_1726616327412.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_lr2e_05_seed42_kinyarwanda_amh_eng_train","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_lr2e_05_seed42_kinyarwanda_amh_eng_train", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_lr2e_05_seed42_kinyarwanda_amh_eng_train| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|802.9 MB| + +## References + +https://huggingface.co/shanhy/xlm-roberta-base_lr2e-05_seed42_kin-amh-eng_train \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-xlm_roberta_base_mongolian_ner_pipeline_mn.md b/docs/_posts/ahmedlone127/2024-09-17-xlm_roberta_base_mongolian_ner_pipeline_mn.md new file mode 100644 index 00000000000000..217abc2f131b2a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-xlm_roberta_base_mongolian_ner_pipeline_mn.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Mongolian xlm_roberta_base_mongolian_ner_pipeline pipeline XlmRoBertaForTokenClassification from nemuwn +author: John Snow Labs +name: xlm_roberta_base_mongolian_ner_pipeline +date: 2024-09-17 +tags: [mn, open_source, pipeline, onnx] +task: Named Entity Recognition +language: mn +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_mongolian_ner_pipeline` is a Mongolian model originally trained by nemuwn. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_mongolian_ner_pipeline_mn_5.5.0_3.0_1726576566495.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_mongolian_ner_pipeline_mn_5.5.0_3.0_1726576566495.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_mongolian_ner_pipeline", lang = "mn") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_mongolian_ner_pipeline", lang = "mn") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_mongolian_ner_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|mn| +|Size:|842.2 MB| + +## References + +https://huggingface.co/nemuwn/xlm-roberta-base-mongolian-ner + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-xlm_roberta_base_operator_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-xlm_roberta_base_operator_pipeline_en.md new file mode 100644 index 00000000000000..8ec15af19e3b71 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-xlm_roberta_base_operator_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_operator_pipeline pipeline XlmRoBertaForSequenceClassification from DanLee6507 +author: John Snow Labs +name: xlm_roberta_base_operator_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_operator_pipeline` is a English model originally trained by DanLee6507. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_operator_pipeline_en_5.5.0_3.0_1726536899945.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_operator_pipeline_en_5.5.0_3.0_1726536899945.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_operator_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_operator_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_operator_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|870.6 MB| + +## References + +https://huggingface.co/DanLee6507/xlm-roberta-base-operator + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-xlm_roberta_base_sequence_classifier_language_detection_en.md b/docs/_posts/ahmedlone127/2024-09-17-xlm_roberta_base_sequence_classifier_language_detection_en.md new file mode 100644 index 00000000000000..837616fb58c18b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-xlm_roberta_base_sequence_classifier_language_detection_en.md @@ -0,0 +1,104 @@ +--- +layout: model +title: XLM-RoBERTa Sequence Classification Base - Language Detection (xlm_roberta_base_sequence_classifier_language_detection) +author: John Snow Labs +name: xlm_roberta_base_sequence_classifier_language_detection +date: 2024-09-17 +tags: [sequence_classification, roberta, openvino, en, open_source] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: openvino +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +“ +XLM-RoBERTa Model with sequence classification/regression head on top (a linear layer on top of the pooled output) e.g. for multi-class document classification tasks. + +xlm_roberta_base_sequence_classifier_language_detection is a fine-tuned XLM-RoBERTa model that is ready to be used for Sequence Classification tasks such as sentiment analysis or multi-class text classification and it achieves state-of-the-art performance. + +We used TFXLMRobertaForSequenceClassification to train this model and used XlmRoBertaForSequenceClassification annotator in Spark NLP 🚀 for prediction at scale! + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_sequence_classifier_language_detection_en_5.5.0_3.0_1726563005096.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_sequence_classifier_language_detection_en_5.5.0_3.0_1726563005096.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +.setInputCol('text') \ +.setOutputCol('document') + +tokenizer = Tokenizer() \ +.setInputCols(['document']) \ +.setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification \ +.pretrained('xlm_roberta_base_sequence_classifier_language_detection', 'en') \ +.setInputCols(['token', 'document']) \ +.setOutputCol('class') \ +.setCaseSensitive(True) \ +.setMaxSentenceLength(512) + +pipeline = Pipeline(stages=[ +document_assembler, +tokenizer, +sequenceClassifier +]) + +example = spark.createDataFrame([['I really liked that movie!']]).toDF("text") +result = pipeline.fit(example).transform(example) + +``` +```scala + + val document_assembler = DocumentAssembler() +.setInputCol("text") +.setOutputCol("document") + +val tokenizer = Tokenizer() +.setInputCols("document") +.setOutputCol("token") + +val tokenClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_sequence_classifier_language_detection", "en") +.setInputCols("document", "token") +.setOutputCol("class") +.setCaseSensitive(true) +.setMaxSentenceLength(512) + +val pipeline = new Pipeline().setStages(Array(document_assembler, tokenizer, sequenceClassifier)) + +val example = Seq("I really liked that movie!").toDS.toDF("text") + +val result = pipeline.fit(example).transform(example) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_sequence_classifier_language_detection| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[token, document]| +|Output Labels:|[label]| +|Language:|en| +|Size:|870.5 MB| +|Case sensitive:|true| \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-xlm_roberta_base_ukraine_waray_philippines_pov_v1_en.md b/docs/_posts/ahmedlone127/2024-09-17-xlm_roberta_base_ukraine_waray_philippines_pov_v1_en.md new file mode 100644 index 00000000000000..c69976ddab762f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-xlm_roberta_base_ukraine_waray_philippines_pov_v1_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_ukraine_waray_philippines_pov_v1 XlmRoBertaForSequenceClassification from YaraKyrychenko +author: John Snow Labs +name: xlm_roberta_base_ukraine_waray_philippines_pov_v1 +date: 2024-09-17 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_ukraine_waray_philippines_pov_v1` is a English model originally trained by YaraKyrychenko. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_ukraine_waray_philippines_pov_v1_en_5.5.0_3.0_1726536005629.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_ukraine_waray_philippines_pov_v1_en_5.5.0_3.0_1726536005629.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_ukraine_waray_philippines_pov_v1","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_ukraine_waray_philippines_pov_v1", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_ukraine_waray_philippines_pov_v1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|875.0 MB| + +## References + +https://huggingface.co/YaraKyrychenko/xlm-roberta-base-ukraine-war-pov-v1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-xlm_roberta_finetuned_emojis_cen_2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-xlm_roberta_finetuned_emojis_cen_2_pipeline_en.md new file mode 100644 index 00000000000000..4d9c116a28dba3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-xlm_roberta_finetuned_emojis_cen_2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_finetuned_emojis_cen_2_pipeline pipeline XlmRoBertaForSequenceClassification from Karim-Gamal +author: John Snow Labs +name: xlm_roberta_finetuned_emojis_cen_2_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_finetuned_emojis_cen_2_pipeline` is a English model originally trained by Karim-Gamal. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_finetuned_emojis_cen_2_pipeline_en_5.5.0_3.0_1726536719294.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_finetuned_emojis_cen_2_pipeline_en_5.5.0_3.0_1726536719294.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_finetuned_emojis_cen_2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_finetuned_emojis_cen_2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_finetuned_emojis_cen_2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.0 GB| + +## References + +https://huggingface.co/Karim-Gamal/XLM-Roberta-finetuned-emojis-cen-2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-xlm_roberta_longformer_base_4096_rua_wl_3_classes_pipeline_fr.md b/docs/_posts/ahmedlone127/2024-09-17-xlm_roberta_longformer_base_4096_rua_wl_3_classes_pipeline_fr.md new file mode 100644 index 00000000000000..640135795fba6d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-xlm_roberta_longformer_base_4096_rua_wl_3_classes_pipeline_fr.md @@ -0,0 +1,70 @@ +--- +layout: model +title: French xlm_roberta_longformer_base_4096_rua_wl_3_classes_pipeline pipeline XlmRoBertaForSequenceClassification from waboucay +author: John Snow Labs +name: xlm_roberta_longformer_base_4096_rua_wl_3_classes_pipeline +date: 2024-09-17 +tags: [fr, open_source, pipeline, onnx] +task: Text Classification +language: fr +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_longformer_base_4096_rua_wl_3_classes_pipeline` is a French model originally trained by waboucay. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_longformer_base_4096_rua_wl_3_classes_pipeline_fr_5.5.0_3.0_1726615270758.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_longformer_base_4096_rua_wl_3_classes_pipeline_fr_5.5.0_3.0_1726615270758.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_longformer_base_4096_rua_wl_3_classes_pipeline", lang = "fr") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_longformer_base_4096_rua_wl_3_classes_pipeline", lang = "fr") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_longformer_base_4096_rua_wl_3_classes_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|fr| +|Size:|1.1 GB| + +## References + +https://huggingface.co/waboucay/xlm-roberta-longformer-base-4096-rua_wl_3_classes + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-xlmr_sinhalese_english_all_shuffled_2020_test1000_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-xlmr_sinhalese_english_all_shuffled_2020_test1000_pipeline_en.md new file mode 100644 index 00000000000000..469368af390d3c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-xlmr_sinhalese_english_all_shuffled_2020_test1000_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlmr_sinhalese_english_all_shuffled_2020_test1000_pipeline pipeline XlmRoBertaForSequenceClassification from patpizio +author: John Snow Labs +name: xlmr_sinhalese_english_all_shuffled_2020_test1000_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlmr_sinhalese_english_all_shuffled_2020_test1000_pipeline` is a English model originally trained by patpizio. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmr_sinhalese_english_all_shuffled_2020_test1000_pipeline_en_5.5.0_3.0_1726615997876.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmr_sinhalese_english_all_shuffled_2020_test1000_pipeline_en_5.5.0_3.0_1726615997876.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlmr_sinhalese_english_all_shuffled_2020_test1000_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlmr_sinhalese_english_all_shuffled_2020_test1000_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmr_sinhalese_english_all_shuffled_2020_test1000_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|814.3 MB| + +## References + +https://huggingface.co/patpizio/xlmr-si-en-all_shuffled-2020-test1000 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-0_000005_0_999_en.md b/docs/_posts/ahmedlone127/2024-09-18-0_000005_0_999_en.md new file mode 100644 index 00000000000000..5256049e498880 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-0_000005_0_999_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English 0_000005_0_999 RoBertaForSequenceClassification from rose-e-wang +author: John Snow Labs +name: 0_000005_0_999 +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`0_000005_0_999` is a English model originally trained by rose-e-wang. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/0_000005_0_999_en_5.5.0_3.0_1726627919804.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/0_000005_0_999_en_5.5.0_3.0_1726627919804.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("0_000005_0_999","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("0_000005_0_999", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|0_000005_0_999| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/rose-e-wang/0.000005_0.999 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-1104_en.md b/docs/_posts/ahmedlone127/2024-09-18-1104_en.md new file mode 100644 index 00000000000000..0267cbffd152f4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-1104_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English 1104 DistilBertForSequenceClassification from tingchih +author: John Snow Labs +name: 1104 +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`1104` is a English model originally trained by tingchih. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/1104_en_5.5.0_3.0_1726625515443.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/1104_en_5.5.0_3.0_1726625515443.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("1104","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("1104", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|1104| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/tingchih/1104 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-1104_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-1104_pipeline_en.md new file mode 100644 index 00000000000000..bdf8866f90c39e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-1104_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English 1104_pipeline pipeline DistilBertForSequenceClassification from tingchih +author: John Snow Labs +name: 1104_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`1104_pipeline` is a English model originally trained by tingchih. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/1104_pipeline_en_5.5.0_3.0_1726625528138.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/1104_pipeline_en_5.5.0_3.0_1726625528138.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("1104_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("1104_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|1104_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/tingchih/1104 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-2404v2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-2404v2_pipeline_en.md new file mode 100644 index 00000000000000..c29fd4e8b18680 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-2404v2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English 2404v2_pipeline pipeline RoBertaForSequenceClassification from adriansanz +author: John Snow Labs +name: 2404v2_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`2404v2_pipeline` is a English model originally trained by adriansanz. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/2404v2_pipeline_en_5.5.0_3.0_1726650568817.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/2404v2_pipeline_en_5.5.0_3.0_1726650568817.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("2404v2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("2404v2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|2404v2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|450.6 MB| + +## References + +https://huggingface.co/adriansanz/2404v2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-40_langdetect_v01_en.md b/docs/_posts/ahmedlone127/2024-09-18-40_langdetect_v01_en.md new file mode 100644 index 00000000000000..0769d9a368e890 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-40_langdetect_v01_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English 40_langdetect_v01 XlmRoBertaForSequenceClassification from ERCDiDip +author: John Snow Labs +name: 40_langdetect_v01 +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`40_langdetect_v01` is a English model originally trained by ERCDiDip. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/40_langdetect_v01_en_5.5.0_3.0_1726671779487.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/40_langdetect_v01_en_5.5.0_3.0_1726671779487.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("40_langdetect_v01","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("40_langdetect_v01", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|40_langdetect_v01| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|927.0 MB| + +## References + +https://huggingface.co/ERCDiDip/40_langdetect_v01 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-5718_5_en.md b/docs/_posts/ahmedlone127/2024-09-18-5718_5_en.md new file mode 100644 index 00000000000000..79cdd76b6c7207 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-5718_5_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English 5718_5 DistilBertForSequenceClassification from mhpanju +author: John Snow Labs +name: 5718_5 +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`5718_5` is a English model originally trained by mhpanju. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/5718_5_en_5.5.0_3.0_1726695900541.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/5718_5_en_5.5.0_3.0_1726695900541.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("5718_5","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("5718_5", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|5718_5| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/mhpanju/5718_5 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-5718_5_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-5718_5_pipeline_en.md new file mode 100644 index 00000000000000..2d40d8f60fb4ff --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-5718_5_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English 5718_5_pipeline pipeline DistilBertForSequenceClassification from mhpanju +author: John Snow Labs +name: 5718_5_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`5718_5_pipeline` is a English model originally trained by mhpanju. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/5718_5_pipeline_en_5.5.0_3.0_1726695913671.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/5718_5_pipeline_en_5.5.0_3.0_1726695913671.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("5718_5_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("5718_5_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|5718_5_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/mhpanju/5718_5 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-absa_restaurant_froberta_base_en.md b/docs/_posts/ahmedlone127/2024-09-18-absa_restaurant_froberta_base_en.md new file mode 100644 index 00000000000000..286ac9ac358ee0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-absa_restaurant_froberta_base_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English absa_restaurant_froberta_base RoBertaEmbeddings from AliAhmad001 +author: John Snow Labs +name: absa_restaurant_froberta_base +date: 2024-09-18 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`absa_restaurant_froberta_base` is a English model originally trained by AliAhmad001. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/absa_restaurant_froberta_base_en_5.5.0_3.0_1726617933778.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/absa_restaurant_froberta_base_en_5.5.0_3.0_1726617933778.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("absa_restaurant_froberta_base","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("absa_restaurant_froberta_base","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|absa_restaurant_froberta_base| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|466.0 MB| + +## References + +https://huggingface.co/AliAhmad001/absa-restaurant-froberta-base \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-ag_news_roberta_large_seed_3_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-ag_news_roberta_large_seed_3_pipeline_en.md new file mode 100644 index 00000000000000..9dbe5b21d7de48 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-ag_news_roberta_large_seed_3_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English ag_news_roberta_large_seed_3_pipeline pipeline RoBertaForSequenceClassification from utahnlp +author: John Snow Labs +name: ag_news_roberta_large_seed_3_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ag_news_roberta_large_seed_3_pipeline` is a English model originally trained by utahnlp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ag_news_roberta_large_seed_3_pipeline_en_5.5.0_3.0_1726628476743.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ag_news_roberta_large_seed_3_pipeline_en_5.5.0_3.0_1726628476743.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("ag_news_roberta_large_seed_3_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("ag_news_roberta_large_seed_3_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ag_news_roberta_large_seed_3_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/utahnlp/ag_news_roberta-large_seed-3 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-ahisto_ner_model_tds1_mu_nlpc_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-ahisto_ner_model_tds1_mu_nlpc_pipeline_en.md new file mode 100644 index 00000000000000..89c704578ea77e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-ahisto_ner_model_tds1_mu_nlpc_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English ahisto_ner_model_tds1_mu_nlpc_pipeline pipeline XlmRoBertaForTokenClassification from MU-NLPC +author: John Snow Labs +name: ahisto_ner_model_tds1_mu_nlpc_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ahisto_ner_model_tds1_mu_nlpc_pipeline` is a English model originally trained by MU-NLPC. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ahisto_ner_model_tds1_mu_nlpc_pipeline_en_5.5.0_3.0_1726702147500.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ahisto_ner_model_tds1_mu_nlpc_pipeline_en_5.5.0_3.0_1726702147500.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("ahisto_ner_model_tds1_mu_nlpc_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("ahisto_ner_model_tds1_mu_nlpc_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ahisto_ner_model_tds1_mu_nlpc_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.0 GB| + +## References + +https://huggingface.co/MU-NLPC/ahisto-ner-model-tds1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-ai_generated_text_classification_sanalsprasad_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-ai_generated_text_classification_sanalsprasad_pipeline_en.md new file mode 100644 index 00000000000000..4a5394a874dd2a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-ai_generated_text_classification_sanalsprasad_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English ai_generated_text_classification_sanalsprasad_pipeline pipeline RoBertaForSequenceClassification from sanalsprasad +author: John Snow Labs +name: ai_generated_text_classification_sanalsprasad_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ai_generated_text_classification_sanalsprasad_pipeline` is a English model originally trained by sanalsprasad. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ai_generated_text_classification_sanalsprasad_pipeline_en_5.5.0_3.0_1726622504436.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ai_generated_text_classification_sanalsprasad_pipeline_en_5.5.0_3.0_1726622504436.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("ai_generated_text_classification_sanalsprasad_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("ai_generated_text_classification_sanalsprasad_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ai_generated_text_classification_sanalsprasad_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|468.2 MB| + +## References + +https://huggingface.co/sanalsprasad/ai-generated-text-classification + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-albert_base_jackh1995_en.md b/docs/_posts/ahmedlone127/2024-09-18-albert_base_jackh1995_en.md new file mode 100644 index 00000000000000..6cf33dce2fee51 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-albert_base_jackh1995_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English albert_base_jackh1995 BertForQuestionAnswering from jackh1995 +author: John Snow Labs +name: albert_base_jackh1995 +date: 2024-09-18 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`albert_base_jackh1995` is a English model originally trained by jackh1995. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/albert_base_jackh1995_en_5.5.0_3.0_1726658607463.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/albert_base_jackh1995_en_5.5.0_3.0_1726658607463.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("albert_base_jackh1995","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("albert_base_jackh1995", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|albert_base_jackh1995| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|380.8 MB| + +## References + +https://huggingface.co/jackh1995/albert-base \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-albert_model_02_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-albert_model_02_pipeline_en.md new file mode 100644 index 00000000000000..18998872920fe1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-albert_model_02_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English albert_model_02_pipeline pipeline DistilBertForSequenceClassification from KalaiselvanD +author: John Snow Labs +name: albert_model_02_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`albert_model_02_pipeline` is a English model originally trained by KalaiselvanD. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/albert_model_02_pipeline_en_5.5.0_3.0_1726626015843.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/albert_model_02_pipeline_en_5.5.0_3.0_1726626015843.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("albert_model_02_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("albert_model_02_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|albert_model_02_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/KalaiselvanD/albert_model_02 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-all_roberta_large_v1_travel_16_16_5_oos_en.md b/docs/_posts/ahmedlone127/2024-09-18-all_roberta_large_v1_travel_16_16_5_oos_en.md new file mode 100644 index 00000000000000..0b292c5f6b4512 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-all_roberta_large_v1_travel_16_16_5_oos_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English all_roberta_large_v1_travel_16_16_5_oos RoBertaForSequenceClassification from fathyshalab +author: John Snow Labs +name: all_roberta_large_v1_travel_16_16_5_oos +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`all_roberta_large_v1_travel_16_16_5_oos` is a English model originally trained by fathyshalab. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/all_roberta_large_v1_travel_16_16_5_oos_en_5.5.0_3.0_1726628065698.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/all_roberta_large_v1_travel_16_16_5_oos_en_5.5.0_3.0_1726628065698.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("all_roberta_large_v1_travel_16_16_5_oos","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("all_roberta_large_v1_travel_16_16_5_oos", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|all_roberta_large_v1_travel_16_16_5_oos| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/fathyshalab/all-roberta-large-v1-travel-16-16-5-oos \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-all_roberta_large_v1_work_1000_16_5_oos_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-all_roberta_large_v1_work_1000_16_5_oos_pipeline_en.md new file mode 100644 index 00000000000000..d62f8b04b11d33 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-all_roberta_large_v1_work_1000_16_5_oos_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English all_roberta_large_v1_work_1000_16_5_oos_pipeline pipeline RoBertaForSequenceClassification from fathyshalab +author: John Snow Labs +name: all_roberta_large_v1_work_1000_16_5_oos_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`all_roberta_large_v1_work_1000_16_5_oos_pipeline` is a English model originally trained by fathyshalab. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/all_roberta_large_v1_work_1000_16_5_oos_pipeline_en_5.5.0_3.0_1726666817554.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/all_roberta_large_v1_work_1000_16_5_oos_pipeline_en_5.5.0_3.0_1726666817554.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("all_roberta_large_v1_work_1000_16_5_oos_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("all_roberta_large_v1_work_1000_16_5_oos_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|all_roberta_large_v1_work_1000_16_5_oos_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/fathyshalab/all-roberta-large-v1-work-1000-16-5-oos + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-amazon_spanish_reviews_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-amazon_spanish_reviews_pipeline_en.md new file mode 100644 index 00000000000000..972c97563badf9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-amazon_spanish_reviews_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English amazon_spanish_reviews_pipeline pipeline RoBertaForSequenceClassification from santyzenith +author: John Snow Labs +name: amazon_spanish_reviews_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`amazon_spanish_reviews_pipeline` is a English model originally trained by santyzenith. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/amazon_spanish_reviews_pipeline_en_5.5.0_3.0_1726649855335.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/amazon_spanish_reviews_pipeline_en_5.5.0_3.0_1726649855335.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("amazon_spanish_reviews_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("amazon_spanish_reviews_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|amazon_spanish_reviews_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|440.9 MB| + +## References + +https://huggingface.co/santyzenith/amazon_es_reviews + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-amharicnewscharacternormalizedunweighted_en.md b/docs/_posts/ahmedlone127/2024-09-18-amharicnewscharacternormalizedunweighted_en.md new file mode 100644 index 00000000000000..56132746ae30fc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-amharicnewscharacternormalizedunweighted_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English amharicnewscharacternormalizedunweighted XlmRoBertaForSequenceClassification from akiseid +author: John Snow Labs +name: amharicnewscharacternormalizedunweighted +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`amharicnewscharacternormalizedunweighted` is a English model originally trained by akiseid. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/amharicnewscharacternormalizedunweighted_en_5.5.0_3.0_1726697303669.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/amharicnewscharacternormalizedunweighted_en_5.5.0_3.0_1726697303669.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("amharicnewscharacternormalizedunweighted","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("amharicnewscharacternormalizedunweighted", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|amharicnewscharacternormalizedunweighted| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|840.7 MB| + +## References + +https://huggingface.co/akiseid/AmharicNewsCharacterNormalizedUnWeighted \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-arabert_arabic_ner_conllpp_ar.md b/docs/_posts/ahmedlone127/2024-09-18-arabert_arabic_ner_conllpp_ar.md new file mode 100644 index 00000000000000..77ac2b2f03a29a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-arabert_arabic_ner_conllpp_ar.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Arabic arabert_arabic_ner_conllpp BertForTokenClassification from MostafaAhmed98 +author: John Snow Labs +name: arabert_arabic_ner_conllpp +date: 2024-09-18 +tags: [ar, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: ar +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`arabert_arabic_ner_conllpp` is a Arabic model originally trained by MostafaAhmed98. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/arabert_arabic_ner_conllpp_ar_5.5.0_3.0_1726673933993.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/arabert_arabic_ner_conllpp_ar_5.5.0_3.0_1726673933993.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("arabert_arabic_ner_conllpp","ar") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("arabert_arabic_ner_conllpp", "ar") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|arabert_arabic_ner_conllpp| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|ar| +|Size:|505.1 MB| + +## References + +https://huggingface.co/MostafaAhmed98/AraBert-Arabic-NER-CoNLLpp \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-arabert_arabic_ner_conllpp_pipeline_ar.md b/docs/_posts/ahmedlone127/2024-09-18-arabert_arabic_ner_conllpp_pipeline_ar.md new file mode 100644 index 00000000000000..543ca033a36bbd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-arabert_arabic_ner_conllpp_pipeline_ar.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Arabic arabert_arabic_ner_conllpp_pipeline pipeline BertForTokenClassification from MostafaAhmed98 +author: John Snow Labs +name: arabert_arabic_ner_conllpp_pipeline +date: 2024-09-18 +tags: [ar, open_source, pipeline, onnx] +task: Named Entity Recognition +language: ar +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`arabert_arabic_ner_conllpp_pipeline` is a Arabic model originally trained by MostafaAhmed98. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/arabert_arabic_ner_conllpp_pipeline_ar_5.5.0_3.0_1726673958857.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/arabert_arabic_ner_conllpp_pipeline_ar_5.5.0_3.0_1726673958857.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("arabert_arabic_ner_conllpp_pipeline", lang = "ar") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("arabert_arabic_ner_conllpp_pipeline", lang = "ar") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|arabert_arabic_ner_conllpp_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|ar| +|Size:|505.1 MB| + +## References + +https://huggingface.co/MostafaAhmed98/AraBert-Arabic-NER-CoNLLpp + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-arywiki_20230101_roberta_mlm_nobots_ar.md b/docs/_posts/ahmedlone127/2024-09-18-arywiki_20230101_roberta_mlm_nobots_ar.md new file mode 100644 index 00000000000000..0240a617a2f4c4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-arywiki_20230101_roberta_mlm_nobots_ar.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Arabic arywiki_20230101_roberta_mlm_nobots RoBertaEmbeddings from SaiedAlshahrani +author: John Snow Labs +name: arywiki_20230101_roberta_mlm_nobots +date: 2024-09-18 +tags: [ar, open_source, onnx, embeddings, roberta] +task: Embeddings +language: ar +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`arywiki_20230101_roberta_mlm_nobots` is a Arabic model originally trained by SaiedAlshahrani. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/arywiki_20230101_roberta_mlm_nobots_ar_5.5.0_3.0_1726651619191.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/arywiki_20230101_roberta_mlm_nobots_ar_5.5.0_3.0_1726651619191.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("arywiki_20230101_roberta_mlm_nobots","ar") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("arywiki_20230101_roberta_mlm_nobots","ar") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|arywiki_20230101_roberta_mlm_nobots| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|ar| +|Size:|311.3 MB| + +## References + +https://huggingface.co/SaiedAlshahrani/arywiki_20230101_roberta_mlm_nobots \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-arywiki_20230101_roberta_mlm_nobots_pipeline_ar.md b/docs/_posts/ahmedlone127/2024-09-18-arywiki_20230101_roberta_mlm_nobots_pipeline_ar.md new file mode 100644 index 00000000000000..cf72a4558b9f3a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-arywiki_20230101_roberta_mlm_nobots_pipeline_ar.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Arabic arywiki_20230101_roberta_mlm_nobots_pipeline pipeline RoBertaEmbeddings from SaiedAlshahrani +author: John Snow Labs +name: arywiki_20230101_roberta_mlm_nobots_pipeline +date: 2024-09-18 +tags: [ar, open_source, pipeline, onnx] +task: Embeddings +language: ar +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`arywiki_20230101_roberta_mlm_nobots_pipeline` is a Arabic model originally trained by SaiedAlshahrani. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/arywiki_20230101_roberta_mlm_nobots_pipeline_ar_5.5.0_3.0_1726651634592.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/arywiki_20230101_roberta_mlm_nobots_pipeline_ar_5.5.0_3.0_1726651634592.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("arywiki_20230101_roberta_mlm_nobots_pipeline", lang = "ar") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("arywiki_20230101_roberta_mlm_nobots_pipeline", lang = "ar") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|arywiki_20230101_roberta_mlm_nobots_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|ar| +|Size:|311.3 MB| + +## References + +https://huggingface.co/SaiedAlshahrani/arywiki_20230101_roberta_mlm_nobots + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-babylm_roberta_base_epoch_5_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-babylm_roberta_base_epoch_5_pipeline_en.md new file mode 100644 index 00000000000000..60daa0d5f78e76 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-babylm_roberta_base_epoch_5_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English babylm_roberta_base_epoch_5_pipeline pipeline RoBertaEmbeddings from Raj-Sanjay-Shah +author: John Snow Labs +name: babylm_roberta_base_epoch_5_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`babylm_roberta_base_epoch_5_pipeline` is a English model originally trained by Raj-Sanjay-Shah. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/babylm_roberta_base_epoch_5_pipeline_en_5.5.0_3.0_1726626629490.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/babylm_roberta_base_epoch_5_pipeline_en_5.5.0_3.0_1726626629490.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("babylm_roberta_base_epoch_5_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("babylm_roberta_base_epoch_5_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|babylm_roberta_base_epoch_5_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|465.6 MB| + +## References + +https://huggingface.co/Raj-Sanjay-Shah/babyLM_roberta_base_epoch_5 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-bantulm_pipeline_xx.md b/docs/_posts/ahmedlone127/2024-09-18-bantulm_pipeline_xx.md new file mode 100644 index 00000000000000..3f5117b6ad27a9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-bantulm_pipeline_xx.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Multilingual bantulm_pipeline pipeline BertEmbeddings from nairaxo +author: John Snow Labs +name: bantulm_pipeline +date: 2024-09-18 +tags: [xx, open_source, pipeline, onnx] +task: Embeddings +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bantulm_pipeline` is a Multilingual model originally trained by nairaxo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bantulm_pipeline_xx_5.5.0_3.0_1726691766822.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bantulm_pipeline_xx_5.5.0_3.0_1726691766822.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bantulm_pipeline", lang = "xx") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bantulm_pipeline", lang = "xx") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bantulm_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|xx| +|Size:|752.9 MB| + +## References + +https://huggingface.co/nairaxo/bantulm + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-bbc_news_classifier_en.md b/docs/_posts/ahmedlone127/2024-09-18-bbc_news_classifier_en.md new file mode 100644 index 00000000000000..bfd94224239d5b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-bbc_news_classifier_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bbc_news_classifier RoBertaForSequenceClassification from chrisliu298 +author: John Snow Labs +name: bbc_news_classifier +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bbc_news_classifier` is a English model originally trained by chrisliu298. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bbc_news_classifier_en_5.5.0_3.0_1726689604579.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bbc_news_classifier_en_5.5.0_3.0_1726689604579.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("bbc_news_classifier","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("bbc_news_classifier", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bbc_news_classifier| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|451.0 MB| + +## References + +https://huggingface.co/chrisliu298/bbc_news_classifier \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-berel_2_0_sam_v3_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-berel_2_0_sam_v3_pipeline_en.md new file mode 100644 index 00000000000000..5cfa352ffee860 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-berel_2_0_sam_v3_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English berel_2_0_sam_v3_pipeline pipeline BertEmbeddings from johnlockejrr +author: John Snow Labs +name: berel_2_0_sam_v3_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`berel_2_0_sam_v3_pipeline` is a English model originally trained by johnlockejrr. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/berel_2_0_sam_v3_pipeline_en_5.5.0_3.0_1726673192497.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/berel_2_0_sam_v3_pipeline_en_5.5.0_3.0_1726673192497.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("berel_2_0_sam_v3_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("berel_2_0_sam_v3_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|berel_2_0_sam_v3_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|689.9 MB| + +## References + +https://huggingface.co/johnlockejrr/BEREL_2.0-sam-v3 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-berit_2000_enriched_optimized_en.md b/docs/_posts/ahmedlone127/2024-09-18-berit_2000_enriched_optimized_en.md new file mode 100644 index 00000000000000..6db665e18e1557 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-berit_2000_enriched_optimized_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English berit_2000_enriched_optimized RoBertaEmbeddings from gngpostalsrvc +author: John Snow Labs +name: berit_2000_enriched_optimized +date: 2024-09-18 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`berit_2000_enriched_optimized` is a English model originally trained by gngpostalsrvc. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/berit_2000_enriched_optimized_en_5.5.0_3.0_1726678777982.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/berit_2000_enriched_optimized_en_5.5.0_3.0_1726678777982.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("berit_2000_enriched_optimized","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("berit_2000_enriched_optimized","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|berit_2000_enriched_optimized| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|464.9 MB| + +## References + +https://huggingface.co/gngpostalsrvc/BERiT_2000_enriched_optimized \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-berit_2000_enriched_optimized_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-berit_2000_enriched_optimized_pipeline_en.md new file mode 100644 index 00000000000000..e51d3a23650496 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-berit_2000_enriched_optimized_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English berit_2000_enriched_optimized_pipeline pipeline RoBertaEmbeddings from gngpostalsrvc +author: John Snow Labs +name: berit_2000_enriched_optimized_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`berit_2000_enriched_optimized_pipeline` is a English model originally trained by gngpostalsrvc. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/berit_2000_enriched_optimized_pipeline_en_5.5.0_3.0_1726678800715.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/berit_2000_enriched_optimized_pipeline_en_5.5.0_3.0_1726678800715.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("berit_2000_enriched_optimized_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("berit_2000_enriched_optimized_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|berit_2000_enriched_optimized_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|464.9 MB| + +## References + +https://huggingface.co/gngpostalsrvc/BERiT_2000_enriched_optimized + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-bert_base_arabert_finetuned_mdeberta_tswana_en.md b/docs/_posts/ahmedlone127/2024-09-18-bert_base_arabert_finetuned_mdeberta_tswana_en.md new file mode 100644 index 00000000000000..cb1d054818be0f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-bert_base_arabert_finetuned_mdeberta_tswana_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_arabert_finetuned_mdeberta_tswana BertEmbeddings from betteib +author: John Snow Labs +name: bert_base_arabert_finetuned_mdeberta_tswana +date: 2024-09-18 +tags: [en, open_source, onnx, embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_arabert_finetuned_mdeberta_tswana` is a English model originally trained by betteib. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_arabert_finetuned_mdeberta_tswana_en_5.5.0_3.0_1726700395207.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_arabert_finetuned_mdeberta_tswana_en_5.5.0_3.0_1726700395207.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = BertEmbeddings.pretrained("bert_base_arabert_finetuned_mdeberta_tswana","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = BertEmbeddings.pretrained("bert_base_arabert_finetuned_mdeberta_tswana","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_arabert_finetuned_mdeberta_tswana| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[bert]| +|Language:|en| +|Size:|504.6 MB| + +## References + +https://huggingface.co/betteib/bert-base-arabert-finetuned-mdeberta-tn \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-bert_base_cased_qqp_en.md b/docs/_posts/ahmedlone127/2024-09-18-bert_base_cased_qqp_en.md new file mode 100644 index 00000000000000..63600a4d0d65c3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-bert_base_cased_qqp_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_cased_qqp BertForSequenceClassification from WillHeld +author: John Snow Labs +name: bert_base_cased_qqp +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_cased_qqp` is a English model originally trained by WillHeld. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_cased_qqp_en_5.5.0_3.0_1726623656921.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_cased_qqp_en_5.5.0_3.0_1726623656921.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_cased_qqp","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_cased_qqp", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_cased_qqp| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|406.0 MB| + +## References + +https://huggingface.co/WillHeld/bert-base-cased-qqp \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-bert_base_cased_qqp_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-bert_base_cased_qqp_pipeline_en.md new file mode 100644 index 00000000000000..6c6b47d1129b99 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-bert_base_cased_qqp_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_base_cased_qqp_pipeline pipeline BertForSequenceClassification from WillHeld +author: John Snow Labs +name: bert_base_cased_qqp_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_cased_qqp_pipeline` is a English model originally trained by WillHeld. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_cased_qqp_pipeline_en_5.5.0_3.0_1726623676136.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_cased_qqp_pipeline_en_5.5.0_3.0_1726623676136.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_cased_qqp_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_cased_qqp_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_cased_qqp_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|406.0 MB| + +## References + +https://huggingface.co/WillHeld/bert-base-cased-qqp + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-bert_base_historic_dutch_cased_squad_dutch_en.md b/docs/_posts/ahmedlone127/2024-09-18-bert_base_historic_dutch_cased_squad_dutch_en.md new file mode 100644 index 00000000000000..46936cd27d24bf --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-bert_base_historic_dutch_cased_squad_dutch_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_historic_dutch_cased_squad_dutch BertForQuestionAnswering from Nadav +author: John Snow Labs +name: bert_base_historic_dutch_cased_squad_dutch +date: 2024-09-18 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_historic_dutch_cased_squad_dutch` is a English model originally trained by Nadav. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_historic_dutch_cased_squad_dutch_en_5.5.0_3.0_1726659133410.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_historic_dutch_cased_squad_dutch_en_5.5.0_3.0_1726659133410.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_historic_dutch_cased_squad_dutch","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_historic_dutch_cased_squad_dutch", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_historic_dutch_cased_squad_dutch| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|412.0 MB| + +## References + +https://huggingface.co/Nadav/bert-base-historic-dutch-cased-squad-nl \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-bert_base_squad_v1_1_portuguese_ibama_v0_220240904182946_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-bert_base_squad_v1_1_portuguese_ibama_v0_220240904182946_pipeline_en.md new file mode 100644 index 00000000000000..4ade4f3abbb925 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-bert_base_squad_v1_1_portuguese_ibama_v0_220240904182946_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_squad_v1_1_portuguese_ibama_v0_220240904182946_pipeline pipeline BertForQuestionAnswering from alcalazans +author: John Snow Labs +name: bert_base_squad_v1_1_portuguese_ibama_v0_220240904182946_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_squad_v1_1_portuguese_ibama_v0_220240904182946_pipeline` is a English model originally trained by alcalazans. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_squad_v1_1_portuguese_ibama_v0_220240904182946_pipeline_en_5.5.0_3.0_1726667918574.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_squad_v1_1_portuguese_ibama_v0_220240904182946_pipeline_en_5.5.0_3.0_1726667918574.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_squad_v1_1_portuguese_ibama_v0_220240904182946_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_squad_v1_1_portuguese_ibama_v0_220240904182946_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_squad_v1_1_portuguese_ibama_v0_220240904182946_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|405.9 MB| + +## References + +https://huggingface.co/alcalazans/bert-base-squad-v1.1-pt-IBAMA_v0.220240904182946 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-bert_base_uncased_finetuned_squad_summerzhang_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-bert_base_uncased_finetuned_squad_summerzhang_pipeline_en.md new file mode 100644 index 00000000000000..80e560e40eb408 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-bert_base_uncased_finetuned_squad_summerzhang_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_uncased_finetuned_squad_summerzhang_pipeline pipeline BertForQuestionAnswering from SummerZhang +author: John Snow Labs +name: bert_base_uncased_finetuned_squad_summerzhang_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetuned_squad_summerzhang_pipeline` is a English model originally trained by SummerZhang. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_squad_summerzhang_pipeline_en_5.5.0_3.0_1726658697869.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_squad_summerzhang_pipeline_en_5.5.0_3.0_1726658697869.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_finetuned_squad_summerzhang_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_finetuned_squad_summerzhang_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetuned_squad_summerzhang_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/SummerZhang/bert-base-uncased-finetuned-squad + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-bert_base_uncased_issues_128_takaiwai_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-bert_base_uncased_issues_128_takaiwai_pipeline_en.md new file mode 100644 index 00000000000000..32d1b9e01833f8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-bert_base_uncased_issues_128_takaiwai_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_base_uncased_issues_128_takaiwai_pipeline pipeline BertEmbeddings from takaiwai +author: John Snow Labs +name: bert_base_uncased_issues_128_takaiwai_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_issues_128_takaiwai_pipeline` is a English model originally trained by takaiwai. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_issues_128_takaiwai_pipeline_en_5.5.0_3.0_1726700756874.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_issues_128_takaiwai_pipeline_en_5.5.0_3.0_1726700756874.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_issues_128_takaiwai_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_issues_128_takaiwai_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_issues_128_takaiwai_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/takaiwai/bert-base-uncased-issues-128 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-bert_emotion_hirenvadalia_en.md b/docs/_posts/ahmedlone127/2024-09-18-bert_emotion_hirenvadalia_en.md new file mode 100644 index 00000000000000..3f90cb26df98a2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-bert_emotion_hirenvadalia_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_emotion_hirenvadalia DistilBertForSequenceClassification from hirenvadalia +author: John Snow Labs +name: bert_emotion_hirenvadalia +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_emotion_hirenvadalia` is a English model originally trained by hirenvadalia. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_emotion_hirenvadalia_en_5.5.0_3.0_1726625926876.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_emotion_hirenvadalia_en_5.5.0_3.0_1726625926876.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("bert_emotion_hirenvadalia","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("bert_emotion_hirenvadalia", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_emotion_hirenvadalia| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|246.0 MB| + +## References + +https://huggingface.co/hirenvadalia/bert-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-bert_ner_ANER_pipeline_ar.md b/docs/_posts/ahmedlone127/2024-09-18-bert_ner_ANER_pipeline_ar.md new file mode 100644 index 00000000000000..c914c41be7954b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-bert_ner_ANER_pipeline_ar.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Arabic bert_ner_ANER_pipeline pipeline BertForTokenClassification from boda +author: John Snow Labs +name: bert_ner_ANER_pipeline +date: 2024-09-18 +tags: [ar, open_source, pipeline, onnx] +task: Named Entity Recognition +language: ar +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_ner_ANER_pipeline` is a Arabic model originally trained by boda. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_ANER_pipeline_ar_5.5.0_3.0_1726699232991.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_ANER_pipeline_ar_5.5.0_3.0_1726699232991.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_ner_ANER_pipeline", lang = "ar") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_ner_ANER_pipeline", lang = "ar") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_ANER_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|ar| +|Size:|505.4 MB| + +## References + +https://huggingface.co/boda/ANER + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-bert_sentiment_persian_farsi_en.md b/docs/_posts/ahmedlone127/2024-09-18-bert_sentiment_persian_farsi_en.md new file mode 100644 index 00000000000000..7358ebe0a1671e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-bert_sentiment_persian_farsi_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_sentiment_persian_farsi RoBertaForSequenceClassification from Rasooli +author: John Snow Labs +name: bert_sentiment_persian_farsi +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_sentiment_persian_farsi` is a English model originally trained by Rasooli. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_sentiment_persian_farsi_en_5.5.0_3.0_1726622443956.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_sentiment_persian_farsi_en_5.5.0_3.0_1726622443956.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("bert_sentiment_persian_farsi","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("bert_sentiment_persian_farsi", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_sentiment_persian_farsi| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|444.3 MB| + +## References + +https://huggingface.co/Rasooli/Bert-Sentiment-Fa \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-bert_sql_classfication_en.md b/docs/_posts/ahmedlone127/2024-09-18-bert_sql_classfication_en.md new file mode 100644 index 00000000000000..e9ab1b00a6b274 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-bert_sql_classfication_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_sql_classfication DistilBertForSequenceClassification from hiwensen +author: John Snow Labs +name: bert_sql_classfication +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_sql_classfication` is a English model originally trained by hiwensen. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_sql_classfication_en_5.5.0_3.0_1726696213329.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_sql_classfication_en_5.5.0_3.0_1726696213329.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("bert_sql_classfication","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("bert_sql_classfication", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_sql_classfication| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/hiwensen/bert_sql_classfication \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-bert_sql_classfication_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-bert_sql_classfication_pipeline_en.md new file mode 100644 index 00000000000000..040a58accea072 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-bert_sql_classfication_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_sql_classfication_pipeline pipeline DistilBertForSequenceClassification from hiwensen +author: John Snow Labs +name: bert_sql_classfication_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_sql_classfication_pipeline` is a English model originally trained by hiwensen. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_sql_classfication_pipeline_en_5.5.0_3.0_1726696226396.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_sql_classfication_pipeline_en_5.5.0_3.0_1726696226396.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_sql_classfication_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_sql_classfication_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_sql_classfication_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/hiwensen/bert_sql_classfication + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-bert_vllm_gemma2b_deterministic_7_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-bert_vllm_gemma2b_deterministic_7_pipeline_en.md new file mode 100644 index 00000000000000..8f878e590994ca --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-bert_vllm_gemma2b_deterministic_7_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_vllm_gemma2b_deterministic_7_pipeline pipeline DistilBertForSequenceClassification from jvelja +author: John Snow Labs +name: bert_vllm_gemma2b_deterministic_7_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_vllm_gemma2b_deterministic_7_pipeline` is a English model originally trained by jvelja. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_vllm_gemma2b_deterministic_7_pipeline_en_5.5.0_3.0_1726695114071.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_vllm_gemma2b_deterministic_7_pipeline_en_5.5.0_3.0_1726695114071.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_vllm_gemma2b_deterministic_7_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_vllm_gemma2b_deterministic_7_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_vllm_gemma2b_deterministic_7_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/jvelja/BERT_vllm-gemma2b-deterministic_7 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-bertin_roberta_fine_tuned_text_classification_slovene_data_augmentation_test_3_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-bertin_roberta_fine_tuned_text_classification_slovene_data_augmentation_test_3_pipeline_en.md new file mode 100644 index 00000000000000..b0ae8c61b96f28 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-bertin_roberta_fine_tuned_text_classification_slovene_data_augmentation_test_3_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bertin_roberta_fine_tuned_text_classification_slovene_data_augmentation_test_3_pipeline pipeline RoBertaForSequenceClassification from Sleoruiz +author: John Snow Labs +name: bertin_roberta_fine_tuned_text_classification_slovene_data_augmentation_test_3_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bertin_roberta_fine_tuned_text_classification_slovene_data_augmentation_test_3_pipeline` is a English model originally trained by Sleoruiz. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bertin_roberta_fine_tuned_text_classification_slovene_data_augmentation_test_3_pipeline_en_5.5.0_3.0_1726641435250.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bertin_roberta_fine_tuned_text_classification_slovene_data_augmentation_test_3_pipeline_en_5.5.0_3.0_1726641435250.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bertin_roberta_fine_tuned_text_classification_slovene_data_augmentation_test_3_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bertin_roberta_fine_tuned_text_classification_slovene_data_augmentation_test_3_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bertin_roberta_fine_tuned_text_classification_slovene_data_augmentation_test_3_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|464.7 MB| + +## References + +https://huggingface.co/Sleoruiz/bertin-roberta-fine-tuned-text-classification-SL-data-augmentation-test-3 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-biobert_finetuned_squad_insurance_parambharat_en.md b/docs/_posts/ahmedlone127/2024-09-18-biobert_finetuned_squad_insurance_parambharat_en.md new file mode 100644 index 00000000000000..39c6f1618c5845 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-biobert_finetuned_squad_insurance_parambharat_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English biobert_finetuned_squad_insurance_parambharat BertForQuestionAnswering from parambharat +author: John Snow Labs +name: biobert_finetuned_squad_insurance_parambharat +date: 2024-09-18 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`biobert_finetuned_squad_insurance_parambharat` is a English model originally trained by parambharat. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/biobert_finetuned_squad_insurance_parambharat_en_5.5.0_3.0_1726658968714.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/biobert_finetuned_squad_insurance_parambharat_en_5.5.0_3.0_1726658968714.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("biobert_finetuned_squad_insurance_parambharat","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("biobert_finetuned_squad_insurance_parambharat", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|biobert_finetuned_squad_insurance_parambharat| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/parambharat/biobert-finetuned-squad-insurance \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-biobert_finetuned_squad_insurance_parambharat_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-biobert_finetuned_squad_insurance_parambharat_pipeline_en.md new file mode 100644 index 00000000000000..fab408354d3444 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-biobert_finetuned_squad_insurance_parambharat_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English biobert_finetuned_squad_insurance_parambharat_pipeline pipeline BertForQuestionAnswering from parambharat +author: John Snow Labs +name: biobert_finetuned_squad_insurance_parambharat_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`biobert_finetuned_squad_insurance_parambharat_pipeline` is a English model originally trained by parambharat. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/biobert_finetuned_squad_insurance_parambharat_pipeline_en_5.5.0_3.0_1726659034271.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/biobert_finetuned_squad_insurance_parambharat_pipeline_en_5.5.0_3.0_1726659034271.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("biobert_finetuned_squad_insurance_parambharat_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("biobert_finetuned_squad_insurance_parambharat_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|biobert_finetuned_squad_insurance_parambharat_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/parambharat/biobert-finetuned-squad-insurance + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-bpe_selfies_pubchem_shard00_50k_en.md b/docs/_posts/ahmedlone127/2024-09-18-bpe_selfies_pubchem_shard00_50k_en.md new file mode 100644 index 00000000000000..d7c8204ac37702 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-bpe_selfies_pubchem_shard00_50k_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bpe_selfies_pubchem_shard00_50k RoBertaEmbeddings from seyonec +author: John Snow Labs +name: bpe_selfies_pubchem_shard00_50k +date: 2024-09-18 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bpe_selfies_pubchem_shard00_50k` is a English model originally trained by seyonec. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bpe_selfies_pubchem_shard00_50k_en_5.5.0_3.0_1726651331274.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bpe_selfies_pubchem_shard00_50k_en_5.5.0_3.0_1726651331274.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("bpe_selfies_pubchem_shard00_50k","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("bpe_selfies_pubchem_shard00_50k","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bpe_selfies_pubchem_shard00_50k| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|309.6 MB| + +## References + +https://huggingface.co/seyonec/BPE_SELFIES_PubChem_shard00_50k \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-btsn1_distilbert_base_uncased_en.md b/docs/_posts/ahmedlone127/2024-09-18-btsn1_distilbert_base_uncased_en.md new file mode 100644 index 00000000000000..1029044651a6b8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-btsn1_distilbert_base_uncased_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English btsn1_distilbert_base_uncased DistilBertForSequenceClassification from ceblay +author: John Snow Labs +name: btsn1_distilbert_base_uncased +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`btsn1_distilbert_base_uncased` is a English model originally trained by ceblay. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/btsn1_distilbert_base_uncased_en_5.5.0_3.0_1726677076870.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/btsn1_distilbert_base_uncased_en_5.5.0_3.0_1726677076870.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("btsn1_distilbert_base_uncased","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("btsn1_distilbert_base_uncased", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|btsn1_distilbert_base_uncased| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/ceblay/btsn1-distilbert-base-uncased \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-burmese_awesome_model_adelinachirtes_en.md b/docs/_posts/ahmedlone127/2024-09-18-burmese_awesome_model_adelinachirtes_en.md new file mode 100644 index 00000000000000..982b7179d1ea0b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-burmese_awesome_model_adelinachirtes_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_awesome_model_adelinachirtes DistilBertForSequenceClassification from adelinachirtes +author: John Snow Labs +name: burmese_awesome_model_adelinachirtes +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_adelinachirtes` is a English model originally trained by adelinachirtes. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_adelinachirtes_en_5.5.0_3.0_1726681492290.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_adelinachirtes_en_5.5.0_3.0_1726681492290.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_adelinachirtes","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_adelinachirtes", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_adelinachirtes| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/adelinachirtes/my_awesome_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-burmese_awesome_model_alexanderaziz_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-burmese_awesome_model_alexanderaziz_pipeline_en.md new file mode 100644 index 00000000000000..6db8293e567898 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-burmese_awesome_model_alexanderaziz_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_model_alexanderaziz_pipeline pipeline DistilBertForSequenceClassification from alexanderaziz +author: John Snow Labs +name: burmese_awesome_model_alexanderaziz_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_alexanderaziz_pipeline` is a English model originally trained by alexanderaziz. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_alexanderaziz_pipeline_en_5.5.0_3.0_1726630182614.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_alexanderaziz_pipeline_en_5.5.0_3.0_1726630182614.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_model_alexanderaziz_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_model_alexanderaziz_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_alexanderaziz_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/alexanderaziz/my_awesome_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-burmese_awesome_model_anhminh3105_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-burmese_awesome_model_anhminh3105_pipeline_en.md new file mode 100644 index 00000000000000..ff91b49343baeb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-burmese_awesome_model_anhminh3105_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_model_anhminh3105_pipeline pipeline DistilBertForSequenceClassification from anhminh3105 +author: John Snow Labs +name: burmese_awesome_model_anhminh3105_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_anhminh3105_pipeline` is a English model originally trained by anhminh3105. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_anhminh3105_pipeline_en_5.5.0_3.0_1726677418784.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_anhminh3105_pipeline_en_5.5.0_3.0_1726677418784.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_model_anhminh3105_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_model_anhminh3105_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_anhminh3105_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/anhminh3105/my_awesome_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-burmese_awesome_model_dajulster_en.md b/docs/_posts/ahmedlone127/2024-09-18-burmese_awesome_model_dajulster_en.md new file mode 100644 index 00000000000000..1dac3803a28dc8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-burmese_awesome_model_dajulster_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_awesome_model_dajulster DistilBertForSequenceClassification from DaJulster +author: John Snow Labs +name: burmese_awesome_model_dajulster +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_dajulster` is a English model originally trained by DaJulster. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_dajulster_en_5.5.0_3.0_1726630717134.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_dajulster_en_5.5.0_3.0_1726630717134.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_dajulster","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_dajulster", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_dajulster| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/DaJulster/my_awesome_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-burmese_awesome_model_dajulster_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-burmese_awesome_model_dajulster_pipeline_en.md new file mode 100644 index 00000000000000..a3d1cea5cf526e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-burmese_awesome_model_dajulster_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_model_dajulster_pipeline pipeline DistilBertForSequenceClassification from DaJulster +author: John Snow Labs +name: burmese_awesome_model_dajulster_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_dajulster_pipeline` is a English model originally trained by DaJulster. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_dajulster_pipeline_en_5.5.0_3.0_1726630729759.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_dajulster_pipeline_en_5.5.0_3.0_1726630729759.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_model_dajulster_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_model_dajulster_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_dajulster_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/DaJulster/my_awesome_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-burmese_awesome_model_darkshark77_en.md b/docs/_posts/ahmedlone127/2024-09-18-burmese_awesome_model_darkshark77_en.md new file mode 100644 index 00000000000000..1caf5e4884c6b2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-burmese_awesome_model_darkshark77_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_awesome_model_darkshark77 DistilBertForSequenceClassification from darkshark77 +author: John Snow Labs +name: burmese_awesome_model_darkshark77 +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_darkshark77` is a English model originally trained by darkshark77. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_darkshark77_en_5.5.0_3.0_1726695643421.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_darkshark77_en_5.5.0_3.0_1726695643421.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_darkshark77","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_darkshark77", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_darkshark77| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/darkshark77/my_awesome_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-burmese_awesome_model_darkshark77_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-burmese_awesome_model_darkshark77_pipeline_en.md new file mode 100644 index 00000000000000..af20126a28670d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-burmese_awesome_model_darkshark77_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_model_darkshark77_pipeline pipeline DistilBertForSequenceClassification from darkshark77 +author: John Snow Labs +name: burmese_awesome_model_darkshark77_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_darkshark77_pipeline` is a English model originally trained by darkshark77. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_darkshark77_pipeline_en_5.5.0_3.0_1726695655958.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_darkshark77_pipeline_en_5.5.0_3.0_1726695655958.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_model_darkshark77_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_model_darkshark77_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_darkshark77_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/darkshark77/my_awesome_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-burmese_awesome_model_dguywill_en.md b/docs/_posts/ahmedlone127/2024-09-18-burmese_awesome_model_dguywill_en.md new file mode 100644 index 00000000000000..bf4089a6ccc2c6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-burmese_awesome_model_dguywill_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_awesome_model_dguywill DistilBertForSequenceClassification from dguywill +author: John Snow Labs +name: burmese_awesome_model_dguywill +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_dguywill` is a English model originally trained by dguywill. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_dguywill_en_5.5.0_3.0_1726694888765.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_dguywill_en_5.5.0_3.0_1726694888765.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_dguywill","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_dguywill", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_dguywill| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/dguywill/my_awesome_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-burmese_awesome_model_dldnlee_en.md b/docs/_posts/ahmedlone127/2024-09-18-burmese_awesome_model_dldnlee_en.md new file mode 100644 index 00000000000000..156ad52b18ed03 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-burmese_awesome_model_dldnlee_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_awesome_model_dldnlee DistilBertForSequenceClassification from dldnlee +author: John Snow Labs +name: burmese_awesome_model_dldnlee +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_dldnlee` is a English model originally trained by dldnlee. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_dldnlee_en_5.5.0_3.0_1726696432371.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_dldnlee_en_5.5.0_3.0_1726696432371.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_dldnlee","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_dldnlee", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_dldnlee| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/dldnlee/my_awesome_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-burmese_awesome_model_duy221_en.md b/docs/_posts/ahmedlone127/2024-09-18-burmese_awesome_model_duy221_en.md new file mode 100644 index 00000000000000..efb13b7164cf10 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-burmese_awesome_model_duy221_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_awesome_model_duy221 DistilBertForSequenceClassification from duy221 +author: John Snow Labs +name: burmese_awesome_model_duy221 +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_duy221` is a English model originally trained by duy221. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_duy221_en_5.5.0_3.0_1726694860914.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_duy221_en_5.5.0_3.0_1726694860914.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_duy221","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_duy221", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_duy221| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/duy221/my_awesome_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-burmese_awesome_model_duy221_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-burmese_awesome_model_duy221_pipeline_en.md new file mode 100644 index 00000000000000..c86b874ed21df6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-burmese_awesome_model_duy221_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_model_duy221_pipeline pipeline DistilBertForSequenceClassification from duy221 +author: John Snow Labs +name: burmese_awesome_model_duy221_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_duy221_pipeline` is a English model originally trained by duy221. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_duy221_pipeline_en_5.5.0_3.0_1726694876391.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_duy221_pipeline_en_5.5.0_3.0_1726694876391.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_model_duy221_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_model_duy221_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_duy221_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/duy221/my_awesome_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-burmese_awesome_model_jayhook_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-burmese_awesome_model_jayhook_pipeline_en.md new file mode 100644 index 00000000000000..cf66aaa331a0c8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-burmese_awesome_model_jayhook_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_model_jayhook_pipeline pipeline DistilBertForSequenceClassification from jayhook +author: John Snow Labs +name: burmese_awesome_model_jayhook_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_jayhook_pipeline` is a English model originally trained by jayhook. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_jayhook_pipeline_en_5.5.0_3.0_1726625587461.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_jayhook_pipeline_en_5.5.0_3.0_1726625587461.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_model_jayhook_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_model_jayhook_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_jayhook_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/jayhook/my_awesome_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-burmese_awesome_model_neroism8422_en.md b/docs/_posts/ahmedlone127/2024-09-18-burmese_awesome_model_neroism8422_en.md new file mode 100644 index 00000000000000..c2760db1bde9f6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-burmese_awesome_model_neroism8422_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_awesome_model_neroism8422 DistilBertForSequenceClassification from Neroism8422 +author: John Snow Labs +name: burmese_awesome_model_neroism8422 +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_neroism8422` is a English model originally trained by Neroism8422. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_neroism8422_en_5.5.0_3.0_1726680807555.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_neroism8422_en_5.5.0_3.0_1726680807555.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_neroism8422","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_neroism8422", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_neroism8422| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Neroism8422/my_awesome_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-burmese_awesome_model_neroism8422_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-burmese_awesome_model_neroism8422_pipeline_en.md new file mode 100644 index 00000000000000..3595a5077a449f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-burmese_awesome_model_neroism8422_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_model_neroism8422_pipeline pipeline DistilBertForSequenceClassification from Neroism8422 +author: John Snow Labs +name: burmese_awesome_model_neroism8422_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_neroism8422_pipeline` is a English model originally trained by Neroism8422. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_neroism8422_pipeline_en_5.5.0_3.0_1726680819826.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_neroism8422_pipeline_en_5.5.0_3.0_1726680819826.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_model_neroism8422_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_model_neroism8422_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_neroism8422_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Neroism8422/my_awesome_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-burmese_awesome_model_pawannlp123_en.md b/docs/_posts/ahmedlone127/2024-09-18-burmese_awesome_model_pawannlp123_en.md new file mode 100644 index 00000000000000..4461cd7d32da76 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-burmese_awesome_model_pawannlp123_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_awesome_model_pawannlp123 DistilBertForSequenceClassification from pawanNLP123 +author: John Snow Labs +name: burmese_awesome_model_pawannlp123 +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_pawannlp123` is a English model originally trained by pawanNLP123. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_pawannlp123_en_5.5.0_3.0_1726696044568.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_pawannlp123_en_5.5.0_3.0_1726696044568.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_pawannlp123","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_pawannlp123", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_pawannlp123| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/pawanNLP123/my_awesome_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-burmese_awesome_model_priority_2_en.md b/docs/_posts/ahmedlone127/2024-09-18-burmese_awesome_model_priority_2_en.md new file mode 100644 index 00000000000000..8ee2c79df6669a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-burmese_awesome_model_priority_2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_awesome_model_priority_2 DistilBertForSequenceClassification from lenate +author: John Snow Labs +name: burmese_awesome_model_priority_2 +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_priority_2` is a English model originally trained by lenate. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_priority_2_en_5.5.0_3.0_1726630985864.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_priority_2_en_5.5.0_3.0_1726630985864.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_priority_2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_priority_2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_priority_2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/lenate/my_awesome_model_priority_2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-burmese_awesome_model_sklug_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-burmese_awesome_model_sklug_pipeline_en.md new file mode 100644 index 00000000000000..570f48b889f3a2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-burmese_awesome_model_sklug_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_model_sklug_pipeline pipeline DistilBertForSequenceClassification from sklug +author: John Snow Labs +name: burmese_awesome_model_sklug_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_sklug_pipeline` is a English model originally trained by sklug. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_sklug_pipeline_en_5.5.0_3.0_1726630283720.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_sklug_pipeline_en_5.5.0_3.0_1726630283720.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_model_sklug_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_model_sklug_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_sklug_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/sklug/my_awesome_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-burmese_awesome_model_thebisso09_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-burmese_awesome_model_thebisso09_pipeline_en.md new file mode 100644 index 00000000000000..800cf579be0968 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-burmese_awesome_model_thebisso09_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_model_thebisso09_pipeline pipeline DistilBertForSequenceClassification from Thebisso09 +author: John Snow Labs +name: burmese_awesome_model_thebisso09_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_thebisso09_pipeline` is a English model originally trained by Thebisso09. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_thebisso09_pipeline_en_5.5.0_3.0_1726681812015.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_thebisso09_pipeline_en_5.5.0_3.0_1726681812015.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_model_thebisso09_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_model_thebisso09_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_thebisso09_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Thebisso09/my_awesome_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-burmese_awesome_qa_model_ahmed13245_en.md b/docs/_posts/ahmedlone127/2024-09-18-burmese_awesome_qa_model_ahmed13245_en.md new file mode 100644 index 00000000000000..1ed1e6957d4502 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-burmese_awesome_qa_model_ahmed13245_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English burmese_awesome_qa_model_ahmed13245 DistilBertForQuestionAnswering from AHMED13245 +author: John Snow Labs +name: burmese_awesome_qa_model_ahmed13245 +date: 2024-09-18 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_qa_model_ahmed13245` is a English model originally trained by AHMED13245. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_ahmed13245_en_5.5.0_3.0_1726641008931.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_ahmed13245_en_5.5.0_3.0_1726641008931.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("burmese_awesome_qa_model_ahmed13245","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("burmese_awesome_qa_model_ahmed13245", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_qa_model_ahmed13245| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/AHMED13245/my_awesome_qa_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-burmese_finetuned_distilbert_on_imdb_en.md b/docs/_posts/ahmedlone127/2024-09-18-burmese_finetuned_distilbert_on_imdb_en.md new file mode 100644 index 00000000000000..e310f41c1217cc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-burmese_finetuned_distilbert_on_imdb_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_finetuned_distilbert_on_imdb DistilBertForSequenceClassification from gslshbs +author: John Snow Labs +name: burmese_finetuned_distilbert_on_imdb +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_finetuned_distilbert_on_imdb` is a English model originally trained by gslshbs. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_finetuned_distilbert_on_imdb_en_5.5.0_3.0_1726680500836.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_finetuned_distilbert_on_imdb_en_5.5.0_3.0_1726680500836.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_finetuned_distilbert_on_imdb","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_finetuned_distilbert_on_imdb", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_finetuned_distilbert_on_imdb| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/gslshbs/my_finetuned_DistilBERT_on_IMDb \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-burmese_finetuned_distilbert_on_imdb_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-burmese_finetuned_distilbert_on_imdb_pipeline_en.md new file mode 100644 index 00000000000000..115bd79aa9feb8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-burmese_finetuned_distilbert_on_imdb_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_finetuned_distilbert_on_imdb_pipeline pipeline DistilBertForSequenceClassification from gslshbs +author: John Snow Labs +name: burmese_finetuned_distilbert_on_imdb_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_finetuned_distilbert_on_imdb_pipeline` is a English model originally trained by gslshbs. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_finetuned_distilbert_on_imdb_pipeline_en_5.5.0_3.0_1726680513696.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_finetuned_distilbert_on_imdb_pipeline_en_5.5.0_3.0_1726680513696.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_finetuned_distilbert_on_imdb_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_finetuned_distilbert_on_imdb_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_finetuned_distilbert_on_imdb_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/gslshbs/my_finetuned_DistilBERT_on_IMDb + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-burmese_finetuned_emotion_distilbert_zijay_en.md b/docs/_posts/ahmedlone127/2024-09-18-burmese_finetuned_emotion_distilbert_zijay_en.md new file mode 100644 index 00000000000000..1b8e204f8cad87 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-burmese_finetuned_emotion_distilbert_zijay_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_finetuned_emotion_distilbert_zijay DistilBertForSequenceClassification from zijay +author: John Snow Labs +name: burmese_finetuned_emotion_distilbert_zijay +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_finetuned_emotion_distilbert_zijay` is a English model originally trained by zijay. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_finetuned_emotion_distilbert_zijay_en_5.5.0_3.0_1726696233540.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_finetuned_emotion_distilbert_zijay_en_5.5.0_3.0_1726696233540.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_finetuned_emotion_distilbert_zijay","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_finetuned_emotion_distilbert_zijay", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_finetuned_emotion_distilbert_zijay| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|246.0 MB| + +## References + +https://huggingface.co/zijay/my-finetuned-emotion-distilbert \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-burmese_model_eperiment6_en.md b/docs/_posts/ahmedlone127/2024-09-18-burmese_model_eperiment6_en.md new file mode 100644 index 00000000000000..09953cc634c008 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-burmese_model_eperiment6_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_model_eperiment6 DistilBertForSequenceClassification from HFFErica +author: John Snow Labs +name: burmese_model_eperiment6 +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_model_eperiment6` is a English model originally trained by HFFErica. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_model_eperiment6_en_5.5.0_3.0_1726680746757.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_model_eperiment6_en_5.5.0_3.0_1726680746757.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_model_eperiment6","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_model_eperiment6", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_model_eperiment6| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.6 MB| + +## References + +https://huggingface.co/HFFErica/my_model_Eperiment6 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-burmese_model_parsawar_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-burmese_model_parsawar_pipeline_en.md new file mode 100644 index 00000000000000..38cc6a3044b76b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-burmese_model_parsawar_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_model_parsawar_pipeline pipeline DistilBertForSequenceClassification from parsawar +author: John Snow Labs +name: burmese_model_parsawar_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_model_parsawar_pipeline` is a English model originally trained by parsawar. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_model_parsawar_pipeline_en_5.5.0_3.0_1726625541952.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_model_parsawar_pipeline_en_5.5.0_3.0_1726625541952.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_model_parsawar_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_model_parsawar_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_model_parsawar_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/parsawar/my_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-burmese_not_somali_awesome_model_en.md b/docs/_posts/ahmedlone127/2024-09-18-burmese_not_somali_awesome_model_en.md new file mode 100644 index 00000000000000..354b5299f6efc8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-burmese_not_somali_awesome_model_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_not_somali_awesome_model DistilBertForSequenceClassification from baris-yazici +author: John Snow Labs +name: burmese_not_somali_awesome_model +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_not_somali_awesome_model` is a English model originally trained by baris-yazici. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_not_somali_awesome_model_en_5.5.0_3.0_1726682050486.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_not_somali_awesome_model_en_5.5.0_3.0_1726682050486.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_not_somali_awesome_model","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_not_somali_awesome_model", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_not_somali_awesome_model| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/baris-yazici/my_not_so_awesome_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-burmese_not_somali_awesome_model_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-burmese_not_somali_awesome_model_pipeline_en.md new file mode 100644 index 00000000000000..731b4e215f2349 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-burmese_not_somali_awesome_model_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_not_somali_awesome_model_pipeline pipeline DistilBertForSequenceClassification from baris-yazici +author: John Snow Labs +name: burmese_not_somali_awesome_model_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_not_somali_awesome_model_pipeline` is a English model originally trained by baris-yazici. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_not_somali_awesome_model_pipeline_en_5.5.0_3.0_1726682065100.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_not_somali_awesome_model_pipeline_en_5.5.0_3.0_1726682065100.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_not_somali_awesome_model_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_not_somali_awesome_model_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_not_somali_awesome_model_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/baris-yazici/my_not_so_awesome_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-cat_ner_iw_4_en.md b/docs/_posts/ahmedlone127/2024-09-18-cat_ner_iw_4_en.md new file mode 100644 index 00000000000000..0be41f280715ea --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-cat_ner_iw_4_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English cat_ner_iw_4 XlmRoBertaForTokenClassification from homersimpson +author: John Snow Labs +name: cat_ner_iw_4 +date: 2024-09-18 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`cat_ner_iw_4` is a English model originally trained by homersimpson. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/cat_ner_iw_4_en_5.5.0_3.0_1726635971336.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/cat_ner_iw_4_en_5.5.0_3.0_1726635971336.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("cat_ner_iw_4","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("cat_ner_iw_4", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|cat_ner_iw_4| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|423.2 MB| + +## References + +https://huggingface.co/homersimpson/cat-ner-iw-4 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-category_1_delivery_cancellation_distilbert_base_uncased_v1_en.md b/docs/_posts/ahmedlone127/2024-09-18-category_1_delivery_cancellation_distilbert_base_uncased_v1_en.md new file mode 100644 index 00000000000000..17086111cce61d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-category_1_delivery_cancellation_distilbert_base_uncased_v1_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English category_1_delivery_cancellation_distilbert_base_uncased_v1 DistilBertForSequenceClassification from chuuhtetnaing +author: John Snow Labs +name: category_1_delivery_cancellation_distilbert_base_uncased_v1 +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`category_1_delivery_cancellation_distilbert_base_uncased_v1` is a English model originally trained by chuuhtetnaing. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/category_1_delivery_cancellation_distilbert_base_uncased_v1_en_5.5.0_3.0_1726669508139.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/category_1_delivery_cancellation_distilbert_base_uncased_v1_en_5.5.0_3.0_1726669508139.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("category_1_delivery_cancellation_distilbert_base_uncased_v1","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("category_1_delivery_cancellation_distilbert_base_uncased_v1", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|category_1_delivery_cancellation_distilbert_base_uncased_v1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/chuuhtetnaing/category-1-delivery-cancellation-distilbert-base-uncased-v1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-category_1_delivery_cancellation_distilbert_base_uncased_v1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-category_1_delivery_cancellation_distilbert_base_uncased_v1_pipeline_en.md new file mode 100644 index 00000000000000..068cdea9055be9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-category_1_delivery_cancellation_distilbert_base_uncased_v1_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English category_1_delivery_cancellation_distilbert_base_uncased_v1_pipeline pipeline DistilBertForSequenceClassification from chuuhtetnaing +author: John Snow Labs +name: category_1_delivery_cancellation_distilbert_base_uncased_v1_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`category_1_delivery_cancellation_distilbert_base_uncased_v1_pipeline` is a English model originally trained by chuuhtetnaing. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/category_1_delivery_cancellation_distilbert_base_uncased_v1_pipeline_en_5.5.0_3.0_1726669521039.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/category_1_delivery_cancellation_distilbert_base_uncased_v1_pipeline_en_5.5.0_3.0_1726669521039.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("category_1_delivery_cancellation_distilbert_base_uncased_v1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("category_1_delivery_cancellation_distilbert_base_uncased_v1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|category_1_delivery_cancellation_distilbert_base_uncased_v1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/chuuhtetnaing/category-1-delivery-cancellation-distilbert-base-uncased-v1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-ceva_en.md b/docs/_posts/ahmedlone127/2024-09-18-ceva_en.md new file mode 100644 index 00000000000000..9b896d8ed546c4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-ceva_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English ceva RoBertaForSequenceClassification from dianamihalache27 +author: John Snow Labs +name: ceva +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ceva` is a English model originally trained by dianamihalache27. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ceva_en_5.5.0_3.0_1726650180046.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ceva_en_5.5.0_3.0_1726650180046.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("ceva","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("ceva", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ceva| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|468.2 MB| + +## References + +https://huggingface.co/dianamihalache27/ceva \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-clasificadormotivomora10_en.md b/docs/_posts/ahmedlone127/2024-09-18-clasificadormotivomora10_en.md new file mode 100644 index 00000000000000..eb4334f4ce8aca --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-clasificadormotivomora10_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English clasificadormotivomora10 RoBertaForSequenceClassification from Arodrigo +author: John Snow Labs +name: clasificadormotivomora10 +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`clasificadormotivomora10` is a English model originally trained by Arodrigo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/clasificadormotivomora10_en_5.5.0_3.0_1726627794340.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/clasificadormotivomora10_en_5.5.0_3.0_1726627794340.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("clasificadormotivomora10","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("clasificadormotivomora10", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|clasificadormotivomora10| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|464.5 MB| + +## References + +https://huggingface.co/Arodrigo/ClasificadorMotivoMora10 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-classification_4_kfold_v1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-classification_4_kfold_v1_pipeline_en.md new file mode 100644 index 00000000000000..6353b69c6af850 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-classification_4_kfold_v1_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English classification_4_kfold_v1_pipeline pipeline DistilBertForSequenceClassification from Pranavsenthilvel +author: John Snow Labs +name: classification_4_kfold_v1_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`classification_4_kfold_v1_pipeline` is a English model originally trained by Pranavsenthilvel. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/classification_4_kfold_v1_pipeline_en_5.5.0_3.0_1726630496458.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/classification_4_kfold_v1_pipeline_en_5.5.0_3.0_1726630496458.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("classification_4_kfold_v1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("classification_4_kfold_v1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|classification_4_kfold_v1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Pranavsenthilvel/classification-4-kfold-V1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-climate_obstructive_narratives_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-climate_obstructive_narratives_pipeline_en.md new file mode 100644 index 00000000000000..b3ac50ee86eee6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-climate_obstructive_narratives_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English climate_obstructive_narratives_pipeline pipeline RoBertaForSequenceClassification from climate-nlp +author: John Snow Labs +name: climate_obstructive_narratives_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`climate_obstructive_narratives_pipeline` is a English model originally trained by climate-nlp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/climate_obstructive_narratives_pipeline_en_5.5.0_3.0_1726689935785.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/climate_obstructive_narratives_pipeline_en_5.5.0_3.0_1726689935785.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("climate_obstructive_narratives_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("climate_obstructive_narratives_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|climate_obstructive_narratives_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/climate-nlp/climate-obstructive-narratives + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-clinicalbertqa_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-clinicalbertqa_pipeline_en.md new file mode 100644 index 00000000000000..7545282372533f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-clinicalbertqa_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English clinicalbertqa_pipeline pipeline BertForQuestionAnswering from lanzv +author: John Snow Labs +name: clinicalbertqa_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`clinicalbertqa_pipeline` is a English model originally trained by lanzv. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/clinicalbertqa_pipeline_en_5.5.0_3.0_1726667918468.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/clinicalbertqa_pipeline_en_5.5.0_3.0_1726667918468.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("clinicalbertqa_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("clinicalbertqa_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|clinicalbertqa_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|403.3 MB| + +## References + +https://huggingface.co/lanzv/ClinicalBERTQA + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-coha1980s_en.md b/docs/_posts/ahmedlone127/2024-09-18-coha1980s_en.md new file mode 100644 index 00000000000000..f01fd757f14ae9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-coha1980s_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English coha1980s RoBertaEmbeddings from simonmun +author: John Snow Labs +name: coha1980s +date: 2024-09-18 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`coha1980s` is a English model originally trained by simonmun. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/coha1980s_en_5.5.0_3.0_1726678702941.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/coha1980s_en_5.5.0_3.0_1726678702941.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("coha1980s","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("coha1980s","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|coha1980s| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|311.4 MB| + +## References + +https://huggingface.co/simonmun/COHA1980s \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-cold_fusion_finetuned_convincingness_ibm_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-cold_fusion_finetuned_convincingness_ibm_pipeline_en.md new file mode 100644 index 00000000000000..84e5c584be848f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-cold_fusion_finetuned_convincingness_ibm_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English cold_fusion_finetuned_convincingness_ibm_pipeline pipeline RoBertaForSequenceClassification from jakub014 +author: John Snow Labs +name: cold_fusion_finetuned_convincingness_ibm_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`cold_fusion_finetuned_convincingness_ibm_pipeline` is a English model originally trained by jakub014. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/cold_fusion_finetuned_convincingness_ibm_pipeline_en_5.5.0_3.0_1726641979111.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/cold_fusion_finetuned_convincingness_ibm_pipeline_en_5.5.0_3.0_1726641979111.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("cold_fusion_finetuned_convincingness_ibm_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("cold_fusion_finetuned_convincingness_ibm_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|cold_fusion_finetuned_convincingness_ibm_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|468.1 MB| + +## References + +https://huggingface.co/jakub014/ColD-Fusion-finetuned-convincingness-IBM + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-cold_fusion_itr11_seed0_en.md b/docs/_posts/ahmedlone127/2024-09-18-cold_fusion_itr11_seed0_en.md new file mode 100644 index 00000000000000..8584d3fbbf71ad --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-cold_fusion_itr11_seed0_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English cold_fusion_itr11_seed0 RoBertaForSequenceClassification from ibm +author: John Snow Labs +name: cold_fusion_itr11_seed0 +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`cold_fusion_itr11_seed0` is a English model originally trained by ibm. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/cold_fusion_itr11_seed0_en_5.5.0_3.0_1726650362404.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/cold_fusion_itr11_seed0_en_5.5.0_3.0_1726650362404.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("cold_fusion_itr11_seed0","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("cold_fusion_itr11_seed0", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|cold_fusion_itr11_seed0| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|467.9 MB| + +## References + +https://huggingface.co/ibm/ColD-Fusion-itr11-seed0 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-cold_fusion_itr13_seed4_en.md b/docs/_posts/ahmedlone127/2024-09-18-cold_fusion_itr13_seed4_en.md new file mode 100644 index 00000000000000..6c6c171aa0baf0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-cold_fusion_itr13_seed4_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English cold_fusion_itr13_seed4 RoBertaForSequenceClassification from ibm +author: John Snow Labs +name: cold_fusion_itr13_seed4 +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`cold_fusion_itr13_seed4` is a English model originally trained by ibm. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/cold_fusion_itr13_seed4_en_5.5.0_3.0_1726649517117.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/cold_fusion_itr13_seed4_en_5.5.0_3.0_1726649517117.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("cold_fusion_itr13_seed4","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("cold_fusion_itr13_seed4", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|cold_fusion_itr13_seed4| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|467.9 MB| + +## References + +https://huggingface.co/ibm/ColD-Fusion-itr13-seed4 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-cold_fusion_itr15_seed3_en.md b/docs/_posts/ahmedlone127/2024-09-18-cold_fusion_itr15_seed3_en.md new file mode 100644 index 00000000000000..20099b1a5bd1ec --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-cold_fusion_itr15_seed3_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English cold_fusion_itr15_seed3 RoBertaForSequenceClassification from ibm +author: John Snow Labs +name: cold_fusion_itr15_seed3 +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`cold_fusion_itr15_seed3` is a English model originally trained by ibm. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/cold_fusion_itr15_seed3_en_5.5.0_3.0_1726649677590.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/cold_fusion_itr15_seed3_en_5.5.0_3.0_1726649677590.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("cold_fusion_itr15_seed3","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("cold_fusion_itr15_seed3", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|cold_fusion_itr15_seed3| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|468.0 MB| + +## References + +https://huggingface.co/ibm/ColD-Fusion-itr15-seed3 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-cold_fusion_itr15_seed3_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-cold_fusion_itr15_seed3_pipeline_en.md new file mode 100644 index 00000000000000..d07522bfe6f3e7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-cold_fusion_itr15_seed3_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English cold_fusion_itr15_seed3_pipeline pipeline RoBertaForSequenceClassification from ibm +author: John Snow Labs +name: cold_fusion_itr15_seed3_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`cold_fusion_itr15_seed3_pipeline` is a English model originally trained by ibm. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/cold_fusion_itr15_seed3_pipeline_en_5.5.0_3.0_1726649700149.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/cold_fusion_itr15_seed3_pipeline_en_5.5.0_3.0_1726649700149.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("cold_fusion_itr15_seed3_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("cold_fusion_itr15_seed3_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|cold_fusion_itr15_seed3_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|468.0 MB| + +## References + +https://huggingface.co/ibm/ColD-Fusion-itr15-seed3 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-cold_fusion_itr24_seed1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-cold_fusion_itr24_seed1_pipeline_en.md new file mode 100644 index 00000000000000..4b5c7fae850890 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-cold_fusion_itr24_seed1_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English cold_fusion_itr24_seed1_pipeline pipeline RoBertaForSequenceClassification from ibm +author: John Snow Labs +name: cold_fusion_itr24_seed1_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`cold_fusion_itr24_seed1_pipeline` is a English model originally trained by ibm. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/cold_fusion_itr24_seed1_pipeline_en_5.5.0_3.0_1726628210809.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/cold_fusion_itr24_seed1_pipeline_en_5.5.0_3.0_1726628210809.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("cold_fusion_itr24_seed1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("cold_fusion_itr24_seed1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|cold_fusion_itr24_seed1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|468.0 MB| + +## References + +https://huggingface.co/ibm/ColD-Fusion-itr24-seed1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-db_aca_1_1_en.md b/docs/_posts/ahmedlone127/2024-09-18-db_aca_1_1_en.md new file mode 100644 index 00000000000000..1b6a3d3d4450de --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-db_aca_1_1_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English db_aca_1_1 DistilBertForSequenceClassification from exala +author: John Snow Labs +name: db_aca_1_1 +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`db_aca_1_1` is a English model originally trained by exala. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/db_aca_1_1_en_5.5.0_3.0_1726682050217.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/db_aca_1_1_en_5.5.0_3.0_1726682050217.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("db_aca_1_1","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("db_aca_1_1", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|db_aca_1_1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/exala/db_aca_1.1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-db_aca_1_1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-db_aca_1_1_pipeline_en.md new file mode 100644 index 00000000000000..9ed5dec453db20 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-db_aca_1_1_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English db_aca_1_1_pipeline pipeline DistilBertForSequenceClassification from exala +author: John Snow Labs +name: db_aca_1_1_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`db_aca_1_1_pipeline` is a English model originally trained by exala. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/db_aca_1_1_pipeline_en_5.5.0_3.0_1726682064931.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/db_aca_1_1_pipeline_en_5.5.0_3.0_1726682064931.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("db_aca_1_1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("db_aca_1_1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|db_aca_1_1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/exala/db_aca_1.1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-db_mc_6a_89_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-db_mc_6a_89_pipeline_en.md new file mode 100644 index 00000000000000..27ab41a056a981 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-db_mc_6a_89_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English db_mc_6a_89_pipeline pipeline DistilBertForSequenceClassification from exala +author: John Snow Labs +name: db_mc_6a_89_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`db_mc_6a_89_pipeline` is a English model originally trained by exala. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/db_mc_6a_89_pipeline_en_5.5.0_3.0_1726676761452.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/db_mc_6a_89_pipeline_en_5.5.0_3.0_1726676761452.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("db_mc_6a_89_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("db_mc_6a_89_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|db_mc_6a_89_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/exala/db_mc_6a-89 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-deep_pavlov__qa_model_en.md b/docs/_posts/ahmedlone127/2024-09-18-deep_pavlov__qa_model_en.md new file mode 100644 index 00000000000000..89586283feae41 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-deep_pavlov__qa_model_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English deep_pavlov__qa_model BertForQuestionAnswering from greatakela +author: John Snow Labs +name: deep_pavlov__qa_model +date: 2024-09-18 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`deep_pavlov__qa_model` is a English model originally trained by greatakela. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/deep_pavlov__qa_model_en_5.5.0_3.0_1726668301644.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/deep_pavlov__qa_model_en_5.5.0_3.0_1726668301644.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("deep_pavlov__qa_model","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("deep_pavlov__qa_model", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|deep_pavlov__qa_model| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|664.3 MB| + +## References + +https://huggingface.co/greatakela/deep_pavlov__qa_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-did_the_doctor_give_you_his_name_bert_first128_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-did_the_doctor_give_you_his_name_bert_first128_pipeline_en.md new file mode 100644 index 00000000000000..12a4334bffa149 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-did_the_doctor_give_you_his_name_bert_first128_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English did_the_doctor_give_you_his_name_bert_first128_pipeline pipeline BertForSequenceClassification from etadevosyan +author: John Snow Labs +name: did_the_doctor_give_you_his_name_bert_first128_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`did_the_doctor_give_you_his_name_bert_first128_pipeline` is a English model originally trained by etadevosyan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/did_the_doctor_give_you_his_name_bert_first128_pipeline_en_5.5.0_3.0_1726624470534.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/did_the_doctor_give_you_his_name_bert_first128_pipeline_en_5.5.0_3.0_1726624470534.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("did_the_doctor_give_you_his_name_bert_first128_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("did_the_doctor_give_you_his_name_bert_first128_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|did_the_doctor_give_you_his_name_bert_first128_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|666.5 MB| + +## References + +https://huggingface.co/etadevosyan/did_the_doctor_give_you_his_name_bert_First128 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-disaster_tweets_electra_small_en.md b/docs/_posts/ahmedlone127/2024-09-18-disaster_tweets_electra_small_en.md new file mode 100644 index 00000000000000..25c6cab0205384 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-disaster_tweets_electra_small_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English disaster_tweets_electra_small RoBertaForSequenceClassification from Arsive +author: John Snow Labs +name: disaster_tweets_electra_small +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`disaster_tweets_electra_small` is a English model originally trained by Arsive. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/disaster_tweets_electra_small_en_5.5.0_3.0_1726622033084.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/disaster_tweets_electra_small_en_5.5.0_3.0_1726622033084.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("disaster_tweets_electra_small","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("disaster_tweets_electra_small", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|disaster_tweets_electra_small| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|308.6 MB| + +## References + +https://huggingface.co/Arsive/disaster_tweets_electra_small \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_multilingual_cased_language_detection_silvanus_pipeline_xx.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_multilingual_cased_language_detection_silvanus_pipeline_xx.md new file mode 100644 index 00000000000000..508116950e0bd2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_multilingual_cased_language_detection_silvanus_pipeline_xx.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Multilingual distilbert_base_multilingual_cased_language_detection_silvanus_pipeline pipeline DistilBertForSequenceClassification from rollerhafeezh-amikom +author: John Snow Labs +name: distilbert_base_multilingual_cased_language_detection_silvanus_pipeline +date: 2024-09-18 +tags: [xx, open_source, pipeline, onnx] +task: Text Classification +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_multilingual_cased_language_detection_silvanus_pipeline` is a Multilingual model originally trained by rollerhafeezh-amikom. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_multilingual_cased_language_detection_silvanus_pipeline_xx_5.5.0_3.0_1726630627203.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_multilingual_cased_language_detection_silvanus_pipeline_xx_5.5.0_3.0_1726630627203.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_multilingual_cased_language_detection_silvanus_pipeline", lang = "xx") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_multilingual_cased_language_detection_silvanus_pipeline", lang = "xx") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_multilingual_cased_language_detection_silvanus_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|xx| +|Size:|507.6 MB| + +## References + +https://huggingface.co/rollerhafeezh-amikom/distilbert-base-multilingual-cased-language-detection-silvanus + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_multilingual_cased_language_detection_silvanus_xx.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_multilingual_cased_language_detection_silvanus_xx.md new file mode 100644 index 00000000000000..7cfffbe7390dd5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_multilingual_cased_language_detection_silvanus_xx.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Multilingual distilbert_base_multilingual_cased_language_detection_silvanus DistilBertForSequenceClassification from rollerhafeezh-amikom +author: John Snow Labs +name: distilbert_base_multilingual_cased_language_detection_silvanus +date: 2024-09-18 +tags: [xx, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_multilingual_cased_language_detection_silvanus` is a Multilingual model originally trained by rollerhafeezh-amikom. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_multilingual_cased_language_detection_silvanus_xx_5.5.0_3.0_1726630602005.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_multilingual_cased_language_detection_silvanus_xx_5.5.0_3.0_1726630602005.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_multilingual_cased_language_detection_silvanus","xx") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_multilingual_cased_language_detection_silvanus", "xx") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_multilingual_cased_language_detection_silvanus| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|xx| +|Size:|507.6 MB| + +## References + +https://huggingface.co/rollerhafeezh-amikom/distilbert-base-multilingual-cased-language-detection-silvanus \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_multilingual_cased_razones_especificas_esp_xx.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_multilingual_cased_razones_especificas_esp_xx.md new file mode 100644 index 00000000000000..93cd6f998129a4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_multilingual_cased_razones_especificas_esp_xx.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Multilingual distilbert_base_multilingual_cased_razones_especificas_esp DistilBertForSequenceClassification from rogelioplatt +author: John Snow Labs +name: distilbert_base_multilingual_cased_razones_especificas_esp +date: 2024-09-18 +tags: [xx, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_multilingual_cased_razones_especificas_esp` is a Multilingual model originally trained by rogelioplatt. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_multilingual_cased_razones_especificas_esp_xx_5.5.0_3.0_1726696041265.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_multilingual_cased_razones_especificas_esp_xx_5.5.0_3.0_1726696041265.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_multilingual_cased_razones_especificas_esp","xx") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_multilingual_cased_razones_especificas_esp", "xx") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_multilingual_cased_razones_especificas_esp| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|xx| +|Size:|507.6 MB| + +## References + +https://huggingface.co/rogelioplatt/distilbert-base-multilingual-cased-Razones_Especificas_Esp \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_turkish_cased_stance_pipeline_tr.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_turkish_cased_stance_pipeline_tr.md new file mode 100644 index 00000000000000..491ff304e09617 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_turkish_cased_stance_pipeline_tr.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Turkish distilbert_base_turkish_cased_stance_pipeline pipeline DistilBertForSequenceClassification from byunal +author: John Snow Labs +name: distilbert_base_turkish_cased_stance_pipeline +date: 2024-09-18 +tags: [tr, open_source, pipeline, onnx] +task: Text Classification +language: tr +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_turkish_cased_stance_pipeline` is a Turkish model originally trained by byunal. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_turkish_cased_stance_pipeline_tr_5.5.0_3.0_1726669685693.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_turkish_cased_stance_pipeline_tr_5.5.0_3.0_1726669685693.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_turkish_cased_stance_pipeline", lang = "tr") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_turkish_cased_stance_pipeline", lang = "tr") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_turkish_cased_stance_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|tr| +|Size:|254.1 MB| + +## References + +https://huggingface.co/byunal/distilbert-base-turkish-cased-stance + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_credit_cards_zphr_0st72_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_credit_cards_zphr_0st72_pipeline_en.md new file mode 100644 index 00000000000000..dedad29c71a9e7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_credit_cards_zphr_0st72_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_credit_cards_zphr_0st72_pipeline pipeline DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_credit_cards_zphr_0st72_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_credit_cards_zphr_0st72_pipeline` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_credit_cards_zphr_0st72_pipeline_en_5.5.0_3.0_1726680431065.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_credit_cards_zphr_0st72_pipeline_en_5.5.0_3.0_1726680431065.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_credit_cards_zphr_0st72_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_credit_cards_zphr_0st72_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_credit_cards_zphr_0st72_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_credit_cards_zphr_0st72 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_distil_fine_on_bioasq_50_50_shuffled_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_distil_fine_on_bioasq_50_50_shuffled_pipeline_en.md new file mode 100644 index 00000000000000..f16a1da47a2a2f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_distil_fine_on_bioasq_50_50_shuffled_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English distilbert_base_uncased_distil_fine_on_bioasq_50_50_shuffled_pipeline pipeline DistilBertForQuestionAnswering from jkhsong +author: John Snow Labs +name: distilbert_base_uncased_distil_fine_on_bioasq_50_50_shuffled_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_distil_fine_on_bioasq_50_50_shuffled_pipeline` is a English model originally trained by jkhsong. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_distil_fine_on_bioasq_50_50_shuffled_pipeline_en_5.5.0_3.0_1726644163384.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_distil_fine_on_bioasq_50_50_shuffled_pipeline_en_5.5.0_3.0_1726644163384.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_distil_fine_on_bioasq_50_50_shuffled_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_distil_fine_on_bioasq_50_50_shuffled_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_distil_fine_on_bioasq_50_50_shuffled_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.3 MB| + +## References + +https://huggingface.co/jkhsong/distilbert-base-uncased-distil-fine-on-bioasq-50-50-shuffled + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_distilled_clinc_aaa01101312_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_distilled_clinc_aaa01101312_pipeline_en.md new file mode 100644 index 00000000000000..d662bcd4666c7a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_distilled_clinc_aaa01101312_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_distilled_clinc_aaa01101312_pipeline pipeline DistilBertForSequenceClassification from AAA01101312 +author: John Snow Labs +name: distilbert_base_uncased_distilled_clinc_aaa01101312_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_distilled_clinc_aaa01101312_pipeline` is a English model originally trained by AAA01101312. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_distilled_clinc_aaa01101312_pipeline_en_5.5.0_3.0_1726669535648.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_distilled_clinc_aaa01101312_pipeline_en_5.5.0_3.0_1726669535648.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_distilled_clinc_aaa01101312_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_distilled_clinc_aaa01101312_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_distilled_clinc_aaa01101312_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.9 MB| + +## References + +https://huggingface.co/AAA01101312/distilbert-base-uncased-distilled-clinc + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_distilled_clinc_cezeozue_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_distilled_clinc_cezeozue_en.md new file mode 100644 index 00000000000000..b37ca3ee94b1c2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_distilled_clinc_cezeozue_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_distilled_clinc_cezeozue DistilBertForSequenceClassification from cezeozue +author: John Snow Labs +name: distilbert_base_uncased_distilled_clinc_cezeozue +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_distilled_clinc_cezeozue` is a English model originally trained by cezeozue. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_distilled_clinc_cezeozue_en_5.5.0_3.0_1726696195007.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_distilled_clinc_cezeozue_en_5.5.0_3.0_1726696195007.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_distilled_clinc_cezeozue","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_distilled_clinc_cezeozue", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_distilled_clinc_cezeozue| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.9 MB| + +## References + +https://huggingface.co/cezeozue/distilbert-base-uncased-distilled-clinc \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_distilled_clinc_mealduct_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_distilled_clinc_mealduct_en.md new file mode 100644 index 00000000000000..2ff57e68868512 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_distilled_clinc_mealduct_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_distilled_clinc_mealduct DistilBertForSequenceClassification from MealDuct +author: John Snow Labs +name: distilbert_base_uncased_distilled_clinc_mealduct +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_distilled_clinc_mealduct` is a English model originally trained by MealDuct. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_distilled_clinc_mealduct_en_5.5.0_3.0_1726625359284.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_distilled_clinc_mealduct_en_5.5.0_3.0_1726625359284.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_distilled_clinc_mealduct","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_distilled_clinc_mealduct", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_distilled_clinc_mealduct| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.9 MB| + +## References + +https://huggingface.co/MealDuct/distilbert-base-uncased-distilled-clinc \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_distilled_clinc_seddiktrk_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_distilled_clinc_seddiktrk_en.md new file mode 100644 index 00000000000000..96c344ad337d74 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_distilled_clinc_seddiktrk_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_distilled_clinc_seddiktrk DistilBertForSequenceClassification from seddiktrk +author: John Snow Labs +name: distilbert_base_uncased_distilled_clinc_seddiktrk +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_distilled_clinc_seddiktrk` is a English model originally trained by seddiktrk. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_distilled_clinc_seddiktrk_en_5.5.0_3.0_1726681953985.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_distilled_clinc_seddiktrk_en_5.5.0_3.0_1726681953985.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_distilled_clinc_seddiktrk","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_distilled_clinc_seddiktrk", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_distilled_clinc_seddiktrk| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.9 MB| + +## References + +https://huggingface.co/seddiktrk/distilbert-base-uncased-distilled-clinc \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_enneagram_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_enneagram_en.md new file mode 100644 index 00000000000000..200371e2d53711 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_enneagram_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_enneagram DistilBertForSequenceClassification from LandersonMiguel +author: John Snow Labs +name: distilbert_base_uncased_enneagram +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_enneagram` is a English model originally trained by LandersonMiguel. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_enneagram_en_5.5.0_3.0_1726669975661.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_enneagram_en_5.5.0_3.0_1726669975661.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_enneagram","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_enneagram", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_enneagram| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/LandersonMiguel/distilbert-base-uncased-enneagram \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_enneagram_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_enneagram_pipeline_en.md new file mode 100644 index 00000000000000..717347c3f530e1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_enneagram_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_enneagram_pipeline pipeline DistilBertForSequenceClassification from LandersonMiguel +author: John Snow Labs +name: distilbert_base_uncased_enneagram_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_enneagram_pipeline` is a English model originally trained by LandersonMiguel. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_enneagram_pipeline_en_5.5.0_3.0_1726669989474.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_enneagram_pipeline_en_5.5.0_3.0_1726669989474.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_enneagram_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_enneagram_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_enneagram_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/LandersonMiguel/distilbert-base-uncased-enneagram + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_fine_tunning_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_fine_tunning_pipeline_en.md new file mode 100644 index 00000000000000..576ec8df13629c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_fine_tunning_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_fine_tunning_pipeline pipeline DistilBertForSequenceClassification from adolford +author: John Snow Labs +name: distilbert_base_uncased_fine_tunning_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_fine_tunning_pipeline` is a English model originally trained by adolford. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_fine_tunning_pipeline_en_5.5.0_3.0_1726695061240.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_fine_tunning_pipeline_en_5.5.0_3.0_1726695061240.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_fine_tunning_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_fine_tunning_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_fine_tunning_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/adolford/distilbert-base-uncased_fine_tunning + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_clinc_balus_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_clinc_balus_en.md new file mode 100644 index 00000000000000..f23f5749f2e355 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_clinc_balus_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_clinc_balus DistilBertForSequenceClassification from balus +author: John Snow Labs +name: distilbert_base_uncased_finetuned_clinc_balus +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_clinc_balus` is a English model originally trained by balus. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_clinc_balus_en_5.5.0_3.0_1726696538129.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_clinc_balus_en_5.5.0_3.0_1726696538129.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_clinc_balus","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_clinc_balus", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_clinc_balus| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.9 MB| + +## References + +https://huggingface.co/balus/distilbert-base-uncased-finetuned-clinc \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_clinc_balus_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_clinc_balus_pipeline_en.md new file mode 100644 index 00000000000000..f75e1b0997a5db --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_clinc_balus_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_clinc_balus_pipeline pipeline DistilBertForSequenceClassification from balus +author: John Snow Labs +name: distilbert_base_uncased_finetuned_clinc_balus_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_clinc_balus_pipeline` is a English model originally trained by balus. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_clinc_balus_pipeline_en_5.5.0_3.0_1726696550995.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_clinc_balus_pipeline_en_5.5.0_3.0_1726696550995.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_clinc_balus_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_clinc_balus_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_clinc_balus_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.9 MB| + +## References + +https://huggingface.co/balus/distilbert-base-uncased-finetuned-clinc + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_clinc_cheng_cherry_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_clinc_cheng_cherry_pipeline_en.md new file mode 100644 index 00000000000000..dc51c4c43aa8fd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_clinc_cheng_cherry_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_clinc_cheng_cherry_pipeline pipeline DistilBertForSequenceClassification from cheng-cherry +author: John Snow Labs +name: distilbert_base_uncased_finetuned_clinc_cheng_cherry_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_clinc_cheng_cherry_pipeline` is a English model originally trained by cheng-cherry. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_clinc_cheng_cherry_pipeline_en_5.5.0_3.0_1726681591363.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_clinc_cheng_cherry_pipeline_en_5.5.0_3.0_1726681591363.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_clinc_cheng_cherry_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_clinc_cheng_cherry_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_clinc_cheng_cherry_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.9 MB| + +## References + +https://huggingface.co/cheng-cherry/distilbert-base-uncased-finetuned-clinc + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_clinc_dro14_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_clinc_dro14_en.md new file mode 100644 index 00000000000000..05bd7306fb6e51 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_clinc_dro14_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_clinc_dro14 DistilBertForSequenceClassification from dro14 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_clinc_dro14 +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_clinc_dro14` is a English model originally trained by dro14. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_clinc_dro14_en_5.5.0_3.0_1726630168639.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_clinc_dro14_en_5.5.0_3.0_1726630168639.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_clinc_dro14","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_clinc_dro14", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_clinc_dro14| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.9 MB| + +## References + +https://huggingface.co/dro14/distilbert-base-uncased-finetuned-clinc \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_clinc_dro14_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_clinc_dro14_pipeline_en.md new file mode 100644 index 00000000000000..11b907d6610ba4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_clinc_dro14_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_clinc_dro14_pipeline pipeline DistilBertForSequenceClassification from dro14 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_clinc_dro14_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_clinc_dro14_pipeline` is a English model originally trained by dro14. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_clinc_dro14_pipeline_en_5.5.0_3.0_1726630182030.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_clinc_dro14_pipeline_en_5.5.0_3.0_1726630182030.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_clinc_dro14_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_clinc_dro14_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_clinc_dro14_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.9 MB| + +## References + +https://huggingface.co/dro14/distilbert-base-uncased-finetuned-clinc + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_clinc_ehottl_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_clinc_ehottl_pipeline_en.md new file mode 100644 index 00000000000000..09c3a274c9766b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_clinc_ehottl_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_clinc_ehottl_pipeline pipeline DistilBertForSequenceClassification from ehottl +author: John Snow Labs +name: distilbert_base_uncased_finetuned_clinc_ehottl_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_clinc_ehottl_pipeline` is a English model originally trained by ehottl. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_clinc_ehottl_pipeline_en_5.5.0_3.0_1726681911817.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_clinc_ehottl_pipeline_en_5.5.0_3.0_1726681911817.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_clinc_ehottl_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_clinc_ehottl_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_clinc_ehottl_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.9 MB| + +## References + +https://huggingface.co/ehottl/distilbert-base-uncased-finetuned-clinc + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_cola_anuj55_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_cola_anuj55_pipeline_en.md new file mode 100644 index 00000000000000..8e0a6022646a81 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_cola_anuj55_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_cola_anuj55_pipeline pipeline DistilBertForSequenceClassification from anuj55 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_cola_anuj55_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_cola_anuj55_pipeline` is a English model originally trained by anuj55. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_anuj55_pipeline_en_5.5.0_3.0_1726694876353.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_anuj55_pipeline_en_5.5.0_3.0_1726694876353.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_cola_anuj55_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_cola_anuj55_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_cola_anuj55_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/anuj55/distilbert-base-uncased-finetuned-cola + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_cola_hashemghanem_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_cola_hashemghanem_en.md new file mode 100644 index 00000000000000..efe097173e4f8c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_cola_hashemghanem_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_cola_hashemghanem DistilBertForSequenceClassification from Hashemghanem +author: John Snow Labs +name: distilbert_base_uncased_finetuned_cola_hashemghanem +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_cola_hashemghanem` is a English model originally trained by Hashemghanem. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_hashemghanem_en_5.5.0_3.0_1726677343310.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_hashemghanem_en_5.5.0_3.0_1726677343310.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_cola_hashemghanem","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_cola_hashemghanem", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_cola_hashemghanem| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Hashemghanem/distilbert-base-uncased-finetuned-cola \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_cola_hashemghanem_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_cola_hashemghanem_pipeline_en.md new file mode 100644 index 00000000000000..bfd9a82bfa2662 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_cola_hashemghanem_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_cola_hashemghanem_pipeline pipeline DistilBertForSequenceClassification from Hashemghanem +author: John Snow Labs +name: distilbert_base_uncased_finetuned_cola_hashemghanem_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_cola_hashemghanem_pipeline` is a English model originally trained by Hashemghanem. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_hashemghanem_pipeline_en_5.5.0_3.0_1726677355734.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_hashemghanem_pipeline_en_5.5.0_3.0_1726677355734.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_cola_hashemghanem_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_cola_hashemghanem_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_cola_hashemghanem_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Hashemghanem/distilbert-base-uncased-finetuned-cola + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_cola_hfdsajkfd_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_cola_hfdsajkfd_en.md new file mode 100644 index 00000000000000..1d4224c78aaccc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_cola_hfdsajkfd_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_cola_hfdsajkfd DistilBertForSequenceClassification from hfdsajkfd +author: John Snow Labs +name: distilbert_base_uncased_finetuned_cola_hfdsajkfd +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_cola_hfdsajkfd` is a English model originally trained by hfdsajkfd. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_hfdsajkfd_en_5.5.0_3.0_1726630281320.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_hfdsajkfd_en_5.5.0_3.0_1726630281320.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_cola_hfdsajkfd","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_cola_hfdsajkfd", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_cola_hfdsajkfd| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/hfdsajkfd/distilbert-base-uncased-finetuned-cola \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_cola_hfdsajkfd_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_cola_hfdsajkfd_pipeline_en.md new file mode 100644 index 00000000000000..82057a6d579b0e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_cola_hfdsajkfd_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_cola_hfdsajkfd_pipeline pipeline DistilBertForSequenceClassification from hfdsajkfd +author: John Snow Labs +name: distilbert_base_uncased_finetuned_cola_hfdsajkfd_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_cola_hfdsajkfd_pipeline` is a English model originally trained by hfdsajkfd. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_hfdsajkfd_pipeline_en_5.5.0_3.0_1726630295100.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_hfdsajkfd_pipeline_en_5.5.0_3.0_1726630295100.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_cola_hfdsajkfd_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_cola_hfdsajkfd_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_cola_hfdsajkfd_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/hfdsajkfd/distilbert-base-uncased-finetuned-cola + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_cola_obudzecie_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_cola_obudzecie_pipeline_en.md new file mode 100644 index 00000000000000..9ad183c42f0cd3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_cola_obudzecie_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_cola_obudzecie_pipeline pipeline DistilBertForSequenceClassification from obudzecie +author: John Snow Labs +name: distilbert_base_uncased_finetuned_cola_obudzecie_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_cola_obudzecie_pipeline` is a English model originally trained by obudzecie. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_obudzecie_pipeline_en_5.5.0_3.0_1726681385569.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_obudzecie_pipeline_en_5.5.0_3.0_1726681385569.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_cola_obudzecie_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_cola_obudzecie_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_cola_obudzecie_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/obudzecie/distilbert-base-uncased-finetuned-cola + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_cola_santosale_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_cola_santosale_en.md new file mode 100644 index 00000000000000..5cede84c72ad48 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_cola_santosale_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_cola_santosale DistilBertForSequenceClassification from santosale +author: John Snow Labs +name: distilbert_base_uncased_finetuned_cola_santosale +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_cola_santosale` is a English model originally trained by santosale. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_santosale_en_5.5.0_3.0_1726676749499.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_santosale_en_5.5.0_3.0_1726676749499.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_cola_santosale","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_cola_santosale", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_cola_santosale| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/santosale/distilbert-base-uncased-finetuned-cola \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_cola_santosale_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_cola_santosale_pipeline_en.md new file mode 100644 index 00000000000000..5d6f0b8afb6d96 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_cola_santosale_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_cola_santosale_pipeline pipeline DistilBertForSequenceClassification from santosale +author: John Snow Labs +name: distilbert_base_uncased_finetuned_cola_santosale_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_cola_santosale_pipeline` is a English model originally trained by santosale. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_santosale_pipeline_en_5.5.0_3.0_1726676766725.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_santosale_pipeline_en_5.5.0_3.0_1726676766725.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_cola_santosale_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_cola_santosale_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_cola_santosale_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/santosale/distilbert-base-uncased-finetuned-cola + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_cola_whoopwhoop_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_cola_whoopwhoop_en.md new file mode 100644 index 00000000000000..9787e58bf8487d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_cola_whoopwhoop_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_cola_whoopwhoop DistilBertForSequenceClassification from whoopwhoop +author: John Snow Labs +name: distilbert_base_uncased_finetuned_cola_whoopwhoop +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_cola_whoopwhoop` is a English model originally trained by whoopwhoop. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_whoopwhoop_en_5.5.0_3.0_1726695439703.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_whoopwhoop_en_5.5.0_3.0_1726695439703.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_cola_whoopwhoop","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_cola_whoopwhoop", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_cola_whoopwhoop| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/whoopwhoop/distilbert-base-uncased-finetuned-cola \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_cola_zhihengjasontou_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_cola_zhihengjasontou_en.md new file mode 100644 index 00000000000000..5ec2a279b08e0a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_cola_zhihengjasontou_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_cola_zhihengjasontou DistilBertForSequenceClassification from zhihengjasontou +author: John Snow Labs +name: distilbert_base_uncased_finetuned_cola_zhihengjasontou +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_cola_zhihengjasontou` is a English model originally trained by zhihengjasontou. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_zhihengjasontou_en_5.5.0_3.0_1726676749399.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_zhihengjasontou_en_5.5.0_3.0_1726676749399.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_cola_zhihengjasontou","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_cola_zhihengjasontou", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_cola_zhihengjasontou| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/zhihengjasontou/distilbert-base-uncased-finetuned-cola \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotion_agonrod_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotion_agonrod_en.md new file mode 100644 index 00000000000000..ff21d69396efa6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotion_agonrod_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_agonrod DistilBertForSequenceClassification from agonrod +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_agonrod +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_agonrod` is a English model originally trained by agonrod. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_agonrod_en_5.5.0_3.0_1726695029404.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_agonrod_en_5.5.0_3.0_1726695029404.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_agonrod","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_agonrod", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_agonrod| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/agonrod/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotion_amirabedini_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotion_amirabedini_pipeline_en.md new file mode 100644 index 00000000000000..fbed2499258398 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotion_amirabedini_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_amirabedini_pipeline pipeline DistilBertForSequenceClassification from AmirAbedini +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_amirabedini_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_amirabedini_pipeline` is a English model originally trained by AmirAbedini. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_amirabedini_pipeline_en_5.5.0_3.0_1726696024732.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_amirabedini_pipeline_en_5.5.0_3.0_1726696024732.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_amirabedini_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_amirabedini_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_amirabedini_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/AmirAbedini/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotion_arvindsinghmanhas_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotion_arvindsinghmanhas_en.md new file mode 100644 index 00000000000000..a651c25de007ed --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotion_arvindsinghmanhas_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_arvindsinghmanhas DistilBertForSequenceClassification from arvindsinghmanhas +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_arvindsinghmanhas +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_arvindsinghmanhas` is a English model originally trained by arvindsinghmanhas. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_arvindsinghmanhas_en_5.5.0_3.0_1726680398947.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_arvindsinghmanhas_en_5.5.0_3.0_1726680398947.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_arvindsinghmanhas","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_arvindsinghmanhas", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_arvindsinghmanhas| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/arvindsinghmanhas/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotion_arvindsinghmanhas_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotion_arvindsinghmanhas_pipeline_en.md new file mode 100644 index 00000000000000..bd863e97dcc143 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotion_arvindsinghmanhas_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_arvindsinghmanhas_pipeline pipeline DistilBertForSequenceClassification from arvindsinghmanhas +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_arvindsinghmanhas_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_arvindsinghmanhas_pipeline` is a English model originally trained by arvindsinghmanhas. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_arvindsinghmanhas_pipeline_en_5.5.0_3.0_1726680412193.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_arvindsinghmanhas_pipeline_en_5.5.0_3.0_1726680412193.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_arvindsinghmanhas_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_arvindsinghmanhas_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_arvindsinghmanhas_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/arvindsinghmanhas/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotion_bentanweihao_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotion_bentanweihao_en.md new file mode 100644 index 00000000000000..54f2eb63a37e5b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotion_bentanweihao_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_bentanweihao DistilBertForSequenceClassification from bentanweihao +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_bentanweihao +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_bentanweihao` is a English model originally trained by bentanweihao. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_bentanweihao_en_5.5.0_3.0_1726625641079.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_bentanweihao_en_5.5.0_3.0_1726625641079.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_bentanweihao","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_bentanweihao", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_bentanweihao| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/bentanweihao/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotion_bentanweihao_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotion_bentanweihao_pipeline_en.md new file mode 100644 index 00000000000000..c813f2a89f9984 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotion_bentanweihao_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_bentanweihao_pipeline pipeline DistilBertForSequenceClassification from bentanweihao +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_bentanweihao_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_bentanweihao_pipeline` is a English model originally trained by bentanweihao. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_bentanweihao_pipeline_en_5.5.0_3.0_1726625653971.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_bentanweihao_pipeline_en_5.5.0_3.0_1726625653971.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_bentanweihao_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_bentanweihao_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_bentanweihao_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/bentanweihao/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotion_edosevering_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotion_edosevering_en.md new file mode 100644 index 00000000000000..1e4af32a5cf347 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotion_edosevering_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_edosevering DistilBertForSequenceClassification from edoSevering +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_edosevering +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_edosevering` is a English model originally trained by edoSevering. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_edosevering_en_5.5.0_3.0_1726695058212.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_edosevering_en_5.5.0_3.0_1726695058212.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_edosevering","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_edosevering", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_edosevering| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/edoSevering/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotion_edosevering_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotion_edosevering_pipeline_en.md new file mode 100644 index 00000000000000..80839b11ff0928 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotion_edosevering_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_edosevering_pipeline pipeline DistilBertForSequenceClassification from edoSevering +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_edosevering_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_edosevering_pipeline` is a English model originally trained by edoSevering. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_edosevering_pipeline_en_5.5.0_3.0_1726695070308.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_edosevering_pipeline_en_5.5.0_3.0_1726695070308.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_edosevering_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_edosevering_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_edosevering_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/edoSevering/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotion_facehugger69420_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotion_facehugger69420_pipeline_en.md new file mode 100644 index 00000000000000..e7f92e042340c7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotion_facehugger69420_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_facehugger69420_pipeline pipeline DistilBertForSequenceClassification from FaceHugger69420 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_facehugger69420_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_facehugger69420_pipeline` is a English model originally trained by FaceHugger69420. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_facehugger69420_pipeline_en_5.5.0_3.0_1726695316515.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_facehugger69420_pipeline_en_5.5.0_3.0_1726695316515.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_facehugger69420_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_facehugger69420_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_facehugger69420_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/FaceHugger69420/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotion_helloyeew_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotion_helloyeew_en.md new file mode 100644 index 00000000000000..37f39938f0cf22 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotion_helloyeew_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_helloyeew DistilBertForSequenceClassification from helloyeew +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_helloyeew +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_helloyeew` is a English model originally trained by helloyeew. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_helloyeew_en_5.5.0_3.0_1726696432771.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_helloyeew_en_5.5.0_3.0_1726696432771.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_helloyeew","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_helloyeew", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_helloyeew| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/helloyeew/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotion_hitoshinagaoka_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotion_hitoshinagaoka_en.md new file mode 100644 index 00000000000000..f323739e504976 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotion_hitoshinagaoka_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_hitoshinagaoka DistilBertForSequenceClassification from hitoshiNagaoka +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_hitoshinagaoka +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_hitoshinagaoka` is a English model originally trained by hitoshiNagaoka. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_hitoshinagaoka_en_5.5.0_3.0_1726696021248.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_hitoshinagaoka_en_5.5.0_3.0_1726696021248.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_hitoshinagaoka","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_hitoshinagaoka", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_hitoshinagaoka| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/hitoshiNagaoka/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotion_hitoshinagaoka_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotion_hitoshinagaoka_pipeline_en.md new file mode 100644 index 00000000000000..0fe12c0da2e1f4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotion_hitoshinagaoka_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_hitoshinagaoka_pipeline pipeline DistilBertForSequenceClassification from hitoshiNagaoka +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_hitoshinagaoka_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_hitoshinagaoka_pipeline` is a English model originally trained by hitoshiNagaoka. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_hitoshinagaoka_pipeline_en_5.5.0_3.0_1726696034810.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_hitoshinagaoka_pipeline_en_5.5.0_3.0_1726696034810.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_hitoshinagaoka_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_hitoshinagaoka_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_hitoshinagaoka_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/hitoshiNagaoka/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotion_jennifer0804_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotion_jennifer0804_en.md new file mode 100644 index 00000000000000..573db6dcb6b647 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotion_jennifer0804_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_jennifer0804 DistilBertForSequenceClassification from jennifer0804 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_jennifer0804 +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_jennifer0804` is a English model originally trained by jennifer0804. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_jennifer0804_en_5.5.0_3.0_1726681404724.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_jennifer0804_en_5.5.0_3.0_1726681404724.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_jennifer0804","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_jennifer0804", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_jennifer0804| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/jennifer0804/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotion_kbrink_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotion_kbrink_pipeline_en.md new file mode 100644 index 00000000000000..862da8c8d87438 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotion_kbrink_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_kbrink_pipeline pipeline DistilBertForSequenceClassification from kbrink +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_kbrink_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_kbrink_pipeline` is a English model originally trained by kbrink. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_kbrink_pipeline_en_5.5.0_3.0_1726696627617.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_kbrink_pipeline_en_5.5.0_3.0_1726696627617.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_kbrink_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_kbrink_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_kbrink_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/kbrink/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotion_kimsan1120_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotion_kimsan1120_en.md new file mode 100644 index 00000000000000..6d70ea3b0a2cdc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotion_kimsan1120_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_kimsan1120 DistilBertForSequenceClassification from kimsan1120 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_kimsan1120 +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_kimsan1120` is a English model originally trained by kimsan1120. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_kimsan1120_en_5.5.0_3.0_1726695902901.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_kimsan1120_en_5.5.0_3.0_1726695902901.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_kimsan1120","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_kimsan1120", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_kimsan1120| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/kimsan1120/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotion_kimsan1120_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotion_kimsan1120_pipeline_en.md new file mode 100644 index 00000000000000..fc06832df8b521 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotion_kimsan1120_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_kimsan1120_pipeline pipeline DistilBertForSequenceClassification from kimsan1120 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_kimsan1120_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_kimsan1120_pipeline` is a English model originally trained by kimsan1120. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_kimsan1120_pipeline_en_5.5.0_3.0_1726695918808.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_kimsan1120_pipeline_en_5.5.0_3.0_1726695918808.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_kimsan1120_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_kimsan1120_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_kimsan1120_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/kimsan1120/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotion_lxlinghu_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotion_lxlinghu_en.md new file mode 100644 index 00000000000000..9849926b40b1f9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotion_lxlinghu_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_lxlinghu DistilBertForSequenceClassification from lxlinghu +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_lxlinghu +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_lxlinghu` is a English model originally trained by lxlinghu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_lxlinghu_en_5.5.0_3.0_1726630903317.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_lxlinghu_en_5.5.0_3.0_1726630903317.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_lxlinghu","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_lxlinghu", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_lxlinghu| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/lxlinghu/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotion_michaelsungboklee_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotion_michaelsungboklee_pipeline_en.md new file mode 100644 index 00000000000000..ff8e1a083778f8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotion_michaelsungboklee_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_michaelsungboklee_pipeline pipeline DistilBertForSequenceClassification from michaelsungboklee +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_michaelsungboklee_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_michaelsungboklee_pipeline` is a English model originally trained by michaelsungboklee. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_michaelsungboklee_pipeline_en_5.5.0_3.0_1726680299118.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_michaelsungboklee_pipeline_en_5.5.0_3.0_1726680299118.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_michaelsungboklee_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_michaelsungboklee_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_michaelsungboklee_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/michaelsungboklee/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotion_ms25_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotion_ms25_en.md new file mode 100644 index 00000000000000..a419bed2b7e585 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotion_ms25_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_ms25 DistilBertForSequenceClassification from ms25 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_ms25 +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_ms25` is a English model originally trained by ms25. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_ms25_en_5.5.0_3.0_1726676972979.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_ms25_en_5.5.0_3.0_1726676972979.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_ms25","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_ms25", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_ms25| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/ms25/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotion_mu7annad_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotion_mu7annad_en.md new file mode 100644 index 00000000000000..f11545f673cf86 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotion_mu7annad_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_mu7annad DistilBertForSequenceClassification from Mu7annad +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_mu7annad +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_mu7annad` is a English model originally trained by Mu7annad. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_mu7annad_en_5.5.0_3.0_1726682028738.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_mu7annad_en_5.5.0_3.0_1726682028738.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_mu7annad","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_mu7annad", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_mu7annad| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Mu7annad/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotion_naikola_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotion_naikola_en.md new file mode 100644 index 00000000000000..92def342bc8b3d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotion_naikola_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_naikola DistilBertForSequenceClassification from Naikola +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_naikola +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_naikola` is a English model originally trained by Naikola. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_naikola_en_5.5.0_3.0_1726677414879.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_naikola_en_5.5.0_3.0_1726677414879.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_naikola","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_naikola", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_naikola| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Naikola/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotion_naikola_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotion_naikola_pipeline_en.md new file mode 100644 index 00000000000000..927089a8215086 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotion_naikola_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_naikola_pipeline pipeline DistilBertForSequenceClassification from Naikola +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_naikola_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_naikola_pipeline` is a English model originally trained by Naikola. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_naikola_pipeline_en_5.5.0_3.0_1726677427431.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_naikola_pipeline_en_5.5.0_3.0_1726677427431.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_naikola_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_naikola_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_naikola_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Naikola/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotion_oturk_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotion_oturk_pipeline_en.md new file mode 100644 index 00000000000000..23728e587d5532 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotion_oturk_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_oturk_pipeline pipeline DistilBertForSequenceClassification from oturk +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_oturk_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_oturk_pipeline` is a English model originally trained by oturk. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_oturk_pipeline_en_5.5.0_3.0_1726680629783.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_oturk_pipeline_en_5.5.0_3.0_1726680629783.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_oturk_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_oturk_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_oturk_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/oturk/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotion_overall_2nd_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotion_overall_2nd_en.md new file mode 100644 index 00000000000000..3615489b08ef9e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotion_overall_2nd_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_overall_2nd DistilBertForSequenceClassification from LeBruse +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_overall_2nd +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_overall_2nd` is a English model originally trained by LeBruse. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_overall_2nd_en_5.5.0_3.0_1726669285055.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_overall_2nd_en_5.5.0_3.0_1726669285055.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_overall_2nd","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_overall_2nd", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_overall_2nd| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/LeBruse/distilbert-base-uncased-finetuned-emotion-overall-2nd \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotion_overall_2nd_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotion_overall_2nd_pipeline_en.md new file mode 100644 index 00000000000000..562ed92edb6ddf --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotion_overall_2nd_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_overall_2nd_pipeline pipeline DistilBertForSequenceClassification from LeBruse +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_overall_2nd_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_overall_2nd_pipeline` is a English model originally trained by LeBruse. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_overall_2nd_pipeline_en_5.5.0_3.0_1726669304110.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_overall_2nd_pipeline_en_5.5.0_3.0_1726669304110.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_overall_2nd_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_overall_2nd_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_overall_2nd_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/LeBruse/distilbert-base-uncased-finetuned-emotion-overall-2nd + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotion_rick72x5_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotion_rick72x5_en.md new file mode 100644 index 00000000000000..75c3b16fecb6e5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotion_rick72x5_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_rick72x5 DistilBertForSequenceClassification from rick72x5 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_rick72x5 +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_rick72x5` is a English model originally trained by rick72x5. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_rick72x5_en_5.5.0_3.0_1726677136216.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_rick72x5_en_5.5.0_3.0_1726677136216.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_rick72x5","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_rick72x5", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_rick72x5| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/rick72x5/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotion_rick72x5_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotion_rick72x5_pipeline_en.md new file mode 100644 index 00000000000000..268b7afbfe8cf3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotion_rick72x5_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_rick72x5_pipeline pipeline DistilBertForSequenceClassification from rick72x5 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_rick72x5_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_rick72x5_pipeline` is a English model originally trained by rick72x5. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_rick72x5_pipeline_en_5.5.0_3.0_1726677148590.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_rick72x5_pipeline_en_5.5.0_3.0_1726677148590.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_rick72x5_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_rick72x5_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_rick72x5_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/rick72x5/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotion_ryanjyc_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotion_ryanjyc_en.md new file mode 100644 index 00000000000000..8f4db40b234315 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotion_ryanjyc_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_ryanjyc DistilBertForSequenceClassification from ryanjyc +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_ryanjyc +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_ryanjyc` is a English model originally trained by ryanjyc. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_ryanjyc_en_5.5.0_3.0_1726695063590.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_ryanjyc_en_5.5.0_3.0_1726695063590.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_ryanjyc","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_ryanjyc", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_ryanjyc| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/ryanjyc/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotion_shiv_pal_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotion_shiv_pal_en.md new file mode 100644 index 00000000000000..d92725eae8e7c3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotion_shiv_pal_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_shiv_pal DistilBertForSequenceClassification from Shiv-Pal +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_shiv_pal +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_shiv_pal` is a English model originally trained by Shiv-Pal. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_shiv_pal_en_5.5.0_3.0_1726625760900.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_shiv_pal_en_5.5.0_3.0_1726625760900.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_shiv_pal","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_shiv_pal", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_shiv_pal| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Shiv-Pal/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotion_sj1011_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotion_sj1011_pipeline_en.md new file mode 100644 index 00000000000000..9d7cf4bbaa6795 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotion_sj1011_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_sj1011_pipeline pipeline DistilBertForSequenceClassification from SJ1011 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_sj1011_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_sj1011_pipeline` is a English model originally trained by SJ1011. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_sj1011_pipeline_en_5.5.0_3.0_1726680823445.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_sj1011_pipeline_en_5.5.0_3.0_1726680823445.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_sj1011_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_sj1011_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_sj1011_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/SJ1011/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotion_skylord_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotion_skylord_pipeline_en.md new file mode 100644 index 00000000000000..f28409ae10032a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotion_skylord_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_skylord_pipeline pipeline DistilBertForSequenceClassification from skylord +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_skylord_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_skylord_pipeline` is a English model originally trained by skylord. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_skylord_pipeline_en_5.5.0_3.0_1726625658963.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_skylord_pipeline_en_5.5.0_3.0_1726625658963.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_skylord_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_skylord_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_skylord_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/skylord/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotion_suraj101_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotion_suraj101_en.md new file mode 100644 index 00000000000000..d7e852bac131e2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotion_suraj101_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_suraj101 DistilBertForSequenceClassification from suraj101 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_suraj101 +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_suraj101` is a English model originally trained by suraj101. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_suraj101_en_5.5.0_3.0_1726630375519.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_suraj101_en_5.5.0_3.0_1726630375519.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_suraj101","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_suraj101", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_suraj101| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/suraj101/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotion_tagch_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotion_tagch_en.md new file mode 100644 index 00000000000000..6b3445e97177cc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotion_tagch_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_tagch DistilBertForSequenceClassification from TAGCH +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_tagch +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_tagch` is a English model originally trained by TAGCH. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_tagch_en_5.5.0_3.0_1726630910303.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_tagch_en_5.5.0_3.0_1726630910303.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_tagch","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_tagch", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_tagch| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/TAGCH/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotion_teraz_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotion_teraz_en.md new file mode 100644 index 00000000000000..f14b63ad8ad578 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotion_teraz_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_teraz DistilBertForSequenceClassification from Teraz +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_teraz +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_teraz` is a English model originally trained by Teraz. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_teraz_en_5.5.0_3.0_1726630591885.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_teraz_en_5.5.0_3.0_1726630591885.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_teraz","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_teraz", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_teraz| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Teraz/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotion_teraz_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotion_teraz_pipeline_en.md new file mode 100644 index 00000000000000..03acdf1ba386b8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotion_teraz_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_teraz_pipeline pipeline DistilBertForSequenceClassification from Teraz +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_teraz_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_teraz_pipeline` is a English model originally trained by Teraz. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_teraz_pipeline_en_5.5.0_3.0_1726630604677.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_teraz_pipeline_en_5.5.0_3.0_1726630604677.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_teraz_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_teraz_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_teraz_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Teraz/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotion_uijeong01_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotion_uijeong01_pipeline_en.md new file mode 100644 index 00000000000000..165289b075a61a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotion_uijeong01_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_uijeong01_pipeline pipeline DistilBertForSequenceClassification from uijeong01 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_uijeong01_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_uijeong01_pipeline` is a English model originally trained by uijeong01. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_uijeong01_pipeline_en_5.5.0_3.0_1726669866463.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_uijeong01_pipeline_en_5.5.0_3.0_1726669866463.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_uijeong01_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_uijeong01_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_uijeong01_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/uijeong01/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotion_uisikdag_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotion_uisikdag_en.md new file mode 100644 index 00000000000000..13ba3293e2c3a0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotion_uisikdag_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_uisikdag DistilBertForSequenceClassification from uisikdag +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_uisikdag +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_uisikdag` is a English model originally trained by uisikdag. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_uisikdag_en_5.5.0_3.0_1726696426600.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_uisikdag_en_5.5.0_3.0_1726696426600.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_uisikdag","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_uisikdag", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_uisikdag| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/uisikdag/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotion_uisikdag_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotion_uisikdag_pipeline_en.md new file mode 100644 index 00000000000000..95d49a490e2018 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotion_uisikdag_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_uisikdag_pipeline pipeline DistilBertForSequenceClassification from uisikdag +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_uisikdag_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_uisikdag_pipeline` is a English model originally trained by uisikdag. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_uisikdag_pipeline_en_5.5.0_3.0_1726696438671.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_uisikdag_pipeline_en_5.5.0_3.0_1726696438671.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_uisikdag_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_uisikdag_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_uisikdag_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/uisikdag/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotion_ujjwalgarg_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotion_ujjwalgarg_en.md new file mode 100644 index 00000000000000..a1a953843fccfd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotion_ujjwalgarg_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_ujjwalgarg DistilBertForSequenceClassification from ujjwalgarg +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_ujjwalgarg +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_ujjwalgarg` is a English model originally trained by ujjwalgarg. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_ujjwalgarg_en_5.5.0_3.0_1726695150871.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_ujjwalgarg_en_5.5.0_3.0_1726695150871.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_ujjwalgarg","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_ujjwalgarg", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_ujjwalgarg| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/ujjwalgarg/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotion_waynesunwen_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotion_waynesunwen_pipeline_en.md new file mode 100644 index 00000000000000..9b5e5be534f6eb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotion_waynesunwen_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_waynesunwen_pipeline pipeline DistilBertForSequenceClassification from waynesunwen +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_waynesunwen_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_waynesunwen_pipeline` is a English model originally trained by waynesunwen. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_waynesunwen_pipeline_en_5.5.0_3.0_1726696335722.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_waynesunwen_pipeline_en_5.5.0_3.0_1726696335722.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_waynesunwen_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_waynesunwen_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_waynesunwen_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/waynesunwen/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotion_xysj2012_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotion_xysj2012_pipeline_en.md new file mode 100644 index 00000000000000..6f7d2912a1b652 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotion_xysj2012_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_xysj2012_pipeline pipeline DistilBertForSequenceClassification from xysj2012 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_xysj2012_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_xysj2012_pipeline` is a English model originally trained by xysj2012. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_xysj2012_pipeline_en_5.5.0_3.0_1726630433981.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_xysj2012_pipeline_en_5.5.0_3.0_1726630433981.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_xysj2012_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_xysj2012_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_xysj2012_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/xysj2012/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotions_jjwariror_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotions_jjwariror_en.md new file mode 100644 index 00000000000000..db2d7635908509 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotions_jjwariror_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotions_jjwariror DistilBertForSequenceClassification from JJWariror +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotions_jjwariror +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotions_jjwariror` is a English model originally trained by JJWariror. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotions_jjwariror_en_5.5.0_3.0_1726695901905.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotions_jjwariror_en_5.5.0_3.0_1726695901905.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotions_jjwariror","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotions_jjwariror", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotions_jjwariror| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/JJWariror/distilbert-base-uncased-finetuned-emotions \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotions_sangeeths11_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotions_sangeeths11_pipeline_en.md new file mode 100644 index 00000000000000..b0b2f6edd62ab6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotions_sangeeths11_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotions_sangeeths11_pipeline pipeline DistilBertForSequenceClassification from sangeeths11 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotions_sangeeths11_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotions_sangeeths11_pipeline` is a English model originally trained by sangeeths11. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotions_sangeeths11_pipeline_en_5.5.0_3.0_1726696432884.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotions_sangeeths11_pipeline_en_5.5.0_3.0_1726696432884.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotions_sangeeths11_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotions_sangeeths11_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotions_sangeeths11_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/sangeeths11/distilbert-base-uncased-finetuned-emotions + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_m_reach_seller_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_m_reach_seller_pipeline_en.md new file mode 100644 index 00000000000000..96a3eccf9d47bb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_m_reach_seller_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_m_reach_seller_pipeline pipeline DistilBertForSequenceClassification from Gregorig +author: John Snow Labs +name: distilbert_base_uncased_finetuned_m_reach_seller_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_m_reach_seller_pipeline` is a English model originally trained by Gregorig. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_m_reach_seller_pipeline_en_5.5.0_3.0_1726694873187.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_m_reach_seller_pipeline_en_5.5.0_3.0_1726694873187.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_m_reach_seller_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_m_reach_seller_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_m_reach_seller_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Gregorig/distilbert-base-uncased-finetuned-m_reach_seller + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_mental_social_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_mental_social_en.md new file mode 100644 index 00000000000000..28bd7c72f3fe71 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_mental_social_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_mental_social DistilBertForSequenceClassification from PriyankaDS +author: John Snow Labs +name: distilbert_base_uncased_finetuned_mental_social +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_mental_social` is a English model originally trained by PriyankaDS. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_mental_social_en_5.5.0_3.0_1726625793219.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_mental_social_en_5.5.0_3.0_1726625793219.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_mental_social","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_mental_social", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_mental_social| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/PriyankaDS/distilbert-base-uncased-finetuned-mental_social \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_mental_social_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_mental_social_pipeline_en.md new file mode 100644 index 00000000000000..2e1a7130bd13c3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_mental_social_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_mental_social_pipeline pipeline DistilBertForSequenceClassification from PriyankaDS +author: John Snow Labs +name: distilbert_base_uncased_finetuned_mental_social_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_mental_social_pipeline` is a English model originally trained by PriyankaDS. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_mental_social_pipeline_en_5.5.0_3.0_1726625805601.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_mental_social_pipeline_en_5.5.0_3.0_1726625805601.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_mental_social_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_mental_social_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_mental_social_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/PriyankaDS/distilbert-base-uncased-finetuned-mental_social + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_mp_unannotated_half_frozen_v1_rile_v1_frozen_4_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_mp_unannotated_half_frozen_v1_rile_v1_frozen_4_pipeline_en.md new file mode 100644 index 00000000000000..69d9b1211be423 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_mp_unannotated_half_frozen_v1_rile_v1_frozen_4_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_mp_unannotated_half_frozen_v1_rile_v1_frozen_4_pipeline pipeline DistilBertForSequenceClassification from kghanlon +author: John Snow Labs +name: distilbert_base_uncased_finetuned_mp_unannotated_half_frozen_v1_rile_v1_frozen_4_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_mp_unannotated_half_frozen_v1_rile_v1_frozen_4_pipeline` is a English model originally trained by kghanlon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_mp_unannotated_half_frozen_v1_rile_v1_frozen_4_pipeline_en_5.5.0_3.0_1726630185154.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_mp_unannotated_half_frozen_v1_rile_v1_frozen_4_pipeline_en_5.5.0_3.0_1726630185154.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_mp_unannotated_half_frozen_v1_rile_v1_frozen_4_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_mp_unannotated_half_frozen_v1_rile_v1_frozen_4_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_mp_unannotated_half_frozen_v1_rile_v1_frozen_4_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/kghanlon/distilbert-base-uncased-finetuned-MP-unannotated-half-frozen-v1-RILE-v1_frozen_4 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_squad_alexcoliveira_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_squad_alexcoliveira_en.md new file mode 100644 index 00000000000000..e8fd244bebae97 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_squad_alexcoliveira_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_squad_alexcoliveira DistilBertForQuestionAnswering from alexcoliveira +author: John Snow Labs +name: distilbert_base_uncased_finetuned_squad_alexcoliveira +date: 2024-09-18 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_squad_alexcoliveira` is a English model originally trained by alexcoliveira. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_alexcoliveira_en_5.5.0_3.0_1726640977160.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_alexcoliveira_en_5.5.0_3.0_1726640977160.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_uncased_finetuned_squad_alexcoliveira","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_uncased_finetuned_squad_alexcoliveira", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_squad_alexcoliveira| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/alexcoliveira/distilbert-base-uncased-finetuned-squad \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_squad_cinoss_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_squad_cinoss_en.md new file mode 100644 index 00000000000000..4dc24cead2587a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_squad_cinoss_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_squad_cinoss DistilBertForQuestionAnswering from cinoss +author: John Snow Labs +name: distilbert_base_uncased_finetuned_squad_cinoss +date: 2024-09-18 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_squad_cinoss` is a English model originally trained by cinoss. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_cinoss_en_5.5.0_3.0_1726640727903.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_cinoss_en_5.5.0_3.0_1726640727903.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_uncased_finetuned_squad_cinoss","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_uncased_finetuned_squad_cinoss", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_squad_cinoss| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/cinoss/distilbert-base-uncased-finetuned-squad \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_squad_cinoss_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_squad_cinoss_pipeline_en.md new file mode 100644 index 00000000000000..e99ebab65773b0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_squad_cinoss_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_squad_cinoss_pipeline pipeline DistilBertForQuestionAnswering from cinoss +author: John Snow Labs +name: distilbert_base_uncased_finetuned_squad_cinoss_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_squad_cinoss_pipeline` is a English model originally trained by cinoss. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_cinoss_pipeline_en_5.5.0_3.0_1726640740057.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_cinoss_pipeline_en_5.5.0_3.0_1726640740057.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_squad_cinoss_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_squad_cinoss_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_squad_cinoss_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.3 MB| + +## References + +https://huggingface.co/cinoss/distilbert-base-uncased-finetuned-squad + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_squad_d5716d28_alex_atelo_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_squad_d5716d28_alex_atelo_pipeline_en.md new file mode 100644 index 00000000000000..c46bb559ae9899 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_squad_d5716d28_alex_atelo_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_squad_d5716d28_alex_atelo_pipeline pipeline DistilBertForQuestionAnswering from alex-atelo +author: John Snow Labs +name: distilbert_base_uncased_finetuned_squad_d5716d28_alex_atelo_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_squad_d5716d28_alex_atelo_pipeline` is a English model originally trained by alex-atelo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_d5716d28_alex_atelo_pipeline_en_5.5.0_3.0_1726644033787.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_d5716d28_alex_atelo_pipeline_en_5.5.0_3.0_1726644033787.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_squad_d5716d28_alex_atelo_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_squad_d5716d28_alex_atelo_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_squad_d5716d28_alex_atelo_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.3 MB| + +## References + +https://huggingface.co/alex-atelo/distilbert-base-uncased-finetuned-squad-d5716d28 + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_squad_d5716d28_mdance_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_squad_d5716d28_mdance_en.md new file mode 100644 index 00000000000000..10d95a0024ed12 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_squad_d5716d28_mdance_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_squad_d5716d28_mdance DistilBertForQuestionAnswering from mdance +author: John Snow Labs +name: distilbert_base_uncased_finetuned_squad_d5716d28_mdance +date: 2024-09-18 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_squad_d5716d28_mdance` is a English model originally trained by mdance. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_d5716d28_mdance_en_5.5.0_3.0_1726641091398.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_d5716d28_mdance_en_5.5.0_3.0_1726641091398.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_uncased_finetuned_squad_d5716d28_mdance","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_uncased_finetuned_squad_d5716d28_mdance", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_squad_d5716d28_mdance| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/mdance/distilbert-base-uncased-finetuned-squad-d5716d28 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_squad_d5716d28_mdance_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_squad_d5716d28_mdance_pipeline_en.md new file mode 100644 index 00000000000000..19e5016638aec1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_squad_d5716d28_mdance_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_squad_d5716d28_mdance_pipeline pipeline DistilBertForQuestionAnswering from mdance +author: John Snow Labs +name: distilbert_base_uncased_finetuned_squad_d5716d28_mdance_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_squad_d5716d28_mdance_pipeline` is a English model originally trained by mdance. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_d5716d28_mdance_pipeline_en_5.5.0_3.0_1726641103347.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_d5716d28_mdance_pipeline_en_5.5.0_3.0_1726641103347.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_squad_d5716d28_mdance_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_squad_d5716d28_mdance_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_squad_d5716d28_mdance_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.3 MB| + +## References + +https://huggingface.co/mdance/distilbert-base-uncased-finetuned-squad-d5716d28 + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_squad_d5716d28_simon580803_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_squad_d5716d28_simon580803_en.md new file mode 100644 index 00000000000000..c8e6483ffafa4e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_squad_d5716d28_simon580803_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_squad_d5716d28_simon580803 DistilBertForQuestionAnswering from Simon580803 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_squad_d5716d28_simon580803 +date: 2024-09-18 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_squad_d5716d28_simon580803` is a English model originally trained by Simon580803. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_d5716d28_simon580803_en_5.5.0_3.0_1726644152760.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_d5716d28_simon580803_en_5.5.0_3.0_1726644152760.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_uncased_finetuned_squad_d5716d28_simon580803","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_uncased_finetuned_squad_d5716d28_simon580803", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_squad_d5716d28_simon580803| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/Simon580803/distilbert-base-uncased-finetuned-squad-d5716d28 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_squad_rajkiran_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_squad_rajkiran_en.md new file mode 100644 index 00000000000000..b512f67009d883 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_squad_rajkiran_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_squad_rajkiran DistilBertForQuestionAnswering from rajkiran +author: John Snow Labs +name: distilbert_base_uncased_finetuned_squad_rajkiran +date: 2024-09-18 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_squad_rajkiran` is a English model originally trained by rajkiran. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_rajkiran_en_5.5.0_3.0_1726640682899.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_rajkiran_en_5.5.0_3.0_1726640682899.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_uncased_finetuned_squad_rajkiran","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_uncased_finetuned_squad_rajkiran", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_squad_rajkiran| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/rajkiran/distilbert-base-uncased-finetuned-squad \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_squad_rajkiran_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_squad_rajkiran_pipeline_en.md new file mode 100644 index 00000000000000..6c3b4cb8381eb8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_squad_rajkiran_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_squad_rajkiran_pipeline pipeline DistilBertForQuestionAnswering from rajkiran +author: John Snow Labs +name: distilbert_base_uncased_finetuned_squad_rajkiran_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_squad_rajkiran_pipeline` is a English model originally trained by rajkiran. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_rajkiran_pipeline_en_5.5.0_3.0_1726640697554.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_rajkiran_pipeline_en_5.5.0_3.0_1726640697554.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_squad_rajkiran_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_squad_rajkiran_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_squad_rajkiran_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.3 MB| + +## References + +https://huggingface.co/rajkiran/distilbert-base-uncased-finetuned-squad + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_squad_sanghakoh_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_squad_sanghakoh_pipeline_en.md new file mode 100644 index 00000000000000..89805e18d56d24 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_squad_sanghakoh_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_squad_sanghakoh_pipeline pipeline DistilBertForQuestionAnswering from sanghakoh +author: John Snow Labs +name: distilbert_base_uncased_finetuned_squad_sanghakoh_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_squad_sanghakoh_pipeline` is a English model originally trained by sanghakoh. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_sanghakoh_pipeline_en_5.5.0_3.0_1726640697607.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_sanghakoh_pipeline_en_5.5.0_3.0_1726640697607.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_squad_sanghakoh_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_squad_sanghakoh_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_squad_sanghakoh_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/sanghakoh/distilbert-base-uncased-finetuned-squad + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_squad_songhyundong_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_squad_songhyundong_en.md new file mode 100644 index 00000000000000..452423f8029063 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_squad_songhyundong_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_squad_songhyundong DistilBertForQuestionAnswering from songhyundong +author: John Snow Labs +name: distilbert_base_uncased_finetuned_squad_songhyundong +date: 2024-09-18 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_squad_songhyundong` is a English model originally trained by songhyundong. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_songhyundong_en_5.5.0_3.0_1726644170135.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_songhyundong_en_5.5.0_3.0_1726644170135.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_uncased_finetuned_squad_songhyundong","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_uncased_finetuned_squad_songhyundong", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_squad_songhyundong| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/songhyundong/distilbert-base-uncased-finetuned-squad \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_squad_soullllll_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_squad_soullllll_en.md new file mode 100644 index 00000000000000..318852e59a5414 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_squad_soullllll_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_squad_soullllll DistilBertForQuestionAnswering from soullllll +author: John Snow Labs +name: distilbert_base_uncased_finetuned_squad_soullllll +date: 2024-09-18 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_squad_soullllll` is a English model originally trained by soullllll. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_soullllll_en_5.5.0_3.0_1726640794286.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_soullllll_en_5.5.0_3.0_1726640794286.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_uncased_finetuned_squad_soullllll","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_uncased_finetuned_squad_soullllll", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_squad_soullllll| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/soullllll/distilbert-base-uncased-finetuned-squad \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_squad_yeoni0208_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_squad_yeoni0208_pipeline_en.md new file mode 100644 index 00000000000000..cc4eb6fc1dd7c9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_squad_yeoni0208_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_squad_yeoni0208_pipeline pipeline DistilBertForQuestionAnswering from yeoni0208 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_squad_yeoni0208_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_squad_yeoni0208_pipeline` is a English model originally trained by yeoni0208. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_yeoni0208_pipeline_en_5.5.0_3.0_1726640803447.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_yeoni0208_pipeline_en_5.5.0_3.0_1726640803447.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_squad_yeoni0208_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_squad_yeoni0208_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_squad_yeoni0208_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.3 MB| + +## References + +https://huggingface.co/yeoni0208/distilbert-base-uncased-finetuned-squad + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_sst2_dinhlnd1610_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_sst2_dinhlnd1610_en.md new file mode 100644 index 00000000000000..98ebeaabe7216b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_sst2_dinhlnd1610_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_sst2_dinhlnd1610 DistilBertForSequenceClassification from dinhlnd1610 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_sst2_dinhlnd1610 +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_sst2_dinhlnd1610` is a English model originally trained by dinhlnd1610. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_sst2_dinhlnd1610_en_5.5.0_3.0_1726630803452.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_sst2_dinhlnd1610_en_5.5.0_3.0_1726630803452.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_sst2_dinhlnd1610","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_sst2_dinhlnd1610", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_sst2_dinhlnd1610| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/dinhlnd1610/distilbert-base-uncased-finetuned-sst2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_t_generic_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_t_generic_en.md new file mode 100644 index 00000000000000..a0440ee23dbe5f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_t_generic_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_t_generic DistilBertForSequenceClassification from Gregorig +author: John Snow Labs +name: distilbert_base_uncased_finetuned_t_generic +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_t_generic` is a English model originally trained by Gregorig. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_t_generic_en_5.5.0_3.0_1726696541782.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_t_generic_en_5.5.0_3.0_1726696541782.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_t_generic","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_t_generic", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_t_generic| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Gregorig/distilbert-base-uncased-finetuned-t_generic \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_t_generic_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_t_generic_pipeline_en.md new file mode 100644 index 00000000000000..cdd87ee0f4475d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_t_generic_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_t_generic_pipeline pipeline DistilBertForSequenceClassification from Gregorig +author: John Snow Labs +name: distilbert_base_uncased_finetuned_t_generic_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_t_generic_pipeline` is a English model originally trained by Gregorig. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_t_generic_pipeline_en_5.5.0_3.0_1726696554994.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_t_generic_pipeline_en_5.5.0_3.0_1726696554994.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_t_generic_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_t_generic_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_t_generic_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Gregorig/distilbert-base-uncased-finetuned-t_generic + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_kitchen_and_dining_zphr_0st72_ut52ut1_plain_simsp_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_kitchen_and_dining_zphr_0st72_ut52ut1_plain_simsp_en.md new file mode 100644 index 00000000000000..0c097960af7ab9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_kitchen_and_dining_zphr_0st72_ut52ut1_plain_simsp_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_kitchen_and_dining_zphr_0st72_ut52ut1_plain_simsp DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_kitchen_and_dining_zphr_0st72_ut52ut1_plain_simsp +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_kitchen_and_dining_zphr_0st72_ut52ut1_plain_simsp` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_kitchen_and_dining_zphr_0st72_ut52ut1_plain_simsp_en_5.5.0_3.0_1726680111837.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_kitchen_and_dining_zphr_0st72_ut52ut1_plain_simsp_en_5.5.0_3.0_1726680111837.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_kitchen_and_dining_zphr_0st72_ut52ut1_plain_simsp","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_kitchen_and_dining_zphr_0st72_ut52ut1_plain_simsp", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_kitchen_and_dining_zphr_0st72_ut52ut1_plain_simsp| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_kitchen_and_dining_zphr_0st72_ut52ut1_plain_simsp \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_kitchen_and_dining_zphr_0st72_ut52ut1_plain_simsp_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_kitchen_and_dining_zphr_0st72_ut52ut1_plain_simsp_pipeline_en.md new file mode 100644 index 00000000000000..c480a25899e953 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_kitchen_and_dining_zphr_0st72_ut52ut1_plain_simsp_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_kitchen_and_dining_zphr_0st72_ut52ut1_plain_simsp_pipeline pipeline DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_kitchen_and_dining_zphr_0st72_ut52ut1_plain_simsp_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_kitchen_and_dining_zphr_0st72_ut52ut1_plain_simsp_pipeline` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_kitchen_and_dining_zphr_0st72_ut52ut1_plain_simsp_pipeline_en_5.5.0_3.0_1726680128923.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_kitchen_and_dining_zphr_0st72_ut52ut1_plain_simsp_pipeline_en_5.5.0_3.0_1726680128923.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_kitchen_and_dining_zphr_0st72_ut52ut1_plain_simsp_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_kitchen_and_dining_zphr_0st72_ut52ut1_plain_simsp_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_kitchen_and_dining_zphr_0st72_ut52ut1_plain_simsp_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_kitchen_and_dining_zphr_0st72_ut52ut1_plain_simsp + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_luciayn_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_luciayn_pipeline_en.md new file mode 100644 index 00000000000000..d59edfeacaca28 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_luciayn_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_luciayn_pipeline pipeline DistilBertForSequenceClassification from luciayn +author: John Snow Labs +name: distilbert_base_uncased_luciayn_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_luciayn_pipeline` is a English model originally trained by luciayn. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_luciayn_pipeline_en_5.5.0_3.0_1726680718235.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_luciayn_pipeline_en_5.5.0_3.0_1726680718235.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_luciayn_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_luciayn_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_luciayn_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/luciayn/distilbert_base_uncased + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_odm_zphr_0st10sd_ut72ut1large10pfxnf_simsp400_clean200_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_odm_zphr_0st10sd_ut72ut1large10pfxnf_simsp400_clean200_en.md new file mode 100644 index 00000000000000..84e392c0e841be --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_odm_zphr_0st10sd_ut72ut1large10pfxnf_simsp400_clean200_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st10sd_ut72ut1large10pfxnf_simsp400_clean200 DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st10sd_ut72ut1large10pfxnf_simsp400_clean200 +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st10sd_ut72ut1large10pfxnf_simsp400_clean200` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st10sd_ut72ut1large10pfxnf_simsp400_clean200_en_5.5.0_3.0_1726680420351.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st10sd_ut72ut1large10pfxnf_simsp400_clean200_en_5.5.0_3.0_1726680420351.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st10sd_ut72ut1large10pfxnf_simsp400_clean200","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st10sd_ut72ut1large10pfxnf_simsp400_clean200", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st10sd_ut72ut1large10pfxnf_simsp400_clean200| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.6 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st10sd_ut72ut1large10PfxNf_simsp400_clean200 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_odm_zphr_0st10sd_ut72ut1large10pfxnf_simsp400_clean200_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_odm_zphr_0st10sd_ut72ut1large10pfxnf_simsp400_clean200_pipeline_en.md new file mode 100644 index 00000000000000..d02c70cd67226d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_odm_zphr_0st10sd_ut72ut1large10pfxnf_simsp400_clean200_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st10sd_ut72ut1large10pfxnf_simsp400_clean200_pipeline pipeline DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st10sd_ut72ut1large10pfxnf_simsp400_clean200_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st10sd_ut72ut1large10pfxnf_simsp400_clean200_pipeline` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st10sd_ut72ut1large10pfxnf_simsp400_clean200_pipeline_en_5.5.0_3.0_1726680433504.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st10sd_ut72ut1large10pfxnf_simsp400_clean200_pipeline_en_5.5.0_3.0_1726680433504.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st10sd_ut72ut1large10pfxnf_simsp400_clean200_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st10sd_ut72ut1large10pfxnf_simsp400_clean200_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st10sd_ut72ut1large10pfxnf_simsp400_clean200_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.7 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st10sd_ut72ut1large10PfxNf_simsp400_clean200 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_odm_zphr_0st11sd_ut72ut1largepfxnf_simsp300_clean200_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_odm_zphr_0st11sd_ut72ut1largepfxnf_simsp300_clean200_en.md new file mode 100644 index 00000000000000..674dc7de5a9152 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_odm_zphr_0st11sd_ut72ut1largepfxnf_simsp300_clean200_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st11sd_ut72ut1largepfxnf_simsp300_clean200 DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st11sd_ut72ut1largepfxnf_simsp300_clean200 +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st11sd_ut72ut1largepfxnf_simsp300_clean200` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st11sd_ut72ut1largepfxnf_simsp300_clean200_en_5.5.0_3.0_1726676869126.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st11sd_ut72ut1largepfxnf_simsp300_clean200_en_5.5.0_3.0_1726676869126.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st11sd_ut72ut1largepfxnf_simsp300_clean200","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st11sd_ut72ut1largepfxnf_simsp300_clean200", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st11sd_ut72ut1largepfxnf_simsp300_clean200| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.6 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st11sd_ut72ut1largePfxNf_simsp300_clean200 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_odm_zphr_0st12sd_ut72ut1large12pfxnf_simsp400_clean200_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_odm_zphr_0st12sd_ut72ut1large12pfxnf_simsp400_clean200_pipeline_en.md new file mode 100644 index 00000000000000..e6619bdfeef1a2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_odm_zphr_0st12sd_ut72ut1large12pfxnf_simsp400_clean200_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st12sd_ut72ut1large12pfxnf_simsp400_clean200_pipeline pipeline DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st12sd_ut72ut1large12pfxnf_simsp400_clean200_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st12sd_ut72ut1large12pfxnf_simsp400_clean200_pipeline` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st12sd_ut72ut1large12pfxnf_simsp400_clean200_pipeline_en_5.5.0_3.0_1726630600620.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st12sd_ut72ut1large12pfxnf_simsp400_clean200_pipeline_en_5.5.0_3.0_1726630600620.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st12sd_ut72ut1large12pfxnf_simsp400_clean200_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st12sd_ut72ut1large12pfxnf_simsp400_clean200_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st12sd_ut72ut1large12pfxnf_simsp400_clean200_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.7 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st12sd_ut72ut1large12PfxNf_simsp400_clean200 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_odm_zphr_0st12sd_ut72ut1largepfxnf_simsp300_clean100_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_odm_zphr_0st12sd_ut72ut1largepfxnf_simsp300_clean100_pipeline_en.md new file mode 100644 index 00000000000000..5831eb1838973a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_odm_zphr_0st12sd_ut72ut1largepfxnf_simsp300_clean100_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st12sd_ut72ut1largepfxnf_simsp300_clean100_pipeline pipeline DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st12sd_ut72ut1largepfxnf_simsp300_clean100_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st12sd_ut72ut1largepfxnf_simsp300_clean100_pipeline` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st12sd_ut72ut1largepfxnf_simsp300_clean100_pipeline_en_5.5.0_3.0_1726680129063.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st12sd_ut72ut1largepfxnf_simsp300_clean100_pipeline_en_5.5.0_3.0_1726680129063.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st12sd_ut72ut1largepfxnf_simsp300_clean100_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st12sd_ut72ut1largepfxnf_simsp300_clean100_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st12sd_ut72ut1largepfxnf_simsp300_clean100_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.7 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st12sd_ut72ut1largePfxNf_simsp300_clean100 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_odm_zphr_0st1sd_ut12ut1_plprefix0stlarge_simsp100_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_odm_zphr_0st1sd_ut12ut1_plprefix0stlarge_simsp100_en.md new file mode 100644 index 00000000000000..26b29d7b36529c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_odm_zphr_0st1sd_ut12ut1_plprefix0stlarge_simsp100_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st1sd_ut12ut1_plprefix0stlarge_simsp100 DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st1sd_ut12ut1_plprefix0stlarge_simsp100 +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st1sd_ut12ut1_plprefix0stlarge_simsp100` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st1sd_ut12ut1_plprefix0stlarge_simsp100_en_5.5.0_3.0_1726681925088.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st1sd_ut12ut1_plprefix0stlarge_simsp100_en_5.5.0_3.0_1726681925088.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st1sd_ut12ut1_plprefix0stlarge_simsp100","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st1sd_ut12ut1_plprefix0stlarge_simsp100", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st1sd_ut12ut1_plprefix0stlarge_simsp100| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.6 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st1sd_ut12ut1_PLPrefix0stlarge_simsp100 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_odm_zphr_0st23sd_ut72ut1_plprefix0stlarge23_simsp_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_odm_zphr_0st23sd_ut72ut1_plprefix0stlarge23_simsp_pipeline_en.md new file mode 100644 index 00000000000000..ff7fba5caa95c4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_odm_zphr_0st23sd_ut72ut1_plprefix0stlarge23_simsp_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st23sd_ut72ut1_plprefix0stlarge23_simsp_pipeline pipeline DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st23sd_ut72ut1_plprefix0stlarge23_simsp_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st23sd_ut72ut1_plprefix0stlarge23_simsp_pipeline` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st23sd_ut72ut1_plprefix0stlarge23_simsp_pipeline_en_5.5.0_3.0_1726669977800.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st23sd_ut72ut1_plprefix0stlarge23_simsp_pipeline_en_5.5.0_3.0_1726669977800.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st23sd_ut72ut1_plprefix0stlarge23_simsp_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st23sd_ut72ut1_plprefix0stlarge23_simsp_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st23sd_ut72ut1_plprefix0stlarge23_simsp_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.7 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st23sd_ut72ut1_PLPrefix0stlarge23_simsp + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_odm_zphr_0st3sd_ut72ut1_plprefix0stlarge_simsp300_clean200_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_odm_zphr_0st3sd_ut72ut1_plprefix0stlarge_simsp300_clean200_en.md new file mode 100644 index 00000000000000..b1a3f27853e8d9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_odm_zphr_0st3sd_ut72ut1_plprefix0stlarge_simsp300_clean200_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st3sd_ut72ut1_plprefix0stlarge_simsp300_clean200 DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st3sd_ut72ut1_plprefix0stlarge_simsp300_clean200 +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st3sd_ut72ut1_plprefix0stlarge_simsp300_clean200` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st3sd_ut72ut1_plprefix0stlarge_simsp300_clean200_en_5.5.0_3.0_1726680302027.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st3sd_ut72ut1_plprefix0stlarge_simsp300_clean200_en_5.5.0_3.0_1726680302027.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st3sd_ut72ut1_plprefix0stlarge_simsp300_clean200","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st3sd_ut72ut1_plprefix0stlarge_simsp300_clean200", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st3sd_ut72ut1_plprefix0stlarge_simsp300_clean200| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.6 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st3sd_ut72ut1_PLPrefix0stlarge_simsp300_clean200 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_odm_zphr_0st3sd_ut72ut1_plprefix0stlarge_simsp300_clean200_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_odm_zphr_0st3sd_ut72ut1_plprefix0stlarge_simsp300_clean200_pipeline_en.md new file mode 100644 index 00000000000000..4ddb717dcee2b4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_odm_zphr_0st3sd_ut72ut1_plprefix0stlarge_simsp300_clean200_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st3sd_ut72ut1_plprefix0stlarge_simsp300_clean200_pipeline pipeline DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st3sd_ut72ut1_plprefix0stlarge_simsp300_clean200_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st3sd_ut72ut1_plprefix0stlarge_simsp300_clean200_pipeline` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st3sd_ut72ut1_plprefix0stlarge_simsp300_clean200_pipeline_en_5.5.0_3.0_1726680315037.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st3sd_ut72ut1_plprefix0stlarge_simsp300_clean200_pipeline_en_5.5.0_3.0_1726680315037.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st3sd_ut72ut1_plprefix0stlarge_simsp300_clean200_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st3sd_ut72ut1_plprefix0stlarge_simsp300_clean200_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st3sd_ut72ut1_plprefix0stlarge_simsp300_clean200_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.7 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st3sd_ut72ut1_PLPrefix0stlarge_simsp300_clean200 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1_plprefix0stlarge21_simsp_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1_plprefix0stlarge21_simsp_pipeline_en.md new file mode 100644 index 00000000000000..e2879f5b534185 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1_plprefix0stlarge21_simsp_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1_plprefix0stlarge21_simsp_pipeline pipeline DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1_plprefix0stlarge21_simsp_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1_plprefix0stlarge21_simsp_pipeline` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1_plprefix0stlarge21_simsp_pipeline_en_5.5.0_3.0_1726677232702.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1_plprefix0stlarge21_simsp_pipeline_en_5.5.0_3.0_1726677232702.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1_plprefix0stlarge21_simsp_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1_plprefix0stlarge21_simsp_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1_plprefix0stlarge21_simsp_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.7 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st42sd_ut72ut1_PLPrefix0stlarge21_simsp + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_odm_zphr_0st5sd_ut72ut5_plprefix0stlarge_simsp100_clean200_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_odm_zphr_0st5sd_ut72ut5_plprefix0stlarge_simsp100_clean200_en.md new file mode 100644 index 00000000000000..1c1a061bb1909f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_odm_zphr_0st5sd_ut72ut5_plprefix0stlarge_simsp100_clean200_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st5sd_ut72ut5_plprefix0stlarge_simsp100_clean200 DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st5sd_ut72ut5_plprefix0stlarge_simsp100_clean200 +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st5sd_ut72ut5_plprefix0stlarge_simsp100_clean200` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st5sd_ut72ut5_plprefix0stlarge_simsp100_clean200_en_5.5.0_3.0_1726630381060.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st5sd_ut72ut5_plprefix0stlarge_simsp100_clean200_en_5.5.0_3.0_1726630381060.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st5sd_ut72ut5_plprefix0stlarge_simsp100_clean200","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st5sd_ut72ut5_plprefix0stlarge_simsp100_clean200", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st5sd_ut72ut5_plprefix0stlarge_simsp100_clean200| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.6 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st5sd_ut72ut5_PLPrefix0stlarge_simsp100_clean200 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_odm_zphr_0st8sd_ut72ut1largepfxnf_simsp300_clean200_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_odm_zphr_0st8sd_ut72ut1largepfxnf_simsp300_clean200_pipeline_en.md new file mode 100644 index 00000000000000..049b89c8e6db7b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_odm_zphr_0st8sd_ut72ut1largepfxnf_simsp300_clean200_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st8sd_ut72ut1largepfxnf_simsp300_clean200_pipeline pipeline DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st8sd_ut72ut1largepfxnf_simsp300_clean200_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st8sd_ut72ut1largepfxnf_simsp300_clean200_pipeline` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st8sd_ut72ut1largepfxnf_simsp300_clean200_pipeline_en_5.5.0_3.0_1726630805093.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st8sd_ut72ut1largepfxnf_simsp300_clean200_pipeline_en_5.5.0_3.0_1726630805093.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st8sd_ut72ut1largepfxnf_simsp300_clean200_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st8sd_ut72ut1largepfxnf_simsp300_clean200_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st8sd_ut72ut1largepfxnf_simsp300_clean200_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.7 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st8sd_ut72ut1largePfxNf_simsp300_clean200 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_odm_zphr_0st8sd_ut72ut5_plprefix0stlarge_simsp100_clean200_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_odm_zphr_0st8sd_ut72ut5_plprefix0stlarge_simsp100_clean200_en.md new file mode 100644 index 00000000000000..57e8f83c4e63d8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_odm_zphr_0st8sd_ut72ut5_plprefix0stlarge_simsp100_clean200_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st8sd_ut72ut5_plprefix0stlarge_simsp100_clean200 DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st8sd_ut72ut5_plprefix0stlarge_simsp100_clean200 +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st8sd_ut72ut5_plprefix0stlarge_simsp100_clean200` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st8sd_ut72ut5_plprefix0stlarge_simsp100_clean200_en_5.5.0_3.0_1726681369576.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st8sd_ut72ut5_plprefix0stlarge_simsp100_clean200_en_5.5.0_3.0_1726681369576.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st8sd_ut72ut5_plprefix0stlarge_simsp100_clean200","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st8sd_ut72ut5_plprefix0stlarge_simsp100_clean200", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st8sd_ut72ut5_plprefix0stlarge_simsp100_clean200| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.6 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st8sd_ut72ut5_PLPrefix0stlarge_simsp100_clean200 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_odm_zphr_0st8sd_ut72ut5_plprefix0stlarge_simsp100_clean200_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_odm_zphr_0st8sd_ut72ut5_plprefix0stlarge_simsp100_clean200_pipeline_en.md new file mode 100644 index 00000000000000..46d855f1f9b688 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_odm_zphr_0st8sd_ut72ut5_plprefix0stlarge_simsp100_clean200_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st8sd_ut72ut5_plprefix0stlarge_simsp100_clean200_pipeline pipeline DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st8sd_ut72ut5_plprefix0stlarge_simsp100_clean200_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st8sd_ut72ut5_plprefix0stlarge_simsp100_clean200_pipeline` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st8sd_ut72ut5_plprefix0stlarge_simsp100_clean200_pipeline_en_5.5.0_3.0_1726681383300.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st8sd_ut72ut5_plprefix0stlarge_simsp100_clean200_pipeline_en_5.5.0_3.0_1726681383300.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st8sd_ut72ut5_plprefix0stlarge_simsp100_clean200_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st8sd_ut72ut5_plprefix0stlarge_simsp100_clean200_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st8sd_ut72ut5_plprefix0stlarge_simsp100_clean200_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.7 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st8sd_ut72ut5_PLPrefix0stlarge_simsp100_clean200 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_squad2_lora_merged_p10_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_squad2_lora_merged_p10_en.md new file mode 100644 index 00000000000000..f91f7a1fa7f9ac --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_squad2_lora_merged_p10_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English distilbert_base_uncased_squad2_lora_merged_p10 DistilBertForQuestionAnswering from pminha +author: John Snow Labs +name: distilbert_base_uncased_squad2_lora_merged_p10 +date: 2024-09-18 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_squad2_lora_merged_p10` is a English model originally trained by pminha. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_squad2_lora_merged_p10_en_5.5.0_3.0_1726640971252.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_squad2_lora_merged_p10_en_5.5.0_3.0_1726640971252.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_uncased_squad2_lora_merged_p10","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_uncased_squad2_lora_merged_p10", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_squad2_lora_merged_p10| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|237.6 MB| + +## References + +https://huggingface.co/pminha/distilbert-base-uncased-squad2-lora-merged-p10 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_squad2_lora_merged_p10_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_squad2_lora_merged_p10_pipeline_en.md new file mode 100644 index 00000000000000..80dd67b53bd55c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_squad2_lora_merged_p10_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English distilbert_base_uncased_squad2_lora_merged_p10_pipeline pipeline DistilBertForQuestionAnswering from pminha +author: John Snow Labs +name: distilbert_base_uncased_squad2_lora_merged_p10_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_squad2_lora_merged_p10_pipeline` is a English model originally trained by pminha. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_squad2_lora_merged_p10_pipeline_en_5.5.0_3.0_1726640985564.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_squad2_lora_merged_p10_pipeline_en_5.5.0_3.0_1726640985564.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_squad2_lora_merged_p10_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_squad2_lora_merged_p10_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_squad2_lora_merged_p10_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|237.6 MB| + +## References + +https://huggingface.co/pminha/distilbert-base-uncased-squad2-lora-merged-p10 + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_squad2_lora_merged_p30_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_squad2_lora_merged_p30_pipeline_en.md new file mode 100644 index 00000000000000..dce8d95f8d946c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_squad2_lora_merged_p30_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English distilbert_base_uncased_squad2_lora_merged_p30_pipeline pipeline DistilBertForQuestionAnswering from pminha +author: John Snow Labs +name: distilbert_base_uncased_squad2_lora_merged_p30_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_squad2_lora_merged_p30_pipeline` is a English model originally trained by pminha. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_squad2_lora_merged_p30_pipeline_en_5.5.0_3.0_1726644157355.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_squad2_lora_merged_p30_pipeline_en_5.5.0_3.0_1726644157355.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_squad2_lora_merged_p30_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_squad2_lora_merged_p30_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_squad2_lora_merged_p30_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|213.2 MB| + +## References + +https://huggingface.co/pminha/distilbert-base-uncased-squad2-lora-merged-p30 + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_squad2_lora_test_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_squad2_lora_test_en.md new file mode 100644 index 00000000000000..7b320270f00517 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_squad2_lora_test_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English distilbert_base_uncased_squad2_lora_test DistilBertForQuestionAnswering from JeukHwang +author: John Snow Labs +name: distilbert_base_uncased_squad2_lora_test +date: 2024-09-18 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_squad2_lora_test` is a English model originally trained by JeukHwang. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_squad2_lora_test_en_5.5.0_3.0_1726640863668.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_squad2_lora_test_en_5.5.0_3.0_1726640863668.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_uncased_squad2_lora_test","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_uncased_squad2_lora_test", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_squad2_lora_test| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|156.4 MB| + +## References + +https://huggingface.co/JeukHwang/distilbert-base-uncased-squad2-lora-test \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_squad2_lora_test_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_squad2_lora_test_pipeline_en.md new file mode 100644 index 00000000000000..95547920b1fe2b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_squad2_lora_test_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English distilbert_base_uncased_squad2_lora_test_pipeline pipeline DistilBertForQuestionAnswering from JeukHwang +author: John Snow Labs +name: distilbert_base_uncased_squad2_lora_test_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_squad2_lora_test_pipeline` is a English model originally trained by JeukHwang. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_squad2_lora_test_pipeline_en_5.5.0_3.0_1726640911028.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_squad2_lora_test_pipeline_en_5.5.0_3.0_1726640911028.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_squad2_lora_test_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_squad2_lora_test_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_squad2_lora_test_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|156.4 MB| + +## References + +https://huggingface.co/JeukHwang/distilbert-base-uncased-squad2-lora-test + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_squad2_p85_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_squad2_p85_en.md new file mode 100644 index 00000000000000..027399554dabf1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_squad2_p85_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English distilbert_base_uncased_squad2_p85 DistilBertForQuestionAnswering from pminha +author: John Snow Labs +name: distilbert_base_uncased_squad2_p85 +date: 2024-09-18 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_squad2_p85` is a English model originally trained by pminha. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_squad2_p85_en_5.5.0_3.0_1726641153424.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_squad2_p85_en_5.5.0_3.0_1726641153424.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_uncased_squad2_p85","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_uncased_squad2_p85", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_squad2_p85| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|130.7 MB| + +## References + +https://huggingface.co/pminha/distilbert-base-uncased-squad2-p85 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_squad2_p85_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_squad2_p85_pipeline_en.md new file mode 100644 index 00000000000000..3cc1f415cbd89c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_squad2_p85_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English distilbert_base_uncased_squad2_p85_pipeline pipeline DistilBertForQuestionAnswering from pminha +author: John Snow Labs +name: distilbert_base_uncased_squad2_p85_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_squad2_p85_pipeline` is a English model originally trained by pminha. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_squad2_p85_pipeline_en_5.5.0_3.0_1726641165642.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_squad2_p85_pipeline_en_5.5.0_3.0_1726641165642.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_squad2_p85_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_squad2_p85_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_squad2_p85_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|130.7 MB| + +## References + +https://huggingface.co/pminha/distilbert-base-uncased-squad2-p85 + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_text_classification_v6_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_text_classification_v6_pipeline_en.md new file mode 100644 index 00000000000000..41d84a8576fbce --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_text_classification_v6_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_text_classification_v6_pipeline pipeline DistilBertForSequenceClassification from arjuntheprogrammer +author: John Snow Labs +name: distilbert_base_uncased_text_classification_v6_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_text_classification_v6_pipeline` is a English model originally trained by arjuntheprogrammer. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_text_classification_v6_pipeline_en_5.5.0_3.0_1726669839593.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_text_classification_v6_pipeline_en_5.5.0_3.0_1726669839593.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_text_classification_v6_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_text_classification_v6_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_text_classification_v6_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.6 MB| + +## References + +https://huggingface.co/arjuntheprogrammer/distilbert-base-uncased-text-classification-v6 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_travel_zphr_0st_ut102ut10_plain_simsp_clean_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_travel_zphr_0st_ut102ut10_plain_simsp_clean_en.md new file mode 100644 index 00000000000000..e54afd9daa0699 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_travel_zphr_0st_ut102ut10_plain_simsp_clean_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_travel_zphr_0st_ut102ut10_plain_simsp_clean DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_travel_zphr_0st_ut102ut10_plain_simsp_clean +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_travel_zphr_0st_ut102ut10_plain_simsp_clean` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_travel_zphr_0st_ut102ut10_plain_simsp_clean_en_5.5.0_3.0_1726696170130.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_travel_zphr_0st_ut102ut10_plain_simsp_clean_en_5.5.0_3.0_1726696170130.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_travel_zphr_0st_ut102ut10_plain_simsp_clean","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_travel_zphr_0st_ut102ut10_plain_simsp_clean", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_travel_zphr_0st_ut102ut10_plain_simsp_clean| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_travel_zphr_0st_ut102ut10_plain_simsp_clean \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_travel_zphr_0st_ut102ut1_plainvalprefixlora_simsp_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_travel_zphr_0st_ut102ut1_plainvalprefixlora_simsp_pipeline_en.md new file mode 100644 index 00000000000000..a216e0e332f808 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_travel_zphr_0st_ut102ut1_plainvalprefixlora_simsp_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_travel_zphr_0st_ut102ut1_plainvalprefixlora_simsp_pipeline pipeline DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_travel_zphr_0st_ut102ut1_plainvalprefixlora_simsp_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_travel_zphr_0st_ut102ut1_plainvalprefixlora_simsp_pipeline` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_travel_zphr_0st_ut102ut1_plainvalprefixlora_simsp_pipeline_en_5.5.0_3.0_1726696542690.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_travel_zphr_0st_ut102ut1_plainvalprefixlora_simsp_pipeline_en_5.5.0_3.0_1726696542690.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_travel_zphr_0st_ut102ut1_plainvalprefixlora_simsp_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_travel_zphr_0st_ut102ut1_plainvalprefixlora_simsp_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_travel_zphr_0st_ut102ut1_plainvalprefixlora_simsp_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_travel_zphr_0st_ut102ut1_plainValPrefixLora_simsp + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_travel_zphr_0st_ut72ut1_ad7_simsp_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_travel_zphr_0st_ut72ut1_ad7_simsp_en.md new file mode 100644 index 00000000000000..df8a38d657e02a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_travel_zphr_0st_ut72ut1_ad7_simsp_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_travel_zphr_0st_ut72ut1_ad7_simsp DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_travel_zphr_0st_ut72ut1_ad7_simsp +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_travel_zphr_0st_ut72ut1_ad7_simsp` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_travel_zphr_0st_ut72ut1_ad7_simsp_en_5.5.0_3.0_1726696336287.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_travel_zphr_0st_ut72ut1_ad7_simsp_en_5.5.0_3.0_1726696336287.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_travel_zphr_0st_ut72ut1_ad7_simsp","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_travel_zphr_0st_ut72ut1_ad7_simsp", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_travel_zphr_0st_ut72ut1_ad7_simsp| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_travel_zphr_0st_ut72ut1_ad7_simsp \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_travel_zphr_0st_ut72ut1_ad7_simsp_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_travel_zphr_0st_ut72ut1_ad7_simsp_pipeline_en.md new file mode 100644 index 00000000000000..7302ca13c10efe --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_travel_zphr_0st_ut72ut1_ad7_simsp_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_travel_zphr_0st_ut72ut1_ad7_simsp_pipeline pipeline DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_travel_zphr_0st_ut72ut1_ad7_simsp_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_travel_zphr_0st_ut72ut1_ad7_simsp_pipeline` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_travel_zphr_0st_ut72ut1_ad7_simsp_pipeline_en_5.5.0_3.0_1726696349747.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_travel_zphr_0st_ut72ut1_ad7_simsp_pipeline_en_5.5.0_3.0_1726696349747.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_travel_zphr_0st_ut72ut1_ad7_simsp_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_travel_zphr_0st_ut72ut1_ad7_simsp_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_travel_zphr_0st_ut72ut1_ad7_simsp_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_travel_zphr_0st_ut72ut1_ad7_simsp + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_work_zphr_0st_ut102ut10_plain_simsp_clean_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_work_zphr_0st_ut102ut10_plain_simsp_clean_en.md new file mode 100644 index 00000000000000..7adf6f2e8855ba --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_work_zphr_0st_ut102ut10_plain_simsp_clean_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_work_zphr_0st_ut102ut10_plain_simsp_clean DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_work_zphr_0st_ut102ut10_plain_simsp_clean +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_work_zphr_0st_ut102ut10_plain_simsp_clean` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_work_zphr_0st_ut102ut10_plain_simsp_clean_en_5.5.0_3.0_1726680629620.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_work_zphr_0st_ut102ut10_plain_simsp_clean_en_5.5.0_3.0_1726680629620.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_work_zphr_0st_ut102ut10_plain_simsp_clean","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_work_zphr_0st_ut102ut10_plain_simsp_clean", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_work_zphr_0st_ut102ut10_plain_simsp_clean| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_work_zphr_0st_ut102ut10_plain_simsp_clean \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_work_zphr_0st_ut52ut1_ad7_simsp_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_work_zphr_0st_ut52ut1_ad7_simsp_en.md new file mode 100644 index 00000000000000..8b6da3265537a2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_work_zphr_0st_ut52ut1_ad7_simsp_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_work_zphr_0st_ut52ut1_ad7_simsp DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_work_zphr_0st_ut52ut1_ad7_simsp +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_work_zphr_0st_ut52ut1_ad7_simsp` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_work_zphr_0st_ut52ut1_ad7_simsp_en_5.5.0_3.0_1726677493037.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_work_zphr_0st_ut52ut1_ad7_simsp_en_5.5.0_3.0_1726677493037.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_work_zphr_0st_ut52ut1_ad7_simsp","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_work_zphr_0st_ut52ut1_ad7_simsp", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_work_zphr_0st_ut52ut1_ad7_simsp| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_work_zphr_0st_ut52ut1_ad7_simsp \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_birads_eco_mamo_1_descartado_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_birads_eco_mamo_1_descartado_pipeline_en.md new file mode 100644 index 00000000000000..91bb86fa3e70e8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_birads_eco_mamo_1_descartado_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_birads_eco_mamo_1_descartado_pipeline pipeline DistilBertForSequenceClassification from sara-m98 +author: John Snow Labs +name: distilbert_birads_eco_mamo_1_descartado_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_birads_eco_mamo_1_descartado_pipeline` is a English model originally trained by sara-m98. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_birads_eco_mamo_1_descartado_pipeline_en_5.5.0_3.0_1726676880035.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_birads_eco_mamo_1_descartado_pipeline_en_5.5.0_3.0_1726676880035.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_birads_eco_mamo_1_descartado_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_birads_eco_mamo_1_descartado_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_birads_eco_mamo_1_descartado_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/sara-m98/DISTILBERT_BIRADS_ECO_MAMO_1_DESCARTADO + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_emotion_muratkznc_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_emotion_muratkznc_pipeline_en.md new file mode 100644 index 00000000000000..3e3a65258d4ccb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_emotion_muratkznc_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_emotion_muratkznc_pipeline pipeline DistilBertForSequenceClassification from MuratKZNC +author: John Snow Labs +name: distilbert_emotion_muratkznc_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_emotion_muratkznc_pipeline` is a English model originally trained by MuratKZNC. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_emotion_muratkznc_pipeline_en_5.5.0_3.0_1726677300026.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_emotion_muratkznc_pipeline_en_5.5.0_3.0_1726677300026.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_emotion_muratkznc_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_emotion_muratkznc_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_emotion_muratkznc_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/MuratKZNC/distilbert-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_emotion_neelaa_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_emotion_neelaa_en.md new file mode 100644 index 00000000000000..3fd9ca3b63ce01 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_emotion_neelaa_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_emotion_neelaa DistilBertForSequenceClassification from neelaa +author: John Snow Labs +name: distilbert_emotion_neelaa +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_emotion_neelaa` is a English model originally trained by neelaa. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_emotion_neelaa_en_5.5.0_3.0_1726625667056.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_emotion_neelaa_en_5.5.0_3.0_1726625667056.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_emotion_neelaa","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_emotion_neelaa", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_emotion_neelaa| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/neelaa/distilbert-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_emotion_neelaa_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_emotion_neelaa_pipeline_en.md new file mode 100644 index 00000000000000..e0b03ecbc9b1a8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_emotion_neelaa_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_emotion_neelaa_pipeline pipeline DistilBertForSequenceClassification from neelaa +author: John Snow Labs +name: distilbert_emotion_neelaa_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_emotion_neelaa_pipeline` is a English model originally trained by neelaa. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_emotion_neelaa_pipeline_en_5.5.0_3.0_1726625679242.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_emotion_neelaa_pipeline_en_5.5.0_3.0_1726625679242.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_emotion_neelaa_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_emotion_neelaa_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_emotion_neelaa_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/neelaa/distilbert-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_emotions_fellowship_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_emotions_fellowship_en.md new file mode 100644 index 00000000000000..b8f8d119d8df24 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_emotions_fellowship_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_emotions_fellowship DistilBertForSequenceClassification from Valwolfor +author: John Snow Labs +name: distilbert_emotions_fellowship +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_emotions_fellowship` is a English model originally trained by Valwolfor. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_emotions_fellowship_en_5.5.0_3.0_1726670155995.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_emotions_fellowship_en_5.5.0_3.0_1726670155995.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_emotions_fellowship","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_emotions_fellowship", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_emotions_fellowship| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Valwolfor/distilbert_emotions_fellowship \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_foundation_category_c6_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_foundation_category_c6_en.md new file mode 100644 index 00000000000000..9c548dba194ff0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_foundation_category_c6_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_foundation_category_c6 DistilBertForSequenceClassification from eric-mc2 +author: John Snow Labs +name: distilbert_foundation_category_c6 +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_foundation_category_c6` is a English model originally trained by eric-mc2. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_foundation_category_c6_en_5.5.0_3.0_1726695903648.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_foundation_category_c6_en_5.5.0_3.0_1726695903648.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_foundation_category_c6","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_foundation_category_c6", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_foundation_category_c6| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/eric-mc2/distilbert-foundation-category-c6 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_imdb_huiang_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_imdb_huiang_en.md new file mode 100644 index 00000000000000..48c539aea054d0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_imdb_huiang_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_imdb_huiang DistilBertForSequenceClassification from huiang +author: John Snow Labs +name: distilbert_imdb_huiang +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_imdb_huiang` is a English model originally trained by huiang. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_imdb_huiang_en_5.5.0_3.0_1726630972201.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_imdb_huiang_en_5.5.0_3.0_1726630972201.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_imdb_huiang","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_imdb_huiang", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_imdb_huiang| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/huiang/distilbert-imdb \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_imdb_huiang_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_imdb_huiang_pipeline_en.md new file mode 100644 index 00000000000000..ec2f5f29f975a6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_imdb_huiang_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_imdb_huiang_pipeline pipeline DistilBertForSequenceClassification from huiang +author: John Snow Labs +name: distilbert_imdb_huiang_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_imdb_huiang_pipeline` is a English model originally trained by huiang. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_imdb_huiang_pipeline_en_5.5.0_3.0_1726630984324.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_imdb_huiang_pipeline_en_5.5.0_3.0_1726630984324.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_imdb_huiang_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_imdb_huiang_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_imdb_huiang_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/huiang/distilbert-imdb + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_imdb_padding80model_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_imdb_padding80model_pipeline_en.md new file mode 100644 index 00000000000000..7588ed499b7470 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_imdb_padding80model_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_imdb_padding80model_pipeline pipeline DistilBertForSequenceClassification from Realgon +author: John Snow Labs +name: distilbert_imdb_padding80model_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_imdb_padding80model_pipeline` is a English model originally trained by Realgon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_imdb_padding80model_pipeline_en_5.5.0_3.0_1726680243855.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_imdb_padding80model_pipeline_en_5.5.0_3.0_1726680243855.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_imdb_padding80model_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_imdb_padding80model_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_imdb_padding80model_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Realgon/distilbert_imdb_padding80model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_imdb_ranzuh_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_imdb_ranzuh_en.md new file mode 100644 index 00000000000000..5203ce11d060af --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_imdb_ranzuh_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_imdb_ranzuh DistilBertForSequenceClassification from ranzuh +author: John Snow Labs +name: distilbert_imdb_ranzuh +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_imdb_ranzuh` is a English model originally trained by ranzuh. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_imdb_ranzuh_en_5.5.0_3.0_1726677195608.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_imdb_ranzuh_en_5.5.0_3.0_1726677195608.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_imdb_ranzuh","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_imdb_ranzuh", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_imdb_ranzuh| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/ranzuh/distilbert-imdb \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_label_manipulation_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_label_manipulation_en.md new file mode 100644 index 00000000000000..ea234a1a280b8b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_label_manipulation_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_label_manipulation DistilBertForSequenceClassification from EllipticCurve +author: John Snow Labs +name: distilbert_label_manipulation +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_label_manipulation` is a English model originally trained by EllipticCurve. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_label_manipulation_en_5.5.0_3.0_1726630679706.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_label_manipulation_en_5.5.0_3.0_1726630679706.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_label_manipulation","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_label_manipulation", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_label_manipulation| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/EllipticCurve/DistilBERT-label-manipulation \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_lr_linear_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_lr_linear_en.md new file mode 100644 index 00000000000000..01fb1eb34e5458 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_lr_linear_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_lr_linear DistilBertForSequenceClassification from K-kiron +author: John Snow Labs +name: distilbert_lr_linear +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_lr_linear` is a English model originally trained by K-kiron. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_lr_linear_en_5.5.0_3.0_1726625556126.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_lr_linear_en_5.5.0_3.0_1726625556126.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_lr_linear","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_lr_linear", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_lr_linear| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/K-kiron/distilbert-lr-linear \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_mental_health_classification_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_mental_health_classification_en.md new file mode 100644 index 00000000000000..452381e21a221a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_mental_health_classification_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_mental_health_classification DistilBertForSequenceClassification from AnuradhaPoddar +author: John Snow Labs +name: distilbert_mental_health_classification +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_mental_health_classification` is a English model originally trained by AnuradhaPoddar. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_mental_health_classification_en_5.5.0_3.0_1726681577996.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_mental_health_classification_en_5.5.0_3.0_1726681577996.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_mental_health_classification","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_mental_health_classification", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_mental_health_classification| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/AnuradhaPoddar/distilbert_mental_health_classification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_mental_health_classification_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_mental_health_classification_pipeline_en.md new file mode 100644 index 00000000000000..7c4804179b245e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_mental_health_classification_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_mental_health_classification_pipeline pipeline DistilBertForSequenceClassification from AnuradhaPoddar +author: John Snow Labs +name: distilbert_mental_health_classification_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_mental_health_classification_pipeline` is a English model originally trained by AnuradhaPoddar. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_mental_health_classification_pipeline_en_5.5.0_3.0_1726681592769.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_mental_health_classification_pipeline_en_5.5.0_3.0_1726681592769.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_mental_health_classification_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_mental_health_classification_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_mental_health_classification_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/AnuradhaPoddar/distilbert_mental_health_classification + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_qa_pytorch_seed_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_qa_pytorch_seed_pipeline_en.md new file mode 100644 index 00000000000000..76b28cabd3418a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_qa_pytorch_seed_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English distilbert_qa_pytorch_seed_pipeline pipeline DistilBertForQuestionAnswering from tyavika +author: John Snow Labs +name: distilbert_qa_pytorch_seed_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_qa_pytorch_seed_pipeline` is a English model originally trained by tyavika. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_qa_pytorch_seed_pipeline_en_5.5.0_3.0_1726641002776.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_qa_pytorch_seed_pipeline_en_5.5.0_3.0_1726641002776.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_qa_pytorch_seed_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_qa_pytorch_seed_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_qa_pytorch_seed_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.3 MB| + +## References + +https://huggingface.co/tyavika/Distilbert-QA-Pytorch-seed + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_rte_192_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_rte_192_pipeline_en.md new file mode 100644 index 00000000000000..82ec8ef119c369 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_rte_192_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_rte_192_pipeline pipeline DistilBertForSequenceClassification from gokuls +author: John Snow Labs +name: distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_rte_192_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_rte_192_pipeline` is a English model originally trained by gokuls. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_rte_192_pipeline_en_5.5.0_3.0_1726677578292.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_rte_192_pipeline_en_5.5.0_3.0_1726677578292.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_rte_192_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_rte_192_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_rte_192_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|52.7 MB| + +## References + +https://huggingface.co/gokuls/distilbert_sa_GLUE_Experiment_logit_kd_data_aug_rte_192 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_sanskrit_saskta_glue_experiment_logit_kd_stsb_192_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_sanskrit_saskta_glue_experiment_logit_kd_stsb_192_pipeline_en.md new file mode 100644 index 00000000000000..5b8799db8d5401 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_sanskrit_saskta_glue_experiment_logit_kd_stsb_192_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_sanskrit_saskta_glue_experiment_logit_kd_stsb_192_pipeline pipeline DistilBertForSequenceClassification from gokuls +author: John Snow Labs +name: distilbert_sanskrit_saskta_glue_experiment_logit_kd_stsb_192_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_sanskrit_saskta_glue_experiment_logit_kd_stsb_192_pipeline` is a English model originally trained by gokuls. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_sanskrit_saskta_glue_experiment_logit_kd_stsb_192_pipeline_en_5.5.0_3.0_1726680212615.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_sanskrit_saskta_glue_experiment_logit_kd_stsb_192_pipeline_en_5.5.0_3.0_1726680212615.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_sanskrit_saskta_glue_experiment_logit_kd_stsb_192_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_sanskrit_saskta_glue_experiment_logit_kd_stsb_192_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_sanskrit_saskta_glue_experiment_logit_kd_stsb_192_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|52.7 MB| + +## References + +https://huggingface.co/gokuls/distilbert_sa_GLUE_Experiment_logit_kd_stsb_192 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_sentiment_analysis_ellipticcurve_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_sentiment_analysis_ellipticcurve_pipeline_en.md new file mode 100644 index 00000000000000..a963cb3c5498a3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_sentiment_analysis_ellipticcurve_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_sentiment_analysis_ellipticcurve_pipeline pipeline DistilBertForSequenceClassification from EllipticCurve +author: John Snow Labs +name: distilbert_sentiment_analysis_ellipticcurve_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_sentiment_analysis_ellipticcurve_pipeline` is a English model originally trained by EllipticCurve. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_sentiment_analysis_ellipticcurve_pipeline_en_5.5.0_3.0_1726669899011.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_sentiment_analysis_ellipticcurve_pipeline_en_5.5.0_3.0_1726669899011.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_sentiment_analysis_ellipticcurve_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_sentiment_analysis_ellipticcurve_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_sentiment_analysis_ellipticcurve_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/EllipticCurve/DistilBERT-sentiment-analysis + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_toxicity_classification_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_toxicity_classification_en.md new file mode 100644 index 00000000000000..3c1ba7350c3add --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_toxicity_classification_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_toxicity_classification DistilBertForSequenceClassification from newsmediabias +author: John Snow Labs +name: distilbert_toxicity_classification +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_toxicity_classification` is a English model originally trained by newsmediabias. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_toxicity_classification_en_5.5.0_3.0_1726625254125.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_toxicity_classification_en_5.5.0_3.0_1726625254125.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_toxicity_classification","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_toxicity_classification", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_toxicity_classification| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/newsmediabias/DistilBert_Toxicity_Classification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_toxicity_classification_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_toxicity_classification_pipeline_en.md new file mode 100644 index 00000000000000..3d57b42e914677 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_toxicity_classification_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_toxicity_classification_pipeline pipeline DistilBertForSequenceClassification from newsmediabias +author: John Snow Labs +name: distilbert_toxicity_classification_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_toxicity_classification_pipeline` is a English model originally trained by newsmediabias. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_toxicity_classification_pipeline_en_5.5.0_3.0_1726625269654.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_toxicity_classification_pipeline_en_5.5.0_3.0_1726625269654.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_toxicity_classification_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_toxicity_classification_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_toxicity_classification_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/newsmediabias/DistilBert_Toxicity_Classification + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_turkish_turkish_news_pipeline_tr.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_turkish_turkish_news_pipeline_tr.md new file mode 100644 index 00000000000000..617611e3d88b53 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_turkish_turkish_news_pipeline_tr.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Turkish distilbert_turkish_turkish_news_pipeline pipeline DistilBertForSequenceClassification from anilguven +author: John Snow Labs +name: distilbert_turkish_turkish_news_pipeline +date: 2024-09-18 +tags: [tr, open_source, pipeline, onnx] +task: Text Classification +language: tr +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_turkish_turkish_news_pipeline` is a Turkish model originally trained by anilguven. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_turkish_turkish_news_pipeline_tr_5.5.0_3.0_1726676883591.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_turkish_turkish_news_pipeline_tr_5.5.0_3.0_1726676883591.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_turkish_turkish_news_pipeline", lang = "tr") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_turkish_turkish_news_pipeline", lang = "tr") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_turkish_turkish_news_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|tr| +|Size:|254.1 MB| + +## References + +https://huggingface.co/anilguven/distilbert_tr_turkish_news + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_turkish_turkish_news_tr.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_turkish_turkish_news_tr.md new file mode 100644 index 00000000000000..72ac1b7de7193a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_turkish_turkish_news_tr.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Turkish distilbert_turkish_turkish_news DistilBertForSequenceClassification from anilguven +author: John Snow Labs +name: distilbert_turkish_turkish_news +date: 2024-09-18 +tags: [tr, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: tr +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_turkish_turkish_news` is a Turkish model originally trained by anilguven. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_turkish_turkish_news_tr_5.5.0_3.0_1726676870194.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_turkish_turkish_news_tr_5.5.0_3.0_1726676870194.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_turkish_turkish_news","tr") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_turkish_turkish_news", "tr") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_turkish_turkish_news| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|tr| +|Size:|254.1 MB| + +## References + +https://huggingface.co/anilguven/distilbert_tr_turkish_news \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_twitterfin_padding90model_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_twitterfin_padding90model_en.md new file mode 100644 index 00000000000000..31eebf17de7682 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_twitterfin_padding90model_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_twitterfin_padding90model DistilBertForSequenceClassification from Realgon +author: John Snow Labs +name: distilbert_twitterfin_padding90model +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_twitterfin_padding90model` is a English model originally trained by Realgon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_twitterfin_padding90model_en_5.5.0_3.0_1726695452851.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_twitterfin_padding90model_en_5.5.0_3.0_1726695452851.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_twitterfin_padding90model","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_twitterfin_padding90model", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_twitterfin_padding90model| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Realgon/distilbert_twitterfin_padding90model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbertfinetunehs3e8bhlr_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbertfinetunehs3e8bhlr_pipeline_en.md new file mode 100644 index 00000000000000..f41fbe3ffb8f64 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbertfinetunehs3e8bhlr_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English distilbertfinetunehs3e8bhlr_pipeline pipeline DistilBertForQuestionAnswering from KarthikAlagarsamy +author: John Snow Labs +name: distilbertfinetunehs3e8bhlr_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbertfinetunehs3e8bhlr_pipeline` is a English model originally trained by KarthikAlagarsamy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbertfinetunehs3e8bhlr_pipeline_en_5.5.0_3.0_1726644347206.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbertfinetunehs3e8bhlr_pipeline_en_5.5.0_3.0_1726644347206.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbertfinetunehs3e8bhlr_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbertfinetunehs3e8bhlr_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbertfinetunehs3e8bhlr_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/KarthikAlagarsamy/distilbertfinetuneHS3E8BHLR + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbertforclassification_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbertforclassification_en.md new file mode 100644 index 00000000000000..faecc928514815 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbertforclassification_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbertforclassification DistilBertForSequenceClassification from poooj +author: John Snow Labs +name: distilbertforclassification +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbertforclassification` is a English model originally trained by poooj. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbertforclassification_en_5.5.0_3.0_1726680509003.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbertforclassification_en_5.5.0_3.0_1726680509003.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbertforclassification","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbertforclassification", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbertforclassification| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/poooj/DistilBERTForClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbertmultilang_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbertmultilang_en.md new file mode 100644 index 00000000000000..2e70def74f9c6f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbertmultilang_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbertmultilang DistilBertForSequenceClassification from baihaqy +author: John Snow Labs +name: distilbertmultilang +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbertmultilang` is a English model originally trained by baihaqy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbertmultilang_en_5.5.0_3.0_1726696128351.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbertmultilang_en_5.5.0_3.0_1726696128351.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbertmultilang","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbertmultilang", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbertmultilang| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|507.6 MB| + +## References + +https://huggingface.co/baihaqy/distilbertmultilang \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbertmultilang_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbertmultilang_pipeline_en.md new file mode 100644 index 00000000000000..a9551589663f46 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbertmultilang_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbertmultilang_pipeline pipeline DistilBertForSequenceClassification from baihaqy +author: John Snow Labs +name: distilbertmultilang_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbertmultilang_pipeline` is a English model originally trained by baihaqy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbertmultilang_pipeline_en_5.5.0_3.0_1726696153251.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbertmultilang_pipeline_en_5.5.0_3.0_1726696153251.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbertmultilang_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbertmultilang_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbertmultilang_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|507.6 MB| + +## References + +https://huggingface.co/baihaqy/distilbertmultilang + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilkobert_ep4_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilkobert_ep4_en.md new file mode 100644 index 00000000000000..a821b4a394611f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilkobert_ep4_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilkobert_ep4 DistilBertForSequenceClassification from yeye776 +author: John Snow Labs +name: distilkobert_ep4 +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilkobert_ep4` is a English model originally trained by yeye776. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilkobert_ep4_en_5.5.0_3.0_1726681677016.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilkobert_ep4_en_5.5.0_3.0_1726681677016.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilkobert_ep4","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilkobert_ep4", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilkobert_ep4| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|106.5 MB| + +## References + +https://huggingface.co/yeye776/DistilKoBERT-ep4 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilroberta_base_fb_housing_posts_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilroberta_base_fb_housing_posts_en.md new file mode 100644 index 00000000000000..67594f95127d0a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilroberta_base_fb_housing_posts_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilroberta_base_fb_housing_posts RoBertaForSequenceClassification from hoaj +author: John Snow Labs +name: distilroberta_base_fb_housing_posts +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilroberta_base_fb_housing_posts` is a English model originally trained by hoaj. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilroberta_base_fb_housing_posts_en_5.5.0_3.0_1726622372318.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilroberta_base_fb_housing_posts_en_5.5.0_3.0_1726622372318.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("distilroberta_base_fb_housing_posts","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("distilroberta_base_fb_housing_posts", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilroberta_base_fb_housing_posts| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|309.0 MB| + +## References + +https://huggingface.co/hoaj/distilroberta-base-fb-housing-posts \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilroberta_base_fb_housing_posts_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilroberta_base_fb_housing_posts_pipeline_en.md new file mode 100644 index 00000000000000..3587788e6cdb6c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilroberta_base_fb_housing_posts_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilroberta_base_fb_housing_posts_pipeline pipeline RoBertaForSequenceClassification from hoaj +author: John Snow Labs +name: distilroberta_base_fb_housing_posts_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilroberta_base_fb_housing_posts_pipeline` is a English model originally trained by hoaj. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilroberta_base_fb_housing_posts_pipeline_en_5.5.0_3.0_1726622386912.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilroberta_base_fb_housing_posts_pipeline_en_5.5.0_3.0_1726622386912.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilroberta_base_fb_housing_posts_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilroberta_base_fb_housing_posts_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilroberta_base_fb_housing_posts_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|309.1 MB| + +## References + +https://huggingface.co/hoaj/distilroberta-base-fb-housing-posts + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilroberta_base_finetuned_agnews_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilroberta_base_finetuned_agnews_en.md new file mode 100644 index 00000000000000..f2dcff2aadbbc8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilroberta_base_finetuned_agnews_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilroberta_base_finetuned_agnews RoBertaForSequenceClassification from tamhuynh27 +author: John Snow Labs +name: distilroberta_base_finetuned_agnews +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilroberta_base_finetuned_agnews` is a English model originally trained by tamhuynh27. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilroberta_base_finetuned_agnews_en_5.5.0_3.0_1726665829805.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilroberta_base_finetuned_agnews_en_5.5.0_3.0_1726665829805.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("distilroberta_base_finetuned_agnews","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("distilroberta_base_finetuned_agnews", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilroberta_base_finetuned_agnews| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|308.7 MB| + +## References + +https://huggingface.co/tamhuynh27/distilroberta-base-finetuned-agnews \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilroberta_base_ft_conservatives_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilroberta_base_ft_conservatives_pipeline_en.md new file mode 100644 index 00000000000000..f443ffe4e0987a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilroberta_base_ft_conservatives_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilroberta_base_ft_conservatives_pipeline pipeline RoBertaEmbeddings from jkruk +author: John Snow Labs +name: distilroberta_base_ft_conservatives_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilroberta_base_ft_conservatives_pipeline` is a English model originally trained by jkruk. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilroberta_base_ft_conservatives_pipeline_en_5.5.0_3.0_1726677977584.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilroberta_base_ft_conservatives_pipeline_en_5.5.0_3.0_1726677977584.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilroberta_base_ft_conservatives_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilroberta_base_ft_conservatives_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilroberta_base_ft_conservatives_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|306.5 MB| + +## References + +https://huggingface.co/jkruk/distilroberta-base-ft-conservatives + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilroberta_base_mrpc_glue_alvaro_castillo_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilroberta_base_mrpc_glue_alvaro_castillo_pipeline_en.md new file mode 100644 index 00000000000000..70642d29983d3b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilroberta_base_mrpc_glue_alvaro_castillo_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilroberta_base_mrpc_glue_alvaro_castillo_pipeline pipeline RoBertaForSequenceClassification from Mrbanano +author: John Snow Labs +name: distilroberta_base_mrpc_glue_alvaro_castillo_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilroberta_base_mrpc_glue_alvaro_castillo_pipeline` is a English model originally trained by Mrbanano. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilroberta_base_mrpc_glue_alvaro_castillo_pipeline_en_5.5.0_3.0_1726690251747.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilroberta_base_mrpc_glue_alvaro_castillo_pipeline_en_5.5.0_3.0_1726690251747.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilroberta_base_mrpc_glue_alvaro_castillo_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilroberta_base_mrpc_glue_alvaro_castillo_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilroberta_base_mrpc_glue_alvaro_castillo_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|308.6 MB| + +## References + +https://huggingface.co/Mrbanano/distilroberta-base-mrpc-glue-alvaro-castillo + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilroberta_base_rb156k_ep40_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilroberta_base_rb156k_ep40_en.md new file mode 100644 index 00000000000000..2d6fecc548f9ec --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilroberta_base_rb156k_ep40_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilroberta_base_rb156k_ep40 RoBertaEmbeddings from judy93536 +author: John Snow Labs +name: distilroberta_base_rb156k_ep40 +date: 2024-09-18 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilroberta_base_rb156k_ep40` is a English model originally trained by judy93536. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilroberta_base_rb156k_ep40_en_5.5.0_3.0_1726626698360.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilroberta_base_rb156k_ep40_en_5.5.0_3.0_1726626698360.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("distilroberta_base_rb156k_ep40","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("distilroberta_base_rb156k_ep40","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilroberta_base_rb156k_ep40| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|306.0 MB| + +## References + +https://huggingface.co/judy93536/distilroberta-base-rb156k-ep40 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilroberta_base_rocstories_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilroberta_base_rocstories_pipeline_en.md new file mode 100644 index 00000000000000..051d77262a2475 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilroberta_base_rocstories_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilroberta_base_rocstories_pipeline pipeline RoBertaForSequenceClassification from KeiHeityuu +author: John Snow Labs +name: distilroberta_base_rocstories_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilroberta_base_rocstories_pipeline` is a English model originally trained by KeiHeityuu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilroberta_base_rocstories_pipeline_en_5.5.0_3.0_1726665755403.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilroberta_base_rocstories_pipeline_en_5.5.0_3.0_1726665755403.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilroberta_base_rocstories_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilroberta_base_rocstories_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilroberta_base_rocstories_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|308.7 MB| + +## References + +https://huggingface.co/KeiHeityuu/distilroberta-base-rocstories + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-divya_resume_model_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-divya_resume_model_pipeline_en.md new file mode 100644 index 00000000000000..947e0d9d46651a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-divya_resume_model_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English divya_resume_model_pipeline pipeline DistilBertForSequenceClassification from Divyaamith +author: John Snow Labs +name: divya_resume_model_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`divya_resume_model_pipeline` is a English model originally trained by Divyaamith. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/divya_resume_model_pipeline_en_5.5.0_3.0_1726676875580.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/divya_resume_model_pipeline_en_5.5.0_3.0_1726676875580.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("divya_resume_model_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("divya_resume_model_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|divya_resume_model_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Divyaamith/divya_resume_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-ecomm_model_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-ecomm_model_pipeline_en.md new file mode 100644 index 00000000000000..b9eb9fad00e69d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-ecomm_model_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English ecomm_model_pipeline pipeline DistilBertForSequenceClassification from aaanhnht +author: John Snow Labs +name: ecomm_model_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ecomm_model_pipeline` is a English model originally trained by aaanhnht. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ecomm_model_pipeline_en_5.5.0_3.0_1726625870080.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ecomm_model_pipeline_en_5.5.0_3.0_1726625870080.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("ecomm_model_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("ecomm_model_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ecomm_model_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/aaanhnht/ecomm_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-eemm_dimensions_12012024_en.md b/docs/_posts/ahmedlone127/2024-09-18-eemm_dimensions_12012024_en.md new file mode 100644 index 00000000000000..988cce21f65ca3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-eemm_dimensions_12012024_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English eemm_dimensions_12012024 DistilBertForSequenceClassification from chernandezc +author: John Snow Labs +name: eemm_dimensions_12012024 +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`eemm_dimensions_12012024` is a English model originally trained by chernandezc. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/eemm_dimensions_12012024_en_5.5.0_3.0_1726625457393.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/eemm_dimensions_12012024_en_5.5.0_3.0_1726625457393.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("eemm_dimensions_12012024","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("eemm_dimensions_12012024", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|eemm_dimensions_12012024| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/chernandezc/EEMM_Dimensions_12012024 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-elicitsbackgroundknowledge_a6000_0_00001_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-elicitsbackgroundknowledge_a6000_0_00001_pipeline_en.md new file mode 100644 index 00000000000000..91b23dda34fc3f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-elicitsbackgroundknowledge_a6000_0_00001_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English elicitsbackgroundknowledge_a6000_0_00001_pipeline pipeline RoBertaForSequenceClassification from rose-e-wang +author: John Snow Labs +name: elicitsbackgroundknowledge_a6000_0_00001_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`elicitsbackgroundknowledge_a6000_0_00001_pipeline` is a English model originally trained by rose-e-wang. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/elicitsbackgroundknowledge_a6000_0_00001_pipeline_en_5.5.0_3.0_1726627656636.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/elicitsbackgroundknowledge_a6000_0_00001_pipeline_en_5.5.0_3.0_1726627656636.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("elicitsbackgroundknowledge_a6000_0_00001_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("elicitsbackgroundknowledge_a6000_0_00001_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|elicitsbackgroundknowledge_a6000_0_00001_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/rose-e-wang/elicitsBackgroundKnowledge_a6000_0.00001 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-ellis_v1_emotion_leadership12_en.md b/docs/_posts/ahmedlone127/2024-09-18-ellis_v1_emotion_leadership12_en.md new file mode 100644 index 00000000000000..6acb932a285428 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-ellis_v1_emotion_leadership12_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English ellis_v1_emotion_leadership12 DistilBertForSequenceClassification from gsl22 +author: John Snow Labs +name: ellis_v1_emotion_leadership12 +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ellis_v1_emotion_leadership12` is a English model originally trained by gsl22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ellis_v1_emotion_leadership12_en_5.5.0_3.0_1726696524612.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ellis_v1_emotion_leadership12_en_5.5.0_3.0_1726696524612.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("ellis_v1_emotion_leadership12","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("ellis_v1_emotion_leadership12", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ellis_v1_emotion_leadership12| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/gsl22/ellis-v1-emotion-leadership12 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-emobert_english_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-emobert_english_pipeline_en.md new file mode 100644 index 00000000000000..b18c9281e475d6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-emobert_english_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English emobert_english_pipeline pipeline RoBertaForSequenceClassification from NLPinas +author: John Snow Labs +name: emobert_english_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`emobert_english_pipeline` is a English model originally trained by NLPinas. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/emobert_english_pipeline_en_5.5.0_3.0_1726627525323.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/emobert_english_pipeline_en_5.5.0_3.0_1726627525323.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("emobert_english_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("emobert_english_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|emobert_english_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|420.8 MB| + +## References + +https://huggingface.co/NLPinas/EMoBERT-en + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-emoji_emoji_random2_seed1_bernice_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-emoji_emoji_random2_seed1_bernice_pipeline_en.md new file mode 100644 index 00000000000000..c15e08490e60ab --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-emoji_emoji_random2_seed1_bernice_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English emoji_emoji_random2_seed1_bernice_pipeline pipeline XlmRoBertaForSequenceClassification from tweettemposhift +author: John Snow Labs +name: emoji_emoji_random2_seed1_bernice_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`emoji_emoji_random2_seed1_bernice_pipeline` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/emoji_emoji_random2_seed1_bernice_pipeline_en_5.5.0_3.0_1726697662733.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/emoji_emoji_random2_seed1_bernice_pipeline_en_5.5.0_3.0_1726697662733.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("emoji_emoji_random2_seed1_bernice_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("emoji_emoji_random2_seed1_bernice_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|emoji_emoji_random2_seed1_bernice_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|825.2 MB| + +## References + +https://huggingface.co/tweettemposhift/emoji-emoji_random2_seed1-bernice + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-emotions_classifier_en.md b/docs/_posts/ahmedlone127/2024-09-18-emotions_classifier_en.md new file mode 100644 index 00000000000000..5f3d0384300ed4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-emotions_classifier_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English emotions_classifier DistilBertForSequenceClassification from XtraPatrick987 +author: John Snow Labs +name: emotions_classifier +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`emotions_classifier` is a English model originally trained by XtraPatrick987. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/emotions_classifier_en_5.5.0_3.0_1726680645783.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/emotions_classifier_en_5.5.0_3.0_1726680645783.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("emotions_classifier","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("emotions_classifier", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|emotions_classifier| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/XtraPatrick987/emotions-classifier \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-emotions_classifier_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-emotions_classifier_pipeline_en.md new file mode 100644 index 00000000000000..ebc08cc6f144c8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-emotions_classifier_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English emotions_classifier_pipeline pipeline DistilBertForSequenceClassification from XtraPatrick987 +author: John Snow Labs +name: emotions_classifier_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`emotions_classifier_pipeline` is a English model originally trained by XtraPatrick987. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/emotions_classifier_pipeline_en_5.5.0_3.0_1726680658193.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/emotions_classifier_pipeline_en_5.5.0_3.0_1726680658193.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("emotions_classifier_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("emotions_classifier_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|emotions_classifier_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/XtraPatrick987/emotions-classifier + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-empathic_concern_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-empathic_concern_pipeline_en.md new file mode 100644 index 00000000000000..b2dd2372e367e7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-empathic_concern_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English empathic_concern_pipeline pipeline RoBertaForSequenceClassification from codesj +author: John Snow Labs +name: empathic_concern_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`empathic_concern_pipeline` is a English model originally trained by codesj. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/empathic_concern_pipeline_en_5.5.0_3.0_1726628450534.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/empathic_concern_pipeline_en_5.5.0_3.0_1726628450534.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("empathic_concern_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("empathic_concern_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|empathic_concern_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|431.3 MB| + +## References + +https://huggingface.co/codesj/empathic-concern + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-enlm_roberta_81_imdb_en.md b/docs/_posts/ahmedlone127/2024-09-18-enlm_roberta_81_imdb_en.md new file mode 100644 index 00000000000000..79cd253660bc36 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-enlm_roberta_81_imdb_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English enlm_roberta_81_imdb XlmRoBertaForSequenceClassification from manirai91 +author: John Snow Labs +name: enlm_roberta_81_imdb +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`enlm_roberta_81_imdb` is a English model originally trained by manirai91. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/enlm_roberta_81_imdb_en_5.5.0_3.0_1726633200185.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/enlm_roberta_81_imdb_en_5.5.0_3.0_1726633200185.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("enlm_roberta_81_imdb","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("enlm_roberta_81_imdb", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|enlm_roberta_81_imdb| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|466.6 MB| + +## References + +https://huggingface.co/manirai91/enlm-roberta-81-imdb \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-esg_classification_bert_all_data_0509_other_v2_en.md b/docs/_posts/ahmedlone127/2024-09-18-esg_classification_bert_all_data_0509_other_v2_en.md new file mode 100644 index 00000000000000..92ba6347b87643 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-esg_classification_bert_all_data_0509_other_v2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English esg_classification_bert_all_data_0509_other_v2 RoBertaForSequenceClassification from dsmsb +author: John Snow Labs +name: esg_classification_bert_all_data_0509_other_v2 +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`esg_classification_bert_all_data_0509_other_v2` is a English model originally trained by dsmsb. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/esg_classification_bert_all_data_0509_other_v2_en_5.5.0_3.0_1726641819985.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/esg_classification_bert_all_data_0509_other_v2_en_5.5.0_3.0_1726641819985.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("esg_classification_bert_all_data_0509_other_v2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("esg_classification_bert_all_data_0509_other_v2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|esg_classification_bert_all_data_0509_other_v2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|444.7 MB| + +## References + +https://huggingface.co/dsmsb/esg-classification_bert_all_data_0509_other_v2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-fake_news_classifier_mahendrakharra_en.md b/docs/_posts/ahmedlone127/2024-09-18-fake_news_classifier_mahendrakharra_en.md new file mode 100644 index 00000000000000..87abc50b732344 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-fake_news_classifier_mahendrakharra_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English fake_news_classifier_mahendrakharra DistilBertForSequenceClassification from Mahendrakharra +author: John Snow Labs +name: fake_news_classifier_mahendrakharra +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`fake_news_classifier_mahendrakharra` is a English model originally trained by Mahendrakharra. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/fake_news_classifier_mahendrakharra_en_5.5.0_3.0_1726680844635.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/fake_news_classifier_mahendrakharra_en_5.5.0_3.0_1726680844635.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("fake_news_classifier_mahendrakharra","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("fake_news_classifier_mahendrakharra", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|fake_news_classifier_mahendrakharra| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Mahendrakharra/Fake-News-Classifier \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-fake_news_classifier_mahendrakharra_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-fake_news_classifier_mahendrakharra_pipeline_en.md new file mode 100644 index 00000000000000..9b9bd72a28dec2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-fake_news_classifier_mahendrakharra_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English fake_news_classifier_mahendrakharra_pipeline pipeline DistilBertForSequenceClassification from Mahendrakharra +author: John Snow Labs +name: fake_news_classifier_mahendrakharra_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`fake_news_classifier_mahendrakharra_pipeline` is a English model originally trained by Mahendrakharra. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/fake_news_classifier_mahendrakharra_pipeline_en_5.5.0_3.0_1726680857362.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/fake_news_classifier_mahendrakharra_pipeline_en_5.5.0_3.0_1726680857362.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("fake_news_classifier_mahendrakharra_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("fake_news_classifier_mahendrakharra_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|fake_news_classifier_mahendrakharra_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Mahendrakharra/Fake-News-Classifier + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-fake_news_detect_en.md b/docs/_posts/ahmedlone127/2024-09-18-fake_news_detect_en.md new file mode 100644 index 00000000000000..acff6c2544f28b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-fake_news_detect_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English fake_news_detect DistilBertForSequenceClassification from Hemg +author: John Snow Labs +name: fake_news_detect +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`fake_news_detect` is a English model originally trained by Hemg. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/fake_news_detect_en_5.5.0_3.0_1726676988797.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/fake_news_detect_en_5.5.0_3.0_1726676988797.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("fake_news_detect","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("fake_news_detect", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|fake_news_detect| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Hemg/fake-news-detect \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-fidicbert_en.md b/docs/_posts/ahmedlone127/2024-09-18-fidicbert_en.md new file mode 100644 index 00000000000000..7e0f461c8952af --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-fidicbert_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English fidicbert RoBertaEmbeddings from Jzz +author: John Snow Labs +name: fidicbert +date: 2024-09-18 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`fidicbert` is a English model originally trained by Jzz. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/fidicbert_en_5.5.0_3.0_1726626876863.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/fidicbert_en_5.5.0_3.0_1726626876863.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("fidicbert","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("fidicbert","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|fidicbert| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|465.9 MB| + +## References + +https://huggingface.co/Jzz/FidicBERT \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-film95000roberta_base_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-film95000roberta_base_pipeline_en.md new file mode 100644 index 00000000000000..feba03ca9e279e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-film95000roberta_base_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English film95000roberta_base_pipeline pipeline RoBertaEmbeddings from AmaiaSolaun +author: John Snow Labs +name: film95000roberta_base_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`film95000roberta_base_pipeline` is a English model originally trained by AmaiaSolaun. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/film95000roberta_base_pipeline_en_5.5.0_3.0_1726651634136.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/film95000roberta_base_pipeline_en_5.5.0_3.0_1726651634136.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("film95000roberta_base_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("film95000roberta_base_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|film95000roberta_base_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|465.9 MB| + +## References + +https://huggingface.co/AmaiaSolaun/film95000roberta-base + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-finace_nepal_bhasa_en.md b/docs/_posts/ahmedlone127/2024-09-18-finace_nepal_bhasa_en.md new file mode 100644 index 00000000000000..42c8e019ff3896 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-finace_nepal_bhasa_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finace_nepal_bhasa RoBertaForSequenceClassification from yzhangqs +author: John Snow Labs +name: finace_nepal_bhasa +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finace_nepal_bhasa` is a English model originally trained by yzhangqs. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finace_nepal_bhasa_en_5.5.0_3.0_1726627910820.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finace_nepal_bhasa_en_5.5.0_3.0_1726627910820.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("finace_nepal_bhasa","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("finace_nepal_bhasa", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finace_nepal_bhasa| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|308.7 MB| + +## References + +https://huggingface.co/yzhangqs/Finace_new \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-financial_model_en.md b/docs/_posts/ahmedlone127/2024-09-18-financial_model_en.md new file mode 100644 index 00000000000000..8287fcb0cb2420 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-financial_model_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English financial_model RoBertaEmbeddings from anablasi +author: John Snow Labs +name: financial_model +date: 2024-09-18 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`financial_model` is a English model originally trained by anablasi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/financial_model_en_5.5.0_3.0_1726618174288.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/financial_model_en_5.5.0_3.0_1726618174288.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("financial_model","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("financial_model","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|financial_model| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|466.4 MB| + +## References + +https://huggingface.co/anablasi/financial_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-fine_tune_embeddnew_sih_2_en.md b/docs/_posts/ahmedlone127/2024-09-18-fine_tune_embeddnew_sih_2_en.md new file mode 100644 index 00000000000000..008668728ebc3c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-fine_tune_embeddnew_sih_2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English fine_tune_embeddnew_sih_2 BertForSequenceClassification from shashaaa +author: John Snow Labs +name: fine_tune_embeddnew_sih_2 +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`fine_tune_embeddnew_sih_2` is a English model originally trained by shashaaa. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/fine_tune_embeddnew_sih_2_en_5.5.0_3.0_1726624336321.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/fine_tune_embeddnew_sih_2_en_5.5.0_3.0_1726624336321.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("fine_tune_embeddnew_sih_2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("fine_tune_embeddnew_sih_2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|fine_tune_embeddnew_sih_2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.5 MB| + +## References + +https://huggingface.co/shashaaa/fine_tune_embeddnew_SIH_2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-fine_tune_roberta_exist_fine_grained_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-fine_tune_roberta_exist_fine_grained_pipeline_en.md new file mode 100644 index 00000000000000..88df708b55471a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-fine_tune_roberta_exist_fine_grained_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English fine_tune_roberta_exist_fine_grained_pipeline pipeline RoBertaForSequenceClassification from nouman-10 +author: John Snow Labs +name: fine_tune_roberta_exist_fine_grained_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`fine_tune_roberta_exist_fine_grained_pipeline` is a English model originally trained by nouman-10. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/fine_tune_roberta_exist_fine_grained_pipeline_en_5.5.0_3.0_1726642052126.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/fine_tune_roberta_exist_fine_grained_pipeline_en_5.5.0_3.0_1726642052126.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("fine_tune_roberta_exist_fine_grained_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("fine_tune_roberta_exist_fine_grained_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|fine_tune_roberta_exist_fine_grained_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|468.3 MB| + +## References + +https://huggingface.co/nouman-10/fine-tune-roberta-exist-fine-grained + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-fine_tuned_bert_model_classification_en.md b/docs/_posts/ahmedlone127/2024-09-18-fine_tuned_bert_model_classification_en.md new file mode 100644 index 00000000000000..0213bbe13da071 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-fine_tuned_bert_model_classification_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English fine_tuned_bert_model_classification BertForSequenceClassification from KareenaBeniwal +author: John Snow Labs +name: fine_tuned_bert_model_classification +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`fine_tuned_bert_model_classification` is a English model originally trained by KareenaBeniwal. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/fine_tuned_bert_model_classification_en_5.5.0_3.0_1726623752512.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/fine_tuned_bert_model_classification_en_5.5.0_3.0_1726623752512.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("fine_tuned_bert_model_classification","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("fine_tuned_bert_model_classification", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|fine_tuned_bert_model_classification| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|410.2 MB| + +## References + +https://huggingface.co/KareenaBeniwal/Fine-tuned-bert-model-classification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-finetuned_geeks_en.md b/docs/_posts/ahmedlone127/2024-09-18-finetuned_geeks_en.md new file mode 100644 index 00000000000000..ab95d01bf82c48 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-finetuned_geeks_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuned_geeks BertForTokenClassification from sampurnr +author: John Snow Labs +name: finetuned_geeks +date: 2024-09-18 +tags: [en, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuned_geeks` is a English model originally trained by sampurnr. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuned_geeks_en_5.5.0_3.0_1726698934504.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuned_geeks_en_5.5.0_3.0_1726698934504.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("finetuned_geeks","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("finetuned_geeks", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuned_geeks| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/sampurnr/finetuned-geeks \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-finetuned_sentiment_distilbert_base_uncased_model_3000_en.md b/docs/_posts/ahmedlone127/2024-09-18-finetuned_sentiment_distilbert_base_uncased_model_3000_en.md new file mode 100644 index 00000000000000..7abe5fd23ac130 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-finetuned_sentiment_distilbert_base_uncased_model_3000_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuned_sentiment_distilbert_base_uncased_model_3000 DistilBertForSequenceClassification from iamsuman +author: John Snow Labs +name: finetuned_sentiment_distilbert_base_uncased_model_3000 +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuned_sentiment_distilbert_base_uncased_model_3000` is a English model originally trained by iamsuman. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuned_sentiment_distilbert_base_uncased_model_3000_en_5.5.0_3.0_1726681819603.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuned_sentiment_distilbert_base_uncased_model_3000_en_5.5.0_3.0_1726681819603.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuned_sentiment_distilbert_base_uncased_model_3000","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuned_sentiment_distilbert_base_uncased_model_3000", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuned_sentiment_distilbert_base_uncased_model_3000| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/iamsuman/finetuned-sentiment-distilbert-base-uncased-model-3000 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-finetuned_sentiment_distilbert_base_uncased_model_3000_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-finetuned_sentiment_distilbert_base_uncased_model_3000_pipeline_en.md new file mode 100644 index 00000000000000..9fde53c4d379d8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-finetuned_sentiment_distilbert_base_uncased_model_3000_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuned_sentiment_distilbert_base_uncased_model_3000_pipeline pipeline DistilBertForSequenceClassification from iamsuman +author: John Snow Labs +name: finetuned_sentiment_distilbert_base_uncased_model_3000_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuned_sentiment_distilbert_base_uncased_model_3000_pipeline` is a English model originally trained by iamsuman. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuned_sentiment_distilbert_base_uncased_model_3000_pipeline_en_5.5.0_3.0_1726681832042.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuned_sentiment_distilbert_base_uncased_model_3000_pipeline_en_5.5.0_3.0_1726681832042.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuned_sentiment_distilbert_base_uncased_model_3000_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuned_sentiment_distilbert_base_uncased_model_3000_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuned_sentiment_distilbert_base_uncased_model_3000_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/iamsuman/finetuned-sentiment-distilbert-base-uncased-model-3000 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-finetunedmodel_review_sentimentanalysis_en.md b/docs/_posts/ahmedlone127/2024-09-18-finetunedmodel_review_sentimentanalysis_en.md new file mode 100644 index 00000000000000..0ed36ddf463c3c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-finetunedmodel_review_sentimentanalysis_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetunedmodel_review_sentimentanalysis DistilBertForSequenceClassification from hanyundudddd +author: John Snow Labs +name: finetunedmodel_review_sentimentanalysis +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetunedmodel_review_sentimentanalysis` is a English model originally trained by hanyundudddd. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetunedmodel_review_sentimentanalysis_en_5.5.0_3.0_1726680233995.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetunedmodel_review_sentimentanalysis_en_5.5.0_3.0_1726680233995.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetunedmodel_review_sentimentanalysis","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetunedmodel_review_sentimentanalysis", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetunedmodel_review_sentimentanalysis| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/hanyundudddd/FinetunedModel_Review_SentimentAnalysis \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-finetunedmodel_review_sentimentanalysis_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-finetunedmodel_review_sentimentanalysis_pipeline_en.md new file mode 100644 index 00000000000000..78beb99b827aa5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-finetunedmodel_review_sentimentanalysis_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetunedmodel_review_sentimentanalysis_pipeline pipeline DistilBertForSequenceClassification from hanyundudddd +author: John Snow Labs +name: finetunedmodel_review_sentimentanalysis_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetunedmodel_review_sentimentanalysis_pipeline` is a English model originally trained by hanyundudddd. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetunedmodel_review_sentimentanalysis_pipeline_en_5.5.0_3.0_1726680246117.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetunedmodel_review_sentimentanalysis_pipeline_en_5.5.0_3.0_1726680246117.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetunedmodel_review_sentimentanalysis_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetunedmodel_review_sentimentanalysis_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetunedmodel_review_sentimentanalysis_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/hanyundudddd/FinetunedModel_Review_SentimentAnalysis + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-finetuning_sentiment_analysis_asif1997_en.md b/docs/_posts/ahmedlone127/2024-09-18-finetuning_sentiment_analysis_asif1997_en.md new file mode 100644 index 00000000000000..c194d448778ef1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-finetuning_sentiment_analysis_asif1997_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuning_sentiment_analysis_asif1997 DistilBertForSequenceClassification from Asif1997 +author: John Snow Labs +name: finetuning_sentiment_analysis_asif1997 +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_analysis_asif1997` is a English model originally trained by Asif1997. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_analysis_asif1997_en_5.5.0_3.0_1726694954416.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_analysis_asif1997_en_5.5.0_3.0_1726694954416.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_analysis_asif1997","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_analysis_asif1997", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_analysis_asif1997| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Asif1997/finetuning-sentiment-analysis \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-finetuning_sentiment_model_3000_samples_a00954334_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-finetuning_sentiment_model_3000_samples_a00954334_pipeline_en.md new file mode 100644 index 00000000000000..927ba847e71b67 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-finetuning_sentiment_model_3000_samples_a00954334_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_a00954334_pipeline pipeline DistilBertForSequenceClassification from A00954334 +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_a00954334_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_a00954334_pipeline` is a English model originally trained by A00954334. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_a00954334_pipeline_en_5.5.0_3.0_1726625933230.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_a00954334_pipeline_en_5.5.0_3.0_1726625933230.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuning_sentiment_model_3000_samples_a00954334_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuning_sentiment_model_3000_samples_a00954334_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_a00954334_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/A00954334/finetuning-sentiment-model-3000-samples + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-finetuning_sentiment_model_3000_samples_aguinrodriguezj_en.md b/docs/_posts/ahmedlone127/2024-09-18-finetuning_sentiment_model_3000_samples_aguinrodriguezj_en.md new file mode 100644 index 00000000000000..e37eecfe0c6416 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-finetuning_sentiment_model_3000_samples_aguinrodriguezj_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_aguinrodriguezj DistilBertForSequenceClassification from aguinrodriguezj +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_aguinrodriguezj +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_aguinrodriguezj` is a English model originally trained by aguinrodriguezj. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_aguinrodriguezj_en_5.5.0_3.0_1726696542664.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_aguinrodriguezj_en_5.5.0_3.0_1726696542664.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_aguinrodriguezj","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_aguinrodriguezj", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_aguinrodriguezj| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/aguinrodriguezj/finetuning-sentiment-model-3000-samples \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-finetuning_sentiment_model_3000_samples_bberken_en.md b/docs/_posts/ahmedlone127/2024-09-18-finetuning_sentiment_model_3000_samples_bberken_en.md new file mode 100644 index 00000000000000..1c7873a6afb5ad --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-finetuning_sentiment_model_3000_samples_bberken_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_bberken DistilBertForSequenceClassification from bberken +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_bberken +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_bberken` is a English model originally trained by bberken. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_bberken_en_5.5.0_3.0_1726680609942.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_bberken_en_5.5.0_3.0_1726680609942.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_bberken","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_bberken", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_bberken| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/bberken/finetuning-sentiment-model-3000-samples \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-finetuning_sentiment_model_3000_samples_bberken_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-finetuning_sentiment_model_3000_samples_bberken_pipeline_en.md new file mode 100644 index 00000000000000..e2445ab520b087 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-finetuning_sentiment_model_3000_samples_bberken_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_bberken_pipeline pipeline DistilBertForSequenceClassification from bberken +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_bberken_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_bberken_pipeline` is a English model originally trained by bberken. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_bberken_pipeline_en_5.5.0_3.0_1726680622871.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_bberken_pipeline_en_5.5.0_3.0_1726680622871.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuning_sentiment_model_3000_samples_bberken_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuning_sentiment_model_3000_samples_bberken_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_bberken_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/bberken/finetuning-sentiment-model-3000-samples + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-finetuning_sentiment_model_3000_samples_denniswangxy_en.md b/docs/_posts/ahmedlone127/2024-09-18-finetuning_sentiment_model_3000_samples_denniswangxy_en.md new file mode 100644 index 00000000000000..b0b9f01c67feed --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-finetuning_sentiment_model_3000_samples_denniswangxy_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_denniswangxy DistilBertForSequenceClassification from denniswangxy +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_denniswangxy +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_denniswangxy` is a English model originally trained by denniswangxy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_denniswangxy_en_5.5.0_3.0_1726630874888.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_denniswangxy_en_5.5.0_3.0_1726630874888.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_denniswangxy","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_denniswangxy", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_denniswangxy| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/denniswangxy/finetuning-sentiment-model-3000-samples \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-finetuning_sentiment_model_3000_samples_dn_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-finetuning_sentiment_model_3000_samples_dn_pipeline_en.md new file mode 100644 index 00000000000000..ffff759ec04f4b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-finetuning_sentiment_model_3000_samples_dn_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_dn_pipeline pipeline DistilBertForSequenceClassification from jakeoko +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_dn_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_dn_pipeline` is a English model originally trained by jakeoko. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_dn_pipeline_en_5.5.0_3.0_1726681993796.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_dn_pipeline_en_5.5.0_3.0_1726681993796.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuning_sentiment_model_3000_samples_dn_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuning_sentiment_model_3000_samples_dn_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_dn_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/jakeoko/finetuning-sentiment-model-3000-samples-DN + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-finetuning_sentiment_model_3000_samples_mo27harakani_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-finetuning_sentiment_model_3000_samples_mo27harakani_pipeline_en.md new file mode 100644 index 00000000000000..dd6f4fd307d89f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-finetuning_sentiment_model_3000_samples_mo27harakani_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_mo27harakani_pipeline pipeline DistilBertForSequenceClassification from mo27harakani +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_mo27harakani_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_mo27harakani_pipeline` is a English model originally trained by mo27harakani. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_mo27harakani_pipeline_en_5.5.0_3.0_1726677194488.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_mo27harakani_pipeline_en_5.5.0_3.0_1726677194488.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuning_sentiment_model_3000_samples_mo27harakani_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuning_sentiment_model_3000_samples_mo27harakani_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_mo27harakani_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/mo27harakani/finetuning-sentiment-model-3000-samples + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-finetuning_sentiment_model_3000_samples_pouriaaa_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-finetuning_sentiment_model_3000_samples_pouriaaa_pipeline_en.md new file mode 100644 index 00000000000000..558a176f9a7c65 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-finetuning_sentiment_model_3000_samples_pouriaaa_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_pouriaaa_pipeline pipeline DistilBertForSequenceClassification from pouriaaa +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_pouriaaa_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_pouriaaa_pipeline` is a English model originally trained by pouriaaa. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_pouriaaa_pipeline_en_5.5.0_3.0_1726669679155.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_pouriaaa_pipeline_en_5.5.0_3.0_1726669679155.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuning_sentiment_model_3000_samples_pouriaaa_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuning_sentiment_model_3000_samples_pouriaaa_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_pouriaaa_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/pouriaaa/finetuning-sentiment-model-3000-samples + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-finetuning_sentiment_model_3000_samples_sumittyagi25_en.md b/docs/_posts/ahmedlone127/2024-09-18-finetuning_sentiment_model_3000_samples_sumittyagi25_en.md new file mode 100644 index 00000000000000..800ed8064b402d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-finetuning_sentiment_model_3000_samples_sumittyagi25_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_sumittyagi25 DistilBertForSequenceClassification from sumittyagi25 +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_sumittyagi25 +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_sumittyagi25` is a English model originally trained by sumittyagi25. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_sumittyagi25_en_5.5.0_3.0_1726630816702.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_sumittyagi25_en_5.5.0_3.0_1726630816702.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_sumittyagi25","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_sumittyagi25", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_sumittyagi25| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/sumittyagi25/finetuning-sentiment-model-3000-samples \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-finetuning_sentiment_model_social_media_en.md b/docs/_posts/ahmedlone127/2024-09-18-finetuning_sentiment_model_social_media_en.md new file mode 100644 index 00000000000000..136cf5a66e4e50 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-finetuning_sentiment_model_social_media_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuning_sentiment_model_social_media DistilBertForSequenceClassification from MariaChzhen +author: John Snow Labs +name: finetuning_sentiment_model_social_media +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_social_media` is a English model originally trained by MariaChzhen. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_social_media_en_5.5.0_3.0_1726625253472.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_social_media_en_5.5.0_3.0_1726625253472.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_social_media","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_social_media", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_social_media| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/MariaChzhen/finetuning-sentiment-model-social-media \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-finetuning_sentiment_model_social_media_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-finetuning_sentiment_model_social_media_pipeline_en.md new file mode 100644 index 00000000000000..5ededfa281ea69 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-finetuning_sentiment_model_social_media_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuning_sentiment_model_social_media_pipeline pipeline DistilBertForSequenceClassification from MariaChzhen +author: John Snow Labs +name: finetuning_sentiment_model_social_media_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_social_media_pipeline` is a English model originally trained by MariaChzhen. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_social_media_pipeline_en_5.5.0_3.0_1726625267541.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_social_media_pipeline_en_5.5.0_3.0_1726625267541.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuning_sentiment_model_social_media_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuning_sentiment_model_social_media_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_social_media_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/MariaChzhen/finetuning-sentiment-model-social-media + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-finnews_sentimentanalysis_v3_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-finnews_sentimentanalysis_v3_pipeline_en.md new file mode 100644 index 00000000000000..4d71f04cf68fce --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-finnews_sentimentanalysis_v3_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finnews_sentimentanalysis_v3_pipeline pipeline DistilBertForSequenceClassification from ZephyruSalsify +author: John Snow Labs +name: finnews_sentimentanalysis_v3_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finnews_sentimentanalysis_v3_pipeline` is a English model originally trained by ZephyruSalsify. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finnews_sentimentanalysis_v3_pipeline_en_5.5.0_3.0_1726681580654.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finnews_sentimentanalysis_v3_pipeline_en_5.5.0_3.0_1726681580654.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finnews_sentimentanalysis_v3_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finnews_sentimentanalysis_v3_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finnews_sentimentanalysis_v3_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|507.6 MB| + +## References + +https://huggingface.co/ZephyruSalsify/FinNews_SentimentAnalysis_v3 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-gal_ner_iwcg_5_en.md b/docs/_posts/ahmedlone127/2024-09-18-gal_ner_iwcg_5_en.md new file mode 100644 index 00000000000000..24cb356c310abf --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-gal_ner_iwcg_5_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English gal_ner_iwcg_5 XlmRoBertaForTokenClassification from homersimpson +author: John Snow Labs +name: gal_ner_iwcg_5 +date: 2024-09-18 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`gal_ner_iwcg_5` is a English model originally trained by homersimpson. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/gal_ner_iwcg_5_en_5.5.0_3.0_1726701802473.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/gal_ner_iwcg_5_en_5.5.0_3.0_1726701802473.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("gal_ner_iwcg_5","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("gal_ner_iwcg_5", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|gal_ner_iwcg_5| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|426.5 MB| + +## References + +https://huggingface.co/homersimpson/gal-ner-iwcg-5 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-gal_ner_iwcg_5_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-gal_ner_iwcg_5_pipeline_en.md new file mode 100644 index 00000000000000..069185d7fb3dfa --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-gal_ner_iwcg_5_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English gal_ner_iwcg_5_pipeline pipeline XlmRoBertaForTokenClassification from homersimpson +author: John Snow Labs +name: gal_ner_iwcg_5_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`gal_ner_iwcg_5_pipeline` is a English model originally trained by homersimpson. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/gal_ner_iwcg_5_pipeline_en_5.5.0_3.0_1726701834647.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/gal_ner_iwcg_5_pipeline_en_5.5.0_3.0_1726701834647.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("gal_ner_iwcg_5_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("gal_ner_iwcg_5_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|gal_ner_iwcg_5_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|426.5 MB| + +## References + +https://huggingface.co/homersimpson/gal-ner-iwcg-5 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-gal_sayula_popoluca_xlmr_4_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-gal_sayula_popoluca_xlmr_4_pipeline_en.md new file mode 100644 index 00000000000000..e584259e4341df --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-gal_sayula_popoluca_xlmr_4_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English gal_sayula_popoluca_xlmr_4_pipeline pipeline XlmRoBertaForTokenClassification from homersimpson +author: John Snow Labs +name: gal_sayula_popoluca_xlmr_4_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`gal_sayula_popoluca_xlmr_4_pipeline` is a English model originally trained by homersimpson. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/gal_sayula_popoluca_xlmr_4_pipeline_en_5.5.0_3.0_1726657420482.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/gal_sayula_popoluca_xlmr_4_pipeline_en_5.5.0_3.0_1726657420482.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("gal_sayula_popoluca_xlmr_4_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("gal_sayula_popoluca_xlmr_4_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|gal_sayula_popoluca_xlmr_4_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|803.5 MB| + +## References + +https://huggingface.co/homersimpson/gal-pos-xlmr-4 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-gena_lm_bert_base_t2t_ft_hepg2_1kbphg19_dhss_h3k27ac_en.md b/docs/_posts/ahmedlone127/2024-09-18-gena_lm_bert_base_t2t_ft_hepg2_1kbphg19_dhss_h3k27ac_en.md new file mode 100644 index 00000000000000..86d87abb5e91f9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-gena_lm_bert_base_t2t_ft_hepg2_1kbphg19_dhss_h3k27ac_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English gena_lm_bert_base_t2t_ft_hepg2_1kbphg19_dhss_h3k27ac BertForSequenceClassification from tanoManzo +author: John Snow Labs +name: gena_lm_bert_base_t2t_ft_hepg2_1kbphg19_dhss_h3k27ac +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`gena_lm_bert_base_t2t_ft_hepg2_1kbphg19_dhss_h3k27ac` is a English model originally trained by tanoManzo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/gena_lm_bert_base_t2t_ft_hepg2_1kbphg19_dhss_h3k27ac_en_5.5.0_3.0_1726647910989.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/gena_lm_bert_base_t2t_ft_hepg2_1kbphg19_dhss_h3k27ac_en_5.5.0_3.0_1726647910989.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("gena_lm_bert_base_t2t_ft_hepg2_1kbphg19_dhss_h3k27ac","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("gena_lm_bert_base_t2t_ft_hepg2_1kbphg19_dhss_h3k27ac", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|gena_lm_bert_base_t2t_ft_hepg2_1kbphg19_dhss_h3k27ac| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|414.0 MB| + +## References + +https://huggingface.co/tanoManzo/gena-lm-bert-base-t2t_ft_Hepg2_1kbpHG19_DHSs_H3K27AC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-gena_lm_bert_base_t2t_ft_hepg2_1kbphg19_dhss_h3k27ac_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-gena_lm_bert_base_t2t_ft_hepg2_1kbphg19_dhss_h3k27ac_pipeline_en.md new file mode 100644 index 00000000000000..3bce6526e1b69e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-gena_lm_bert_base_t2t_ft_hepg2_1kbphg19_dhss_h3k27ac_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English gena_lm_bert_base_t2t_ft_hepg2_1kbphg19_dhss_h3k27ac_pipeline pipeline BertForSequenceClassification from tanoManzo +author: John Snow Labs +name: gena_lm_bert_base_t2t_ft_hepg2_1kbphg19_dhss_h3k27ac_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`gena_lm_bert_base_t2t_ft_hepg2_1kbphg19_dhss_h3k27ac_pipeline` is a English model originally trained by tanoManzo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/gena_lm_bert_base_t2t_ft_hepg2_1kbphg19_dhss_h3k27ac_pipeline_en_5.5.0_3.0_1726647930756.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/gena_lm_bert_base_t2t_ft_hepg2_1kbphg19_dhss_h3k27ac_pipeline_en_5.5.0_3.0_1726647930756.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("gena_lm_bert_base_t2t_ft_hepg2_1kbphg19_dhss_h3k27ac_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("gena_lm_bert_base_t2t_ft_hepg2_1kbphg19_dhss_h3k27ac_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|gena_lm_bert_base_t2t_ft_hepg2_1kbphg19_dhss_h3k27ac_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|414.0 MB| + +## References + +https://huggingface.co/tanoManzo/gena-lm-bert-base-t2t_ft_Hepg2_1kbpHG19_DHSs_H3K27AC + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-hate_hate_balance_random3_seed0_roberta_large_en.md b/docs/_posts/ahmedlone127/2024-09-18-hate_hate_balance_random3_seed0_roberta_large_en.md new file mode 100644 index 00000000000000..6e601a2f11460d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-hate_hate_balance_random3_seed0_roberta_large_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English hate_hate_balance_random3_seed0_roberta_large RoBertaForSequenceClassification from tweettemposhift +author: John Snow Labs +name: hate_hate_balance_random3_seed0_roberta_large +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`hate_hate_balance_random3_seed0_roberta_large` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/hate_hate_balance_random3_seed0_roberta_large_en_5.5.0_3.0_1726649992939.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/hate_hate_balance_random3_seed0_roberta_large_en_5.5.0_3.0_1726649992939.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("hate_hate_balance_random3_seed0_roberta_large","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("hate_hate_balance_random3_seed0_roberta_large", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|hate_hate_balance_random3_seed0_roberta_large| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/tweettemposhift/hate-hate_balance_random3_seed0-roberta-large \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-hate_hate_random0_seed2_twitter_roberta_base_2019_90m_en.md b/docs/_posts/ahmedlone127/2024-09-18-hate_hate_random0_seed2_twitter_roberta_base_2019_90m_en.md new file mode 100644 index 00000000000000..415e2ada9f3ac3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-hate_hate_random0_seed2_twitter_roberta_base_2019_90m_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English hate_hate_random0_seed2_twitter_roberta_base_2019_90m RoBertaForSequenceClassification from tweettemposhift +author: John Snow Labs +name: hate_hate_random0_seed2_twitter_roberta_base_2019_90m +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`hate_hate_random0_seed2_twitter_roberta_base_2019_90m` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/hate_hate_random0_seed2_twitter_roberta_base_2019_90m_en_5.5.0_3.0_1726641406546.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/hate_hate_random0_seed2_twitter_roberta_base_2019_90m_en_5.5.0_3.0_1726641406546.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("hate_hate_random0_seed2_twitter_roberta_base_2019_90m","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("hate_hate_random0_seed2_twitter_roberta_base_2019_90m", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|hate_hate_random0_seed2_twitter_roberta_base_2019_90m| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|468.3 MB| + +## References + +https://huggingface.co/tweettemposhift/hate-hate_random0_seed2-twitter-roberta-base-2019-90m \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-hate_hate_random0_seed2_twitter_roberta_base_2019_90m_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-hate_hate_random0_seed2_twitter_roberta_base_2019_90m_pipeline_en.md new file mode 100644 index 00000000000000..0370e7a95a73ba --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-hate_hate_random0_seed2_twitter_roberta_base_2019_90m_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English hate_hate_random0_seed2_twitter_roberta_base_2019_90m_pipeline pipeline RoBertaForSequenceClassification from tweettemposhift +author: John Snow Labs +name: hate_hate_random0_seed2_twitter_roberta_base_2019_90m_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`hate_hate_random0_seed2_twitter_roberta_base_2019_90m_pipeline` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/hate_hate_random0_seed2_twitter_roberta_base_2019_90m_pipeline_en_5.5.0_3.0_1726641434740.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/hate_hate_random0_seed2_twitter_roberta_base_2019_90m_pipeline_en_5.5.0_3.0_1726641434740.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("hate_hate_random0_seed2_twitter_roberta_base_2019_90m_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("hate_hate_random0_seed2_twitter_roberta_base_2019_90m_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|hate_hate_random0_seed2_twitter_roberta_base_2019_90m_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|468.3 MB| + +## References + +https://huggingface.co/tweettemposhift/hate-hate_random0_seed2-twitter-roberta-base-2019-90m + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-helper2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-helper2_pipeline_en.md new file mode 100644 index 00000000000000..c9c3cee5693e06 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-helper2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English helper2_pipeline pipeline RoBertaForSequenceClassification from raima2001 +author: John Snow Labs +name: helper2_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`helper2_pipeline` is a English model originally trained by raima2001. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/helper2_pipeline_en_5.5.0_3.0_1726650130445.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/helper2_pipeline_en_5.5.0_3.0_1726650130445.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("helper2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("helper2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|helper2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|85.0 MB| + +## References + +https://huggingface.co/raima2001/helper2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-hf_repo_miteshkotak7_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-hf_repo_miteshkotak7_pipeline_en.md new file mode 100644 index 00000000000000..5d2f97a7c2fac6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-hf_repo_miteshkotak7_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English hf_repo_miteshkotak7_pipeline pipeline DistilBertForSequenceClassification from miteshkotak7 +author: John Snow Labs +name: hf_repo_miteshkotak7_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`hf_repo_miteshkotak7_pipeline` is a English model originally trained by miteshkotak7. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/hf_repo_miteshkotak7_pipeline_en_5.5.0_3.0_1726681698338.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/hf_repo_miteshkotak7_pipeline_en_5.5.0_3.0_1726681698338.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("hf_repo_miteshkotak7_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("hf_repo_miteshkotak7_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|hf_repo_miteshkotak7_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|246.0 MB| + +## References + +https://huggingface.co/miteshkotak7/hf-repo + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-horai_medium_17k_roberta_large_30_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-horai_medium_17k_roberta_large_30_pipeline_en.md new file mode 100644 index 00000000000000..81018edf54c2e8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-horai_medium_17k_roberta_large_30_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English horai_medium_17k_roberta_large_30_pipeline pipeline RoBertaForSequenceClassification from stealthwriter +author: John Snow Labs +name: horai_medium_17k_roberta_large_30_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`horai_medium_17k_roberta_large_30_pipeline` is a English model originally trained by stealthwriter. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/horai_medium_17k_roberta_large_30_pipeline_en_5.5.0_3.0_1726650578959.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/horai_medium_17k_roberta_large_30_pipeline_en_5.5.0_3.0_1726650578959.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("horai_medium_17k_roberta_large_30_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("horai_medium_17k_roberta_large_30_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|horai_medium_17k_roberta_large_30_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/stealthwriter/HorAI-medium-17k-roberta-large-30 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-ielts_grading_regression_en.md b/docs/_posts/ahmedlone127/2024-09-18-ielts_grading_regression_en.md new file mode 100644 index 00000000000000..e7c1faba6a7a33 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-ielts_grading_regression_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English ielts_grading_regression RoBertaForSequenceClassification from sebasgaviria79 +author: John Snow Labs +name: ielts_grading_regression +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ielts_grading_regression` is a English model originally trained by sebasgaviria79. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ielts_grading_regression_en_5.5.0_3.0_1726621601328.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ielts_grading_regression_en_5.5.0_3.0_1726621601328.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("ielts_grading_regression","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("ielts_grading_regression", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ielts_grading_regression| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|309.0 MB| + +## References + +https://huggingface.co/sebasgaviria79/ielts-grading-regression \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-imdb_0_en.md b/docs/_posts/ahmedlone127/2024-09-18-imdb_0_en.md new file mode 100644 index 00000000000000..be288a2fbc8fa1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-imdb_0_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English imdb_0 DistilBertForSequenceClassification from draghicivlad +author: John Snow Labs +name: imdb_0 +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`imdb_0` is a English model originally trained by draghicivlad. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/imdb_0_en_5.5.0_3.0_1726676855719.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/imdb_0_en_5.5.0_3.0_1726676855719.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("imdb_0","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("imdb_0", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|imdb_0| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/draghicivlad/imdb_0 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-imdb_0_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-imdb_0_pipeline_en.md new file mode 100644 index 00000000000000..5e570074cc6685 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-imdb_0_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English imdb_0_pipeline pipeline DistilBertForSequenceClassification from draghicivlad +author: John Snow Labs +name: imdb_0_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`imdb_0_pipeline` is a English model originally trained by draghicivlad. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/imdb_0_pipeline_en_5.5.0_3.0_1726676868190.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/imdb_0_pipeline_en_5.5.0_3.0_1726676868190.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("imdb_0_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("imdb_0_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|imdb_0_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/draghicivlad/imdb_0 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-imdb_4_en.md b/docs/_posts/ahmedlone127/2024-09-18-imdb_4_en.md new file mode 100644 index 00000000000000..b43050a02261af --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-imdb_4_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English imdb_4 DistilBertForSequenceClassification from draghicivlad +author: John Snow Labs +name: imdb_4 +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`imdb_4` is a English model originally trained by draghicivlad. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/imdb_4_en_5.5.0_3.0_1726677519729.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/imdb_4_en_5.5.0_3.0_1726677519729.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("imdb_4","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("imdb_4", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|imdb_4| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/draghicivlad/imdb_4 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-intel_huggingface_workshop_bot_en.md b/docs/_posts/ahmedlone127/2024-09-18-intel_huggingface_workshop_bot_en.md new file mode 100644 index 00000000000000..f690da5a74833c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-intel_huggingface_workshop_bot_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English intel_huggingface_workshop_bot DistilBertForSequenceClassification from arjunraghunandanan +author: John Snow Labs +name: intel_huggingface_workshop_bot +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`intel_huggingface_workshop_bot` is a English model originally trained by arjunraghunandanan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/intel_huggingface_workshop_bot_en_5.5.0_3.0_1726694984231.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/intel_huggingface_workshop_bot_en_5.5.0_3.0_1726694984231.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("intel_huggingface_workshop_bot","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("intel_huggingface_workshop_bot", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|intel_huggingface_workshop_bot| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/arjunraghunandanan/intel-huggingface-workshop-bot \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-intel_huggingface_workshop_bot_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-intel_huggingface_workshop_bot_pipeline_en.md new file mode 100644 index 00000000000000..ab083ff1717595 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-intel_huggingface_workshop_bot_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English intel_huggingface_workshop_bot_pipeline pipeline DistilBertForSequenceClassification from arjunraghunandanan +author: John Snow Labs +name: intel_huggingface_workshop_bot_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`intel_huggingface_workshop_bot_pipeline` is a English model originally trained by arjunraghunandanan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/intel_huggingface_workshop_bot_pipeline_en_5.5.0_3.0_1726694996331.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/intel_huggingface_workshop_bot_pipeline_en_5.5.0_3.0_1726694996331.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("intel_huggingface_workshop_bot_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("intel_huggingface_workshop_bot_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|intel_huggingface_workshop_bot_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/arjunraghunandanan/intel-huggingface-workshop-bot + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-interlingua_detection_roberta_base_en.md b/docs/_posts/ahmedlone127/2024-09-18-interlingua_detection_roberta_base_en.md new file mode 100644 index 00000000000000..0b6004a774b832 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-interlingua_detection_roberta_base_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English interlingua_detection_roberta_base RoBertaForSequenceClassification from arincon +author: John Snow Labs +name: interlingua_detection_roberta_base +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`interlingua_detection_roberta_base` is a English model originally trained by arincon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/interlingua_detection_roberta_base_en_5.5.0_3.0_1726666236549.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/interlingua_detection_roberta_base_en_5.5.0_3.0_1726666236549.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("interlingua_detection_roberta_base","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("interlingua_detection_roberta_base", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|interlingua_detection_roberta_base| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|460.4 MB| + +## References + +https://huggingface.co/arincon/ia-detection-roberta-base \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-interlingua_detection_roberta_base_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-interlingua_detection_roberta_base_pipeline_en.md new file mode 100644 index 00000000000000..a653bf9e8d87e1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-interlingua_detection_roberta_base_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English interlingua_detection_roberta_base_pipeline pipeline RoBertaForSequenceClassification from arincon +author: John Snow Labs +name: interlingua_detection_roberta_base_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`interlingua_detection_roberta_base_pipeline` is a English model originally trained by arincon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/interlingua_detection_roberta_base_pipeline_en_5.5.0_3.0_1726666259471.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/interlingua_detection_roberta_base_pipeline_en_5.5.0_3.0_1726666259471.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("interlingua_detection_roberta_base_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("interlingua_detection_roberta_base_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|interlingua_detection_roberta_base_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|460.4 MB| + +## References + +https://huggingface.co/arincon/ia-detection-roberta-base + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-it2_robertuito_l_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-it2_robertuito_l_pipeline_en.md new file mode 100644 index 00000000000000..26573350d85362 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-it2_robertuito_l_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English it2_robertuito_l_pipeline pipeline RoBertaForSequenceClassification from PEzquerra +author: John Snow Labs +name: it2_robertuito_l_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`it2_robertuito_l_pipeline` is a English model originally trained by PEzquerra. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/it2_robertuito_l_pipeline_en_5.5.0_3.0_1726650416339.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/it2_robertuito_l_pipeline_en_5.5.0_3.0_1726650416339.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("it2_robertuito_l_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("it2_robertuito_l_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|it2_robertuito_l_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|408.3 MB| + +## References + +https://huggingface.co/PEzquerra/it2_robertuito_L + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-lao_roberta_base_pipeline_lo.md b/docs/_posts/ahmedlone127/2024-09-18-lao_roberta_base_pipeline_lo.md new file mode 100644 index 00000000000000..64181b2ff8e052 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-lao_roberta_base_pipeline_lo.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Lao lao_roberta_base_pipeline pipeline RoBertaEmbeddings from w11wo +author: John Snow Labs +name: lao_roberta_base_pipeline +date: 2024-09-18 +tags: [lo, open_source, pipeline, onnx] +task: Embeddings +language: lo +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`lao_roberta_base_pipeline` is a Lao model originally trained by w11wo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/lao_roberta_base_pipeline_lo_5.5.0_3.0_1726651570200.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/lao_roberta_base_pipeline_lo_5.5.0_3.0_1726651570200.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("lao_roberta_base_pipeline", lang = "lo") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("lao_roberta_base_pipeline", lang = "lo") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|lao_roberta_base_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|lo| +|Size:|465.8 MB| + +## References + +https://huggingface.co/w11wo/lao-roberta-base + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-left_assamese_train_context_roberta_large_20e_en.md b/docs/_posts/ahmedlone127/2024-09-18-left_assamese_train_context_roberta_large_20e_en.md new file mode 100644 index 00000000000000..1903e0dc6e7447 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-left_assamese_train_context_roberta_large_20e_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English left_assamese_train_context_roberta_large_20e RoBertaForSequenceClassification from kghanlon +author: John Snow Labs +name: left_assamese_train_context_roberta_large_20e +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`left_assamese_train_context_roberta_large_20e` is a English model originally trained by kghanlon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/left_assamese_train_context_roberta_large_20e_en_5.5.0_3.0_1726650681553.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/left_assamese_train_context_roberta_large_20e_en_5.5.0_3.0_1726650681553.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("left_assamese_train_context_roberta_large_20e","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("left_assamese_train_context_roberta_large_20e", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|left_assamese_train_context_roberta_large_20e| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/kghanlon/left_as_train_context_roberta-large_20e \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-legal_undedup_base_v1_5__checkpoint_last_en.md b/docs/_posts/ahmedlone127/2024-09-18-legal_undedup_base_v1_5__checkpoint_last_en.md new file mode 100644 index 00000000000000..c2897d115cdfc7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-legal_undedup_base_v1_5__checkpoint_last_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English legal_undedup_base_v1_5__checkpoint_last RoBertaEmbeddings from eduagarcia-temp +author: John Snow Labs +name: legal_undedup_base_v1_5__checkpoint_last +date: 2024-09-18 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`legal_undedup_base_v1_5__checkpoint_last` is a English model originally trained by eduagarcia-temp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/legal_undedup_base_v1_5__checkpoint_last_en_5.5.0_3.0_1726618086891.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/legal_undedup_base_v1_5__checkpoint_last_en_5.5.0_3.0_1726618086891.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("legal_undedup_base_v1_5__checkpoint_last","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("legal_undedup_base_v1_5__checkpoint_last","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|legal_undedup_base_v1_5__checkpoint_last| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|296.3 MB| + +## References + +https://huggingface.co/eduagarcia-temp/legal_undedup_base_v1_5__checkpoint_last \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-legal_undedup_base_v1_5__checkpoint_last_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-legal_undedup_base_v1_5__checkpoint_last_pipeline_en.md new file mode 100644 index 00000000000000..9e7650e81311b4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-legal_undedup_base_v1_5__checkpoint_last_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English legal_undedup_base_v1_5__checkpoint_last_pipeline pipeline RoBertaEmbeddings from eduagarcia-temp +author: John Snow Labs +name: legal_undedup_base_v1_5__checkpoint_last_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`legal_undedup_base_v1_5__checkpoint_last_pipeline` is a English model originally trained by eduagarcia-temp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/legal_undedup_base_v1_5__checkpoint_last_pipeline_en_5.5.0_3.0_1726618178121.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/legal_undedup_base_v1_5__checkpoint_last_pipeline_en_5.5.0_3.0_1726618178121.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("legal_undedup_base_v1_5__checkpoint_last_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("legal_undedup_base_v1_5__checkpoint_last_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|legal_undedup_base_v1_5__checkpoint_last_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|296.3 MB| + +## References + +https://huggingface.co/eduagarcia-temp/legal_undedup_base_v1_5__checkpoint_last + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-llama_model_overfitted_reg_gpt_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-llama_model_overfitted_reg_gpt_pipeline_en.md new file mode 100644 index 00000000000000..367e32f8364db1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-llama_model_overfitted_reg_gpt_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English llama_model_overfitted_reg_gpt_pipeline pipeline MPNetEmbeddings from soksay +author: John Snow Labs +name: llama_model_overfitted_reg_gpt_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MPNetEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`llama_model_overfitted_reg_gpt_pipeline` is a English model originally trained by soksay. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/llama_model_overfitted_reg_gpt_pipeline_en_5.5.0_3.0_1726675059230.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/llama_model_overfitted_reg_gpt_pipeline_en_5.5.0_3.0_1726675059230.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("llama_model_overfitted_reg_gpt_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("llama_model_overfitted_reg_gpt_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|llama_model_overfitted_reg_gpt_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/soksay/llama_model_overfitted_REG_GPT + +## Included Models + +- DocumentAssembler +- MPNetEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-mach_3_en.md b/docs/_posts/ahmedlone127/2024-09-18-mach_3_en.md new file mode 100644 index 00000000000000..39829232db9b99 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-mach_3_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English mach_3 RoBertaForSequenceClassification from BaronSch +author: John Snow Labs +name: mach_3 +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`mach_3` is a English model originally trained by BaronSch. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/mach_3_en_5.5.0_3.0_1726621973653.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/mach_3_en_5.5.0_3.0_1726621973653.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("mach_3","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("mach_3", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|mach_3| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|468.5 MB| + +## References + +https://huggingface.co/BaronSch/Mach_3 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-mach_3_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-mach_3_pipeline_en.md new file mode 100644 index 00000000000000..3c4ccb2bc63941 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-mach_3_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English mach_3_pipeline pipeline RoBertaForSequenceClassification from BaronSch +author: John Snow Labs +name: mach_3_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`mach_3_pipeline` is a English model originally trained by BaronSch. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/mach_3_pipeline_en_5.5.0_3.0_1726621995994.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/mach_3_pipeline_en_5.5.0_3.0_1726621995994.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("mach_3_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("mach_3_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|mach_3_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|468.5 MB| + +## References + +https://huggingface.co/BaronSch/Mach_3 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-mobilebert_add_pre_training_complete_en.md b/docs/_posts/ahmedlone127/2024-09-18-mobilebert_add_pre_training_complete_en.md new file mode 100644 index 00000000000000..deabd330cee46a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-mobilebert_add_pre_training_complete_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English mobilebert_add_pre_training_complete BertEmbeddings from gokuls +author: John Snow Labs +name: mobilebert_add_pre_training_complete +date: 2024-09-18 +tags: [en, open_source, onnx, embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`mobilebert_add_pre_training_complete` is a English model originally trained by gokuls. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/mobilebert_add_pre_training_complete_en_5.5.0_3.0_1726673607658.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/mobilebert_add_pre_training_complete_en_5.5.0_3.0_1726673607658.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = BertEmbeddings.pretrained("mobilebert_add_pre_training_complete","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = BertEmbeddings.pretrained("mobilebert_add_pre_training_complete","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|mobilebert_add_pre_training_complete| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[bert]| +|Language:|en| +|Size:|92.5 MB| + +## References + +https://huggingface.co/gokuls/mobilebert_add_pre-training-complete \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-mobilebert_add_pre_training_complete_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-mobilebert_add_pre_training_complete_pipeline_en.md new file mode 100644 index 00000000000000..9244cc72804058 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-mobilebert_add_pre_training_complete_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English mobilebert_add_pre_training_complete_pipeline pipeline BertEmbeddings from gokuls +author: John Snow Labs +name: mobilebert_add_pre_training_complete_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`mobilebert_add_pre_training_complete_pipeline` is a English model originally trained by gokuls. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/mobilebert_add_pre_training_complete_pipeline_en_5.5.0_3.0_1726673612502.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/mobilebert_add_pre_training_complete_pipeline_en_5.5.0_3.0_1726673612502.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("mobilebert_add_pre_training_complete_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("mobilebert_add_pre_training_complete_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|mobilebert_add_pre_training_complete_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|92.5 MB| + +## References + +https://huggingface.co/gokuls/mobilebert_add_pre-training-complete + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-mongolian_roberta_large_en.md b/docs/_posts/ahmedlone127/2024-09-18-mongolian_roberta_large_en.md new file mode 100644 index 00000000000000..6ab3f22f00dacd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-mongolian_roberta_large_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English mongolian_roberta_large RoBertaEmbeddings from bayartsogt +author: John Snow Labs +name: mongolian_roberta_large +date: 2024-09-18 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`mongolian_roberta_large` is a English model originally trained by bayartsogt. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/mongolian_roberta_large_en_5.5.0_3.0_1726651443337.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/mongolian_roberta_large_en_5.5.0_3.0_1726651443337.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("mongolian_roberta_large","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("mongolian_roberta_large","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|mongolian_roberta_large| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/bayartsogt/mongolian-roberta-large \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-moviegenreprediction_en.md b/docs/_posts/ahmedlone127/2024-09-18-moviegenreprediction_en.md new file mode 100644 index 00000000000000..22f63c1c7c8d64 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-moviegenreprediction_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English moviegenreprediction DistilBertForSequenceClassification from shaggysus +author: John Snow Labs +name: moviegenreprediction +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`moviegenreprediction` is a English model originally trained by shaggysus. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/moviegenreprediction_en_5.5.0_3.0_1726625728202.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/moviegenreprediction_en_5.5.0_3.0_1726625728202.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("moviegenreprediction","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("moviegenreprediction", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|moviegenreprediction| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/shaggysus/MovieGenrePrediction \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-moviegenreprediction_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-moviegenreprediction_pipeline_en.md new file mode 100644 index 00000000000000..30a895e86a4f99 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-moviegenreprediction_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English moviegenreprediction_pipeline pipeline DistilBertForSequenceClassification from shaggysus +author: John Snow Labs +name: moviegenreprediction_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`moviegenreprediction_pipeline` is a English model originally trained by shaggysus. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/moviegenreprediction_pipeline_en_5.5.0_3.0_1726625740388.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/moviegenreprediction_pipeline_en_5.5.0_3.0_1726625740388.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("moviegenreprediction_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("moviegenreprediction_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|moviegenreprediction_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/shaggysus/MovieGenrePrediction + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-movies_notice_en.md b/docs/_posts/ahmedlone127/2024-09-18-movies_notice_en.md new file mode 100644 index 00000000000000..1dca8b1b93ac75 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-movies_notice_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English movies_notice DistilBertForSequenceClassification from hermione03 +author: John Snow Labs +name: movies_notice +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`movies_notice` is a English model originally trained by hermione03. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/movies_notice_en_5.5.0_3.0_1726695668314.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/movies_notice_en_5.5.0_3.0_1726695668314.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("movies_notice","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("movies_notice", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|movies_notice| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.6 MB| + +## References + +https://huggingface.co/hermione03/movies_notice \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-mrlincolnberta_en.md b/docs/_posts/ahmedlone127/2024-09-18-mrlincolnberta_en.md new file mode 100644 index 00000000000000..a3db79bc219652 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-mrlincolnberta_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English mrlincolnberta RoBertaEmbeddings from BigSalmon +author: John Snow Labs +name: mrlincolnberta +date: 2024-09-18 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`mrlincolnberta` is a English model originally trained by BigSalmon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/mrlincolnberta_en_5.5.0_3.0_1726651329667.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/mrlincolnberta_en_5.5.0_3.0_1726651329667.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("mrlincolnberta","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("mrlincolnberta","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|mrlincolnberta| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|466.2 MB| + +## References + +https://huggingface.co/BigSalmon/MrLincolnBerta \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-mt5_base_visp_s1_96_en.md b/docs/_posts/ahmedlone127/2024-09-18-mt5_base_visp_s1_96_en.md new file mode 100644 index 00000000000000..f6194a1bb3589f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-mt5_base_visp_s1_96_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English mt5_base_visp_s1_96 T5Transformer from ngwgsang +author: John Snow Labs +name: mt5_base_visp_s1_96 +date: 2024-09-18 +tags: [en, open_source, onnx, t5, question_answering, summarization, translation, text_generation] +task: [Question Answering, Summarization, Translation, Text Generation] +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: T5Transformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained T5Transformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`mt5_base_visp_s1_96` is a English model originally trained by ngwgsang. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/mt5_base_visp_s1_96_en_5.5.0_3.0_1726703439821.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/mt5_base_visp_s1_96_en_5.5.0_3.0_1726703439821.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +t5 = T5Transformer.pretrained("mt5_base_visp_s1_96","en") \ + .setInputCols(["document"]) \ + .setOutputCol("output") + +pipeline = Pipeline().setStages([documentAssembler, t5]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val t5 = T5Transformer.pretrained("mt5_base_visp_s1_96", "en") + .setInputCols(Array("documents")) + .setOutputCol("output") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, t5)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|mt5_base_visp_s1_96| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[output]| +|Language:|en| +|Size:|2.2 GB| + +## References + +https://huggingface.co/ngwgsang/mt5-base-visp-s1-96 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-multilingual_e5_base_censor_v0_1_xx.md b/docs/_posts/ahmedlone127/2024-09-18-multilingual_e5_base_censor_v0_1_xx.md new file mode 100644 index 00000000000000..e8cd485c8dce7b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-multilingual_e5_base_censor_v0_1_xx.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Multilingual multilingual_e5_base_censor_v0_1 XlmRoBertaForSequenceClassification from Data-Lab +author: John Snow Labs +name: multilingual_e5_base_censor_v0_1 +date: 2024-09-18 +tags: [xx, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`multilingual_e5_base_censor_v0_1` is a Multilingual model originally trained by Data-Lab. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/multilingual_e5_base_censor_v0_1_xx_5.5.0_3.0_1726659848298.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/multilingual_e5_base_censor_v0_1_xx_5.5.0_3.0_1726659848298.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("multilingual_e5_base_censor_v0_1","xx") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("multilingual_e5_base_censor_v0_1", "xx") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|multilingual_e5_base_censor_v0_1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|xx| +|Size:|802.4 MB| + +## References + +https://huggingface.co/Data-Lab/multilingual-e5-base_censor_v0.1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-multilingual_xlm_roberta_for_ner_sedaorcin_pipeline_xx.md b/docs/_posts/ahmedlone127/2024-09-18-multilingual_xlm_roberta_for_ner_sedaorcin_pipeline_xx.md new file mode 100644 index 00000000000000..da7c55d898ff0b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-multilingual_xlm_roberta_for_ner_sedaorcin_pipeline_xx.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Multilingual multilingual_xlm_roberta_for_ner_sedaorcin_pipeline pipeline XlmRoBertaForTokenClassification from sedaorcin +author: John Snow Labs +name: multilingual_xlm_roberta_for_ner_sedaorcin_pipeline +date: 2024-09-18 +tags: [xx, open_source, pipeline, onnx] +task: Named Entity Recognition +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`multilingual_xlm_roberta_for_ner_sedaorcin_pipeline` is a Multilingual model originally trained by sedaorcin. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/multilingual_xlm_roberta_for_ner_sedaorcin_pipeline_xx_5.5.0_3.0_1726636499064.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/multilingual_xlm_roberta_for_ner_sedaorcin_pipeline_xx_5.5.0_3.0_1726636499064.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("multilingual_xlm_roberta_for_ner_sedaorcin_pipeline", lang = "xx") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("multilingual_xlm_roberta_for_ner_sedaorcin_pipeline", lang = "xx") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|multilingual_xlm_roberta_for_ner_sedaorcin_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|xx| +|Size:|853.8 MB| + +## References + +https://huggingface.co/sedaorcin/multilingual-xlm-roberta-for-ner + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-mymodel_isom5240group17_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-mymodel_isom5240group17_pipeline_en.md new file mode 100644 index 00000000000000..18cd54bc0cb02a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-mymodel_isom5240group17_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English mymodel_isom5240group17_pipeline pipeline RoBertaForSequenceClassification from isom5240group17 +author: John Snow Labs +name: mymodel_isom5240group17_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`mymodel_isom5240group17_pipeline` is a English model originally trained by isom5240group17. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/mymodel_isom5240group17_pipeline_en_5.5.0_3.0_1726621539817.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/mymodel_isom5240group17_pipeline_en_5.5.0_3.0_1726621539817.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("mymodel_isom5240group17_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("mymodel_isom5240group17_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|mymodel_isom5240group17_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|308.8 MB| + +## References + +https://huggingface.co/isom5240group17/myModel + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-n_distilbert_twitterfin_padding70model_en.md b/docs/_posts/ahmedlone127/2024-09-18-n_distilbert_twitterfin_padding70model_en.md new file mode 100644 index 00000000000000..391e6960781392 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-n_distilbert_twitterfin_padding70model_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English n_distilbert_twitterfin_padding70model DistilBertForSequenceClassification from Realgon +author: John Snow Labs +name: n_distilbert_twitterfin_padding70model +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`n_distilbert_twitterfin_padding70model` is a English model originally trained by Realgon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/n_distilbert_twitterfin_padding70model_en_5.5.0_3.0_1726677316842.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/n_distilbert_twitterfin_padding70model_en_5.5.0_3.0_1726677316842.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("n_distilbert_twitterfin_padding70model","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("n_distilbert_twitterfin_padding70model", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|n_distilbert_twitterfin_padding70model| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.7 MB| + +## References + +https://huggingface.co/Realgon/N_distilbert_twitterfin_padding70model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-nerd_nerd_random0_seed0_bernice_en.md b/docs/_posts/ahmedlone127/2024-09-18-nerd_nerd_random0_seed0_bernice_en.md new file mode 100644 index 00000000000000..b6ab0f9ffbab8a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-nerd_nerd_random0_seed0_bernice_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English nerd_nerd_random0_seed0_bernice XlmRoBertaForSequenceClassification from tweettemposhift +author: John Snow Labs +name: nerd_nerd_random0_seed0_bernice +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`nerd_nerd_random0_seed0_bernice` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/nerd_nerd_random0_seed0_bernice_en_5.5.0_3.0_1726686617114.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/nerd_nerd_random0_seed0_bernice_en_5.5.0_3.0_1726686617114.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("nerd_nerd_random0_seed0_bernice","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("nerd_nerd_random0_seed0_bernice", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|nerd_nerd_random0_seed0_bernice| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|832.0 MB| + +## References + +https://huggingface.co/tweettemposhift/nerd-nerd_random0_seed0-bernice \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-nerd_nerd_random0_seed1_twitter_roberta_base_jun2021_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-nerd_nerd_random0_seed1_twitter_roberta_base_jun2021_pipeline_en.md new file mode 100644 index 00000000000000..7c63c0b199f38b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-nerd_nerd_random0_seed1_twitter_roberta_base_jun2021_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English nerd_nerd_random0_seed1_twitter_roberta_base_jun2021_pipeline pipeline RoBertaForSequenceClassification from tweettemposhift +author: John Snow Labs +name: nerd_nerd_random0_seed1_twitter_roberta_base_jun2021_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`nerd_nerd_random0_seed1_twitter_roberta_base_jun2021_pipeline` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/nerd_nerd_random0_seed1_twitter_roberta_base_jun2021_pipeline_en_5.5.0_3.0_1726650408758.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/nerd_nerd_random0_seed1_twitter_roberta_base_jun2021_pipeline_en_5.5.0_3.0_1726650408758.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("nerd_nerd_random0_seed1_twitter_roberta_base_jun2021_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("nerd_nerd_random0_seed1_twitter_roberta_base_jun2021_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|nerd_nerd_random0_seed1_twitter_roberta_base_jun2021_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|468.3 MB| + +## References + +https://huggingface.co/tweettemposhift/nerd-nerd_random0_seed1-twitter-roberta-base-jun2021 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-nerd_nerd_random3_seed2_twitter_roberta_base_2021_124m_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-nerd_nerd_random3_seed2_twitter_roberta_base_2021_124m_pipeline_en.md new file mode 100644 index 00000000000000..a07c5a99ca8507 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-nerd_nerd_random3_seed2_twitter_roberta_base_2021_124m_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English nerd_nerd_random3_seed2_twitter_roberta_base_2021_124m_pipeline pipeline RoBertaForSequenceClassification from tweettemposhift +author: John Snow Labs +name: nerd_nerd_random3_seed2_twitter_roberta_base_2021_124m_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`nerd_nerd_random3_seed2_twitter_roberta_base_2021_124m_pipeline` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/nerd_nerd_random3_seed2_twitter_roberta_base_2021_124m_pipeline_en_5.5.0_3.0_1726641806221.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/nerd_nerd_random3_seed2_twitter_roberta_base_2021_124m_pipeline_en_5.5.0_3.0_1726641806221.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("nerd_nerd_random3_seed2_twitter_roberta_base_2021_124m_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("nerd_nerd_random3_seed2_twitter_roberta_base_2021_124m_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|nerd_nerd_random3_seed2_twitter_roberta_base_2021_124m_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|468.3 MB| + +## References + +https://huggingface.co/tweettemposhift/nerd-nerd_random3_seed2-twitter-roberta-base-2021-124m + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-noise_memo_bert_3_01_en.md b/docs/_posts/ahmedlone127/2024-09-18-noise_memo_bert_3_01_en.md new file mode 100644 index 00000000000000..0ff1671f70325d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-noise_memo_bert_3_01_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English noise_memo_bert_3_01 XlmRoBertaForSequenceClassification from yemen2016 +author: John Snow Labs +name: noise_memo_bert_3_01 +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`noise_memo_bert_3_01` is a English model originally trained by yemen2016. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/noise_memo_bert_3_01_en_5.5.0_3.0_1726660110833.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/noise_memo_bert_3_01_en_5.5.0_3.0_1726660110833.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("noise_memo_bert_3_01","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("noise_memo_bert_3_01", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|noise_memo_bert_3_01| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|466.6 MB| + +## References + +https://huggingface.co/yemen2016/Noise_MeMo_BERT-3_01 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-norms_establish_check_reproducibility_16_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-norms_establish_check_reproducibility_16_pipeline_en.md new file mode 100644 index 00000000000000..dd17cc746f76a2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-norms_establish_check_reproducibility_16_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English norms_establish_check_reproducibility_16_pipeline pipeline RoBertaForSequenceClassification from rose-e-wang +author: John Snow Labs +name: norms_establish_check_reproducibility_16_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`norms_establish_check_reproducibility_16_pipeline` is a English model originally trained by rose-e-wang. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/norms_establish_check_reproducibility_16_pipeline_en_5.5.0_3.0_1726642239636.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/norms_establish_check_reproducibility_16_pipeline_en_5.5.0_3.0_1726642239636.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("norms_establish_check_reproducibility_16_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("norms_establish_check_reproducibility_16_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|norms_establish_check_reproducibility_16_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/rose-e-wang/norms_establish_check_reproducibility_16 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-ooc_patch_v1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-ooc_patch_v1_pipeline_en.md new file mode 100644 index 00000000000000..f73b773b9e09ac --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-ooc_patch_v1_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English ooc_patch_v1_pipeline pipeline DistilBertForSequenceClassification from Ksgk-fy +author: John Snow Labs +name: ooc_patch_v1_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ooc_patch_v1_pipeline` is a English model originally trained by Ksgk-fy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ooc_patch_v1_pipeline_en_5.5.0_3.0_1726630788562.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ooc_patch_v1_pipeline_en_5.5.0_3.0_1726630788562.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("ooc_patch_v1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("ooc_patch_v1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ooc_patch_v1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Ksgk-fy/ooc_patch_v1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-opticalbert_qa_squadv1_cased_en.md b/docs/_posts/ahmedlone127/2024-09-18-opticalbert_qa_squadv1_cased_en.md new file mode 100644 index 00000000000000..a011795596225b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-opticalbert_qa_squadv1_cased_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English opticalbert_qa_squadv1_cased BertForQuestionAnswering from opticalmaterials +author: John Snow Labs +name: opticalbert_qa_squadv1_cased +date: 2024-09-18 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opticalbert_qa_squadv1_cased` is a English model originally trained by opticalmaterials. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opticalbert_qa_squadv1_cased_en_5.5.0_3.0_1726659146866.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opticalbert_qa_squadv1_cased_en_5.5.0_3.0_1726659146866.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("opticalbert_qa_squadv1_cased","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("opticalbert_qa_squadv1_cased", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opticalbert_qa_squadv1_cased| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|403.5 MB| + +## References + +https://huggingface.co/opticalmaterials/opticalbert_qa_squadv1_cased \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-opticalbert_qa_squadv1_cased_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-opticalbert_qa_squadv1_cased_pipeline_en.md new file mode 100644 index 00000000000000..c183f891d35b5b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-opticalbert_qa_squadv1_cased_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English opticalbert_qa_squadv1_cased_pipeline pipeline BertForQuestionAnswering from opticalmaterials +author: John Snow Labs +name: opticalbert_qa_squadv1_cased_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opticalbert_qa_squadv1_cased_pipeline` is a English model originally trained by opticalmaterials. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opticalbert_qa_squadv1_cased_pipeline_en_5.5.0_3.0_1726659166285.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opticalbert_qa_squadv1_cased_pipeline_en_5.5.0_3.0_1726659166285.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("opticalbert_qa_squadv1_cased_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("opticalbert_qa_squadv1_cased_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opticalbert_qa_squadv1_cased_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|403.5 MB| + +## References + +https://huggingface.co/opticalmaterials/opticalbert_qa_squadv1_cased + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-org_bot_classifier_dstilbert_en.md b/docs/_posts/ahmedlone127/2024-09-18-org_bot_classifier_dstilbert_en.md new file mode 100644 index 00000000000000..e00b680a057327 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-org_bot_classifier_dstilbert_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English org_bot_classifier_dstilbert DistilBertForSequenceClassification from Jjzzzz +author: John Snow Labs +name: org_bot_classifier_dstilbert +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`org_bot_classifier_dstilbert` is a English model originally trained by Jjzzzz. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/org_bot_classifier_dstilbert_en_5.5.0_3.0_1726625547651.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/org_bot_classifier_dstilbert_en_5.5.0_3.0_1726625547651.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("org_bot_classifier_dstilbert","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("org_bot_classifier_dstilbert", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|org_bot_classifier_dstilbert| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Jjzzzz/org_bot_classifier_dstilbert \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-org_bot_classifier_dstilbert_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-org_bot_classifier_dstilbert_pipeline_en.md new file mode 100644 index 00000000000000..5e05a3a8cf5664 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-org_bot_classifier_dstilbert_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English org_bot_classifier_dstilbert_pipeline pipeline DistilBertForSequenceClassification from Jjzzzz +author: John Snow Labs +name: org_bot_classifier_dstilbert_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`org_bot_classifier_dstilbert_pipeline` is a English model originally trained by Jjzzzz. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/org_bot_classifier_dstilbert_pipeline_en_5.5.0_3.0_1726625560234.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/org_bot_classifier_dstilbert_pipeline_en_5.5.0_3.0_1726625560234.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("org_bot_classifier_dstilbert_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("org_bot_classifier_dstilbert_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|org_bot_classifier_dstilbert_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Jjzzzz/org_bot_classifier_dstilbert + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-pharma_classification_v2_en.md b/docs/_posts/ahmedlone127/2024-09-18-pharma_classification_v2_en.md new file mode 100644 index 00000000000000..f1dc0d6f1142b3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-pharma_classification_v2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English pharma_classification_v2 DistilBertForSequenceClassification from NikhilBITS +author: John Snow Labs +name: pharma_classification_v2 +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`pharma_classification_v2` is a English model originally trained by NikhilBITS. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/pharma_classification_v2_en_5.5.0_3.0_1726680619577.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/pharma_classification_v2_en_5.5.0_3.0_1726680619577.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("pharma_classification_v2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("pharma_classification_v2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|pharma_classification_v2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/NikhilBITS/pharma_classification_v2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-platzi_distilroberta_base_mrpc_glue_carlos_venegas_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-platzi_distilroberta_base_mrpc_glue_carlos_venegas_pipeline_en.md new file mode 100644 index 00000000000000..6b57d95fddb335 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-platzi_distilroberta_base_mrpc_glue_carlos_venegas_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English platzi_distilroberta_base_mrpc_glue_carlos_venegas_pipeline pipeline RoBertaForSequenceClassification from platzi +author: John Snow Labs +name: platzi_distilroberta_base_mrpc_glue_carlos_venegas_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`platzi_distilroberta_base_mrpc_glue_carlos_venegas_pipeline` is a English model originally trained by platzi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/platzi_distilroberta_base_mrpc_glue_carlos_venegas_pipeline_en_5.5.0_3.0_1726649403634.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/platzi_distilroberta_base_mrpc_glue_carlos_venegas_pipeline_en_5.5.0_3.0_1726649403634.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("platzi_distilroberta_base_mrpc_glue_carlos_venegas_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("platzi_distilroberta_base_mrpc_glue_carlos_venegas_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|platzi_distilroberta_base_mrpc_glue_carlos_venegas_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|308.6 MB| + +## References + +https://huggingface.co/platzi/platzi-distilroberta-base-mrpc-glue-carlos-venegas + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-platzi_distilroberta_base_mrpc_glue_luis_rascon_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-platzi_distilroberta_base_mrpc_glue_luis_rascon_pipeline_en.md new file mode 100644 index 00000000000000..da2c6f31402a1e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-platzi_distilroberta_base_mrpc_glue_luis_rascon_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English platzi_distilroberta_base_mrpc_glue_luis_rascon_pipeline pipeline RoBertaForSequenceClassification from platzi +author: John Snow Labs +name: platzi_distilroberta_base_mrpc_glue_luis_rascon_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`platzi_distilroberta_base_mrpc_glue_luis_rascon_pipeline` is a English model originally trained by platzi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/platzi_distilroberta_base_mrpc_glue_luis_rascon_pipeline_en_5.5.0_3.0_1726649859736.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/platzi_distilroberta_base_mrpc_glue_luis_rascon_pipeline_en_5.5.0_3.0_1726649859736.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("platzi_distilroberta_base_mrpc_glue_luis_rascon_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("platzi_distilroberta_base_mrpc_glue_luis_rascon_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|platzi_distilroberta_base_mrpc_glue_luis_rascon_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|308.6 MB| + +## References + +https://huggingface.co/platzi/platzi-distilroberta-base-mrpc-glue-luis-rascon + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-platzi_distilroberta_base_mrpc_glue_saul_burgos_en.md b/docs/_posts/ahmedlone127/2024-09-18-platzi_distilroberta_base_mrpc_glue_saul_burgos_en.md new file mode 100644 index 00000000000000..19092b37d0e965 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-platzi_distilroberta_base_mrpc_glue_saul_burgos_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English platzi_distilroberta_base_mrpc_glue_saul_burgos RoBertaForSequenceClassification from platzi +author: John Snow Labs +name: platzi_distilroberta_base_mrpc_glue_saul_burgos +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`platzi_distilroberta_base_mrpc_glue_saul_burgos` is a English model originally trained by platzi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/platzi_distilroberta_base_mrpc_glue_saul_burgos_en_5.5.0_3.0_1726666439544.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/platzi_distilroberta_base_mrpc_glue_saul_burgos_en_5.5.0_3.0_1726666439544.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("platzi_distilroberta_base_mrpc_glue_saul_burgos","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("platzi_distilroberta_base_mrpc_glue_saul_burgos", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|platzi_distilroberta_base_mrpc_glue_saul_burgos| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|308.6 MB| + +## References + +https://huggingface.co/platzi/platzi-distilroberta-base-mrpc-glue-saul-burgos \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-polarizer_base_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-polarizer_base_pipeline_en.md new file mode 100644 index 00000000000000..eaa8c0b3dc1401 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-polarizer_base_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English polarizer_base_pipeline pipeline RoBertaEmbeddings from kyungmin011029 +author: John Snow Labs +name: polarizer_base_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`polarizer_base_pipeline` is a English model originally trained by kyungmin011029. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/polarizer_base_pipeline_en_5.5.0_3.0_1726618118159.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/polarizer_base_pipeline_en_5.5.0_3.0_1726618118159.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("polarizer_base_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("polarizer_base_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|polarizer_base_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|465.8 MB| + +## References + +https://huggingface.co/kyungmin011029/Polarizer-base + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-polibert_malaysia_ver4_en.md b/docs/_posts/ahmedlone127/2024-09-18-polibert_malaysia_ver4_en.md new file mode 100644 index 00000000000000..11fffd3b3118cc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-polibert_malaysia_ver4_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English polibert_malaysia_ver4 BertForSequenceClassification from YagiASAFAS +author: John Snow Labs +name: polibert_malaysia_ver4 +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`polibert_malaysia_ver4` is a English model originally trained by YagiASAFAS. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/polibert_malaysia_ver4_en_5.5.0_3.0_1726623785605.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/polibert_malaysia_ver4_en_5.5.0_3.0_1726623785605.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("polibert_malaysia_ver4","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("polibert_malaysia_ver4", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|polibert_malaysia_ver4| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/YagiASAFAS/polibert-malaysia-ver4 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-polibert_malaysia_ver4_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-polibert_malaysia_ver4_pipeline_en.md new file mode 100644 index 00000000000000..3955da836de14c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-polibert_malaysia_ver4_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English polibert_malaysia_ver4_pipeline pipeline BertForSequenceClassification from YagiASAFAS +author: John Snow Labs +name: polibert_malaysia_ver4_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`polibert_malaysia_ver4_pipeline` is a English model originally trained by YagiASAFAS. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/polibert_malaysia_ver4_pipeline_en_5.5.0_3.0_1726623805293.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/polibert_malaysia_ver4_pipeline_en_5.5.0_3.0_1726623805293.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("polibert_malaysia_ver4_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("polibert_malaysia_ver4_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|polibert_malaysia_ver4_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/YagiASAFAS/polibert-malaysia-ver4 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-ptcrawl_plus_legal_large_v1_6__checkpoint_last_en.md b/docs/_posts/ahmedlone127/2024-09-18-ptcrawl_plus_legal_large_v1_6__checkpoint_last_en.md new file mode 100644 index 00000000000000..d498c9be8b211e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-ptcrawl_plus_legal_large_v1_6__checkpoint_last_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English ptcrawl_plus_legal_large_v1_6__checkpoint_last RoBertaEmbeddings from eduagarcia-temp +author: John Snow Labs +name: ptcrawl_plus_legal_large_v1_6__checkpoint_last +date: 2024-09-18 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ptcrawl_plus_legal_large_v1_6__checkpoint_last` is a English model originally trained by eduagarcia-temp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ptcrawl_plus_legal_large_v1_6__checkpoint_last_en_5.5.0_3.0_1726678638354.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ptcrawl_plus_legal_large_v1_6__checkpoint_last_en_5.5.0_3.0_1726678638354.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("ptcrawl_plus_legal_large_v1_6__checkpoint_last","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("ptcrawl_plus_legal_large_v1_6__checkpoint_last","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ptcrawl_plus_legal_large_v1_6__checkpoint_last| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|843.9 MB| + +## References + +https://huggingface.co/eduagarcia-temp/ptcrawl_plus_legal_large_v1_6__checkpoint_last \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-qa_model_rachelle_en.md b/docs/_posts/ahmedlone127/2024-09-18-qa_model_rachelle_en.md new file mode 100644 index 00000000000000..f28464177aab66 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-qa_model_rachelle_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English qa_model_rachelle DistilBertForQuestionAnswering from RachelLe +author: John Snow Labs +name: qa_model_rachelle +date: 2024-09-18 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`qa_model_rachelle` is a English model originally trained by RachelLe. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/qa_model_rachelle_en_5.5.0_3.0_1726640810642.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/qa_model_rachelle_en_5.5.0_3.0_1726640810642.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("qa_model_rachelle","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("qa_model_rachelle", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|qa_model_rachelle| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/RachelLe/qa_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-reberta_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-reberta_pipeline_en.md new file mode 100644 index 00000000000000..771f1a99b7ae3a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-reberta_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English reberta_pipeline pipeline XlmRoBertaForSequenceClassification from achDev +author: John Snow Labs +name: reberta_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`reberta_pipeline` is a English model originally trained by achDev. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/reberta_pipeline_en_5.5.0_3.0_1726697963422.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/reberta_pipeline_en_5.5.0_3.0_1726697963422.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("reberta_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("reberta_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|reberta_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|814.3 MB| + +## References + +https://huggingface.co/achDev/reberta + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-request_type_classifier_roberta_26_08_2024_en.md b/docs/_posts/ahmedlone127/2024-09-18-request_type_classifier_roberta_26_08_2024_en.md new file mode 100644 index 00000000000000..af2d2dc4602d8d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-request_type_classifier_roberta_26_08_2024_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English request_type_classifier_roberta_26_08_2024 RoBertaForSequenceClassification from venkynavs +author: John Snow Labs +name: request_type_classifier_roberta_26_08_2024 +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`request_type_classifier_roberta_26_08_2024` is a English model originally trained by venkynavs. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/request_type_classifier_roberta_26_08_2024_en_5.5.0_3.0_1726621641063.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/request_type_classifier_roberta_26_08_2024_en_5.5.0_3.0_1726621641063.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("request_type_classifier_roberta_26_08_2024","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("request_type_classifier_roberta_26_08_2024", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|request_type_classifier_roberta_26_08_2024| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/venkynavs/Request_Type_Classifier_RoBERTa_26_08_2024 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-results_neihc_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-results_neihc_pipeline_en.md new file mode 100644 index 00000000000000..ef5f8e68b4d982 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-results_neihc_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English results_neihc_pipeline pipeline DistilBertForTokenClassification from neihc +author: John Snow Labs +name: results_neihc_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`results_neihc_pipeline` is a English model originally trained by neihc. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/results_neihc_pipeline_en_5.5.0_3.0_1726645054524.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/results_neihc_pipeline_en_5.5.0_3.0_1726645054524.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("results_neihc_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("results_neihc_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|results_neihc_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.3 MB| + +## References + +https://huggingface.co/neihc/results + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-review_classification_frithureiks_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-review_classification_frithureiks_pipeline_en.md new file mode 100644 index 00000000000000..e43518c48d9fca --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-review_classification_frithureiks_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English review_classification_frithureiks_pipeline pipeline DistilBertForSequenceClassification from frithureiks +author: John Snow Labs +name: review_classification_frithureiks_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`review_classification_frithureiks_pipeline` is a English model originally trained by frithureiks. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/review_classification_frithureiks_pipeline_en_5.5.0_3.0_1726630710484.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/review_classification_frithureiks_pipeline_en_5.5.0_3.0_1726630710484.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("review_classification_frithureiks_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("review_classification_frithureiks_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|review_classification_frithureiks_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/frithureiks/review_classification + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-review_classification_josephjose025_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-review_classification_josephjose025_pipeline_en.md new file mode 100644 index 00000000000000..c9c9a5968831eb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-review_classification_josephjose025_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English review_classification_josephjose025_pipeline pipeline DistilBertForSequenceClassification from josephjose025 +author: John Snow Labs +name: review_classification_josephjose025_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`review_classification_josephjose025_pipeline` is a English model originally trained by josephjose025. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/review_classification_josephjose025_pipeline_en_5.5.0_3.0_1726669414949.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/review_classification_josephjose025_pipeline_en_5.5.0_3.0_1726669414949.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("review_classification_josephjose025_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("review_classification_josephjose025_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|review_classification_josephjose025_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/josephjose025/review_classification + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-roberta_base_bne_finetuned_amazon_practica_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-roberta_base_bne_finetuned_amazon_practica_pipeline_en.md new file mode 100644 index 00000000000000..1f240c04e62318 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-roberta_base_bne_finetuned_amazon_practica_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_bne_finetuned_amazon_practica_pipeline pipeline RoBertaForSequenceClassification from Jhandry +author: John Snow Labs +name: roberta_base_bne_finetuned_amazon_practica_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_bne_finetuned_amazon_practica_pipeline` is a English model originally trained by Jhandry. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_bne_finetuned_amazon_practica_pipeline_en_5.5.0_3.0_1726642398775.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_bne_finetuned_amazon_practica_pipeline_en_5.5.0_3.0_1726642398775.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_bne_finetuned_amazon_practica_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_bne_finetuned_amazon_practica_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_bne_finetuned_amazon_practica_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|446.8 MB| + +## References + +https://huggingface.co/Jhandry/roberta-base-bne-finetuned-amazon_practica + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-roberta_base_bne_finetuned_amazon_reviews_spanish_03_en.md b/docs/_posts/ahmedlone127/2024-09-18-roberta_base_bne_finetuned_amazon_reviews_spanish_03_en.md new file mode 100644 index 00000000000000..fcd557a2f33efa --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-roberta_base_bne_finetuned_amazon_reviews_spanish_03_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_base_bne_finetuned_amazon_reviews_spanish_03 RoBertaForSequenceClassification from DevCar +author: John Snow Labs +name: roberta_base_bne_finetuned_amazon_reviews_spanish_03 +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_bne_finetuned_amazon_reviews_spanish_03` is a English model originally trained by DevCar. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_bne_finetuned_amazon_reviews_spanish_03_en_5.5.0_3.0_1726628472902.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_bne_finetuned_amazon_reviews_spanish_03_en_5.5.0_3.0_1726628472902.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_base_bne_finetuned_amazon_reviews_spanish_03","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_base_bne_finetuned_amazon_reviews_spanish_03", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_bne_finetuned_amazon_reviews_spanish_03| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|442.7 MB| + +## References + +https://huggingface.co/DevCar/roberta-base-bne-finetuned-amazon_reviews_es_03 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-roberta_base_bne_finetuned_amazon_reviews_spanish_03_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-roberta_base_bne_finetuned_amazon_reviews_spanish_03_pipeline_en.md new file mode 100644 index 00000000000000..f1113d76d939fa --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-roberta_base_bne_finetuned_amazon_reviews_spanish_03_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_bne_finetuned_amazon_reviews_spanish_03_pipeline pipeline RoBertaForSequenceClassification from DevCar +author: John Snow Labs +name: roberta_base_bne_finetuned_amazon_reviews_spanish_03_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_bne_finetuned_amazon_reviews_spanish_03_pipeline` is a English model originally trained by DevCar. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_bne_finetuned_amazon_reviews_spanish_03_pipeline_en_5.5.0_3.0_1726628501394.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_bne_finetuned_amazon_reviews_spanish_03_pipeline_en_5.5.0_3.0_1726628501394.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_bne_finetuned_amazon_reviews_spanish_03_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_bne_finetuned_amazon_reviews_spanish_03_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_bne_finetuned_amazon_reviews_spanish_03_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|442.7 MB| + +## References + +https://huggingface.co/DevCar/roberta-base-bne-finetuned-amazon_reviews_es_03 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-roberta_base_bne_finetuned_nepal_bhasa_oriya_used_title_gonchisi_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-roberta_base_bne_finetuned_nepal_bhasa_oriya_used_title_gonchisi_pipeline_en.md new file mode 100644 index 00000000000000..95e00c93e2a5fe --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-roberta_base_bne_finetuned_nepal_bhasa_oriya_used_title_gonchisi_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_bne_finetuned_nepal_bhasa_oriya_used_title_gonchisi_pipeline pipeline RoBertaForSequenceClassification from gonchisi +author: John Snow Labs +name: roberta_base_bne_finetuned_nepal_bhasa_oriya_used_title_gonchisi_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_bne_finetuned_nepal_bhasa_oriya_used_title_gonchisi_pipeline` is a English model originally trained by gonchisi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_bne_finetuned_nepal_bhasa_oriya_used_title_gonchisi_pipeline_en_5.5.0_3.0_1726627500997.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_bne_finetuned_nepal_bhasa_oriya_used_title_gonchisi_pipeline_en_5.5.0_3.0_1726627500997.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_bne_finetuned_nepal_bhasa_oriya_used_title_gonchisi_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_bne_finetuned_nepal_bhasa_oriya_used_title_gonchisi_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_bne_finetuned_nepal_bhasa_oriya_used_title_gonchisi_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|439.4 MB| + +## References + +https://huggingface.co/gonchisi/roberta-base-bne-finetuned-new_or_used_title + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-roberta_base_cased_portuguese_c_corpus_en.md b/docs/_posts/ahmedlone127/2024-09-18-roberta_base_cased_portuguese_c_corpus_en.md new file mode 100644 index 00000000000000..4f0444a249b71a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-roberta_base_cased_portuguese_c_corpus_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_base_cased_portuguese_c_corpus RoBertaEmbeddings from rosimeirecosta +author: John Snow Labs +name: roberta_base_cased_portuguese_c_corpus +date: 2024-09-18 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_cased_portuguese_c_corpus` is a English model originally trained by rosimeirecosta. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_cased_portuguese_c_corpus_en_5.5.0_3.0_1726618050564.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_cased_portuguese_c_corpus_en_5.5.0_3.0_1726618050564.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("roberta_base_cased_portuguese_c_corpus","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("roberta_base_cased_portuguese_c_corpus","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_cased_portuguese_c_corpus| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|590.6 MB| + +## References + +https://huggingface.co/rosimeirecosta/roberta-base-cased-pt-c-corpus \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-roberta_base_classification_en.md b/docs/_posts/ahmedlone127/2024-09-18-roberta_base_classification_en.md new file mode 100644 index 00000000000000..28716d9ae165f9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-roberta_base_classification_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_base_classification RoBertaForSequenceClassification from Ahmed235 +author: John Snow Labs +name: roberta_base_classification +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_classification` is a English model originally trained by Ahmed235. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_classification_en_5.5.0_3.0_1726689775998.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_classification_en_5.5.0_3.0_1726689775998.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_base_classification","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_base_classification", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_classification| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|653.2 MB| + +## References + +https://huggingface.co/Ahmed235/roberta-base-classification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-roberta_base_epoch_70_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-roberta_base_epoch_70_pipeline_en.md new file mode 100644 index 00000000000000..b9590162958565 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-roberta_base_epoch_70_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_epoch_70_pipeline pipeline RoBertaEmbeddings from yanaiela +author: John Snow Labs +name: roberta_base_epoch_70_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_epoch_70_pipeline` is a English model originally trained by yanaiela. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_epoch_70_pipeline_en_5.5.0_3.0_1726678343365.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_epoch_70_pipeline_en_5.5.0_3.0_1726678343365.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_epoch_70_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_epoch_70_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_epoch_70_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|297.3 MB| + +## References + +https://huggingface.co/yanaiela/roberta-base-epoch_70 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-roberta_base_epoch_73_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-roberta_base_epoch_73_pipeline_en.md new file mode 100644 index 00000000000000..7eb2c18a84429e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-roberta_base_epoch_73_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_epoch_73_pipeline pipeline RoBertaEmbeddings from yanaiela +author: John Snow Labs +name: roberta_base_epoch_73_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_epoch_73_pipeline` is a English model originally trained by yanaiela. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_epoch_73_pipeline_en_5.5.0_3.0_1726651997394.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_epoch_73_pipeline_en_5.5.0_3.0_1726651997394.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_epoch_73_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_epoch_73_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_epoch_73_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|297.3 MB| + +## References + +https://huggingface.co/yanaiela/roberta-base-epoch_73 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-roberta_base_finetuned_1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-roberta_base_finetuned_1_pipeline_en.md new file mode 100644 index 00000000000000..7cc8698b7e798c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-roberta_base_finetuned_1_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_finetuned_1_pipeline pipeline RoBertaForSequenceClassification from sara-nabhani +author: John Snow Labs +name: roberta_base_finetuned_1_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_finetuned_1_pipeline` is a English model originally trained by sara-nabhani. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_finetuned_1_pipeline_en_5.5.0_3.0_1726622284435.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_finetuned_1_pipeline_en_5.5.0_3.0_1726622284435.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_finetuned_1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_finetuned_1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_finetuned_1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|415.3 MB| + +## References + +https://huggingface.co/sara-nabhani/roberta-base-finetuned-1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-roberta_base_finetuned_bible_accelerate_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-roberta_base_finetuned_bible_accelerate_pipeline_en.md new file mode 100644 index 00000000000000..f7fabf8d12820a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-roberta_base_finetuned_bible_accelerate_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_finetuned_bible_accelerate_pipeline pipeline RoBertaEmbeddings from Pragash-Mohanarajah +author: John Snow Labs +name: roberta_base_finetuned_bible_accelerate_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_finetuned_bible_accelerate_pipeline` is a English model originally trained by Pragash-Mohanarajah. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_finetuned_bible_accelerate_pipeline_en_5.5.0_3.0_1726651396259.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_finetuned_bible_accelerate_pipeline_en_5.5.0_3.0_1726651396259.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_finetuned_bible_accelerate_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_finetuned_bible_accelerate_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_finetuned_bible_accelerate_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|465.9 MB| + +## References + +https://huggingface.co/Pragash-Mohanarajah/roberta-base-finetuned-bible-accelerate + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-roberta_base_finetuned_college_reviews_en.md b/docs/_posts/ahmedlone127/2024-09-18-roberta_base_finetuned_college_reviews_en.md new file mode 100644 index 00000000000000..795322f0ee2ab8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-roberta_base_finetuned_college_reviews_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_base_finetuned_college_reviews RoBertaEmbeddings from Mohit09gupta +author: John Snow Labs +name: roberta_base_finetuned_college_reviews +date: 2024-09-18 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_finetuned_college_reviews` is a English model originally trained by Mohit09gupta. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_finetuned_college_reviews_en_5.5.0_3.0_1726678451807.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_finetuned_college_reviews_en_5.5.0_3.0_1726678451807.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("roberta_base_finetuned_college_reviews","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("roberta_base_finetuned_college_reviews","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_finetuned_college_reviews| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|467.1 MB| + +## References + +https://huggingface.co/Mohit09gupta/roberta-base-finetuned-College-Reviews \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-roberta_base_finetuned_mrpc_vitaliivrublevskyi_en.md b/docs/_posts/ahmedlone127/2024-09-18-roberta_base_finetuned_mrpc_vitaliivrublevskyi_en.md new file mode 100644 index 00000000000000..6785cf61fd7c52 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-roberta_base_finetuned_mrpc_vitaliivrublevskyi_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_base_finetuned_mrpc_vitaliivrublevskyi RoBertaForSequenceClassification from VitaliiVrublevskyi +author: John Snow Labs +name: roberta_base_finetuned_mrpc_vitaliivrublevskyi +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_finetuned_mrpc_vitaliivrublevskyi` is a English model originally trained by VitaliiVrublevskyi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_finetuned_mrpc_vitaliivrublevskyi_en_5.5.0_3.0_1726666766462.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_finetuned_mrpc_vitaliivrublevskyi_en_5.5.0_3.0_1726666766462.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_base_finetuned_mrpc_vitaliivrublevskyi","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_base_finetuned_mrpc_vitaliivrublevskyi", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_finetuned_mrpc_vitaliivrublevskyi| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|442.4 MB| + +## References + +https://huggingface.co/VitaliiVrublevskyi/roberta-base-finetuned-mrpc \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-roberta_base_frozen_generics_mlm_en.md b/docs/_posts/ahmedlone127/2024-09-18-roberta_base_frozen_generics_mlm_en.md new file mode 100644 index 00000000000000..dfc5de448402ec --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-roberta_base_frozen_generics_mlm_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_base_frozen_generics_mlm RoBertaEmbeddings from sello-ralethe +author: John Snow Labs +name: roberta_base_frozen_generics_mlm +date: 2024-09-18 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_frozen_generics_mlm` is a English model originally trained by sello-ralethe. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_frozen_generics_mlm_en_5.5.0_3.0_1726678597715.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_frozen_generics_mlm_en_5.5.0_3.0_1726678597715.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("roberta_base_frozen_generics_mlm","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("roberta_base_frozen_generics_mlm","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_frozen_generics_mlm| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|298.2 MB| + +## References + +https://huggingface.co/sello-ralethe/roberta-base-frozen-generics-mlm \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-roberta_base_imdb_aypan17_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-roberta_base_imdb_aypan17_pipeline_en.md new file mode 100644 index 00000000000000..889d8b5b473e95 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-roberta_base_imdb_aypan17_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_imdb_aypan17_pipeline pipeline RoBertaForSequenceClassification from aypan17 +author: John Snow Labs +name: roberta_base_imdb_aypan17_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_imdb_aypan17_pipeline` is a English model originally trained by aypan17. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_imdb_aypan17_pipeline_en_5.5.0_3.0_1726690469274.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_imdb_aypan17_pipeline_en_5.5.0_3.0_1726690469274.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_imdb_aypan17_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_imdb_aypan17_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_imdb_aypan17_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|465.0 MB| + +## References + +https://huggingface.co/aypan17/roberta-base-imdb + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-roberta_base_pretrained_marathi_marh_2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-roberta_base_pretrained_marathi_marh_2_pipeline_en.md new file mode 100644 index 00000000000000..eb3d6b881d95a5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-roberta_base_pretrained_marathi_marh_2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_pretrained_marathi_marh_2_pipeline pipeline RoBertaEmbeddings from DeadBeast +author: John Snow Labs +name: roberta_base_pretrained_marathi_marh_2_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_pretrained_marathi_marh_2_pipeline` is a English model originally trained by DeadBeast. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_pretrained_marathi_marh_2_pipeline_en_5.5.0_3.0_1726651640689.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_pretrained_marathi_marh_2_pipeline_en_5.5.0_3.0_1726651640689.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_pretrained_marathi_marh_2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_pretrained_marathi_marh_2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_pretrained_marathi_marh_2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.7 MB| + +## References + +https://huggingface.co/DeadBeast/roberta-base-pretrained-mr-2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-roberta_baseline_finetuned_atis_3pct_v0_en.md b/docs/_posts/ahmedlone127/2024-09-18-roberta_baseline_finetuned_atis_3pct_v0_en.md new file mode 100644 index 00000000000000..4a2c0b3de8a812 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-roberta_baseline_finetuned_atis_3pct_v0_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_baseline_finetuned_atis_3pct_v0 RoBertaForSequenceClassification from benayas +author: John Snow Labs +name: roberta_baseline_finetuned_atis_3pct_v0 +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_baseline_finetuned_atis_3pct_v0` is a English model originally trained by benayas. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_baseline_finetuned_atis_3pct_v0_en_5.5.0_3.0_1726641701471.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_baseline_finetuned_atis_3pct_v0_en_5.5.0_3.0_1726641701471.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_baseline_finetuned_atis_3pct_v0","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_baseline_finetuned_atis_3pct_v0", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_baseline_finetuned_atis_3pct_v0| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|417.1 MB| + +## References + +https://huggingface.co/benayas/roberta-baseline-finetuned-atis_3pct_v0 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-roberta_dnd_intents_en.md b/docs/_posts/ahmedlone127/2024-09-18-roberta_dnd_intents_en.md new file mode 100644 index 00000000000000..ef0316f989d369 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-roberta_dnd_intents_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_dnd_intents RoBertaForSequenceClassification from neurae +author: John Snow Labs +name: roberta_dnd_intents +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_dnd_intents` is a English model originally trained by neurae. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_dnd_intents_en_5.5.0_3.0_1726641406128.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_dnd_intents_en_5.5.0_3.0_1726641406128.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_dnd_intents","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_dnd_intents", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_dnd_intents| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|468.2 MB| + +## References + +https://huggingface.co/neurae/roberta-dnd-intents \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-roberta_english_annualreport_tuned_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-roberta_english_annualreport_tuned_pipeline_en.md new file mode 100644 index 00000000000000..e230b55f16cffe --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-roberta_english_annualreport_tuned_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_english_annualreport_tuned_pipeline pipeline RoBertaEmbeddings from CCCCC5 +author: John Snow Labs +name: roberta_english_annualreport_tuned_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_english_annualreport_tuned_pipeline` is a English model originally trained by CCCCC5. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_english_annualreport_tuned_pipeline_en_5.5.0_3.0_1726678865481.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_english_annualreport_tuned_pipeline_en_5.5.0_3.0_1726678865481.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_english_annualreport_tuned_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_english_annualreport_tuned_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_english_annualreport_tuned_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|425.5 MB| + +## References + +https://huggingface.co/CCCCC5/RoBERTa_English_AnnualReport_tuned + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-roberta_fine_tuned_text_classification_slovene_data_augmentation_en.md b/docs/_posts/ahmedlone127/2024-09-18-roberta_fine_tuned_text_classification_slovene_data_augmentation_en.md new file mode 100644 index 00000000000000..85e864c036929b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-roberta_fine_tuned_text_classification_slovene_data_augmentation_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_fine_tuned_text_classification_slovene_data_augmentation RoBertaForSequenceClassification from Sleoruiz +author: John Snow Labs +name: roberta_fine_tuned_text_classification_slovene_data_augmentation +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_fine_tuned_text_classification_slovene_data_augmentation` is a English model originally trained by Sleoruiz. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_fine_tuned_text_classification_slovene_data_augmentation_en_5.5.0_3.0_1726649890818.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_fine_tuned_text_classification_slovene_data_augmentation_en_5.5.0_3.0_1726649890818.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_fine_tuned_text_classification_slovene_data_augmentation","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_fine_tuned_text_classification_slovene_data_augmentation", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_fine_tuned_text_classification_slovene_data_augmentation| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|461.3 MB| + +## References + +https://huggingface.co/Sleoruiz/roberta-fine-tuned-text-classification-SL-data-augmentation \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-roberta_finetuned_intention_prediction_spanish_jsevisal_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-roberta_finetuned_intention_prediction_spanish_jsevisal_pipeline_en.md new file mode 100644 index 00000000000000..863f27058f8411 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-roberta_finetuned_intention_prediction_spanish_jsevisal_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_finetuned_intention_prediction_spanish_jsevisal_pipeline pipeline RoBertaForTokenClassification from Jsevisal +author: John Snow Labs +name: roberta_finetuned_intention_prediction_spanish_jsevisal_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_finetuned_intention_prediction_spanish_jsevisal_pipeline` is a English model originally trained by Jsevisal. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_finetuned_intention_prediction_spanish_jsevisal_pipeline_en_5.5.0_3.0_1726652994586.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_finetuned_intention_prediction_spanish_jsevisal_pipeline_en_5.5.0_3.0_1726652994586.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_finetuned_intention_prediction_spanish_jsevisal_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_finetuned_intention_prediction_spanish_jsevisal_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_finetuned_intention_prediction_spanish_jsevisal_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|471.1 MB| + +## References + +https://huggingface.co/Jsevisal/roberta-finetuned-intention-prediction-es + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-roberta_finetuned_sensitive_keywords_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-roberta_finetuned_sensitive_keywords_pipeline_en.md new file mode 100644 index 00000000000000..65e0f0d2cf39db --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-roberta_finetuned_sensitive_keywords_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English roberta_finetuned_sensitive_keywords_pipeline pipeline RoBertaForQuestionAnswering from Mourya +author: John Snow Labs +name: roberta_finetuned_sensitive_keywords_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_finetuned_sensitive_keywords_pipeline` is a English model originally trained by Mourya. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_finetuned_sensitive_keywords_pipeline_en_5.5.0_3.0_1726619717087.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_finetuned_sensitive_keywords_pipeline_en_5.5.0_3.0_1726619717087.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_finetuned_sensitive_keywords_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_finetuned_sensitive_keywords_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_finetuned_sensitive_keywords_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|464.1 MB| + +## References + +https://huggingface.co/Mourya/roberta-finetuned-sensitive-keywords + +## Included Models + +- MultiDocumentAssembler +- RoBertaForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-roberta_for_pii_en.md b/docs/_posts/ahmedlone127/2024-09-18-roberta_for_pii_en.md new file mode 100644 index 00000000000000..9ca8441bdb97b4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-roberta_for_pii_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_for_pii RoBertaForTokenClassification from moo3030 +author: John Snow Labs +name: roberta_for_pii +date: 2024-09-18 +tags: [en, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_for_pii` is a English model originally trained by moo3030. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_for_pii_en_5.5.0_3.0_1726652660748.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_for_pii_en_5.5.0_3.0_1726652660748.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_for_pii","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_for_pii", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_for_pii| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|438.5 MB| + +## References + +https://huggingface.co/moo3030/roberta-for-pii \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-roberta_large_finetuned_m_avoid_harm_seler_en.md b/docs/_posts/ahmedlone127/2024-09-18-roberta_large_finetuned_m_avoid_harm_seler_en.md new file mode 100644 index 00000000000000..10dabb999751ce --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-roberta_large_finetuned_m_avoid_harm_seler_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_large_finetuned_m_avoid_harm_seler RoBertaForSequenceClassification from Gregorig +author: John Snow Labs +name: roberta_large_finetuned_m_avoid_harm_seler +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_large_finetuned_m_avoid_harm_seler` is a English model originally trained by Gregorig. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_large_finetuned_m_avoid_harm_seler_en_5.5.0_3.0_1726666085739.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_large_finetuned_m_avoid_harm_seler_en_5.5.0_3.0_1726666085739.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_large_finetuned_m_avoid_harm_seler","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_large_finetuned_m_avoid_harm_seler", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_large_finetuned_m_avoid_harm_seler| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/Gregorig/roberta-large-finetuned-m_avoid_harm_seler \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-roberta_large_finetuned_ner_single_label_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-roberta_large_finetuned_ner_single_label_pipeline_en.md new file mode 100644 index 00000000000000..9a8a24a2a679ca --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-roberta_large_finetuned_ner_single_label_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_large_finetuned_ner_single_label_pipeline pipeline RoBertaForTokenClassification from DDDacc +author: John Snow Labs +name: roberta_large_finetuned_ner_single_label_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_large_finetuned_ner_single_label_pipeline` is a English model originally trained by DDDacc. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_large_finetuned_ner_single_label_pipeline_en_5.5.0_3.0_1726638382672.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_large_finetuned_ner_single_label_pipeline_en_5.5.0_3.0_1726638382672.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_large_finetuned_ner_single_label_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_large_finetuned_ner_single_label_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_large_finetuned_ner_single_label_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/DDDacc/RoBERTa-Large-finetuned-ner-single-label + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-roberta_large_finetuned_t_overall_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-roberta_large_finetuned_t_overall_pipeline_en.md new file mode 100644 index 00000000000000..749bc925b0c9c0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-roberta_large_finetuned_t_overall_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_large_finetuned_t_overall_pipeline pipeline RoBertaForSequenceClassification from Gregorig +author: John Snow Labs +name: roberta_large_finetuned_t_overall_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_large_finetuned_t_overall_pipeline` is a English model originally trained by Gregorig. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_large_finetuned_t_overall_pipeline_en_5.5.0_3.0_1726627778877.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_large_finetuned_t_overall_pipeline_en_5.5.0_3.0_1726627778877.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_large_finetuned_t_overall_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_large_finetuned_t_overall_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_large_finetuned_t_overall_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/Gregorig/roberta-large-finetuned-t_overall + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-roberta_large_hoax_classifier_defs_1h10r_en.md b/docs/_posts/ahmedlone127/2024-09-18-roberta_large_hoax_classifier_defs_1h10r_en.md new file mode 100644 index 00000000000000..9e0d4e0796c104 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-roberta_large_hoax_classifier_defs_1h10r_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_large_hoax_classifier_defs_1h10r RoBertaForSequenceClassification from research-dump +author: John Snow Labs +name: roberta_large_hoax_classifier_defs_1h10r +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_large_hoax_classifier_defs_1h10r` is a English model originally trained by research-dump. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_large_hoax_classifier_defs_1h10r_en_5.5.0_3.0_1726666656144.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_large_hoax_classifier_defs_1h10r_en_5.5.0_3.0_1726666656144.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_large_hoax_classifier_defs_1h10r","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_large_hoax_classifier_defs_1h10r", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_large_hoax_classifier_defs_1h10r| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/research-dump/roberta-large_hoax_classifier_defs_1h10r \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-roberta_large_temp_classifier_bootstrapped_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-roberta_large_temp_classifier_bootstrapped_pipeline_en.md new file mode 100644 index 00000000000000..4679b479798e02 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-roberta_large_temp_classifier_bootstrapped_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_large_temp_classifier_bootstrapped_pipeline pipeline RoBertaForSequenceClassification from research-dump +author: John Snow Labs +name: roberta_large_temp_classifier_bootstrapped_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_large_temp_classifier_bootstrapped_pipeline` is a English model originally trained by research-dump. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_large_temp_classifier_bootstrapped_pipeline_en_5.5.0_3.0_1726649553694.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_large_temp_classifier_bootstrapped_pipeline_en_5.5.0_3.0_1726649553694.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_large_temp_classifier_bootstrapped_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_large_temp_classifier_bootstrapped_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_large_temp_classifier_bootstrapped_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/research-dump/roberta_large_temp_classifier_bootstrapped + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-roberta_largeweighted_hoax_classifier_definition_en.md b/docs/_posts/ahmedlone127/2024-09-18-roberta_largeweighted_hoax_classifier_definition_en.md new file mode 100644 index 00000000000000..65d16940980c5e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-roberta_largeweighted_hoax_classifier_definition_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_largeweighted_hoax_classifier_definition RoBertaForSequenceClassification from research-dump +author: John Snow Labs +name: roberta_largeweighted_hoax_classifier_definition +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_largeweighted_hoax_classifier_definition` is a English model originally trained by research-dump. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_largeweighted_hoax_classifier_definition_en_5.5.0_3.0_1726628721218.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_largeweighted_hoax_classifier_definition_en_5.5.0_3.0_1726628721218.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_largeweighted_hoax_classifier_definition","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_largeweighted_hoax_classifier_definition", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_largeweighted_hoax_classifier_definition| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/research-dump/roberta-largeweighted_hoax_classifier_definition \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-roberta_movie_review_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-roberta_movie_review_pipeline_en.md new file mode 100644 index 00000000000000..47841c7e64230b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-roberta_movie_review_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_movie_review_pipeline pipeline RoBertaForSequenceClassification from imalexianne +author: John Snow Labs +name: roberta_movie_review_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_movie_review_pipeline` is a English model originally trained by imalexianne. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_movie_review_pipeline_en_5.5.0_3.0_1726621843881.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_movie_review_pipeline_en_5.5.0_3.0_1726621843881.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_movie_review_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_movie_review_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_movie_review_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|454.3 MB| + +## References + +https://huggingface.co/imalexianne/Roberta-Movie_Review + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-roberta_mrqa_v2_en.md b/docs/_posts/ahmedlone127/2024-09-18-roberta_mrqa_v2_en.md new file mode 100644 index 00000000000000..59fac57a649f60 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-roberta_mrqa_v2_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English roberta_mrqa_v2 RoBertaForQuestionAnswering from enriquesaou +author: John Snow Labs +name: roberta_mrqa_v2 +date: 2024-09-18 +tags: [en, open_source, onnx, question_answering, roberta] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_mrqa_v2` is a English model originally trained by enriquesaou. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_mrqa_v2_en_5.5.0_3.0_1726619683795.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_mrqa_v2_en_5.5.0_3.0_1726619683795.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = RoBertaForQuestionAnswering.pretrained("roberta_mrqa_v2","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = RoBertaForQuestionAnswering.pretrained("roberta_mrqa_v2", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_mrqa_v2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|464.2 MB| + +## References + +https://huggingface.co/enriquesaou/roberta_mrqa_v2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-roberta_mrqa_v2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-roberta_mrqa_v2_pipeline_en.md new file mode 100644 index 00000000000000..306927f8171990 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-roberta_mrqa_v2_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English roberta_mrqa_v2_pipeline pipeline RoBertaForQuestionAnswering from enriquesaou +author: John Snow Labs +name: roberta_mrqa_v2_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_mrqa_v2_pipeline` is a English model originally trained by enriquesaou. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_mrqa_v2_pipeline_en_5.5.0_3.0_1726619705657.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_mrqa_v2_pipeline_en_5.5.0_3.0_1726619705657.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_mrqa_v2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_mrqa_v2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_mrqa_v2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|464.2 MB| + +## References + +https://huggingface.co/enriquesaou/roberta_mrqa_v2 + +## Included Models + +- MultiDocumentAssembler +- RoBertaForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-roberta_nli_group71_en.md b/docs/_posts/ahmedlone127/2024-09-18-roberta_nli_group71_en.md new file mode 100644 index 00000000000000..4ff3ed7656fc2a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-roberta_nli_group71_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_nli_group71 RoBertaForSequenceClassification from awashh +author: John Snow Labs +name: roberta_nli_group71 +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_nli_group71` is a English model originally trained by awashh. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_nli_group71_en_5.5.0_3.0_1726650265104.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_nli_group71_en_5.5.0_3.0_1726650265104.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_nli_group71","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_nli_group71", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_nli_group71| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|456.1 MB| + +## References + +https://huggingface.co/awashh/RoBERTa-NLI-Group71 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-roberta_nli_group71_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-roberta_nli_group71_pipeline_en.md new file mode 100644 index 00000000000000..e2dd68e06a3f3c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-roberta_nli_group71_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_nli_group71_pipeline pipeline RoBertaForSequenceClassification from awashh +author: John Snow Labs +name: roberta_nli_group71_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_nli_group71_pipeline` is a English model originally trained by awashh. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_nli_group71_pipeline_en_5.5.0_3.0_1726650290207.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_nli_group71_pipeline_en_5.5.0_3.0_1726650290207.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_nli_group71_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_nli_group71_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_nli_group71_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|456.1 MB| + +## References + +https://huggingface.co/awashh/RoBERTa-NLI-Group71 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-roberta_poetry_happiness_crpo_en.md b/docs/_posts/ahmedlone127/2024-09-18-roberta_poetry_happiness_crpo_en.md new file mode 100644 index 00000000000000..bbc0943669399d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-roberta_poetry_happiness_crpo_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_poetry_happiness_crpo RoBertaEmbeddings from andreipb +author: John Snow Labs +name: roberta_poetry_happiness_crpo +date: 2024-09-18 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_poetry_happiness_crpo` is a English model originally trained by andreipb. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_poetry_happiness_crpo_en_5.5.0_3.0_1726651871262.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_poetry_happiness_crpo_en_5.5.0_3.0_1726651871262.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("roberta_poetry_happiness_crpo","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("roberta_poetry_happiness_crpo","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_poetry_happiness_crpo| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|466.1 MB| + +## References + +https://huggingface.co/andreipb/roberta-poetry-happiness-crpo \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-roberta_poetry_life_crpo_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-roberta_poetry_life_crpo_pipeline_en.md new file mode 100644 index 00000000000000..6c0d64221af965 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-roberta_poetry_life_crpo_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_poetry_life_crpo_pipeline pipeline RoBertaEmbeddings from andreipb +author: John Snow Labs +name: roberta_poetry_life_crpo_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_poetry_life_crpo_pipeline` is a English model originally trained by andreipb. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_poetry_life_crpo_pipeline_en_5.5.0_3.0_1726652096644.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_poetry_life_crpo_pipeline_en_5.5.0_3.0_1726652096644.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_poetry_life_crpo_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_poetry_life_crpo_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_poetry_life_crpo_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|466.3 MB| + +## References + +https://huggingface.co/andreipb/roberta-poetry-life-crpo + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-roberta_poetry_love_crpo_en.md b/docs/_posts/ahmedlone127/2024-09-18-roberta_poetry_love_crpo_en.md new file mode 100644 index 00000000000000..d3863b40080017 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-roberta_poetry_love_crpo_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_poetry_love_crpo RoBertaEmbeddings from andreipb +author: John Snow Labs +name: roberta_poetry_love_crpo +date: 2024-09-18 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_poetry_love_crpo` is a English model originally trained by andreipb. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_poetry_love_crpo_en_5.5.0_3.0_1726677972899.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_poetry_love_crpo_en_5.5.0_3.0_1726677972899.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("roberta_poetry_love_crpo","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("roberta_poetry_love_crpo","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_poetry_love_crpo| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|466.3 MB| + +## References + +https://huggingface.co/andreipb/roberta-poetry-love-crpo \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-roberta_poetry_nature_crpo_en.md b/docs/_posts/ahmedlone127/2024-09-18-roberta_poetry_nature_crpo_en.md new file mode 100644 index 00000000000000..7573c07c70d6ef --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-roberta_poetry_nature_crpo_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_poetry_nature_crpo RoBertaEmbeddings from andreipb +author: John Snow Labs +name: roberta_poetry_nature_crpo +date: 2024-09-18 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_poetry_nature_crpo` is a English model originally trained by andreipb. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_poetry_nature_crpo_en_5.5.0_3.0_1726678322484.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_poetry_nature_crpo_en_5.5.0_3.0_1726678322484.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("roberta_poetry_nature_crpo","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("roberta_poetry_nature_crpo","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_poetry_nature_crpo| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|466.2 MB| + +## References + +https://huggingface.co/andreipb/roberta-poetry-nature-crpo \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-roberta_retrained_350k_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-roberta_retrained_350k_pipeline_en.md new file mode 100644 index 00000000000000..b74ed14e53d940 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-roberta_retrained_350k_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_retrained_350k_pipeline pipeline RoBertaEmbeddings from bitsanlp +author: John Snow Labs +name: roberta_retrained_350k_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_retrained_350k_pipeline` is a English model originally trained by bitsanlp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_retrained_350k_pipeline_en_5.5.0_3.0_1726678438848.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_retrained_350k_pipeline_en_5.5.0_3.0_1726678438848.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_retrained_350k_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_retrained_350k_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_retrained_350k_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|465.8 MB| + +## References + +https://huggingface.co/bitsanlp/roberta-retrained-350k + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-roberta_similarity_mudasiryasin_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-roberta_similarity_mudasiryasin_pipeline_en.md new file mode 100644 index 00000000000000..98e67ffbcc1299 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-roberta_similarity_mudasiryasin_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_similarity_mudasiryasin_pipeline pipeline RoBertaForSequenceClassification from mudasiryasin +author: John Snow Labs +name: roberta_similarity_mudasiryasin_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_similarity_mudasiryasin_pipeline` is a English model originally trained by mudasiryasin. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_similarity_mudasiryasin_pipeline_en_5.5.0_3.0_1726641438641.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_similarity_mudasiryasin_pipeline_en_5.5.0_3.0_1726641438641.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_similarity_mudasiryasin_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_similarity_mudasiryasin_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_similarity_mudasiryasin_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|440.6 MB| + +## References + +https://huggingface.co/mudasiryasin/roberta-similarity + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-roberta_span_detection_en.md b/docs/_posts/ahmedlone127/2024-09-18-roberta_span_detection_en.md new file mode 100644 index 00000000000000..01e9631bab526e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-roberta_span_detection_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_span_detection RoBertaForTokenClassification from AntoineBlanot +author: John Snow Labs +name: roberta_span_detection +date: 2024-09-18 +tags: [en, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_span_detection` is a English model originally trained by AntoineBlanot. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_span_detection_en_5.5.0_3.0_1726653201619.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_span_detection_en_5.5.0_3.0_1726653201619.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_span_detection","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_span_detection", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_span_detection| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|842.5 MB| + +## References + +https://huggingface.co/AntoineBlanot/roberta-span-detection \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-roberta_spanish_clinical_trials_medic_attr_ner_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-roberta_spanish_clinical_trials_medic_attr_ner_pipeline_en.md new file mode 100644 index 00000000000000..fb658109f40067 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-roberta_spanish_clinical_trials_medic_attr_ner_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_spanish_clinical_trials_medic_attr_ner_pipeline pipeline RoBertaForTokenClassification from medspaner +author: John Snow Labs +name: roberta_spanish_clinical_trials_medic_attr_ner_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_spanish_clinical_trials_medic_attr_ner_pipeline` is a English model originally trained by medspaner. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_spanish_clinical_trials_medic_attr_ner_pipeline_en_5.5.0_3.0_1726653453128.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_spanish_clinical_trials_medic_attr_ner_pipeline_en_5.5.0_3.0_1726653453128.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_spanish_clinical_trials_medic_attr_ner_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_spanish_clinical_trials_medic_attr_ner_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_spanish_clinical_trials_medic_attr_ner_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|442.3 MB| + +## References + +https://huggingface.co/medspaner/roberta-es-clinical-trials-medic-attr-ner + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-roberta_tagalog_base_ft_udpos213_ukrainian_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-roberta_tagalog_base_ft_udpos213_ukrainian_pipeline_en.md new file mode 100644 index 00000000000000..82f10000cd6eea --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-roberta_tagalog_base_ft_udpos213_ukrainian_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_tagalog_base_ft_udpos213_ukrainian_pipeline pipeline RoBertaForTokenClassification from hellojimson +author: John Snow Labs +name: roberta_tagalog_base_ft_udpos213_ukrainian_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_tagalog_base_ft_udpos213_ukrainian_pipeline` is a English model originally trained by hellojimson. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_tagalog_base_ft_udpos213_ukrainian_pipeline_en_5.5.0_3.0_1726653129913.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_tagalog_base_ft_udpos213_ukrainian_pipeline_en_5.5.0_3.0_1726653129913.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_tagalog_base_ft_udpos213_ukrainian_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_tagalog_base_ft_udpos213_ukrainian_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_tagalog_base_ft_udpos213_ukrainian_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/hellojimson/roberta-tagalog-base-ft-udpos213-uk + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-roberta_twitterfin_padding30model_en.md b/docs/_posts/ahmedlone127/2024-09-18-roberta_twitterfin_padding30model_en.md new file mode 100644 index 00000000000000..48b399e1f97599 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-roberta_twitterfin_padding30model_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_twitterfin_padding30model RoBertaForSequenceClassification from Realgon +author: John Snow Labs +name: roberta_twitterfin_padding30model +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_twitterfin_padding30model` is a English model originally trained by Realgon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_twitterfin_padding30model_en_5.5.0_3.0_1726649674389.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_twitterfin_padding30model_en_5.5.0_3.0_1726649674389.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_twitterfin_padding30model","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_twitterfin_padding30model", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_twitterfin_padding30model| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|443.0 MB| + +## References + +https://huggingface.co/Realgon/roberta_twitterfin_padding30model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-roberta_untrained_1eps_seed291_en.md b/docs/_posts/ahmedlone127/2024-09-18-roberta_untrained_1eps_seed291_en.md new file mode 100644 index 00000000000000..526de12c571da8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-roberta_untrained_1eps_seed291_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_untrained_1eps_seed291 RoBertaForSequenceClassification from custeau +author: John Snow Labs +name: roberta_untrained_1eps_seed291 +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_untrained_1eps_seed291` is a English model originally trained by custeau. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_untrained_1eps_seed291_en_5.5.0_3.0_1726622172427.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_untrained_1eps_seed291_en_5.5.0_3.0_1726622172427.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_untrained_1eps_seed291","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_untrained_1eps_seed291", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_untrained_1eps_seed291| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|447.8 MB| + +## References + +https://huggingface.co/custeau/roberta_untrained_1eps_seed291 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-roberta_wikiann_conll_finetuned_chuvash_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-roberta_wikiann_conll_finetuned_chuvash_pipeline_en.md new file mode 100644 index 00000000000000..6dd13300f7fc13 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-roberta_wikiann_conll_finetuned_chuvash_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_wikiann_conll_finetuned_chuvash_pipeline pipeline RoBertaForTokenClassification from mrfirdauss +author: John Snow Labs +name: roberta_wikiann_conll_finetuned_chuvash_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_wikiann_conll_finetuned_chuvash_pipeline` is a English model originally trained by mrfirdauss. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_wikiann_conll_finetuned_chuvash_pipeline_en_5.5.0_3.0_1726652802266.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_wikiann_conll_finetuned_chuvash_pipeline_en_5.5.0_3.0_1726652802266.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_wikiann_conll_finetuned_chuvash_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_wikiann_conll_finetuned_chuvash_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_wikiann_conll_finetuned_chuvash_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|306.9 MB| + +## References + +https://huggingface.co/mrfirdauss/roberta_wikiann_conll_finetuned_cv + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-robertuito_3_j_en.md b/docs/_posts/ahmedlone127/2024-09-18-robertuito_3_j_en.md new file mode 100644 index 00000000000000..76dfef65465159 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-robertuito_3_j_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English robertuito_3_j RoBertaForSequenceClassification from PEzquerra +author: John Snow Labs +name: robertuito_3_j +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`robertuito_3_j` is a English model originally trained by PEzquerra. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/robertuito_3_j_en_5.5.0_3.0_1726666549722.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/robertuito_3_j_en_5.5.0_3.0_1726666549722.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("robertuito_3_j","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("robertuito_3_j", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|robertuito_3_j| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|408.3 MB| + +## References + +https://huggingface.co/PEzquerra/robertuito_3_J \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-roebrta_base_val_test_en.md b/docs/_posts/ahmedlone127/2024-09-18-roebrta_base_val_test_en.md new file mode 100644 index 00000000000000..4169cb5f006eb4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-roebrta_base_val_test_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roebrta_base_val_test RoBertaEmbeddings from Emanuel +author: John Snow Labs +name: roebrta_base_val_test +date: 2024-09-18 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roebrta_base_val_test` is a English model originally trained by Emanuel. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roebrta_base_val_test_en_5.5.0_3.0_1726678032419.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roebrta_base_val_test_en_5.5.0_3.0_1726678032419.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("roebrta_base_val_test","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("roebrta_base_val_test","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roebrta_base_val_test| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|466.3 MB| + +## References + +https://huggingface.co/Emanuel/roebrta-base-val-test \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-roebrta_base_val_test_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-roebrta_base_val_test_pipeline_en.md new file mode 100644 index 00000000000000..8391a6bc6e906e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-roebrta_base_val_test_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roebrta_base_val_test_pipeline pipeline RoBertaEmbeddings from Emanuel +author: John Snow Labs +name: roebrta_base_val_test_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roebrta_base_val_test_pipeline` is a English model originally trained by Emanuel. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roebrta_base_val_test_pipeline_en_5.5.0_3.0_1726678055423.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roebrta_base_val_test_pipeline_en_5.5.0_3.0_1726678055423.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roebrta_base_val_test_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roebrta_base_val_test_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roebrta_base_val_test_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|466.4 MB| + +## References + +https://huggingface.co/Emanuel/roebrta-base-val-test + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-rottentomato_classifier_en.md b/docs/_posts/ahmedlone127/2024-09-18-rottentomato_classifier_en.md new file mode 100644 index 00000000000000..825becad9f04f7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-rottentomato_classifier_en.md @@ -0,0 +1,98 @@ +--- +layout: model +title: English rottentomato_classifier DistilBertForSequenceClassification from tkurtulus +author: John Snow Labs +name: rottentomato_classifier +date: 2024-09-18 +tags: [bert, en, open_source, sequence_classification, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`rottentomato_classifier` is a English model originally trained by tkurtulus. + +## Predicted Entities + + + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/rottentomato_classifier_en_5.5.0_3.0_1726669916511.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/rottentomato_classifier_en_5.5.0_3.0_1726669916511.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +document_assembler = DocumentAssembler()\ + .setInputCol("text")\ + .setOutputCol("document") + +tokenizer = Tokenizer()\ + .setInputCols("document")\ + .setOutputCol("token") + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("rottentomato_classifier","en")\ + .setInputCols(["document","token"])\ + .setOutputCol("class") + +pipeline = Pipeline().setStages([document_assembler, tokenizer, sequenceClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val document_assembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("rottentomato_classifier","en") + .setInputCols(Array("document","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|rottentomato_classifier| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +References + +https://huggingface.co/tkurtulus/rottentomato-classifier \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-sad_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-sad_pipeline_en.md new file mode 100644 index 00000000000000..f4a44553285341 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-sad_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English sad_pipeline pipeline BertForSequenceClassification from Tianlin668 +author: John Snow Labs +name: sad_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sad_pipeline` is a English model originally trained by Tianlin668. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sad_pipeline_en_5.5.0_3.0_1726624310651.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sad_pipeline_en_5.5.0_3.0_1726624310651.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sad_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sad_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sad_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|408.9 MB| + +## References + +https://huggingface.co/Tianlin668/SAD + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-sberbank_rubert_base_collection3_pipeline_ru.md b/docs/_posts/ahmedlone127/2024-09-18-sberbank_rubert_base_collection3_pipeline_ru.md new file mode 100644 index 00000000000000..3b871a84eac78a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-sberbank_rubert_base_collection3_pipeline_ru.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Russian sberbank_rubert_base_collection3_pipeline pipeline BertForTokenClassification from viktoroo +author: John Snow Labs +name: sberbank_rubert_base_collection3_pipeline +date: 2024-09-18 +tags: [ru, open_source, pipeline, onnx] +task: Named Entity Recognition +language: ru +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sberbank_rubert_base_collection3_pipeline` is a Russian model originally trained by viktoroo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sberbank_rubert_base_collection3_pipeline_ru_5.5.0_3.0_1726699109092.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sberbank_rubert_base_collection3_pipeline_ru_5.5.0_3.0_1726699109092.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sberbank_rubert_base_collection3_pipeline", lang = "ru") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sberbank_rubert_base_collection3_pipeline", lang = "ru") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sberbank_rubert_base_collection3_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|ru| +|Size:|667.1 MB| + +## References + +https://huggingface.co/viktoroo/sberbank-rubert-base-collection3 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-scenario_non_kd_scr_copy_cdf_all_d2_data_cardiffnlp_tweet_sentiment_multilingual_xx.md b/docs/_posts/ahmedlone127/2024-09-18-scenario_non_kd_scr_copy_cdf_all_d2_data_cardiffnlp_tweet_sentiment_multilingual_xx.md new file mode 100644 index 00000000000000..31dcce902f10c7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-scenario_non_kd_scr_copy_cdf_all_d2_data_cardiffnlp_tweet_sentiment_multilingual_xx.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Multilingual scenario_non_kd_scr_copy_cdf_all_d2_data_cardiffnlp_tweet_sentiment_multilingual XlmRoBertaForSequenceClassification from haryoaw +author: John Snow Labs +name: scenario_non_kd_scr_copy_cdf_all_d2_data_cardiffnlp_tweet_sentiment_multilingual +date: 2024-09-18 +tags: [xx, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`scenario_non_kd_scr_copy_cdf_all_d2_data_cardiffnlp_tweet_sentiment_multilingual` is a Multilingual model originally trained by haryoaw. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/scenario_non_kd_scr_copy_cdf_all_d2_data_cardiffnlp_tweet_sentiment_multilingual_xx_5.5.0_3.0_1726686318835.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/scenario_non_kd_scr_copy_cdf_all_d2_data_cardiffnlp_tweet_sentiment_multilingual_xx_5.5.0_3.0_1726686318835.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("scenario_non_kd_scr_copy_cdf_all_d2_data_cardiffnlp_tweet_sentiment_multilingual","xx") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("scenario_non_kd_scr_copy_cdf_all_d2_data_cardiffnlp_tweet_sentiment_multilingual", "xx") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|scenario_non_kd_scr_copy_cdf_all_d2_data_cardiffnlp_tweet_sentiment_multilingual| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|xx| +|Size:|883.8 MB| + +## References + +https://huggingface.co/haryoaw/scenario-NON-KD-SCR-COPY-CDF-ALL-D2_data-cardiffnlp_tweet_sentiment_multilingual \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-scenario_non_kd_scr_copy_cdf_english_d2_data_english_cardiff_eng_only_gamma_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-scenario_non_kd_scr_copy_cdf_english_d2_data_english_cardiff_eng_only_gamma_pipeline_en.md new file mode 100644 index 00000000000000..f4507ba2eea30c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-scenario_non_kd_scr_copy_cdf_english_d2_data_english_cardiff_eng_only_gamma_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English scenario_non_kd_scr_copy_cdf_english_d2_data_english_cardiff_eng_only_gamma_pipeline pipeline XlmRoBertaForSequenceClassification from haryoaw +author: John Snow Labs +name: scenario_non_kd_scr_copy_cdf_english_d2_data_english_cardiff_eng_only_gamma_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`scenario_non_kd_scr_copy_cdf_english_d2_data_english_cardiff_eng_only_gamma_pipeline` is a English model originally trained by haryoaw. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/scenario_non_kd_scr_copy_cdf_english_d2_data_english_cardiff_eng_only_gamma_pipeline_en_5.5.0_3.0_1726671714294.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/scenario_non_kd_scr_copy_cdf_english_d2_data_english_cardiff_eng_only_gamma_pipeline_en_5.5.0_3.0_1726671714294.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("scenario_non_kd_scr_copy_cdf_english_d2_data_english_cardiff_eng_only_gamma_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("scenario_non_kd_scr_copy_cdf_english_d2_data_english_cardiff_eng_only_gamma_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|scenario_non_kd_scr_copy_cdf_english_d2_data_english_cardiff_eng_only_gamma_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|883.9 MB| + +## References + +https://huggingface.co/haryoaw/scenario-NON-KD-SCR-COPY-CDF-EN-D2_data-en-cardiff_eng_only_gamma + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-scenario_tcr_data_cl_cardiff_cl_only20271_en.md b/docs/_posts/ahmedlone127/2024-09-18-scenario_tcr_data_cl_cardiff_cl_only20271_en.md new file mode 100644 index 00000000000000..96d792c4d65fbc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-scenario_tcr_data_cl_cardiff_cl_only20271_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English scenario_tcr_data_cl_cardiff_cl_only20271 XlmRoBertaForSequenceClassification from haryoaw +author: John Snow Labs +name: scenario_tcr_data_cl_cardiff_cl_only20271 +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`scenario_tcr_data_cl_cardiff_cl_only20271` is a English model originally trained by haryoaw. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/scenario_tcr_data_cl_cardiff_cl_only20271_en_5.5.0_3.0_1726697062037.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/scenario_tcr_data_cl_cardiff_cl_only20271_en_5.5.0_3.0_1726697062037.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("scenario_tcr_data_cl_cardiff_cl_only20271","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("scenario_tcr_data_cl_cardiff_cl_only20271", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|scenario_tcr_data_cl_cardiff_cl_only20271| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|840.5 MB| + +## References + +https://huggingface.co/haryoaw/scenario-TCR_data-cl-cardiff_cl_only20271 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-scenario_tcr_data_cl_cardiff_cl_only20271_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-scenario_tcr_data_cl_cardiff_cl_only20271_pipeline_en.md new file mode 100644 index 00000000000000..719965bde72af9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-scenario_tcr_data_cl_cardiff_cl_only20271_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English scenario_tcr_data_cl_cardiff_cl_only20271_pipeline pipeline XlmRoBertaForSequenceClassification from haryoaw +author: John Snow Labs +name: scenario_tcr_data_cl_cardiff_cl_only20271_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`scenario_tcr_data_cl_cardiff_cl_only20271_pipeline` is a English model originally trained by haryoaw. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/scenario_tcr_data_cl_cardiff_cl_only20271_pipeline_en_5.5.0_3.0_1726697159548.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/scenario_tcr_data_cl_cardiff_cl_only20271_pipeline_en_5.5.0_3.0_1726697159548.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("scenario_tcr_data_cl_cardiff_cl_only20271_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("scenario_tcr_data_cl_cardiff_cl_only20271_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|scenario_tcr_data_cl_cardiff_cl_only20271_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|840.6 MB| + +## References + +https://huggingface.co/haryoaw/scenario-TCR_data-cl-cardiff_cl_only20271 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-second_qa_en.md b/docs/_posts/ahmedlone127/2024-09-18-second_qa_en.md new file mode 100644 index 00000000000000..ceb73a7deb9487 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-second_qa_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English second_qa DistilBertForQuestionAnswering from mattwanjia +author: John Snow Labs +name: second_qa +date: 2024-09-18 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`second_qa` is a English model originally trained by mattwanjia. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/second_qa_en_5.5.0_3.0_1726644023638.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/second_qa_en_5.5.0_3.0_1726644023638.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("second_qa","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("second_qa", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|second_qa| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/mattwanjia/second_qa \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-second_qa_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-second_qa_pipeline_en.md new file mode 100644 index 00000000000000..e87b2f1be4a908 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-second_qa_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English second_qa_pipeline pipeline DistilBertForQuestionAnswering from mattwanjia +author: John Snow Labs +name: second_qa_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`second_qa_pipeline` is a English model originally trained by mattwanjia. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/second_qa_pipeline_en_5.5.0_3.0_1726644036981.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/second_qa_pipeline_en_5.5.0_3.0_1726644036981.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("second_qa_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("second_qa_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|second_qa_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.3 MB| + +## References + +https://huggingface.co/mattwanjia/second_qa + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-sent_arabertmo_base_v3_ar.md b/docs/_posts/ahmedlone127/2024-09-18-sent_arabertmo_base_v3_ar.md new file mode 100644 index 00000000000000..4811d707423a28 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-sent_arabertmo_base_v3_ar.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Arabic sent_arabertmo_base_v3 BertSentenceEmbeddings from Ebtihal +author: John Snow Labs +name: sent_arabertmo_base_v3 +date: 2024-09-18 +tags: [ar, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: ar +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_arabertmo_base_v3` is a Arabic model originally trained by Ebtihal. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_arabertmo_base_v3_ar_5.5.0_3.0_1726675949156.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_arabertmo_base_v3_ar_5.5.0_3.0_1726675949156.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_arabertmo_base_v3","ar") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_arabertmo_base_v3","ar") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_arabertmo_base_v3| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|ar| +|Size:|407.9 MB| + +## References + +https://huggingface.co/Ebtihal/AraBertMo_base_V3 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-sent_arabertmo_base_v3_pipeline_ar.md b/docs/_posts/ahmedlone127/2024-09-18-sent_arabertmo_base_v3_pipeline_ar.md new file mode 100644 index 00000000000000..e6786b8a8095c3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-sent_arabertmo_base_v3_pipeline_ar.md @@ -0,0 +1,71 @@ +--- +layout: model +title: Arabic sent_arabertmo_base_v3_pipeline pipeline BertSentenceEmbeddings from Ebtihal +author: John Snow Labs +name: sent_arabertmo_base_v3_pipeline +date: 2024-09-18 +tags: [ar, open_source, pipeline, onnx] +task: Embeddings +language: ar +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_arabertmo_base_v3_pipeline` is a Arabic model originally trained by Ebtihal. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_arabertmo_base_v3_pipeline_ar_5.5.0_3.0_1726675969186.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_arabertmo_base_v3_pipeline_ar_5.5.0_3.0_1726675969186.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_arabertmo_base_v3_pipeline", lang = "ar") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_arabertmo_base_v3_pipeline", lang = "ar") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_arabertmo_base_v3_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|ar| +|Size:|408.4 MB| + +## References + +https://huggingface.co/Ebtihal/AraBertMo_base_V3 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-sent_arabertmo_base_v5_ar.md b/docs/_posts/ahmedlone127/2024-09-18-sent_arabertmo_base_v5_ar.md new file mode 100644 index 00000000000000..f9f9a2979891af --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-sent_arabertmo_base_v5_ar.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Arabic sent_arabertmo_base_v5 BertSentenceEmbeddings from Ebtihal +author: John Snow Labs +name: sent_arabertmo_base_v5 +date: 2024-09-18 +tags: [ar, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: ar +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_arabertmo_base_v5` is a Arabic model originally trained by Ebtihal. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_arabertmo_base_v5_ar_5.5.0_3.0_1726687552748.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_arabertmo_base_v5_ar_5.5.0_3.0_1726687552748.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_arabertmo_base_v5","ar") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_arabertmo_base_v5","ar") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_arabertmo_base_v5| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|ar| +|Size:|407.9 MB| + +## References + +https://huggingface.co/Ebtihal/AraBertMo_base_V5 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-sent_bert_base_25lang_cased_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-sent_bert_base_25lang_cased_pipeline_en.md new file mode 100644 index 00000000000000..e9c286245fa159 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-sent_bert_base_25lang_cased_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_base_25lang_cased_pipeline pipeline BertSentenceEmbeddings from Geotrend +author: John Snow Labs +name: sent_bert_base_25lang_cased_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_25lang_cased_pipeline` is a English model originally trained by Geotrend. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_25lang_cased_pipeline_en_5.5.0_3.0_1726687077805.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_25lang_cased_pipeline_en_5.5.0_3.0_1726687077805.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_25lang_cased_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_25lang_cased_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_25lang_cased_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|565.8 MB| + +## References + +https://huggingface.co/Geotrend/bert-base-25lang-cased + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-sent_bert_base_english_spanish_cased_en.md b/docs/_posts/ahmedlone127/2024-09-18-sent_bert_base_english_spanish_cased_en.md new file mode 100644 index 00000000000000..f26a7d6d14063e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-sent_bert_base_english_spanish_cased_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_base_english_spanish_cased BertSentenceEmbeddings from Geotrend +author: John Snow Labs +name: sent_bert_base_english_spanish_cased +date: 2024-09-18 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_english_spanish_cased` is a English model originally trained by Geotrend. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_english_spanish_cased_en_5.5.0_3.0_1726675863733.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_english_spanish_cased_en_5.5.0_3.0_1726675863733.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_english_spanish_cased","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_english_spanish_cased","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_english_spanish_cased| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|422.2 MB| + +## References + +https://huggingface.co/Geotrend/bert-base-en-es-cased \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-sent_bert_base_english_spanish_cased_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-sent_bert_base_english_spanish_cased_pipeline_en.md new file mode 100644 index 00000000000000..123e58074ab19d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-sent_bert_base_english_spanish_cased_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_base_english_spanish_cased_pipeline pipeline BertSentenceEmbeddings from Geotrend +author: John Snow Labs +name: sent_bert_base_english_spanish_cased_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_english_spanish_cased_pipeline` is a English model originally trained by Geotrend. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_english_spanish_cased_pipeline_en_5.5.0_3.0_1726675883932.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_english_spanish_cased_pipeline_en_5.5.0_3.0_1726675883932.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_english_spanish_cased_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_english_spanish_cased_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_english_spanish_cased_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|422.8 MB| + +## References + +https://huggingface.co/Geotrend/bert-base-en-es-cased + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-sent_bert_base_english_vietnamese_cased_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-sent_bert_base_english_vietnamese_cased_pipeline_en.md new file mode 100644 index 00000000000000..d0770b3eae2d31 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-sent_bert_base_english_vietnamese_cased_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_base_english_vietnamese_cased_pipeline pipeline BertSentenceEmbeddings from Geotrend +author: John Snow Labs +name: sent_bert_base_english_vietnamese_cased_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_english_vietnamese_cased_pipeline` is a English model originally trained by Geotrend. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_english_vietnamese_cased_pipeline_en_5.5.0_3.0_1726675963015.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_english_vietnamese_cased_pipeline_en_5.5.0_3.0_1726675963015.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_english_vietnamese_cased_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_english_vietnamese_cased_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_english_vietnamese_cased_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|411.4 MB| + +## References + +https://huggingface.co/Geotrend/bert-base-en-vi-cased + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-sent_bert_base_multilingual_cased_finetuned_naija_pipeline_xx.md b/docs/_posts/ahmedlone127/2024-09-18-sent_bert_base_multilingual_cased_finetuned_naija_pipeline_xx.md new file mode 100644 index 00000000000000..7393540e498e03 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-sent_bert_base_multilingual_cased_finetuned_naija_pipeline_xx.md @@ -0,0 +1,71 @@ +--- +layout: model +title: Multilingual sent_bert_base_multilingual_cased_finetuned_naija_pipeline pipeline BertSentenceEmbeddings from Davlan +author: John Snow Labs +name: sent_bert_base_multilingual_cased_finetuned_naija_pipeline +date: 2024-09-18 +tags: [xx, open_source, pipeline, onnx] +task: Embeddings +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_multilingual_cased_finetuned_naija_pipeline` is a Multilingual model originally trained by Davlan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_multilingual_cased_finetuned_naija_pipeline_xx_5.5.0_3.0_1726676277620.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_multilingual_cased_finetuned_naija_pipeline_xx_5.5.0_3.0_1726676277620.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_multilingual_cased_finetuned_naija_pipeline", lang = "xx") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_multilingual_cased_finetuned_naija_pipeline", lang = "xx") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_multilingual_cased_finetuned_naija_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|xx| +|Size:|665.6 MB| + +## References + +https://huggingface.co/Davlan/bert-base-multilingual-cased-finetuned-naija + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-sent_bert_base_uncased_finetuned_academic_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-sent_bert_base_uncased_finetuned_academic_pipeline_en.md new file mode 100644 index 00000000000000..98a80f3e58beb5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-sent_bert_base_uncased_finetuned_academic_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_base_uncased_finetuned_academic_pipeline pipeline BertSentenceEmbeddings from egumasa +author: John Snow Labs +name: sent_bert_base_uncased_finetuned_academic_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_uncased_finetuned_academic_pipeline` is a English model originally trained by egumasa. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_finetuned_academic_pipeline_en_5.5.0_3.0_1726687058487.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_finetuned_academic_pipeline_en_5.5.0_3.0_1726687058487.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_uncased_finetuned_academic_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_uncased_finetuned_academic_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_uncased_finetuned_academic_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.7 MB| + +## References + +https://huggingface.co/egumasa/bert-base-uncased-finetuned-academic + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-sent_bert_base_uncased_finetuned_imdb_shushuile_en.md b/docs/_posts/ahmedlone127/2024-09-18-sent_bert_base_uncased_finetuned_imdb_shushuile_en.md new file mode 100644 index 00000000000000..8cae986b85eac9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-sent_bert_base_uncased_finetuned_imdb_shushuile_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_base_uncased_finetuned_imdb_shushuile BertSentenceEmbeddings from shushuile +author: John Snow Labs +name: sent_bert_base_uncased_finetuned_imdb_shushuile +date: 2024-09-18 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_uncased_finetuned_imdb_shushuile` is a English model originally trained by shushuile. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_finetuned_imdb_shushuile_en_5.5.0_3.0_1726687337128.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_finetuned_imdb_shushuile_en_5.5.0_3.0_1726687337128.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_uncased_finetuned_imdb_shushuile","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_uncased_finetuned_imdb_shushuile","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_uncased_finetuned_imdb_shushuile| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/shushuile/bert-base-uncased-finetuned-imdb \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-sent_bert_base_uncased_issues_128_martinwunderlich_en.md b/docs/_posts/ahmedlone127/2024-09-18-sent_bert_base_uncased_issues_128_martinwunderlich_en.md new file mode 100644 index 00000000000000..309874f631de29 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-sent_bert_base_uncased_issues_128_martinwunderlich_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_base_uncased_issues_128_martinwunderlich BertSentenceEmbeddings from martinwunderlich +author: John Snow Labs +name: sent_bert_base_uncased_issues_128_martinwunderlich +date: 2024-09-18 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_uncased_issues_128_martinwunderlich` is a English model originally trained by martinwunderlich. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_issues_128_martinwunderlich_en_5.5.0_3.0_1726694214492.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_issues_128_martinwunderlich_en_5.5.0_3.0_1726694214492.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_uncased_issues_128_martinwunderlich","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_uncased_issues_128_martinwunderlich","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_uncased_issues_128_martinwunderlich| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|407.1 MB| + +## References + +https://huggingface.co/martinwunderlich/bert-base-uncased-issues-128 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-sent_bert_base_uncased_issues_128_martinwunderlich_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-sent_bert_base_uncased_issues_128_martinwunderlich_pipeline_en.md new file mode 100644 index 00000000000000..a6d49520426ba3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-sent_bert_base_uncased_issues_128_martinwunderlich_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_base_uncased_issues_128_martinwunderlich_pipeline pipeline BertSentenceEmbeddings from martinwunderlich +author: John Snow Labs +name: sent_bert_base_uncased_issues_128_martinwunderlich_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_uncased_issues_128_martinwunderlich_pipeline` is a English model originally trained by martinwunderlich. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_issues_128_martinwunderlich_pipeline_en_5.5.0_3.0_1726694233934.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_issues_128_martinwunderlich_pipeline_en_5.5.0_3.0_1726694233934.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_uncased_issues_128_martinwunderlich_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_uncased_issues_128_martinwunderlich_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_uncased_issues_128_martinwunderlich_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.7 MB| + +## References + +https://huggingface.co/martinwunderlich/bert-base-uncased-issues-128 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-sent_bert_based_ner_models_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-sent_bert_based_ner_models_pipeline_en.md new file mode 100644 index 00000000000000..e625036bb2f7ee --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-sent_bert_based_ner_models_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_based_ner_models_pipeline pipeline BertSentenceEmbeddings from pragnakalp +author: John Snow Labs +name: sent_bert_based_ner_models_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_based_ner_models_pipeline` is a English model originally trained by pragnakalp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_based_ner_models_pipeline_en_5.5.0_3.0_1726661674559.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_based_ner_models_pipeline_en_5.5.0_3.0_1726661674559.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_based_ner_models_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_based_ner_models_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_based_ner_models_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|404.2 MB| + +## References + +https://huggingface.co/pragnakalp/bert_based_ner_models + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-sent_bert_double_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-sent_bert_double_pipeline_en.md new file mode 100644 index 00000000000000..10c91eef602851 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-sent_bert_double_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_double_pipeline pipeline BertSentenceEmbeddings from casehold +author: John Snow Labs +name: sent_bert_double_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_double_pipeline` is a English model originally trained by casehold. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_double_pipeline_en_5.5.0_3.0_1726687226606.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_double_pipeline_en_5.5.0_3.0_1726687226606.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_double_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_double_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_double_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.5 MB| + +## References + +https://huggingface.co/casehold/bert-double + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-sent_bert_large_nli_ct_en.md b/docs/_posts/ahmedlone127/2024-09-18-sent_bert_large_nli_ct_en.md new file mode 100644 index 00000000000000..0d1480e779fcde --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-sent_bert_large_nli_ct_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_large_nli_ct BertSentenceEmbeddings from Contrastive-Tension +author: John Snow Labs +name: sent_bert_large_nli_ct +date: 2024-09-18 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_large_nli_ct` is a English model originally trained by Contrastive-Tension. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_large_nli_ct_en_5.5.0_3.0_1726675978906.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_large_nli_ct_en_5.5.0_3.0_1726675978906.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_large_nli_ct","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_large_nli_ct","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_large_nli_ct| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/Contrastive-Tension/BERT-Large-NLI-CT \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-sent_conflibert_cont_uncased_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-sent_conflibert_cont_uncased_pipeline_en.md new file mode 100644 index 00000000000000..0df845b6f32dbc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-sent_conflibert_cont_uncased_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_conflibert_cont_uncased_pipeline pipeline BertSentenceEmbeddings from snowood1 +author: John Snow Labs +name: sent_conflibert_cont_uncased_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_conflibert_cont_uncased_pipeline` is a English model originally trained by snowood1. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_conflibert_cont_uncased_pipeline_en_5.5.0_3.0_1726676225810.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_conflibert_cont_uncased_pipeline_en_5.5.0_3.0_1726676225810.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_conflibert_cont_uncased_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_conflibert_cont_uncased_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_conflibert_cont_uncased_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.4 MB| + +## References + +https://huggingface.co/snowood1/ConfliBERT-cont-uncased + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-sent_congretimbau_en.md b/docs/_posts/ahmedlone127/2024-09-18-sent_congretimbau_en.md new file mode 100644 index 00000000000000..4550df181933e3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-sent_congretimbau_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_congretimbau BertSentenceEmbeddings from belisards +author: John Snow Labs +name: sent_congretimbau +date: 2024-09-18 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_congretimbau` is a English model originally trained by belisards. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_congretimbau_en_5.5.0_3.0_1726676334967.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_congretimbau_en_5.5.0_3.0_1726676334967.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_congretimbau","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_congretimbau","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_congretimbau| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/belisards/congretimbau \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-sent_congretimbau_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-sent_congretimbau_pipeline_en.md new file mode 100644 index 00000000000000..e2d97b727ef30a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-sent_congretimbau_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_congretimbau_pipeline pipeline BertSentenceEmbeddings from belisards +author: John Snow Labs +name: sent_congretimbau_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_congretimbau_pipeline` is a English model originally trained by belisards. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_congretimbau_pipeline_en_5.5.0_3.0_1726676397245.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_congretimbau_pipeline_en_5.5.0_3.0_1726676397245.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_congretimbau_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_congretimbau_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_congretimbau_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/belisards/congretimbau + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-sent_convbert_base_generator_finnish_fi.md b/docs/_posts/ahmedlone127/2024-09-18-sent_convbert_base_generator_finnish_fi.md new file mode 100644 index 00000000000000..e4b304471910a9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-sent_convbert_base_generator_finnish_fi.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Finnish sent_convbert_base_generator_finnish BertSentenceEmbeddings from Finnish-NLP +author: John Snow Labs +name: sent_convbert_base_generator_finnish +date: 2024-09-18 +tags: [fi, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: fi +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_convbert_base_generator_finnish` is a Finnish model originally trained by Finnish-NLP. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_convbert_base_generator_finnish_fi_5.5.0_3.0_1726676163191.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_convbert_base_generator_finnish_fi_5.5.0_3.0_1726676163191.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_convbert_base_generator_finnish","fi") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_convbert_base_generator_finnish","fi") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_convbert_base_generator_finnish| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|fi| +|Size:|181.3 MB| + +## References + +https://huggingface.co/Finnish-NLP/convbert-base-generator-finnish \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-sent_distilbert_finetuned_imdb_neural_net_rahul_en.md b/docs/_posts/ahmedlone127/2024-09-18-sent_distilbert_finetuned_imdb_neural_net_rahul_en.md new file mode 100644 index 00000000000000..14c002da4e1cb3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-sent_distilbert_finetuned_imdb_neural_net_rahul_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_distilbert_finetuned_imdb_neural_net_rahul BertSentenceEmbeddings from neural-net-rahul +author: John Snow Labs +name: sent_distilbert_finetuned_imdb_neural_net_rahul +date: 2024-09-18 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_distilbert_finetuned_imdb_neural_net_rahul` is a English model originally trained by neural-net-rahul. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_distilbert_finetuned_imdb_neural_net_rahul_en_5.5.0_3.0_1726676342331.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_distilbert_finetuned_imdb_neural_net_rahul_en_5.5.0_3.0_1726676342331.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_distilbert_finetuned_imdb_neural_net_rahul","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_distilbert_finetuned_imdb_neural_net_rahul","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_distilbert_finetuned_imdb_neural_net_rahul| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|403.6 MB| + +## References + +https://huggingface.co/neural-net-rahul/distilbert-finetuned-imdb \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-sent_divehi_labse_dv.md b/docs/_posts/ahmedlone127/2024-09-18-sent_divehi_labse_dv.md new file mode 100644 index 00000000000000..f0fcb66ff4124a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-sent_divehi_labse_dv.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Dhivehi, Divehi, Maldivian sent_divehi_labse BertSentenceEmbeddings from monsoon-nlp +author: John Snow Labs +name: sent_divehi_labse +date: 2024-09-18 +tags: [dv, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: dv +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_divehi_labse` is a Dhivehi, Divehi, Maldivian model originally trained by monsoon-nlp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_divehi_labse_dv_5.5.0_3.0_1726687331827.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_divehi_labse_dv_5.5.0_3.0_1726687331827.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_divehi_labse","dv") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_divehi_labse","dv") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_divehi_labse| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|dv| +|Size:|1.9 GB| + +## References + +https://huggingface.co/monsoon-nlp/dv-labse \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-sent_divehi_labse_pipeline_dv.md b/docs/_posts/ahmedlone127/2024-09-18-sent_divehi_labse_pipeline_dv.md new file mode 100644 index 00000000000000..a8330070094add --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-sent_divehi_labse_pipeline_dv.md @@ -0,0 +1,71 @@ +--- +layout: model +title: Dhivehi, Divehi, Maldivian sent_divehi_labse_pipeline pipeline BertSentenceEmbeddings from monsoon-nlp +author: John Snow Labs +name: sent_divehi_labse_pipeline +date: 2024-09-18 +tags: [dv, open_source, pipeline, onnx] +task: Embeddings +language: dv +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_divehi_labse_pipeline` is a Dhivehi, Divehi, Maldivian model originally trained by monsoon-nlp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_divehi_labse_pipeline_dv_5.5.0_3.0_1726687422412.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_divehi_labse_pipeline_dv_5.5.0_3.0_1726687422412.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_divehi_labse_pipeline", lang = "dv") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_divehi_labse_pipeline", lang = "dv") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_divehi_labse_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|dv| +|Size:|1.9 GB| + +## References + +https://huggingface.co/monsoon-nlp/dv-labse + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-sent_esmlmt60_10000_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-sent_esmlmt60_10000_pipeline_en.md new file mode 100644 index 00000000000000..b1c5efd43e78d1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-sent_esmlmt60_10000_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_esmlmt60_10000_pipeline pipeline BertSentenceEmbeddings from hjkim811 +author: John Snow Labs +name: sent_esmlmt60_10000_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_esmlmt60_10000_pipeline` is a English model originally trained by hjkim811. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_esmlmt60_10000_pipeline_en_5.5.0_3.0_1726675897096.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_esmlmt60_10000_pipeline_en_5.5.0_3.0_1726675897096.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_esmlmt60_10000_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_esmlmt60_10000_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_esmlmt60_10000_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|404.2 MB| + +## References + +https://huggingface.co/hjkim811/esmlmt60-10000 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-sent_greeksocialbert_base_greek_uncased_v1_el.md b/docs/_posts/ahmedlone127/2024-09-18-sent_greeksocialbert_base_greek_uncased_v1_el.md new file mode 100644 index 00000000000000..e53d167c294d2e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-sent_greeksocialbert_base_greek_uncased_v1_el.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Modern Greek (1453-) sent_greeksocialbert_base_greek_uncased_v1 BertSentenceEmbeddings from gealexandri +author: John Snow Labs +name: sent_greeksocialbert_base_greek_uncased_v1 +date: 2024-09-18 +tags: [el, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: el +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_greeksocialbert_base_greek_uncased_v1` is a Modern Greek (1453-) model originally trained by gealexandri. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_greeksocialbert_base_greek_uncased_v1_el_5.5.0_3.0_1726694204097.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_greeksocialbert_base_greek_uncased_v1_el_5.5.0_3.0_1726694204097.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_greeksocialbert_base_greek_uncased_v1","el") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_greeksocialbert_base_greek_uncased_v1","el") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_greeksocialbert_base_greek_uncased_v1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|el| +|Size:|421.3 MB| + +## References + +https://huggingface.co/gealexandri/greeksocialbert-base-greek-uncased-v1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-sent_marathi_bert_small_pipeline_mr.md b/docs/_posts/ahmedlone127/2024-09-18-sent_marathi_bert_small_pipeline_mr.md new file mode 100644 index 00000000000000..cc8c5b601f85d5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-sent_marathi_bert_small_pipeline_mr.md @@ -0,0 +1,71 @@ +--- +layout: model +title: Marathi sent_marathi_bert_small_pipeline pipeline BertSentenceEmbeddings from l3cube-pune +author: John Snow Labs +name: sent_marathi_bert_small_pipeline +date: 2024-09-18 +tags: [mr, open_source, pipeline, onnx] +task: Embeddings +language: mr +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_marathi_bert_small_pipeline` is a Marathi model originally trained by l3cube-pune. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_marathi_bert_small_pipeline_mr_5.5.0_3.0_1726694422267.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_marathi_bert_small_pipeline_mr_5.5.0_3.0_1726694422267.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_marathi_bert_small_pipeline", lang = "mr") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_marathi_bert_small_pipeline", lang = "mr") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_marathi_bert_small_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|mr| +|Size:|311.7 MB| + +## References + +https://huggingface.co/l3cube-pune/marathi-bert-small + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-sent_nusabert_large_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-sent_nusabert_large_pipeline_en.md new file mode 100644 index 00000000000000..77281d6d8ca183 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-sent_nusabert_large_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_nusabert_large_pipeline pipeline BertSentenceEmbeddings from LazarusNLP +author: John Snow Labs +name: sent_nusabert_large_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_nusabert_large_pipeline` is a English model originally trained by LazarusNLP. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_nusabert_large_pipeline_en_5.5.0_3.0_1726662286462.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_nusabert_large_pipeline_en_5.5.0_3.0_1726662286462.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_nusabert_large_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_nusabert_large_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_nusabert_large_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/LazarusNLP/NusaBERT-large + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-sent_parlbert_german_v2_de.md b/docs/_posts/ahmedlone127/2024-09-18-sent_parlbert_german_v2_de.md new file mode 100644 index 00000000000000..3166343e7f2dc4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-sent_parlbert_german_v2_de.md @@ -0,0 +1,94 @@ +--- +layout: model +title: German sent_parlbert_german_v2 BertSentenceEmbeddings from chkla +author: John Snow Labs +name: sent_parlbert_german_v2 +date: 2024-09-18 +tags: [de, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: de +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_parlbert_german_v2` is a German model originally trained by chkla. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_parlbert_german_v2_de_5.5.0_3.0_1726662156708.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_parlbert_german_v2_de_5.5.0_3.0_1726662156708.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_parlbert_german_v2","de") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_parlbert_german_v2","de") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_parlbert_german_v2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|de| +|Size:|409.7 MB| + +## References + +https://huggingface.co/chkla/parlbert-german-v2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-sent_protaugment_lm_liu_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-sent_protaugment_lm_liu_pipeline_en.md new file mode 100644 index 00000000000000..37fc9d116a1857 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-sent_protaugment_lm_liu_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_protaugment_lm_liu_pipeline pipeline BertSentenceEmbeddings from tdopierre +author: John Snow Labs +name: sent_protaugment_lm_liu_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_protaugment_lm_liu_pipeline` is a English model originally trained by tdopierre. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_protaugment_lm_liu_pipeline_en_5.5.0_3.0_1726675783025.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_protaugment_lm_liu_pipeline_en_5.5.0_3.0_1726675783025.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_protaugment_lm_liu_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_protaugment_lm_liu_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_protaugment_lm_liu_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|403.9 MB| + +## References + +https://huggingface.co/tdopierre/ProtAugment-LM-Liu + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-sent_roberta_small_word_chinese_cluecorpussmall_zh.md b/docs/_posts/ahmedlone127/2024-09-18-sent_roberta_small_word_chinese_cluecorpussmall_zh.md new file mode 100644 index 00000000000000..76061c47ab150c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-sent_roberta_small_word_chinese_cluecorpussmall_zh.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Chinese sent_roberta_small_word_chinese_cluecorpussmall BertSentenceEmbeddings from uer +author: John Snow Labs +name: sent_roberta_small_word_chinese_cluecorpussmall +date: 2024-09-18 +tags: [zh, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: zh +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_roberta_small_word_chinese_cluecorpussmall` is a Chinese model originally trained by uer. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_roberta_small_word_chinese_cluecorpussmall_zh_5.5.0_3.0_1726675749236.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_roberta_small_word_chinese_cluecorpussmall_zh_5.5.0_3.0_1726675749236.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_roberta_small_word_chinese_cluecorpussmall","zh") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_roberta_small_word_chinese_cluecorpussmall","zh") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_roberta_small_word_chinese_cluecorpussmall| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|zh| +|Size:|240.2 MB| + +## References + +https://huggingface.co/uer/roberta-small-word-chinese-cluecorpussmall \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-sent_ssci_bert_e4_en.md b/docs/_posts/ahmedlone127/2024-09-18-sent_ssci_bert_e4_en.md new file mode 100644 index 00000000000000..1d1655157967ca --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-sent_ssci_bert_e4_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_ssci_bert_e4 BertSentenceEmbeddings from KM4STfulltext +author: John Snow Labs +name: sent_ssci_bert_e4 +date: 2024-09-18 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_ssci_bert_e4` is a English model originally trained by KM4STfulltext. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_ssci_bert_e4_en_5.5.0_3.0_1726687206830.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_ssci_bert_e4_en_5.5.0_3.0_1726687206830.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_ssci_bert_e4","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_ssci_bert_e4","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_ssci_bert_e4| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|403.6 MB| + +## References + +https://huggingface.co/KM4STfulltext/SSCI-BERT-e4 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-sent_ssci_bert_e4_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-sent_ssci_bert_e4_pipeline_en.md new file mode 100644 index 00000000000000..d31d8ee7def661 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-sent_ssci_bert_e4_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_ssci_bert_e4_pipeline pipeline BertSentenceEmbeddings from KM4STfulltext +author: John Snow Labs +name: sent_ssci_bert_e4_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_ssci_bert_e4_pipeline` is a English model originally trained by KM4STfulltext. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_ssci_bert_e4_pipeline_en_5.5.0_3.0_1726687228577.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_ssci_bert_e4_pipeline_en_5.5.0_3.0_1726687228577.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_ssci_bert_e4_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_ssci_bert_e4_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_ssci_bert_e4_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|404.1 MB| + +## References + +https://huggingface.co/KM4STfulltext/SSCI-BERT-e4 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-sent_tod_bert_jnt_v1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-sent_tod_bert_jnt_v1_pipeline_en.md new file mode 100644 index 00000000000000..943107437d4ac2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-sent_tod_bert_jnt_v1_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_tod_bert_jnt_v1_pipeline pipeline BertSentenceEmbeddings from TODBERT +author: John Snow Labs +name: sent_tod_bert_jnt_v1_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_tod_bert_jnt_v1_pipeline` is a English model originally trained by TODBERT. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_tod_bert_jnt_v1_pipeline_en_5.5.0_3.0_1726661910714.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_tod_bert_jnt_v1_pipeline_en_5.5.0_3.0_1726661910714.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_tod_bert_jnt_v1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_tod_bert_jnt_v1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_tod_bert_jnt_v1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|406.5 MB| + +## References + +https://huggingface.co/TODBERT/TOD-BERT-JNT-V1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-sentiment_roberta_large_e12_b16_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-sentiment_roberta_large_e12_b16_pipeline_en.md new file mode 100644 index 00000000000000..d821424357830b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-sentiment_roberta_large_e12_b16_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English sentiment_roberta_large_e12_b16_pipeline pipeline RoBertaForSequenceClassification from JerryYanJiang +author: John Snow Labs +name: sentiment_roberta_large_e12_b16_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sentiment_roberta_large_e12_b16_pipeline` is a English model originally trained by JerryYanJiang. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sentiment_roberta_large_e12_b16_pipeline_en_5.5.0_3.0_1726650725143.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sentiment_roberta_large_e12_b16_pipeline_en_5.5.0_3.0_1726650725143.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sentiment_roberta_large_e12_b16_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sentiment_roberta_large_e12_b16_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sentiment_roberta_large_e12_b16_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/JerryYanJiang/sentiment-roberta-large-e12-b16 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-sentiment_sentiment_small_random0_seed0_twitter_roberta_base_2021_124m_en.md b/docs/_posts/ahmedlone127/2024-09-18-sentiment_sentiment_small_random0_seed0_twitter_roberta_base_2021_124m_en.md new file mode 100644 index 00000000000000..2ea4c90342650e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-sentiment_sentiment_small_random0_seed0_twitter_roberta_base_2021_124m_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sentiment_sentiment_small_random0_seed0_twitter_roberta_base_2021_124m RoBertaForSequenceClassification from tweettemposhift +author: John Snow Labs +name: sentiment_sentiment_small_random0_seed0_twitter_roberta_base_2021_124m +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sentiment_sentiment_small_random0_seed0_twitter_roberta_base_2021_124m` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sentiment_sentiment_small_random0_seed0_twitter_roberta_base_2021_124m_en_5.5.0_3.0_1726628631916.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sentiment_sentiment_small_random0_seed0_twitter_roberta_base_2021_124m_en_5.5.0_3.0_1726628631916.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("sentiment_sentiment_small_random0_seed0_twitter_roberta_base_2021_124m","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("sentiment_sentiment_small_random0_seed0_twitter_roberta_base_2021_124m", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sentiment_sentiment_small_random0_seed0_twitter_roberta_base_2021_124m| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|468.3 MB| + +## References + +https://huggingface.co/tweettemposhift/sentiment-sentiment_small_random0_seed0-twitter-roberta-base-2021-124m \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-sentiment_ufakz_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-sentiment_ufakz_pipeline_en.md new file mode 100644 index 00000000000000..80d4a3ebca0597 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-sentiment_ufakz_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English sentiment_ufakz_pipeline pipeline DistilBertForSequenceClassification from ufakz +author: John Snow Labs +name: sentiment_ufakz_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sentiment_ufakz_pipeline` is a English model originally trained by ufakz. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sentiment_ufakz_pipeline_en_5.5.0_3.0_1726676766688.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sentiment_ufakz_pipeline_en_5.5.0_3.0_1726676766688.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sentiment_ufakz_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sentiment_ufakz_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sentiment_ufakz_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/ufakz/sentiment + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-seven_emotion_finetuned_twitter_xlm_roberta_base_en.md b/docs/_posts/ahmedlone127/2024-09-18-seven_emotion_finetuned_twitter_xlm_roberta_base_en.md new file mode 100644 index 00000000000000..28007223064806 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-seven_emotion_finetuned_twitter_xlm_roberta_base_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English seven_emotion_finetuned_twitter_xlm_roberta_base XlmRoBertaForSequenceClassification from 02shanky +author: John Snow Labs +name: seven_emotion_finetuned_twitter_xlm_roberta_base +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`seven_emotion_finetuned_twitter_xlm_roberta_base` is a English model originally trained by 02shanky. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/seven_emotion_finetuned_twitter_xlm_roberta_base_en_5.5.0_3.0_1726686586847.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/seven_emotion_finetuned_twitter_xlm_roberta_base_en_5.5.0_3.0_1726686586847.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("seven_emotion_finetuned_twitter_xlm_roberta_base","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("seven_emotion_finetuned_twitter_xlm_roberta_base", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|seven_emotion_finetuned_twitter_xlm_roberta_base| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.0 GB| + +## References + +https://huggingface.co/02shanky/seven-emotion-finetuned-twitter-xlm-roberta-base \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-seven_emotion_finetuned_twitter_xlm_roberta_base_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-seven_emotion_finetuned_twitter_xlm_roberta_base_pipeline_en.md new file mode 100644 index 00000000000000..bc860d4bb40bd7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-seven_emotion_finetuned_twitter_xlm_roberta_base_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English seven_emotion_finetuned_twitter_xlm_roberta_base_pipeline pipeline XlmRoBertaForSequenceClassification from 02shanky +author: John Snow Labs +name: seven_emotion_finetuned_twitter_xlm_roberta_base_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`seven_emotion_finetuned_twitter_xlm_roberta_base_pipeline` is a English model originally trained by 02shanky. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/seven_emotion_finetuned_twitter_xlm_roberta_base_pipeline_en_5.5.0_3.0_1726686636263.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/seven_emotion_finetuned_twitter_xlm_roberta_base_pipeline_en_5.5.0_3.0_1726686636263.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("seven_emotion_finetuned_twitter_xlm_roberta_base_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("seven_emotion_finetuned_twitter_xlm_roberta_base_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|seven_emotion_finetuned_twitter_xlm_roberta_base_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.0 GB| + +## References + +https://huggingface.co/02shanky/seven-emotion-finetuned-twitter-xlm-roberta-base + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-social_media_fake_news_detection_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-social_media_fake_news_detection_pipeline_en.md new file mode 100644 index 00000000000000..328d953d484c60 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-social_media_fake_news_detection_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English social_media_fake_news_detection_pipeline pipeline RoBertaForSequenceClassification from ljz512187207 +author: John Snow Labs +name: social_media_fake_news_detection_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`social_media_fake_news_detection_pipeline` is a English model originally trained by ljz512187207. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/social_media_fake_news_detection_pipeline_en_5.5.0_3.0_1726621693674.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/social_media_fake_news_detection_pipeline_en_5.5.0_3.0_1726621693674.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("social_media_fake_news_detection_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("social_media_fake_news_detection_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|social_media_fake_news_detection_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|465.8 MB| + +## References + +https://huggingface.co/ljz512187207/Social_Media_Fake_News_Detection + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-soongsilbert_nsmc_base_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-soongsilbert_nsmc_base_pipeline_en.md new file mode 100644 index 00000000000000..069d343fbc3b8f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-soongsilbert_nsmc_base_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English soongsilbert_nsmc_base_pipeline pipeline RoBertaForSequenceClassification from jason9693 +author: John Snow Labs +name: soongsilbert_nsmc_base_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`soongsilbert_nsmc_base_pipeline` is a English model originally trained by jason9693. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/soongsilbert_nsmc_base_pipeline_en_5.5.0_3.0_1726690004866.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/soongsilbert_nsmc_base_pipeline_en_5.5.0_3.0_1726690004866.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("soongsilbert_nsmc_base_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("soongsilbert_nsmc_base_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|soongsilbert_nsmc_base_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|368.7 MB| + +## References + +https://huggingface.co/jason9693/SoongsilBERT-nsmc-base + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-sst5_padding40model_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-sst5_padding40model_pipeline_en.md new file mode 100644 index 00000000000000..de4bf7d6ded53b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-sst5_padding40model_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English sst5_padding40model_pipeline pipeline DistilBertForSequenceClassification from Realgon +author: John Snow Labs +name: sst5_padding40model_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sst5_padding40model_pipeline` is a English model originally trained by Realgon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sst5_padding40model_pipeline_en_5.5.0_3.0_1726630180420.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sst5_padding40model_pipeline_en_5.5.0_3.0_1726630180420.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sst5_padding40model_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sst5_padding40model_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sst5_padding40model_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Realgon/sst5_padding40model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-stego_classifier_checkpoint_epoch_0_2024_07_26_14_26_52_en.md b/docs/_posts/ahmedlone127/2024-09-18-stego_classifier_checkpoint_epoch_0_2024_07_26_14_26_52_en.md new file mode 100644 index 00000000000000..48b203c72fe202 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-stego_classifier_checkpoint_epoch_0_2024_07_26_14_26_52_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English stego_classifier_checkpoint_epoch_0_2024_07_26_14_26_52 DistilBertForSequenceClassification from jvelja +author: John Snow Labs +name: stego_classifier_checkpoint_epoch_0_2024_07_26_14_26_52 +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`stego_classifier_checkpoint_epoch_0_2024_07_26_14_26_52` is a English model originally trained by jvelja. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/stego_classifier_checkpoint_epoch_0_2024_07_26_14_26_52_en_5.5.0_3.0_1726625717043.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/stego_classifier_checkpoint_epoch_0_2024_07_26_14_26_52_en_5.5.0_3.0_1726625717043.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("stego_classifier_checkpoint_epoch_0_2024_07_26_14_26_52","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("stego_classifier_checkpoint_epoch_0_2024_07_26_14_26_52", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|stego_classifier_checkpoint_epoch_0_2024_07_26_14_26_52| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/jvelja/stego-classifier-checkpoint-epoch-0-2024-07-26_14-26-52 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-t_103_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-t_103_pipeline_en.md new file mode 100644 index 00000000000000..1066756009c757 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-t_103_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English t_103_pipeline pipeline RoBertaForSequenceClassification from Pablojmed +author: John Snow Labs +name: t_103_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`t_103_pipeline` is a English model originally trained by Pablojmed. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/t_103_pipeline_en_5.5.0_3.0_1726650419310.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/t_103_pipeline_en_5.5.0_3.0_1726650419310.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("t_103_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("t_103_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|t_103_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|438.2 MB| + +## References + +https://huggingface.co/Pablojmed/t_103 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-t_8_en.md b/docs/_posts/ahmedlone127/2024-09-18-t_8_en.md new file mode 100644 index 00000000000000..418699e5a9dea6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-t_8_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English t_8 RoBertaForSequenceClassification from Pablojmed +author: John Snow Labs +name: t_8 +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`t_8` is a English model originally trained by Pablojmed. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/t_8_en_5.5.0_3.0_1726642340469.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/t_8_en_5.5.0_3.0_1726642340469.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("t_8","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("t_8", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|t_8| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|438.2 MB| + +## References + +https://huggingface.co/Pablojmed/t_8 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-t_8_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-t_8_pipeline_en.md new file mode 100644 index 00000000000000..a8023768400444 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-t_8_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English t_8_pipeline pipeline RoBertaForSequenceClassification from Pablojmed +author: John Snow Labs +name: t_8_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`t_8_pipeline` is a English model originally trained by Pablojmed. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/t_8_pipeline_en_5.5.0_3.0_1726642365902.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/t_8_pipeline_en_5.5.0_3.0_1726642365902.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("t_8_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("t_8_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|t_8_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|438.2 MB| + +## References + +https://huggingface.co/Pablojmed/t_8 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-test4_distilbert_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-test4_distilbert_pipeline_en.md new file mode 100644 index 00000000000000..5ec38e735ed859 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-test4_distilbert_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English test4_distilbert_pipeline pipeline DistilBertForSequenceClassification from Daniel246 +author: John Snow Labs +name: test4_distilbert_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`test4_distilbert_pipeline` is a English model originally trained by Daniel246. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/test4_distilbert_pipeline_en_5.5.0_3.0_1726669913613.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/test4_distilbert_pipeline_en_5.5.0_3.0_1726669913613.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("test4_distilbert_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("test4_distilbert_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|test4_distilbert_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Daniel246/test4_distilbert + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-test_finetuned__roberta_base_bne__augmented_ultrasounds_ner_en.md b/docs/_posts/ahmedlone127/2024-09-18-test_finetuned__roberta_base_bne__augmented_ultrasounds_ner_en.md new file mode 100644 index 00000000000000..402f8d17e7cb40 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-test_finetuned__roberta_base_bne__augmented_ultrasounds_ner_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English test_finetuned__roberta_base_bne__augmented_ultrasounds_ner RoBertaForTokenClassification from manucos +author: John Snow Labs +name: test_finetuned__roberta_base_bne__augmented_ultrasounds_ner +date: 2024-09-18 +tags: [en, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`test_finetuned__roberta_base_bne__augmented_ultrasounds_ner` is a English model originally trained by manucos. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/test_finetuned__roberta_base_bne__augmented_ultrasounds_ner_en_5.5.0_3.0_1726652421147.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/test_finetuned__roberta_base_bne__augmented_ultrasounds_ner_en_5.5.0_3.0_1726652421147.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("test_finetuned__roberta_base_bne__augmented_ultrasounds_ner","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("test_finetuned__roberta_base_bne__augmented_ultrasounds_ner", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|test_finetuned__roberta_base_bne__augmented_ultrasounds_ner| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|463.7 MB| + +## References + +https://huggingface.co/manucos/test-finetuned__roberta-base-bne__augmented-ultrasounds-ner \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-test_forwarder1121_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-test_forwarder1121_pipeline_en.md new file mode 100644 index 00000000000000..b8730ba34100a6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-test_forwarder1121_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English test_forwarder1121_pipeline pipeline DistilBertForSequenceClassification from forwarder1121 +author: John Snow Labs +name: test_forwarder1121_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`test_forwarder1121_pipeline` is a English model originally trained by forwarder1121. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/test_forwarder1121_pipeline_en_5.5.0_3.0_1726681828226.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/test_forwarder1121_pipeline_en_5.5.0_3.0_1726681828226.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("test_forwarder1121_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("test_forwarder1121_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|test_forwarder1121_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/forwarder1121/test + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-test_model_cynthiaaaaaaaa_en.md b/docs/_posts/ahmedlone127/2024-09-18-test_model_cynthiaaaaaaaa_en.md new file mode 100644 index 00000000000000..923f93f96eb9f3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-test_model_cynthiaaaaaaaa_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English test_model_cynthiaaaaaaaa DistilBertForSequenceClassification from Cynthiaaaaaaaa +author: John Snow Labs +name: test_model_cynthiaaaaaaaa +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`test_model_cynthiaaaaaaaa` is a English model originally trained by Cynthiaaaaaaaa. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/test_model_cynthiaaaaaaaa_en_5.5.0_3.0_1726669790814.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/test_model_cynthiaaaaaaaa_en_5.5.0_3.0_1726669790814.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("test_model_cynthiaaaaaaaa","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("test_model_cynthiaaaaaaaa", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|test_model_cynthiaaaaaaaa| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Cynthiaaaaaaaa/test_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-test_model_cynthiaaaaaaaa_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-test_model_cynthiaaaaaaaa_pipeline_en.md new file mode 100644 index 00000000000000..8d742e998d9ae2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-test_model_cynthiaaaaaaaa_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English test_model_cynthiaaaaaaaa_pipeline pipeline DistilBertForSequenceClassification from Cynthiaaaaaaaa +author: John Snow Labs +name: test_model_cynthiaaaaaaaa_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`test_model_cynthiaaaaaaaa_pipeline` is a English model originally trained by Cynthiaaaaaaaa. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/test_model_cynthiaaaaaaaa_pipeline_en_5.5.0_3.0_1726669803395.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/test_model_cynthiaaaaaaaa_pipeline_en_5.5.0_3.0_1726669803395.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("test_model_cynthiaaaaaaaa_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("test_model_cynthiaaaaaaaa_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|test_model_cynthiaaaaaaaa_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Cynthiaaaaaaaa/test_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-tibetan_roberta_g_v1_867238_en.md b/docs/_posts/ahmedlone127/2024-09-18-tibetan_roberta_g_v1_867238_en.md new file mode 100644 index 00000000000000..e9214fb9b5adf1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-tibetan_roberta_g_v1_867238_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English tibetan_roberta_g_v1_867238 RoBertaEmbeddings from spsither +author: John Snow Labs +name: tibetan_roberta_g_v1_867238 +date: 2024-09-18 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`tibetan_roberta_g_v1_867238` is a English model originally trained by spsither. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/tibetan_roberta_g_v1_867238_en_5.5.0_3.0_1726626677602.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/tibetan_roberta_g_v1_867238_en_5.5.0_3.0_1726626677602.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("tibetan_roberta_g_v1_867238","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("tibetan_roberta_g_v1_867238","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|tibetan_roberta_g_v1_867238| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|412.1 MB| + +## References + +https://huggingface.co/spsither/tibetan_RoBERTa_G_v1_867238 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-tibetan_roberta_s_e2_en.md b/docs/_posts/ahmedlone127/2024-09-18-tibetan_roberta_s_e2_en.md new file mode 100644 index 00000000000000..029dd9ef64ec27 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-tibetan_roberta_s_e2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English tibetan_roberta_s_e2 RoBertaEmbeddings from spsither +author: John Snow Labs +name: tibetan_roberta_s_e2 +date: 2024-09-18 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`tibetan_roberta_s_e2` is a English model originally trained by spsither. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/tibetan_roberta_s_e2_en_5.5.0_3.0_1726626757795.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/tibetan_roberta_s_e2_en_5.5.0_3.0_1726626757795.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("tibetan_roberta_s_e2","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("tibetan_roberta_s_e2","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|tibetan_roberta_s_e2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|311.7 MB| + +## References + +https://huggingface.co/spsither/tibetan_RoBERTa_S_e2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-tibetan_roberta_s_e2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-tibetan_roberta_s_e2_pipeline_en.md new file mode 100644 index 00000000000000..09ea46f25457e1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-tibetan_roberta_s_e2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English tibetan_roberta_s_e2_pipeline pipeline RoBertaEmbeddings from spsither +author: John Snow Labs +name: tibetan_roberta_s_e2_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`tibetan_roberta_s_e2_pipeline` is a English model originally trained by spsither. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/tibetan_roberta_s_e2_pipeline_en_5.5.0_3.0_1726626772664.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/tibetan_roberta_s_e2_pipeline_en_5.5.0_3.0_1726626772664.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("tibetan_roberta_s_e2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("tibetan_roberta_s_e2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|tibetan_roberta_s_e2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|311.7 MB| + +## References + +https://huggingface.co/spsither/tibetan_RoBERTa_S_e2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-tmp_trainer_moisesdiazm_en.md b/docs/_posts/ahmedlone127/2024-09-18-tmp_trainer_moisesdiazm_en.md new file mode 100644 index 00000000000000..84e1cb4d5786f5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-tmp_trainer_moisesdiazm_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English tmp_trainer_moisesdiazm DistilBertForSequenceClassification from moisesdiazm +author: John Snow Labs +name: tmp_trainer_moisesdiazm +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`tmp_trainer_moisesdiazm` is a English model originally trained by moisesdiazm. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/tmp_trainer_moisesdiazm_en_5.5.0_3.0_1726696111952.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/tmp_trainer_moisesdiazm_en_5.5.0_3.0_1726696111952.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("tmp_trainer_moisesdiazm","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("tmp_trainer_moisesdiazm", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|tmp_trainer_moisesdiazm| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/moisesdiazm/tmp_trainer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-topic_topic_random2_seed0_bernice_en.md b/docs/_posts/ahmedlone127/2024-09-18-topic_topic_random2_seed0_bernice_en.md new file mode 100644 index 00000000000000..de3f0d92fffda0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-topic_topic_random2_seed0_bernice_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English topic_topic_random2_seed0_bernice XlmRoBertaForSequenceClassification from tweettemposhift +author: John Snow Labs +name: topic_topic_random2_seed0_bernice +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`topic_topic_random2_seed0_bernice` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/topic_topic_random2_seed0_bernice_en_5.5.0_3.0_1726660832243.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/topic_topic_random2_seed0_bernice_en_5.5.0_3.0_1726660832243.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("topic_topic_random2_seed0_bernice","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("topic_topic_random2_seed0_bernice", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|topic_topic_random2_seed0_bernice| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|805.5 MB| + +## References + +https://huggingface.co/tweettemposhift/topic-topic_random2_seed0-bernice \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-topic_topic_random3_seed2_bernice_en.md b/docs/_posts/ahmedlone127/2024-09-18-topic_topic_random3_seed2_bernice_en.md new file mode 100644 index 00000000000000..f22860de3c59c2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-topic_topic_random3_seed2_bernice_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English topic_topic_random3_seed2_bernice XlmRoBertaForSequenceClassification from tweettemposhift +author: John Snow Labs +name: topic_topic_random3_seed2_bernice +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`topic_topic_random3_seed2_bernice` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/topic_topic_random3_seed2_bernice_en_5.5.0_3.0_1726672577417.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/topic_topic_random3_seed2_bernice_en_5.5.0_3.0_1726672577417.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("topic_topic_random3_seed2_bernice","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("topic_topic_random3_seed2_bernice", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|topic_topic_random3_seed2_bernice| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|805.6 MB| + +## References + +https://huggingface.co/tweettemposhift/topic-topic_random3_seed2-bernice \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-topic_topic_random3_seed2_bernice_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-topic_topic_random3_seed2_bernice_pipeline_en.md new file mode 100644 index 00000000000000..1a6a88fe9b0d68 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-topic_topic_random3_seed2_bernice_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English topic_topic_random3_seed2_bernice_pipeline pipeline XlmRoBertaForSequenceClassification from tweettemposhift +author: John Snow Labs +name: topic_topic_random3_seed2_bernice_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`topic_topic_random3_seed2_bernice_pipeline` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/topic_topic_random3_seed2_bernice_pipeline_en_5.5.0_3.0_1726672715966.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/topic_topic_random3_seed2_bernice_pipeline_en_5.5.0_3.0_1726672715966.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("topic_topic_random3_seed2_bernice_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("topic_topic_random3_seed2_bernice_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|topic_topic_random3_seed2_bernice_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|805.6 MB| + +## References + +https://huggingface.co/tweettemposhift/topic-topic_random3_seed2-bernice + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-toxicbot_en.md b/docs/_posts/ahmedlone127/2024-09-18-toxicbot_en.md new file mode 100644 index 00000000000000..77f1428fb38ec9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-toxicbot_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English toxicbot RoBertaForSequenceClassification from CobaltAlchemist +author: John Snow Labs +name: toxicbot +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`toxicbot` is a English model originally trained by CobaltAlchemist. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/toxicbot_en_5.5.0_3.0_1726665705723.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/toxicbot_en_5.5.0_3.0_1726665705723.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("toxicbot","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("toxicbot", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|toxicbot| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|467.9 MB| + +## References + +https://huggingface.co/CobaltAlchemist/Toxicbot \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-toxicdistilbert_en.md b/docs/_posts/ahmedlone127/2024-09-18-toxicdistilbert_en.md new file mode 100644 index 00000000000000..eb2d056083cd72 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-toxicdistilbert_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English toxicdistilbert DistilBertForSequenceClassification from YuryCHep +author: John Snow Labs +name: toxicdistilbert +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`toxicdistilbert` is a English model originally trained by YuryCHep. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/toxicdistilbert_en_5.5.0_3.0_1726695273249.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/toxicdistilbert_en_5.5.0_3.0_1726695273249.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("toxicdistilbert","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("toxicdistilbert", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|toxicdistilbert| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|507.6 MB| + +## References + +https://huggingface.co/YuryCHep/TOXICDISTILBERT \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-trainer2f_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-trainer2f_pipeline_en.md new file mode 100644 index 00000000000000..21cfddb68e3131 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-trainer2f_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English trainer2f_pipeline pipeline DistilBertForSequenceClassification from SimoneJLaudani +author: John Snow Labs +name: trainer2f_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`trainer2f_pipeline` is a English model originally trained by SimoneJLaudani. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/trainer2f_pipeline_en_5.5.0_3.0_1726695258517.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/trainer2f_pipeline_en_5.5.0_3.0_1726695258517.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("trainer2f_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("trainer2f_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|trainer2f_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/SimoneJLaudani/trainer2F + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-trainer5c_en.md b/docs/_posts/ahmedlone127/2024-09-18-trainer5c_en.md new file mode 100644 index 00000000000000..eec873b9fb048d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-trainer5c_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English trainer5c DistilBertForSequenceClassification from SimoneJLaudani +author: John Snow Labs +name: trainer5c +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`trainer5c` is a English model originally trained by SimoneJLaudani. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/trainer5c_en_5.5.0_3.0_1726680524461.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/trainer5c_en_5.5.0_3.0_1726680524461.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("trainer5c","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("trainer5c", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|trainer5c| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|410.0 MB| + +## References + +https://huggingface.co/SimoneJLaudani/trainer5c \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-transient_data_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-transient_data_pipeline_en.md new file mode 100644 index 00000000000000..3d609ea4a63da3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-transient_data_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English transient_data_pipeline pipeline DistilBertForSequenceClassification from Jingni +author: John Snow Labs +name: transient_data_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`transient_data_pipeline` is a English model originally trained by Jingni. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/transient_data_pipeline_en_5.5.0_3.0_1726670097054.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/transient_data_pipeline_en_5.5.0_3.0_1726670097054.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("transient_data_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("transient_data_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|transient_data_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Jingni/transient_data + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-twitter_roberta_base_sentence_itr0_1e_05_all_01_03_2022_13_38_07_en.md b/docs/_posts/ahmedlone127/2024-09-18-twitter_roberta_base_sentence_itr0_1e_05_all_01_03_2022_13_38_07_en.md new file mode 100644 index 00000000000000..4c31999b670551 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-twitter_roberta_base_sentence_itr0_1e_05_all_01_03_2022_13_38_07_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English twitter_roberta_base_sentence_itr0_1e_05_all_01_03_2022_13_38_07 RoBertaForSequenceClassification from ali2066 +author: John Snow Labs +name: twitter_roberta_base_sentence_itr0_1e_05_all_01_03_2022_13_38_07 +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`twitter_roberta_base_sentence_itr0_1e_05_all_01_03_2022_13_38_07` is a English model originally trained by ali2066. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/twitter_roberta_base_sentence_itr0_1e_05_all_01_03_2022_13_38_07_en_5.5.0_3.0_1726689606081.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/twitter_roberta_base_sentence_itr0_1e_05_all_01_03_2022_13_38_07_en_5.5.0_3.0_1726689606081.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("twitter_roberta_base_sentence_itr0_1e_05_all_01_03_2022_13_38_07","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("twitter_roberta_base_sentence_itr0_1e_05_all_01_03_2022_13_38_07", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|twitter_roberta_base_sentence_itr0_1e_05_all_01_03_2022_13_38_07| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|468.2 MB| + +## References + +https://huggingface.co/ali2066/twitter-roberta-base_sentence_itr0_1e-05_all_01_03_2022-13_38_07 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-twitterfin_padding80model_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-twitterfin_padding80model_pipeline_en.md new file mode 100644 index 00000000000000..5e72048700b0ec --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-twitterfin_padding80model_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English twitterfin_padding80model_pipeline pipeline DistilBertForSequenceClassification from Realgon +author: John Snow Labs +name: twitterfin_padding80model_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`twitterfin_padding80model_pipeline` is a English model originally trained by Realgon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/twitterfin_padding80model_pipeline_en_5.5.0_3.0_1726625369293.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/twitterfin_padding80model_pipeline_en_5.5.0_3.0_1726625369293.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("twitterfin_padding80model_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("twitterfin_padding80model_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|twitterfin_padding80model_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Realgon/twitterfin_padding80model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-uned_tfg_08_57_mas_frecuentes_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-uned_tfg_08_57_mas_frecuentes_pipeline_en.md new file mode 100644 index 00000000000000..5271c95685689c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-uned_tfg_08_57_mas_frecuentes_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English uned_tfg_08_57_mas_frecuentes_pipeline pipeline RoBertaForSequenceClassification from alexisdr +author: John Snow Labs +name: uned_tfg_08_57_mas_frecuentes_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`uned_tfg_08_57_mas_frecuentes_pipeline` is a English model originally trained by alexisdr. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/uned_tfg_08_57_mas_frecuentes_pipeline_en_5.5.0_3.0_1726628734094.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/uned_tfg_08_57_mas_frecuentes_pipeline_en_5.5.0_3.0_1726628734094.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("uned_tfg_08_57_mas_frecuentes_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("uned_tfg_08_57_mas_frecuentes_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|uned_tfg_08_57_mas_frecuentes_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|425.7 MB| + +## References + +https://huggingface.co/alexisdr/uned-tfg-08.57_mas_frecuentes + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-uned_tfg_08_86_mas_frecuentes_en.md b/docs/_posts/ahmedlone127/2024-09-18-uned_tfg_08_86_mas_frecuentes_en.md new file mode 100644 index 00000000000000..69987c415b5796 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-uned_tfg_08_86_mas_frecuentes_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English uned_tfg_08_86_mas_frecuentes RoBertaForSequenceClassification from alexisdr +author: John Snow Labs +name: uned_tfg_08_86_mas_frecuentes +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`uned_tfg_08_86_mas_frecuentes` is a English model originally trained by alexisdr. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/uned_tfg_08_86_mas_frecuentes_en_5.5.0_3.0_1726666545021.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/uned_tfg_08_86_mas_frecuentes_en_5.5.0_3.0_1726666545021.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("uned_tfg_08_86_mas_frecuentes","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("uned_tfg_08_86_mas_frecuentes", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|uned_tfg_08_86_mas_frecuentes| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|427.5 MB| + +## References + +https://huggingface.co/alexisdr/uned-tfg-08.86_mas_frecuentes \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-vp_visobert_syl_viwikifc_en.md b/docs/_posts/ahmedlone127/2024-09-18-vp_visobert_syl_viwikifc_en.md new file mode 100644 index 00000000000000..662c8abc7c24a4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-vp_visobert_syl_viwikifc_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English vp_visobert_syl_viwikifc XlmRoBertaForSequenceClassification from tringuyen-uit +author: John Snow Labs +name: vp_visobert_syl_viwikifc +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`vp_visobert_syl_viwikifc` is a English model originally trained by tringuyen-uit. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/vp_visobert_syl_viwikifc_en_5.5.0_3.0_1726696944388.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/vp_visobert_syl_viwikifc_en_5.5.0_3.0_1726696944388.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("vp_visobert_syl_viwikifc","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("vp_visobert_syl_viwikifc", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|vp_visobert_syl_viwikifc| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|365.9 MB| + +## References + +https://huggingface.co/tringuyen-uit/VP_ViSoBERT_syl_ViWikiFC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-withinapps_ndd_mrbs_test_content_cwadj_en.md b/docs/_posts/ahmedlone127/2024-09-18-withinapps_ndd_mrbs_test_content_cwadj_en.md new file mode 100644 index 00000000000000..ec6683a45e9cfa --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-withinapps_ndd_mrbs_test_content_cwadj_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English withinapps_ndd_mrbs_test_content_cwadj DistilBertForSequenceClassification from lgk03 +author: John Snow Labs +name: withinapps_ndd_mrbs_test_content_cwadj +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`withinapps_ndd_mrbs_test_content_cwadj` is a English model originally trained by lgk03. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/withinapps_ndd_mrbs_test_content_cwadj_en_5.5.0_3.0_1726696003827.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/withinapps_ndd_mrbs_test_content_cwadj_en_5.5.0_3.0_1726696003827.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("withinapps_ndd_mrbs_test_content_cwadj","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("withinapps_ndd_mrbs_test_content_cwadj", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|withinapps_ndd_mrbs_test_content_cwadj| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/lgk03/WITHINAPPS_NDD-mrbs_test-content-CWAdj \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-withinapps_ndd_pagekit_test_content_cwadj_en.md b/docs/_posts/ahmedlone127/2024-09-18-withinapps_ndd_pagekit_test_content_cwadj_en.md new file mode 100644 index 00000000000000..4571f0aca93ed0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-withinapps_ndd_pagekit_test_content_cwadj_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English withinapps_ndd_pagekit_test_content_cwadj DistilBertForSequenceClassification from lgk03 +author: John Snow Labs +name: withinapps_ndd_pagekit_test_content_cwadj +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`withinapps_ndd_pagekit_test_content_cwadj` is a English model originally trained by lgk03. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/withinapps_ndd_pagekit_test_content_cwadj_en_5.5.0_3.0_1726630469491.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/withinapps_ndd_pagekit_test_content_cwadj_en_5.5.0_3.0_1726630469491.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("withinapps_ndd_pagekit_test_content_cwadj","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("withinapps_ndd_pagekit_test_content_cwadj", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|withinapps_ndd_pagekit_test_content_cwadj| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/lgk03/WITHINAPPS_NDD-pagekit_test-content-CWAdj \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-withinapps_ndd_pagekit_test_content_cwadj_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-withinapps_ndd_pagekit_test_content_cwadj_pipeline_en.md new file mode 100644 index 00000000000000..663f88a919c1ba --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-withinapps_ndd_pagekit_test_content_cwadj_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English withinapps_ndd_pagekit_test_content_cwadj_pipeline pipeline DistilBertForSequenceClassification from lgk03 +author: John Snow Labs +name: withinapps_ndd_pagekit_test_content_cwadj_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`withinapps_ndd_pagekit_test_content_cwadj_pipeline` is a English model originally trained by lgk03. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/withinapps_ndd_pagekit_test_content_cwadj_pipeline_en_5.5.0_3.0_1726630481737.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/withinapps_ndd_pagekit_test_content_cwadj_pipeline_en_5.5.0_3.0_1726630481737.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("withinapps_ndd_pagekit_test_content_cwadj_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("withinapps_ndd_pagekit_test_content_cwadj_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|withinapps_ndd_pagekit_test_content_cwadj_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/lgk03/WITHINAPPS_NDD-pagekit_test-content-CWAdj + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_clickbait_mattzid_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_clickbait_mattzid_pipeline_en.md new file mode 100644 index 00000000000000..90293d8477d8e6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_clickbait_mattzid_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_clickbait_mattzid_pipeline pipeline XlmRoBertaForSequenceClassification from MattZid +author: John Snow Labs +name: xlm_roberta_base_clickbait_mattzid_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_clickbait_mattzid_pipeline` is a English model originally trained by MattZid. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_clickbait_mattzid_pipeline_en_5.5.0_3.0_1726685513238.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_clickbait_mattzid_pipeline_en_5.5.0_3.0_1726685513238.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_clickbait_mattzid_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_clickbait_mattzid_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_clickbait_mattzid_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|856.5 MB| + +## References + +https://huggingface.co/MattZid/xlm-roberta-base-clickbait + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_finetuned_kinyarwanda_kinyarwanda_finetuned_kinte_tweet_finetuned_kinyarwanda_sent2_en.md b/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_finetuned_kinyarwanda_kinyarwanda_finetuned_kinte_tweet_finetuned_kinyarwanda_sent2_en.md new file mode 100644 index 00000000000000..e16a7230e179e7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_finetuned_kinyarwanda_kinyarwanda_finetuned_kinte_tweet_finetuned_kinyarwanda_sent2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_kinyarwanda_kinyarwanda_finetuned_kinte_tweet_finetuned_kinyarwanda_sent2 XlmRoBertaForSequenceClassification from RogerB +author: John Snow Labs +name: xlm_roberta_base_finetuned_kinyarwanda_kinyarwanda_finetuned_kinte_tweet_finetuned_kinyarwanda_sent2 +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_kinyarwanda_kinyarwanda_finetuned_kinte_tweet_finetuned_kinyarwanda_sent2` is a English model originally trained by RogerB. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_kinyarwanda_kinyarwanda_finetuned_kinte_tweet_finetuned_kinyarwanda_sent2_en_5.5.0_3.0_1726685424102.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_kinyarwanda_kinyarwanda_finetuned_kinte_tweet_finetuned_kinyarwanda_sent2_en_5.5.0_3.0_1726685424102.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_finetuned_kinyarwanda_kinyarwanda_finetuned_kinte_tweet_finetuned_kinyarwanda_sent2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_finetuned_kinyarwanda_kinyarwanda_finetuned_kinte_tweet_finetuned_kinyarwanda_sent2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_kinyarwanda_kinyarwanda_finetuned_kinte_tweet_finetuned_kinyarwanda_sent2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.0 GB| + +## References + +https://huggingface.co/RogerB/xlm-roberta-base-finetuned-kinyarwanda-kin-finetuned-kinte-tweet-finetuned-kin-sent2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_finetuned_language_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_finetuned_language_pipeline_en.md new file mode 100644 index 00000000000000..a675fd63411335 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_finetuned_language_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_language_pipeline pipeline XlmRoBertaForSequenceClassification from ms25 +author: John Snow Labs +name: xlm_roberta_base_finetuned_language_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_language_pipeline` is a English model originally trained by ms25. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_language_pipeline_en_5.5.0_3.0_1726672675542.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_language_pipeline_en_5.5.0_3.0_1726672675542.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_language_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_language_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_language_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|904.3 MB| + +## References + +https://huggingface.co/ms25/xlm-roberta-base-finetuned-language + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_finetuned_marc_english_tkesonia_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_finetuned_marc_english_tkesonia_pipeline_en.md new file mode 100644 index 00000000000000..2315b7b7f2db76 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_finetuned_marc_english_tkesonia_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_marc_english_tkesonia_pipeline pipeline XlmRoBertaForSequenceClassification from tkesonia +author: John Snow Labs +name: xlm_roberta_base_finetuned_marc_english_tkesonia_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_marc_english_tkesonia_pipeline` is a English model originally trained by tkesonia. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_marc_english_tkesonia_pipeline_en_5.5.0_3.0_1726633080295.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_marc_english_tkesonia_pipeline_en_5.5.0_3.0_1726633080295.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_marc_english_tkesonia_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_marc_english_tkesonia_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_marc_english_tkesonia_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|833.5 MB| + +## References + +https://huggingface.co/tkesonia/xlm-roberta-base-finetuned-marc-en + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_finetuned_panx_all_k4west_en.md b/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_finetuned_panx_all_k4west_en.md new file mode 100644 index 00000000000000..37244840af02b9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_finetuned_panx_all_k4west_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_all_k4west XlmRoBertaForTokenClassification from k4west +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_all_k4west +date: 2024-09-18 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_all_k4west` is a English model originally trained by k4west. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_k4west_en_5.5.0_3.0_1726657082863.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_k4west_en_5.5.0_3.0_1726657082863.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_all_k4west","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_all_k4west", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_all_k4west| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|848.0 MB| + +## References + +https://huggingface.co/k4west/xlm-roberta-base-finetuned-panx-all \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_finetuned_panx_all_k4west_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_finetuned_panx_all_k4west_pipeline_en.md new file mode 100644 index 00000000000000..aeeedf80c6d1b9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_finetuned_panx_all_k4west_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_all_k4west_pipeline pipeline XlmRoBertaForTokenClassification from k4west +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_all_k4west_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_all_k4west_pipeline` is a English model originally trained by k4west. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_k4west_pipeline_en_5.5.0_3.0_1726657167112.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_k4west_pipeline_en_5.5.0_3.0_1726657167112.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_all_k4west_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_all_k4west_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_all_k4west_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|848.0 MB| + +## References + +https://huggingface.co/k4west/xlm-roberta-base-finetuned-panx-all + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_finetuned_panx_all_vantaa32_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_finetuned_panx_all_vantaa32_pipeline_en.md new file mode 100644 index 00000000000000..c48a3a73155e2e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_finetuned_panx_all_vantaa32_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_all_vantaa32_pipeline pipeline XlmRoBertaForTokenClassification from vantaa32 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_all_vantaa32_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_all_vantaa32_pipeline` is a English model originally trained by vantaa32. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_vantaa32_pipeline_en_5.5.0_3.0_1726701812115.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_vantaa32_pipeline_en_5.5.0_3.0_1726701812115.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_all_vantaa32_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_all_vantaa32_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_all_vantaa32_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|848.0 MB| + +## References + +https://huggingface.co/vantaa32/xlm-roberta-base-finetuned-panx-all + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_finetuned_panx_all_zardian_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_finetuned_panx_all_zardian_pipeline_en.md new file mode 100644 index 00000000000000..3d11e2823faa59 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_finetuned_panx_all_zardian_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_all_zardian_pipeline pipeline XlmRoBertaForTokenClassification from Zardian +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_all_zardian_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_all_zardian_pipeline` is a English model originally trained by Zardian. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_zardian_pipeline_en_5.5.0_3.0_1726635774845.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_zardian_pipeline_en_5.5.0_3.0_1726635774845.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_all_zardian_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_all_zardian_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_all_zardian_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|848.0 MB| + +## References + +https://huggingface.co/Zardian/xlm-roberta-base-finetuned-panx-all + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_finetuned_panx_english_edwardjross_en.md b/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_finetuned_panx_english_edwardjross_en.md new file mode 100644 index 00000000000000..0ab82603e16c31 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_finetuned_panx_english_edwardjross_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_english_edwardjross XlmRoBertaForTokenClassification from edwardjross +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_english_edwardjross +date: 2024-09-18 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_english_edwardjross` is a English model originally trained by edwardjross. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_edwardjross_en_5.5.0_3.0_1726657361764.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_edwardjross_en_5.5.0_3.0_1726657361764.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_english_edwardjross","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_english_edwardjross", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_english_edwardjross| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|827.2 MB| + +## References + +https://huggingface.co/edwardjross/xlm-roberta-base-finetuned-panx-en \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_finetuned_panx_english_edwardjross_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_finetuned_panx_english_edwardjross_pipeline_en.md new file mode 100644 index 00000000000000..4569196131db2b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_finetuned_panx_english_edwardjross_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_english_edwardjross_pipeline pipeline XlmRoBertaForTokenClassification from edwardjross +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_english_edwardjross_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_english_edwardjross_pipeline` is a English model originally trained by edwardjross. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_edwardjross_pipeline_en_5.5.0_3.0_1726657450029.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_edwardjross_pipeline_en_5.5.0_3.0_1726657450029.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_english_edwardjross_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_english_edwardjross_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_english_edwardjross_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|827.2 MB| + +## References + +https://huggingface.co/edwardjross/xlm-roberta-base-finetuned-panx-en + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_finetuned_panx_english_jbreunig_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_finetuned_panx_english_jbreunig_pipeline_en.md new file mode 100644 index 00000000000000..37595361794584 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_finetuned_panx_english_jbreunig_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_english_jbreunig_pipeline pipeline XlmRoBertaForTokenClassification from jbreunig +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_english_jbreunig_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_english_jbreunig_pipeline` is a English model originally trained by jbreunig. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_jbreunig_pipeline_en_5.5.0_3.0_1726636620953.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_jbreunig_pipeline_en_5.5.0_3.0_1726636620953.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_english_jbreunig_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_english_jbreunig_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_english_jbreunig_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|826.4 MB| + +## References + +https://huggingface.co/jbreunig/xlm-roberta-base-finetuned-panx-en + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_finetuned_panx_english_praboda_en.md b/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_finetuned_panx_english_praboda_en.md new file mode 100644 index 00000000000000..228b4369bd34d7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_finetuned_panx_english_praboda_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_english_praboda XlmRoBertaForTokenClassification from Praboda +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_english_praboda +date: 2024-09-18 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_english_praboda` is a English model originally trained by Praboda. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_praboda_en_5.5.0_3.0_1726635605141.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_praboda_en_5.5.0_3.0_1726635605141.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_english_praboda","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_english_praboda", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_english_praboda| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|826.4 MB| + +## References + +https://huggingface.co/Praboda/xlm-roberta-base-finetuned-panx-en \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_finetuned_panx_english_v3rx2000_en.md b/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_finetuned_panx_english_v3rx2000_en.md new file mode 100644 index 00000000000000..c030ff78b72639 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_finetuned_panx_english_v3rx2000_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_english_v3rx2000 XlmRoBertaForTokenClassification from V3RX2000 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_english_v3rx2000 +date: 2024-09-18 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_english_v3rx2000` is a English model originally trained by V3RX2000. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_v3rx2000_en_5.5.0_3.0_1726657318938.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_v3rx2000_en_5.5.0_3.0_1726657318938.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_english_v3rx2000","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_english_v3rx2000", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_english_v3rx2000| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|826.4 MB| + +## References + +https://huggingface.co/V3RX2000/xlm-roberta-base-finetuned-panx-en \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_finetuned_panx_english_v3rx2000_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_finetuned_panx_english_v3rx2000_pipeline_en.md new file mode 100644 index 00000000000000..5799a91017f652 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_finetuned_panx_english_v3rx2000_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_english_v3rx2000_pipeline pipeline XlmRoBertaForTokenClassification from V3RX2000 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_english_v3rx2000_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_english_v3rx2000_pipeline` is a English model originally trained by V3RX2000. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_v3rx2000_pipeline_en_5.5.0_3.0_1726657417487.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_v3rx2000_pipeline_en_5.5.0_3.0_1726657417487.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_english_v3rx2000_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_english_v3rx2000_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_english_v3rx2000_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|826.4 MB| + +## References + +https://huggingface.co/V3RX2000/xlm-roberta-base-finetuned-panx-en + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_finetuned_panx_french_abdelkareem_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_finetuned_panx_french_abdelkareem_pipeline_en.md new file mode 100644 index 00000000000000..87ca1c616dd992 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_finetuned_panx_french_abdelkareem_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_abdelkareem_pipeline pipeline XlmRoBertaForTokenClassification from Abdelkareem +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_abdelkareem_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_abdelkareem_pipeline` is a English model originally trained by Abdelkareem. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_abdelkareem_pipeline_en_5.5.0_3.0_1726635187036.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_abdelkareem_pipeline_en_5.5.0_3.0_1726635187036.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_abdelkareem_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_abdelkareem_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_abdelkareem_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|827.9 MB| + +## References + +https://huggingface.co/Abdelkareem/xlm-roberta-base-finetuned-panx-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_finetuned_panx_french_ajit_transformer_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_finetuned_panx_french_ajit_transformer_pipeline_en.md new file mode 100644 index 00000000000000..e6d5c6fff8bc12 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_finetuned_panx_french_ajit_transformer_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_ajit_transformer_pipeline pipeline XlmRoBertaForTokenClassification from ajit-transformer +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_ajit_transformer_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_ajit_transformer_pipeline` is a English model originally trained by ajit-transformer. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_ajit_transformer_pipeline_en_5.5.0_3.0_1726664010280.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_ajit_transformer_pipeline_en_5.5.0_3.0_1726664010280.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_ajit_transformer_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_ajit_transformer_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_ajit_transformer_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|828.9 MB| + +## References + +https://huggingface.co/ajit-transformer/xlm-roberta-base-finetuned-panx-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_finetuned_panx_french_drigb_en.md b/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_finetuned_panx_french_drigb_en.md new file mode 100644 index 00000000000000..d529783c464f21 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_finetuned_panx_french_drigb_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_drigb XlmRoBertaForTokenClassification from drigb +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_drigb +date: 2024-09-18 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_drigb` is a English model originally trained by drigb. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_drigb_en_5.5.0_3.0_1726657436243.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_drigb_en_5.5.0_3.0_1726657436243.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_french_drigb","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_french_drigb", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_drigb| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|831.2 MB| + +## References + +https://huggingface.co/drigb/xlm-roberta-base-finetuned-panx-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_finetuned_panx_french_drigb_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_finetuned_panx_french_drigb_pipeline_en.md new file mode 100644 index 00000000000000..d8d3934c8b4a7e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_finetuned_panx_french_drigb_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_drigb_pipeline pipeline XlmRoBertaForTokenClassification from drigb +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_drigb_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_drigb_pipeline` is a English model originally trained by drigb. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_drigb_pipeline_en_5.5.0_3.0_1726657521717.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_drigb_pipeline_en_5.5.0_3.0_1726657521717.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_drigb_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_drigb_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_drigb_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|831.2 MB| + +## References + +https://huggingface.co/drigb/xlm-roberta-base-finetuned-panx-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_finetuned_panx_french_mealduct_en.md b/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_finetuned_panx_french_mealduct_en.md new file mode 100644 index 00000000000000..16bb6df36de895 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_finetuned_panx_french_mealduct_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_mealduct XlmRoBertaForTokenClassification from MealDuct +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_mealduct +date: 2024-09-18 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_mealduct` is a English model originally trained by MealDuct. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_mealduct_en_5.5.0_3.0_1726656599257.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_mealduct_en_5.5.0_3.0_1726656599257.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_french_mealduct","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_french_mealduct", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_mealduct| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|827.9 MB| + +## References + +https://huggingface.co/MealDuct/xlm-roberta-base-finetuned-panx-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_finetuned_panx_french_mealduct_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_finetuned_panx_french_mealduct_pipeline_en.md new file mode 100644 index 00000000000000..4e6b4cd0a5569f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_finetuned_panx_french_mealduct_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_mealduct_pipeline pipeline XlmRoBertaForTokenClassification from MealDuct +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_mealduct_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_mealduct_pipeline` is a English model originally trained by MealDuct. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_mealduct_pipeline_en_5.5.0_3.0_1726656693508.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_mealduct_pipeline_en_5.5.0_3.0_1726656693508.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_mealduct_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_mealduct_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_mealduct_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|827.9 MB| + +## References + +https://huggingface.co/MealDuct/xlm-roberta-base-finetuned-panx-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_finetuned_panx_french_mikhab_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_finetuned_panx_french_mikhab_pipeline_en.md new file mode 100644 index 00000000000000..7f5333d8e9e876 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_finetuned_panx_french_mikhab_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_mikhab_pipeline pipeline XlmRoBertaForTokenClassification from mikhab +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_mikhab_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_mikhab_pipeline` is a English model originally trained by mikhab. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_mikhab_pipeline_en_5.5.0_3.0_1726663059750.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_mikhab_pipeline_en_5.5.0_3.0_1726663059750.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_mikhab_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_mikhab_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_mikhab_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|827.9 MB| + +## References + +https://huggingface.co/mikhab/xlm-roberta-base-finetuned-panx-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_finetuned_panx_french_reinoudbosch_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_finetuned_panx_french_reinoudbosch_pipeline_en.md new file mode 100644 index 00000000000000..758dff0eae4487 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_finetuned_panx_french_reinoudbosch_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_reinoudbosch_pipeline pipeline XlmRoBertaForTokenClassification from reinoudbosch +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_reinoudbosch_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_reinoudbosch_pipeline` is a English model originally trained by reinoudbosch. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_reinoudbosch_pipeline_en_5.5.0_3.0_1726635646117.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_reinoudbosch_pipeline_en_5.5.0_3.0_1726635646117.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_reinoudbosch_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_reinoudbosch_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_reinoudbosch_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|840.9 MB| + +## References + +https://huggingface.co/reinoudbosch/xlm-roberta-base-finetuned-panx-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_finetuned_panx_french_ryo_hsgw_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_finetuned_panx_french_ryo_hsgw_pipeline_en.md new file mode 100644 index 00000000000000..61018fe4855211 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_finetuned_panx_french_ryo_hsgw_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_ryo_hsgw_pipeline pipeline XlmRoBertaForTokenClassification from ryo-hsgw +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_ryo_hsgw_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_ryo_hsgw_pipeline` is a English model originally trained by ryo-hsgw. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_ryo_hsgw_pipeline_en_5.5.0_3.0_1726656026207.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_ryo_hsgw_pipeline_en_5.5.0_3.0_1726656026207.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_ryo_hsgw_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_ryo_hsgw_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_ryo_hsgw_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|840.9 MB| + +## References + +https://huggingface.co/ryo-hsgw/xlm-roberta-base-finetuned-panx-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_finetuned_panx_german_cykim_en.md b/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_finetuned_panx_german_cykim_en.md new file mode 100644 index 00000000000000..a9c8c77f1fcb81 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_finetuned_panx_german_cykim_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_cykim XlmRoBertaForTokenClassification from cykim +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_cykim +date: 2024-09-18 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_cykim` is a English model originally trained by cykim. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_cykim_en_5.5.0_3.0_1726636497759.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_cykim_en_5.5.0_3.0_1726636497759.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_cykim","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_cykim", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_cykim| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|853.4 MB| + +## References + +https://huggingface.co/cykim/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_finetuned_panx_german_french_abdelkareem_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_finetuned_panx_german_french_abdelkareem_pipeline_en.md new file mode 100644 index 00000000000000..993518f9be3df2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_finetuned_panx_german_french_abdelkareem_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_abdelkareem_pipeline pipeline XlmRoBertaForTokenClassification from Abdelkareem +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_abdelkareem_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_abdelkareem_pipeline` is a English model originally trained by Abdelkareem. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_abdelkareem_pipeline_en_5.5.0_3.0_1726663773276.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_abdelkareem_pipeline_en_5.5.0_3.0_1726663773276.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_abdelkareem_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_abdelkareem_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_abdelkareem_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|838.4 MB| + +## References + +https://huggingface.co/Abdelkareem/xlm-roberta-base-finetuned-panx-de-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_finetuned_panx_german_french_conorjudge_en.md b/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_finetuned_panx_german_french_conorjudge_en.md new file mode 100644 index 00000000000000..329f5235b1a568 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_finetuned_panx_german_french_conorjudge_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_conorjudge XlmRoBertaForTokenClassification from conorjudge +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_conorjudge +date: 2024-09-18 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_conorjudge` is a English model originally trained by conorjudge. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_conorjudge_en_5.5.0_3.0_1726657086323.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_conorjudge_en_5.5.0_3.0_1726657086323.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_conorjudge","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_conorjudge", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_conorjudge| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|858.2 MB| + +## References + +https://huggingface.co/conorjudge/xlm-roberta-base-finetuned-panx-de-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_finetuned_panx_german_french_conorjudge_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_finetuned_panx_german_french_conorjudge_pipeline_en.md new file mode 100644 index 00000000000000..79e7a6dfa52a45 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_finetuned_panx_german_french_conorjudge_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_conorjudge_pipeline pipeline XlmRoBertaForTokenClassification from conorjudge +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_conorjudge_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_conorjudge_pipeline` is a English model originally trained by conorjudge. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_conorjudge_pipeline_en_5.5.0_3.0_1726657154146.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_conorjudge_pipeline_en_5.5.0_3.0_1726657154146.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_conorjudge_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_conorjudge_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_conorjudge_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|858.2 MB| + +## References + +https://huggingface.co/conorjudge/xlm-roberta-base-finetuned-panx-de-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_finetuned_panx_german_french_gus07ven_en.md b/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_finetuned_panx_german_french_gus07ven_en.md new file mode 100644 index 00000000000000..f5ea4b695c4cb0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_finetuned_panx_german_french_gus07ven_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_gus07ven XlmRoBertaForTokenClassification from gus07ven +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_gus07ven +date: 2024-09-18 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_gus07ven` is a English model originally trained by gus07ven. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_gus07ven_en_5.5.0_3.0_1726664007515.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_gus07ven_en_5.5.0_3.0_1726664007515.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_gus07ven","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_gus07ven", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_gus07ven| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|858.2 MB| + +## References + +https://huggingface.co/gus07ven/xlm-roberta-base-finetuned-panx-de-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_finetuned_panx_german_french_gus07ven_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_finetuned_panx_german_french_gus07ven_pipeline_en.md new file mode 100644 index 00000000000000..a2f04cda7b7ae7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_finetuned_panx_german_french_gus07ven_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_gus07ven_pipeline pipeline XlmRoBertaForTokenClassification from gus07ven +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_gus07ven_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_gus07ven_pipeline` is a English model originally trained by gus07ven. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_gus07ven_pipeline_en_5.5.0_3.0_1726664077690.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_gus07ven_pipeline_en_5.5.0_3.0_1726664077690.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_gus07ven_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_gus07ven_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_gus07ven_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|858.2 MB| + +## References + +https://huggingface.co/gus07ven/xlm-roberta-base-finetuned-panx-de-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_finetuned_panx_german_isaacp_en.md b/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_finetuned_panx_german_isaacp_en.md new file mode 100644 index 00000000000000..ee84ccbf003426 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_finetuned_panx_german_isaacp_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_isaacp XlmRoBertaForTokenClassification from Isaacp +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_isaacp +date: 2024-09-18 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_isaacp` is a English model originally trained by Isaacp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_isaacp_en_5.5.0_3.0_1726635836369.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_isaacp_en_5.5.0_3.0_1726635836369.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_isaacp","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_isaacp", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_isaacp| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/Isaacp/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_finetuned_panx_german_km0228kr_en.md b/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_finetuned_panx_german_km0228kr_en.md new file mode 100644 index 00000000000000..7f1662c21e1d10 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_finetuned_panx_german_km0228kr_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_km0228kr XlmRoBertaForTokenClassification from km0228kr +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_km0228kr +date: 2024-09-18 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_km0228kr` is a English model originally trained by km0228kr. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_km0228kr_en_5.5.0_3.0_1726635255555.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_km0228kr_en_5.5.0_3.0_1726635255555.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_km0228kr","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_km0228kr", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_km0228kr| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/km0228kr/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_finetuned_panx_german_postrational_en.md b/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_finetuned_panx_german_postrational_en.md new file mode 100644 index 00000000000000..6dc720adf595d1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_finetuned_panx_german_postrational_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_postrational XlmRoBertaForTokenClassification from postrational +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_postrational +date: 2024-09-18 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_postrational` is a English model originally trained by postrational. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_postrational_en_5.5.0_3.0_1726656945035.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_postrational_en_5.5.0_3.0_1726656945035.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_postrational","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_postrational", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_postrational| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|840.8 MB| + +## References + +https://huggingface.co/postrational/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_finetuned_panx_german_stevevee0101_en.md b/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_finetuned_panx_german_stevevee0101_en.md new file mode 100644 index 00000000000000..4961e827e25396 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_finetuned_panx_german_stevevee0101_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_stevevee0101 XlmRoBertaForTokenClassification from stevevee0101 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_stevevee0101 +date: 2024-09-18 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_stevevee0101` is a English model originally trained by stevevee0101. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_stevevee0101_en_5.5.0_3.0_1726655796943.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_stevevee0101_en_5.5.0_3.0_1726655796943.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_stevevee0101","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_stevevee0101", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_stevevee0101| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/stevevee0101/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_finetuned_panx_german_yujini_en.md b/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_finetuned_panx_german_yujini_en.md new file mode 100644 index 00000000000000..0d3a6d4503e7d6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_finetuned_panx_german_yujini_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_yujini XlmRoBertaForTokenClassification from yujini +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_yujini +date: 2024-09-18 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_yujini` is a English model originally trained by yujini. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_yujini_en_5.5.0_3.0_1726656515148.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_yujini_en_5.5.0_3.0_1726656515148.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_yujini","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_yujini", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_yujini| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|858.8 MB| + +## References + +https://huggingface.co/yujini/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_finetuned_panx_german_yujini_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_finetuned_panx_german_yujini_pipeline_en.md new file mode 100644 index 00000000000000..fefb8357e8a817 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_finetuned_panx_german_yujini_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_yujini_pipeline pipeline XlmRoBertaForTokenClassification from yujini +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_yujini_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_yujini_pipeline` is a English model originally trained by yujini. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_yujini_pipeline_en_5.5.0_3.0_1726656579429.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_yujini_pipeline_en_5.5.0_3.0_1726656579429.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_yujini_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_yujini_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_yujini_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|858.8 MB| + +## References + +https://huggingface.co/yujini/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_finetuned_panx_italian_bessho_en.md b/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_finetuned_panx_italian_bessho_en.md new file mode 100644 index 00000000000000..dca65e38452dff --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_finetuned_panx_italian_bessho_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_bessho XlmRoBertaForTokenClassification from bessho +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_bessho +date: 2024-09-18 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_bessho` is a English model originally trained by bessho. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_bessho_en_5.5.0_3.0_1726664072532.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_bessho_en_5.5.0_3.0_1726664072532.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_bessho","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_bessho", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_bessho| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|828.6 MB| + +## References + +https://huggingface.co/bessho/xlm-roberta-base-finetuned-panx-it \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_finetuned_panx_italian_jbreunig_en.md b/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_finetuned_panx_italian_jbreunig_en.md new file mode 100644 index 00000000000000..de01bae6ce3d67 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_finetuned_panx_italian_jbreunig_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_jbreunig XlmRoBertaForTokenClassification from jbreunig +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_jbreunig +date: 2024-09-18 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_jbreunig` is a English model originally trained by jbreunig. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_jbreunig_en_5.5.0_3.0_1726656233749.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_jbreunig_en_5.5.0_3.0_1726656233749.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_jbreunig","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_jbreunig", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_jbreunig| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|828.6 MB| + +## References + +https://huggingface.co/jbreunig/xlm-roberta-base-finetuned-panx-it \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_finetuned_panx_italian_jjglilleberg_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_finetuned_panx_italian_jjglilleberg_pipeline_en.md new file mode 100644 index 00000000000000..fb6fa458976c0b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_finetuned_panx_italian_jjglilleberg_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_jjglilleberg_pipeline pipeline XlmRoBertaForTokenClassification from jjglilleberg +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_jjglilleberg_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_jjglilleberg_pipeline` is a English model originally trained by jjglilleberg. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_jjglilleberg_pipeline_en_5.5.0_3.0_1726701846152.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_jjglilleberg_pipeline_en_5.5.0_3.0_1726701846152.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_jjglilleberg_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_jjglilleberg_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_jjglilleberg_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|816.8 MB| + +## References + +https://huggingface.co/jjglilleberg/xlm-roberta-base-finetuned-panx-it + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_finetuned_raw_en.md b/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_finetuned_raw_en.md new file mode 100644 index 00000000000000..8117f7f2bbed51 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_finetuned_raw_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_raw XlmRoBertaForSequenceClassification from 5imp5on +author: John Snow Labs +name: xlm_roberta_base_finetuned_raw +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_raw` is a English model originally trained by 5imp5on. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_raw_en_5.5.0_3.0_1726697090645.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_raw_en_5.5.0_3.0_1726697090645.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_finetuned_raw","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_finetuned_raw", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_raw| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|823.7 MB| + +## References + +https://huggingface.co/5imp5on/xlm-roberta-base-finetuned-raw \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_headtuned_panx_german_en.md b/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_headtuned_panx_german_en.md new file mode 100644 index 00000000000000..9885193633ab93 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_headtuned_panx_german_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_headtuned_panx_german XlmRoBertaForTokenClassification from jeremygf +author: John Snow Labs +name: xlm_roberta_base_headtuned_panx_german +date: 2024-09-18 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_headtuned_panx_german` is a English model originally trained by jeremygf. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_headtuned_panx_german_en_5.5.0_3.0_1726656032019.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_headtuned_panx_german_en_5.5.0_3.0_1726656032019.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_headtuned_panx_german","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_headtuned_panx_german", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_headtuned_panx_german| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|653.0 MB| + +## References + +https://huggingface.co/jeremygf/xlm-roberta-base-headtuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_headtuned_panx_german_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_headtuned_panx_german_pipeline_en.md new file mode 100644 index 00000000000000..6538cf94666ef6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_headtuned_panx_german_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_headtuned_panx_german_pipeline pipeline XlmRoBertaForTokenClassification from jeremygf +author: John Snow Labs +name: xlm_roberta_base_headtuned_panx_german_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_headtuned_panx_german_pipeline` is a English model originally trained by jeremygf. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_headtuned_panx_german_pipeline_en_5.5.0_3.0_1726656227746.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_headtuned_panx_german_pipeline_en_5.5.0_3.0_1726656227746.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_headtuned_panx_german_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_headtuned_panx_german_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_headtuned_panx_german_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|653.0 MB| + +## References + +https://huggingface.co/jeremygf/xlm-roberta-base-headtuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_imdb_muzammil_eds_en.md b/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_imdb_muzammil_eds_en.md new file mode 100644 index 00000000000000..c0f8573175e218 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_imdb_muzammil_eds_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_imdb_muzammil_eds XlmRoBertaForSequenceClassification from muzammil-eds +author: John Snow Labs +name: xlm_roberta_base_imdb_muzammil_eds +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_imdb_muzammil_eds` is a English model originally trained by muzammil-eds. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_imdb_muzammil_eds_en_5.5.0_3.0_1726660831500.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_imdb_muzammil_eds_en_5.5.0_3.0_1726660831500.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_imdb_muzammil_eds","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_imdb_muzammil_eds", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_imdb_muzammil_eds| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|860.9 MB| + +## References + +https://huggingface.co/muzammil-eds/xlm-roberta-base-IMDB \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_lr0_0001_seed42_basic_original_kinyarwanda_hau_eng_train_en.md b/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_lr0_0001_seed42_basic_original_kinyarwanda_hau_eng_train_en.md new file mode 100644 index 00000000000000..2112f2e17c601f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_lr0_0001_seed42_basic_original_kinyarwanda_hau_eng_train_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_lr0_0001_seed42_basic_original_kinyarwanda_hau_eng_train XlmRoBertaForSequenceClassification from shanhy +author: John Snow Labs +name: xlm_roberta_base_lr0_0001_seed42_basic_original_kinyarwanda_hau_eng_train +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_lr0_0001_seed42_basic_original_kinyarwanda_hau_eng_train` is a English model originally trained by shanhy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_lr0_0001_seed42_basic_original_kinyarwanda_hau_eng_train_en_5.5.0_3.0_1726685932292.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_lr0_0001_seed42_basic_original_kinyarwanda_hau_eng_train_en_5.5.0_3.0_1726685932292.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_lr0_0001_seed42_basic_original_kinyarwanda_hau_eng_train","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_lr0_0001_seed42_basic_original_kinyarwanda_hau_eng_train", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_lr0_0001_seed42_basic_original_kinyarwanda_hau_eng_train| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|815.5 MB| + +## References + +https://huggingface.co/shanhy/xlm-roberta-base_lr0.0001_seed42_basic_original_kin-hau-eng_train \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_lr2e_05_seed42_basic_eng_train_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_lr2e_05_seed42_basic_eng_train_pipeline_en.md new file mode 100644 index 00000000000000..85a187875cb400 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_lr2e_05_seed42_basic_eng_train_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_lr2e_05_seed42_basic_eng_train_pipeline pipeline XlmRoBertaForSequenceClassification from shanhy +author: John Snow Labs +name: xlm_roberta_base_lr2e_05_seed42_basic_eng_train_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_lr2e_05_seed42_basic_eng_train_pipeline` is a English model originally trained by shanhy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_lr2e_05_seed42_basic_eng_train_pipeline_en_5.5.0_3.0_1726685797226.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_lr2e_05_seed42_basic_eng_train_pipeline_en_5.5.0_3.0_1726685797226.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_lr2e_05_seed42_basic_eng_train_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_lr2e_05_seed42_basic_eng_train_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_lr2e_05_seed42_basic_eng_train_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|791.0 MB| + +## References + +https://huggingface.co/shanhy/xlm-roberta-base_lr2e-05_seed42_basic_eng_train + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_nepal_bhasa_vietnam_train_1_en.md b/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_nepal_bhasa_vietnam_train_1_en.md new file mode 100644 index 00000000000000..a0a040a9443863 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_nepal_bhasa_vietnam_train_1_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_nepal_bhasa_vietnam_train_1 XlmRoBertaForSequenceClassification from ThuyNT03 +author: John Snow Labs +name: xlm_roberta_base_nepal_bhasa_vietnam_train_1 +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_nepal_bhasa_vietnam_train_1` is a English model originally trained by ThuyNT03. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_nepal_bhasa_vietnam_train_1_en_5.5.0_3.0_1726660349372.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_nepal_bhasa_vietnam_train_1_en_5.5.0_3.0_1726660349372.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_nepal_bhasa_vietnam_train_1","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_nepal_bhasa_vietnam_train_1", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_nepal_bhasa_vietnam_train_1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|794.3 MB| + +## References + +https://huggingface.co/ThuyNT03/xlm-roberta-base-New_VietNam-train-1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_rte_10_en.md b/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_rte_10_en.md new file mode 100644 index 00000000000000..0ebea33b332167 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_rte_10_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_rte_10 XlmRoBertaForSequenceClassification from tmnam20 +author: John Snow Labs +name: xlm_roberta_base_rte_10 +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_rte_10` is a English model originally trained by tmnam20. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_rte_10_en_5.5.0_3.0_1726697613753.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_rte_10_en_5.5.0_3.0_1726697613753.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_rte_10","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_rte_10", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_rte_10| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|784.1 MB| + +## References + +https://huggingface.co/tmnam20/xlm-roberta-base-rte-10 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_trimmed_arabic_tweet_sentiment_arabic_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_trimmed_arabic_tweet_sentiment_arabic_pipeline_en.md new file mode 100644 index 00000000000000..41cefee79387a3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_trimmed_arabic_tweet_sentiment_arabic_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_trimmed_arabic_tweet_sentiment_arabic_pipeline pipeline XlmRoBertaForSequenceClassification from vocabtrimmer +author: John Snow Labs +name: xlm_roberta_base_trimmed_arabic_tweet_sentiment_arabic_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_trimmed_arabic_tweet_sentiment_arabic_pipeline` is a English model originally trained by vocabtrimmer. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_trimmed_arabic_tweet_sentiment_arabic_pipeline_en_5.5.0_3.0_1726660767462.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_trimmed_arabic_tweet_sentiment_arabic_pipeline_en_5.5.0_3.0_1726660767462.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_trimmed_arabic_tweet_sentiment_arabic_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_trimmed_arabic_tweet_sentiment_arabic_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_trimmed_arabic_tweet_sentiment_arabic_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|423.5 MB| + +## References + +https://huggingface.co/vocabtrimmer/xlm-roberta-base-trimmed-ar-tweet-sentiment-ar + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_trimmed_english_10000_xnli_english_en.md b/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_trimmed_english_10000_xnli_english_en.md new file mode 100644 index 00000000000000..9da22216b5fbca --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_trimmed_english_10000_xnli_english_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_trimmed_english_10000_xnli_english XlmRoBertaForSequenceClassification from vocabtrimmer +author: John Snow Labs +name: xlm_roberta_base_trimmed_english_10000_xnli_english +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_trimmed_english_10000_xnli_english` is a English model originally trained by vocabtrimmer. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_trimmed_english_10000_xnli_english_en_5.5.0_3.0_1726671560733.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_trimmed_english_10000_xnli_english_en_5.5.0_3.0_1726671560733.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_trimmed_english_10000_xnli_english","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_trimmed_english_10000_xnli_english", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_trimmed_english_10000_xnli_english| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|353.6 MB| + +## References + +https://huggingface.co/vocabtrimmer/xlm-roberta-base-trimmed-en-10000-xnli-en \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_trimmed_spanish_60000_tweet_sentiment_spanish_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_trimmed_spanish_60000_tweet_sentiment_spanish_pipeline_en.md new file mode 100644 index 00000000000000..a210b0c175b54b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_trimmed_spanish_60000_tweet_sentiment_spanish_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_trimmed_spanish_60000_tweet_sentiment_spanish_pipeline pipeline XlmRoBertaForSequenceClassification from vocabtrimmer +author: John Snow Labs +name: xlm_roberta_base_trimmed_spanish_60000_tweet_sentiment_spanish_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_trimmed_spanish_60000_tweet_sentiment_spanish_pipeline` is a English model originally trained by vocabtrimmer. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_trimmed_spanish_60000_tweet_sentiment_spanish_pipeline_en_5.5.0_3.0_1726672308232.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_trimmed_spanish_60000_tweet_sentiment_spanish_pipeline_en_5.5.0_3.0_1726672308232.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_trimmed_spanish_60000_tweet_sentiment_spanish_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_trimmed_spanish_60000_tweet_sentiment_spanish_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_trimmed_spanish_60000_tweet_sentiment_spanish_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|442.0 MB| + +## References + +https://huggingface.co/vocabtrimmer/xlm-roberta-base-trimmed-es-60000-tweet-sentiment-es + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_ukraine_waray_philippines_official_v2_en.md b/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_ukraine_waray_philippines_official_v2_en.md new file mode 100644 index 00000000000000..ffaa21ac2a7887 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_ukraine_waray_philippines_official_v2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_ukraine_waray_philippines_official_v2 XlmRoBertaForSequenceClassification from YaraKyrychenko +author: John Snow Labs +name: xlm_roberta_base_ukraine_waray_philippines_official_v2 +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_ukraine_waray_philippines_official_v2` is a English model originally trained by YaraKyrychenko. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_ukraine_waray_philippines_official_v2_en_5.5.0_3.0_1726698185551.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_ukraine_waray_philippines_official_v2_en_5.5.0_3.0_1726698185551.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_ukraine_waray_philippines_official_v2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_ukraine_waray_philippines_official_v2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_ukraine_waray_philippines_official_v2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|880.6 MB| + +## References + +https://huggingface.co/YaraKyrychenko/xlm-roberta-base-ukraine-war-official-v2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_ukraine_waray_philippines_official_v2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_ukraine_waray_philippines_official_v2_pipeline_en.md new file mode 100644 index 00000000000000..94ca9e021ba792 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_ukraine_waray_philippines_official_v2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_ukraine_waray_philippines_official_v2_pipeline pipeline XlmRoBertaForSequenceClassification from YaraKyrychenko +author: John Snow Labs +name: xlm_roberta_base_ukraine_waray_philippines_official_v2_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_ukraine_waray_philippines_official_v2_pipeline` is a English model originally trained by YaraKyrychenko. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_ukraine_waray_philippines_official_v2_pipeline_en_5.5.0_3.0_1726698248470.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_ukraine_waray_philippines_official_v2_pipeline_en_5.5.0_3.0_1726698248470.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_ukraine_waray_philippines_official_v2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_ukraine_waray_philippines_official_v2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_ukraine_waray_philippines_official_v2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|880.6 MB| + +## References + +https://huggingface.co/YaraKyrychenko/xlm-roberta-base-ukraine-war-official-v2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_xnli_english_trimmed_english_15000_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_xnli_english_trimmed_english_15000_pipeline_en.md new file mode 100644 index 00000000000000..dd85f08528e4e0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_xnli_english_trimmed_english_15000_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_xnli_english_trimmed_english_15000_pipeline pipeline XlmRoBertaForSequenceClassification from vocabtrimmer +author: John Snow Labs +name: xlm_roberta_base_xnli_english_trimmed_english_15000_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_xnli_english_trimmed_english_15000_pipeline` is a English model originally trained by vocabtrimmer. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_xnli_english_trimmed_english_15000_pipeline_en_5.5.0_3.0_1726672312483.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_xnli_english_trimmed_english_15000_pipeline_en_5.5.0_3.0_1726672312483.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_xnli_english_trimmed_english_15000_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_xnli_english_trimmed_english_15000_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_xnli_english_trimmed_english_15000_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|367.4 MB| + +## References + +https://huggingface.co/vocabtrimmer/xlm-roberta-base-xnli-en-trimmed-en-15000 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_xnli_french_trimmed_french_10000_en.md b/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_xnli_french_trimmed_french_10000_en.md new file mode 100644 index 00000000000000..3c80d5c326151a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_xnli_french_trimmed_french_10000_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_xnli_french_trimmed_french_10000 XlmRoBertaForSequenceClassification from vocabtrimmer +author: John Snow Labs +name: xlm_roberta_base_xnli_french_trimmed_french_10000 +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_xnli_french_trimmed_french_10000` is a English model originally trained by vocabtrimmer. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_xnli_french_trimmed_french_10000_en_5.5.0_3.0_1726659561011.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_xnli_french_trimmed_french_10000_en_5.5.0_3.0_1726659561011.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_xnli_french_trimmed_french_10000","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_xnli_french_trimmed_french_10000", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_xnli_french_trimmed_french_10000| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|353.5 MB| + +## References + +https://huggingface.co/vocabtrimmer/xlm-roberta-base-xnli-fr-trimmed-fr-10000 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_xnli_french_trimmed_french_10000_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_xnli_french_trimmed_french_10000_pipeline_en.md new file mode 100644 index 00000000000000..7c7b80272a63b3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_xnli_french_trimmed_french_10000_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_xnli_french_trimmed_french_10000_pipeline pipeline XlmRoBertaForSequenceClassification from vocabtrimmer +author: John Snow Labs +name: xlm_roberta_base_xnli_french_trimmed_french_10000_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_xnli_french_trimmed_french_10000_pipeline` is a English model originally trained by vocabtrimmer. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_xnli_french_trimmed_french_10000_pipeline_en_5.5.0_3.0_1726659578880.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_xnli_french_trimmed_french_10000_pipeline_en_5.5.0_3.0_1726659578880.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_xnli_french_trimmed_french_10000_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_xnli_french_trimmed_french_10000_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_xnli_french_trimmed_french_10000_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|353.5 MB| + +## References + +https://huggingface.co/vocabtrimmer/xlm-roberta-base-xnli-fr-trimmed-fr-10000 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_xnli_german_trimmed_german_10000_en.md b/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_xnli_german_trimmed_german_10000_en.md new file mode 100644 index 00000000000000..5ce6c06d77177d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_xnli_german_trimmed_german_10000_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_xnli_german_trimmed_german_10000 XlmRoBertaForSequenceClassification from vocabtrimmer +author: John Snow Labs +name: xlm_roberta_base_xnli_german_trimmed_german_10000 +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_xnli_german_trimmed_german_10000` is a English model originally trained by vocabtrimmer. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_xnli_german_trimmed_german_10000_en_5.5.0_3.0_1726660354839.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_xnli_german_trimmed_german_10000_en_5.5.0_3.0_1726660354839.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_xnli_german_trimmed_german_10000","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_xnli_german_trimmed_german_10000", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_xnli_german_trimmed_german_10000| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|353.6 MB| + +## References + +https://huggingface.co/vocabtrimmer/xlm-roberta-base-xnli-de-trimmed-de-10000 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_xnli_spanish_trimmed_spanish_15000_en.md b/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_xnli_spanish_trimmed_spanish_15000_en.md new file mode 100644 index 00000000000000..8022de68702a66 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_xnli_spanish_trimmed_spanish_15000_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_xnli_spanish_trimmed_spanish_15000 XlmRoBertaForSequenceClassification from vocabtrimmer +author: John Snow Labs +name: xlm_roberta_base_xnli_spanish_trimmed_spanish_15000 +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_xnli_spanish_trimmed_spanish_15000` is a English model originally trained by vocabtrimmer. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_xnli_spanish_trimmed_spanish_15000_en_5.5.0_3.0_1726672372049.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_xnli_spanish_trimmed_spanish_15000_en_5.5.0_3.0_1726672372049.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_xnli_spanish_trimmed_spanish_15000","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_xnli_spanish_trimmed_spanish_15000", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_xnli_spanish_trimmed_spanish_15000| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|367.4 MB| + +## References + +https://huggingface.co/vocabtrimmer/xlm-roberta-base-xnli-es-trimmed-es-15000 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_xnli_spanish_trimmed_spanish_15000_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_xnli_spanish_trimmed_spanish_15000_pipeline_en.md new file mode 100644 index 00000000000000..862d97abbafe7d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_xnli_spanish_trimmed_spanish_15000_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_xnli_spanish_trimmed_spanish_15000_pipeline pipeline XlmRoBertaForSequenceClassification from vocabtrimmer +author: John Snow Labs +name: xlm_roberta_base_xnli_spanish_trimmed_spanish_15000_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_xnli_spanish_trimmed_spanish_15000_pipeline` is a English model originally trained by vocabtrimmer. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_xnli_spanish_trimmed_spanish_15000_pipeline_en_5.5.0_3.0_1726672390020.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_xnli_spanish_trimmed_spanish_15000_pipeline_en_5.5.0_3.0_1726672390020.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_xnli_spanish_trimmed_spanish_15000_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_xnli_spanish_trimmed_spanish_15000_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_xnli_spanish_trimmed_spanish_15000_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|367.5 MB| + +## References + +https://huggingface.co/vocabtrimmer/xlm-roberta-base-xnli-es-trimmed-es-15000 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_zero_shot_classifier_xnli_anli_mnli_snli_xx.md b/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_zero_shot_classifier_xnli_anli_mnli_snli_xx.md new file mode 100644 index 00000000000000..4cb29b24e4b65a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_zero_shot_classifier_xnli_anli_mnli_snli_xx.md @@ -0,0 +1,107 @@ +--- +layout: model +title: XlmRoBertaZero-Shot Classification Base xlm_roberta_base_zero_shot_classifier_xnli_anli_mnli_snli +author: John Snow Labs +name: xlm_roberta_base_zero_shot_classifier_xnli_anli_mnli_snli +date: 2024-09-18 +tags: [token_classification, xlm_roberta, openvino, xx, open_source] +task: Zero-Shot Classification +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: openvino +annotator: XlmRoBertaForZeroShotClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +“ + +This model is intended to be used for zero-shot text classification, especially in English. It is fine-tuned on NLI by using XlmRoberta Large model. + +XlmRoBertaForZeroShotClassificationusing a ModelForSequenceClassification trained on NLI (natural language inference) tasks. Equivalent of TFXLMRoBertaForZeroShotClassification models, but these models don’t require a hardcoded number of potential classes, they can be chosen at runtime. It usually means it’s slower but it is much more flexible. + +We used TFXLMRobertaForSequenceClassification to train this model and used XlmRoBertaForZeroShotClassification annotator in Spark NLP 🚀 for prediction at scale! + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_zero_shot_classifier_xnli_anli_mnli_snli_xx_5.5.0_3.0_1726659257571.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_zero_shot_classifier_xnli_anli_mnli_snli_xx_5.5.0_3.0_1726659257571.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + document_assembler = DocumentAssembler() \ +.setInputCol('text') \ +.setOutputCol('document') + +tokenizer = Tokenizer() \ +.setInputCols(['document']) \ +.setOutputCol('token') + +zeroShotClassifier = XlmRobertaForSequenceClassification \ +.pretrained('xlm_roberta_base_zero_shot_classifier_xnli_anli_mnli_snli', 'xx') \ +.setInputCols(['token', 'document']) \ +.setOutputCol('class') \ +.setCaseSensitive(True) \ +.setMaxSentenceLength(512) \ +.setCandidateLabels(["urgent", "mobile", "travel", "movie", "music", "sport", "weather", "technology"]) + +pipeline = Pipeline(stages=[ +document_assembler, +tokenizer, +zeroShotClassifier +]) + +example = spark.createDataFrame([['I have a problem with my iphone that needs to be resolved asap!!']]).toDF("text") +result = pipeline.fit(example).transform(example) + +``` +```scala + +val document_assembler = DocumentAssembler() +.setInputCol("text") +.setOutputCol("document") + +val tokenizer = Tokenizer() +.setInputCols("document") +.setOutputCol("token") + +val zeroShotClassifier = XlmRobertaForSequenceClassification.pretrained("xlm_roberta_base_zero_shot_classifier_xnli_anli_mnli_snli", "xx") +.setInputCols("document", "token") +.setOutputCol("class") +.setCaseSensitive(true) +.setMaxSentenceLength(512) +.setCandidateLabels(Array("urgent", "mobile", "travel", "movie", "music", "sport", "weather", "technology")) + +val pipeline = new Pipeline().setStages(Array(document_assembler, tokenizer, zeroShotClassifier)) +val example = Seq("I have a problem with my iphone that needs to be resolved asap!!").toDS.toDF("text") +val result = pipeline.fit(example).transform(example) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_zero_shot_classifier_xnli_anli_mnli_snli| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[token, document]| +|Output Labels:|[label]| +|Language:|xx| +|Size:|900.0 MB| +|Case sensitive:|true| \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-xlmr_base_finetuned_igbo_2e_4_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-xlmr_base_finetuned_igbo_2e_4_pipeline_en.md new file mode 100644 index 00000000000000..8a96d87a356e29 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-xlmr_base_finetuned_igbo_2e_4_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlmr_base_finetuned_igbo_2e_4_pipeline pipeline XlmRoBertaForTokenClassification from grace-pro +author: John Snow Labs +name: xlmr_base_finetuned_igbo_2e_4_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlmr_base_finetuned_igbo_2e_4_pipeline` is a English model originally trained by grace-pro. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmr_base_finetuned_igbo_2e_4_pipeline_en_5.5.0_3.0_1726663592733.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmr_base_finetuned_igbo_2e_4_pipeline_en_5.5.0_3.0_1726663592733.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlmr_base_finetuned_igbo_2e_4_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlmr_base_finetuned_igbo_2e_4_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmr_base_finetuned_igbo_2e_4_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|865.0 MB| + +## References + +https://huggingface.co/grace-pro/xlmr-base-finetuned-igbo-2e-4 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-xlmr_base_trained_conll2002_english_en.md b/docs/_posts/ahmedlone127/2024-09-18-xlmr_base_trained_conll2002_english_en.md new file mode 100644 index 00000000000000..ce657dd27f79b6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-xlmr_base_trained_conll2002_english_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlmr_base_trained_conll2002_english XlmRoBertaForTokenClassification from DeepaPeri +author: John Snow Labs +name: xlmr_base_trained_conll2002_english +date: 2024-09-18 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlmr_base_trained_conll2002_english` is a English model originally trained by DeepaPeri. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmr_base_trained_conll2002_english_en_5.5.0_3.0_1726656852413.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmr_base_trained_conll2002_english_en_5.5.0_3.0_1726656852413.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlmr_base_trained_conll2002_english","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlmr_base_trained_conll2002_english", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmr_base_trained_conll2002_english| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|792.1 MB| + +## References + +https://huggingface.co/DeepaPeri/XLMR-BASE-TRAINED-CONLL2002-en \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-xlmr_model_name_finetuned_panx_german_en.md b/docs/_posts/ahmedlone127/2024-09-18-xlmr_model_name_finetuned_panx_german_en.md new file mode 100644 index 00000000000000..32c7e6ed684947 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-xlmr_model_name_finetuned_panx_german_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlmr_model_name_finetuned_panx_german XlmRoBertaForTokenClassification from Denilah +author: John Snow Labs +name: xlmr_model_name_finetuned_panx_german +date: 2024-09-18 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlmr_model_name_finetuned_panx_german` is a English model originally trained by Denilah. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmr_model_name_finetuned_panx_german_en_5.5.0_3.0_1726701746949.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmr_model_name_finetuned_panx_german_en_5.5.0_3.0_1726701746949.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlmr_model_name_finetuned_panx_german","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlmr_model_name_finetuned_panx_german", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmr_model_name_finetuned_panx_german| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|853.7 MB| + +## References + +https://huggingface.co/Denilah/xlmr_model_name-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-xlmr_model_name_finetuned_panx_german_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-xlmr_model_name_finetuned_panx_german_pipeline_en.md new file mode 100644 index 00000000000000..57bcddd8a74435 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-xlmr_model_name_finetuned_panx_german_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlmr_model_name_finetuned_panx_german_pipeline pipeline XlmRoBertaForTokenClassification from Denilah +author: John Snow Labs +name: xlmr_model_name_finetuned_panx_german_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlmr_model_name_finetuned_panx_german_pipeline` is a English model originally trained by Denilah. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmr_model_name_finetuned_panx_german_pipeline_en_5.5.0_3.0_1726701813783.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmr_model_name_finetuned_panx_german_pipeline_en_5.5.0_3.0_1726701813783.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlmr_model_name_finetuned_panx_german_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlmr_model_name_finetuned_panx_german_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmr_model_name_finetuned_panx_german_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|853.7 MB| + +## References + +https://huggingface.co/Denilah/xlmr_model_name-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-xlmr_nepali_english_norwegian_shuffled_orig_test1000_en.md b/docs/_posts/ahmedlone127/2024-09-18-xlmr_nepali_english_norwegian_shuffled_orig_test1000_en.md new file mode 100644 index 00000000000000..22628ac0c856c5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-xlmr_nepali_english_norwegian_shuffled_orig_test1000_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlmr_nepali_english_norwegian_shuffled_orig_test1000 XlmRoBertaForSequenceClassification from patpizio +author: John Snow Labs +name: xlmr_nepali_english_norwegian_shuffled_orig_test1000 +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlmr_nepali_english_norwegian_shuffled_orig_test1000` is a English model originally trained by patpizio. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmr_nepali_english_norwegian_shuffled_orig_test1000_en_5.5.0_3.0_1726660031400.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmr_nepali_english_norwegian_shuffled_orig_test1000_en_5.5.0_3.0_1726660031400.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlmr_nepali_english_norwegian_shuffled_orig_test1000","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlmr_nepali_english_norwegian_shuffled_orig_test1000", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmr_nepali_english_norwegian_shuffled_orig_test1000| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|819.8 MB| + +## References + +https://huggingface.co/patpizio/xlmr-ne-en-no_shuffled-orig-test1000 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-xlmrobertanews_en.md b/docs/_posts/ahmedlone127/2024-09-18-xlmrobertanews_en.md new file mode 100644 index 00000000000000..35116eaf3716d0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-xlmrobertanews_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlmrobertanews XlmRoBertaForSequenceClassification from oe2015 +author: John Snow Labs +name: xlmrobertanews +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlmrobertanews` is a English model originally trained by oe2015. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmrobertanews_en_5.5.0_3.0_1726697982106.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmrobertanews_en_5.5.0_3.0_1726697982106.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlmrobertanews","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlmrobertanews", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmrobertanews| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|828.9 MB| + +## References + +https://huggingface.co/oe2015/XLMRobertaNews \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-xnli_xlm_r_only_chinese_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-xnli_xlm_r_only_chinese_pipeline_en.md new file mode 100644 index 00000000000000..e9f6e69ba397d9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-xnli_xlm_r_only_chinese_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xnli_xlm_r_only_chinese_pipeline pipeline XlmRoBertaForSequenceClassification from semindan +author: John Snow Labs +name: xnli_xlm_r_only_chinese_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xnli_xlm_r_only_chinese_pipeline` is a English model originally trained by semindan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xnli_xlm_r_only_chinese_pipeline_en_5.5.0_3.0_1726633084035.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xnli_xlm_r_only_chinese_pipeline_en_5.5.0_3.0_1726633084035.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xnli_xlm_r_only_chinese_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xnli_xlm_r_only_chinese_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xnli_xlm_r_only_chinese_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|807.1 MB| + +## References + +https://huggingface.co/semindan/xnli_xlm_r_only_zh + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-yelp_polarity_roberta_base_seed_3_en.md b/docs/_posts/ahmedlone127/2024-09-18-yelp_polarity_roberta_base_seed_3_en.md new file mode 100644 index 00000000000000..d181543be0e96b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-yelp_polarity_roberta_base_seed_3_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English yelp_polarity_roberta_base_seed_3 RoBertaForSequenceClassification from utahnlp +author: John Snow Labs +name: yelp_polarity_roberta_base_seed_3 +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`yelp_polarity_roberta_base_seed_3` is a English model originally trained by utahnlp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/yelp_polarity_roberta_base_seed_3_en_5.5.0_3.0_1726628331831.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/yelp_polarity_roberta_base_seed_3_en_5.5.0_3.0_1726628331831.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("yelp_polarity_roberta_base_seed_3","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("yelp_polarity_roberta_base_seed_3", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|yelp_polarity_roberta_base_seed_3| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|465.8 MB| + +## References + +https://huggingface.co/utahnlp/yelp_polarity_roberta-base_seed-3 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-yelp_polarity_roberta_large_seed_2_en.md b/docs/_posts/ahmedlone127/2024-09-18-yelp_polarity_roberta_large_seed_2_en.md new file mode 100644 index 00000000000000..6d49840b41aebb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-yelp_polarity_roberta_large_seed_2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English yelp_polarity_roberta_large_seed_2 RoBertaForSequenceClassification from utahnlp +author: John Snow Labs +name: yelp_polarity_roberta_large_seed_2 +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`yelp_polarity_roberta_large_seed_2` is a English model originally trained by utahnlp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/yelp_polarity_roberta_large_seed_2_en_5.5.0_3.0_1726649776701.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/yelp_polarity_roberta_large_seed_2_en_5.5.0_3.0_1726649776701.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("yelp_polarity_roberta_large_seed_2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("yelp_polarity_roberta_large_seed_2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|yelp_polarity_roberta_large_seed_2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/utahnlp/yelp_polarity_roberta-large_seed-2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-yiyang_test_en.md b/docs/_posts/ahmedlone127/2024-09-18-yiyang_test_en.md new file mode 100644 index 00000000000000..3d533a886e0907 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-yiyang_test_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English yiyang_test BertForSequenceClassification from yiyang0101 +author: John Snow Labs +name: yiyang_test +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`yiyang_test` is a English model originally trained by yiyang0101. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/yiyang_test_en_5.5.0_3.0_1726624162910.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/yiyang_test_en_5.5.0_3.0_1726624162910.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("yiyang_test","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("yiyang_test", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|yiyang_test| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/yiyang0101/yiyang-test \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-4_datasets_fake_news_with_balanced_5_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-4_datasets_fake_news_with_balanced_5_pipeline_en.md new file mode 100644 index 00000000000000..1f90a4d816adc3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-4_datasets_fake_news_with_balanced_5_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English 4_datasets_fake_news_with_balanced_5_pipeline pipeline DistilBertForSequenceClassification from littlepinhorse +author: John Snow Labs +name: 4_datasets_fake_news_with_balanced_5_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`4_datasets_fake_news_with_balanced_5_pipeline` is a English model originally trained by littlepinhorse. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/4_datasets_fake_news_with_balanced_5_pipeline_en_5.5.0_3.0_1726744064435.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/4_datasets_fake_news_with_balanced_5_pipeline_en_5.5.0_3.0_1726744064435.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("4_datasets_fake_news_with_balanced_5_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("4_datasets_fake_news_with_balanced_5_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|4_datasets_fake_news_with_balanced_5_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/littlepinhorse/4_datasets_fake_news_with_balanced_5 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-abstractclassifier_en.md b/docs/_posts/ahmedlone127/2024-09-19-abstractclassifier_en.md new file mode 100644 index 00000000000000..197c9b1e353f90 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-abstractclassifier_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English abstractclassifier DistilBertForSequenceClassification from abehandlerorg +author: John Snow Labs +name: abstractclassifier +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`abstractclassifier` is a English model originally trained by abehandlerorg. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/abstractclassifier_en_5.5.0_3.0_1726764031479.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/abstractclassifier_en_5.5.0_3.0_1726764031479.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("abstractclassifier","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("abstractclassifier", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|abstractclassifier| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/abehandlerorg/abstractclassifier \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-abstractclassifier_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-abstractclassifier_pipeline_en.md new file mode 100644 index 00000000000000..f204ca8e03b6e9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-abstractclassifier_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English abstractclassifier_pipeline pipeline DistilBertForSequenceClassification from abehandlerorg +author: John Snow Labs +name: abstractclassifier_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`abstractclassifier_pipeline` is a English model originally trained by abehandlerorg. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/abstractclassifier_pipeline_en_5.5.0_3.0_1726764044204.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/abstractclassifier_pipeline_en_5.5.0_3.0_1726764044204.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("abstractclassifier_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("abstractclassifier_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|abstractclassifier_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/abehandlerorg/abstractclassifier + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-aift_model_review_multiple_label_classification_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-aift_model_review_multiple_label_classification_pipeline_en.md new file mode 100644 index 00000000000000..b57fadbcec6638 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-aift_model_review_multiple_label_classification_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English aift_model_review_multiple_label_classification_pipeline pipeline DistilBertForSequenceClassification from Cielciel +author: John Snow Labs +name: aift_model_review_multiple_label_classification_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`aift_model_review_multiple_label_classification_pipeline` is a English model originally trained by Cielciel. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/aift_model_review_multiple_label_classification_pipeline_en_5.5.0_3.0_1726763556694.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/aift_model_review_multiple_label_classification_pipeline_en_5.5.0_3.0_1726763556694.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("aift_model_review_multiple_label_classification_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("aift_model_review_multiple_label_classification_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|aift_model_review_multiple_label_classification_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Cielciel/aift-model-review-multiple-label-classification + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-alberta_base_nbolton04_en.md b/docs/_posts/ahmedlone127/2024-09-19-alberta_base_nbolton04_en.md new file mode 100644 index 00000000000000..4918c17fd8fef0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-alberta_base_nbolton04_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English alberta_base_nbolton04 RoBertaForSequenceClassification from nbolton04 +author: John Snow Labs +name: alberta_base_nbolton04 +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`alberta_base_nbolton04` is a English model originally trained by nbolton04. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/alberta_base_nbolton04_en_5.5.0_3.0_1726726447654.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/alberta_base_nbolton04_en_5.5.0_3.0_1726726447654.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("alberta_base_nbolton04","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("alberta_base_nbolton04", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|alberta_base_nbolton04| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|441.6 MB| + +## References + +https://huggingface.co/nbolton04/alberta_base \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-alberta_base_nbolton04_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-alberta_base_nbolton04_pipeline_en.md new file mode 100644 index 00000000000000..b968bb82452d1f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-alberta_base_nbolton04_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English alberta_base_nbolton04_pipeline pipeline RoBertaForSequenceClassification from nbolton04 +author: John Snow Labs +name: alberta_base_nbolton04_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`alberta_base_nbolton04_pipeline` is a English model originally trained by nbolton04. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/alberta_base_nbolton04_pipeline_en_5.5.0_3.0_1726726474074.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/alberta_base_nbolton04_pipeline_en_5.5.0_3.0_1726726474074.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("alberta_base_nbolton04_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("alberta_base_nbolton04_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|alberta_base_nbolton04_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|441.7 MB| + +## References + +https://huggingface.co/nbolton04/alberta_base + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-all_roberta_large_v1_work_9_16_5_en.md b/docs/_posts/ahmedlone127/2024-09-19-all_roberta_large_v1_work_9_16_5_en.md new file mode 100644 index 00000000000000..5de85d51428477 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-all_roberta_large_v1_work_9_16_5_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English all_roberta_large_v1_work_9_16_5 RoBertaForSequenceClassification from fathyshalab +author: John Snow Labs +name: all_roberta_large_v1_work_9_16_5 +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`all_roberta_large_v1_work_9_16_5` is a English model originally trained by fathyshalab. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/all_roberta_large_v1_work_9_16_5_en_5.5.0_3.0_1726726553874.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/all_roberta_large_v1_work_9_16_5_en_5.5.0_3.0_1726726553874.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("all_roberta_large_v1_work_9_16_5","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("all_roberta_large_v1_work_9_16_5", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|all_roberta_large_v1_work_9_16_5| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/fathyshalab/all-roberta-large-v1-work-9-16-5 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-ambert_en.md b/docs/_posts/ahmedlone127/2024-09-19-ambert_en.md new file mode 100644 index 00000000000000..6681e7b82b3794 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-ambert_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English ambert RoBertaEmbeddings from surafelkindu +author: John Snow Labs +name: ambert +date: 2024-09-19 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ambert` is a English model originally trained by surafelkindu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ambert_en_5.5.0_3.0_1726748980805.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ambert_en_5.5.0_3.0_1726748980805.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("ambert","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("ambert","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ambert| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|311.5 MB| + +## References + +https://huggingface.co/surafelkindu/AmBERT \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-angela_punc_diacritics_eval_en.md b/docs/_posts/ahmedlone127/2024-09-19-angela_punc_diacritics_eval_en.md new file mode 100644 index 00000000000000..fb490673920dfd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-angela_punc_diacritics_eval_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English angela_punc_diacritics_eval XlmRoBertaForTokenClassification from azhang1212 +author: John Snow Labs +name: angela_punc_diacritics_eval +date: 2024-09-19 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`angela_punc_diacritics_eval` is a English model originally trained by azhang1212. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/angela_punc_diacritics_eval_en_5.5.0_3.0_1726708862780.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/angela_punc_diacritics_eval_en_5.5.0_3.0_1726708862780.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("angela_punc_diacritics_eval","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("angela_punc_diacritics_eval", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|angela_punc_diacritics_eval| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|1.0 GB| + +## References + +https://huggingface.co/azhang1212/angela_punc_diacritics_eval \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-arbovirus_bert_base_dylr_1e_5_n_10_bosnian_32_en.md b/docs/_posts/ahmedlone127/2024-09-19-arbovirus_bert_base_dylr_1e_5_n_10_bosnian_32_en.md new file mode 100644 index 00000000000000..1cb78be8e222cb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-arbovirus_bert_base_dylr_1e_5_n_10_bosnian_32_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English arbovirus_bert_base_dylr_1e_5_n_10_bosnian_32 BertEmbeddings from mmcleige +author: John Snow Labs +name: arbovirus_bert_base_dylr_1e_5_n_10_bosnian_32 +date: 2024-09-19 +tags: [en, open_source, onnx, embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`arbovirus_bert_base_dylr_1e_5_n_10_bosnian_32` is a English model originally trained by mmcleige. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/arbovirus_bert_base_dylr_1e_5_n_10_bosnian_32_en_5.5.0_3.0_1726744433568.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/arbovirus_bert_base_dylr_1e_5_n_10_bosnian_32_en_5.5.0_3.0_1726744433568.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = BertEmbeddings.pretrained("arbovirus_bert_base_dylr_1e_5_n_10_bosnian_32","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = BertEmbeddings.pretrained("arbovirus_bert_base_dylr_1e_5_n_10_bosnian_32","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|arbovirus_bert_base_dylr_1e_5_n_10_bosnian_32| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[bert]| +|Language:|en| +|Size:|403.1 MB| + +## References + +https://huggingface.co/mmcleige/arbovirus_bert_base_DYLR.1e-5_N.10_BS.32 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-arbovirus_bert_base_dylr_1e_5_n_10_bosnian_32_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-arbovirus_bert_base_dylr_1e_5_n_10_bosnian_32_pipeline_en.md new file mode 100644 index 00000000000000..8c3e2fb4c8f846 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-arbovirus_bert_base_dylr_1e_5_n_10_bosnian_32_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English arbovirus_bert_base_dylr_1e_5_n_10_bosnian_32_pipeline pipeline BertEmbeddings from mmcleige +author: John Snow Labs +name: arbovirus_bert_base_dylr_1e_5_n_10_bosnian_32_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`arbovirus_bert_base_dylr_1e_5_n_10_bosnian_32_pipeline` is a English model originally trained by mmcleige. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/arbovirus_bert_base_dylr_1e_5_n_10_bosnian_32_pipeline_en_5.5.0_3.0_1726744452421.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/arbovirus_bert_base_dylr_1e_5_n_10_bosnian_32_pipeline_en_5.5.0_3.0_1726744452421.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("arbovirus_bert_base_dylr_1e_5_n_10_bosnian_32_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("arbovirus_bert_base_dylr_1e_5_n_10_bosnian_32_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|arbovirus_bert_base_dylr_1e_5_n_10_bosnian_32_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|403.1 MB| + +## References + +https://huggingface.co/mmcleige/arbovirus_bert_base_DYLR.1e-5_N.10_BS.32 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-article_sentiment_analysis_model_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-article_sentiment_analysis_model_pipeline_en.md new file mode 100644 index 00000000000000..ecb349b6684d7d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-article_sentiment_analysis_model_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English article_sentiment_analysis_model_pipeline pipeline DistilBertForSequenceClassification from jfr139 +author: John Snow Labs +name: article_sentiment_analysis_model_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`article_sentiment_analysis_model_pipeline` is a English model originally trained by jfr139. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/article_sentiment_analysis_model_pipeline_en_5.5.0_3.0_1726763738024.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/article_sentiment_analysis_model_pipeline_en_5.5.0_3.0_1726763738024.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("article_sentiment_analysis_model_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("article_sentiment_analysis_model_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|article_sentiment_analysis_model_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/jfr139/article-sentiment-analysis-model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-autotrain_g10wr_ryb7t_en.md b/docs/_posts/ahmedlone127/2024-09-19-autotrain_g10wr_ryb7t_en.md new file mode 100644 index 00000000000000..a73e52c3379a9f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-autotrain_g10wr_ryb7t_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English autotrain_g10wr_ryb7t RoBertaForTokenClassification from bikashpatra +author: John Snow Labs +name: autotrain_g10wr_ryb7t +date: 2024-09-19 +tags: [en, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`autotrain_g10wr_ryb7t` is a English model originally trained by bikashpatra. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/autotrain_g10wr_ryb7t_en_5.5.0_3.0_1726730984867.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/autotrain_g10wr_ryb7t_en_5.5.0_3.0_1726730984867.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("autotrain_g10wr_ryb7t","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("autotrain_g10wr_ryb7t", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|autotrain_g10wr_ryb7t| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/bikashpatra/autotrain-g10wr-ryb7t \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-babylm_jde_5_en.md b/docs/_posts/ahmedlone127/2024-09-19-babylm_jde_5_en.md new file mode 100644 index 00000000000000..bac0b38c4a583f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-babylm_jde_5_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English babylm_jde_5 RoBertaEmbeddings from jdebene +author: John Snow Labs +name: babylm_jde_5 +date: 2024-09-19 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`babylm_jde_5` is a English model originally trained by jdebene. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/babylm_jde_5_en_5.5.0_3.0_1726778195830.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/babylm_jde_5_en_5.5.0_3.0_1726778195830.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("babylm_jde_5","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("babylm_jde_5","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|babylm_jde_5| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|257.5 MB| + +## References + +https://huggingface.co/jdebene/BabyLM-jde-5 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-babylm_jde_5_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-babylm_jde_5_pipeline_en.md new file mode 100644 index 00000000000000..c4c98112f33ca3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-babylm_jde_5_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English babylm_jde_5_pipeline pipeline RoBertaEmbeddings from jdebene +author: John Snow Labs +name: babylm_jde_5_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`babylm_jde_5_pipeline` is a English model originally trained by jdebene. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/babylm_jde_5_pipeline_en_5.5.0_3.0_1726778208447.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/babylm_jde_5_pipeline_en_5.5.0_3.0_1726778208447.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("babylm_jde_5_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("babylm_jde_5_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|babylm_jde_5_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|257.6 MB| + +## References + +https://huggingface.co/jdebene/BabyLM-jde-5 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-bailii_roberta_en.md b/docs/_posts/ahmedlone127/2024-09-19-bailii_roberta_en.md new file mode 100644 index 00000000000000..7da94db83ea99f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-bailii_roberta_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bailii_roberta RoBertaEmbeddings from tsantosh7 +author: John Snow Labs +name: bailii_roberta +date: 2024-09-19 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bailii_roberta` is a English model originally trained by tsantosh7. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bailii_roberta_en_5.5.0_3.0_1726747868726.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bailii_roberta_en_5.5.0_3.0_1726747868726.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("bailii_roberta","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("bailii_roberta","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bailii_roberta| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|465.8 MB| + +## References + +https://huggingface.co/tsantosh7/Bailii-Roberta \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-bailii_roberta_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-bailii_roberta_pipeline_en.md new file mode 100644 index 00000000000000..19296991e7fa15 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-bailii_roberta_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bailii_roberta_pipeline pipeline RoBertaEmbeddings from tsantosh7 +author: John Snow Labs +name: bailii_roberta_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bailii_roberta_pipeline` is a English model originally trained by tsantosh7. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bailii_roberta_pipeline_en_5.5.0_3.0_1726747892086.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bailii_roberta_pipeline_en_5.5.0_3.0_1726747892086.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bailii_roberta_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bailii_roberta_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bailii_roberta_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|465.8 MB| + +## References + +https://huggingface.co/tsantosh7/Bailii-Roberta + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-base_english_combined_v4_1_0_8_1e_06_restful_sweep_5_en.md b/docs/_posts/ahmedlone127/2024-09-19-base_english_combined_v4_1_0_8_1e_06_restful_sweep_5_en.md new file mode 100644 index 00000000000000..f3d8bdd1b31a1b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-base_english_combined_v4_1_0_8_1e_06_restful_sweep_5_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English base_english_combined_v4_1_0_8_1e_06_restful_sweep_5 WhisperForCTC from saahith +author: John Snow Labs +name: base_english_combined_v4_1_0_8_1e_06_restful_sweep_5 +date: 2024-09-19 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`base_english_combined_v4_1_0_8_1e_06_restful_sweep_5` is a English model originally trained by saahith. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/base_english_combined_v4_1_0_8_1e_06_restful_sweep_5_en_5.5.0_3.0_1726758374750.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/base_english_combined_v4_1_0_8_1e_06_restful_sweep_5_en_5.5.0_3.0_1726758374750.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("base_english_combined_v4_1_0_8_1e_06_restful_sweep_5","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("base_english_combined_v4_1_0_8_1e_06_restful_sweep_5", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|base_english_combined_v4_1_0_8_1e_06_restful_sweep_5| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|646.1 MB| + +## References + +https://huggingface.co/saahith/base.en-combined_v4-1-0-8-1e-06-restful-sweep-5 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-base_english_combined_v4_1_0_8_1e_06_restful_sweep_5_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-base_english_combined_v4_1_0_8_1e_06_restful_sweep_5_pipeline_en.md new file mode 100644 index 00000000000000..68d6f5ddb7caef --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-base_english_combined_v4_1_0_8_1e_06_restful_sweep_5_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English base_english_combined_v4_1_0_8_1e_06_restful_sweep_5_pipeline pipeline WhisperForCTC from saahith +author: John Snow Labs +name: base_english_combined_v4_1_0_8_1e_06_restful_sweep_5_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`base_english_combined_v4_1_0_8_1e_06_restful_sweep_5_pipeline` is a English model originally trained by saahith. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/base_english_combined_v4_1_0_8_1e_06_restful_sweep_5_pipeline_en_5.5.0_3.0_1726758409649.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/base_english_combined_v4_1_0_8_1e_06_restful_sweep_5_pipeline_en_5.5.0_3.0_1726758409649.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("base_english_combined_v4_1_0_8_1e_06_restful_sweep_5_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("base_english_combined_v4_1_0_8_1e_06_restful_sweep_5_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|base_english_combined_v4_1_0_8_1e_06_restful_sweep_5_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|646.1 MB| + +## References + +https://huggingface.co/saahith/base.en-combined_v4-1-0-8-1e-06-restful-sweep-5 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-bert_base_blbooks_cased_en.md b/docs/_posts/ahmedlone127/2024-09-19-bert_base_blbooks_cased_en.md new file mode 100644 index 00000000000000..dde3db1a234653 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-bert_base_blbooks_cased_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_blbooks_cased BertEmbeddings from bigscience-historical-texts +author: John Snow Labs +name: bert_base_blbooks_cased +date: 2024-09-19 +tags: [en, open_source, onnx, embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_blbooks_cased` is a English model originally trained by bigscience-historical-texts. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_blbooks_cased_en_5.5.0_3.0_1726705673902.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_blbooks_cased_en_5.5.0_3.0_1726705673902.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = BertEmbeddings.pretrained("bert_base_blbooks_cased","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = BertEmbeddings.pretrained("bert_base_blbooks_cased","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_blbooks_cased| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[bert]| +|Language:|en| +|Size:|412.3 MB| + +## References + +https://huggingface.co/bigscience-historical-texts/bert-base-blbooks-cased \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-bert_base_cased_mlm_chemistry_en.md b/docs/_posts/ahmedlone127/2024-09-19-bert_base_cased_mlm_chemistry_en.md new file mode 100644 index 00000000000000..2e5de951c8662e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-bert_base_cased_mlm_chemistry_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_cased_mlm_chemistry BertEmbeddings from jonas-luehrs +author: John Snow Labs +name: bert_base_cased_mlm_chemistry +date: 2024-09-19 +tags: [en, open_source, onnx, embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_cased_mlm_chemistry` is a English model originally trained by jonas-luehrs. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_cased_mlm_chemistry_en_5.5.0_3.0_1726744568016.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_cased_mlm_chemistry_en_5.5.0_3.0_1726744568016.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = BertEmbeddings.pretrained("bert_base_cased_mlm_chemistry","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = BertEmbeddings.pretrained("bert_base_cased_mlm_chemistry","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_cased_mlm_chemistry| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[bert]| +|Language:|en| +|Size:|403.6 MB| + +## References + +https://huggingface.co/jonas-luehrs/bert-base-cased-MLM-chemistry \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-bert_base_cased_mlm_chemistry_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-bert_base_cased_mlm_chemistry_pipeline_en.md new file mode 100644 index 00000000000000..650e8e6511dbdd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-bert_base_cased_mlm_chemistry_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_base_cased_mlm_chemistry_pipeline pipeline BertEmbeddings from jonas-luehrs +author: John Snow Labs +name: bert_base_cased_mlm_chemistry_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_cased_mlm_chemistry_pipeline` is a English model originally trained by jonas-luehrs. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_cased_mlm_chemistry_pipeline_en_5.5.0_3.0_1726744587675.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_cased_mlm_chemistry_pipeline_en_5.5.0_3.0_1726744587675.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_cased_mlm_chemistry_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_cased_mlm_chemistry_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_cased_mlm_chemistry_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/jonas-luehrs/bert-base-cased-MLM-chemistry + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-bert_base_multilingual_uncased_finetuned_hp_pipeline_xx.md b/docs/_posts/ahmedlone127/2024-09-19-bert_base_multilingual_uncased_finetuned_hp_pipeline_xx.md new file mode 100644 index 00000000000000..40cb46cb8ae0d8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-bert_base_multilingual_uncased_finetuned_hp_pipeline_xx.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Multilingual bert_base_multilingual_uncased_finetuned_hp_pipeline pipeline BertEmbeddings from rman-rahimi-29 +author: John Snow Labs +name: bert_base_multilingual_uncased_finetuned_hp_pipeline +date: 2024-09-19 +tags: [xx, open_source, pipeline, onnx] +task: Embeddings +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_multilingual_uncased_finetuned_hp_pipeline` is a Multilingual model originally trained by rman-rahimi-29. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_multilingual_uncased_finetuned_hp_pipeline_xx_5.5.0_3.0_1726731805654.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_multilingual_uncased_finetuned_hp_pipeline_xx_5.5.0_3.0_1726731805654.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_multilingual_uncased_finetuned_hp_pipeline", lang = "xx") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_multilingual_uncased_finetuned_hp_pipeline", lang = "xx") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_multilingual_uncased_finetuned_hp_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|xx| +|Size:|625.5 MB| + +## References + +https://huggingface.co/rman-rahimi-29/bert-base-multilingual-uncased-finetuned-hp + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-bert_base_spanish_analysis_app_questions_en.md b/docs/_posts/ahmedlone127/2024-09-19-bert_base_spanish_analysis_app_questions_en.md new file mode 100644 index 00000000000000..2f12e5eb2477f1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-bert_base_spanish_analysis_app_questions_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_spanish_analysis_app_questions BertForSequenceClassification from devdroide +author: John Snow Labs +name: bert_base_spanish_analysis_app_questions +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_spanish_analysis_app_questions` is a English model originally trained by devdroide. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_spanish_analysis_app_questions_en_5.5.0_3.0_1726770634994.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_spanish_analysis_app_questions_en_5.5.0_3.0_1726770634994.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_spanish_analysis_app_questions","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_spanish_analysis_app_questions", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_spanish_analysis_app_questions| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|411.9 MB| + +## References + +https://huggingface.co/devdroide/bert-base-spanish-analysis-app-questions \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-bert_base_uncased_1802_r1_en.md b/docs/_posts/ahmedlone127/2024-09-19-bert_base_uncased_1802_r1_en.md new file mode 100644 index 00000000000000..4b06029fa662fb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-bert_base_uncased_1802_r1_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_uncased_1802_r1 BertEmbeddings from JamesKim +author: John Snow Labs +name: bert_base_uncased_1802_r1 +date: 2024-09-19 +tags: [en, open_source, onnx, embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_1802_r1` is a English model originally trained by JamesKim. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_1802_r1_en_5.5.0_3.0_1726744741322.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_1802_r1_en_5.5.0_3.0_1726744741322.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = BertEmbeddings.pretrained("bert_base_uncased_1802_r1","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = BertEmbeddings.pretrained("bert_base_uncased_1802_r1","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_1802_r1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[bert]| +|Language:|en| +|Size:|407.1 MB| + +## References + +https://huggingface.co/JamesKim/bert-base-uncased_1802_r1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-bert_base_uncased_1802_r1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-bert_base_uncased_1802_r1_pipeline_en.md new file mode 100644 index 00000000000000..ad57abdd7c8f12 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-bert_base_uncased_1802_r1_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_base_uncased_1802_r1_pipeline pipeline BertEmbeddings from JamesKim +author: John Snow Labs +name: bert_base_uncased_1802_r1_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_1802_r1_pipeline` is a English model originally trained by JamesKim. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_1802_r1_pipeline_en_5.5.0_3.0_1726744760610.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_1802_r1_pipeline_en_5.5.0_3.0_1726744760610.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_1802_r1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_1802_r1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_1802_r1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.1 MB| + +## References + +https://huggingface.co/JamesKim/bert-base-uncased_1802_r1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-bert_base_uncased_ag_news_finetuned_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-bert_base_uncased_ag_news_finetuned_pipeline_en.md new file mode 100644 index 00000000000000..fe7da2ff18168c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-bert_base_uncased_ag_news_finetuned_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_base_uncased_ag_news_finetuned_pipeline pipeline BertForSequenceClassification from odunola +author: John Snow Labs +name: bert_base_uncased_ag_news_finetuned_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_ag_news_finetuned_pipeline` is a English model originally trained by odunola. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_ag_news_finetuned_pipeline_en_5.5.0_3.0_1726739835962.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_ag_news_finetuned_pipeline_en_5.5.0_3.0_1726739835962.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_ag_news_finetuned_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_ag_news_finetuned_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_ag_news_finetuned_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/odunola/bert-base-uncased-ag-news-finetuned + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-bert_base_uncased_ear_misogyny_italian_it.md b/docs/_posts/ahmedlone127/2024-09-19-bert_base_uncased_ear_misogyny_italian_it.md new file mode 100644 index 00000000000000..432fa65b1da855 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-bert_base_uncased_ear_misogyny_italian_it.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Italian bert_base_uncased_ear_misogyny_italian BertForSequenceClassification from MilaNLProc +author: John Snow Labs +name: bert_base_uncased_ear_misogyny_italian +date: 2024-09-19 +tags: [it, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: it +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_ear_misogyny_italian` is a Italian model originally trained by MilaNLProc. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_ear_misogyny_italian_it_5.5.0_3.0_1726736444083.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_ear_misogyny_italian_it_5.5.0_3.0_1726736444083.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_uncased_ear_misogyny_italian","it") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_uncased_ear_misogyny_italian", "it") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_ear_misogyny_italian| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|it| +|Size:|412.0 MB| + +## References + +https://huggingface.co/MilaNLProc/bert-base-uncased-ear-misogyny-italian \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-bert_base_uncased_ear_misogyny_italian_pipeline_it.md b/docs/_posts/ahmedlone127/2024-09-19-bert_base_uncased_ear_misogyny_italian_pipeline_it.md new file mode 100644 index 00000000000000..d696ce5b9abb4a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-bert_base_uncased_ear_misogyny_italian_pipeline_it.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Italian bert_base_uncased_ear_misogyny_italian_pipeline pipeline BertForSequenceClassification from MilaNLProc +author: John Snow Labs +name: bert_base_uncased_ear_misogyny_italian_pipeline +date: 2024-09-19 +tags: [it, open_source, pipeline, onnx] +task: Text Classification +language: it +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_ear_misogyny_italian_pipeline` is a Italian model originally trained by MilaNLProc. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_ear_misogyny_italian_pipeline_it_5.5.0_3.0_1726736465190.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_ear_misogyny_italian_pipeline_it_5.5.0_3.0_1726736465190.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_ear_misogyny_italian_pipeline", lang = "it") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_ear_misogyny_italian_pipeline", lang = "it") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_ear_misogyny_italian_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|it| +|Size:|412.0 MB| + +## References + +https://huggingface.co/MilaNLProc/bert-base-uncased-ear-misogyny-italian + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-bert_base_uncased_few_shot_k_64_finetuned_squad_seed_8_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-bert_base_uncased_few_shot_k_64_finetuned_squad_seed_8_pipeline_en.md new file mode 100644 index 00000000000000..257bd25e40c630 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-bert_base_uncased_few_shot_k_64_finetuned_squad_seed_8_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_uncased_few_shot_k_64_finetuned_squad_seed_8_pipeline pipeline BertForQuestionAnswering from anas-awadalla +author: John Snow Labs +name: bert_base_uncased_few_shot_k_64_finetuned_squad_seed_8_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_few_shot_k_64_finetuned_squad_seed_8_pipeline` is a English model originally trained by anas-awadalla. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_few_shot_k_64_finetuned_squad_seed_8_pipeline_en_5.5.0_3.0_1726710470114.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_few_shot_k_64_finetuned_squad_seed_8_pipeline_en_5.5.0_3.0_1726710470114.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_few_shot_k_64_finetuned_squad_seed_8_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_few_shot_k_64_finetuned_squad_seed_8_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_few_shot_k_64_finetuned_squad_seed_8_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/anas-awadalla/bert-base-uncased-few-shot-k-64-finetuned-squad-seed-8 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-bert_base_uncased_finetuned_bible_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-bert_base_uncased_finetuned_bible_pipeline_en.md new file mode 100644 index 00000000000000..2bb6fd1961ccb5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-bert_base_uncased_finetuned_bible_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_base_uncased_finetuned_bible_pipeline pipeline BertEmbeddings from Pragash-Mohanarajah +author: John Snow Labs +name: bert_base_uncased_finetuned_bible_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetuned_bible_pipeline` is a English model originally trained by Pragash-Mohanarajah. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_bible_pipeline_en_5.5.0_3.0_1726717677937.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_bible_pipeline_en_5.5.0_3.0_1726717677937.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_finetuned_bible_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_finetuned_bible_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetuned_bible_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/Pragash-Mohanarajah/bert-base-uncased-finetuned-bible + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-bert_base_uncased_finetuned_news_1973_1974_en.md b/docs/_posts/ahmedlone127/2024-09-19-bert_base_uncased_finetuned_news_1973_1974_en.md new file mode 100644 index 00000000000000..6e6063b66ff2c7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-bert_base_uncased_finetuned_news_1973_1974_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_uncased_finetuned_news_1973_1974 BertEmbeddings from sally9805 +author: John Snow Labs +name: bert_base_uncased_finetuned_news_1973_1974 +date: 2024-09-19 +tags: [en, open_source, onnx, embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetuned_news_1973_1974` is a English model originally trained by sally9805. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_news_1973_1974_en_5.5.0_3.0_1726734597497.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_news_1973_1974_en_5.5.0_3.0_1726734597497.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = BertEmbeddings.pretrained("bert_base_uncased_finetuned_news_1973_1974","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = BertEmbeddings.pretrained("bert_base_uncased_finetuned_news_1973_1974","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetuned_news_1973_1974| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[bert]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/sally9805/bert-base-uncased-finetuned-news-1973-1974 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-bert_base_uncased_finetuned_news_1973_1974_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-bert_base_uncased_finetuned_news_1973_1974_pipeline_en.md new file mode 100644 index 00000000000000..9e1106b713eb91 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-bert_base_uncased_finetuned_news_1973_1974_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_base_uncased_finetuned_news_1973_1974_pipeline pipeline BertEmbeddings from sally9805 +author: John Snow Labs +name: bert_base_uncased_finetuned_news_1973_1974_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetuned_news_1973_1974_pipeline` is a English model originally trained by sally9805. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_news_1973_1974_pipeline_en_5.5.0_3.0_1726734617602.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_news_1973_1974_pipeline_en_5.5.0_3.0_1726734617602.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_finetuned_news_1973_1974_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_finetuned_news_1973_1974_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetuned_news_1973_1974_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/sally9805/bert-base-uncased-finetuned-news-1973-1974 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-bert_base_uncased_finetuned_news_2010_2015_en.md b/docs/_posts/ahmedlone127/2024-09-19-bert_base_uncased_finetuned_news_2010_2015_en.md new file mode 100644 index 00000000000000..36a1e89feb0a01 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-bert_base_uncased_finetuned_news_2010_2015_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_uncased_finetuned_news_2010_2015 BertEmbeddings from sally9805 +author: John Snow Labs +name: bert_base_uncased_finetuned_news_2010_2015 +date: 2024-09-19 +tags: [en, open_source, onnx, embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetuned_news_2010_2015` is a English model originally trained by sally9805. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_news_2010_2015_en_5.5.0_3.0_1726717751883.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_news_2010_2015_en_5.5.0_3.0_1726717751883.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = BertEmbeddings.pretrained("bert_base_uncased_finetuned_news_2010_2015","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = BertEmbeddings.pretrained("bert_base_uncased_finetuned_news_2010_2015","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetuned_news_2010_2015| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[bert]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/sally9805/bert-base-uncased-finetuned-news-2010-2015 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-bert_base_uncased_finetuned_news_2010_2015_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-bert_base_uncased_finetuned_news_2010_2015_pipeline_en.md new file mode 100644 index 00000000000000..79abacacdc4366 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-bert_base_uncased_finetuned_news_2010_2015_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_base_uncased_finetuned_news_2010_2015_pipeline pipeline BertEmbeddings from sally9805 +author: John Snow Labs +name: bert_base_uncased_finetuned_news_2010_2015_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetuned_news_2010_2015_pipeline` is a English model originally trained by sally9805. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_news_2010_2015_pipeline_en_5.5.0_3.0_1726717771103.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_news_2010_2015_pipeline_en_5.5.0_3.0_1726717771103.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_finetuned_news_2010_2015_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_finetuned_news_2010_2015_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetuned_news_2010_2015_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/sally9805/bert-base-uncased-finetuned-news-2010-2015 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-bert_base_uncased_finetuned_squad_sonalh_en.md b/docs/_posts/ahmedlone127/2024-09-19-bert_base_uncased_finetuned_squad_sonalh_en.md new file mode 100644 index 00000000000000..d611537fd1ba9e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-bert_base_uncased_finetuned_squad_sonalh_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_uncased_finetuned_squad_sonalh BertForQuestionAnswering from SonalH +author: John Snow Labs +name: bert_base_uncased_finetuned_squad_sonalh +date: 2024-09-19 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetuned_squad_sonalh` is a English model originally trained by SonalH. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_squad_sonalh_en_5.5.0_3.0_1726765269370.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_squad_sonalh_en_5.5.0_3.0_1726765269370.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetuned_squad_sonalh","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetuned_squad_sonalh", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetuned_squad_sonalh| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/SonalH/bert-base-uncased-finetuned-squad \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-bert_base_uncased_finetuned_squad_sonalh_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-bert_base_uncased_finetuned_squad_sonalh_pipeline_en.md new file mode 100644 index 00000000000000..9dd71f62af5279 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-bert_base_uncased_finetuned_squad_sonalh_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_uncased_finetuned_squad_sonalh_pipeline pipeline BertForQuestionAnswering from SonalH +author: John Snow Labs +name: bert_base_uncased_finetuned_squad_sonalh_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetuned_squad_sonalh_pipeline` is a English model originally trained by SonalH. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_squad_sonalh_pipeline_en_5.5.0_3.0_1726765289608.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_squad_sonalh_pipeline_en_5.5.0_3.0_1726765289608.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_finetuned_squad_sonalh_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_finetuned_squad_sonalh_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetuned_squad_sonalh_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/SonalH/bert-base-uncased-finetuned-squad + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-bert_base_uncased_git_pipeline_zh.md b/docs/_posts/ahmedlone127/2024-09-19-bert_base_uncased_git_pipeline_zh.md new file mode 100644 index 00000000000000..0ba47970a1f032 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-bert_base_uncased_git_pipeline_zh.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Chinese bert_base_uncased_git_pipeline pipeline BertEmbeddings from littlebird13 +author: John Snow Labs +name: bert_base_uncased_git_pipeline +date: 2024-09-19 +tags: [zh, open_source, pipeline, onnx] +task: Embeddings +language: zh +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_git_pipeline` is a Chinese model originally trained by littlebird13. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_git_pipeline_zh_5.5.0_3.0_1726717457999.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_git_pipeline_zh_5.5.0_3.0_1726717457999.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_git_pipeline", lang = "zh") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_git_pipeline", lang = "zh") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_git_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|zh| +|Size:|407.2 MB| + +## References + +https://huggingface.co/littlebird13/bert-base-uncased-git + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-bert_base_uncased_sclarge_en.md b/docs/_posts/ahmedlone127/2024-09-19-bert_base_uncased_sclarge_en.md new file mode 100644 index 00000000000000..805e93f3252bb7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-bert_base_uncased_sclarge_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_uncased_sclarge BertEmbeddings from CambridgeMolecularEngineering +author: John Snow Labs +name: bert_base_uncased_sclarge +date: 2024-09-19 +tags: [en, open_source, onnx, embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_sclarge` is a English model originally trained by CambridgeMolecularEngineering. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_sclarge_en_5.5.0_3.0_1726734685884.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_sclarge_en_5.5.0_3.0_1726734685884.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = BertEmbeddings.pretrained("bert_base_uncased_sclarge","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = BertEmbeddings.pretrained("bert_base_uncased_sclarge","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_sclarge| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[bert]| +|Language:|en| +|Size:|407.1 MB| + +## References + +https://huggingface.co/CambridgeMolecularEngineering/bert-base-uncased-sclarge \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-bert_base_uncased_sclarge_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-bert_base_uncased_sclarge_pipeline_en.md new file mode 100644 index 00000000000000..b30902bd95a693 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-bert_base_uncased_sclarge_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_base_uncased_sclarge_pipeline pipeline BertEmbeddings from CambridgeMolecularEngineering +author: John Snow Labs +name: bert_base_uncased_sclarge_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_sclarge_pipeline` is a English model originally trained by CambridgeMolecularEngineering. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_sclarge_pipeline_en_5.5.0_3.0_1726734705697.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_sclarge_pipeline_en_5.5.0_3.0_1726734705697.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_sclarge_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_sclarge_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_sclarge_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/CambridgeMolecularEngineering/bert-base-uncased-sclarge + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-bert_base_uncased_sijia_w_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-bert_base_uncased_sijia_w_pipeline_en.md new file mode 100644 index 00000000000000..88aaf5f7e4b89f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-bert_base_uncased_sijia_w_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_base_uncased_sijia_w_pipeline pipeline BertEmbeddings from sijia-w +author: John Snow Labs +name: bert_base_uncased_sijia_w_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_sijia_w_pipeline` is a English model originally trained by sijia-w. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_sijia_w_pipeline_en_5.5.0_3.0_1726731962786.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_sijia_w_pipeline_en_5.5.0_3.0_1726731962786.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_sijia_w_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_sijia_w_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_sijia_w_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/sijia-w/bert-base-uncased + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-bert_covid_emotion_classifier_en.md b/docs/_posts/ahmedlone127/2024-09-19-bert_covid_emotion_classifier_en.md new file mode 100644 index 00000000000000..aff9855fdda32b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-bert_covid_emotion_classifier_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_covid_emotion_classifier DistilBertForSequenceClassification from PFraud +author: John Snow Labs +name: bert_covid_emotion_classifier +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_covid_emotion_classifier` is a English model originally trained by PFraud. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_covid_emotion_classifier_en_5.5.0_3.0_1726763667918.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_covid_emotion_classifier_en_5.5.0_3.0_1726763667918.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("bert_covid_emotion_classifier","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("bert_covid_emotion_classifier", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_covid_emotion_classifier| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/PFraud/BERT_Covid_Emotion_Classifier \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-bert_covid_emotion_classifier_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-bert_covid_emotion_classifier_pipeline_en.md new file mode 100644 index 00000000000000..34e819c5323871 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-bert_covid_emotion_classifier_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_covid_emotion_classifier_pipeline pipeline DistilBertForSequenceClassification from PFraud +author: John Snow Labs +name: bert_covid_emotion_classifier_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_covid_emotion_classifier_pipeline` is a English model originally trained by PFraud. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_covid_emotion_classifier_pipeline_en_5.5.0_3.0_1726763680343.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_covid_emotion_classifier_pipeline_en_5.5.0_3.0_1726763680343.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_covid_emotion_classifier_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_covid_emotion_classifier_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_covid_emotion_classifier_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/PFraud/BERT_Covid_Emotion_Classifier + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-bert_distilled_multi_teacher_model_random_tweet_emotion_epoch7_alpha0_8_refined_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-bert_distilled_multi_teacher_model_random_tweet_emotion_epoch7_alpha0_8_refined_pipeline_en.md new file mode 100644 index 00000000000000..1f27b5a33a73fe --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-bert_distilled_multi_teacher_model_random_tweet_emotion_epoch7_alpha0_8_refined_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_distilled_multi_teacher_model_random_tweet_emotion_epoch7_alpha0_8_refined_pipeline pipeline DistilBertForSequenceClassification from ArafatBHossain +author: John Snow Labs +name: bert_distilled_multi_teacher_model_random_tweet_emotion_epoch7_alpha0_8_refined_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_distilled_multi_teacher_model_random_tweet_emotion_epoch7_alpha0_8_refined_pipeline` is a English model originally trained by ArafatBHossain. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_distilled_multi_teacher_model_random_tweet_emotion_epoch7_alpha0_8_refined_pipeline_en_5.5.0_3.0_1726741554074.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_distilled_multi_teacher_model_random_tweet_emotion_epoch7_alpha0_8_refined_pipeline_en_5.5.0_3.0_1726741554074.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_distilled_multi_teacher_model_random_tweet_emotion_epoch7_alpha0_8_refined_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_distilled_multi_teacher_model_random_tweet_emotion_epoch7_alpha0_8_refined_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_distilled_multi_teacher_model_random_tweet_emotion_epoch7_alpha0_8_refined_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/ArafatBHossain/bert-distilled-multi_teacher_model_random_tweet_emotion_epoch7_alpha0.8_refined + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-bert_finetuned_ner4_ikram11_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-bert_finetuned_ner4_ikram11_pipeline_en.md new file mode 100644 index 00000000000000..27e2062941eee1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-bert_finetuned_ner4_ikram11_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_finetuned_ner4_ikram11_pipeline pipeline BertForTokenClassification from Ikram11 +author: John Snow Labs +name: bert_finetuned_ner4_ikram11_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_finetuned_ner4_ikram11_pipeline` is a English model originally trained by Ikram11. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_finetuned_ner4_ikram11_pipeline_en_5.5.0_3.0_1726774375006.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_finetuned_ner4_ikram11_pipeline_en_5.5.0_3.0_1726774375006.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_finetuned_ner4_ikram11_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_finetuned_ner4_ikram11_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_finetuned_ner4_ikram11_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/Ikram11/bert-finetuned-ner4 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-bert_finetuned_squad_ashaduzzaman_en.md b/docs/_posts/ahmedlone127/2024-09-19-bert_finetuned_squad_ashaduzzaman_en.md new file mode 100644 index 00000000000000..efd9e23e506ed5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-bert_finetuned_squad_ashaduzzaman_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_finetuned_squad_ashaduzzaman BertForQuestionAnswering from ashaduzzaman +author: John Snow Labs +name: bert_finetuned_squad_ashaduzzaman +date: 2024-09-19 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_finetuned_squad_ashaduzzaman` is a English model originally trained by ashaduzzaman. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_finetuned_squad_ashaduzzaman_en_5.5.0_3.0_1726765858826.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_finetuned_squad_ashaduzzaman_en_5.5.0_3.0_1726765858826.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_finetuned_squad_ashaduzzaman","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_finetuned_squad_ashaduzzaman", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_finetuned_squad_ashaduzzaman| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/ashaduzzaman/bert-finetuned-squad \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-bert_finetuned_squad_ashaduzzaman_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-bert_finetuned_squad_ashaduzzaman_pipeline_en.md new file mode 100644 index 00000000000000..9fb2e9f3cf2fca --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-bert_finetuned_squad_ashaduzzaman_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_finetuned_squad_ashaduzzaman_pipeline pipeline BertForQuestionAnswering from ashaduzzaman +author: John Snow Labs +name: bert_finetuned_squad_ashaduzzaman_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_finetuned_squad_ashaduzzaman_pipeline` is a English model originally trained by ashaduzzaman. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_finetuned_squad_ashaduzzaman_pipeline_en_5.5.0_3.0_1726765877820.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_finetuned_squad_ashaduzzaman_pipeline_en_5.5.0_3.0_1726765877820.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_finetuned_squad_ashaduzzaman_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_finetuned_squad_ashaduzzaman_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_finetuned_squad_ashaduzzaman_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/ashaduzzaman/bert-finetuned-squad + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-bert_l10_h256_uncased_en.md b/docs/_posts/ahmedlone127/2024-09-19-bert_l10_h256_uncased_en.md new file mode 100644 index 00000000000000..18546d8c97ca33 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-bert_l10_h256_uncased_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_l10_h256_uncased BertEmbeddings from gaunernst +author: John Snow Labs +name: bert_l10_h256_uncased +date: 2024-09-19 +tags: [en, open_source, onnx, embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_l10_h256_uncased` is a English model originally trained by gaunernst. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_l10_h256_uncased_en_5.5.0_3.0_1726734315861.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_l10_h256_uncased_en_5.5.0_3.0_1726734315861.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = BertEmbeddings.pretrained("bert_l10_h256_uncased","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = BertEmbeddings.pretrained("bert_l10_h256_uncased","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_l10_h256_uncased| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[bert]| +|Language:|en| +|Size:|59.6 MB| + +## References + +https://huggingface.co/gaunernst/bert-L10-H256-uncased \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-bert_l12_h256_uncased_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-bert_l12_h256_uncased_pipeline_en.md new file mode 100644 index 00000000000000..f70508c8bceeeb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-bert_l12_h256_uncased_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_l12_h256_uncased_pipeline pipeline BertEmbeddings from gaunernst +author: John Snow Labs +name: bert_l12_h256_uncased_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_l12_h256_uncased_pipeline` is a English model originally trained by gaunernst. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_l12_h256_uncased_pipeline_en_5.5.0_3.0_1726744700213.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_l12_h256_uncased_pipeline_en_5.5.0_3.0_1726744700213.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_l12_h256_uncased_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_l12_h256_uncased_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_l12_h256_uncased_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|65.5 MB| + +## References + +https://huggingface.co/gaunernst/bert-L12-H256-uncased + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-bert_question_answering_squad_en.md b/docs/_posts/ahmedlone127/2024-09-19-bert_question_answering_squad_en.md new file mode 100644 index 00000000000000..ca045830151578 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-bert_question_answering_squad_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_question_answering_squad BertForQuestionAnswering from MattBoraske +author: John Snow Labs +name: bert_question_answering_squad +date: 2024-09-19 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_question_answering_squad` is a English model originally trained by MattBoraske. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_question_answering_squad_en_5.5.0_3.0_1726710011504.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_question_answering_squad_en_5.5.0_3.0_1726710011504.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_question_answering_squad","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_question_answering_squad", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_question_answering_squad| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/MattBoraske/BERT-question-answering-SQuAD \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-bert_racial_cross_validation_en.md b/docs/_posts/ahmedlone127/2024-09-19-bert_racial_cross_validation_en.md new file mode 100644 index 00000000000000..90421fe070f830 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-bert_racial_cross_validation_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_racial_cross_validation DistilBertForSequenceClassification from jamnik99 +author: John Snow Labs +name: bert_racial_cross_validation +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_racial_cross_validation` is a English model originally trained by jamnik99. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_racial_cross_validation_en_5.5.0_3.0_1726719083250.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_racial_cross_validation_en_5.5.0_3.0_1726719083250.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("bert_racial_cross_validation","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("bert_racial_cross_validation", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_racial_cross_validation| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/jamnik99/BERT_racial_cross_validation \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-bert_racial_cross_validation_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-bert_racial_cross_validation_pipeline_en.md new file mode 100644 index 00000000000000..5b4eace3a17ebe --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-bert_racial_cross_validation_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_racial_cross_validation_pipeline pipeline DistilBertForSequenceClassification from jamnik99 +author: John Snow Labs +name: bert_racial_cross_validation_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_racial_cross_validation_pipeline` is a English model originally trained by jamnik99. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_racial_cross_validation_pipeline_en_5.5.0_3.0_1726719096016.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_racial_cross_validation_pipeline_en_5.5.0_3.0_1726719096016.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_racial_cross_validation_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_racial_cross_validation_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_racial_cross_validation_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/jamnik99/BERT_racial_cross_validation + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-bert_sbic_offensive_en.md b/docs/_posts/ahmedlone127/2024-09-19-bert_sbic_offensive_en.md new file mode 100644 index 00000000000000..aee1c573645ce1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-bert_sbic_offensive_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_sbic_offensive BertForSequenceClassification from Cameron +author: John Snow Labs +name: bert_sbic_offensive +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_sbic_offensive` is a English model originally trained by Cameron. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_sbic_offensive_en_5.5.0_3.0_1726781767716.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_sbic_offensive_en_5.5.0_3.0_1726781767716.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_sbic_offensive","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_sbic_offensive", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_sbic_offensive| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|405.9 MB| + +## References + +https://huggingface.co/Cameron/BERT-SBIC-offensive \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-bert_sxie3333_en.md b/docs/_posts/ahmedlone127/2024-09-19-bert_sxie3333_en.md new file mode 100644 index 00000000000000..5ec856ff3d40ec --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-bert_sxie3333_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_sxie3333 BertForSequenceClassification from sxie3333 +author: John Snow Labs +name: bert_sxie3333 +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_sxie3333` is a English model originally trained by sxie3333. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_sxie3333_en_5.5.0_3.0_1726706948659.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_sxie3333_en_5.5.0_3.0_1726706948659.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_sxie3333","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_sxie3333", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_sxie3333| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/sxie3333/BERT \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-bert_sxie3333_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-bert_sxie3333_pipeline_en.md new file mode 100644 index 00000000000000..6b407364537898 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-bert_sxie3333_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_sxie3333_pipeline pipeline BertForSequenceClassification from sxie3333 +author: John Snow Labs +name: bert_sxie3333_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_sxie3333_pipeline` is a English model originally trained by sxie3333. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_sxie3333_pipeline_en_5.5.0_3.0_1726706968327.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_sxie3333_pipeline_en_5.5.0_3.0_1726706968327.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_sxie3333_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_sxie3333_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_sxie3333_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/sxie3333/BERT + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-bert_two_en.md b/docs/_posts/ahmedlone127/2024-09-19-bert_two_en.md new file mode 100644 index 00000000000000..d845c5c6a86334 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-bert_two_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_two BertEmbeddings from emma7897 +author: John Snow Labs +name: bert_two +date: 2024-09-19 +tags: [en, open_source, onnx, embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_two` is a English model originally trained by emma7897. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_two_en_5.5.0_3.0_1726744584268.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_two_en_5.5.0_3.0_1726744584268.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = BertEmbeddings.pretrained("bert_two","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = BertEmbeddings.pretrained("bert_two","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_two| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[bert]| +|Language:|en| +|Size:|403.6 MB| + +## References + +https://huggingface.co/emma7897/bert_two \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-bert_two_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-bert_two_pipeline_en.md new file mode 100644 index 00000000000000..b9b988bd70cedb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-bert_two_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_two_pipeline pipeline BertEmbeddings from emma7897 +author: John Snow Labs +name: bert_two_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_two_pipeline` is a English model originally trained by emma7897. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_two_pipeline_en_5.5.0_3.0_1726744603574.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_two_pipeline_en_5.5.0_3.0_1726744603574.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_two_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_two_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_two_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/emma7897/bert_two + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-bert_vllm_gemma2b_llmoversight_1_0_nodropsus_13_en.md b/docs/_posts/ahmedlone127/2024-09-19-bert_vllm_gemma2b_llmoversight_1_0_nodropsus_13_en.md new file mode 100644 index 00000000000000..12be4708c19ee5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-bert_vllm_gemma2b_llmoversight_1_0_nodropsus_13_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_vllm_gemma2b_llmoversight_1_0_nodropsus_13 DistilBertForSequenceClassification from jvelja +author: John Snow Labs +name: bert_vllm_gemma2b_llmoversight_1_0_nodropsus_13 +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_vllm_gemma2b_llmoversight_1_0_nodropsus_13` is a English model originally trained by jvelja. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_vllm_gemma2b_llmoversight_1_0_nodropsus_13_en_5.5.0_3.0_1726719240069.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_vllm_gemma2b_llmoversight_1_0_nodropsus_13_en_5.5.0_3.0_1726719240069.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("bert_vllm_gemma2b_llmoversight_1_0_nodropsus_13","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("bert_vllm_gemma2b_llmoversight_1_0_nodropsus_13", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_vllm_gemma2b_llmoversight_1_0_nodropsus_13| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/jvelja/BERT_vllm-gemma2b-llmOversight-1.0-noDropSus_13 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-bert_vllm_gemma2b_llmoversight_1_0_nodropsus_13_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-bert_vllm_gemma2b_llmoversight_1_0_nodropsus_13_pipeline_en.md new file mode 100644 index 00000000000000..0f2cc241aa5c8e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-bert_vllm_gemma2b_llmoversight_1_0_nodropsus_13_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_vllm_gemma2b_llmoversight_1_0_nodropsus_13_pipeline pipeline DistilBertForSequenceClassification from jvelja +author: John Snow Labs +name: bert_vllm_gemma2b_llmoversight_1_0_nodropsus_13_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_vllm_gemma2b_llmoversight_1_0_nodropsus_13_pipeline` is a English model originally trained by jvelja. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_vllm_gemma2b_llmoversight_1_0_nodropsus_13_pipeline_en_5.5.0_3.0_1726719252584.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_vllm_gemma2b_llmoversight_1_0_nodropsus_13_pipeline_en_5.5.0_3.0_1726719252584.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_vllm_gemma2b_llmoversight_1_0_nodropsus_13_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_vllm_gemma2b_llmoversight_1_0_nodropsus_13_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_vllm_gemma2b_llmoversight_1_0_nodropsus_13_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/jvelja/BERT_vllm-gemma2b-llmOversight-1.0-noDropSus_13 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-bert_vllm_gemma2b_stringmatcher_newdataset_2_en.md b/docs/_posts/ahmedlone127/2024-09-19-bert_vllm_gemma2b_stringmatcher_newdataset_2_en.md new file mode 100644 index 00000000000000..d98939b614d987 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-bert_vllm_gemma2b_stringmatcher_newdataset_2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_vllm_gemma2b_stringmatcher_newdataset_2 DistilBertForSequenceClassification from jvelja +author: John Snow Labs +name: bert_vllm_gemma2b_stringmatcher_newdataset_2 +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_vllm_gemma2b_stringmatcher_newdataset_2` is a English model originally trained by jvelja. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_vllm_gemma2b_stringmatcher_newdataset_2_en_5.5.0_3.0_1726763564004.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_vllm_gemma2b_stringmatcher_newdataset_2_en_5.5.0_3.0_1726763564004.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("bert_vllm_gemma2b_stringmatcher_newdataset_2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("bert_vllm_gemma2b_stringmatcher_newdataset_2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_vllm_gemma2b_stringmatcher_newdataset_2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/jvelja/BERT_vllm-gemma2b-stringMatcher-newDataset_2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-bert_vllm_gemma2b_stringmatcher_newdataset_2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-bert_vllm_gemma2b_stringmatcher_newdataset_2_pipeline_en.md new file mode 100644 index 00000000000000..2b3297e1812e20 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-bert_vllm_gemma2b_stringmatcher_newdataset_2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_vllm_gemma2b_stringmatcher_newdataset_2_pipeline pipeline DistilBertForSequenceClassification from jvelja +author: John Snow Labs +name: bert_vllm_gemma2b_stringmatcher_newdataset_2_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_vllm_gemma2b_stringmatcher_newdataset_2_pipeline` is a English model originally trained by jvelja. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_vllm_gemma2b_stringmatcher_newdataset_2_pipeline_en_5.5.0_3.0_1726763576248.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_vllm_gemma2b_stringmatcher_newdataset_2_pipeline_en_5.5.0_3.0_1726763576248.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_vllm_gemma2b_stringmatcher_newdataset_2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_vllm_gemma2b_stringmatcher_newdataset_2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_vllm_gemma2b_stringmatcher_newdataset_2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/jvelja/BERT_vllm-gemma2b-stringMatcher-newDataset_2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-bertbased_hatespeech_pretrain_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-bertbased_hatespeech_pretrain_pipeline_en.md new file mode 100644 index 00000000000000..7611408bf2906c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-bertbased_hatespeech_pretrain_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bertbased_hatespeech_pretrain_pipeline pipeline BertEmbeddings from agvidit1 +author: John Snow Labs +name: bertbased_hatespeech_pretrain_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bertbased_hatespeech_pretrain_pipeline` is a English model originally trained by agvidit1. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bertbased_hatespeech_pretrain_pipeline_en_5.5.0_3.0_1726705484685.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bertbased_hatespeech_pretrain_pipeline_en_5.5.0_3.0_1726705484685.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bertbased_hatespeech_pretrain_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bertbased_hatespeech_pretrain_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bertbased_hatespeech_pretrain_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/agvidit1/BertBased_HateSpeech_pretrain + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-bfsee_en.md b/docs/_posts/ahmedlone127/2024-09-19-bfsee_en.md new file mode 100644 index 00000000000000..4705369c130abf --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-bfsee_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bfsee DistilBertForSequenceClassification from talktojustintoday +author: John Snow Labs +name: bfsee +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bfsee` is a English model originally trained by talktojustintoday. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bfsee_en_5.5.0_3.0_1726763878963.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bfsee_en_5.5.0_3.0_1726763878963.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("bfsee","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("bfsee", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bfsee| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|507.7 MB| + +## References + +https://huggingface.co/talktojustintoday/bfsee \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-bfsee_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-bfsee_pipeline_en.md new file mode 100644 index 00000000000000..745d8ce323aa41 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-bfsee_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bfsee_pipeline pipeline DistilBertForSequenceClassification from talktojustintoday +author: John Snow Labs +name: bfsee_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bfsee_pipeline` is a English model originally trained by talktojustintoday. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bfsee_pipeline_en_5.5.0_3.0_1726763904228.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bfsee_pipeline_en_5.5.0_3.0_1726763904228.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bfsee_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bfsee_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bfsee_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|507.7 MB| + +## References + +https://huggingface.co/talktojustintoday/bfsee + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-bge_large_eedi_2024_en.md b/docs/_posts/ahmedlone127/2024-09-19-bge_large_eedi_2024_en.md new file mode 100644 index 00000000000000..16e492a2781b22 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-bge_large_eedi_2024_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English bge_large_eedi_2024 BGEEmbeddings from Gurveer05 +author: John Snow Labs +name: bge_large_eedi_2024 +date: 2024-09-19 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_large_eedi_2024` is a English model originally trained by Gurveer05. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_large_eedi_2024_en_5.5.0_3.0_1726764715725.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_large_eedi_2024_en_5.5.0_3.0_1726764715725.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("bge_large_eedi_2024","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("bge_large_eedi_2024","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_large_eedi_2024| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/Gurveer05/bge-large-eedi-2024 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-bge_large_repmus_matryoshka_en.md b/docs/_posts/ahmedlone127/2024-09-19-bge_large_repmus_matryoshka_en.md new file mode 100644 index 00000000000000..96607e5804accb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-bge_large_repmus_matryoshka_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English bge_large_repmus_matryoshka BGEEmbeddings from tessimago +author: John Snow Labs +name: bge_large_repmus_matryoshka +date: 2024-09-19 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_large_repmus_matryoshka` is a English model originally trained by tessimago. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_large_repmus_matryoshka_en_5.5.0_3.0_1726720225267.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_large_repmus_matryoshka_en_5.5.0_3.0_1726720225267.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("bge_large_repmus_matryoshka","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("bge_large_repmus_matryoshka","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_large_repmus_matryoshka| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/tessimago/bge-large-repmus-matryoshka \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-bge_large_repmus_matryoshka_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-bge_large_repmus_matryoshka_pipeline_en.md new file mode 100644 index 00000000000000..e00658796151c1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-bge_large_repmus_matryoshka_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bge_large_repmus_matryoshka_pipeline pipeline BGEEmbeddings from tessimago +author: John Snow Labs +name: bge_large_repmus_matryoshka_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_large_repmus_matryoshka_pipeline` is a English model originally trained by tessimago. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_large_repmus_matryoshka_pipeline_en_5.5.0_3.0_1726720294653.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_large_repmus_matryoshka_pipeline_en_5.5.0_3.0_1726720294653.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bge_large_repmus_matryoshka_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bge_large_repmus_matryoshka_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_large_repmus_matryoshka_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/tessimago/bge-large-repmus-matryoshka + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-bluebert_pubmed_mimic_uncased_l_12_h_768_a_12_finetuned_squad_en.md b/docs/_posts/ahmedlone127/2024-09-19-bluebert_pubmed_mimic_uncased_l_12_h_768_a_12_finetuned_squad_en.md new file mode 100644 index 00000000000000..7d97922fdd5a37 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-bluebert_pubmed_mimic_uncased_l_12_h_768_a_12_finetuned_squad_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bluebert_pubmed_mimic_uncased_l_12_h_768_a_12_finetuned_squad BertForQuestionAnswering from rsml +author: John Snow Labs +name: bluebert_pubmed_mimic_uncased_l_12_h_768_a_12_finetuned_squad +date: 2024-09-19 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bluebert_pubmed_mimic_uncased_l_12_h_768_a_12_finetuned_squad` is a English model originally trained by rsml. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bluebert_pubmed_mimic_uncased_l_12_h_768_a_12_finetuned_squad_en_5.5.0_3.0_1726765295557.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bluebert_pubmed_mimic_uncased_l_12_h_768_a_12_finetuned_squad_en_5.5.0_3.0_1726765295557.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bluebert_pubmed_mimic_uncased_l_12_h_768_a_12_finetuned_squad","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bluebert_pubmed_mimic_uncased_l_12_h_768_a_12_finetuned_squad", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bluebert_pubmed_mimic_uncased_l_12_h_768_a_12_finetuned_squad| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|407.1 MB| + +## References + +https://huggingface.co/rsml/bluebert_pubmed_mimic_uncased_L-12_H-768_A-12-finetuned-squad \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-brwac_v1_3__checkpoint4_en.md b/docs/_posts/ahmedlone127/2024-09-19-brwac_v1_3__checkpoint4_en.md new file mode 100644 index 00000000000000..f0aebe37ca2c3c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-brwac_v1_3__checkpoint4_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English brwac_v1_3__checkpoint4 RoBertaEmbeddings from eduagarcia-temp +author: John Snow Labs +name: brwac_v1_3__checkpoint4 +date: 2024-09-19 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`brwac_v1_3__checkpoint4` is a English model originally trained by eduagarcia-temp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/brwac_v1_3__checkpoint4_en_5.5.0_3.0_1726747028178.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/brwac_v1_3__checkpoint4_en_5.5.0_3.0_1726747028178.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("brwac_v1_3__checkpoint4","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("brwac_v1_3__checkpoint4","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|brwac_v1_3__checkpoint4| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|298.7 MB| + +## References + +https://huggingface.co/eduagarcia-temp/brwac_v1_3__checkpoint4 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-brwac_v1_3__checkpoint4_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-brwac_v1_3__checkpoint4_pipeline_en.md new file mode 100644 index 00000000000000..d6f260e939284b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-brwac_v1_3__checkpoint4_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English brwac_v1_3__checkpoint4_pipeline pipeline RoBertaEmbeddings from eduagarcia-temp +author: John Snow Labs +name: brwac_v1_3__checkpoint4_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`brwac_v1_3__checkpoint4_pipeline` is a English model originally trained by eduagarcia-temp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/brwac_v1_3__checkpoint4_pipeline_en_5.5.0_3.0_1726747119424.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/brwac_v1_3__checkpoint4_pipeline_en_5.5.0_3.0_1726747119424.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("brwac_v1_3__checkpoint4_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("brwac_v1_3__checkpoint4_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|brwac_v1_3__checkpoint4_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|298.7 MB| + +## References + +https://huggingface.co/eduagarcia-temp/brwac_v1_3__checkpoint4 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-brwac_v1_5__checkpoint_17_62000_en.md b/docs/_posts/ahmedlone127/2024-09-19-brwac_v1_5__checkpoint_17_62000_en.md new file mode 100644 index 00000000000000..0254fdfb111cd3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-brwac_v1_5__checkpoint_17_62000_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English brwac_v1_5__checkpoint_17_62000 RoBertaEmbeddings from eduagarcia-temp +author: John Snow Labs +name: brwac_v1_5__checkpoint_17_62000 +date: 2024-09-19 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`brwac_v1_5__checkpoint_17_62000` is a English model originally trained by eduagarcia-temp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/brwac_v1_5__checkpoint_17_62000_en_5.5.0_3.0_1726778555961.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/brwac_v1_5__checkpoint_17_62000_en_5.5.0_3.0_1726778555961.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("brwac_v1_5__checkpoint_17_62000","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("brwac_v1_5__checkpoint_17_62000","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|brwac_v1_5__checkpoint_17_62000| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|297.1 MB| + +## References + +https://huggingface.co/eduagarcia-temp/brwac_v1_5__checkpoint_17_62000 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-brwac_v1_5__checkpoint_17_62000_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-brwac_v1_5__checkpoint_17_62000_pipeline_en.md new file mode 100644 index 00000000000000..2709afa64033bb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-brwac_v1_5__checkpoint_17_62000_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English brwac_v1_5__checkpoint_17_62000_pipeline pipeline RoBertaEmbeddings from eduagarcia-temp +author: John Snow Labs +name: brwac_v1_5__checkpoint_17_62000_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`brwac_v1_5__checkpoint_17_62000_pipeline` is a English model originally trained by eduagarcia-temp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/brwac_v1_5__checkpoint_17_62000_pipeline_en_5.5.0_3.0_1726778644134.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/brwac_v1_5__checkpoint_17_62000_pipeline_en_5.5.0_3.0_1726778644134.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("brwac_v1_5__checkpoint_17_62000_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("brwac_v1_5__checkpoint_17_62000_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|brwac_v1_5__checkpoint_17_62000_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|297.1 MB| + +## References + +https://huggingface.co/eduagarcia-temp/brwac_v1_5__checkpoint_17_62000 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-bsc_bio_ehr_spanish_symptemist_fasttext_9_ner_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-bsc_bio_ehr_spanish_symptemist_fasttext_9_ner_pipeline_en.md new file mode 100644 index 00000000000000..7a825b8db7f15d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-bsc_bio_ehr_spanish_symptemist_fasttext_9_ner_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bsc_bio_ehr_spanish_symptemist_fasttext_9_ner_pipeline pipeline RoBertaForTokenClassification from Rodrigo1771 +author: John Snow Labs +name: bsc_bio_ehr_spanish_symptemist_fasttext_9_ner_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bsc_bio_ehr_spanish_symptemist_fasttext_9_ner_pipeline` is a English model originally trained by Rodrigo1771. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bsc_bio_ehr_spanish_symptemist_fasttext_9_ner_pipeline_en_5.5.0_3.0_1726730279214.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bsc_bio_ehr_spanish_symptemist_fasttext_9_ner_pipeline_en_5.5.0_3.0_1726730279214.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bsc_bio_ehr_spanish_symptemist_fasttext_9_ner_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bsc_bio_ehr_spanish_symptemist_fasttext_9_ner_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bsc_bio_ehr_spanish_symptemist_fasttext_9_ner_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|434.9 MB| + +## References + +https://huggingface.co/Rodrigo1771/bsc-bio-ehr-es-symptemist-fasttext-9-ner + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-burmese_awesome_model_abdelrahman_hassan_1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-burmese_awesome_model_abdelrahman_hassan_1_pipeline_en.md new file mode 100644 index 00000000000000..4ec4ca7648d23e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-burmese_awesome_model_abdelrahman_hassan_1_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_model_abdelrahman_hassan_1_pipeline pipeline DistilBertForSequenceClassification from Abdelrahman-Hassan-1 +author: John Snow Labs +name: burmese_awesome_model_abdelrahman_hassan_1_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_abdelrahman_hassan_1_pipeline` is a English model originally trained by Abdelrahman-Hassan-1. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_abdelrahman_hassan_1_pipeline_en_5.5.0_3.0_1726741016534.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_abdelrahman_hassan_1_pipeline_en_5.5.0_3.0_1726741016534.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_model_abdelrahman_hassan_1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_model_abdelrahman_hassan_1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_abdelrahman_hassan_1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Abdelrahman-Hassan-1/my_awesome_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-burmese_awesome_model_adithya5243_en.md b/docs/_posts/ahmedlone127/2024-09-19-burmese_awesome_model_adithya5243_en.md new file mode 100644 index 00000000000000..e2478d8b067e73 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-burmese_awesome_model_adithya5243_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_awesome_model_adithya5243 DistilBertForSequenceClassification from adithya5243 +author: John Snow Labs +name: burmese_awesome_model_adithya5243 +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_adithya5243` is a English model originally trained by adithya5243. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_adithya5243_en_5.5.0_3.0_1726704469224.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_adithya5243_en_5.5.0_3.0_1726704469224.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_adithya5243","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_adithya5243", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_adithya5243| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/adithya5243/my_awesome_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-burmese_awesome_model_bazgha_en.md b/docs/_posts/ahmedlone127/2024-09-19-burmese_awesome_model_bazgha_en.md new file mode 100644 index 00000000000000..0bd91346abe204 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-burmese_awesome_model_bazgha_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_awesome_model_bazgha DistilBertForSequenceClassification from bazgha +author: John Snow Labs +name: burmese_awesome_model_bazgha +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_bazgha` is a English model originally trained by bazgha. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_bazgha_en_5.5.0_3.0_1726742602513.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_bazgha_en_5.5.0_3.0_1726742602513.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_bazgha","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_bazgha", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_bazgha| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/bazgha/my_awesome_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-burmese_awesome_model_bazgha_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-burmese_awesome_model_bazgha_pipeline_en.md new file mode 100644 index 00000000000000..ab0afb8b2ecf27 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-burmese_awesome_model_bazgha_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_model_bazgha_pipeline pipeline DistilBertForSequenceClassification from bazgha +author: John Snow Labs +name: burmese_awesome_model_bazgha_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_bazgha_pipeline` is a English model originally trained by bazgha. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_bazgha_pipeline_en_5.5.0_3.0_1726742614799.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_bazgha_pipeline_en_5.5.0_3.0_1726742614799.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_model_bazgha_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_model_bazgha_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_bazgha_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/bazgha/my_awesome_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-burmese_awesome_model_fold_5_en.md b/docs/_posts/ahmedlone127/2024-09-19-burmese_awesome_model_fold_5_en.md new file mode 100644 index 00000000000000..46f2ae4ce9b1b3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-burmese_awesome_model_fold_5_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_awesome_model_fold_5 DistilBertForSequenceClassification from Thebisso09 +author: John Snow Labs +name: burmese_awesome_model_fold_5 +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_fold_5` is a English model originally trained by Thebisso09. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_fold_5_en_5.5.0_3.0_1726763700122.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_fold_5_en_5.5.0_3.0_1726763700122.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_fold_5","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_fold_5", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_fold_5| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Thebisso09/my_awesome_model_fold_5 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-burmese_awesome_model_fold_5_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-burmese_awesome_model_fold_5_pipeline_en.md new file mode 100644 index 00000000000000..8ad301bfc05c55 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-burmese_awesome_model_fold_5_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_model_fold_5_pipeline pipeline DistilBertForSequenceClassification from Thebisso09 +author: John Snow Labs +name: burmese_awesome_model_fold_5_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_fold_5_pipeline` is a English model originally trained by Thebisso09. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_fold_5_pipeline_en_5.5.0_3.0_1726763713233.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_fold_5_pipeline_en_5.5.0_3.0_1726763713233.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_model_fold_5_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_model_fold_5_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_fold_5_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Thebisso09/my_awesome_model_fold_5 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-burmese_awesome_model_highcodger10_en.md b/docs/_posts/ahmedlone127/2024-09-19-burmese_awesome_model_highcodger10_en.md new file mode 100644 index 00000000000000..21daa670b0049d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-burmese_awesome_model_highcodger10_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_awesome_model_highcodger10 DistilBertForSequenceClassification from highcodger10 +author: John Snow Labs +name: burmese_awesome_model_highcodger10 +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_highcodger10` is a English model originally trained by highcodger10. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_highcodger10_en_5.5.0_3.0_1726704791725.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_highcodger10_en_5.5.0_3.0_1726704791725.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_highcodger10","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_highcodger10", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_highcodger10| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/highcodger10/my_awesome_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-burmese_awesome_model_highcodger10_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-burmese_awesome_model_highcodger10_pipeline_en.md new file mode 100644 index 00000000000000..ef73f79c1841ca --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-burmese_awesome_model_highcodger10_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_model_highcodger10_pipeline pipeline DistilBertForSequenceClassification from highcodger10 +author: John Snow Labs +name: burmese_awesome_model_highcodger10_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_highcodger10_pipeline` is a English model originally trained by highcodger10. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_highcodger10_pipeline_en_5.5.0_3.0_1726704804062.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_highcodger10_pipeline_en_5.5.0_3.0_1726704804062.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_model_highcodger10_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_model_highcodger10_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_highcodger10_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/highcodger10/my_awesome_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-burmese_awesome_model_jacobmakar_en.md b/docs/_posts/ahmedlone127/2024-09-19-burmese_awesome_model_jacobmakar_en.md new file mode 100644 index 00000000000000..20ada008bc5518 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-burmese_awesome_model_jacobmakar_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_awesome_model_jacobmakar DistilBertForSequenceClassification from jacobmakar +author: John Snow Labs +name: burmese_awesome_model_jacobmakar +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_jacobmakar` is a English model originally trained by jacobmakar. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_jacobmakar_en_5.5.0_3.0_1726704483168.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_jacobmakar_en_5.5.0_3.0_1726704483168.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_jacobmakar","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_jacobmakar", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_jacobmakar| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/jacobmakar/my_awesome_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-burmese_awesome_model_kiliemah_en.md b/docs/_posts/ahmedlone127/2024-09-19-burmese_awesome_model_kiliemah_en.md new file mode 100644 index 00000000000000..86df50033fcd89 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-burmese_awesome_model_kiliemah_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_awesome_model_kiliemah DistilBertForSequenceClassification from Kiliemah +author: John Snow Labs +name: burmese_awesome_model_kiliemah +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_kiliemah` is a English model originally trained by Kiliemah. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_kiliemah_en_5.5.0_3.0_1726763346429.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_kiliemah_en_5.5.0_3.0_1726763346429.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_kiliemah","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_kiliemah", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_kiliemah| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Kiliemah/my_awesome_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-burmese_awesome_model_kiliemah_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-burmese_awesome_model_kiliemah_pipeline_en.md new file mode 100644 index 00000000000000..b514defbafbfdb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-burmese_awesome_model_kiliemah_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_model_kiliemah_pipeline pipeline DistilBertForSequenceClassification from Kiliemah +author: John Snow Labs +name: burmese_awesome_model_kiliemah_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_kiliemah_pipeline` is a English model originally trained by Kiliemah. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_kiliemah_pipeline_en_5.5.0_3.0_1726763361299.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_kiliemah_pipeline_en_5.5.0_3.0_1726763361299.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_model_kiliemah_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_model_kiliemah_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_kiliemah_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Kiliemah/my_awesome_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-burmese_awesome_model_lxl2023_en.md b/docs/_posts/ahmedlone127/2024-09-19-burmese_awesome_model_lxl2023_en.md new file mode 100644 index 00000000000000..f2926609cd8a25 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-burmese_awesome_model_lxl2023_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_awesome_model_lxl2023 DistilBertForSequenceClassification from lxl2023 +author: John Snow Labs +name: burmese_awesome_model_lxl2023 +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_lxl2023` is a English model originally trained by lxl2023. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_lxl2023_en_5.5.0_3.0_1726743079364.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_lxl2023_en_5.5.0_3.0_1726743079364.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_lxl2023","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_lxl2023", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_lxl2023| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/lxl2023/my_awesome_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-burmese_awesome_model_lxl2023_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-burmese_awesome_model_lxl2023_pipeline_en.md new file mode 100644 index 00000000000000..d2983b22f66d87 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-burmese_awesome_model_lxl2023_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_model_lxl2023_pipeline pipeline DistilBertForSequenceClassification from lxl2023 +author: John Snow Labs +name: burmese_awesome_model_lxl2023_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_lxl2023_pipeline` is a English model originally trained by lxl2023. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_lxl2023_pipeline_en_5.5.0_3.0_1726743091921.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_lxl2023_pipeline_en_5.5.0_3.0_1726743091921.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_model_lxl2023_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_model_lxl2023_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_lxl2023_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/lxl2023/my_awesome_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-burmese_awesome_model_philgrey_en.md b/docs/_posts/ahmedlone127/2024-09-19-burmese_awesome_model_philgrey_en.md new file mode 100644 index 00000000000000..d06ebeaedfea39 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-burmese_awesome_model_philgrey_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_awesome_model_philgrey DistilBertForSequenceClassification from philgrey +author: John Snow Labs +name: burmese_awesome_model_philgrey +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_philgrey` is a English model originally trained by philgrey. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_philgrey_en_5.5.0_3.0_1726741016052.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_philgrey_en_5.5.0_3.0_1726741016052.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_philgrey","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_philgrey", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_philgrey| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/philgrey/my_awesome_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-burmese_awesome_model_philgrey_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-burmese_awesome_model_philgrey_pipeline_en.md new file mode 100644 index 00000000000000..25d91849731c92 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-burmese_awesome_model_philgrey_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_model_philgrey_pipeline pipeline DistilBertForSequenceClassification from philgrey +author: John Snow Labs +name: burmese_awesome_model_philgrey_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_philgrey_pipeline` is a English model originally trained by philgrey. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_philgrey_pipeline_en_5.5.0_3.0_1726741028275.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_philgrey_pipeline_en_5.5.0_3.0_1726741028275.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_model_philgrey_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_model_philgrey_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_philgrey_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/philgrey/my_awesome_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-burmese_awesome_model_thomas628_en.md b/docs/_posts/ahmedlone127/2024-09-19-burmese_awesome_model_thomas628_en.md new file mode 100644 index 00000000000000..5cedafb8615607 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-burmese_awesome_model_thomas628_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_awesome_model_thomas628 DistilBertForSequenceClassification from thomas628 +author: John Snow Labs +name: burmese_awesome_model_thomas628 +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_thomas628` is a English model originally trained by thomas628. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_thomas628_en_5.5.0_3.0_1726719495480.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_thomas628_en_5.5.0_3.0_1726719495480.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_thomas628","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_thomas628", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_thomas628| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/thomas628/my_awesome_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-burmese_awesome_model_thomas628_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-burmese_awesome_model_thomas628_pipeline_en.md new file mode 100644 index 00000000000000..4df803f16361fd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-burmese_awesome_model_thomas628_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_model_thomas628_pipeline pipeline DistilBertForSequenceClassification from thomas628 +author: John Snow Labs +name: burmese_awesome_model_thomas628_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_thomas628_pipeline` is a English model originally trained by thomas628. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_thomas628_pipeline_en_5.5.0_3.0_1726719507847.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_thomas628_pipeline_en_5.5.0_3.0_1726719507847.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_model_thomas628_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_model_thomas628_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_thomas628_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/thomas628/my_awesome_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-burmese_awesome_qa_model_colabdash_en.md b/docs/_posts/ahmedlone127/2024-09-19-burmese_awesome_qa_model_colabdash_en.md new file mode 100644 index 00000000000000..d4c25e11bf2b61 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-burmese_awesome_qa_model_colabdash_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English burmese_awesome_qa_model_colabdash DistilBertForQuestionAnswering from colabdash +author: John Snow Labs +name: burmese_awesome_qa_model_colabdash +date: 2024-09-19 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_qa_model_colabdash` is a English model originally trained by colabdash. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_colabdash_en_5.5.0_3.0_1726727741660.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_colabdash_en_5.5.0_3.0_1726727741660.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("burmese_awesome_qa_model_colabdash","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("burmese_awesome_qa_model_colabdash", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_qa_model_colabdash| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/colabdash/my_awesome_qa_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-burmese_awesome_qa_model_torchat_en.md b/docs/_posts/ahmedlone127/2024-09-19-burmese_awesome_qa_model_torchat_en.md new file mode 100644 index 00000000000000..5050680d1d369b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-burmese_awesome_qa_model_torchat_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English burmese_awesome_qa_model_torchat DistilBertForQuestionAnswering from torchat +author: John Snow Labs +name: burmese_awesome_qa_model_torchat +date: 2024-09-19 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_qa_model_torchat` is a English model originally trained by torchat. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_torchat_en_5.5.0_3.0_1726785824150.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_torchat_en_5.5.0_3.0_1726785824150.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("burmese_awesome_qa_model_torchat","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("burmese_awesome_qa_model_torchat", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_qa_model_torchat| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/torchat/my_awesome_qa_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-burmese_awesome_qa_model_torchat_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-burmese_awesome_qa_model_torchat_pipeline_en.md new file mode 100644 index 00000000000000..5fc05a50eff80d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-burmese_awesome_qa_model_torchat_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English burmese_awesome_qa_model_torchat_pipeline pipeline DistilBertForQuestionAnswering from torchat +author: John Snow Labs +name: burmese_awesome_qa_model_torchat_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_qa_model_torchat_pipeline` is a English model originally trained by torchat. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_torchat_pipeline_en_5.5.0_3.0_1726785835301.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_torchat_pipeline_en_5.5.0_3.0_1726785835301.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_qa_model_torchat_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_qa_model_torchat_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_qa_model_torchat_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/torchat/my_awesome_qa_model + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-burmese_distilbert_imdb_en.md b/docs/_posts/ahmedlone127/2024-09-19-burmese_distilbert_imdb_en.md new file mode 100644 index 00000000000000..9c8f499862aa85 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-burmese_distilbert_imdb_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_distilbert_imdb DistilBertForSequenceClassification from nnhwin +author: John Snow Labs +name: burmese_distilbert_imdb +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_distilbert_imdb` is a English model originally trained by nnhwin. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_distilbert_imdb_en_5.5.0_3.0_1726741088665.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_distilbert_imdb_en_5.5.0_3.0_1726741088665.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_distilbert_imdb","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_distilbert_imdb", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_distilbert_imdb| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/nnhwin/my-distilbert-imdb \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-burmese_distilbert_imdb_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-burmese_distilbert_imdb_pipeline_en.md new file mode 100644 index 00000000000000..495e519b72498f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-burmese_distilbert_imdb_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_distilbert_imdb_pipeline pipeline DistilBertForSequenceClassification from nnhwin +author: John Snow Labs +name: burmese_distilbert_imdb_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_distilbert_imdb_pipeline` is a English model originally trained by nnhwin. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_distilbert_imdb_pipeline_en_5.5.0_3.0_1726741101256.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_distilbert_imdb_pipeline_en_5.5.0_3.0_1726741101256.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_distilbert_imdb_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_distilbert_imdb_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_distilbert_imdb_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/nnhwin/my-distilbert-imdb + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-cat_sayula_popoluca_iw_3_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-cat_sayula_popoluca_iw_3_pipeline_en.md new file mode 100644 index 00000000000000..1decd9748f02ad --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-cat_sayula_popoluca_iw_3_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English cat_sayula_popoluca_iw_3_pipeline pipeline XlmRoBertaForTokenClassification from homersimpson +author: John Snow Labs +name: cat_sayula_popoluca_iw_3_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`cat_sayula_popoluca_iw_3_pipeline` is a English model originally trained by homersimpson. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/cat_sayula_popoluca_iw_3_pipeline_en_5.5.0_3.0_1726710899674.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/cat_sayula_popoluca_iw_3_pipeline_en_5.5.0_3.0_1726710899674.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("cat_sayula_popoluca_iw_3_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("cat_sayula_popoluca_iw_3_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|cat_sayula_popoluca_iw_3_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|424.4 MB| + +## References + +https://huggingface.co/homersimpson/cat-pos-iw-3 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-category_1_balanced_distilbert_base_uncased_v4_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-category_1_balanced_distilbert_base_uncased_v4_pipeline_en.md new file mode 100644 index 00000000000000..7293bde7e44f9c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-category_1_balanced_distilbert_base_uncased_v4_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English category_1_balanced_distilbert_base_uncased_v4_pipeline pipeline DistilBertForSequenceClassification from chuuhtetnaing +author: John Snow Labs +name: category_1_balanced_distilbert_base_uncased_v4_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`category_1_balanced_distilbert_base_uncased_v4_pipeline` is a English model originally trained by chuuhtetnaing. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/category_1_balanced_distilbert_base_uncased_v4_pipeline_en_5.5.0_3.0_1726763462710.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/category_1_balanced_distilbert_base_uncased_v4_pipeline_en_5.5.0_3.0_1726763462710.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("category_1_balanced_distilbert_base_uncased_v4_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("category_1_balanced_distilbert_base_uncased_v4_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|category_1_balanced_distilbert_base_uncased_v4_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/chuuhtetnaing/category-1-balanced-distilbert-base-uncased-v4 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-chinese_sentiment_analysis_fund_direction_pipeline_zh.md b/docs/_posts/ahmedlone127/2024-09-19-chinese_sentiment_analysis_fund_direction_pipeline_zh.md new file mode 100644 index 00000000000000..b0fb9cb7a7c8b6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-chinese_sentiment_analysis_fund_direction_pipeline_zh.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Chinese chinese_sentiment_analysis_fund_direction_pipeline pipeline BertForSequenceClassification from sanshizhang +author: John Snow Labs +name: chinese_sentiment_analysis_fund_direction_pipeline +date: 2024-09-19 +tags: [zh, open_source, pipeline, onnx] +task: Text Classification +language: zh +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`chinese_sentiment_analysis_fund_direction_pipeline` is a Chinese model originally trained by sanshizhang. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/chinese_sentiment_analysis_fund_direction_pipeline_zh_5.5.0_3.0_1726706970859.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/chinese_sentiment_analysis_fund_direction_pipeline_zh_5.5.0_3.0_1726706970859.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("chinese_sentiment_analysis_fund_direction_pipeline", lang = "zh") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("chinese_sentiment_analysis_fund_direction_pipeline", lang = "zh") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|chinese_sentiment_analysis_fund_direction_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|zh| +|Size:|383.3 MB| + +## References + +https://huggingface.co/sanshizhang/Chinese-Sentiment-Analysis-Fund-Direction + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-chinese_sentiment_analysis_fund_direction_zh.md b/docs/_posts/ahmedlone127/2024-09-19-chinese_sentiment_analysis_fund_direction_zh.md new file mode 100644 index 00000000000000..2ea2c2bf13578e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-chinese_sentiment_analysis_fund_direction_zh.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Chinese chinese_sentiment_analysis_fund_direction BertForSequenceClassification from sanshizhang +author: John Snow Labs +name: chinese_sentiment_analysis_fund_direction +date: 2024-09-19 +tags: [zh, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: zh +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`chinese_sentiment_analysis_fund_direction` is a Chinese model originally trained by sanshizhang. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/chinese_sentiment_analysis_fund_direction_zh_5.5.0_3.0_1726706952776.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/chinese_sentiment_analysis_fund_direction_zh_5.5.0_3.0_1726706952776.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("chinese_sentiment_analysis_fund_direction","zh") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("chinese_sentiment_analysis_fund_direction", "zh") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|chinese_sentiment_analysis_fund_direction| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|zh| +|Size:|383.3 MB| + +## References + +https://huggingface.co/sanshizhang/Chinese-Sentiment-Analysis-Fund-Direction \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-cino_base_v2_tusa_en.md b/docs/_posts/ahmedlone127/2024-09-19-cino_base_v2_tusa_en.md new file mode 100644 index 00000000000000..5f23a699a47d17 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-cino_base_v2_tusa_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English cino_base_v2_tusa XlmRoBertaForSequenceClassification from UTibetNLP +author: John Snow Labs +name: cino_base_v2_tusa +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`cino_base_v2_tusa` is a English model originally trained by UTibetNLP. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/cino_base_v2_tusa_en_5.5.0_3.0_1726752070964.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/cino_base_v2_tusa_en_5.5.0_3.0_1726752070964.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("cino_base_v2_tusa","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("cino_base_v2_tusa", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|cino_base_v2_tusa| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|712.2 MB| + +## References + +https://huggingface.co/UTibetNLP/cino-base-v2_TUSA \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-cino_base_v2_tusa_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-cino_base_v2_tusa_pipeline_en.md new file mode 100644 index 00000000000000..a0dcaeff6bf97e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-cino_base_v2_tusa_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English cino_base_v2_tusa_pipeline pipeline XlmRoBertaForSequenceClassification from UTibetNLP +author: John Snow Labs +name: cino_base_v2_tusa_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`cino_base_v2_tusa_pipeline` is a English model originally trained by UTibetNLP. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/cino_base_v2_tusa_pipeline_en_5.5.0_3.0_1726752108920.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/cino_base_v2_tusa_pipeline_en_5.5.0_3.0_1726752108920.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("cino_base_v2_tusa_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("cino_base_v2_tusa_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|cino_base_v2_tusa_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|712.2 MB| + +## References + +https://huggingface.co/UTibetNLP/cino-base-v2_TUSA + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-citation_polarity_roberta_base_en.md b/docs/_posts/ahmedlone127/2024-09-19-citation_polarity_roberta_base_en.md new file mode 100644 index 00000000000000..8257043216a66c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-citation_polarity_roberta_base_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English citation_polarity_roberta_base RoBertaForSequenceClassification from coltekin +author: John Snow Labs +name: citation_polarity_roberta_base +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`citation_polarity_roberta_base` is a English model originally trained by coltekin. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/citation_polarity_roberta_base_en_5.5.0_3.0_1726733629262.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/citation_polarity_roberta_base_en_5.5.0_3.0_1726733629262.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("citation_polarity_roberta_base","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("citation_polarity_roberta_base", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|citation_polarity_roberta_base| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|426.9 MB| + +## References + +https://huggingface.co/coltekin/citation-polarity-roberta-base \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-climatebert_rereranker_f_cf_ipcc_en.md b/docs/_posts/ahmedlone127/2024-09-19-climatebert_rereranker_f_cf_ipcc_en.md new file mode 100644 index 00000000000000..415cc807f57538 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-climatebert_rereranker_f_cf_ipcc_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English climatebert_rereranker_f_cf_ipcc RoBertaForSequenceClassification from iestynmullinor +author: John Snow Labs +name: climatebert_rereranker_f_cf_ipcc +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`climatebert_rereranker_f_cf_ipcc` is a English model originally trained by iestynmullinor. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/climatebert_rereranker_f_cf_ipcc_en_5.5.0_3.0_1726726218620.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/climatebert_rereranker_f_cf_ipcc_en_5.5.0_3.0_1726726218620.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("climatebert_rereranker_f_cf_ipcc","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("climatebert_rereranker_f_cf_ipcc", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|climatebert_rereranker_f_cf_ipcc| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|309.6 MB| + +## References + +https://huggingface.co/iestynmullinor/climatebert-rereranker-f-cf-ipcc \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-code_bert_small_finetuned_v2_en.md b/docs/_posts/ahmedlone127/2024-09-19-code_bert_small_finetuned_v2_en.md new file mode 100644 index 00000000000000..48a4bbee8f25ac --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-code_bert_small_finetuned_v2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English code_bert_small_finetuned_v2 RoBertaEmbeddings from mshn74 +author: John Snow Labs +name: code_bert_small_finetuned_v2 +date: 2024-09-19 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`code_bert_small_finetuned_v2` is a English model originally trained by mshn74. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/code_bert_small_finetuned_v2_en_5.5.0_3.0_1726747082491.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/code_bert_small_finetuned_v2_en_5.5.0_3.0_1726747082491.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("code_bert_small_finetuned_v2","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("code_bert_small_finetuned_v2","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|code_bert_small_finetuned_v2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|311.8 MB| + +## References + +https://huggingface.co/mshn74/code_bert_small-finetuned-v2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-code_bert_small_finetuned_v2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-code_bert_small_finetuned_v2_pipeline_en.md new file mode 100644 index 00000000000000..4df2f49b6f97a5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-code_bert_small_finetuned_v2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English code_bert_small_finetuned_v2_pipeline pipeline RoBertaEmbeddings from mshn74 +author: John Snow Labs +name: code_bert_small_finetuned_v2_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`code_bert_small_finetuned_v2_pipeline` is a English model originally trained by mshn74. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/code_bert_small_finetuned_v2_pipeline_en_5.5.0_3.0_1726747098257.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/code_bert_small_finetuned_v2_pipeline_en_5.5.0_3.0_1726747098257.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("code_bert_small_finetuned_v2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("code_bert_small_finetuned_v2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|code_bert_small_finetuned_v2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|311.8 MB| + +## References + +https://huggingface.co/mshn74/code_bert_small-finetuned-v2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-coha1880s_en.md b/docs/_posts/ahmedlone127/2024-09-19-coha1880s_en.md new file mode 100644 index 00000000000000..cbf92813d4441f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-coha1880s_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English coha1880s RoBertaEmbeddings from simonmun +author: John Snow Labs +name: coha1880s +date: 2024-09-19 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`coha1880s` is a English model originally trained by simonmun. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/coha1880s_en_5.5.0_3.0_1726778015563.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/coha1880s_en_5.5.0_3.0_1726778015563.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("coha1880s","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("coha1880s","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|coha1880s| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|311.8 MB| + +## References + +https://huggingface.co/simonmun/COHA1880s \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-coha1880s_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-coha1880s_pipeline_en.md new file mode 100644 index 00000000000000..75efa16542d781 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-coha1880s_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English coha1880s_pipeline pipeline RoBertaEmbeddings from simonmun +author: John Snow Labs +name: coha1880s_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`coha1880s_pipeline` is a English model originally trained by simonmun. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/coha1880s_pipeline_en_5.5.0_3.0_1726778035696.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/coha1880s_pipeline_en_5.5.0_3.0_1726778035696.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("coha1880s_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("coha1880s_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|coha1880s_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|311.9 MB| + +## References + +https://huggingface.co/simonmun/COHA1880s + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-cold_fusion_itr25_seed1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-cold_fusion_itr25_seed1_pipeline_en.md new file mode 100644 index 00000000000000..9865a0b39b6080 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-cold_fusion_itr25_seed1_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English cold_fusion_itr25_seed1_pipeline pipeline RoBertaForSequenceClassification from ibm +author: John Snow Labs +name: cold_fusion_itr25_seed1_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`cold_fusion_itr25_seed1_pipeline` is a English model originally trained by ibm. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/cold_fusion_itr25_seed1_pipeline_en_5.5.0_3.0_1726726274550.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/cold_fusion_itr25_seed1_pipeline_en_5.5.0_3.0_1726726274550.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("cold_fusion_itr25_seed1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("cold_fusion_itr25_seed1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|cold_fusion_itr25_seed1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|468.0 MB| + +## References + +https://huggingface.co/ibm/ColD-Fusion-itr25-seed1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-common_voice_lt.md b/docs/_posts/ahmedlone127/2024-09-19-common_voice_lt.md new file mode 100644 index 00000000000000..91e0aadbfd3116 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-common_voice_lt.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Lithuanian common_voice WhisperForCTC from Tomas1234 +author: John Snow Labs +name: common_voice +date: 2024-09-19 +tags: [lt, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: lt +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`common_voice` is a Lithuanian model originally trained by Tomas1234. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/common_voice_lt_5.5.0_3.0_1726757233936.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/common_voice_lt_5.5.0_3.0_1726757233936.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("common_voice","lt") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("common_voice", "lt") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|common_voice| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|lt| +|Size:|1.7 GB| + +## References + +https://huggingface.co/Tomas1234/common_voice \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-common_voice_pipeline_lt.md b/docs/_posts/ahmedlone127/2024-09-19-common_voice_pipeline_lt.md new file mode 100644 index 00000000000000..e84b5b5e79c24d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-common_voice_pipeline_lt.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Lithuanian common_voice_pipeline pipeline WhisperForCTC from Tomas1234 +author: John Snow Labs +name: common_voice_pipeline +date: 2024-09-19 +tags: [lt, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: lt +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`common_voice_pipeline` is a Lithuanian model originally trained by Tomas1234. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/common_voice_pipeline_lt_5.5.0_3.0_1726757318113.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/common_voice_pipeline_lt_5.5.0_3.0_1726757318113.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("common_voice_pipeline", lang = "lt") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("common_voice_pipeline", lang = "lt") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|common_voice_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|lt| +|Size:|1.7 GB| + +## References + +https://huggingface.co/Tomas1234/common_voice + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-conflibert_scr_cased_en.md b/docs/_posts/ahmedlone127/2024-09-19-conflibert_scr_cased_en.md new file mode 100644 index 00000000000000..e5c18954f97527 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-conflibert_scr_cased_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English conflibert_scr_cased BertEmbeddings from snowood1 +author: John Snow Labs +name: conflibert_scr_cased +date: 2024-09-19 +tags: [en, open_source, onnx, embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`conflibert_scr_cased` is a English model originally trained by snowood1. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/conflibert_scr_cased_en_5.5.0_3.0_1726717422367.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/conflibert_scr_cased_en_5.5.0_3.0_1726717422367.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = BertEmbeddings.pretrained("conflibert_scr_cased","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = BertEmbeddings.pretrained("conflibert_scr_cased","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|conflibert_scr_cased| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[bert]| +|Language:|en| +|Size:|405.9 MB| + +## References + +https://huggingface.co/snowood1/ConfliBERT-scr-cased \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-ct_m1_complete_en.md b/docs/_posts/ahmedlone127/2024-09-19-ct_m1_complete_en.md new file mode 100644 index 00000000000000..01b980ab9fbdf1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-ct_m1_complete_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English ct_m1_complete RoBertaEmbeddings from crisistransformers +author: John Snow Labs +name: ct_m1_complete +date: 2024-09-19 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ct_m1_complete` is a English model originally trained by crisistransformers. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ct_m1_complete_en_5.5.0_3.0_1726746961595.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ct_m1_complete_en_5.5.0_3.0_1726746961595.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("ct_m1_complete","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("ct_m1_complete","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ct_m1_complete| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|504.7 MB| + +## References + +https://huggingface.co/crisistransformers/CT-M1-Complete \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-ct_m1_complete_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-ct_m1_complete_pipeline_en.md new file mode 100644 index 00000000000000..43e988b8b748c9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-ct_m1_complete_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English ct_m1_complete_pipeline pipeline RoBertaEmbeddings from crisistransformers +author: John Snow Labs +name: ct_m1_complete_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ct_m1_complete_pipeline` is a English model originally trained by crisistransformers. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ct_m1_complete_pipeline_en_5.5.0_3.0_1726746987857.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ct_m1_complete_pipeline_en_5.5.0_3.0_1726746987857.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("ct_m1_complete_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("ct_m1_complete_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ct_m1_complete_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|504.7 MB| + +## References + +https://huggingface.co/crisistransformers/CT-M1-Complete + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-ct_m2_onelook_en.md b/docs/_posts/ahmedlone127/2024-09-19-ct_m2_onelook_en.md new file mode 100644 index 00000000000000..ec97f883d45eb8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-ct_m2_onelook_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English ct_m2_onelook RoBertaEmbeddings from crisistransformers +author: John Snow Labs +name: ct_m2_onelook +date: 2024-09-19 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ct_m2_onelook` is a English model originally trained by crisistransformers. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ct_m2_onelook_en_5.5.0_3.0_1726747236316.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ct_m2_onelook_en_5.5.0_3.0_1726747236316.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("ct_m2_onelook","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("ct_m2_onelook","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ct_m2_onelook| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|466.3 MB| + +## References + +https://huggingface.co/crisistransformers/CT-M2-OneLook \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-ct_m2_onelook_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-ct_m2_onelook_pipeline_en.md new file mode 100644 index 00000000000000..ed6264cf9c0445 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-ct_m2_onelook_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English ct_m2_onelook_pipeline pipeline RoBertaEmbeddings from crisistransformers +author: John Snow Labs +name: ct_m2_onelook_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ct_m2_onelook_pipeline` is a English model originally trained by crisistransformers. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ct_m2_onelook_pipeline_en_5.5.0_3.0_1726747259647.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ct_m2_onelook_pipeline_en_5.5.0_3.0_1726747259647.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("ct_m2_onelook_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("ct_m2_onelook_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ct_m2_onelook_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|466.3 MB| + +## References + +https://huggingface.co/crisistransformers/CT-M2-OneLook + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-cv9_special_batch8_lr6_small_id.md b/docs/_posts/ahmedlone127/2024-09-19-cv9_special_batch8_lr6_small_id.md new file mode 100644 index 00000000000000..f888115fd5a089 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-cv9_special_batch8_lr6_small_id.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Indonesian cv9_special_batch8_lr6_small WhisperForCTC from TheRains +author: John Snow Labs +name: cv9_special_batch8_lr6_small +date: 2024-09-19 +tags: [id, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: id +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`cv9_special_batch8_lr6_small` is a Indonesian model originally trained by TheRains. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/cv9_special_batch8_lr6_small_id_5.5.0_3.0_1726756809654.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/cv9_special_batch8_lr6_small_id_5.5.0_3.0_1726756809654.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("cv9_special_batch8_lr6_small","id") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("cv9_special_batch8_lr6_small", "id") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|cv9_special_batch8_lr6_small| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|id| +|Size:|1.7 GB| + +## References + +https://huggingface.co/TheRains/cv9-special-batch8-lr6-small \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-cv9_special_batch8_lr6_small_pipeline_id.md b/docs/_posts/ahmedlone127/2024-09-19-cv9_special_batch8_lr6_small_pipeline_id.md new file mode 100644 index 00000000000000..d7478ddf46acae --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-cv9_special_batch8_lr6_small_pipeline_id.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Indonesian cv9_special_batch8_lr6_small_pipeline pipeline WhisperForCTC from TheRains +author: John Snow Labs +name: cv9_special_batch8_lr6_small_pipeline +date: 2024-09-19 +tags: [id, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: id +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`cv9_special_batch8_lr6_small_pipeline` is a Indonesian model originally trained by TheRains. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/cv9_special_batch8_lr6_small_pipeline_id_5.5.0_3.0_1726756893012.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/cv9_special_batch8_lr6_small_pipeline_id_5.5.0_3.0_1726756893012.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("cv9_special_batch8_lr6_small_pipeline", lang = "id") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("cv9_special_batch8_lr6_small_pipeline", lang = "id") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|cv9_special_batch8_lr6_small_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|id| +|Size:|1.7 GB| + +## References + +https://huggingface.co/TheRains/cv9-special-batch8-lr6-small + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-cybert_our_data_en.md b/docs/_posts/ahmedlone127/2024-09-19-cybert_our_data_en.md new file mode 100644 index 00000000000000..16b36c53c34ca8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-cybert_our_data_en.md @@ -0,0 +1,100 @@ +--- +layout: model +title: English cybert_our_data RoBertaForTokenClassification from anonymouspd +author: John Snow Labs +name: cybert_our_data +date: 2024-09-19 +tags: [roberta, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`cybert_our_data` is a English model originally trained by anonymouspd. + +## Predicted Entities + + + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/cybert_our_data_en_5.5.0_3.0_1726729914290.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/cybert_our_data_en_5.5.0_3.0_1726729914290.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols(["document"]) \ + .setOutputCol("token") + + +tokenClassifier = RoBertaForTokenClassification.pretrained("cybert_our_data","en") \ + .setInputCols(["document","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = Tokenizer() \ + .setInputCols(Array("document")) \ + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification + .pretrained("cybert_our_data", "en") + .setInputCols(Array("document","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|cybert_our_data| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|311.4 MB| + +## References + +References + +https://huggingface.co/anonymouspd/CyBERT-our-data \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-czech_roberta_base_1_en.md b/docs/_posts/ahmedlone127/2024-09-19-czech_roberta_base_1_en.md new file mode 100644 index 00000000000000..20234b242478ca --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-czech_roberta_base_1_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English czech_roberta_base_1 RoBertaForSequenceClassification from Adammz +author: John Snow Labs +name: czech_roberta_base_1 +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`czech_roberta_base_1` is a English model originally trained by Adammz. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/czech_roberta_base_1_en_5.5.0_3.0_1726732618776.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/czech_roberta_base_1_en_5.5.0_3.0_1726732618776.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("czech_roberta_base_1","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("czech_roberta_base_1", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|czech_roberta_base_1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|468.5 MB| + +## References + +https://huggingface.co/Adammz/cs_roberta_base-1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-darija_test8_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-darija_test8_pipeline_en.md new file mode 100644 index 00000000000000..f6569d906923f8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-darija_test8_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English darija_test8_pipeline pipeline BertForSequenceClassification from smerchi +author: John Snow Labs +name: darija_test8_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`darija_test8_pipeline` is a English model originally trained by smerchi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/darija_test8_pipeline_en_5.5.0_3.0_1726736452528.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/darija_test8_pipeline_en_5.5.0_3.0_1726736452528.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("darija_test8_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("darija_test8_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|darija_test8_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|553.7 MB| + +## References + +https://huggingface.co/smerchi/darija_test8 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-defsent_roberta_base_cls_en.md b/docs/_posts/ahmedlone127/2024-09-19-defsent_roberta_base_cls_en.md new file mode 100644 index 00000000000000..1861d1b648164e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-defsent_roberta_base_cls_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English defsent_roberta_base_cls RoBertaEmbeddings from cl-nagoya +author: John Snow Labs +name: defsent_roberta_base_cls +date: 2024-09-19 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`defsent_roberta_base_cls` is a English model originally trained by cl-nagoya. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/defsent_roberta_base_cls_en_5.5.0_3.0_1726778292225.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/defsent_roberta_base_cls_en_5.5.0_3.0_1726778292225.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("defsent_roberta_base_cls","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("defsent_roberta_base_cls","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|defsent_roberta_base_cls| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|413.1 MB| + +## References + +https://huggingface.co/cl-nagoya/defsent-roberta-base-cls \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-defsent_roberta_base_cls_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-defsent_roberta_base_cls_pipeline_en.md new file mode 100644 index 00000000000000..9736473c3add47 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-defsent_roberta_base_cls_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English defsent_roberta_base_cls_pipeline pipeline RoBertaEmbeddings from cl-nagoya +author: John Snow Labs +name: defsent_roberta_base_cls_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`defsent_roberta_base_cls_pipeline` is a English model originally trained by cl-nagoya. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/defsent_roberta_base_cls_pipeline_en_5.5.0_3.0_1726778336060.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/defsent_roberta_base_cls_pipeline_en_5.5.0_3.0_1726778336060.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("defsent_roberta_base_cls_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("defsent_roberta_base_cls_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|defsent_roberta_base_cls_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|413.1 MB| + +## References + +https://huggingface.co/cl-nagoya/defsent-roberta-base-cls + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-depress_roberta_en.md b/docs/_posts/ahmedlone127/2024-09-19-depress_roberta_en.md new file mode 100644 index 00000000000000..b4cb61d8301075 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-depress_roberta_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English depress_roberta RoBertaForSequenceClassification from tiya1012 +author: John Snow Labs +name: depress_roberta +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`depress_roberta` is a English model originally trained by tiya1012. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/depress_roberta_en_5.5.0_3.0_1726733555154.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/depress_roberta_en_5.5.0_3.0_1726733555154.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("depress_roberta","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("depress_roberta", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|depress_roberta| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|468.3 MB| + +## References + +https://huggingface.co/tiya1012/depress_roberta \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-depress_roberta_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-depress_roberta_pipeline_en.md new file mode 100644 index 00000000000000..2eb5e20f3e3572 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-depress_roberta_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English depress_roberta_pipeline pipeline RoBertaForSequenceClassification from tiya1012 +author: John Snow Labs +name: depress_roberta_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`depress_roberta_pipeline` is a English model originally trained by tiya1012. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/depress_roberta_pipeline_en_5.5.0_3.0_1726733577673.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/depress_roberta_pipeline_en_5.5.0_3.0_1726733577673.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("depress_roberta_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("depress_roberta_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|depress_roberta_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|468.3 MB| + +## References + +https://huggingface.co/tiya1012/depress_roberta + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-discord_philosophy_medium_en.md b/docs/_posts/ahmedlone127/2024-09-19-discord_philosophy_medium_en.md new file mode 100644 index 00000000000000..435219a8bbad5e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-discord_philosophy_medium_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English discord_philosophy_medium RoBertaEmbeddings from TheDiamondKing +author: John Snow Labs +name: discord_philosophy_medium +date: 2024-09-19 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`discord_philosophy_medium` is a English model originally trained by TheDiamondKing. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/discord_philosophy_medium_en_5.5.0_3.0_1726747813878.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/discord_philosophy_medium_en_5.5.0_3.0_1726747813878.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("discord_philosophy_medium","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("discord_philosophy_medium","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|discord_philosophy_medium| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|469.7 MB| + +## References + +https://huggingface.co/TheDiamondKing/Discord-Philosophy-Medium \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-discord_philosophy_medium_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-discord_philosophy_medium_pipeline_en.md new file mode 100644 index 00000000000000..4c998f5d96b4e8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-discord_philosophy_medium_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English discord_philosophy_medium_pipeline pipeline RoBertaEmbeddings from TheDiamondKing +author: John Snow Labs +name: discord_philosophy_medium_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`discord_philosophy_medium_pipeline` is a English model originally trained by TheDiamondKing. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/discord_philosophy_medium_pipeline_en_5.5.0_3.0_1726747837858.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/discord_philosophy_medium_pipeline_en_5.5.0_3.0_1726747837858.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("discord_philosophy_medium_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("discord_philosophy_medium_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|discord_philosophy_medium_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|469.7 MB| + +## References + +https://huggingface.co/TheDiamondKing/Discord-Philosophy-Medium + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distalbert_arabic_classification_en.md b/docs/_posts/ahmedlone127/2024-09-19-distalbert_arabic_classification_en.md new file mode 100644 index 00000000000000..9d93d22ffba406 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distalbert_arabic_classification_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distalbert_arabic_classification DistilBertForSequenceClassification from abd95 +author: John Snow Labs +name: distalbert_arabic_classification +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distalbert_arabic_classification` is a English model originally trained by abd95. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distalbert_arabic_classification_en_5.5.0_3.0_1726763469305.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distalbert_arabic_classification_en_5.5.0_3.0_1726763469305.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distalbert_arabic_classification","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distalbert_arabic_classification", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distalbert_arabic_classification| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/abd95/distalbert-ar-classification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distiberthatespeech_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-distiberthatespeech_pipeline_en.md new file mode 100644 index 00000000000000..94f7b6175dae75 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distiberthatespeech_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distiberthatespeech_pipeline pipeline DistilBertForSequenceClassification from GalMargo +author: John Snow Labs +name: distiberthatespeech_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distiberthatespeech_pipeline` is a English model originally trained by GalMargo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distiberthatespeech_pipeline_en_5.5.0_3.0_1726743443870.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distiberthatespeech_pipeline_en_5.5.0_3.0_1726743443870.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distiberthatespeech_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distiberthatespeech_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distiberthatespeech_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/GalMargo/DistiBERTHateSpeech + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distil_bert_v3_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-distil_bert_v3_pipeline_en.md new file mode 100644 index 00000000000000..cdbd03d090ab03 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distil_bert_v3_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distil_bert_v3_pipeline pipeline DistilBertForSequenceClassification from KayraAksit +author: John Snow Labs +name: distil_bert_v3_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distil_bert_v3_pipeline` is a English model originally trained by KayraAksit. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distil_bert_v3_pipeline_en_5.5.0_3.0_1726744187131.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distil_bert_v3_pipeline_en_5.5.0_3.0_1726744187131.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distil_bert_v3_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distil_bert_v3_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distil_bert_v3_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/KayraAksit/distil_bert_v3 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distil_news_finetune2_en.md b/docs/_posts/ahmedlone127/2024-09-19-distil_news_finetune2_en.md new file mode 100644 index 00000000000000..8cc4175a4a536f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distil_news_finetune2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distil_news_finetune2 DistilBertForSequenceClassification from anggari +author: John Snow Labs +name: distil_news_finetune2 +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distil_news_finetune2` is a English model originally trained by anggari. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distil_news_finetune2_en_5.5.0_3.0_1726763630796.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distil_news_finetune2_en_5.5.0_3.0_1726763630796.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distil_news_finetune2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distil_news_finetune2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distil_news_finetune2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/anggari/distil_news_finetune2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distil_news_finetune2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-distil_news_finetune2_pipeline_en.md new file mode 100644 index 00000000000000..9891c81e994e65 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distil_news_finetune2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distil_news_finetune2_pipeline pipeline DistilBertForSequenceClassification from anggari +author: John Snow Labs +name: distil_news_finetune2_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distil_news_finetune2_pipeline` is a English model originally trained by anggari. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distil_news_finetune2_pipeline_en_5.5.0_3.0_1726763643349.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distil_news_finetune2_pipeline_en_5.5.0_3.0_1726763643349.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distil_news_finetune2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distil_news_finetune2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distil_news_finetune2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/anggari/distil_news_finetune2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distil_task_b_3_en.md b/docs/_posts/ahmedlone127/2024-09-19-distil_task_b_3_en.md new file mode 100644 index 00000000000000..5ad163bdc785ff --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distil_task_b_3_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distil_task_b_3 DistilBertForSequenceClassification from sheduele +author: John Snow Labs +name: distil_task_b_3 +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distil_task_b_3` is a English model originally trained by sheduele. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distil_task_b_3_en_5.5.0_3.0_1726742803768.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distil_task_b_3_en_5.5.0_3.0_1726742803768.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distil_task_b_3","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distil_task_b_3", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distil_task_b_3| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/sheduele/distil_task_B_3 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distil_task_b_3_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-distil_task_b_3_pipeline_en.md new file mode 100644 index 00000000000000..e365cb79e305f7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distil_task_b_3_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distil_task_b_3_pipeline pipeline DistilBertForSequenceClassification from sheduele +author: John Snow Labs +name: distil_task_b_3_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distil_task_b_3_pipeline` is a English model originally trained by sheduele. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distil_task_b_3_pipeline_en_5.5.0_3.0_1726742816085.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distil_task_b_3_pipeline_en_5.5.0_3.0_1726742816085.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distil_task_b_3_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distil_task_b_3_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distil_task_b_3_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/sheduele/distil_task_B_3 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilbert_70k_qa_model_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilbert_70k_qa_model_pipeline_en.md new file mode 100644 index 00000000000000..8335497c87014d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilbert_70k_qa_model_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English distilbert_70k_qa_model_pipeline pipeline DistilBertForQuestionAnswering from Vasanth +author: John Snow Labs +name: distilbert_70k_qa_model_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_70k_qa_model_pipeline` is a English model originally trained by Vasanth. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_70k_qa_model_pipeline_en_5.5.0_3.0_1726748426855.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_70k_qa_model_pipeline_en_5.5.0_3.0_1726748426855.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_70k_qa_model_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_70k_qa_model_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_70k_qa_model_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|243.8 MB| + +## References + +https://huggingface.co/Vasanth/distilbert_70k_qa_model + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilbert_anandharaju_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilbert_anandharaju_en.md new file mode 100644 index 00000000000000..bfd2cdb1205482 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilbert_anandharaju_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English distilbert_anandharaju DistilBertForQuestionAnswering from Anandharaju +author: John Snow Labs +name: distilbert_anandharaju +date: 2024-09-19 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_anandharaju` is a English model originally trained by Anandharaju. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_anandharaju_en_5.5.0_3.0_1726786046576.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_anandharaju_en_5.5.0_3.0_1726786046576.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_anandharaju","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_anandharaju", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_anandharaju| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/Anandharaju/distilbert \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilbert_anandharaju_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilbert_anandharaju_pipeline_en.md new file mode 100644 index 00000000000000..2d0780cd1fab4a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilbert_anandharaju_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English distilbert_anandharaju_pipeline pipeline DistilBertForQuestionAnswering from Anandharaju +author: John Snow Labs +name: distilbert_anandharaju_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_anandharaju_pipeline` is a English model originally trained by Anandharaju. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_anandharaju_pipeline_en_5.5.0_3.0_1726786057907.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_anandharaju_pipeline_en_5.5.0_3.0_1726786057907.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_anandharaju_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_anandharaju_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_anandharaju_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.3 MB| + +## References + +https://huggingface.co/Anandharaju/distilbert + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_cased_distilled_squad_finetuned_squad_test3_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_cased_distilled_squad_finetuned_squad_test3_en.md new file mode 100644 index 00000000000000..cb0bb0fcfc1a96 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_cased_distilled_squad_finetuned_squad_test3_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English distilbert_base_cased_distilled_squad_finetuned_squad_test3 DistilBertForQuestionAnswering from allistair99 +author: John Snow Labs +name: distilbert_base_cased_distilled_squad_finetuned_squad_test3 +date: 2024-09-19 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_cased_distilled_squad_finetuned_squad_test3` is a English model originally trained by allistair99. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_cased_distilled_squad_finetuned_squad_test3_en_5.5.0_3.0_1726727801103.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_cased_distilled_squad_finetuned_squad_test3_en_5.5.0_3.0_1726727801103.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_cased_distilled_squad_finetuned_squad_test3","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_cased_distilled_squad_finetuned_squad_test3", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_cased_distilled_squad_finetuned_squad_test3| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|243.8 MB| + +## References + +https://huggingface.co/allistair99/distilbert-base-cased-distilled-squad-finetuned-squad-test3 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_cased_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_cased_en.md new file mode 100644 index 00000000000000..69dc9869565953 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_cased_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: DistilBERT base model (cased) +author: John Snow Labs +name: distilbert_base_cased +date: 2024-09-19 +tags: [distilbert, en, english, open_source, embeddings, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +This model is a distilled version of the [BERT base model](https://huggingface.co/bert-base-cased). It was introduced in [this paper](https://arxiv.org/abs/1910.01108). The code for the distillation process can be found [here](https://github.com/huggingface/transformers/tree/master/examples/research_projects/distillation). This model is cased: it does make a difference between english and English. + +## Predicted Entities + + + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_cased_en_5.5.0_3.0_1726742707299.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_cased_en_5.5.0_3.0_1726742707299.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + +{:.model-param} + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +embeddings = DistilBertEmbeddings.pretrained("distilbert_base_cased", "en") \ +.setInputCols("sentence", "token") \ +.setOutputCol("embeddings") +nlp_pipeline = Pipeline(stages=[document_assembler, sentence_detector, tokenizer, embeddings]) +``` +```scala +val embeddings = DistilBertEmbeddings.pretrained("distilbert_base_cased", "en") +.setInputCols("sentence", "token") +.setOutputCol("embeddings") +val pipeline = new Pipeline().setStages(Array(document_assembler, sentence_detector, tokenizer, embeddings)) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.embed.distilbert").predict("""Put your text here.""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_cased| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|246.0 MB| + +## References + +References + +[https://huggingface.co/distilbert-base-cased](https://huggingface.co/distilbert-base-cased) + +## Benchmarking + +```bash + +Benchmarking + + +When fine-tuned on downstream tasks, this model achieves the following results: + +Glue test results: + +| Task | MNLI | QQP | QNLI | SST-2 | CoLA | STS-B | MRPC | RTE | +|:----:|:----:|:----:|:----:|:-----:|:----:|:-----:|:----:|:----:| +| | 81.5 | 87.8 | 88.2 | 90.4 | 47.2 | 85.5 | 85.6 | 60.6 | +``` \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_cased_kallidavidson_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_cased_kallidavidson_en.md new file mode 100644 index 00000000000000..a7b17bc1316b2c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_cased_kallidavidson_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English distilbert_base_cased_kallidavidson DistilBertForQuestionAnswering from kallidavidson +author: John Snow Labs +name: distilbert_base_cased_kallidavidson +date: 2024-09-19 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_cased_kallidavidson` is a English model originally trained by kallidavidson. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_cased_kallidavidson_en_5.5.0_3.0_1726766671555.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_cased_kallidavidson_en_5.5.0_3.0_1726766671555.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_cased_kallidavidson","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_cased_kallidavidson", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_cased_kallidavidson| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|243.8 MB| + +## References + +https://huggingface.co/kallidavidson/distilbert-base-cased \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_cased_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_cased_pipeline_en.md new file mode 100644 index 00000000000000..80fc46b17c68bf --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_cased_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_cased_pipeline pipeline DistilBertForSequenceClassification from xshubhamx +author: John Snow Labs +name: distilbert_base_cased_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_cased_pipeline` is a English model originally trained by xshubhamx. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_cased_pipeline_en_5.5.0_3.0_1726742720077.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_cased_pipeline_en_5.5.0_3.0_1726742720077.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_cased_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_cased_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_cased_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|246.1 MB| + +## References + +https://huggingface.co/xshubhamx/distilbert-base-cased + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_multilingual_cased_actitud_german_tener_latin_razon_esp_pipeline_xx.md b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_multilingual_cased_actitud_german_tener_latin_razon_esp_pipeline_xx.md new file mode 100644 index 00000000000000..e3a1a83f0cc202 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_multilingual_cased_actitud_german_tener_latin_razon_esp_pipeline_xx.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Multilingual distilbert_base_multilingual_cased_actitud_german_tener_latin_razon_esp_pipeline pipeline DistilBertForSequenceClassification from rogelioplatt +author: John Snow Labs +name: distilbert_base_multilingual_cased_actitud_german_tener_latin_razon_esp_pipeline +date: 2024-09-19 +tags: [xx, open_source, pipeline, onnx] +task: Text Classification +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_multilingual_cased_actitud_german_tener_latin_razon_esp_pipeline` is a Multilingual model originally trained by rogelioplatt. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_multilingual_cased_actitud_german_tener_latin_razon_esp_pipeline_xx_5.5.0_3.0_1726764163574.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_multilingual_cased_actitud_german_tener_latin_razon_esp_pipeline_xx_5.5.0_3.0_1726764163574.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_multilingual_cased_actitud_german_tener_latin_razon_esp_pipeline", lang = "xx") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_multilingual_cased_actitud_german_tener_latin_razon_esp_pipeline", lang = "xx") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_multilingual_cased_actitud_german_tener_latin_razon_esp_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|xx| +|Size:|507.6 MB| + +## References + +https://huggingface.co/rogelioplatt/distilbert-base-multilingual-cased-Actitud_de_tener_la_razon_Esp + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_multilingual_cased_regression_finetuned_news_all_xx.md b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_multilingual_cased_regression_finetuned_news_all_xx.md new file mode 100644 index 00000000000000..9f15f085547009 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_multilingual_cased_regression_finetuned_news_all_xx.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Multilingual distilbert_base_multilingual_cased_regression_finetuned_news_all DistilBertForSequenceClassification from Mou11209203 +author: John Snow Labs +name: distilbert_base_multilingual_cased_regression_finetuned_news_all +date: 2024-09-19 +tags: [xx, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_multilingual_cased_regression_finetuned_news_all` is a Multilingual model originally trained by Mou11209203. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_multilingual_cased_regression_finetuned_news_all_xx_5.5.0_3.0_1726743601435.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_multilingual_cased_regression_finetuned_news_all_xx_5.5.0_3.0_1726743601435.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_multilingual_cased_regression_finetuned_news_all","xx") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_multilingual_cased_regression_finetuned_news_all", "xx") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_multilingual_cased_regression_finetuned_news_all| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|xx| +|Size:|507.7 MB| + +## References + +https://huggingface.co/Mou11209203/distilbert-base-multilingual-cased_regression_finetuned_news_all \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_thai_cased_finetuned_sentiment_th.md b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_thai_cased_finetuned_sentiment_th.md new file mode 100644 index 00000000000000..1eafb25bd2f494 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_thai_cased_finetuned_sentiment_th.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Thai distilbert_base_thai_cased_finetuned_sentiment DistilBertForSequenceClassification from FlukeTJ +author: John Snow Labs +name: distilbert_base_thai_cased_finetuned_sentiment +date: 2024-09-19 +tags: [th, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: th +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_thai_cased_finetuned_sentiment` is a Thai model originally trained by FlukeTJ. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_thai_cased_finetuned_sentiment_th_5.5.0_3.0_1726763693712.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_thai_cased_finetuned_sentiment_th_5.5.0_3.0_1726763693712.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_thai_cased_finetuned_sentiment","th") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_thai_cased_finetuned_sentiment", "th") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_thai_cased_finetuned_sentiment| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|th| +|Size:|507.6 MB| + +## References + +https://huggingface.co/FlukeTJ/distilbert-base-thai-cased-finetuned-sentiment \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_3epoch7_2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_3epoch7_2_pipeline_en.md new file mode 100644 index 00000000000000..325dae0bdca8a7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_3epoch7_2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_3epoch7_2_pipeline pipeline DistilBertForSequenceClassification from dianamihalache27 +author: John Snow Labs +name: distilbert_base_uncased_3epoch7_2_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_3epoch7_2_pipeline` is a English model originally trained by dianamihalache27. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_3epoch7_2_pipeline_en_5.5.0_3.0_1726704745918.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_3epoch7_2_pipeline_en_5.5.0_3.0_1726704745918.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_3epoch7_2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_3epoch7_2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_3epoch7_2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/dianamihalache27/distilbert-base-uncased_3epoch7.2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_amazon_polarity_google_colab_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_amazon_polarity_google_colab_en.md new file mode 100644 index 00000000000000..dcaa4e32697793 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_amazon_polarity_google_colab_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_amazon_polarity_google_colab DistilBertForSequenceClassification from jigarcpatel +author: John Snow Labs +name: distilbert_base_uncased_amazon_polarity_google_colab +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_amazon_polarity_google_colab` is a English model originally trained by jigarcpatel. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_amazon_polarity_google_colab_en_5.5.0_3.0_1726718982826.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_amazon_polarity_google_colab_en_5.5.0_3.0_1726718982826.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_amazon_polarity_google_colab","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_amazon_polarity_google_colab", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_amazon_polarity_google_colab| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/jigarcpatel/distilbert-base-uncased-amazon-polarity-google-colab \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_amazon_polarity_google_colab_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_amazon_polarity_google_colab_pipeline_en.md new file mode 100644 index 00000000000000..3da8f567603dbd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_amazon_polarity_google_colab_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_amazon_polarity_google_colab_pipeline pipeline DistilBertForSequenceClassification from jigarcpatel +author: John Snow Labs +name: distilbert_base_uncased_amazon_polarity_google_colab_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_amazon_polarity_google_colab_pipeline` is a English model originally trained by jigarcpatel. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_amazon_polarity_google_colab_pipeline_en_5.5.0_3.0_1726718999524.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_amazon_polarity_google_colab_pipeline_en_5.5.0_3.0_1726718999524.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_amazon_polarity_google_colab_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_amazon_polarity_google_colab_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_amazon_polarity_google_colab_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/jigarcpatel/distilbert-base-uncased-amazon-polarity-google-colab + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_banking_zphr_0st72_ut102ut10_plain_simsp_clean_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_banking_zphr_0st72_ut102ut10_plain_simsp_clean_en.md new file mode 100644 index 00000000000000..24aed23b1a9dd8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_banking_zphr_0st72_ut102ut10_plain_simsp_clean_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_banking_zphr_0st72_ut102ut10_plain_simsp_clean DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_banking_zphr_0st72_ut102ut10_plain_simsp_clean +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_banking_zphr_0st72_ut102ut10_plain_simsp_clean` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_banking_zphr_0st72_ut102ut10_plain_simsp_clean_en_5.5.0_3.0_1726743219666.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_banking_zphr_0st72_ut102ut10_plain_simsp_clean_en_5.5.0_3.0_1726743219666.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_banking_zphr_0st72_ut102ut10_plain_simsp_clean","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_banking_zphr_0st72_ut102ut10_plain_simsp_clean", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_banking_zphr_0st72_ut102ut10_plain_simsp_clean| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_banking_zphr_0st72_ut102ut10_plain_simsp_clean \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_banking_zphr_0st72_ut102ut10_plain_simsp_clean_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_banking_zphr_0st72_ut102ut10_plain_simsp_clean_pipeline_en.md new file mode 100644 index 00000000000000..bbaf96f12c88ee --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_banking_zphr_0st72_ut102ut10_plain_simsp_clean_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_banking_zphr_0st72_ut102ut10_plain_simsp_clean_pipeline pipeline DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_banking_zphr_0st72_ut102ut10_plain_simsp_clean_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_banking_zphr_0st72_ut102ut10_plain_simsp_clean_pipeline` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_banking_zphr_0st72_ut102ut10_plain_simsp_clean_pipeline_en_5.5.0_3.0_1726743231922.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_banking_zphr_0st72_ut102ut10_plain_simsp_clean_pipeline_en_5.5.0_3.0_1726743231922.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_banking_zphr_0st72_ut102ut10_plain_simsp_clean_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_banking_zphr_0st72_ut102ut10_plain_simsp_clean_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_banking_zphr_0st72_ut102ut10_plain_simsp_clean_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_banking_zphr_0st72_ut102ut10_plain_simsp_clean + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_emotion_ft_0416_peter4_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_emotion_ft_0416_peter4_en.md new file mode 100644 index 00000000000000..693ab3d8d6b6f5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_emotion_ft_0416_peter4_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_emotion_ft_0416_peter4 DistilBertForSequenceClassification from Peter4 +author: John Snow Labs +name: distilbert_base_uncased_emotion_ft_0416_peter4 +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_emotion_ft_0416_peter4` is a English model originally trained by Peter4. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_emotion_ft_0416_peter4_en_5.5.0_3.0_1726719088633.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_emotion_ft_0416_peter4_en_5.5.0_3.0_1726719088633.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_emotion_ft_0416_peter4","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_emotion_ft_0416_peter4", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_emotion_ft_0416_peter4| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Peter4/distilbert-base-uncased_emotion_ft_0416 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_emotion_ft_0416_peter4_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_emotion_ft_0416_peter4_pipeline_en.md new file mode 100644 index 00000000000000..c4e708982792e6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_emotion_ft_0416_peter4_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_emotion_ft_0416_peter4_pipeline pipeline DistilBertForSequenceClassification from Peter4 +author: John Snow Labs +name: distilbert_base_uncased_emotion_ft_0416_peter4_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_emotion_ft_0416_peter4_pipeline` is a English model originally trained by Peter4. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_emotion_ft_0416_peter4_pipeline_en_5.5.0_3.0_1726719101222.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_emotion_ft_0416_peter4_pipeline_en_5.5.0_3.0_1726719101222.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_emotion_ft_0416_peter4_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_emotion_ft_0416_peter4_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_emotion_ft_0416_peter4_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Peter4/distilbert-base-uncased_emotion_ft_0416 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_findtuned_emotion_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_findtuned_emotion_en.md new file mode 100644 index 00000000000000..1211cc232d2fa2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_findtuned_emotion_en.md @@ -0,0 +1,98 @@ +--- +layout: model +title: English distilbert_base_uncased_findtuned_emotion DistilBertForSequenceClassification from ctojang +author: John Snow Labs +name: distilbert_base_uncased_findtuned_emotion +date: 2024-09-19 +tags: [bert, en, open_source, sequence_classification, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_findtuned_emotion` is a English model originally trained by ctojang. + +## Predicted Entities + + + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_findtuned_emotion_en_5.5.0_3.0_1726742796845.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_findtuned_emotion_en_5.5.0_3.0_1726742796845.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +document_assembler = DocumentAssembler()\ + .setInputCol("text")\ + .setOutputCol("document") + +tokenizer = Tokenizer()\ + .setInputCols("document")\ + .setOutputCol("token") + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_findtuned_emotion","en")\ + .setInputCols(["document","token"])\ + .setOutputCol("class") + +pipeline = Pipeline().setStages([document_assembler, tokenizer, sequenceClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val document_assembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_findtuned_emotion","en") + .setInputCols(Array("document","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_findtuned_emotion| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +References + +https://huggingface.co/ctojang/distilbert-base-uncased-findtuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_adl_hw1_binga288_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_adl_hw1_binga288_en.md new file mode 100644 index 00000000000000..9933bdf088bf47 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_adl_hw1_binga288_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_adl_hw1_binga288 DistilBertForSequenceClassification from Binga288 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_adl_hw1_binga288 +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_adl_hw1_binga288` is a English model originally trained by Binga288. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_adl_hw1_binga288_en_5.5.0_3.0_1726764127257.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_adl_hw1_binga288_en_5.5.0_3.0_1726764127257.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_adl_hw1_binga288","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_adl_hw1_binga288", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_adl_hw1_binga288| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.9 MB| + +## References + +https://huggingface.co/Binga288/distilbert-base-uncased-finetuned-adl_hw1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_adl_hw1_binga288_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_adl_hw1_binga288_pipeline_en.md new file mode 100644 index 00000000000000..eb6f71e14369a2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_adl_hw1_binga288_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_adl_hw1_binga288_pipeline pipeline DistilBertForSequenceClassification from Binga288 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_adl_hw1_binga288_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_adl_hw1_binga288_pipeline` is a English model originally trained by Binga288. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_adl_hw1_binga288_pipeline_en_5.5.0_3.0_1726764139410.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_adl_hw1_binga288_pipeline_en_5.5.0_3.0_1726764139410.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_adl_hw1_binga288_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_adl_hw1_binga288_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_adl_hw1_binga288_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.9 MB| + +## References + +https://huggingface.co/Binga288/distilbert-base-uncased-finetuned-adl_hw1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_adl_hw1_wennycooper_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_adl_hw1_wennycooper_pipeline_en.md new file mode 100644 index 00000000000000..788d30d948b35a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_adl_hw1_wennycooper_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_adl_hw1_wennycooper_pipeline pipeline DistilBertForSequenceClassification from wennycooper +author: John Snow Labs +name: distilbert_base_uncased_finetuned_adl_hw1_wennycooper_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_adl_hw1_wennycooper_pipeline` is a English model originally trained by wennycooper. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_adl_hw1_wennycooper_pipeline_en_5.5.0_3.0_1726743691010.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_adl_hw1_wennycooper_pipeline_en_5.5.0_3.0_1726743691010.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_adl_hw1_wennycooper_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_adl_hw1_wennycooper_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_adl_hw1_wennycooper_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.9 MB| + +## References + +https://huggingface.co/wennycooper/distilbert-base-uncased-finetuned-adl_hw1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_bentleyjeon_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_bentleyjeon_pipeline_en.md new file mode 100644 index 00000000000000..c0704c45deb24b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_bentleyjeon_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_bentleyjeon_pipeline pipeline DistilBertForSequenceClassification from bentleyjeon +author: John Snow Labs +name: distilbert_base_uncased_finetuned_bentleyjeon_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_bentleyjeon_pipeline` is a English model originally trained by bentleyjeon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_bentleyjeon_pipeline_en_5.5.0_3.0_1726704404450.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_bentleyjeon_pipeline_en_5.5.0_3.0_1726704404450.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_bentleyjeon_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_bentleyjeon_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_bentleyjeon_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/bentleyjeon/distilbert-base-uncased-finetuned + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_cola_abushady5_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_cola_abushady5_en.md new file mode 100644 index 00000000000000..602bb690c36b5a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_cola_abushady5_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_cola_abushady5 DistilBertForSequenceClassification from Abushady5 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_cola_abushady5 +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_cola_abushady5` is a English model originally trained by Abushady5. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_abushady5_en_5.5.0_3.0_1726719324321.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_abushady5_en_5.5.0_3.0_1726719324321.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_cola_abushady5","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_cola_abushady5", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_cola_abushady5| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Abushady5/distilbert-base-uncased-finetuned-cola \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_cola_abushady5_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_cola_abushady5_pipeline_en.md new file mode 100644 index 00000000000000..7abee68d4325bb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_cola_abushady5_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_cola_abushady5_pipeline pipeline DistilBertForSequenceClassification from Abushady5 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_cola_abushady5_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_cola_abushady5_pipeline` is a English model originally trained by Abushady5. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_abushady5_pipeline_en_5.5.0_3.0_1726719336468.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_abushady5_pipeline_en_5.5.0_3.0_1726719336468.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_cola_abushady5_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_cola_abushady5_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_cola_abushady5_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Abushady5/distilbert-base-uncased-finetuned-cola + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_cola_alexmy2023_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_cola_alexmy2023_en.md new file mode 100644 index 00000000000000..5039393a1ab16c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_cola_alexmy2023_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_cola_alexmy2023 DistilBertForSequenceClassification from alexmy2023 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_cola_alexmy2023 +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_cola_alexmy2023` is a English model originally trained by alexmy2023. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_alexmy2023_en_5.5.0_3.0_1726763531700.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_alexmy2023_en_5.5.0_3.0_1726763531700.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_cola_alexmy2023","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_cola_alexmy2023", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_cola_alexmy2023| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/alexmy2023/distilbert-base-uncased-finetuned-cola \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_cola_alexmy2023_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_cola_alexmy2023_pipeline_en.md new file mode 100644 index 00000000000000..4b6aff09f39d83 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_cola_alexmy2023_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_cola_alexmy2023_pipeline pipeline DistilBertForSequenceClassification from alexmy2023 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_cola_alexmy2023_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_cola_alexmy2023_pipeline` is a English model originally trained by alexmy2023. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_alexmy2023_pipeline_en_5.5.0_3.0_1726763544886.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_alexmy2023_pipeline_en_5.5.0_3.0_1726763544886.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_cola_alexmy2023_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_cola_alexmy2023_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_cola_alexmy2023_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/alexmy2023/distilbert-base-uncased-finetuned-cola + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_cola_bwy071_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_cola_bwy071_en.md new file mode 100644 index 00000000000000..f33c1556051585 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_cola_bwy071_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_cola_bwy071 DistilBertForSequenceClassification from bwy071 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_cola_bwy071 +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_cola_bwy071` is a English model originally trained by bwy071. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_bwy071_en_5.5.0_3.0_1726704741223.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_bwy071_en_5.5.0_3.0_1726704741223.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_cola_bwy071","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_cola_bwy071", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_cola_bwy071| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/bwy071/distilbert-base-uncased-finetuned-cola \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_cola_bwy071_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_cola_bwy071_pipeline_en.md new file mode 100644 index 00000000000000..70def8e719d47f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_cola_bwy071_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_cola_bwy071_pipeline pipeline DistilBertForSequenceClassification from bwy071 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_cola_bwy071_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_cola_bwy071_pipeline` is a English model originally trained by bwy071. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_bwy071_pipeline_en_5.5.0_3.0_1726704752870.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_bwy071_pipeline_en_5.5.0_3.0_1726704752870.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_cola_bwy071_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_cola_bwy071_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_cola_bwy071_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/bwy071/distilbert-base-uncased-finetuned-cola + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_cola_elyziumm_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_cola_elyziumm_en.md new file mode 100644 index 00000000000000..375c48c864eb50 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_cola_elyziumm_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_cola_elyziumm DistilBertForSequenceClassification from elyziumm +author: John Snow Labs +name: distilbert_base_uncased_finetuned_cola_elyziumm +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_cola_elyziumm` is a English model originally trained by elyziumm. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_elyziumm_en_5.5.0_3.0_1726742797856.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_elyziumm_en_5.5.0_3.0_1726742797856.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_cola_elyziumm","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_cola_elyziumm", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_cola_elyziumm| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/elyziumm/distilbert-base-uncased-finetuned-cola \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_cola_hanzla107_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_cola_hanzla107_en.md new file mode 100644 index 00000000000000..f07c5a5fc43d90 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_cola_hanzla107_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_cola_hanzla107 DistilBertForSequenceClassification from hanzla107 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_cola_hanzla107 +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_cola_hanzla107` is a English model originally trained by hanzla107. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_hanzla107_en_5.5.0_3.0_1726718982866.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_hanzla107_en_5.5.0_3.0_1726718982866.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_cola_hanzla107","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_cola_hanzla107", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_cola_hanzla107| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/hanzla107/distilbert-base-uncased-finetuned-cola \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_cola_hanzla107_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_cola_hanzla107_pipeline_en.md new file mode 100644 index 00000000000000..16b3601c381f44 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_cola_hanzla107_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_cola_hanzla107_pipeline pipeline DistilBertForSequenceClassification from hanzla107 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_cola_hanzla107_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_cola_hanzla107_pipeline` is a English model originally trained by hanzla107. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_hanzla107_pipeline_en_5.5.0_3.0_1726718995054.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_hanzla107_pipeline_en_5.5.0_3.0_1726718995054.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_cola_hanzla107_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_cola_hanzla107_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_cola_hanzla107_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/hanzla107/distilbert-base-uncased-finetuned-cola + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_cola_jbgao_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_cola_jbgao_pipeline_en.md new file mode 100644 index 00000000000000..bb56e1e3f41c99 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_cola_jbgao_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_cola_jbgao_pipeline pipeline DistilBertForSequenceClassification from jbgao +author: John Snow Labs +name: distilbert_base_uncased_finetuned_cola_jbgao_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_cola_jbgao_pipeline` is a English model originally trained by jbgao. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_jbgao_pipeline_en_5.5.0_3.0_1726742427021.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_jbgao_pipeline_en_5.5.0_3.0_1726742427021.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_cola_jbgao_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_cola_jbgao_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_cola_jbgao_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/jbgao/distilbert-base-uncased-finetuned-cola + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_emo_une_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_emo_une_pipeline_en.md new file mode 100644 index 00000000000000..678ca6fa6fe8ee --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_emo_une_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emo_une_pipeline pipeline DistilBertForSequenceClassification from Gregorig +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emo_une_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emo_une_pipeline` is a English model originally trained by Gregorig. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emo_une_pipeline_en_5.5.0_3.0_1726743156699.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emo_une_pipeline_en_5.5.0_3.0_1726743156699.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emo_une_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emo_une_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emo_une_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Gregorig/distilbert-base-uncased-finetuned-emo_une + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_emotion_aicoder009_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_emotion_aicoder009_en.md new file mode 100644 index 00000000000000..8997c2ac6eb8ec --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_emotion_aicoder009_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_aicoder009 DistilBertForSequenceClassification from AICODER009 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_aicoder009 +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_aicoder009` is a English model originally trained by AICODER009. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_aicoder009_en_5.5.0_3.0_1726704675610.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_aicoder009_en_5.5.0_3.0_1726704675610.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_aicoder009","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_aicoder009", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_aicoder009| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/AICODER009/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_emotion_ayushvaish2000_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_emotion_ayushvaish2000_en.md new file mode 100644 index 00000000000000..ae73062f79a4b2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_emotion_ayushvaish2000_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_ayushvaish2000 DistilBertForSequenceClassification from ayushvaish2000 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_ayushvaish2000 +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_ayushvaish2000` is a English model originally trained by ayushvaish2000. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_ayushvaish2000_en_5.5.0_3.0_1726742577976.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_ayushvaish2000_en_5.5.0_3.0_1726742577976.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_ayushvaish2000","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_ayushvaish2000", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_ayushvaish2000| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/ayushvaish2000/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_emotion_benshafat_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_emotion_benshafat_en.md new file mode 100644 index 00000000000000..d2911b600d5477 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_emotion_benshafat_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_benshafat DistilBertForSequenceClassification from benshafat +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_benshafat +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_benshafat` is a English model originally trained by benshafat. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_benshafat_en_5.5.0_3.0_1726763559066.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_benshafat_en_5.5.0_3.0_1726763559066.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_benshafat","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_benshafat", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_benshafat| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/benshafat/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_emotion_benshafat_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_emotion_benshafat_pipeline_en.md new file mode 100644 index 00000000000000..52336bd09fae7c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_emotion_benshafat_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_benshafat_pipeline pipeline DistilBertForSequenceClassification from benshafat +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_benshafat_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_benshafat_pipeline` is a English model originally trained by benshafat. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_benshafat_pipeline_en_5.5.0_3.0_1726763571286.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_benshafat_pipeline_en_5.5.0_3.0_1726763571286.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_benshafat_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_benshafat_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_benshafat_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/benshafat/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_emotion_bingyizh_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_emotion_bingyizh_pipeline_en.md new file mode 100644 index 00000000000000..cf02c509bf73cd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_emotion_bingyizh_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_bingyizh_pipeline pipeline DistilBertForSequenceClassification from bingyizh +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_bingyizh_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_bingyizh_pipeline` is a English model originally trained by bingyizh. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_bingyizh_pipeline_en_5.5.0_3.0_1726704884403.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_bingyizh_pipeline_en_5.5.0_3.0_1726704884403.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_bingyizh_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_bingyizh_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_bingyizh_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/bingyizh/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_emotion_cokuun_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_emotion_cokuun_en.md new file mode 100644 index 00000000000000..6412f4abbbf757 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_emotion_cokuun_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_cokuun DistilBertForSequenceClassification from cokuun +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_cokuun +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_cokuun` is a English model originally trained by cokuun. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_cokuun_en_5.5.0_3.0_1726741235584.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_cokuun_en_5.5.0_3.0_1726741235584.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_cokuun","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_cokuun", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_cokuun| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/cokuun/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_emotion_cokuun_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_emotion_cokuun_pipeline_en.md new file mode 100644 index 00000000000000..ad3458116858c5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_emotion_cokuun_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_cokuun_pipeline pipeline DistilBertForSequenceClassification from cokuun +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_cokuun_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_cokuun_pipeline` is a English model originally trained by cokuun. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_cokuun_pipeline_en_5.5.0_3.0_1726741248606.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_cokuun_pipeline_en_5.5.0_3.0_1726741248606.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_cokuun_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_cokuun_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_cokuun_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/cokuun/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_emotion_cramade_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_emotion_cramade_en.md new file mode 100644 index 00000000000000..4fb0941f3b3212 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_emotion_cramade_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_cramade DistilBertForSequenceClassification from cramade +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_cramade +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_cramade` is a English model originally trained by cramade. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_cramade_en_5.5.0_3.0_1726741025878.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_cramade_en_5.5.0_3.0_1726741025878.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_cramade","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_cramade", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_cramade| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/cramade/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_emotion_cramade_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_emotion_cramade_pipeline_en.md new file mode 100644 index 00000000000000..b68704ef2bc117 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_emotion_cramade_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_cramade_pipeline pipeline DistilBertForSequenceClassification from cramade +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_cramade_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_cramade_pipeline` is a English model originally trained by cramade. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_cramade_pipeline_en_5.5.0_3.0_1726741038222.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_cramade_pipeline_en_5.5.0_3.0_1726741038222.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_cramade_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_cramade_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_cramade_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/cramade/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_emotion_es_k_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_emotion_es_k_en.md new file mode 100644 index 00000000000000..071cb7d7dfaefe --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_emotion_es_k_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_es_k DistilBertForSequenceClassification from es-k +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_es_k +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_es_k` is a English model originally trained by es-k. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_es_k_en_5.5.0_3.0_1726704377979.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_es_k_en_5.5.0_3.0_1726704377979.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_es_k","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_es_k", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_es_k| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/es-k/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_emotion_es_k_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_emotion_es_k_pipeline_en.md new file mode 100644 index 00000000000000..5daebc59b4c4ae --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_emotion_es_k_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_es_k_pipeline pipeline DistilBertForSequenceClassification from es-k +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_es_k_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_es_k_pipeline` is a English model originally trained by es-k. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_es_k_pipeline_en_5.5.0_3.0_1726704390181.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_es_k_pipeline_en_5.5.0_3.0_1726704390181.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_es_k_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_es_k_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_es_k_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/es-k/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_emotion_farisanki_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_emotion_farisanki_en.md new file mode 100644 index 00000000000000..a6299fb2352e39 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_emotion_farisanki_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_farisanki DistilBertForSequenceClassification from farisanki +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_farisanki +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_farisanki` is a English model originally trained by farisanki. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_farisanki_en_5.5.0_3.0_1726742966951.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_farisanki_en_5.5.0_3.0_1726742966951.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_farisanki","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_farisanki", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_farisanki| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/farisanki/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_emotion_farisanki_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_emotion_farisanki_pipeline_en.md new file mode 100644 index 00000000000000..8508650e6c04a8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_emotion_farisanki_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_farisanki_pipeline pipeline DistilBertForSequenceClassification from farisanki +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_farisanki_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_farisanki_pipeline` is a English model originally trained by farisanki. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_farisanki_pipeline_en_5.5.0_3.0_1726742979347.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_farisanki_pipeline_en_5.5.0_3.0_1726742979347.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_farisanki_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_farisanki_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_farisanki_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/farisanki/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_emotion_fonzie_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_emotion_fonzie_en.md new file mode 100644 index 00000000000000..f4294e8fd12b2b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_emotion_fonzie_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_fonzie DistilBertForSequenceClassification from Fonzie +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_fonzie +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_fonzie` is a English model originally trained by Fonzie. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_fonzie_en_5.5.0_3.0_1726741352092.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_fonzie_en_5.5.0_3.0_1726741352092.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_fonzie","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_fonzie", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_fonzie| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Fonzie/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_emotion_fonzie_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_emotion_fonzie_pipeline_en.md new file mode 100644 index 00000000000000..97bc80a36f315e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_emotion_fonzie_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_fonzie_pipeline pipeline DistilBertForSequenceClassification from Fonzie +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_fonzie_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_fonzie_pipeline` is a English model originally trained by Fonzie. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_fonzie_pipeline_en_5.5.0_3.0_1726741364413.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_fonzie_pipeline_en_5.5.0_3.0_1726741364413.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_fonzie_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_fonzie_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_fonzie_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Fonzie/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_emotion_fyl1_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_emotion_fyl1_en.md new file mode 100644 index 00000000000000..70887986f805e8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_emotion_fyl1_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_fyl1 DistilBertForSequenceClassification from fyl1 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_fyl1 +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_fyl1` is a English model originally trained by fyl1. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_fyl1_en_5.5.0_3.0_1726743949004.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_fyl1_en_5.5.0_3.0_1726743949004.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_fyl1","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_fyl1", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_fyl1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/fyl1/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_emotion_gopidon_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_emotion_gopidon_en.md new file mode 100644 index 00000000000000..72449868b1ff64 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_emotion_gopidon_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_gopidon DistilBertForSequenceClassification from gopidon +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_gopidon +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_gopidon` is a English model originally trained by gopidon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_gopidon_en_5.5.0_3.0_1726763552890.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_gopidon_en_5.5.0_3.0_1726763552890.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_gopidon","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_gopidon", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_gopidon| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/gopidon/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_emotion_gopidon_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_emotion_gopidon_pipeline_en.md new file mode 100644 index 00000000000000..99083ca387436a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_emotion_gopidon_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_gopidon_pipeline pipeline DistilBertForSequenceClassification from gopidon +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_gopidon_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_gopidon_pipeline` is a English model originally trained by gopidon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_gopidon_pipeline_en_5.5.0_3.0_1726763565178.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_gopidon_pipeline_en_5.5.0_3.0_1726763565178.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_gopidon_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_gopidon_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_gopidon_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/gopidon/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_emotion_lch34677_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_emotion_lch34677_en.md new file mode 100644 index 00000000000000..57b4261e65723f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_emotion_lch34677_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_lch34677 DistilBertForSequenceClassification from lch34677 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_lch34677 +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_lch34677` is a English model originally trained by lch34677. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_lch34677_en_5.5.0_3.0_1726744071233.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_lch34677_en_5.5.0_3.0_1726744071233.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_lch34677","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_lch34677", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_lch34677| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/lch34677/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_emotion_mile48_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_emotion_mile48_en.md new file mode 100644 index 00000000000000..7218422728289c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_emotion_mile48_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_mile48 DistilBertForSequenceClassification from mile48 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_mile48 +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_mile48` is a English model originally trained by mile48. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_mile48_en_5.5.0_3.0_1726743762349.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_mile48_en_5.5.0_3.0_1726743762349.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_mile48","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_mile48", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_mile48| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/mile48/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_emotion_nickrth_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_emotion_nickrth_en.md new file mode 100644 index 00000000000000..8169dbcbe0cde3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_emotion_nickrth_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_nickrth DistilBertForSequenceClassification from nickrth +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_nickrth +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_nickrth` is a English model originally trained by nickrth. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_nickrth_en_5.5.0_3.0_1726719507945.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_nickrth_en_5.5.0_3.0_1726719507945.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_nickrth","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_nickrth", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_nickrth| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/nickrth/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_emotion_nickrth_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_emotion_nickrth_pipeline_en.md new file mode 100644 index 00000000000000..08a792d37f3d47 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_emotion_nickrth_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_nickrth_pipeline pipeline DistilBertForSequenceClassification from nickrth +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_nickrth_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_nickrth_pipeline` is a English model originally trained by nickrth. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_nickrth_pipeline_en_5.5.0_3.0_1726719521201.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_nickrth_pipeline_en_5.5.0_3.0_1726719521201.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_nickrth_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_nickrth_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_nickrth_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/nickrth/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_emotion_qixing_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_emotion_qixing_en.md new file mode 100644 index 00000000000000..a23e3ebe077705 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_emotion_qixing_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_qixing DistilBertForSequenceClassification from qixing +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_qixing +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_qixing` is a English model originally trained by qixing. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_qixing_en_5.5.0_3.0_1726719569467.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_qixing_en_5.5.0_3.0_1726719569467.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_qixing","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_qixing", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_qixing| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/qixing/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_emotion_qixing_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_emotion_qixing_pipeline_en.md new file mode 100644 index 00000000000000..64f14f2361f7ef --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_emotion_qixing_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_qixing_pipeline pipeline DistilBertForSequenceClassification from qixing +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_qixing_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_qixing_pipeline` is a English model originally trained by qixing. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_qixing_pipeline_en_5.5.0_3.0_1726719581470.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_qixing_pipeline_en_5.5.0_3.0_1726719581470.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_qixing_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_qixing_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_qixing_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/qixing/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_emotion_rairachit_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_emotion_rairachit_en.md new file mode 100644 index 00000000000000..5b5c9a76612f14 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_emotion_rairachit_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_rairachit DistilBertForSequenceClassification from RaiRachit +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_rairachit +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_rairachit` is a English model originally trained by RaiRachit. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_rairachit_en_5.5.0_3.0_1726719209059.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_rairachit_en_5.5.0_3.0_1726719209059.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_rairachit","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_rairachit", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_rairachit| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/RaiRachit/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_emotion_rairachit_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_emotion_rairachit_pipeline_en.md new file mode 100644 index 00000000000000..61d6d7ce939ba0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_emotion_rairachit_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_rairachit_pipeline pipeline DistilBertForSequenceClassification from RaiRachit +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_rairachit_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_rairachit_pipeline` is a English model originally trained by RaiRachit. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_rairachit_pipeline_en_5.5.0_3.0_1726719221704.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_rairachit_pipeline_en_5.5.0_3.0_1726719221704.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_rairachit_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_rairachit_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_rairachit_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/RaiRachit/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_emotion_redglasses_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_emotion_redglasses_en.md new file mode 100644 index 00000000000000..7c18436567bf5c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_emotion_redglasses_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_redglasses DistilBertForSequenceClassification from RedGlasses +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_redglasses +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_redglasses` is a English model originally trained by RedGlasses. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_redglasses_en_5.5.0_3.0_1726704353336.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_redglasses_en_5.5.0_3.0_1726704353336.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_redglasses","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_redglasses", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_redglasses| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/RedGlasses/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_emotion_sayanote_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_emotion_sayanote_pipeline_en.md new file mode 100644 index 00000000000000..1fed1bc7685f51 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_emotion_sayanote_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_sayanote_pipeline pipeline DistilBertForSequenceClassification from Sayanote +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_sayanote_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_sayanote_pipeline` is a English model originally trained by Sayanote. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_sayanote_pipeline_en_5.5.0_3.0_1726743555011.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_sayanote_pipeline_en_5.5.0_3.0_1726743555011.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_sayanote_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_sayanote_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_sayanote_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Sayanote/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_emotions_jjfumero_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_emotions_jjfumero_en.md new file mode 100644 index 00000000000000..6bf7b2c3ea4002 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_emotions_jjfumero_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotions_jjfumero DistilBertForSequenceClassification from jjfumero +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotions_jjfumero +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotions_jjfumero` is a English model originally trained by jjfumero. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotions_jjfumero_en_5.5.0_3.0_1726764116961.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotions_jjfumero_en_5.5.0_3.0_1726764116961.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotions_jjfumero","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotions_jjfumero", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotions_jjfumero| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/jjfumero/distilbert-base-uncased-finetuned-emotions \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_emotions_jjfumero_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_emotions_jjfumero_pipeline_en.md new file mode 100644 index 00000000000000..730220966a7595 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_emotions_jjfumero_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotions_jjfumero_pipeline pipeline DistilBertForSequenceClassification from jjfumero +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotions_jjfumero_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotions_jjfumero_pipeline` is a English model originally trained by jjfumero. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotions_jjfumero_pipeline_en_5.5.0_3.0_1726764128924.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotions_jjfumero_pipeline_en_5.5.0_3.0_1726764128924.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotions_jjfumero_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotions_jjfumero_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotions_jjfumero_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/jjfumero/distilbert-base-uncased-finetuned-emotions + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_imdb_classifier_nlp_course_chapter7_section2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_imdb_classifier_nlp_course_chapter7_section2_pipeline_en.md new file mode 100644 index 00000000000000..0a2e48b6f524c4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_imdb_classifier_nlp_course_chapter7_section2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_imdb_classifier_nlp_course_chapter7_section2_pipeline pipeline DistilBertForSequenceClassification from BanUrsus +author: John Snow Labs +name: distilbert_base_uncased_finetuned_imdb_classifier_nlp_course_chapter7_section2_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_imdb_classifier_nlp_course_chapter7_section2_pipeline` is a English model originally trained by BanUrsus. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_imdb_classifier_nlp_course_chapter7_section2_pipeline_en_5.5.0_3.0_1726719406223.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_imdb_classifier_nlp_course_chapter7_section2_pipeline_en_5.5.0_3.0_1726719406223.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_imdb_classifier_nlp_course_chapter7_section2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_imdb_classifier_nlp_course_chapter7_section2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_imdb_classifier_nlp_course_chapter7_section2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/BanUrsus/distilbert-base-uncased-finetuned-imdb-classifier_nlp-course-chapter7-section2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_nlp_letters_text_all_class_weighted_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_nlp_letters_text_all_class_weighted_en.md new file mode 100644 index 00000000000000..e73f66a5e7946a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_nlp_letters_text_all_class_weighted_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_nlp_letters_text_all_class_weighted DistilBertForSequenceClassification from ben-yu +author: John Snow Labs +name: distilbert_base_uncased_finetuned_nlp_letters_text_all_class_weighted +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_nlp_letters_text_all_class_weighted` is a English model originally trained by ben-yu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_nlp_letters_text_all_class_weighted_en_5.5.0_3.0_1726742979203.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_nlp_letters_text_all_class_weighted_en_5.5.0_3.0_1726742979203.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_nlp_letters_text_all_class_weighted","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_nlp_letters_text_all_class_weighted", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_nlp_letters_text_all_class_weighted| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/ben-yu/distilbert-base-uncased-finetuned-nlp-letters-TEXT-all-class-weighted \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_sanskrit_saskta_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_sanskrit_saskta_pipeline_en.md new file mode 100644 index 00000000000000..2da93930894085 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_sanskrit_saskta_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_sanskrit_saskta_pipeline pipeline DistilBertForSequenceClassification from datht +author: John Snow Labs +name: distilbert_base_uncased_finetuned_sanskrit_saskta_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_sanskrit_saskta_pipeline` is a English model originally trained by datht. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_sanskrit_saskta_pipeline_en_5.5.0_3.0_1726704814395.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_sanskrit_saskta_pipeline_en_5.5.0_3.0_1726704814395.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_sanskrit_saskta_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_sanskrit_saskta_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_sanskrit_saskta_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/datht/distilbert-base-uncased-finetuned-SA + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_squad_amna1015_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_squad_amna1015_en.md new file mode 100644 index 00000000000000..90ee657c18f5c4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_squad_amna1015_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_squad_amna1015 DistilBertForQuestionAnswering from Amna1015 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_squad_amna1015 +date: 2024-09-19 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_squad_amna1015` is a English model originally trained by Amna1015. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_amna1015_en_5.5.0_3.0_1726727561754.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_amna1015_en_5.5.0_3.0_1726727561754.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_uncased_finetuned_squad_amna1015","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_uncased_finetuned_squad_amna1015", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_squad_amna1015| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/Amna1015/distilbert-base-uncased-finetuned-squad \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_squad_d5716d28_coreyabs_db_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_squad_d5716d28_coreyabs_db_en.md new file mode 100644 index 00000000000000..05056287c1bac3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_squad_d5716d28_coreyabs_db_en.md @@ -0,0 +1,92 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_squad_d5716d28_coreyabs_db DistilBertEmbeddings from coreyabs-db +author: John Snow Labs +name: distilbert_base_uncased_finetuned_squad_d5716d28_coreyabs_db +date: 2024-09-19 +tags: [distilbert, en, open_source, fill_mask, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_squad_d5716d28_coreyabs_db` is a English model originally trained by coreyabs-db. + +## Predicted Entities + + + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_d5716d28_coreyabs_db_en_5.5.0_3.0_1726727597479.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_d5716d28_coreyabs_db_en_5.5.0_3.0_1726727597479.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +document_assembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +embeddings =DistilBertEmbeddings.pretrained("distilbert_base_uncased_finetuned_squad_d5716d28_coreyabs_db","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([document_assembler, embeddings]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) +``` +```scala +val document_assembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val embeddings = DistilBertEmbeddings + .pretrained("distilbert_base_uncased_finetuned_squad_d5716d28_coreyabs_db", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(document_assembler, embeddings)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_squad_d5716d28_coreyabs_db| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|247.2 MB| + +## References + +References + +https://huggingface.co/coreyabs-db/distilbert-base-uncased-finetuned-squad-d5716d28 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_squad_d5716d28_coreyabs_db_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_squad_d5716d28_coreyabs_db_pipeline_en.md new file mode 100644 index 00000000000000..0b102a324bb772 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_squad_d5716d28_coreyabs_db_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_squad_d5716d28_coreyabs_db_pipeline pipeline DistilBertForQuestionAnswering from coreyabs-db +author: John Snow Labs +name: distilbert_base_uncased_finetuned_squad_d5716d28_coreyabs_db_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_squad_d5716d28_coreyabs_db_pipeline` is a English model originally trained by coreyabs-db. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_d5716d28_coreyabs_db_pipeline_en_5.5.0_3.0_1726727609746.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_d5716d28_coreyabs_db_pipeline_en_5.5.0_3.0_1726727609746.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_squad_d5716d28_coreyabs_db_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_squad_d5716d28_coreyabs_db_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_squad_d5716d28_coreyabs_db_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.3 MB| + +## References + +https://huggingface.co/coreyabs-db/distilbert-base-uncased-finetuned-squad-d5716d28 + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_squad_d5716d28_kiwihead15_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_squad_d5716d28_kiwihead15_en.md new file mode 100644 index 00000000000000..dc56e5e52a360b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_squad_d5716d28_kiwihead15_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_squad_d5716d28_kiwihead15 DistilBertForQuestionAnswering from Kiwihead15 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_squad_d5716d28_kiwihead15 +date: 2024-09-19 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_squad_d5716d28_kiwihead15` is a English model originally trained by Kiwihead15. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_d5716d28_kiwihead15_en_5.5.0_3.0_1726766763265.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_d5716d28_kiwihead15_en_5.5.0_3.0_1726766763265.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_uncased_finetuned_squad_d5716d28_kiwihead15","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_uncased_finetuned_squad_d5716d28_kiwihead15", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_squad_d5716d28_kiwihead15| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/Kiwihead15/distilbert-base-uncased-finetuned-squad-d5716d28 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_squad_d5716d28_sgr23_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_squad_d5716d28_sgr23_pipeline_en.md new file mode 100644 index 00000000000000..b24a633151640c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_squad_d5716d28_sgr23_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_squad_d5716d28_sgr23_pipeline pipeline DistilBertForQuestionAnswering from sgr23 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_squad_d5716d28_sgr23_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_squad_d5716d28_sgr23_pipeline` is a English model originally trained by sgr23. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_d5716d28_sgr23_pipeline_en_5.5.0_3.0_1726727623230.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_d5716d28_sgr23_pipeline_en_5.5.0_3.0_1726727623230.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_squad_d5716d28_sgr23_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_squad_d5716d28_sgr23_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_squad_d5716d28_sgr23_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.3 MB| + +## References + +https://huggingface.co/sgr23/distilbert-base-uncased-finetuned-squad-d5716d28 + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_squad_d5716d28_sober_clever_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_squad_d5716d28_sober_clever_en.md new file mode 100644 index 00000000000000..5ea3fa73a77eeb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_squad_d5716d28_sober_clever_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_squad_d5716d28_sober_clever DistilBertForQuestionAnswering from Sober-Clever +author: John Snow Labs +name: distilbert_base_uncased_finetuned_squad_d5716d28_sober_clever +date: 2024-09-19 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_squad_d5716d28_sober_clever` is a English model originally trained by Sober-Clever. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_d5716d28_sober_clever_en_5.5.0_3.0_1726786041906.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_d5716d28_sober_clever_en_5.5.0_3.0_1726786041906.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_uncased_finetuned_squad_d5716d28_sober_clever","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_uncased_finetuned_squad_d5716d28_sober_clever", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_squad_d5716d28_sober_clever| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/Sober-Clever/distilbert-base-uncased-finetuned-squad-d5716d28 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_squad_d5716d28_sober_clever_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_squad_d5716d28_sober_clever_pipeline_en.md new file mode 100644 index 00000000000000..d569abcbca1d2c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_squad_d5716d28_sober_clever_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_squad_d5716d28_sober_clever_pipeline pipeline DistilBertForQuestionAnswering from Sober-Clever +author: John Snow Labs +name: distilbert_base_uncased_finetuned_squad_d5716d28_sober_clever_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_squad_d5716d28_sober_clever_pipeline` is a English model originally trained by Sober-Clever. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_d5716d28_sober_clever_pipeline_en_5.5.0_3.0_1726786052946.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_d5716d28_sober_clever_pipeline_en_5.5.0_3.0_1726786052946.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_squad_d5716d28_sober_clever_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_squad_d5716d28_sober_clever_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_squad_d5716d28_sober_clever_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.3 MB| + +## References + +https://huggingface.co/Sober-Clever/distilbert-base-uncased-finetuned-squad-d5716d28 + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_squad_gplaza91_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_squad_gplaza91_en.md new file mode 100644 index 00000000000000..d727cc84e0d174 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_squad_gplaza91_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_squad_gplaza91 DistilBertForQuestionAnswering from gplaza91 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_squad_gplaza91 +date: 2024-09-19 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_squad_gplaza91` is a English model originally trained by gplaza91. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_gplaza91_en_5.5.0_3.0_1726748417556.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_gplaza91_en_5.5.0_3.0_1726748417556.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_uncased_finetuned_squad_gplaza91","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_uncased_finetuned_squad_gplaza91", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_squad_gplaza91| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/gplaza91/distilbert-base-uncased-finetuned-squad \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_squad_gplaza91_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_squad_gplaza91_pipeline_en.md new file mode 100644 index 00000000000000..5ec5a2347c7f15 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_squad_gplaza91_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_squad_gplaza91_pipeline pipeline DistilBertForQuestionAnswering from gplaza91 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_squad_gplaza91_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_squad_gplaza91_pipeline` is a English model originally trained by gplaza91. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_gplaza91_pipeline_en_5.5.0_3.0_1726748430961.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_gplaza91_pipeline_en_5.5.0_3.0_1726748430961.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_squad_gplaza91_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_squad_gplaza91_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_squad_gplaza91_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.3 MB| + +## References + +https://huggingface.co/gplaza91/distilbert-base-uncased-finetuned-squad + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_squad_jmoonyoung_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_squad_jmoonyoung_en.md new file mode 100644 index 00000000000000..b5c366d1c6843a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_squad_jmoonyoung_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_squad_jmoonyoung DistilBertForQuestionAnswering from jmoonyoung +author: John Snow Labs +name: distilbert_base_uncased_finetuned_squad_jmoonyoung +date: 2024-09-19 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_squad_jmoonyoung` is a English model originally trained by jmoonyoung. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_jmoonyoung_en_5.5.0_3.0_1726748417741.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_jmoonyoung_en_5.5.0_3.0_1726748417741.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_uncased_finetuned_squad_jmoonyoung","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_uncased_finetuned_squad_jmoonyoung", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_squad_jmoonyoung| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/jmoonyoung/distilbert-base-uncased-finetuned-squad \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_squad_mfenner_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_squad_mfenner_en.md new file mode 100644 index 00000000000000..2b1c72259fb519 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_squad_mfenner_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_squad_mfenner DistilBertForQuestionAnswering from mfenner +author: John Snow Labs +name: distilbert_base_uncased_finetuned_squad_mfenner +date: 2024-09-19 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_squad_mfenner` is a English model originally trained by mfenner. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_mfenner_en_5.5.0_3.0_1726766672248.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_mfenner_en_5.5.0_3.0_1726766672248.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_uncased_finetuned_squad_mfenner","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_uncased_finetuned_squad_mfenner", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_squad_mfenner| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/mfenner/distilbert-base-uncased-finetuned-squad \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_squad_msamahero_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_squad_msamahero_en.md new file mode 100644 index 00000000000000..4bf08773d4fef9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_squad_msamahero_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_squad_msamahero DistilBertForQuestionAnswering from msamahero +author: John Snow Labs +name: distilbert_base_uncased_finetuned_squad_msamahero +date: 2024-09-19 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_squad_msamahero` is a English model originally trained by msamahero. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_msamahero_en_5.5.0_3.0_1726727855733.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_msamahero_en_5.5.0_3.0_1726727855733.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_uncased_finetuned_squad_msamahero","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_uncased_finetuned_squad_msamahero", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_squad_msamahero| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/msamahero/distilbert-base-uncased-finetuned-squad \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_squad_tuner23652_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_squad_tuner23652_en.md new file mode 100644 index 00000000000000..343a83b8167f09 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_squad_tuner23652_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_squad_tuner23652 DistilBertForQuestionAnswering from tuner23652 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_squad_tuner23652 +date: 2024-09-19 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_squad_tuner23652` is a English model originally trained by tuner23652. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_tuner23652_en_5.5.0_3.0_1726766673187.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_tuner23652_en_5.5.0_3.0_1726766673187.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_uncased_finetuned_squad_tuner23652","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_uncased_finetuned_squad_tuner23652", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_squad_tuner23652| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/tuner23652/distilbert-base-uncased-finetuned-squad \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_sst_2_english_toxicity_ft_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_sst_2_english_toxicity_ft_en.md new file mode 100644 index 00000000000000..5d50d12dc58a2e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_sst_2_english_toxicity_ft_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_sst_2_english_toxicity_ft DistilBertForSequenceClassification from fatmhd1995 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_sst_2_english_toxicity_ft +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_sst_2_english_toxicity_ft` is a English model originally trained by fatmhd1995. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_sst_2_english_toxicity_ft_en_5.5.0_3.0_1726740692106.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_sst_2_english_toxicity_ft_en_5.5.0_3.0_1726740692106.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_sst_2_english_toxicity_ft","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_sst_2_english_toxicity_ft", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_sst_2_english_toxicity_ft| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/fatmhd1995/distilbert-base-uncased-finetuned-sst-2-english-TOXICITY-FT \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_t_communication_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_t_communication_en.md new file mode 100644 index 00000000000000..9af156543f1fb4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_t_communication_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_t_communication DistilBertForSequenceClassification from Gregorig +author: John Snow Labs +name: distilbert_base_uncased_finetuned_t_communication +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_t_communication` is a English model originally trained by Gregorig. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_t_communication_en_5.5.0_3.0_1726742860008.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_t_communication_en_5.5.0_3.0_1726742860008.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_t_communication","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_t_communication", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_t_communication| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Gregorig/distilbert-base-uncased-finetuned-t_communication \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_t_communication_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_t_communication_pipeline_en.md new file mode 100644 index 00000000000000..f1d232b481ea9e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_t_communication_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_t_communication_pipeline pipeline DistilBertForSequenceClassification from Gregorig +author: John Snow Labs +name: distilbert_base_uncased_finetuned_t_communication_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_t_communication_pipeline` is a English model originally trained by Gregorig. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_t_communication_pipeline_en_5.5.0_3.0_1726742872625.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_t_communication_pipeline_en_5.5.0_3.0_1726742872625.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_t_communication_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_t_communication_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_t_communication_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Gregorig/distilbert-base-uncased-finetuned-t_communication + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_fold_4_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_fold_4_en.md new file mode 100644 index 00000000000000..a89f887330f10d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_fold_4_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_fold_4 DistilBertForSequenceClassification from research-dump +author: John Snow Labs +name: distilbert_base_uncased_fold_4 +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_fold_4` is a English model originally trained by research-dump. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_fold_4_en_5.5.0_3.0_1726743731338.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_fold_4_en_5.5.0_3.0_1726743731338.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_fold_4","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_fold_4", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_fold_4| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/research-dump/distilbert-base-uncased_fold_4 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_fold_4_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_fold_4_pipeline_en.md new file mode 100644 index 00000000000000..216d8f744e3c28 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_fold_4_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_fold_4_pipeline pipeline DistilBertForSequenceClassification from research-dump +author: John Snow Labs +name: distilbert_base_uncased_fold_4_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_fold_4_pipeline` is a English model originally trained by research-dump. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_fold_4_pipeline_en_5.5.0_3.0_1726743744582.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_fold_4_pipeline_en_5.5.0_3.0_1726743744582.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_fold_4_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_fold_4_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_fold_4_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/research-dump/distilbert-base-uncased_fold_4 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_md_gender_bias_trained_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_md_gender_bias_trained_pipeline_en.md new file mode 100644 index 00000000000000..5df6d9cb95ebdd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_md_gender_bias_trained_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_md_gender_bias_trained_pipeline pipeline DistilBertForSequenceClassification from JakobKaiser +author: John Snow Labs +name: distilbert_base_uncased_md_gender_bias_trained_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_md_gender_bias_trained_pipeline` is a English model originally trained by JakobKaiser. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_md_gender_bias_trained_pipeline_en_5.5.0_3.0_1726704582944.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_md_gender_bias_trained_pipeline_en_5.5.0_3.0_1726704582944.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_md_gender_bias_trained_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_md_gender_bias_trained_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_md_gender_bias_trained_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/JakobKaiser/distilbert-base-uncased-md_gender_bias-trained + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_odm_zphr_0st15sd_ut72ut5_plprefix0stlarge_simsp100_clean300_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_odm_zphr_0st15sd_ut72ut5_plprefix0stlarge_simsp100_clean300_en.md new file mode 100644 index 00000000000000..f9aaf3203a4626 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_odm_zphr_0st15sd_ut72ut5_plprefix0stlarge_simsp100_clean300_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st15sd_ut72ut5_plprefix0stlarge_simsp100_clean300 DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st15sd_ut72ut5_plprefix0stlarge_simsp100_clean300 +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st15sd_ut72ut5_plprefix0stlarge_simsp100_clean300` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st15sd_ut72ut5_plprefix0stlarge_simsp100_clean300_en_5.5.0_3.0_1726742731419.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st15sd_ut72ut5_plprefix0stlarge_simsp100_clean300_en_5.5.0_3.0_1726742731419.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st15sd_ut72ut5_plprefix0stlarge_simsp100_clean300","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st15sd_ut72ut5_plprefix0stlarge_simsp100_clean300", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st15sd_ut72ut5_plprefix0stlarge_simsp100_clean300| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.6 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st15sd_ut72ut5_PLPrefix0stlarge_simsp100_clean300 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_odm_zphr_0st16sd_ut72ut1large16pfxnf_simsp400_clean100_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_odm_zphr_0st16sd_ut72ut1large16pfxnf_simsp400_clean100_pipeline_en.md new file mode 100644 index 00000000000000..9f2080336174c2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_odm_zphr_0st16sd_ut72ut1large16pfxnf_simsp400_clean100_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st16sd_ut72ut1large16pfxnf_simsp400_clean100_pipeline pipeline DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st16sd_ut72ut1large16pfxnf_simsp400_clean100_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st16sd_ut72ut1large16pfxnf_simsp400_clean100_pipeline` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st16sd_ut72ut1large16pfxnf_simsp400_clean100_pipeline_en_5.5.0_3.0_1726743009343.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st16sd_ut72ut1large16pfxnf_simsp400_clean100_pipeline_en_5.5.0_3.0_1726743009343.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st16sd_ut72ut1large16pfxnf_simsp400_clean100_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st16sd_ut72ut1large16pfxnf_simsp400_clean100_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st16sd_ut72ut1large16pfxnf_simsp400_clean100_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.7 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st16sd_ut72ut1large16PfxNf_simsp400_clean100 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_odm_zphr_0st17sd_ut72ut1_plprefix0stlarge_simsp300_clean100_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_odm_zphr_0st17sd_ut72ut1_plprefix0stlarge_simsp300_clean100_en.md new file mode 100644 index 00000000000000..98e0f3994887be --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_odm_zphr_0st17sd_ut72ut1_plprefix0stlarge_simsp300_clean100_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st17sd_ut72ut1_plprefix0stlarge_simsp300_clean100 DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st17sd_ut72ut1_plprefix0stlarge_simsp300_clean100 +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st17sd_ut72ut1_plprefix0stlarge_simsp300_clean100` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st17sd_ut72ut1_plprefix0stlarge_simsp300_clean100_en_5.5.0_3.0_1726743424443.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st17sd_ut72ut1_plprefix0stlarge_simsp300_clean100_en_5.5.0_3.0_1726743424443.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st17sd_ut72ut1_plprefix0stlarge_simsp300_clean100","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st17sd_ut72ut1_plprefix0stlarge_simsp300_clean100", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st17sd_ut72ut1_plprefix0stlarge_simsp300_clean100| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.6 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st17sd_ut72ut1_PLPrefix0stlarge_simsp300_clean100 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_odm_zphr_0st17sd_ut72ut1_plprefix0stlarge_simsp300_clean100_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_odm_zphr_0st17sd_ut72ut1_plprefix0stlarge_simsp300_clean100_pipeline_en.md new file mode 100644 index 00000000000000..1d0c2dbba3e270 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_odm_zphr_0st17sd_ut72ut1_plprefix0stlarge_simsp300_clean100_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st17sd_ut72ut1_plprefix0stlarge_simsp300_clean100_pipeline pipeline DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st17sd_ut72ut1_plprefix0stlarge_simsp300_clean100_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st17sd_ut72ut1_plprefix0stlarge_simsp300_clean100_pipeline` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st17sd_ut72ut1_plprefix0stlarge_simsp300_clean100_pipeline_en_5.5.0_3.0_1726743443983.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st17sd_ut72ut1_plprefix0stlarge_simsp300_clean100_pipeline_en_5.5.0_3.0_1726743443983.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st17sd_ut72ut1_plprefix0stlarge_simsp300_clean100_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st17sd_ut72ut1_plprefix0stlarge_simsp300_clean100_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st17sd_ut72ut1_plprefix0stlarge_simsp300_clean100_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.7 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st17sd_ut72ut1_PLPrefix0stlarge_simsp300_clean100 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_odm_zphr_0st17sd_ut72ut1largepfxnf_simsp300_clean200_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_odm_zphr_0st17sd_ut72ut1largepfxnf_simsp300_clean200_en.md new file mode 100644 index 00000000000000..8868cb121c1bca --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_odm_zphr_0st17sd_ut72ut1largepfxnf_simsp300_clean200_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st17sd_ut72ut1largepfxnf_simsp300_clean200 DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st17sd_ut72ut1largepfxnf_simsp300_clean200 +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st17sd_ut72ut1largepfxnf_simsp300_clean200` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st17sd_ut72ut1largepfxnf_simsp300_clean200_en_5.5.0_3.0_1726743858444.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st17sd_ut72ut1largepfxnf_simsp300_clean200_en_5.5.0_3.0_1726743858444.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st17sd_ut72ut1largepfxnf_simsp300_clean200","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st17sd_ut72ut1largepfxnf_simsp300_clean200", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st17sd_ut72ut1largepfxnf_simsp300_clean200| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.6 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st17sd_ut72ut1largePfxNf_simsp300_clean200 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_odm_zphr_0st19sd_ut72ut5_plprefix0stlarge19_simsp100_clean300_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_odm_zphr_0st19sd_ut72ut5_plprefix0stlarge19_simsp100_clean300_en.md new file mode 100644 index 00000000000000..4d2bbd75ec3027 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_odm_zphr_0st19sd_ut72ut5_plprefix0stlarge19_simsp100_clean300_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st19sd_ut72ut5_plprefix0stlarge19_simsp100_clean300 DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st19sd_ut72ut5_plprefix0stlarge19_simsp100_clean300 +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st19sd_ut72ut5_plprefix0stlarge19_simsp100_clean300` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st19sd_ut72ut5_plprefix0stlarge19_simsp100_clean300_en_5.5.0_3.0_1726742907682.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st19sd_ut72ut5_plprefix0stlarge19_simsp100_clean300_en_5.5.0_3.0_1726742907682.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st19sd_ut72ut5_plprefix0stlarge19_simsp100_clean300","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st19sd_ut72ut5_plprefix0stlarge19_simsp100_clean300", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st19sd_ut72ut5_plprefix0stlarge19_simsp100_clean300| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.6 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st19sd_ut72ut5_PLPrefix0stlarge19_simsp100_clean300 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_odm_zphr_0st7sd_ut72ut1large7pfxnf_simsp400_clean300_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_odm_zphr_0st7sd_ut72ut1large7pfxnf_simsp400_clean300_en.md new file mode 100644 index 00000000000000..39d2ceb0b132d7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_odm_zphr_0st7sd_ut72ut1large7pfxnf_simsp400_clean300_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st7sd_ut72ut1large7pfxnf_simsp400_clean300 DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st7sd_ut72ut1large7pfxnf_simsp400_clean300 +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st7sd_ut72ut1large7pfxnf_simsp400_clean300` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st7sd_ut72ut1large7pfxnf_simsp400_clean300_en_5.5.0_3.0_1726742680811.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st7sd_ut72ut1large7pfxnf_simsp400_clean300_en_5.5.0_3.0_1726742680811.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st7sd_ut72ut1large7pfxnf_simsp400_clean300","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st7sd_ut72ut1large7pfxnf_simsp400_clean300", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st7sd_ut72ut1large7pfxnf_simsp400_clean300| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.6 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st7sd_ut72ut1large7PfxNf_simsp400_clean300 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_odm_zphr_0st7sd_ut72ut1large7pfxnf_simsp400_clean300_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_odm_zphr_0st7sd_ut72ut1large7pfxnf_simsp400_clean300_pipeline_en.md new file mode 100644 index 00000000000000..0ab39e2a10a667 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_odm_zphr_0st7sd_ut72ut1large7pfxnf_simsp400_clean300_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st7sd_ut72ut1large7pfxnf_simsp400_clean300_pipeline pipeline DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st7sd_ut72ut1large7pfxnf_simsp400_clean300_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st7sd_ut72ut1large7pfxnf_simsp400_clean300_pipeline` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st7sd_ut72ut1large7pfxnf_simsp400_clean300_pipeline_en_5.5.0_3.0_1726742693449.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st7sd_ut72ut1large7pfxnf_simsp400_clean300_pipeline_en_5.5.0_3.0_1726742693449.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st7sd_ut72ut1large7pfxnf_simsp400_clean300_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st7sd_ut72ut1large7pfxnf_simsp400_clean300_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st7sd_ut72ut1large7pfxnf_simsp400_clean300_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.7 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st7sd_ut72ut1large7PfxNf_simsp400_clean300 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_qa_mash_covid_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_qa_mash_covid_pipeline_en.md new file mode 100644 index 00000000000000..93eeabdc56cee8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_qa_mash_covid_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English distilbert_base_uncased_qa_mash_covid_pipeline pipeline DistilBertForQuestionAnswering from Eurosmart +author: John Snow Labs +name: distilbert_base_uncased_qa_mash_covid_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_qa_mash_covid_pipeline` is a English model originally trained by Eurosmart. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_qa_mash_covid_pipeline_en_5.5.0_3.0_1726766785777.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_qa_mash_covid_pipeline_en_5.5.0_3.0_1726766785777.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_qa_mash_covid_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_qa_mash_covid_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_qa_mash_covid_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.3 MB| + +## References + +https://huggingface.co/Eurosmart/distilbert-base-uncased-qa-mash-covid + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_rile_v1_frozen_4_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_rile_v1_frozen_4_en.md new file mode 100644 index 00000000000000..68034f8e198488 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_rile_v1_frozen_4_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_rile_v1_frozen_4 DistilBertForSequenceClassification from kghanlon +author: John Snow Labs +name: distilbert_base_uncased_rile_v1_frozen_4 +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_rile_v1_frozen_4` is a English model originally trained by kghanlon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_rile_v1_frozen_4_en_5.5.0_3.0_1726744167857.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_rile_v1_frozen_4_en_5.5.0_3.0_1726744167857.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_rile_v1_frozen_4","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_rile_v1_frozen_4", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_rile_v1_frozen_4| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/kghanlon/distilbert-base-uncased-RILE-v1_frozen_4 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_rile_v1_frozen_4_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_rile_v1_frozen_4_pipeline_en.md new file mode 100644 index 00000000000000..596eb381c121b9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_rile_v1_frozen_4_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_rile_v1_frozen_4_pipeline pipeline DistilBertForSequenceClassification from kghanlon +author: John Snow Labs +name: distilbert_base_uncased_rile_v1_frozen_4_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_rile_v1_frozen_4_pipeline` is a English model originally trained by kghanlon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_rile_v1_frozen_4_pipeline_en_5.5.0_3.0_1726744179807.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_rile_v1_frozen_4_pipeline_en_5.5.0_3.0_1726744179807.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_rile_v1_frozen_4_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_rile_v1_frozen_4_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_rile_v1_frozen_4_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/kghanlon/distilbert-base-uncased-RILE-v1_frozen_4 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_sgd_zphr_0st42sd_ut72ut5_plprefix0stlarge_simsp100_clean100_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_sgd_zphr_0st42sd_ut72ut5_plprefix0stlarge_simsp100_clean100_pipeline_en.md new file mode 100644 index 00000000000000..4d8ce2a1ab1345 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_sgd_zphr_0st42sd_ut72ut5_plprefix0stlarge_simsp100_clean100_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_sgd_zphr_0st42sd_ut72ut5_plprefix0stlarge_simsp100_clean100_pipeline pipeline DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_sgd_zphr_0st42sd_ut72ut5_plprefix0stlarge_simsp100_clean100_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_sgd_zphr_0st42sd_ut72ut5_plprefix0stlarge_simsp100_clean100_pipeline` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_sgd_zphr_0st42sd_ut72ut5_plprefix0stlarge_simsp100_clean100_pipeline_en_5.5.0_3.0_1726704719638.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_sgd_zphr_0st42sd_ut72ut5_plprefix0stlarge_simsp100_clean100_pipeline_en_5.5.0_3.0_1726704719638.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_sgd_zphr_0st42sd_ut72ut5_plprefix0stlarge_simsp100_clean100_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_sgd_zphr_0st42sd_ut72ut5_plprefix0stlarge_simsp100_clean100_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_sgd_zphr_0st42sd_ut72ut5_plprefix0stlarge_simsp100_clean100_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.6 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_sgd_zphr_0st42sd_ut72ut5_PLPrefix0stlarge_simsp100_clean100 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_simpleeng_classifier_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_simpleeng_classifier_en.md new file mode 100644 index 00000000000000..8343f3e630d09b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_simpleeng_classifier_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_simpleeng_classifier DistilBertForSequenceClassification from saradiaz +author: John Snow Labs +name: distilbert_base_uncased_simpleeng_classifier +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_simpleeng_classifier` is a English model originally trained by saradiaz. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_simpleeng_classifier_en_5.5.0_3.0_1726763824010.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_simpleeng_classifier_en_5.5.0_3.0_1726763824010.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_simpleeng_classifier","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_simpleeng_classifier", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_simpleeng_classifier| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/saradiaz/distilbert-base-uncased-simpleEng-classifier \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_simpleeng_classifier_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_simpleeng_classifier_pipeline_en.md new file mode 100644 index 00000000000000..fd94b2aa61c288 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_simpleeng_classifier_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_simpleeng_classifier_pipeline pipeline DistilBertForSequenceClassification from saradiaz +author: John Snow Labs +name: distilbert_base_uncased_simpleeng_classifier_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_simpleeng_classifier_pipeline` is a English model originally trained by saradiaz. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_simpleeng_classifier_pipeline_en_5.5.0_3.0_1726763836503.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_simpleeng_classifier_pipeline_en_5.5.0_3.0_1726763836503.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_simpleeng_classifier_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_simpleeng_classifier_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_simpleeng_classifier_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/saradiaz/distilbert-base-uncased-simpleEng-classifier + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_small_talk_zphr_0st_ut12ut1_plain_simsp_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_small_talk_zphr_0st_ut12ut1_plain_simsp_en.md new file mode 100644 index 00000000000000..c68a37bd17f148 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_small_talk_zphr_0st_ut12ut1_plain_simsp_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_small_talk_zphr_0st_ut12ut1_plain_simsp DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_small_talk_zphr_0st_ut12ut1_plain_simsp +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_small_talk_zphr_0st_ut12ut1_plain_simsp` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_small_talk_zphr_0st_ut12ut1_plain_simsp_en_5.5.0_3.0_1726744178540.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_small_talk_zphr_0st_ut12ut1_plain_simsp_en_5.5.0_3.0_1726744178540.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_small_talk_zphr_0st_ut12ut1_plain_simsp","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_small_talk_zphr_0st_ut12ut1_plain_simsp", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_small_talk_zphr_0st_ut12ut1_plain_simsp| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_small_talk_zphr_0st_ut12ut1_plain_simsp \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_small_talk_zphr_2st_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_small_talk_zphr_2st_en.md new file mode 100644 index 00000000000000..bd0f5c5a5caf4c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_small_talk_zphr_2st_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_small_talk_zphr_2st DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_small_talk_zphr_2st +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_small_talk_zphr_2st` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_small_talk_zphr_2st_en_5.5.0_3.0_1726764016565.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_small_talk_zphr_2st_en_5.5.0_3.0_1726764016565.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_small_talk_zphr_2st","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_small_talk_zphr_2st", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_small_talk_zphr_2st| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_small_talk_zphr_2st \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_squad2_pruned_p45_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_squad2_pruned_p45_en.md new file mode 100644 index 00000000000000..48853029966e0b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_squad2_pruned_p45_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English distilbert_base_uncased_squad2_pruned_p45 DistilBertForQuestionAnswering from pminha +author: John Snow Labs +name: distilbert_base_uncased_squad2_pruned_p45 +date: 2024-09-19 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_squad2_pruned_p45` is a English model originally trained by pminha. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_squad2_pruned_p45_en_5.5.0_3.0_1726727880486.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_squad2_pruned_p45_en_5.5.0_3.0_1726727880486.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_uncased_squad2_pruned_p45","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_uncased_squad2_pruned_p45", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_squad2_pruned_p45| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|192.6 MB| + +## References + +https://huggingface.co/pminha/distilbert-base-uncased-squad2-pruned-p45 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_squad2_pruned_p45_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_squad2_pruned_p45_pipeline_en.md new file mode 100644 index 00000000000000..a6e6bb34aa8808 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_squad2_pruned_p45_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English distilbert_base_uncased_squad2_pruned_p45_pipeline pipeline DistilBertForQuestionAnswering from pminha +author: John Snow Labs +name: distilbert_base_uncased_squad2_pruned_p45_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_squad2_pruned_p45_pipeline` is a English model originally trained by pminha. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_squad2_pruned_p45_pipeline_en_5.5.0_3.0_1726727895991.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_squad2_pruned_p45_pipeline_en_5.5.0_3.0_1726727895991.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_squad2_pruned_p45_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_squad2_pruned_p45_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_squad2_pruned_p45_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|192.6 MB| + +## References + +https://huggingface.co/pminha/distilbert-base-uncased-squad2-pruned-p45 + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_squad2_pruned_p50_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_squad2_pruned_p50_en.md new file mode 100644 index 00000000000000..2d529022e74824 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_squad2_pruned_p50_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English distilbert_base_uncased_squad2_pruned_p50 DistilBertForQuestionAnswering from pminha +author: John Snow Labs +name: distilbert_base_uncased_squad2_pruned_p50 +date: 2024-09-19 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_squad2_pruned_p50` is a English model originally trained by pminha. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_squad2_pruned_p50_en_5.5.0_3.0_1726748506237.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_squad2_pruned_p50_en_5.5.0_3.0_1726748506237.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_uncased_squad2_pruned_p50","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_uncased_squad2_pruned_p50", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_squad2_pruned_p50| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|185.2 MB| + +## References + +https://huggingface.co/pminha/distilbert-base-uncased-squad2-pruned-p50 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_travel_zphr_0st_ut12ut1_plain_simsp_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_travel_zphr_0st_ut12ut1_plain_simsp_en.md new file mode 100644 index 00000000000000..d394deebce64ce --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_travel_zphr_0st_ut12ut1_plain_simsp_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_travel_zphr_0st_ut12ut1_plain_simsp DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_travel_zphr_0st_ut12ut1_plain_simsp +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_travel_zphr_0st_ut12ut1_plain_simsp` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_travel_zphr_0st_ut12ut1_plain_simsp_en_5.5.0_3.0_1726704588976.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_travel_zphr_0st_ut12ut1_plain_simsp_en_5.5.0_3.0_1726704588976.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_travel_zphr_0st_ut12ut1_plain_simsp","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_travel_zphr_0st_ut12ut1_plain_simsp", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_travel_zphr_0st_ut12ut1_plain_simsp| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_travel_zphr_0st_ut12ut1_plain_simsp \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_travel_zphr_1st_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_travel_zphr_1st_en.md new file mode 100644 index 00000000000000..1467fa63715e34 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_travel_zphr_1st_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_travel_zphr_1st DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_travel_zphr_1st +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_travel_zphr_1st` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_travel_zphr_1st_en_5.5.0_3.0_1726719402788.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_travel_zphr_1st_en_5.5.0_3.0_1726719402788.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_travel_zphr_1st","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_travel_zphr_1st", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_travel_zphr_1st| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_travel_zphr_1st \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilbert_college_experience_classifier_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilbert_college_experience_classifier_en.md new file mode 100644 index 00000000000000..bd2eb8aab385a9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilbert_college_experience_classifier_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_college_experience_classifier DistilBertForSequenceClassification from jasonchay +author: John Snow Labs +name: distilbert_college_experience_classifier +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_college_experience_classifier` is a English model originally trained by jasonchay. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_college_experience_classifier_en_5.5.0_3.0_1726742905743.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_college_experience_classifier_en_5.5.0_3.0_1726742905743.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_college_experience_classifier","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_college_experience_classifier", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_college_experience_classifier| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/jasonchay/distilbert-college-experience-classifier \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilbert_college_experience_classifier_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilbert_college_experience_classifier_pipeline_en.md new file mode 100644 index 00000000000000..ea1b2d09b9c0c8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilbert_college_experience_classifier_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_college_experience_classifier_pipeline pipeline DistilBertForSequenceClassification from jasonchay +author: John Snow Labs +name: distilbert_college_experience_classifier_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_college_experience_classifier_pipeline` is a English model originally trained by jasonchay. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_college_experience_classifier_pipeline_en_5.5.0_3.0_1726742918779.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_college_experience_classifier_pipeline_en_5.5.0_3.0_1726742918779.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_college_experience_classifier_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_college_experience_classifier_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_college_experience_classifier_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/jasonchay/distilbert-college-experience-classifier + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilbert_emotion_bilalinan_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilbert_emotion_bilalinan_en.md new file mode 100644 index 00000000000000..4dc8fc061872c2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilbert_emotion_bilalinan_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_emotion_bilalinan DistilBertForSequenceClassification from bilalinan +author: John Snow Labs +name: distilbert_emotion_bilalinan +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_emotion_bilalinan` is a English model originally trained by bilalinan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_emotion_bilalinan_en_5.5.0_3.0_1726704281278.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_emotion_bilalinan_en_5.5.0_3.0_1726704281278.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_emotion_bilalinan","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_emotion_bilalinan", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_emotion_bilalinan| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/bilalinan/distilbert-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilbert_emotion_bilalinan_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilbert_emotion_bilalinan_pipeline_en.md new file mode 100644 index 00000000000000..428ca82b22e92f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilbert_emotion_bilalinan_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_emotion_bilalinan_pipeline pipeline DistilBertForSequenceClassification from bilalinan +author: John Snow Labs +name: distilbert_emotion_bilalinan_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_emotion_bilalinan_pipeline` is a English model originally trained by bilalinan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_emotion_bilalinan_pipeline_en_5.5.0_3.0_1726704293519.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_emotion_bilalinan_pipeline_en_5.5.0_3.0_1726704293519.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_emotion_bilalinan_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_emotion_bilalinan_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_emotion_bilalinan_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/bilalinan/distilbert-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilbert_fine_tuned_rte_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilbert_fine_tuned_rte_en.md new file mode 100644 index 00000000000000..fba84d78a4ed11 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilbert_fine_tuned_rte_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_fine_tuned_rte DistilBertForSequenceClassification from rycecorn +author: John Snow Labs +name: distilbert_fine_tuned_rte +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_fine_tuned_rte` is a English model originally trained by rycecorn. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_fine_tuned_rte_en_5.5.0_3.0_1726704234954.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_fine_tuned_rte_en_5.5.0_3.0_1726704234954.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_fine_tuned_rte","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_fine_tuned_rte", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_fine_tuned_rte| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/rycecorn/DistilBert-fine-tuned-RTE \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilbert_finetuned_imdb_sentiment_dipanjans_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilbert_finetuned_imdb_sentiment_dipanjans_en.md new file mode 100644 index 00000000000000..af14298fdc6a18 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilbert_finetuned_imdb_sentiment_dipanjans_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_finetuned_imdb_sentiment_dipanjans DistilBertForSequenceClassification from dipanjanS +author: John Snow Labs +name: distilbert_finetuned_imdb_sentiment_dipanjans +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_finetuned_imdb_sentiment_dipanjans` is a English model originally trained by dipanjanS. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_finetuned_imdb_sentiment_dipanjans_en_5.5.0_3.0_1726741466694.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_finetuned_imdb_sentiment_dipanjans_en_5.5.0_3.0_1726741466694.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_finetuned_imdb_sentiment_dipanjans","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_finetuned_imdb_sentiment_dipanjans", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_finetuned_imdb_sentiment_dipanjans| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/dipanjanS/distilbert-finetuned-imdb-sentiment \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilbert_imdb_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilbert_imdb_en.md new file mode 100644 index 00000000000000..bb1ba90e9eee60 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilbert_imdb_en.md @@ -0,0 +1,98 @@ +--- +layout: model +title: English distilbert_imdb DistilBertForSequenceClassification from songyi-ng +author: John Snow Labs +name: distilbert_imdb +date: 2024-09-19 +tags: [bert, en, open_source, sequence_classification, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_imdb` is a English model originally trained by songyi-ng. + +## Predicted Entities + + + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_imdb_en_5.5.0_3.0_1726780339502.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_imdb_en_5.5.0_3.0_1726780339502.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +document_assembler = DocumentAssembler()\ + .setInputCol("text")\ + .setOutputCol("document") + +tokenizer = Tokenizer()\ + .setInputCols("document")\ + .setOutputCol("token") + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_imdb","en")\ + .setInputCols(["document","token"])\ + .setOutputCol("class") + +pipeline = Pipeline().setStages([document_assembler, tokenizer, sequenceClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val document_assembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_imdb","en") + .setInputCols(Array("document","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_imdb| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|461.8 MB| + +## References + +References + +https://huggingface.co/songyi-ng/distilbert_IMDB \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilbert_lora_finetuned_merged_imdb_sentiment_aniket_jain_9_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilbert_lora_finetuned_merged_imdb_sentiment_aniket_jain_9_en.md new file mode 100644 index 00000000000000..0fc773afe37edf --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilbert_lora_finetuned_merged_imdb_sentiment_aniket_jain_9_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_lora_finetuned_merged_imdb_sentiment_aniket_jain_9 DistilBertForSequenceClassification from aniket-jain-9 +author: John Snow Labs +name: distilbert_lora_finetuned_merged_imdb_sentiment_aniket_jain_9 +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_lora_finetuned_merged_imdb_sentiment_aniket_jain_9` is a English model originally trained by aniket-jain-9. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_lora_finetuned_merged_imdb_sentiment_aniket_jain_9_en_5.5.0_3.0_1726741119425.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_lora_finetuned_merged_imdb_sentiment_aniket_jain_9_en_5.5.0_3.0_1726741119425.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_lora_finetuned_merged_imdb_sentiment_aniket_jain_9","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_lora_finetuned_merged_imdb_sentiment_aniket_jain_9", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_lora_finetuned_merged_imdb_sentiment_aniket_jain_9| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/aniket-jain-9/distilbert-lora-finetuned-merged-imdb-sentiment \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilbert_lora_finetuned_merged_imdb_sentiment_aniket_jain_9_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilbert_lora_finetuned_merged_imdb_sentiment_aniket_jain_9_pipeline_en.md new file mode 100644 index 00000000000000..4a46066b53a304 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilbert_lora_finetuned_merged_imdb_sentiment_aniket_jain_9_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_lora_finetuned_merged_imdb_sentiment_aniket_jain_9_pipeline pipeline DistilBertForSequenceClassification from aniket-jain-9 +author: John Snow Labs +name: distilbert_lora_finetuned_merged_imdb_sentiment_aniket_jain_9_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_lora_finetuned_merged_imdb_sentiment_aniket_jain_9_pipeline` is a English model originally trained by aniket-jain-9. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_lora_finetuned_merged_imdb_sentiment_aniket_jain_9_pipeline_en_5.5.0_3.0_1726741131924.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_lora_finetuned_merged_imdb_sentiment_aniket_jain_9_pipeline_en_5.5.0_3.0_1726741131924.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_lora_finetuned_merged_imdb_sentiment_aniket_jain_9_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_lora_finetuned_merged_imdb_sentiment_aniket_jain_9_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_lora_finetuned_merged_imdb_sentiment_aniket_jain_9_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/aniket-jain-9/distilbert-lora-finetuned-merged-imdb-sentiment + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilbert_netincomeloss_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilbert_netincomeloss_en.md new file mode 100644 index 00000000000000..34f44d3d680c8c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilbert_netincomeloss_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_netincomeloss DistilBertForSequenceClassification from lenguyen +author: John Snow Labs +name: distilbert_netincomeloss +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_netincomeloss` is a English model originally trained by lenguyen. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_netincomeloss_en_5.5.0_3.0_1726742545962.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_netincomeloss_en_5.5.0_3.0_1726742545962.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_netincomeloss","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_netincomeloss", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_netincomeloss| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|411.0 MB| + +## References + +https://huggingface.co/lenguyen/distilbert_NetIncomeLoss \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_qnli_256_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_qnli_256_en.md new file mode 100644 index 00000000000000..9b11239a5a2f11 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_qnli_256_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_qnli_256 DistilBertForSequenceClassification from gokuls +author: John Snow Labs +name: distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_qnli_256 +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_qnli_256` is a English model originally trained by gokuls. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_qnli_256_en_5.5.0_3.0_1726742882540.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_qnli_256_en_5.5.0_3.0_1726742882540.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_qnli_256","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_qnli_256", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_qnli_256| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|71.7 MB| + +## References + +https://huggingface.co/gokuls/distilbert_sa_GLUE_Experiment_logit_kd_data_aug_qnli_256 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_qnli_256_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_qnli_256_pipeline_en.md new file mode 100644 index 00000000000000..2e500f81fb1986 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_qnli_256_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_qnli_256_pipeline pipeline DistilBertForSequenceClassification from gokuls +author: John Snow Labs +name: distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_qnli_256_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_qnli_256_pipeline` is a English model originally trained by gokuls. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_qnli_256_pipeline_en_5.5.0_3.0_1726742886670.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_qnli_256_pipeline_en_5.5.0_3.0_1726742886670.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_qnli_256_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_qnli_256_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_qnli_256_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|71.7 MB| + +## References + +https://huggingface.co/gokuls/distilbert_sa_GLUE_Experiment_logit_kd_data_aug_qnli_256 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilbert_sanskrit_saskta_glue_experiment_logit_kd_sst2_96_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilbert_sanskrit_saskta_glue_experiment_logit_kd_sst2_96_en.md new file mode 100644 index 00000000000000..df128abccd5fcc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilbert_sanskrit_saskta_glue_experiment_logit_kd_sst2_96_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_sanskrit_saskta_glue_experiment_logit_kd_sst2_96 DistilBertForSequenceClassification from gokuls +author: John Snow Labs +name: distilbert_sanskrit_saskta_glue_experiment_logit_kd_sst2_96 +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_sanskrit_saskta_glue_experiment_logit_kd_sst2_96` is a English model originally trained by gokuls. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_sanskrit_saskta_glue_experiment_logit_kd_sst2_96_en_5.5.0_3.0_1726743840199.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_sanskrit_saskta_glue_experiment_logit_kd_sst2_96_en_5.5.0_3.0_1726743840199.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_sanskrit_saskta_glue_experiment_logit_kd_sst2_96","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_sanskrit_saskta_glue_experiment_logit_kd_sst2_96", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_sanskrit_saskta_glue_experiment_logit_kd_sst2_96| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|25.7 MB| + +## References + +https://huggingface.co/gokuls/distilbert_sa_GLUE_Experiment_logit_kd_sst2_96 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilbert_sanskrit_saskta_glue_experiment_mnli_192_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilbert_sanskrit_saskta_glue_experiment_mnli_192_en.md new file mode 100644 index 00000000000000..fd58ea0eecc66c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilbert_sanskrit_saskta_glue_experiment_mnli_192_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_sanskrit_saskta_glue_experiment_mnli_192 DistilBertForSequenceClassification from gokuls +author: John Snow Labs +name: distilbert_sanskrit_saskta_glue_experiment_mnli_192 +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_sanskrit_saskta_glue_experiment_mnli_192` is a English model originally trained by gokuls. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_sanskrit_saskta_glue_experiment_mnli_192_en_5.5.0_3.0_1726742896953.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_sanskrit_saskta_glue_experiment_mnli_192_en_5.5.0_3.0_1726742896953.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_sanskrit_saskta_glue_experiment_mnli_192","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_sanskrit_saskta_glue_experiment_mnli_192", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_sanskrit_saskta_glue_experiment_mnli_192| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|52.7 MB| + +## References + +https://huggingface.co/gokuls/distilbert_sa_GLUE_Experiment_mnli_192 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilbert_sanskrit_saskta_glue_experiment_mnli_192_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilbert_sanskrit_saskta_glue_experiment_mnli_192_pipeline_en.md new file mode 100644 index 00000000000000..330a92927e95b4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilbert_sanskrit_saskta_glue_experiment_mnli_192_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_sanskrit_saskta_glue_experiment_mnli_192_pipeline pipeline DistilBertForSequenceClassification from gokuls +author: John Snow Labs +name: distilbert_sanskrit_saskta_glue_experiment_mnli_192_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_sanskrit_saskta_glue_experiment_mnli_192_pipeline` is a English model originally trained by gokuls. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_sanskrit_saskta_glue_experiment_mnli_192_pipeline_en_5.5.0_3.0_1726742900045.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_sanskrit_saskta_glue_experiment_mnli_192_pipeline_en_5.5.0_3.0_1726742900045.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_sanskrit_saskta_glue_experiment_mnli_192_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_sanskrit_saskta_glue_experiment_mnli_192_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_sanskrit_saskta_glue_experiment_mnli_192_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|52.7 MB| + +## References + +https://huggingface.co/gokuls/distilbert_sa_GLUE_Experiment_mnli_192 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilbert_sanskrit_saskta_glue_experiment_rte_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilbert_sanskrit_saskta_glue_experiment_rte_pipeline_en.md new file mode 100644 index 00000000000000..e7390d3747c8e3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilbert_sanskrit_saskta_glue_experiment_rte_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_sanskrit_saskta_glue_experiment_rte_pipeline pipeline DistilBertForSequenceClassification from gokuls +author: John Snow Labs +name: distilbert_sanskrit_saskta_glue_experiment_rte_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_sanskrit_saskta_glue_experiment_rte_pipeline` is a English model originally trained by gokuls. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_sanskrit_saskta_glue_experiment_rte_pipeline_en_5.5.0_3.0_1726743765377.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_sanskrit_saskta_glue_experiment_rte_pipeline_en_5.5.0_3.0_1726743765377.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_sanskrit_saskta_glue_experiment_rte_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_sanskrit_saskta_glue_experiment_rte_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_sanskrit_saskta_glue_experiment_rte_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|250.8 MB| + +## References + +https://huggingface.co/gokuls/distilbert_sa_GLUE_Experiment_rte + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilbert_sst2_padding90model_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilbert_sst2_padding90model_en.md new file mode 100644 index 00000000000000..3dc7962ef62c9e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilbert_sst2_padding90model_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_sst2_padding90model DistilBertForSequenceClassification from Realgon +author: John Snow Labs +name: distilbert_sst2_padding90model +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_sst2_padding90model` is a English model originally trained by Realgon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_sst2_padding90model_en_5.5.0_3.0_1726719285012.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_sst2_padding90model_en_5.5.0_3.0_1726719285012.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_sst2_padding90model","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_sst2_padding90model", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_sst2_padding90model| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Realgon/distilbert_sst2_padding90model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilbert_sst2_padding90model_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilbert_sst2_padding90model_pipeline_en.md new file mode 100644 index 00000000000000..943b7d1b1c5533 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilbert_sst2_padding90model_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_sst2_padding90model_pipeline pipeline DistilBertForSequenceClassification from Realgon +author: John Snow Labs +name: distilbert_sst2_padding90model_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_sst2_padding90model_pipeline` is a English model originally trained by Realgon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_sst2_padding90model_pipeline_en_5.5.0_3.0_1726719298265.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_sst2_padding90model_pipeline_en_5.5.0_3.0_1726719298265.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_sst2_padding90model_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_sst2_padding90model_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_sst2_padding90model_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Realgon/distilbert_sst2_padding90model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilbert_turk_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilbert_turk_en.md new file mode 100644 index 00000000000000..1061f08b234bd2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilbert_turk_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_turk DistilBertForSequenceClassification from alionder +author: John Snow Labs +name: distilbert_turk +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_turk` is a English model originally trained by alionder. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_turk_en_5.5.0_3.0_1726743002032.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_turk_en_5.5.0_3.0_1726743002032.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_turk","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_turk", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_turk| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|254.1 MB| + +## References + +https://huggingface.co/alionder/distilbert_turk \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilbert_turk_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilbert_turk_pipeline_en.md new file mode 100644 index 00000000000000..de9087a89efb92 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilbert_turk_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_turk_pipeline pipeline DistilBertForSequenceClassification from alionder +author: John Snow Labs +name: distilbert_turk_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_turk_pipeline` is a English model originally trained by alionder. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_turk_pipeline_en_5.5.0_3.0_1726743014804.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_turk_pipeline_en_5.5.0_3.0_1726743014804.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_turk_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_turk_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_turk_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|254.1 MB| + +## References + +https://huggingface.co/alionder/distilbert_turk + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilbert_turkish_qa_turkish_squad_pipeline_tr.md b/docs/_posts/ahmedlone127/2024-09-19-distilbert_turkish_qa_turkish_squad_pipeline_tr.md new file mode 100644 index 00000000000000..ec3f92411690a2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilbert_turkish_qa_turkish_squad_pipeline_tr.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Turkish distilbert_turkish_qa_turkish_squad_pipeline pipeline DistilBertForQuestionAnswering from anilguven +author: John Snow Labs +name: distilbert_turkish_qa_turkish_squad_pipeline +date: 2024-09-19 +tags: [tr, open_source, pipeline, onnx] +task: Question Answering +language: tr +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_turkish_qa_turkish_squad_pipeline` is a Turkish model originally trained by anilguven. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_turkish_qa_turkish_squad_pipeline_tr_5.5.0_3.0_1726727797276.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_turkish_qa_turkish_squad_pipeline_tr_5.5.0_3.0_1726727797276.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_turkish_qa_turkish_squad_pipeline", lang = "tr") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_turkish_qa_turkish_squad_pipeline", lang = "tr") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_turkish_qa_turkish_squad_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|tr| +|Size:|251.8 MB| + +## References + +https://huggingface.co/anilguven/distilbert_tr_qa_turkish_squad + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilbert_turkish_sentiment11_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilbert_turkish_sentiment11_en.md new file mode 100644 index 00000000000000..67e78f15281f1d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilbert_turkish_sentiment11_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_turkish_sentiment11 DistilBertForSequenceClassification from balciberin +author: John Snow Labs +name: distilbert_turkish_sentiment11 +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_turkish_sentiment11` is a English model originally trained by balciberin. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_turkish_sentiment11_en_5.5.0_3.0_1726743161153.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_turkish_sentiment11_en_5.5.0_3.0_1726743161153.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_turkish_sentiment11","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_turkish_sentiment11", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_turkish_sentiment11| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/balciberin/distilbert_turkish_sentiment11 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilbert_turkish_sentiment11_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilbert_turkish_sentiment11_pipeline_en.md new file mode 100644 index 00000000000000..a6cc0d4398857b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilbert_turkish_sentiment11_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_turkish_sentiment11_pipeline pipeline DistilBertForSequenceClassification from balciberin +author: John Snow Labs +name: distilbert_turkish_sentiment11_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_turkish_sentiment11_pipeline` is a English model originally trained by balciberin. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_turkish_sentiment11_pipeline_en_5.5.0_3.0_1726743173090.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_turkish_sentiment11_pipeline_en_5.5.0_3.0_1726743173090.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_turkish_sentiment11_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_turkish_sentiment11_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_turkish_sentiment11_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/balciberin/distilbert_turkish_sentiment11 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilbert_turkish_turkish_tweet_pipeline_tr.md b/docs/_posts/ahmedlone127/2024-09-19-distilbert_turkish_turkish_tweet_pipeline_tr.md new file mode 100644 index 00000000000000..f52aa3d83a03e1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilbert_turkish_turkish_tweet_pipeline_tr.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Turkish distilbert_turkish_turkish_tweet_pipeline pipeline DistilBertForSequenceClassification from anilguven +author: John Snow Labs +name: distilbert_turkish_turkish_tweet_pipeline +date: 2024-09-19 +tags: [tr, open_source, pipeline, onnx] +task: Text Classification +language: tr +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_turkish_turkish_tweet_pipeline` is a Turkish model originally trained by anilguven. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_turkish_turkish_tweet_pipeline_tr_5.5.0_3.0_1726740873841.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_turkish_turkish_tweet_pipeline_tr_5.5.0_3.0_1726740873841.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_turkish_turkish_tweet_pipeline", lang = "tr") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_turkish_turkish_tweet_pipeline", lang = "tr") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_turkish_turkish_tweet_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|tr| +|Size:|254.1 MB| + +## References + +https://huggingface.co/anilguven/distilbert_tr_turkish_tweet + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilkobert_ep2_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilkobert_ep2_en.md new file mode 100644 index 00000000000000..99153d4625f89c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilkobert_ep2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilkobert_ep2 DistilBertForSequenceClassification from yeye776 +author: John Snow Labs +name: distilkobert_ep2 +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilkobert_ep2` is a English model originally trained by yeye776. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilkobert_ep2_en_5.5.0_3.0_1726763333004.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilkobert_ep2_en_5.5.0_3.0_1726763333004.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilkobert_ep2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilkobert_ep2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilkobert_ep2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|106.5 MB| + +## References + +https://huggingface.co/yeye776/DistilKoBERT-ep2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilkobert_ep2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilkobert_ep2_pipeline_en.md new file mode 100644 index 00000000000000..af18809981227e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilkobert_ep2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilkobert_ep2_pipeline pipeline DistilBertForSequenceClassification from yeye776 +author: John Snow Labs +name: distilkobert_ep2_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilkobert_ep2_pipeline` is a English model originally trained by yeye776. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilkobert_ep2_pipeline_en_5.5.0_3.0_1726763338735.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilkobert_ep2_pipeline_en_5.5.0_3.0_1726763338735.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilkobert_ep2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilkobert_ep2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilkobert_ep2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|106.5 MB| + +## References + +https://huggingface.co/yeye776/DistilKoBERT-ep2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilroberta_base_ft_summonerschool_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilroberta_base_ft_summonerschool_en.md new file mode 100644 index 00000000000000..f8196531fc33a2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilroberta_base_ft_summonerschool_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilroberta_base_ft_summonerschool RoBertaEmbeddings from jkruk +author: John Snow Labs +name: distilroberta_base_ft_summonerschool +date: 2024-09-19 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilroberta_base_ft_summonerschool` is a English model originally trained by jkruk. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilroberta_base_ft_summonerschool_en_5.5.0_3.0_1726747572996.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilroberta_base_ft_summonerschool_en_5.5.0_3.0_1726747572996.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("distilroberta_base_ft_summonerschool","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("distilroberta_base_ft_summonerschool","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilroberta_base_ft_summonerschool| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|306.5 MB| + +## References + +https://huggingface.co/jkruk/distilroberta-base-ft-summonerschool \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilroberta_base_ft_summonerschool_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilroberta_base_ft_summonerschool_pipeline_en.md new file mode 100644 index 00000000000000..86ec6b4b9c7aa4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilroberta_base_ft_summonerschool_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilroberta_base_ft_summonerschool_pipeline pipeline RoBertaEmbeddings from jkruk +author: John Snow Labs +name: distilroberta_base_ft_summonerschool_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilroberta_base_ft_summonerschool_pipeline` is a English model originally trained by jkruk. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilroberta_base_ft_summonerschool_pipeline_en_5.5.0_3.0_1726747588720.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilroberta_base_ft_summonerschool_pipeline_en_5.5.0_3.0_1726747588720.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilroberta_base_ft_summonerschool_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilroberta_base_ft_summonerschool_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilroberta_base_ft_summonerschool_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|306.5 MB| + +## References + +https://huggingface.co/jkruk/distilroberta-base-ft-summonerschool + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilroberta_rb156k_opt15_ep40_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilroberta_rb156k_opt15_ep40_en.md new file mode 100644 index 00000000000000..382dd759e0adef --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilroberta_rb156k_opt15_ep40_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilroberta_rb156k_opt15_ep40 RoBertaEmbeddings from judy93536 +author: John Snow Labs +name: distilroberta_rb156k_opt15_ep40 +date: 2024-09-19 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilroberta_rb156k_opt15_ep40` is a English model originally trained by judy93536. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilroberta_rb156k_opt15_ep40_en_5.5.0_3.0_1726778375556.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilroberta_rb156k_opt15_ep40_en_5.5.0_3.0_1726778375556.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("distilroberta_rb156k_opt15_ep40","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("distilroberta_rb156k_opt15_ep40","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilroberta_rb156k_opt15_ep40| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|305.9 MB| + +## References + +https://huggingface.co/judy93536/distilroberta-rb156k-opt15-ep40 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilroberta_rb156k_opt15_ep40_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilroberta_rb156k_opt15_ep40_pipeline_en.md new file mode 100644 index 00000000000000..80c0eb9c8de314 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilroberta_rb156k_opt15_ep40_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilroberta_rb156k_opt15_ep40_pipeline pipeline RoBertaEmbeddings from judy93536 +author: John Snow Labs +name: distilroberta_rb156k_opt15_ep40_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilroberta_rb156k_opt15_ep40_pipeline` is a English model originally trained by judy93536. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilroberta_rb156k_opt15_ep40_pipeline_en_5.5.0_3.0_1726778390404.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilroberta_rb156k_opt15_ep40_pipeline_en_5.5.0_3.0_1726778390404.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilroberta_rb156k_opt15_ep40_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilroberta_rb156k_opt15_ep40_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilroberta_rb156k_opt15_ep40_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|305.9 MB| + +## References + +https://huggingface.co/judy93536/distilroberta-rb156k-opt15-ep40 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-emotion_english_distilroberta_base_fine_tuned_for_amazon_reviews_english_3_en.md b/docs/_posts/ahmedlone127/2024-09-19-emotion_english_distilroberta_base_fine_tuned_for_amazon_reviews_english_3_en.md new file mode 100644 index 00000000000000..55bc85b1f3ba7c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-emotion_english_distilroberta_base_fine_tuned_for_amazon_reviews_english_3_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English emotion_english_distilroberta_base_fine_tuned_for_amazon_reviews_english_3 RoBertaForSequenceClassification from Abdelrahman-Rezk +author: John Snow Labs +name: emotion_english_distilroberta_base_fine_tuned_for_amazon_reviews_english_3 +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`emotion_english_distilroberta_base_fine_tuned_for_amazon_reviews_english_3` is a English model originally trained by Abdelrahman-Rezk. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/emotion_english_distilroberta_base_fine_tuned_for_amazon_reviews_english_3_en_5.5.0_3.0_1726725985729.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/emotion_english_distilroberta_base_fine_tuned_for_amazon_reviews_english_3_en_5.5.0_3.0_1726725985729.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("emotion_english_distilroberta_base_fine_tuned_for_amazon_reviews_english_3","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("emotion_english_distilroberta_base_fine_tuned_for_amazon_reviews_english_3", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|emotion_english_distilroberta_base_fine_tuned_for_amazon_reviews_english_3| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|309.0 MB| + +## References + +https://huggingface.co/Abdelrahman-Rezk/emotion-english-distilroberta-base-fine_tuned_for_amazon_reviews_english_3 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-emotion_english_distilroberta_base_fine_tuned_for_amazon_reviews_english_3_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-emotion_english_distilroberta_base_fine_tuned_for_amazon_reviews_english_3_pipeline_en.md new file mode 100644 index 00000000000000..c9cef6acff12b3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-emotion_english_distilroberta_base_fine_tuned_for_amazon_reviews_english_3_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English emotion_english_distilroberta_base_fine_tuned_for_amazon_reviews_english_3_pipeline pipeline RoBertaForSequenceClassification from Abdelrahman-Rezk +author: John Snow Labs +name: emotion_english_distilroberta_base_fine_tuned_for_amazon_reviews_english_3_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`emotion_english_distilroberta_base_fine_tuned_for_amazon_reviews_english_3_pipeline` is a English model originally trained by Abdelrahman-Rezk. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/emotion_english_distilroberta_base_fine_tuned_for_amazon_reviews_english_3_pipeline_en_5.5.0_3.0_1726726000160.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/emotion_english_distilroberta_base_fine_tuned_for_amazon_reviews_english_3_pipeline_en_5.5.0_3.0_1726726000160.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("emotion_english_distilroberta_base_fine_tuned_for_amazon_reviews_english_3_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("emotion_english_distilroberta_base_fine_tuned_for_amazon_reviews_english_3_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|emotion_english_distilroberta_base_fine_tuned_for_amazon_reviews_english_3_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|309.1 MB| + +## References + +https://huggingface.co/Abdelrahman-Rezk/emotion-english-distilroberta-base-fine_tuned_for_amazon_reviews_english_3 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-emotion_predictor_for_emotion_chat_bot_en.md b/docs/_posts/ahmedlone127/2024-09-19-emotion_predictor_for_emotion_chat_bot_en.md new file mode 100644 index 00000000000000..46a6d22dee3399 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-emotion_predictor_for_emotion_chat_bot_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English emotion_predictor_for_emotion_chat_bot RoBertaForSequenceClassification from Shotaro30678 +author: John Snow Labs +name: emotion_predictor_for_emotion_chat_bot +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`emotion_predictor_for_emotion_chat_bot` is a English model originally trained by Shotaro30678. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/emotion_predictor_for_emotion_chat_bot_en_5.5.0_3.0_1726726692215.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/emotion_predictor_for_emotion_chat_bot_en_5.5.0_3.0_1726726692215.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("emotion_predictor_for_emotion_chat_bot","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("emotion_predictor_for_emotion_chat_bot", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|emotion_predictor_for_emotion_chat_bot| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|308.8 MB| + +## References + +https://huggingface.co/Shotaro30678/emotion_predictor_for_emotion_chat_bot \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-emotion_predictor_for_emotion_chat_bot_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-emotion_predictor_for_emotion_chat_bot_pipeline_en.md new file mode 100644 index 00000000000000..a75b2972f85ef6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-emotion_predictor_for_emotion_chat_bot_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English emotion_predictor_for_emotion_chat_bot_pipeline pipeline RoBertaForSequenceClassification from Shotaro30678 +author: John Snow Labs +name: emotion_predictor_for_emotion_chat_bot_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`emotion_predictor_for_emotion_chat_bot_pipeline` is a English model originally trained by Shotaro30678. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/emotion_predictor_for_emotion_chat_bot_pipeline_en_5.5.0_3.0_1726726706574.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/emotion_predictor_for_emotion_chat_bot_pipeline_en_5.5.0_3.0_1726726706574.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("emotion_predictor_for_emotion_chat_bot_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("emotion_predictor_for_emotion_chat_bot_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|emotion_predictor_for_emotion_chat_bot_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|308.9 MB| + +## References + +https://huggingface.co/Shotaro30678/emotion_predictor_for_emotion_chat_bot + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-emotions_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-emotions_pipeline_en.md new file mode 100644 index 00000000000000..e78ada88e3e200 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-emotions_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English emotions_pipeline pipeline RoBertaForSequenceClassification from HARSHU550 +author: John Snow Labs +name: emotions_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`emotions_pipeline` is a English model originally trained by HARSHU550. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/emotions_pipeline_en_5.5.0_3.0_1726733526067.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/emotions_pipeline_en_5.5.0_3.0_1726733526067.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("emotions_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("emotions_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|emotions_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|308.9 MB| + +## References + +https://huggingface.co/HARSHU550/Emotions + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-fakenews_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-fakenews_pipeline_en.md new file mode 100644 index 00000000000000..6088efe5bf3e35 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-fakenews_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English fakenews_pipeline pipeline DistilBertForSequenceClassification from reetghosh1 +author: John Snow Labs +name: fakenews_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`fakenews_pipeline` is a English model originally trained by reetghosh1. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/fakenews_pipeline_en_5.5.0_3.0_1726719427942.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/fakenews_pipeline_en_5.5.0_3.0_1726719427942.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("fakenews_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("fakenews_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|fakenews_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/reetghosh1/FakeNews + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-fakenewsmodel_en.md b/docs/_posts/ahmedlone127/2024-09-19-fakenewsmodel_en.md new file mode 100644 index 00000000000000..ddc2013d9dabda --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-fakenewsmodel_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English fakenewsmodel RoBertaForSequenceClassification from magnusgp +author: John Snow Labs +name: fakenewsmodel +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`fakenewsmodel` is a English model originally trained by magnusgp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/fakenewsmodel_en_5.5.0_3.0_1726750895650.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/fakenewsmodel_en_5.5.0_3.0_1726750895650.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("fakenewsmodel","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("fakenewsmodel", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|fakenewsmodel| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|425.7 MB| + +## References + +https://huggingface.co/magnusgp/fakenewsmodel \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-fin_sentiment_en.md b/docs/_posts/ahmedlone127/2024-09-19-fin_sentiment_en.md new file mode 100644 index 00000000000000..293358d3091a15 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-fin_sentiment_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English fin_sentiment XlmRoBertaForSequenceClassification from thaidv96 +author: John Snow Labs +name: fin_sentiment +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`fin_sentiment` is a English model originally trained by thaidv96. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/fin_sentiment_en_5.5.0_3.0_1726752776357.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/fin_sentiment_en_5.5.0_3.0_1726752776357.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("fin_sentiment","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("fin_sentiment", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|fin_sentiment| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|847.9 MB| + +## References + +https://huggingface.co/thaidv96/fin-sentiment \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-fin_sentiment_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-fin_sentiment_pipeline_en.md new file mode 100644 index 00000000000000..2027739d179ef7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-fin_sentiment_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English fin_sentiment_pipeline pipeline XlmRoBertaForSequenceClassification from thaidv96 +author: John Snow Labs +name: fin_sentiment_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`fin_sentiment_pipeline` is a English model originally trained by thaidv96. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/fin_sentiment_pipeline_en_5.5.0_3.0_1726752848555.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/fin_sentiment_pipeline_en_5.5.0_3.0_1726752848555.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("fin_sentiment_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("fin_sentiment_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|fin_sentiment_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|847.9 MB| + +## References + +https://huggingface.co/thaidv96/fin-sentiment + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-financial_sentiment_distilbert_en.md b/docs/_posts/ahmedlone127/2024-09-19-financial_sentiment_distilbert_en.md new file mode 100644 index 00000000000000..acd54be0d13320 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-financial_sentiment_distilbert_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English financial_sentiment_distilbert DistilBertForSequenceClassification from runaksh +author: John Snow Labs +name: financial_sentiment_distilbert +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`financial_sentiment_distilbert` is a English model originally trained by runaksh. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/financial_sentiment_distilbert_en_5.5.0_3.0_1726704648580.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/financial_sentiment_distilbert_en_5.5.0_3.0_1726704648580.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("financial_sentiment_distilbert","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("financial_sentiment_distilbert", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|financial_sentiment_distilbert| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/runaksh/financial_sentiment_distilBERT \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-financial_sentiment_model_1500_samples_en.md b/docs/_posts/ahmedlone127/2024-09-19-financial_sentiment_model_1500_samples_en.md new file mode 100644 index 00000000000000..b785d15b7578ce --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-financial_sentiment_model_1500_samples_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English financial_sentiment_model_1500_samples DistilBertForSequenceClassification from kevinwlip +author: John Snow Labs +name: financial_sentiment_model_1500_samples +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`financial_sentiment_model_1500_samples` is a English model originally trained by kevinwlip. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/financial_sentiment_model_1500_samples_en_5.5.0_3.0_1726719106847.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/financial_sentiment_model_1500_samples_en_5.5.0_3.0_1726719106847.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("financial_sentiment_model_1500_samples","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("financial_sentiment_model_1500_samples", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|financial_sentiment_model_1500_samples| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/kevinwlip/financial-sentiment-model-1500-samples \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-financial_sentiment_model_1500_samples_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-financial_sentiment_model_1500_samples_pipeline_en.md new file mode 100644 index 00000000000000..f300c2c8ac3415 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-financial_sentiment_model_1500_samples_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English financial_sentiment_model_1500_samples_pipeline pipeline DistilBertForSequenceClassification from kevinwlip +author: John Snow Labs +name: financial_sentiment_model_1500_samples_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`financial_sentiment_model_1500_samples_pipeline` is a English model originally trained by kevinwlip. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/financial_sentiment_model_1500_samples_pipeline_en_5.5.0_3.0_1726719119424.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/financial_sentiment_model_1500_samples_pipeline_en_5.5.0_3.0_1726719119424.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("financial_sentiment_model_1500_samples_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("financial_sentiment_model_1500_samples_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|financial_sentiment_model_1500_samples_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/kevinwlip/financial-sentiment-model-1500-samples + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-finbert_netcashprovidedbyusedininvestingactivities_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-finbert_netcashprovidedbyusedininvestingactivities_pipeline_en.md new file mode 100644 index 00000000000000..6c819eea5c1090 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-finbert_netcashprovidedbyusedininvestingactivities_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finbert_netcashprovidedbyusedininvestingactivities_pipeline pipeline DistilBertForSequenceClassification from lenguyen +author: John Snow Labs +name: finbert_netcashprovidedbyusedininvestingactivities_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finbert_netcashprovidedbyusedininvestingactivities_pipeline` is a English model originally trained by lenguyen. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finbert_netcashprovidedbyusedininvestingactivities_pipeline_en_5.5.0_3.0_1726741163115.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finbert_netcashprovidedbyusedininvestingactivities_pipeline_en_5.5.0_3.0_1726741163115.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finbert_netcashprovidedbyusedininvestingactivities_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finbert_netcashprovidedbyusedininvestingactivities_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finbert_netcashprovidedbyusedininvestingactivities_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|411.0 MB| + +## References + +https://huggingface.co/lenguyen/finbert_NetCashProvidedByUsedInInvestingActivities + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-finbert_revenuefromcontractwithcustomerexcludingassessedtax_en.md b/docs/_posts/ahmedlone127/2024-09-19-finbert_revenuefromcontractwithcustomerexcludingassessedtax_en.md new file mode 100644 index 00000000000000..74869de1e87647 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-finbert_revenuefromcontractwithcustomerexcludingassessedtax_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finbert_revenuefromcontractwithcustomerexcludingassessedtax DistilBertForSequenceClassification from lenguyen +author: John Snow Labs +name: finbert_revenuefromcontractwithcustomerexcludingassessedtax +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finbert_revenuefromcontractwithcustomerexcludingassessedtax` is a English model originally trained by lenguyen. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finbert_revenuefromcontractwithcustomerexcludingassessedtax_en_5.5.0_3.0_1726741490485.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finbert_revenuefromcontractwithcustomerexcludingassessedtax_en_5.5.0_3.0_1726741490485.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("finbert_revenuefromcontractwithcustomerexcludingassessedtax","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("finbert_revenuefromcontractwithcustomerexcludingassessedtax", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finbert_revenuefromcontractwithcustomerexcludingassessedtax| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|410.9 MB| + +## References + +https://huggingface.co/lenguyen/finbert_RevenueFromContractWithCustomerExcludingAssessedTax \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-finbert_revenuefromcontractwithcustomerexcludingassessedtax_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-finbert_revenuefromcontractwithcustomerexcludingassessedtax_pipeline_en.md new file mode 100644 index 00000000000000..ab5970684363c6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-finbert_revenuefromcontractwithcustomerexcludingassessedtax_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finbert_revenuefromcontractwithcustomerexcludingassessedtax_pipeline pipeline DistilBertForSequenceClassification from lenguyen +author: John Snow Labs +name: finbert_revenuefromcontractwithcustomerexcludingassessedtax_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finbert_revenuefromcontractwithcustomerexcludingassessedtax_pipeline` is a English model originally trained by lenguyen. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finbert_revenuefromcontractwithcustomerexcludingassessedtax_pipeline_en_5.5.0_3.0_1726741510145.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finbert_revenuefromcontractwithcustomerexcludingassessedtax_pipeline_en_5.5.0_3.0_1726741510145.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finbert_revenuefromcontractwithcustomerexcludingassessedtax_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finbert_revenuefromcontractwithcustomerexcludingassessedtax_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finbert_revenuefromcontractwithcustomerexcludingassessedtax_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|411.0 MB| + +## References + +https://huggingface.co/lenguyen/finbert_RevenueFromContractWithCustomerExcludingAssessedTax + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-finetuned_demo_sagax_sagacis_en.md b/docs/_posts/ahmedlone127/2024-09-19-finetuned_demo_sagax_sagacis_en.md new file mode 100644 index 00000000000000..08b93985284dcb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-finetuned_demo_sagax_sagacis_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuned_demo_sagax_sagacis DistilBertForSequenceClassification from sagax-sagacis +author: John Snow Labs +name: finetuned_demo_sagax_sagacis +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuned_demo_sagax_sagacis` is a English model originally trained by sagax-sagacis. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuned_demo_sagax_sagacis_en_5.5.0_3.0_1726763772341.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuned_demo_sagax_sagacis_en_5.5.0_3.0_1726763772341.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuned_demo_sagax_sagacis","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuned_demo_sagax_sagacis", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuned_demo_sagax_sagacis| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|246.0 MB| + +## References + +https://huggingface.co/sagax-sagacis/finetuned_demo \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-finetuned_fakenewsdetect_robertabasedl_en.md b/docs/_posts/ahmedlone127/2024-09-19-finetuned_fakenewsdetect_robertabasedl_en.md new file mode 100644 index 00000000000000..e38ff3d515b84a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-finetuned_fakenewsdetect_robertabasedl_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuned_fakenewsdetect_robertabasedl RoBertaForSequenceClassification from Johnson-Olakanmi +author: John Snow Labs +name: finetuned_fakenewsdetect_robertabasedl +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuned_fakenewsdetect_robertabasedl` is a English model originally trained by Johnson-Olakanmi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuned_fakenewsdetect_robertabasedl_en_5.5.0_3.0_1726750926432.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuned_fakenewsdetect_robertabasedl_en_5.5.0_3.0_1726750926432.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("finetuned_fakenewsdetect_robertabasedl","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("finetuned_fakenewsdetect_robertabasedl", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuned_fakenewsdetect_robertabasedl| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|458.5 MB| + +## References + +https://huggingface.co/Johnson-Olakanmi/finetuned_fakenewsDetect_RobertaBaseDL \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-finetuned_roberta_base_model_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-finetuned_roberta_base_model_pipeline_en.md new file mode 100644 index 00000000000000..bd95b28eac06d3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-finetuned_roberta_base_model_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuned_roberta_base_model_pipeline pipeline RoBertaForSequenceClassification from KwabenaMufasa +author: John Snow Labs +name: finetuned_roberta_base_model_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuned_roberta_base_model_pipeline` is a English model originally trained by KwabenaMufasa. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuned_roberta_base_model_pipeline_en_5.5.0_3.0_1726780253334.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuned_roberta_base_model_pipeline_en_5.5.0_3.0_1726780253334.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuned_roberta_base_model_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuned_roberta_base_model_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuned_roberta_base_model_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|436.3 MB| + +## References + +https://huggingface.co/KwabenaMufasa/Finetuned-Roberta-base-model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-finetuning_covidsenti_bert_model_en.md b/docs/_posts/ahmedlone127/2024-09-19-finetuning_covidsenti_bert_model_en.md new file mode 100644 index 00000000000000..30f25bdad659d6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-finetuning_covidsenti_bert_model_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuning_covidsenti_bert_model DistilBertForSequenceClassification from Letrica +author: John Snow Labs +name: finetuning_covidsenti_bert_model +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_covidsenti_bert_model` is a English model originally trained by Letrica. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_covidsenti_bert_model_en_5.5.0_3.0_1726763832636.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_covidsenti_bert_model_en_5.5.0_3.0_1726763832636.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_covidsenti_bert_model","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_covidsenti_bert_model", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_covidsenti_bert_model| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Letrica/finetuning-COVIDSenti-bert-model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-finetuning_distillbert_imdb_en.md b/docs/_posts/ahmedlone127/2024-09-19-finetuning_distillbert_imdb_en.md new file mode 100644 index 00000000000000..a2e090fa367f30 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-finetuning_distillbert_imdb_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuning_distillbert_imdb DistilBertForSequenceClassification from kaustavbhattacharjee +author: John Snow Labs +name: finetuning_distillbert_imdb +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_distillbert_imdb` is a English model originally trained by kaustavbhattacharjee. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_distillbert_imdb_en_5.5.0_3.0_1726742454117.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_distillbert_imdb_en_5.5.0_3.0_1726742454117.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_distillbert_imdb","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_distillbert_imdb", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_distillbert_imdb| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/kaustavbhattacharjee/finetuning-DistillBERT-imdb \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-finetuning_distillbert_imdb_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-finetuning_distillbert_imdb_pipeline_en.md new file mode 100644 index 00000000000000..d6d2bd9250ae18 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-finetuning_distillbert_imdb_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuning_distillbert_imdb_pipeline pipeline DistilBertForSequenceClassification from kaustavbhattacharjee +author: John Snow Labs +name: finetuning_distillbert_imdb_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_distillbert_imdb_pipeline` is a English model originally trained by kaustavbhattacharjee. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_distillbert_imdb_pipeline_en_5.5.0_3.0_1726742466111.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_distillbert_imdb_pipeline_en_5.5.0_3.0_1726742466111.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuning_distillbert_imdb_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuning_distillbert_imdb_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_distillbert_imdb_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/kaustavbhattacharjee/finetuning-DistillBERT-imdb + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-finetuning_sentiment_analysis_siebert_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-finetuning_sentiment_analysis_siebert_pipeline_en.md new file mode 100644 index 00000000000000..b7d065dfb47be7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-finetuning_sentiment_analysis_siebert_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuning_sentiment_analysis_siebert_pipeline pipeline RoBertaForSequenceClassification from aruca +author: John Snow Labs +name: finetuning_sentiment_analysis_siebert_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_analysis_siebert_pipeline` is a English model originally trained by aruca. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_analysis_siebert_pipeline_en_5.5.0_3.0_1726726308079.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_analysis_siebert_pipeline_en_5.5.0_3.0_1726726308079.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuning_sentiment_analysis_siebert_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuning_sentiment_analysis_siebert_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_analysis_siebert_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/aruca/finetuning-sentiment-analysis-siebert + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-finetuning_sentiment_model_3000_samples_8_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-finetuning_sentiment_model_3000_samples_8_pipeline_en.md new file mode 100644 index 00000000000000..59ef2e9e0d2b5d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-finetuning_sentiment_model_3000_samples_8_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_8_pipeline pipeline DistilBertForSequenceClassification from mamledes +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_8_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_8_pipeline` is a English model originally trained by mamledes. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_8_pipeline_en_5.5.0_3.0_1726763435783.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_8_pipeline_en_5.5.0_3.0_1726763435783.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuning_sentiment_model_3000_samples_8_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuning_sentiment_model_3000_samples_8_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_8_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/mamledes/finetuning-sentiment-model-3000-samples_8 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-finetuning_sentiment_model_3000_samples_abrario_en.md b/docs/_posts/ahmedlone127/2024-09-19-finetuning_sentiment_model_3000_samples_abrario_en.md new file mode 100644 index 00000000000000..e99c664662445b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-finetuning_sentiment_model_3000_samples_abrario_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_abrario DistilBertForSequenceClassification from abrario +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_abrario +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_abrario` is a English model originally trained by abrario. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_abrario_en_5.5.0_3.0_1726719150321.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_abrario_en_5.5.0_3.0_1726719150321.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_abrario","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_abrario", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_abrario| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/abrario/finetuning-sentiment-model-3000-samples \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-finetuning_sentiment_model_3000_samples_abrario_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-finetuning_sentiment_model_3000_samples_abrario_pipeline_en.md new file mode 100644 index 00000000000000..206d2991e5ceef --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-finetuning_sentiment_model_3000_samples_abrario_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_abrario_pipeline pipeline DistilBertForSequenceClassification from abrario +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_abrario_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_abrario_pipeline` is a English model originally trained by abrario. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_abrario_pipeline_en_5.5.0_3.0_1726719162665.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_abrario_pipeline_en_5.5.0_3.0_1726719162665.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuning_sentiment_model_3000_samples_abrario_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuning_sentiment_model_3000_samples_abrario_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_abrario_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/abrario/finetuning-sentiment-model-3000-samples + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-finetuning_sentiment_model_3000_samples_dantecesar3_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-finetuning_sentiment_model_3000_samples_dantecesar3_pipeline_en.md new file mode 100644 index 00000000000000..85a8db74596003 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-finetuning_sentiment_model_3000_samples_dantecesar3_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_dantecesar3_pipeline pipeline DistilBertForSequenceClassification from dantecesar3 +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_dantecesar3_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_dantecesar3_pipeline` is a English model originally trained by dantecesar3. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_dantecesar3_pipeline_en_5.5.0_3.0_1726704288268.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_dantecesar3_pipeline_en_5.5.0_3.0_1726704288268.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuning_sentiment_model_3000_samples_dantecesar3_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuning_sentiment_model_3000_samples_dantecesar3_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_dantecesar3_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/dantecesar3/finetuning-sentiment-model-3000-samples + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-finetuning_sentiment_model_3000_samples_freedino_en.md b/docs/_posts/ahmedlone127/2024-09-19-finetuning_sentiment_model_3000_samples_freedino_en.md new file mode 100644 index 00000000000000..3759fd53a7d655 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-finetuning_sentiment_model_3000_samples_freedino_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_freedino DistilBertForSequenceClassification from Freedino +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_freedino +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_freedino` is a English model originally trained by Freedino. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_freedino_en_5.5.0_3.0_1726719194691.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_freedino_en_5.5.0_3.0_1726719194691.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_freedino","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_freedino", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_freedino| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Freedino/finetuning-sentiment-model-3000-samples \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-finetuning_sentiment_model_3000_samples_jamnik99_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-finetuning_sentiment_model_3000_samples_jamnik99_pipeline_en.md new file mode 100644 index 00000000000000..ce33267a501239 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-finetuning_sentiment_model_3000_samples_jamnik99_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_jamnik99_pipeline pipeline DistilBertForSequenceClassification from jamnik99 +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_jamnik99_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_jamnik99_pipeline` is a English model originally trained by jamnik99. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_jamnik99_pipeline_en_5.5.0_3.0_1726743212341.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_jamnik99_pipeline_en_5.5.0_3.0_1726743212341.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuning_sentiment_model_3000_samples_jamnik99_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuning_sentiment_model_3000_samples_jamnik99_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_jamnik99_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/jamnik99/finetuning-sentiment-model-3000-samples + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-finetuning_sentiment_model_3000_samples_nickomania_en.md b/docs/_posts/ahmedlone127/2024-09-19-finetuning_sentiment_model_3000_samples_nickomania_en.md new file mode 100644 index 00000000000000..1755fa81a0d520 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-finetuning_sentiment_model_3000_samples_nickomania_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_nickomania DistilBertForSequenceClassification from nickomania +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_nickomania +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_nickomania` is a English model originally trained by nickomania. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_nickomania_en_5.5.0_3.0_1726743424385.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_nickomania_en_5.5.0_3.0_1726743424385.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_nickomania","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_nickomania", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_nickomania| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/nickomania/finetuning-sentiment-model-3000-samples \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-finetuning_sentiment_model_3000_samples_nickomania_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-finetuning_sentiment_model_3000_samples_nickomania_pipeline_en.md new file mode 100644 index 00000000000000..52185fb2310057 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-finetuning_sentiment_model_3000_samples_nickomania_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_nickomania_pipeline pipeline DistilBertForSequenceClassification from nickomania +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_nickomania_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_nickomania_pipeline` is a English model originally trained by nickomania. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_nickomania_pipeline_en_5.5.0_3.0_1726743443952.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_nickomania_pipeline_en_5.5.0_3.0_1726743443952.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuning_sentiment_model_3000_samples_nickomania_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuning_sentiment_model_3000_samples_nickomania_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_nickomania_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/nickomania/finetuning-sentiment-model-3000-samples + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-finetuning_sentiment_model_3000_samples_rjmrajababu_en.md b/docs/_posts/ahmedlone127/2024-09-19-finetuning_sentiment_model_3000_samples_rjmrajababu_en.md new file mode 100644 index 00000000000000..88beae3371db45 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-finetuning_sentiment_model_3000_samples_rjmrajababu_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_rjmrajababu DistilBertForSequenceClassification from RjmRajaBabu +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_rjmrajababu +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_rjmrajababu` is a English model originally trained by RjmRajaBabu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_rjmrajababu_en_5.5.0_3.0_1726764054518.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_rjmrajababu_en_5.5.0_3.0_1726764054518.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_rjmrajababu","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_rjmrajababu", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_rjmrajababu| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/RjmRajaBabu/finetuning-sentiment-model-3000-samples \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-finetuning_sentiment_model_3000_samples_rjmrajababu_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-finetuning_sentiment_model_3000_samples_rjmrajababu_pipeline_en.md new file mode 100644 index 00000000000000..b6e6ae5651e391 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-finetuning_sentiment_model_3000_samples_rjmrajababu_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_rjmrajababu_pipeline pipeline DistilBertForSequenceClassification from RjmRajaBabu +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_rjmrajababu_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_rjmrajababu_pipeline` is a English model originally trained by RjmRajaBabu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_rjmrajababu_pipeline_en_5.5.0_3.0_1726764067358.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_rjmrajababu_pipeline_en_5.5.0_3.0_1726764067358.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuning_sentiment_model_3000_samples_rjmrajababu_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuning_sentiment_model_3000_samples_rjmrajababu_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_rjmrajababu_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/RjmRajaBabu/finetuning-sentiment-model-3000-samples + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-finetuning_sentiment_model_3000_samples_zhian66_en.md b/docs/_posts/ahmedlone127/2024-09-19-finetuning_sentiment_model_3000_samples_zhian66_en.md new file mode 100644 index 00000000000000..d289aad1302210 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-finetuning_sentiment_model_3000_samples_zhian66_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_zhian66 DistilBertForSequenceClassification from Zhian66 +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_zhian66 +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_zhian66` is a English model originally trained by Zhian66. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_zhian66_en_5.5.0_3.0_1726719185627.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_zhian66_en_5.5.0_3.0_1726719185627.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_zhian66","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_zhian66", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_zhian66| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Zhian66/finetuning-sentiment-model-3000-samples \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-finetuning_sentiment_model_3000_samples_zhian66_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-finetuning_sentiment_model_3000_samples_zhian66_pipeline_en.md new file mode 100644 index 00000000000000..07fe5797a128fe --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-finetuning_sentiment_model_3000_samples_zhian66_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_zhian66_pipeline pipeline DistilBertForSequenceClassification from Zhian66 +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_zhian66_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_zhian66_pipeline` is a English model originally trained by Zhian66. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_zhian66_pipeline_en_5.5.0_3.0_1726719198846.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_zhian66_pipeline_en_5.5.0_3.0_1726719198846.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuning_sentiment_model_3000_samples_zhian66_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuning_sentiment_model_3000_samples_zhian66_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_zhian66_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Zhian66/finetuning-sentiment-model-3000-samples + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-finetuning_sentiment_model_5000_amazon_en.md b/docs/_posts/ahmedlone127/2024-09-19-finetuning_sentiment_model_5000_amazon_en.md new file mode 100644 index 00000000000000..a7995d9b4c529a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-finetuning_sentiment_model_5000_amazon_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuning_sentiment_model_5000_amazon DistilBertForSequenceClassification from abyesses +author: John Snow Labs +name: finetuning_sentiment_model_5000_amazon +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_5000_amazon` is a English model originally trained by abyesses. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_5000_amazon_en_5.5.0_3.0_1726741473681.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_5000_amazon_en_5.5.0_3.0_1726741473681.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_5000_amazon","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_5000_amazon", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_5000_amazon| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/abyesses/finetuning-sentiment-model-5000-amazon \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-finetuning_sentiment_model_5000_amazon_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-finetuning_sentiment_model_5000_amazon_pipeline_en.md new file mode 100644 index 00000000000000..304ed042ebdb15 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-finetuning_sentiment_model_5000_amazon_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuning_sentiment_model_5000_amazon_pipeline pipeline DistilBertForSequenceClassification from abyesses +author: John Snow Labs +name: finetuning_sentiment_model_5000_amazon_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_5000_amazon_pipeline` is a English model originally trained by abyesses. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_5000_amazon_pipeline_en_5.5.0_3.0_1726741485848.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_5000_amazon_pipeline_en_5.5.0_3.0_1726741485848.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuning_sentiment_model_5000_amazon_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuning_sentiment_model_5000_amazon_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_5000_amazon_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/abyesses/finetuning-sentiment-model-5000-amazon + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-finetuning_sentiment_model_6000_samples_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-finetuning_sentiment_model_6000_samples_pipeline_en.md new file mode 100644 index 00000000000000..3c44982542eec7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-finetuning_sentiment_model_6000_samples_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuning_sentiment_model_6000_samples_pipeline pipeline DistilBertForSequenceClassification from WHL2001 +author: John Snow Labs +name: finetuning_sentiment_model_6000_samples_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_6000_samples_pipeline` is a English model originally trained by WHL2001. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_6000_samples_pipeline_en_5.5.0_3.0_1726763859965.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_6000_samples_pipeline_en_5.5.0_3.0_1726763859965.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuning_sentiment_model_6000_samples_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuning_sentiment_model_6000_samples_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_6000_samples_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/WHL2001/finetuning-sentiment-model-6000-samples + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-finetuning_sentiment_model_dscoder25_en.md b/docs/_posts/ahmedlone127/2024-09-19-finetuning_sentiment_model_dscoder25_en.md new file mode 100644 index 00000000000000..3ec216dfae21f6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-finetuning_sentiment_model_dscoder25_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuning_sentiment_model_dscoder25 DistilBertForSequenceClassification from dscoder25 +author: John Snow Labs +name: finetuning_sentiment_model_dscoder25 +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_dscoder25` is a English model originally trained by dscoder25. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_dscoder25_en_5.5.0_3.0_1726742410446.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_dscoder25_en_5.5.0_3.0_1726742410446.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_dscoder25","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_dscoder25", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_dscoder25| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/dscoder25/finetuning-sentiment-model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-finetuning_sentiment_model_dscoder25_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-finetuning_sentiment_model_dscoder25_pipeline_en.md new file mode 100644 index 00000000000000..b34fee6c9c2cb2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-finetuning_sentiment_model_dscoder25_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuning_sentiment_model_dscoder25_pipeline pipeline DistilBertForSequenceClassification from dscoder25 +author: John Snow Labs +name: finetuning_sentiment_model_dscoder25_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_dscoder25_pipeline` is a English model originally trained by dscoder25. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_dscoder25_pipeline_en_5.5.0_3.0_1726742426963.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_dscoder25_pipeline_en_5.5.0_3.0_1726742426963.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuning_sentiment_model_dscoder25_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuning_sentiment_model_dscoder25_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_dscoder25_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/dscoder25/finetuning-sentiment-model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-finetuning_sentiment_model_imdb_3000_samples_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-finetuning_sentiment_model_imdb_3000_samples_pipeline_en.md new file mode 100644 index 00000000000000..1edbbe32200bd0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-finetuning_sentiment_model_imdb_3000_samples_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuning_sentiment_model_imdb_3000_samples_pipeline pipeline DistilBertForSequenceClassification from akshataupadhye +author: John Snow Labs +name: finetuning_sentiment_model_imdb_3000_samples_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_imdb_3000_samples_pipeline` is a English model originally trained by akshataupadhye. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_imdb_3000_samples_pipeline_en_5.5.0_3.0_1726743440603.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_imdb_3000_samples_pipeline_en_5.5.0_3.0_1726743440603.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuning_sentiment_model_imdb_3000_samples_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuning_sentiment_model_imdb_3000_samples_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_imdb_3000_samples_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/akshataupadhye/finetuning-sentiment-model-imdb-3000-samples + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-finetuning_sentiment_model_lr_2e_05_en.md b/docs/_posts/ahmedlone127/2024-09-19-finetuning_sentiment_model_lr_2e_05_en.md new file mode 100644 index 00000000000000..d92e5f782599f0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-finetuning_sentiment_model_lr_2e_05_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuning_sentiment_model_lr_2e_05 DistilBertForSequenceClassification from ash-akjp-ga +author: John Snow Labs +name: finetuning_sentiment_model_lr_2e_05 +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_lr_2e_05` is a English model originally trained by ash-akjp-ga. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_lr_2e_05_en_5.5.0_3.0_1726743735188.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_lr_2e_05_en_5.5.0_3.0_1726743735188.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_lr_2e_05","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_lr_2e_05", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_lr_2e_05| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/ash-akjp-ga/finetuning-sentiment-model_lr_2e-05 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-finetuning_sentiment_model_mnamon_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-finetuning_sentiment_model_mnamon_pipeline_en.md new file mode 100644 index 00000000000000..7f8e28cb33a488 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-finetuning_sentiment_model_mnamon_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuning_sentiment_model_mnamon_pipeline pipeline DistilBertForSequenceClassification from mnamon +author: John Snow Labs +name: finetuning_sentiment_model_mnamon_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_mnamon_pipeline` is a English model originally trained by mnamon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_mnamon_pipeline_en_5.5.0_3.0_1726704514397.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_mnamon_pipeline_en_5.5.0_3.0_1726704514397.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuning_sentiment_model_mnamon_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuning_sentiment_model_mnamon_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_mnamon_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/mnamon/finetuning-sentiment-model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-finetuning_sentiment_model_reddit_3000_samples_en.md b/docs/_posts/ahmedlone127/2024-09-19-finetuning_sentiment_model_reddit_3000_samples_en.md new file mode 100644 index 00000000000000..c0c474094dd405 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-finetuning_sentiment_model_reddit_3000_samples_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuning_sentiment_model_reddit_3000_samples DistilBertForSequenceClassification from akshataupadhye +author: John Snow Labs +name: finetuning_sentiment_model_reddit_3000_samples +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_reddit_3000_samples` is a English model originally trained by akshataupadhye. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_reddit_3000_samples_en_5.5.0_3.0_1726743640586.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_reddit_3000_samples_en_5.5.0_3.0_1726743640586.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_reddit_3000_samples","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_reddit_3000_samples", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_reddit_3000_samples| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/akshataupadhye/finetuning-sentiment-model-reddit-3000-samples \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-flame_italian_pipeline_it.md b/docs/_posts/ahmedlone127/2024-09-19-flame_italian_pipeline_it.md new file mode 100644 index 00000000000000..3942d3d0372e64 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-flame_italian_pipeline_it.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Italian flame_italian_pipeline pipeline BertForSequenceClassification from aequa-tech +author: John Snow Labs +name: flame_italian_pipeline +date: 2024-09-19 +tags: [it, open_source, pipeline, onnx] +task: Text Classification +language: it +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`flame_italian_pipeline` is a Italian model originally trained by aequa-tech. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/flame_italian_pipeline_it_5.5.0_3.0_1726770876132.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/flame_italian_pipeline_it_5.5.0_3.0_1726770876132.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("flame_italian_pipeline", lang = "it") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("flame_italian_pipeline", lang = "it") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|flame_italian_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|it| +|Size:|691.9 MB| + +## References + +https://huggingface.co/aequa-tech/flame-it + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-formalrobertalincoln_en.md b/docs/_posts/ahmedlone127/2024-09-19-formalrobertalincoln_en.md new file mode 100644 index 00000000000000..edaf9b712c68c9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-formalrobertalincoln_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English formalrobertalincoln RoBertaEmbeddings from BigSalmon +author: John Snow Labs +name: formalrobertalincoln +date: 2024-09-19 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`formalrobertalincoln` is a English model originally trained by BigSalmon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/formalrobertalincoln_en_5.5.0_3.0_1726778014154.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/formalrobertalincoln_en_5.5.0_3.0_1726778014154.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("formalrobertalincoln","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("formalrobertalincoln","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|formalrobertalincoln| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|466.3 MB| + +## References + +https://huggingface.co/BigSalmon/FormalRobertaLincoln \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-ft_10m_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-ft_10m_pipeline_en.md new file mode 100644 index 00000000000000..ab864759de1378 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-ft_10m_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English ft_10m_pipeline pipeline WhisperForCTC from xuliu15 +author: John Snow Labs +name: ft_10m_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ft_10m_pipeline` is a English model originally trained by xuliu15. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ft_10m_pipeline_en_5.5.0_3.0_1726716384619.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ft_10m_pipeline_en_5.5.0_3.0_1726716384619.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("ft_10m_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("ft_10m_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ft_10m_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/xuliu15/FT-10m + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-ft_danish_distilbert_gdp_en.md b/docs/_posts/ahmedlone127/2024-09-19-ft_danish_distilbert_gdp_en.md new file mode 100644 index 00000000000000..ae5da3bb8a2d85 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-ft_danish_distilbert_gdp_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English ft_danish_distilbert_gdp DistilBertForSequenceClassification from gc394 +author: John Snow Labs +name: ft_danish_distilbert_gdp +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ft_danish_distilbert_gdp` is a English model originally trained by gc394. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ft_danish_distilbert_gdp_en_5.5.0_3.0_1726743645359.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ft_danish_distilbert_gdp_en_5.5.0_3.0_1726743645359.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("ft_danish_distilbert_gdp","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("ft_danish_distilbert_gdp", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ft_danish_distilbert_gdp| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.4 MB| + +## References + +https://huggingface.co/gc394/ft_da_distilbert_gdp \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-furina_seed42_eng_esp_hau_basic_5e_06_en.md b/docs/_posts/ahmedlone127/2024-09-19-furina_seed42_eng_esp_hau_basic_5e_06_en.md new file mode 100644 index 00000000000000..1303671fb25b21 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-furina_seed42_eng_esp_hau_basic_5e_06_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English furina_seed42_eng_esp_hau_basic_5e_06 XlmRoBertaForSequenceClassification from Shijia +author: John Snow Labs +name: furina_seed42_eng_esp_hau_basic_5e_06 +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`furina_seed42_eng_esp_hau_basic_5e_06` is a English model originally trained by Shijia. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/furina_seed42_eng_esp_hau_basic_5e_06_en_5.5.0_3.0_1726721613696.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/furina_seed42_eng_esp_hau_basic_5e_06_en_5.5.0_3.0_1726721613696.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("furina_seed42_eng_esp_hau_basic_5e_06","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("furina_seed42_eng_esp_hau_basic_5e_06", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|furina_seed42_eng_esp_hau_basic_5e_06| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.5 GB| + +## References + +https://huggingface.co/Shijia/furina_seed42_eng_esp_hau_basic_5e-06 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-gal_sayula_popoluca_iwcg_3_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-gal_sayula_popoluca_iwcg_3_pipeline_en.md new file mode 100644 index 00000000000000..febde0f28afa55 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-gal_sayula_popoluca_iwcg_3_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English gal_sayula_popoluca_iwcg_3_pipeline pipeline XlmRoBertaForTokenClassification from homersimpson +author: John Snow Labs +name: gal_sayula_popoluca_iwcg_3_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`gal_sayula_popoluca_iwcg_3_pipeline` is a English model originally trained by homersimpson. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/gal_sayula_popoluca_iwcg_3_pipeline_en_5.5.0_3.0_1726711414753.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/gal_sayula_popoluca_iwcg_3_pipeline_en_5.5.0_3.0_1726711414753.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("gal_sayula_popoluca_iwcg_3_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("gal_sayula_popoluca_iwcg_3_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|gal_sayula_popoluca_iwcg_3_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|424.0 MB| + +## References + +https://huggingface.co/homersimpson/gal-pos-iwcg-3 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-germanic_languages_qna_en.md b/docs/_posts/ahmedlone127/2024-09-19-germanic_languages_qna_en.md new file mode 100644 index 00000000000000..214540acd8035b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-germanic_languages_qna_en.md @@ -0,0 +1,92 @@ +--- +layout: model +title: English germanic_languages_qna DistilBertForQuestionAnswering from zuu +author: John Snow Labs +name: germanic_languages_qna +date: 2024-09-19 +tags: [distilbert, en, open_source, question_answering, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`germanic_languages_qna` is a English model originally trained by zuu. + +## Predicted Entities + + + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/germanic_languages_qna_en_5.5.0_3.0_1726727687765.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/germanic_languages_qna_en_5.5.0_3.0_1726727687765.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +document_assembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + + +spanClassifier = DistilBertForQuestionAnswering.pretrained("germanic_languages_qna","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([document_assembler, spanClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) +``` +```scala +val document_assembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering + .pretrained("germanic_languages_qna", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(document_assembler, spanClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|germanic_languages_qna| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|247.2 MB| + +## References + +References + +https://huggingface.co/zuu/gem-qna \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-hate_detect_distilbert_en.md b/docs/_posts/ahmedlone127/2024-09-19-hate_detect_distilbert_en.md new file mode 100644 index 00000000000000..7d4efdaaa2dffc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-hate_detect_distilbert_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English hate_detect_distilbert DistilBertForSequenceClassification from vipulkumar49 +author: John Snow Labs +name: hate_detect_distilbert +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`hate_detect_distilbert` is a English model originally trained by vipulkumar49. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/hate_detect_distilbert_en_5.5.0_3.0_1726743852217.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/hate_detect_distilbert_en_5.5.0_3.0_1726743852217.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("hate_detect_distilbert","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("hate_detect_distilbert", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|hate_detect_distilbert| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/vipulkumar49/hate_detect_distilbert \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-hate_hate_random0_seed2_roberta_base_en.md b/docs/_posts/ahmedlone127/2024-09-19-hate_hate_random0_seed2_roberta_base_en.md new file mode 100644 index 00000000000000..8f43d972e52070 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-hate_hate_random0_seed2_roberta_base_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English hate_hate_random0_seed2_roberta_base RoBertaForSequenceClassification from tweettemposhift +author: John Snow Labs +name: hate_hate_random0_seed2_roberta_base +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`hate_hate_random0_seed2_roberta_base` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/hate_hate_random0_seed2_roberta_base_en_5.5.0_3.0_1726732500404.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/hate_hate_random0_seed2_roberta_base_en_5.5.0_3.0_1726732500404.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("hate_hate_random0_seed2_roberta_base","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("hate_hate_random0_seed2_roberta_base", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|hate_hate_random0_seed2_roberta_base| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|427.7 MB| + +## References + +https://huggingface.co/tweettemposhift/hate-hate_random0_seed2-roberta-base \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-hate_hate_random2_seed1_twitter_roberta_base_2019_90m_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-hate_hate_random2_seed1_twitter_roberta_base_2019_90m_pipeline_en.md new file mode 100644 index 00000000000000..24f22aad7a886e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-hate_hate_random2_seed1_twitter_roberta_base_2019_90m_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English hate_hate_random2_seed1_twitter_roberta_base_2019_90m_pipeline pipeline RoBertaForSequenceClassification from tweettemposhift +author: John Snow Labs +name: hate_hate_random2_seed1_twitter_roberta_base_2019_90m_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`hate_hate_random2_seed1_twitter_roberta_base_2019_90m_pipeline` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/hate_hate_random2_seed1_twitter_roberta_base_2019_90m_pipeline_en_5.5.0_3.0_1726751330761.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/hate_hate_random2_seed1_twitter_roberta_base_2019_90m_pipeline_en_5.5.0_3.0_1726751330761.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("hate_hate_random2_seed1_twitter_roberta_base_2019_90m_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("hate_hate_random2_seed1_twitter_roberta_base_2019_90m_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|hate_hate_random2_seed1_twitter_roberta_base_2019_90m_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|468.3 MB| + +## References + +https://huggingface.co/tweettemposhift/hate-hate_random2_seed1-twitter-roberta-base-2019-90m + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-hausa_sentiment_analysis_ha.md b/docs/_posts/ahmedlone127/2024-09-19-hausa_sentiment_analysis_ha.md new file mode 100644 index 00000000000000..97f19e5726b76f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-hausa_sentiment_analysis_ha.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Hausa hausa_sentiment_analysis BertForSequenceClassification from Kumshe +author: John Snow Labs +name: hausa_sentiment_analysis +date: 2024-09-19 +tags: [ha, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: ha +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`hausa_sentiment_analysis` is a Hausa model originally trained by Kumshe. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/hausa_sentiment_analysis_ha_5.5.0_3.0_1726736488226.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/hausa_sentiment_analysis_ha_5.5.0_3.0_1726736488226.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("hausa_sentiment_analysis","ha") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("hausa_sentiment_analysis", "ha") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|hausa_sentiment_analysis| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|ha| +|Size:|405.9 MB| + +## References + +https://huggingface.co/Kumshe/Hausa-sentiment-analysis \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-hausa_sentiment_analysis_pipeline_ha.md b/docs/_posts/ahmedlone127/2024-09-19-hausa_sentiment_analysis_pipeline_ha.md new file mode 100644 index 00000000000000..ada00bc7e8c956 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-hausa_sentiment_analysis_pipeline_ha.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Hausa hausa_sentiment_analysis_pipeline pipeline BertForSequenceClassification from Kumshe +author: John Snow Labs +name: hausa_sentiment_analysis_pipeline +date: 2024-09-19 +tags: [ha, open_source, pipeline, onnx] +task: Text Classification +language: ha +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`hausa_sentiment_analysis_pipeline` is a Hausa model originally trained by Kumshe. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/hausa_sentiment_analysis_pipeline_ha_5.5.0_3.0_1726736507386.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/hausa_sentiment_analysis_pipeline_ha_5.5.0_3.0_1726736507386.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("hausa_sentiment_analysis_pipeline", lang = "ha") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("hausa_sentiment_analysis_pipeline", lang = "ha") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|hausa_sentiment_analysis_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|ha| +|Size:|405.9 MB| + +## References + +https://huggingface.co/Kumshe/Hausa-sentiment-analysis + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-hupd_distilbert_2023_02_16_13_20_en.md b/docs/_posts/ahmedlone127/2024-09-19-hupd_distilbert_2023_02_16_13_20_en.md new file mode 100644 index 00000000000000..fa34d1d5fa84a9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-hupd_distilbert_2023_02_16_13_20_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English hupd_distilbert_2023_02_16_13_20 RoBertaForSequenceClassification from leeju +author: John Snow Labs +name: hupd_distilbert_2023_02_16_13_20 +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`hupd_distilbert_2023_02_16_13_20` is a English model originally trained by leeju. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/hupd_distilbert_2023_02_16_13_20_en_5.5.0_3.0_1726751046189.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/hupd_distilbert_2023_02_16_13_20_en_5.5.0_3.0_1726751046189.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("hupd_distilbert_2023_02_16_13_20","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("hupd_distilbert_2023_02_16_13_20", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|hupd_distilbert_2023_02_16_13_20| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|312.3 MB| + +## References + +https://huggingface.co/leeju/HUPD_distilbert_2023-02-16_13-20 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-hw001_lastsmile_en.md b/docs/_posts/ahmedlone127/2024-09-19-hw001_lastsmile_en.md new file mode 100644 index 00000000000000..fab12bad31020d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-hw001_lastsmile_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English hw001_lastsmile DistilBertForSequenceClassification from LastSmile +author: John Snow Labs +name: hw001_lastsmile +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`hw001_lastsmile` is a English model originally trained by LastSmile. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/hw001_lastsmile_en_5.5.0_3.0_1726719308297.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/hw001_lastsmile_en_5.5.0_3.0_1726719308297.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("hw001_lastsmile","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("hw001_lastsmile", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|hw001_lastsmile| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/LastSmile/HW001 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-hw01_albertttt_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-hw01_albertttt_pipeline_en.md new file mode 100644 index 00000000000000..0beb5226cb73fc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-hw01_albertttt_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English hw01_albertttt_pipeline pipeline DistilBertForSequenceClassification from albertttt +author: John Snow Labs +name: hw01_albertttt_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`hw01_albertttt_pipeline` is a English model originally trained by albertttt. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/hw01_albertttt_pipeline_en_5.5.0_3.0_1726764101999.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/hw01_albertttt_pipeline_en_5.5.0_3.0_1726764101999.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("hw01_albertttt_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("hw01_albertttt_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|hw01_albertttt_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/albertttt/HW01 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-icf_domains_nl.md b/docs/_posts/ahmedlone127/2024-09-19-icf_domains_nl.md new file mode 100644 index 00000000000000..ea5919ad5b1c8a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-icf_domains_nl.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Dutch, Flemish icf_domains RoBertaForSequenceClassification from CLTL +author: John Snow Labs +name: icf_domains +date: 2024-09-19 +tags: [nl, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: nl +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`icf_domains` is a Dutch, Flemish model originally trained by CLTL. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/icf_domains_nl_5.5.0_3.0_1726726043979.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/icf_domains_nl_5.5.0_3.0_1726726043979.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("icf_domains","nl") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("icf_domains", "nl") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|icf_domains| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|nl| +|Size:|472.0 MB| + +## References + +https://huggingface.co/CLTL/icf-domains \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-imdb_binary_classifier_roberta_base_en.md b/docs/_posts/ahmedlone127/2024-09-19-imdb_binary_classifier_roberta_base_en.md new file mode 100644 index 00000000000000..1336d5199a25f1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-imdb_binary_classifier_roberta_base_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English imdb_binary_classifier_roberta_base RoBertaForSequenceClassification from againeureka +author: John Snow Labs +name: imdb_binary_classifier_roberta_base +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`imdb_binary_classifier_roberta_base` is a English model originally trained by againeureka. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/imdb_binary_classifier_roberta_base_en_5.5.0_3.0_1726725835175.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/imdb_binary_classifier_roberta_base_en_5.5.0_3.0_1726725835175.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("imdb_binary_classifier_roberta_base","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("imdb_binary_classifier_roberta_base", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|imdb_binary_classifier_roberta_base| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|464.2 MB| + +## References + +https://huggingface.co/againeureka/imdb_binary_classifier_roberta_base \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-imdb_binary_classifier_roberta_base_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-imdb_binary_classifier_roberta_base_pipeline_en.md new file mode 100644 index 00000000000000..b43c4b28f0ed21 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-imdb_binary_classifier_roberta_base_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English imdb_binary_classifier_roberta_base_pipeline pipeline RoBertaForSequenceClassification from againeureka +author: John Snow Labs +name: imdb_binary_classifier_roberta_base_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`imdb_binary_classifier_roberta_base_pipeline` is a English model originally trained by againeureka. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/imdb_binary_classifier_roberta_base_pipeline_en_5.5.0_3.0_1726725858512.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/imdb_binary_classifier_roberta_base_pipeline_en_5.5.0_3.0_1726725858512.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("imdb_binary_classifier_roberta_base_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("imdb_binary_classifier_roberta_base_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|imdb_binary_classifier_roberta_base_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|464.2 MB| + +## References + +https://huggingface.co/againeureka/imdb_binary_classifier_roberta_base + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-imdbreviews_classification_roberta_base_en.md b/docs/_posts/ahmedlone127/2024-09-19-imdbreviews_classification_roberta_base_en.md new file mode 100644 index 00000000000000..f7e73990132deb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-imdbreviews_classification_roberta_base_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English imdbreviews_classification_roberta_base RoBertaForSequenceClassification from JmGarzonv +author: John Snow Labs +name: imdbreviews_classification_roberta_base +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`imdbreviews_classification_roberta_base` is a English model originally trained by JmGarzonv. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/imdbreviews_classification_roberta_base_en_5.5.0_3.0_1726779864207.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/imdbreviews_classification_roberta_base_en_5.5.0_3.0_1726779864207.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("imdbreviews_classification_roberta_base","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("imdbreviews_classification_roberta_base", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|imdbreviews_classification_roberta_base| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|463.8 MB| + +## References + +https://huggingface.co/JmGarzonv/imdbreviews_classification_roberta-base \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-imdbreviews_classification_roberta_base_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-imdbreviews_classification_roberta_base_pipeline_en.md new file mode 100644 index 00000000000000..2f8737fc6e6046 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-imdbreviews_classification_roberta_base_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English imdbreviews_classification_roberta_base_pipeline pipeline RoBertaForSequenceClassification from JmGarzonv +author: John Snow Labs +name: imdbreviews_classification_roberta_base_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`imdbreviews_classification_roberta_base_pipeline` is a English model originally trained by JmGarzonv. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/imdbreviews_classification_roberta_base_pipeline_en_5.5.0_3.0_1726779888587.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/imdbreviews_classification_roberta_base_pipeline_en_5.5.0_3.0_1726779888587.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("imdbreviews_classification_roberta_base_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("imdbreviews_classification_roberta_base_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|imdbreviews_classification_roberta_base_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|463.9 MB| + +## References + +https://huggingface.co/JmGarzonv/imdbreviews_classification_roberta-base + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-insta_sentiment_distill_roberta_en.md b/docs/_posts/ahmedlone127/2024-09-19-insta_sentiment_distill_roberta_en.md new file mode 100644 index 00000000000000..0cbfcb6f0c8221 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-insta_sentiment_distill_roberta_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English insta_sentiment_distill_roberta RoBertaForSequenceClassification from davin45 +author: John Snow Labs +name: insta_sentiment_distill_roberta +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`insta_sentiment_distill_roberta` is a English model originally trained by davin45. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/insta_sentiment_distill_roberta_en_5.5.0_3.0_1726732696995.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/insta_sentiment_distill_roberta_en_5.5.0_3.0_1726732696995.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("insta_sentiment_distill_roberta","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("insta_sentiment_distill_roberta", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|insta_sentiment_distill_roberta| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|309.0 MB| + +## References + +https://huggingface.co/davin45/insta-sentiment-distill-roberta \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-insta_sentiment_distill_roberta_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-insta_sentiment_distill_roberta_pipeline_en.md new file mode 100644 index 00000000000000..3bc30711317ba4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-insta_sentiment_distill_roberta_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English insta_sentiment_distill_roberta_pipeline pipeline RoBertaForSequenceClassification from davin45 +author: John Snow Labs +name: insta_sentiment_distill_roberta_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`insta_sentiment_distill_roberta_pipeline` is a English model originally trained by davin45. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/insta_sentiment_distill_roberta_pipeline_en_5.5.0_3.0_1726732711769.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/insta_sentiment_distill_roberta_pipeline_en_5.5.0_3.0_1726732711769.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("insta_sentiment_distill_roberta_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("insta_sentiment_distill_roberta_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|insta_sentiment_distill_roberta_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|309.1 MB| + +## References + +https://huggingface.co/davin45/insta-sentiment-distill-roberta + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-intercalado_id23_en.md b/docs/_posts/ahmedlone127/2024-09-19-intercalado_id23_en.md new file mode 100644 index 00000000000000..93f564fe1a91aa --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-intercalado_id23_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English intercalado_id23 DistilBertForSequenceClassification from manarea +author: John Snow Labs +name: intercalado_id23 +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`intercalado_id23` is a English model originally trained by manarea. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/intercalado_id23_en_5.5.0_3.0_1726740809802.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/intercalado_id23_en_5.5.0_3.0_1726740809802.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("intercalado_id23","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("intercalado_id23", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|intercalado_id23| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|223.0 MB| + +## References + +https://huggingface.co/manarea/Intercalado-ID23 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-jerteh355sentneg2_en.md b/docs/_posts/ahmedlone127/2024-09-19-jerteh355sentneg2_en.md new file mode 100644 index 00000000000000..4f08727622be91 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-jerteh355sentneg2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English jerteh355sentneg2 RoBertaForSequenceClassification from Tanor +author: John Snow Labs +name: jerteh355sentneg2 +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`jerteh355sentneg2` is a English model originally trained by Tanor. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/jerteh355sentneg2_en_5.5.0_3.0_1726732599344.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/jerteh355sentneg2_en_5.5.0_3.0_1726732599344.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("jerteh355sentneg2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("jerteh355sentneg2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|jerteh355sentneg2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/Tanor/Jerteh355SENTNEG2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-jerteh355sentneg2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-jerteh355sentneg2_pipeline_en.md new file mode 100644 index 00000000000000..cbc67ccd806f28 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-jerteh355sentneg2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English jerteh355sentneg2_pipeline pipeline RoBertaForSequenceClassification from Tanor +author: John Snow Labs +name: jerteh355sentneg2_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`jerteh355sentneg2_pipeline` is a English model originally trained by Tanor. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/jerteh355sentneg2_pipeline_en_5.5.0_3.0_1726732670152.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/jerteh355sentneg2_pipeline_en_5.5.0_3.0_1726732670152.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("jerteh355sentneg2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("jerteh355sentneg2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|jerteh355sentneg2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/Tanor/Jerteh355SENTNEG2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-just_another_emotion_classifier_en.md b/docs/_posts/ahmedlone127/2024-09-19-just_another_emotion_classifier_en.md new file mode 100644 index 00000000000000..48d111964134d8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-just_another_emotion_classifier_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English just_another_emotion_classifier BertForSequenceClassification from bdotloh +author: John Snow Labs +name: just_another_emotion_classifier +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`just_another_emotion_classifier` is a English model originally trained by bdotloh. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/just_another_emotion_classifier_en_5.5.0_3.0_1726707131531.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/just_another_emotion_classifier_en_5.5.0_3.0_1726707131531.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("just_another_emotion_classifier","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("just_another_emotion_classifier", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|just_another_emotion_classifier| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|410.1 MB| + +## References + +https://huggingface.co/bdotloh/just-another-emotion-classifier \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-khmer_sentence_segmentation_en.md b/docs/_posts/ahmedlone127/2024-09-19-khmer_sentence_segmentation_en.md new file mode 100644 index 00000000000000..a5db4aa56d9159 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-khmer_sentence_segmentation_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English khmer_sentence_segmentation XlmRoBertaForTokenClassification from seanghay +author: John Snow Labs +name: khmer_sentence_segmentation +date: 2024-09-19 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`khmer_sentence_segmentation` is a English model originally trained by seanghay. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/khmer_sentence_segmentation_en_5.5.0_3.0_1726737888102.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/khmer_sentence_segmentation_en_5.5.0_3.0_1726737888102.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("khmer_sentence_segmentation","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("khmer_sentence_segmentation", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|khmer_sentence_segmentation| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|838.6 MB| + +## References + +https://huggingface.co/seanghay/khmer-sentence-segmentation \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-khmer_sentence_segmentation_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-khmer_sentence_segmentation_pipeline_en.md new file mode 100644 index 00000000000000..246f6e6565453d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-khmer_sentence_segmentation_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English khmer_sentence_segmentation_pipeline pipeline XlmRoBertaForTokenClassification from seanghay +author: John Snow Labs +name: khmer_sentence_segmentation_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`khmer_sentence_segmentation_pipeline` is a English model originally trained by seanghay. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/khmer_sentence_segmentation_pipeline_en_5.5.0_3.0_1726737956234.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/khmer_sentence_segmentation_pipeline_en_5.5.0_3.0_1726737956234.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("khmer_sentence_segmentation_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("khmer_sentence_segmentation_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|khmer_sentence_segmentation_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|838.6 MB| + +## References + +https://huggingface.co/seanghay/khmer-sentence-segmentation + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-lang_transcribe_pipeline_hi.md b/docs/_posts/ahmedlone127/2024-09-19-lang_transcribe_pipeline_hi.md new file mode 100644 index 00000000000000..1c11305f82f072 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-lang_transcribe_pipeline_hi.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Hindi lang_transcribe_pipeline pipeline WhisperForCTC from bimamuhammad +author: John Snow Labs +name: lang_transcribe_pipeline +date: 2024-09-19 +tags: [hi, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: hi +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`lang_transcribe_pipeline` is a Hindi model originally trained by bimamuhammad. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/lang_transcribe_pipeline_hi_5.5.0_3.0_1726715371821.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/lang_transcribe_pipeline_hi_5.5.0_3.0_1726715371821.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("lang_transcribe_pipeline", lang = "hi") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("lang_transcribe_pipeline", lang = "hi") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|lang_transcribe_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|hi| +|Size:|1.7 GB| + +## References + +https://huggingface.co/bimamuhammad/lang_transcribe + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-legal_base_v1_5__checkpoint_last_en.md b/docs/_posts/ahmedlone127/2024-09-19-legal_base_v1_5__checkpoint_last_en.md new file mode 100644 index 00000000000000..2dd4e65688ce51 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-legal_base_v1_5__checkpoint_last_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English legal_base_v1_5__checkpoint_last RoBertaEmbeddings from eduagarcia-temp +author: John Snow Labs +name: legal_base_v1_5__checkpoint_last +date: 2024-09-19 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`legal_base_v1_5__checkpoint_last` is a English model originally trained by eduagarcia-temp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/legal_base_v1_5__checkpoint_last_en_5.5.0_3.0_1726747763871.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/legal_base_v1_5__checkpoint_last_en_5.5.0_3.0_1726747763871.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("legal_base_v1_5__checkpoint_last","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("legal_base_v1_5__checkpoint_last","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|legal_base_v1_5__checkpoint_last| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|296.3 MB| + +## References + +https://huggingface.co/eduagarcia-temp/legal_base_v1_5__checkpoint_last \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-legal_base_v1_5__checkpoint_last_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-legal_base_v1_5__checkpoint_last_pipeline_en.md new file mode 100644 index 00000000000000..9dfbf279b59061 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-legal_base_v1_5__checkpoint_last_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English legal_base_v1_5__checkpoint_last_pipeline pipeline RoBertaEmbeddings from eduagarcia-temp +author: John Snow Labs +name: legal_base_v1_5__checkpoint_last_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`legal_base_v1_5__checkpoint_last_pipeline` is a English model originally trained by eduagarcia-temp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/legal_base_v1_5__checkpoint_last_pipeline_en_5.5.0_3.0_1726747852710.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/legal_base_v1_5__checkpoint_last_pipeline_en_5.5.0_3.0_1726747852710.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("legal_base_v1_5__checkpoint_last_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("legal_base_v1_5__checkpoint_last_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|legal_base_v1_5__checkpoint_last_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|296.3 MB| + +## References + +https://huggingface.co/eduagarcia-temp/legal_base_v1_5__checkpoint_last + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-legal_longformer_base_8192_spanish_en.md b/docs/_posts/ahmedlone127/2024-09-19-legal_longformer_base_8192_spanish_en.md new file mode 100644 index 00000000000000..85132d8fb3ed96 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-legal_longformer_base_8192_spanish_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English legal_longformer_base_8192_spanish RoBertaEmbeddings from clibrain +author: John Snow Labs +name: legal_longformer_base_8192_spanish +date: 2024-09-19 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`legal_longformer_base_8192_spanish` is a English model originally trained by clibrain. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/legal_longformer_base_8192_spanish_en_5.5.0_3.0_1726778162217.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/legal_longformer_base_8192_spanish_en_5.5.0_3.0_1726778162217.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("legal_longformer_base_8192_spanish","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("legal_longformer_base_8192_spanish","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|legal_longformer_base_8192_spanish| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|490.6 MB| + +## References + +https://huggingface.co/clibrain/legal-longformer-base-8192-spanish \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-maghriberta_en.md b/docs/_posts/ahmedlone127/2024-09-19-maghriberta_en.md new file mode 100644 index 00000000000000..365d2201824105 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-maghriberta_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English maghriberta RoBertaEmbeddings from nboudad +author: John Snow Labs +name: maghriberta +date: 2024-09-19 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`maghriberta` is a English model originally trained by nboudad. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/maghriberta_en_5.5.0_3.0_1726747055219.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/maghriberta_en_5.5.0_3.0_1726747055219.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("maghriberta","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("maghriberta","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|maghriberta| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|346.5 MB| + +## References + +https://huggingface.co/nboudad/Maghriberta \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-maghriberta_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-maghriberta_pipeline_en.md new file mode 100644 index 00000000000000..807b9d87bc7471 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-maghriberta_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English maghriberta_pipeline pipeline RoBertaEmbeddings from nboudad +author: John Snow Labs +name: maghriberta_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`maghriberta_pipeline` is a English model originally trained by nboudad. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/maghriberta_pipeline_en_5.5.0_3.0_1726747072905.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/maghriberta_pipeline_en_5.5.0_3.0_1726747072905.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("maghriberta_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("maghriberta_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|maghriberta_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|346.6 MB| + +## References + +https://huggingface.co/nboudad/Maghriberta + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-magpie_portuguese_xlm_en.md b/docs/_posts/ahmedlone127/2024-09-19-magpie_portuguese_xlm_en.md new file mode 100644 index 00000000000000..1376ea1bc8d364 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-magpie_portuguese_xlm_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English magpie_portuguese_xlm XlmRoBertaForSequenceClassification from mediabiasgroup +author: John Snow Labs +name: magpie_portuguese_xlm +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`magpie_portuguese_xlm` is a English model originally trained by mediabiasgroup. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/magpie_portuguese_xlm_en_5.5.0_3.0_1726720712337.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/magpie_portuguese_xlm_en_5.5.0_3.0_1726720712337.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("magpie_portuguese_xlm","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("magpie_portuguese_xlm", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|magpie_portuguese_xlm| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|878.8 MB| + +## References + +https://huggingface.co/mediabiasgroup/magpie-pt-xlm \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-magpie_portuguese_xlm_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-magpie_portuguese_xlm_pipeline_en.md new file mode 100644 index 00000000000000..f7c5f0075fb3b5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-magpie_portuguese_xlm_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English magpie_portuguese_xlm_pipeline pipeline XlmRoBertaForSequenceClassification from mediabiasgroup +author: John Snow Labs +name: magpie_portuguese_xlm_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`magpie_portuguese_xlm_pipeline` is a English model originally trained by mediabiasgroup. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/magpie_portuguese_xlm_pipeline_en_5.5.0_3.0_1726720798472.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/magpie_portuguese_xlm_pipeline_en_5.5.0_3.0_1726720798472.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("magpie_portuguese_xlm_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("magpie_portuguese_xlm_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|magpie_portuguese_xlm_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|878.8 MB| + +## References + +https://huggingface.co/mediabiasgroup/magpie-pt-xlm + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-marathi_marh_small_whisper_oi_pipeline_mr.md b/docs/_posts/ahmedlone127/2024-09-19-marathi_marh_small_whisper_oi_pipeline_mr.md new file mode 100644 index 00000000000000..fd64448a01968e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-marathi_marh_small_whisper_oi_pipeline_mr.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Marathi marathi_marh_small_whisper_oi_pipeline pipeline WhisperForCTC from simran14 +author: John Snow Labs +name: marathi_marh_small_whisper_oi_pipeline +date: 2024-09-19 +tags: [mr, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: mr +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`marathi_marh_small_whisper_oi_pipeline` is a Marathi model originally trained by simran14. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/marathi_marh_small_whisper_oi_pipeline_mr_5.5.0_3.0_1726713128040.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/marathi_marh_small_whisper_oi_pipeline_mr_5.5.0_3.0_1726713128040.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("marathi_marh_small_whisper_oi_pipeline", lang = "mr") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("marathi_marh_small_whisper_oi_pipeline", lang = "mr") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|marathi_marh_small_whisper_oi_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|mr| +|Size:|1.7 GB| + +## References + +https://huggingface.co/simran14/mr-small-whisper-oi + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-marathi_marh_val_dn_mr.md b/docs/_posts/ahmedlone127/2024-09-19-marathi_marh_val_dn_mr.md new file mode 100644 index 00000000000000..58e5a0bacffa3d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-marathi_marh_val_dn_mr.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Marathi marathi_marh_val_dn WhisperForCTC from simran14 +author: John Snow Labs +name: marathi_marh_val_dn +date: 2024-09-19 +tags: [mr, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: mr +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`marathi_marh_val_dn` is a Marathi model originally trained by simran14. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/marathi_marh_val_dn_mr_5.5.0_3.0_1726714260416.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/marathi_marh_val_dn_mr_5.5.0_3.0_1726714260416.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("marathi_marh_val_dn","mr") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("marathi_marh_val_dn", "mr") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|marathi_marh_val_dn| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|mr| +|Size:|1.7 GB| + +## References + +https://huggingface.co/simran14/mr-val-dn \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-mdt_ie_ner_baseline_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-mdt_ie_ner_baseline_pipeline_en.md new file mode 100644 index 00000000000000..0bafcc9ace8530 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-mdt_ie_ner_baseline_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English mdt_ie_ner_baseline_pipeline pipeline XlmRoBertaForTokenClassification from OSainz +author: John Snow Labs +name: mdt_ie_ner_baseline_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`mdt_ie_ner_baseline_pipeline` is a English model originally trained by OSainz. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/mdt_ie_ner_baseline_pipeline_en_5.5.0_3.0_1726754764234.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/mdt_ie_ner_baseline_pipeline_en_5.5.0_3.0_1726754764234.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("mdt_ie_ner_baseline_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("mdt_ie_ner_baseline_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|mdt_ie_ner_baseline_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|789.5 MB| + +## References + +https://huggingface.co/OSainz/mdt-ie-ner-baseline + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-mental_health_model_ilabutk_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-mental_health_model_ilabutk_pipeline_en.md new file mode 100644 index 00000000000000..9c8ac969b957fd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-mental_health_model_ilabutk_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English mental_health_model_ilabutk_pipeline pipeline DistilBertForSequenceClassification from iLabUtk +author: John Snow Labs +name: mental_health_model_ilabutk_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`mental_health_model_ilabutk_pipeline` is a English model originally trained by iLabUtk. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/mental_health_model_ilabutk_pipeline_en_5.5.0_3.0_1726743084570.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/mental_health_model_ilabutk_pipeline_en_5.5.0_3.0_1726743084570.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("mental_health_model_ilabutk_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("mental_health_model_ilabutk_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|mental_health_model_ilabutk_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/iLabUtk/mental_health_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-mlm_finetunedmodel_test_en.md b/docs/_posts/ahmedlone127/2024-09-19-mlm_finetunedmodel_test_en.md new file mode 100644 index 00000000000000..b63dce5a80ca56 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-mlm_finetunedmodel_test_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English mlm_finetunedmodel_test RoBertaEmbeddings from shradha01 +author: John Snow Labs +name: mlm_finetunedmodel_test +date: 2024-09-19 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`mlm_finetunedmodel_test` is a English model originally trained by shradha01. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/mlm_finetunedmodel_test_en_5.5.0_3.0_1726746959343.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/mlm_finetunedmodel_test_en_5.5.0_3.0_1726746959343.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("mlm_finetunedmodel_test","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("mlm_finetunedmodel_test","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|mlm_finetunedmodel_test| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|466.1 MB| + +## References + +https://huggingface.co/shradha01/MLM_FinetunedModel_test \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-model_name_jenniferlimpopo_en.md b/docs/_posts/ahmedlone127/2024-09-19-model_name_jenniferlimpopo_en.md new file mode 100644 index 00000000000000..4da5fe2848c03d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-model_name_jenniferlimpopo_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English model_name_jenniferlimpopo DistilBertForSequenceClassification from Jenniferlimpopo +author: John Snow Labs +name: model_name_jenniferlimpopo +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`model_name_jenniferlimpopo` is a English model originally trained by Jenniferlimpopo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/model_name_jenniferlimpopo_en_5.5.0_3.0_1726719089884.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/model_name_jenniferlimpopo_en_5.5.0_3.0_1726719089884.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("model_name_jenniferlimpopo","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("model_name_jenniferlimpopo", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|model_name_jenniferlimpopo| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Jenniferlimpopo/model_name \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-model_name_jenniferlimpopo_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-model_name_jenniferlimpopo_pipeline_en.md new file mode 100644 index 00000000000000..d55aa05c63638d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-model_name_jenniferlimpopo_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English model_name_jenniferlimpopo_pipeline pipeline DistilBertForSequenceClassification from Jenniferlimpopo +author: John Snow Labs +name: model_name_jenniferlimpopo_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`model_name_jenniferlimpopo_pipeline` is a English model originally trained by Jenniferlimpopo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/model_name_jenniferlimpopo_pipeline_en_5.5.0_3.0_1726719102927.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/model_name_jenniferlimpopo_pipeline_en_5.5.0_3.0_1726719102927.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("model_name_jenniferlimpopo_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("model_name_jenniferlimpopo_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|model_name_jenniferlimpopo_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Jenniferlimpopo/model_name + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-movie_overview_classification_en.md b/docs/_posts/ahmedlone127/2024-09-19-movie_overview_classification_en.md new file mode 100644 index 00000000000000..06cf3d7d55a108 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-movie_overview_classification_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English movie_overview_classification DistilBertForSequenceClassification from mocboch +author: John Snow Labs +name: movie_overview_classification +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`movie_overview_classification` is a English model originally trained by mocboch. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/movie_overview_classification_en_5.5.0_3.0_1726741114905.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/movie_overview_classification_en_5.5.0_3.0_1726741114905.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("movie_overview_classification","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("movie_overview_classification", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|movie_overview_classification| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/mocboch/movie_overview_classification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-multilingual_xlm_roberta_for_ner_ertyazilim_xx.md b/docs/_posts/ahmedlone127/2024-09-19-multilingual_xlm_roberta_for_ner_ertyazilim_xx.md new file mode 100644 index 00000000000000..07dff2af492647 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-multilingual_xlm_roberta_for_ner_ertyazilim_xx.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Multilingual multilingual_xlm_roberta_for_ner_ertyazilim XlmRoBertaForTokenClassification from ertyazilim +author: John Snow Labs +name: multilingual_xlm_roberta_for_ner_ertyazilim +date: 2024-09-19 +tags: [xx, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`multilingual_xlm_roberta_for_ner_ertyazilim` is a Multilingual model originally trained by ertyazilim. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/multilingual_xlm_roberta_for_ner_ertyazilim_xx_5.5.0_3.0_1726737425165.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/multilingual_xlm_roberta_for_ner_ertyazilim_xx_5.5.0_3.0_1726737425165.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("multilingual_xlm_roberta_for_ner_ertyazilim","xx") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("multilingual_xlm_roberta_for_ner_ertyazilim", "xx") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|multilingual_xlm_roberta_for_ner_ertyazilim| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|xx| +|Size:|840.8 MB| + +## References + +https://huggingface.co/ertyazilim/multilingual-xlm-roberta-for-ner \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-n_roberta_imdb_padding90model_en.md b/docs/_posts/ahmedlone127/2024-09-19-n_roberta_imdb_padding90model_en.md new file mode 100644 index 00000000000000..c16da52f63a146 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-n_roberta_imdb_padding90model_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English n_roberta_imdb_padding90model RoBertaForSequenceClassification from Realgon +author: John Snow Labs +name: n_roberta_imdb_padding90model +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`n_roberta_imdb_padding90model` is a English model originally trained by Realgon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/n_roberta_imdb_padding90model_en_5.5.0_3.0_1726780472885.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/n_roberta_imdb_padding90model_en_5.5.0_3.0_1726780472885.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("n_roberta_imdb_padding90model","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("n_roberta_imdb_padding90model", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|n_roberta_imdb_padding90model| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|463.2 MB| + +## References + +https://huggingface.co/Realgon/N_roberta_imdb_padding90model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-n_roberta_imdb_padding90model_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-n_roberta_imdb_padding90model_pipeline_en.md new file mode 100644 index 00000000000000..c7efe3e82b62f1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-n_roberta_imdb_padding90model_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English n_roberta_imdb_padding90model_pipeline pipeline RoBertaForSequenceClassification from Realgon +author: John Snow Labs +name: n_roberta_imdb_padding90model_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`n_roberta_imdb_padding90model_pipeline` is a English model originally trained by Realgon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/n_roberta_imdb_padding90model_pipeline_en_5.5.0_3.0_1726780496853.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/n_roberta_imdb_padding90model_pipeline_en_5.5.0_3.0_1726780496853.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("n_roberta_imdb_padding90model_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("n_roberta_imdb_padding90model_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|n_roberta_imdb_padding90model_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|463.3 MB| + +## References + +https://huggingface.co/Realgon/N_roberta_imdb_padding90model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-ndd_claroline_test_tags_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-ndd_claroline_test_tags_pipeline_en.md new file mode 100644 index 00000000000000..e444da9ef38fee --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-ndd_claroline_test_tags_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English ndd_claroline_test_tags_pipeline pipeline DistilBertForSequenceClassification from lgk03 +author: John Snow Labs +name: ndd_claroline_test_tags_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ndd_claroline_test_tags_pipeline` is a English model originally trained by lgk03. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ndd_claroline_test_tags_pipeline_en_5.5.0_3.0_1726742426946.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ndd_claroline_test_tags_pipeline_en_5.5.0_3.0_1726742426946.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("ndd_claroline_test_tags_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("ndd_claroline_test_tags_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ndd_claroline_test_tags_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/lgk03/NDD-claroline_test-tags + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-ndd_phoenix_test_content_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-ndd_phoenix_test_content_pipeline_en.md new file mode 100644 index 00000000000000..c7fff2529aebff --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-ndd_phoenix_test_content_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English ndd_phoenix_test_content_pipeline pipeline DistilBertForSequenceClassification from lgk03 +author: John Snow Labs +name: ndd_phoenix_test_content_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ndd_phoenix_test_content_pipeline` is a English model originally trained by lgk03. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ndd_phoenix_test_content_pipeline_en_5.5.0_3.0_1726763660569.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ndd_phoenix_test_content_pipeline_en_5.5.0_3.0_1726763660569.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("ndd_phoenix_test_content_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("ndd_phoenix_test_content_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ndd_phoenix_test_content_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/lgk03/NDD-phoenix_test-content + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-nepal_bhasa_bert_en.md b/docs/_posts/ahmedlone127/2024-09-19-nepal_bhasa_bert_en.md new file mode 100644 index 00000000000000..a84c04a4cb70da --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-nepal_bhasa_bert_en.md @@ -0,0 +1,92 @@ +--- +layout: model +title: English nepal_bhasa_bert BertEmbeddings from onlydj96 +author: John Snow Labs +name: nepal_bhasa_bert +date: 2024-09-19 +tags: [bert, en, open_source, fill_mask, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`nepal_bhasa_bert` is a English model originally trained by onlydj96. + +## Predicted Entities + + + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/nepal_bhasa_bert_en_5.5.0_3.0_1726705482586.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/nepal_bhasa_bert_en_5.5.0_3.0_1726705482586.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +document_assembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +embeddings =BertEmbeddings.pretrained("nepal_bhasa_bert","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([document_assembler, embeddings]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) +``` +```scala +val document_assembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val embeddings = BertEmbeddings + .pretrained("nepal_bhasa_bert", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(document_assembler, embeddings)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|nepal_bhasa_bert| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[bert]| +|Language:|en| +|Size:|403.6 MB| + +## References + +References + +https://huggingface.co/onlydj96/new_bert \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-nepal_bhasa_bert_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-nepal_bhasa_bert_pipeline_en.md new file mode 100644 index 00000000000000..24462ddb197d9c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-nepal_bhasa_bert_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English nepal_bhasa_bert_pipeline pipeline BertEmbeddings from searchfind +author: John Snow Labs +name: nepal_bhasa_bert_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`nepal_bhasa_bert_pipeline` is a English model originally trained by searchfind. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/nepal_bhasa_bert_pipeline_en_5.5.0_3.0_1726705501728.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/nepal_bhasa_bert_pipeline_en_5.5.0_3.0_1726705501728.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("nepal_bhasa_bert_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("nepal_bhasa_bert_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|nepal_bhasa_bert_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/searchfind/New_BERT + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-ner_chunkyun_en.md b/docs/_posts/ahmedlone127/2024-09-19-ner_chunkyun_en.md new file mode 100644 index 00000000000000..8f1bb1ea2ddd6b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-ner_chunkyun_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English ner_chunkyun RoBertaForTokenClassification from Chunkyun +author: John Snow Labs +name: ner_chunkyun +date: 2024-09-19 +tags: [en, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ner_chunkyun` is a English model originally trained by Chunkyun. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ner_chunkyun_en_5.5.0_3.0_1726730636226.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ner_chunkyun_en_5.5.0_3.0_1726730636226.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("ner_chunkyun","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("ner_chunkyun", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ner_chunkyun| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|465.0 MB| + +## References + +https://huggingface.co/Chunkyun/ner \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-nerd_nerd_random1_seed1_twitter_roberta_large_2022_154m_en.md b/docs/_posts/ahmedlone127/2024-09-19-nerd_nerd_random1_seed1_twitter_roberta_large_2022_154m_en.md new file mode 100644 index 00000000000000..126c1397aeaa21 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-nerd_nerd_random1_seed1_twitter_roberta_large_2022_154m_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English nerd_nerd_random1_seed1_twitter_roberta_large_2022_154m RoBertaForSequenceClassification from tweettemposhift +author: John Snow Labs +name: nerd_nerd_random1_seed1_twitter_roberta_large_2022_154m +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`nerd_nerd_random1_seed1_twitter_roberta_large_2022_154m` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/nerd_nerd_random1_seed1_twitter_roberta_large_2022_154m_en_5.5.0_3.0_1726732884708.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/nerd_nerd_random1_seed1_twitter_roberta_large_2022_154m_en_5.5.0_3.0_1726732884708.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("nerd_nerd_random1_seed1_twitter_roberta_large_2022_154m","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("nerd_nerd_random1_seed1_twitter_roberta_large_2022_154m", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|nerd_nerd_random1_seed1_twitter_roberta_large_2022_154m| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/tweettemposhift/nerd-nerd_random1_seed1-twitter-roberta-large-2022-154m \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-nerd_nerd_random1_seed2_twitter_roberta_base_2022_154m_en.md b/docs/_posts/ahmedlone127/2024-09-19-nerd_nerd_random1_seed2_twitter_roberta_base_2022_154m_en.md new file mode 100644 index 00000000000000..5a618163b859f5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-nerd_nerd_random1_seed2_twitter_roberta_base_2022_154m_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English nerd_nerd_random1_seed2_twitter_roberta_base_2022_154m RoBertaForSequenceClassification from tweettemposhift +author: John Snow Labs +name: nerd_nerd_random1_seed2_twitter_roberta_base_2022_154m +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`nerd_nerd_random1_seed2_twitter_roberta_base_2022_154m` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/nerd_nerd_random1_seed2_twitter_roberta_base_2022_154m_en_5.5.0_3.0_1726750787953.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/nerd_nerd_random1_seed2_twitter_roberta_base_2022_154m_en_5.5.0_3.0_1726750787953.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("nerd_nerd_random1_seed2_twitter_roberta_base_2022_154m","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("nerd_nerd_random1_seed2_twitter_roberta_base_2022_154m", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|nerd_nerd_random1_seed2_twitter_roberta_base_2022_154m| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|468.2 MB| + +## References + +https://huggingface.co/tweettemposhift/nerd-nerd_random1_seed2-twitter-roberta-base-2022-154m \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-nerd_nerd_random2_seed0_twitter_roberta_base_2022_154m_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-nerd_nerd_random2_seed0_twitter_roberta_base_2022_154m_pipeline_en.md new file mode 100644 index 00000000000000..c7c58d062f5bc3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-nerd_nerd_random2_seed0_twitter_roberta_base_2022_154m_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English nerd_nerd_random2_seed0_twitter_roberta_base_2022_154m_pipeline pipeline RoBertaForSequenceClassification from tweettemposhift +author: John Snow Labs +name: nerd_nerd_random2_seed0_twitter_roberta_base_2022_154m_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`nerd_nerd_random2_seed0_twitter_roberta_base_2022_154m_pipeline` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/nerd_nerd_random2_seed0_twitter_roberta_base_2022_154m_pipeline_en_5.5.0_3.0_1726750603997.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/nerd_nerd_random2_seed0_twitter_roberta_base_2022_154m_pipeline_en_5.5.0_3.0_1726750603997.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("nerd_nerd_random2_seed0_twitter_roberta_base_2022_154m_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("nerd_nerd_random2_seed0_twitter_roberta_base_2022_154m_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|nerd_nerd_random2_seed0_twitter_roberta_base_2022_154m_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|468.2 MB| + +## References + +https://huggingface.co/tweettemposhift/nerd-nerd_random2_seed0-twitter-roberta-base-2022-154m + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-neuronale_crew_a6_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-neuronale_crew_a6_pipeline_en.md new file mode 100644 index 00000000000000..655ccade27ebcb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-neuronale_crew_a6_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English neuronale_crew_a6_pipeline pipeline DistilBertForSequenceClassification from ninjeanne-hka +author: John Snow Labs +name: neuronale_crew_a6_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`neuronale_crew_a6_pipeline` is a English model originally trained by ninjeanne-hka. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/neuronale_crew_a6_pipeline_en_5.5.0_3.0_1726743552974.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/neuronale_crew_a6_pipeline_en_5.5.0_3.0_1726743552974.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("neuronale_crew_a6_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("neuronale_crew_a6_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|neuronale_crew_a6_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/ninjeanne-hka/neuronale_crew_a6 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-nim_h_rivenb_nainizchii_klasufikaciya_en.md b/docs/_posts/ahmedlone127/2024-09-19-nim_h_rivenb_nainizchii_klasufikaciya_en.md new file mode 100644 index 00000000000000..4faf45ae788c70 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-nim_h_rivenb_nainizchii_klasufikaciya_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English nim_h_rivenb_nainizchii_klasufikaciya DistilBertForSequenceClassification from yevhenkost +author: John Snow Labs +name: nim_h_rivenb_nainizchii_klasufikaciya +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`nim_h_rivenb_nainizchii_klasufikaciya` is a English model originally trained by yevhenkost. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/nim_h_rivenb_nainizchii_klasufikaciya_en_5.5.0_3.0_1726741242970.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/nim_h_rivenb_nainizchii_klasufikaciya_en_5.5.0_3.0_1726741242970.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("nim_h_rivenb_nainizchii_klasufikaciya","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("nim_h_rivenb_nainizchii_klasufikaciya", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|nim_h_rivenb_nainizchii_klasufikaciya| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|255.4 MB| + +## References + +https://huggingface.co/yevhenkost/nim_h_rivenb_nainizchii_klasufikaciya \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-nim_h_rivenb_nainizchii_klasufikaciya_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-nim_h_rivenb_nainizchii_klasufikaciya_pipeline_en.md new file mode 100644 index 00000000000000..4489119d36a3e1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-nim_h_rivenb_nainizchii_klasufikaciya_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English nim_h_rivenb_nainizchii_klasufikaciya_pipeline pipeline DistilBertForSequenceClassification from yevhenkost +author: John Snow Labs +name: nim_h_rivenb_nainizchii_klasufikaciya_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`nim_h_rivenb_nainizchii_klasufikaciya_pipeline` is a English model originally trained by yevhenkost. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/nim_h_rivenb_nainizchii_klasufikaciya_pipeline_en_5.5.0_3.0_1726741256817.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/nim_h_rivenb_nainizchii_klasufikaciya_pipeline_en_5.5.0_3.0_1726741256817.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("nim_h_rivenb_nainizchii_klasufikaciya_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("nim_h_rivenb_nainizchii_klasufikaciya_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|nim_h_rivenb_nainizchii_klasufikaciya_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|255.4 MB| + +## References + +https://huggingface.co/yevhenkost/nim_h_rivenb_nainizchii_klasufikaciya + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-notiberto_en.md b/docs/_posts/ahmedlone127/2024-09-19-notiberto_en.md new file mode 100644 index 00000000000000..2f364248509a34 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-notiberto_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English notiberto RoBertaEmbeddings from GioReg +author: John Snow Labs +name: notiberto +date: 2024-09-19 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`notiberto` is a English model originally trained by GioReg. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/notiberto_en_5.5.0_3.0_1726778410768.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/notiberto_en_5.5.0_3.0_1726778410768.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("notiberto","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("notiberto","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|notiberto| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|311.0 MB| + +## References + +https://huggingface.co/GioReg/notiBERTo \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-notiberto_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-notiberto_pipeline_en.md new file mode 100644 index 00000000000000..9a8f6fe9a572a6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-notiberto_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English notiberto_pipeline pipeline RoBertaEmbeddings from GioReg +author: John Snow Labs +name: notiberto_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`notiberto_pipeline` is a English model originally trained by GioReg. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/notiberto_pipeline_en_5.5.0_3.0_1726778426256.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/notiberto_pipeline_en_5.5.0_3.0_1726778426256.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("notiberto_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("notiberto_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|notiberto_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|311.1 MB| + +## References + +https://huggingface.co/GioReg/notiBERTo + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-nsina_category_sinbert_small_pipeline_si.md b/docs/_posts/ahmedlone127/2024-09-19-nsina_category_sinbert_small_pipeline_si.md new file mode 100644 index 00000000000000..bcf19e0f998dd9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-nsina_category_sinbert_small_pipeline_si.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Sinhala, Sinhalese nsina_category_sinbert_small_pipeline pipeline RoBertaForSequenceClassification from sinhala-nlp +author: John Snow Labs +name: nsina_category_sinbert_small_pipeline +date: 2024-09-19 +tags: [si, open_source, pipeline, onnx] +task: Text Classification +language: si +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`nsina_category_sinbert_small_pipeline` is a Sinhala, Sinhalese model originally trained by sinhala-nlp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/nsina_category_sinbert_small_pipeline_si_5.5.0_3.0_1726750944443.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/nsina_category_sinbert_small_pipeline_si_5.5.0_3.0_1726750944443.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("nsina_category_sinbert_small_pipeline", lang = "si") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("nsina_category_sinbert_small_pipeline", lang = "si") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|nsina_category_sinbert_small_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|si| +|Size:|249.4 MB| + +## References + +https://huggingface.co/sinhala-nlp/NSINA-Category-sinbert-small + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-nsina_category_sinbert_small_si.md b/docs/_posts/ahmedlone127/2024-09-19-nsina_category_sinbert_small_si.md new file mode 100644 index 00000000000000..7a6228bbee7e54 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-nsina_category_sinbert_small_si.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Sinhala, Sinhalese nsina_category_sinbert_small RoBertaForSequenceClassification from sinhala-nlp +author: John Snow Labs +name: nsina_category_sinbert_small +date: 2024-09-19 +tags: [si, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: si +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`nsina_category_sinbert_small` is a Sinhala, Sinhalese model originally trained by sinhala-nlp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/nsina_category_sinbert_small_si_5.5.0_3.0_1726750932592.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/nsina_category_sinbert_small_si_5.5.0_3.0_1726750932592.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("nsina_category_sinbert_small","si") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("nsina_category_sinbert_small", "si") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|nsina_category_sinbert_small| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|si| +|Size:|249.4 MB| + +## References + +https://huggingface.co/sinhala-nlp/NSINA-Category-sinbert-small \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-nuner_v1_fewnerd_coarse_super_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-nuner_v1_fewnerd_coarse_super_pipeline_en.md new file mode 100644 index 00000000000000..416129af2350d6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-nuner_v1_fewnerd_coarse_super_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English nuner_v1_fewnerd_coarse_super_pipeline pipeline RoBertaForTokenClassification from guishe +author: John Snow Labs +name: nuner_v1_fewnerd_coarse_super_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`nuner_v1_fewnerd_coarse_super_pipeline` is a English model originally trained by guishe. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/nuner_v1_fewnerd_coarse_super_pipeline_en_5.5.0_3.0_1726730197447.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/nuner_v1_fewnerd_coarse_super_pipeline_en_5.5.0_3.0_1726730197447.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("nuner_v1_fewnerd_coarse_super_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("nuner_v1_fewnerd_coarse_super_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|nuner_v1_fewnerd_coarse_super_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|462.4 MB| + +## References + +https://huggingface.co/guishe/nuner-v1_fewnerd_coarse_super + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-odsc_sawyer_reward_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-odsc_sawyer_reward_pipeline_en.md new file mode 100644 index 00000000000000..8d94982b0ef9c8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-odsc_sawyer_reward_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English odsc_sawyer_reward_pipeline pipeline RoBertaForSequenceClassification from profoz +author: John Snow Labs +name: odsc_sawyer_reward_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`odsc_sawyer_reward_pipeline` is a English model originally trained by profoz. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/odsc_sawyer_reward_pipeline_en_5.5.0_3.0_1726726250257.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/odsc_sawyer_reward_pipeline_en_5.5.0_3.0_1726726250257.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("odsc_sawyer_reward_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("odsc_sawyer_reward_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|odsc_sawyer_reward_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|462.2 MB| + +## References + +https://huggingface.co/profoz/odsc-sawyer-reward + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-openai_whisper_base_spanish_ecu911_pasobajo_pipeline_es.md b/docs/_posts/ahmedlone127/2024-09-19-openai_whisper_base_spanish_ecu911_pasobajo_pipeline_es.md new file mode 100644 index 00000000000000..397a3c10326d95 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-openai_whisper_base_spanish_ecu911_pasobajo_pipeline_es.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Castilian, Spanish openai_whisper_base_spanish_ecu911_pasobajo_pipeline pipeline WhisperForCTC from DanielMarquez +author: John Snow Labs +name: openai_whisper_base_spanish_ecu911_pasobajo_pipeline +date: 2024-09-19 +tags: [es, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: es +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`openai_whisper_base_spanish_ecu911_pasobajo_pipeline` is a Castilian, Spanish model originally trained by DanielMarquez. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/openai_whisper_base_spanish_ecu911_pasobajo_pipeline_es_5.5.0_3.0_1726714452253.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/openai_whisper_base_spanish_ecu911_pasobajo_pipeline_es_5.5.0_3.0_1726714452253.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("openai_whisper_base_spanish_ecu911_pasobajo_pipeline", lang = "es") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("openai_whisper_base_spanish_ecu911_pasobajo_pipeline", lang = "es") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|openai_whisper_base_spanish_ecu911_pasobajo_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|es| +|Size:|615.7 MB| + +## References + +https://huggingface.co/DanielMarquez/openai-whisper-base-es_ecu911-PasoBajo + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-opp_115_user_choice_control_en.md b/docs/_posts/ahmedlone127/2024-09-19-opp_115_user_choice_control_en.md new file mode 100644 index 00000000000000..6ada2d597bd5e9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-opp_115_user_choice_control_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English opp_115_user_choice_control RoBertaForSequenceClassification from jakariamd +author: John Snow Labs +name: opp_115_user_choice_control +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opp_115_user_choice_control` is a English model originally trained by jakariamd. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opp_115_user_choice_control_en_5.5.0_3.0_1726780458471.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opp_115_user_choice_control_en_5.5.0_3.0_1726780458471.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("opp_115_user_choice_control","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("opp_115_user_choice_control", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opp_115_user_choice_control| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|468.5 MB| + +## References + +https://huggingface.co/jakariamd/opp_115_user_choice_control \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-opp_115_user_choice_control_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-opp_115_user_choice_control_pipeline_en.md new file mode 100644 index 00000000000000..61a4d748907ba6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-opp_115_user_choice_control_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English opp_115_user_choice_control_pipeline pipeline RoBertaForSequenceClassification from jakariamd +author: John Snow Labs +name: opp_115_user_choice_control_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opp_115_user_choice_control_pipeline` is a English model originally trained by jakariamd. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opp_115_user_choice_control_pipeline_en_5.5.0_3.0_1726780480799.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opp_115_user_choice_control_pipeline_en_5.5.0_3.0_1726780480799.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("opp_115_user_choice_control_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("opp_115_user_choice_control_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opp_115_user_choice_control_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|468.6 MB| + +## References + +https://huggingface.co/jakariamd/opp_115_user_choice_control + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-paludistilbert_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-paludistilbert_pipeline_en.md new file mode 100644 index 00000000000000..697f09cdf43d1b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-paludistilbert_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English paludistilbert_pipeline pipeline DistilBertForSequenceClassification from Palu001 +author: John Snow Labs +name: paludistilbert_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`paludistilbert_pipeline` is a English model originally trained by Palu001. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/paludistilbert_pipeline_en_5.5.0_3.0_1726742584799.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/paludistilbert_pipeline_en_5.5.0_3.0_1726742584799.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("paludistilbert_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("paludistilbert_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|paludistilbert_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Palu001/PaluDistilbert + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-patent_ner_test_noisyocr_version_en.md b/docs/_posts/ahmedlone127/2024-09-19-patent_ner_test_noisyocr_version_en.md new file mode 100644 index 00000000000000..5378722583dd4b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-patent_ner_test_noisyocr_version_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English patent_ner_test_noisyocr_version RoBertaForTokenClassification from matthewleechen +author: John Snow Labs +name: patent_ner_test_noisyocr_version +date: 2024-09-19 +tags: [en, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`patent_ner_test_noisyocr_version` is a English model originally trained by matthewleechen. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/patent_ner_test_noisyocr_version_en_5.5.0_3.0_1726729583954.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/patent_ner_test_noisyocr_version_en_5.5.0_3.0_1726729583954.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("patent_ner_test_noisyocr_version","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("patent_ner_test_noisyocr_version", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|patent_ner_test_noisyocr_version| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/matthewleechen/patent_ner_test_noisyocr_version \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-patent_ner_test_noisyocr_version_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-patent_ner_test_noisyocr_version_pipeline_en.md new file mode 100644 index 00000000000000..c5a80e68f4d774 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-patent_ner_test_noisyocr_version_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English patent_ner_test_noisyocr_version_pipeline pipeline RoBertaForTokenClassification from matthewleechen +author: John Snow Labs +name: patent_ner_test_noisyocr_version_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`patent_ner_test_noisyocr_version_pipeline` is a English model originally trained by matthewleechen. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/patent_ner_test_noisyocr_version_pipeline_en_5.5.0_3.0_1726729672508.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/patent_ner_test_noisyocr_version_pipeline_en_5.5.0_3.0_1726729672508.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("patent_ner_test_noisyocr_version_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("patent_ner_test_noisyocr_version_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|patent_ner_test_noisyocr_version_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/matthewleechen/patent_ner_test_noisyocr_version + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-platzi_distilroberta_base_mrpc_glue_andres_galvis_en.md b/docs/_posts/ahmedlone127/2024-09-19-platzi_distilroberta_base_mrpc_glue_andres_galvis_en.md new file mode 100644 index 00000000000000..923c100cd0429b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-platzi_distilroberta_base_mrpc_glue_andres_galvis_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English platzi_distilroberta_base_mrpc_glue_andres_galvis RoBertaForSequenceClassification from platzi +author: John Snow Labs +name: platzi_distilroberta_base_mrpc_glue_andres_galvis +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`platzi_distilroberta_base_mrpc_glue_andres_galvis` is a English model originally trained by platzi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/platzi_distilroberta_base_mrpc_glue_andres_galvis_en_5.5.0_3.0_1726751072961.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/platzi_distilroberta_base_mrpc_glue_andres_galvis_en_5.5.0_3.0_1726751072961.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("platzi_distilroberta_base_mrpc_glue_andres_galvis","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("platzi_distilroberta_base_mrpc_glue_andres_galvis", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|platzi_distilroberta_base_mrpc_glue_andres_galvis| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|308.6 MB| + +## References + +https://huggingface.co/platzi/platzi-distilroberta-base-mrpc-glue-andres-galvis \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-platzi_distilroberta_base_mrpc_glue_andres_galvis_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-platzi_distilroberta_base_mrpc_glue_andres_galvis_pipeline_en.md new file mode 100644 index 00000000000000..315218ea3691b2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-platzi_distilroberta_base_mrpc_glue_andres_galvis_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English platzi_distilroberta_base_mrpc_glue_andres_galvis_pipeline pipeline RoBertaForSequenceClassification from platzi +author: John Snow Labs +name: platzi_distilroberta_base_mrpc_glue_andres_galvis_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`platzi_distilroberta_base_mrpc_glue_andres_galvis_pipeline` is a English model originally trained by platzi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/platzi_distilroberta_base_mrpc_glue_andres_galvis_pipeline_en_5.5.0_3.0_1726751088130.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/platzi_distilroberta_base_mrpc_glue_andres_galvis_pipeline_en_5.5.0_3.0_1726751088130.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("platzi_distilroberta_base_mrpc_glue_andres_galvis_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("platzi_distilroberta_base_mrpc_glue_andres_galvis_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|platzi_distilroberta_base_mrpc_glue_andres_galvis_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|308.6 MB| + +## References + +https://huggingface.co/platzi/platzi-distilroberta-base-mrpc-glue-andres-galvis + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-pmp_h768_zh.md b/docs/_posts/ahmedlone127/2024-09-19-pmp_h768_zh.md new file mode 100644 index 00000000000000..98435d00f640d8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-pmp_h768_zh.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Chinese pmp_h768 BertForTokenClassification from rickltt +author: John Snow Labs +name: pmp_h768 +date: 2024-09-19 +tags: [zh, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: zh +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`pmp_h768` is a Chinese model originally trained by rickltt. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/pmp_h768_zh_5.5.0_3.0_1726771429597.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/pmp_h768_zh_5.5.0_3.0_1726771429597.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("pmp_h768","zh") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("pmp_h768", "zh") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|pmp_h768| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|zh| +|Size:|221.9 MB| + +## References + +https://huggingface.co/rickltt/pmp-h768 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-polarizer_bert_base_uncased_en.md b/docs/_posts/ahmedlone127/2024-09-19-polarizer_bert_base_uncased_en.md new file mode 100644 index 00000000000000..1076dde1ebadaa --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-polarizer_bert_base_uncased_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English polarizer_bert_base_uncased BertEmbeddings from kyungmin011029 +author: John Snow Labs +name: polarizer_bert_base_uncased +date: 2024-09-19 +tags: [en, open_source, onnx, embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`polarizer_bert_base_uncased` is a English model originally trained by kyungmin011029. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/polarizer_bert_base_uncased_en_5.5.0_3.0_1726744755315.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/polarizer_bert_base_uncased_en_5.5.0_3.0_1726744755315.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = BertEmbeddings.pretrained("polarizer_bert_base_uncased","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = BertEmbeddings.pretrained("polarizer_bert_base_uncased","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|polarizer_bert_base_uncased| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[bert]| +|Language:|en| +|Size:|407.1 MB| + +## References + +https://huggingface.co/kyungmin011029/Polarizer-bert-base-uncased \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-polarizer_roberta_large_en.md b/docs/_posts/ahmedlone127/2024-09-19-polarizer_roberta_large_en.md new file mode 100644 index 00000000000000..1fc838daeb9785 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-polarizer_roberta_large_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English polarizer_roberta_large RoBertaEmbeddings from kyungmin011029 +author: John Snow Labs +name: polarizer_roberta_large +date: 2024-09-19 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`polarizer_roberta_large` is a English model originally trained by kyungmin011029. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/polarizer_roberta_large_en_5.5.0_3.0_1726748968819.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/polarizer_roberta_large_en_5.5.0_3.0_1726748968819.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("polarizer_roberta_large","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("polarizer_roberta_large","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|polarizer_roberta_large| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/kyungmin011029/Polarizer-roberta-large \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-polarizer_roberta_large_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-polarizer_roberta_large_pipeline_en.md new file mode 100644 index 00000000000000..bf315ff2443b1e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-polarizer_roberta_large_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English polarizer_roberta_large_pipeline pipeline RoBertaEmbeddings from kyungmin011029 +author: John Snow Labs +name: polarizer_roberta_large_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`polarizer_roberta_large_pipeline` is a English model originally trained by kyungmin011029. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/polarizer_roberta_large_pipeline_en_5.5.0_3.0_1726749034823.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/polarizer_roberta_large_pipeline_en_5.5.0_3.0_1726749034823.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("polarizer_roberta_large_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("polarizer_roberta_large_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|polarizer_roberta_large_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/kyungmin011029/Polarizer-roberta-large + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-projectsentimentalanalysis_en.md b/docs/_posts/ahmedlone127/2024-09-19-projectsentimentalanalysis_en.md new file mode 100644 index 00000000000000..fcc5fbe4cbed39 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-projectsentimentalanalysis_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English projectsentimentalanalysis DistilBertForSequenceClassification from Stevenchee +author: John Snow Labs +name: projectsentimentalanalysis +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`projectsentimentalanalysis` is a English model originally trained by Stevenchee. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/projectsentimentalanalysis_en_5.5.0_3.0_1726744065305.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/projectsentimentalanalysis_en_5.5.0_3.0_1726744065305.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("projectsentimentalanalysis","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("projectsentimentalanalysis", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|projectsentimentalanalysis| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|248.4 MB| + +## References + +https://huggingface.co/Stevenchee/projectsentimentalanalysis \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-projectsentimentalanalysis_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-projectsentimentalanalysis_pipeline_en.md new file mode 100644 index 00000000000000..c13ffc9855934e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-projectsentimentalanalysis_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English projectsentimentalanalysis_pipeline pipeline DistilBertForSequenceClassification from Stevenchee +author: John Snow Labs +name: projectsentimentalanalysis_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`projectsentimentalanalysis_pipeline` is a English model originally trained by Stevenchee. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/projectsentimentalanalysis_pipeline_en_5.5.0_3.0_1726744078648.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/projectsentimentalanalysis_pipeline_en_5.5.0_3.0_1726744078648.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("projectsentimentalanalysis_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("projectsentimentalanalysis_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|projectsentimentalanalysis_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|248.5 MB| + +## References + +https://huggingface.co/Stevenchee/projectsentimentalanalysis + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-ptcrawl_base_v2_5__checkpoint_2_26000_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-ptcrawl_base_v2_5__checkpoint_2_26000_pipeline_en.md new file mode 100644 index 00000000000000..a2238971258f95 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-ptcrawl_base_v2_5__checkpoint_2_26000_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English ptcrawl_base_v2_5__checkpoint_2_26000_pipeline pipeline RoBertaEmbeddings from eduagarcia-temp +author: John Snow Labs +name: ptcrawl_base_v2_5__checkpoint_2_26000_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ptcrawl_base_v2_5__checkpoint_2_26000_pipeline` is a English model originally trained by eduagarcia-temp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ptcrawl_base_v2_5__checkpoint_2_26000_pipeline_en_5.5.0_3.0_1726749467407.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ptcrawl_base_v2_5__checkpoint_2_26000_pipeline_en_5.5.0_3.0_1726749467407.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("ptcrawl_base_v2_5__checkpoint_2_26000_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("ptcrawl_base_v2_5__checkpoint_2_26000_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ptcrawl_base_v2_5__checkpoint_2_26000_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|296.6 MB| + +## References + +https://huggingface.co/eduagarcia-temp/ptcrawl_base-v2_5__checkpoint_2_26000 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-pubchem10m_smiles_bpe_450k_en.md b/docs/_posts/ahmedlone127/2024-09-19-pubchem10m_smiles_bpe_450k_en.md new file mode 100644 index 00000000000000..891421afdb7410 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-pubchem10m_smiles_bpe_450k_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English pubchem10m_smiles_bpe_450k RoBertaEmbeddings from seyonec +author: John Snow Labs +name: pubchem10m_smiles_bpe_450k +date: 2024-09-19 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`pubchem10m_smiles_bpe_450k` is a English model originally trained by seyonec. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/pubchem10m_smiles_bpe_450k_en_5.5.0_3.0_1726778206617.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/pubchem10m_smiles_bpe_450k_en_5.5.0_3.0_1726778206617.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("pubchem10m_smiles_bpe_450k","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("pubchem10m_smiles_bpe_450k","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|pubchem10m_smiles_bpe_450k| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|311.0 MB| + +## References + +https://huggingface.co/seyonec/PubChem10M_SMILES_BPE_450k \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-regr_2_en.md b/docs/_posts/ahmedlone127/2024-09-19-regr_2_en.md new file mode 100644 index 00000000000000..c87a45f6534d9b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-regr_2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English regr_2 RoBertaForSequenceClassification from BaronSch +author: John Snow Labs +name: regr_2 +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`regr_2` is a English model originally trained by BaronSch. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/regr_2_en_5.5.0_3.0_1726780180628.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/regr_2_en_5.5.0_3.0_1726780180628.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("regr_2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("regr_2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|regr_2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|468.5 MB| + +## References + +https://huggingface.co/BaronSch/Regr_2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-repo_18_06_mlops_en.md b/docs/_posts/ahmedlone127/2024-09-19-repo_18_06_mlops_en.md new file mode 100644 index 00000000000000..ee863f45676a01 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-repo_18_06_mlops_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English repo_18_06_mlops DistilBertForSequenceClassification from AliMokh +author: John Snow Labs +name: repo_18_06_mlops +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`repo_18_06_mlops` is a English model originally trained by AliMokh. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/repo_18_06_mlops_en_5.5.0_3.0_1726743090052.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/repo_18_06_mlops_en_5.5.0_3.0_1726743090052.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("repo_18_06_mlops","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("repo_18_06_mlops", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|repo_18_06_mlops| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/AliMokh/repo-18-06-MLOps \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-repo_18_06_mlops_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-repo_18_06_mlops_pipeline_en.md new file mode 100644 index 00000000000000..f36b6d32dee5b0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-repo_18_06_mlops_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English repo_18_06_mlops_pipeline pipeline DistilBertForSequenceClassification from AliMokh +author: John Snow Labs +name: repo_18_06_mlops_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`repo_18_06_mlops_pipeline` is a English model originally trained by AliMokh. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/repo_18_06_mlops_pipeline_en_5.5.0_3.0_1726743102484.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/repo_18_06_mlops_pipeline_en_5.5.0_3.0_1726743102484.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("repo_18_06_mlops_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("repo_18_06_mlops_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|repo_18_06_mlops_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/AliMokh/repo-18-06-MLOps + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-roberta_ancient_greek_mlm_en.md b/docs/_posts/ahmedlone127/2024-09-19-roberta_ancient_greek_mlm_en.md new file mode 100644 index 00000000000000..f8d8ff3b3141e3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-roberta_ancient_greek_mlm_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_ancient_greek_mlm RoBertaEmbeddings from wantuta +author: John Snow Labs +name: roberta_ancient_greek_mlm +date: 2024-09-19 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_ancient_greek_mlm` is a English model originally trained by wantuta. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_ancient_greek_mlm_en_5.5.0_3.0_1726747458258.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_ancient_greek_mlm_en_5.5.0_3.0_1726747458258.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("roberta_ancient_greek_mlm","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("roberta_ancient_greek_mlm","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_ancient_greek_mlm| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|468.1 MB| + +## References + +https://huggingface.co/wantuta/roberta_ancient_greek_mlm \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-roberta_aptner_en.md b/docs/_posts/ahmedlone127/2024-09-19-roberta_aptner_en.md new file mode 100644 index 00000000000000..6cefcadf352f6d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-roberta_aptner_en.md @@ -0,0 +1,100 @@ +--- +layout: model +title: English roberta_aptner RoBertaForTokenClassification from anonymouspd +author: John Snow Labs +name: roberta_aptner +date: 2024-09-19 +tags: [roberta, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_aptner` is a English model originally trained by anonymouspd. + +## Predicted Entities + + + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_aptner_en_5.5.0_3.0_1726731172548.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_aptner_en_5.5.0_3.0_1726731172548.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols(["document"]) \ + .setOutputCol("token") + + +tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_aptner","en") \ + .setInputCols(["document","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = Tokenizer() \ + .setInputCols(Array("document")) \ + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification + .pretrained("roberta_aptner", "en") + .setInputCols(Array("document","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_aptner| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|430.3 MB| + +## References + +References + +https://huggingface.co/anonymouspd/RoBERTa-APTNER \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-roberta_base_bne_finetuned_amazon_reviews_spanish_02_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-roberta_base_bne_finetuned_amazon_reviews_spanish_02_pipeline_en.md new file mode 100644 index 00000000000000..8513fc1b8457b8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-roberta_base_bne_finetuned_amazon_reviews_spanish_02_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_bne_finetuned_amazon_reviews_spanish_02_pipeline pipeline RoBertaForSequenceClassification from DevCar +author: John Snow Labs +name: roberta_base_bne_finetuned_amazon_reviews_spanish_02_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_bne_finetuned_amazon_reviews_spanish_02_pipeline` is a English model originally trained by DevCar. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_bne_finetuned_amazon_reviews_spanish_02_pipeline_en_5.5.0_3.0_1726750899552.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_bne_finetuned_amazon_reviews_spanish_02_pipeline_en_5.5.0_3.0_1726750899552.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_bne_finetuned_amazon_reviews_spanish_02_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_bne_finetuned_amazon_reviews_spanish_02_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_bne_finetuned_amazon_reviews_spanish_02_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|454.0 MB| + +## References + +https://huggingface.co/DevCar/roberta-base-bne-finetuned-amazon_reviews_es_02 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-roberta_base_bne_finetuned_tripadvisor_en.md b/docs/_posts/ahmedlone127/2024-09-19-roberta_base_bne_finetuned_tripadvisor_en.md new file mode 100644 index 00000000000000..1be2a08ffc813d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-roberta_base_bne_finetuned_tripadvisor_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_base_bne_finetuned_tripadvisor RoBertaEmbeddings from vg055 +author: John Snow Labs +name: roberta_base_bne_finetuned_tripadvisor +date: 2024-09-19 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_bne_finetuned_tripadvisor` is a English model originally trained by vg055. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_bne_finetuned_tripadvisor_en_5.5.0_3.0_1726747101775.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_bne_finetuned_tripadvisor_en_5.5.0_3.0_1726747101775.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("roberta_base_bne_finetuned_tripadvisor","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("roberta_base_bne_finetuned_tripadvisor","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_bne_finetuned_tripadvisor| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|463.7 MB| + +## References + +https://huggingface.co/vg055/roberta-base-bne-finetuned-tripAdvisor \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-roberta_base_bne_finetuned_tripadvisor_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-roberta_base_bne_finetuned_tripadvisor_pipeline_en.md new file mode 100644 index 00000000000000..7cc0d629055a2a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-roberta_base_bne_finetuned_tripadvisor_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_bne_finetuned_tripadvisor_pipeline pipeline RoBertaEmbeddings from vg055 +author: John Snow Labs +name: roberta_base_bne_finetuned_tripadvisor_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_bne_finetuned_tripadvisor_pipeline` is a English model originally trained by vg055. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_bne_finetuned_tripadvisor_pipeline_en_5.5.0_3.0_1726747125273.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_bne_finetuned_tripadvisor_pipeline_en_5.5.0_3.0_1726747125273.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_bne_finetuned_tripadvisor_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_bne_finetuned_tripadvisor_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_bne_finetuned_tripadvisor_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|463.7 MB| + +## References + +https://huggingface.co/vg055/roberta-base-bne-finetuned-tripAdvisor + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-roberta_base_epoch_10_en.md b/docs/_posts/ahmedlone127/2024-09-19-roberta_base_epoch_10_en.md new file mode 100644 index 00000000000000..d5b2a6fa971354 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-roberta_base_epoch_10_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_base_epoch_10 RoBertaEmbeddings from yanaiela +author: John Snow Labs +name: roberta_base_epoch_10 +date: 2024-09-19 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_epoch_10` is a English model originally trained by yanaiela. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_epoch_10_en_5.5.0_3.0_1726747298844.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_epoch_10_en_5.5.0_3.0_1726747298844.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("roberta_base_epoch_10","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("roberta_base_epoch_10","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_epoch_10| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|297.0 MB| + +## References + +https://huggingface.co/yanaiela/roberta-base-epoch_10 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-roberta_base_epoch_10_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-roberta_base_epoch_10_pipeline_en.md new file mode 100644 index 00000000000000..c741d0de63c698 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-roberta_base_epoch_10_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_epoch_10_pipeline pipeline RoBertaEmbeddings from yanaiela +author: John Snow Labs +name: roberta_base_epoch_10_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_epoch_10_pipeline` is a English model originally trained by yanaiela. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_epoch_10_pipeline_en_5.5.0_3.0_1726747388742.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_epoch_10_pipeline_en_5.5.0_3.0_1726747388742.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_epoch_10_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_epoch_10_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_epoch_10_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|297.0 MB| + +## References + +https://huggingface.co/yanaiela/roberta-base-epoch_10 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-roberta_base_epoch_13_en.md b/docs/_posts/ahmedlone127/2024-09-19-roberta_base_epoch_13_en.md new file mode 100644 index 00000000000000..7e2370f839a869 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-roberta_base_epoch_13_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_base_epoch_13 RoBertaEmbeddings from yanaiela +author: John Snow Labs +name: roberta_base_epoch_13 +date: 2024-09-19 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_epoch_13` is a English model originally trained by yanaiela. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_epoch_13_en_5.5.0_3.0_1726747674915.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_epoch_13_en_5.5.0_3.0_1726747674915.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("roberta_base_epoch_13","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("roberta_base_epoch_13","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_epoch_13| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|297.1 MB| + +## References + +https://huggingface.co/yanaiela/roberta-base-epoch_13 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-roberta_base_epoch_13_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-roberta_base_epoch_13_pipeline_en.md new file mode 100644 index 00000000000000..bf45742975b123 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-roberta_base_epoch_13_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_epoch_13_pipeline pipeline RoBertaEmbeddings from yanaiela +author: John Snow Labs +name: roberta_base_epoch_13_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_epoch_13_pipeline` is a English model originally trained by yanaiela. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_epoch_13_pipeline_en_5.5.0_3.0_1726747764936.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_epoch_13_pipeline_en_5.5.0_3.0_1726747764936.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_epoch_13_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_epoch_13_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_epoch_13_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|297.2 MB| + +## References + +https://huggingface.co/yanaiela/roberta-base-epoch_13 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-roberta_base_epoch_61_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-roberta_base_epoch_61_pipeline_en.md new file mode 100644 index 00000000000000..dfa9abe67f34f8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-roberta_base_epoch_61_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_epoch_61_pipeline pipeline RoBertaEmbeddings from yanaiela +author: John Snow Labs +name: roberta_base_epoch_61_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_epoch_61_pipeline` is a English model originally trained by yanaiela. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_epoch_61_pipeline_en_5.5.0_3.0_1726778606464.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_epoch_61_pipeline_en_5.5.0_3.0_1726778606464.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_epoch_61_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_epoch_61_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_epoch_61_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|297.3 MB| + +## References + +https://huggingface.co/yanaiela/roberta-base-epoch_61 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-roberta_base_finetuned_lower_fabric_en.md b/docs/_posts/ahmedlone127/2024-09-19-roberta_base_finetuned_lower_fabric_en.md new file mode 100644 index 00000000000000..c4fa38d8887bbe --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-roberta_base_finetuned_lower_fabric_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_base_finetuned_lower_fabric RoBertaForSequenceClassification from Cournane +author: John Snow Labs +name: roberta_base_finetuned_lower_fabric +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_finetuned_lower_fabric` is a English model originally trained by Cournane. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_finetuned_lower_fabric_en_5.5.0_3.0_1726732857587.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_finetuned_lower_fabric_en_5.5.0_3.0_1726732857587.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_base_finetuned_lower_fabric","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_base_finetuned_lower_fabric", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_finetuned_lower_fabric| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|427.2 MB| + +## References + +https://huggingface.co/Cournane/roberta-base-finetuned-Lower_fabric \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-roberta_base_finetuned_mp_unannotated_half_frozen_v1_rile_v1_frozen_8_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-roberta_base_finetuned_mp_unannotated_half_frozen_v1_rile_v1_frozen_8_pipeline_en.md new file mode 100644 index 00000000000000..2d2bc81c66ffc4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-roberta_base_finetuned_mp_unannotated_half_frozen_v1_rile_v1_frozen_8_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_finetuned_mp_unannotated_half_frozen_v1_rile_v1_frozen_8_pipeline pipeline RoBertaForSequenceClassification from kghanlon +author: John Snow Labs +name: roberta_base_finetuned_mp_unannotated_half_frozen_v1_rile_v1_frozen_8_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_finetuned_mp_unannotated_half_frozen_v1_rile_v1_frozen_8_pipeline` is a English model originally trained by kghanlon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_finetuned_mp_unannotated_half_frozen_v1_rile_v1_frozen_8_pipeline_en_5.5.0_3.0_1726726213948.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_finetuned_mp_unannotated_half_frozen_v1_rile_v1_frozen_8_pipeline_en_5.5.0_3.0_1726726213948.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_finetuned_mp_unannotated_half_frozen_v1_rile_v1_frozen_8_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_finetuned_mp_unannotated_half_frozen_v1_rile_v1_frozen_8_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_finetuned_mp_unannotated_half_frozen_v1_rile_v1_frozen_8_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|358.0 MB| + +## References + +https://huggingface.co/kghanlon/roberta-base-finetuned-MP-unannotated-half-frozen-v1-RILE-v1_frozen_8 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-roberta_base_finetuned_ner_minhminh09_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-roberta_base_finetuned_ner_minhminh09_pipeline_en.md new file mode 100644 index 00000000000000..2321ec8b44c64e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-roberta_base_finetuned_ner_minhminh09_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_finetuned_ner_minhminh09_pipeline pipeline RoBertaForTokenClassification from MinhMinh09 +author: John Snow Labs +name: roberta_base_finetuned_ner_minhminh09_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_finetuned_ner_minhminh09_pipeline` is a English model originally trained by MinhMinh09. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_finetuned_ner_minhminh09_pipeline_en_5.5.0_3.0_1726729963263.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_finetuned_ner_minhminh09_pipeline_en_5.5.0_3.0_1726729963263.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_finetuned_ner_minhminh09_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_finetuned_ner_minhminh09_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_finetuned_ner_minhminh09_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|419.0 MB| + +## References + +https://huggingface.co/MinhMinh09/roberta-base-finetuned-ner + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-roberta_base_finetuned_ring_en.md b/docs/_posts/ahmedlone127/2024-09-19-roberta_base_finetuned_ring_en.md new file mode 100644 index 00000000000000..e24883e9d7c964 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-roberta_base_finetuned_ring_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_base_finetuned_ring RoBertaForSequenceClassification from Cournane +author: John Snow Labs +name: roberta_base_finetuned_ring +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_finetuned_ring` is a English model originally trained by Cournane. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_finetuned_ring_en_5.5.0_3.0_1726750737389.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_finetuned_ring_en_5.5.0_3.0_1726750737389.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_base_finetuned_ring","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_base_finetuned_ring", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_finetuned_ring| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|427.2 MB| + +## References + +https://huggingface.co/Cournane/roberta-base-finetuned-Ring \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-roberta_base_finetuned_wallisian_whisper_1ep_en.md b/docs/_posts/ahmedlone127/2024-09-19-roberta_base_finetuned_wallisian_whisper_1ep_en.md new file mode 100644 index 00000000000000..bbb7d8d1cce1cc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-roberta_base_finetuned_wallisian_whisper_1ep_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_base_finetuned_wallisian_whisper_1ep RoBertaEmbeddings from btamm12 +author: John Snow Labs +name: roberta_base_finetuned_wallisian_whisper_1ep +date: 2024-09-19 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_finetuned_wallisian_whisper_1ep` is a English model originally trained by btamm12. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_finetuned_wallisian_whisper_1ep_en_5.5.0_3.0_1726747692580.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_finetuned_wallisian_whisper_1ep_en_5.5.0_3.0_1726747692580.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("roberta_base_finetuned_wallisian_whisper_1ep","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("roberta_base_finetuned_wallisian_whisper_1ep","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_finetuned_wallisian_whisper_1ep| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|465.2 MB| + +## References + +https://huggingface.co/btamm12/roberta-base-finetuned-wls-whisper-1ep \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-roberta_base_finetuned_wallisian_whisper_1ep_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-roberta_base_finetuned_wallisian_whisper_1ep_pipeline_en.md new file mode 100644 index 00000000000000..e7635f24b53af7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-roberta_base_finetuned_wallisian_whisper_1ep_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_finetuned_wallisian_whisper_1ep_pipeline pipeline RoBertaEmbeddings from btamm12 +author: John Snow Labs +name: roberta_base_finetuned_wallisian_whisper_1ep_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_finetuned_wallisian_whisper_1ep_pipeline` is a English model originally trained by btamm12. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_finetuned_wallisian_whisper_1ep_pipeline_en_5.5.0_3.0_1726747716215.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_finetuned_wallisian_whisper_1ep_pipeline_en_5.5.0_3.0_1726747716215.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_finetuned_wallisian_whisper_1ep_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_finetuned_wallisian_whisper_1ep_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_finetuned_wallisian_whisper_1ep_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|465.3 MB| + +## References + +https://huggingface.co/btamm12/roberta-base-finetuned-wls-whisper-1ep + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-roberta_base_genia_ner_en.md b/docs/_posts/ahmedlone127/2024-09-19-roberta_base_genia_ner_en.md new file mode 100644 index 00000000000000..1eccd9862b71c1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-roberta_base_genia_ner_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_base_genia_ner RoBertaForTokenClassification from CheccoCando +author: John Snow Labs +name: roberta_base_genia_ner +date: 2024-09-19 +tags: [en, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_genia_ner` is a English model originally trained by CheccoCando. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_genia_ner_en_5.5.0_3.0_1726745762571.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_genia_ner_en_5.5.0_3.0_1726745762571.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_base_genia_ner","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_base_genia_ner", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_genia_ner| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|436.0 MB| + +## References + +https://huggingface.co/CheccoCando/roberta-base_GENIA_NER \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-roberta_base_genia_ner_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-roberta_base_genia_ner_pipeline_en.md new file mode 100644 index 00000000000000..231fdb3cbc1c0e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-roberta_base_genia_ner_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_genia_ner_pipeline pipeline RoBertaForTokenClassification from CheccoCando +author: John Snow Labs +name: roberta_base_genia_ner_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_genia_ner_pipeline` is a English model originally trained by CheccoCando. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_genia_ner_pipeline_en_5.5.0_3.0_1726745789247.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_genia_ner_pipeline_en_5.5.0_3.0_1726745789247.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_genia_ner_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_genia_ner_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_genia_ner_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|436.0 MB| + +## References + +https://huggingface.co/CheccoCando/roberta-base_GENIA_NER + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-roberta_base_hoax_classifier_fulltext_1h2r_en.md b/docs/_posts/ahmedlone127/2024-09-19-roberta_base_hoax_classifier_fulltext_1h2r_en.md new file mode 100644 index 00000000000000..6e58370cf1afa4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-roberta_base_hoax_classifier_fulltext_1h2r_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_base_hoax_classifier_fulltext_1h2r RoBertaForSequenceClassification from research-dump +author: John Snow Labs +name: roberta_base_hoax_classifier_fulltext_1h2r +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_hoax_classifier_fulltext_1h2r` is a English model originally trained by research-dump. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_hoax_classifier_fulltext_1h2r_en_5.5.0_3.0_1726750303667.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_hoax_classifier_fulltext_1h2r_en_5.5.0_3.0_1726750303667.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_base_hoax_classifier_fulltext_1h2r","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_base_hoax_classifier_fulltext_1h2r", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_hoax_classifier_fulltext_1h2r| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|456.1 MB| + +## References + +https://huggingface.co/research-dump/roberta-base_hoax_classifier_fulltext_1h2r \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-roberta_base_last_9_chars_acl2023_en.md b/docs/_posts/ahmedlone127/2024-09-19-roberta_base_last_9_chars_acl2023_en.md new file mode 100644 index 00000000000000..1da3949366ba45 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-roberta_base_last_9_chars_acl2023_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_base_last_9_chars_acl2023 RoBertaEmbeddings from hitachi-nlp +author: John Snow Labs +name: roberta_base_last_9_chars_acl2023 +date: 2024-09-19 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_last_9_chars_acl2023` is a English model originally trained by hitachi-nlp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_last_9_chars_acl2023_en_5.5.0_3.0_1726747683843.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_last_9_chars_acl2023_en_5.5.0_3.0_1726747683843.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("roberta_base_last_9_chars_acl2023","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("roberta_base_last_9_chars_acl2023","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_last_9_chars_acl2023| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|465.3 MB| + +## References + +https://huggingface.co/hitachi-nlp/roberta-base_last-9-chars_acl2023 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-roberta_base_last_9_chars_acl2023_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-roberta_base_last_9_chars_acl2023_pipeline_en.md new file mode 100644 index 00000000000000..bf70ccc4f5b630 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-roberta_base_last_9_chars_acl2023_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_last_9_chars_acl2023_pipeline pipeline RoBertaEmbeddings from hitachi-nlp +author: John Snow Labs +name: roberta_base_last_9_chars_acl2023_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_last_9_chars_acl2023_pipeline` is a English model originally trained by hitachi-nlp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_last_9_chars_acl2023_pipeline_en_5.5.0_3.0_1726747706664.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_last_9_chars_acl2023_pipeline_en_5.5.0_3.0_1726747706664.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_last_9_chars_acl2023_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_last_9_chars_acl2023_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_last_9_chars_acl2023_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|465.3 MB| + +## References + +https://huggingface.co/hitachi-nlp/roberta-base_last-9-chars_acl2023 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-roberta_base_ner_akramhec_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-roberta_base_ner_akramhec_pipeline_en.md new file mode 100644 index 00000000000000..203c782f33cae5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-roberta_base_ner_akramhec_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_ner_akramhec_pipeline pipeline RoBertaForTokenClassification from AkramHec +author: John Snow Labs +name: roberta_base_ner_akramhec_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_ner_akramhec_pipeline` is a English model originally trained by AkramHec. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_ner_akramhec_pipeline_en_5.5.0_3.0_1726730094657.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_ner_akramhec_pipeline_en_5.5.0_3.0_1726730094657.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_ner_akramhec_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_ner_akramhec_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_ner_akramhec_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|431.4 MB| + +## References + +https://huggingface.co/AkramHec/roberta-base-NER + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-roberta_base_ontonotes_pitangent_ds_en.md b/docs/_posts/ahmedlone127/2024-09-19-roberta_base_ontonotes_pitangent_ds_en.md new file mode 100644 index 00000000000000..7d18fa92b08154 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-roberta_base_ontonotes_pitangent_ds_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_base_ontonotes_pitangent_ds RoBertaForTokenClassification from pitangent-ds +author: John Snow Labs +name: roberta_base_ontonotes_pitangent_ds +date: 2024-09-19 +tags: [en, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_ontonotes_pitangent_ds` is a English model originally trained by pitangent-ds. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_ontonotes_pitangent_ds_en_5.5.0_3.0_1726729650008.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_ontonotes_pitangent_ds_en_5.5.0_3.0_1726729650008.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_base_ontonotes_pitangent_ds","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_base_ontonotes_pitangent_ds", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_ontonotes_pitangent_ds| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|449.6 MB| + +## References + +https://huggingface.co/pitangent-ds/roberta-base-ontonotes \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-roberta_base_ontonotes_pitangent_ds_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-roberta_base_ontonotes_pitangent_ds_pipeline_en.md new file mode 100644 index 00000000000000..80d1e3f360d278 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-roberta_base_ontonotes_pitangent_ds_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_ontonotes_pitangent_ds_pipeline pipeline RoBertaForTokenClassification from pitangent-ds +author: John Snow Labs +name: roberta_base_ontonotes_pitangent_ds_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_ontonotes_pitangent_ds_pipeline` is a English model originally trained by pitangent-ds. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_ontonotes_pitangent_ds_pipeline_en_5.5.0_3.0_1726729677840.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_ontonotes_pitangent_ds_pipeline_en_5.5.0_3.0_1726729677840.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_ontonotes_pitangent_ds_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_ontonotes_pitangent_ds_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_ontonotes_pitangent_ds_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|449.7 MB| + +## References + +https://huggingface.co/pitangent-ds/roberta-base-ontonotes + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-roberta_base_swedish_cased_en.md b/docs/_posts/ahmedlone127/2024-09-19-roberta_base_swedish_cased_en.md new file mode 100644 index 00000000000000..2c43e864746f95 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-roberta_base_swedish_cased_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_base_swedish_cased RoBertaEmbeddings from KBLab +author: John Snow Labs +name: roberta_base_swedish_cased +date: 2024-09-19 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_swedish_cased` is a English model originally trained by KBLab. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_swedish_cased_en_5.5.0_3.0_1726778014919.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_swedish_cased_en_5.5.0_3.0_1726778014919.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("roberta_base_swedish_cased","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("roberta_base_swedish_cased","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_swedish_cased| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|470.2 MB| + +## References + +https://huggingface.co/KBLab/roberta-base-swedish-cased \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-roberta_base_swedish_cased_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-roberta_base_swedish_cased_pipeline_en.md new file mode 100644 index 00000000000000..3b0250d9841f9a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-roberta_base_swedish_cased_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_swedish_cased_pipeline pipeline RoBertaEmbeddings from KBLab +author: John Snow Labs +name: roberta_base_swedish_cased_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_swedish_cased_pipeline` is a English model originally trained by KBLab. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_swedish_cased_pipeline_en_5.5.0_3.0_1726778041795.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_swedish_cased_pipeline_en_5.5.0_3.0_1726778041795.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_swedish_cased_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_swedish_cased_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_swedish_cased_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|470.2 MB| + +## References + +https://huggingface.co/KBLab/roberta-base-swedish-cased + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-roberta_base_xshubhamx_en.md b/docs/_posts/ahmedlone127/2024-09-19-roberta_base_xshubhamx_en.md new file mode 100644 index 00000000000000..ea8b300ac4b1d5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-roberta_base_xshubhamx_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_base_xshubhamx RoBertaForSequenceClassification from xshubhamx +author: John Snow Labs +name: roberta_base_xshubhamx +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_xshubhamx` is a English model originally trained by xshubhamx. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_xshubhamx_en_5.5.0_3.0_1726725747560.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_xshubhamx_en_5.5.0_3.0_1726725747560.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_base_xshubhamx","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_base_xshubhamx", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_xshubhamx| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|442.4 MB| + +## References + +https://huggingface.co/xshubhamx/roberta-base \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-roberta_base_xshubhamx_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-roberta_base_xshubhamx_pipeline_en.md new file mode 100644 index 00000000000000..7298291b3bd299 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-roberta_base_xshubhamx_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_xshubhamx_pipeline pipeline RoBertaForSequenceClassification from xshubhamx +author: John Snow Labs +name: roberta_base_xshubhamx_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_xshubhamx_pipeline` is a English model originally trained by xshubhamx. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_xshubhamx_pipeline_en_5.5.0_3.0_1726725779379.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_xshubhamx_pipeline_en_5.5.0_3.0_1726725779379.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_xshubhamx_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_xshubhamx_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_xshubhamx_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|442.5 MB| + +## References + +https://huggingface.co/xshubhamx/roberta-base + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-roberta_classifier_autonlp_fake_covid_news_36769078_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-roberta_classifier_autonlp_fake_covid_news_36769078_pipeline_en.md new file mode 100644 index 00000000000000..1cc85836d7b8e4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-roberta_classifier_autonlp_fake_covid_news_36769078_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_classifier_autonlp_fake_covid_news_36769078_pipeline pipeline RoBertaForSequenceClassification from Qinghui +author: John Snow Labs +name: roberta_classifier_autonlp_fake_covid_news_36769078_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_classifier_autonlp_fake_covid_news_36769078_pipeline` is a English model originally trained by Qinghui. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_classifier_autonlp_fake_covid_news_36769078_pipeline_en_5.5.0_3.0_1726780097198.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_classifier_autonlp_fake_covid_news_36769078_pipeline_en_5.5.0_3.0_1726780097198.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_classifier_autonlp_fake_covid_news_36769078_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_classifier_autonlp_fake_covid_news_36769078_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_classifier_autonlp_fake_covid_news_36769078_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/Qinghui/autonlp-fake-covid-news-36769078 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-roberta_clinical_wl_spanish_ner_en.md b/docs/_posts/ahmedlone127/2024-09-19-roberta_clinical_wl_spanish_ner_en.md new file mode 100644 index 00000000000000..5686620f139ff0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-roberta_clinical_wl_spanish_ner_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_clinical_wl_spanish_ner RoBertaForTokenClassification from manucos +author: John Snow Labs +name: roberta_clinical_wl_spanish_ner +date: 2024-09-19 +tags: [en, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_clinical_wl_spanish_ner` is a English model originally trained by manucos. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_clinical_wl_spanish_ner_en_5.5.0_3.0_1726729290278.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_clinical_wl_spanish_ner_en_5.5.0_3.0_1726729290278.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_clinical_wl_spanish_ner","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_clinical_wl_spanish_ner", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_clinical_wl_spanish_ner| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|469.7 MB| + +## References + +https://huggingface.co/manucos/roberta-clinical-wl-es-ner \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-roberta_clinical_wl_spanish_ner_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-roberta_clinical_wl_spanish_ner_pipeline_en.md new file mode 100644 index 00000000000000..a44e92a566e7c6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-roberta_clinical_wl_spanish_ner_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_clinical_wl_spanish_ner_pipeline pipeline RoBertaForTokenClassification from manucos +author: John Snow Labs +name: roberta_clinical_wl_spanish_ner_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_clinical_wl_spanish_ner_pipeline` is a English model originally trained by manucos. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_clinical_wl_spanish_ner_pipeline_en_5.5.0_3.0_1726729313429.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_clinical_wl_spanish_ner_pipeline_en_5.5.0_3.0_1726729313429.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_clinical_wl_spanish_ner_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_clinical_wl_spanish_ner_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_clinical_wl_spanish_ner_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|469.8 MB| + +## References + +https://huggingface.co/manucos/roberta-clinical-wl-es-ner + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-roberta_combined_generated_epoch_6_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-roberta_combined_generated_epoch_6_pipeline_en.md new file mode 100644 index 00000000000000..b7f6b693b58809 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-roberta_combined_generated_epoch_6_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_combined_generated_epoch_6_pipeline pipeline RoBertaForTokenClassification from ICT2214Team7 +author: John Snow Labs +name: roberta_combined_generated_epoch_6_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_combined_generated_epoch_6_pipeline` is a English model originally trained by ICT2214Team7. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_combined_generated_epoch_6_pipeline_en_5.5.0_3.0_1726731061230.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_combined_generated_epoch_6_pipeline_en_5.5.0_3.0_1726731061230.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_combined_generated_epoch_6_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_combined_generated_epoch_6_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_combined_generated_epoch_6_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|306.6 MB| + +## References + +https://huggingface.co/ICT2214Team7/RoBERTa_Combined_Generated_epoch_6 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-roberta_combined_generated_v1_1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-roberta_combined_generated_v1_1_pipeline_en.md new file mode 100644 index 00000000000000..3a5b855c3564a1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-roberta_combined_generated_v1_1_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_combined_generated_v1_1_pipeline pipeline RoBertaForTokenClassification from ICT2214Team7 +author: John Snow Labs +name: roberta_combined_generated_v1_1_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_combined_generated_v1_1_pipeline` is a English model originally trained by ICT2214Team7. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_combined_generated_v1_1_pipeline_en_5.5.0_3.0_1726722944610.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_combined_generated_v1_1_pipeline_en_5.5.0_3.0_1726722944610.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_combined_generated_v1_1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_combined_generated_v1_1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_combined_generated_v1_1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|306.6 MB| + +## References + +https://huggingface.co/ICT2214Team7/RoBERTa_Combined_Generated_v1.1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-roberta_covid_sentimental_analysis_classifier_1_en.md b/docs/_posts/ahmedlone127/2024-09-19-roberta_covid_sentimental_analysis_classifier_1_en.md new file mode 100644 index 00000000000000..1dd0ff4e1904d9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-roberta_covid_sentimental_analysis_classifier_1_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_covid_sentimental_analysis_classifier_1 RoBertaForSequenceClassification from gyesibiney +author: John Snow Labs +name: roberta_covid_sentimental_analysis_classifier_1 +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_covid_sentimental_analysis_classifier_1` is a English model originally trained by gyesibiney. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_covid_sentimental_analysis_classifier_1_en_5.5.0_3.0_1726750299660.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_covid_sentimental_analysis_classifier_1_en_5.5.0_3.0_1726750299660.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_covid_sentimental_analysis_classifier_1","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_covid_sentimental_analysis_classifier_1", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_covid_sentimental_analysis_classifier_1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|468.1 MB| + +## References + +https://huggingface.co/gyesibiney/roberta-covid-sentimental-analysis-classifier-1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-roberta_large_bne_ctebmsp_es.md b/docs/_posts/ahmedlone127/2024-09-19-roberta_large_bne_ctebmsp_es.md new file mode 100644 index 00000000000000..cd9b735b7790ff --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-roberta_large_bne_ctebmsp_es.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Castilian, Spanish roberta_large_bne_ctebmsp RoBertaForTokenClassification from IIC +author: John Snow Labs +name: roberta_large_bne_ctebmsp +date: 2024-09-19 +tags: [es, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: es +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_large_bne_ctebmsp` is a Castilian, Spanish model originally trained by IIC. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_large_bne_ctebmsp_es_5.5.0_3.0_1726729659749.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_large_bne_ctebmsp_es_5.5.0_3.0_1726729659749.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_large_bne_ctebmsp","es") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_large_bne_ctebmsp", "es") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_large_bne_ctebmsp| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|es| +|Size:|1.3 GB| + +## References + +https://huggingface.co/IIC/roberta-large-bne-ctebmsp \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-roberta_large_earnings21_non_normalized_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-roberta_large_earnings21_non_normalized_pipeline_en.md new file mode 100644 index 00000000000000..75b925fcd5235c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-roberta_large_earnings21_non_normalized_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_large_earnings21_non_normalized_pipeline pipeline RoBertaForTokenClassification from anonymoussubmissions +author: John Snow Labs +name: roberta_large_earnings21_non_normalized_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_large_earnings21_non_normalized_pipeline` is a English model originally trained by anonymoussubmissions. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_large_earnings21_non_normalized_pipeline_en_5.5.0_3.0_1726745541047.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_large_earnings21_non_normalized_pipeline_en_5.5.0_3.0_1726745541047.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_large_earnings21_non_normalized_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_large_earnings21_non_normalized_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_large_earnings21_non_normalized_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/anonymoussubmissions/roberta-large-earnings21-non-normalized + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-roberta_large_full_finetuned_ner_single_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-roberta_large_full_finetuned_ner_single_pipeline_en.md new file mode 100644 index 00000000000000..4d1f49311062ab --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-roberta_large_full_finetuned_ner_single_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_large_full_finetuned_ner_single_pipeline pipeline RoBertaForTokenClassification from DDDacc +author: John Snow Labs +name: roberta_large_full_finetuned_ner_single_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_large_full_finetuned_ner_single_pipeline` is a English model originally trained by DDDacc. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_large_full_finetuned_ner_single_pipeline_en_5.5.0_3.0_1726745765065.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_large_full_finetuned_ner_single_pipeline_en_5.5.0_3.0_1726745765065.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_large_full_finetuned_ner_single_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_large_full_finetuned_ner_single_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_large_full_finetuned_ner_single_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/DDDacc/RoBERTa-Large-full-finetuned-ner-single + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-roberta_large_iterater_classifier_en.md b/docs/_posts/ahmedlone127/2024-09-19-roberta_large_iterater_classifier_en.md new file mode 100644 index 00000000000000..deaaa4ba9dd52a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-roberta_large_iterater_classifier_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_large_iterater_classifier RoBertaForSequenceClassification from owanr +author: John Snow Labs +name: roberta_large_iterater_classifier +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_large_iterater_classifier` is a English model originally trained by owanr. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_large_iterater_classifier_en_5.5.0_3.0_1726751278729.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_large_iterater_classifier_en_5.5.0_3.0_1726751278729.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_large_iterater_classifier","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_large_iterater_classifier", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_large_iterater_classifier| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|845.1 MB| + +## References + +https://huggingface.co/owanr/roberta_large_iterater_classifier \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-roberta_large_merged_subtaskb_en.md b/docs/_posts/ahmedlone127/2024-09-19-roberta_large_merged_subtaskb_en.md new file mode 100644 index 00000000000000..49cf3ac5f268c3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-roberta_large_merged_subtaskb_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_large_merged_subtaskb RoBertaForSequenceClassification from Sansh2003 +author: John Snow Labs +name: roberta_large_merged_subtaskb +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_large_merged_subtaskb` is a English model originally trained by Sansh2003. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_large_merged_subtaskb_en_5.5.0_3.0_1726726488149.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_large_merged_subtaskb_en_5.5.0_3.0_1726726488149.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_large_merged_subtaskb","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_large_merged_subtaskb", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_large_merged_subtaskb| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/Sansh2003/roberta-large-merged-subtaskB \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-roberta_large_merged_subtaskb_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-roberta_large_merged_subtaskb_pipeline_en.md new file mode 100644 index 00000000000000..d875220bb56513 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-roberta_large_merged_subtaskb_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_large_merged_subtaskb_pipeline pipeline RoBertaForSequenceClassification from Sansh2003 +author: John Snow Labs +name: roberta_large_merged_subtaskb_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_large_merged_subtaskb_pipeline` is a English model originally trained by Sansh2003. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_large_merged_subtaskb_pipeline_en_5.5.0_3.0_1726726549749.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_large_merged_subtaskb_pipeline_en_5.5.0_3.0_1726726549749.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_large_merged_subtaskb_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_large_merged_subtaskb_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_large_merged_subtaskb_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/Sansh2003/roberta-large-merged-subtaskB + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-roberta_large_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-roberta_large_pipeline_en.md new file mode 100644 index 00000000000000..8ee3b7cbc016ae --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-roberta_large_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_large_pipeline pipeline RoBertaForTokenClassification from maple +author: John Snow Labs +name: roberta_large_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_large_pipeline` is a English model originally trained by maple. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_large_pipeline_en_5.5.0_3.0_1726730913039.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_large_pipeline_en_5.5.0_3.0_1726730913039.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_large_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_large_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_large_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/maple/roberta-large + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-roberta_large_ppt_occitan_en.md b/docs/_posts/ahmedlone127/2024-09-19-roberta_large_ppt_occitan_en.md new file mode 100644 index 00000000000000..2a2ec984678f2a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-roberta_large_ppt_occitan_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_large_ppt_occitan RoBertaEmbeddings from mehrshadk +author: John Snow Labs +name: roberta_large_ppt_occitan +date: 2024-09-19 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_large_ppt_occitan` is a English model originally trained by mehrshadk. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_large_ppt_occitan_en_5.5.0_3.0_1726749573508.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_large_ppt_occitan_en_5.5.0_3.0_1726749573508.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("roberta_large_ppt_occitan","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("roberta_large_ppt_occitan","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_large_ppt_occitan| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/mehrshadk/roberta_Large_ppt_OC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-roberta_large_ppt_occitan_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-roberta_large_ppt_occitan_pipeline_en.md new file mode 100644 index 00000000000000..9be4d9d6dbc308 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-roberta_large_ppt_occitan_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_large_ppt_occitan_pipeline pipeline RoBertaEmbeddings from mehrshadk +author: John Snow Labs +name: roberta_large_ppt_occitan_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_large_ppt_occitan_pipeline` is a English model originally trained by mehrshadk. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_large_ppt_occitan_pipeline_en_5.5.0_3.0_1726749638382.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_large_ppt_occitan_pipeline_en_5.5.0_3.0_1726749638382.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_large_ppt_occitan_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_large_ppt_occitan_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_large_ppt_occitan_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/mehrshadk/roberta_Large_ppt_OC + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-roberta_large_sentiment_sst5_mapped_grouped_0_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-roberta_large_sentiment_sst5_mapped_grouped_0_pipeline_en.md new file mode 100644 index 00000000000000..4915ff473ba31f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-roberta_large_sentiment_sst5_mapped_grouped_0_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_large_sentiment_sst5_mapped_grouped_0_pipeline pipeline RoBertaForSequenceClassification from kohankhaki +author: John Snow Labs +name: roberta_large_sentiment_sst5_mapped_grouped_0_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_large_sentiment_sst5_mapped_grouped_0_pipeline` is a English model originally trained by kohankhaki. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_large_sentiment_sst5_mapped_grouped_0_pipeline_en_5.5.0_3.0_1726733372453.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_large_sentiment_sst5_mapped_grouped_0_pipeline_en_5.5.0_3.0_1726733372453.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_large_sentiment_sst5_mapped_grouped_0_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_large_sentiment_sst5_mapped_grouped_0_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_large_sentiment_sst5_mapped_grouped_0_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/kohankhaki/roberta-large-sentiment-sst5-mapped-grouped-0 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-roberta_persian_farsi_zwnj_base_v1_en.md b/docs/_posts/ahmedlone127/2024-09-19-roberta_persian_farsi_zwnj_base_v1_en.md new file mode 100644 index 00000000000000..b4794d79636165 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-roberta_persian_farsi_zwnj_base_v1_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_persian_farsi_zwnj_base_v1 RoBertaForSequenceClassification from soltaniali +author: John Snow Labs +name: roberta_persian_farsi_zwnj_base_v1 +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_persian_farsi_zwnj_base_v1` is a English model originally trained by soltaniali. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_persian_farsi_zwnj_base_v1_en_5.5.0_3.0_1726733346800.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_persian_farsi_zwnj_base_v1_en_5.5.0_3.0_1726733346800.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_persian_farsi_zwnj_base_v1","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_persian_farsi_zwnj_base_v1", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_persian_farsi_zwnj_base_v1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|444.3 MB| + +## References + +https://huggingface.co/soltaniali/roberta-fa-zwnj-base_v1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-roberta_persian_farsi_zwnj_base_v1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-roberta_persian_farsi_zwnj_base_v1_pipeline_en.md new file mode 100644 index 00000000000000..90a21af755309e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-roberta_persian_farsi_zwnj_base_v1_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_persian_farsi_zwnj_base_v1_pipeline pipeline RoBertaForSequenceClassification from soltaniali +author: John Snow Labs +name: roberta_persian_farsi_zwnj_base_v1_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_persian_farsi_zwnj_base_v1_pipeline` is a English model originally trained by soltaniali. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_persian_farsi_zwnj_base_v1_pipeline_en_5.5.0_3.0_1726733368861.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_persian_farsi_zwnj_base_v1_pipeline_en_5.5.0_3.0_1726733368861.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_persian_farsi_zwnj_base_v1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_persian_farsi_zwnj_base_v1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_persian_farsi_zwnj_base_v1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|444.3 MB| + +## References + +https://huggingface.co/soltaniali/roberta-fa-zwnj-base_v1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-roberta_poetry_sadness_crpo_en.md b/docs/_posts/ahmedlone127/2024-09-19-roberta_poetry_sadness_crpo_en.md new file mode 100644 index 00000000000000..3fed12a2ca7726 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-roberta_poetry_sadness_crpo_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_poetry_sadness_crpo RoBertaEmbeddings from andreipb +author: John Snow Labs +name: roberta_poetry_sadness_crpo +date: 2024-09-19 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_poetry_sadness_crpo` is a English model originally trained by andreipb. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_poetry_sadness_crpo_en_5.5.0_3.0_1726778319056.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_poetry_sadness_crpo_en_5.5.0_3.0_1726778319056.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("roberta_poetry_sadness_crpo","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("roberta_poetry_sadness_crpo","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_poetry_sadness_crpo| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|466.2 MB| + +## References + +https://huggingface.co/andreipb/roberta-poetry-sadness-crpo \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-roberta_poetry_sadness_crpo_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-roberta_poetry_sadness_crpo_pipeline_en.md new file mode 100644 index 00000000000000..4801fdee514959 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-roberta_poetry_sadness_crpo_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_poetry_sadness_crpo_pipeline pipeline RoBertaEmbeddings from andreipb +author: John Snow Labs +name: roberta_poetry_sadness_crpo_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_poetry_sadness_crpo_pipeline` is a English model originally trained by andreipb. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_poetry_sadness_crpo_pipeline_en_5.5.0_3.0_1726778341641.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_poetry_sadness_crpo_pipeline_en_5.5.0_3.0_1726778341641.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_poetry_sadness_crpo_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_poetry_sadness_crpo_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_poetry_sadness_crpo_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|466.2 MB| + +## References + +https://huggingface.co/andreipb/roberta-poetry-sadness-crpo + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-roberta_sayula_popoluca_tagging_hosnahoseini_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-roberta_sayula_popoluca_tagging_hosnahoseini_pipeline_en.md new file mode 100644 index 00000000000000..06878647989adf --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-roberta_sayula_popoluca_tagging_hosnahoseini_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_sayula_popoluca_tagging_hosnahoseini_pipeline pipeline RoBertaForTokenClassification from hosnahoseini +author: John Snow Labs +name: roberta_sayula_popoluca_tagging_hosnahoseini_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_sayula_popoluca_tagging_hosnahoseini_pipeline` is a English model originally trained by hosnahoseini. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_sayula_popoluca_tagging_hosnahoseini_pipeline_en_5.5.0_3.0_1726729485277.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_sayula_popoluca_tagging_hosnahoseini_pipeline_en_5.5.0_3.0_1726729485277.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_sayula_popoluca_tagging_hosnahoseini_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_sayula_popoluca_tagging_hosnahoseini_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_sayula_popoluca_tagging_hosnahoseini_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|464.5 MB| + +## References + +https://huggingface.co/hosnahoseini/roberta-pos-tagging + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-roberta_shopee_sentiment_gadgets_pipeline_tl.md b/docs/_posts/ahmedlone127/2024-09-19-roberta_shopee_sentiment_gadgets_pipeline_tl.md new file mode 100644 index 00000000000000..ab869a61b9d294 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-roberta_shopee_sentiment_gadgets_pipeline_tl.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Tagalog roberta_shopee_sentiment_gadgets_pipeline pipeline RoBertaForSequenceClassification from magixxixx +author: John Snow Labs +name: roberta_shopee_sentiment_gadgets_pipeline +date: 2024-09-19 +tags: [tl, open_source, pipeline, onnx] +task: Text Classification +language: tl +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_shopee_sentiment_gadgets_pipeline` is a Tagalog model originally trained by magixxixx. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_shopee_sentiment_gadgets_pipeline_tl_5.5.0_3.0_1726779765567.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_shopee_sentiment_gadgets_pipeline_tl_5.5.0_3.0_1726779765567.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_shopee_sentiment_gadgets_pipeline", lang = "tl") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_shopee_sentiment_gadgets_pipeline", lang = "tl") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_shopee_sentiment_gadgets_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|tl| +|Size:|409.4 MB| + +## References + +https://huggingface.co/magixxixx/roberta-shopee-sentiment-gadgets + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-roberta_shopee_sentiment_gadgets_tl.md b/docs/_posts/ahmedlone127/2024-09-19-roberta_shopee_sentiment_gadgets_tl.md new file mode 100644 index 00000000000000..8ba05eaf36f0ed --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-roberta_shopee_sentiment_gadgets_tl.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Tagalog roberta_shopee_sentiment_gadgets RoBertaForSequenceClassification from magixxixx +author: John Snow Labs +name: roberta_shopee_sentiment_gadgets +date: 2024-09-19 +tags: [tl, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: tl +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_shopee_sentiment_gadgets` is a Tagalog model originally trained by magixxixx. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_shopee_sentiment_gadgets_tl_5.5.0_3.0_1726779745735.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_shopee_sentiment_gadgets_tl_5.5.0_3.0_1726779745735.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_shopee_sentiment_gadgets","tl") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_shopee_sentiment_gadgets", "tl") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_shopee_sentiment_gadgets| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|tl| +|Size:|409.3 MB| + +## References + +https://huggingface.co/magixxixx/roberta-shopee-sentiment-gadgets \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-roberta_untrained_1eps_seed995_en.md b/docs/_posts/ahmedlone127/2024-09-19-roberta_untrained_1eps_seed995_en.md new file mode 100644 index 00000000000000..81ca01fb01b08d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-roberta_untrained_1eps_seed995_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_untrained_1eps_seed995 RoBertaForSequenceClassification from custeau +author: John Snow Labs +name: roberta_untrained_1eps_seed995 +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_untrained_1eps_seed995` is a English model originally trained by custeau. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_untrained_1eps_seed995_en_5.5.0_3.0_1726779857199.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_untrained_1eps_seed995_en_5.5.0_3.0_1726779857199.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_untrained_1eps_seed995","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_untrained_1eps_seed995", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_untrained_1eps_seed995| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|447.8 MB| + +## References + +https://huggingface.co/custeau/roberta_untrained_1eps_seed995 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-roberta_untrained_1eps_seed995_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-roberta_untrained_1eps_seed995_pipeline_en.md new file mode 100644 index 00000000000000..ebd26f7e34a78c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-roberta_untrained_1eps_seed995_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_untrained_1eps_seed995_pipeline pipeline RoBertaForSequenceClassification from custeau +author: John Snow Labs +name: roberta_untrained_1eps_seed995_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_untrained_1eps_seed995_pipeline` is a English model originally trained by custeau. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_untrained_1eps_seed995_pipeline_en_5.5.0_3.0_1726779888079.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_untrained_1eps_seed995_pipeline_en_5.5.0_3.0_1726779888079.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_untrained_1eps_seed995_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_untrained_1eps_seed995_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_untrained_1eps_seed995_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|447.9 MB| + +## References + +https://huggingface.co/custeau/roberta_untrained_1eps_seed995 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-roberta_untrained_2eps_seed408_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-roberta_untrained_2eps_seed408_pipeline_en.md new file mode 100644 index 00000000000000..d8f896fa99a0c0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-roberta_untrained_2eps_seed408_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_untrained_2eps_seed408_pipeline pipeline RoBertaForSequenceClassification from custeau +author: John Snow Labs +name: roberta_untrained_2eps_seed408_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_untrained_2eps_seed408_pipeline` is a English model originally trained by custeau. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_untrained_2eps_seed408_pipeline_en_5.5.0_3.0_1726725802135.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_untrained_2eps_seed408_pipeline_en_5.5.0_3.0_1726725802135.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_untrained_2eps_seed408_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_untrained_2eps_seed408_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_untrained_2eps_seed408_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|447.9 MB| + +## References + +https://huggingface.co/custeau/roberta_untrained_2eps_seed408 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-robertabase_subjectivity_1_actual_en.md b/docs/_posts/ahmedlone127/2024-09-19-robertabase_subjectivity_1_actual_en.md new file mode 100644 index 00000000000000..e909c92bd39e5b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-robertabase_subjectivity_1_actual_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English robertabase_subjectivity_1_actual RoBertaForSequenceClassification from Muffins987 +author: John Snow Labs +name: robertabase_subjectivity_1_actual +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`robertabase_subjectivity_1_actual` is a English model originally trained by Muffins987. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/robertabase_subjectivity_1_actual_en_5.5.0_3.0_1726780212339.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/robertabase_subjectivity_1_actual_en_5.5.0_3.0_1726780212339.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("robertabase_subjectivity_1_actual","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("robertabase_subjectivity_1_actual", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|robertabase_subjectivity_1_actual| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|466.5 MB| + +## References + +https://huggingface.co/Muffins987/robertabase-subjectivity-1-actual \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-robertabase_subjectivity_1_actual_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-robertabase_subjectivity_1_actual_pipeline_en.md new file mode 100644 index 00000000000000..6fa27bb3cfbb6d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-robertabase_subjectivity_1_actual_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English robertabase_subjectivity_1_actual_pipeline pipeline RoBertaForSequenceClassification from Muffins987 +author: John Snow Labs +name: robertabase_subjectivity_1_actual_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`robertabase_subjectivity_1_actual_pipeline` is a English model originally trained by Muffins987. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/robertabase_subjectivity_1_actual_pipeline_en_5.5.0_3.0_1726780236224.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/robertabase_subjectivity_1_actual_pipeline_en_5.5.0_3.0_1726780236224.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("robertabase_subjectivity_1_actual_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("robertabase_subjectivity_1_actual_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|robertabase_subjectivity_1_actual_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|466.5 MB| + +## References + +https://huggingface.co/Muffins987/robertabase-subjectivity-1-actual + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-robertaiqbal_en.md b/docs/_posts/ahmedlone127/2024-09-19-robertaiqbal_en.md new file mode 100644 index 00000000000000..21895aa0622dd1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-robertaiqbal_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English robertaiqbal RoBertaEmbeddings from cxfajar197 +author: John Snow Labs +name: robertaiqbal +date: 2024-09-19 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`robertaiqbal` is a English model originally trained by cxfajar197. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/robertaiqbal_en_5.5.0_3.0_1726747401065.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/robertaiqbal_en_5.5.0_3.0_1726747401065.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("robertaiqbal","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("robertaiqbal","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|robertaiqbal| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|471.0 MB| + +## References + +https://huggingface.co/cxfajar197/robertaiqbal \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-robertaiqbal_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-robertaiqbal_pipeline_en.md new file mode 100644 index 00000000000000..a7739d1debde2d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-robertaiqbal_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English robertaiqbal_pipeline pipeline RoBertaEmbeddings from cxfajar197 +author: John Snow Labs +name: robertaiqbal_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`robertaiqbal_pipeline` is a English model originally trained by cxfajar197. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/robertaiqbal_pipeline_en_5.5.0_3.0_1726747424973.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/robertaiqbal_pipeline_en_5.5.0_3.0_1726747424973.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("robertaiqbal_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("robertaiqbal_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|robertaiqbal_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|471.0 MB| + +## References + +https://huggingface.co/cxfajar197/robertaiqbal + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-robertvar5pc_en.md b/docs/_posts/ahmedlone127/2024-09-19-robertvar5pc_en.md new file mode 100644 index 00000000000000..d9a02c3687a301 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-robertvar5pc_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English robertvar5pc RoBertaEmbeddings from gnathoi +author: John Snow Labs +name: robertvar5pc +date: 2024-09-19 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`robertvar5pc` is a English model originally trained by gnathoi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/robertvar5pc_en_5.5.0_3.0_1726778081730.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/robertvar5pc_en_5.5.0_3.0_1726778081730.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("robertvar5pc","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("robertvar5pc","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|robertvar5pc| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|298.2 MB| + +## References + +https://huggingface.co/gnathoi/roBERTvar5pc \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-robertvar5pc_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-robertvar5pc_pipeline_en.md new file mode 100644 index 00000000000000..ef02ad9afb4c8d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-robertvar5pc_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English robertvar5pc_pipeline pipeline RoBertaEmbeddings from gnathoi +author: John Snow Labs +name: robertvar5pc_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`robertvar5pc_pipeline` is a English model originally trained by gnathoi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/robertvar5pc_pipeline_en_5.5.0_3.0_1726778171184.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/robertvar5pc_pipeline_en_5.5.0_3.0_1726778171184.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("robertvar5pc_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("robertvar5pc_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|robertvar5pc_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|298.2 MB| + +## References + +https://huggingface.co/gnathoi/roBERTvar5pc + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-robertvar_20pc_en.md b/docs/_posts/ahmedlone127/2024-09-19-robertvar_20pc_en.md new file mode 100644 index 00000000000000..bab05e99fc2df2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-robertvar_20pc_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English robertvar_20pc RoBertaEmbeddings from gnathoi +author: John Snow Labs +name: robertvar_20pc +date: 2024-09-19 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`robertvar_20pc` is a English model originally trained by gnathoi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/robertvar_20pc_en_5.5.0_3.0_1726778400156.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/robertvar_20pc_en_5.5.0_3.0_1726778400156.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("robertvar_20pc","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("robertvar_20pc","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|robertvar_20pc| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|298.2 MB| + +## References + +https://huggingface.co/gnathoi/RoBERTvar_20pc \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-robertvar_20pc_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-robertvar_20pc_pipeline_en.md new file mode 100644 index 00000000000000..bebe5daa7e6d32 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-robertvar_20pc_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English robertvar_20pc_pipeline pipeline RoBertaEmbeddings from gnathoi +author: John Snow Labs +name: robertvar_20pc_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`robertvar_20pc_pipeline` is a English model originally trained by gnathoi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/robertvar_20pc_pipeline_en_5.5.0_3.0_1726778490544.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/robertvar_20pc_pipeline_en_5.5.0_3.0_1726778490544.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("robertvar_20pc_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("robertvar_20pc_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|robertvar_20pc_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|298.2 MB| + +## References + +https://huggingface.co/gnathoi/RoBERTvar_20pc + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-rotten_tomatoes_roberta_base_seed_2_en.md b/docs/_posts/ahmedlone127/2024-09-19-rotten_tomatoes_roberta_base_seed_2_en.md new file mode 100644 index 00000000000000..45766e096787d0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-rotten_tomatoes_roberta_base_seed_2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English rotten_tomatoes_roberta_base_seed_2 RoBertaForSequenceClassification from utahnlp +author: John Snow Labs +name: rotten_tomatoes_roberta_base_seed_2 +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`rotten_tomatoes_roberta_base_seed_2` is a English model originally trained by utahnlp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/rotten_tomatoes_roberta_base_seed_2_en_5.5.0_3.0_1726750668565.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/rotten_tomatoes_roberta_base_seed_2_en_5.5.0_3.0_1726750668565.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("rotten_tomatoes_roberta_base_seed_2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("rotten_tomatoes_roberta_base_seed_2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|rotten_tomatoes_roberta_base_seed_2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|435.9 MB| + +## References + +https://huggingface.co/utahnlp/rotten_tomatoes_roberta-base_seed-2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-rubioroberta_neg_en.md b/docs/_posts/ahmedlone127/2024-09-19-rubioroberta_neg_en.md new file mode 100644 index 00000000000000..1ab30ef0dad4f2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-rubioroberta_neg_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English rubioroberta_neg RoBertaForTokenClassification from DimasikKurd +author: John Snow Labs +name: rubioroberta_neg +date: 2024-09-19 +tags: [en, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`rubioroberta_neg` is a English model originally trained by DimasikKurd. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/rubioroberta_neg_en_5.5.0_3.0_1726731289370.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/rubioroberta_neg_en_5.5.0_3.0_1726731289370.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("rubioroberta_neg","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("rubioroberta_neg", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|rubioroberta_neg| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/DimasikKurd/RuBioRoBERTa_neg \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-ruperta_base_es.md b/docs/_posts/ahmedlone127/2024-09-19-ruperta_base_es.md new file mode 100644 index 00000000000000..a14a2a6ca1dc84 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-ruperta_base_es.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Castilian, Spanish ruperta_base RoBertaEmbeddings from mrm8488 +author: John Snow Labs +name: ruperta_base +date: 2024-09-19 +tags: [es, open_source, onnx, embeddings, roberta] +task: Embeddings +language: es +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ruperta_base` is a Castilian, Spanish model originally trained by mrm8488. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ruperta_base_es_5.5.0_3.0_1726747270187.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ruperta_base_es_5.5.0_3.0_1726747270187.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("ruperta_base","es") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("ruperta_base","es") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ruperta_base| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|es| +|Size:|470.0 MB| + +## References + +https://huggingface.co/mrm8488/RuPERTa-base \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-ruperta_base_pipeline_es.md b/docs/_posts/ahmedlone127/2024-09-19-ruperta_base_pipeline_es.md new file mode 100644 index 00000000000000..3ddae9ef938527 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-ruperta_base_pipeline_es.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Castilian, Spanish ruperta_base_pipeline pipeline RoBertaEmbeddings from mrm8488 +author: John Snow Labs +name: ruperta_base_pipeline +date: 2024-09-19 +tags: [es, open_source, pipeline, onnx] +task: Embeddings +language: es +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ruperta_base_pipeline` is a Castilian, Spanish model originally trained by mrm8488. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ruperta_base_pipeline_es_5.5.0_3.0_1726747293480.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ruperta_base_pipeline_es_5.5.0_3.0_1726747293480.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("ruperta_base_pipeline", lang = "es") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("ruperta_base_pipeline", lang = "es") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ruperta_base_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|es| +|Size:|470.0 MB| + +## References + +https://huggingface.co/mrm8488/RuPERTa-base + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-ruroberta_large_mlm_tuned_en.md b/docs/_posts/ahmedlone127/2024-09-19-ruroberta_large_mlm_tuned_en.md new file mode 100644 index 00000000000000..2e02a0ae5b1ba0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-ruroberta_large_mlm_tuned_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English ruroberta_large_mlm_tuned RoBertaEmbeddings from warleagle +author: John Snow Labs +name: ruroberta_large_mlm_tuned +date: 2024-09-19 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ruroberta_large_mlm_tuned` is a English model originally trained by warleagle. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ruroberta_large_mlm_tuned_en_5.5.0_3.0_1726747150910.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ruroberta_large_mlm_tuned_en_5.5.0_3.0_1726747150910.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("ruroberta_large_mlm_tuned","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("ruroberta_large_mlm_tuned","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ruroberta_large_mlm_tuned| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/warleagle/ruRoberta-large-mlm_tuned \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-ruroberta_large_mlm_tuned_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-ruroberta_large_mlm_tuned_pipeline_en.md new file mode 100644 index 00000000000000..3d0b070b6bbcc2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-ruroberta_large_mlm_tuned_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English ruroberta_large_mlm_tuned_pipeline pipeline RoBertaEmbeddings from warleagle +author: John Snow Labs +name: ruroberta_large_mlm_tuned_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ruroberta_large_mlm_tuned_pipeline` is a English model originally trained by warleagle. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ruroberta_large_mlm_tuned_pipeline_en_5.5.0_3.0_1726747217624.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ruroberta_large_mlm_tuned_pipeline_en_5.5.0_3.0_1726747217624.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("ruroberta_large_mlm_tuned_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("ruroberta_large_mlm_tuned_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ruroberta_large_mlm_tuned_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/warleagle/ruRoberta-large-mlm_tuned + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-sanskrit_saskta_distilbert_eng_en.md b/docs/_posts/ahmedlone127/2024-09-19-sanskrit_saskta_distilbert_eng_en.md new file mode 100644 index 00000000000000..3729961e3ae7d7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-sanskrit_saskta_distilbert_eng_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sanskrit_saskta_distilbert_eng DistilBertForSequenceClassification from keefezowie +author: John Snow Labs +name: sanskrit_saskta_distilbert_eng +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sanskrit_saskta_distilbert_eng` is a English model originally trained by keefezowie. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sanskrit_saskta_distilbert_eng_en_5.5.0_3.0_1726740652930.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sanskrit_saskta_distilbert_eng_en_5.5.0_3.0_1726740652930.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("sanskrit_saskta_distilbert_eng","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("sanskrit_saskta_distilbert_eng", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sanskrit_saskta_distilbert_eng| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|250.8 MB| + +## References + +https://huggingface.co/keefezowie/sa_distilBERT_eng \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-sbic_roberta_text_disagreement_predictor_en.md b/docs/_posts/ahmedlone127/2024-09-19-sbic_roberta_text_disagreement_predictor_en.md new file mode 100644 index 00000000000000..08a963adb0ee04 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-sbic_roberta_text_disagreement_predictor_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sbic_roberta_text_disagreement_predictor RoBertaForSequenceClassification from RuyuanWan +author: John Snow Labs +name: sbic_roberta_text_disagreement_predictor +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sbic_roberta_text_disagreement_predictor` is a English model originally trained by RuyuanWan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sbic_roberta_text_disagreement_predictor_en_5.5.0_3.0_1726733058318.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sbic_roberta_text_disagreement_predictor_en_5.5.0_3.0_1726733058318.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("sbic_roberta_text_disagreement_predictor","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("sbic_roberta_text_disagreement_predictor", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sbic_roberta_text_disagreement_predictor| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|415.5 MB| + +## References + +https://huggingface.co/RuyuanWan/SBIC_RoBERTa_Text_Disagreement_Predictor \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-sbic_roberta_text_disagreement_predictor_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-sbic_roberta_text_disagreement_predictor_pipeline_en.md new file mode 100644 index 00000000000000..888fe281eb147d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-sbic_roberta_text_disagreement_predictor_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English sbic_roberta_text_disagreement_predictor_pipeline pipeline RoBertaForSequenceClassification from RuyuanWan +author: John Snow Labs +name: sbic_roberta_text_disagreement_predictor_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sbic_roberta_text_disagreement_predictor_pipeline` is a English model originally trained by RuyuanWan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sbic_roberta_text_disagreement_predictor_pipeline_en_5.5.0_3.0_1726733101562.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sbic_roberta_text_disagreement_predictor_pipeline_en_5.5.0_3.0_1726733101562.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sbic_roberta_text_disagreement_predictor_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sbic_roberta_text_disagreement_predictor_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sbic_roberta_text_disagreement_predictor_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|415.5 MB| + +## References + +https://huggingface.co/RuyuanWan/SBIC_RoBERTa_Text_Disagreement_Predictor + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-schem_roberta_text_disagreement_binary_classifier_en.md b/docs/_posts/ahmedlone127/2024-09-19-schem_roberta_text_disagreement_binary_classifier_en.md new file mode 100644 index 00000000000000..545019295e699b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-schem_roberta_text_disagreement_binary_classifier_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English schem_roberta_text_disagreement_binary_classifier RoBertaForSequenceClassification from RuyuanWan +author: John Snow Labs +name: schem_roberta_text_disagreement_binary_classifier +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`schem_roberta_text_disagreement_binary_classifier` is a English model originally trained by RuyuanWan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/schem_roberta_text_disagreement_binary_classifier_en_5.5.0_3.0_1726733086788.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/schem_roberta_text_disagreement_binary_classifier_en_5.5.0_3.0_1726733086788.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("schem_roberta_text_disagreement_binary_classifier","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("schem_roberta_text_disagreement_binary_classifier", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|schem_roberta_text_disagreement_binary_classifier| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|422.8 MB| + +## References + +https://huggingface.co/RuyuanWan/SChem_RoBERTa_Text_Disagreement_Binary_Classifier \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-schem_roberta_text_disagreement_binary_classifier_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-schem_roberta_text_disagreement_binary_classifier_pipeline_en.md new file mode 100644 index 00000000000000..825148a4f53a4b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-schem_roberta_text_disagreement_binary_classifier_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English schem_roberta_text_disagreement_binary_classifier_pipeline pipeline RoBertaForSequenceClassification from RuyuanWan +author: John Snow Labs +name: schem_roberta_text_disagreement_binary_classifier_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`schem_roberta_text_disagreement_binary_classifier_pipeline` is a English model originally trained by RuyuanWan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/schem_roberta_text_disagreement_binary_classifier_pipeline_en_5.5.0_3.0_1726733127519.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/schem_roberta_text_disagreement_binary_classifier_pipeline_en_5.5.0_3.0_1726733127519.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("schem_roberta_text_disagreement_binary_classifier_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("schem_roberta_text_disagreement_binary_classifier_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|schem_roberta_text_disagreement_binary_classifier_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|422.9 MB| + +## References + +https://huggingface.co/RuyuanWan/SChem_RoBERTa_Text_Disagreement_Binary_Classifier + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-sent_aethiqs_gembert_bertje_50k_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-sent_aethiqs_gembert_bertje_50k_pipeline_en.md new file mode 100644 index 00000000000000..9c2f6b13796840 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-sent_aethiqs_gembert_bertje_50k_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_aethiqs_gembert_bertje_50k_pipeline pipeline BertSentenceEmbeddings from AethiQs-Max +author: John Snow Labs +name: sent_aethiqs_gembert_bertje_50k_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_aethiqs_gembert_bertje_50k_pipeline` is a English model originally trained by AethiQs-Max. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_aethiqs_gembert_bertje_50k_pipeline_en_5.5.0_3.0_1726782842624.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_aethiqs_gembert_bertje_50k_pipeline_en_5.5.0_3.0_1726782842624.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_aethiqs_gembert_bertje_50k_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_aethiqs_gembert_bertje_50k_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_aethiqs_gembert_bertje_50k_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.3 MB| + +## References + +https://huggingface.co/AethiQs-Max/AethiQs_GemBERT_bertje_50k + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-sent_bert_base_uncased_2022_habana_test_6_en.md b/docs/_posts/ahmedlone127/2024-09-19-sent_bert_base_uncased_2022_habana_test_6_en.md new file mode 100644 index 00000000000000..93fef3ca873154 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-sent_bert_base_uncased_2022_habana_test_6_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_base_uncased_2022_habana_test_6 BertSentenceEmbeddings from philschmid +author: John Snow Labs +name: sent_bert_base_uncased_2022_habana_test_6 +date: 2024-09-19 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_uncased_2022_habana_test_6` is a English model originally trained by philschmid. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_2022_habana_test_6_en_5.5.0_3.0_1726782700228.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_2022_habana_test_6_en_5.5.0_3.0_1726782700228.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_uncased_2022_habana_test_6","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_uncased_2022_habana_test_6","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_uncased_2022_habana_test_6| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|412.2 MB| + +## References + +https://huggingface.co/philschmid/bert-base-uncased-2022-habana-test-6 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-sent_bert_base_uncased_2022_habana_test_6_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-sent_bert_base_uncased_2022_habana_test_6_pipeline_en.md new file mode 100644 index 00000000000000..9936dc60d8a64d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-sent_bert_base_uncased_2022_habana_test_6_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_base_uncased_2022_habana_test_6_pipeline pipeline BertSentenceEmbeddings from philschmid +author: John Snow Labs +name: sent_bert_base_uncased_2022_habana_test_6_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_uncased_2022_habana_test_6_pipeline` is a English model originally trained by philschmid. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_2022_habana_test_6_pipeline_en_5.5.0_3.0_1726782719200.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_2022_habana_test_6_pipeline_en_5.5.0_3.0_1726782719200.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_uncased_2022_habana_test_6_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_uncased_2022_habana_test_6_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_uncased_2022_habana_test_6_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|412.8 MB| + +## References + +https://huggingface.co/philschmid/bert-base-uncased-2022-habana-test-6 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-sent_bert_base_uncased_finetuned_mol_mlm_0_3_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-sent_bert_base_uncased_finetuned_mol_mlm_0_3_pipeline_en.md new file mode 100644 index 00000000000000..483dc00d7b518b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-sent_bert_base_uncased_finetuned_mol_mlm_0_3_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_base_uncased_finetuned_mol_mlm_0_3_pipeline pipeline BertSentenceEmbeddings from matr1xx +author: John Snow Labs +name: sent_bert_base_uncased_finetuned_mol_mlm_0_3_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_uncased_finetuned_mol_mlm_0_3_pipeline` is a English model originally trained by matr1xx. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_finetuned_mol_mlm_0_3_pipeline_en_5.5.0_3.0_1726768591217.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_finetuned_mol_mlm_0_3_pipeline_en_5.5.0_3.0_1726768591217.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_uncased_finetuned_mol_mlm_0_3_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_uncased_finetuned_mol_mlm_0_3_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_uncased_finetuned_mol_mlm_0_3_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.7 MB| + +## References + +https://huggingface.co/matr1xx/bert-base-uncased-finetuned-mol-mlm-0.3 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-sent_bert_emoji_latvian_twitter_en.md b/docs/_posts/ahmedlone127/2024-09-19-sent_bert_emoji_latvian_twitter_en.md new file mode 100644 index 00000000000000..21ffd04da0e474 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-sent_bert_emoji_latvian_twitter_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_emoji_latvian_twitter BertSentenceEmbeddings from FFZG-cleopatra +author: John Snow Labs +name: sent_bert_emoji_latvian_twitter +date: 2024-09-19 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_emoji_latvian_twitter` is a English model originally trained by FFZG-cleopatra. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_emoji_latvian_twitter_en_5.5.0_3.0_1726760958679.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_emoji_latvian_twitter_en_5.5.0_3.0_1726760958679.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_emoji_latvian_twitter","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_emoji_latvian_twitter","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_emoji_latvian_twitter| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|664.2 MB| + +## References + +https://huggingface.co/FFZG-cleopatra/bert-emoji-latvian-twitter \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-sent_bert_large_ct_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-sent_bert_large_ct_pipeline_en.md new file mode 100644 index 00000000000000..0a466c6b65d9a9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-sent_bert_large_ct_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_large_ct_pipeline pipeline BertSentenceEmbeddings from Contrastive-Tension +author: John Snow Labs +name: sent_bert_large_ct_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_large_ct_pipeline` is a English model originally trained by Contrastive-Tension. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_large_ct_pipeline_en_5.5.0_3.0_1726728805430.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_large_ct_pipeline_en_5.5.0_3.0_1726728805430.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_large_ct_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_large_ct_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_large_ct_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/Contrastive-Tension/BERT-Large-CT + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-sent_hindi_bpe_bert_test_large_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-sent_hindi_bpe_bert_test_large_pipeline_en.md new file mode 100644 index 00000000000000..87e93924b20149 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-sent_hindi_bpe_bert_test_large_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_hindi_bpe_bert_test_large_pipeline pipeline BertSentenceEmbeddings from rg1683 +author: John Snow Labs +name: sent_hindi_bpe_bert_test_large_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_hindi_bpe_bert_test_large_pipeline` is a English model originally trained by rg1683. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_hindi_bpe_bert_test_large_pipeline_en_5.5.0_3.0_1726760964357.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_hindi_bpe_bert_test_large_pipeline_en_5.5.0_3.0_1726760964357.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_hindi_bpe_bert_test_large_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_hindi_bpe_bert_test_large_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_hindi_bpe_bert_test_large_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|378.3 MB| + +## References + +https://huggingface.co/rg1683/hindi_bpe_bert_test_large + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-sent_javanese_bert_small_imdb_jv.md b/docs/_posts/ahmedlone127/2024-09-19-sent_javanese_bert_small_imdb_jv.md new file mode 100644 index 00000000000000..3c20ad8d29fcbd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-sent_javanese_bert_small_imdb_jv.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Javanese sent_javanese_bert_small_imdb BertSentenceEmbeddings from w11wo +author: John Snow Labs +name: sent_javanese_bert_small_imdb +date: 2024-09-19 +tags: [jv, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: jv +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_javanese_bert_small_imdb` is a Javanese model originally trained by w11wo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_javanese_bert_small_imdb_jv_5.5.0_3.0_1726782985552.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_javanese_bert_small_imdb_jv_5.5.0_3.0_1726782985552.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_javanese_bert_small_imdb","jv") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_javanese_bert_small_imdb","jv") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_javanese_bert_small_imdb| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|jv| +|Size:|407.3 MB| + +## References + +https://huggingface.co/w11wo/javanese-bert-small-imdb \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-sent_jmedroberta_base_sentencepiece_vocab50000_ja.md b/docs/_posts/ahmedlone127/2024-09-19-sent_jmedroberta_base_sentencepiece_vocab50000_ja.md new file mode 100644 index 00000000000000..a86089b60c83aa --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-sent_jmedroberta_base_sentencepiece_vocab50000_ja.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Japanese sent_jmedroberta_base_sentencepiece_vocab50000 BertSentenceEmbeddings from alabnii +author: John Snow Labs +name: sent_jmedroberta_base_sentencepiece_vocab50000 +date: 2024-09-19 +tags: [ja, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: ja +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_jmedroberta_base_sentencepiece_vocab50000` is a Japanese model originally trained by alabnii. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_jmedroberta_base_sentencepiece_vocab50000_ja_5.5.0_3.0_1726761322365.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_jmedroberta_base_sentencepiece_vocab50000_ja_5.5.0_3.0_1726761322365.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_jmedroberta_base_sentencepiece_vocab50000","ja") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_jmedroberta_base_sentencepiece_vocab50000","ja") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_jmedroberta_base_sentencepiece_vocab50000| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|ja| +|Size:|464.0 MB| + +## References + +https://huggingface.co/alabnii/jmedroberta-base-sentencepiece-vocab50000 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-sent_nepnewsbert_en.md b/docs/_posts/ahmedlone127/2024-09-19-sent_nepnewsbert_en.md new file mode 100644 index 00000000000000..7468f6d6761274 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-sent_nepnewsbert_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_nepnewsbert BertSentenceEmbeddings from Shushant +author: John Snow Labs +name: sent_nepnewsbert +date: 2024-09-19 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_nepnewsbert` is a English model originally trained by Shushant. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_nepnewsbert_en_5.5.0_3.0_1726728652900.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_nepnewsbert_en_5.5.0_3.0_1726728652900.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_nepnewsbert","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_nepnewsbert","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_nepnewsbert| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|408.7 MB| + +## References + +https://huggingface.co/Shushant/NepNewsBERT \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-sent_nepnewsbert_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-sent_nepnewsbert_pipeline_en.md new file mode 100644 index 00000000000000..74e98dac022f7d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-sent_nepnewsbert_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_nepnewsbert_pipeline pipeline BertSentenceEmbeddings from Shushant +author: John Snow Labs +name: sent_nepnewsbert_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_nepnewsbert_pipeline` is a English model originally trained by Shushant. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_nepnewsbert_pipeline_en_5.5.0_3.0_1726728672508.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_nepnewsbert_pipeline_en_5.5.0_3.0_1726728672508.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_nepnewsbert_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_nepnewsbert_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_nepnewsbert_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.2 MB| + +## References + +https://huggingface.co/Shushant/NepNewsBERT + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-sent_phs_bert_en.md b/docs/_posts/ahmedlone127/2024-09-19-sent_phs_bert_en.md new file mode 100644 index 00000000000000..89aece1b87571a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-sent_phs_bert_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_phs_bert BertSentenceEmbeddings from publichealthsurveillance +author: John Snow Labs +name: sent_phs_bert +date: 2024-09-19 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_phs_bert` is a English model originally trained by publichealthsurveillance. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_phs_bert_en_5.5.0_3.0_1726782711992.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_phs_bert_en_5.5.0_3.0_1726782711992.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_phs_bert","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_phs_bert","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_phs_bert| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/publichealthsurveillance/PHS-BERT \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-sent_phs_bert_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-sent_phs_bert_pipeline_en.md new file mode 100644 index 00000000000000..1f9b9e7567e545 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-sent_phs_bert_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_phs_bert_pipeline pipeline BertSentenceEmbeddings from publichealthsurveillance +author: John Snow Labs +name: sent_phs_bert_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_phs_bert_pipeline` is a English model originally trained by publichealthsurveillance. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_phs_bert_pipeline_en_5.5.0_3.0_1726782769989.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_phs_bert_pipeline_en_5.5.0_3.0_1726782769989.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_phs_bert_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_phs_bert_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_phs_bert_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/publichealthsurveillance/PHS-BERT + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-sent_youtube_bert_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-sent_youtube_bert_pipeline_en.md new file mode 100644 index 00000000000000..6b9f530c96bf18 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-sent_youtube_bert_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_youtube_bert_pipeline pipeline BertSentenceEmbeddings from flboehm +author: John Snow Labs +name: sent_youtube_bert_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_youtube_bert_pipeline` is a English model originally trained by flboehm. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_youtube_bert_pipeline_en_5.5.0_3.0_1726728786334.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_youtube_bert_pipeline_en_5.5.0_3.0_1726728786334.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_youtube_bert_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_youtube_bert_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_youtube_bert_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|408.1 MB| + +## References + +https://huggingface.co/flboehm/youtube-bert + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-sentiment_analysis_fb_roberta_fine_tune_hashtag_removed_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-sentiment_analysis_fb_roberta_fine_tune_hashtag_removed_pipeline_en.md new file mode 100644 index 00000000000000..58f711e8690cf0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-sentiment_analysis_fb_roberta_fine_tune_hashtag_removed_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English sentiment_analysis_fb_roberta_fine_tune_hashtag_removed_pipeline pipeline RoBertaForSequenceClassification from technocrat3128 +author: John Snow Labs +name: sentiment_analysis_fb_roberta_fine_tune_hashtag_removed_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sentiment_analysis_fb_roberta_fine_tune_hashtag_removed_pipeline` is a English model originally trained by technocrat3128. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sentiment_analysis_fb_roberta_fine_tune_hashtag_removed_pipeline_en_5.5.0_3.0_1726725918874.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sentiment_analysis_fb_roberta_fine_tune_hashtag_removed_pipeline_en_5.5.0_3.0_1726725918874.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sentiment_analysis_fb_roberta_fine_tune_hashtag_removed_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sentiment_analysis_fb_roberta_fine_tune_hashtag_removed_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sentiment_analysis_fb_roberta_fine_tune_hashtag_removed_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|441.8 MB| + +## References + +https://huggingface.co/technocrat3128/sentiment_analysis_FB_roberta_fine_tune_hashtag_removed + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-sentiment_analysis_finetuning_en.md b/docs/_posts/ahmedlone127/2024-09-19-sentiment_analysis_finetuning_en.md new file mode 100644 index 00000000000000..b3d9457c704776 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-sentiment_analysis_finetuning_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sentiment_analysis_finetuning RoBertaForSequenceClassification from Asif1997 +author: John Snow Labs +name: sentiment_analysis_finetuning +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sentiment_analysis_finetuning` is a English model originally trained by Asif1997. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sentiment_analysis_finetuning_en_5.5.0_3.0_1726751205326.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sentiment_analysis_finetuning_en_5.5.0_3.0_1726751205326.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("sentiment_analysis_finetuning","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("sentiment_analysis_finetuning", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sentiment_analysis_finetuning| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|456.9 MB| + +## References + +https://huggingface.co/Asif1997/Sentiment-Analysis-Finetuning \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-sentiment_analysis_finetuning_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-sentiment_analysis_finetuning_pipeline_en.md new file mode 100644 index 00000000000000..b682f2a64698a6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-sentiment_analysis_finetuning_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English sentiment_analysis_finetuning_pipeline pipeline RoBertaForSequenceClassification from Asif1997 +author: John Snow Labs +name: sentiment_analysis_finetuning_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sentiment_analysis_finetuning_pipeline` is a English model originally trained by Asif1997. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sentiment_analysis_finetuning_pipeline_en_5.5.0_3.0_1726751232228.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sentiment_analysis_finetuning_pipeline_en_5.5.0_3.0_1726751232228.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sentiment_analysis_finetuning_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sentiment_analysis_finetuning_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sentiment_analysis_finetuning_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|457.0 MB| + +## References + +https://huggingface.co/Asif1997/Sentiment-Analysis-Finetuning + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-sentiment_multilingual_distilbert_pipeline_xx.md b/docs/_posts/ahmedlone127/2024-09-19-sentiment_multilingual_distilbert_pipeline_xx.md new file mode 100644 index 00000000000000..972b56cfa0a536 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-sentiment_multilingual_distilbert_pipeline_xx.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Multilingual sentiment_multilingual_distilbert_pipeline pipeline DistilBertForSequenceClassification from Mukalingam0813 +author: John Snow Labs +name: sentiment_multilingual_distilbert_pipeline +date: 2024-09-19 +tags: [xx, open_source, pipeline, onnx] +task: Text Classification +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sentiment_multilingual_distilbert_pipeline` is a Multilingual model originally trained by Mukalingam0813. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sentiment_multilingual_distilbert_pipeline_xx_5.5.0_3.0_1726741391537.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sentiment_multilingual_distilbert_pipeline_xx_5.5.0_3.0_1726741391537.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sentiment_multilingual_distilbert_pipeline", lang = "xx") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sentiment_multilingual_distilbert_pipeline", lang = "xx") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sentiment_multilingual_distilbert_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|xx| +|Size:|507.4 MB| + +## References + +https://huggingface.co/Mukalingam0813/sentiment-multilingual-distilbert + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-sexism_in_memes_en.md b/docs/_posts/ahmedlone127/2024-09-19-sexism_in_memes_en.md new file mode 100644 index 00000000000000..dd3197289622b0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-sexism_in_memes_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sexism_in_memes DistilBertForSequenceClassification from thranduil2 +author: John Snow Labs +name: sexism_in_memes +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sexism_in_memes` is a English model originally trained by thranduil2. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sexism_in_memes_en_5.5.0_3.0_1726718982886.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sexism_in_memes_en_5.5.0_3.0_1726718982886.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("sexism_in_memes","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("sexism_in_memes", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sexism_in_memes| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/thranduil2/sexism_in_memes \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-sexism_in_memes_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-sexism_in_memes_pipeline_en.md new file mode 100644 index 00000000000000..56d3bfdc69f565 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-sexism_in_memes_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English sexism_in_memes_pipeline pipeline DistilBertForSequenceClassification from thranduil2 +author: John Snow Labs +name: sexism_in_memes_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sexism_in_memes_pipeline` is a English model originally trained by thranduil2. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sexism_in_memes_pipeline_en_5.5.0_3.0_1726718999637.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sexism_in_memes_pipeline_en_5.5.0_3.0_1726718999637.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sexism_in_memes_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sexism_in_memes_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sexism_in_memes_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/thranduil2/sexism_in_memes + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-small_llm_lingo_a_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-small_llm_lingo_a_pipeline_en.md new file mode 100644 index 00000000000000..592c8c484eaf1c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-small_llm_lingo_a_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English small_llm_lingo_a_pipeline pipeline WhisperForCTC from Enagamirzayev +author: John Snow Labs +name: small_llm_lingo_a_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`small_llm_lingo_a_pipeline` is a English model originally trained by Enagamirzayev. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/small_llm_lingo_a_pipeline_en_5.5.0_3.0_1726788843944.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/small_llm_lingo_a_pipeline_en_5.5.0_3.0_1726788843944.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("small_llm_lingo_a_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("small_llm_lingo_a_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|small_llm_lingo_a_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/Enagamirzayev/small-llm-lingo_a + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-socmed_comment_roberta_base_indonesian_smsa_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-socmed_comment_roberta_base_indonesian_smsa_pipeline_en.md new file mode 100644 index 00000000000000..232bb3fdd6f731 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-socmed_comment_roberta_base_indonesian_smsa_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English socmed_comment_roberta_base_indonesian_smsa_pipeline pipeline RoBertaForSequenceClassification from databoks-irfan +author: John Snow Labs +name: socmed_comment_roberta_base_indonesian_smsa_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`socmed_comment_roberta_base_indonesian_smsa_pipeline` is a English model originally trained by databoks-irfan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/socmed_comment_roberta_base_indonesian_smsa_pipeline_en_5.5.0_3.0_1726780023599.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/socmed_comment_roberta_base_indonesian_smsa_pipeline_en_5.5.0_3.0_1726780023599.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("socmed_comment_roberta_base_indonesian_smsa_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("socmed_comment_roberta_base_indonesian_smsa_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|socmed_comment_roberta_base_indonesian_smsa_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|467.7 MB| + +## References + +https://huggingface.co/databoks-irfan/socmed-comment-roberta-base-indonesian-smsa + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-stego_classifier_checkpoint_epoch_0_2024_07_26_13_30_02_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-stego_classifier_checkpoint_epoch_0_2024_07_26_13_30_02_pipeline_en.md new file mode 100644 index 00000000000000..bcbfd705cbc5fe --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-stego_classifier_checkpoint_epoch_0_2024_07_26_13_30_02_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English stego_classifier_checkpoint_epoch_0_2024_07_26_13_30_02_pipeline pipeline DistilBertForSequenceClassification from jvelja +author: John Snow Labs +name: stego_classifier_checkpoint_epoch_0_2024_07_26_13_30_02_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`stego_classifier_checkpoint_epoch_0_2024_07_26_13_30_02_pipeline` is a English model originally trained by jvelja. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/stego_classifier_checkpoint_epoch_0_2024_07_26_13_30_02_pipeline_en_5.5.0_3.0_1726741192145.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/stego_classifier_checkpoint_epoch_0_2024_07_26_13_30_02_pipeline_en_5.5.0_3.0_1726741192145.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("stego_classifier_checkpoint_epoch_0_2024_07_26_13_30_02_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("stego_classifier_checkpoint_epoch_0_2024_07_26_13_30_02_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|stego_classifier_checkpoint_epoch_0_2024_07_26_13_30_02_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/jvelja/stego-classifier-checkpoint-epoch-0-2024-07-26_13-30-02 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-stego_classifier_checkpoint_epoch_20_2024_07_26_16_19_31_en.md b/docs/_posts/ahmedlone127/2024-09-19-stego_classifier_checkpoint_epoch_20_2024_07_26_16_19_31_en.md new file mode 100644 index 00000000000000..bd62e84883ebdb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-stego_classifier_checkpoint_epoch_20_2024_07_26_16_19_31_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English stego_classifier_checkpoint_epoch_20_2024_07_26_16_19_31 DistilBertForSequenceClassification from jvelja +author: John Snow Labs +name: stego_classifier_checkpoint_epoch_20_2024_07_26_16_19_31 +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`stego_classifier_checkpoint_epoch_20_2024_07_26_16_19_31` is a English model originally trained by jvelja. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/stego_classifier_checkpoint_epoch_20_2024_07_26_16_19_31_en_5.5.0_3.0_1726763347671.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/stego_classifier_checkpoint_epoch_20_2024_07_26_16_19_31_en_5.5.0_3.0_1726763347671.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("stego_classifier_checkpoint_epoch_20_2024_07_26_16_19_31","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("stego_classifier_checkpoint_epoch_20_2024_07_26_16_19_31", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|stego_classifier_checkpoint_epoch_20_2024_07_26_16_19_31| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/jvelja/stego-classifier-checkpoint-epoch-20-2024-07-26_16-19-31 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-stego_classifier_checkpoint_epoch_40_2024_07_26_12_23_45_en.md b/docs/_posts/ahmedlone127/2024-09-19-stego_classifier_checkpoint_epoch_40_2024_07_26_12_23_45_en.md new file mode 100644 index 00000000000000..48310816e51bfe --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-stego_classifier_checkpoint_epoch_40_2024_07_26_12_23_45_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English stego_classifier_checkpoint_epoch_40_2024_07_26_12_23_45 DistilBertForSequenceClassification from jvelja +author: John Snow Labs +name: stego_classifier_checkpoint_epoch_40_2024_07_26_12_23_45 +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`stego_classifier_checkpoint_epoch_40_2024_07_26_12_23_45` is a English model originally trained by jvelja. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/stego_classifier_checkpoint_epoch_40_2024_07_26_12_23_45_en_5.5.0_3.0_1726743751122.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/stego_classifier_checkpoint_epoch_40_2024_07_26_12_23_45_en_5.5.0_3.0_1726743751122.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("stego_classifier_checkpoint_epoch_40_2024_07_26_12_23_45","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("stego_classifier_checkpoint_epoch_40_2024_07_26_12_23_45", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|stego_classifier_checkpoint_epoch_40_2024_07_26_12_23_45| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/jvelja/stego-classifier-checkpoint-epoch-40-2024-07-26_12-23-45 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-stresstweetrobertasentiment_en.md b/docs/_posts/ahmedlone127/2024-09-19-stresstweetrobertasentiment_en.md new file mode 100644 index 00000000000000..4f09e49946a0d5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-stresstweetrobertasentiment_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English stresstweetrobertasentiment RoBertaForSequenceClassification from StephArn +author: John Snow Labs +name: stresstweetrobertasentiment +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`stresstweetrobertasentiment` is a English model originally trained by StephArn. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/stresstweetrobertasentiment_en_5.5.0_3.0_1726779734363.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/stresstweetrobertasentiment_en_5.5.0_3.0_1726779734363.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("stresstweetrobertasentiment","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("stresstweetrobertasentiment", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|stresstweetrobertasentiment| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|468.3 MB| + +## References + +https://huggingface.co/StephArn/StressTweetRobertaSentiment \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-stresstweetrobertasentiment_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-stresstweetrobertasentiment_pipeline_en.md new file mode 100644 index 00000000000000..26d78e69012c9d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-stresstweetrobertasentiment_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English stresstweetrobertasentiment_pipeline pipeline RoBertaForSequenceClassification from StephArn +author: John Snow Labs +name: stresstweetrobertasentiment_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`stresstweetrobertasentiment_pipeline` is a English model originally trained by StephArn. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/stresstweetrobertasentiment_pipeline_en_5.5.0_3.0_1726779757898.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/stresstweetrobertasentiment_pipeline_en_5.5.0_3.0_1726779757898.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("stresstweetrobertasentiment_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("stresstweetrobertasentiment_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|stresstweetrobertasentiment_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|468.3 MB| + +## References + +https://huggingface.co/StephArn/StressTweetRobertaSentiment + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-tag_clf_en.md b/docs/_posts/ahmedlone127/2024-09-19-tag_clf_en.md new file mode 100644 index 00000000000000..a67b29866e3f53 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-tag_clf_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English tag_clf RoBertaForSequenceClassification from eyeonyou +author: John Snow Labs +name: tag_clf +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`tag_clf` is a English model originally trained by eyeonyou. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/tag_clf_en_5.5.0_3.0_1726733572739.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/tag_clf_en_5.5.0_3.0_1726733572739.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("tag_clf","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("tag_clf", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|tag_clf| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|468.4 MB| + +## References + +https://huggingface.co/eyeonyou/tag_clf \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-taiwanese_whisper_splend1dchan_en.md b/docs/_posts/ahmedlone127/2024-09-19-taiwanese_whisper_splend1dchan_en.md new file mode 100644 index 00000000000000..03d323838152f8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-taiwanese_whisper_splend1dchan_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English taiwanese_whisper_splend1dchan WhisperForCTC from Splend1dchan +author: John Snow Labs +name: taiwanese_whisper_splend1dchan +date: 2024-09-19 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`taiwanese_whisper_splend1dchan` is a English model originally trained by Splend1dchan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/taiwanese_whisper_splend1dchan_en_5.5.0_3.0_1726787745178.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/taiwanese_whisper_splend1dchan_en_5.5.0_3.0_1726787745178.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("taiwanese_whisper_splend1dchan","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("taiwanese_whisper_splend1dchan", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|taiwanese_whisper_splend1dchan| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/Splend1dchan/Taiwanese-Whisper \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-tatar_en.md b/docs/_posts/ahmedlone127/2024-09-19-tatar_en.md new file mode 100644 index 00000000000000..33437883c016a7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-tatar_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English tatar DistilBertForSequenceClassification from isom5240sp24 +author: John Snow Labs +name: tatar +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`tatar` is a English model originally trained by isom5240sp24. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/tatar_en_5.5.0_3.0_1726719184054.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/tatar_en_5.5.0_3.0_1726719184054.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("tatar","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("tatar", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|tatar| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/isom5240sp24/tt \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-tatar_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-tatar_pipeline_en.md new file mode 100644 index 00000000000000..a2ed4fb023edc2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-tatar_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English tatar_pipeline pipeline DistilBertForSequenceClassification from isom5240sp24 +author: John Snow Labs +name: tatar_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`tatar_pipeline` is a English model originally trained by isom5240sp24. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/tatar_pipeline_en_5.5.0_3.0_1726719196962.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/tatar_pipeline_en_5.5.0_3.0_1726719196962.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("tatar_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("tatar_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|tatar_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/isom5240sp24/tt + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-team7_en.md b/docs/_posts/ahmedlone127/2024-09-19-team7_en.md new file mode 100644 index 00000000000000..23eac0855417e5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-team7_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English team7 RoBertaForSequenceClassification from MLGuy2 +author: John Snow Labs +name: team7 +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`team7` is a English model originally trained by MLGuy2. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/team7_en_5.5.0_3.0_1726726104659.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/team7_en_5.5.0_3.0_1726726104659.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("team7","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("team7", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|team7| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|468.4 MB| + +## References + +https://huggingface.co/MLGuy2/Team7 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-tenaliai_fintech_v1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-tenaliai_fintech_v1_pipeline_en.md new file mode 100644 index 00000000000000..2ee9348f7631e8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-tenaliai_fintech_v1_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English tenaliai_fintech_v1_pipeline pipeline BertForSequenceClassification from credentek +author: John Snow Labs +name: tenaliai_fintech_v1_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`tenaliai_fintech_v1_pipeline` is a English model originally trained by credentek. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/tenaliai_fintech_v1_pipeline_en_5.5.0_3.0_1726770521898.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/tenaliai_fintech_v1_pipeline_en_5.5.0_3.0_1726770521898.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("tenaliai_fintech_v1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("tenaliai_fintech_v1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|tenaliai_fintech_v1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.7 MB| + +## References + +https://huggingface.co/credentek/TenaliAI-FinTech-v1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-test7_balanced_and_sentence_en.md b/docs/_posts/ahmedlone127/2024-09-19-test7_balanced_and_sentence_en.md new file mode 100644 index 00000000000000..2a0a9f6ffc1cb8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-test7_balanced_and_sentence_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English test7_balanced_and_sentence RoBertaForSequenceClassification from adriansanz +author: John Snow Labs +name: test7_balanced_and_sentence +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`test7_balanced_and_sentence` is a English model originally trained by adriansanz. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/test7_balanced_and_sentence_en_5.5.0_3.0_1726750646809.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/test7_balanced_and_sentence_en_5.5.0_3.0_1726750646809.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("test7_balanced_and_sentence","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("test7_balanced_and_sentence", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|test7_balanced_and_sentence| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|465.4 MB| + +## References + +https://huggingface.co/adriansanz/test7_balanced_and_sentence \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-test_camsaid_en.md b/docs/_posts/ahmedlone127/2024-09-19-test_camsaid_en.md new file mode 100644 index 00000000000000..1eb7e0f9b7355e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-test_camsaid_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English test_camsaid DistilBertForSequenceClassification from CamSaid +author: John Snow Labs +name: test_camsaid +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`test_camsaid` is a English model originally trained by CamSaid. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/test_camsaid_en_5.5.0_3.0_1726741230172.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/test_camsaid_en_5.5.0_3.0_1726741230172.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("test_camsaid","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("test_camsaid", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|test_camsaid| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/CamSaid/Test \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-test_finetuned__roberta_base_biomedical_clinical_spanish__59k_ultrasounds_ner_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-test_finetuned__roberta_base_biomedical_clinical_spanish__59k_ultrasounds_ner_pipeline_en.md new file mode 100644 index 00000000000000..1aa757b15b480d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-test_finetuned__roberta_base_biomedical_clinical_spanish__59k_ultrasounds_ner_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English test_finetuned__roberta_base_biomedical_clinical_spanish__59k_ultrasounds_ner_pipeline pipeline RoBertaForTokenClassification from manucos +author: John Snow Labs +name: test_finetuned__roberta_base_biomedical_clinical_spanish__59k_ultrasounds_ner_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`test_finetuned__roberta_base_biomedical_clinical_spanish__59k_ultrasounds_ner_pipeline` is a English model originally trained by manucos. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/test_finetuned__roberta_base_biomedical_clinical_spanish__59k_ultrasounds_ner_pipeline_en_5.5.0_3.0_1726730670572.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/test_finetuned__roberta_base_biomedical_clinical_spanish__59k_ultrasounds_ner_pipeline_en_5.5.0_3.0_1726730670572.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("test_finetuned__roberta_base_biomedical_clinical_spanish__59k_ultrasounds_ner_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("test_finetuned__roberta_base_biomedical_clinical_spanish__59k_ultrasounds_ner_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|test_finetuned__roberta_base_biomedical_clinical_spanish__59k_ultrasounds_ner_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|469.5 MB| + +## References + +https://huggingface.co/manucos/test-finetuned__roberta-base-biomedical-clinical-es__59k-ultrasounds-ner + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-test_harmmie_en.md b/docs/_posts/ahmedlone127/2024-09-19-test_harmmie_en.md new file mode 100644 index 00000000000000..41f14335fd7f0f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-test_harmmie_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English test_harmmie DistilBertForSequenceClassification from Harmmie +author: John Snow Labs +name: test_harmmie +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`test_harmmie` is a English model originally trained by Harmmie. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/test_harmmie_en_5.5.0_3.0_1726763938500.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/test_harmmie_en_5.5.0_3.0_1726763938500.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("test_harmmie","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("test_harmmie", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|test_harmmie| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Harmmie/test \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-test_harmmie_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-test_harmmie_pipeline_en.md new file mode 100644 index 00000000000000..54926cbe912e4e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-test_harmmie_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English test_harmmie_pipeline pipeline DistilBertForSequenceClassification from Harmmie +author: John Snow Labs +name: test_harmmie_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`test_harmmie_pipeline` is a English model originally trained by Harmmie. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/test_harmmie_pipeline_en_5.5.0_3.0_1726763953091.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/test_harmmie_pipeline_en_5.5.0_3.0_1726763953091.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("test_harmmie_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("test_harmmie_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|test_harmmie_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Harmmie/test + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-test_model1_en.md b/docs/_posts/ahmedlone127/2024-09-19-test_model1_en.md new file mode 100644 index 00000000000000..c5408022bf1a44 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-test_model1_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English test_model1 DistilBertForSequenceClassification from imljls +author: John Snow Labs +name: test_model1 +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`test_model1` is a English model originally trained by imljls. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/test_model1_en_5.5.0_3.0_1726763938548.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/test_model1_en_5.5.0_3.0_1726763938548.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("test_model1","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("test_model1", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|test_model1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/imljls/test_model1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-test_model1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-test_model1_pipeline_en.md new file mode 100644 index 00000000000000..c9af7f37904d67 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-test_model1_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English test_model1_pipeline pipeline DistilBertForSequenceClassification from imljls +author: John Snow Labs +name: test_model1_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`test_model1_pipeline` is a English model originally trained by imljls. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/test_model1_pipeline_en_5.5.0_3.0_1726763953193.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/test_model1_pipeline_en_5.5.0_3.0_1726763953193.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("test_model1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("test_model1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|test_model1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/imljls/test_model1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-test_whisper_tiny_thai_phakphum_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-test_whisper_tiny_thai_phakphum_pipeline_en.md new file mode 100644 index 00000000000000..b3de5788992888 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-test_whisper_tiny_thai_phakphum_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English test_whisper_tiny_thai_phakphum_pipeline pipeline WhisperForCTC from Phakphum +author: John Snow Labs +name: test_whisper_tiny_thai_phakphum_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`test_whisper_tiny_thai_phakphum_pipeline` is a English model originally trained by Phakphum. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/test_whisper_tiny_thai_phakphum_pipeline_en_5.5.0_3.0_1726714439379.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/test_whisper_tiny_thai_phakphum_pipeline_en_5.5.0_3.0_1726714439379.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("test_whisper_tiny_thai_phakphum_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("test_whisper_tiny_thai_phakphum_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|test_whisper_tiny_thai_phakphum_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|390.0 MB| + +## References + +https://huggingface.co/Phakphum/test-whisper-tiny-th + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-testing_model_katowtkkk_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-testing_model_katowtkkk_pipeline_en.md new file mode 100644 index 00000000000000..a081c6a110ffc3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-testing_model_katowtkkk_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English testing_model_katowtkkk_pipeline pipeline DistilBertForSequenceClassification from katowtkkk +author: John Snow Labs +name: testing_model_katowtkkk_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`testing_model_katowtkkk_pipeline` is a English model originally trained by katowtkkk. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/testing_model_katowtkkk_pipeline_en_5.5.0_3.0_1726740667789.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/testing_model_katowtkkk_pipeline_en_5.5.0_3.0_1726740667789.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("testing_model_katowtkkk_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("testing_model_katowtkkk_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|testing_model_katowtkkk_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/katowtkkk/testing_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-testing_nischalsingh_en.md b/docs/_posts/ahmedlone127/2024-09-19-testing_nischalsingh_en.md new file mode 100644 index 00000000000000..0917a5e9ba62b2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-testing_nischalsingh_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English testing_nischalsingh DistilBertForSequenceClassification from nischalsingh +author: John Snow Labs +name: testing_nischalsingh +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`testing_nischalsingh` is a English model originally trained by nischalsingh. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/testing_nischalsingh_en_5.5.0_3.0_1726763974861.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/testing_nischalsingh_en_5.5.0_3.0_1726763974861.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("testing_nischalsingh","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("testing_nischalsingh", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|testing_nischalsingh| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/nischalsingh/testing \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-testing_nischalsingh_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-testing_nischalsingh_pipeline_en.md new file mode 100644 index 00000000000000..eb8f62a258823f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-testing_nischalsingh_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English testing_nischalsingh_pipeline pipeline DistilBertForSequenceClassification from nischalsingh +author: John Snow Labs +name: testing_nischalsingh_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`testing_nischalsingh_pipeline` is a English model originally trained by nischalsingh. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/testing_nischalsingh_pipeline_en_5.5.0_3.0_1726763986941.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/testing_nischalsingh_pipeline_en_5.5.0_3.0_1726763986941.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("testing_nischalsingh_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("testing_nischalsingh_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|testing_nischalsingh_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/nischalsingh/testing + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-text_classification_sms_model_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-text_classification_sms_model_pipeline_en.md new file mode 100644 index 00000000000000..4c1177466ad83f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-text_classification_sms_model_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English text_classification_sms_model_pipeline pipeline DistilBertForSequenceClassification from dstankovskii +author: John Snow Labs +name: text_classification_sms_model_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`text_classification_sms_model_pipeline` is a English model originally trained by dstankovskii. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/text_classification_sms_model_pipeline_en_5.5.0_3.0_1726740715082.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/text_classification_sms_model_pipeline_en_5.5.0_3.0_1726740715082.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("text_classification_sms_model_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("text_classification_sms_model_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|text_classification_sms_model_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/dstankovskii/text_classification_sms_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-third_classification_model_en.md b/docs/_posts/ahmedlone127/2024-09-19-third_classification_model_en.md new file mode 100644 index 00000000000000..01b2f4cd69484c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-third_classification_model_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English third_classification_model DistilBertForSequenceClassification from Danny-Moldovan +author: John Snow Labs +name: third_classification_model +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`third_classification_model` is a English model originally trained by Danny-Moldovan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/third_classification_model_en_5.5.0_3.0_1726763346827.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/third_classification_model_en_5.5.0_3.0_1726763346827.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("third_classification_model","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("third_classification_model", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|third_classification_model| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Danny-Moldovan/third_classification_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-trained_dilibert_sentiment_analysis_amolinab_en.md b/docs/_posts/ahmedlone127/2024-09-19-trained_dilibert_sentiment_analysis_amolinab_en.md new file mode 100644 index 00000000000000..9b51c8876a32d1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-trained_dilibert_sentiment_analysis_amolinab_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English trained_dilibert_sentiment_analysis_amolinab DistilBertForSequenceClassification from amolinab +author: John Snow Labs +name: trained_dilibert_sentiment_analysis_amolinab +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`trained_dilibert_sentiment_analysis_amolinab` is a English model originally trained by amolinab. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/trained_dilibert_sentiment_analysis_amolinab_en_5.5.0_3.0_1726740889921.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/trained_dilibert_sentiment_analysis_amolinab_en_5.5.0_3.0_1726740889921.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("trained_dilibert_sentiment_analysis_amolinab","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("trained_dilibert_sentiment_analysis_amolinab", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|trained_dilibert_sentiment_analysis_amolinab| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/amolinab/trained_dilibert_sentiment_analysis \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-trainedlocation_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-trainedlocation_pipeline_en.md new file mode 100644 index 00000000000000..6bad08921b3577 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-trainedlocation_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English trainedlocation_pipeline pipeline DistilBertForSequenceClassification from PathofthePeople +author: John Snow Labs +name: trainedlocation_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`trainedlocation_pipeline` is a English model originally trained by PathofthePeople. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/trainedlocation_pipeline_en_5.5.0_3.0_1726743874102.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/trainedlocation_pipeline_en_5.5.0_3.0_1726743874102.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("trainedlocation_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("trainedlocation_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|trainedlocation_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/PathofthePeople/TrainedLocation + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-trainer_chapter5_en.md b/docs/_posts/ahmedlone127/2024-09-19-trainer_chapter5_en.md new file mode 100644 index 00000000000000..c5e8cd9a4dac73 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-trainer_chapter5_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English trainer_chapter5 DistilBertForSequenceClassification from AlanHou +author: John Snow Labs +name: trainer_chapter5 +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`trainer_chapter5` is a English model originally trained by AlanHou. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/trainer_chapter5_en_5.5.0_3.0_1726743424391.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/trainer_chapter5_en_5.5.0_3.0_1726743424391.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("trainer_chapter5","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("trainer_chapter5", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|trainer_chapter5| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/AlanHou/trainer-chapter5 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-trainer_chapter5_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-trainer_chapter5_pipeline_en.md new file mode 100644 index 00000000000000..90fbbc5a87f63a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-trainer_chapter5_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English trainer_chapter5_pipeline pipeline DistilBertForSequenceClassification from AlanHou +author: John Snow Labs +name: trainer_chapter5_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`trainer_chapter5_pipeline` is a English model originally trained by AlanHou. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/trainer_chapter5_pipeline_en_5.5.0_3.0_1726743440550.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/trainer_chapter5_pipeline_en_5.5.0_3.0_1726743440550.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("trainer_chapter5_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("trainer_chapter5_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|trainer_chapter5_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/AlanHou/trainer-chapter5 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-urdu_roberta_ner_en.md b/docs/_posts/ahmedlone127/2024-09-19-urdu_roberta_ner_en.md new file mode 100644 index 00000000000000..e66bb8e367397f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-urdu_roberta_ner_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English urdu_roberta_ner XlmRoBertaForTokenClassification from mirfan899 +author: John Snow Labs +name: urdu_roberta_ner +date: 2024-09-19 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`urdu_roberta_ner` is a English model originally trained by mirfan899. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/urdu_roberta_ner_en_5.5.0_3.0_1726737668683.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/urdu_roberta_ner_en_5.5.0_3.0_1726737668683.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("urdu_roberta_ner","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("urdu_roberta_ner", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|urdu_roberta_ner| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|803.5 MB| + +## References + +https://huggingface.co/mirfan899/urdu-roberta-ner \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-urdu_roberta_ner_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-urdu_roberta_ner_pipeline_en.md new file mode 100644 index 00000000000000..6c3dab3712047b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-urdu_roberta_ner_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English urdu_roberta_ner_pipeline pipeline XlmRoBertaForTokenClassification from mirfan899 +author: John Snow Labs +name: urdu_roberta_ner_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`urdu_roberta_ner_pipeline` is a English model originally trained by mirfan899. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/urdu_roberta_ner_pipeline_en_5.5.0_3.0_1726737790151.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/urdu_roberta_ner_pipeline_en_5.5.0_3.0_1726737790151.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("urdu_roberta_ner_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("urdu_roberta_ner_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|urdu_roberta_ner_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|803.6 MB| + +## References + +https://huggingface.co/mirfan899/urdu-roberta-ner + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-url_relevance_en.md b/docs/_posts/ahmedlone127/2024-09-19-url_relevance_en.md new file mode 100644 index 00000000000000..5d87354a44620f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-url_relevance_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English url_relevance DistilBertForSequenceClassification from PDAP +author: John Snow Labs +name: url_relevance +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`url_relevance` is a English model originally trained by PDAP. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/url_relevance_en_5.5.0_3.0_1726763829864.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/url_relevance_en_5.5.0_3.0_1726763829864.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("url_relevance","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("url_relevance", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|url_relevance| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|253.9 MB| + +## References + +https://huggingface.co/PDAP/url-relevance \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-whisper_base_indonesian_rizka_id.md b/docs/_posts/ahmedlone127/2024-09-19-whisper_base_indonesian_rizka_id.md new file mode 100644 index 00000000000000..61cb69cda8ebd8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-whisper_base_indonesian_rizka_id.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Indonesian whisper_base_indonesian_rizka WhisperForCTC from Rizka +author: John Snow Labs +name: whisper_base_indonesian_rizka +date: 2024-09-19 +tags: [id, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: id +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_base_indonesian_rizka` is a Indonesian model originally trained by Rizka. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_base_indonesian_rizka_id_5.5.0_3.0_1726759797260.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_base_indonesian_rizka_id_5.5.0_3.0_1726759797260.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_base_indonesian_rizka","id") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_base_indonesian_rizka", "id") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_base_indonesian_rizka| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|id| +|Size:|642.1 MB| + +## References + +https://huggingface.co/Rizka/whisper-base-id \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-whisper_base_indonesian_rizka_pipeline_id.md b/docs/_posts/ahmedlone127/2024-09-19-whisper_base_indonesian_rizka_pipeline_id.md new file mode 100644 index 00000000000000..0a32fe682960f0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-whisper_base_indonesian_rizka_pipeline_id.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Indonesian whisper_base_indonesian_rizka_pipeline pipeline WhisperForCTC from Rizka +author: John Snow Labs +name: whisper_base_indonesian_rizka_pipeline +date: 2024-09-19 +tags: [id, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: id +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_base_indonesian_rizka_pipeline` is a Indonesian model originally trained by Rizka. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_base_indonesian_rizka_pipeline_id_5.5.0_3.0_1726759830534.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_base_indonesian_rizka_pipeline_id_5.5.0_3.0_1726759830534.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_base_indonesian_rizka_pipeline", lang = "id") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_base_indonesian_rizka_pipeline", lang = "id") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_base_indonesian_rizka_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|id| +|Size:|642.1 MB| + +## References + +https://huggingface.co/Rizka/whisper-base-id + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-whisper_big_kmon_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-whisper_big_kmon_pipeline_en.md new file mode 100644 index 00000000000000..249947ac0941a2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-whisper_big_kmon_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_big_kmon_pipeline pipeline WhisperForCTC from susmitabhatt +author: John Snow Labs +name: whisper_big_kmon_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_big_kmon_pipeline` is a English model originally trained by susmitabhatt. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_big_kmon_pipeline_en_5.5.0_3.0_1726757913058.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_big_kmon_pipeline_en_5.5.0_3.0_1726757913058.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_big_kmon_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_big_kmon_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_big_kmon_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/susmitabhatt/whisper-big-kmon + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-whisper_jrb_small_tamil_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-whisper_jrb_small_tamil_pipeline_en.md new file mode 100644 index 00000000000000..225cb584b105ad --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-whisper_jrb_small_tamil_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_jrb_small_tamil_pipeline pipeline WhisperForCTC from jbatista79 +author: John Snow Labs +name: whisper_jrb_small_tamil_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_jrb_small_tamil_pipeline` is a English model originally trained by jbatista79. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_jrb_small_tamil_pipeline_en_5.5.0_3.0_1726716270724.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_jrb_small_tamil_pipeline_en_5.5.0_3.0_1726716270724.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_jrb_small_tamil_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_jrb_small_tamil_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_jrb_small_tamil_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/jbatista79/whisper-jrb-small-ta + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-whisper_nak_01_en.md b/docs/_posts/ahmedlone127/2024-09-19-whisper_nak_01_en.md new file mode 100644 index 00000000000000..e25881e7a95754 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-whisper_nak_01_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_nak_01 WhisperForCTC from rizer0 +author: John Snow Labs +name: whisper_nak_01 +date: 2024-09-19 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_nak_01` is a English model originally trained by rizer0. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_nak_01_en_5.5.0_3.0_1726715198411.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_nak_01_en_5.5.0_3.0_1726715198411.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_nak_01","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_nak_01", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_nak_01| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|639.9 MB| + +## References + +https://huggingface.co/rizer0/whisper_nak_01 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-whisper_romanian_en.md b/docs/_posts/ahmedlone127/2024-09-19-whisper_romanian_en.md new file mode 100644 index 00000000000000..856356281fb17b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-whisper_romanian_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_romanian WhisperForCTC from readerbench +author: John Snow Labs +name: whisper_romanian +date: 2024-09-19 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_romanian` is a English model originally trained by readerbench. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_romanian_en_5.5.0_3.0_1726758952893.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_romanian_en_5.5.0_3.0_1726758952893.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_romanian","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_romanian", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_romanian| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/readerbench/whisper-ro \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-whisper_romanian_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-whisper_romanian_pipeline_en.md new file mode 100644 index 00000000000000..aa5f5ac9077778 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-whisper_romanian_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_romanian_pipeline pipeline WhisperForCTC from readerbench +author: John Snow Labs +name: whisper_romanian_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_romanian_pipeline` is a English model originally trained by readerbench. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_romanian_pipeline_en_5.5.0_3.0_1726759039372.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_romanian_pipeline_en_5.5.0_3.0_1726759039372.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_romanian_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_romanian_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_romanian_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/readerbench/whisper-ro + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-whisper_slovak_small_augmented_pipeline_sk.md b/docs/_posts/ahmedlone127/2024-09-19-whisper_slovak_small_augmented_pipeline_sk.md new file mode 100644 index 00000000000000..1e9d7fefcde8d2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-whisper_slovak_small_augmented_pipeline_sk.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Slovak whisper_slovak_small_augmented_pipeline pipeline WhisperForCTC from ALM +author: John Snow Labs +name: whisper_slovak_small_augmented_pipeline +date: 2024-09-19 +tags: [sk, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: sk +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_slovak_small_augmented_pipeline` is a Slovak model originally trained by ALM. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_slovak_small_augmented_pipeline_sk_5.5.0_3.0_1726787879617.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_slovak_small_augmented_pipeline_sk_5.5.0_3.0_1726787879617.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_slovak_small_augmented_pipeline", lang = "sk") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_slovak_small_augmented_pipeline", lang = "sk") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_slovak_small_augmented_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|sk| +|Size:|1.7 GB| + +## References + +https://huggingface.co/ALM/whisper-sk-small-augmented + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-whisper_small_arabic_mohammadjamalaldeen_pipeline_ar.md b/docs/_posts/ahmedlone127/2024-09-19-whisper_small_arabic_mohammadjamalaldeen_pipeline_ar.md new file mode 100644 index 00000000000000..1d7ca906093f93 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-whisper_small_arabic_mohammadjamalaldeen_pipeline_ar.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Arabic whisper_small_arabic_mohammadjamalaldeen_pipeline pipeline WhisperForCTC from MohammadJamalaldeen +author: John Snow Labs +name: whisper_small_arabic_mohammadjamalaldeen_pipeline +date: 2024-09-19 +tags: [ar, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: ar +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_arabic_mohammadjamalaldeen_pipeline` is a Arabic model originally trained by MohammadJamalaldeen. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_arabic_mohammadjamalaldeen_pipeline_ar_5.5.0_3.0_1726714695319.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_arabic_mohammadjamalaldeen_pipeline_ar_5.5.0_3.0_1726714695319.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_arabic_mohammadjamalaldeen_pipeline", lang = "ar") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_arabic_mohammadjamalaldeen_pipeline", lang = "ar") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_arabic_mohammadjamalaldeen_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|ar| +|Size:|1.7 GB| + +## References + +https://huggingface.co/MohammadJamalaldeen/whisper-small-arabic + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-whisper_small_arabic_noise2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-whisper_small_arabic_noise2_pipeline_en.md new file mode 100644 index 00000000000000..e2b7caca3c1763 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-whisper_small_arabic_noise2_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_small_arabic_noise2_pipeline pipeline WhisperForCTC from MohammedNasri +author: John Snow Labs +name: whisper_small_arabic_noise2_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_arabic_noise2_pipeline` is a English model originally trained by MohammedNasri. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_arabic_noise2_pipeline_en_5.5.0_3.0_1726788665983.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_arabic_noise2_pipeline_en_5.5.0_3.0_1726788665983.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_arabic_noise2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_arabic_noise2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_arabic_noise2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/MohammedNasri/whisper-small-ar-Noise2 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-whisper_small_commonvoice_english_indacc_reduce_lr_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-whisper_small_commonvoice_english_indacc_reduce_lr_pipeline_en.md new file mode 100644 index 00000000000000..10f25fb411fc63 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-whisper_small_commonvoice_english_indacc_reduce_lr_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_small_commonvoice_english_indacc_reduce_lr_pipeline pipeline WhisperForCTC from lauratomokiyo +author: John Snow Labs +name: whisper_small_commonvoice_english_indacc_reduce_lr_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_commonvoice_english_indacc_reduce_lr_pipeline` is a English model originally trained by lauratomokiyo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_commonvoice_english_indacc_reduce_lr_pipeline_en_5.5.0_3.0_1726713917753.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_commonvoice_english_indacc_reduce_lr_pipeline_en_5.5.0_3.0_1726713917753.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_commonvoice_english_indacc_reduce_lr_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_commonvoice_english_indacc_reduce_lr_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_commonvoice_english_indacc_reduce_lr_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/lauratomokiyo/whisper-small-commonvoice-english-indacc-reduce_lr + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-whisper_small_cv11_french_fr.md b/docs/_posts/ahmedlone127/2024-09-19-whisper_small_cv11_french_fr.md new file mode 100644 index 00000000000000..4dc52adc84c1b1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-whisper_small_cv11_french_fr.md @@ -0,0 +1,84 @@ +--- +layout: model +title: French whisper_small_cv11_french WhisperForCTC from bofenghuang +author: John Snow Labs +name: whisper_small_cv11_french +date: 2024-09-19 +tags: [fr, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: fr +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_cv11_french` is a French model originally trained by bofenghuang. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_cv11_french_fr_5.5.0_3.0_1726757212069.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_cv11_french_fr_5.5.0_3.0_1726757212069.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_cv11_french","fr") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_cv11_french", "fr") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_cv11_french| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|fr| +|Size:|1.1 GB| + +## References + +https://huggingface.co/bofenghuang/whisper-small-cv11-french \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-whisper_small_divehi_frorozcol_en.md b/docs/_posts/ahmedlone127/2024-09-19-whisper_small_divehi_frorozcol_en.md new file mode 100644 index 00000000000000..3a7eadf41223b5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-whisper_small_divehi_frorozcol_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_small_divehi_frorozcol WhisperForCTC from Frorozcol +author: John Snow Labs +name: whisper_small_divehi_frorozcol +date: 2024-09-19 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_divehi_frorozcol` is a English model originally trained by Frorozcol. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_divehi_frorozcol_en_5.5.0_3.0_1726714515728.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_divehi_frorozcol_en_5.5.0_3.0_1726714515728.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_divehi_frorozcol","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_divehi_frorozcol", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_divehi_frorozcol| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|389.9 MB| + +## References + +https://huggingface.co/Frorozcol/whisper-small-dv \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-whisper_small_divehi_frorozcol_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-whisper_small_divehi_frorozcol_pipeline_en.md new file mode 100644 index 00000000000000..9d073894423f4a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-whisper_small_divehi_frorozcol_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_small_divehi_frorozcol_pipeline pipeline WhisperForCTC from Frorozcol +author: John Snow Labs +name: whisper_small_divehi_frorozcol_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_divehi_frorozcol_pipeline` is a English model originally trained by Frorozcol. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_divehi_frorozcol_pipeline_en_5.5.0_3.0_1726714535609.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_divehi_frorozcol_pipeline_en_5.5.0_3.0_1726714535609.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_divehi_frorozcol_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_divehi_frorozcol_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_divehi_frorozcol_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|389.9 MB| + +## References + +https://huggingface.co/Frorozcol/whisper-small-dv + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-whisper_small_divehi_jackoyoungblood_en.md b/docs/_posts/ahmedlone127/2024-09-19-whisper_small_divehi_jackoyoungblood_en.md new file mode 100644 index 00000000000000..07aa1c1b49817a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-whisper_small_divehi_jackoyoungblood_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_small_divehi_jackoyoungblood WhisperForCTC from jackoyoungblood +author: John Snow Labs +name: whisper_small_divehi_jackoyoungblood +date: 2024-09-19 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_divehi_jackoyoungblood` is a English model originally trained by jackoyoungblood. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_divehi_jackoyoungblood_en_5.5.0_3.0_1726759883402.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_divehi_jackoyoungblood_en_5.5.0_3.0_1726759883402.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_divehi_jackoyoungblood","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_divehi_jackoyoungblood", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_divehi_jackoyoungblood| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/jackoyoungblood/whisper-small-dv \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-whisper_small_divehi_jackoyoungblood_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-whisper_small_divehi_jackoyoungblood_pipeline_en.md new file mode 100644 index 00000000000000..378a49811382fd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-whisper_small_divehi_jackoyoungblood_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_small_divehi_jackoyoungblood_pipeline pipeline WhisperForCTC from jackoyoungblood +author: John Snow Labs +name: whisper_small_divehi_jackoyoungblood_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_divehi_jackoyoungblood_pipeline` is a English model originally trained by jackoyoungblood. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_divehi_jackoyoungblood_pipeline_en_5.5.0_3.0_1726759966409.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_divehi_jackoyoungblood_pipeline_en_5.5.0_3.0_1726759966409.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_divehi_jackoyoungblood_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_divehi_jackoyoungblood_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_divehi_jackoyoungblood_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/jackoyoungblood/whisper-small-dv + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-whisper_small_eg_pipeline_ar.md b/docs/_posts/ahmedlone127/2024-09-19-whisper_small_eg_pipeline_ar.md new file mode 100644 index 00000000000000..3dbda784895203 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-whisper_small_eg_pipeline_ar.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Arabic whisper_small_eg_pipeline pipeline WhisperForCTC from abuelnasr +author: John Snow Labs +name: whisper_small_eg_pipeline +date: 2024-09-19 +tags: [ar, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: ar +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_eg_pipeline` is a Arabic model originally trained by abuelnasr. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_eg_pipeline_ar_5.5.0_3.0_1726788836844.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_eg_pipeline_ar_5.5.0_3.0_1726788836844.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_eg_pipeline", lang = "ar") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_eg_pipeline", lang = "ar") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_eg_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|ar| +|Size:|1.7 GB| + +## References + +https://huggingface.co/abuelnasr/whisper-small-eg + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-whisper_small_hausa_seon25_pipeline_ha.md b/docs/_posts/ahmedlone127/2024-09-19-whisper_small_hausa_seon25_pipeline_ha.md new file mode 100644 index 00000000000000..bbd2b4ee32f562 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-whisper_small_hausa_seon25_pipeline_ha.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Hausa whisper_small_hausa_seon25_pipeline pipeline WhisperForCTC from Seon25 +author: John Snow Labs +name: whisper_small_hausa_seon25_pipeline +date: 2024-09-19 +tags: [ha, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: ha +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_hausa_seon25_pipeline` is a Hausa model originally trained by Seon25. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_hausa_seon25_pipeline_ha_5.5.0_3.0_1726714465701.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_hausa_seon25_pipeline_ha_5.5.0_3.0_1726714465701.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_hausa_seon25_pipeline", lang = "ha") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_hausa_seon25_pipeline", lang = "ha") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_hausa_seon25_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|ha| +|Size:|641.7 MB| + +## References + +https://huggingface.co/Seon25/whisper-small-ha + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-whisper_small_hindi_zemans_hi.md b/docs/_posts/ahmedlone127/2024-09-19-whisper_small_hindi_zemans_hi.md new file mode 100644 index 00000000000000..f3ff6db81e5da0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-whisper_small_hindi_zemans_hi.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Hindi whisper_small_hindi_zemans WhisperForCTC from Zemans +author: John Snow Labs +name: whisper_small_hindi_zemans +date: 2024-09-19 +tags: [hi, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: hi +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_hindi_zemans` is a Hindi model originally trained by Zemans. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_hindi_zemans_hi_5.5.0_3.0_1726755414843.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_hindi_zemans_hi_5.5.0_3.0_1726755414843.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_hindi_zemans","hi") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_hindi_zemans", "hi") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_hindi_zemans| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|hi| +|Size:|1.7 GB| + +## References + +https://huggingface.co/Zemans/whisper-small-hi \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-whisper_small_icelandic_is.md b/docs/_posts/ahmedlone127/2024-09-19-whisper_small_icelandic_is.md new file mode 100644 index 00000000000000..8071e50457b21f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-whisper_small_icelandic_is.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Icelandic whisper_small_icelandic WhisperForCTC from Valdimarb13 +author: John Snow Labs +name: whisper_small_icelandic +date: 2024-09-19 +tags: [is, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: is +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_icelandic` is a Icelandic model originally trained by Valdimarb13. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_icelandic_is_5.5.0_3.0_1726758164834.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_icelandic_is_5.5.0_3.0_1726758164834.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_icelandic","is") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_icelandic", "is") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_icelandic| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|is| +|Size:|1.7 GB| + +## References + +https://huggingface.co/Valdimarb13/whisper-small-icelandic \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-whisper_small_icelandic_pipeline_is.md b/docs/_posts/ahmedlone127/2024-09-19-whisper_small_icelandic_pipeline_is.md new file mode 100644 index 00000000000000..a10430f0b688db --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-whisper_small_icelandic_pipeline_is.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Icelandic whisper_small_icelandic_pipeline pipeline WhisperForCTC from Valdimarb13 +author: John Snow Labs +name: whisper_small_icelandic_pipeline +date: 2024-09-19 +tags: [is, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: is +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_icelandic_pipeline` is a Icelandic model originally trained by Valdimarb13. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_icelandic_pipeline_is_5.5.0_3.0_1726758248424.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_icelandic_pipeline_is_5.5.0_3.0_1726758248424.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_icelandic_pipeline", lang = "is") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_icelandic_pipeline", lang = "is") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_icelandic_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|is| +|Size:|1.7 GB| + +## References + +https://huggingface.co/Valdimarb13/whisper-small-icelandic + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-whisper_small_indonesian_rizka_pipeline_id.md b/docs/_posts/ahmedlone127/2024-09-19-whisper_small_indonesian_rizka_pipeline_id.md new file mode 100644 index 00000000000000..c8607bef7c6a09 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-whisper_small_indonesian_rizka_pipeline_id.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Indonesian whisper_small_indonesian_rizka_pipeline pipeline WhisperForCTC from Rizka +author: John Snow Labs +name: whisper_small_indonesian_rizka_pipeline +date: 2024-09-19 +tags: [id, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: id +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_indonesian_rizka_pipeline` is a Indonesian model originally trained by Rizka. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_indonesian_rizka_pipeline_id_5.5.0_3.0_1726756129702.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_indonesian_rizka_pipeline_id_5.5.0_3.0_1726756129702.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_indonesian_rizka_pipeline", lang = "id") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_indonesian_rizka_pipeline", lang = "id") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_indonesian_rizka_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|id| +|Size:|1.7 GB| + +## References + +https://huggingface.co/Rizka/whisper-small-id + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-whisper_small_llm_lingo_p_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-whisper_small_llm_lingo_p_pipeline_en.md new file mode 100644 index 00000000000000..b23c3dd10ea69c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-whisper_small_llm_lingo_p_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_small_llm_lingo_p_pipeline pipeline WhisperForCTC from Enagamirzayev +author: John Snow Labs +name: whisper_small_llm_lingo_p_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_llm_lingo_p_pipeline` is a English model originally trained by Enagamirzayev. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_llm_lingo_p_pipeline_en_5.5.0_3.0_1726716511302.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_llm_lingo_p_pipeline_en_5.5.0_3.0_1726716511302.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_llm_lingo_p_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_llm_lingo_p_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_llm_lingo_p_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/Enagamirzayev/whisper-small-llm-lingo_p + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-whisper_small_nepal_bhasa_russian_polish_bulgarian_a_bg.md b/docs/_posts/ahmedlone127/2024-09-19-whisper_small_nepal_bhasa_russian_polish_bulgarian_a_bg.md new file mode 100644 index 00000000000000..f4048cb8c990cc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-whisper_small_nepal_bhasa_russian_polish_bulgarian_a_bg.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Bulgarian whisper_small_nepal_bhasa_russian_polish_bulgarian_a WhisperForCTC from Maks545curve +author: John Snow Labs +name: whisper_small_nepal_bhasa_russian_polish_bulgarian_a +date: 2024-09-19 +tags: [bg, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: bg +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_nepal_bhasa_russian_polish_bulgarian_a` is a Bulgarian model originally trained by Maks545curve. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_nepal_bhasa_russian_polish_bulgarian_a_bg_5.5.0_3.0_1726758283849.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_nepal_bhasa_russian_polish_bulgarian_a_bg_5.5.0_3.0_1726758283849.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_nepal_bhasa_russian_polish_bulgarian_a","bg") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_nepal_bhasa_russian_polish_bulgarian_a", "bg") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_nepal_bhasa_russian_polish_bulgarian_a| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|bg| +|Size:|1.7 GB| + +## References + +https://huggingface.co/Maks545curve/whisper-small-new-ru-pl-bg-a \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-whisper_small_nepal_bhasa_russian_polish_bulgarian_a_pipeline_bg.md b/docs/_posts/ahmedlone127/2024-09-19-whisper_small_nepal_bhasa_russian_polish_bulgarian_a_pipeline_bg.md new file mode 100644 index 00000000000000..d06dc7908ee65e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-whisper_small_nepal_bhasa_russian_polish_bulgarian_a_pipeline_bg.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Bulgarian whisper_small_nepal_bhasa_russian_polish_bulgarian_a_pipeline pipeline WhisperForCTC from Maks545curve +author: John Snow Labs +name: whisper_small_nepal_bhasa_russian_polish_bulgarian_a_pipeline +date: 2024-09-19 +tags: [bg, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: bg +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_nepal_bhasa_russian_polish_bulgarian_a_pipeline` is a Bulgarian model originally trained by Maks545curve. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_nepal_bhasa_russian_polish_bulgarian_a_pipeline_bg_5.5.0_3.0_1726758375721.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_nepal_bhasa_russian_polish_bulgarian_a_pipeline_bg_5.5.0_3.0_1726758375721.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_nepal_bhasa_russian_polish_bulgarian_a_pipeline", lang = "bg") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_nepal_bhasa_russian_polish_bulgarian_a_pipeline", lang = "bg") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_nepal_bhasa_russian_polish_bulgarian_a_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|bg| +|Size:|1.7 GB| + +## References + +https://huggingface.co/Maks545curve/whisper-small-new-ru-pl-bg-a + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-whisper_small_yoruba_07_17_en.md b/docs/_posts/ahmedlone127/2024-09-19-whisper_small_yoruba_07_17_en.md new file mode 100644 index 00000000000000..d275f1998c4d3e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-whisper_small_yoruba_07_17_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_small_yoruba_07_17 WhisperForCTC from ccibeekeoc42 +author: John Snow Labs +name: whisper_small_yoruba_07_17 +date: 2024-09-19 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_yoruba_07_17` is a English model originally trained by ccibeekeoc42. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_yoruba_07_17_en_5.5.0_3.0_1726759978053.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_yoruba_07_17_en_5.5.0_3.0_1726759978053.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_yoruba_07_17","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_yoruba_07_17", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_yoruba_07_17| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/ccibeekeoc42/whisper-small-yoruba-07-17 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-whisper_small_yoruba_07_17_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-whisper_small_yoruba_07_17_pipeline_en.md new file mode 100644 index 00000000000000..3e9ca824cdaf4a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-whisper_small_yoruba_07_17_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_small_yoruba_07_17_pipeline pipeline WhisperForCTC from ccibeekeoc42 +author: John Snow Labs +name: whisper_small_yoruba_07_17_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_yoruba_07_17_pipeline` is a English model originally trained by ccibeekeoc42. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_yoruba_07_17_pipeline_en_5.5.0_3.0_1726760064384.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_yoruba_07_17_pipeline_en_5.5.0_3.0_1726760064384.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_yoruba_07_17_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_yoruba_07_17_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_yoruba_07_17_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/ccibeekeoc42/whisper-small-yoruba-07-17 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-whisper_tiny_english_atco2_asr_en.md b/docs/_posts/ahmedlone127/2024-09-19-whisper_tiny_english_atco2_asr_en.md new file mode 100644 index 00000000000000..be9a8bccecdc46 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-whisper_tiny_english_atco2_asr_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_tiny_english_atco2_asr WhisperForCTC from jlvdoorn +author: John Snow Labs +name: whisper_tiny_english_atco2_asr +date: 2024-09-19 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_english_atco2_asr` is a English model originally trained by jlvdoorn. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_english_atco2_asr_en_5.5.0_3.0_1726788133454.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_english_atco2_asr_en_5.5.0_3.0_1726788133454.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_tiny_english_atco2_asr","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_tiny_english_atco2_asr", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_english_atco2_asr| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|393.9 MB| + +## References + +https://huggingface.co/jlvdoorn/whisper-tiny.en-atco2-asr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-whisper_tiny_english_codingqueen13_en.md b/docs/_posts/ahmedlone127/2024-09-19-whisper_tiny_english_codingqueen13_en.md new file mode 100644 index 00000000000000..d435243e054fd1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-whisper_tiny_english_codingqueen13_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_tiny_english_codingqueen13 WhisperForCTC from CodingQueen13 +author: John Snow Labs +name: whisper_tiny_english_codingqueen13 +date: 2024-09-19 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_english_codingqueen13` is a English model originally trained by CodingQueen13. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_english_codingqueen13_en_5.5.0_3.0_1726759756796.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_english_codingqueen13_en_5.5.0_3.0_1726759756796.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_tiny_english_codingqueen13","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_tiny_english_codingqueen13", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_english_codingqueen13| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|389.9 MB| + +## References + +https://huggingface.co/CodingQueen13/whisper-tiny-en \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-whisper_tiny_english_codingqueen13_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-whisper_tiny_english_codingqueen13_pipeline_en.md new file mode 100644 index 00000000000000..8c6d4972f3916e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-whisper_tiny_english_codingqueen13_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_tiny_english_codingqueen13_pipeline pipeline WhisperForCTC from CodingQueen13 +author: John Snow Labs +name: whisper_tiny_english_codingqueen13_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_english_codingqueen13_pipeline` is a English model originally trained by CodingQueen13. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_english_codingqueen13_pipeline_en_5.5.0_3.0_1726759777845.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_english_codingqueen13_pipeline_en_5.5.0_3.0_1726759777845.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_tiny_english_codingqueen13_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_tiny_english_codingqueen13_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_english_codingqueen13_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|389.9 MB| + +## References + +https://huggingface.co/CodingQueen13/whisper-tiny-en + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-whisper_tiny_english_minds14_magnustragardh_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-whisper_tiny_english_minds14_magnustragardh_pipeline_en.md new file mode 100644 index 00000000000000..73008df602f64a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-whisper_tiny_english_minds14_magnustragardh_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_tiny_english_minds14_magnustragardh_pipeline pipeline WhisperForCTC from magnustragardh +author: John Snow Labs +name: whisper_tiny_english_minds14_magnustragardh_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_english_minds14_magnustragardh_pipeline` is a English model originally trained by magnustragardh. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_english_minds14_magnustragardh_pipeline_en_5.5.0_3.0_1726789018902.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_english_minds14_magnustragardh_pipeline_en_5.5.0_3.0_1726789018902.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_tiny_english_minds14_magnustragardh_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_tiny_english_minds14_magnustragardh_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_english_minds14_magnustragardh_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/magnustragardh/whisper-tiny-en-minds14 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-whisper_tiny_finetune_hindi_fleurs_hi.md b/docs/_posts/ahmedlone127/2024-09-19-whisper_tiny_finetune_hindi_fleurs_hi.md new file mode 100644 index 00000000000000..674b74d184be20 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-whisper_tiny_finetune_hindi_fleurs_hi.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Hindi whisper_tiny_finetune_hindi_fleurs WhisperForCTC from Aryan-401 +author: John Snow Labs +name: whisper_tiny_finetune_hindi_fleurs +date: 2024-09-19 +tags: [hi, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: hi +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_finetune_hindi_fleurs` is a Hindi model originally trained by Aryan-401. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_finetune_hindi_fleurs_hi_5.5.0_3.0_1726714588010.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_finetune_hindi_fleurs_hi_5.5.0_3.0_1726714588010.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_tiny_finetune_hindi_fleurs","hi") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_tiny_finetune_hindi_fleurs", "hi") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_finetune_hindi_fleurs| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|hi| +|Size:|390.0 MB| + +## References + +https://huggingface.co/Aryan-401/whisper-tiny-finetune-hindi-fleurs \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-whisper_tiny_indonesian_cahya_id.md b/docs/_posts/ahmedlone127/2024-09-19-whisper_tiny_indonesian_cahya_id.md new file mode 100644 index 00000000000000..674670158389b8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-whisper_tiny_indonesian_cahya_id.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Indonesian whisper_tiny_indonesian_cahya WhisperForCTC from cahya +author: John Snow Labs +name: whisper_tiny_indonesian_cahya +date: 2024-09-19 +tags: [id, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: id +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_indonesian_cahya` is a Indonesian model originally trained by cahya. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_indonesian_cahya_id_5.5.0_3.0_1726712665653.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_indonesian_cahya_id_5.5.0_3.0_1726712665653.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_tiny_indonesian_cahya","id") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_tiny_indonesian_cahya", "id") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_indonesian_cahya| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|id| +|Size:|389.8 MB| + +## References + +https://huggingface.co/cahya/whisper-tiny-id \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-whisper_tiny_italian_11_it.md b/docs/_posts/ahmedlone127/2024-09-19-whisper_tiny_italian_11_it.md new file mode 100644 index 00000000000000..3e622a4201d4a6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-whisper_tiny_italian_11_it.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Italian whisper_tiny_italian_11 WhisperForCTC from FCameCode +author: John Snow Labs +name: whisper_tiny_italian_11 +date: 2024-09-19 +tags: [it, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: it +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_italian_11` is a Italian model originally trained by FCameCode. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_italian_11_it_5.5.0_3.0_1726787413831.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_italian_11_it_5.5.0_3.0_1726787413831.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_tiny_italian_11","it") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_tiny_italian_11", "it") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_italian_11| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|it| +|Size:|390.9 MB| + +## References + +https://huggingface.co/FCameCode/whisper-tiny-it-11 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-whisper_tiny_italian_11_pipeline_it.md b/docs/_posts/ahmedlone127/2024-09-19-whisper_tiny_italian_11_pipeline_it.md new file mode 100644 index 00000000000000..0a73cbca4d8d64 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-whisper_tiny_italian_11_pipeline_it.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Italian whisper_tiny_italian_11_pipeline pipeline WhisperForCTC from FCameCode +author: John Snow Labs +name: whisper_tiny_italian_11_pipeline +date: 2024-09-19 +tags: [it, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: it +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_italian_11_pipeline` is a Italian model originally trained by FCameCode. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_italian_11_pipeline_it_5.5.0_3.0_1726787434948.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_italian_11_pipeline_it_5.5.0_3.0_1726787434948.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_tiny_italian_11_pipeline", lang = "it") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_tiny_italian_11_pipeline", lang = "it") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_italian_11_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|it| +|Size:|390.9 MB| + +## References + +https://huggingface.co/FCameCode/whisper-tiny-it-11 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-whisper_tiny_minds14_english_chugyouk_en.md b/docs/_posts/ahmedlone127/2024-09-19-whisper_tiny_minds14_english_chugyouk_en.md new file mode 100644 index 00000000000000..39de0821c0d680 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-whisper_tiny_minds14_english_chugyouk_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_tiny_minds14_english_chugyouk WhisperForCTC from ChuGyouk +author: John Snow Labs +name: whisper_tiny_minds14_english_chugyouk +date: 2024-09-19 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_minds14_english_chugyouk` is a English model originally trained by ChuGyouk. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_minds14_english_chugyouk_en_5.5.0_3.0_1726787453559.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_minds14_english_chugyouk_en_5.5.0_3.0_1726787453559.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_tiny_minds14_english_chugyouk","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_tiny_minds14_english_chugyouk", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_minds14_english_chugyouk| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|389.9 MB| + +## References + +https://huggingface.co/ChuGyouk/whisper-tiny-minds14-en \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-whisper_tiny_minds14_english_chugyouk_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-whisper_tiny_minds14_english_chugyouk_pipeline_en.md new file mode 100644 index 00000000000000..e32914a30699c8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-whisper_tiny_minds14_english_chugyouk_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_tiny_minds14_english_chugyouk_pipeline pipeline WhisperForCTC from ChuGyouk +author: John Snow Labs +name: whisper_tiny_minds14_english_chugyouk_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_minds14_english_chugyouk_pipeline` is a English model originally trained by ChuGyouk. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_minds14_english_chugyouk_pipeline_en_5.5.0_3.0_1726787472501.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_minds14_english_chugyouk_pipeline_en_5.5.0_3.0_1726787472501.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_tiny_minds14_english_chugyouk_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_tiny_minds14_english_chugyouk_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_minds14_english_chugyouk_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|390.0 MB| + +## References + +https://huggingface.co/ChuGyouk/whisper-tiny-minds14-en + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-whisper_tiny_serbian_combined_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-whisper_tiny_serbian_combined_pipeline_en.md new file mode 100644 index 00000000000000..e91097c9923938 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-whisper_tiny_serbian_combined_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_tiny_serbian_combined_pipeline pipeline WhisperForCTC from cminja +author: John Snow Labs +name: whisper_tiny_serbian_combined_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_serbian_combined_pipeline` is a English model originally trained by cminja. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_serbian_combined_pipeline_en_5.5.0_3.0_1726788912059.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_serbian_combined_pipeline_en_5.5.0_3.0_1726788912059.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_tiny_serbian_combined_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_tiny_serbian_combined_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_serbian_combined_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|390.0 MB| + +## References + +https://huggingface.co/cminja/whisper-tiny-sr-combined + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-withinapps_ndd_ppma_test_content_tags_cwadj_en.md b/docs/_posts/ahmedlone127/2024-09-19-withinapps_ndd_ppma_test_content_tags_cwadj_en.md new file mode 100644 index 00000000000000..20757d5f8b948a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-withinapps_ndd_ppma_test_content_tags_cwadj_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English withinapps_ndd_ppma_test_content_tags_cwadj DistilBertForSequenceClassification from lgk03 +author: John Snow Labs +name: withinapps_ndd_ppma_test_content_tags_cwadj +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`withinapps_ndd_ppma_test_content_tags_cwadj` is a English model originally trained by lgk03. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/withinapps_ndd_ppma_test_content_tags_cwadj_en_5.5.0_3.0_1726741023170.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/withinapps_ndd_ppma_test_content_tags_cwadj_en_5.5.0_3.0_1726741023170.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("withinapps_ndd_ppma_test_content_tags_cwadj","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("withinapps_ndd_ppma_test_content_tags_cwadj", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|withinapps_ndd_ppma_test_content_tags_cwadj| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/lgk03/WITHINAPPS_NDD-ppma_test-content_tags-CWAdj \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-withinapps_ndd_ppma_test_content_tags_cwadj_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-withinapps_ndd_ppma_test_content_tags_cwadj_pipeline_en.md new file mode 100644 index 00000000000000..7ef800d3dcdee3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-withinapps_ndd_ppma_test_content_tags_cwadj_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English withinapps_ndd_ppma_test_content_tags_cwadj_pipeline pipeline DistilBertForSequenceClassification from lgk03 +author: John Snow Labs +name: withinapps_ndd_ppma_test_content_tags_cwadj_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`withinapps_ndd_ppma_test_content_tags_cwadj_pipeline` is a English model originally trained by lgk03. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/withinapps_ndd_ppma_test_content_tags_cwadj_pipeline_en_5.5.0_3.0_1726741035737.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/withinapps_ndd_ppma_test_content_tags_cwadj_pipeline_en_5.5.0_3.0_1726741035737.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("withinapps_ndd_ppma_test_content_tags_cwadj_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("withinapps_ndd_ppma_test_content_tags_cwadj_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|withinapps_ndd_ppma_test_content_tags_cwadj_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/lgk03/WITHINAPPS_NDD-ppma_test-content_tags-CWAdj + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-xlm_roberta_base_albiecofie_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-xlm_roberta_base_albiecofie_pipeline_en.md new file mode 100644 index 00000000000000..f03029beb13fed --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-xlm_roberta_base_albiecofie_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_albiecofie_pipeline pipeline XlmRoBertaForSequenceClassification from AlbieCofie +author: John Snow Labs +name: xlm_roberta_base_albiecofie_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_albiecofie_pipeline` is a English model originally trained by AlbieCofie. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_albiecofie_pipeline_en_5.5.0_3.0_1726752910422.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_albiecofie_pipeline_en_5.5.0_3.0_1726752910422.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_albiecofie_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_albiecofie_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_albiecofie_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.0 GB| + +## References + +https://huggingface.co/AlbieCofie/xlm_roberta_base + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-xlm_roberta_base_finetuned_panx_all_youngbeauty_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-xlm_roberta_base_finetuned_panx_all_youngbeauty_pipeline_en.md new file mode 100644 index 00000000000000..a8ebfd41e6b720 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-xlm_roberta_base_finetuned_panx_all_youngbeauty_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_all_youngbeauty_pipeline pipeline XlmRoBertaForTokenClassification from YoungBeauty +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_all_youngbeauty_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_all_youngbeauty_pipeline` is a English model originally trained by YoungBeauty. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_youngbeauty_pipeline_en_5.5.0_3.0_1726738367745.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_youngbeauty_pipeline_en_5.5.0_3.0_1726738367745.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_all_youngbeauty_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_all_youngbeauty_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_all_youngbeauty_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|861.0 MB| + +## References + +https://huggingface.co/YoungBeauty/xlm-roberta-base-finetuned-panx-all + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-xlm_roberta_base_finetuned_panx_english_hhffxx_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-xlm_roberta_base_finetuned_panx_english_hhffxx_pipeline_en.md new file mode 100644 index 00000000000000..39dca5ddcf8102 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-xlm_roberta_base_finetuned_panx_english_hhffxx_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_english_hhffxx_pipeline pipeline XlmRoBertaForTokenClassification from hhffxx +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_english_hhffxx_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_english_hhffxx_pipeline` is a English model originally trained by hhffxx. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_hhffxx_pipeline_en_5.5.0_3.0_1726708689902.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_hhffxx_pipeline_en_5.5.0_3.0_1726708689902.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_english_hhffxx_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_english_hhffxx_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_english_hhffxx_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|837.2 MB| + +## References + +https://huggingface.co/hhffxx/xlm-roberta-base-finetuned-panx-en + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-xlm_roberta_base_finetuned_panx_english_munsu_en.md b/docs/_posts/ahmedlone127/2024-09-19-xlm_roberta_base_finetuned_panx_english_munsu_en.md new file mode 100644 index 00000000000000..df119a9c896117 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-xlm_roberta_base_finetuned_panx_english_munsu_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_english_munsu XlmRoBertaForTokenClassification from MunSu +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_english_munsu +date: 2024-09-19 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_english_munsu` is a English model originally trained by MunSu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_munsu_en_5.5.0_3.0_1726708478300.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_munsu_en_5.5.0_3.0_1726708478300.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_english_munsu","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_english_munsu", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_english_munsu| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|859.4 MB| + +## References + +https://huggingface.co/MunSu/xlm-roberta-base-finetuned-panx-en \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-xlm_roberta_base_finetuned_panx_english_u00890358_en.md b/docs/_posts/ahmedlone127/2024-09-19-xlm_roberta_base_finetuned_panx_english_u00890358_en.md new file mode 100644 index 00000000000000..4c12148ee328f7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-xlm_roberta_base_finetuned_panx_english_u00890358_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_english_u00890358 XlmRoBertaForTokenClassification from u00890358 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_english_u00890358 +date: 2024-09-19 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_english_u00890358` is a English model originally trained by u00890358. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_u00890358_en_5.5.0_3.0_1726711378234.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_u00890358_en_5.5.0_3.0_1726711378234.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_english_u00890358","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_english_u00890358", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_english_u00890358| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|814.3 MB| + +## References + +https://huggingface.co/u00890358/xlm-roberta-base-finetuned-panx-en \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-xlm_roberta_base_finetuned_panx_french_bluetree99_en.md b/docs/_posts/ahmedlone127/2024-09-19-xlm_roberta_base_finetuned_panx_french_bluetree99_en.md new file mode 100644 index 00000000000000..8acb2371b28d15 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-xlm_roberta_base_finetuned_panx_french_bluetree99_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_bluetree99 XlmRoBertaForTokenClassification from bluetree99 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_bluetree99 +date: 2024-09-19 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_bluetree99` is a English model originally trained by bluetree99. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_bluetree99_en_5.5.0_3.0_1726753623802.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_bluetree99_en_5.5.0_3.0_1726753623802.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_french_bluetree99","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_french_bluetree99", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_bluetree99| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|840.9 MB| + +## References + +https://huggingface.co/bluetree99/xlm-roberta-base-finetuned-panx-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-xlm_roberta_base_finetuned_panx_french_occupy1_en.md b/docs/_posts/ahmedlone127/2024-09-19-xlm_roberta_base_finetuned_panx_french_occupy1_en.md new file mode 100644 index 00000000000000..e74977de7ee78f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-xlm_roberta_base_finetuned_panx_french_occupy1_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_occupy1 XlmRoBertaForTokenClassification from occupy1 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_occupy1 +date: 2024-09-19 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_occupy1` is a English model originally trained by occupy1. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_occupy1_en_5.5.0_3.0_1726754400059.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_occupy1_en_5.5.0_3.0_1726754400059.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_french_occupy1","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_french_occupy1", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_occupy1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|827.9 MB| + +## References + +https://huggingface.co/occupy1/xlm-roberta-base-finetuned-panx-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-xlm_roberta_base_finetuned_panx_german_azaidi_face_en.md b/docs/_posts/ahmedlone127/2024-09-19-xlm_roberta_base_finetuned_panx_german_azaidi_face_en.md new file mode 100644 index 00000000000000..f648a10f960362 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-xlm_roberta_base_finetuned_panx_german_azaidi_face_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_azaidi_face XlmRoBertaForTokenClassification from azaidi-face +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_azaidi_face +date: 2024-09-19 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_azaidi_face` is a English model originally trained by azaidi-face. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_azaidi_face_en_5.5.0_3.0_1726753927288.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_azaidi_face_en_5.5.0_3.0_1726753927288.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_azaidi_face","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_azaidi_face", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_azaidi_face| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/azaidi-face/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-xlm_roberta_base_finetuned_panx_german_dinasalama_en.md b/docs/_posts/ahmedlone127/2024-09-19-xlm_roberta_base_finetuned_panx_german_dinasalama_en.md new file mode 100644 index 00000000000000..23ac8b3a41679e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-xlm_roberta_base_finetuned_panx_german_dinasalama_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_dinasalama XlmRoBertaForTokenClassification from DinaSalama +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_dinasalama +date: 2024-09-19 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_dinasalama` is a English model originally trained by DinaSalama. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_dinasalama_en_5.5.0_3.0_1726738301118.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_dinasalama_en_5.5.0_3.0_1726738301118.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_dinasalama","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_dinasalama", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_dinasalama| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|851.7 MB| + +## References + +https://huggingface.co/DinaSalama/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-xlm_roberta_base_finetuned_panx_german_dinasalama_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-xlm_roberta_base_finetuned_panx_german_dinasalama_pipeline_en.md new file mode 100644 index 00000000000000..6c6e8548c83c5d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-xlm_roberta_base_finetuned_panx_german_dinasalama_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_dinasalama_pipeline pipeline XlmRoBertaForTokenClassification from DinaSalama +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_dinasalama_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_dinasalama_pipeline` is a English model originally trained by DinaSalama. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_dinasalama_pipeline_en_5.5.0_3.0_1726738380246.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_dinasalama_pipeline_en_5.5.0_3.0_1726738380246.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_dinasalama_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_dinasalama_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_dinasalama_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|851.8 MB| + +## References + +https://huggingface.co/DinaSalama/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-xlm_roberta_base_finetuned_panx_german_duykha0511_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-xlm_roberta_base_finetuned_panx_german_duykha0511_pipeline_en.md new file mode 100644 index 00000000000000..f2dcc1c2ee91f3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-xlm_roberta_base_finetuned_panx_german_duykha0511_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_duykha0511_pipeline pipeline XlmRoBertaForTokenClassification from duykha0511 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_duykha0511_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_duykha0511_pipeline` is a English model originally trained by duykha0511. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_duykha0511_pipeline_en_5.5.0_3.0_1726711155638.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_duykha0511_pipeline_en_5.5.0_3.0_1726711155638.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_duykha0511_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_duykha0511_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_duykha0511_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/duykha0511/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-xlm_roberta_base_finetuned_panx_german_french_chaoli_en.md b/docs/_posts/ahmedlone127/2024-09-19-xlm_roberta_base_finetuned_panx_german_french_chaoli_en.md new file mode 100644 index 00000000000000..6f351c356a3b82 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-xlm_roberta_base_finetuned_panx_german_french_chaoli_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_chaoli XlmRoBertaForTokenClassification from ChaoLi +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_chaoli +date: 2024-09-19 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_chaoli` is a English model originally trained by ChaoLi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_chaoli_en_5.5.0_3.0_1726708981131.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_chaoli_en_5.5.0_3.0_1726708981131.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_chaoli","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_chaoli", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_chaoli| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|858.2 MB| + +## References + +https://huggingface.co/ChaoLi/xlm-roberta-base-finetuned-panx-de-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-xlm_roberta_base_finetuned_panx_german_french_mcparty2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-xlm_roberta_base_finetuned_panx_german_french_mcparty2_pipeline_en.md new file mode 100644 index 00000000000000..fe69a4923b2867 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-xlm_roberta_base_finetuned_panx_german_french_mcparty2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_mcparty2_pipeline pipeline XlmRoBertaForTokenClassification from mcparty2 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_mcparty2_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_mcparty2_pipeline` is a English model originally trained by mcparty2. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_mcparty2_pipeline_en_5.5.0_3.0_1726738208148.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_mcparty2_pipeline_en_5.5.0_3.0_1726738208148.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_mcparty2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_mcparty2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_mcparty2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|843.4 MB| + +## References + +https://huggingface.co/mcparty2/xlm-roberta-base-finetuned-panx-de-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-xlm_roberta_base_finetuned_panx_german_french_munsu_en.md b/docs/_posts/ahmedlone127/2024-09-19-xlm_roberta_base_finetuned_panx_german_french_munsu_en.md new file mode 100644 index 00000000000000..5dd530fbf8b3cd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-xlm_roberta_base_finetuned_panx_german_french_munsu_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_munsu XlmRoBertaForTokenClassification from MunSu +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_munsu +date: 2024-09-19 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_munsu` is a English model originally trained by MunSu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_munsu_en_5.5.0_3.0_1726753675409.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_munsu_en_5.5.0_3.0_1726753675409.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_munsu","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_munsu", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_munsu| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|858.3 MB| + +## References + +https://huggingface.co/MunSu/xlm-roberta-base-finetuned-panx-de-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-xlm_roberta_base_finetuned_panx_german_french_munsu_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-xlm_roberta_base_finetuned_panx_german_french_munsu_pipeline_en.md new file mode 100644 index 00000000000000..f2621ece6ac77a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-xlm_roberta_base_finetuned_panx_german_french_munsu_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_munsu_pipeline pipeline XlmRoBertaForTokenClassification from MunSu +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_munsu_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_munsu_pipeline` is a English model originally trained by MunSu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_munsu_pipeline_en_5.5.0_3.0_1726753742044.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_munsu_pipeline_en_5.5.0_3.0_1726753742044.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_munsu_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_munsu_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_munsu_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|858.3 MB| + +## References + +https://huggingface.co/MunSu/xlm-roberta-base-finetuned-panx-de-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-xlm_roberta_base_finetuned_panx_german_french_pockypocky_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-xlm_roberta_base_finetuned_panx_german_french_pockypocky_pipeline_en.md new file mode 100644 index 00000000000000..65de7ceba47b78 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-xlm_roberta_base_finetuned_panx_german_french_pockypocky_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_pockypocky_pipeline pipeline XlmRoBertaForTokenClassification from pockypocky +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_pockypocky_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_pockypocky_pipeline` is a English model originally trained by pockypocky. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_pockypocky_pipeline_en_5.5.0_3.0_1726708900954.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_pockypocky_pipeline_en_5.5.0_3.0_1726708900954.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_pockypocky_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_pockypocky_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_pockypocky_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|843.4 MB| + +## References + +https://huggingface.co/pockypocky/xlm-roberta-base-finetuned-panx-de-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-xlm_roberta_base_finetuned_panx_german_huggingbase_en.md b/docs/_posts/ahmedlone127/2024-09-19-xlm_roberta_base_finetuned_panx_german_huggingbase_en.md new file mode 100644 index 00000000000000..d794377142f422 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-xlm_roberta_base_finetuned_panx_german_huggingbase_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_huggingbase XlmRoBertaForTokenClassification from huggingbase +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_huggingbase +date: 2024-09-19 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_huggingbase` is a English model originally trained by huggingbase. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_huggingbase_en_5.5.0_3.0_1726737312716.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_huggingbase_en_5.5.0_3.0_1726737312716.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_huggingbase","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_huggingbase", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_huggingbase| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/huggingbase/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-xlm_roberta_base_finetuned_panx_german_jaydipsen_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-xlm_roberta_base_finetuned_panx_german_jaydipsen_pipeline_en.md new file mode 100644 index 00000000000000..de7ccc94be2b7a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-xlm_roberta_base_finetuned_panx_german_jaydipsen_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_jaydipsen_pipeline pipeline XlmRoBertaForTokenClassification from jaydipsen +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_jaydipsen_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_jaydipsen_pipeline` is a English model originally trained by jaydipsen. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_jaydipsen_pipeline_en_5.5.0_3.0_1726737672555.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_jaydipsen_pipeline_en_5.5.0_3.0_1726737672555.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_jaydipsen_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_jaydipsen_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_jaydipsen_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/jaydipsen/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-xlm_roberta_base_finetuned_panx_german_michaelkim_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-xlm_roberta_base_finetuned_panx_german_michaelkim_pipeline_en.md new file mode 100644 index 00000000000000..cdd52296b0d625 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-xlm_roberta_base_finetuned_panx_german_michaelkim_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_michaelkim_pipeline pipeline XlmRoBertaForTokenClassification from MichaelKim +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_michaelkim_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_michaelkim_pipeline` is a English model originally trained by MichaelKim. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_michaelkim_pipeline_en_5.5.0_3.0_1726754277444.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_michaelkim_pipeline_en_5.5.0_3.0_1726754277444.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_michaelkim_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_michaelkim_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_michaelkim_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|840.8 MB| + +## References + +https://huggingface.co/MichaelKim/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-xlm_roberta_base_finetuned_panx_german_princedl_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-xlm_roberta_base_finetuned_panx_german_princedl_pipeline_en.md new file mode 100644 index 00000000000000..d227d809a9cffa --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-xlm_roberta_base_finetuned_panx_german_princedl_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_princedl_pipeline pipeline XlmRoBertaForTokenClassification from princedl +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_princedl_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_princedl_pipeline` is a English model originally trained by princedl. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_princedl_pipeline_en_5.5.0_3.0_1726711488347.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_princedl_pipeline_en_5.5.0_3.0_1726711488347.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_princedl_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_princedl_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_princedl_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|844.5 MB| + +## References + +https://huggingface.co/princedl/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-xlm_roberta_base_finetuned_panx_german_prudhvip21_en.md b/docs/_posts/ahmedlone127/2024-09-19-xlm_roberta_base_finetuned_panx_german_prudhvip21_en.md new file mode 100644 index 00000000000000..18d70279d1b096 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-xlm_roberta_base_finetuned_panx_german_prudhvip21_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_prudhvip21 XlmRoBertaForTokenClassification from prudhvip21 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_prudhvip21 +date: 2024-09-19 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_prudhvip21` is a English model originally trained by prudhvip21. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_prudhvip21_en_5.5.0_3.0_1726753556245.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_prudhvip21_en_5.5.0_3.0_1726753556245.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_prudhvip21","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_prudhvip21", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_prudhvip21| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/prudhvip21/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-xlm_roberta_base_finetuned_panx_german_prudhvip21_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-xlm_roberta_base_finetuned_panx_german_prudhvip21_pipeline_en.md new file mode 100644 index 00000000000000..b322f6d42b5a9b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-xlm_roberta_base_finetuned_panx_german_prudhvip21_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_prudhvip21_pipeline pipeline XlmRoBertaForTokenClassification from prudhvip21 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_prudhvip21_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_prudhvip21_pipeline` is a English model originally trained by prudhvip21. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_prudhvip21_pipeline_en_5.5.0_3.0_1726753627145.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_prudhvip21_pipeline_en_5.5.0_3.0_1726753627145.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_prudhvip21_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_prudhvip21_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_prudhvip21_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/prudhvip21/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-xlm_roberta_base_finetuned_panx_german_ruihui_en.md b/docs/_posts/ahmedlone127/2024-09-19-xlm_roberta_base_finetuned_panx_german_ruihui_en.md new file mode 100644 index 00000000000000..5f9f35a99e383b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-xlm_roberta_base_finetuned_panx_german_ruihui_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_ruihui XlmRoBertaForTokenClassification from ruihui +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_ruihui +date: 2024-09-19 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_ruihui` is a English model originally trained by ruihui. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_ruihui_en_5.5.0_3.0_1726754237282.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_ruihui_en_5.5.0_3.0_1726754237282.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_ruihui","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_ruihui", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_ruihui| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|854.4 MB| + +## References + +https://huggingface.co/ruihui/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-xlm_roberta_base_finetuned_panx_italian_ashrielbrian_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-xlm_roberta_base_finetuned_panx_italian_ashrielbrian_pipeline_en.md new file mode 100644 index 00000000000000..7c8c7baa94c3c7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-xlm_roberta_base_finetuned_panx_italian_ashrielbrian_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_ashrielbrian_pipeline pipeline XlmRoBertaForTokenClassification from ashrielbrian +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_ashrielbrian_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_ashrielbrian_pipeline` is a English model originally trained by ashrielbrian. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_ashrielbrian_pipeline_en_5.5.0_3.0_1726708506081.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_ashrielbrian_pipeline_en_5.5.0_3.0_1726708506081.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_ashrielbrian_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_ashrielbrian_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_ashrielbrian_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|828.6 MB| + +## References + +https://huggingface.co/ashrielbrian/xlm-roberta-base-finetuned-panx-it + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-xlm_roberta_base_finetuned_panx_italian_k4west_en.md b/docs/_posts/ahmedlone127/2024-09-19-xlm_roberta_base_finetuned_panx_italian_k4west_en.md new file mode 100644 index 00000000000000..4fd35a726056f6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-xlm_roberta_base_finetuned_panx_italian_k4west_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_k4west XlmRoBertaForTokenClassification from k4west +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_k4west +date: 2024-09-19 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_k4west` is a English model originally trained by k4west. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_k4west_en_5.5.0_3.0_1726753875820.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_k4west_en_5.5.0_3.0_1726753875820.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_k4west","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_k4west", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_k4west| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|816.7 MB| + +## References + +https://huggingface.co/k4west/xlm-roberta-base-finetuned-panx-it \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-xlm_roberta_base_finetuned_panx_italian_k4west_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-xlm_roberta_base_finetuned_panx_italian_k4west_pipeline_en.md new file mode 100644 index 00000000000000..21444ab177fad1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-xlm_roberta_base_finetuned_panx_italian_k4west_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_k4west_pipeline pipeline XlmRoBertaForTokenClassification from k4west +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_k4west_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_k4west_pipeline` is a English model originally trained by k4west. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_k4west_pipeline_en_5.5.0_3.0_1726753976609.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_k4west_pipeline_en_5.5.0_3.0_1726753976609.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_k4west_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_k4west_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_k4west_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|816.8 MB| + +## References + +https://huggingface.co/k4west/xlm-roberta-base-finetuned-panx-it + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-xlm_roberta_base_finetuned_panx_italian_xrchen11_en.md b/docs/_posts/ahmedlone127/2024-09-19-xlm_roberta_base_finetuned_panx_italian_xrchen11_en.md new file mode 100644 index 00000000000000..a745a10fe613cd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-xlm_roberta_base_finetuned_panx_italian_xrchen11_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_xrchen11 XlmRoBertaForTokenClassification from xrchen11 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_xrchen11 +date: 2024-09-19 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_xrchen11` is a English model originally trained by xrchen11. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_xrchen11_en_5.5.0_3.0_1726753793660.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_xrchen11_en_5.5.0_3.0_1726753793660.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_xrchen11","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_xrchen11", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_xrchen11| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|816.7 MB| + +## References + +https://huggingface.co/xrchen11/xlm-roberta-base-finetuned-panx-it \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-xlm_roberta_base_language_detection_unklefedor_en.md b/docs/_posts/ahmedlone127/2024-09-19-xlm_roberta_base_language_detection_unklefedor_en.md new file mode 100644 index 00000000000000..8c5d8dab637785 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-xlm_roberta_base_language_detection_unklefedor_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_language_detection_unklefedor XlmRoBertaForSequenceClassification from unklefedor +author: John Snow Labs +name: xlm_roberta_base_language_detection_unklefedor +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_language_detection_unklefedor` is a English model originally trained by unklefedor. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_language_detection_unklefedor_en_5.5.0_3.0_1726752707238.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_language_detection_unklefedor_en_5.5.0_3.0_1726752707238.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_language_detection_unklefedor","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_language_detection_unklefedor", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_language_detection_unklefedor| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|911.9 MB| + +## References + +https://huggingface.co/unklefedor/xlm-roberta-base-language-detection \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-xlm_roberta_base_language_detection_unklefedor_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-xlm_roberta_base_language_detection_unklefedor_pipeline_en.md new file mode 100644 index 00000000000000..3e7187aee3e426 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-xlm_roberta_base_language_detection_unklefedor_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_language_detection_unklefedor_pipeline pipeline XlmRoBertaForSequenceClassification from unklefedor +author: John Snow Labs +name: xlm_roberta_base_language_detection_unklefedor_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_language_detection_unklefedor_pipeline` is a English model originally trained by unklefedor. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_language_detection_unklefedor_pipeline_en_5.5.0_3.0_1726752799887.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_language_detection_unklefedor_pipeline_en_5.5.0_3.0_1726752799887.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_language_detection_unklefedor_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_language_detection_unklefedor_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_language_detection_unklefedor_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|911.9 MB| + +## References + +https://huggingface.co/unklefedor/xlm-roberta-base-language-detection + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-xlm_roberta_base_lr0_0001_seed42_basic_original_amh_esp_eng_train_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-xlm_roberta_base_lr0_0001_seed42_basic_original_amh_esp_eng_train_pipeline_en.md new file mode 100644 index 00000000000000..fb998ce4c03384 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-xlm_roberta_base_lr0_0001_seed42_basic_original_amh_esp_eng_train_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_lr0_0001_seed42_basic_original_amh_esp_eng_train_pipeline pipeline XlmRoBertaForSequenceClassification from shanhy +author: John Snow Labs +name: xlm_roberta_base_lr0_0001_seed42_basic_original_amh_esp_eng_train_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_lr0_0001_seed42_basic_original_amh_esp_eng_train_pipeline` is a English model originally trained by shanhy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_lr0_0001_seed42_basic_original_amh_esp_eng_train_pipeline_en_5.5.0_3.0_1726752815153.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_lr0_0001_seed42_basic_original_amh_esp_eng_train_pipeline_en_5.5.0_3.0_1726752815153.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_lr0_0001_seed42_basic_original_amh_esp_eng_train_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_lr0_0001_seed42_basic_original_amh_esp_eng_train_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_lr0_0001_seed42_basic_original_amh_esp_eng_train_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|821.8 MB| + +## References + +https://huggingface.co/shanhy/xlm-roberta-base_lr0.0001_seed42_basic_original_amh-esp-eng_train + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-xlm_roberta_base_lr0_001_seed42_esp_kinyarwanda_eng_train_en.md b/docs/_posts/ahmedlone127/2024-09-19-xlm_roberta_base_lr0_001_seed42_esp_kinyarwanda_eng_train_en.md new file mode 100644 index 00000000000000..19947b416a3b3d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-xlm_roberta_base_lr0_001_seed42_esp_kinyarwanda_eng_train_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_lr0_001_seed42_esp_kinyarwanda_eng_train XlmRoBertaForSequenceClassification from shanhy +author: John Snow Labs +name: xlm_roberta_base_lr0_001_seed42_esp_kinyarwanda_eng_train +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_lr0_001_seed42_esp_kinyarwanda_eng_train` is a English model originally trained by shanhy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_lr0_001_seed42_esp_kinyarwanda_eng_train_en_5.5.0_3.0_1726721134117.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_lr0_001_seed42_esp_kinyarwanda_eng_train_en_5.5.0_3.0_1726721134117.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_lr0_001_seed42_esp_kinyarwanda_eng_train","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_lr0_001_seed42_esp_kinyarwanda_eng_train", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_lr0_001_seed42_esp_kinyarwanda_eng_train| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|835.1 MB| + +## References + +https://huggingface.co/shanhy/xlm-roberta-base_lr0.001_seed42_esp-kin-eng_train \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-xlm_roberta_finetuned_emojis_1_client_toxic_cen_2_en.md b/docs/_posts/ahmedlone127/2024-09-19-xlm_roberta_finetuned_emojis_1_client_toxic_cen_2_en.md new file mode 100644 index 00000000000000..9b51d49e9a6a7e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-xlm_roberta_finetuned_emojis_1_client_toxic_cen_2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_finetuned_emojis_1_client_toxic_cen_2 XlmRoBertaForSequenceClassification from Karim-Gamal +author: John Snow Labs +name: xlm_roberta_finetuned_emojis_1_client_toxic_cen_2 +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_finetuned_emojis_1_client_toxic_cen_2` is a English model originally trained by Karim-Gamal. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_finetuned_emojis_1_client_toxic_cen_2_en_5.5.0_3.0_1726752183517.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_finetuned_emojis_1_client_toxic_cen_2_en_5.5.0_3.0_1726752183517.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_finetuned_emojis_1_client_toxic_cen_2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_finetuned_emojis_1_client_toxic_cen_2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_finetuned_emojis_1_client_toxic_cen_2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.0 GB| + +## References + +https://huggingface.co/Karim-Gamal/XLM-Roberta-finetuned-emojis-1-client-toxic-cen-2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-xmlroberta_gendata_double_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-xmlroberta_gendata_double_pipeline_en.md new file mode 100644 index 00000000000000..be0283fbdb4523 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-xmlroberta_gendata_double_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xmlroberta_gendata_double_pipeline pipeline XlmRoBertaForSequenceClassification from Constien +author: John Snow Labs +name: xmlroberta_gendata_double_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xmlroberta_gendata_double_pipeline` is a English model originally trained by Constien. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xmlroberta_gendata_double_pipeline_en_5.5.0_3.0_1726752224567.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xmlroberta_gendata_double_pipeline_en_5.5.0_3.0_1726752224567.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xmlroberta_gendata_double_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xmlroberta_gendata_double_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xmlroberta_gendata_double_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|802.5 MB| + +## References + +https://huggingface.co/Constien/xmlRoberta_GenData_Double + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-0_00005_0_999_a98zhang_en.md b/docs/_posts/ahmedlone127/2024-09-20-0_00005_0_999_a98zhang_en.md new file mode 100644 index 00000000000000..8cb1371d0fd34e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-0_00005_0_999_a98zhang_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English 0_00005_0_999_a98zhang RoBertaForSequenceClassification from a98zhang +author: John Snow Labs +name: 0_00005_0_999_a98zhang +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`0_00005_0_999_a98zhang` is a English model originally trained by a98zhang. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/0_00005_0_999_a98zhang_en_5.5.0_3.0_1726852109223.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/0_00005_0_999_a98zhang_en_5.5.0_3.0_1726852109223.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("0_00005_0_999_a98zhang","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("0_00005_0_999_a98zhang", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|0_00005_0_999_a98zhang| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/a98zhang/0.00005_0.999 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-2020_q4_25p_filtered_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-2020_q4_25p_filtered_pipeline_en.md new file mode 100644 index 00000000000000..40a5311a4fe505 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-2020_q4_25p_filtered_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English 2020_q4_25p_filtered_pipeline pipeline RoBertaEmbeddings from DouglasPontes +author: John Snow Labs +name: 2020_q4_25p_filtered_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`2020_q4_25p_filtered_pipeline` is a English model originally trained by DouglasPontes. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/2020_q4_25p_filtered_pipeline_en_5.5.0_3.0_1726796745516.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/2020_q4_25p_filtered_pipeline_en_5.5.0_3.0_1726796745516.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("2020_q4_25p_filtered_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("2020_q4_25p_filtered_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|2020_q4_25p_filtered_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|466.1 MB| + +## References + +https://huggingface.co/DouglasPontes/2020-Q4-25p-filtered + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-2020_q4_25p_filtered_random_en.md b/docs/_posts/ahmedlone127/2024-09-20-2020_q4_25p_filtered_random_en.md new file mode 100644 index 00000000000000..6e4f2938c8b530 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-2020_q4_25p_filtered_random_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English 2020_q4_25p_filtered_random RoBertaEmbeddings from DouglasPontes +author: John Snow Labs +name: 2020_q4_25p_filtered_random +date: 2024-09-20 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`2020_q4_25p_filtered_random` is a English model originally trained by DouglasPontes. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/2020_q4_25p_filtered_random_en_5.5.0_3.0_1726815990613.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/2020_q4_25p_filtered_random_en_5.5.0_3.0_1726815990613.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("2020_q4_25p_filtered_random","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("2020_q4_25p_filtered_random","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|2020_q4_25p_filtered_random| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|466.1 MB| + +## References + +https://huggingface.co/DouglasPontes/2020-Q4-25p-filtered-random \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-2020_q4_25p_filtered_random_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-2020_q4_25p_filtered_random_pipeline_en.md new file mode 100644 index 00000000000000..656333e52fb4a5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-2020_q4_25p_filtered_random_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English 2020_q4_25p_filtered_random_pipeline pipeline RoBertaEmbeddings from DouglasPontes +author: John Snow Labs +name: 2020_q4_25p_filtered_random_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`2020_q4_25p_filtered_random_pipeline` is a English model originally trained by DouglasPontes. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/2020_q4_25p_filtered_random_pipeline_en_5.5.0_3.0_1726816013879.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/2020_q4_25p_filtered_random_pipeline_en_5.5.0_3.0_1726816013879.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("2020_q4_25p_filtered_random_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("2020_q4_25p_filtered_random_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|2020_q4_25p_filtered_random_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|466.1 MB| + +## References + +https://huggingface.co/DouglasPontes/2020-Q4-25p-filtered-random + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-2020_q4_50p_filtered_prog_from_q3_en.md b/docs/_posts/ahmedlone127/2024-09-20-2020_q4_50p_filtered_prog_from_q3_en.md new file mode 100644 index 00000000000000..b0a4745dd89dd7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-2020_q4_50p_filtered_prog_from_q3_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English 2020_q4_50p_filtered_prog_from_q3 RoBertaEmbeddings from DouglasPontes +author: John Snow Labs +name: 2020_q4_50p_filtered_prog_from_q3 +date: 2024-09-20 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`2020_q4_50p_filtered_prog_from_q3` is a English model originally trained by DouglasPontes. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/2020_q4_50p_filtered_prog_from_q3_en_5.5.0_3.0_1726857075526.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/2020_q4_50p_filtered_prog_from_q3_en_5.5.0_3.0_1726857075526.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("2020_q4_50p_filtered_prog_from_q3","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("2020_q4_50p_filtered_prog_from_q3","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|2020_q4_50p_filtered_prog_from_q3| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|466.0 MB| + +## References + +https://huggingface.co/DouglasPontes/2020-Q4-50p-filtered-prog_from_Q3 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-2020_q4_50p_filtered_prog_from_q3_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-2020_q4_50p_filtered_prog_from_q3_pipeline_en.md new file mode 100644 index 00000000000000..a294e323746d64 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-2020_q4_50p_filtered_prog_from_q3_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English 2020_q4_50p_filtered_prog_from_q3_pipeline pipeline RoBertaEmbeddings from DouglasPontes +author: John Snow Labs +name: 2020_q4_50p_filtered_prog_from_q3_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`2020_q4_50p_filtered_prog_from_q3_pipeline` is a English model originally trained by DouglasPontes. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/2020_q4_50p_filtered_prog_from_q3_pipeline_en_5.5.0_3.0_1726857106084.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/2020_q4_50p_filtered_prog_from_q3_pipeline_en_5.5.0_3.0_1726857106084.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("2020_q4_50p_filtered_prog_from_q3_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("2020_q4_50p_filtered_prog_from_q3_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|2020_q4_50p_filtered_prog_from_q3_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|466.0 MB| + +## References + +https://huggingface.co/DouglasPontes/2020-Q4-50p-filtered-prog_from_Q3 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-20240304_01_hf_en.md b/docs/_posts/ahmedlone127/2024-09-20-20240304_01_hf_en.md new file mode 100644 index 00000000000000..b493083792b442 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-20240304_01_hf_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English 20240304_01_hf DistilBertForSequenceClassification from linweichen +author: John Snow Labs +name: 20240304_01_hf +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`20240304_01_hf` is a English model originally trained by linweichen. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/20240304_01_hf_en_5.5.0_3.0_1726841426416.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/20240304_01_hf_en_5.5.0_3.0_1726841426416.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("20240304_01_hf","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("20240304_01_hf", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|20240304_01_hf| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/linweichen/20240304_01_HF \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-20240304_01_hf_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-20240304_01_hf_pipeline_en.md new file mode 100644 index 00000000000000..db3f8a9b537796 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-20240304_01_hf_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English 20240304_01_hf_pipeline pipeline DistilBertForSequenceClassification from linweichen +author: John Snow Labs +name: 20240304_01_hf_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`20240304_01_hf_pipeline` is a English model originally trained by linweichen. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/20240304_01_hf_pipeline_en_5.5.0_3.0_1726841439437.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/20240304_01_hf_pipeline_en_5.5.0_3.0_1726841439437.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("20240304_01_hf_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("20240304_01_hf_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|20240304_01_hf_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/linweichen/20240304_01_HF + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-27th2024dash1_distilbert_base_uncased_2_en.md b/docs/_posts/ahmedlone127/2024-09-20-27th2024dash1_distilbert_base_uncased_2_en.md new file mode 100644 index 00000000000000..772ff020418dfb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-27th2024dash1_distilbert_base_uncased_2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English 27th2024dash1_distilbert_base_uncased_2 DistilBertForSequenceClassification from blockchain17171 +author: John Snow Labs +name: 27th2024dash1_distilbert_base_uncased_2 +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`27th2024dash1_distilbert_base_uncased_2` is a English model originally trained by blockchain17171. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/27th2024dash1_distilbert_base_uncased_2_en_5.5.0_3.0_1726809303704.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/27th2024dash1_distilbert_base_uncased_2_en_5.5.0_3.0_1726809303704.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("27th2024dash1_distilbert_base_uncased_2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("27th2024dash1_distilbert_base_uncased_2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|27th2024dash1_distilbert_base_uncased_2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/blockchain17171/27th2024DASH1-distilbert-base-uncased-2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-27th2024dash1_distilbert_base_uncased_2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-27th2024dash1_distilbert_base_uncased_2_pipeline_en.md new file mode 100644 index 00000000000000..7f261fd1e9b260 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-27th2024dash1_distilbert_base_uncased_2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English 27th2024dash1_distilbert_base_uncased_2_pipeline pipeline DistilBertForSequenceClassification from blockchain17171 +author: John Snow Labs +name: 27th2024dash1_distilbert_base_uncased_2_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`27th2024dash1_distilbert_base_uncased_2_pipeline` is a English model originally trained by blockchain17171. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/27th2024dash1_distilbert_base_uncased_2_pipeline_en_5.5.0_3.0_1726809315447.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/27th2024dash1_distilbert_base_uncased_2_pipeline_en_5.5.0_3.0_1726809315447.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("27th2024dash1_distilbert_base_uncased_2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("27th2024dash1_distilbert_base_uncased_2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|27th2024dash1_distilbert_base_uncased_2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/blockchain17171/27th2024DASH1-distilbert-base-uncased-2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-4_datasets_fake_news_with_covid_balanced_others_norwegian_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-4_datasets_fake_news_with_covid_balanced_others_norwegian_pipeline_en.md new file mode 100644 index 00000000000000..4218a40c44d8b4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-4_datasets_fake_news_with_covid_balanced_others_norwegian_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English 4_datasets_fake_news_with_covid_balanced_others_norwegian_pipeline pipeline DistilBertForSequenceClassification from littlepinhorse +author: John Snow Labs +name: 4_datasets_fake_news_with_covid_balanced_others_norwegian_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`4_datasets_fake_news_with_covid_balanced_others_norwegian_pipeline` is a English model originally trained by littlepinhorse. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/4_datasets_fake_news_with_covid_balanced_others_norwegian_pipeline_en_5.5.0_3.0_1726792035678.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/4_datasets_fake_news_with_covid_balanced_others_norwegian_pipeline_en_5.5.0_3.0_1726792035678.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("4_datasets_fake_news_with_covid_balanced_others_norwegian_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("4_datasets_fake_news_with_covid_balanced_others_norwegian_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|4_datasets_fake_news_with_covid_balanced_others_norwegian_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/littlepinhorse/4_datasets_fake_news_with_covid_balanced_others_no + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-a_nepal_bhasa_repo_edurayan_en.md b/docs/_posts/ahmedlone127/2024-09-20-a_nepal_bhasa_repo_edurayan_en.md new file mode 100644 index 00000000000000..cfa404a3612852 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-a_nepal_bhasa_repo_edurayan_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English a_nepal_bhasa_repo_edurayan DistilBertForSequenceClassification from EduRayan +author: John Snow Labs +name: a_nepal_bhasa_repo_edurayan +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`a_nepal_bhasa_repo_edurayan` is a English model originally trained by EduRayan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/a_nepal_bhasa_repo_edurayan_en_5.5.0_3.0_1726871734597.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/a_nepal_bhasa_repo_edurayan_en_5.5.0_3.0_1726871734597.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("a_nepal_bhasa_repo_edurayan","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("a_nepal_bhasa_repo_edurayan", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|a_nepal_bhasa_repo_edurayan| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/EduRayan/A-new-repo \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-a_nepal_bhasa_repo_edurayan_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-a_nepal_bhasa_repo_edurayan_pipeline_en.md new file mode 100644 index 00000000000000..2c5055aafef32b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-a_nepal_bhasa_repo_edurayan_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English a_nepal_bhasa_repo_edurayan_pipeline pipeline DistilBertForSequenceClassification from EduRayan +author: John Snow Labs +name: a_nepal_bhasa_repo_edurayan_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`a_nepal_bhasa_repo_edurayan_pipeline` is a English model originally trained by EduRayan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/a_nepal_bhasa_repo_edurayan_pipeline_en_5.5.0_3.0_1726871746407.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/a_nepal_bhasa_repo_edurayan_pipeline_en_5.5.0_3.0_1726871746407.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("a_nepal_bhasa_repo_edurayan_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("a_nepal_bhasa_repo_edurayan_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|a_nepal_bhasa_repo_edurayan_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/EduRayan/A-new-repo + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-absa_restaurant_froberta_base_v2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-absa_restaurant_froberta_base_v2_pipeline_en.md new file mode 100644 index 00000000000000..5c00cda11eb409 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-absa_restaurant_froberta_base_v2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English absa_restaurant_froberta_base_v2_pipeline pipeline RoBertaEmbeddings from AliAhmad001 +author: John Snow Labs +name: absa_restaurant_froberta_base_v2_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`absa_restaurant_froberta_base_v2_pipeline` is a English model originally trained by AliAhmad001. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/absa_restaurant_froberta_base_v2_pipeline_en_5.5.0_3.0_1726857809625.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/absa_restaurant_froberta_base_v2_pipeline_en_5.5.0_3.0_1726857809625.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("absa_restaurant_froberta_base_v2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("absa_restaurant_froberta_base_v2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|absa_restaurant_froberta_base_v2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|466.0 MB| + +## References + +https://huggingface.co/AliAhmad001/absa-restaurant-froberta-base-v2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-affect_arroberta_v1_ar.md b/docs/_posts/ahmedlone127/2024-09-20-affect_arroberta_v1_ar.md new file mode 100644 index 00000000000000..e81459c4e6b3d7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-affect_arroberta_v1_ar.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Arabic affect_arroberta_v1 RoBertaEmbeddings from NLP-EXP +author: John Snow Labs +name: affect_arroberta_v1 +date: 2024-09-20 +tags: [ar, open_source, onnx, embeddings, roberta] +task: Embeddings +language: ar +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`affect_arroberta_v1` is a Arabic model originally trained by NLP-EXP. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/affect_arroberta_v1_ar_5.5.0_3.0_1726857545335.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/affect_arroberta_v1_ar_5.5.0_3.0_1726857545335.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("affect_arroberta_v1","ar") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("affect_arroberta_v1","ar") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|affect_arroberta_v1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|ar| +|Size:|504.5 MB| + +## References + +https://huggingface.co/NLP-EXP/Affect-ArRoberta-v1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-affect_arroberta_v1_pipeline_ar.md b/docs/_posts/ahmedlone127/2024-09-20-affect_arroberta_v1_pipeline_ar.md new file mode 100644 index 00000000000000..a5aa5dc12dd92b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-affect_arroberta_v1_pipeline_ar.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Arabic affect_arroberta_v1_pipeline pipeline RoBertaEmbeddings from NLP-EXP +author: John Snow Labs +name: affect_arroberta_v1_pipeline +date: 2024-09-20 +tags: [ar, open_source, pipeline, onnx] +task: Embeddings +language: ar +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`affect_arroberta_v1_pipeline` is a Arabic model originally trained by NLP-EXP. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/affect_arroberta_v1_pipeline_ar_5.5.0_3.0_1726857570854.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/affect_arroberta_v1_pipeline_ar_5.5.0_3.0_1726857570854.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("affect_arroberta_v1_pipeline", lang = "ar") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("affect_arroberta_v1_pipeline", lang = "ar") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|affect_arroberta_v1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|ar| +|Size:|504.5 MB| + +## References + +https://huggingface.co/NLP-EXP/Affect-ArRoberta-v1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-affilgood_ner_test_v4_en.md b/docs/_posts/ahmedlone127/2024-09-20-affilgood_ner_test_v4_en.md new file mode 100644 index 00000000000000..422fd8c96f68df --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-affilgood_ner_test_v4_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English affilgood_ner_test_v4 RoBertaForTokenClassification from nicolauduran45 +author: John Snow Labs +name: affilgood_ner_test_v4 +date: 2024-09-20 +tags: [en, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`affilgood_ner_test_v4` is a English model originally trained by nicolauduran45. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/affilgood_ner_test_v4_en_5.5.0_3.0_1726846859587.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/affilgood_ner_test_v4_en_5.5.0_3.0_1726846859587.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("affilgood_ner_test_v4","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("affilgood_ner_test_v4", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|affilgood_ner_test_v4| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|466.1 MB| + +## References + +https://huggingface.co/nicolauduran45/affilgood-ner-test-v4 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-affilgood_ner_test_v4_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-affilgood_ner_test_v4_pipeline_en.md new file mode 100644 index 00000000000000..e26842a6f5ab54 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-affilgood_ner_test_v4_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English affilgood_ner_test_v4_pipeline pipeline RoBertaForTokenClassification from nicolauduran45 +author: John Snow Labs +name: affilgood_ner_test_v4_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`affilgood_ner_test_v4_pipeline` is a English model originally trained by nicolauduran45. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/affilgood_ner_test_v4_pipeline_en_5.5.0_3.0_1726846882261.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/affilgood_ner_test_v4_pipeline_en_5.5.0_3.0_1726846882261.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("affilgood_ner_test_v4_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("affilgood_ner_test_v4_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|affilgood_ner_test_v4_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|466.1 MB| + +## References + +https://huggingface.co/nicolauduran45/affilgood-ner-test-v4 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-agnews_padding10model_en.md b/docs/_posts/ahmedlone127/2024-09-20-agnews_padding10model_en.md new file mode 100644 index 00000000000000..56c1186213a74d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-agnews_padding10model_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English agnews_padding10model DistilBertForSequenceClassification from Realgon +author: John Snow Labs +name: agnews_padding10model +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`agnews_padding10model` is a English model originally trained by Realgon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/agnews_padding10model_en_5.5.0_3.0_1726840996697.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/agnews_padding10model_en_5.5.0_3.0_1726840996697.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("agnews_padding10model","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("agnews_padding10model", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|agnews_padding10model| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Realgon/agnews_padding10model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-agnews_padding10model_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-agnews_padding10model_pipeline_en.md new file mode 100644 index 00000000000000..aff5bfd6218605 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-agnews_padding10model_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English agnews_padding10model_pipeline pipeline DistilBertForSequenceClassification from Realgon +author: John Snow Labs +name: agnews_padding10model_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`agnews_padding10model_pipeline` is a English model originally trained by Realgon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/agnews_padding10model_pipeline_en_5.5.0_3.0_1726841013074.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/agnews_padding10model_pipeline_en_5.5.0_3.0_1726841013074.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("agnews_padding10model_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("agnews_padding10model_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|agnews_padding10model_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Realgon/agnews_padding10model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-albert_base_qa_squad2_en.md b/docs/_posts/ahmedlone127/2024-09-20-albert_base_qa_squad2_en.md new file mode 100644 index 00000000000000..78a13b822634ac --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-albert_base_qa_squad2_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English AlbertForQuestionAnswering model (from twmkn9) +author: John Snow Labs +name: albert_base_qa_squad2 +date: 2024-09-20 +tags: [question_answering, albert, openvino, en, open_source, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: AlbertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +“ +Pretrained Question Answering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. albert-base-v2-squad2 is a English model originally trained by twmkn9. + +## Predicted Entities + + + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/albert_base_qa_squad2_en_5.5.0_3.0_1726866588603.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/albert_base_qa_squad2_en_5.5.0_3.0_1726866588603.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = MultiDocumentAssembler() \ +.setInputCols(["question", "context"]) \ +.setOutputCols(["document_question", "document_context"]) + +spanClassifier = AlbertForQuestionAnswering.pretrained("albert_base_qa_squad2","en") \ +.setInputCols(["document_question", "document_context"]) \ +.setOutputCol("answer").setCaseSensitive(True) + +pipeline = Pipeline(stages=[documentAssembler, spanClassifier]) + +data = spark.createDataFrame([["What is my name?", "My name is Clara and I live in Berkeley."]]).toDF("question", "context") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new MultiDocumentAssembler() +.setInputCols(Array("question", "context")) +.setOutputCols(Array("document_question", "document_context")) + +val spanClassifer = AlbertForQuestionAnswering.pretrained("albert_base_qa_squad2","en") +.setInputCols(Array("document", "token")) +.setOutputCol("answer") +.setCaseSensitive(true) + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) + +val data = Seq("What is my name?", "My name is Clara and I live in Berkeley.").toDF("question", "context") + +val result = pipeline.fit(data).transform(data) +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|albert_base_qa_squad2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|42.0 MB| \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-albert_base_qa_squad2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-albert_base_qa_squad2_pipeline_en.md new file mode 100644 index 00000000000000..5fca055ba2af95 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-albert_base_qa_squad2_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English albert_base_qa_squad2_pipeline pipeline AlbertForQuestionAnswering from twmkn9 +author: John Snow Labs +name: albert_base_qa_squad2_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained AlbertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`albert_base_qa_squad2_pipeline` is a English model originally trained by twmkn9. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/albert_base_qa_squad2_pipeline_en_5.5.0_3.0_1726866590969.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/albert_base_qa_squad2_pipeline_en_5.5.0_3.0_1726866590969.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("albert_base_qa_squad2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("albert_base_qa_squad2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|albert_base_qa_squad2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|42.0 MB| + +## References + +https://huggingface.co/twmkn9albert-base-v2-squad2 + +## Included Models + +- MultiDocumentAssembler +- AlbertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-albert_base_v2_squad2_twmkn9_en.md b/docs/_posts/ahmedlone127/2024-09-20-albert_base_v2_squad2_twmkn9_en.md new file mode 100644 index 00000000000000..3982100a157870 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-albert_base_v2_squad2_twmkn9_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English albert_base_v2_squad2_twmkn9 AlbertForQuestionAnswering from twmkn9 +author: John Snow Labs +name: albert_base_v2_squad2_twmkn9 +date: 2024-09-20 +tags: [en, open_source, onnx, question_answering, albert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: AlbertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained AlbertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`albert_base_v2_squad2_twmkn9` is a English model originally trained by twmkn9. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/albert_base_v2_squad2_twmkn9_en_5.5.0_3.0_1726866588390.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/albert_base_v2_squad2_twmkn9_en_5.5.0_3.0_1726866588390.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = AlbertForQuestionAnswering.pretrained("albert_base_v2_squad2_twmkn9","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = AlbertForQuestionAnswering.pretrained("albert_base_v2_squad2_twmkn9", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|albert_base_v2_squad2_twmkn9| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|42.0 MB| + +## References + +https://huggingface.co/twmkn9/albert-base-v2-squad2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-albert_base_v2_squad2_twmkn9_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-albert_base_v2_squad2_twmkn9_pipeline_en.md new file mode 100644 index 00000000000000..68e5c36ded3b24 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-albert_base_v2_squad2_twmkn9_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English albert_base_v2_squad2_twmkn9_pipeline pipeline AlbertForQuestionAnswering from twmkn9 +author: John Snow Labs +name: albert_base_v2_squad2_twmkn9_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained AlbertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`albert_base_v2_squad2_twmkn9_pipeline` is a English model originally trained by twmkn9. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/albert_base_v2_squad2_twmkn9_pipeline_en_5.5.0_3.0_1726866590531.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/albert_base_v2_squad2_twmkn9_pipeline_en_5.5.0_3.0_1726866590531.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("albert_base_v2_squad2_twmkn9_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("albert_base_v2_squad2_twmkn9_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|albert_base_v2_squad2_twmkn9_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|42.0 MB| + +## References + +https://huggingface.co/twmkn9/albert-base-v2-squad2 + +## Included Models + +- MultiDocumentAssembler +- AlbertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-alberta_base_mathissimo_en.md b/docs/_posts/ahmedlone127/2024-09-20-alberta_base_mathissimo_en.md new file mode 100644 index 00000000000000..1f42e44b32e4e3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-alberta_base_mathissimo_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English alberta_base_mathissimo RoBertaForSequenceClassification from Mathissimo +author: John Snow Labs +name: alberta_base_mathissimo +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`alberta_base_mathissimo` is a English model originally trained by Mathissimo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/alberta_base_mathissimo_en_5.5.0_3.0_1726850139055.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/alberta_base_mathissimo_en_5.5.0_3.0_1726850139055.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("alberta_base_mathissimo","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("alberta_base_mathissimo", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|alberta_base_mathissimo| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|442.5 MB| + +## References + +https://huggingface.co/Mathissimo/alberta_base \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-alberta_base_mathissimo_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-alberta_base_mathissimo_pipeline_en.md new file mode 100644 index 00000000000000..ce3a0b69cfcbdf --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-alberta_base_mathissimo_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English alberta_base_mathissimo_pipeline pipeline RoBertaForSequenceClassification from Mathissimo +author: John Snow Labs +name: alberta_base_mathissimo_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`alberta_base_mathissimo_pipeline` is a English model originally trained by Mathissimo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/alberta_base_mathissimo_pipeline_en_5.5.0_3.0_1726850164600.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/alberta_base_mathissimo_pipeline_en_5.5.0_3.0_1726850164600.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("alberta_base_mathissimo_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("alberta_base_mathissimo_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|alberta_base_mathissimo_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|442.6 MB| + +## References + +https://huggingface.co/Mathissimo/alberta_base + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-all_distilroberta_v1_finetuned_dit_10_epochs_en.md b/docs/_posts/ahmedlone127/2024-09-20-all_distilroberta_v1_finetuned_dit_10_epochs_en.md new file mode 100644 index 00000000000000..696d701e5e4cc5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-all_distilroberta_v1_finetuned_dit_10_epochs_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English all_distilroberta_v1_finetuned_dit_10_epochs RoBertaEmbeddings from veddm +author: John Snow Labs +name: all_distilroberta_v1_finetuned_dit_10_epochs +date: 2024-09-20 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`all_distilroberta_v1_finetuned_dit_10_epochs` is a English model originally trained by veddm. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/all_distilroberta_v1_finetuned_dit_10_epochs_en_5.5.0_3.0_1726857216238.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/all_distilroberta_v1_finetuned_dit_10_epochs_en_5.5.0_3.0_1726857216238.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("all_distilroberta_v1_finetuned_dit_10_epochs","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("all_distilroberta_v1_finetuned_dit_10_epochs","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|all_distilroberta_v1_finetuned_dit_10_epochs| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|306.4 MB| + +## References + +https://huggingface.co/veddm/all-distilroberta-v1-finetuned-DIT-10_epochs \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-all_distilroberta_v1_finetuned_dit_10_epochs_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-all_distilroberta_v1_finetuned_dit_10_epochs_pipeline_en.md new file mode 100644 index 00000000000000..ebaf280a8cd894 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-all_distilroberta_v1_finetuned_dit_10_epochs_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English all_distilroberta_v1_finetuned_dit_10_epochs_pipeline pipeline RoBertaEmbeddings from veddm +author: John Snow Labs +name: all_distilroberta_v1_finetuned_dit_10_epochs_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`all_distilroberta_v1_finetuned_dit_10_epochs_pipeline` is a English model originally trained by veddm. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/all_distilroberta_v1_finetuned_dit_10_epochs_pipeline_en_5.5.0_3.0_1726857230785.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/all_distilroberta_v1_finetuned_dit_10_epochs_pipeline_en_5.5.0_3.0_1726857230785.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("all_distilroberta_v1_finetuned_dit_10_epochs_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("all_distilroberta_v1_finetuned_dit_10_epochs_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|all_distilroberta_v1_finetuned_dit_10_epochs_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|306.5 MB| + +## References + +https://huggingface.co/veddm/all-distilroberta-v1-finetuned-DIT-10_epochs + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-all_minilm_l6_v2_finetuned_squad_en.md b/docs/_posts/ahmedlone127/2024-09-20-all_minilm_l6_v2_finetuned_squad_en.md new file mode 100644 index 00000000000000..68831671e7948e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-all_minilm_l6_v2_finetuned_squad_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English all_minilm_l6_v2_finetuned_squad BertForQuestionAnswering from Sybghat +author: John Snow Labs +name: all_minilm_l6_v2_finetuned_squad +date: 2024-09-20 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`all_minilm_l6_v2_finetuned_squad` is a English model originally trained by Sybghat. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/all_minilm_l6_v2_finetuned_squad_en_5.5.0_3.0_1726834331584.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/all_minilm_l6_v2_finetuned_squad_en_5.5.0_3.0_1726834331584.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("all_minilm_l6_v2_finetuned_squad","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("all_minilm_l6_v2_finetuned_squad", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|all_minilm_l6_v2_finetuned_squad| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|84.2 MB| + +## References + +https://huggingface.co/Sybghat/all-MiniLM-L6-v2-finetuned-squad \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-all_minilm_l6_v2_finetuned_squad_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-all_minilm_l6_v2_finetuned_squad_pipeline_en.md new file mode 100644 index 00000000000000..f995d9c72bc4bc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-all_minilm_l6_v2_finetuned_squad_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English all_minilm_l6_v2_finetuned_squad_pipeline pipeline BertForQuestionAnswering from Sybghat +author: John Snow Labs +name: all_minilm_l6_v2_finetuned_squad_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`all_minilm_l6_v2_finetuned_squad_pipeline` is a English model originally trained by Sybghat. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/all_minilm_l6_v2_finetuned_squad_pipeline_en_5.5.0_3.0_1726834335735.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/all_minilm_l6_v2_finetuned_squad_pipeline_en_5.5.0_3.0_1726834335735.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("all_minilm_l6_v2_finetuned_squad_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("all_minilm_l6_v2_finetuned_squad_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|all_minilm_l6_v2_finetuned_squad_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|84.2 MB| + +## References + +https://huggingface.co/Sybghat/all-MiniLM-L6-v2-finetuned-squad + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-all_roberta_large_v1_auto_and_commute_16_16_5_oos_en.md b/docs/_posts/ahmedlone127/2024-09-20-all_roberta_large_v1_auto_and_commute_16_16_5_oos_en.md new file mode 100644 index 00000000000000..c77825cbadfb5e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-all_roberta_large_v1_auto_and_commute_16_16_5_oos_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English all_roberta_large_v1_auto_and_commute_16_16_5_oos RoBertaForSequenceClassification from fathyshalab +author: John Snow Labs +name: all_roberta_large_v1_auto_and_commute_16_16_5_oos +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`all_roberta_large_v1_auto_and_commute_16_16_5_oos` is a English model originally trained by fathyshalab. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/all_roberta_large_v1_auto_and_commute_16_16_5_oos_en_5.5.0_3.0_1726805469779.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/all_roberta_large_v1_auto_and_commute_16_16_5_oos_en_5.5.0_3.0_1726805469779.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("all_roberta_large_v1_auto_and_commute_16_16_5_oos","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("all_roberta_large_v1_auto_and_commute_16_16_5_oos", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|all_roberta_large_v1_auto_and_commute_16_16_5_oos| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/fathyshalab/all-roberta-large-v1-auto_and_commute-16-16-5-oos \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-all_roberta_large_v1_banking_1000_16_5_oos_en.md b/docs/_posts/ahmedlone127/2024-09-20-all_roberta_large_v1_banking_1000_16_5_oos_en.md new file mode 100644 index 00000000000000..feae548d60f70c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-all_roberta_large_v1_banking_1000_16_5_oos_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English all_roberta_large_v1_banking_1000_16_5_oos RoBertaForSequenceClassification from fathyshalab +author: John Snow Labs +name: all_roberta_large_v1_banking_1000_16_5_oos +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`all_roberta_large_v1_banking_1000_16_5_oos` is a English model originally trained by fathyshalab. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/all_roberta_large_v1_banking_1000_16_5_oos_en_5.5.0_3.0_1726804609648.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/all_roberta_large_v1_banking_1000_16_5_oos_en_5.5.0_3.0_1726804609648.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("all_roberta_large_v1_banking_1000_16_5_oos","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("all_roberta_large_v1_banking_1000_16_5_oos", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|all_roberta_large_v1_banking_1000_16_5_oos| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/fathyshalab/all-roberta-large-v1-banking-1000-16-5-oos \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-all_roberta_large_v1_credit_cards_8_16_5_en.md b/docs/_posts/ahmedlone127/2024-09-20-all_roberta_large_v1_credit_cards_8_16_5_en.md new file mode 100644 index 00000000000000..3e5ba787df5d91 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-all_roberta_large_v1_credit_cards_8_16_5_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English all_roberta_large_v1_credit_cards_8_16_5 RoBertaForSequenceClassification from fathyshalab +author: John Snow Labs +name: all_roberta_large_v1_credit_cards_8_16_5 +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`all_roberta_large_v1_credit_cards_8_16_5` is a English model originally trained by fathyshalab. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/all_roberta_large_v1_credit_cards_8_16_5_en_5.5.0_3.0_1726850049121.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/all_roberta_large_v1_credit_cards_8_16_5_en_5.5.0_3.0_1726850049121.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("all_roberta_large_v1_credit_cards_8_16_5","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("all_roberta_large_v1_credit_cards_8_16_5", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|all_roberta_large_v1_credit_cards_8_16_5| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/fathyshalab/all-roberta-large-v1-credit_cards-8-16-5 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-all_roberta_large_v1_credit_cards_8_16_5_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-all_roberta_large_v1_credit_cards_8_16_5_pipeline_en.md new file mode 100644 index 00000000000000..54b9539e8e9136 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-all_roberta_large_v1_credit_cards_8_16_5_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English all_roberta_large_v1_credit_cards_8_16_5_pipeline pipeline RoBertaForSequenceClassification from fathyshalab +author: John Snow Labs +name: all_roberta_large_v1_credit_cards_8_16_5_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`all_roberta_large_v1_credit_cards_8_16_5_pipeline` is a English model originally trained by fathyshalab. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/all_roberta_large_v1_credit_cards_8_16_5_pipeline_en_5.5.0_3.0_1726850112567.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/all_roberta_large_v1_credit_cards_8_16_5_pipeline_en_5.5.0_3.0_1726850112567.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("all_roberta_large_v1_credit_cards_8_16_5_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("all_roberta_large_v1_credit_cards_8_16_5_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|all_roberta_large_v1_credit_cards_8_16_5_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/fathyshalab/all-roberta-large-v1-credit_cards-8-16-5 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-all_roberta_large_v1_travel_4_16_5_oos_en.md b/docs/_posts/ahmedlone127/2024-09-20-all_roberta_large_v1_travel_4_16_5_oos_en.md new file mode 100644 index 00000000000000..0ddb69fbe649d7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-all_roberta_large_v1_travel_4_16_5_oos_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English all_roberta_large_v1_travel_4_16_5_oos RoBertaForSequenceClassification from fathyshalab +author: John Snow Labs +name: all_roberta_large_v1_travel_4_16_5_oos +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`all_roberta_large_v1_travel_4_16_5_oos` is a English model originally trained by fathyshalab. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/all_roberta_large_v1_travel_4_16_5_oos_en_5.5.0_3.0_1726804833301.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/all_roberta_large_v1_travel_4_16_5_oos_en_5.5.0_3.0_1726804833301.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("all_roberta_large_v1_travel_4_16_5_oos","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("all_roberta_large_v1_travel_4_16_5_oos", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|all_roberta_large_v1_travel_4_16_5_oos| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/fathyshalab/all-roberta-large-v1-travel-4-16-5-oos \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-all_roberta_large_v1_travel_4_16_5_oos_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-all_roberta_large_v1_travel_4_16_5_oos_pipeline_en.md new file mode 100644 index 00000000000000..38e8e4b65cf6b6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-all_roberta_large_v1_travel_4_16_5_oos_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English all_roberta_large_v1_travel_4_16_5_oos_pipeline pipeline RoBertaForSequenceClassification from fathyshalab +author: John Snow Labs +name: all_roberta_large_v1_travel_4_16_5_oos_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`all_roberta_large_v1_travel_4_16_5_oos_pipeline` is a English model originally trained by fathyshalab. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/all_roberta_large_v1_travel_4_16_5_oos_pipeline_en_5.5.0_3.0_1726804896730.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/all_roberta_large_v1_travel_4_16_5_oos_pipeline_en_5.5.0_3.0_1726804896730.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("all_roberta_large_v1_travel_4_16_5_oos_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("all_roberta_large_v1_travel_4_16_5_oos_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|all_roberta_large_v1_travel_4_16_5_oos_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/fathyshalab/all-roberta-large-v1-travel-4-16-5-oos + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-amazon_baby_distilbert_en.md b/docs/_posts/ahmedlone127/2024-09-20-amazon_baby_distilbert_en.md new file mode 100644 index 00000000000000..430ff9b018f58c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-amazon_baby_distilbert_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English amazon_baby_distilbert DistilBertForSequenceClassification from aleehpandita +author: John Snow Labs +name: amazon_baby_distilbert +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`amazon_baby_distilbert` is a English model originally trained by aleehpandita. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/amazon_baby_distilbert_en_5.5.0_3.0_1726861230089.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/amazon_baby_distilbert_en_5.5.0_3.0_1726861230089.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("amazon_baby_distilbert","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("amazon_baby_distilbert", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|amazon_baby_distilbert| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/aleehpandita/amazon-baby-distilbert \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-amazon_baby_distilbert_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-amazon_baby_distilbert_pipeline_en.md new file mode 100644 index 00000000000000..e35e4efb9bb75a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-amazon_baby_distilbert_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English amazon_baby_distilbert_pipeline pipeline DistilBertForSequenceClassification from aleehpandita +author: John Snow Labs +name: amazon_baby_distilbert_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`amazon_baby_distilbert_pipeline` is a English model originally trained by aleehpandita. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/amazon_baby_distilbert_pipeline_en_5.5.0_3.0_1726861241720.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/amazon_baby_distilbert_pipeline_en_5.5.0_3.0_1726861241720.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("amazon_baby_distilbert_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("amazon_baby_distilbert_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|amazon_baby_distilbert_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/aleehpandita/amazon-baby-distilbert + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-asrforcommonvoice_en.md b/docs/_posts/ahmedlone127/2024-09-20-asrforcommonvoice_en.md new file mode 100644 index 00000000000000..38d837bc7111b8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-asrforcommonvoice_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English asrforcommonvoice WhisperForCTC from Wishwa98 +author: John Snow Labs +name: asrforcommonvoice +date: 2024-09-20 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`asrforcommonvoice` is a English model originally trained by Wishwa98. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/asrforcommonvoice_en_5.5.0_3.0_1726813752426.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/asrforcommonvoice_en_5.5.0_3.0_1726813752426.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("asrforcommonvoice","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("asrforcommonvoice", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|asrforcommonvoice| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/Wishwa98/ASRForCommonVoice \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-asrforcommonvoice_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-asrforcommonvoice_pipeline_en.md new file mode 100644 index 00000000000000..27a8f4dd7ddbdd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-asrforcommonvoice_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English asrforcommonvoice_pipeline pipeline WhisperForCTC from Wishwa98 +author: John Snow Labs +name: asrforcommonvoice_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`asrforcommonvoice_pipeline` is a English model originally trained by Wishwa98. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/asrforcommonvoice_pipeline_en_5.5.0_3.0_1726813836752.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/asrforcommonvoice_pipeline_en_5.5.0_3.0_1726813836752.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("asrforcommonvoice_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("asrforcommonvoice_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|asrforcommonvoice_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/Wishwa98/ASRForCommonVoice + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-augmented_model_fast_2_c_norwegian_copula_norwegian_time_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-augmented_model_fast_2_c_norwegian_copula_norwegian_time_pipeline_en.md new file mode 100644 index 00000000000000..2cc3ed484409b4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-augmented_model_fast_2_c_norwegian_copula_norwegian_time_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English augmented_model_fast_2_c_norwegian_copula_norwegian_time_pipeline pipeline DistilBertForSequenceClassification from LeonardoFettucciari +author: John Snow Labs +name: augmented_model_fast_2_c_norwegian_copula_norwegian_time_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`augmented_model_fast_2_c_norwegian_copula_norwegian_time_pipeline` is a English model originally trained by LeonardoFettucciari. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/augmented_model_fast_2_c_norwegian_copula_norwegian_time_pipeline_en_5.5.0_3.0_1726871756113.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/augmented_model_fast_2_c_norwegian_copula_norwegian_time_pipeline_en_5.5.0_3.0_1726871756113.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("augmented_model_fast_2_c_norwegian_copula_norwegian_time_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("augmented_model_fast_2_c_norwegian_copula_norwegian_time_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|augmented_model_fast_2_c_norwegian_copula_norwegian_time_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/LeonardoFettucciari/augmented_model_fast_2_c_NO_COPULA_NO_TIME + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-autotrain_63go1_k0lzp_en.md b/docs/_posts/ahmedlone127/2024-09-20-autotrain_63go1_k0lzp_en.md new file mode 100644 index 00000000000000..45303ba656dfb9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-autotrain_63go1_k0lzp_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English autotrain_63go1_k0lzp DistilBertForSequenceClassification from ben-yu +author: John Snow Labs +name: autotrain_63go1_k0lzp +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`autotrain_63go1_k0lzp` is a English model originally trained by ben-yu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/autotrain_63go1_k0lzp_en_5.5.0_3.0_1726860726102.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/autotrain_63go1_k0lzp_en_5.5.0_3.0_1726860726102.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("autotrain_63go1_k0lzp","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("autotrain_63go1_k0lzp", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|autotrain_63go1_k0lzp| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/ben-yu/autotrain-63go1-k0lzp \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-autotrain_63go1_k0lzp_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-autotrain_63go1_k0lzp_pipeline_en.md new file mode 100644 index 00000000000000..c75b7a5d67eef7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-autotrain_63go1_k0lzp_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English autotrain_63go1_k0lzp_pipeline pipeline DistilBertForSequenceClassification from ben-yu +author: John Snow Labs +name: autotrain_63go1_k0lzp_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`autotrain_63go1_k0lzp_pipeline` is a English model originally trained by ben-yu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/autotrain_63go1_k0lzp_pipeline_en_5.5.0_3.0_1726860742399.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/autotrain_63go1_k0lzp_pipeline_en_5.5.0_3.0_1726860742399.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("autotrain_63go1_k0lzp_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("autotrain_63go1_k0lzp_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|autotrain_63go1_k0lzp_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/ben-yu/autotrain-63go1-k0lzp + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-autotrain_v2v7o_9tu3d_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-autotrain_v2v7o_9tu3d_pipeline_en.md new file mode 100644 index 00000000000000..9a053ff3947261 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-autotrain_v2v7o_9tu3d_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English autotrain_v2v7o_9tu3d_pipeline pipeline DistilBertForSequenceClassification from cuwfnguyen +author: John Snow Labs +name: autotrain_v2v7o_9tu3d_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`autotrain_v2v7o_9tu3d_pipeline` is a English model originally trained by cuwfnguyen. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/autotrain_v2v7o_9tu3d_pipeline_en_5.5.0_3.0_1726792387685.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/autotrain_v2v7o_9tu3d_pipeline_en_5.5.0_3.0_1726792387685.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("autotrain_v2v7o_9tu3d_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("autotrain_v2v7o_9tu3d_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|autotrain_v2v7o_9tu3d_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.6 MB| + +## References + +https://huggingface.co/cuwfnguyen/autotrain-v2v7o-9tu3d + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-azbertacontextualizedwordembeddingsinazerbaijanilanguage_en.md b/docs/_posts/ahmedlone127/2024-09-20-azbertacontextualizedwordembeddingsinazerbaijanilanguage_en.md new file mode 100644 index 00000000000000..c46abec428b787 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-azbertacontextualizedwordembeddingsinazerbaijanilanguage_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English azbertacontextualizedwordembeddingsinazerbaijanilanguage RoBertaEmbeddings from turalizada +author: John Snow Labs +name: azbertacontextualizedwordembeddingsinazerbaijanilanguage +date: 2024-09-20 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`azbertacontextualizedwordembeddingsinazerbaijanilanguage` is a English model originally trained by turalizada. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/azbertacontextualizedwordembeddingsinazerbaijanilanguage_en_5.5.0_3.0_1726857736688.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/azbertacontextualizedwordembeddingsinazerbaijanilanguage_en_5.5.0_3.0_1726857736688.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("azbertacontextualizedwordembeddingsinazerbaijanilanguage","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("azbertacontextualizedwordembeddingsinazerbaijanilanguage","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|azbertacontextualizedwordembeddingsinazerbaijanilanguage| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|1.0 GB| + +## References + +https://huggingface.co/turalizada/AzBERTaContextualizedWordEmbeddingsinAzerbaijaniLanguage \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-azbertacontextualizedwordembeddingsinazerbaijanilanguage_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-azbertacontextualizedwordembeddingsinazerbaijanilanguage_pipeline_en.md new file mode 100644 index 00000000000000..ec530243fcb428 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-azbertacontextualizedwordembeddingsinazerbaijanilanguage_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English azbertacontextualizedwordembeddingsinazerbaijanilanguage_pipeline pipeline RoBertaEmbeddings from turalizada +author: John Snow Labs +name: azbertacontextualizedwordembeddingsinazerbaijanilanguage_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`azbertacontextualizedwordembeddingsinazerbaijanilanguage_pipeline` is a English model originally trained by turalizada. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/azbertacontextualizedwordembeddingsinazerbaijanilanguage_pipeline_en_5.5.0_3.0_1726857785539.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/azbertacontextualizedwordembeddingsinazerbaijanilanguage_pipeline_en_5.5.0_3.0_1726857785539.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("azbertacontextualizedwordembeddingsinazerbaijanilanguage_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("azbertacontextualizedwordembeddingsinazerbaijanilanguage_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|azbertacontextualizedwordembeddingsinazerbaijanilanguage_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.0 GB| + +## References + +https://huggingface.co/turalizada/AzBERTaContextualizedWordEmbeddingsinAzerbaijaniLanguage + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-b001_cleaned_en.md b/docs/_posts/ahmedlone127/2024-09-20-b001_cleaned_en.md new file mode 100644 index 00000000000000..af8a774d8f110e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-b001_cleaned_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English b001_cleaned DistilBertForSequenceClassification from Theoreticallyhugo +author: John Snow Labs +name: b001_cleaned +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`b001_cleaned` is a English model originally trained by Theoreticallyhugo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/b001_cleaned_en_5.5.0_3.0_1726871524721.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/b001_cleaned_en_5.5.0_3.0_1726871524721.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("b001_cleaned","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("b001_cleaned", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|b001_cleaned| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Theoreticallyhugo/B001_cleaned \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-b001_cleaned_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-b001_cleaned_pipeline_en.md new file mode 100644 index 00000000000000..406a207002e21e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-b001_cleaned_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English b001_cleaned_pipeline pipeline DistilBertForSequenceClassification from Theoreticallyhugo +author: John Snow Labs +name: b001_cleaned_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`b001_cleaned_pipeline` is a English model originally trained by Theoreticallyhugo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/b001_cleaned_pipeline_en_5.5.0_3.0_1726871536420.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/b001_cleaned_pipeline_en_5.5.0_3.0_1726871536420.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("b001_cleaned_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("b001_cleaned_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|b001_cleaned_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Theoreticallyhugo/B001_cleaned + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-base_model_en.md b/docs/_posts/ahmedlone127/2024-09-20-base_model_en.md new file mode 100644 index 00000000000000..a0a4500cd22fd0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-base_model_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English base_model DistilBertForSequenceClassification from ghantaharsha +author: John Snow Labs +name: base_model +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`base_model` is a English model originally trained by ghantaharsha. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/base_model_en_5.5.0_3.0_1726830311966.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/base_model_en_5.5.0_3.0_1726830311966.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("base_model","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("base_model", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|base_model| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/ghantaharsha/base-model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-base_model_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-base_model_pipeline_en.md new file mode 100644 index 00000000000000..2fec2287e13b4f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-base_model_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English base_model_pipeline pipeline DistilBertForSequenceClassification from ghantaharsha +author: John Snow Labs +name: base_model_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`base_model_pipeline` is a English model originally trained by ghantaharsha. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/base_model_pipeline_en_5.5.0_3.0_1726830324538.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/base_model_pipeline_en_5.5.0_3.0_1726830324538.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("base_model_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("base_model_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|base_model_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/ghantaharsha/base-model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-bbc_en.md b/docs/_posts/ahmedlone127/2024-09-20-bbc_en.md new file mode 100644 index 00000000000000..130a5157b5e75d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-bbc_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bbc DistilBertForSequenceClassification from NawinCom +author: John Snow Labs +name: bbc +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bbc` is a English model originally trained by NawinCom. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bbc_en_5.5.0_3.0_1726842175208.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bbc_en_5.5.0_3.0_1726842175208.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("bbc","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("bbc", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bbc| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/NawinCom/BBC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-bbc_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-bbc_pipeline_en.md new file mode 100644 index 00000000000000..e0e1d7b32a37c7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-bbc_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bbc_pipeline pipeline DistilBertForSequenceClassification from NawinCom +author: John Snow Labs +name: bbc_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bbc_pipeline` is a English model originally trained by NawinCom. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bbc_pipeline_en_5.5.0_3.0_1726842187802.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bbc_pipeline_en_5.5.0_3.0_1726842187802.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bbc_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bbc_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bbc_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/NawinCom/BBC + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-berit_52000_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-berit_52000_pipeline_en.md new file mode 100644 index 00000000000000..99368dfca7da1a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-berit_52000_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English berit_52000_pipeline pipeline RoBertaEmbeddings from gngpostalsrvc +author: John Snow Labs +name: berit_52000_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`berit_52000_pipeline` is a English model originally trained by gngpostalsrvc. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/berit_52000_pipeline_en_5.5.0_3.0_1726793310855.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/berit_52000_pipeline_en_5.5.0_3.0_1726793310855.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("berit_52000_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("berit_52000_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|berit_52000_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|469.9 MB| + +## References + +https://huggingface.co/gngpostalsrvc/BERiT_52000 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-bert_250_redo_en.md b/docs/_posts/ahmedlone127/2024-09-20-bert_250_redo_en.md new file mode 100644 index 00000000000000..ba4c60e2d8d758 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-bert_250_redo_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_250_redo DistilBertForSequenceClassification from intrinsic-disorder +author: John Snow Labs +name: bert_250_redo +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_250_redo` is a English model originally trained by intrinsic-disorder. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_250_redo_en_5.5.0_3.0_1726861199242.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_250_redo_en_5.5.0_3.0_1726861199242.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("bert_250_redo","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("bert_250_redo", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_250_redo| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/intrinsic-disorder/bert-250-redo \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-bert_250_redo_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-bert_250_redo_pipeline_en.md new file mode 100644 index 00000000000000..87cffc4a6b9908 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-bert_250_redo_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_250_redo_pipeline pipeline DistilBertForSequenceClassification from intrinsic-disorder +author: John Snow Labs +name: bert_250_redo_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_250_redo_pipeline` is a English model originally trained by intrinsic-disorder. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_250_redo_pipeline_en_5.5.0_3.0_1726861211299.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_250_redo_pipeline_en_5.5.0_3.0_1726861211299.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_250_redo_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_250_redo_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_250_redo_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/intrinsic-disorder/bert-250-redo + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-bert_250k_en.md b/docs/_posts/ahmedlone127/2024-09-20-bert_250k_en.md new file mode 100644 index 00000000000000..cfe454502f1aed --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-bert_250k_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_250k DistilBertForSequenceClassification from intrinsic-disorder +author: John Snow Labs +name: bert_250k +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_250k` is a English model originally trained by intrinsic-disorder. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_250k_en_5.5.0_3.0_1726842464779.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_250k_en_5.5.0_3.0_1726842464779.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("bert_250k","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("bert_250k", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_250k| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/intrinsic-disorder/bert-250k \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-bert_250k_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-bert_250k_pipeline_en.md new file mode 100644 index 00000000000000..fb2bfd8319080d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-bert_250k_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_250k_pipeline pipeline DistilBertForSequenceClassification from intrinsic-disorder +author: John Snow Labs +name: bert_250k_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_250k_pipeline` is a English model originally trained by intrinsic-disorder. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_250k_pipeline_en_5.5.0_3.0_1726842476788.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_250k_pipeline_en_5.5.0_3.0_1726842476788.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_250k_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_250k_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_250k_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/intrinsic-disorder/bert-250k + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-bert_base_arabert_finetuned_mdeberta_tswana_v2_en.md b/docs/_posts/ahmedlone127/2024-09-20-bert_base_arabert_finetuned_mdeberta_tswana_v2_en.md new file mode 100644 index 00000000000000..bb256aa83a736c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-bert_base_arabert_finetuned_mdeberta_tswana_v2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_arabert_finetuned_mdeberta_tswana_v2 BertEmbeddings from betteib +author: John Snow Labs +name: bert_base_arabert_finetuned_mdeberta_tswana_v2 +date: 2024-09-20 +tags: [en, open_source, onnx, embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_arabert_finetuned_mdeberta_tswana_v2` is a English model originally trained by betteib. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_arabert_finetuned_mdeberta_tswana_v2_en_5.5.0_3.0_1726806499691.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_arabert_finetuned_mdeberta_tswana_v2_en_5.5.0_3.0_1726806499691.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = BertEmbeddings.pretrained("bert_base_arabert_finetuned_mdeberta_tswana_v2","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = BertEmbeddings.pretrained("bert_base_arabert_finetuned_mdeberta_tswana_v2","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_arabert_finetuned_mdeberta_tswana_v2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[bert]| +|Language:|en| +|Size:|504.6 MB| + +## References + +https://huggingface.co/betteib/bert-base-arabert-finetuned-mdeberta-tn-v2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-bert_base_arabert_finetuned_mdeberta_tswana_v2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-bert_base_arabert_finetuned_mdeberta_tswana_v2_pipeline_en.md new file mode 100644 index 00000000000000..eb9873ff89def5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-bert_base_arabert_finetuned_mdeberta_tswana_v2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_base_arabert_finetuned_mdeberta_tswana_v2_pipeline pipeline BertEmbeddings from betteib +author: John Snow Labs +name: bert_base_arabert_finetuned_mdeberta_tswana_v2_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_arabert_finetuned_mdeberta_tswana_v2_pipeline` is a English model originally trained by betteib. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_arabert_finetuned_mdeberta_tswana_v2_pipeline_en_5.5.0_3.0_1726806524271.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_arabert_finetuned_mdeberta_tswana_v2_pipeline_en_5.5.0_3.0_1726806524271.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_arabert_finetuned_mdeberta_tswana_v2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_arabert_finetuned_mdeberta_tswana_v2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_arabert_finetuned_mdeberta_tswana_v2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|504.6 MB| + +## References + +https://huggingface.co/betteib/bert-base-arabert-finetuned-mdeberta-tn-v2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-bert_base_klue_mrc_finetuned_jihoonkimharu_ko.md b/docs/_posts/ahmedlone127/2024-09-20-bert_base_klue_mrc_finetuned_jihoonkimharu_ko.md new file mode 100644 index 00000000000000..cf607f197b8618 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-bert_base_klue_mrc_finetuned_jihoonkimharu_ko.md @@ -0,0 +1,86 @@ +--- +layout: model +title: Korean bert_base_klue_mrc_finetuned_jihoonkimharu BertForQuestionAnswering from jihoonkimharu +author: John Snow Labs +name: bert_base_klue_mrc_finetuned_jihoonkimharu +date: 2024-09-20 +tags: [ko, open_source, onnx, question_answering, bert] +task: Question Answering +language: ko +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_klue_mrc_finetuned_jihoonkimharu` is a Korean model originally trained by jihoonkimharu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_klue_mrc_finetuned_jihoonkimharu_ko_5.5.0_3.0_1726820644319.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_klue_mrc_finetuned_jihoonkimharu_ko_5.5.0_3.0_1726820644319.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_klue_mrc_finetuned_jihoonkimharu","ko") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_klue_mrc_finetuned_jihoonkimharu", "ko") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_klue_mrc_finetuned_jihoonkimharu| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|ko| +|Size:|412.4 MB| + +## References + +https://huggingface.co/jihoonkimharu/bert-base-klue-mrc-finetuned \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-bert_base_klue_mrc_finetuned_jihoonkimharu_pipeline_ko.md b/docs/_posts/ahmedlone127/2024-09-20-bert_base_klue_mrc_finetuned_jihoonkimharu_pipeline_ko.md new file mode 100644 index 00000000000000..43b8b1ba122caf --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-bert_base_klue_mrc_finetuned_jihoonkimharu_pipeline_ko.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Korean bert_base_klue_mrc_finetuned_jihoonkimharu_pipeline pipeline BertForQuestionAnswering from jihoonkimharu +author: John Snow Labs +name: bert_base_klue_mrc_finetuned_jihoonkimharu_pipeline +date: 2024-09-20 +tags: [ko, open_source, pipeline, onnx] +task: Question Answering +language: ko +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_klue_mrc_finetuned_jihoonkimharu_pipeline` is a Korean model originally trained by jihoonkimharu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_klue_mrc_finetuned_jihoonkimharu_pipeline_ko_5.5.0_3.0_1726820662656.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_klue_mrc_finetuned_jihoonkimharu_pipeline_ko_5.5.0_3.0_1726820662656.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_klue_mrc_finetuned_jihoonkimharu_pipeline", lang = "ko") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_klue_mrc_finetuned_jihoonkimharu_pipeline", lang = "ko") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_klue_mrc_finetuned_jihoonkimharu_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|ko| +|Size:|412.4 MB| + +## References + +https://huggingface.co/jihoonkimharu/bert-base-klue-mrc-finetuned + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-bert_base_multilingual_squad_v2_pipeline_xx.md b/docs/_posts/ahmedlone127/2024-09-20-bert_base_multilingual_squad_v2_pipeline_xx.md new file mode 100644 index 00000000000000..469dd6d0a5ad50 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-bert_base_multilingual_squad_v2_pipeline_xx.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Multilingual bert_base_multilingual_squad_v2_pipeline pipeline BertForQuestionAnswering from jedstrom +author: John Snow Labs +name: bert_base_multilingual_squad_v2_pipeline +date: 2024-09-20 +tags: [xx, open_source, pipeline, onnx] +task: Question Answering +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_multilingual_squad_v2_pipeline` is a Multilingual model originally trained by jedstrom. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_multilingual_squad_v2_pipeline_xx_5.5.0_3.0_1726833744728.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_multilingual_squad_v2_pipeline_xx_5.5.0_3.0_1726833744728.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_multilingual_squad_v2_pipeline", lang = "xx") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_multilingual_squad_v2_pipeline", lang = "xx") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_multilingual_squad_v2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|xx| +|Size:|625.5 MB| + +## References + +https://huggingface.co/jedstrom/bert-base-multilingual-squad-v2 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-bert_base_multilingual_squad_v2_xx.md b/docs/_posts/ahmedlone127/2024-09-20-bert_base_multilingual_squad_v2_xx.md new file mode 100644 index 00000000000000..c4801dd567e060 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-bert_base_multilingual_squad_v2_xx.md @@ -0,0 +1,86 @@ +--- +layout: model +title: Multilingual bert_base_multilingual_squad_v2 BertForQuestionAnswering from jedstrom +author: John Snow Labs +name: bert_base_multilingual_squad_v2 +date: 2024-09-20 +tags: [xx, open_source, onnx, question_answering, bert] +task: Question Answering +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_multilingual_squad_v2` is a Multilingual model originally trained by jedstrom. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_multilingual_squad_v2_xx_5.5.0_3.0_1726833712279.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_multilingual_squad_v2_xx_5.5.0_3.0_1726833712279.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_multilingual_squad_v2","xx") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_multilingual_squad_v2", "xx") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_multilingual_squad_v2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|xx| +|Size:|625.5 MB| + +## References + +https://huggingface.co/jedstrom/bert-base-multilingual-squad-v2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-bert_base_portuguese_cased_tiagosanti_en.md b/docs/_posts/ahmedlone127/2024-09-20-bert_base_portuguese_cased_tiagosanti_en.md new file mode 100644 index 00000000000000..8bf312ffa39f43 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-bert_base_portuguese_cased_tiagosanti_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_portuguese_cased_tiagosanti BertForSequenceClassification from TiagoSanti +author: John Snow Labs +name: bert_base_portuguese_cased_tiagosanti +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_portuguese_cased_tiagosanti` is a English model originally trained by TiagoSanti. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_portuguese_cased_tiagosanti_en_5.5.0_3.0_1726859840548.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_portuguese_cased_tiagosanti_en_5.5.0_3.0_1726859840548.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_portuguese_cased_tiagosanti","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_portuguese_cased_tiagosanti", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_portuguese_cased_tiagosanti| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|408.2 MB| + +## References + +https://huggingface.co/TiagoSanti/bert-base-portuguese-cased \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-bert_base_portuguese_cased_tiagosanti_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-bert_base_portuguese_cased_tiagosanti_pipeline_en.md new file mode 100644 index 00000000000000..fb25cab68f16ab --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-bert_base_portuguese_cased_tiagosanti_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_base_portuguese_cased_tiagosanti_pipeline pipeline BertForSequenceClassification from TiagoSanti +author: John Snow Labs +name: bert_base_portuguese_cased_tiagosanti_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_portuguese_cased_tiagosanti_pipeline` is a English model originally trained by TiagoSanti. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_portuguese_cased_tiagosanti_pipeline_en_5.5.0_3.0_1726859860251.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_portuguese_cased_tiagosanti_pipeline_en_5.5.0_3.0_1726859860251.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_portuguese_cased_tiagosanti_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_portuguese_cased_tiagosanti_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_portuguese_cased_tiagosanti_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|408.2 MB| + +## References + +https://huggingface.co/TiagoSanti/bert-base-portuguese-cased + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-bert_base_spanish_wwm_cased_finetuned_qa_sqac_en.md b/docs/_posts/ahmedlone127/2024-09-20-bert_base_spanish_wwm_cased_finetuned_qa_sqac_en.md new file mode 100644 index 00000000000000..778b99fbf4f399 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-bert_base_spanish_wwm_cased_finetuned_qa_sqac_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_spanish_wwm_cased_finetuned_qa_sqac BertForQuestionAnswering from dccuchile +author: John Snow Labs +name: bert_base_spanish_wwm_cased_finetuned_qa_sqac +date: 2024-09-20 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_spanish_wwm_cased_finetuned_qa_sqac` is a English model originally trained by dccuchile. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_spanish_wwm_cased_finetuned_qa_sqac_en_5.5.0_3.0_1726833665686.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_spanish_wwm_cased_finetuned_qa_sqac_en_5.5.0_3.0_1726833665686.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_spanish_wwm_cased_finetuned_qa_sqac","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_spanish_wwm_cased_finetuned_qa_sqac", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_spanish_wwm_cased_finetuned_qa_sqac| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|409.5 MB| + +## References + +https://huggingface.co/dccuchile/bert-base-spanish-wwm-cased-finetuned-qa-sqac \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-bert_base_spanish_wwm_cased_finetuned_qa_sqac_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-bert_base_spanish_wwm_cased_finetuned_qa_sqac_pipeline_en.md new file mode 100644 index 00000000000000..e2f1c501429aca --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-bert_base_spanish_wwm_cased_finetuned_qa_sqac_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_spanish_wwm_cased_finetuned_qa_sqac_pipeline pipeline BertForQuestionAnswering from dccuchile +author: John Snow Labs +name: bert_base_spanish_wwm_cased_finetuned_qa_sqac_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_spanish_wwm_cased_finetuned_qa_sqac_pipeline` is a English model originally trained by dccuchile. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_spanish_wwm_cased_finetuned_qa_sqac_pipeline_en_5.5.0_3.0_1726833687506.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_spanish_wwm_cased_finetuned_qa_sqac_pipeline_en_5.5.0_3.0_1726833687506.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_spanish_wwm_cased_finetuned_qa_sqac_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_spanish_wwm_cased_finetuned_qa_sqac_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_spanish_wwm_cased_finetuned_qa_sqac_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.5 MB| + +## References + +https://huggingface.co/dccuchile/bert-base-spanish-wwm-cased-finetuned-qa-sqac + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-bert_base_spanish_wwm_uncased_finetuned_qa_sqac_en.md b/docs/_posts/ahmedlone127/2024-09-20-bert_base_spanish_wwm_uncased_finetuned_qa_sqac_en.md new file mode 100644 index 00000000000000..3c410ba48a4c8b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-bert_base_spanish_wwm_uncased_finetuned_qa_sqac_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_spanish_wwm_uncased_finetuned_qa_sqac BertForQuestionAnswering from dccuchile +author: John Snow Labs +name: bert_base_spanish_wwm_uncased_finetuned_qa_sqac +date: 2024-09-20 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_spanish_wwm_uncased_finetuned_qa_sqac` is a English model originally trained by dccuchile. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_spanish_wwm_uncased_finetuned_qa_sqac_en_5.5.0_3.0_1726833950446.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_spanish_wwm_uncased_finetuned_qa_sqac_en_5.5.0_3.0_1726833950446.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_spanish_wwm_uncased_finetuned_qa_sqac","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_spanish_wwm_uncased_finetuned_qa_sqac", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_spanish_wwm_uncased_finetuned_qa_sqac| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|409.7 MB| + +## References + +https://huggingface.co/dccuchile/bert-base-spanish-wwm-uncased-finetuned-qa-sqac \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-bert_base_spanish_wwm_uncased_finetuned_qa_sqac_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-bert_base_spanish_wwm_uncased_finetuned_qa_sqac_pipeline_en.md new file mode 100644 index 00000000000000..a00febd32c77ac --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-bert_base_spanish_wwm_uncased_finetuned_qa_sqac_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_spanish_wwm_uncased_finetuned_qa_sqac_pipeline pipeline BertForQuestionAnswering from dccuchile +author: John Snow Labs +name: bert_base_spanish_wwm_uncased_finetuned_qa_sqac_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_spanish_wwm_uncased_finetuned_qa_sqac_pipeline` is a English model originally trained by dccuchile. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_spanish_wwm_uncased_finetuned_qa_sqac_pipeline_en_5.5.0_3.0_1726833971009.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_spanish_wwm_uncased_finetuned_qa_sqac_pipeline_en_5.5.0_3.0_1726833971009.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_spanish_wwm_uncased_finetuned_qa_sqac_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_spanish_wwm_uncased_finetuned_qa_sqac_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_spanish_wwm_uncased_finetuned_qa_sqac_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.7 MB| + +## References + +https://huggingface.co/dccuchile/bert-base-spanish-wwm-uncased-finetuned-qa-sqac + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-bert_base_squad_theseus_bulgarian_bg.md b/docs/_posts/ahmedlone127/2024-09-20-bert_base_squad_theseus_bulgarian_bg.md new file mode 100644 index 00000000000000..6eda9f68377c8e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-bert_base_squad_theseus_bulgarian_bg.md @@ -0,0 +1,86 @@ +--- +layout: model +title: Bulgarian bert_base_squad_theseus_bulgarian BertForQuestionAnswering from rmihaylov +author: John Snow Labs +name: bert_base_squad_theseus_bulgarian +date: 2024-09-20 +tags: [bg, open_source, onnx, question_answering, bert] +task: Question Answering +language: bg +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_squad_theseus_bulgarian` is a Bulgarian model originally trained by rmihaylov. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_squad_theseus_bulgarian_bg_5.5.0_3.0_1726834091590.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_squad_theseus_bulgarian_bg_5.5.0_3.0_1726834091590.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_squad_theseus_bulgarian","bg") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_squad_theseus_bulgarian", "bg") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_squad_theseus_bulgarian| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|bg| +|Size:|505.6 MB| + +## References + +https://huggingface.co/rmihaylov/bert-base-squad-theseus-bg \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-bert_base_squad_theseus_bulgarian_pipeline_bg.md b/docs/_posts/ahmedlone127/2024-09-20-bert_base_squad_theseus_bulgarian_pipeline_bg.md new file mode 100644 index 00000000000000..92df6e70ea11e9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-bert_base_squad_theseus_bulgarian_pipeline_bg.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Bulgarian bert_base_squad_theseus_bulgarian_pipeline pipeline BertForQuestionAnswering from rmihaylov +author: John Snow Labs +name: bert_base_squad_theseus_bulgarian_pipeline +date: 2024-09-20 +tags: [bg, open_source, pipeline, onnx] +task: Question Answering +language: bg +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_squad_theseus_bulgarian_pipeline` is a Bulgarian model originally trained by rmihaylov. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_squad_theseus_bulgarian_pipeline_bg_5.5.0_3.0_1726834116224.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_squad_theseus_bulgarian_pipeline_bg_5.5.0_3.0_1726834116224.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_squad_theseus_bulgarian_pipeline", lang = "bg") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_squad_theseus_bulgarian_pipeline", lang = "bg") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_squad_theseus_bulgarian_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|bg| +|Size:|505.6 MB| + +## References + +https://huggingface.co/rmihaylov/bert-base-squad-theseus-bg + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-bert_base_squad_v1_1_portuguese_ibama_v0_220240731160529_en.md b/docs/_posts/ahmedlone127/2024-09-20-bert_base_squad_v1_1_portuguese_ibama_v0_220240731160529_en.md new file mode 100644 index 00000000000000..6be89c00aeac4c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-bert_base_squad_v1_1_portuguese_ibama_v0_220240731160529_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_squad_v1_1_portuguese_ibama_v0_220240731160529 BertForQuestionAnswering from alcalazans +author: John Snow Labs +name: bert_base_squad_v1_1_portuguese_ibama_v0_220240731160529 +date: 2024-09-20 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_squad_v1_1_portuguese_ibama_v0_220240731160529` is a English model originally trained by alcalazans. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_squad_v1_1_portuguese_ibama_v0_220240731160529_en_5.5.0_3.0_1726834340163.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_squad_v1_1_portuguese_ibama_v0_220240731160529_en_5.5.0_3.0_1726834340163.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_squad_v1_1_portuguese_ibama_v0_220240731160529","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_squad_v1_1_portuguese_ibama_v0_220240731160529", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_squad_v1_1_portuguese_ibama_v0_220240731160529| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|405.9 MB| + +## References + +https://huggingface.co/alcalazans/bert-base-squad-v1.1-pt-IBAMA_v0.220240731160529 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-bert_base_squad_v1_1_portuguese_ibama_v0_220240731160529_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-bert_base_squad_v1_1_portuguese_ibama_v0_220240731160529_pipeline_en.md new file mode 100644 index 00000000000000..304be171200707 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-bert_base_squad_v1_1_portuguese_ibama_v0_220240731160529_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_squad_v1_1_portuguese_ibama_v0_220240731160529_pipeline pipeline BertForQuestionAnswering from alcalazans +author: John Snow Labs +name: bert_base_squad_v1_1_portuguese_ibama_v0_220240731160529_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_squad_v1_1_portuguese_ibama_v0_220240731160529_pipeline` is a English model originally trained by alcalazans. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_squad_v1_1_portuguese_ibama_v0_220240731160529_pipeline_en_5.5.0_3.0_1726834359056.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_squad_v1_1_portuguese_ibama_v0_220240731160529_pipeline_en_5.5.0_3.0_1726834359056.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_squad_v1_1_portuguese_ibama_v0_220240731160529_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_squad_v1_1_portuguese_ibama_v0_220240731160529_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_squad_v1_1_portuguese_ibama_v0_220240731160529_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|405.9 MB| + +## References + +https://huggingface.co/alcalazans/bert-base-squad-v1.1-pt-IBAMA_v0.220240731160529 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-bert_base_uncased_emotionsmodified_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-bert_base_uncased_emotionsmodified_pipeline_en.md new file mode 100644 index 00000000000000..917b4e4ebc27d6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-bert_base_uncased_emotionsmodified_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_base_uncased_emotionsmodified_pipeline pipeline BertForSequenceClassification from zbnsl +author: John Snow Labs +name: bert_base_uncased_emotionsmodified_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_emotionsmodified_pipeline` is a English model originally trained by zbnsl. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_emotionsmodified_pipeline_en_5.5.0_3.0_1726794983609.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_emotionsmodified_pipeline_en_5.5.0_3.0_1726794983609.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_emotionsmodified_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_emotionsmodified_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_emotionsmodified_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/zbnsl/bert-base-uncased-emotionsModified + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-bert_base_uncased_ep_1_12_b_32_lr_4e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_en.md b/docs/_posts/ahmedlone127/2024-09-20-bert_base_uncased_ep_1_12_b_32_lr_4e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_en.md new file mode 100644 index 00000000000000..f799b5f6f4df0d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-bert_base_uncased_ep_1_12_b_32_lr_4e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_uncased_ep_1_12_b_32_lr_4e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0 BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_ep_1_12_b_32_lr_4e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0 +date: 2024-09-20 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_ep_1_12_b_32_lr_4e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_ep_1_12_b_32_lr_4e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_en_5.5.0_3.0_1726833897722.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_ep_1_12_b_32_lr_4e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_en_5.5.0_3.0_1726833897722.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_ep_1_12_b_32_lr_4e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_ep_1_12_b_32_lr_4e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_ep_1_12_b_32_lr_4e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-ep-1.12-b-32-lr-4e-07-dp-0.5-ss-0-st-True-fh-False-hs-0 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-bert_base_uncased_ep_1_12_b_32_lr_4e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-bert_base_uncased_ep_1_12_b_32_lr_4e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_pipeline_en.md new file mode 100644 index 00000000000000..c3b119c575b6d1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-bert_base_uncased_ep_1_12_b_32_lr_4e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_uncased_ep_1_12_b_32_lr_4e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_pipeline pipeline BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_ep_1_12_b_32_lr_4e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_ep_1_12_b_32_lr_4e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_pipeline` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_ep_1_12_b_32_lr_4e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_pipeline_en_5.5.0_3.0_1726833920233.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_ep_1_12_b_32_lr_4e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_pipeline_en_5.5.0_3.0_1726833920233.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_ep_1_12_b_32_lr_4e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_ep_1_12_b_32_lr_4e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_ep_1_12_b_32_lr_4e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-ep-1.12-b-32-lr-4e-07-dp-0.5-ss-0-st-True-fh-False-hs-0 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-bert_base_uncased_finetune_squad_ep_1_0_lr_1e_05_wd_0_001_dp_0_0004_swati_0_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-bert_base_uncased_finetune_squad_ep_1_0_lr_1e_05_wd_0_001_dp_0_0004_swati_0_pipeline_en.md new file mode 100644 index 00000000000000..5ebef49f6c6a47 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-bert_base_uncased_finetune_squad_ep_1_0_lr_1e_05_wd_0_001_dp_0_0004_swati_0_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_uncased_finetune_squad_ep_1_0_lr_1e_05_wd_0_001_dp_0_0004_swati_0_pipeline pipeline BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_finetune_squad_ep_1_0_lr_1e_05_wd_0_001_dp_0_0004_swati_0_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetune_squad_ep_1_0_lr_1e_05_wd_0_001_dp_0_0004_swati_0_pipeline` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_1_0_lr_1e_05_wd_0_001_dp_0_0004_swati_0_pipeline_en_5.5.0_3.0_1726833894766.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_1_0_lr_1e_05_wd_0_001_dp_0_0004_swati_0_pipeline_en_5.5.0_3.0_1726833894766.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_finetune_squad_ep_1_0_lr_1e_05_wd_0_001_dp_0_0004_swati_0_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_finetune_squad_ep_1_0_lr_1e_05_wd_0_001_dp_0_0004_swati_0_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetune_squad_ep_1_0_lr_1e_05_wd_0_001_dp_0_0004_swati_0_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-finetune-squad-ep-1.0-lr-1e-05-wd-0.001-dp-0.0004-ss-0 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-bert_base_uncased_finetune_squad_ep_1_0_lr_1e_06_wd_0_001_dp_0_5_swati_0_southern_sotho_true_fh_true_en.md b/docs/_posts/ahmedlone127/2024-09-20-bert_base_uncased_finetune_squad_ep_1_0_lr_1e_06_wd_0_001_dp_0_5_swati_0_southern_sotho_true_fh_true_en.md new file mode 100644 index 00000000000000..d82c42268dad6e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-bert_base_uncased_finetune_squad_ep_1_0_lr_1e_06_wd_0_001_dp_0_5_swati_0_southern_sotho_true_fh_true_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_uncased_finetune_squad_ep_1_0_lr_1e_06_wd_0_001_dp_0_5_swati_0_southern_sotho_true_fh_true BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_finetune_squad_ep_1_0_lr_1e_06_wd_0_001_dp_0_5_swati_0_southern_sotho_true_fh_true +date: 2024-09-20 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetune_squad_ep_1_0_lr_1e_06_wd_0_001_dp_0_5_swati_0_southern_sotho_true_fh_true` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_1_0_lr_1e_06_wd_0_001_dp_0_5_swati_0_southern_sotho_true_fh_true_en_5.5.0_3.0_1726833695106.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_1_0_lr_1e_06_wd_0_001_dp_0_5_swati_0_southern_sotho_true_fh_true_en_5.5.0_3.0_1726833695106.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetune_squad_ep_1_0_lr_1e_06_wd_0_001_dp_0_5_swati_0_southern_sotho_true_fh_true","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetune_squad_ep_1_0_lr_1e_06_wd_0_001_dp_0_5_swati_0_southern_sotho_true_fh_true", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetune_squad_ep_1_0_lr_1e_06_wd_0_001_dp_0_5_swati_0_southern_sotho_true_fh_true| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-finetune-squad-ep-1.0-lr-1e-06-wd-0.001-dp-0.5-ss-0-st-True-fh-True \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-bert_base_uncased_finetune_squad_ep_1_0_lr_1e_06_wd_0_001_dp_0_5_swati_0_southern_sotho_true_fh_true_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-bert_base_uncased_finetune_squad_ep_1_0_lr_1e_06_wd_0_001_dp_0_5_swati_0_southern_sotho_true_fh_true_pipeline_en.md new file mode 100644 index 00000000000000..0934768f835a3d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-bert_base_uncased_finetune_squad_ep_1_0_lr_1e_06_wd_0_001_dp_0_5_swati_0_southern_sotho_true_fh_true_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_uncased_finetune_squad_ep_1_0_lr_1e_06_wd_0_001_dp_0_5_swati_0_southern_sotho_true_fh_true_pipeline pipeline BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_finetune_squad_ep_1_0_lr_1e_06_wd_0_001_dp_0_5_swati_0_southern_sotho_true_fh_true_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetune_squad_ep_1_0_lr_1e_06_wd_0_001_dp_0_5_swati_0_southern_sotho_true_fh_true_pipeline` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_1_0_lr_1e_06_wd_0_001_dp_0_5_swati_0_southern_sotho_true_fh_true_pipeline_en_5.5.0_3.0_1726833715719.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_1_0_lr_1e_06_wd_0_001_dp_0_5_swati_0_southern_sotho_true_fh_true_pipeline_en_5.5.0_3.0_1726833715719.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_finetune_squad_ep_1_0_lr_1e_06_wd_0_001_dp_0_5_swati_0_southern_sotho_true_fh_true_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_finetune_squad_ep_1_0_lr_1e_06_wd_0_001_dp_0_5_swati_0_southern_sotho_true_fh_true_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetune_squad_ep_1_0_lr_1e_06_wd_0_001_dp_0_5_swati_0_southern_sotho_true_fh_true_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-finetune-squad-ep-1.0-lr-1e-06-wd-0.001-dp-0.5-ss-0-st-True-fh-True + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-bert_base_uncased_finetune_squad_ep_2_25_lr_4e_07_wd_1e_05_dp_0_3_swati_0_southern_sotho_false_fh_false_hs_600_en.md b/docs/_posts/ahmedlone127/2024-09-20-bert_base_uncased_finetune_squad_ep_2_25_lr_4e_07_wd_1e_05_dp_0_3_swati_0_southern_sotho_false_fh_false_hs_600_en.md new file mode 100644 index 00000000000000..299cb8b96c4fa8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-bert_base_uncased_finetune_squad_ep_2_25_lr_4e_07_wd_1e_05_dp_0_3_swati_0_southern_sotho_false_fh_false_hs_600_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_uncased_finetune_squad_ep_2_25_lr_4e_07_wd_1e_05_dp_0_3_swati_0_southern_sotho_false_fh_false_hs_600 BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_finetune_squad_ep_2_25_lr_4e_07_wd_1e_05_dp_0_3_swati_0_southern_sotho_false_fh_false_hs_600 +date: 2024-09-20 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetune_squad_ep_2_25_lr_4e_07_wd_1e_05_dp_0_3_swati_0_southern_sotho_false_fh_false_hs_600` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_2_25_lr_4e_07_wd_1e_05_dp_0_3_swati_0_southern_sotho_false_fh_false_hs_600_en_5.5.0_3.0_1726834029212.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_2_25_lr_4e_07_wd_1e_05_dp_0_3_swati_0_southern_sotho_false_fh_false_hs_600_en_5.5.0_3.0_1726834029212.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetune_squad_ep_2_25_lr_4e_07_wd_1e_05_dp_0_3_swati_0_southern_sotho_false_fh_false_hs_600","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetune_squad_ep_2_25_lr_4e_07_wd_1e_05_dp_0_3_swati_0_southern_sotho_false_fh_false_hs_600", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetune_squad_ep_2_25_lr_4e_07_wd_1e_05_dp_0_3_swati_0_southern_sotho_false_fh_false_hs_600| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-finetune-squad-ep-2.25-lr-4e-07-wd-1e-05-dp-0.3-ss-0-st-False-fh-False-hs-600 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-bert_base_uncased_finetune_squad_ep_2_25_lr_4e_07_wd_1e_05_dp_0_3_swati_0_southern_sotho_false_fh_false_hs_600_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-bert_base_uncased_finetune_squad_ep_2_25_lr_4e_07_wd_1e_05_dp_0_3_swati_0_southern_sotho_false_fh_false_hs_600_pipeline_en.md new file mode 100644 index 00000000000000..27906bf6a53755 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-bert_base_uncased_finetune_squad_ep_2_25_lr_4e_07_wd_1e_05_dp_0_3_swati_0_southern_sotho_false_fh_false_hs_600_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_uncased_finetune_squad_ep_2_25_lr_4e_07_wd_1e_05_dp_0_3_swati_0_southern_sotho_false_fh_false_hs_600_pipeline pipeline BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_finetune_squad_ep_2_25_lr_4e_07_wd_1e_05_dp_0_3_swati_0_southern_sotho_false_fh_false_hs_600_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetune_squad_ep_2_25_lr_4e_07_wd_1e_05_dp_0_3_swati_0_southern_sotho_false_fh_false_hs_600_pipeline` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_2_25_lr_4e_07_wd_1e_05_dp_0_3_swati_0_southern_sotho_false_fh_false_hs_600_pipeline_en_5.5.0_3.0_1726834048789.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_2_25_lr_4e_07_wd_1e_05_dp_0_3_swati_0_southern_sotho_false_fh_false_hs_600_pipeline_en_5.5.0_3.0_1726834048789.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_finetune_squad_ep_2_25_lr_4e_07_wd_1e_05_dp_0_3_swati_0_southern_sotho_false_fh_false_hs_600_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_finetune_squad_ep_2_25_lr_4e_07_wd_1e_05_dp_0_3_swati_0_southern_sotho_false_fh_false_hs_600_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetune_squad_ep_2_25_lr_4e_07_wd_1e_05_dp_0_3_swati_0_southern_sotho_false_fh_false_hs_600_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-finetune-squad-ep-2.25-lr-4e-07-wd-1e-05-dp-0.3-ss-0-st-False-fh-False-hs-600 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-bert_base_uncased_finetuned_set_3_en.md b/docs/_posts/ahmedlone127/2024-09-20-bert_base_uncased_finetuned_set_3_en.md new file mode 100644 index 00000000000000..d9cc8409103b00 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-bert_base_uncased_finetuned_set_3_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_uncased_finetuned_set_3 BertForSequenceClassification from joetey +author: John Snow Labs +name: bert_base_uncased_finetuned_set_3 +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetuned_set_3` is a English model originally trained by joetey. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_set_3_en_5.5.0_3.0_1726797266587.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_set_3_en_5.5.0_3.0_1726797266587.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_uncased_finetuned_set_3","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_uncased_finetuned_set_3", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetuned_set_3| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/joetey/bert-base-uncased-finetuned-set_3 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-bert_base_uncased_finetuned_set_3_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-bert_base_uncased_finetuned_set_3_pipeline_en.md new file mode 100644 index 00000000000000..4f32a1df5f3902 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-bert_base_uncased_finetuned_set_3_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_base_uncased_finetuned_set_3_pipeline pipeline BertForSequenceClassification from joetey +author: John Snow Labs +name: bert_base_uncased_finetuned_set_3_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetuned_set_3_pipeline` is a English model originally trained by joetey. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_set_3_pipeline_en_5.5.0_3.0_1726797285307.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_set_3_pipeline_en_5.5.0_3.0_1726797285307.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_finetuned_set_3_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_finetuned_set_3_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetuned_set_3_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/joetey/bert-base-uncased-finetuned-set_3 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-bert_base_uncased_glue_sst2_en.md b/docs/_posts/ahmedlone127/2024-09-20-bert_base_uncased_glue_sst2_en.md new file mode 100644 index 00000000000000..bd0b7ccd6d90b2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-bert_base_uncased_glue_sst2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_uncased_glue_sst2 BertForSequenceClassification from pmthangk09 +author: John Snow Labs +name: bert_base_uncased_glue_sst2 +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_glue_sst2` is a English model originally trained by pmthangk09. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_glue_sst2_en_5.5.0_3.0_1726829390120.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_glue_sst2_en_5.5.0_3.0_1726829390120.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_uncased_glue_sst2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_uncased_glue_sst2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_glue_sst2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/pmthangk09/bert-base-uncased-glue-sst2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-bert_base_uncased_glue_sst2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-bert_base_uncased_glue_sst2_pipeline_en.md new file mode 100644 index 00000000000000..dbde31a426d09d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-bert_base_uncased_glue_sst2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_base_uncased_glue_sst2_pipeline pipeline BertForSequenceClassification from pmthangk09 +author: John Snow Labs +name: bert_base_uncased_glue_sst2_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_glue_sst2_pipeline` is a English model originally trained by pmthangk09. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_glue_sst2_pipeline_en_5.5.0_3.0_1726829408860.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_glue_sst2_pipeline_en_5.5.0_3.0_1726829408860.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_glue_sst2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_glue_sst2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_glue_sst2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/pmthangk09/bert-base-uncased-glue-sst2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-bert_base_uncased_qqp_modeltc_en.md b/docs/_posts/ahmedlone127/2024-09-20-bert_base_uncased_qqp_modeltc_en.md new file mode 100644 index 00000000000000..156b09749c908b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-bert_base_uncased_qqp_modeltc_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_uncased_qqp_modeltc BertForSequenceClassification from ModelTC +author: John Snow Labs +name: bert_base_uncased_qqp_modeltc +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_qqp_modeltc` is a English model originally trained by ModelTC. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_qqp_modeltc_en_5.5.0_3.0_1726828569797.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_qqp_modeltc_en_5.5.0_3.0_1726828569797.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_uncased_qqp_modeltc","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_uncased_qqp_modeltc", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_qqp_modeltc| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/ModelTC/bert-base-uncased-qqp \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-bert_base_uncased_qqp_modeltc_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-bert_base_uncased_qqp_modeltc_pipeline_en.md new file mode 100644 index 00000000000000..633a6390e477d9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-bert_base_uncased_qqp_modeltc_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_base_uncased_qqp_modeltc_pipeline pipeline BertForSequenceClassification from ModelTC +author: John Snow Labs +name: bert_base_uncased_qqp_modeltc_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_qqp_modeltc_pipeline` is a English model originally trained by ModelTC. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_qqp_modeltc_pipeline_en_5.5.0_3.0_1726828589206.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_qqp_modeltc_pipeline_en_5.5.0_3.0_1726828589206.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_qqp_modeltc_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_qqp_modeltc_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_qqp_modeltc_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/ModelTC/bert-base-uncased-qqp + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-bert_base_uncased_retrained_squad_meghanaanil_en.md b/docs/_posts/ahmedlone127/2024-09-20-bert_base_uncased_retrained_squad_meghanaanil_en.md new file mode 100644 index 00000000000000..3e3c8c0d9e4e34 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-bert_base_uncased_retrained_squad_meghanaanil_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_uncased_retrained_squad_meghanaanil BertForQuestionAnswering from meghanaanil +author: John Snow Labs +name: bert_base_uncased_retrained_squad_meghanaanil +date: 2024-09-20 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_retrained_squad_meghanaanil` is a English model originally trained by meghanaanil. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_retrained_squad_meghanaanil_en_5.5.0_3.0_1726833906603.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_retrained_squad_meghanaanil_en_5.5.0_3.0_1726833906603.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_retrained_squad_meghanaanil","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_retrained_squad_meghanaanil", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_retrained_squad_meghanaanil| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/meghanaanil/bert-base-uncased-retrained-squad \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-bert_base_uncased_retrained_squad_meghanaanil_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-bert_base_uncased_retrained_squad_meghanaanil_pipeline_en.md new file mode 100644 index 00000000000000..0d09960d4145a7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-bert_base_uncased_retrained_squad_meghanaanil_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_uncased_retrained_squad_meghanaanil_pipeline pipeline BertForQuestionAnswering from meghanaanil +author: John Snow Labs +name: bert_base_uncased_retrained_squad_meghanaanil_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_retrained_squad_meghanaanil_pipeline` is a English model originally trained by meghanaanil. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_retrained_squad_meghanaanil_pipeline_en_5.5.0_3.0_1726833926732.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_retrained_squad_meghanaanil_pipeline_en_5.5.0_3.0_1726833926732.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_retrained_squad_meghanaanil_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_retrained_squad_meghanaanil_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_retrained_squad_meghanaanil_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/meghanaanil/bert-base-uncased-retrained-squad + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-bert_base_uncased_token_itr0_0_0001_all_01_03_2022_14_21_25_en.md b/docs/_posts/ahmedlone127/2024-09-20-bert_base_uncased_token_itr0_0_0001_all_01_03_2022_14_21_25_en.md new file mode 100644 index 00000000000000..3d2b81e3cbe905 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-bert_base_uncased_token_itr0_0_0001_all_01_03_2022_14_21_25_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_uncased_token_itr0_0_0001_all_01_03_2022_14_21_25 BertForTokenClassification from ali2066 +author: John Snow Labs +name: bert_base_uncased_token_itr0_0_0001_all_01_03_2022_14_21_25 +date: 2024-09-20 +tags: [en, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_token_itr0_0_0001_all_01_03_2022_14_21_25` is a English model originally trained by ali2066. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_token_itr0_0_0001_all_01_03_2022_14_21_25_en_5.5.0_3.0_1726840104716.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_token_itr0_0_0001_all_01_03_2022_14_21_25_en_5.5.0_3.0_1726840104716.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("bert_base_uncased_token_itr0_0_0001_all_01_03_2022_14_21_25","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_base_uncased_token_itr0_0_0001_all_01_03_2022_14_21_25", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_token_itr0_0_0001_all_01_03_2022_14_21_25| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/ali2066/bert-base-uncased_token_itr0_0.0001_all_01_03_2022-14_21_25 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-bert_base_uncased_token_itr0_0_0001_all_01_03_2022_14_21_25_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-bert_base_uncased_token_itr0_0_0001_all_01_03_2022_14_21_25_pipeline_en.md new file mode 100644 index 00000000000000..9ab27a6e644d48 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-bert_base_uncased_token_itr0_0_0001_all_01_03_2022_14_21_25_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_base_uncased_token_itr0_0_0001_all_01_03_2022_14_21_25_pipeline pipeline BertForTokenClassification from ali2066 +author: John Snow Labs +name: bert_base_uncased_token_itr0_0_0001_all_01_03_2022_14_21_25_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_token_itr0_0_0001_all_01_03_2022_14_21_25_pipeline` is a English model originally trained by ali2066. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_token_itr0_0_0001_all_01_03_2022_14_21_25_pipeline_en_5.5.0_3.0_1726840124720.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_token_itr0_0_0001_all_01_03_2022_14_21_25_pipeline_en_5.5.0_3.0_1726840124720.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_token_itr0_0_0001_all_01_03_2022_14_21_25_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_token_itr0_0_0001_all_01_03_2022_14_21_25_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_token_itr0_0_0001_all_01_03_2022_14_21_25_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/ali2066/bert-base-uncased_token_itr0_0.0001_all_01_03_2022-14_21_25 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-bert_base_uncased_vitamin_c_fact_verification_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-bert_base_uncased_vitamin_c_fact_verification_pipeline_en.md new file mode 100644 index 00000000000000..770f06365c1289 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-bert_base_uncased_vitamin_c_fact_verification_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_uncased_vitamin_c_fact_verification_pipeline pipeline BertForQuestionAnswering from DunnBC22 +author: John Snow Labs +name: bert_base_uncased_vitamin_c_fact_verification_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_vitamin_c_fact_verification_pipeline` is a English model originally trained by DunnBC22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_vitamin_c_fact_verification_pipeline_en_5.5.0_3.0_1726820777448.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_vitamin_c_fact_verification_pipeline_en_5.5.0_3.0_1726820777448.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_vitamin_c_fact_verification_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_vitamin_c_fact_verification_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_vitamin_c_fact_verification_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/DunnBC22/bert-base-uncased-Vitamin_C_Fact_Verification + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-bert_classification_en.md b/docs/_posts/ahmedlone127/2024-09-20-bert_classification_en.md new file mode 100644 index 00000000000000..22daa376caf6de --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-bert_classification_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_classification DistilBertForSequenceClassification from mdp0999 +author: John Snow Labs +name: bert_classification +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_classification` is a English model originally trained by mdp0999. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_classification_en_5.5.0_3.0_1726860933086.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_classification_en_5.5.0_3.0_1726860933086.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("bert_classification","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("bert_classification", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_classification| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/mdp0999/bert_classification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-bert_classification_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-bert_classification_pipeline_en.md new file mode 100644 index 00000000000000..16914ce02b7f8a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-bert_classification_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_classification_pipeline pipeline DistilBertForSequenceClassification from mdp0999 +author: John Snow Labs +name: bert_classification_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_classification_pipeline` is a English model originally trained by mdp0999. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_classification_pipeline_en_5.5.0_3.0_1726860945093.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_classification_pipeline_en_5.5.0_3.0_1726860945093.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_classification_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_classification_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_classification_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/mdp0999/bert_classification + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-bert_distilled_multi_teacher_model_random_emotion_epoch7_alpha0_8_refined_en.md b/docs/_posts/ahmedlone127/2024-09-20-bert_distilled_multi_teacher_model_random_emotion_epoch7_alpha0_8_refined_en.md new file mode 100644 index 00000000000000..147120b05eec82 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-bert_distilled_multi_teacher_model_random_emotion_epoch7_alpha0_8_refined_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_distilled_multi_teacher_model_random_emotion_epoch7_alpha0_8_refined DistilBertForSequenceClassification from ArafatBHossain +author: John Snow Labs +name: bert_distilled_multi_teacher_model_random_emotion_epoch7_alpha0_8_refined +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_distilled_multi_teacher_model_random_emotion_epoch7_alpha0_8_refined` is a English model originally trained by ArafatBHossain. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_distilled_multi_teacher_model_random_emotion_epoch7_alpha0_8_refined_en_5.5.0_3.0_1726840874698.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_distilled_multi_teacher_model_random_emotion_epoch7_alpha0_8_refined_en_5.5.0_3.0_1726840874698.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("bert_distilled_multi_teacher_model_random_emotion_epoch7_alpha0_8_refined","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("bert_distilled_multi_teacher_model_random_emotion_epoch7_alpha0_8_refined", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_distilled_multi_teacher_model_random_emotion_epoch7_alpha0_8_refined| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/ArafatBHossain/bert-distilled-multi_teacher_model_random_emotion_epoch7_alpha0.8_refined \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-bert_distilled_multi_teacher_model_random_emotion_epoch7_alpha0_8_refined_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-bert_distilled_multi_teacher_model_random_emotion_epoch7_alpha0_8_refined_pipeline_en.md new file mode 100644 index 00000000000000..685939b03cce33 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-bert_distilled_multi_teacher_model_random_emotion_epoch7_alpha0_8_refined_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_distilled_multi_teacher_model_random_emotion_epoch7_alpha0_8_refined_pipeline pipeline DistilBertForSequenceClassification from ArafatBHossain +author: John Snow Labs +name: bert_distilled_multi_teacher_model_random_emotion_epoch7_alpha0_8_refined_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_distilled_multi_teacher_model_random_emotion_epoch7_alpha0_8_refined_pipeline` is a English model originally trained by ArafatBHossain. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_distilled_multi_teacher_model_random_emotion_epoch7_alpha0_8_refined_pipeline_en_5.5.0_3.0_1726840889865.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_distilled_multi_teacher_model_random_emotion_epoch7_alpha0_8_refined_pipeline_en_5.5.0_3.0_1726840889865.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_distilled_multi_teacher_model_random_emotion_epoch7_alpha0_8_refined_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_distilled_multi_teacher_model_random_emotion_epoch7_alpha0_8_refined_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_distilled_multi_teacher_model_random_emotion_epoch7_alpha0_8_refined_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/ArafatBHossain/bert-distilled-multi_teacher_model_random_emotion_epoch7_alpha0.8_refined + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-bert_distilled_multi_teacher_model_sentiment_hp_optimized_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-bert_distilled_multi_teacher_model_sentiment_hp_optimized_pipeline_en.md new file mode 100644 index 00000000000000..027bb04c53ab0d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-bert_distilled_multi_teacher_model_sentiment_hp_optimized_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_distilled_multi_teacher_model_sentiment_hp_optimized_pipeline pipeline DistilBertForSequenceClassification from ArafatBHossain +author: John Snow Labs +name: bert_distilled_multi_teacher_model_sentiment_hp_optimized_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_distilled_multi_teacher_model_sentiment_hp_optimized_pipeline` is a English model originally trained by ArafatBHossain. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_distilled_multi_teacher_model_sentiment_hp_optimized_pipeline_en_5.5.0_3.0_1726791978374.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_distilled_multi_teacher_model_sentiment_hp_optimized_pipeline_en_5.5.0_3.0_1726791978374.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_distilled_multi_teacher_model_sentiment_hp_optimized_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_distilled_multi_teacher_model_sentiment_hp_optimized_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_distilled_multi_teacher_model_sentiment_hp_optimized_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/ArafatBHossain/bert-distilled-multi_teacher_model_sentiment_hp_optimized + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-bert_en.md b/docs/_posts/ahmedlone127/2024-09-20-bert_en.md new file mode 100644 index 00000000000000..2153422414e1f0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-bert_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert RoBertaEmbeddings from ai-ar +author: John Snow Labs +name: bert +date: 2024-09-20 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert` is a English model originally trained by ai-ar. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_en_5.5.0_3.0_1726816418936.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_en_5.5.0_3.0_1726816418936.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("bert","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("bert","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/ai-ar/bert \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-bert_fine_tuned_cola_en.md b/docs/_posts/ahmedlone127/2024-09-20-bert_fine_tuned_cola_en.md new file mode 100644 index 00000000000000..5b4fc9b8f23777 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-bert_fine_tuned_cola_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_fine_tuned_cola DistilBertForSequenceClassification from erden00 +author: John Snow Labs +name: bert_fine_tuned_cola +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_fine_tuned_cola` is a English model originally trained by erden00. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_fine_tuned_cola_en_5.5.0_3.0_1726841550819.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_fine_tuned_cola_en_5.5.0_3.0_1726841550819.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("bert_fine_tuned_cola","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("bert_fine_tuned_cola", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_fine_tuned_cola| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/erden00/bert-fine-tuned-cola \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-bert_fine_tuned_cola_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-bert_fine_tuned_cola_pipeline_en.md new file mode 100644 index 00000000000000..5095df5dbbce6f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-bert_fine_tuned_cola_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_fine_tuned_cola_pipeline pipeline DistilBertForSequenceClassification from erden00 +author: John Snow Labs +name: bert_fine_tuned_cola_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_fine_tuned_cola_pipeline` is a English model originally trained by erden00. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_fine_tuned_cola_pipeline_en_5.5.0_3.0_1726841564036.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_fine_tuned_cola_pipeline_en_5.5.0_3.0_1726841564036.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_fine_tuned_cola_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_fine_tuned_cola_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_fine_tuned_cola_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/erden00/bert-fine-tuned-cola + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-bert_finetuned_ner_mbalos_en.md b/docs/_posts/ahmedlone127/2024-09-20-bert_finetuned_ner_mbalos_en.md new file mode 100644 index 00000000000000..9ec80c810db6e6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-bert_finetuned_ner_mbalos_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_finetuned_ner_mbalos BertForTokenClassification from mbalos +author: John Snow Labs +name: bert_finetuned_ner_mbalos +date: 2024-09-20 +tags: [en, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_finetuned_ner_mbalos` is a English model originally trained by mbalos. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_finetuned_ner_mbalos_en_5.5.0_3.0_1726840237378.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_finetuned_ner_mbalos_en_5.5.0_3.0_1726840237378.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("bert_finetuned_ner_mbalos","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_finetuned_ner_mbalos", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_finetuned_ner_mbalos| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/mbalos/bert-finetuned-ner \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-bert_finetuned_ner_mbalos_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-bert_finetuned_ner_mbalos_pipeline_en.md new file mode 100644 index 00000000000000..9df023f19e476c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-bert_finetuned_ner_mbalos_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_finetuned_ner_mbalos_pipeline pipeline BertForTokenClassification from mbalos +author: John Snow Labs +name: bert_finetuned_ner_mbalos_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_finetuned_ner_mbalos_pipeline` is a English model originally trained by mbalos. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_finetuned_ner_mbalos_pipeline_en_5.5.0_3.0_1726840256916.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_finetuned_ner_mbalos_pipeline_en_5.5.0_3.0_1726840256916.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_finetuned_ner_mbalos_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_finetuned_ner_mbalos_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_finetuned_ner_mbalos_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/mbalos/bert-finetuned-ner + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-bert_finetuned_resume_en.md b/docs/_posts/ahmedlone127/2024-09-20-bert_finetuned_resume_en.md new file mode 100644 index 00000000000000..4c451640d62eb6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-bert_finetuned_resume_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_finetuned_resume DistilBertForSequenceClassification from bayesian4042 +author: John Snow Labs +name: bert_finetuned_resume +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_finetuned_resume` is a English model originally trained by bayesian4042. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_finetuned_resume_en_5.5.0_3.0_1726832390913.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_finetuned_resume_en_5.5.0_3.0_1726832390913.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("bert_finetuned_resume","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("bert_finetuned_resume", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_finetuned_resume| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/bayesian4042/bert_finetuned_resume \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-bert_finetuned_resume_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-bert_finetuned_resume_pipeline_en.md new file mode 100644 index 00000000000000..d09c83b70a05f4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-bert_finetuned_resume_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_finetuned_resume_pipeline pipeline DistilBertForSequenceClassification from bayesian4042 +author: John Snow Labs +name: bert_finetuned_resume_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_finetuned_resume_pipeline` is a English model originally trained by bayesian4042. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_finetuned_resume_pipeline_en_5.5.0_3.0_1726832403290.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_finetuned_resume_pipeline_en_5.5.0_3.0_1726832403290.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_finetuned_resume_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_finetuned_resume_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_finetuned_resume_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/bayesian4042/bert_finetuned_resume + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-bert_fromscratch_galician_xlarge_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-bert_fromscratch_galician_xlarge_pipeline_en.md new file mode 100644 index 00000000000000..78a73217728a0f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-bert_fromscratch_galician_xlarge_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_fromscratch_galician_xlarge_pipeline pipeline RoBertaEmbeddings from fpuentes +author: John Snow Labs +name: bert_fromscratch_galician_xlarge_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_fromscratch_galician_xlarge_pipeline` is a English model originally trained by fpuentes. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_fromscratch_galician_xlarge_pipeline_en_5.5.0_3.0_1726793740355.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_fromscratch_galician_xlarge_pipeline_en_5.5.0_3.0_1726793740355.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_fromscratch_galician_xlarge_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_fromscratch_galician_xlarge_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_fromscratch_galician_xlarge_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/fpuentes/bert-fromscratch-galician-xlarge + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-bert_jigsaw_severetoxic_en.md b/docs/_posts/ahmedlone127/2024-09-20-bert_jigsaw_severetoxic_en.md new file mode 100644 index 00000000000000..1a1f4dfe6425d5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-bert_jigsaw_severetoxic_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_jigsaw_severetoxic BertForSequenceClassification from Cameron +author: John Snow Labs +name: bert_jigsaw_severetoxic +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_jigsaw_severetoxic` is a English model originally trained by Cameron. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_jigsaw_severetoxic_en_5.5.0_3.0_1726859936954.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_jigsaw_severetoxic_en_5.5.0_3.0_1726859936954.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_jigsaw_severetoxic","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_jigsaw_severetoxic", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_jigsaw_severetoxic| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|405.9 MB| + +## References + +https://huggingface.co/Cameron/BERT-jigsaw-severetoxic \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-bert_jigsaw_severetoxic_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-bert_jigsaw_severetoxic_pipeline_en.md new file mode 100644 index 00000000000000..4f56f09813da40 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-bert_jigsaw_severetoxic_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_jigsaw_severetoxic_pipeline pipeline BertForSequenceClassification from Cameron +author: John Snow Labs +name: bert_jigsaw_severetoxic_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_jigsaw_severetoxic_pipeline` is a English model originally trained by Cameron. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_jigsaw_severetoxic_pipeline_en_5.5.0_3.0_1726859956011.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_jigsaw_severetoxic_pipeline_en_5.5.0_3.0_1726859956011.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_jigsaw_severetoxic_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_jigsaw_severetoxic_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_jigsaw_severetoxic_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|405.9 MB| + +## References + +https://huggingface.co/Cameron/BERT-jigsaw-severetoxic + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-bert_math_en.md b/docs/_posts/ahmedlone127/2024-09-20-bert_math_en.md new file mode 100644 index 00000000000000..71b6a4269be696 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-bert_math_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_math DistilBertForSequenceClassification from CrissWang +author: John Snow Labs +name: bert_math +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_math` is a English model originally trained by CrissWang. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_math_en_5.5.0_3.0_1726871615556.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_math_en_5.5.0_3.0_1726871615556.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("bert_math","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("bert_math", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_math| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/CrissWang/bert-math \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-bert_math_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-bert_math_pipeline_en.md new file mode 100644 index 00000000000000..14ca460deabc7a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-bert_math_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_math_pipeline pipeline DistilBertForSequenceClassification from CrissWang +author: John Snow Labs +name: bert_math_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_math_pipeline` is a English model originally trained by CrissWang. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_math_pipeline_en_5.5.0_3.0_1726871627491.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_math_pipeline_en_5.5.0_3.0_1726871627491.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_math_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_math_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_math_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/CrissWang/bert-math + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-bert_multiclass_classification_en.md b/docs/_posts/ahmedlone127/2024-09-20-bert_multiclass_classification_en.md new file mode 100644 index 00000000000000..66f61127fc4c1d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-bert_multiclass_classification_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_multiclass_classification BertForSequenceClassification from Ronysalem +author: John Snow Labs +name: bert_multiclass_classification +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_multiclass_classification` is a English model originally trained by Ronysalem. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_multiclass_classification_en_5.5.0_3.0_1726797160131.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_multiclass_classification_en_5.5.0_3.0_1726797160131.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_multiclass_classification","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_multiclass_classification", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_multiclass_classification| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/Ronysalem/Bert-Multiclass-Classification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-bert_multilingual_uncased_intelligence_headlines_xx.md b/docs/_posts/ahmedlone127/2024-09-20-bert_multilingual_uncased_intelligence_headlines_xx.md new file mode 100644 index 00000000000000..f98e545bb77a1c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-bert_multilingual_uncased_intelligence_headlines_xx.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Multilingual bert_multilingual_uncased_intelligence_headlines BertForSequenceClassification from nlpodyssey +author: John Snow Labs +name: bert_multilingual_uncased_intelligence_headlines +date: 2024-09-20 +tags: [xx, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_multilingual_uncased_intelligence_headlines` is a Multilingual model originally trained by nlpodyssey. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_multilingual_uncased_intelligence_headlines_xx_5.5.0_3.0_1726795413707.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_multilingual_uncased_intelligence_headlines_xx_5.5.0_3.0_1726795413707.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_multilingual_uncased_intelligence_headlines","xx") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_multilingual_uncased_intelligence_headlines", "xx") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_multilingual_uncased_intelligence_headlines| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|xx| +|Size:|628.1 MB| + +## References + +https://huggingface.co/nlpodyssey/bert-multilingual-uncased-intelligence-headlines \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-bert_next_word_prediction_en.md b/docs/_posts/ahmedlone127/2024-09-20-bert_next_word_prediction_en.md new file mode 100644 index 00000000000000..89267551849de1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-bert_next_word_prediction_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_next_word_prediction BertEmbeddings from MattNandavong +author: John Snow Labs +name: bert_next_word_prediction +date: 2024-09-20 +tags: [en, open_source, onnx, embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_next_word_prediction` is a English model originally trained by MattNandavong. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_next_word_prediction_en_5.5.0_3.0_1726825703123.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_next_word_prediction_en_5.5.0_3.0_1726825703123.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = BertEmbeddings.pretrained("bert_next_word_prediction","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = BertEmbeddings.pretrained("bert_next_word_prediction","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_next_word_prediction| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[bert]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/MattNandavong/bert-next-word-prediction \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-bert_next_word_prediction_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-bert_next_word_prediction_pipeline_en.md new file mode 100644 index 00000000000000..2204d4826a3ec0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-bert_next_word_prediction_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_next_word_prediction_pipeline pipeline BertEmbeddings from MattNandavong +author: John Snow Labs +name: bert_next_word_prediction_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_next_word_prediction_pipeline` is a English model originally trained by MattNandavong. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_next_word_prediction_pipeline_en_5.5.0_3.0_1726825722673.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_next_word_prediction_pipeline_en_5.5.0_3.0_1726825722673.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_next_word_prediction_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_next_word_prediction_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_next_word_prediction_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/MattNandavong/bert-next-word-prediction + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-bert_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-bert_pipeline_en.md new file mode 100644 index 00000000000000..0af97ad2c7243f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-bert_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_pipeline pipeline RoBertaEmbeddings from ai-ar +author: John Snow Labs +name: bert_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_pipeline` is a English model originally trained by ai-ar. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_pipeline_en_5.5.0_3.0_1726816482305.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_pipeline_en_5.5.0_3.0_1726816482305.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/ai-ar/bert + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-bert_sbic_targetcategory_en.md b/docs/_posts/ahmedlone127/2024-09-20-bert_sbic_targetcategory_en.md new file mode 100644 index 00000000000000..09ecd87a983003 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-bert_sbic_targetcategory_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_sbic_targetcategory BertForSequenceClassification from Cameron +author: John Snow Labs +name: bert_sbic_targetcategory +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_sbic_targetcategory` is a English model originally trained by Cameron. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_sbic_targetcategory_en_5.5.0_3.0_1726860450040.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_sbic_targetcategory_en_5.5.0_3.0_1726860450040.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_sbic_targetcategory","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_sbic_targetcategory", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_sbic_targetcategory| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|405.9 MB| + +## References + +https://huggingface.co/Cameron/BERT-SBIC-targetcategory \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-bert_sbic_targetcategory_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-bert_sbic_targetcategory_pipeline_en.md new file mode 100644 index 00000000000000..eff2c0d4b44bb3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-bert_sbic_targetcategory_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_sbic_targetcategory_pipeline pipeline BertForSequenceClassification from Cameron +author: John Snow Labs +name: bert_sbic_targetcategory_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_sbic_targetcategory_pipeline` is a English model originally trained by Cameron. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_sbic_targetcategory_pipeline_en_5.5.0_3.0_1726860469150.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_sbic_targetcategory_pipeline_en_5.5.0_3.0_1726860469150.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_sbic_targetcategory_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_sbic_targetcategory_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_sbic_targetcategory_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|405.9 MB| + +## References + +https://huggingface.co/Cameron/BERT-SBIC-targetcategory + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-bert_sentiment_persian_farsi_rasooli3003_en.md b/docs/_posts/ahmedlone127/2024-09-20-bert_sentiment_persian_farsi_rasooli3003_en.md new file mode 100644 index 00000000000000..e76f522a8a2bd7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-bert_sentiment_persian_farsi_rasooli3003_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_sentiment_persian_farsi_rasooli3003 BertForSequenceClassification from Rasooli3003 +author: John Snow Labs +name: bert_sentiment_persian_farsi_rasooli3003 +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_sentiment_persian_farsi_rasooli3003` is a English model originally trained by Rasooli3003. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_sentiment_persian_farsi_rasooli3003_en_5.5.0_3.0_1726829010909.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_sentiment_persian_farsi_rasooli3003_en_5.5.0_3.0_1726829010909.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_sentiment_persian_farsi_rasooli3003","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_sentiment_persian_farsi_rasooli3003", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_sentiment_persian_farsi_rasooli3003| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|608.7 MB| + +## References + +https://huggingface.co/Rasooli3003/Bert-Sentiment-Fa \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-bert_sentiment_persian_farsi_rasooli3003_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-bert_sentiment_persian_farsi_rasooli3003_pipeline_en.md new file mode 100644 index 00000000000000..9bd819703b7475 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-bert_sentiment_persian_farsi_rasooli3003_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_sentiment_persian_farsi_rasooli3003_pipeline pipeline BertForSequenceClassification from Rasooli3003 +author: John Snow Labs +name: bert_sentiment_persian_farsi_rasooli3003_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_sentiment_persian_farsi_rasooli3003_pipeline` is a English model originally trained by Rasooli3003. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_sentiment_persian_farsi_rasooli3003_pipeline_en_5.5.0_3.0_1726829039256.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_sentiment_persian_farsi_rasooli3003_pipeline_en_5.5.0_3.0_1726829039256.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_sentiment_persian_farsi_rasooli3003_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_sentiment_persian_farsi_rasooli3003_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_sentiment_persian_farsi_rasooli3003_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|608.8 MB| + +## References + +https://huggingface.co/Rasooli3003/Bert-Sentiment-Fa + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-bert_test_abethman_en.md b/docs/_posts/ahmedlone127/2024-09-20-bert_test_abethman_en.md new file mode 100644 index 00000000000000..4a7113f6fecd00 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-bert_test_abethman_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_test_abethman DistilBertForSequenceClassification from abethman +author: John Snow Labs +name: bert_test_abethman +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_test_abethman` is a English model originally trained by abethman. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_test_abethman_en_5.5.0_3.0_1726860726205.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_test_abethman_en_5.5.0_3.0_1726860726205.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("bert_test_abethman","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("bert_test_abethman", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_test_abethman| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/abethman/bert_test \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-bert_test_abethman_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-bert_test_abethman_pipeline_en.md new file mode 100644 index 00000000000000..cbe358a1b6dc8c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-bert_test_abethman_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_test_abethman_pipeline pipeline DistilBertForSequenceClassification from abethman +author: John Snow Labs +name: bert_test_abethman_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_test_abethman_pipeline` is a English model originally trained by abethman. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_test_abethman_pipeline_en_5.5.0_3.0_1726860740027.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_test_abethman_pipeline_en_5.5.0_3.0_1726860740027.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_test_abethman_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_test_abethman_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_test_abethman_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/abethman/bert_test + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-bert_tiny_squadv2_en.md b/docs/_posts/ahmedlone127/2024-09-20-bert_tiny_squadv2_en.md new file mode 100644 index 00000000000000..b031290b2e19a7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-bert_tiny_squadv2_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_tiny_squadv2 BertForQuestionAnswering from VenkatManda +author: John Snow Labs +name: bert_tiny_squadv2 +date: 2024-09-20 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_tiny_squadv2` is a English model originally trained by VenkatManda. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_tiny_squadv2_en_5.5.0_3.0_1726820437390.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_tiny_squadv2_en_5.5.0_3.0_1726820437390.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_tiny_squadv2","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_tiny_squadv2", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_tiny_squadv2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|16.7 MB| + +## References + +https://huggingface.co/VenkatManda/bert-tiny-squadV2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-bert_tiny_squadv2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-bert_tiny_squadv2_pipeline_en.md new file mode 100644 index 00000000000000..105858d36daadb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-bert_tiny_squadv2_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_tiny_squadv2_pipeline pipeline BertForQuestionAnswering from VenkatManda +author: John Snow Labs +name: bert_tiny_squadv2_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_tiny_squadv2_pipeline` is a English model originally trained by VenkatManda. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_tiny_squadv2_pipeline_en_5.5.0_3.0_1726820438568.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_tiny_squadv2_pipeline_en_5.5.0_3.0_1726820438568.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_tiny_squadv2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_tiny_squadv2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_tiny_squadv2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|16.7 MB| + +## References + +https://huggingface.co/VenkatManda/bert-tiny-squadV2 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-bge_base_english_41_keys_phase_2_v1_en.md b/docs/_posts/ahmedlone127/2024-09-20-bge_base_english_41_keys_phase_2_v1_en.md new file mode 100644 index 00000000000000..aa15471d6507cd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-bge_base_english_41_keys_phase_2_v1_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English bge_base_english_41_keys_phase_2_v1 BGEEmbeddings from RishuD7 +author: John Snow Labs +name: bge_base_english_41_keys_phase_2_v1 +date: 2024-09-20 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_english_41_keys_phase_2_v1` is a English model originally trained by RishuD7. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_english_41_keys_phase_2_v1_en_5.5.0_3.0_1726831490154.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_english_41_keys_phase_2_v1_en_5.5.0_3.0_1726831490154.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("bge_base_english_41_keys_phase_2_v1","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("bge_base_english_41_keys_phase_2_v1","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_english_41_keys_phase_2_v1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|391.3 MB| + +## References + +https://huggingface.co/RishuD7/bge-base-en-41-keys-phase-2-v1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-bge_base_english_41_keys_phase_2_v1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-bge_base_english_41_keys_phase_2_v1_pipeline_en.md new file mode 100644 index 00000000000000..bc814de5412120 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-bge_base_english_41_keys_phase_2_v1_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bge_base_english_41_keys_phase_2_v1_pipeline pipeline BGEEmbeddings from RishuD7 +author: John Snow Labs +name: bge_base_english_41_keys_phase_2_v1_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_english_41_keys_phase_2_v1_pipeline` is a English model originally trained by RishuD7. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_english_41_keys_phase_2_v1_pipeline_en_5.5.0_3.0_1726831515475.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_english_41_keys_phase_2_v1_pipeline_en_5.5.0_3.0_1726831515475.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bge_base_english_41_keys_phase_2_v1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bge_base_english_41_keys_phase_2_v1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_english_41_keys_phase_2_v1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|391.3 MB| + +## References + +https://huggingface.co/RishuD7/bge-base-en-41-keys-phase-2-v1 + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-bge_base_english_v1_5_41_keys_phase_2_v1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-bge_base_english_v1_5_41_keys_phase_2_v1_pipeline_en.md new file mode 100644 index 00000000000000..317072d907a35b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-bge_base_english_v1_5_41_keys_phase_2_v1_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bge_base_english_v1_5_41_keys_phase_2_v1_pipeline pipeline BGEEmbeddings from RishuD7 +author: John Snow Labs +name: bge_base_english_v1_5_41_keys_phase_2_v1_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_english_v1_5_41_keys_phase_2_v1_pipeline` is a English model originally trained by RishuD7. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_english_v1_5_41_keys_phase_2_v1_pipeline_en_5.5.0_3.0_1726831516653.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_english_v1_5_41_keys_phase_2_v1_pipeline_en_5.5.0_3.0_1726831516653.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bge_base_english_v1_5_41_keys_phase_2_v1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bge_base_english_v1_5_41_keys_phase_2_v1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_english_v1_5_41_keys_phase_2_v1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|390.9 MB| + +## References + +https://huggingface.co/RishuD7/bge-base-en-v1.5-41-keys-phase-2-v1 + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-bge_base_english_v1_5_course_recommender_v1_en.md b/docs/_posts/ahmedlone127/2024-09-20-bge_base_english_v1_5_course_recommender_v1_en.md new file mode 100644 index 00000000000000..15f78c247f39f1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-bge_base_english_v1_5_course_recommender_v1_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English bge_base_english_v1_5_course_recommender_v1 BGEEmbeddings from sachin19566 +author: John Snow Labs +name: bge_base_english_v1_5_course_recommender_v1 +date: 2024-09-20 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_english_v1_5_course_recommender_v1` is a English model originally trained by sachin19566. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_english_v1_5_course_recommender_v1_en_5.5.0_3.0_1726831503772.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_english_v1_5_course_recommender_v1_en_5.5.0_3.0_1726831503772.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("bge_base_english_v1_5_course_recommender_v1","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("bge_base_english_v1_5_course_recommender_v1","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_english_v1_5_course_recommender_v1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|376.1 MB| + +## References + +https://huggingface.co/sachin19566/bge-base-en-v1.5-course-recommender-v1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-bge_base_english_v1_5_course_recommender_v1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-bge_base_english_v1_5_course_recommender_v1_pipeline_en.md new file mode 100644 index 00000000000000..edaabab8f89712 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-bge_base_english_v1_5_course_recommender_v1_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bge_base_english_v1_5_course_recommender_v1_pipeline pipeline BGEEmbeddings from sachin19566 +author: John Snow Labs +name: bge_base_english_v1_5_course_recommender_v1_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_english_v1_5_course_recommender_v1_pipeline` is a English model originally trained by sachin19566. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_english_v1_5_course_recommender_v1_pipeline_en_5.5.0_3.0_1726831533061.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_english_v1_5_course_recommender_v1_pipeline_en_5.5.0_3.0_1726831533061.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bge_base_english_v1_5_course_recommender_v1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bge_base_english_v1_5_course_recommender_v1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_english_v1_5_course_recommender_v1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|376.1 MB| + +## References + +https://huggingface.co/sachin19566/bge-base-en-v1.5-course-recommender-v1 + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-bhavik_finetuning_sentiment_model_1_en.md b/docs/_posts/ahmedlone127/2024-09-20-bhavik_finetuning_sentiment_model_1_en.md new file mode 100644 index 00000000000000..918f10dbdc4a24 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-bhavik_finetuning_sentiment_model_1_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bhavik_finetuning_sentiment_model_1 DistilBertForSequenceClassification from bhavikardeshna +author: John Snow Labs +name: bhavik_finetuning_sentiment_model_1 +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bhavik_finetuning_sentiment_model_1` is a English model originally trained by bhavikardeshna. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bhavik_finetuning_sentiment_model_1_en_5.5.0_3.0_1726861004087.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bhavik_finetuning_sentiment_model_1_en_5.5.0_3.0_1726861004087.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("bhavik_finetuning_sentiment_model_1","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("bhavik_finetuning_sentiment_model_1", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bhavik_finetuning_sentiment_model_1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/bhavikardeshna/bhavik-finetuning-sentiment-model-1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-bhavik_finetuning_sentiment_model_1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-bhavik_finetuning_sentiment_model_1_pipeline_en.md new file mode 100644 index 00000000000000..40e231209b8cb8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-bhavik_finetuning_sentiment_model_1_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bhavik_finetuning_sentiment_model_1_pipeline pipeline DistilBertForSequenceClassification from bhavikardeshna +author: John Snow Labs +name: bhavik_finetuning_sentiment_model_1_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bhavik_finetuning_sentiment_model_1_pipeline` is a English model originally trained by bhavikardeshna. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bhavik_finetuning_sentiment_model_1_pipeline_en_5.5.0_3.0_1726861015962.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bhavik_finetuning_sentiment_model_1_pipeline_en_5.5.0_3.0_1726861015962.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bhavik_finetuning_sentiment_model_1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bhavik_finetuning_sentiment_model_1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bhavik_finetuning_sentiment_model_1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/bhavikardeshna/bhavik-finetuning-sentiment-model-1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-bias_model_1_en.md b/docs/_posts/ahmedlone127/2024-09-20-bias_model_1_en.md new file mode 100644 index 00000000000000..974e272e136f3c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-bias_model_1_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bias_model_1 DistilBertForSequenceClassification from najeebY +author: John Snow Labs +name: bias_model_1 +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bias_model_1` is a English model originally trained by najeebY. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bias_model_1_en_5.5.0_3.0_1726830064659.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bias_model_1_en_5.5.0_3.0_1726830064659.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("bias_model_1","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("bias_model_1", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bias_model_1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/najeebY/bias_model_1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-bias_model_1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-bias_model_1_pipeline_en.md new file mode 100644 index 00000000000000..d6780743acf48d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-bias_model_1_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bias_model_1_pipeline pipeline DistilBertForSequenceClassification from najeebY +author: John Snow Labs +name: bias_model_1_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bias_model_1_pipeline` is a English model originally trained by najeebY. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bias_model_1_pipeline_en_5.5.0_3.0_1726830076743.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bias_model_1_pipeline_en_5.5.0_3.0_1726830076743.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bias_model_1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bias_model_1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bias_model_1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/najeebY/bias_model_1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-bobo_emb_en.md b/docs/_posts/ahmedlone127/2024-09-20-bobo_emb_en.md new file mode 100644 index 00000000000000..aacfe6d7b2be8b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-bobo_emb_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bobo_emb BertForSequenceClassification from Bobouo +author: John Snow Labs +name: bobo_emb +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bobo_emb` is a English model originally trained by Bobouo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bobo_emb_en_5.5.0_3.0_1726829231032.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bobo_emb_en_5.5.0_3.0_1726829231032.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bobo_emb","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bobo_emb", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bobo_emb| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.9 MB| + +## References + +https://huggingface.co/Bobouo/Bobo_emb \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-bobo_emb_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-bobo_emb_pipeline_en.md new file mode 100644 index 00000000000000..99bad34e5a56de --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-bobo_emb_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bobo_emb_pipeline pipeline BertForSequenceClassification from Bobouo +author: John Snow Labs +name: bobo_emb_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bobo_emb_pipeline` is a English model originally trained by Bobouo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bobo_emb_pipeline_en_5.5.0_3.0_1726829250443.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bobo_emb_pipeline_en_5.5.0_3.0_1726829250443.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bobo_emb_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bobo_emb_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bobo_emb_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|410.0 MB| + +## References + +https://huggingface.co/Bobouo/Bobo_emb + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-bsc_bio_ehr_spanish_carmen_symptemist_es.md b/docs/_posts/ahmedlone127/2024-09-20-bsc_bio_ehr_spanish_carmen_symptemist_es.md new file mode 100644 index 00000000000000..a396a60d455108 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-bsc_bio_ehr_spanish_carmen_symptemist_es.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Castilian, Spanish bsc_bio_ehr_spanish_carmen_symptemist RoBertaForTokenClassification from BSC-NLP4BIA +author: John Snow Labs +name: bsc_bio_ehr_spanish_carmen_symptemist +date: 2024-09-20 +tags: [es, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: es +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bsc_bio_ehr_spanish_carmen_symptemist` is a Castilian, Spanish model originally trained by BSC-NLP4BIA. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bsc_bio_ehr_spanish_carmen_symptemist_es_5.5.0_3.0_1726862795907.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bsc_bio_ehr_spanish_carmen_symptemist_es_5.5.0_3.0_1726862795907.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("bsc_bio_ehr_spanish_carmen_symptemist","es") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("bsc_bio_ehr_spanish_carmen_symptemist", "es") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bsc_bio_ehr_spanish_carmen_symptemist| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|es| +|Size:|449.4 MB| + +## References + +https://huggingface.co/BSC-NLP4BIA/bsc-bio-ehr-es-carmen-symptemist \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-bsc_bio_ehr_spanish_combined_train_distemist_dev_word2vec_85_ner_en.md b/docs/_posts/ahmedlone127/2024-09-20-bsc_bio_ehr_spanish_combined_train_distemist_dev_word2vec_85_ner_en.md new file mode 100644 index 00000000000000..70a7ebbeb67dec --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-bsc_bio_ehr_spanish_combined_train_distemist_dev_word2vec_85_ner_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bsc_bio_ehr_spanish_combined_train_distemist_dev_word2vec_85_ner RoBertaForTokenClassification from Rodrigo1771 +author: John Snow Labs +name: bsc_bio_ehr_spanish_combined_train_distemist_dev_word2vec_85_ner +date: 2024-09-20 +tags: [en, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bsc_bio_ehr_spanish_combined_train_distemist_dev_word2vec_85_ner` is a English model originally trained by Rodrigo1771. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bsc_bio_ehr_spanish_combined_train_distemist_dev_word2vec_85_ner_en_5.5.0_3.0_1726847447745.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bsc_bio_ehr_spanish_combined_train_distemist_dev_word2vec_85_ner_en_5.5.0_3.0_1726847447745.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("bsc_bio_ehr_spanish_combined_train_distemist_dev_word2vec_85_ner","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("bsc_bio_ehr_spanish_combined_train_distemist_dev_word2vec_85_ner", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bsc_bio_ehr_spanish_combined_train_distemist_dev_word2vec_85_ner| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|440.6 MB| + +## References + +https://huggingface.co/Rodrigo1771/bsc-bio-ehr-es-combined-train-distemist-dev-word2vec-85-ner \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-bsc_bio_ehr_spanish_symptemist_fasttext_75_ner_en.md b/docs/_posts/ahmedlone127/2024-09-20-bsc_bio_ehr_spanish_symptemist_fasttext_75_ner_en.md new file mode 100644 index 00000000000000..be43401cbac69f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-bsc_bio_ehr_spanish_symptemist_fasttext_75_ner_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bsc_bio_ehr_spanish_symptemist_fasttext_75_ner RoBertaForTokenClassification from Rodrigo1771 +author: John Snow Labs +name: bsc_bio_ehr_spanish_symptemist_fasttext_75_ner +date: 2024-09-20 +tags: [en, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bsc_bio_ehr_spanish_symptemist_fasttext_75_ner` is a English model originally trained by Rodrigo1771. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bsc_bio_ehr_spanish_symptemist_fasttext_75_ner_en_5.5.0_3.0_1726847469915.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bsc_bio_ehr_spanish_symptemist_fasttext_75_ner_en_5.5.0_3.0_1726847469915.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("bsc_bio_ehr_spanish_symptemist_fasttext_75_ner","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("bsc_bio_ehr_spanish_symptemist_fasttext_75_ner", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bsc_bio_ehr_spanish_symptemist_fasttext_75_ner| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|435.7 MB| + +## References + +https://huggingface.co/Rodrigo1771/bsc-bio-ehr-es-symptemist-fasttext-75-ner \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-bsc_bio_ehr_spanish_symptemist_fasttext_75_ner_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-bsc_bio_ehr_spanish_symptemist_fasttext_75_ner_pipeline_en.md new file mode 100644 index 00000000000000..8cf4271737b34c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-bsc_bio_ehr_spanish_symptemist_fasttext_75_ner_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bsc_bio_ehr_spanish_symptemist_fasttext_75_ner_pipeline pipeline RoBertaForTokenClassification from Rodrigo1771 +author: John Snow Labs +name: bsc_bio_ehr_spanish_symptemist_fasttext_75_ner_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bsc_bio_ehr_spanish_symptemist_fasttext_75_ner_pipeline` is a English model originally trained by Rodrigo1771. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bsc_bio_ehr_spanish_symptemist_fasttext_75_ner_pipeline_en_5.5.0_3.0_1726847503107.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bsc_bio_ehr_spanish_symptemist_fasttext_75_ner_pipeline_en_5.5.0_3.0_1726847503107.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bsc_bio_ehr_spanish_symptemist_fasttext_75_ner_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bsc_bio_ehr_spanish_symptemist_fasttext_75_ner_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bsc_bio_ehr_spanish_symptemist_fasttext_75_ner_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|435.7 MB| + +## References + +https://huggingface.co/Rodrigo1771/bsc-bio-ehr-es-symptemist-fasttext-75-ner + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_eli5_mlm_model_aldaalmira_en.md b/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_eli5_mlm_model_aldaalmira_en.md new file mode 100644 index 00000000000000..397306529ca6b1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_eli5_mlm_model_aldaalmira_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_awesome_eli5_mlm_model_aldaalmira RoBertaEmbeddings from aldaalmira +author: John Snow Labs +name: burmese_awesome_eli5_mlm_model_aldaalmira +date: 2024-09-20 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_eli5_mlm_model_aldaalmira` is a English model originally trained by aldaalmira. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_eli5_mlm_model_aldaalmira_en_5.5.0_3.0_1726857403124.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_eli5_mlm_model_aldaalmira_en_5.5.0_3.0_1726857403124.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("burmese_awesome_eli5_mlm_model_aldaalmira","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("burmese_awesome_eli5_mlm_model_aldaalmira","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_eli5_mlm_model_aldaalmira| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|306.5 MB| + +## References + +https://huggingface.co/aldaalmira/my_awesome_eli5_mlm_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_eli5_mlm_model_aldaalmira_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_eli5_mlm_model_aldaalmira_pipeline_en.md new file mode 100644 index 00000000000000..f007111ae4b229 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_eli5_mlm_model_aldaalmira_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_eli5_mlm_model_aldaalmira_pipeline pipeline RoBertaEmbeddings from aldaalmira +author: John Snow Labs +name: burmese_awesome_eli5_mlm_model_aldaalmira_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_eli5_mlm_model_aldaalmira_pipeline` is a English model originally trained by aldaalmira. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_eli5_mlm_model_aldaalmira_pipeline_en_5.5.0_3.0_1726857417478.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_eli5_mlm_model_aldaalmira_pipeline_en_5.5.0_3.0_1726857417478.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_eli5_mlm_model_aldaalmira_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_eli5_mlm_model_aldaalmira_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_eli5_mlm_model_aldaalmira_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|306.5 MB| + +## References + +https://huggingface.co/aldaalmira/my_awesome_eli5_mlm_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_eli5_mlm_model_zdaniar_en.md b/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_eli5_mlm_model_zdaniar_en.md new file mode 100644 index 00000000000000..7a680b58f8c7ce --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_eli5_mlm_model_zdaniar_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_awesome_eli5_mlm_model_zdaniar RoBertaEmbeddings from zdaniar +author: John Snow Labs +name: burmese_awesome_eli5_mlm_model_zdaniar +date: 2024-09-20 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_eli5_mlm_model_zdaniar` is a English model originally trained by zdaniar. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_eli5_mlm_model_zdaniar_en_5.5.0_3.0_1726796423076.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_eli5_mlm_model_zdaniar_en_5.5.0_3.0_1726796423076.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("burmese_awesome_eli5_mlm_model_zdaniar","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("burmese_awesome_eli5_mlm_model_zdaniar","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_eli5_mlm_model_zdaniar| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|306.3 MB| + +## References + +https://huggingface.co/zdaniar/my_awesome_eli5_mlm_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_eli5_mlm_model_zdaniar_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_eli5_mlm_model_zdaniar_pipeline_en.md new file mode 100644 index 00000000000000..fcfbd91aa53ffc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_eli5_mlm_model_zdaniar_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_eli5_mlm_model_zdaniar_pipeline pipeline RoBertaEmbeddings from zdaniar +author: John Snow Labs +name: burmese_awesome_eli5_mlm_model_zdaniar_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_eli5_mlm_model_zdaniar_pipeline` is a English model originally trained by zdaniar. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_eli5_mlm_model_zdaniar_pipeline_en_5.5.0_3.0_1726796437314.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_eli5_mlm_model_zdaniar_pipeline_en_5.5.0_3.0_1726796437314.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_eli5_mlm_model_zdaniar_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_eli5_mlm_model_zdaniar_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_eli5_mlm_model_zdaniar_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|306.3 MB| + +## References + +https://huggingface.co/zdaniar/my_awesome_eli5_mlm_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_model_akhil9514_en.md b/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_model_akhil9514_en.md new file mode 100644 index 00000000000000..39471c45d5333d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_model_akhil9514_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_awesome_model_akhil9514 DistilBertForSequenceClassification from Akhil9514 +author: John Snow Labs +name: burmese_awesome_model_akhil9514 +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_akhil9514` is a English model originally trained by Akhil9514. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_akhil9514_en_5.5.0_3.0_1726832737646.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_akhil9514_en_5.5.0_3.0_1726832737646.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_akhil9514","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_akhil9514", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_akhil9514| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Akhil9514/my_awesome_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_model_akhil9514_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_model_akhil9514_pipeline_en.md new file mode 100644 index 00000000000000..8984ce9be8fcab --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_model_akhil9514_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_model_akhil9514_pipeline pipeline DistilBertForSequenceClassification from Akhil9514 +author: John Snow Labs +name: burmese_awesome_model_akhil9514_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_akhil9514_pipeline` is a English model originally trained by Akhil9514. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_akhil9514_pipeline_en_5.5.0_3.0_1726832753670.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_akhil9514_pipeline_en_5.5.0_3.0_1726832753670.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_model_akhil9514_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_model_akhil9514_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_akhil9514_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Akhil9514/my_awesome_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_model_bartmachielsen_en.md b/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_model_bartmachielsen_en.md new file mode 100644 index 00000000000000..2dceb67d609734 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_model_bartmachielsen_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_awesome_model_bartmachielsen DistilBertForSequenceClassification from bartmachielsen +author: John Snow Labs +name: burmese_awesome_model_bartmachielsen +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_bartmachielsen` is a English model originally trained by bartmachielsen. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_bartmachielsen_en_5.5.0_3.0_1726809385590.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_bartmachielsen_en_5.5.0_3.0_1726809385590.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_bartmachielsen","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_bartmachielsen", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_bartmachielsen| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/bartmachielsen/my_awesome_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_model_bartmachielsen_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_model_bartmachielsen_pipeline_en.md new file mode 100644 index 00000000000000..ea81c18db272cf --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_model_bartmachielsen_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_model_bartmachielsen_pipeline pipeline DistilBertForSequenceClassification from bartmachielsen +author: John Snow Labs +name: burmese_awesome_model_bartmachielsen_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_bartmachielsen_pipeline` is a English model originally trained by bartmachielsen. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_bartmachielsen_pipeline_en_5.5.0_3.0_1726809397411.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_bartmachielsen_pipeline_en_5.5.0_3.0_1726809397411.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_model_bartmachielsen_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_model_bartmachielsen_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_bartmachielsen_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/bartmachielsen/my_awesome_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_model_blitzapurva_en.md b/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_model_blitzapurva_en.md new file mode 100644 index 00000000000000..2756749e6d2d0e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_model_blitzapurva_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_awesome_model_blitzapurva DistilBertForSequenceClassification from blitzapurva +author: John Snow Labs +name: burmese_awesome_model_blitzapurva +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_blitzapurva` is a English model originally trained by blitzapurva. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_blitzapurva_en_5.5.0_3.0_1726848732914.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_blitzapurva_en_5.5.0_3.0_1726848732914.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_blitzapurva","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_blitzapurva", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_blitzapurva| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/blitzapurva/my_awesome_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_model_blitzapurva_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_model_blitzapurva_pipeline_en.md new file mode 100644 index 00000000000000..ae467dcf656848 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_model_blitzapurva_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_model_blitzapurva_pipeline pipeline DistilBertForSequenceClassification from blitzapurva +author: John Snow Labs +name: burmese_awesome_model_blitzapurva_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_blitzapurva_pipeline` is a English model originally trained by blitzapurva. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_blitzapurva_pipeline_en_5.5.0_3.0_1726848744568.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_blitzapurva_pipeline_en_5.5.0_3.0_1726848744568.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_model_blitzapurva_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_model_blitzapurva_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_blitzapurva_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/blitzapurva/my_awesome_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_model_bsgreenb_en.md b/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_model_bsgreenb_en.md new file mode 100644 index 00000000000000..beda77d20a5b3e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_model_bsgreenb_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_awesome_model_bsgreenb DistilBertForSequenceClassification from bsgreenb +author: John Snow Labs +name: burmese_awesome_model_bsgreenb +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_bsgreenb` is a English model originally trained by bsgreenb. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_bsgreenb_en_5.5.0_3.0_1726832954871.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_bsgreenb_en_5.5.0_3.0_1726832954871.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_bsgreenb","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_bsgreenb", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_bsgreenb| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/bsgreenb/my_awesome_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_model_bsgreenb_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_model_bsgreenb_pipeline_en.md new file mode 100644 index 00000000000000..f358b353185e7f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_model_bsgreenb_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_model_bsgreenb_pipeline pipeline DistilBertForSequenceClassification from bsgreenb +author: John Snow Labs +name: burmese_awesome_model_bsgreenb_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_bsgreenb_pipeline` is a English model originally trained by bsgreenb. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_bsgreenb_pipeline_en_5.5.0_3.0_1726832968996.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_bsgreenb_pipeline_en_5.5.0_3.0_1726832968996.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_model_bsgreenb_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_model_bsgreenb_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_bsgreenb_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/bsgreenb/my_awesome_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_model_charlie82_en.md b/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_model_charlie82_en.md new file mode 100644 index 00000000000000..4ed6b83195cc8d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_model_charlie82_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_awesome_model_charlie82 DistilBertForSequenceClassification from charlie82 +author: John Snow Labs +name: burmese_awesome_model_charlie82 +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_charlie82` is a English model originally trained by charlie82. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_charlie82_en_5.5.0_3.0_1726830200215.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_charlie82_en_5.5.0_3.0_1726830200215.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_charlie82","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_charlie82", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_charlie82| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/charlie82/my_awesome_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_model_charlie82_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_model_charlie82_pipeline_en.md new file mode 100644 index 00000000000000..d999261a94d92b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_model_charlie82_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_model_charlie82_pipeline pipeline DistilBertForSequenceClassification from charlie82 +author: John Snow Labs +name: burmese_awesome_model_charlie82_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_charlie82_pipeline` is a English model originally trained by charlie82. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_charlie82_pipeline_en_5.5.0_3.0_1726830212915.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_charlie82_pipeline_en_5.5.0_3.0_1726830212915.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_model_charlie82_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_model_charlie82_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_charlie82_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/charlie82/my_awesome_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_model_diodiodada_en.md b/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_model_diodiodada_en.md new file mode 100644 index 00000000000000..423826818ef45a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_model_diodiodada_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_awesome_model_diodiodada DistilBertForSequenceClassification from diodiodada +author: John Snow Labs +name: burmese_awesome_model_diodiodada +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_diodiodada` is a English model originally trained by diodiodada. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_diodiodada_en_5.5.0_3.0_1726809211954.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_diodiodada_en_5.5.0_3.0_1726809211954.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_diodiodada","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_diodiodada", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_diodiodada| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/diodiodada/my_awesome_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_model_feelwoo_en.md b/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_model_feelwoo_en.md new file mode 100644 index 00000000000000..c6ef8f244e9ca5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_model_feelwoo_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_awesome_model_feelwoo DistilBertForSequenceClassification from feelwoo +author: John Snow Labs +name: burmese_awesome_model_feelwoo +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_feelwoo` is a English model originally trained by feelwoo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_feelwoo_en_5.5.0_3.0_1726842266619.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_feelwoo_en_5.5.0_3.0_1726842266619.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_feelwoo","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_feelwoo", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_feelwoo| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/feelwoo/my_awesome_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_model_feelwoo_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_model_feelwoo_pipeline_en.md new file mode 100644 index 00000000000000..9b19fa35998423 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_model_feelwoo_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_model_feelwoo_pipeline pipeline DistilBertForSequenceClassification from feelwoo +author: John Snow Labs +name: burmese_awesome_model_feelwoo_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_feelwoo_pipeline` is a English model originally trained by feelwoo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_feelwoo_pipeline_en_5.5.0_3.0_1726842278980.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_feelwoo_pipeline_en_5.5.0_3.0_1726842278980.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_model_feelwoo_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_model_feelwoo_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_feelwoo_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/feelwoo/my_awesome_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_model_gamdalf_en.md b/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_model_gamdalf_en.md new file mode 100644 index 00000000000000..c9336817ce0a41 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_model_gamdalf_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_awesome_model_gamdalf DistilBertForSequenceClassification from Gamdalf +author: John Snow Labs +name: burmese_awesome_model_gamdalf +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_gamdalf` is a English model originally trained by Gamdalf. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_gamdalf_en_5.5.0_3.0_1726842369443.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_gamdalf_en_5.5.0_3.0_1726842369443.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_gamdalf","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_gamdalf", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_gamdalf| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Gamdalf/my_awesome_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_model_gamdalf_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_model_gamdalf_pipeline_en.md new file mode 100644 index 00000000000000..a73027e59d3af4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_model_gamdalf_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_model_gamdalf_pipeline pipeline DistilBertForSequenceClassification from Gamdalf +author: John Snow Labs +name: burmese_awesome_model_gamdalf_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_gamdalf_pipeline` is a English model originally trained by Gamdalf. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_gamdalf_pipeline_en_5.5.0_3.0_1726842381442.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_gamdalf_pipeline_en_5.5.0_3.0_1726842381442.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_model_gamdalf_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_model_gamdalf_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_gamdalf_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Gamdalf/my_awesome_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_model_gauravr12060102_en.md b/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_model_gauravr12060102_en.md new file mode 100644 index 00000000000000..3c84e0c41a38d1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_model_gauravr12060102_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_awesome_model_gauravr12060102 DistilBertForSequenceClassification from GauravR12060102 +author: John Snow Labs +name: burmese_awesome_model_gauravr12060102 +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_gauravr12060102` is a English model originally trained by GauravR12060102. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_gauravr12060102_en_5.5.0_3.0_1726832394540.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_gauravr12060102_en_5.5.0_3.0_1726832394540.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_gauravr12060102","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_gauravr12060102", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_gauravr12060102| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/GauravR12060102/my_awesome_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_model_gauravr12060102_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_model_gauravr12060102_pipeline_en.md new file mode 100644 index 00000000000000..afdb6bb4a0f432 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_model_gauravr12060102_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_model_gauravr12060102_pipeline pipeline DistilBertForSequenceClassification from GauravR12060102 +author: John Snow Labs +name: burmese_awesome_model_gauravr12060102_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_gauravr12060102_pipeline` is a English model originally trained by GauravR12060102. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_gauravr12060102_pipeline_en_5.5.0_3.0_1726832408770.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_gauravr12060102_pipeline_en_5.5.0_3.0_1726832408770.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_model_gauravr12060102_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_model_gauravr12060102_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_gauravr12060102_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/GauravR12060102/my_awesome_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_model_imdb_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_model_imdb_pipeline_en.md new file mode 100644 index 00000000000000..59129ef7ab2fed --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_model_imdb_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_model_imdb_pipeline pipeline DistilBertForSequenceClassification from Sif10 +author: John Snow Labs +name: burmese_awesome_model_imdb_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_imdb_pipeline` is a English model originally trained by Sif10. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_imdb_pipeline_en_5.5.0_3.0_1726809116712.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_imdb_pipeline_en_5.5.0_3.0_1726809116712.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_model_imdb_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_model_imdb_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_imdb_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Sif10/my_awesome_model_imdb + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_model_priority_3_en.md b/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_model_priority_3_en.md new file mode 100644 index 00000000000000..8f3dfe47d356b6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_model_priority_3_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_awesome_model_priority_3 DistilBertForSequenceClassification from lenate +author: John Snow Labs +name: burmese_awesome_model_priority_3 +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_priority_3` is a English model originally trained by lenate. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_priority_3_en_5.5.0_3.0_1726842284296.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_priority_3_en_5.5.0_3.0_1726842284296.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_priority_3","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_priority_3", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_priority_3| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/lenate/my_awesome_model_priority_3 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_model_priority_3_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_model_priority_3_pipeline_en.md new file mode 100644 index 00000000000000..9c4b7c77bb86fc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_model_priority_3_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_model_priority_3_pipeline pipeline DistilBertForSequenceClassification from lenate +author: John Snow Labs +name: burmese_awesome_model_priority_3_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_priority_3_pipeline` is a English model originally trained by lenate. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_priority_3_pipeline_en_5.5.0_3.0_1726842296394.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_priority_3_pipeline_en_5.5.0_3.0_1726842296394.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_model_priority_3_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_model_priority_3_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_priority_3_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/lenate/my_awesome_model_priority_3 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_model_rk212_en.md b/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_model_rk212_en.md new file mode 100644 index 00000000000000..e8d8edc6c7e85a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_model_rk212_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_awesome_model_rk212 DistilBertForSequenceClassification from rk212 +author: John Snow Labs +name: burmese_awesome_model_rk212 +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_rk212` is a English model originally trained by rk212. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_rk212_en_5.5.0_3.0_1726833038495.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_rk212_en_5.5.0_3.0_1726833038495.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_rk212","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_rk212", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_rk212| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/rk212/my_awesome_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_model_rk212_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_model_rk212_pipeline_en.md new file mode 100644 index 00000000000000..767a7b76e4b30e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_model_rk212_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_model_rk212_pipeline pipeline DistilBertForSequenceClassification from rk212 +author: John Snow Labs +name: burmese_awesome_model_rk212_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_rk212_pipeline` is a English model originally trained by rk212. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_rk212_pipeline_en_5.5.0_3.0_1726833050390.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_rk212_pipeline_en_5.5.0_3.0_1726833050390.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_model_rk212_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_model_rk212_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_rk212_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/rk212/my_awesome_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_model_robinsh2023_en.md b/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_model_robinsh2023_en.md new file mode 100644 index 00000000000000..5ef4dcca6f7be8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_model_robinsh2023_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_awesome_model_robinsh2023 DistilBertForSequenceClassification from Robinsh2023 +author: John Snow Labs +name: burmese_awesome_model_robinsh2023 +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_robinsh2023` is a English model originally trained by Robinsh2023. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_robinsh2023_en_5.5.0_3.0_1726808984439.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_robinsh2023_en_5.5.0_3.0_1726808984439.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_robinsh2023","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_robinsh2023", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_robinsh2023| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Robinsh2023/my_awesome_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_model_ruhullah1_en.md b/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_model_ruhullah1_en.md new file mode 100644 index 00000000000000..16cbf503fe2a49 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_model_ruhullah1_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_awesome_model_ruhullah1 DistilBertForSequenceClassification from ruhullah1 +author: John Snow Labs +name: burmese_awesome_model_ruhullah1 +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_ruhullah1` is a English model originally trained by ruhullah1. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_ruhullah1_en_5.5.0_3.0_1726841464927.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_ruhullah1_en_5.5.0_3.0_1726841464927.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_ruhullah1","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_ruhullah1", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_ruhullah1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/ruhullah1/my_awesome_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_model_ruhullah1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_model_ruhullah1_pipeline_en.md new file mode 100644 index 00000000000000..554c459b510404 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_model_ruhullah1_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_model_ruhullah1_pipeline pipeline DistilBertForSequenceClassification from ruhullah1 +author: John Snow Labs +name: burmese_awesome_model_ruhullah1_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_ruhullah1_pipeline` is a English model originally trained by ruhullah1. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_ruhullah1_pipeline_en_5.5.0_3.0_1726841476969.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_ruhullah1_pipeline_en_5.5.0_3.0_1726841476969.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_model_ruhullah1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_model_ruhullah1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_ruhullah1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/ruhullah1/my_awesome_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_model_s_kinoshita_en.md b/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_model_s_kinoshita_en.md new file mode 100644 index 00000000000000..85405dbb2e5a40 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_model_s_kinoshita_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_awesome_model_s_kinoshita DistilBertForSequenceClassification from s-kinoshita +author: John Snow Labs +name: burmese_awesome_model_s_kinoshita +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_s_kinoshita` is a English model originally trained by s-kinoshita. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_s_kinoshita_en_5.5.0_3.0_1726809207565.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_s_kinoshita_en_5.5.0_3.0_1726809207565.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_s_kinoshita","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_s_kinoshita", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_s_kinoshita| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/s-kinoshita/my_awesome_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_model_sibumi_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_model_sibumi_pipeline_en.md new file mode 100644 index 00000000000000..15e24d76fb31ac --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_model_sibumi_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_model_sibumi_pipeline pipeline DistilBertForSequenceClassification from sibumi +author: John Snow Labs +name: burmese_awesome_model_sibumi_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_sibumi_pipeline` is a English model originally trained by sibumi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_sibumi_pipeline_en_5.5.0_3.0_1726791888082.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_sibumi_pipeline_en_5.5.0_3.0_1726791888082.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_model_sibumi_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_model_sibumi_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_sibumi_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/sibumi/my_awesome_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_model_soosookentelmanis_en.md b/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_model_soosookentelmanis_en.md new file mode 100644 index 00000000000000..521e5a43b572af --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_model_soosookentelmanis_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_awesome_model_soosookentelmanis DistilBertForSequenceClassification from soosookentelmanis +author: John Snow Labs +name: burmese_awesome_model_soosookentelmanis +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_soosookentelmanis` is a English model originally trained by soosookentelmanis. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_soosookentelmanis_en_5.5.0_3.0_1726809203883.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_soosookentelmanis_en_5.5.0_3.0_1726809203883.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_soosookentelmanis","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_soosookentelmanis", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_soosookentelmanis| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/soosookentelmanis/my_awesome_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_model_souh333_en.md b/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_model_souh333_en.md new file mode 100644 index 00000000000000..ca12d2b4342054 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_model_souh333_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_awesome_model_souh333 DistilBertForSequenceClassification from Souh333 +author: John Snow Labs +name: burmese_awesome_model_souh333 +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_souh333` is a English model originally trained by Souh333. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_souh333_en_5.5.0_3.0_1726832827508.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_souh333_en_5.5.0_3.0_1726832827508.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_souh333","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_souh333", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_souh333| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Souh333/my_awesome_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_model_souh333_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_model_souh333_pipeline_en.md new file mode 100644 index 00000000000000..bf5e01285ffbc9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_model_souh333_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_model_souh333_pipeline pipeline DistilBertForSequenceClassification from Souh333 +author: John Snow Labs +name: burmese_awesome_model_souh333_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_souh333_pipeline` is a English model originally trained by Souh333. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_souh333_pipeline_en_5.5.0_3.0_1726832840123.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_souh333_pipeline_en_5.5.0_3.0_1726832840123.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_model_souh333_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_model_souh333_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_souh333_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Souh333/my_awesome_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_model_thepixel42_en.md b/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_model_thepixel42_en.md new file mode 100644 index 00000000000000..5407dd9f6712ae --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_model_thepixel42_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_awesome_model_thepixel42 DistilBertForSequenceClassification from thePixel42 +author: John Snow Labs +name: burmese_awesome_model_thepixel42 +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_thepixel42` is a English model originally trained by thePixel42. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_thepixel42_en_5.5.0_3.0_1726809082914.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_thepixel42_en_5.5.0_3.0_1726809082914.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_thepixel42","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_thepixel42", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_thepixel42| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/thePixel42/my_awesome_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_model_thepixel42_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_model_thepixel42_pipeline_en.md new file mode 100644 index 00000000000000..216876f466fd90 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_model_thepixel42_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_model_thepixel42_pipeline pipeline DistilBertForSequenceClassification from thePixel42 +author: John Snow Labs +name: burmese_awesome_model_thepixel42_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_thepixel42_pipeline` is a English model originally trained by thePixel42. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_thepixel42_pipeline_en_5.5.0_3.0_1726809096587.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_thepixel42_pipeline_en_5.5.0_3.0_1726809096587.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_model_thepixel42_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_model_thepixel42_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_thepixel42_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/thePixel42/my_awesome_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_model_zera09_en.md b/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_model_zera09_en.md new file mode 100644 index 00000000000000..74c42fccada25c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_model_zera09_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_awesome_model_zera09 DistilBertForSequenceClassification from zera09 +author: John Snow Labs +name: burmese_awesome_model_zera09 +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_zera09` is a English model originally trained by zera09. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_zera09_en_5.5.0_3.0_1726832738505.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_zera09_en_5.5.0_3.0_1726832738505.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_zera09","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_zera09", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_zera09| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/zera09/my_awesome_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_model_zera09_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_model_zera09_pipeline_en.md new file mode 100644 index 00000000000000..b95ce953debe10 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_model_zera09_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_model_zera09_pipeline pipeline DistilBertForSequenceClassification from zera09 +author: John Snow Labs +name: burmese_awesome_model_zera09_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_zera09_pipeline` is a English model originally trained by zera09. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_zera09_pipeline_en_5.5.0_3.0_1726832754007.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_zera09_pipeline_en_5.5.0_3.0_1726832754007.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_model_zera09_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_model_zera09_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_zera09_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/zera09/my_awesome_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_qa_model_anamgarcia_en.md b/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_qa_model_anamgarcia_en.md new file mode 100644 index 00000000000000..71d719cdf7bb4f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_qa_model_anamgarcia_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English burmese_awesome_qa_model_anamgarcia DistilBertForQuestionAnswering from anamgarcia +author: John Snow Labs +name: burmese_awesome_qa_model_anamgarcia +date: 2024-09-20 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_qa_model_anamgarcia` is a English model originally trained by anamgarcia. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_anamgarcia_en_5.5.0_3.0_1726851164219.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_anamgarcia_en_5.5.0_3.0_1726851164219.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("burmese_awesome_qa_model_anamgarcia","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("burmese_awesome_qa_model_anamgarcia", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_qa_model_anamgarcia| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/anamgarcia/my_awesome_qa_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_qa_model_anamgarcia_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_qa_model_anamgarcia_pipeline_en.md new file mode 100644 index 00000000000000..e45a98563f89cd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_qa_model_anamgarcia_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English burmese_awesome_qa_model_anamgarcia_pipeline pipeline DistilBertForQuestionAnswering from anamgarcia +author: John Snow Labs +name: burmese_awesome_qa_model_anamgarcia_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_qa_model_anamgarcia_pipeline` is a English model originally trained by anamgarcia. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_anamgarcia_pipeline_en_5.5.0_3.0_1726851177809.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_anamgarcia_pipeline_en_5.5.0_3.0_1726851177809.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_qa_model_anamgarcia_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_qa_model_anamgarcia_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_qa_model_anamgarcia_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.3 MB| + +## References + +https://huggingface.co/anamgarcia/my_awesome_qa_model + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_wnut_model_kanansharmaa_en.md b/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_wnut_model_kanansharmaa_en.md new file mode 100644 index 00000000000000..57bfe1e1c3b0b9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_wnut_model_kanansharmaa_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_awesome_wnut_model_kanansharmaa RoBertaForTokenClassification from kanansharmaa +author: John Snow Labs +name: burmese_awesome_wnut_model_kanansharmaa +date: 2024-09-20 +tags: [en, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_wnut_model_kanansharmaa` is a English model originally trained by kanansharmaa. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_wnut_model_kanansharmaa_en_5.5.0_3.0_1726847270855.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_wnut_model_kanansharmaa_en_5.5.0_3.0_1726847270855.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("burmese_awesome_wnut_model_kanansharmaa","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("burmese_awesome_wnut_model_kanansharmaa", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_wnut_model_kanansharmaa| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|434.2 MB| + +## References + +https://huggingface.co/kanansharmaa/my_awesome_wnut_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_wnut_model_kanansharmaa_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_wnut_model_kanansharmaa_pipeline_en.md new file mode 100644 index 00000000000000..d3bd09afe357cf --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_wnut_model_kanansharmaa_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_wnut_model_kanansharmaa_pipeline pipeline RoBertaForTokenClassification from kanansharmaa +author: John Snow Labs +name: burmese_awesome_wnut_model_kanansharmaa_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_wnut_model_kanansharmaa_pipeline` is a English model originally trained by kanansharmaa. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_wnut_model_kanansharmaa_pipeline_en_5.5.0_3.0_1726847303753.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_wnut_model_kanansharmaa_pipeline_en_5.5.0_3.0_1726847303753.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_wnut_model_kanansharmaa_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_wnut_model_kanansharmaa_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_wnut_model_kanansharmaa_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|434.2 MB| + +## References + +https://huggingface.co/kanansharmaa/my_awesome_wnut_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-burmese_finetuned_banking77_distilbert_en.md b/docs/_posts/ahmedlone127/2024-09-20-burmese_finetuned_banking77_distilbert_en.md new file mode 100644 index 00000000000000..22dcf78f41422b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-burmese_finetuned_banking77_distilbert_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_finetuned_banking77_distilbert DistilBertForSequenceClassification from Ghareeb-M +author: John Snow Labs +name: burmese_finetuned_banking77_distilbert +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_finetuned_banking77_distilbert` is a English model originally trained by Ghareeb-M. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_finetuned_banking77_distilbert_en_5.5.0_3.0_1726840991542.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_finetuned_banking77_distilbert_en_5.5.0_3.0_1726840991542.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_finetuned_banking77_distilbert","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_finetuned_banking77_distilbert", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_finetuned_banking77_distilbert| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.7 MB| + +## References + +https://huggingface.co/Ghareeb-M/my-finetuned-banking77-distilbert \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-burmese_finetuned_banking77_distilbert_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-burmese_finetuned_banking77_distilbert_pipeline_en.md new file mode 100644 index 00000000000000..aaeb055d6959a5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-burmese_finetuned_banking77_distilbert_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_finetuned_banking77_distilbert_pipeline pipeline DistilBertForSequenceClassification from Ghareeb-M +author: John Snow Labs +name: burmese_finetuned_banking77_distilbert_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_finetuned_banking77_distilbert_pipeline` is a English model originally trained by Ghareeb-M. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_finetuned_banking77_distilbert_pipeline_en_5.5.0_3.0_1726841003393.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_finetuned_banking77_distilbert_pipeline_en_5.5.0_3.0_1726841003393.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_finetuned_banking77_distilbert_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_finetuned_banking77_distilbert_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_finetuned_banking77_distilbert_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.7 MB| + +## References + +https://huggingface.co/Ghareeb-M/my-finetuned-banking77-distilbert + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-burmese_finetuned_emotion_distilbert_ghareeb_m_en.md b/docs/_posts/ahmedlone127/2024-09-20-burmese_finetuned_emotion_distilbert_ghareeb_m_en.md new file mode 100644 index 00000000000000..c459e28f7b0139 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-burmese_finetuned_emotion_distilbert_ghareeb_m_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_finetuned_emotion_distilbert_ghareeb_m DistilBertForSequenceClassification from Ghareeb-M +author: John Snow Labs +name: burmese_finetuned_emotion_distilbert_ghareeb_m +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_finetuned_emotion_distilbert_ghareeb_m` is a English model originally trained by Ghareeb-M. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_finetuned_emotion_distilbert_ghareeb_m_en_5.5.0_3.0_1726871373697.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_finetuned_emotion_distilbert_ghareeb_m_en_5.5.0_3.0_1726871373697.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_finetuned_emotion_distilbert_ghareeb_m","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_finetuned_emotion_distilbert_ghareeb_m", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_finetuned_emotion_distilbert_ghareeb_m| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|246.0 MB| + +## References + +https://huggingface.co/Ghareeb-M/my-finetuned-emotion-distilbert \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-burmese_finetuned_emotion_distilbert_ghareeb_m_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-burmese_finetuned_emotion_distilbert_ghareeb_m_pipeline_en.md new file mode 100644 index 00000000000000..a78d242c7c9707 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-burmese_finetuned_emotion_distilbert_ghareeb_m_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_finetuned_emotion_distilbert_ghareeb_m_pipeline pipeline DistilBertForSequenceClassification from Ghareeb-M +author: John Snow Labs +name: burmese_finetuned_emotion_distilbert_ghareeb_m_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_finetuned_emotion_distilbert_ghareeb_m_pipeline` is a English model originally trained by Ghareeb-M. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_finetuned_emotion_distilbert_ghareeb_m_pipeline_en_5.5.0_3.0_1726871385438.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_finetuned_emotion_distilbert_ghareeb_m_pipeline_en_5.5.0_3.0_1726871385438.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_finetuned_emotion_distilbert_ghareeb_m_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_finetuned_emotion_distilbert_ghareeb_m_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_finetuned_emotion_distilbert_ghareeb_m_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|246.0 MB| + +## References + +https://huggingface.co/Ghareeb-M/my-finetuned-emotion-distilbert + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-burmese_first_test_model_en.md b/docs/_posts/ahmedlone127/2024-09-20-burmese_first_test_model_en.md new file mode 100644 index 00000000000000..7ac4c8aea2c45f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-burmese_first_test_model_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_first_test_model DistilBertForSequenceClassification from Neroism8422 +author: John Snow Labs +name: burmese_first_test_model +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_first_test_model` is a English model originally trained by Neroism8422. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_first_test_model_en_5.5.0_3.0_1726842454840.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_first_test_model_en_5.5.0_3.0_1726842454840.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_first_test_model","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_first_test_model", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_first_test_model| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Neroism8422/my_first_test_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-burmese_first_test_model_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-burmese_first_test_model_pipeline_en.md new file mode 100644 index 00000000000000..60dafd9c910157 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-burmese_first_test_model_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_first_test_model_pipeline pipeline DistilBertForSequenceClassification from Neroism8422 +author: John Snow Labs +name: burmese_first_test_model_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_first_test_model_pipeline` is a English model originally trained by Neroism8422. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_first_test_model_pipeline_en_5.5.0_3.0_1726842466727.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_first_test_model_pipeline_en_5.5.0_3.0_1726842466727.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_first_test_model_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_first_test_model_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_first_test_model_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Neroism8422/my_first_test_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-burmese_idea_classification_model_trial_1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-burmese_idea_classification_model_trial_1_pipeline_en.md new file mode 100644 index 00000000000000..87d33813a5d4d1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-burmese_idea_classification_model_trial_1_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_idea_classification_model_trial_1_pipeline pipeline DistilBertForSequenceClassification from manimaranpa07 +author: John Snow Labs +name: burmese_idea_classification_model_trial_1_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_idea_classification_model_trial_1_pipeline` is a English model originally trained by manimaranpa07. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_idea_classification_model_trial_1_pipeline_en_5.5.0_3.0_1726809114485.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_idea_classification_model_trial_1_pipeline_en_5.5.0_3.0_1726809114485.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_idea_classification_model_trial_1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_idea_classification_model_trial_1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_idea_classification_model_trial_1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/manimaranpa07/my_idea_classification_model_trial_1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-burmese_insurance_mlm_model_en.md b/docs/_posts/ahmedlone127/2024-09-20-burmese_insurance_mlm_model_en.md new file mode 100644 index 00000000000000..521c668736c70e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-burmese_insurance_mlm_model_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_insurance_mlm_model RoBertaEmbeddings from michaelfong2017 +author: John Snow Labs +name: burmese_insurance_mlm_model +date: 2024-09-20 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_insurance_mlm_model` is a English model originally trained by michaelfong2017. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_insurance_mlm_model_en_5.5.0_3.0_1726793771854.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_insurance_mlm_model_en_5.5.0_3.0_1726793771854.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("burmese_insurance_mlm_model","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("burmese_insurance_mlm_model","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_insurance_mlm_model| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|306.5 MB| + +## References + +https://huggingface.co/michaelfong2017/my_insurance_mlm_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-burmese_model1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-burmese_model1_pipeline_en.md new file mode 100644 index 00000000000000..a112a77a943a81 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-burmese_model1_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_model1_pipeline pipeline DistilBertForSequenceClassification from Asadbek1 +author: John Snow Labs +name: burmese_model1_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_model1_pipeline` is a English model originally trained by Asadbek1. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_model1_pipeline_en_5.5.0_3.0_1726842236201.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_model1_pipeline_en_5.5.0_3.0_1726842236201.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_model1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_model1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_model1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Asadbek1/my_model1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-burmese_model_jiangwf_en.md b/docs/_posts/ahmedlone127/2024-09-20-burmese_model_jiangwf_en.md new file mode 100644 index 00000000000000..99385211ac246e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-burmese_model_jiangwf_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_model_jiangwf DistilBertForSequenceClassification from jiangwf +author: John Snow Labs +name: burmese_model_jiangwf +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_model_jiangwf` is a English model originally trained by jiangwf. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_model_jiangwf_en_5.5.0_3.0_1726842079040.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_model_jiangwf_en_5.5.0_3.0_1726842079040.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_model_jiangwf","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_model_jiangwf", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_model_jiangwf| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/jiangwf/my_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-burmese_model_jiangwf_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-burmese_model_jiangwf_pipeline_en.md new file mode 100644 index 00000000000000..516ddf8809d81d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-burmese_model_jiangwf_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_model_jiangwf_pipeline pipeline DistilBertForSequenceClassification from jiangwf +author: John Snow Labs +name: burmese_model_jiangwf_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_model_jiangwf_pipeline` is a English model originally trained by jiangwf. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_model_jiangwf_pipeline_en_5.5.0_3.0_1726842093245.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_model_jiangwf_pipeline_en_5.5.0_3.0_1726842093245.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_model_jiangwf_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_model_jiangwf_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_model_jiangwf_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/jiangwf/my_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-burmese_model_mlituma_en.md b/docs/_posts/ahmedlone127/2024-09-20-burmese_model_mlituma_en.md new file mode 100644 index 00000000000000..4c99404c50fda9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-burmese_model_mlituma_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_model_mlituma DistilBertForSequenceClassification from mlituma +author: John Snow Labs +name: burmese_model_mlituma +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_model_mlituma` is a English model originally trained by mlituma. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_model_mlituma_en_5.5.0_3.0_1726841322048.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_model_mlituma_en_5.5.0_3.0_1726841322048.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_model_mlituma","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_model_mlituma", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_model_mlituma| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/mlituma/my_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-burmese_model_mlituma_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-burmese_model_mlituma_pipeline_en.md new file mode 100644 index 00000000000000..d3c2165c83af7a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-burmese_model_mlituma_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_model_mlituma_pipeline pipeline DistilBertForSequenceClassification from mlituma +author: John Snow Labs +name: burmese_model_mlituma_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_model_mlituma_pipeline` is a English model originally trained by mlituma. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_model_mlituma_pipeline_en_5.5.0_3.0_1726841335082.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_model_mlituma_pipeline_en_5.5.0_3.0_1726841335082.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_model_mlituma_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_model_mlituma_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_model_mlituma_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/mlituma/my_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-case_analysis_distilbert_base_cased_en.md b/docs/_posts/ahmedlone127/2024-09-20-case_analysis_distilbert_base_cased_en.md new file mode 100644 index 00000000000000..4b373f00e651f6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-case_analysis_distilbert_base_cased_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English case_analysis_distilbert_base_cased DistilBertForSequenceClassification from cite-text-analysis +author: John Snow Labs +name: case_analysis_distilbert_base_cased +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`case_analysis_distilbert_base_cased` is a English model originally trained by cite-text-analysis. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/case_analysis_distilbert_base_cased_en_5.5.0_3.0_1726840872217.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/case_analysis_distilbert_base_cased_en_5.5.0_3.0_1726840872217.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("case_analysis_distilbert_base_cased","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("case_analysis_distilbert_base_cased", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|case_analysis_distilbert_base_cased| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|246.0 MB| + +## References + +https://huggingface.co/cite-text-analysis/case-analysis-distilbert-base-cased \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-case_analysis_distilbert_base_cased_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-case_analysis_distilbert_base_cased_pipeline_en.md new file mode 100644 index 00000000000000..04d3cc12f13db4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-case_analysis_distilbert_base_cased_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English case_analysis_distilbert_base_cased_pipeline pipeline DistilBertForSequenceClassification from cite-text-analysis +author: John Snow Labs +name: case_analysis_distilbert_base_cased_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`case_analysis_distilbert_base_cased_pipeline` is a English model originally trained by cite-text-analysis. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/case_analysis_distilbert_base_cased_pipeline_en_5.5.0_3.0_1726840890356.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/case_analysis_distilbert_base_cased_pipeline_en_5.5.0_3.0_1726840890356.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("case_analysis_distilbert_base_cased_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("case_analysis_distilbert_base_cased_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|case_analysis_distilbert_base_cased_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|246.0 MB| + +## References + +https://huggingface.co/cite-text-analysis/case-analysis-distilbert-base-cased + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-cat_1_html_distilbert_base_uncased_en.md b/docs/_posts/ahmedlone127/2024-09-20-cat_1_html_distilbert_base_uncased_en.md new file mode 100644 index 00000000000000..4ae3a8990af64d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-cat_1_html_distilbert_base_uncased_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English cat_1_html_distilbert_base_uncased DistilBertForSequenceClassification from chuuhtetnaing +author: John Snow Labs +name: cat_1_html_distilbert_base_uncased +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`cat_1_html_distilbert_base_uncased` is a English model originally trained by chuuhtetnaing. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/cat_1_html_distilbert_base_uncased_en_5.5.0_3.0_1726832500256.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/cat_1_html_distilbert_base_uncased_en_5.5.0_3.0_1726832500256.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("cat_1_html_distilbert_base_uncased","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("cat_1_html_distilbert_base_uncased", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|cat_1_html_distilbert_base_uncased| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/chuuhtetnaing/cat-1-html-distilbert-base-uncased \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-cat_1_html_distilbert_base_uncased_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-cat_1_html_distilbert_base_uncased_pipeline_en.md new file mode 100644 index 00000000000000..c33c7680e22689 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-cat_1_html_distilbert_base_uncased_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English cat_1_html_distilbert_base_uncased_pipeline pipeline DistilBertForSequenceClassification from chuuhtetnaing +author: John Snow Labs +name: cat_1_html_distilbert_base_uncased_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`cat_1_html_distilbert_base_uncased_pipeline` is a English model originally trained by chuuhtetnaing. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/cat_1_html_distilbert_base_uncased_pipeline_en_5.5.0_3.0_1726832512257.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/cat_1_html_distilbert_base_uncased_pipeline_en_5.5.0_3.0_1726832512257.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("cat_1_html_distilbert_base_uncased_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("cat_1_html_distilbert_base_uncased_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|cat_1_html_distilbert_base_uncased_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/chuuhtetnaing/cat-1-html-distilbert-base-uncased + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-cat_ner_spanish_5_en.md b/docs/_posts/ahmedlone127/2024-09-20-cat_ner_spanish_5_en.md new file mode 100644 index 00000000000000..ccf2f06b50b118 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-cat_ner_spanish_5_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English cat_ner_spanish_5 RoBertaForTokenClassification from homersimpson +author: John Snow Labs +name: cat_ner_spanish_5 +date: 2024-09-20 +tags: [en, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`cat_ner_spanish_5` is a English model originally trained by homersimpson. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/cat_ner_spanish_5_en_5.5.0_3.0_1726847568889.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/cat_ner_spanish_5_en_5.5.0_3.0_1726847568889.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("cat_ner_spanish_5","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("cat_ner_spanish_5", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|cat_ner_spanish_5| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|462.3 MB| + +## References + +https://huggingface.co/homersimpson/cat-ner-es-5 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-cat_ner_spanish_5_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-cat_ner_spanish_5_pipeline_en.md new file mode 100644 index 00000000000000..7f0505cd0d3743 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-cat_ner_spanish_5_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English cat_ner_spanish_5_pipeline pipeline RoBertaForTokenClassification from homersimpson +author: John Snow Labs +name: cat_ner_spanish_5_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`cat_ner_spanish_5_pipeline` is a English model originally trained by homersimpson. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/cat_ner_spanish_5_pipeline_en_5.5.0_3.0_1726847592576.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/cat_ner_spanish_5_pipeline_en_5.5.0_3.0_1726847592576.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("cat_ner_spanish_5_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("cat_ner_spanish_5_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|cat_ner_spanish_5_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|462.3 MB| + +## References + +https://huggingface.co/homersimpson/cat-ner-es-5 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-cat_sayula_popoluca_xlmr_3_en.md b/docs/_posts/ahmedlone127/2024-09-20-cat_sayula_popoluca_xlmr_3_en.md new file mode 100644 index 00000000000000..fa9a8dc5ae047b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-cat_sayula_popoluca_xlmr_3_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English cat_sayula_popoluca_xlmr_3 XlmRoBertaForTokenClassification from homersimpson +author: John Snow Labs +name: cat_sayula_popoluca_xlmr_3 +date: 2024-09-20 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`cat_sayula_popoluca_xlmr_3` is a English model originally trained by homersimpson. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/cat_sayula_popoluca_xlmr_3_en_5.5.0_3.0_1726843766638.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/cat_sayula_popoluca_xlmr_3_en_5.5.0_3.0_1726843766638.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("cat_sayula_popoluca_xlmr_3","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("cat_sayula_popoluca_xlmr_3", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|cat_sayula_popoluca_xlmr_3| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|815.8 MB| + +## References + +https://huggingface.co/homersimpson/cat-pos-xlmr-3 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-cat_sayula_popoluca_xlmr_3_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-cat_sayula_popoluca_xlmr_3_pipeline_en.md new file mode 100644 index 00000000000000..314c385a8a67d7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-cat_sayula_popoluca_xlmr_3_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English cat_sayula_popoluca_xlmr_3_pipeline pipeline XlmRoBertaForTokenClassification from homersimpson +author: John Snow Labs +name: cat_sayula_popoluca_xlmr_3_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`cat_sayula_popoluca_xlmr_3_pipeline` is a English model originally trained by homersimpson. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/cat_sayula_popoluca_xlmr_3_pipeline_en_5.5.0_3.0_1726843879501.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/cat_sayula_popoluca_xlmr_3_pipeline_en_5.5.0_3.0_1726843879501.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("cat_sayula_popoluca_xlmr_3_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("cat_sayula_popoluca_xlmr_3_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|cat_sayula_popoluca_xlmr_3_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|815.8 MB| + +## References + +https://huggingface.co/homersimpson/cat-pos-xlmr-3 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-cat_sayula_popoluca_xlmr_4_en.md b/docs/_posts/ahmedlone127/2024-09-20-cat_sayula_popoluca_xlmr_4_en.md new file mode 100644 index 00000000000000..ac09e2501f8465 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-cat_sayula_popoluca_xlmr_4_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English cat_sayula_popoluca_xlmr_4 XlmRoBertaForTokenClassification from homersimpson +author: John Snow Labs +name: cat_sayula_popoluca_xlmr_4 +date: 2024-09-20 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`cat_sayula_popoluca_xlmr_4` is a English model originally trained by homersimpson. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/cat_sayula_popoluca_xlmr_4_en_5.5.0_3.0_1726843289305.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/cat_sayula_popoluca_xlmr_4_en_5.5.0_3.0_1726843289305.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("cat_sayula_popoluca_xlmr_4","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("cat_sayula_popoluca_xlmr_4", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|cat_sayula_popoluca_xlmr_4| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|815.8 MB| + +## References + +https://huggingface.co/homersimpson/cat-pos-xlmr-4 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-cat_sayula_popoluca_xlmr_4_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-cat_sayula_popoluca_xlmr_4_pipeline_en.md new file mode 100644 index 00000000000000..c645f9815a57c8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-cat_sayula_popoluca_xlmr_4_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English cat_sayula_popoluca_xlmr_4_pipeline pipeline XlmRoBertaForTokenClassification from homersimpson +author: John Snow Labs +name: cat_sayula_popoluca_xlmr_4_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`cat_sayula_popoluca_xlmr_4_pipeline` is a English model originally trained by homersimpson. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/cat_sayula_popoluca_xlmr_4_pipeline_en_5.5.0_3.0_1726843406955.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/cat_sayula_popoluca_xlmr_4_pipeline_en_5.5.0_3.0_1726843406955.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("cat_sayula_popoluca_xlmr_4_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("cat_sayula_popoluca_xlmr_4_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|cat_sayula_popoluca_xlmr_4_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|815.8 MB| + +## References + +https://huggingface.co/homersimpson/cat-pos-xlmr-4 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-chungliao_mbert_base_cased_en.md b/docs/_posts/ahmedlone127/2024-09-20-chungliao_mbert_base_cased_en.md new file mode 100644 index 00000000000000..43d8963018ed84 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-chungliao_mbert_base_cased_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English chungliao_mbert_base_cased BertEmbeddings from N1ch0 +author: John Snow Labs +name: chungliao_mbert_base_cased +date: 2024-09-20 +tags: [en, open_source, onnx, embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`chungliao_mbert_base_cased` is a English model originally trained by N1ch0. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/chungliao_mbert_base_cased_en_5.5.0_3.0_1726806199029.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/chungliao_mbert_base_cased_en_5.5.0_3.0_1726806199029.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = BertEmbeddings.pretrained("chungliao_mbert_base_cased","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = BertEmbeddings.pretrained("chungliao_mbert_base_cased","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|chungliao_mbert_base_cased| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[bert]| +|Language:|en| +|Size:|665.0 MB| + +## References + +https://huggingface.co/N1ch0/chungliao-mbert-base-cased \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-classif_mmate_1_5_original_cont_3_sent_en.md b/docs/_posts/ahmedlone127/2024-09-20-classif_mmate_1_5_original_cont_3_sent_en.md new file mode 100644 index 00000000000000..58b8a3a05f9ef1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-classif_mmate_1_5_original_cont_3_sent_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English classif_mmate_1_5_original_cont_3_sent BertForSequenceClassification from spneshaei +author: John Snow Labs +name: classif_mmate_1_5_original_cont_3_sent +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`classif_mmate_1_5_original_cont_3_sent` is a English model originally trained by spneshaei. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/classif_mmate_1_5_original_cont_3_sent_en_5.5.0_3.0_1726860328721.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/classif_mmate_1_5_original_cont_3_sent_en_5.5.0_3.0_1726860328721.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("classif_mmate_1_5_original_cont_3_sent","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("classif_mmate_1_5_original_cont_3_sent", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|classif_mmate_1_5_original_cont_3_sent| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.1 MB| + +## References + +https://huggingface.co/spneshaei/classif_mmate_1_5_original_cont_3_sent \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-classif_mmate_1_5_original_cont_3_sent_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-classif_mmate_1_5_original_cont_3_sent_pipeline_en.md new file mode 100644 index 00000000000000..66d9435a66f1c2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-classif_mmate_1_5_original_cont_3_sent_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English classif_mmate_1_5_original_cont_3_sent_pipeline pipeline BertForSequenceClassification from spneshaei +author: John Snow Labs +name: classif_mmate_1_5_original_cont_3_sent_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`classif_mmate_1_5_original_cont_3_sent_pipeline` is a English model originally trained by spneshaei. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/classif_mmate_1_5_original_cont_3_sent_pipeline_en_5.5.0_3.0_1726860348492.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/classif_mmate_1_5_original_cont_3_sent_pipeline_en_5.5.0_3.0_1726860348492.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("classif_mmate_1_5_original_cont_3_sent_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("classif_mmate_1_5_original_cont_3_sent_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|classif_mmate_1_5_original_cont_3_sent_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.1 MB| + +## References + +https://huggingface.co/spneshaei/classif_mmate_1_5_original_cont_3_sent + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-coha1810to1850_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-coha1810to1850_pipeline_en.md new file mode 100644 index 00000000000000..b84ab90e456a53 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-coha1810to1850_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English coha1810to1850_pipeline pipeline RoBertaEmbeddings from simonmun +author: John Snow Labs +name: coha1810to1850_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`coha1810to1850_pipeline` is a English model originally trained by simonmun. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/coha1810to1850_pipeline_en_5.5.0_3.0_1726796625924.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/coha1810to1850_pipeline_en_5.5.0_3.0_1726796625924.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("coha1810to1850_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("coha1810to1850_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|coha1810to1850_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|309.7 MB| + +## References + +https://huggingface.co/simonmun/COHA1810to1850 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-commit_message_quality_codebert_en.md b/docs/_posts/ahmedlone127/2024-09-20-commit_message_quality_codebert_en.md new file mode 100644 index 00000000000000..3bf69ec6a4442f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-commit_message_quality_codebert_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English commit_message_quality_codebert RoBertaForSequenceClassification from saridormi +author: John Snow Labs +name: commit_message_quality_codebert +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`commit_message_quality_codebert` is a English model originally trained by saridormi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/commit_message_quality_codebert_en_5.5.0_3.0_1726852451894.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/commit_message_quality_codebert_en_5.5.0_3.0_1726852451894.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("commit_message_quality_codebert","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("commit_message_quality_codebert", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|commit_message_quality_codebert| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|468.3 MB| + +## References + +https://huggingface.co/saridormi/commit-message-quality-codebert \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-commit_message_quality_codebert_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-commit_message_quality_codebert_pipeline_en.md new file mode 100644 index 00000000000000..cabd1663734a45 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-commit_message_quality_codebert_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English commit_message_quality_codebert_pipeline pipeline RoBertaForSequenceClassification from saridormi +author: John Snow Labs +name: commit_message_quality_codebert_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`commit_message_quality_codebert_pipeline` is a English model originally trained by saridormi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/commit_message_quality_codebert_pipeline_en_5.5.0_3.0_1726852473650.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/commit_message_quality_codebert_pipeline_en_5.5.0_3.0_1726852473650.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("commit_message_quality_codebert_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("commit_message_quality_codebert_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|commit_message_quality_codebert_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|468.4 MB| + +## References + +https://huggingface.co/saridormi/commit-message-quality-codebert + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-covid_roberta_60_masked_en.md b/docs/_posts/ahmedlone127/2024-09-20-covid_roberta_60_masked_en.md new file mode 100644 index 00000000000000..ab84fab5a36c5c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-covid_roberta_60_masked_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English covid_roberta_60_masked RoBertaEmbeddings from timoneda +author: John Snow Labs +name: covid_roberta_60_masked +date: 2024-09-20 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`covid_roberta_60_masked` is a English model originally trained by timoneda. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/covid_roberta_60_masked_en_5.5.0_3.0_1726796725166.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/covid_roberta_60_masked_en_5.5.0_3.0_1726796725166.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("covid_roberta_60_masked","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("covid_roberta_60_masked","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|covid_roberta_60_masked| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/timoneda/covid_roberta_60_masked \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-covid_roberta_60_masked_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-covid_roberta_60_masked_pipeline_en.md new file mode 100644 index 00000000000000..da2883e1504b58 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-covid_roberta_60_masked_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English covid_roberta_60_masked_pipeline pipeline RoBertaEmbeddings from timoneda +author: John Snow Labs +name: covid_roberta_60_masked_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`covid_roberta_60_masked_pipeline` is a English model originally trained by timoneda. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/covid_roberta_60_masked_pipeline_en_5.5.0_3.0_1726796785205.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/covid_roberta_60_masked_pipeline_en_5.5.0_3.0_1726796785205.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("covid_roberta_60_masked_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("covid_roberta_60_masked_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|covid_roberta_60_masked_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/timoneda/covid_roberta_60_masked + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-criminal_case_classifier1_en.md b/docs/_posts/ahmedlone127/2024-09-20-criminal_case_classifier1_en.md new file mode 100644 index 00000000000000..5f88c0401d5739 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-criminal_case_classifier1_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English criminal_case_classifier1 DistilBertForSequenceClassification from LahiruProjects +author: John Snow Labs +name: criminal_case_classifier1 +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`criminal_case_classifier1` is a English model originally trained by LahiruProjects. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/criminal_case_classifier1_en_5.5.0_3.0_1726840993335.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/criminal_case_classifier1_en_5.5.0_3.0_1726840993335.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("criminal_case_classifier1","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("criminal_case_classifier1", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|criminal_case_classifier1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/LahiruProjects/criminal-case-classifier1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-criminal_case_classifier1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-criminal_case_classifier1_pipeline_en.md new file mode 100644 index 00000000000000..5ab2387530b537 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-criminal_case_classifier1_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English criminal_case_classifier1_pipeline pipeline DistilBertForSequenceClassification from LahiruProjects +author: John Snow Labs +name: criminal_case_classifier1_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`criminal_case_classifier1_pipeline` is a English model originally trained by LahiruProjects. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/criminal_case_classifier1_pipeline_en_5.5.0_3.0_1726841006254.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/criminal_case_classifier1_pipeline_en_5.5.0_3.0_1726841006254.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("criminal_case_classifier1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("criminal_case_classifier1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|criminal_case_classifier1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/LahiruProjects/criminal-case-classifier1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-custommodel_v2c_k_en.md b/docs/_posts/ahmedlone127/2024-09-20-custommodel_v2c_k_en.md new file mode 100644 index 00000000000000..138076364b80b0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-custommodel_v2c_k_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English custommodel_v2c_k DistilBertForSequenceClassification from katowtkkk +author: John Snow Labs +name: custommodel_v2c_k +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`custommodel_v2c_k` is a English model originally trained by katowtkkk. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/custommodel_v2c_k_en_5.5.0_3.0_1726832619234.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/custommodel_v2c_k_en_5.5.0_3.0_1726832619234.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("custommodel_v2c_k","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("custommodel_v2c_k", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|custommodel_v2c_k| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|233.9 KB| + +## References + +https://huggingface.co/katowtkkk/CustomModel_v2c_k \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-custommodel_v2c_k_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-custommodel_v2c_k_pipeline_en.md new file mode 100644 index 00000000000000..03607054eccbc9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-custommodel_v2c_k_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English custommodel_v2c_k_pipeline pipeline DistilBertForSequenceClassification from katowtkkk +author: John Snow Labs +name: custommodel_v2c_k_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`custommodel_v2c_k_pipeline` is a English model originally trained by katowtkkk. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/custommodel_v2c_k_pipeline_en_5.5.0_3.0_1726832619598.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/custommodel_v2c_k_pipeline_en_5.5.0_3.0_1726832619598.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("custommodel_v2c_k_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("custommodel_v2c_k_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|custommodel_v2c_k_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|257.2 KB| + +## References + +https://huggingface.co/katowtkkk/CustomModel_v2c_k + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-cyberta_en.md b/docs/_posts/ahmedlone127/2024-09-20-cyberta_en.md new file mode 100644 index 00000000000000..2a8bf6f83b41fa --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-cyberta_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English cyberta RoBertaEmbeddings from mstaron +author: John Snow Labs +name: cyberta +date: 2024-09-20 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`cyberta` is a English model originally trained by mstaron. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/cyberta_en_5.5.0_3.0_1726816179846.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/cyberta_en_5.5.0_3.0_1726816179846.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("cyberta","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("cyberta","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|cyberta| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|310.9 MB| + +## References + +https://huggingface.co/mstaron/CyBERTa \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-cyberta_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-cyberta_pipeline_en.md new file mode 100644 index 00000000000000..248fac1e8b7fe8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-cyberta_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English cyberta_pipeline pipeline RoBertaEmbeddings from mstaron +author: John Snow Labs +name: cyberta_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`cyberta_pipeline` is a English model originally trained by mstaron. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/cyberta_pipeline_en_5.5.0_3.0_1726816194478.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/cyberta_pipeline_en_5.5.0_3.0_1726816194478.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("cyberta_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("cyberta_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|cyberta_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|310.9 MB| + +## References + +https://huggingface.co/mstaron/CyBERTa + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-db_mc2_4_2_en.md b/docs/_posts/ahmedlone127/2024-09-20-db_mc2_4_2_en.md new file mode 100644 index 00000000000000..c11b9243a1a701 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-db_mc2_4_2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English db_mc2_4_2 DistilBertForSequenceClassification from exala +author: John Snow Labs +name: db_mc2_4_2 +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`db_mc2_4_2` is a English model originally trained by exala. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/db_mc2_4_2_en_5.5.0_3.0_1726830359753.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/db_mc2_4_2_en_5.5.0_3.0_1726830359753.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("db_mc2_4_2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("db_mc2_4_2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|db_mc2_4_2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.7 MB| + +## References + +https://huggingface.co/exala/db_mc2_4.2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-db_mc2_4_2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-db_mc2_4_2_pipeline_en.md new file mode 100644 index 00000000000000..53ef8d3b783d50 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-db_mc2_4_2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English db_mc2_4_2_pipeline pipeline DistilBertForSequenceClassification from exala +author: John Snow Labs +name: db_mc2_4_2_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`db_mc2_4_2_pipeline` is a English model originally trained by exala. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/db_mc2_4_2_pipeline_en_5.5.0_3.0_1726830375315.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/db_mc2_4_2_pipeline_en_5.5.0_3.0_1726830375315.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("db_mc2_4_2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("db_mc2_4_2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|db_mc2_4_2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.7 MB| + +## References + +https://huggingface.co/exala/db_mc2_4.2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-db_mc_2_0_1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-db_mc_2_0_1_pipeline_en.md new file mode 100644 index 00000000000000..dcb4a9a56cf11e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-db_mc_2_0_1_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English db_mc_2_0_1_pipeline pipeline DistilBertForSequenceClassification from exala +author: John Snow Labs +name: db_mc_2_0_1_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`db_mc_2_0_1_pipeline` is a English model originally trained by exala. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/db_mc_2_0_1_pipeline_en_5.5.0_3.0_1726792141759.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/db_mc_2_0_1_pipeline_en_5.5.0_3.0_1726792141759.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("db_mc_2_0_1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("db_mc_2_0_1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|db_mc_2_0_1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.7 MB| + +## References + +https://huggingface.co/exala/db_mc_2.0.1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-descr_class_two_cm_en.md b/docs/_posts/ahmedlone127/2024-09-20-descr_class_two_cm_en.md new file mode 100644 index 00000000000000..7031f5771d7cfc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-descr_class_two_cm_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English descr_class_two_cm DistilBertForSequenceClassification from BanananaMax +author: John Snow Labs +name: descr_class_two_cm +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`descr_class_two_cm` is a English model originally trained by BanananaMax. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/descr_class_two_cm_en_5.5.0_3.0_1726849038165.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/descr_class_two_cm_en_5.5.0_3.0_1726849038165.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("descr_class_two_cm","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("descr_class_two_cm", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|descr_class_two_cm| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.6 MB| + +## References + +https://huggingface.co/BanananaMax/descr_class_two_cm \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-descr_class_two_cm_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-descr_class_two_cm_pipeline_en.md new file mode 100644 index 00000000000000..817d554780a329 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-descr_class_two_cm_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English descr_class_two_cm_pipeline pipeline DistilBertForSequenceClassification from BanananaMax +author: John Snow Labs +name: descr_class_two_cm_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`descr_class_two_cm_pipeline` is a English model originally trained by BanananaMax. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/descr_class_two_cm_pipeline_en_5.5.0_3.0_1726849050270.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/descr_class_two_cm_pipeline_en_5.5.0_3.0_1726849050270.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("descr_class_two_cm_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("descr_class_two_cm_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|descr_class_two_cm_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.7 MB| + +## References + +https://huggingface.co/BanananaMax/descr_class_two_cm + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-dialogue_overfit_check_fold_4_en.md b/docs/_posts/ahmedlone127/2024-09-20-dialogue_overfit_check_fold_4_en.md new file mode 100644 index 00000000000000..d3e6f6554512b9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-dialogue_overfit_check_fold_4_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English dialogue_overfit_check_fold_4 DistilBertForSequenceClassification from SharonTudi +author: John Snow Labs +name: dialogue_overfit_check_fold_4 +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`dialogue_overfit_check_fold_4` is a English model originally trained by SharonTudi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/dialogue_overfit_check_fold_4_en_5.5.0_3.0_1726848807936.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/dialogue_overfit_check_fold_4_en_5.5.0_3.0_1726848807936.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("dialogue_overfit_check_fold_4","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("dialogue_overfit_check_fold_4", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|dialogue_overfit_check_fold_4| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/SharonTudi/DIALOGUE_overfit_check_fold_4 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-dialogue_overfit_check_fold_4_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-dialogue_overfit_check_fold_4_pipeline_en.md new file mode 100644 index 00000000000000..27f22e8ca27b72 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-dialogue_overfit_check_fold_4_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English dialogue_overfit_check_fold_4_pipeline pipeline DistilBertForSequenceClassification from SharonTudi +author: John Snow Labs +name: dialogue_overfit_check_fold_4_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`dialogue_overfit_check_fold_4_pipeline` is a English model originally trained by SharonTudi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/dialogue_overfit_check_fold_4_pipeline_en_5.5.0_3.0_1726848824190.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/dialogue_overfit_check_fold_4_pipeline_en_5.5.0_3.0_1726848824190.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("dialogue_overfit_check_fold_4_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("dialogue_overfit_check_fold_4_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|dialogue_overfit_check_fold_4_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/SharonTudi/DIALOGUE_overfit_check_fold_4 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert5_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert5_en.md new file mode 100644 index 00000000000000..f8f715269defd9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert5_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert5 DistilBertForSequenceClassification from deptage +author: John Snow Labs +name: distilbert5 +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert5` is a English model originally trained by deptage. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert5_en_5.5.0_3.0_1726848919402.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert5_en_5.5.0_3.0_1726848919402.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert5","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert5", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert5| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/deptage/distilbert5 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert5_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert5_pipeline_en.md new file mode 100644 index 00000000000000..eb72546d055685 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert5_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert5_pipeline pipeline DistilBertForSequenceClassification from deptage +author: John Snow Labs +name: distilbert5_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert5_pipeline` is a English model originally trained by deptage. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert5_pipeline_en_5.5.0_3.0_1726848931460.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert5_pipeline_en_5.5.0_3.0_1726848931460.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert5_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert5_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert5_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/deptage/distilbert5 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_airlines_news_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_airlines_news_en.md new file mode 100644 index 00000000000000..3368b96682707e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_airlines_news_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_airlines_news DistilBertForSequenceClassification from dahe827 +author: John Snow Labs +name: distilbert_base_airlines_news +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_airlines_news` is a English model originally trained by dahe827. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_airlines_news_en_5.5.0_3.0_1726830511483.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_airlines_news_en_5.5.0_3.0_1726830511483.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_airlines_news","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_airlines_news", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_airlines_news| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/dahe827/DistilBERT-base-airlines-news \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_airlines_news_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_airlines_news_pipeline_en.md new file mode 100644 index 00000000000000..0f70cf4a1cb296 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_airlines_news_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_airlines_news_pipeline pipeline DistilBertForSequenceClassification from dahe827 +author: John Snow Labs +name: distilbert_base_airlines_news_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_airlines_news_pipeline` is a English model originally trained by dahe827. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_airlines_news_pipeline_en_5.5.0_3.0_1726830523617.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_airlines_news_pipeline_en_5.5.0_3.0_1726830523617.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_airlines_news_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_airlines_news_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_airlines_news_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/dahe827/DistilBERT-base-airlines-news + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_cased_airlines_news_multi_label_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_cased_airlines_news_multi_label_en.md new file mode 100644 index 00000000000000..a111c93c7adb55 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_cased_airlines_news_multi_label_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_cased_airlines_news_multi_label DistilBertForSequenceClassification from dahe827 +author: John Snow Labs +name: distilbert_base_cased_airlines_news_multi_label +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_cased_airlines_news_multi_label` is a English model originally trained by dahe827. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_cased_airlines_news_multi_label_en_5.5.0_3.0_1726792379823.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_cased_airlines_news_multi_label_en_5.5.0_3.0_1726792379823.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_cased_airlines_news_multi_label","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_cased_airlines_news_multi_label", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_cased_airlines_news_multi_label| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|246.0 MB| + +## References + +https://huggingface.co/dahe827/distilbert-base-cased-airlines-news-multi-label \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_cased_airlines_news_multi_label_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_cased_airlines_news_multi_label_pipeline_en.md new file mode 100644 index 00000000000000..ab210bcf3d2b0e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_cased_airlines_news_multi_label_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_cased_airlines_news_multi_label_pipeline pipeline DistilBertForSequenceClassification from dahe827 +author: John Snow Labs +name: distilbert_base_cased_airlines_news_multi_label_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_cased_airlines_news_multi_label_pipeline` is a English model originally trained by dahe827. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_cased_airlines_news_multi_label_pipeline_en_5.5.0_3.0_1726792392806.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_cased_airlines_news_multi_label_pipeline_en_5.5.0_3.0_1726792392806.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_cased_airlines_news_multi_label_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_cased_airlines_news_multi_label_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_cased_airlines_news_multi_label_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|246.0 MB| + +## References + +https://huggingface.co/dahe827/distilbert-base-cased-airlines-news-multi-label + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_multilingual_cased_resumesclasssifierv1_pipeline_xx.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_multilingual_cased_resumesclasssifierv1_pipeline_xx.md new file mode 100644 index 00000000000000..a34ae05a4ba19e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_multilingual_cased_resumesclasssifierv1_pipeline_xx.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Multilingual distilbert_base_multilingual_cased_resumesclasssifierv1_pipeline pipeline DistilBertForSequenceClassification from youssefkhalil320 +author: John Snow Labs +name: distilbert_base_multilingual_cased_resumesclasssifierv1_pipeline +date: 2024-09-20 +tags: [xx, open_source, pipeline, onnx] +task: Text Classification +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_multilingual_cased_resumesclasssifierv1_pipeline` is a Multilingual model originally trained by youssefkhalil320. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_multilingual_cased_resumesclasssifierv1_pipeline_xx_5.5.0_3.0_1726809697652.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_multilingual_cased_resumesclasssifierv1_pipeline_xx_5.5.0_3.0_1726809697652.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_multilingual_cased_resumesclasssifierv1_pipeline", lang = "xx") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_multilingual_cased_resumesclasssifierv1_pipeline", lang = "xx") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_multilingual_cased_resumesclasssifierv1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|xx| +|Size:|507.8 MB| + +## References + +https://huggingface.co/youssefkhalil320/distilbert-base-multilingual-cased-resumesClasssifierV1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_multilingual_cased_resumesclasssifierv1_xx.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_multilingual_cased_resumesclasssifierv1_xx.md new file mode 100644 index 00000000000000..81fe064351556b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_multilingual_cased_resumesclasssifierv1_xx.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Multilingual distilbert_base_multilingual_cased_resumesclasssifierv1 DistilBertForSequenceClassification from youssefkhalil320 +author: John Snow Labs +name: distilbert_base_multilingual_cased_resumesclasssifierv1 +date: 2024-09-20 +tags: [xx, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_multilingual_cased_resumesclasssifierv1` is a Multilingual model originally trained by youssefkhalil320. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_multilingual_cased_resumesclasssifierv1_xx_5.5.0_3.0_1726809673299.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_multilingual_cased_resumesclasssifierv1_xx_5.5.0_3.0_1726809673299.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_multilingual_cased_resumesclasssifierv1","xx") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_multilingual_cased_resumesclasssifierv1", "xx") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_multilingual_cased_resumesclasssifierv1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|xx| +|Size:|507.7 MB| + +## References + +https://huggingface.co/youssefkhalil320/distilbert-base-multilingual-cased-resumesClasssifierV1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_denyszakharkevych_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_denyszakharkevych_en.md new file mode 100644 index 00000000000000..23df0594bae4e2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_denyszakharkevych_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_denyszakharkevych DistilBertForSequenceClassification from DenysZakharkevych +author: John Snow Labs +name: distilbert_base_uncased_denyszakharkevych +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_denyszakharkevych` is a English model originally trained by DenysZakharkevych. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_denyszakharkevych_en_5.5.0_3.0_1726832394084.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_denyszakharkevych_en_5.5.0_3.0_1726832394084.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_denyszakharkevych","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_denyszakharkevych", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_denyszakharkevych| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/DenysZakharkevych/distilbert-base-uncased \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_denyszakharkevych_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_denyszakharkevych_pipeline_en.md new file mode 100644 index 00000000000000..94d814e3fdae8d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_denyszakharkevych_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_denyszakharkevych_pipeline pipeline DistilBertForSequenceClassification from DenysZakharkevych +author: John Snow Labs +name: distilbert_base_uncased_denyszakharkevych_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_denyszakharkevych_pipeline` is a English model originally trained by DenysZakharkevych. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_denyszakharkevych_pipeline_en_5.5.0_3.0_1726832408290.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_denyszakharkevych_pipeline_en_5.5.0_3.0_1726832408290.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_denyszakharkevych_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_denyszakharkevych_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_denyszakharkevych_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/DenysZakharkevych/distilbert-base-uncased + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_distilled_clinc_akashjoy_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_distilled_clinc_akashjoy_en.md new file mode 100644 index 00000000000000..d1b2f4396bfd51 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_distilled_clinc_akashjoy_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_distilled_clinc_akashjoy DistilBertForSequenceClassification from akashjoy +author: John Snow Labs +name: distilbert_base_uncased_distilled_clinc_akashjoy +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_distilled_clinc_akashjoy` is a English model originally trained by akashjoy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_distilled_clinc_akashjoy_en_5.5.0_3.0_1726842466550.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_distilled_clinc_akashjoy_en_5.5.0_3.0_1726842466550.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_distilled_clinc_akashjoy","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_distilled_clinc_akashjoy", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_distilled_clinc_akashjoy| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.9 MB| + +## References + +https://huggingface.co/akashjoy/distilbert-base-uncased-distilled-clinc \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_distilled_clinc_akashjoy_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_distilled_clinc_akashjoy_pipeline_en.md new file mode 100644 index 00000000000000..2e077b0e987779 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_distilled_clinc_akashjoy_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_distilled_clinc_akashjoy_pipeline pipeline DistilBertForSequenceClassification from akashjoy +author: John Snow Labs +name: distilbert_base_uncased_distilled_clinc_akashjoy_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_distilled_clinc_akashjoy_pipeline` is a English model originally trained by akashjoy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_distilled_clinc_akashjoy_pipeline_en_5.5.0_3.0_1726842482842.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_distilled_clinc_akashjoy_pipeline_en_5.5.0_3.0_1726842482842.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_distilled_clinc_akashjoy_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_distilled_clinc_akashjoy_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_distilled_clinc_akashjoy_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.9 MB| + +## References + +https://huggingface.co/akashjoy/distilbert-base-uncased-distilled-clinc + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_distilled_clinc_ehottl_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_distilled_clinc_ehottl_en.md new file mode 100644 index 00000000000000..d812c25b14136a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_distilled_clinc_ehottl_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_distilled_clinc_ehottl DistilBertForSequenceClassification from ehottl +author: John Snow Labs +name: distilbert_base_uncased_distilled_clinc_ehottl +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_distilled_clinc_ehottl` is a English model originally trained by ehottl. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_distilled_clinc_ehottl_en_5.5.0_3.0_1726842553073.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_distilled_clinc_ehottl_en_5.5.0_3.0_1726842553073.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_distilled_clinc_ehottl","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_distilled_clinc_ehottl", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_distilled_clinc_ehottl| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.9 MB| + +## References + +https://huggingface.co/ehottl/distilbert-base-uncased-distilled-clinc \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_distilled_clinc_ehottl_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_distilled_clinc_ehottl_pipeline_en.md new file mode 100644 index 00000000000000..a40b58af234ee8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_distilled_clinc_ehottl_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_distilled_clinc_ehottl_pipeline pipeline DistilBertForSequenceClassification from ehottl +author: John Snow Labs +name: distilbert_base_uncased_distilled_clinc_ehottl_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_distilled_clinc_ehottl_pipeline` is a English model originally trained by ehottl. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_distilled_clinc_ehottl_pipeline_en_5.5.0_3.0_1726842564848.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_distilled_clinc_ehottl_pipeline_en_5.5.0_3.0_1726842564848.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_distilled_clinc_ehottl_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_distilled_clinc_ehottl_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_distilled_clinc_ehottl_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.9 MB| + +## References + +https://huggingface.co/ehottl/distilbert-base-uncased-distilled-clinc + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_distilled_clinc_kwkwkwkwpark_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_distilled_clinc_kwkwkwkwpark_en.md new file mode 100644 index 00000000000000..2c7b79c06d4721 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_distilled_clinc_kwkwkwkwpark_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_distilled_clinc_kwkwkwkwpark DistilBertForSequenceClassification from kwkwkwkwpark +author: John Snow Labs +name: distilbert_base_uncased_distilled_clinc_kwkwkwkwpark +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_distilled_clinc_kwkwkwkwpark` is a English model originally trained by kwkwkwkwpark. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_distilled_clinc_kwkwkwkwpark_en_5.5.0_3.0_1726830368048.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_distilled_clinc_kwkwkwkwpark_en_5.5.0_3.0_1726830368048.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_distilled_clinc_kwkwkwkwpark","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_distilled_clinc_kwkwkwkwpark", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_distilled_clinc_kwkwkwkwpark| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.9 MB| + +## References + +https://huggingface.co/kwkwkwkwpark/distilbert-base-uncased-distilled-clinc \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_distilled_clinc_kwkwkwkwpark_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_distilled_clinc_kwkwkwkwpark_pipeline_en.md new file mode 100644 index 00000000000000..c5fd923566995f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_distilled_clinc_kwkwkwkwpark_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_distilled_clinc_kwkwkwkwpark_pipeline pipeline DistilBertForSequenceClassification from kwkwkwkwpark +author: John Snow Labs +name: distilbert_base_uncased_distilled_clinc_kwkwkwkwpark_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_distilled_clinc_kwkwkwkwpark_pipeline` is a English model originally trained by kwkwkwkwpark. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_distilled_clinc_kwkwkwkwpark_pipeline_en_5.5.0_3.0_1726830380529.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_distilled_clinc_kwkwkwkwpark_pipeline_en_5.5.0_3.0_1726830380529.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_distilled_clinc_kwkwkwkwpark_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_distilled_clinc_kwkwkwkwpark_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_distilled_clinc_kwkwkwkwpark_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.9 MB| + +## References + +https://huggingface.co/kwkwkwkwpark/distilbert-base-uncased-distilled-clinc + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_distilled_clinc_maarten1953_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_distilled_clinc_maarten1953_en.md new file mode 100644 index 00000000000000..cbbb3513f9bb4a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_distilled_clinc_maarten1953_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_distilled_clinc_maarten1953 DistilBertForSequenceClassification from Maarten1953 +author: John Snow Labs +name: distilbert_base_uncased_distilled_clinc_maarten1953 +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_distilled_clinc_maarten1953` is a English model originally trained by Maarten1953. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_distilled_clinc_maarten1953_en_5.5.0_3.0_1726871658428.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_distilled_clinc_maarten1953_en_5.5.0_3.0_1726871658428.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_distilled_clinc_maarten1953","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_distilled_clinc_maarten1953", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_distilled_clinc_maarten1953| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.9 MB| + +## References + +https://huggingface.co/Maarten1953/distilbert-base-uncased-distilled-clinc \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_distilled_clinc_maarten1953_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_distilled_clinc_maarten1953_pipeline_en.md new file mode 100644 index 00000000000000..28ab397373e9b7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_distilled_clinc_maarten1953_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_distilled_clinc_maarten1953_pipeline pipeline DistilBertForSequenceClassification from Maarten1953 +author: John Snow Labs +name: distilbert_base_uncased_distilled_clinc_maarten1953_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_distilled_clinc_maarten1953_pipeline` is a English model originally trained by Maarten1953. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_distilled_clinc_maarten1953_pipeline_en_5.5.0_3.0_1726871670481.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_distilled_clinc_maarten1953_pipeline_en_5.5.0_3.0_1726871670481.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_distilled_clinc_maarten1953_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_distilled_clinc_maarten1953_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_distilled_clinc_maarten1953_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.9 MB| + +## References + +https://huggingface.co/Maarten1953/distilbert-base-uncased-distilled-clinc + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_emotion_ft_0520_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_emotion_ft_0520_en.md new file mode 100644 index 00000000000000..63221b8b154bb3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_emotion_ft_0520_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_emotion_ft_0520 DistilBertForSequenceClassification from TangXiaoMing123 +author: John Snow Labs +name: distilbert_base_uncased_emotion_ft_0520 +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_emotion_ft_0520` is a English model originally trained by TangXiaoMing123. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_emotion_ft_0520_en_5.5.0_3.0_1726823697570.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_emotion_ft_0520_en_5.5.0_3.0_1726823697570.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_emotion_ft_0520","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_emotion_ft_0520", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_emotion_ft_0520| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/TangXiaoMing123/distilbert-base-uncased_emotion_ft_0520 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_emotion_ft_0520_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_emotion_ft_0520_pipeline_en.md new file mode 100644 index 00000000000000..474d7f2837b81b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_emotion_ft_0520_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_emotion_ft_0520_pipeline pipeline DistilBertForSequenceClassification from TangXiaoMing123 +author: John Snow Labs +name: distilbert_base_uncased_emotion_ft_0520_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_emotion_ft_0520_pipeline` is a English model originally trained by TangXiaoMing123. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_emotion_ft_0520_pipeline_en_5.5.0_3.0_1726823710214.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_emotion_ft_0520_pipeline_en_5.5.0_3.0_1726823710214.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_emotion_ft_0520_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_emotion_ft_0520_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_emotion_ft_0520_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/TangXiaoMing123/distilbert-base-uncased_emotion_ft_0520 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_cate_classfication_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_cate_classfication_en.md new file mode 100644 index 00000000000000..54c7dac2f850dc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_cate_classfication_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_cate_classfication DistilBertForSequenceClassification from shnguo +author: John Snow Labs +name: distilbert_base_uncased_finetuned_cate_classfication +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_cate_classfication` is a English model originally trained by shnguo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cate_classfication_en_5.5.0_3.0_1726823565030.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cate_classfication_en_5.5.0_3.0_1726823565030.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_cate_classfication","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_cate_classfication", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_cate_classfication| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|253.6 MB| + +## References + +https://huggingface.co/shnguo/distilbert-base-uncased-finetuned-cate-classfication \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_clinc_0xd1rac_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_clinc_0xd1rac_en.md new file mode 100644 index 00000000000000..8aa4fb01ae3030 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_clinc_0xd1rac_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_clinc_0xd1rac DistilBertForSequenceClassification from 0xd1rac +author: John Snow Labs +name: distilbert_base_uncased_finetuned_clinc_0xd1rac +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_clinc_0xd1rac` is a English model originally trained by 0xd1rac. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_clinc_0xd1rac_en_5.5.0_3.0_1726832704345.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_clinc_0xd1rac_en_5.5.0_3.0_1726832704345.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_clinc_0xd1rac","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_clinc_0xd1rac", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_clinc_0xd1rac| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.9 MB| + +## References + +https://huggingface.co/0xd1rac/distilbert-base-uncased-finetuned-clinc \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_clinc_0xd1rac_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_clinc_0xd1rac_pipeline_en.md new file mode 100644 index 00000000000000..067c09dddfdebc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_clinc_0xd1rac_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_clinc_0xd1rac_pipeline pipeline DistilBertForSequenceClassification from 0xd1rac +author: John Snow Labs +name: distilbert_base_uncased_finetuned_clinc_0xd1rac_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_clinc_0xd1rac_pipeline` is a English model originally trained by 0xd1rac. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_clinc_0xd1rac_pipeline_en_5.5.0_3.0_1726832717102.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_clinc_0xd1rac_pipeline_en_5.5.0_3.0_1726832717102.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_clinc_0xd1rac_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_clinc_0xd1rac_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_clinc_0xd1rac_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.9 MB| + +## References + +https://huggingface.co/0xd1rac/distilbert-base-uncased-finetuned-clinc + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_clinc_fibleep_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_clinc_fibleep_en.md new file mode 100644 index 00000000000000..4c8c1dcc43083a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_clinc_fibleep_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_clinc_fibleep DistilBertForSequenceClassification from fibleep +author: John Snow Labs +name: distilbert_base_uncased_finetuned_clinc_fibleep +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_clinc_fibleep` is a English model originally trained by fibleep. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_clinc_fibleep_en_5.5.0_3.0_1726861127477.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_clinc_fibleep_en_5.5.0_3.0_1726861127477.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_clinc_fibleep","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_clinc_fibleep", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_clinc_fibleep| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.9 MB| + +## References + +https://huggingface.co/fibleep/distilbert-base-uncased-finetuned-clinc \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_clinc_fibleep_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_clinc_fibleep_pipeline_en.md new file mode 100644 index 00000000000000..e5407aab499f46 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_clinc_fibleep_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_clinc_fibleep_pipeline pipeline DistilBertForSequenceClassification from fibleep +author: John Snow Labs +name: distilbert_base_uncased_finetuned_clinc_fibleep_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_clinc_fibleep_pipeline` is a English model originally trained by fibleep. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_clinc_fibleep_pipeline_en_5.5.0_3.0_1726861140678.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_clinc_fibleep_pipeline_en_5.5.0_3.0_1726861140678.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_clinc_fibleep_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_clinc_fibleep_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_clinc_fibleep_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.9 MB| + +## References + +https://huggingface.co/fibleep/distilbert-base-uncased-finetuned-clinc + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_clinc_khalidr_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_clinc_khalidr_en.md new file mode 100644 index 00000000000000..54f721a20c4e87 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_clinc_khalidr_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_clinc_khalidr DistilBertForSequenceClassification from khalidr +author: John Snow Labs +name: distilbert_base_uncased_finetuned_clinc_khalidr +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_clinc_khalidr` is a English model originally trained by khalidr. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_clinc_khalidr_en_5.5.0_3.0_1726841555661.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_clinc_khalidr_en_5.5.0_3.0_1726841555661.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_clinc_khalidr","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_clinc_khalidr", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_clinc_khalidr| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.9 MB| + +## References + +https://huggingface.co/khalidr/distilbert-base-uncased-finetuned-clinc \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_clinc_khalidr_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_clinc_khalidr_pipeline_en.md new file mode 100644 index 00000000000000..a1c5cc264b3afd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_clinc_khalidr_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_clinc_khalidr_pipeline pipeline DistilBertForSequenceClassification from khalidr +author: John Snow Labs +name: distilbert_base_uncased_finetuned_clinc_khalidr_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_clinc_khalidr_pipeline` is a English model originally trained by khalidr. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_clinc_khalidr_pipeline_en_5.5.0_3.0_1726841568600.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_clinc_khalidr_pipeline_en_5.5.0_3.0_1726841568600.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_clinc_khalidr_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_clinc_khalidr_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_clinc_khalidr_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.9 MB| + +## References + +https://huggingface.co/khalidr/distilbert-base-uncased-finetuned-clinc + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_clinc_mealduct_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_clinc_mealduct_en.md new file mode 100644 index 00000000000000..dc267b6f49fb7f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_clinc_mealduct_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_clinc_mealduct DistilBertForSequenceClassification from MealDuct +author: John Snow Labs +name: distilbert_base_uncased_finetuned_clinc_mealduct +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_clinc_mealduct` is a English model originally trained by MealDuct. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_clinc_mealduct_en_5.5.0_3.0_1726792283067.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_clinc_mealduct_en_5.5.0_3.0_1726792283067.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_clinc_mealduct","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_clinc_mealduct", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_clinc_mealduct| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.9 MB| + +## References + +https://huggingface.co/MealDuct/distilbert-base-uncased-finetuned-clinc \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_cola_faresg42_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_cola_faresg42_en.md new file mode 100644 index 00000000000000..dc70cb9c2913cd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_cola_faresg42_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_cola_faresg42 DistilBertForSequenceClassification from FaresG42 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_cola_faresg42 +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_cola_faresg42` is a English model originally trained by FaresG42. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_faresg42_en_5.5.0_3.0_1726848524016.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_faresg42_en_5.5.0_3.0_1726848524016.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_cola_faresg42","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_cola_faresg42", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_cola_faresg42| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/FaresG42/distilbert-base-uncased-finetuned-cola \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_cola_faresg42_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_cola_faresg42_pipeline_en.md new file mode 100644 index 00000000000000..0474448eed5dc6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_cola_faresg42_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_cola_faresg42_pipeline pipeline DistilBertForSequenceClassification from FaresG42 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_cola_faresg42_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_cola_faresg42_pipeline` is a English model originally trained by FaresG42. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_faresg42_pipeline_en_5.5.0_3.0_1726848540930.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_faresg42_pipeline_en_5.5.0_3.0_1726848540930.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_cola_faresg42_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_cola_faresg42_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_cola_faresg42_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/FaresG42/distilbert-base-uncased-finetuned-cola + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_cola_k_kiron_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_cola_k_kiron_en.md new file mode 100644 index 00000000000000..51de99a2f404c4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_cola_k_kiron_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_cola_k_kiron DistilBertForSequenceClassification from K-kiron +author: John Snow Labs +name: distilbert_base_uncased_finetuned_cola_k_kiron +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_cola_k_kiron` is a English model originally trained by K-kiron. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_k_kiron_en_5.5.0_3.0_1726823935284.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_k_kiron_en_5.5.0_3.0_1726823935284.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_cola_k_kiron","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_cola_k_kiron", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_cola_k_kiron| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/K-kiron/distilbert-base-uncased-finetuned-cola \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_cola_k_kiron_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_cola_k_kiron_pipeline_en.md new file mode 100644 index 00000000000000..ac4fed97273ad2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_cola_k_kiron_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_cola_k_kiron_pipeline pipeline DistilBertForSequenceClassification from K-kiron +author: John Snow Labs +name: distilbert_base_uncased_finetuned_cola_k_kiron_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_cola_k_kiron_pipeline` is a English model originally trained by K-kiron. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_k_kiron_pipeline_en_5.5.0_3.0_1726823946955.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_k_kiron_pipeline_en_5.5.0_3.0_1726823946955.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_cola_k_kiron_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_cola_k_kiron_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_cola_k_kiron_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/K-kiron/distilbert-base-uncased-finetuned-cola + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_cola_majid097_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_cola_majid097_en.md new file mode 100644 index 00000000000000..a3f70a393e0a83 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_cola_majid097_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_cola_majid097 DistilBertForSequenceClassification from Majid097 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_cola_majid097 +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_cola_majid097` is a English model originally trained by Majid097. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_majid097_en_5.5.0_3.0_1726842123211.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_majid097_en_5.5.0_3.0_1726842123211.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_cola_majid097","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_cola_majid097", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_cola_majid097| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Majid097/distilbert-base-uncased-finetuned-cola \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_cola_majid097_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_cola_majid097_pipeline_en.md new file mode 100644 index 00000000000000..88822832226a6f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_cola_majid097_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_cola_majid097_pipeline pipeline DistilBertForSequenceClassification from Majid097 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_cola_majid097_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_cola_majid097_pipeline` is a English model originally trained by Majid097. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_majid097_pipeline_en_5.5.0_3.0_1726842135296.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_majid097_pipeline_en_5.5.0_3.0_1726842135296.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_cola_majid097_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_cola_majid097_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_cola_majid097_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Majid097/distilbert-base-uncased-finetuned-cola + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_cola_rubensmau_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_cola_rubensmau_pipeline_en.md new file mode 100644 index 00000000000000..ba72dab0e93297 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_cola_rubensmau_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_cola_rubensmau_pipeline pipeline DistilBertForSequenceClassification from rubensmau +author: John Snow Labs +name: distilbert_base_uncased_finetuned_cola_rubensmau_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_cola_rubensmau_pipeline` is a English model originally trained by rubensmau. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_rubensmau_pipeline_en_5.5.0_3.0_1726809776774.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_rubensmau_pipeline_en_5.5.0_3.0_1726809776774.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_cola_rubensmau_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_cola_rubensmau_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_cola_rubensmau_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/rubensmau/distilbert-base-uncased-finetuned-cola + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_cola_shivamklr_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_cola_shivamklr_en.md new file mode 100644 index 00000000000000..2af57bf8187555 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_cola_shivamklr_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_cola_shivamklr DistilBertForSequenceClassification from shivamklr +author: John Snow Labs +name: distilbert_base_uncased_finetuned_cola_shivamklr +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_cola_shivamklr` is a English model originally trained by shivamklr. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_shivamklr_en_5.5.0_3.0_1726792593747.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_shivamklr_en_5.5.0_3.0_1726792593747.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_cola_shivamklr","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_cola_shivamklr", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_cola_shivamklr| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/shivamklr/distilbert-base-uncased-finetuned-cola \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_cola_tristandewildt_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_cola_tristandewildt_en.md new file mode 100644 index 00000000000000..01387d6c8102c0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_cola_tristandewildt_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_cola_tristandewildt DistilBertForSequenceClassification from TristandeWildt +author: John Snow Labs +name: distilbert_base_uncased_finetuned_cola_tristandewildt +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_cola_tristandewildt` is a English model originally trained by TristandeWildt. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_tristandewildt_en_5.5.0_3.0_1726823857983.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_tristandewildt_en_5.5.0_3.0_1726823857983.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_cola_tristandewildt","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_cola_tristandewildt", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_cola_tristandewildt| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/TristandeWildt/distilbert-base-uncased-finetuned-cola \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_cola_tristandewildt_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_cola_tristandewildt_pipeline_en.md new file mode 100644 index 00000000000000..9227568f877507 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_cola_tristandewildt_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_cola_tristandewildt_pipeline pipeline DistilBertForSequenceClassification from TristandeWildt +author: John Snow Labs +name: distilbert_base_uncased_finetuned_cola_tristandewildt_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_cola_tristandewildt_pipeline` is a English model originally trained by TristandeWildt. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_tristandewildt_pipeline_en_5.5.0_3.0_1726823869757.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_tristandewildt_pipeline_en_5.5.0_3.0_1726823869757.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_cola_tristandewildt_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_cola_tristandewildt_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_cola_tristandewildt_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/TristandeWildt/distilbert-base-uncased-finetuned-cola + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_0605_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_0605_en.md new file mode 100644 index 00000000000000..be0123076fa867 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_0605_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_0605 DistilBertForSequenceClassification from kwkwkwkwpark +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_0605 +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_0605` is a English model originally trained by kwkwkwkwpark. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_0605_en_5.5.0_3.0_1726840987655.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_0605_en_5.5.0_3.0_1726840987655.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_0605","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_0605", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_0605| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/kwkwkwkwpark/distilbert-base-uncased-finetuned-emotion-0605 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_0605_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_0605_pipeline_en.md new file mode 100644 index 00000000000000..cb024e757d8e4b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_0605_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_0605_pipeline pipeline DistilBertForSequenceClassification from kwkwkwkwpark +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_0605_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_0605_pipeline` is a English model originally trained by kwkwkwkwpark. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_0605_pipeline_en_5.5.0_3.0_1726840999685.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_0605_pipeline_en_5.5.0_3.0_1726840999685.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_0605_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_0605_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_0605_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/kwkwkwkwpark/distilbert-base-uncased-finetuned-emotion-0605 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_andmog77_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_andmog77_en.md new file mode 100644 index 00000000000000..f59d1b4669c961 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_andmog77_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_andmog77 DistilBertForSequenceClassification from andmog77 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_andmog77 +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_andmog77` is a English model originally trained by andmog77. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_andmog77_en_5.5.0_3.0_1726829920271.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_andmog77_en_5.5.0_3.0_1726829920271.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_andmog77","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_andmog77", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_andmog77| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/andmog77/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_andmog77_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_andmog77_pipeline_en.md new file mode 100644 index 00000000000000..ee7dca4ccd00ef --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_andmog77_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_andmog77_pipeline pipeline DistilBertForSequenceClassification from andmog77 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_andmog77_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_andmog77_pipeline` is a English model originally trained by andmog77. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_andmog77_pipeline_en_5.5.0_3.0_1726829933595.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_andmog77_pipeline_en_5.5.0_3.0_1726829933595.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_andmog77_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_andmog77_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_andmog77_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/andmog77/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_awayes_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_awayes_pipeline_en.md new file mode 100644 index 00000000000000..c7d06e6ab2eab2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_awayes_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_awayes_pipeline pipeline DistilBertForSequenceClassification from Awayes +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_awayes_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_awayes_pipeline` is a English model originally trained by Awayes. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_awayes_pipeline_en_5.5.0_3.0_1726860916099.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_awayes_pipeline_en_5.5.0_3.0_1726860916099.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_awayes_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_awayes_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_awayes_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Awayes/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_bosezhang_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_bosezhang_en.md new file mode 100644 index 00000000000000..ded4815babec65 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_bosezhang_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_bosezhang DistilBertForSequenceClassification from bosezhang +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_bosezhang +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_bosezhang` is a English model originally trained by bosezhang. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_bosezhang_en_5.5.0_3.0_1726841970169.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_bosezhang_en_5.5.0_3.0_1726841970169.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_bosezhang","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_bosezhang", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_bosezhang| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/bosezhang/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_btown2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_btown2_pipeline_en.md new file mode 100644 index 00000000000000..2ca2674ab61f71 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_btown2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_btown2_pipeline pipeline DistilBertForSequenceClassification from btown2 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_btown2_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_btown2_pipeline` is a English model originally trained by btown2. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_btown2_pipeline_en_5.5.0_3.0_1726809411677.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_btown2_pipeline_en_5.5.0_3.0_1726809411677.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_btown2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_btown2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_btown2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/btown2/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_carver63_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_carver63_en.md new file mode 100644 index 00000000000000..a1476d0c456a6a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_carver63_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_carver63 DistilBertForSequenceClassification from carver63 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_carver63 +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_carver63` is a English model originally trained by carver63. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_carver63_en_5.5.0_3.0_1726860756615.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_carver63_en_5.5.0_3.0_1726860756615.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_carver63","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_carver63", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_carver63| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/carver63/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_carver63_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_carver63_pipeline_en.md new file mode 100644 index 00000000000000..49dee9e030ee8c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_carver63_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_carver63_pipeline pipeline DistilBertForSequenceClassification from carver63 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_carver63_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_carver63_pipeline` is a English model originally trained by carver63. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_carver63_pipeline_en_5.5.0_3.0_1726860768346.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_carver63_pipeline_en_5.5.0_3.0_1726860768346.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_carver63_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_carver63_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_carver63_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/carver63/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_chhabi_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_chhabi_en.md new file mode 100644 index 00000000000000..e1ec9e5d0c51a6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_chhabi_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_chhabi DistilBertForSequenceClassification from Chhabi +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_chhabi +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_chhabi` is a English model originally trained by Chhabi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_chhabi_en_5.5.0_3.0_1726823530293.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_chhabi_en_5.5.0_3.0_1726823530293.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_chhabi","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_chhabi", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_chhabi| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Chhabi/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_chhabi_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_chhabi_pipeline_en.md new file mode 100644 index 00000000000000..acbeba3328cfdf --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_chhabi_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_chhabi_pipeline pipeline DistilBertForSequenceClassification from Chhabi +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_chhabi_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_chhabi_pipeline` is a English model originally trained by Chhabi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_chhabi_pipeline_en_5.5.0_3.0_1726823542054.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_chhabi_pipeline_en_5.5.0_3.0_1726823542054.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_chhabi_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_chhabi_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_chhabi_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Chhabi/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_devborbot_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_devborbot_en.md new file mode 100644 index 00000000000000..a1049e4386601c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_devborbot_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_devborbot DistilBertForSequenceClassification from devBorbot +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_devborbot +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_devborbot` is a English model originally trained by devBorbot. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_devborbot_en_5.5.0_3.0_1726861011931.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_devborbot_en_5.5.0_3.0_1726861011931.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_devborbot","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_devborbot", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_devborbot| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/devBorbot/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_devborbot_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_devborbot_pipeline_en.md new file mode 100644 index 00000000000000..41f332709b6fe3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_devborbot_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_devborbot_pipeline pipeline DistilBertForSequenceClassification from devBorbot +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_devborbot_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_devborbot_pipeline` is a English model originally trained by devBorbot. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_devborbot_pipeline_en_5.5.0_3.0_1726861025491.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_devborbot_pipeline_en_5.5.0_3.0_1726861025491.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_devborbot_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_devborbot_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_devborbot_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/devBorbot/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_dorol_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_dorol_en.md new file mode 100644 index 00000000000000..ec94591165282e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_dorol_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_dorol DistilBertForSequenceClassification from doroL +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_dorol +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_dorol` is a English model originally trained by doroL. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_dorol_en_5.5.0_3.0_1726829992445.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_dorol_en_5.5.0_3.0_1726829992445.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_dorol","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_dorol", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_dorol| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/doroL/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_dorol_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_dorol_pipeline_en.md new file mode 100644 index 00000000000000..9f1543da413c65 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_dorol_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_dorol_pipeline pipeline DistilBertForSequenceClassification from doroL +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_dorol_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_dorol_pipeline` is a English model originally trained by doroL. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_dorol_pipeline_en_5.5.0_3.0_1726830005062.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_dorol_pipeline_en_5.5.0_3.0_1726830005062.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_dorol_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_dorol_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_dorol_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/doroL/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_duke123456_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_duke123456_en.md new file mode 100644 index 00000000000000..eae9c0fd82dee8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_duke123456_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_duke123456 DistilBertForSequenceClassification from duke123456 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_duke123456 +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_duke123456` is a English model originally trained by duke123456. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_duke123456_en_5.5.0_3.0_1726861099990.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_duke123456_en_5.5.0_3.0_1726861099990.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_duke123456","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_duke123456", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_duke123456| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/duke123456/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_duke123456_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_duke123456_pipeline_en.md new file mode 100644 index 00000000000000..7982177f4e3899 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_duke123456_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_duke123456_pipeline pipeline DistilBertForSequenceClassification from duke123456 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_duke123456_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_duke123456_pipeline` is a English model originally trained by duke123456. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_duke123456_pipeline_en_5.5.0_3.0_1726861112883.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_duke123456_pipeline_en_5.5.0_3.0_1726861112883.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_duke123456_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_duke123456_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_duke123456_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/duke123456/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_duxinghua_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_duxinghua_en.md new file mode 100644 index 00000000000000..55532c915a3dab --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_duxinghua_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_duxinghua DistilBertForSequenceClassification from duxinghua +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_duxinghua +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_duxinghua` is a English model originally trained by duxinghua. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_duxinghua_en_5.5.0_3.0_1726830452699.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_duxinghua_en_5.5.0_3.0_1726830452699.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_duxinghua","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_duxinghua", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_duxinghua| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/duxinghua/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_duxinghua_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_duxinghua_pipeline_en.md new file mode 100644 index 00000000000000..1cbc4488838ba7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_duxinghua_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_duxinghua_pipeline pipeline DistilBertForSequenceClassification from duxinghua +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_duxinghua_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_duxinghua_pipeline` is a English model originally trained by duxinghua. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_duxinghua_pipeline_en_5.5.0_3.0_1726830464829.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_duxinghua_pipeline_en_5.5.0_3.0_1726830464829.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_duxinghua_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_duxinghua_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_duxinghua_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/duxinghua/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_edarmartinez_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_edarmartinez_en.md new file mode 100644 index 00000000000000..2f915292eee438 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_edarmartinez_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_edarmartinez DistilBertForSequenceClassification from edarmartinez +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_edarmartinez +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_edarmartinez` is a English model originally trained by edarmartinez. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_edarmartinez_en_5.5.0_3.0_1726830224241.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_edarmartinez_en_5.5.0_3.0_1726830224241.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_edarmartinez","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_edarmartinez", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_edarmartinez| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/edarmartinez/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_edarmartinez_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_edarmartinez_pipeline_en.md new file mode 100644 index 00000000000000..6e83b74800dd34 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_edarmartinez_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_edarmartinez_pipeline pipeline DistilBertForSequenceClassification from edarmartinez +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_edarmartinez_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_edarmartinez_pipeline` is a English model originally trained by edarmartinez. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_edarmartinez_pipeline_en_5.5.0_3.0_1726830236335.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_edarmartinez_pipeline_en_5.5.0_3.0_1726830236335.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_edarmartinez_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_edarmartinez_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_edarmartinez_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/edarmartinez/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_farzanmrz_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_farzanmrz_en.md new file mode 100644 index 00000000000000..efe5f5fff5f64d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_farzanmrz_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_farzanmrz DistilBertForSequenceClassification from farzanmrz +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_farzanmrz +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_farzanmrz` is a English model originally trained by farzanmrz. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_farzanmrz_en_5.5.0_3.0_1726791988989.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_farzanmrz_en_5.5.0_3.0_1726791988989.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_farzanmrz","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_farzanmrz", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_farzanmrz| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/farzanmrz/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_henrikho_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_henrikho_en.md new file mode 100644 index 00000000000000..2fce4d9d354214 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_henrikho_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_henrikho DistilBertForSequenceClassification from henrikho +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_henrikho +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_henrikho` is a English model originally trained by henrikho. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_henrikho_en_5.5.0_3.0_1726809426464.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_henrikho_en_5.5.0_3.0_1726809426464.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_henrikho","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_henrikho", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_henrikho| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/henrikho/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_henrikho_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_henrikho_pipeline_en.md new file mode 100644 index 00000000000000..1a0791689ad4ad --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_henrikho_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_henrikho_pipeline pipeline DistilBertForSequenceClassification from henrikho +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_henrikho_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_henrikho_pipeline` is a English model originally trained by henrikho. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_henrikho_pipeline_en_5.5.0_3.0_1726809439723.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_henrikho_pipeline_en_5.5.0_3.0_1726809439723.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_henrikho_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_henrikho_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_henrikho_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/henrikho/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_jhagege_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_jhagege_en.md new file mode 100644 index 00000000000000..a8eb1592a63916 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_jhagege_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_jhagege DistilBertForSequenceClassification from jhagege +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_jhagege +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_jhagege` is a English model originally trained by jhagege. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_jhagege_en_5.5.0_3.0_1726842163474.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_jhagege_en_5.5.0_3.0_1726842163474.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_jhagege","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_jhagege", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_jhagege| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/jhagege/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_jhagege_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_jhagege_pipeline_en.md new file mode 100644 index 00000000000000..7ac7155c95bcad --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_jhagege_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_jhagege_pipeline pipeline DistilBertForSequenceClassification from jhagege +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_jhagege_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_jhagege_pipeline` is a English model originally trained by jhagege. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_jhagege_pipeline_en_5.5.0_3.0_1726842176211.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_jhagege_pipeline_en_5.5.0_3.0_1726842176211.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_jhagege_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_jhagege_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_jhagege_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/jhagege/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_karol9kk_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_karol9kk_en.md new file mode 100644 index 00000000000000..2eaf75de1e73f0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_karol9kk_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_karol9kk DistilBertForSequenceClassification from karol9kk +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_karol9kk +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_karol9kk` is a English model originally trained by karol9kk. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_karol9kk_en_5.5.0_3.0_1726832953958.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_karol9kk_en_5.5.0_3.0_1726832953958.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_karol9kk","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_karol9kk", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_karol9kk| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/karol9kk/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_karol9kk_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_karol9kk_pipeline_en.md new file mode 100644 index 00000000000000..0e9e23d42ee11c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_karol9kk_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_karol9kk_pipeline pipeline DistilBertForSequenceClassification from karol9kk +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_karol9kk_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_karol9kk_pipeline` is a English model originally trained by karol9kk. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_karol9kk_pipeline_en_5.5.0_3.0_1726832966602.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_karol9kk_pipeline_en_5.5.0_3.0_1726832966602.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_karol9kk_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_karol9kk_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_karol9kk_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/karol9kk/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_khalidr_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_khalidr_en.md new file mode 100644 index 00000000000000..16f2f360de76bb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_khalidr_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_khalidr DistilBertForSequenceClassification from khalidr +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_khalidr +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_khalidr` is a English model originally trained by khalidr. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_khalidr_en_5.5.0_3.0_1726809552537.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_khalidr_en_5.5.0_3.0_1726809552537.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_khalidr","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_khalidr", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_khalidr| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/khalidr/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_khalidr_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_khalidr_pipeline_en.md new file mode 100644 index 00000000000000..cd58de15dbad81 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_khalidr_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_khalidr_pipeline pipeline DistilBertForSequenceClassification from khalidr +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_khalidr_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_khalidr_pipeline` is a English model originally trained by khalidr. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_khalidr_pipeline_en_5.5.0_3.0_1726809563870.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_khalidr_pipeline_en_5.5.0_3.0_1726809563870.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_khalidr_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_khalidr_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_khalidr_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/khalidr/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_lostsartre_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_lostsartre_en.md new file mode 100644 index 00000000000000..2a4457072f4fb2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_lostsartre_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_lostsartre DistilBertForSequenceClassification from lostsartre +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_lostsartre +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_lostsartre` is a English model originally trained by lostsartre. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_lostsartre_en_5.5.0_3.0_1726841442221.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_lostsartre_en_5.5.0_3.0_1726841442221.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_lostsartre","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_lostsartre", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_lostsartre| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/lostsartre/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_lostsartre_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_lostsartre_pipeline_en.md new file mode 100644 index 00000000000000..552c1df1592b25 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_lostsartre_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_lostsartre_pipeline pipeline DistilBertForSequenceClassification from lostsartre +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_lostsartre_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_lostsartre_pipeline` is a English model originally trained by lostsartre. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_lostsartre_pipeline_en_5.5.0_3.0_1726841455028.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_lostsartre_pipeline_en_5.5.0_3.0_1726841455028.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_lostsartre_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_lostsartre_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_lostsartre_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/lostsartre/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_maichle_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_maichle_en.md new file mode 100644 index 00000000000000..c0ba168ca6b8f5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_maichle_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_maichle DistilBertForSequenceClassification from Maichle +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_maichle +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_maichle` is a English model originally trained by Maichle. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_maichle_en_5.5.0_3.0_1726841219379.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_maichle_en_5.5.0_3.0_1726841219379.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_maichle","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_maichle", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_maichle| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Maichle/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_maichle_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_maichle_pipeline_en.md new file mode 100644 index 00000000000000..e9737dc2ec75ac --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_maichle_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_maichle_pipeline pipeline DistilBertForSequenceClassification from Maichle +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_maichle_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_maichle_pipeline` is a English model originally trained by Maichle. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_maichle_pipeline_en_5.5.0_3.0_1726841234251.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_maichle_pipeline_en_5.5.0_3.0_1726841234251.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_maichle_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_maichle_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_maichle_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Maichle/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_minseok0109_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_minseok0109_en.md new file mode 100644 index 00000000000000..f7805d92df5f0f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_minseok0109_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_minseok0109 DistilBertForSequenceClassification from minseok0109 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_minseok0109 +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_minseok0109` is a English model originally trained by minseok0109. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_minseok0109_en_5.5.0_3.0_1726823585900.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_minseok0109_en_5.5.0_3.0_1726823585900.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_minseok0109","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_minseok0109", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_minseok0109| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/minseok0109/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_minsu_chae_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_minsu_chae_pipeline_en.md new file mode 100644 index 00000000000000..d31ba0f70546c6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_minsu_chae_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_minsu_chae_pipeline pipeline DistilBertForSequenceClassification from Minsu-Chae +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_minsu_chae_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_minsu_chae_pipeline` is a English model originally trained by Minsu-Chae. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_minsu_chae_pipeline_en_5.5.0_3.0_1726809347022.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_minsu_chae_pipeline_en_5.5.0_3.0_1726809347022.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_minsu_chae_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_minsu_chae_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_minsu_chae_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Minsu-Chae/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_mive_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_mive_en.md new file mode 100644 index 00000000000000..d2d08a8f7c9e09 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_mive_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_mive DistilBertForSequenceClassification from mive +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_mive +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_mive` is a English model originally trained by mive. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_mive_en_5.5.0_3.0_1726823941398.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_mive_en_5.5.0_3.0_1726823941398.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_mive","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_mive", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_mive| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/mive/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_mive_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_mive_pipeline_en.md new file mode 100644 index 00000000000000..c0be26915d9219 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_mive_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_mive_pipeline pipeline DistilBertForSequenceClassification from mive +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_mive_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_mive_pipeline` is a English model originally trained by mive. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_mive_pipeline_en_5.5.0_3.0_1726823955471.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_mive_pipeline_en_5.5.0_3.0_1726823955471.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_mive_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_mive_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_mive_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/mive/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_nkkbr_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_nkkbr_en.md new file mode 100644 index 00000000000000..3b7aa2625397a5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_nkkbr_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_nkkbr DistilBertForSequenceClassification from nkkbr +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_nkkbr +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_nkkbr` is a English model originally trained by nkkbr. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_nkkbr_en_5.5.0_3.0_1726830524561.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_nkkbr_en_5.5.0_3.0_1726830524561.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_nkkbr","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_nkkbr", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_nkkbr| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/nkkbr/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_nkkbr_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_nkkbr_pipeline_en.md new file mode 100644 index 00000000000000..d1d86b9e7bb293 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_nkkbr_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_nkkbr_pipeline pipeline DistilBertForSequenceClassification from nkkbr +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_nkkbr_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_nkkbr_pipeline` is a English model originally trained by nkkbr. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_nkkbr_pipeline_en_5.5.0_3.0_1726830538102.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_nkkbr_pipeline_en_5.5.0_3.0_1726830538102.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_nkkbr_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_nkkbr_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_nkkbr_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/nkkbr/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_nurik0210_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_nurik0210_en.md new file mode 100644 index 00000000000000..a9f065063ad963 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_nurik0210_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_nurik0210 DistilBertForSequenceClassification from nurik0210 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_nurik0210 +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_nurik0210` is a English model originally trained by nurik0210. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_nurik0210_en_5.5.0_3.0_1726871397657.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_nurik0210_en_5.5.0_3.0_1726871397657.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_nurik0210","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_nurik0210", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_nurik0210| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/nurik0210/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_nurik0210_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_nurik0210_pipeline_en.md new file mode 100644 index 00000000000000..c6981ff4da5911 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_nurik0210_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_nurik0210_pipeline pipeline DistilBertForSequenceClassification from nurik0210 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_nurik0210_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_nurik0210_pipeline` is a English model originally trained by nurik0210. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_nurik0210_pipeline_en_5.5.0_3.0_1726871409801.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_nurik0210_pipeline_en_5.5.0_3.0_1726871409801.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_nurik0210_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_nurik0210_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_nurik0210_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/nurik0210/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_pbruna_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_pbruna_en.md new file mode 100644 index 00000000000000..73951a738e15f4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_pbruna_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_pbruna DistilBertForSequenceClassification from pbruna +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_pbruna +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_pbruna` is a English model originally trained by pbruna. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_pbruna_en_5.5.0_3.0_1726842259077.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_pbruna_en_5.5.0_3.0_1726842259077.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_pbruna","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_pbruna", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_pbruna| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/pbruna/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_pbruna_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_pbruna_pipeline_en.md new file mode 100644 index 00000000000000..b517835fd1eb27 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_pbruna_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_pbruna_pipeline pipeline DistilBertForSequenceClassification from pbruna +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_pbruna_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_pbruna_pipeline` is a English model originally trained by pbruna. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_pbruna_pipeline_en_5.5.0_3.0_1726842271579.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_pbruna_pipeline_en_5.5.0_3.0_1726842271579.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_pbruna_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_pbruna_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_pbruna_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/pbruna/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_raota_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_raota_en.md new file mode 100644 index 00000000000000..3f832cdf294944 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_raota_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_raota DistilBertForSequenceClassification from raota +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_raota +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_raota` is a English model originally trained by raota. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_raota_en_5.5.0_3.0_1726792219323.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_raota_en_5.5.0_3.0_1726792219323.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_raota","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_raota", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_raota| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/raota/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_raota_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_raota_pipeline_en.md new file mode 100644 index 00000000000000..e8864b145c11a5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_raota_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_raota_pipeline pipeline DistilBertForSequenceClassification from raota +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_raota_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_raota_pipeline` is a English model originally trained by raota. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_raota_pipeline_en_5.5.0_3.0_1726792233818.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_raota_pipeline_en_5.5.0_3.0_1726792233818.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_raota_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_raota_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_raota_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/raota/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_richwiss_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_richwiss_en.md new file mode 100644 index 00000000000000..da0acf142c5928 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_richwiss_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_richwiss DistilBertForSequenceClassification from richwiss +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_richwiss +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_richwiss` is a English model originally trained by richwiss. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_richwiss_en_5.5.0_3.0_1726830181521.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_richwiss_en_5.5.0_3.0_1726830181521.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_richwiss","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_richwiss", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_richwiss| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/richwiss/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_richwiss_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_richwiss_pipeline_en.md new file mode 100644 index 00000000000000..db25e25c9493ea --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_richwiss_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_richwiss_pipeline pipeline DistilBertForSequenceClassification from richwiss +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_richwiss_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_richwiss_pipeline` is a English model originally trained by richwiss. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_richwiss_pipeline_en_5.5.0_3.0_1726830194022.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_richwiss_pipeline_en_5.5.0_3.0_1726830194022.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_richwiss_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_richwiss_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_richwiss_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/richwiss/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_sergiomer_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_sergiomer_en.md new file mode 100644 index 00000000000000..8924bf815c069b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_sergiomer_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_sergiomer DistilBertForSequenceClassification from SergioMer +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_sergiomer +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_sergiomer` is a English model originally trained by SergioMer. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_sergiomer_en_5.5.0_3.0_1726861041197.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_sergiomer_en_5.5.0_3.0_1726861041197.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_sergiomer","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_sergiomer", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_sergiomer| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/SergioMer/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_sergiomer_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_sergiomer_pipeline_en.md new file mode 100644 index 00000000000000..69b523e2eb11d0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_sergiomer_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_sergiomer_pipeline pipeline DistilBertForSequenceClassification from SergioMer +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_sergiomer_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_sergiomer_pipeline` is a English model originally trained by SergioMer. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_sergiomer_pipeline_en_5.5.0_3.0_1726861052972.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_sergiomer_pipeline_en_5.5.0_3.0_1726861052972.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_sergiomer_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_sergiomer_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_sergiomer_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/SergioMer/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_shng2025_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_shng2025_pipeline_en.md new file mode 100644 index 00000000000000..84c50b2bb3f790 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_shng2025_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_shng2025_pipeline pipeline DistilBertForSequenceClassification from shng2025 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_shng2025_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_shng2025_pipeline` is a English model originally trained by shng2025. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_shng2025_pipeline_en_5.5.0_3.0_1726871516697.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_shng2025_pipeline_en_5.5.0_3.0_1726871516697.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_shng2025_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_shng2025_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_shng2025_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/shng2025/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_shotaro30678_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_shotaro30678_en.md new file mode 100644 index 00000000000000..344b947c44a901 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_shotaro30678_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_shotaro30678 DistilBertForSequenceClassification from Shotaro30678 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_shotaro30678 +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_shotaro30678` is a English model originally trained by Shotaro30678. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_shotaro30678_en_5.5.0_3.0_1726840872448.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_shotaro30678_en_5.5.0_3.0_1726840872448.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_shotaro30678","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_shotaro30678", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_shotaro30678| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Shotaro30678/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_shotaro30678_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_shotaro30678_pipeline_en.md new file mode 100644 index 00000000000000..380211aa4d881e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_shotaro30678_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_shotaro30678_pipeline pipeline DistilBertForSequenceClassification from Shotaro30678 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_shotaro30678_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_shotaro30678_pipeline` is a English model originally trained by Shotaro30678. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_shotaro30678_pipeline_en_5.5.0_3.0_1726840890742.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_shotaro30678_pipeline_en_5.5.0_3.0_1726840890742.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_shotaro30678_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_shotaro30678_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_shotaro30678_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Shotaro30678/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_sk1709_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_sk1709_en.md new file mode 100644 index 00000000000000..658d39eeaa5e1b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_sk1709_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_sk1709 DistilBertForSequenceClassification from sk1709 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_sk1709 +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_sk1709` is a English model originally trained by sk1709. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_sk1709_en_5.5.0_3.0_1726861109627.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_sk1709_en_5.5.0_3.0_1726861109627.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_sk1709","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_sk1709", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_sk1709| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/sk1709/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_sk1709_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_sk1709_pipeline_en.md new file mode 100644 index 00000000000000..a74c74f730e6aa --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_sk1709_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_sk1709_pipeline pipeline DistilBertForSequenceClassification from sk1709 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_sk1709_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_sk1709_pipeline` is a English model originally trained by sk1709. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_sk1709_pipeline_en_5.5.0_3.0_1726861121889.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_sk1709_pipeline_en_5.5.0_3.0_1726861121889.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_sk1709_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_sk1709_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_sk1709_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/sk1709/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_tcurran4589_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_tcurran4589_en.md new file mode 100644 index 00000000000000..923f1047a5eac6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_tcurran4589_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_tcurran4589 DistilBertForSequenceClassification from TCurran4589 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_tcurran4589 +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_tcurran4589` is a English model originally trained by TCurran4589. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_tcurran4589_en_5.5.0_3.0_1726809413623.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_tcurran4589_en_5.5.0_3.0_1726809413623.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_tcurran4589","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_tcurran4589", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_tcurran4589| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/TCurran4589/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_tcurran4589_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_tcurran4589_pipeline_en.md new file mode 100644 index 00000000000000..6843e14f830e92 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_tcurran4589_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_tcurran4589_pipeline pipeline DistilBertForSequenceClassification from TCurran4589 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_tcurran4589_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_tcurran4589_pipeline` is a English model originally trained by TCurran4589. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_tcurran4589_pipeline_en_5.5.0_3.0_1726809425752.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_tcurran4589_pipeline_en_5.5.0_3.0_1726809425752.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_tcurran4589_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_tcurran4589_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_tcurran4589_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/TCurran4589/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_thinsu_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_thinsu_en.md new file mode 100644 index 00000000000000..6891f34c551c35 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_thinsu_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_thinsu DistilBertForSequenceClassification from ThinSu +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_thinsu +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_thinsu` is a English model originally trained by ThinSu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_thinsu_en_5.5.0_3.0_1726871611796.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_thinsu_en_5.5.0_3.0_1726871611796.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_thinsu","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_thinsu", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_thinsu| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/ThinSu/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_thinsu_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_thinsu_pipeline_en.md new file mode 100644 index 00000000000000..11f5184a4c1126 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_thinsu_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_thinsu_pipeline pipeline DistilBertForSequenceClassification from ThinSu +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_thinsu_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_thinsu_pipeline` is a English model originally trained by ThinSu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_thinsu_pipeline_en_5.5.0_3.0_1726871623442.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_thinsu_pipeline_en_5.5.0_3.0_1726871623442.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_thinsu_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_thinsu_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_thinsu_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/ThinSu/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_uaevuon_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_uaevuon_en.md new file mode 100644 index 00000000000000..7056d83025c50c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_uaevuon_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_uaevuon DistilBertForSequenceClassification from uaevuon +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_uaevuon +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_uaevuon` is a English model originally trained by uaevuon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_uaevuon_en_5.5.0_3.0_1726860824412.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_uaevuon_en_5.5.0_3.0_1726860824412.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_uaevuon","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_uaevuon", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_uaevuon| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/uaevuon/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_uaevuon_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_uaevuon_pipeline_en.md new file mode 100644 index 00000000000000..85b9fa37322a88 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_uaevuon_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_uaevuon_pipeline pipeline DistilBertForSequenceClassification from uaevuon +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_uaevuon_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_uaevuon_pipeline` is a English model originally trained by uaevuon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_uaevuon_pipeline_en_5.5.0_3.0_1726860836611.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_uaevuon_pipeline_en_5.5.0_3.0_1726860836611.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_uaevuon_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_uaevuon_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_uaevuon_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/uaevuon/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_vivienluna_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_vivienluna_en.md new file mode 100644 index 00000000000000..41508437ba5388 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_vivienluna_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_vivienluna DistilBertForSequenceClassification from VivienLuna +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_vivienluna +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_vivienluna` is a English model originally trained by VivienLuna. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_vivienluna_en_5.5.0_3.0_1726830262727.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_vivienluna_en_5.5.0_3.0_1726830262727.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_vivienluna","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_vivienluna", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_vivienluna| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/VivienLuna/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_vivienluna_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_vivienluna_pipeline_en.md new file mode 100644 index 00000000000000..314226f42aef2c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_vivienluna_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_vivienluna_pipeline pipeline DistilBertForSequenceClassification from VivienLuna +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_vivienluna_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_vivienluna_pipeline` is a English model originally trained by VivienLuna. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_vivienluna_pipeline_en_5.5.0_3.0_1726830275244.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_vivienluna_pipeline_en_5.5.0_3.0_1726830275244.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_vivienluna_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_vivienluna_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_vivienluna_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/VivienLuna/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_global_intent_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_global_intent_en.md new file mode 100644 index 00000000000000..9b6ef471e8ae83 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_global_intent_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_global_intent DistilBertForSequenceClassification from alibidaran +author: John Snow Labs +name: distilbert_base_uncased_finetuned_global_intent +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_global_intent` is a English model originally trained by alibidaran. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_global_intent_en_5.5.0_3.0_1726792477906.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_global_intent_en_5.5.0_3.0_1726792477906.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_global_intent","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_global_intent", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_global_intent| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.6 MB| + +## References + +https://huggingface.co/alibidaran/distilbert-base-uncased-finetuned-Global_Intent \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_global_intent_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_global_intent_pipeline_en.md new file mode 100644 index 00000000000000..32ba0738bf430f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_global_intent_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_global_intent_pipeline pipeline DistilBertForSequenceClassification from alibidaran +author: John Snow Labs +name: distilbert_base_uncased_finetuned_global_intent_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_global_intent_pipeline` is a English model originally trained by alibidaran. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_global_intent_pipeline_en_5.5.0_3.0_1726792490204.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_global_intent_pipeline_en_5.5.0_3.0_1726792490204.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_global_intent_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_global_intent_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_global_intent_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.7 MB| + +## References + +https://huggingface.co/alibidaran/distilbert-base-uncased-finetuned-Global_Intent + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_intent_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_intent_pipeline_en.md new file mode 100644 index 00000000000000..df083eb9ff8a95 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_intent_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_intent_pipeline pipeline DistilBertForSequenceClassification from avivnat13 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_intent_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_intent_pipeline` is a English model originally trained by avivnat13. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_intent_pipeline_en_5.5.0_3.0_1726842387548.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_intent_pipeline_en_5.5.0_3.0_1726842387548.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_intent_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_intent_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_intent_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/avivnat13/distilbert-base-uncased-finetuned-intent + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_m_express_emo_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_m_express_emo_en.md new file mode 100644 index 00000000000000..b2dda9a4054cb7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_m_express_emo_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_m_express_emo DistilBertForSequenceClassification from Gregorig +author: John Snow Labs +name: distilbert_base_uncased_finetuned_m_express_emo +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_m_express_emo` is a English model originally trained by Gregorig. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_m_express_emo_en_5.5.0_3.0_1726823947366.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_m_express_emo_en_5.5.0_3.0_1726823947366.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_m_express_emo","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_m_express_emo", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_m_express_emo| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Gregorig/distilbert-base-uncased-finetuned-m_express_emo \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_m_express_emo_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_m_express_emo_pipeline_en.md new file mode 100644 index 00000000000000..50889919152682 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_m_express_emo_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_m_express_emo_pipeline pipeline DistilBertForSequenceClassification from Gregorig +author: John Snow Labs +name: distilbert_base_uncased_finetuned_m_express_emo_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_m_express_emo_pipeline` is a English model originally trained by Gregorig. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_m_express_emo_pipeline_en_5.5.0_3.0_1726823960076.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_m_express_emo_pipeline_en_5.5.0_3.0_1726823960076.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_m_express_emo_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_m_express_emo_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_m_express_emo_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Gregorig/distilbert-base-uncased-finetuned-m_express_emo + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_nlp_course_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_nlp_course_en.md new file mode 100644 index 00000000000000..cd3d6e5aa50a5e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_nlp_course_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_nlp_course DistilBertForQuestionAnswering from hanspeterlyngsoeraaschoujensen +author: John Snow Labs +name: distilbert_base_uncased_finetuned_nlp_course +date: 2024-09-20 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_nlp_course` is a English model originally trained by hanspeterlyngsoeraaschoujensen. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_nlp_course_en_5.5.0_3.0_1726851164231.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_nlp_course_en_5.5.0_3.0_1726851164231.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_uncased_finetuned_nlp_course","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_uncased_finetuned_nlp_course", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_nlp_course| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/hanspeterlyngsoeraaschoujensen/distilbert-base-uncased-finetuned-nlp-course \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_nlp_course_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_nlp_course_pipeline_en.md new file mode 100644 index 00000000000000..035ae80c38e4db --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_nlp_course_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_nlp_course_pipeline pipeline DistilBertForQuestionAnswering from hanspeterlyngsoeraaschoujensen +author: John Snow Labs +name: distilbert_base_uncased_finetuned_nlp_course_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_nlp_course_pipeline` is a English model originally trained by hanspeterlyngsoeraaschoujensen. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_nlp_course_pipeline_en_5.5.0_3.0_1726851177323.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_nlp_course_pipeline_en_5.5.0_3.0_1726851177323.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_nlp_course_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_nlp_course_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_nlp_course_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/hanspeterlyngsoeraaschoujensen/distilbert-base-uncased-finetuned-nlp-course + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_nlp_letters_s1_s2_all_class_weighted_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_nlp_letters_s1_s2_all_class_weighted_pipeline_en.md new file mode 100644 index 00000000000000..5dcf1ef33f4cb2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_nlp_letters_s1_s2_all_class_weighted_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_nlp_letters_s1_s2_all_class_weighted_pipeline pipeline DistilBertForSequenceClassification from ben-yu +author: John Snow Labs +name: distilbert_base_uncased_finetuned_nlp_letters_s1_s2_all_class_weighted_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_nlp_letters_s1_s2_all_class_weighted_pipeline` is a English model originally trained by ben-yu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_nlp_letters_s1_s2_all_class_weighted_pipeline_en_5.5.0_3.0_1726809651934.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_nlp_letters_s1_s2_all_class_weighted_pipeline_en_5.5.0_3.0_1726809651934.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_nlp_letters_s1_s2_all_class_weighted_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_nlp_letters_s1_s2_all_class_weighted_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_nlp_letters_s1_s2_all_class_weighted_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/ben-yu/distilbert-base-uncased-finetuned-nlp-letters-s1_s2-all-class-weighted + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_nlp_letters_s1_s2_degendered_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_nlp_letters_s1_s2_degendered_en.md new file mode 100644 index 00000000000000..75113b408f2e85 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_nlp_letters_s1_s2_degendered_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_nlp_letters_s1_s2_degendered DistilBertForSequenceClassification from ben-yu +author: John Snow Labs +name: distilbert_base_uncased_finetuned_nlp_letters_s1_s2_degendered +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_nlp_letters_s1_s2_degendered` is a English model originally trained by ben-yu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_nlp_letters_s1_s2_degendered_en_5.5.0_3.0_1726832595472.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_nlp_letters_s1_s2_degendered_en_5.5.0_3.0_1726832595472.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_nlp_letters_s1_s2_degendered","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_nlp_letters_s1_s2_degendered", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_nlp_letters_s1_s2_degendered| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/ben-yu/distilbert-base-uncased-finetuned-nlp-letters-s1-s2-degendered \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_nlp_letters_s1_s2_degendered_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_nlp_letters_s1_s2_degendered_pipeline_en.md new file mode 100644 index 00000000000000..9a23979c75af3b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_nlp_letters_s1_s2_degendered_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_nlp_letters_s1_s2_degendered_pipeline pipeline DistilBertForSequenceClassification from ben-yu +author: John Snow Labs +name: distilbert_base_uncased_finetuned_nlp_letters_s1_s2_degendered_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_nlp_letters_s1_s2_degendered_pipeline` is a English model originally trained by ben-yu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_nlp_letters_s1_s2_degendered_pipeline_en_5.5.0_3.0_1726832608007.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_nlp_letters_s1_s2_degendered_pipeline_en_5.5.0_3.0_1726832608007.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_nlp_letters_s1_s2_degendered_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_nlp_letters_s1_s2_degendered_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_nlp_letters_s1_s2_degendered_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/ben-yu/distilbert-base-uncased-finetuned-nlp-letters-s1-s2-degendered + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_orutra11_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_orutra11_en.md new file mode 100644 index 00000000000000..3f4b0da486d2b4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_orutra11_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_orutra11 DistilBertForSequenceClassification from orutra11 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_orutra11 +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_orutra11` is a English model originally trained by orutra11. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_orutra11_en_5.5.0_3.0_1726832840164.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_orutra11_en_5.5.0_3.0_1726832840164.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_orutra11","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_orutra11", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_orutra11| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/orutra11/distilbert-base-uncased-finetuned \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_orutra11_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_orutra11_pipeline_en.md new file mode 100644 index 00000000000000..2da7f7ac2d8819 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_orutra11_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_orutra11_pipeline pipeline DistilBertForSequenceClassification from orutra11 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_orutra11_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_orutra11_pipeline` is a English model originally trained by orutra11. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_orutra11_pipeline_en_5.5.0_3.0_1726832853816.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_orutra11_pipeline_en_5.5.0_3.0_1726832853816.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_orutra11_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_orutra11_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_orutra11_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/orutra11/distilbert-base-uncased-finetuned + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_pad_mult_clf_v2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_pad_mult_clf_v2_pipeline_en.md new file mode 100644 index 00000000000000..0582ca66a420ee --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_pad_mult_clf_v2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_pad_mult_clf_v2_pipeline pipeline DistilBertForSequenceClassification from netoferraz +author: John Snow Labs +name: distilbert_base_uncased_finetuned_pad_mult_clf_v2_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_pad_mult_clf_v2_pipeline` is a English model originally trained by netoferraz. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_pad_mult_clf_v2_pipeline_en_5.5.0_3.0_1726809538902.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_pad_mult_clf_v2_pipeline_en_5.5.0_3.0_1726809538902.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_pad_mult_clf_v2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_pad_mult_clf_v2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_pad_mult_clf_v2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/netoferraz/distilbert-base-uncased-finetuned-pad-mult-clf-v2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_reviews_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_reviews_en.md new file mode 100644 index 00000000000000..12b2a171d1271f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_reviews_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_reviews DistilBertForSequenceClassification from cetini18 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_reviews +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_reviews` is a English model originally trained by cetini18. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_reviews_en_5.5.0_3.0_1726830207256.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_reviews_en_5.5.0_3.0_1726830207256.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_reviews","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_reviews", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_reviews| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/cetini18/distilbert-base-uncased-finetuned-reviews \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_reviews_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_reviews_pipeline_en.md new file mode 100644 index 00000000000000..371f861e2c474c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_reviews_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_reviews_pipeline pipeline DistilBertForSequenceClassification from cetini18 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_reviews_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_reviews_pipeline` is a English model originally trained by cetini18. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_reviews_pipeline_en_5.5.0_3.0_1726830220286.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_reviews_pipeline_en_5.5.0_3.0_1726830220286.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_reviews_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_reviews_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_reviews_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/cetini18/distilbert-base-uncased-finetuned-reviews + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_rottentomatoes_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_rottentomatoes_en.md new file mode 100644 index 00000000000000..413470ed049ba7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_rottentomatoes_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_rottentomatoes DistilBertForSequenceClassification from mohamedsaeed823 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_rottentomatoes +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_rottentomatoes` is a English model originally trained by mohamedsaeed823. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_rottentomatoes_en_5.5.0_3.0_1726871306848.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_rottentomatoes_en_5.5.0_3.0_1726871306848.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_rottentomatoes","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_rottentomatoes", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_rottentomatoes| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/mohamedsaeed823/distilbert-base-uncased-finetuned-RottenTomatoes \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_rottentomatoes_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_rottentomatoes_pipeline_en.md new file mode 100644 index 00000000000000..e6e8de2c4c7bd6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_rottentomatoes_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_rottentomatoes_pipeline pipeline DistilBertForSequenceClassification from mohamedsaeed823 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_rottentomatoes_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_rottentomatoes_pipeline` is a English model originally trained by mohamedsaeed823. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_rottentomatoes_pipeline_en_5.5.0_3.0_1726871320892.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_rottentomatoes_pipeline_en_5.5.0_3.0_1726871320892.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_rottentomatoes_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_rottentomatoes_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_rottentomatoes_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/mohamedsaeed823/distilbert-base-uncased-finetuned-RottenTomatoes + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_scam_classification_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_scam_classification_en.md new file mode 100644 index 00000000000000..f401ab2e76cc15 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_scam_classification_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_scam_classification DistilBertForSequenceClassification from jaranohaal +author: John Snow Labs +name: distilbert_base_uncased_finetuned_scam_classification +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_scam_classification` is a English model originally trained by jaranohaal. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_scam_classification_en_5.5.0_3.0_1726809524754.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_scam_classification_en_5.5.0_3.0_1726809524754.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_scam_classification","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_scam_classification", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_scam_classification| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/jaranohaal/distilbert-base-uncased-finetuned-scam-classification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_scam_classification_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_scam_classification_pipeline_en.md new file mode 100644 index 00000000000000..a6a30702bbbcab --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_scam_classification_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_scam_classification_pipeline pipeline DistilBertForSequenceClassification from jaranohaal +author: John Snow Labs +name: distilbert_base_uncased_finetuned_scam_classification_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_scam_classification_pipeline` is a English model originally trained by jaranohaal. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_scam_classification_pipeline_en_5.5.0_3.0_1726809539259.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_scam_classification_pipeline_en_5.5.0_3.0_1726809539259.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_scam_classification_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_scam_classification_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_scam_classification_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/jaranohaal/distilbert-base-uncased-finetuned-scam-classification + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_sst2_kietb_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_sst2_kietb_en.md new file mode 100644 index 00000000000000..dafd917245ac25 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_sst2_kietb_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_sst2_kietb DistilBertForSequenceClassification from KietB +author: John Snow Labs +name: distilbert_base_uncased_finetuned_sst2_kietb +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_sst2_kietb` is a English model originally trained by KietB. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_sst2_kietb_en_5.5.0_3.0_1726842065870.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_sst2_kietb_en_5.5.0_3.0_1726842065870.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_sst2_kietb","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_sst2_kietb", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_sst2_kietb| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/KietB/distilbert-base-uncased-finetuned-sst2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_sst2_kietb_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_sst2_kietb_pipeline_en.md new file mode 100644 index 00000000000000..63f82d5b47420e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_sst2_kietb_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_sst2_kietb_pipeline pipeline DistilBertForSequenceClassification from KietB +author: John Snow Labs +name: distilbert_base_uncased_finetuned_sst2_kietb_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_sst2_kietb_pipeline` is a English model originally trained by KietB. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_sst2_kietb_pipeline_en_5.5.0_3.0_1726842079116.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_sst2_kietb_pipeline_en_5.5.0_3.0_1726842079116.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_sst2_kietb_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_sst2_kietb_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_sst2_kietb_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/KietB/distilbert-base-uncased-finetuned-sst2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_stationary_update_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_stationary_update_en.md new file mode 100644 index 00000000000000..2d6074604d17cc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_stationary_update_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_stationary_update DistilBertForSequenceClassification from MKS3099 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_stationary_update +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_stationary_update` is a English model originally trained by MKS3099. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_stationary_update_en_5.5.0_3.0_1726792206965.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_stationary_update_en_5.5.0_3.0_1726792206965.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_stationary_update","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_stationary_update", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_stationary_update| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/MKS3099/distilbert-base-uncased-finetuned-stationary-update \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_stationary_update_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_stationary_update_pipeline_en.md new file mode 100644 index 00000000000000..834fd29c95cd50 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_stationary_update_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_stationary_update_pipeline pipeline DistilBertForSequenceClassification from MKS3099 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_stationary_update_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_stationary_update_pipeline` is a English model originally trained by MKS3099. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_stationary_update_pipeline_en_5.5.0_3.0_1726792219458.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_stationary_update_pipeline_en_5.5.0_3.0_1726792219458.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_stationary_update_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_stationary_update_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_stationary_update_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/MKS3099/distilbert-base-uncased-finetuned-stationary-update + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_ft_2_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_ft_2_en.md new file mode 100644 index 00000000000000..e4725b09c62a16 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_ft_2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_ft_2 DistilBertForSequenceClassification from keylazy +author: John Snow Labs +name: distilbert_base_uncased_ft_2 +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_ft_2` is a English model originally trained by keylazy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_ft_2_en_5.5.0_3.0_1726871482400.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_ft_2_en_5.5.0_3.0_1726871482400.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_ft_2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_ft_2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_ft_2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/keylazy/distilbert-base-uncased-ft-2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_ft_2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_ft_2_pipeline_en.md new file mode 100644 index 00000000000000..1c7a88a4866e06 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_ft_2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_ft_2_pipeline pipeline DistilBertForSequenceClassification from keylazy +author: John Snow Labs +name: distilbert_base_uncased_ft_2_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_ft_2_pipeline` is a English model originally trained by keylazy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_ft_2_pipeline_en_5.5.0_3.0_1726871494588.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_ft_2_pipeline_en_5.5.0_3.0_1726871494588.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_ft_2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_ft_2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_ft_2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/keylazy/distilbert-base-uncased-ft-2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_home_zphr_0st72_ut12ut1_plain_simsp_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_home_zphr_0st72_ut12ut1_plain_simsp_en.md new file mode 100644 index 00000000000000..0404fd94f36131 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_home_zphr_0st72_ut12ut1_plain_simsp_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_home_zphr_0st72_ut12ut1_plain_simsp DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_home_zphr_0st72_ut12ut1_plain_simsp +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_home_zphr_0st72_ut12ut1_plain_simsp` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_home_zphr_0st72_ut12ut1_plain_simsp_en_5.5.0_3.0_1726871572523.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_home_zphr_0st72_ut12ut1_plain_simsp_en_5.5.0_3.0_1726871572523.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_home_zphr_0st72_ut12ut1_plain_simsp","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_home_zphr_0st72_ut12ut1_plain_simsp", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_home_zphr_0st72_ut12ut1_plain_simsp| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_home_zphr_0st72_ut12ut1_plain_simsp \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_home_zphr_0st72_ut12ut1_plain_simsp_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_home_zphr_0st72_ut12ut1_plain_simsp_pipeline_en.md new file mode 100644 index 00000000000000..05d552b48d6ecf --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_home_zphr_0st72_ut12ut1_plain_simsp_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_home_zphr_0st72_ut12ut1_plain_simsp_pipeline pipeline DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_home_zphr_0st72_ut12ut1_plain_simsp_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_home_zphr_0st72_ut12ut1_plain_simsp_pipeline` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_home_zphr_0st72_ut12ut1_plain_simsp_pipeline_en_5.5.0_3.0_1726871584506.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_home_zphr_0st72_ut12ut1_plain_simsp_pipeline_en_5.5.0_3.0_1726871584506.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_home_zphr_0st72_ut12ut1_plain_simsp_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_home_zphr_0st72_ut12ut1_plain_simsp_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_home_zphr_0st72_ut12ut1_plain_simsp_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_home_zphr_0st72_ut12ut1_plain_simsp + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_imdb_edg3_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_imdb_edg3_en.md new file mode 100644 index 00000000000000..bcd47f73c4c9a7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_imdb_edg3_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_imdb_edg3 DistilBertForSequenceClassification from edg3 +author: John Snow Labs +name: distilbert_base_uncased_imdb_edg3 +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_imdb_edg3` is a English model originally trained by edg3. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_imdb_edg3_en_5.5.0_3.0_1726832412910.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_imdb_edg3_en_5.5.0_3.0_1726832412910.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_imdb_edg3","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_imdb_edg3", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_imdb_edg3| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/edg3/distilbert-base-uncased-imdb \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_imdb_edg3_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_imdb_edg3_pipeline_en.md new file mode 100644 index 00000000000000..2e50a60508225c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_imdb_edg3_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_imdb_edg3_pipeline pipeline DistilBertForSequenceClassification from edg3 +author: John Snow Labs +name: distilbert_base_uncased_imdb_edg3_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_imdb_edg3_pipeline` is a English model originally trained by edg3. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_imdb_edg3_pipeline_en_5.5.0_3.0_1726832424893.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_imdb_edg3_pipeline_en_5.5.0_3.0_1726832424893.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_imdb_edg3_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_imdb_edg3_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_imdb_edg3_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/edg3/distilbert-base-uncased-imdb + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st102sd_random_ut72ut1_plprefix0stlarge_simsp100_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st102sd_random_ut72ut1_plprefix0stlarge_simsp100_en.md new file mode 100644 index 00000000000000..498cce1a9b4734 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st102sd_random_ut72ut1_plprefix0stlarge_simsp100_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st102sd_random_ut72ut1_plprefix0stlarge_simsp100 DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st102sd_random_ut72ut1_plprefix0stlarge_simsp100 +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st102sd_random_ut72ut1_plprefix0stlarge_simsp100` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st102sd_random_ut72ut1_plprefix0stlarge_simsp100_en_5.5.0_3.0_1726842004793.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st102sd_random_ut72ut1_plprefix0stlarge_simsp100_en_5.5.0_3.0_1726842004793.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st102sd_random_ut72ut1_plprefix0stlarge_simsp100","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st102sd_random_ut72ut1_plprefix0stlarge_simsp100", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st102sd_random_ut72ut1_plprefix0stlarge_simsp100| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.6 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st102sd_random_ut72ut1_PLPrefix0stlarge_simsp100 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st102sd_random_ut72ut1_plprefix0stlarge_simsp100_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st102sd_random_ut72ut1_plprefix0stlarge_simsp100_pipeline_en.md new file mode 100644 index 00000000000000..b20e93d7288fba --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st102sd_random_ut72ut1_plprefix0stlarge_simsp100_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st102sd_random_ut72ut1_plprefix0stlarge_simsp100_pipeline pipeline DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st102sd_random_ut72ut1_plprefix0stlarge_simsp100_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st102sd_random_ut72ut1_plprefix0stlarge_simsp100_pipeline` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st102sd_random_ut72ut1_plprefix0stlarge_simsp100_pipeline_en_5.5.0_3.0_1726842017140.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st102sd_random_ut72ut1_plprefix0stlarge_simsp100_pipeline_en_5.5.0_3.0_1726842017140.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st102sd_random_ut72ut1_plprefix0stlarge_simsp100_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st102sd_random_ut72ut1_plprefix0stlarge_simsp100_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st102sd_random_ut72ut1_plprefix0stlarge_simsp100_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.7 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st102sd_random_ut72ut1_PLPrefix0stlarge_simsp100 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st11sd_ut72ut1_plprefix0stlarge_simsp300_clean100_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st11sd_ut72ut1_plprefix0stlarge_simsp300_clean100_en.md new file mode 100644 index 00000000000000..b7e27da00c4cd4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st11sd_ut72ut1_plprefix0stlarge_simsp300_clean100_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st11sd_ut72ut1_plprefix0stlarge_simsp300_clean100 DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st11sd_ut72ut1_plprefix0stlarge_simsp300_clean100 +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st11sd_ut72ut1_plprefix0stlarge_simsp300_clean100` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st11sd_ut72ut1_plprefix0stlarge_simsp300_clean100_en_5.5.0_3.0_1726861019723.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st11sd_ut72ut1_plprefix0stlarge_simsp300_clean100_en_5.5.0_3.0_1726861019723.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st11sd_ut72ut1_plprefix0stlarge_simsp300_clean100","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st11sd_ut72ut1_plprefix0stlarge_simsp300_clean100", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st11sd_ut72ut1_plprefix0stlarge_simsp300_clean100| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.6 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st11sd_ut72ut1_PLPrefix0stlarge_simsp300_clean100 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st11sd_ut72ut1_plprefix0stlarge_simsp300_clean100_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st11sd_ut72ut1_plprefix0stlarge_simsp300_clean100_pipeline_en.md new file mode 100644 index 00000000000000..03fb0b53ba4928 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st11sd_ut72ut1_plprefix0stlarge_simsp300_clean100_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st11sd_ut72ut1_plprefix0stlarge_simsp300_clean100_pipeline pipeline DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st11sd_ut72ut1_plprefix0stlarge_simsp300_clean100_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st11sd_ut72ut1_plprefix0stlarge_simsp300_clean100_pipeline` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st11sd_ut72ut1_plprefix0stlarge_simsp300_clean100_pipeline_en_5.5.0_3.0_1726861032032.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st11sd_ut72ut1_plprefix0stlarge_simsp300_clean100_pipeline_en_5.5.0_3.0_1726861032032.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st11sd_ut72ut1_plprefix0stlarge_simsp300_clean100_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st11sd_ut72ut1_plprefix0stlarge_simsp300_clean100_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st11sd_ut72ut1_plprefix0stlarge_simsp300_clean100_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.7 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st11sd_ut72ut1_PLPrefix0stlarge_simsp300_clean100 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st11sd_ut72ut1large11pfxnf_simsp400_clean300_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st11sd_ut72ut1large11pfxnf_simsp400_clean300_en.md new file mode 100644 index 00000000000000..311d0a0feea73a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st11sd_ut72ut1large11pfxnf_simsp400_clean300_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st11sd_ut72ut1large11pfxnf_simsp400_clean300 DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st11sd_ut72ut1large11pfxnf_simsp400_clean300 +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st11sd_ut72ut1large11pfxnf_simsp400_clean300` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st11sd_ut72ut1large11pfxnf_simsp400_clean300_en_5.5.0_3.0_1726841347831.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st11sd_ut72ut1large11pfxnf_simsp400_clean300_en_5.5.0_3.0_1726841347831.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st11sd_ut72ut1large11pfxnf_simsp400_clean300","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st11sd_ut72ut1large11pfxnf_simsp400_clean300", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st11sd_ut72ut1large11pfxnf_simsp400_clean300| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.6 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st11sd_ut72ut1large11PfxNf_simsp400_clean300 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st11sd_ut72ut1large11pfxnf_simsp400_clean300_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st11sd_ut72ut1large11pfxnf_simsp400_clean300_pipeline_en.md new file mode 100644 index 00000000000000..97f4d20e1b30cc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st11sd_ut72ut1large11pfxnf_simsp400_clean300_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st11sd_ut72ut1large11pfxnf_simsp400_clean300_pipeline pipeline DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st11sd_ut72ut1large11pfxnf_simsp400_clean300_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st11sd_ut72ut1large11pfxnf_simsp400_clean300_pipeline` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st11sd_ut72ut1large11pfxnf_simsp400_clean300_pipeline_en_5.5.0_3.0_1726841359815.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st11sd_ut72ut1large11pfxnf_simsp400_clean300_pipeline_en_5.5.0_3.0_1726841359815.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st11sd_ut72ut1large11pfxnf_simsp400_clean300_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st11sd_ut72ut1large11pfxnf_simsp400_clean300_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st11sd_ut72ut1large11pfxnf_simsp400_clean300_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.7 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st11sd_ut72ut1large11PfxNf_simsp400_clean300 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st13sd_ut72ut1_plprefix0stlarge_simsp300_clean100_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st13sd_ut72ut1_plprefix0stlarge_simsp300_clean100_pipeline_en.md new file mode 100644 index 00000000000000..5b7c2ffc0058d1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st13sd_ut72ut1_plprefix0stlarge_simsp300_clean100_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st13sd_ut72ut1_plprefix0stlarge_simsp300_clean100_pipeline pipeline DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st13sd_ut72ut1_plprefix0stlarge_simsp300_clean100_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st13sd_ut72ut1_plprefix0stlarge_simsp300_clean100_pipeline` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st13sd_ut72ut1_plprefix0stlarge_simsp300_clean100_pipeline_en_5.5.0_3.0_1726791888534.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st13sd_ut72ut1_plprefix0stlarge_simsp300_clean100_pipeline_en_5.5.0_3.0_1726791888534.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st13sd_ut72ut1_plprefix0stlarge_simsp300_clean100_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st13sd_ut72ut1_plprefix0stlarge_simsp300_clean100_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st13sd_ut72ut1_plprefix0stlarge_simsp300_clean100_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.7 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st13sd_ut72ut1_PLPrefix0stlarge_simsp300_clean100 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st14sd_ut72ut1large14pfxnf_simsp400_clean100_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st14sd_ut72ut1large14pfxnf_simsp400_clean100_pipeline_en.md new file mode 100644 index 00000000000000..4b66421fddd03f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st14sd_ut72ut1large14pfxnf_simsp400_clean100_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st14sd_ut72ut1large14pfxnf_simsp400_clean100_pipeline pipeline DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st14sd_ut72ut1large14pfxnf_simsp400_clean100_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st14sd_ut72ut1large14pfxnf_simsp400_clean100_pipeline` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st14sd_ut72ut1large14pfxnf_simsp400_clean100_pipeline_en_5.5.0_3.0_1726808999960.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st14sd_ut72ut1large14pfxnf_simsp400_clean100_pipeline_en_5.5.0_3.0_1726808999960.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st14sd_ut72ut1large14pfxnf_simsp400_clean100_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st14sd_ut72ut1large14pfxnf_simsp400_clean100_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st14sd_ut72ut1large14pfxnf_simsp400_clean100_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.7 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st14sd_ut72ut1large14PfxNf_simsp400_clean100 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st14sd_ut72ut5_plprefix0stlarge14_simsp100_clean200_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st14sd_ut72ut5_plprefix0stlarge14_simsp100_clean200_en.md new file mode 100644 index 00000000000000..1a8b4dc8e37871 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st14sd_ut72ut5_plprefix0stlarge14_simsp100_clean200_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st14sd_ut72ut5_plprefix0stlarge14_simsp100_clean200 DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st14sd_ut72ut5_plprefix0stlarge14_simsp100_clean200 +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st14sd_ut72ut5_plprefix0stlarge14_simsp100_clean200` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st14sd_ut72ut5_plprefix0stlarge14_simsp100_clean200_en_5.5.0_3.0_1726841211476.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st14sd_ut72ut5_plprefix0stlarge14_simsp100_clean200_en_5.5.0_3.0_1726841211476.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st14sd_ut72ut5_plprefix0stlarge14_simsp100_clean200","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st14sd_ut72ut5_plprefix0stlarge14_simsp100_clean200", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st14sd_ut72ut5_plprefix0stlarge14_simsp100_clean200| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.6 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st14sd_ut72ut5_PLPrefix0stlarge14_simsp100_clean200 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st14sd_ut72ut5_plprefix0stlarge14_simsp100_clean200_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st14sd_ut72ut5_plprefix0stlarge14_simsp100_clean200_pipeline_en.md new file mode 100644 index 00000000000000..52eec50e939b89 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st14sd_ut72ut5_plprefix0stlarge14_simsp100_clean200_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st14sd_ut72ut5_plprefix0stlarge14_simsp100_clean200_pipeline pipeline DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st14sd_ut72ut5_plprefix0stlarge14_simsp100_clean200_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st14sd_ut72ut5_plprefix0stlarge14_simsp100_clean200_pipeline` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st14sd_ut72ut5_plprefix0stlarge14_simsp100_clean200_pipeline_en_5.5.0_3.0_1726841223487.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st14sd_ut72ut5_plprefix0stlarge14_simsp100_clean200_pipeline_en_5.5.0_3.0_1726841223487.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st14sd_ut72ut5_plprefix0stlarge14_simsp100_clean200_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st14sd_ut72ut5_plprefix0stlarge14_simsp100_clean200_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st14sd_ut72ut5_plprefix0stlarge14_simsp100_clean200_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.7 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st14sd_ut72ut5_PLPrefix0stlarge14_simsp100_clean200 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st15sd_ut72ut1large15pfxnf_simsp400_clean100_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st15sd_ut72ut1large15pfxnf_simsp400_clean100_en.md new file mode 100644 index 00000000000000..1b7e621235df04 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st15sd_ut72ut1large15pfxnf_simsp400_clean100_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st15sd_ut72ut1large15pfxnf_simsp400_clean100 DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st15sd_ut72ut1large15pfxnf_simsp400_clean100 +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st15sd_ut72ut1large15pfxnf_simsp400_clean100` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st15sd_ut72ut1large15pfxnf_simsp400_clean100_en_5.5.0_3.0_1726848629986.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st15sd_ut72ut1large15pfxnf_simsp400_clean100_en_5.5.0_3.0_1726848629986.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st15sd_ut72ut1large15pfxnf_simsp400_clean100","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st15sd_ut72ut1large15pfxnf_simsp400_clean100", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st15sd_ut72ut1large15pfxnf_simsp400_clean100| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.6 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st15sd_ut72ut1large15PfxNf_simsp400_clean100 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st15sd_ut72ut1large15pfxnf_simsp400_clean100_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st15sd_ut72ut1large15pfxnf_simsp400_clean100_pipeline_en.md new file mode 100644 index 00000000000000..0a2097e576b3e4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st15sd_ut72ut1large15pfxnf_simsp400_clean100_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st15sd_ut72ut1large15pfxnf_simsp400_clean100_pipeline pipeline DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st15sd_ut72ut1large15pfxnf_simsp400_clean100_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st15sd_ut72ut1large15pfxnf_simsp400_clean100_pipeline` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st15sd_ut72ut1large15pfxnf_simsp400_clean100_pipeline_en_5.5.0_3.0_1726848645620.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st15sd_ut72ut1large15pfxnf_simsp400_clean100_pipeline_en_5.5.0_3.0_1726848645620.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st15sd_ut72ut1large15pfxnf_simsp400_clean100_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st15sd_ut72ut1large15pfxnf_simsp400_clean100_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st15sd_ut72ut1large15pfxnf_simsp400_clean100_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.7 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st15sd_ut72ut1large15PfxNf_simsp400_clean100 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st17sd_ut72ut3_plprefix0stlarge_simsp100_clean100_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st17sd_ut72ut3_plprefix0stlarge_simsp100_clean100_en.md new file mode 100644 index 00000000000000..dec7292f7ba5a9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st17sd_ut72ut3_plprefix0stlarge_simsp100_clean100_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st17sd_ut72ut3_plprefix0stlarge_simsp100_clean100 DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st17sd_ut72ut3_plprefix0stlarge_simsp100_clean100 +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st17sd_ut72ut3_plprefix0stlarge_simsp100_clean100` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st17sd_ut72ut3_plprefix0stlarge_simsp100_clean100_en_5.5.0_3.0_1726830021037.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st17sd_ut72ut3_plprefix0stlarge_simsp100_clean100_en_5.5.0_3.0_1726830021037.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st17sd_ut72ut3_plprefix0stlarge_simsp100_clean100","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st17sd_ut72ut3_plprefix0stlarge_simsp100_clean100", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st17sd_ut72ut3_plprefix0stlarge_simsp100_clean100| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.6 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st17sd_ut72ut3_PLPrefix0stlarge_simsp100_clean100 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st17sd_ut72ut3_plprefix0stlarge_simsp100_clean100_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st17sd_ut72ut3_plprefix0stlarge_simsp100_clean100_pipeline_en.md new file mode 100644 index 00000000000000..7fc3dbbaa96d49 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st17sd_ut72ut3_plprefix0stlarge_simsp100_clean100_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st17sd_ut72ut3_plprefix0stlarge_simsp100_clean100_pipeline pipeline DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st17sd_ut72ut3_plprefix0stlarge_simsp100_clean100_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st17sd_ut72ut3_plprefix0stlarge_simsp100_clean100_pipeline` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st17sd_ut72ut3_plprefix0stlarge_simsp100_clean100_pipeline_en_5.5.0_3.0_1726830034280.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st17sd_ut72ut3_plprefix0stlarge_simsp100_clean100_pipeline_en_5.5.0_3.0_1726830034280.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st17sd_ut72ut3_plprefix0stlarge_simsp100_clean100_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st17sd_ut72ut3_plprefix0stlarge_simsp100_clean100_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st17sd_ut72ut3_plprefix0stlarge_simsp100_clean100_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.7 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st17sd_ut72ut3_PLPrefix0stlarge_simsp100_clean100 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st17sd_ut72ut5_plprefix0stlarge17_simsp100_clean300_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st17sd_ut72ut5_plprefix0stlarge17_simsp100_clean300_en.md new file mode 100644 index 00000000000000..acd7624ded4bda --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st17sd_ut72ut5_plprefix0stlarge17_simsp100_clean300_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st17sd_ut72ut5_plprefix0stlarge17_simsp100_clean300 DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st17sd_ut72ut5_plprefix0stlarge17_simsp100_clean300 +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st17sd_ut72ut5_plprefix0stlarge17_simsp100_clean300` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st17sd_ut72ut5_plprefix0stlarge17_simsp100_clean300_en_5.5.0_3.0_1726841549081.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st17sd_ut72ut5_plprefix0stlarge17_simsp100_clean300_en_5.5.0_3.0_1726841549081.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st17sd_ut72ut5_plprefix0stlarge17_simsp100_clean300","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st17sd_ut72ut5_plprefix0stlarge17_simsp100_clean300", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st17sd_ut72ut5_plprefix0stlarge17_simsp100_clean300| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.6 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st17sd_ut72ut5_PLPrefix0stlarge17_simsp100_clean300 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st17sd_ut72ut5_plprefix0stlarge17_simsp100_clean300_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st17sd_ut72ut5_plprefix0stlarge17_simsp100_clean300_pipeline_en.md new file mode 100644 index 00000000000000..2551ef45125ab3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st17sd_ut72ut5_plprefix0stlarge17_simsp100_clean300_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st17sd_ut72ut5_plprefix0stlarge17_simsp100_clean300_pipeline pipeline DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st17sd_ut72ut5_plprefix0stlarge17_simsp100_clean300_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st17sd_ut72ut5_plprefix0stlarge17_simsp100_clean300_pipeline` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st17sd_ut72ut5_plprefix0stlarge17_simsp100_clean300_pipeline_en_5.5.0_3.0_1726841561721.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st17sd_ut72ut5_plprefix0stlarge17_simsp100_clean300_pipeline_en_5.5.0_3.0_1726841561721.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st17sd_ut72ut5_plprefix0stlarge17_simsp100_clean300_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st17sd_ut72ut5_plprefix0stlarge17_simsp100_clean300_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st17sd_ut72ut5_plprefix0stlarge17_simsp100_clean300_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.7 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st17sd_ut72ut5_PLPrefix0stlarge17_simsp100_clean300 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st19sd_ut72ut1_plprefix0stlarge19_simsp_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st19sd_ut72ut1_plprefix0stlarge19_simsp_en.md new file mode 100644 index 00000000000000..5361c54b219f22 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st19sd_ut72ut1_plprefix0stlarge19_simsp_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st19sd_ut72ut1_plprefix0stlarge19_simsp DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st19sd_ut72ut1_plprefix0stlarge19_simsp +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st19sd_ut72ut1_plprefix0stlarge19_simsp` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st19sd_ut72ut1_plprefix0stlarge19_simsp_en_5.5.0_3.0_1726832851138.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st19sd_ut72ut1_plprefix0stlarge19_simsp_en_5.5.0_3.0_1726832851138.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st19sd_ut72ut1_plprefix0stlarge19_simsp","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st19sd_ut72ut1_plprefix0stlarge19_simsp", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st19sd_ut72ut1_plprefix0stlarge19_simsp| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.6 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st19sd_ut72ut1_PLPrefix0stlarge19_simsp \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st19sd_ut72ut1_plprefix0stlarge19_simsp_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st19sd_ut72ut1_plprefix0stlarge19_simsp_pipeline_en.md new file mode 100644 index 00000000000000..c8ef4779424ee6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st19sd_ut72ut1_plprefix0stlarge19_simsp_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st19sd_ut72ut1_plprefix0stlarge19_simsp_pipeline pipeline DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st19sd_ut72ut1_plprefix0stlarge19_simsp_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st19sd_ut72ut1_plprefix0stlarge19_simsp_pipeline` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st19sd_ut72ut1_plprefix0stlarge19_simsp_pipeline_en_5.5.0_3.0_1726832863519.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st19sd_ut72ut1_plprefix0stlarge19_simsp_pipeline_en_5.5.0_3.0_1726832863519.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st19sd_ut72ut1_plprefix0stlarge19_simsp_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st19sd_ut72ut1_plprefix0stlarge19_simsp_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st19sd_ut72ut1_plprefix0stlarge19_simsp_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.7 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st19sd_ut72ut1_PLPrefix0stlarge19_simsp + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st20sd_ut72ut5_plprefix0stlarge20_simsp100_clean300_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st20sd_ut72ut5_plprefix0stlarge20_simsp100_clean300_en.md new file mode 100644 index 00000000000000..45a025ec1980aa --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st20sd_ut72ut5_plprefix0stlarge20_simsp100_clean300_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st20sd_ut72ut5_plprefix0stlarge20_simsp100_clean300 DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st20sd_ut72ut5_plprefix0stlarge20_simsp100_clean300 +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st20sd_ut72ut5_plprefix0stlarge20_simsp100_clean300` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st20sd_ut72ut5_plprefix0stlarge20_simsp100_clean300_en_5.5.0_3.0_1726808983992.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st20sd_ut72ut5_plprefix0stlarge20_simsp100_clean300_en_5.5.0_3.0_1726808983992.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st20sd_ut72ut5_plprefix0stlarge20_simsp100_clean300","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st20sd_ut72ut5_plprefix0stlarge20_simsp100_clean300", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st20sd_ut72ut5_plprefix0stlarge20_simsp100_clean300| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.6 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st20sd_ut72ut5_PLPrefix0stlarge20_simsp100_clean300 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st20sd_ut72ut5_plprefix0stlarge20_simsp100_clean300_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st20sd_ut72ut5_plprefix0stlarge20_simsp100_clean300_pipeline_en.md new file mode 100644 index 00000000000000..db99837f6aaeca --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st20sd_ut72ut5_plprefix0stlarge20_simsp100_clean300_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st20sd_ut72ut5_plprefix0stlarge20_simsp100_clean300_pipeline pipeline DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st20sd_ut72ut5_plprefix0stlarge20_simsp100_clean300_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st20sd_ut72ut5_plprefix0stlarge20_simsp100_clean300_pipeline` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st20sd_ut72ut5_plprefix0stlarge20_simsp100_clean300_pipeline_en_5.5.0_3.0_1726809000163.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st20sd_ut72ut5_plprefix0stlarge20_simsp100_clean300_pipeline_en_5.5.0_3.0_1726809000163.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st20sd_ut72ut5_plprefix0stlarge20_simsp100_clean300_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st20sd_ut72ut5_plprefix0stlarge20_simsp100_clean300_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st20sd_ut72ut5_plprefix0stlarge20_simsp100_clean300_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.7 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st20sd_ut72ut5_PLPrefix0stlarge20_simsp100_clean300 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st2sd_ut72ut1large2pfxnf_simsp400_clean200_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st2sd_ut72ut1large2pfxnf_simsp400_clean200_en.md new file mode 100644 index 00000000000000..4520fc86b40a44 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st2sd_ut72ut1large2pfxnf_simsp400_clean200_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st2sd_ut72ut1large2pfxnf_simsp400_clean200 DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st2sd_ut72ut1large2pfxnf_simsp400_clean200 +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st2sd_ut72ut1large2pfxnf_simsp400_clean200` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st2sd_ut72ut1large2pfxnf_simsp400_clean200_en_5.5.0_3.0_1726848789875.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st2sd_ut72ut1large2pfxnf_simsp400_clean200_en_5.5.0_3.0_1726848789875.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st2sd_ut72ut1large2pfxnf_simsp400_clean200","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st2sd_ut72ut1large2pfxnf_simsp400_clean200", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st2sd_ut72ut1large2pfxnf_simsp400_clean200| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.6 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st2sd_ut72ut1large2PfxNf_simsp400_clean200 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st2sd_ut72ut1large2pfxnf_simsp400_clean200_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st2sd_ut72ut1large2pfxnf_simsp400_clean200_pipeline_en.md new file mode 100644 index 00000000000000..101d87c5450b0b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st2sd_ut72ut1large2pfxnf_simsp400_clean200_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st2sd_ut72ut1large2pfxnf_simsp400_clean200_pipeline pipeline DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st2sd_ut72ut1large2pfxnf_simsp400_clean200_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st2sd_ut72ut1large2pfxnf_simsp400_clean200_pipeline` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st2sd_ut72ut1large2pfxnf_simsp400_clean200_pipeline_en_5.5.0_3.0_1726848802411.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st2sd_ut72ut1large2pfxnf_simsp400_clean200_pipeline_en_5.5.0_3.0_1726848802411.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st2sd_ut72ut1large2pfxnf_simsp400_clean200_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st2sd_ut72ut1large2pfxnf_simsp400_clean200_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st2sd_ut72ut1large2pfxnf_simsp400_clean200_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.7 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st2sd_ut72ut1large2PfxNf_simsp400_clean200 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1_plprefix0stlarge80_simsp_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1_plprefix0stlarge80_simsp_en.md new file mode 100644 index 00000000000000..0b090b71f00188 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1_plprefix0stlarge80_simsp_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1_plprefix0stlarge80_simsp DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1_plprefix0stlarge80_simsp +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1_plprefix0stlarge80_simsp` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1_plprefix0stlarge80_simsp_en_5.5.0_3.0_1726823755633.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1_plprefix0stlarge80_simsp_en_5.5.0_3.0_1726823755633.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1_plprefix0stlarge80_simsp","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1_plprefix0stlarge80_simsp", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1_plprefix0stlarge80_simsp| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.6 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st42sd_ut72ut1_PLPrefix0stlarge80_simsp \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1_plprefix0stlarge80_simsp_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1_plprefix0stlarge80_simsp_pipeline_en.md new file mode 100644 index 00000000000000..2e1c4eb634e204 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1_plprefix0stlarge80_simsp_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1_plprefix0stlarge80_simsp_pipeline pipeline DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1_plprefix0stlarge80_simsp_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1_plprefix0stlarge80_simsp_pipeline` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1_plprefix0stlarge80_simsp_pipeline_en_5.5.0_3.0_1726823767070.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1_plprefix0stlarge80_simsp_pipeline_en_5.5.0_3.0_1726823767070.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1_plprefix0stlarge80_simsp_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1_plprefix0stlarge80_simsp_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1_plprefix0stlarge80_simsp_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.7 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st42sd_ut72ut1_PLPrefix0stlarge80_simsp + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large40pfxnf_simsp_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large40pfxnf_simsp_en.md new file mode 100644 index 00000000000000..3501615981b128 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large40pfxnf_simsp_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large40pfxnf_simsp DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large40pfxnf_simsp +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large40pfxnf_simsp` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large40pfxnf_simsp_en_5.5.0_3.0_1726841218228.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large40pfxnf_simsp_en_5.5.0_3.0_1726841218228.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large40pfxnf_simsp","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large40pfxnf_simsp", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large40pfxnf_simsp| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.6 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st42sd_ut72ut1large40PfxNf_simsp \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large40pfxnf_simsp_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large40pfxnf_simsp_pipeline_en.md new file mode 100644 index 00000000000000..137b21a3950c22 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large40pfxnf_simsp_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large40pfxnf_simsp_pipeline pipeline DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large40pfxnf_simsp_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large40pfxnf_simsp_pipeline` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large40pfxnf_simsp_pipeline_en_5.5.0_3.0_1726841234253.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large40pfxnf_simsp_pipeline_en_5.5.0_3.0_1726841234253.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large40pfxnf_simsp_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large40pfxnf_simsp_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large40pfxnf_simsp_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.7 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st42sd_ut72ut1large40PfxNf_simsp + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large90pfxnf_simsp_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large90pfxnf_simsp_en.md new file mode 100644 index 00000000000000..111bd1992a91e9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large90pfxnf_simsp_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large90pfxnf_simsp DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large90pfxnf_simsp +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large90pfxnf_simsp` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large90pfxnf_simsp_en_5.5.0_3.0_1726848930136.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large90pfxnf_simsp_en_5.5.0_3.0_1726848930136.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large90pfxnf_simsp","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large90pfxnf_simsp", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large90pfxnf_simsp| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.6 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st42sd_ut72ut1large90PfxNf_simsp \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large90pfxnf_simsp_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large90pfxnf_simsp_pipeline_en.md new file mode 100644 index 00000000000000..986f960458f999 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large90pfxnf_simsp_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large90pfxnf_simsp_pipeline pipeline DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large90pfxnf_simsp_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large90pfxnf_simsp_pipeline` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large90pfxnf_simsp_pipeline_en_5.5.0_3.0_1726848942016.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large90pfxnf_simsp_pipeline_en_5.5.0_3.0_1726848942016.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large90pfxnf_simsp_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large90pfxnf_simsp_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large90pfxnf_simsp_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.7 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st42sd_ut72ut1large90PfxNf_simsp + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large91pfxnf_simsp_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large91pfxnf_simsp_en.md new file mode 100644 index 00000000000000..4fe22fe6406044 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large91pfxnf_simsp_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large91pfxnf_simsp DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large91pfxnf_simsp +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large91pfxnf_simsp` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large91pfxnf_simsp_en_5.5.0_3.0_1726841445240.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large91pfxnf_simsp_en_5.5.0_3.0_1726841445240.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large91pfxnf_simsp","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large91pfxnf_simsp", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large91pfxnf_simsp| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.6 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st42sd_ut72ut1large91PfxNf_simsp \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large91pfxnf_simsp_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large91pfxnf_simsp_pipeline_en.md new file mode 100644 index 00000000000000..d3561caad5aec8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large91pfxnf_simsp_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large91pfxnf_simsp_pipeline pipeline DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large91pfxnf_simsp_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large91pfxnf_simsp_pipeline` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large91pfxnf_simsp_pipeline_en_5.5.0_3.0_1726841458120.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large91pfxnf_simsp_pipeline_en_5.5.0_3.0_1726841458120.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large91pfxnf_simsp_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large91pfxnf_simsp_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large91pfxnf_simsp_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.7 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st42sd_ut72ut1large91PfxNf_simsp + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st5sd_ut72ut1_plprefix0stlarge5_simsp100_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st5sd_ut72ut1_plprefix0stlarge5_simsp100_en.md new file mode 100644 index 00000000000000..c944690cba0645 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st5sd_ut72ut1_plprefix0stlarge5_simsp100_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st5sd_ut72ut1_plprefix0stlarge5_simsp100 DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st5sd_ut72ut1_plprefix0stlarge5_simsp100 +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st5sd_ut72ut1_plprefix0stlarge5_simsp100` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st5sd_ut72ut1_plprefix0stlarge5_simsp100_en_5.5.0_3.0_1726849030957.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st5sd_ut72ut1_plprefix0stlarge5_simsp100_en_5.5.0_3.0_1726849030957.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st5sd_ut72ut1_plprefix0stlarge5_simsp100","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st5sd_ut72ut1_plprefix0stlarge5_simsp100", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st5sd_ut72ut1_plprefix0stlarge5_simsp100| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.6 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st5sd_ut72ut1_PLPrefix0stlarge5_simsp100 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st5sd_ut72ut1_plprefix0stlarge5_simsp100_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st5sd_ut72ut1_plprefix0stlarge5_simsp100_pipeline_en.md new file mode 100644 index 00000000000000..79def0c8663602 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st5sd_ut72ut1_plprefix0stlarge5_simsp100_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st5sd_ut72ut1_plprefix0stlarge5_simsp100_pipeline pipeline DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st5sd_ut72ut1_plprefix0stlarge5_simsp100_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st5sd_ut72ut1_plprefix0stlarge5_simsp100_pipeline` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st5sd_ut72ut1_plprefix0stlarge5_simsp100_pipeline_en_5.5.0_3.0_1726849043401.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st5sd_ut72ut1_plprefix0stlarge5_simsp100_pipeline_en_5.5.0_3.0_1726849043401.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st5sd_ut72ut1_plprefix0stlarge5_simsp100_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st5sd_ut72ut1_plprefix0stlarge5_simsp100_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st5sd_ut72ut1_plprefix0stlarge5_simsp100_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.7 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st5sd_ut72ut1_PLPrefix0stlarge5_simsp100 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st5sd_ut72ut1_plprefix0stlarge5_simsp400_clean200_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st5sd_ut72ut1_plprefix0stlarge5_simsp400_clean200_en.md new file mode 100644 index 00000000000000..ad23a6b39b51be --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st5sd_ut72ut1_plprefix0stlarge5_simsp400_clean200_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st5sd_ut72ut1_plprefix0stlarge5_simsp400_clean200 DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st5sd_ut72ut1_plprefix0stlarge5_simsp400_clean200 +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st5sd_ut72ut1_plprefix0stlarge5_simsp400_clean200` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st5sd_ut72ut1_plprefix0stlarge5_simsp400_clean200_en_5.5.0_3.0_1726848524010.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st5sd_ut72ut1_plprefix0stlarge5_simsp400_clean200_en_5.5.0_3.0_1726848524010.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st5sd_ut72ut1_plprefix0stlarge5_simsp400_clean200","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st5sd_ut72ut1_plprefix0stlarge5_simsp400_clean200", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st5sd_ut72ut1_plprefix0stlarge5_simsp400_clean200| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.6 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st5sd_ut72ut1_PLPrefix0stlarge5_simsp400_clean200 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st5sd_ut72ut1_plprefix0stlarge5_simsp400_clean200_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st5sd_ut72ut1_plprefix0stlarge5_simsp400_clean200_pipeline_en.md new file mode 100644 index 00000000000000..ee3a10c9320a36 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st5sd_ut72ut1_plprefix0stlarge5_simsp400_clean200_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st5sd_ut72ut1_plprefix0stlarge5_simsp400_clean200_pipeline pipeline DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st5sd_ut72ut1_plprefix0stlarge5_simsp400_clean200_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st5sd_ut72ut1_plprefix0stlarge5_simsp400_clean200_pipeline` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st5sd_ut72ut1_plprefix0stlarge5_simsp400_clean200_pipeline_en_5.5.0_3.0_1726848536713.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st5sd_ut72ut1_plprefix0stlarge5_simsp400_clean200_pipeline_en_5.5.0_3.0_1726848536713.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st5sd_ut72ut1_plprefix0stlarge5_simsp400_clean200_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st5sd_ut72ut1_plprefix0stlarge5_simsp400_clean200_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st5sd_ut72ut1_plprefix0stlarge5_simsp400_clean200_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.7 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st5sd_ut72ut1_PLPrefix0stlarge5_simsp400_clean200 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st5sd_ut72ut5_plprefix0stlarge5_simsp100_clean200_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st5sd_ut72ut5_plprefix0stlarge5_simsp100_clean200_en.md new file mode 100644 index 00000000000000..24a179da35ed7f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st5sd_ut72ut5_plprefix0stlarge5_simsp100_clean200_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st5sd_ut72ut5_plprefix0stlarge5_simsp100_clean200 DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st5sd_ut72ut5_plprefix0stlarge5_simsp100_clean200 +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st5sd_ut72ut5_plprefix0stlarge5_simsp100_clean200` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st5sd_ut72ut5_plprefix0stlarge5_simsp100_clean200_en_5.5.0_3.0_1726829860099.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st5sd_ut72ut5_plprefix0stlarge5_simsp100_clean200_en_5.5.0_3.0_1726829860099.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st5sd_ut72ut5_plprefix0stlarge5_simsp100_clean200","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st5sd_ut72ut5_plprefix0stlarge5_simsp100_clean200", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st5sd_ut72ut5_plprefix0stlarge5_simsp100_clean200| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.6 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st5sd_ut72ut5_PLPrefix0stlarge5_simsp100_clean200 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st5sd_ut72ut5_plprefix0stlarge5_simsp100_clean200_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st5sd_ut72ut5_plprefix0stlarge5_simsp100_clean200_pipeline_en.md new file mode 100644 index 00000000000000..6aa882640b6788 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st5sd_ut72ut5_plprefix0stlarge5_simsp100_clean200_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st5sd_ut72ut5_plprefix0stlarge5_simsp100_clean200_pipeline pipeline DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st5sd_ut72ut5_plprefix0stlarge5_simsp100_clean200_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st5sd_ut72ut5_plprefix0stlarge5_simsp100_clean200_pipeline` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st5sd_ut72ut5_plprefix0stlarge5_simsp100_clean200_pipeline_en_5.5.0_3.0_1726829873885.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st5sd_ut72ut5_plprefix0stlarge5_simsp100_clean200_pipeline_en_5.5.0_3.0_1726829873885.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st5sd_ut72ut5_plprefix0stlarge5_simsp100_clean200_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st5sd_ut72ut5_plprefix0stlarge5_simsp100_clean200_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st5sd_ut72ut5_plprefix0stlarge5_simsp100_clean200_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.7 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st5sd_ut72ut5_PLPrefix0stlarge5_simsp100_clean200 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st7sd_ut72ut1_plprefix0stlarge_simsp300_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st7sd_ut72ut1_plprefix0stlarge_simsp300_en.md new file mode 100644 index 00000000000000..049898f066d095 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st7sd_ut72ut1_plprefix0stlarge_simsp300_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st7sd_ut72ut1_plprefix0stlarge_simsp300 DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st7sd_ut72ut1_plprefix0stlarge_simsp300 +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st7sd_ut72ut1_plprefix0stlarge_simsp300` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st7sd_ut72ut1_plprefix0stlarge_simsp300_en_5.5.0_3.0_1726823941541.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st7sd_ut72ut1_plprefix0stlarge_simsp300_en_5.5.0_3.0_1726823941541.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st7sd_ut72ut1_plprefix0stlarge_simsp300","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st7sd_ut72ut1_plprefix0stlarge_simsp300", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st7sd_ut72ut1_plprefix0stlarge_simsp300| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.6 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st7sd_ut72ut1_PLPrefix0stlarge_simsp300 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st7sd_ut72ut1_plprefix0stlarge_simsp300_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st7sd_ut72ut1_plprefix0stlarge_simsp300_pipeline_en.md new file mode 100644 index 00000000000000..0aeb87eae0f427 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st7sd_ut72ut1_plprefix0stlarge_simsp300_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st7sd_ut72ut1_plprefix0stlarge_simsp300_pipeline pipeline DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st7sd_ut72ut1_plprefix0stlarge_simsp300_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st7sd_ut72ut1_plprefix0stlarge_simsp300_pipeline` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st7sd_ut72ut1_plprefix0stlarge_simsp300_pipeline_en_5.5.0_3.0_1726823955581.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st7sd_ut72ut1_plprefix0stlarge_simsp300_pipeline_en_5.5.0_3.0_1726823955581.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st7sd_ut72ut1_plprefix0stlarge_simsp300_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st7sd_ut72ut1_plprefix0stlarge_simsp300_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st7sd_ut72ut1_plprefix0stlarge_simsp300_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.7 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st7sd_ut72ut1_PLPrefix0stlarge_simsp300 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st7sd_ut72ut1largepfxnf_simsp300_clean200_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st7sd_ut72ut1largepfxnf_simsp300_clean200_en.md new file mode 100644 index 00000000000000..f99e935d8f5ef5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st7sd_ut72ut1largepfxnf_simsp300_clean200_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st7sd_ut72ut1largepfxnf_simsp300_clean200 DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st7sd_ut72ut1largepfxnf_simsp300_clean200 +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st7sd_ut72ut1largepfxnf_simsp300_clean200` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st7sd_ut72ut1largepfxnf_simsp300_clean200_en_5.5.0_3.0_1726832493195.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st7sd_ut72ut1largepfxnf_simsp300_clean200_en_5.5.0_3.0_1726832493195.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st7sd_ut72ut1largepfxnf_simsp300_clean200","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st7sd_ut72ut1largepfxnf_simsp300_clean200", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st7sd_ut72ut1largepfxnf_simsp300_clean200| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.6 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st7sd_ut72ut1largePfxNf_simsp300_clean200 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st7sd_ut72ut1largepfxnf_simsp300_clean200_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st7sd_ut72ut1largepfxnf_simsp300_clean200_pipeline_en.md new file mode 100644 index 00000000000000..8d33df5112a0c1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st7sd_ut72ut1largepfxnf_simsp300_clean200_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st7sd_ut72ut1largepfxnf_simsp300_clean200_pipeline pipeline DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st7sd_ut72ut1largepfxnf_simsp300_clean200_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st7sd_ut72ut1largepfxnf_simsp300_clean200_pipeline` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st7sd_ut72ut1largepfxnf_simsp300_clean200_pipeline_en_5.5.0_3.0_1726832505338.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st7sd_ut72ut1largepfxnf_simsp300_clean200_pipeline_en_5.5.0_3.0_1726832505338.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st7sd_ut72ut1largepfxnf_simsp300_clean200_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st7sd_ut72ut1largepfxnf_simsp300_clean200_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st7sd_ut72ut1largepfxnf_simsp300_clean200_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.7 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st7sd_ut72ut1largePfxNf_simsp300_clean200 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st9sd_ut72ut1large9pfxnf_simsp400_clean100_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st9sd_ut72ut1large9pfxnf_simsp400_clean100_en.md new file mode 100644 index 00000000000000..f54372b65de3fa --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st9sd_ut72ut1large9pfxnf_simsp400_clean100_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st9sd_ut72ut1large9pfxnf_simsp400_clean100 DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st9sd_ut72ut1large9pfxnf_simsp400_clean100 +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st9sd_ut72ut1large9pfxnf_simsp400_clean100` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st9sd_ut72ut1large9pfxnf_simsp400_clean100_en_5.5.0_3.0_1726832638940.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st9sd_ut72ut1large9pfxnf_simsp400_clean100_en_5.5.0_3.0_1726832638940.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st9sd_ut72ut1large9pfxnf_simsp400_clean100","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st9sd_ut72ut1large9pfxnf_simsp400_clean100", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st9sd_ut72ut1large9pfxnf_simsp400_clean100| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.6 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st9sd_ut72ut1large9PfxNf_simsp400_clean100 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_odm_zphr_0st102sd_random_ut72ut1_plprefix0stlarge_simsp100_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_odm_zphr_0st102sd_random_ut72ut1_plprefix0stlarge_simsp100_pipeline_en.md new file mode 100644 index 00000000000000..a67a5f62fe9ae9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_odm_zphr_0st102sd_random_ut72ut1_plprefix0stlarge_simsp100_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_odm_zphr_0st102sd_random_ut72ut1_plprefix0stlarge_simsp100_pipeline pipeline DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_odm_zphr_0st102sd_random_ut72ut1_plprefix0stlarge_simsp100_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_odm_zphr_0st102sd_random_ut72ut1_plprefix0stlarge_simsp100_pipeline` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_odm_zphr_0st102sd_random_ut72ut1_plprefix0stlarge_simsp100_pipeline_en_5.5.0_3.0_1726809352362.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_odm_zphr_0st102sd_random_ut72ut1_plprefix0stlarge_simsp100_pipeline_en_5.5.0_3.0_1726809352362.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_odm_zphr_odm_zphr_0st102sd_random_ut72ut1_plprefix0stlarge_simsp100_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_odm_zphr_odm_zphr_0st102sd_random_ut72ut1_plprefix0stlarge_simsp100_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_odm_zphr_0st102sd_random_ut72ut1_plprefix0stlarge_simsp100_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.7 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_odm_zphr_0st102sd_random_ut72ut1_PLPrefix0stlarge_simsp100 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_odm_zphr_0st102sd_score_ut72ut5_plprefix0stlarge_simsp100_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_odm_zphr_0st102sd_score_ut72ut5_plprefix0stlarge_simsp100_en.md new file mode 100644 index 00000000000000..f32e17fc4616b3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_odm_zphr_0st102sd_score_ut72ut5_plprefix0stlarge_simsp100_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_odm_zphr_0st102sd_score_ut72ut5_plprefix0stlarge_simsp100 DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_odm_zphr_0st102sd_score_ut72ut5_plprefix0stlarge_simsp100 +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_odm_zphr_0st102sd_score_ut72ut5_plprefix0stlarge_simsp100` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_odm_zphr_0st102sd_score_ut72ut5_plprefix0stlarge_simsp100_en_5.5.0_3.0_1726849134296.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_odm_zphr_0st102sd_score_ut72ut5_plprefix0stlarge_simsp100_en_5.5.0_3.0_1726849134296.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_odm_zphr_0st102sd_score_ut72ut5_plprefix0stlarge_simsp100","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_odm_zphr_0st102sd_score_ut72ut5_plprefix0stlarge_simsp100", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_odm_zphr_0st102sd_score_ut72ut5_plprefix0stlarge_simsp100| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.6 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_odm_zphr_0st102sd_score_ut72ut5_PLPrefix0stlarge_simsp100 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_odm_zphr_0st102sd_score_ut72ut5_plprefix0stlarge_simsp100_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_odm_zphr_0st102sd_score_ut72ut5_plprefix0stlarge_simsp100_pipeline_en.md new file mode 100644 index 00000000000000..3a8295f36b5888 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_odm_zphr_0st102sd_score_ut72ut5_plprefix0stlarge_simsp100_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_odm_zphr_0st102sd_score_ut72ut5_plprefix0stlarge_simsp100_pipeline pipeline DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_odm_zphr_0st102sd_score_ut72ut5_plprefix0stlarge_simsp100_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_odm_zphr_0st102sd_score_ut72ut5_plprefix0stlarge_simsp100_pipeline` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_odm_zphr_0st102sd_score_ut72ut5_plprefix0stlarge_simsp100_pipeline_en_5.5.0_3.0_1726849146204.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_odm_zphr_0st102sd_score_ut72ut5_plprefix0stlarge_simsp100_pipeline_en_5.5.0_3.0_1726849146204.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_odm_zphr_odm_zphr_0st102sd_score_ut72ut5_plprefix0stlarge_simsp100_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_odm_zphr_odm_zphr_0st102sd_score_ut72ut5_plprefix0stlarge_simsp100_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_odm_zphr_0st102sd_score_ut72ut5_plprefix0stlarge_simsp100_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.7 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_odm_zphr_0st102sd_score_ut72ut5_PLPrefix0stlarge_simsp100 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_small_talk_zphr_0st_ut102ut1_plain_simsp_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_small_talk_zphr_0st_ut102ut1_plain_simsp_en.md new file mode 100644 index 00000000000000..944f01ace01b91 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_small_talk_zphr_0st_ut102ut1_plain_simsp_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_small_talk_zphr_0st_ut102ut1_plain_simsp DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_small_talk_zphr_0st_ut102ut1_plain_simsp +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_small_talk_zphr_0st_ut102ut1_plain_simsp` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_small_talk_zphr_0st_ut102ut1_plain_simsp_en_5.5.0_3.0_1726823960502.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_small_talk_zphr_0st_ut102ut1_plain_simsp_en_5.5.0_3.0_1726823960502.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_small_talk_zphr_0st_ut102ut1_plain_simsp","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_small_talk_zphr_0st_ut102ut1_plain_simsp", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_small_talk_zphr_0st_ut102ut1_plain_simsp| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_small_talk_zphr_0st_ut102ut1_plain_simsp \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_small_talk_zphr_0st_ut102ut1_plain_simsp_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_small_talk_zphr_0st_ut102ut1_plain_simsp_pipeline_en.md new file mode 100644 index 00000000000000..87d27aff0a9e1d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_small_talk_zphr_0st_ut102ut1_plain_simsp_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_small_talk_zphr_0st_ut102ut1_plain_simsp_pipeline pipeline DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_small_talk_zphr_0st_ut102ut1_plain_simsp_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_small_talk_zphr_0st_ut102ut1_plain_simsp_pipeline` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_small_talk_zphr_0st_ut102ut1_plain_simsp_pipeline_en_5.5.0_3.0_1726823973171.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_small_talk_zphr_0st_ut102ut1_plain_simsp_pipeline_en_5.5.0_3.0_1726823973171.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_small_talk_zphr_0st_ut102ut1_plain_simsp_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_small_talk_zphr_0st_ut102ut1_plain_simsp_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_small_talk_zphr_0st_ut102ut1_plain_simsp_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_small_talk_zphr_0st_ut102ut1_plain_simsp + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_stock_classification_finetuned_dcard_epoch2_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_stock_classification_finetuned_dcard_epoch2_en.md new file mode 100644 index 00000000000000..1173ebf0a8f780 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_stock_classification_finetuned_dcard_epoch2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_stock_classification_finetuned_dcard_epoch2 DistilBertForSequenceClassification from Mou11209203 +author: John Snow Labs +name: distilbert_base_uncased_stock_classification_finetuned_dcard_epoch2 +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_stock_classification_finetuned_dcard_epoch2` is a English model originally trained by Mou11209203. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_stock_classification_finetuned_dcard_epoch2_en_5.5.0_3.0_1726848935060.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_stock_classification_finetuned_dcard_epoch2_en_5.5.0_3.0_1726848935060.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_stock_classification_finetuned_dcard_epoch2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_stock_classification_finetuned_dcard_epoch2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_stock_classification_finetuned_dcard_epoch2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Mou11209203/distilbert-base-uncased_stock_classification_finetuned_dcard_epoch2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_stock_classification_finetuned_dcard_epoch2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_stock_classification_finetuned_dcard_epoch2_pipeline_en.md new file mode 100644 index 00000000000000..4665dc5d10a931 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_stock_classification_finetuned_dcard_epoch2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_stock_classification_finetuned_dcard_epoch2_pipeline pipeline DistilBertForSequenceClassification from Mou11209203 +author: John Snow Labs +name: distilbert_base_uncased_stock_classification_finetuned_dcard_epoch2_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_stock_classification_finetuned_dcard_epoch2_pipeline` is a English model originally trained by Mou11209203. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_stock_classification_finetuned_dcard_epoch2_pipeline_en_5.5.0_3.0_1726848947954.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_stock_classification_finetuned_dcard_epoch2_pipeline_en_5.5.0_3.0_1726848947954.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_stock_classification_finetuned_dcard_epoch2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_stock_classification_finetuned_dcard_epoch2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_stock_classification_finetuned_dcard_epoch2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Mou11209203/distilbert-base-uncased_stock_classification_finetuned_dcard_epoch2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_travel_zphr_0st_ut12ut1_plain_sp_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_travel_zphr_0st_ut12ut1_plain_sp_en.md new file mode 100644 index 00000000000000..82f4bcf1bd913b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_travel_zphr_0st_ut12ut1_plain_sp_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_travel_zphr_0st_ut12ut1_plain_sp DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_travel_zphr_0st_ut12ut1_plain_sp +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_travel_zphr_0st_ut12ut1_plain_sp` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_travel_zphr_0st_ut12ut1_plain_sp_en_5.5.0_3.0_1726809625202.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_travel_zphr_0st_ut12ut1_plain_sp_en_5.5.0_3.0_1726809625202.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_travel_zphr_0st_ut12ut1_plain_sp","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_travel_zphr_0st_ut12ut1_plain_sp", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_travel_zphr_0st_ut12ut1_plain_sp| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_travel_zphr_0st_ut12ut1_plain_sp \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_travel_zphr_0st_ut12ut1_plain_sp_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_travel_zphr_0st_ut12ut1_plain_sp_pipeline_en.md new file mode 100644 index 00000000000000..e6ff8293e1b924 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_travel_zphr_0st_ut12ut1_plain_sp_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_travel_zphr_0st_ut12ut1_plain_sp_pipeline pipeline DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_travel_zphr_0st_ut12ut1_plain_sp_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_travel_zphr_0st_ut12ut1_plain_sp_pipeline` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_travel_zphr_0st_ut12ut1_plain_sp_pipeline_en_5.5.0_3.0_1726809637823.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_travel_zphr_0st_ut12ut1_plain_sp_pipeline_en_5.5.0_3.0_1726809637823.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_travel_zphr_0st_ut12ut1_plain_sp_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_travel_zphr_0st_ut12ut1_plain_sp_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_travel_zphr_0st_ut12ut1_plain_sp_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_travel_zphr_0st_ut12ut1_plain_sp + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_travel_zphr_0st_ut52ut1_ad7_simsp_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_travel_zphr_0st_ut52ut1_ad7_simsp_en.md new file mode 100644 index 00000000000000..15905397a298eb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_travel_zphr_0st_ut52ut1_ad7_simsp_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_travel_zphr_0st_ut52ut1_ad7_simsp DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_travel_zphr_0st_ut52ut1_ad7_simsp +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_travel_zphr_0st_ut52ut1_ad7_simsp` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_travel_zphr_0st_ut52ut1_ad7_simsp_en_5.5.0_3.0_1726823659626.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_travel_zphr_0st_ut52ut1_ad7_simsp_en_5.5.0_3.0_1726823659626.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_travel_zphr_0st_ut52ut1_ad7_simsp","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_travel_zphr_0st_ut52ut1_ad7_simsp", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_travel_zphr_0st_ut52ut1_ad7_simsp| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_travel_zphr_0st_ut52ut1_ad7_simsp \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_travel_zphr_0st_ut52ut1_ad7_simsp_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_travel_zphr_0st_ut52ut1_ad7_simsp_pipeline_en.md new file mode 100644 index 00000000000000..7e1f6f7f1493a2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_travel_zphr_0st_ut52ut1_ad7_simsp_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_travel_zphr_0st_ut52ut1_ad7_simsp_pipeline pipeline DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_travel_zphr_0st_ut52ut1_ad7_simsp_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_travel_zphr_0st_ut52ut1_ad7_simsp_pipeline` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_travel_zphr_0st_ut52ut1_ad7_simsp_pipeline_en_5.5.0_3.0_1726823673172.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_travel_zphr_0st_ut52ut1_ad7_simsp_pipeline_en_5.5.0_3.0_1726823673172.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_travel_zphr_0st_ut52ut1_ad7_simsp_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_travel_zphr_0st_ut52ut1_ad7_simsp_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_travel_zphr_0st_ut52ut1_ad7_simsp_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_travel_zphr_0st_ut52ut1_ad7_simsp + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_tuvalu_zephyr_1shot_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_tuvalu_zephyr_1shot_en.md new file mode 100644 index 00000000000000..7e546894297c6d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_tuvalu_zephyr_1shot_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_tuvalu_zephyr_1shot DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_tuvalu_zephyr_1shot +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_tuvalu_zephyr_1shot` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_tuvalu_zephyr_1shot_en_5.5.0_3.0_1726848809553.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_tuvalu_zephyr_1shot_en_5.5.0_3.0_1726848809553.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_tuvalu_zephyr_1shot","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_tuvalu_zephyr_1shot", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_tuvalu_zephyr_1shot| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_tvl_zephyr_1shot \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_tuvalu_zephyr_1shot_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_tuvalu_zephyr_1shot_pipeline_en.md new file mode 100644 index 00000000000000..4bbd82484ccce7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_tuvalu_zephyr_1shot_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_tuvalu_zephyr_1shot_pipeline pipeline DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_tuvalu_zephyr_1shot_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_tuvalu_zephyr_1shot_pipeline` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_tuvalu_zephyr_1shot_pipeline_en_5.5.0_3.0_1726848827728.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_tuvalu_zephyr_1shot_pipeline_en_5.5.0_3.0_1726848827728.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_tuvalu_zephyr_1shot_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_tuvalu_zephyr_1shot_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_tuvalu_zephyr_1shot_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_tvl_zephyr_1shot + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_tuvalu_zephyr_3shot_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_tuvalu_zephyr_3shot_en.md new file mode 100644 index 00000000000000..36ef3512b70439 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_tuvalu_zephyr_3shot_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_tuvalu_zephyr_3shot DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_tuvalu_zephyr_3shot +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_tuvalu_zephyr_3shot` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_tuvalu_zephyr_3shot_en_5.5.0_3.0_1726832865573.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_tuvalu_zephyr_3shot_en_5.5.0_3.0_1726832865573.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_tuvalu_zephyr_3shot","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_tuvalu_zephyr_3shot", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_tuvalu_zephyr_3shot| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_tvl_zephyr_3shot \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_tuvalu_zephyr_3shot_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_tuvalu_zephyr_3shot_pipeline_en.md new file mode 100644 index 00000000000000..756ba3b9a41acf --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_tuvalu_zephyr_3shot_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_tuvalu_zephyr_3shot_pipeline pipeline DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_tuvalu_zephyr_3shot_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_tuvalu_zephyr_3shot_pipeline` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_tuvalu_zephyr_3shot_pipeline_en_5.5.0_3.0_1726832878533.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_tuvalu_zephyr_3shot_pipeline_en_5.5.0_3.0_1726832878533.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_tuvalu_zephyr_3shot_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_tuvalu_zephyr_3shot_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_tuvalu_zephyr_3shot_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_tvl_zephyr_3shot + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_utility_zphr_0st_ut52ut1_plain_sp_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_utility_zphr_0st_ut52ut1_plain_sp_en.md new file mode 100644 index 00000000000000..ed5809bef75573 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_utility_zphr_0st_ut52ut1_plain_sp_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_utility_zphr_0st_ut52ut1_plain_sp DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_utility_zphr_0st_ut52ut1_plain_sp +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_utility_zphr_0st_ut52ut1_plain_sp` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_utility_zphr_0st_ut52ut1_plain_sp_en_5.5.0_3.0_1726830015848.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_utility_zphr_0st_ut52ut1_plain_sp_en_5.5.0_3.0_1726830015848.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_utility_zphr_0st_ut52ut1_plain_sp","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_utility_zphr_0st_ut52ut1_plain_sp", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_utility_zphr_0st_ut52ut1_plain_sp| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_utility_zphr_0st_ut52ut1_plain_sp \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_utility_zphr_0st_ut52ut1_plain_sp_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_utility_zphr_0st_ut52ut1_plain_sp_pipeline_en.md new file mode 100644 index 00000000000000..c8accd631da8c2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_utility_zphr_0st_ut52ut1_plain_sp_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_utility_zphr_0st_ut52ut1_plain_sp_pipeline pipeline DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_utility_zphr_0st_ut52ut1_plain_sp_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_utility_zphr_0st_ut52ut1_plain_sp_pipeline` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_utility_zphr_0st_ut52ut1_plain_sp_pipeline_en_5.5.0_3.0_1726830027554.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_utility_zphr_0st_ut52ut1_plain_sp_pipeline_en_5.5.0_3.0_1726830027554.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_utility_zphr_0st_ut52ut1_plain_sp_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_utility_zphr_0st_ut52ut1_plain_sp_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_utility_zphr_0st_ut52ut1_plain_sp_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_utility_zphr_0st_ut52ut1_plain_sp + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_work_zphr_0st_ut72ut1_ad7dsc3_simsp_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_work_zphr_0st_ut72ut1_ad7dsc3_simsp_en.md new file mode 100644 index 00000000000000..3088de542d7584 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_work_zphr_0st_ut72ut1_ad7dsc3_simsp_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_work_zphr_0st_ut72ut1_ad7dsc3_simsp DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_work_zphr_0st_ut72ut1_ad7dsc3_simsp +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_work_zphr_0st_ut72ut1_ad7dsc3_simsp` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_work_zphr_0st_ut72ut1_ad7dsc3_simsp_en_5.5.0_3.0_1726823471280.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_work_zphr_0st_ut72ut1_ad7dsc3_simsp_en_5.5.0_3.0_1726823471280.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_work_zphr_0st_ut72ut1_ad7dsc3_simsp","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_work_zphr_0st_ut72ut1_ad7dsc3_simsp", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_work_zphr_0st_ut72ut1_ad7dsc3_simsp| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_work_zphr_0st_ut72ut1_ad7dsc3_simsp \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_work_zphr_0st_ut72ut1_ad7dsc3_simsp_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_work_zphr_0st_ut72ut1_ad7dsc3_simsp_pipeline_en.md new file mode 100644 index 00000000000000..b0d6246ffb017a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_work_zphr_0st_ut72ut1_ad7dsc3_simsp_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_work_zphr_0st_ut72ut1_ad7dsc3_simsp_pipeline pipeline DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_work_zphr_0st_ut72ut1_ad7dsc3_simsp_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_work_zphr_0st_ut72ut1_ad7dsc3_simsp_pipeline` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_work_zphr_0st_ut72ut1_ad7dsc3_simsp_pipeline_en_5.5.0_3.0_1726823484191.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_work_zphr_0st_ut72ut1_ad7dsc3_simsp_pipeline_en_5.5.0_3.0_1726823484191.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_work_zphr_0st_ut72ut1_ad7dsc3_simsp_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_work_zphr_0st_ut72ut1_ad7dsc3_simsp_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_work_zphr_0st_ut72ut1_ad7dsc3_simsp_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_work_zphr_0st_ut72ut1_ad7dsc3_simsp + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_classification_task1_post_title_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_classification_task1_post_title_en.md new file mode 100644 index 00000000000000..f3f69443299220 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_classification_task1_post_title_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_classification_task1_post_title DistilBertForSequenceClassification from abdulmanaam +author: John Snow Labs +name: distilbert_classification_task1_post_title +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_classification_task1_post_title` is a English model originally trained by abdulmanaam. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_classification_task1_post_title_en_5.5.0_3.0_1726842214507.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_classification_task1_post_title_en_5.5.0_3.0_1726842214507.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_classification_task1_post_title","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_classification_task1_post_title", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_classification_task1_post_title| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/abdulmanaam/distilbert_classification_task1_post_title \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_classification_task1_post_title_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_classification_task1_post_title_pipeline_en.md new file mode 100644 index 00000000000000..2450e8c6f1e293 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_classification_task1_post_title_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_classification_task1_post_title_pipeline pipeline DistilBertForSequenceClassification from abdulmanaam +author: John Snow Labs +name: distilbert_classification_task1_post_title_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_classification_task1_post_title_pipeline` is a English model originally trained by abdulmanaam. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_classification_task1_post_title_pipeline_en_5.5.0_3.0_1726842227193.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_classification_task1_post_title_pipeline_en_5.5.0_3.0_1726842227193.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_classification_task1_post_title_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_classification_task1_post_title_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_classification_task1_post_title_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/abdulmanaam/distilbert_classification_task1_post_title + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_coping_reaction_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_coping_reaction_en.md new file mode 100644 index 00000000000000..608c233607495a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_coping_reaction_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_coping_reaction DistilBertForSequenceClassification from coping-appraisal +author: John Snow Labs +name: distilbert_coping_reaction +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_coping_reaction` is a English model originally trained by coping-appraisal. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_coping_reaction_en_5.5.0_3.0_1726841105009.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_coping_reaction_en_5.5.0_3.0_1726841105009.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_coping_reaction","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_coping_reaction", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_coping_reaction| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/coping-appraisal/distilbert-coping-reaction \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_coping_reaction_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_coping_reaction_pipeline_en.md new file mode 100644 index 00000000000000..fb2038be3d1fc4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_coping_reaction_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_coping_reaction_pipeline pipeline DistilBertForSequenceClassification from coping-appraisal +author: John Snow Labs +name: distilbert_coping_reaction_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_coping_reaction_pipeline` is a English model originally trained by coping-appraisal. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_coping_reaction_pipeline_en_5.5.0_3.0_1726841119316.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_coping_reaction_pipeline_en_5.5.0_3.0_1726841119316.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_coping_reaction_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_coping_reaction_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_coping_reaction_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/coping-appraisal/distilbert-coping-reaction + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_emotion_albeirog1681_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_emotion_albeirog1681_en.md new file mode 100644 index 00000000000000..3000b51ceba30e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_emotion_albeirog1681_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_emotion_albeirog1681 DistilBertForSequenceClassification from albeirog1681 +author: John Snow Labs +name: distilbert_emotion_albeirog1681 +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_emotion_albeirog1681` is a English model originally trained by albeirog1681. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_emotion_albeirog1681_en_5.5.0_3.0_1726832724411.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_emotion_albeirog1681_en_5.5.0_3.0_1726832724411.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_emotion_albeirog1681","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_emotion_albeirog1681", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_emotion_albeirog1681| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/albeirog1681/distilbert-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_emotion_albeirog1681_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_emotion_albeirog1681_pipeline_en.md new file mode 100644 index 00000000000000..d98ec90cae04a8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_emotion_albeirog1681_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_emotion_albeirog1681_pipeline pipeline DistilBertForSequenceClassification from albeirog1681 +author: John Snow Labs +name: distilbert_emotion_albeirog1681_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_emotion_albeirog1681_pipeline` is a English model originally trained by albeirog1681. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_emotion_albeirog1681_pipeline_en_5.5.0_3.0_1726832737105.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_emotion_albeirog1681_pipeline_en_5.5.0_3.0_1726832737105.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_emotion_albeirog1681_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_emotion_albeirog1681_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_emotion_albeirog1681_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/albeirog1681/distilbert-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_emotion_karlaroco_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_emotion_karlaroco_en.md new file mode 100644 index 00000000000000..ae795a41b8a86d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_emotion_karlaroco_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_emotion_karlaroco DistilBertForSequenceClassification from karlaroco +author: John Snow Labs +name: distilbert_emotion_karlaroco +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_emotion_karlaroco` is a English model originally trained by karlaroco. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_emotion_karlaroco_en_5.5.0_3.0_1726809720436.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_emotion_karlaroco_en_5.5.0_3.0_1726809720436.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_emotion_karlaroco","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_emotion_karlaroco", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_emotion_karlaroco| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/karlaroco/distilbert-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_emotion_karlaroco_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_emotion_karlaroco_pipeline_en.md new file mode 100644 index 00000000000000..bc4e07f0a4b121 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_emotion_karlaroco_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_emotion_karlaroco_pipeline pipeline DistilBertForSequenceClassification from karlaroco +author: John Snow Labs +name: distilbert_emotion_karlaroco_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_emotion_karlaroco_pipeline` is a English model originally trained by karlaroco. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_emotion_karlaroco_pipeline_en_5.5.0_3.0_1726809732551.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_emotion_karlaroco_pipeline_en_5.5.0_3.0_1726809732551.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_emotion_karlaroco_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_emotion_karlaroco_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_emotion_karlaroco_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/karlaroco/distilbert-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_finetuned_imdb_sinensia_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_finetuned_imdb_sinensia_en.md new file mode 100644 index 00000000000000..d5be51451cfc53 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_finetuned_imdb_sinensia_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_finetuned_imdb_sinensia DistilBertForSequenceClassification from sinensia +author: John Snow Labs +name: distilbert_finetuned_imdb_sinensia +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_finetuned_imdb_sinensia` is a English model originally trained by sinensia. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_finetuned_imdb_sinensia_en_5.5.0_3.0_1726848524004.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_finetuned_imdb_sinensia_en_5.5.0_3.0_1726848524004.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_finetuned_imdb_sinensia","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_finetuned_imdb_sinensia", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_finetuned_imdb_sinensia| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/sinensia/distilbert-finetuned-imdb \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_finetuned_imdb_sinensia_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_finetuned_imdb_sinensia_pipeline_en.md new file mode 100644 index 00000000000000..c3c1aec4d184a7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_finetuned_imdb_sinensia_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_finetuned_imdb_sinensia_pipeline pipeline DistilBertForSequenceClassification from sinensia +author: John Snow Labs +name: distilbert_finetuned_imdb_sinensia_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_finetuned_imdb_sinensia_pipeline` is a English model originally trained by sinensia. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_finetuned_imdb_sinensia_pipeline_en_5.5.0_3.0_1726848540849.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_finetuned_imdb_sinensia_pipeline_en_5.5.0_3.0_1726848540849.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_finetuned_imdb_sinensia_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_finetuned_imdb_sinensia_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_finetuned_imdb_sinensia_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/sinensia/distilbert-finetuned-imdb + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_finetuned_vicentenedor_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_finetuned_vicentenedor_pipeline_en.md new file mode 100644 index 00000000000000..0b6a45d1e89188 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_finetuned_vicentenedor_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_finetuned_vicentenedor_pipeline pipeline DistilBertForSequenceClassification from vicentenedor +author: John Snow Labs +name: distilbert_finetuned_vicentenedor_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_finetuned_vicentenedor_pipeline` is a English model originally trained by vicentenedor. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_finetuned_vicentenedor_pipeline_en_5.5.0_3.0_1726841231438.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_finetuned_vicentenedor_pipeline_en_5.5.0_3.0_1726841231438.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_finetuned_vicentenedor_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_finetuned_vicentenedor_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_finetuned_vicentenedor_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/vicentenedor/distilbert-finetuned + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_imdb_finetune_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_imdb_finetune_en.md new file mode 100644 index 00000000000000..a1d3bae72e38f0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_imdb_finetune_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_imdb_finetune DistilBertForSequenceClassification from edogarci +author: John Snow Labs +name: distilbert_imdb_finetune +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_imdb_finetune` is a English model originally trained by edogarci. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_imdb_finetune_en_5.5.0_3.0_1726860841862.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_imdb_finetune_en_5.5.0_3.0_1726860841862.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_imdb_finetune","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_imdb_finetune", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_imdb_finetune| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/edogarci/distilbert-imdb-finetune \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_imdb_finetune_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_imdb_finetune_pipeline_en.md new file mode 100644 index 00000000000000..f75f5c119857da --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_imdb_finetune_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_imdb_finetune_pipeline pipeline DistilBertForSequenceClassification from edogarci +author: John Snow Labs +name: distilbert_imdb_finetune_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_imdb_finetune_pipeline` is a English model originally trained by edogarci. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_imdb_finetune_pipeline_en_5.5.0_3.0_1726860854146.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_imdb_finetune_pipeline_en_5.5.0_3.0_1726860854146.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_imdb_finetune_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_imdb_finetune_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_imdb_finetune_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/edogarci/distilbert-imdb-finetune + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_imdb_huggingface_ysphang_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_imdb_huggingface_ysphang_en.md new file mode 100644 index 00000000000000..0eb734f6655c09 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_imdb_huggingface_ysphang_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_imdb_huggingface_ysphang DistilBertForSequenceClassification from ysphang +author: John Snow Labs +name: distilbert_imdb_huggingface_ysphang +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_imdb_huggingface_ysphang` is a English model originally trained by ysphang. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_imdb_huggingface_ysphang_en_5.5.0_3.0_1726860726167.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_imdb_huggingface_ysphang_en_5.5.0_3.0_1726860726167.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_imdb_huggingface_ysphang","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_imdb_huggingface_ysphang", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_imdb_huggingface_ysphang| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/ysphang/DISTILBERT-IMDB-HUGGINGFACE \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_imdb_huggingface_ysphang_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_imdb_huggingface_ysphang_pipeline_en.md new file mode 100644 index 00000000000000..8e903fc646a172 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_imdb_huggingface_ysphang_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_imdb_huggingface_ysphang_pipeline pipeline DistilBertForSequenceClassification from ysphang +author: John Snow Labs +name: distilbert_imdb_huggingface_ysphang_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_imdb_huggingface_ysphang_pipeline` is a English model originally trained by ysphang. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_imdb_huggingface_ysphang_pipeline_en_5.5.0_3.0_1726860742430.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_imdb_huggingface_ysphang_pipeline_en_5.5.0_3.0_1726860742430.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_imdb_huggingface_ysphang_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_imdb_huggingface_ysphang_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_imdb_huggingface_ysphang_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/ysphang/DISTILBERT-IMDB-HUGGINGFACE + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_imdb_padding70model_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_imdb_padding70model_en.md new file mode 100644 index 00000000000000..59429875626677 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_imdb_padding70model_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_imdb_padding70model DistilBertForSequenceClassification from Realgon +author: John Snow Labs +name: distilbert_imdb_padding70model +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_imdb_padding70model` is a English model originally trained by Realgon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_imdb_padding70model_en_5.5.0_3.0_1726823689037.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_imdb_padding70model_en_5.5.0_3.0_1726823689037.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_imdb_padding70model","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_imdb_padding70model", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_imdb_padding70model| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Realgon/distilbert_imdb_padding70model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_imdb_padding70model_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_imdb_padding70model_pipeline_en.md new file mode 100644 index 00000000000000..488677c2b64331 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_imdb_padding70model_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_imdb_padding70model_pipeline pipeline DistilBertForSequenceClassification from Realgon +author: John Snow Labs +name: distilbert_imdb_padding70model_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_imdb_padding70model_pipeline` is a English model originally trained by Realgon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_imdb_padding70model_pipeline_en_5.5.0_3.0_1726823700720.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_imdb_padding70model_pipeline_en_5.5.0_3.0_1726823700720.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_imdb_padding70model_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_imdb_padding70model_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_imdb_padding70model_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Realgon/distilbert_imdb_padding70model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_intent_tanmoyeeroy_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_intent_tanmoyeeroy_en.md new file mode 100644 index 00000000000000..88cb9c046417dc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_intent_tanmoyeeroy_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_intent_tanmoyeeroy DistilBertForSequenceClassification from TanmoyeeRoy +author: John Snow Labs +name: distilbert_intent_tanmoyeeroy +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_intent_tanmoyeeroy` is a English model originally trained by TanmoyeeRoy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_intent_tanmoyeeroy_en_5.5.0_3.0_1726829858647.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_intent_tanmoyeeroy_en_5.5.0_3.0_1726829858647.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_intent_tanmoyeeroy","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_intent_tanmoyeeroy", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_intent_tanmoyeeroy| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/TanmoyeeRoy/distilbert_intent \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_intent_tanmoyeeroy_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_intent_tanmoyeeroy_pipeline_en.md new file mode 100644 index 00000000000000..418dbbb068563e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_intent_tanmoyeeroy_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_intent_tanmoyeeroy_pipeline pipeline DistilBertForSequenceClassification from TanmoyeeRoy +author: John Snow Labs +name: distilbert_intent_tanmoyeeroy_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_intent_tanmoyeeroy_pipeline` is a English model originally trained by TanmoyeeRoy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_intent_tanmoyeeroy_pipeline_en_5.5.0_3.0_1726829870358.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_intent_tanmoyeeroy_pipeline_en_5.5.0_3.0_1726829870358.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_intent_tanmoyeeroy_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_intent_tanmoyeeroy_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_intent_tanmoyeeroy_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/TanmoyeeRoy/distilbert_intent + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_product_classifier_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_product_classifier_en.md new file mode 100644 index 00000000000000..5db08f8f7804d3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_product_classifier_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_product_classifier BertForSequenceClassification from SavvySpender +author: John Snow Labs +name: distilbert_product_classifier +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_product_classifier` is a English model originally trained by SavvySpender. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_product_classifier_en_5.5.0_3.0_1726803593326.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_product_classifier_en_5.5.0_3.0_1726803593326.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("distilbert_product_classifier","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("distilbert_product_classifier", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_product_classifier| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/SavvySpender/distilbert-product-classifier \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_product_classifier_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_product_classifier_pipeline_en.md new file mode 100644 index 00000000000000..233f038cf1bf93 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_product_classifier_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_product_classifier_pipeline pipeline BertForSequenceClassification from SavvySpender +author: John Snow Labs +name: distilbert_product_classifier_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_product_classifier_pipeline` is a English model originally trained by SavvySpender. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_product_classifier_pipeline_en_5.5.0_3.0_1726803653229.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_product_classifier_pipeline_en_5.5.0_3.0_1726803653229.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_product_classifier_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_product_classifier_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_product_classifier_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/SavvySpender/distilbert-product-classifier + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_reviews_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_reviews_en.md new file mode 100644 index 00000000000000..39c92db51ec67c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_reviews_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_reviews DistilBertForSequenceClassification from AGnatkiv +author: John Snow Labs +name: distilbert_reviews +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_reviews` is a English model originally trained by AGnatkiv. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_reviews_en_5.5.0_3.0_1726841103867.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_reviews_en_5.5.0_3.0_1726841103867.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_reviews","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_reviews", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_reviews| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/AGnatkiv/distilbert-reviews \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_reviews_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_reviews_pipeline_en.md new file mode 100644 index 00000000000000..6af8e599a1f7ec --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_reviews_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_reviews_pipeline pipeline DistilBertForSequenceClassification from AGnatkiv +author: John Snow Labs +name: distilbert_reviews_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_reviews_pipeline` is a English model originally trained by AGnatkiv. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_reviews_pipeline_en_5.5.0_3.0_1726841120912.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_reviews_pipeline_en_5.5.0_3.0_1726841120912.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_reviews_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_reviews_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_reviews_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/AGnatkiv/distilbert-reviews + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_sanskrit_saskta_glue_experiment_cola_384_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_sanskrit_saskta_glue_experiment_cola_384_en.md new file mode 100644 index 00000000000000..74299fd5ba9b89 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_sanskrit_saskta_glue_experiment_cola_384_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_sanskrit_saskta_glue_experiment_cola_384 DistilBertForSequenceClassification from gokuls +author: John Snow Labs +name: distilbert_sanskrit_saskta_glue_experiment_cola_384 +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_sanskrit_saskta_glue_experiment_cola_384` is a English model originally trained by gokuls. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_sanskrit_saskta_glue_experiment_cola_384_en_5.5.0_3.0_1726842085285.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_sanskrit_saskta_glue_experiment_cola_384_en_5.5.0_3.0_1726842085285.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_sanskrit_saskta_glue_experiment_cola_384","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_sanskrit_saskta_glue_experiment_cola_384", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_sanskrit_saskta_glue_experiment_cola_384| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|111.8 MB| + +## References + +https://huggingface.co/gokuls/distilbert_sa_GLUE_Experiment_cola_384 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_sanskrit_saskta_glue_experiment_cola_384_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_sanskrit_saskta_glue_experiment_cola_384_pipeline_en.md new file mode 100644 index 00000000000000..1f9f43b2f78f43 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_sanskrit_saskta_glue_experiment_cola_384_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_sanskrit_saskta_glue_experiment_cola_384_pipeline pipeline DistilBertForSequenceClassification from gokuls +author: John Snow Labs +name: distilbert_sanskrit_saskta_glue_experiment_cola_384_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_sanskrit_saskta_glue_experiment_cola_384_pipeline` is a English model originally trained by gokuls. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_sanskrit_saskta_glue_experiment_cola_384_pipeline_en_5.5.0_3.0_1726842090763.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_sanskrit_saskta_glue_experiment_cola_384_pipeline_en_5.5.0_3.0_1726842090763.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_sanskrit_saskta_glue_experiment_cola_384_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_sanskrit_saskta_glue_experiment_cola_384_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_sanskrit_saskta_glue_experiment_cola_384_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|111.8 MB| + +## References + +https://huggingface.co/gokuls/distilbert_sa_GLUE_Experiment_cola_384 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_sanskrit_saskta_glue_experiment_cola_96_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_sanskrit_saskta_glue_experiment_cola_96_en.md new file mode 100644 index 00000000000000..05431ac042f308 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_sanskrit_saskta_glue_experiment_cola_96_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_sanskrit_saskta_glue_experiment_cola_96 DistilBertForSequenceClassification from gokuls +author: John Snow Labs +name: distilbert_sanskrit_saskta_glue_experiment_cola_96 +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_sanskrit_saskta_glue_experiment_cola_96` is a English model originally trained by gokuls. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_sanskrit_saskta_glue_experiment_cola_96_en_5.5.0_3.0_1726849132877.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_sanskrit_saskta_glue_experiment_cola_96_en_5.5.0_3.0_1726849132877.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_sanskrit_saskta_glue_experiment_cola_96","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_sanskrit_saskta_glue_experiment_cola_96", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_sanskrit_saskta_glue_experiment_cola_96| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|25.7 MB| + +## References + +https://huggingface.co/gokuls/distilbert_sa_GLUE_Experiment_cola_96 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_sanskrit_saskta_glue_experiment_data_aug_qqp_192_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_sanskrit_saskta_glue_experiment_data_aug_qqp_192_en.md new file mode 100644 index 00000000000000..55d351d8981aaa --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_sanskrit_saskta_glue_experiment_data_aug_qqp_192_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_sanskrit_saskta_glue_experiment_data_aug_qqp_192 DistilBertForSequenceClassification from gokuls +author: John Snow Labs +name: distilbert_sanskrit_saskta_glue_experiment_data_aug_qqp_192 +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_sanskrit_saskta_glue_experiment_data_aug_qqp_192` is a English model originally trained by gokuls. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_sanskrit_saskta_glue_experiment_data_aug_qqp_192_en_5.5.0_3.0_1726832601703.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_sanskrit_saskta_glue_experiment_data_aug_qqp_192_en_5.5.0_3.0_1726832601703.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_sanskrit_saskta_glue_experiment_data_aug_qqp_192","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_sanskrit_saskta_glue_experiment_data_aug_qqp_192", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_sanskrit_saskta_glue_experiment_data_aug_qqp_192| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|52.7 MB| + +## References + +https://huggingface.co/gokuls/distilbert_sa_GLUE_Experiment_data_aug_qqp_192 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_sanskrit_saskta_glue_experiment_data_aug_qqp_192_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_sanskrit_saskta_glue_experiment_data_aug_qqp_192_pipeline_en.md new file mode 100644 index 00000000000000..7d203fff22217c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_sanskrit_saskta_glue_experiment_data_aug_qqp_192_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_sanskrit_saskta_glue_experiment_data_aug_qqp_192_pipeline pipeline DistilBertForSequenceClassification from gokuls +author: John Snow Labs +name: distilbert_sanskrit_saskta_glue_experiment_data_aug_qqp_192_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_sanskrit_saskta_glue_experiment_data_aug_qqp_192_pipeline` is a English model originally trained by gokuls. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_sanskrit_saskta_glue_experiment_data_aug_qqp_192_pipeline_en_5.5.0_3.0_1726832604590.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_sanskrit_saskta_glue_experiment_data_aug_qqp_192_pipeline_en_5.5.0_3.0_1726832604590.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_sanskrit_saskta_glue_experiment_data_aug_qqp_192_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_sanskrit_saskta_glue_experiment_data_aug_qqp_192_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_sanskrit_saskta_glue_experiment_data_aug_qqp_192_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|52.7 MB| + +## References + +https://huggingface.co/gokuls/distilbert_sa_GLUE_Experiment_data_aug_qqp_192 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_qnli_96_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_qnli_96_en.md new file mode 100644 index 00000000000000..ca06a5717c53be --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_qnli_96_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_qnli_96 DistilBertForSequenceClassification from gokuls +author: John Snow Labs +name: distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_qnli_96 +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_qnli_96` is a English model originally trained by gokuls. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_qnli_96_en_5.5.0_3.0_1726823448373.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_qnli_96_en_5.5.0_3.0_1726823448373.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_qnli_96","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_qnli_96", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_qnli_96| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|25.7 MB| + +## References + +https://huggingface.co/gokuls/distilbert_sa_GLUE_Experiment_logit_kd_data_aug_qnli_96 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_qnli_96_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_qnli_96_pipeline_en.md new file mode 100644 index 00000000000000..a36e92425206e8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_qnli_96_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_qnli_96_pipeline pipeline DistilBertForSequenceClassification from gokuls +author: John Snow Labs +name: distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_qnli_96_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_qnli_96_pipeline` is a English model originally trained by gokuls. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_qnli_96_pipeline_en_5.5.0_3.0_1726823450158.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_qnli_96_pipeline_en_5.5.0_3.0_1726823450158.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_qnli_96_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_qnli_96_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_qnli_96_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|25.7 MB| + +## References + +https://huggingface.co/gokuls/distilbert_sa_GLUE_Experiment_logit_kd_data_aug_qnli_96 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_qqp_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_qqp_en.md new file mode 100644 index 00000000000000..0f76c018c2e2d2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_qqp_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_qqp DistilBertForSequenceClassification from gokuls +author: John Snow Labs +name: distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_qqp +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_qqp` is a English model originally trained by gokuls. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_qqp_en_5.5.0_3.0_1726840896213.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_qqp_en_5.5.0_3.0_1726840896213.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_qqp","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_qqp", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_qqp| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|251.2 MB| + +## References + +https://huggingface.co/gokuls/distilbert_sa_GLUE_Experiment_logit_kd_data_aug_qqp \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_qqp_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_qqp_pipeline_en.md new file mode 100644 index 00000000000000..ad666099497c17 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_qqp_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_qqp_pipeline pipeline DistilBertForSequenceClassification from gokuls +author: John Snow Labs +name: distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_qqp_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_qqp_pipeline` is a English model originally trained by gokuls. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_qqp_pipeline_en_5.5.0_3.0_1726840909601.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_qqp_pipeline_en_5.5.0_3.0_1726840909601.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_qqp_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_qqp_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_qqp_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|251.2 MB| + +## References + +https://huggingface.co/gokuls/distilbert_sa_GLUE_Experiment_logit_kd_data_aug_qqp + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_wnli_256_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_wnli_256_en.md new file mode 100644 index 00000000000000..6fe1ece2aab009 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_wnli_256_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_wnli_256 DistilBertForSequenceClassification from gokuls +author: John Snow Labs +name: distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_wnli_256 +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_wnli_256` is a English model originally trained by gokuls. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_wnli_256_en_5.5.0_3.0_1726830099761.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_wnli_256_en_5.5.0_3.0_1726830099761.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_wnli_256","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_wnli_256", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_wnli_256| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|71.6 MB| + +## References + +https://huggingface.co/gokuls/distilbert_sa_GLUE_Experiment_logit_kd_data_aug_wnli_256 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_wnli_256_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_wnli_256_pipeline_en.md new file mode 100644 index 00000000000000..db7cd23fd1cc2d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_wnli_256_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_wnli_256_pipeline pipeline DistilBertForSequenceClassification from gokuls +author: John Snow Labs +name: distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_wnli_256_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_wnli_256_pipeline` is a English model originally trained by gokuls. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_wnli_256_pipeline_en_5.5.0_3.0_1726830103679.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_wnli_256_pipeline_en_5.5.0_3.0_1726830103679.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_wnli_256_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_wnli_256_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_wnli_256_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|71.6 MB| + +## References + +https://huggingface.co/gokuls/distilbert_sa_GLUE_Experiment_logit_kd_data_aug_wnli_256 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_sanskrit_saskta_glue_experiment_logit_kd_pretrain_sst2_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_sanskrit_saskta_glue_experiment_logit_kd_pretrain_sst2_en.md new file mode 100644 index 00000000000000..8ad2c7686c12b8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_sanskrit_saskta_glue_experiment_logit_kd_pretrain_sst2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_sanskrit_saskta_glue_experiment_logit_kd_pretrain_sst2 DistilBertForSequenceClassification from gokuls +author: John Snow Labs +name: distilbert_sanskrit_saskta_glue_experiment_logit_kd_pretrain_sst2 +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_sanskrit_saskta_glue_experiment_logit_kd_pretrain_sst2` is a English model originally trained by gokuls. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_sanskrit_saskta_glue_experiment_logit_kd_pretrain_sst2_en_5.5.0_3.0_1726832625532.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_sanskrit_saskta_glue_experiment_logit_kd_pretrain_sst2_en_5.5.0_3.0_1726832625532.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_sanskrit_saskta_glue_experiment_logit_kd_pretrain_sst2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_sanskrit_saskta_glue_experiment_logit_kd_pretrain_sst2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_sanskrit_saskta_glue_experiment_logit_kd_pretrain_sst2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/gokuls/distilbert_sa_GLUE_Experiment_logit_kd_pretrain_sst2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_sanskrit_saskta_glue_experiment_logit_kd_pretrain_sst2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_sanskrit_saskta_glue_experiment_logit_kd_pretrain_sst2_pipeline_en.md new file mode 100644 index 00000000000000..6cd4f218453896 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_sanskrit_saskta_glue_experiment_logit_kd_pretrain_sst2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_sanskrit_saskta_glue_experiment_logit_kd_pretrain_sst2_pipeline pipeline DistilBertForSequenceClassification from gokuls +author: John Snow Labs +name: distilbert_sanskrit_saskta_glue_experiment_logit_kd_pretrain_sst2_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_sanskrit_saskta_glue_experiment_logit_kd_pretrain_sst2_pipeline` is a English model originally trained by gokuls. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_sanskrit_saskta_glue_experiment_logit_kd_pretrain_sst2_pipeline_en_5.5.0_3.0_1726832638034.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_sanskrit_saskta_glue_experiment_logit_kd_pretrain_sst2_pipeline_en_5.5.0_3.0_1726832638034.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_sanskrit_saskta_glue_experiment_logit_kd_pretrain_sst2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_sanskrit_saskta_glue_experiment_logit_kd_pretrain_sst2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_sanskrit_saskta_glue_experiment_logit_kd_pretrain_sst2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/gokuls/distilbert_sa_GLUE_Experiment_logit_kd_pretrain_sst2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_sanskrit_saskta_glue_experiment_qqp_192_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_sanskrit_saskta_glue_experiment_qqp_192_en.md new file mode 100644 index 00000000000000..fed3b4f865374e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_sanskrit_saskta_glue_experiment_qqp_192_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_sanskrit_saskta_glue_experiment_qqp_192 DistilBertForSequenceClassification from gokuls +author: John Snow Labs +name: distilbert_sanskrit_saskta_glue_experiment_qqp_192 +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_sanskrit_saskta_glue_experiment_qqp_192` is a English model originally trained by gokuls. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_sanskrit_saskta_glue_experiment_qqp_192_en_5.5.0_3.0_1726829840449.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_sanskrit_saskta_glue_experiment_qqp_192_en_5.5.0_3.0_1726829840449.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_sanskrit_saskta_glue_experiment_qqp_192","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_sanskrit_saskta_glue_experiment_qqp_192", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_sanskrit_saskta_glue_experiment_qqp_192| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|52.7 MB| + +## References + +https://huggingface.co/gokuls/distilbert_sa_GLUE_Experiment_qqp_192 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_sanskrit_saskta_glue_experiment_qqp_192_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_sanskrit_saskta_glue_experiment_qqp_192_pipeline_en.md new file mode 100644 index 00000000000000..e49c8e59770a20 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_sanskrit_saskta_glue_experiment_qqp_192_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_sanskrit_saskta_glue_experiment_qqp_192_pipeline pipeline DistilBertForSequenceClassification from gokuls +author: John Snow Labs +name: distilbert_sanskrit_saskta_glue_experiment_qqp_192_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_sanskrit_saskta_glue_experiment_qqp_192_pipeline` is a English model originally trained by gokuls. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_sanskrit_saskta_glue_experiment_qqp_192_pipeline_en_5.5.0_3.0_1726829843936.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_sanskrit_saskta_glue_experiment_qqp_192_pipeline_en_5.5.0_3.0_1726829843936.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_sanskrit_saskta_glue_experiment_qqp_192_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_sanskrit_saskta_glue_experiment_qqp_192_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_sanskrit_saskta_glue_experiment_qqp_192_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|52.7 MB| + +## References + +https://huggingface.co/gokuls/distilbert_sa_GLUE_Experiment_qqp_192 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_scam_classifier_v1_1_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_scam_classifier_v1_1_en.md new file mode 100644 index 00000000000000..38d4168ba3abe5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_scam_classifier_v1_1_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_scam_classifier_v1_1 DistilBertForSequenceClassification from BothBosu +author: John Snow Labs +name: distilbert_scam_classifier_v1_1 +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_scam_classifier_v1_1` is a English model originally trained by BothBosu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_scam_classifier_v1_1_en_5.5.0_3.0_1726841091839.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_scam_classifier_v1_1_en_5.5.0_3.0_1726841091839.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_scam_classifier_v1_1","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_scam_classifier_v1_1", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_scam_classifier_v1_1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/BothBosu/distilbert-scam-classifier-v1.1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_scam_classifier_v1_1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_scam_classifier_v1_1_pipeline_en.md new file mode 100644 index 00000000000000..424feb91b4770f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_scam_classifier_v1_1_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_scam_classifier_v1_1_pipeline pipeline DistilBertForSequenceClassification from BothBosu +author: John Snow Labs +name: distilbert_scam_classifier_v1_1_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_scam_classifier_v1_1_pipeline` is a English model originally trained by BothBosu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_scam_classifier_v1_1_pipeline_en_5.5.0_3.0_1726841104197.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_scam_classifier_v1_1_pipeline_en_5.5.0_3.0_1726841104197.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_scam_classifier_v1_1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_scam_classifier_v1_1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_scam_classifier_v1_1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/BothBosu/distilbert-scam-classifier-v1.1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_sql_timeout_classifier_with_features_4096_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_sql_timeout_classifier_with_features_4096_en.md new file mode 100644 index 00000000000000..dda2c342f11484 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_sql_timeout_classifier_with_features_4096_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_sql_timeout_classifier_with_features_4096 DistilBertForSequenceClassification from Lifehouse +author: John Snow Labs +name: distilbert_sql_timeout_classifier_with_features_4096 +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_sql_timeout_classifier_with_features_4096` is a English model originally trained by Lifehouse. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_sql_timeout_classifier_with_features_4096_en_5.5.0_3.0_1726823659753.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_sql_timeout_classifier_with_features_4096_en_5.5.0_3.0_1726823659753.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_sql_timeout_classifier_with_features_4096","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_sql_timeout_classifier_with_features_4096", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_sql_timeout_classifier_with_features_4096| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|259.8 MB| + +## References + +https://huggingface.co/Lifehouse/distilbert-sql-timeout-classifier-with-features-4096 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_sql_timeout_classifier_with_features_4096_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_sql_timeout_classifier_with_features_4096_pipeline_en.md new file mode 100644 index 00000000000000..e8fffab08f7e88 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_sql_timeout_classifier_with_features_4096_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_sql_timeout_classifier_with_features_4096_pipeline pipeline DistilBertForSequenceClassification from Lifehouse +author: John Snow Labs +name: distilbert_sql_timeout_classifier_with_features_4096_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_sql_timeout_classifier_with_features_4096_pipeline` is a English model originally trained by Lifehouse. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_sql_timeout_classifier_with_features_4096_pipeline_en_5.5.0_3.0_1726823673687.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_sql_timeout_classifier_with_features_4096_pipeline_en_5.5.0_3.0_1726823673687.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_sql_timeout_classifier_with_features_4096_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_sql_timeout_classifier_with_features_4096_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_sql_timeout_classifier_with_features_4096_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|259.8 MB| + +## References + +https://huggingface.co/Lifehouse/distilbert-sql-timeout-classifier-with-features-4096 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_sst2_padding40model_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_sst2_padding40model_en.md new file mode 100644 index 00000000000000..29a9d66e5075e6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_sst2_padding40model_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_sst2_padding40model DistilBertForSequenceClassification from Realgon +author: John Snow Labs +name: distilbert_sst2_padding40model +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_sst2_padding40model` is a English model originally trained by Realgon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_sst2_padding40model_en_5.5.0_3.0_1726842183841.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_sst2_padding40model_en_5.5.0_3.0_1726842183841.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_sst2_padding40model","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_sst2_padding40model", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_sst2_padding40model| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Realgon/distilbert_sst2_padding40model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_sst2_padding40model_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_sst2_padding40model_pipeline_en.md new file mode 100644 index 00000000000000..e06a3aa502ad9f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_sst2_padding40model_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_sst2_padding40model_pipeline pipeline DistilBertForSequenceClassification from Realgon +author: John Snow Labs +name: distilbert_sst2_padding40model_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_sst2_padding40model_pipeline` is a English model originally trained by Realgon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_sst2_padding40model_pipeline_en_5.5.0_3.0_1726842195824.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_sst2_padding40model_pipeline_en_5.5.0_3.0_1726842195824.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_sst2_padding40model_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_sst2_padding40model_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_sst2_padding40model_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Realgon/distilbert_sst2_padding40model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_stackoverflow_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_stackoverflow_pipeline_en.md new file mode 100644 index 00000000000000..52707ff4b252ae --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_stackoverflow_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_stackoverflow_pipeline pipeline DistilBertForSequenceClassification from liambyrne +author: John Snow Labs +name: distilbert_stackoverflow_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_stackoverflow_pipeline` is a English model originally trained by liambyrne. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_stackoverflow_pipeline_en_5.5.0_3.0_1726809489144.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_stackoverflow_pipeline_en_5.5.0_3.0_1726809489144.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_stackoverflow_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_stackoverflow_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_stackoverflow_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/liambyrne/distilbert-stackoverflow + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_twitterfin_padding30model_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_twitterfin_padding30model_en.md new file mode 100644 index 00000000000000..035f9afa70516f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_twitterfin_padding30model_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_twitterfin_padding30model DistilBertForSequenceClassification from Realgon +author: John Snow Labs +name: distilbert_twitterfin_padding30model +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_twitterfin_padding30model` is a English model originally trained by Realgon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_twitterfin_padding30model_en_5.5.0_3.0_1726842359839.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_twitterfin_padding30model_en_5.5.0_3.0_1726842359839.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_twitterfin_padding30model","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_twitterfin_padding30model", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_twitterfin_padding30model| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Realgon/distilbert_twitterfin_padding30model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_twitterfin_padding30model_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_twitterfin_padding30model_pipeline_en.md new file mode 100644 index 00000000000000..715305ca7a901c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_twitterfin_padding30model_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_twitterfin_padding30model_pipeline pipeline DistilBertForSequenceClassification from Realgon +author: John Snow Labs +name: distilbert_twitterfin_padding30model_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_twitterfin_padding30model_pipeline` is a English model originally trained by Realgon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_twitterfin_padding30model_pipeline_en_5.5.0_3.0_1726842372352.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_twitterfin_padding30model_pipeline_en_5.5.0_3.0_1726842372352.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_twitterfin_padding30model_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_twitterfin_padding30model_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_twitterfin_padding30model_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Realgon/distilbert_twitterfin_padding30model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distillbert_edu_en.md b/docs/_posts/ahmedlone127/2024-09-20-distillbert_edu_en.md new file mode 100644 index 00000000000000..678973aaf6575a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distillbert_edu_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distillbert_edu RoBertaForSequenceClassification from debajyotidatta +author: John Snow Labs +name: distillbert_edu +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distillbert_edu` is a English model originally trained by debajyotidatta. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distillbert_edu_en_5.5.0_3.0_1726849784814.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distillbert_edu_en_5.5.0_3.0_1726849784814.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("distillbert_edu","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("distillbert_edu", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distillbert_edu| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|426.5 MB| + +## References + +https://huggingface.co/debajyotidatta/distillbert_edu \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distillbert_edu_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-distillbert_edu_pipeline_en.md new file mode 100644 index 00000000000000..925da56919b780 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distillbert_edu_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distillbert_edu_pipeline pipeline RoBertaForSequenceClassification from debajyotidatta +author: John Snow Labs +name: distillbert_edu_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distillbert_edu_pipeline` is a English model originally trained by debajyotidatta. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distillbert_edu_pipeline_en_5.5.0_3.0_1726849818289.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distillbert_edu_pipeline_en_5.5.0_3.0_1726849818289.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distillbert_edu_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distillbert_edu_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distillbert_edu_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|426.5 MB| + +## References + +https://huggingface.co/debajyotidatta/distillbert_edu + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distillbert_qsc_en.md b/docs/_posts/ahmedlone127/2024-09-20-distillbert_qsc_en.md new file mode 100644 index 00000000000000..7c875b265b08e5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distillbert_qsc_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distillbert_qsc DistilBertForSequenceClassification from thehyperpineapple +author: John Snow Labs +name: distillbert_qsc +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distillbert_qsc` is a English model originally trained by thehyperpineapple. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distillbert_qsc_en_5.5.0_3.0_1726824079221.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distillbert_qsc_en_5.5.0_3.0_1726824079221.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distillbert_qsc","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distillbert_qsc", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distillbert_qsc| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/thehyperpineapple/DistillBERT-QSC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distillbert_qsc_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-distillbert_qsc_pipeline_en.md new file mode 100644 index 00000000000000..d1c4e5a837e552 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distillbert_qsc_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distillbert_qsc_pipeline pipeline DistilBertForSequenceClassification from thehyperpineapple +author: John Snow Labs +name: distillbert_qsc_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distillbert_qsc_pipeline` is a English model originally trained by thehyperpineapple. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distillbert_qsc_pipeline_en_5.5.0_3.0_1726824090864.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distillbert_qsc_pipeline_en_5.5.0_3.0_1726824090864.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distillbert_qsc_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distillbert_qsc_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distillbert_qsc_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/thehyperpineapple/DistillBERT-QSC + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilroberta_base_ep20_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilroberta_base_ep20_en.md new file mode 100644 index 00000000000000..a3c11790d09e39 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilroberta_base_ep20_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilroberta_base_ep20 RoBertaEmbeddings from judy93536 +author: John Snow Labs +name: distilroberta_base_ep20 +date: 2024-09-20 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilroberta_base_ep20` is a English model originally trained by judy93536. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilroberta_base_ep20_en_5.5.0_3.0_1726816249060.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilroberta_base_ep20_en_5.5.0_3.0_1726816249060.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("distilroberta_base_ep20","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("distilroberta_base_ep20","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilroberta_base_ep20| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|306.4 MB| + +## References + +https://huggingface.co/judy93536/distilroberta-base-ep20 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilroberta_base_ft_askwomen_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilroberta_base_ft_askwomen_en.md new file mode 100644 index 00000000000000..4bcf951de6e063 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilroberta_base_ft_askwomen_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilroberta_base_ft_askwomen RoBertaEmbeddings from jkruk +author: John Snow Labs +name: distilroberta_base_ft_askwomen +date: 2024-09-20 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilroberta_base_ft_askwomen` is a English model originally trained by jkruk. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilroberta_base_ft_askwomen_en_5.5.0_3.0_1726857411524.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilroberta_base_ft_askwomen_en_5.5.0_3.0_1726857411524.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("distilroberta_base_ft_askwomen","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("distilroberta_base_ft_askwomen","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilroberta_base_ft_askwomen| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|306.5 MB| + +## References + +https://huggingface.co/jkruk/distilroberta-base-ft-AskWomen \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilroberta_base_ft_askwomen_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilroberta_base_ft_askwomen_pipeline_en.md new file mode 100644 index 00000000000000..943bdfaea82e5a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilroberta_base_ft_askwomen_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilroberta_base_ft_askwomen_pipeline pipeline RoBertaEmbeddings from jkruk +author: John Snow Labs +name: distilroberta_base_ft_askwomen_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilroberta_base_ft_askwomen_pipeline` is a English model originally trained by jkruk. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilroberta_base_ft_askwomen_pipeline_en_5.5.0_3.0_1726857426438.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilroberta_base_ft_askwomen_pipeline_en_5.5.0_3.0_1726857426438.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilroberta_base_ft_askwomen_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilroberta_base_ft_askwomen_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilroberta_base_ft_askwomen_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|306.5 MB| + +## References + +https://huggingface.co/jkruk/distilroberta-base-ft-AskWomen + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilroberta_base_ft_wow_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilroberta_base_ft_wow_en.md new file mode 100644 index 00000000000000..2c003fecc991e4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilroberta_base_ft_wow_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilroberta_base_ft_wow RoBertaEmbeddings from jkruk +author: John Snow Labs +name: distilroberta_base_ft_wow +date: 2024-09-20 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilroberta_base_ft_wow` is a English model originally trained by jkruk. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilroberta_base_ft_wow_en_5.5.0_3.0_1726815880227.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilroberta_base_ft_wow_en_5.5.0_3.0_1726815880227.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("distilroberta_base_ft_wow","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("distilroberta_base_ft_wow","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilroberta_base_ft_wow| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|306.5 MB| + +## References + +https://huggingface.co/jkruk/distilroberta-base-ft-wow \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilroberta_base_ft_wow_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilroberta_base_ft_wow_pipeline_en.md new file mode 100644 index 00000000000000..a031ac7e4d7a46 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilroberta_base_ft_wow_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilroberta_base_ft_wow_pipeline pipeline RoBertaEmbeddings from jkruk +author: John Snow Labs +name: distilroberta_base_ft_wow_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilroberta_base_ft_wow_pipeline` is a English model originally trained by jkruk. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilroberta_base_ft_wow_pipeline_en_5.5.0_3.0_1726815894958.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilroberta_base_ft_wow_pipeline_en_5.5.0_3.0_1726815894958.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilroberta_base_ft_wow_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilroberta_base_ft_wow_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilroberta_base_ft_wow_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|306.5 MB| + +## References + +https://huggingface.co/jkruk/distilroberta-base-ft-wow + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilrubert_tiny_2nd_finetune_epru_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilrubert_tiny_2nd_finetune_epru_en.md new file mode 100644 index 00000000000000..9aca7377fd2454 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilrubert_tiny_2nd_finetune_epru_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilrubert_tiny_2nd_finetune_epru DistilBertForSequenceClassification from mmillet +author: John Snow Labs +name: distilrubert_tiny_2nd_finetune_epru +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilrubert_tiny_2nd_finetune_epru` is a English model originally trained by mmillet. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilrubert_tiny_2nd_finetune_epru_en_5.5.0_3.0_1726848739170.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilrubert_tiny_2nd_finetune_epru_en_5.5.0_3.0_1726848739170.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilrubert_tiny_2nd_finetune_epru","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilrubert_tiny_2nd_finetune_epru", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilrubert_tiny_2nd_finetune_epru| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|39.2 MB| + +## References + +https://huggingface.co/mmillet/distilrubert-tiny-2nd-finetune-epru \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilrubert_tiny_2nd_finetune_epru_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilrubert_tiny_2nd_finetune_epru_pipeline_en.md new file mode 100644 index 00000000000000..b745136675ee08 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilrubert_tiny_2nd_finetune_epru_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilrubert_tiny_2nd_finetune_epru_pipeline pipeline DistilBertForSequenceClassification from mmillet +author: John Snow Labs +name: distilrubert_tiny_2nd_finetune_epru_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilrubert_tiny_2nd_finetune_epru_pipeline` is a English model originally trained by mmillet. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilrubert_tiny_2nd_finetune_epru_pipeline_en_5.5.0_3.0_1726848741278.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilrubert_tiny_2nd_finetune_epru_pipeline_en_5.5.0_3.0_1726848741278.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilrubert_tiny_2nd_finetune_epru_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilrubert_tiny_2nd_finetune_epru_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilrubert_tiny_2nd_finetune_epru_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|39.3 MB| + +## References + +https://huggingface.co/mmillet/distilrubert-tiny-2nd-finetune-epru + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-email_answer_extraction_en.md b/docs/_posts/ahmedlone127/2024-09-20-email_answer_extraction_en.md new file mode 100644 index 00000000000000..6cc2e672df9894 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-email_answer_extraction_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English email_answer_extraction RoBertaForTokenClassification from arya555 +author: John Snow Labs +name: email_answer_extraction +date: 2024-09-20 +tags: [en, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`email_answer_extraction` is a English model originally trained by arya555. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/email_answer_extraction_en_5.5.0_3.0_1726853165085.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/email_answer_extraction_en_5.5.0_3.0_1726853165085.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("email_answer_extraction","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("email_answer_extraction", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|email_answer_extraction| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|422.0 MB| + +## References + +https://huggingface.co/arya555/email_answer_extraction \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-email_answer_extraction_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-email_answer_extraction_pipeline_en.md new file mode 100644 index 00000000000000..99592426ee612d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-email_answer_extraction_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English email_answer_extraction_pipeline pipeline RoBertaForTokenClassification from arya555 +author: John Snow Labs +name: email_answer_extraction_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`email_answer_extraction_pipeline` is a English model originally trained by arya555. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/email_answer_extraction_pipeline_en_5.5.0_3.0_1726853200290.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/email_answer_extraction_pipeline_en_5.5.0_3.0_1726853200290.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("email_answer_extraction_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("email_answer_extraction_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|email_answer_extraction_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|422.0 MB| + +## References + +https://huggingface.co/arya555/email_answer_extraction + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-emoji_emoji_random0_seed0_bernice_en.md b/docs/_posts/ahmedlone127/2024-09-20-emoji_emoji_random0_seed0_bernice_en.md new file mode 100644 index 00000000000000..693b59958fade9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-emoji_emoji_random0_seed0_bernice_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English emoji_emoji_random0_seed0_bernice XlmRoBertaForSequenceClassification from tweettemposhift +author: John Snow Labs +name: emoji_emoji_random0_seed0_bernice +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`emoji_emoji_random0_seed0_bernice` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/emoji_emoji_random0_seed0_bernice_en_5.5.0_3.0_1726872128950.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/emoji_emoji_random0_seed0_bernice_en_5.5.0_3.0_1726872128950.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("emoji_emoji_random0_seed0_bernice","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("emoji_emoji_random0_seed0_bernice", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|emoji_emoji_random0_seed0_bernice| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|825.3 MB| + +## References + +https://huggingface.co/tweettemposhift/emoji-emoji_random0_seed0-bernice \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-emoji_emoji_random0_seed0_bernice_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-emoji_emoji_random0_seed0_bernice_pipeline_en.md new file mode 100644 index 00000000000000..8529ba9df28f8e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-emoji_emoji_random0_seed0_bernice_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English emoji_emoji_random0_seed0_bernice_pipeline pipeline XlmRoBertaForSequenceClassification from tweettemposhift +author: John Snow Labs +name: emoji_emoji_random0_seed0_bernice_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`emoji_emoji_random0_seed0_bernice_pipeline` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/emoji_emoji_random0_seed0_bernice_pipeline_en_5.5.0_3.0_1726872257778.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/emoji_emoji_random0_seed0_bernice_pipeline_en_5.5.0_3.0_1726872257778.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("emoji_emoji_random0_seed0_bernice_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("emoji_emoji_random0_seed0_bernice_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|emoji_emoji_random0_seed0_bernice_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|825.4 MB| + +## References + +https://huggingface.co/tweettemposhift/emoji-emoji_random0_seed0-bernice + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-emoji_emoji_random0_seed1_bernice_en.md b/docs/_posts/ahmedlone127/2024-09-20-emoji_emoji_random0_seed1_bernice_en.md new file mode 100644 index 00000000000000..db8e4dce97a937 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-emoji_emoji_random0_seed1_bernice_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English emoji_emoji_random0_seed1_bernice XlmRoBertaForSequenceClassification from tweettemposhift +author: John Snow Labs +name: emoji_emoji_random0_seed1_bernice +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`emoji_emoji_random0_seed1_bernice` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/emoji_emoji_random0_seed1_bernice_en_5.5.0_3.0_1726866019659.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/emoji_emoji_random0_seed1_bernice_en_5.5.0_3.0_1726866019659.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("emoji_emoji_random0_seed1_bernice","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("emoji_emoji_random0_seed1_bernice", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|emoji_emoji_random0_seed1_bernice| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|825.3 MB| + +## References + +https://huggingface.co/tweettemposhift/emoji-emoji_random0_seed1-bernice \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-emoji_emoji_random0_seed1_bernice_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-emoji_emoji_random0_seed1_bernice_pipeline_en.md new file mode 100644 index 00000000000000..585cc377a98f50 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-emoji_emoji_random0_seed1_bernice_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English emoji_emoji_random0_seed1_bernice_pipeline pipeline XlmRoBertaForSequenceClassification from tweettemposhift +author: John Snow Labs +name: emoji_emoji_random0_seed1_bernice_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`emoji_emoji_random0_seed1_bernice_pipeline` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/emoji_emoji_random0_seed1_bernice_pipeline_en_5.5.0_3.0_1726866146519.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/emoji_emoji_random0_seed1_bernice_pipeline_en_5.5.0_3.0_1726866146519.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("emoji_emoji_random0_seed1_bernice_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("emoji_emoji_random0_seed1_bernice_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|emoji_emoji_random0_seed1_bernice_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|825.4 MB| + +## References + +https://huggingface.co/tweettemposhift/emoji-emoji_random0_seed1-bernice + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-emoji_emoji_random0_seed1_twitter_roberta_base_2022_154m_en.md b/docs/_posts/ahmedlone127/2024-09-20-emoji_emoji_random0_seed1_twitter_roberta_base_2022_154m_en.md new file mode 100644 index 00000000000000..2e7c5aa5561fe3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-emoji_emoji_random0_seed1_twitter_roberta_base_2022_154m_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English emoji_emoji_random0_seed1_twitter_roberta_base_2022_154m RoBertaForSequenceClassification from tweettemposhift +author: John Snow Labs +name: emoji_emoji_random0_seed1_twitter_roberta_base_2022_154m +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`emoji_emoji_random0_seed1_twitter_roberta_base_2022_154m` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/emoji_emoji_random0_seed1_twitter_roberta_base_2022_154m_en_5.5.0_3.0_1726852356671.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/emoji_emoji_random0_seed1_twitter_roberta_base_2022_154m_en_5.5.0_3.0_1726852356671.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("emoji_emoji_random0_seed1_twitter_roberta_base_2022_154m","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("emoji_emoji_random0_seed1_twitter_roberta_base_2022_154m", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|emoji_emoji_random0_seed1_twitter_roberta_base_2022_154m| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|468.4 MB| + +## References + +https://huggingface.co/tweettemposhift/emoji-emoji_random0_seed1-twitter-roberta-base-2022-154m \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-emoji_emoji_random0_seed1_twitter_roberta_base_2022_154m_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-emoji_emoji_random0_seed1_twitter_roberta_base_2022_154m_pipeline_en.md new file mode 100644 index 00000000000000..ba5fa169e058ae --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-emoji_emoji_random0_seed1_twitter_roberta_base_2022_154m_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English emoji_emoji_random0_seed1_twitter_roberta_base_2022_154m_pipeline pipeline RoBertaForSequenceClassification from tweettemposhift +author: John Snow Labs +name: emoji_emoji_random0_seed1_twitter_roberta_base_2022_154m_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`emoji_emoji_random0_seed1_twitter_roberta_base_2022_154m_pipeline` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/emoji_emoji_random0_seed1_twitter_roberta_base_2022_154m_pipeline_en_5.5.0_3.0_1726852379610.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/emoji_emoji_random0_seed1_twitter_roberta_base_2022_154m_pipeline_en_5.5.0_3.0_1726852379610.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("emoji_emoji_random0_seed1_twitter_roberta_base_2022_154m_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("emoji_emoji_random0_seed1_twitter_roberta_base_2022_154m_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|emoji_emoji_random0_seed1_twitter_roberta_base_2022_154m_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|468.4 MB| + +## References + +https://huggingface.co/tweettemposhift/emoji-emoji_random0_seed1-twitter-roberta-base-2022-154m + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-emoji_emoji_random2_seed0_twitter_roberta_base_2019_90m_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-emoji_emoji_random2_seed0_twitter_roberta_base_2019_90m_pipeline_en.md new file mode 100644 index 00000000000000..cee7395f4b4fe0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-emoji_emoji_random2_seed0_twitter_roberta_base_2019_90m_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English emoji_emoji_random2_seed0_twitter_roberta_base_2019_90m_pipeline pipeline RoBertaForSequenceClassification from tweettemposhift +author: John Snow Labs +name: emoji_emoji_random2_seed0_twitter_roberta_base_2019_90m_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`emoji_emoji_random2_seed0_twitter_roberta_base_2019_90m_pipeline` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/emoji_emoji_random2_seed0_twitter_roberta_base_2019_90m_pipeline_en_5.5.0_3.0_1726805353986.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/emoji_emoji_random2_seed0_twitter_roberta_base_2019_90m_pipeline_en_5.5.0_3.0_1726805353986.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("emoji_emoji_random2_seed0_twitter_roberta_base_2019_90m_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("emoji_emoji_random2_seed0_twitter_roberta_base_2019_90m_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|emoji_emoji_random2_seed0_twitter_roberta_base_2019_90m_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|468.6 MB| + +## References + +https://huggingface.co/tweettemposhift/emoji-emoji_random2_seed0-twitter-roberta-base-2019-90m + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-emotion_detector_en.md b/docs/_posts/ahmedlone127/2024-09-20-emotion_detector_en.md new file mode 100644 index 00000000000000..d6c47a5581fbdc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-emotion_detector_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English emotion_detector DistilBertForSequenceClassification from Foulbubble +author: John Snow Labs +name: emotion_detector +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`emotion_detector` is a English model originally trained by Foulbubble. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/emotion_detector_en_5.5.0_3.0_1726809311460.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/emotion_detector_en_5.5.0_3.0_1726809311460.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("emotion_detector","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("emotion_detector", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|emotion_detector| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Foulbubble/Emotion-Detector \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-emotion_detector_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-emotion_detector_pipeline_en.md new file mode 100644 index 00000000000000..78792d62d3ebc4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-emotion_detector_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English emotion_detector_pipeline pipeline DistilBertForSequenceClassification from Foulbubble +author: John Snow Labs +name: emotion_detector_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`emotion_detector_pipeline` is a English model originally trained by Foulbubble. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/emotion_detector_pipeline_en_5.5.0_3.0_1726809323317.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/emotion_detector_pipeline_en_5.5.0_3.0_1726809323317.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("emotion_detector_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("emotion_detector_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|emotion_detector_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Foulbubble/Emotion-Detector + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-entitylinker_en.md b/docs/_posts/ahmedlone127/2024-09-20-entitylinker_en.md new file mode 100644 index 00000000000000..9b987ff8c13cd7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-entitylinker_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English entitylinker DistilBertForSequenceClassification from hadifar +author: John Snow Labs +name: entitylinker +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`entitylinker` is a English model originally trained by hadifar. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/entitylinker_en_5.5.0_3.0_1726840872379.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/entitylinker_en_5.5.0_3.0_1726840872379.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("entitylinker","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("entitylinker", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|entitylinker| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/hadifar/entitylinker \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-entitylinker_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-entitylinker_pipeline_en.md new file mode 100644 index 00000000000000..bc7848796a8501 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-entitylinker_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English entitylinker_pipeline pipeline DistilBertForSequenceClassification from hadifar +author: John Snow Labs +name: entitylinker_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`entitylinker_pipeline` is a English model originally trained by hadifar. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/entitylinker_pipeline_en_5.5.0_3.0_1726840885806.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/entitylinker_pipeline_en_5.5.0_3.0_1726840885806.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("entitylinker_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("entitylinker_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|entitylinker_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/hadifar/entitylinker + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-ep16_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-ep16_pipeline_en.md new file mode 100644 index 00000000000000..31a40e5306bbc5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-ep16_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English ep16_pipeline pipeline WhisperForCTC from JoeTan +author: John Snow Labs +name: ep16_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ep16_pipeline` is a English model originally trained by JoeTan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ep16_pipeline_en_5.5.0_3.0_1726791219258.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ep16_pipeline_en_5.5.0_3.0_1726791219258.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("ep16_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("ep16_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ep16_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|4.8 GB| + +## References + +https://huggingface.co/JoeTan/Ep16 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-esrf_doc_categorizer_model_en.md b/docs/_posts/ahmedlone127/2024-09-20-esrf_doc_categorizer_model_en.md new file mode 100644 index 00000000000000..86fe4adff76bc5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-esrf_doc_categorizer_model_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English esrf_doc_categorizer_model DistilBertForSequenceClassification from rgoswami31 +author: John Snow Labs +name: esrf_doc_categorizer_model +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`esrf_doc_categorizer_model` is a English model originally trained by rgoswami31. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/esrf_doc_categorizer_model_en_5.5.0_3.0_1726830413482.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/esrf_doc_categorizer_model_en_5.5.0_3.0_1726830413482.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("esrf_doc_categorizer_model","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("esrf_doc_categorizer_model", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|esrf_doc_categorizer_model| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/rgoswami31/esrf_doc_categorizer_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-esrf_doc_categorizer_model_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-esrf_doc_categorizer_model_pipeline_en.md new file mode 100644 index 00000000000000..b2eb461838e733 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-esrf_doc_categorizer_model_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English esrf_doc_categorizer_model_pipeline pipeline DistilBertForSequenceClassification from rgoswami31 +author: John Snow Labs +name: esrf_doc_categorizer_model_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`esrf_doc_categorizer_model_pipeline` is a English model originally trained by rgoswami31. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/esrf_doc_categorizer_model_pipeline_en_5.5.0_3.0_1726830426019.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/esrf_doc_categorizer_model_pipeline_en_5.5.0_3.0_1726830426019.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("esrf_doc_categorizer_model_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("esrf_doc_categorizer_model_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|esrf_doc_categorizer_model_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/rgoswami31/esrf_doc_categorizer_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-feedback_finetuned_sentiment_model_en.md b/docs/_posts/ahmedlone127/2024-09-20-feedback_finetuned_sentiment_model_en.md new file mode 100644 index 00000000000000..7a9ca4e0f8da74 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-feedback_finetuned_sentiment_model_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English feedback_finetuned_sentiment_model DistilBertForSequenceClassification from divy1810 +author: John Snow Labs +name: feedback_finetuned_sentiment_model +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`feedback_finetuned_sentiment_model` is a English model originally trained by divy1810. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/feedback_finetuned_sentiment_model_en_5.5.0_3.0_1726792509401.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/feedback_finetuned_sentiment_model_en_5.5.0_3.0_1726792509401.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("feedback_finetuned_sentiment_model","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("feedback_finetuned_sentiment_model", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|feedback_finetuned_sentiment_model| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/divy1810/feedback-finetuned-sentiment-model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-feedback_finetuned_sentiment_model_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-feedback_finetuned_sentiment_model_pipeline_en.md new file mode 100644 index 00000000000000..b1f72d0ab25413 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-feedback_finetuned_sentiment_model_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English feedback_finetuned_sentiment_model_pipeline pipeline DistilBertForSequenceClassification from divy1810 +author: John Snow Labs +name: feedback_finetuned_sentiment_model_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`feedback_finetuned_sentiment_model_pipeline` is a English model originally trained by divy1810. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/feedback_finetuned_sentiment_model_pipeline_en_5.5.0_3.0_1726792522515.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/feedback_finetuned_sentiment_model_pipeline_en_5.5.0_3.0_1726792522515.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("feedback_finetuned_sentiment_model_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("feedback_finetuned_sentiment_model_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|feedback_finetuned_sentiment_model_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/divy1810/feedback-finetuned-sentiment-model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-film20000roberta_base_en.md b/docs/_posts/ahmedlone127/2024-09-20-film20000roberta_base_en.md new file mode 100644 index 00000000000000..c136b09880e3ca --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-film20000roberta_base_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English film20000roberta_base RoBertaEmbeddings from AmaiaSolaun +author: John Snow Labs +name: film20000roberta_base +date: 2024-09-20 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`film20000roberta_base` is a English model originally trained by AmaiaSolaun. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/film20000roberta_base_en_5.5.0_3.0_1726857076171.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/film20000roberta_base_en_5.5.0_3.0_1726857076171.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("film20000roberta_base","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("film20000roberta_base","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|film20000roberta_base| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|465.9 MB| + +## References + +https://huggingface.co/AmaiaSolaun/film20000roberta-base \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-film20000roberta_base_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-film20000roberta_base_pipeline_en.md new file mode 100644 index 00000000000000..b2c4300f7701a7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-film20000roberta_base_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English film20000roberta_base_pipeline pipeline RoBertaEmbeddings from AmaiaSolaun +author: John Snow Labs +name: film20000roberta_base_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`film20000roberta_base_pipeline` is a English model originally trained by AmaiaSolaun. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/film20000roberta_base_pipeline_en_5.5.0_3.0_1726857106040.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/film20000roberta_base_pipeline_en_5.5.0_3.0_1726857106040.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("film20000roberta_base_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("film20000roberta_base_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|film20000roberta_base_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|465.9 MB| + +## References + +https://huggingface.co/AmaiaSolaun/film20000roberta-base + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-final_nlp_question1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-final_nlp_question1_pipeline_en.md new file mode 100644 index 00000000000000..14fd848b7cc9cf --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-final_nlp_question1_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English final_nlp_question1_pipeline pipeline DistilBertForSequenceClassification from chamdentimem +author: John Snow Labs +name: final_nlp_question1_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`final_nlp_question1_pipeline` is a English model originally trained by chamdentimem. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/final_nlp_question1_pipeline_en_5.5.0_3.0_1726792281359.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/final_nlp_question1_pipeline_en_5.5.0_3.0_1726792281359.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("final_nlp_question1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("final_nlp_question1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|final_nlp_question1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/chamdentimem/final_nlp_question1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-final_reviews_finetuned_model_epoch_05_en.md b/docs/_posts/ahmedlone127/2024-09-20-final_reviews_finetuned_model_epoch_05_en.md new file mode 100644 index 00000000000000..21820307ee173e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-final_reviews_finetuned_model_epoch_05_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English final_reviews_finetuned_model_epoch_05 DistilBertForSequenceClassification from rahulgaikwad007 +author: John Snow Labs +name: final_reviews_finetuned_model_epoch_05 +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`final_reviews_finetuned_model_epoch_05` is a English model originally trained by rahulgaikwad007. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/final_reviews_finetuned_model_epoch_05_en_5.5.0_3.0_1726849045204.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/final_reviews_finetuned_model_epoch_05_en_5.5.0_3.0_1726849045204.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("final_reviews_finetuned_model_epoch_05","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("final_reviews_finetuned_model_epoch_05", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|final_reviews_finetuned_model_epoch_05| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/rahulgaikwad007/Final-reviews-Finetuned-model-Epoch-05 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-final_reviews_finetuned_model_epoch_05_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-final_reviews_finetuned_model_epoch_05_pipeline_en.md new file mode 100644 index 00000000000000..52617793095fde --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-final_reviews_finetuned_model_epoch_05_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English final_reviews_finetuned_model_epoch_05_pipeline pipeline DistilBertForSequenceClassification from rahulgaikwad007 +author: John Snow Labs +name: final_reviews_finetuned_model_epoch_05_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`final_reviews_finetuned_model_epoch_05_pipeline` is a English model originally trained by rahulgaikwad007. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/final_reviews_finetuned_model_epoch_05_pipeline_en_5.5.0_3.0_1726849057443.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/final_reviews_finetuned_model_epoch_05_pipeline_en_5.5.0_3.0_1726849057443.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("final_reviews_finetuned_model_epoch_05_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("final_reviews_finetuned_model_epoch_05_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|final_reviews_finetuned_model_epoch_05_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/rahulgaikwad007/Final-reviews-Finetuned-model-Epoch-05 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-fine_tuned_datasetqas_idk_mrc_with_indobert_base_uncased_with_ittl_without_freeze_lr_1e_05_muhammadravi251001_en.md b/docs/_posts/ahmedlone127/2024-09-20-fine_tuned_datasetqas_idk_mrc_with_indobert_base_uncased_with_ittl_without_freeze_lr_1e_05_muhammadravi251001_en.md new file mode 100644 index 00000000000000..cd7a954315a1cf --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-fine_tuned_datasetqas_idk_mrc_with_indobert_base_uncased_with_ittl_without_freeze_lr_1e_05_muhammadravi251001_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English fine_tuned_datasetqas_idk_mrc_with_indobert_base_uncased_with_ittl_without_freeze_lr_1e_05_muhammadravi251001 BertForQuestionAnswering from muhammadravi251001 +author: John Snow Labs +name: fine_tuned_datasetqas_idk_mrc_with_indobert_base_uncased_with_ittl_without_freeze_lr_1e_05_muhammadravi251001 +date: 2024-09-20 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`fine_tuned_datasetqas_idk_mrc_with_indobert_base_uncased_with_ittl_without_freeze_lr_1e_05_muhammadravi251001` is a English model originally trained by muhammadravi251001. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/fine_tuned_datasetqas_idk_mrc_with_indobert_base_uncased_with_ittl_without_freeze_lr_1e_05_muhammadravi251001_en_5.5.0_3.0_1726820457250.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/fine_tuned_datasetqas_idk_mrc_with_indobert_base_uncased_with_ittl_without_freeze_lr_1e_05_muhammadravi251001_en_5.5.0_3.0_1726820457250.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("fine_tuned_datasetqas_idk_mrc_with_indobert_base_uncased_with_ittl_without_freeze_lr_1e_05_muhammadravi251001","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("fine_tuned_datasetqas_idk_mrc_with_indobert_base_uncased_with_ittl_without_freeze_lr_1e_05_muhammadravi251001", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|fine_tuned_datasetqas_idk_mrc_with_indobert_base_uncased_with_ittl_without_freeze_lr_1e_05_muhammadravi251001| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|411.7 MB| + +## References + +https://huggingface.co/muhammadravi251001/fine-tuned-DatasetQAS-IDK-MRC-with-indobert-base-uncased-with-ITTL-without-freeze-LR-1e-05 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-fine_tuned_datasetqas_idk_mrc_with_indobert_base_uncased_with_ittl_without_freeze_lr_1e_05_muhammadravi251001_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-fine_tuned_datasetqas_idk_mrc_with_indobert_base_uncased_with_ittl_without_freeze_lr_1e_05_muhammadravi251001_pipeline_en.md new file mode 100644 index 00000000000000..13ba902000ce30 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-fine_tuned_datasetqas_idk_mrc_with_indobert_base_uncased_with_ittl_without_freeze_lr_1e_05_muhammadravi251001_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English fine_tuned_datasetqas_idk_mrc_with_indobert_base_uncased_with_ittl_without_freeze_lr_1e_05_muhammadravi251001_pipeline pipeline BertForQuestionAnswering from muhammadravi251001 +author: John Snow Labs +name: fine_tuned_datasetqas_idk_mrc_with_indobert_base_uncased_with_ittl_without_freeze_lr_1e_05_muhammadravi251001_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`fine_tuned_datasetqas_idk_mrc_with_indobert_base_uncased_with_ittl_without_freeze_lr_1e_05_muhammadravi251001_pipeline` is a English model originally trained by muhammadravi251001. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/fine_tuned_datasetqas_idk_mrc_with_indobert_base_uncased_with_ittl_without_freeze_lr_1e_05_muhammadravi251001_pipeline_en_5.5.0_3.0_1726820475729.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/fine_tuned_datasetqas_idk_mrc_with_indobert_base_uncased_with_ittl_without_freeze_lr_1e_05_muhammadravi251001_pipeline_en_5.5.0_3.0_1726820475729.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("fine_tuned_datasetqas_idk_mrc_with_indobert_base_uncased_with_ittl_without_freeze_lr_1e_05_muhammadravi251001_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("fine_tuned_datasetqas_idk_mrc_with_indobert_base_uncased_with_ittl_without_freeze_lr_1e_05_muhammadravi251001_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|fine_tuned_datasetqas_idk_mrc_with_indobert_base_uncased_with_ittl_without_freeze_lr_1e_05_muhammadravi251001_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|411.7 MB| + +## References + +https://huggingface.co/muhammadravi251001/fine-tuned-DatasetQAS-IDK-MRC-with-indobert-base-uncased-with-ITTL-without-freeze-LR-1e-05 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-fine_tuned_distilbert_for_amazon_book_review_absa_en.md b/docs/_posts/ahmedlone127/2024-09-20-fine_tuned_distilbert_for_amazon_book_review_absa_en.md new file mode 100644 index 00000000000000..f11660234ef227 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-fine_tuned_distilbert_for_amazon_book_review_absa_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English fine_tuned_distilbert_for_amazon_book_review_absa DistilBertForSequenceClassification from ferdynandchan +author: John Snow Labs +name: fine_tuned_distilbert_for_amazon_book_review_absa +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`fine_tuned_distilbert_for_amazon_book_review_absa` is a English model originally trained by ferdynandchan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/fine_tuned_distilbert_for_amazon_book_review_absa_en_5.5.0_3.0_1726832949048.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/fine_tuned_distilbert_for_amazon_book_review_absa_en_5.5.0_3.0_1726832949048.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("fine_tuned_distilbert_for_amazon_book_review_absa","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("fine_tuned_distilbert_for_amazon_book_review_absa", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|fine_tuned_distilbert_for_amazon_book_review_absa| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/ferdynandchan/Fine-Tuned-DistilBERT-for-Amazon-Book-Review-ABSA \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-fine_tuned_distilbert_for_amazon_book_review_absa_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-fine_tuned_distilbert_for_amazon_book_review_absa_pipeline_en.md new file mode 100644 index 00000000000000..127bf2247414f2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-fine_tuned_distilbert_for_amazon_book_review_absa_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English fine_tuned_distilbert_for_amazon_book_review_absa_pipeline pipeline DistilBertForSequenceClassification from ferdynandchan +author: John Snow Labs +name: fine_tuned_distilbert_for_amazon_book_review_absa_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`fine_tuned_distilbert_for_amazon_book_review_absa_pipeline` is a English model originally trained by ferdynandchan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/fine_tuned_distilbert_for_amazon_book_review_absa_pipeline_en_5.5.0_3.0_1726832961447.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/fine_tuned_distilbert_for_amazon_book_review_absa_pipeline_en_5.5.0_3.0_1726832961447.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("fine_tuned_distilbert_for_amazon_book_review_absa_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("fine_tuned_distilbert_for_amazon_book_review_absa_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|fine_tuned_distilbert_for_amazon_book_review_absa_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/ferdynandchan/Fine-Tuned-DistilBERT-for-Amazon-Book-Review-ABSA + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-fine_tuned_model_v1_en.md b/docs/_posts/ahmedlone127/2024-09-20-fine_tuned_model_v1_en.md new file mode 100644 index 00000000000000..9df8a707b723e3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-fine_tuned_model_v1_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English fine_tuned_model_v1 DistilBertForSequenceClassification from rb757 +author: John Snow Labs +name: fine_tuned_model_v1 +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`fine_tuned_model_v1` is a English model originally trained by rb757. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/fine_tuned_model_v1_en_5.5.0_3.0_1726792130422.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/fine_tuned_model_v1_en_5.5.0_3.0_1726792130422.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("fine_tuned_model_v1","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("fine_tuned_model_v1", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|fine_tuned_model_v1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/rb757/fine_tuned_model_v1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-fine_tuned_model_v1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-fine_tuned_model_v1_pipeline_en.md new file mode 100644 index 00000000000000..9d2713a3ccf9b3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-fine_tuned_model_v1_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English fine_tuned_model_v1_pipeline pipeline DistilBertForSequenceClassification from rb757 +author: John Snow Labs +name: fine_tuned_model_v1_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`fine_tuned_model_v1_pipeline` is a English model originally trained by rb757. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/fine_tuned_model_v1_pipeline_en_5.5.0_3.0_1726792142880.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/fine_tuned_model_v1_pipeline_en_5.5.0_3.0_1726792142880.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("fine_tuned_model_v1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("fine_tuned_model_v1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|fine_tuned_model_v1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/rb757/fine_tuned_model_v1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-fine_tunning_en.md b/docs/_posts/ahmedlone127/2024-09-20-fine_tunning_en.md new file mode 100644 index 00000000000000..42ff66f89b6d1d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-fine_tunning_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English fine_tunning DistilBertForSequenceClassification from teagiraldo +author: John Snow Labs +name: fine_tunning +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`fine_tunning` is a English model originally trained by teagiraldo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/fine_tunning_en_5.5.0_3.0_1726842466689.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/fine_tunning_en_5.5.0_3.0_1726842466689.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("fine_tunning","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("fine_tunning", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|fine_tunning| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/teagiraldo/fine_tunning \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-fine_tunning_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-fine_tunning_pipeline_en.md new file mode 100644 index 00000000000000..fd22c504355f67 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-fine_tunning_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English fine_tunning_pipeline pipeline DistilBertForSequenceClassification from teagiraldo +author: John Snow Labs +name: fine_tunning_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`fine_tunning_pipeline` is a English model originally trained by teagiraldo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/fine_tunning_pipeline_en_5.5.0_3.0_1726842482848.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/fine_tunning_pipeline_en_5.5.0_3.0_1726842482848.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("fine_tunning_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("fine_tunning_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|fine_tunning_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/teagiraldo/fine_tunning + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-finetuned_beliefs_sentiment_classifier_experiment2_en.md b/docs/_posts/ahmedlone127/2024-09-20-finetuned_beliefs_sentiment_classifier_experiment2_en.md new file mode 100644 index 00000000000000..31490e3fb7ca54 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-finetuned_beliefs_sentiment_classifier_experiment2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuned_beliefs_sentiment_classifier_experiment2 RoBertaForSequenceClassification from hriaz +author: John Snow Labs +name: finetuned_beliefs_sentiment_classifier_experiment2 +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuned_beliefs_sentiment_classifier_experiment2` is a English model originally trained by hriaz. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuned_beliefs_sentiment_classifier_experiment2_en_5.5.0_3.0_1726804993689.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuned_beliefs_sentiment_classifier_experiment2_en_5.5.0_3.0_1726804993689.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("finetuned_beliefs_sentiment_classifier_experiment2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("finetuned_beliefs_sentiment_classifier_experiment2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuned_beliefs_sentiment_classifier_experiment2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/hriaz/finetuned_beliefs_sentiment_classifier_experiment2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-finetuning_amazon_sample100000_text_robertamodel_en.md b/docs/_posts/ahmedlone127/2024-09-20-finetuning_amazon_sample100000_text_robertamodel_en.md new file mode 100644 index 00000000000000..7d8582fd976a95 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-finetuning_amazon_sample100000_text_robertamodel_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuning_amazon_sample100000_text_robertamodel RoBertaForSequenceClassification from hsiuping +author: John Snow Labs +name: finetuning_amazon_sample100000_text_robertamodel +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_amazon_sample100000_text_robertamodel` is a English model originally trained by hsiuping. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_amazon_sample100000_text_robertamodel_en_5.5.0_3.0_1726851763956.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_amazon_sample100000_text_robertamodel_en_5.5.0_3.0_1726851763956.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("finetuning_amazon_sample100000_text_robertamodel","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("finetuning_amazon_sample100000_text_robertamodel", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_amazon_sample100000_text_robertamodel| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|466.1 MB| + +## References + +https://huggingface.co/hsiuping/finetuning-amazon-sample100000-text-RoBERTamodel \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-finetuning_amazon_sample100000_text_robertamodel_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-finetuning_amazon_sample100000_text_robertamodel_pipeline_en.md new file mode 100644 index 00000000000000..6226be74ef4b6d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-finetuning_amazon_sample100000_text_robertamodel_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuning_amazon_sample100000_text_robertamodel_pipeline pipeline RoBertaForSequenceClassification from hsiuping +author: John Snow Labs +name: finetuning_amazon_sample100000_text_robertamodel_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_amazon_sample100000_text_robertamodel_pipeline` is a English model originally trained by hsiuping. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_amazon_sample100000_text_robertamodel_pipeline_en_5.5.0_3.0_1726851787746.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_amazon_sample100000_text_robertamodel_pipeline_en_5.5.0_3.0_1726851787746.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuning_amazon_sample100000_text_robertamodel_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuning_amazon_sample100000_text_robertamodel_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_amazon_sample100000_text_robertamodel_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|466.1 MB| + +## References + +https://huggingface.co/hsiuping/finetuning-amazon-sample100000-text-RoBERTamodel + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-finetuning_imdb_sentiment_model_3000_samples_k_ray_en.md b/docs/_posts/ahmedlone127/2024-09-20-finetuning_imdb_sentiment_model_3000_samples_k_ray_en.md new file mode 100644 index 00000000000000..e49baeda9ddc69 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-finetuning_imdb_sentiment_model_3000_samples_k_ray_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuning_imdb_sentiment_model_3000_samples_k_ray DistilBertForSequenceClassification from K-Ray +author: John Snow Labs +name: finetuning_imdb_sentiment_model_3000_samples_k_ray +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_imdb_sentiment_model_3000_samples_k_ray` is a English model originally trained by K-Ray. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_imdb_sentiment_model_3000_samples_k_ray_en_5.5.0_3.0_1726792467845.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_imdb_sentiment_model_3000_samples_k_ray_en_5.5.0_3.0_1726792467845.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_imdb_sentiment_model_3000_samples_k_ray","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_imdb_sentiment_model_3000_samples_k_ray", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_imdb_sentiment_model_3000_samples_k_ray| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/K-Ray/finetuning-imdb-sentiment-model-3000-samples \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-finetuning_imdb_sentiment_model_3000_samples_k_ray_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-finetuning_imdb_sentiment_model_3000_samples_k_ray_pipeline_en.md new file mode 100644 index 00000000000000..cea6fb14915677 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-finetuning_imdb_sentiment_model_3000_samples_k_ray_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuning_imdb_sentiment_model_3000_samples_k_ray_pipeline pipeline DistilBertForSequenceClassification from K-Ray +author: John Snow Labs +name: finetuning_imdb_sentiment_model_3000_samples_k_ray_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_imdb_sentiment_model_3000_samples_k_ray_pipeline` is a English model originally trained by K-Ray. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_imdb_sentiment_model_3000_samples_k_ray_pipeline_en_5.5.0_3.0_1726792480378.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_imdb_sentiment_model_3000_samples_k_ray_pipeline_en_5.5.0_3.0_1726792480378.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuning_imdb_sentiment_model_3000_samples_k_ray_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuning_imdb_sentiment_model_3000_samples_k_ray_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_imdb_sentiment_model_3000_samples_k_ray_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/K-Ray/finetuning-imdb-sentiment-model-3000-samples + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-finetuning_imdb_sentiment_model_3000_samples_rahulgaikwad007_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-finetuning_imdb_sentiment_model_3000_samples_rahulgaikwad007_pipeline_en.md new file mode 100644 index 00000000000000..189589fa7143a6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-finetuning_imdb_sentiment_model_3000_samples_rahulgaikwad007_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuning_imdb_sentiment_model_3000_samples_rahulgaikwad007_pipeline pipeline DistilBertForSequenceClassification from rahulgaikwad007 +author: John Snow Labs +name: finetuning_imdb_sentiment_model_3000_samples_rahulgaikwad007_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_imdb_sentiment_model_3000_samples_rahulgaikwad007_pipeline` is a English model originally trained by rahulgaikwad007. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_imdb_sentiment_model_3000_samples_rahulgaikwad007_pipeline_en_5.5.0_3.0_1726842575421.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_imdb_sentiment_model_3000_samples_rahulgaikwad007_pipeline_en_5.5.0_3.0_1726842575421.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuning_imdb_sentiment_model_3000_samples_rahulgaikwad007_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuning_imdb_sentiment_model_3000_samples_rahulgaikwad007_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_imdb_sentiment_model_3000_samples_rahulgaikwad007_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/rahulgaikwad007/finetuning-imdb-sentiment-model-3000-samples + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_3000_samples_3_en.md b/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_3000_samples_3_en.md new file mode 100644 index 00000000000000..a9ca2e10265937 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_3000_samples_3_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_3 DistilBertForSequenceClassification from mamledes +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_3 +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_3` is a English model originally trained by mamledes. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_3_en_5.5.0_3.0_1726841969968.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_3_en_5.5.0_3.0_1726841969968.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_3","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_3", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_3| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/mamledes/finetuning-sentiment-model-3000-samples_3 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_3000_samples_3_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_3000_samples_3_pipeline_en.md new file mode 100644 index 00000000000000..4da8a516e79c7e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_3000_samples_3_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_3_pipeline pipeline DistilBertForSequenceClassification from mamledes +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_3_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_3_pipeline` is a English model originally trained by mamledes. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_3_pipeline_en_5.5.0_3.0_1726841986100.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_3_pipeline_en_5.5.0_3.0_1726841986100.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuning_sentiment_model_3000_samples_3_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuning_sentiment_model_3000_samples_3_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_3_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/mamledes/finetuning-sentiment-model-3000-samples_3 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_3000_samples_aaaaaiden_en.md b/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_3000_samples_aaaaaiden_en.md new file mode 100644 index 00000000000000..45061513efbe26 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_3000_samples_aaaaaiden_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_aaaaaiden DistilBertForSequenceClassification from AAAAAiden +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_aaaaaiden +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_aaaaaiden` is a English model originally trained by AAAAAiden. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_aaaaaiden_en_5.5.0_3.0_1726832945070.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_aaaaaiden_en_5.5.0_3.0_1726832945070.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_aaaaaiden","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_aaaaaiden", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_aaaaaiden| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/AAAAAiden/finetuning-sentiment-model-3000-samples \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_3000_samples_aaaaaiden_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_3000_samples_aaaaaiden_pipeline_en.md new file mode 100644 index 00000000000000..e64602e4cb0264 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_3000_samples_aaaaaiden_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_aaaaaiden_pipeline pipeline DistilBertForSequenceClassification from AAAAAiden +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_aaaaaiden_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_aaaaaiden_pipeline` is a English model originally trained by AAAAAiden. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_aaaaaiden_pipeline_en_5.5.0_3.0_1726832957501.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_aaaaaiden_pipeline_en_5.5.0_3.0_1726832957501.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuning_sentiment_model_3000_samples_aaaaaiden_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuning_sentiment_model_3000_samples_aaaaaiden_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_aaaaaiden_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.6 MB| + +## References + +https://huggingface.co/AAAAAiden/finetuning-sentiment-model-3000-samples + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_3000_samples_bijupv_en.md b/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_3000_samples_bijupv_en.md new file mode 100644 index 00000000000000..0e100e999f868e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_3000_samples_bijupv_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_bijupv DistilBertForSequenceClassification from BijuPV +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_bijupv +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_bijupv` is a English model originally trained by BijuPV. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_bijupv_en_5.5.0_3.0_1726792302809.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_bijupv_en_5.5.0_3.0_1726792302809.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_bijupv","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_bijupv", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_bijupv| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/BijuPV/finetuning-sentiment-model-3000-samples \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_3000_samples_bijupv_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_3000_samples_bijupv_pipeline_en.md new file mode 100644 index 00000000000000..ed9ed2a0f6699a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_3000_samples_bijupv_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_bijupv_pipeline pipeline DistilBertForSequenceClassification from BijuPV +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_bijupv_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_bijupv_pipeline` is a English model originally trained by BijuPV. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_bijupv_pipeline_en_5.5.0_3.0_1726792315100.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_bijupv_pipeline_en_5.5.0_3.0_1726792315100.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuning_sentiment_model_3000_samples_bijupv_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuning_sentiment_model_3000_samples_bijupv_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_bijupv_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/BijuPV/finetuning-sentiment-model-3000-samples + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_3000_samples_david1987bb_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_3000_samples_david1987bb_pipeline_en.md new file mode 100644 index 00000000000000..046d821fa58e68 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_3000_samples_david1987bb_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_david1987bb_pipeline pipeline DistilBertForSequenceClassification from David1987BB +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_david1987bb_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_david1987bb_pipeline` is a English model originally trained by David1987BB. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_david1987bb_pipeline_en_5.5.0_3.0_1726848706284.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_david1987bb_pipeline_en_5.5.0_3.0_1726848706284.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuning_sentiment_model_3000_samples_david1987bb_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuning_sentiment_model_3000_samples_david1987bb_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_david1987bb_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/David1987BB/finetuning-sentiment-model-3000-samples + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_3000_samples_dscoder25_en.md b/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_3000_samples_dscoder25_en.md new file mode 100644 index 00000000000000..7bf20eacac8edc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_3000_samples_dscoder25_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_dscoder25 DistilBertForSequenceClassification from dscoder25 +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_dscoder25 +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_dscoder25` is a English model originally trained by dscoder25. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_dscoder25_en_5.5.0_3.0_1726841110633.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_dscoder25_en_5.5.0_3.0_1726841110633.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_dscoder25","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_dscoder25", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_dscoder25| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/dscoder25/finetuning-sentiment-model-3000-samples \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_3000_samples_dscoder25_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_3000_samples_dscoder25_pipeline_en.md new file mode 100644 index 00000000000000..0b783b5260c698 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_3000_samples_dscoder25_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_dscoder25_pipeline pipeline DistilBertForSequenceClassification from dscoder25 +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_dscoder25_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_dscoder25_pipeline` is a English model originally trained by dscoder25. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_dscoder25_pipeline_en_5.5.0_3.0_1726841124653.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_dscoder25_pipeline_en_5.5.0_3.0_1726841124653.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuning_sentiment_model_3000_samples_dscoder25_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuning_sentiment_model_3000_samples_dscoder25_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_dscoder25_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/dscoder25/finetuning-sentiment-model-3000-samples + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_3000_samples_manikanta0002_en.md b/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_3000_samples_manikanta0002_en.md new file mode 100644 index 00000000000000..5a59f760079af6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_3000_samples_manikanta0002_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_manikanta0002 DistilBertForSequenceClassification from manikanta0002 +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_manikanta0002 +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_manikanta0002` is a English model originally trained by manikanta0002. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_manikanta0002_en_5.5.0_3.0_1726871717954.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_manikanta0002_en_5.5.0_3.0_1726871717954.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_manikanta0002","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_manikanta0002", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_manikanta0002| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/manikanta0002/finetuning-sentiment-model-3000-samples \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_3000_samples_manikanta0002_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_3000_samples_manikanta0002_pipeline_en.md new file mode 100644 index 00000000000000..9f5a8049ceac9c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_3000_samples_manikanta0002_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_manikanta0002_pipeline pipeline DistilBertForSequenceClassification from manikanta0002 +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_manikanta0002_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_manikanta0002_pipeline` is a English model originally trained by manikanta0002. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_manikanta0002_pipeline_en_5.5.0_3.0_1726871729813.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_manikanta0002_pipeline_en_5.5.0_3.0_1726871729813.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuning_sentiment_model_3000_samples_manikanta0002_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuning_sentiment_model_3000_samples_manikanta0002_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_manikanta0002_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/manikanta0002/finetuning-sentiment-model-3000-samples + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_3000_samples_mekteck_en.md b/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_3000_samples_mekteck_en.md new file mode 100644 index 00000000000000..ea2c22425df0e5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_3000_samples_mekteck_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_mekteck DistilBertForSequenceClassification from Mekteck +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_mekteck +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_mekteck` is a English model originally trained by Mekteck. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_mekteck_en_5.5.0_3.0_1726792105962.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_mekteck_en_5.5.0_3.0_1726792105962.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_mekteck","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_mekteck", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_mekteck| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Mekteck/finetuning-sentiment-model-3000-samples \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_3000_samples_mekteck_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_3000_samples_mekteck_pipeline_en.md new file mode 100644 index 00000000000000..b681c5e3eb1c2a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_3000_samples_mekteck_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_mekteck_pipeline pipeline DistilBertForSequenceClassification from Mekteck +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_mekteck_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_mekteck_pipeline` is a English model originally trained by Mekteck. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_mekteck_pipeline_en_5.5.0_3.0_1726792119456.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_mekteck_pipeline_en_5.5.0_3.0_1726792119456.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuning_sentiment_model_3000_samples_mekteck_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuning_sentiment_model_3000_samples_mekteck_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_mekteck_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Mekteck/finetuning-sentiment-model-3000-samples + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_3000_samples_mk_20_en.md b/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_3000_samples_mk_20_en.md new file mode 100644 index 00000000000000..c3782613cf302b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_3000_samples_mk_20_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_mk_20 DistilBertForSequenceClassification from mk-20 +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_mk_20 +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_mk_20` is a English model originally trained by mk-20. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_mk_20_en_5.5.0_3.0_1726809105407.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_mk_20_en_5.5.0_3.0_1726809105407.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_mk_20","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_mk_20", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_mk_20| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/mk-20/finetuning-sentiment-model-3000-samples \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_3000_samples_mk_20_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_3000_samples_mk_20_pipeline_en.md new file mode 100644 index 00000000000000..fd8d5913f3d4be --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_3000_samples_mk_20_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_mk_20_pipeline pipeline DistilBertForSequenceClassification from mk-20 +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_mk_20_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_mk_20_pipeline` is a English model originally trained by mk-20. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_mk_20_pipeline_en_5.5.0_3.0_1726809117914.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_mk_20_pipeline_en_5.5.0_3.0_1726809117914.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuning_sentiment_model_3000_samples_mk_20_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuning_sentiment_model_3000_samples_mk_20_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_mk_20_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/mk-20/finetuning-sentiment-model-3000-samples + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_3000_samples_ml4algotrading_en.md b/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_3000_samples_ml4algotrading_en.md new file mode 100644 index 00000000000000..5d7b5ceb00946d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_3000_samples_ml4algotrading_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_ml4algotrading DistilBertForSequenceClassification from ml4algotrading +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_ml4algotrading +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_ml4algotrading` is a English model originally trained by ml4algotrading. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_ml4algotrading_en_5.5.0_3.0_1726860918683.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_ml4algotrading_en_5.5.0_3.0_1726860918683.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_ml4algotrading","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_ml4algotrading", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_ml4algotrading| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/ml4algotrading/finetuning-sentiment-model-3000-samples \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_3000_samples_ml4algotrading_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_3000_samples_ml4algotrading_pipeline_en.md new file mode 100644 index 00000000000000..c22cbc810b60cd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_3000_samples_ml4algotrading_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_ml4algotrading_pipeline pipeline DistilBertForSequenceClassification from ml4algotrading +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_ml4algotrading_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_ml4algotrading_pipeline` is a English model originally trained by ml4algotrading. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_ml4algotrading_pipeline_en_5.5.0_3.0_1726860930585.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_ml4algotrading_pipeline_en_5.5.0_3.0_1726860930585.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuning_sentiment_model_3000_samples_ml4algotrading_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuning_sentiment_model_3000_samples_ml4algotrading_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_ml4algotrading_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/ml4algotrading/finetuning-sentiment-model-3000-samples + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_3000_samples_naomaru_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_3000_samples_naomaru_pipeline_en.md new file mode 100644 index 00000000000000..1d69e3e0fdd3c1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_3000_samples_naomaru_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_naomaru_pipeline pipeline DistilBertForSequenceClassification from naomaru +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_naomaru_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_naomaru_pipeline` is a English model originally trained by naomaru. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_naomaru_pipeline_en_5.5.0_3.0_1726809266024.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_naomaru_pipeline_en_5.5.0_3.0_1726809266024.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuning_sentiment_model_3000_samples_naomaru_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuning_sentiment_model_3000_samples_naomaru_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_naomaru_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/naomaru/finetuning-sentiment-model-3000-samples + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_3000_samples_nypriya_en.md b/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_3000_samples_nypriya_en.md new file mode 100644 index 00000000000000..80bca5685f0cd8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_3000_samples_nypriya_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_nypriya DistilBertForSequenceClassification from nypriya +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_nypriya +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_nypriya` is a English model originally trained by nypriya. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_nypriya_en_5.5.0_3.0_1726809493059.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_nypriya_en_5.5.0_3.0_1726809493059.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_nypriya","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_nypriya", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_nypriya| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/nypriya/finetuning-sentiment-model-3000-samples \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_3000_samples_nypriya_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_3000_samples_nypriya_pipeline_en.md new file mode 100644 index 00000000000000..0fa96e6e4b97e0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_3000_samples_nypriya_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_nypriya_pipeline pipeline DistilBertForSequenceClassification from nypriya +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_nypriya_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_nypriya_pipeline` is a English model originally trained by nypriya. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_nypriya_pipeline_en_5.5.0_3.0_1726809505231.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_nypriya_pipeline_en_5.5.0_3.0_1726809505231.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuning_sentiment_model_3000_samples_nypriya_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuning_sentiment_model_3000_samples_nypriya_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_nypriya_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/nypriya/finetuning-sentiment-model-3000-samples + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_3000_samples_renatadbc_en.md b/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_3000_samples_renatadbc_en.md new file mode 100644 index 00000000000000..824a4ce3b288e1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_3000_samples_renatadbc_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_renatadbc DistilBertForSequenceClassification from Renatadbc +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_renatadbc +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_renatadbc` is a English model originally trained by Renatadbc. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_renatadbc_en_5.5.0_3.0_1726830164800.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_renatadbc_en_5.5.0_3.0_1726830164800.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_renatadbc","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_renatadbc", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_renatadbc| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Renatadbc/finetuning-sentiment-model-3000-samples \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_3000_samples_renatadbc_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_3000_samples_renatadbc_pipeline_en.md new file mode 100644 index 00000000000000..dc86cb3fb0b0d5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_3000_samples_renatadbc_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_renatadbc_pipeline pipeline DistilBertForSequenceClassification from Renatadbc +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_renatadbc_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_renatadbc_pipeline` is a English model originally trained by Renatadbc. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_renatadbc_pipeline_en_5.5.0_3.0_1726830176559.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_renatadbc_pipeline_en_5.5.0_3.0_1726830176559.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuning_sentiment_model_3000_samples_renatadbc_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuning_sentiment_model_3000_samples_renatadbc_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_renatadbc_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Renatadbc/finetuning-sentiment-model-3000-samples + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_3000_samples_renhook_en.md b/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_3000_samples_renhook_en.md new file mode 100644 index 00000000000000..9e78d99c844db7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_3000_samples_renhook_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_renhook DistilBertForSequenceClassification from RenHook +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_renhook +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_renhook` is a English model originally trained by RenHook. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_renhook_en_5.5.0_3.0_1726841320079.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_renhook_en_5.5.0_3.0_1726841320079.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_renhook","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_renhook", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_renhook| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/RenHook/finetuning-sentiment-model-3000-samples \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_3000_samples_renhook_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_3000_samples_renhook_pipeline_en.md new file mode 100644 index 00000000000000..c4897dbbb62580 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_3000_samples_renhook_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_renhook_pipeline pipeline DistilBertForSequenceClassification from RenHook +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_renhook_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_renhook_pipeline` is a English model originally trained by RenHook. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_renhook_pipeline_en_5.5.0_3.0_1726841332592.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_renhook_pipeline_en_5.5.0_3.0_1726841332592.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuning_sentiment_model_3000_samples_renhook_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuning_sentiment_model_3000_samples_renhook_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_renhook_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/RenHook/finetuning-sentiment-model-3000-samples + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_3000_samples_stephfortiz_en.md b/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_3000_samples_stephfortiz_en.md new file mode 100644 index 00000000000000..938d37fef69246 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_3000_samples_stephfortiz_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_stephfortiz DistilBertForSequenceClassification from StephFortiz +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_stephfortiz +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_stephfortiz` is a English model originally trained by StephFortiz. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_stephfortiz_en_5.5.0_3.0_1726861112800.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_stephfortiz_en_5.5.0_3.0_1726861112800.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_stephfortiz","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_stephfortiz", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_stephfortiz| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/StephFortiz/finetuning-sentiment-model-3000-samples \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_3000_samples_stephfortiz_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_3000_samples_stephfortiz_pipeline_en.md new file mode 100644 index 00000000000000..6106a35f1692ea --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_3000_samples_stephfortiz_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_stephfortiz_pipeline pipeline DistilBertForSequenceClassification from StephFortiz +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_stephfortiz_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_stephfortiz_pipeline` is a English model originally trained by StephFortiz. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_stephfortiz_pipeline_en_5.5.0_3.0_1726861126932.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_stephfortiz_pipeline_en_5.5.0_3.0_1726861126932.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuning_sentiment_model_3000_samples_stephfortiz_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuning_sentiment_model_3000_samples_stephfortiz_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_stephfortiz_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/StephFortiz/finetuning-sentiment-model-3000-samples + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_3000_samples_tanushgolwala_en.md b/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_3000_samples_tanushgolwala_en.md new file mode 100644 index 00000000000000..2f3d829f5df5e8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_3000_samples_tanushgolwala_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_tanushgolwala DistilBertForSequenceClassification from tanushgolwala +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_tanushgolwala +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_tanushgolwala` is a English model originally trained by tanushgolwala. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_tanushgolwala_en_5.5.0_3.0_1726824072566.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_tanushgolwala_en_5.5.0_3.0_1726824072566.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_tanushgolwala","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_tanushgolwala", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_tanushgolwala| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/tanushgolwala/finetuning-sentiment-model-3000-samples \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_3000_samples_tanushgolwala_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_3000_samples_tanushgolwala_pipeline_en.md new file mode 100644 index 00000000000000..8880403b339f59 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_3000_samples_tanushgolwala_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_tanushgolwala_pipeline pipeline DistilBertForSequenceClassification from tanushgolwala +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_tanushgolwala_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_tanushgolwala_pipeline` is a English model originally trained by tanushgolwala. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_tanushgolwala_pipeline_en_5.5.0_3.0_1726824084091.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_tanushgolwala_pipeline_en_5.5.0_3.0_1726824084091.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuning_sentiment_model_3000_samples_tanushgolwala_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuning_sentiment_model_3000_samples_tanushgolwala_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_tanushgolwala_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/tanushgolwala/finetuning-sentiment-model-3000-samples + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_3000_samples_tiziperata_en.md b/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_3000_samples_tiziperata_en.md new file mode 100644 index 00000000000000..3371784628787a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_3000_samples_tiziperata_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_tiziperata DistilBertForSequenceClassification from tiziperata +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_tiziperata +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_tiziperata` is a English model originally trained by tiziperata. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_tiziperata_en_5.5.0_3.0_1726830107516.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_tiziperata_en_5.5.0_3.0_1726830107516.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_tiziperata","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_tiziperata", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_tiziperata| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/tiziperata/finetuning-sentiment-model-3000-samples \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_3000_samples_tiziperata_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_3000_samples_tiziperata_pipeline_en.md new file mode 100644 index 00000000000000..4e38bace8345a1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_3000_samples_tiziperata_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_tiziperata_pipeline pipeline DistilBertForSequenceClassification from tiziperata +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_tiziperata_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_tiziperata_pipeline` is a English model originally trained by tiziperata. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_tiziperata_pipeline_en_5.5.0_3.0_1726830119406.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_tiziperata_pipeline_en_5.5.0_3.0_1726830119406.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuning_sentiment_model_3000_samples_tiziperata_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuning_sentiment_model_3000_samples_tiziperata_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_tiziperata_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/tiziperata/finetuning-sentiment-model-3000-samples + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_3000_samples_williamtbarker_en.md b/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_3000_samples_williamtbarker_en.md new file mode 100644 index 00000000000000..2fd9a6f3422a7e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_3000_samples_williamtbarker_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_williamtbarker DistilBertForSequenceClassification from williamtbarker +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_williamtbarker +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_williamtbarker` is a English model originally trained by williamtbarker. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_williamtbarker_en_5.5.0_3.0_1726841576553.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_williamtbarker_en_5.5.0_3.0_1726841576553.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_williamtbarker","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_williamtbarker", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_williamtbarker| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/williamtbarker/finetuning-sentiment-model-3000-samples \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_3000_samples_williamtbarker_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_3000_samples_williamtbarker_pipeline_en.md new file mode 100644 index 00000000000000..328a6446b49cdc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_3000_samples_williamtbarker_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_williamtbarker_pipeline pipeline DistilBertForSequenceClassification from williamtbarker +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_williamtbarker_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_williamtbarker_pipeline` is a English model originally trained by williamtbarker. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_williamtbarker_pipeline_en_5.5.0_3.0_1726841588466.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_williamtbarker_pipeline_en_5.5.0_3.0_1726841588466.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuning_sentiment_model_3000_samples_williamtbarker_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuning_sentiment_model_3000_samples_williamtbarker_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_williamtbarker_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/williamtbarker/finetuning-sentiment-model-3000-samples + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_3000_samples_xinyiding_en.md b/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_3000_samples_xinyiding_en.md new file mode 100644 index 00000000000000..b932976a5ca880 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_3000_samples_xinyiding_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_xinyiding DistilBertForSequenceClassification from xinyiding +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_xinyiding +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_xinyiding` is a English model originally trained by xinyiding. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_xinyiding_en_5.5.0_3.0_1726830346519.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_xinyiding_en_5.5.0_3.0_1726830346519.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_xinyiding","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_xinyiding", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_xinyiding| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/xinyiding/finetuning-sentiment-model-3000-samples \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_3000_samples_xinyiding_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_3000_samples_xinyiding_pipeline_en.md new file mode 100644 index 00000000000000..31799f8c9b92c9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_3000_samples_xinyiding_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_xinyiding_pipeline pipeline DistilBertForSequenceClassification from xinyiding +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_xinyiding_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_xinyiding_pipeline` is a English model originally trained by xinyiding. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_xinyiding_pipeline_en_5.5.0_3.0_1726830359508.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_xinyiding_pipeline_en_5.5.0_3.0_1726830359508.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuning_sentiment_model_3000_samples_xinyiding_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuning_sentiment_model_3000_samples_xinyiding_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_xinyiding_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/xinyiding/finetuning-sentiment-model-3000-samples + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_3000_samples_yeabinml_en.md b/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_3000_samples_yeabinml_en.md new file mode 100644 index 00000000000000..6d61e69beee447 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_3000_samples_yeabinml_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_yeabinml DistilBertForSequenceClassification from Yeabinml +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_yeabinml +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_yeabinml` is a English model originally trained by Yeabinml. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_yeabinml_en_5.5.0_3.0_1726871349335.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_yeabinml_en_5.5.0_3.0_1726871349335.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_yeabinml","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_yeabinml", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_yeabinml| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Yeabinml/finetuning-sentiment-model-3000-samples \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_3004_samples_valliyammai_en.md b/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_3004_samples_valliyammai_en.md new file mode 100644 index 00000000000000..f842c0cda51a88 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_3004_samples_valliyammai_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuning_sentiment_model_3004_samples_valliyammai DistilBertForSequenceClassification from Valliyammai +author: John Snow Labs +name: finetuning_sentiment_model_3004_samples_valliyammai +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3004_samples_valliyammai` is a English model originally trained by Valliyammai. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3004_samples_valliyammai_en_5.5.0_3.0_1726841255887.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3004_samples_valliyammai_en_5.5.0_3.0_1726841255887.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3004_samples_valliyammai","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3004_samples_valliyammai", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3004_samples_valliyammai| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Valliyammai/finetuning-sentiment-model-3004-samples \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_3004_samples_valliyammai_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_3004_samples_valliyammai_pipeline_en.md new file mode 100644 index 00000000000000..e63c54a1ad5f40 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_3004_samples_valliyammai_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuning_sentiment_model_3004_samples_valliyammai_pipeline pipeline DistilBertForSequenceClassification from Valliyammai +author: John Snow Labs +name: finetuning_sentiment_model_3004_samples_valliyammai_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3004_samples_valliyammai_pipeline` is a English model originally trained by Valliyammai. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3004_samples_valliyammai_pipeline_en_5.5.0_3.0_1726841268457.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3004_samples_valliyammai_pipeline_en_5.5.0_3.0_1726841268457.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuning_sentiment_model_3004_samples_valliyammai_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuning_sentiment_model_3004_samples_valliyammai_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3004_samples_valliyammai_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Valliyammai/finetuning-sentiment-model-3004-samples + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_300_samples_kanzabatool_en.md b/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_300_samples_kanzabatool_en.md new file mode 100644 index 00000000000000..9e63c7939438fd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_300_samples_kanzabatool_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuning_sentiment_model_300_samples_kanzabatool DistilBertForSequenceClassification from KanzaBatool +author: John Snow Labs +name: finetuning_sentiment_model_300_samples_kanzabatool +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_300_samples_kanzabatool` is a English model originally trained by KanzaBatool. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_300_samples_kanzabatool_en_5.5.0_3.0_1726824083459.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_300_samples_kanzabatool_en_5.5.0_3.0_1726824083459.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_300_samples_kanzabatool","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_300_samples_kanzabatool", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_300_samples_kanzabatool| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/KanzaBatool/finetuning-sentiment-model-300-samples \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_3500_samples_train_yvillamil_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_3500_samples_train_yvillamil_pipeline_en.md new file mode 100644 index 00000000000000..9d98864019925e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_3500_samples_train_yvillamil_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuning_sentiment_model_3500_samples_train_yvillamil_pipeline pipeline DistilBertForSequenceClassification from yvillamil +author: John Snow Labs +name: finetuning_sentiment_model_3500_samples_train_yvillamil_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3500_samples_train_yvillamil_pipeline` is a English model originally trained by yvillamil. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3500_samples_train_yvillamil_pipeline_en_5.5.0_3.0_1726823485063.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3500_samples_train_yvillamil_pipeline_en_5.5.0_3.0_1726823485063.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuning_sentiment_model_3500_samples_train_yvillamil_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuning_sentiment_model_3500_samples_train_yvillamil_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3500_samples_train_yvillamil_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/yvillamil/finetuning-sentiment-model-3500-samples-train + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_5000_amazon_samples_en.md b/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_5000_amazon_samples_en.md new file mode 100644 index 00000000000000..b5bf93ebfbe913 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_5000_amazon_samples_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuning_sentiment_model_5000_amazon_samples DistilBertForSequenceClassification from maurocastill +author: John Snow Labs +name: finetuning_sentiment_model_5000_amazon_samples +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_5000_amazon_samples` is a English model originally trained by maurocastill. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_5000_amazon_samples_en_5.5.0_3.0_1726830440123.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_5000_amazon_samples_en_5.5.0_3.0_1726830440123.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_5000_amazon_samples","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_5000_amazon_samples", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_5000_amazon_samples| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/maurocastill/finetuning-sentiment-model-5000-amazon-samples \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_5000_amazon_samples_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_5000_amazon_samples_pipeline_en.md new file mode 100644 index 00000000000000..9ca1ad56ca62ab --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_5000_amazon_samples_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuning_sentiment_model_5000_amazon_samples_pipeline pipeline DistilBertForSequenceClassification from maurocastill +author: John Snow Labs +name: finetuning_sentiment_model_5000_amazon_samples_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_5000_amazon_samples_pipeline` is a English model originally trained by maurocastill. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_5000_amazon_samples_pipeline_en_5.5.0_3.0_1726830453954.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_5000_amazon_samples_pipeline_en_5.5.0_3.0_1726830453954.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuning_sentiment_model_5000_amazon_samples_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuning_sentiment_model_5000_amazon_samples_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_5000_amazon_samples_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/maurocastill/finetuning-sentiment-model-5000-amazon-samples + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_5000_amazonbaby_samples_a01793005_en.md b/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_5000_amazonbaby_samples_a01793005_en.md new file mode 100644 index 00000000000000..74866791bd6f31 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_5000_amazonbaby_samples_a01793005_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuning_sentiment_model_5000_amazonbaby_samples_a01793005 DistilBertForSequenceClassification from A01793005 +author: John Snow Labs +name: finetuning_sentiment_model_5000_amazonbaby_samples_a01793005 +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_5000_amazonbaby_samples_a01793005` is a English model originally trained by A01793005. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_5000_amazonbaby_samples_a01793005_en_5.5.0_3.0_1726848933026.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_5000_amazonbaby_samples_a01793005_en_5.5.0_3.0_1726848933026.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_5000_amazonbaby_samples_a01793005","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_5000_amazonbaby_samples_a01793005", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_5000_amazonbaby_samples_a01793005| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/A01793005/finetuning-sentiment-model-5000-amazonbaby-samples \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_5000_amazonbaby_samples_a01793005_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_5000_amazonbaby_samples_a01793005_pipeline_en.md new file mode 100644 index 00000000000000..3a3761ea60c9c1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_5000_amazonbaby_samples_a01793005_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuning_sentiment_model_5000_amazonbaby_samples_a01793005_pipeline pipeline DistilBertForSequenceClassification from A01793005 +author: John Snow Labs +name: finetuning_sentiment_model_5000_amazonbaby_samples_a01793005_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_5000_amazonbaby_samples_a01793005_pipeline` is a English model originally trained by A01793005. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_5000_amazonbaby_samples_a01793005_pipeline_en_5.5.0_3.0_1726848947952.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_5000_amazonbaby_samples_a01793005_pipeline_en_5.5.0_3.0_1726848947952.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuning_sentiment_model_5000_amazonbaby_samples_a01793005_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuning_sentiment_model_5000_amazonbaby_samples_a01793005_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_5000_amazonbaby_samples_a01793005_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/A01793005/finetuning-sentiment-model-5000-amazonbaby-samples + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_bensonzhang_en.md b/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_bensonzhang_en.md new file mode 100644 index 00000000000000..d73a9ff09f7644 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_bensonzhang_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuning_sentiment_model_bensonzhang DistilBertForSequenceClassification from BensonZhang +author: John Snow Labs +name: finetuning_sentiment_model_bensonzhang +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_bensonzhang` is a English model originally trained by BensonZhang. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_bensonzhang_en_5.5.0_3.0_1726842560434.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_bensonzhang_en_5.5.0_3.0_1726842560434.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_bensonzhang","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_bensonzhang", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_bensonzhang| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/BensonZhang/finetuning-sentiment-model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_bensonzhang_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_bensonzhang_pipeline_en.md new file mode 100644 index 00000000000000..4fd814ff224674 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_bensonzhang_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuning_sentiment_model_bensonzhang_pipeline pipeline DistilBertForSequenceClassification from BensonZhang +author: John Snow Labs +name: finetuning_sentiment_model_bensonzhang_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_bensonzhang_pipeline` is a English model originally trained by BensonZhang. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_bensonzhang_pipeline_en_5.5.0_3.0_1726842575365.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_bensonzhang_pipeline_en_5.5.0_3.0_1726842575365.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuning_sentiment_model_bensonzhang_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuning_sentiment_model_bensonzhang_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_bensonzhang_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/BensonZhang/finetuning-sentiment-model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_bugabooo30_en.md b/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_bugabooo30_en.md new file mode 100644 index 00000000000000..364ae57ca71501 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_bugabooo30_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuning_sentiment_model_bugabooo30 DistilBertForSequenceClassification from Bugabooo30 +author: John Snow Labs +name: finetuning_sentiment_model_bugabooo30 +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_bugabooo30` is a English model originally trained by Bugabooo30. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_bugabooo30_en_5.5.0_3.0_1726841969962.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_bugabooo30_en_5.5.0_3.0_1726841969962.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_bugabooo30","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_bugabooo30", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_bugabooo30| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Bugabooo30/finetuning-sentiment-model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_bugabooo30_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_bugabooo30_pipeline_en.md new file mode 100644 index 00000000000000..7a28487aa83cc1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_bugabooo30_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuning_sentiment_model_bugabooo30_pipeline pipeline DistilBertForSequenceClassification from Bugabooo30 +author: John Snow Labs +name: finetuning_sentiment_model_bugabooo30_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_bugabooo30_pipeline` is a English model originally trained by Bugabooo30. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_bugabooo30_pipeline_en_5.5.0_3.0_1726841982328.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_bugabooo30_pipeline_en_5.5.0_3.0_1726841982328.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuning_sentiment_model_bugabooo30_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuning_sentiment_model_bugabooo30_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_bugabooo30_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Bugabooo30/finetuning-sentiment-model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_enriquer_en.md b/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_enriquer_en.md new file mode 100644 index 00000000000000..8a3ec2b1288763 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_enriquer_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuning_sentiment_model_enriquer DistilBertForSequenceClassification from EnriqueR +author: John Snow Labs +name: finetuning_sentiment_model_enriquer +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_enriquer` is a English model originally trained by EnriqueR. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_enriquer_en_5.5.0_3.0_1726792530274.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_enriquer_en_5.5.0_3.0_1726792530274.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_enriquer","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_enriquer", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_enriquer| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/EnriqueR/finetuning-sentiment-model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_enriquer_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_enriquer_pipeline_en.md new file mode 100644 index 00000000000000..689e2f6339c6d0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_enriquer_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuning_sentiment_model_enriquer_pipeline pipeline DistilBertForSequenceClassification from EnriqueR +author: John Snow Labs +name: finetuning_sentiment_model_enriquer_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_enriquer_pipeline` is a English model originally trained by EnriqueR. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_enriquer_pipeline_en_5.5.0_3.0_1726792542831.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_enriquer_pipeline_en_5.5.0_3.0_1726792542831.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuning_sentiment_model_enriquer_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuning_sentiment_model_enriquer_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_enriquer_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/EnriqueR/finetuning-sentiment-model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_samples_prince12f_en.md b/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_samples_prince12f_en.md new file mode 100644 index 00000000000000..0784f45a59664b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_samples_prince12f_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuning_sentiment_model_samples_prince12f DistilBertForSequenceClassification from Prince12f +author: John Snow Labs +name: finetuning_sentiment_model_samples_prince12f +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_samples_prince12f` is a English model originally trained by Prince12f. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_samples_prince12f_en_5.5.0_3.0_1726861315053.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_samples_prince12f_en_5.5.0_3.0_1726861315053.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_samples_prince12f","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_samples_prince12f", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_samples_prince12f| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Prince12f/finetuning-sentiment-model-samples \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_samples_prince12f_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_samples_prince12f_pipeline_en.md new file mode 100644 index 00000000000000..d37e335e437962 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_samples_prince12f_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuning_sentiment_model_samples_prince12f_pipeline pipeline DistilBertForSequenceClassification from Prince12f +author: John Snow Labs +name: finetuning_sentiment_model_samples_prince12f_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_samples_prince12f_pipeline` is a English model originally trained by Prince12f. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_samples_prince12f_pipeline_en_5.5.0_3.0_1726861326498.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_samples_prince12f_pipeline_en_5.5.0_3.0_1726861326498.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuning_sentiment_model_samples_prince12f_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuning_sentiment_model_samples_prince12f_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_samples_prince12f_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Prince12f/finetuning-sentiment-model-samples + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_roberta_base_model_10000_samples_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_roberta_base_model_10000_samples_pipeline_en.md new file mode 100644 index 00000000000000..89412f3c0633ca --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_roberta_base_model_10000_samples_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuning_sentiment_roberta_base_model_10000_samples_pipeline pipeline RoBertaForSequenceClassification from pryshlyak +author: John Snow Labs +name: finetuning_sentiment_roberta_base_model_10000_samples_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_roberta_base_model_10000_samples_pipeline` is a English model originally trained by pryshlyak. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_roberta_base_model_10000_samples_pipeline_en_5.5.0_3.0_1726799069085.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_roberta_base_model_10000_samples_pipeline_en_5.5.0_3.0_1726799069085.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuning_sentiment_roberta_base_model_10000_samples_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuning_sentiment_roberta_base_model_10000_samples_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_roberta_base_model_10000_samples_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|459.9 MB| + +## References + +https://huggingface.co/pryshlyak/finetuning-sentiment-roberta-base-model-10000-samples + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-frpile_mlm_basel_roberta_en.md b/docs/_posts/ahmedlone127/2024-09-20-frpile_mlm_basel_roberta_en.md new file mode 100644 index 00000000000000..d2771affdd95ca --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-frpile_mlm_basel_roberta_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English frpile_mlm_basel_roberta RoBertaEmbeddings from DragosGorduza +author: John Snow Labs +name: frpile_mlm_basel_roberta +date: 2024-09-20 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`frpile_mlm_basel_roberta` is a English model originally trained by DragosGorduza. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/frpile_mlm_basel_roberta_en_5.5.0_3.0_1726857261256.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/frpile_mlm_basel_roberta_en_5.5.0_3.0_1726857261256.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("frpile_mlm_basel_roberta","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("frpile_mlm_basel_roberta","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|frpile_mlm_basel_roberta| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|466.0 MB| + +## References + +https://huggingface.co/DragosGorduza/FRPile_MLM_Basel_Roberta \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-frpile_mlm_basel_roberta_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-frpile_mlm_basel_roberta_pipeline_en.md new file mode 100644 index 00000000000000..a7c0debff0f8aa --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-frpile_mlm_basel_roberta_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English frpile_mlm_basel_roberta_pipeline pipeline RoBertaEmbeddings from DragosGorduza +author: John Snow Labs +name: frpile_mlm_basel_roberta_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`frpile_mlm_basel_roberta_pipeline` is a English model originally trained by DragosGorduza. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/frpile_mlm_basel_roberta_pipeline_en_5.5.0_3.0_1726857283324.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/frpile_mlm_basel_roberta_pipeline_en_5.5.0_3.0_1726857283324.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("frpile_mlm_basel_roberta_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("frpile_mlm_basel_roberta_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|frpile_mlm_basel_roberta_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|466.0 MB| + +## References + +https://huggingface.co/DragosGorduza/FRPile_MLM_Basel_Roberta + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-ft_distb_toxicity_en.md b/docs/_posts/ahmedlone127/2024-09-20-ft_distb_toxicity_en.md new file mode 100644 index 00000000000000..c70219ba5b5e4a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-ft_distb_toxicity_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English ft_distb_toxicity DistilBertForSequenceClassification from Yash907 +author: John Snow Labs +name: ft_distb_toxicity +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ft_distb_toxicity` is a English model originally trained by Yash907. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ft_distb_toxicity_en_5.5.0_3.0_1726830118159.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ft_distb_toxicity_en_5.5.0_3.0_1726830118159.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("ft_distb_toxicity","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("ft_distb_toxicity", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ft_distb_toxicity| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Yash907/ft-distb-toxicity \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-ft_distb_toxicity_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-ft_distb_toxicity_pipeline_en.md new file mode 100644 index 00000000000000..a3760ebe322bb6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-ft_distb_toxicity_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English ft_distb_toxicity_pipeline pipeline DistilBertForSequenceClassification from Yash907 +author: John Snow Labs +name: ft_distb_toxicity_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ft_distb_toxicity_pipeline` is a English model originally trained by Yash907. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ft_distb_toxicity_pipeline_en_5.5.0_3.0_1726830129695.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ft_distb_toxicity_pipeline_en_5.5.0_3.0_1726830129695.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("ft_distb_toxicity_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("ft_distb_toxicity_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ft_distb_toxicity_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Yash907/ft-distb-toxicity + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-ft_distilroberta_base_with_askscience_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-ft_distilroberta_base_with_askscience_pipeline_en.md new file mode 100644 index 00000000000000..d2d66f124bb82c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-ft_distilroberta_base_with_askscience_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English ft_distilroberta_base_with_askscience_pipeline pipeline RoBertaEmbeddings from aisuko +author: John Snow Labs +name: ft_distilroberta_base_with_askscience_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ft_distilroberta_base_with_askscience_pipeline` is a English model originally trained by aisuko. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ft_distilroberta_base_with_askscience_pipeline_en_5.5.0_3.0_1726796320099.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ft_distilroberta_base_with_askscience_pipeline_en_5.5.0_3.0_1726796320099.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("ft_distilroberta_base_with_askscience_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("ft_distilroberta_base_with_askscience_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ft_distilroberta_base_with_askscience_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|306.5 MB| + +## References + +https://huggingface.co/aisuko/ft-distilroberta-base-with-askscience + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-furina_seed42_eng_kinyarwanda_hau_cross_0_0001_en.md b/docs/_posts/ahmedlone127/2024-09-20-furina_seed42_eng_kinyarwanda_hau_cross_0_0001_en.md new file mode 100644 index 00000000000000..1e16cd9937e1ab --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-furina_seed42_eng_kinyarwanda_hau_cross_0_0001_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English furina_seed42_eng_kinyarwanda_hau_cross_0_0001 XlmRoBertaForSequenceClassification from Shijia +author: John Snow Labs +name: furina_seed42_eng_kinyarwanda_hau_cross_0_0001 +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`furina_seed42_eng_kinyarwanda_hau_cross_0_0001` is a English model originally trained by Shijia. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/furina_seed42_eng_kinyarwanda_hau_cross_0_0001_en_5.5.0_3.0_1726846200936.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/furina_seed42_eng_kinyarwanda_hau_cross_0_0001_en_5.5.0_3.0_1726846200936.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("furina_seed42_eng_kinyarwanda_hau_cross_0_0001","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("furina_seed42_eng_kinyarwanda_hau_cross_0_0001", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|furina_seed42_eng_kinyarwanda_hau_cross_0_0001| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.5 GB| + +## References + +https://huggingface.co/Shijia/furina_seed42_eng_kin_hau_cross_0.0001 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-furina_seed42_eng_kinyarwanda_hau_cross_0_0001_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-furina_seed42_eng_kinyarwanda_hau_cross_0_0001_pipeline_en.md new file mode 100644 index 00000000000000..d732c84a802f13 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-furina_seed42_eng_kinyarwanda_hau_cross_0_0001_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English furina_seed42_eng_kinyarwanda_hau_cross_0_0001_pipeline pipeline XlmRoBertaForSequenceClassification from Shijia +author: John Snow Labs +name: furina_seed42_eng_kinyarwanda_hau_cross_0_0001_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`furina_seed42_eng_kinyarwanda_hau_cross_0_0001_pipeline` is a English model originally trained by Shijia. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/furina_seed42_eng_kinyarwanda_hau_cross_0_0001_pipeline_en_5.5.0_3.0_1726846271254.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/furina_seed42_eng_kinyarwanda_hau_cross_0_0001_pipeline_en_5.5.0_3.0_1726846271254.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("furina_seed42_eng_kinyarwanda_hau_cross_0_0001_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("furina_seed42_eng_kinyarwanda_hau_cross_0_0001_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|furina_seed42_eng_kinyarwanda_hau_cross_0_0001_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.5 GB| + +## References + +https://huggingface.co/Shijia/furina_seed42_eng_kin_hau_cross_0.0001 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-furina_seed42_eng_kinyarwanda_hau_cross_2e_05_en.md b/docs/_posts/ahmedlone127/2024-09-20-furina_seed42_eng_kinyarwanda_hau_cross_2e_05_en.md new file mode 100644 index 00000000000000..44a54ae468fdb3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-furina_seed42_eng_kinyarwanda_hau_cross_2e_05_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English furina_seed42_eng_kinyarwanda_hau_cross_2e_05 XlmRoBertaForSequenceClassification from Shijia +author: John Snow Labs +name: furina_seed42_eng_kinyarwanda_hau_cross_2e_05 +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`furina_seed42_eng_kinyarwanda_hau_cross_2e_05` is a English model originally trained by Shijia. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/furina_seed42_eng_kinyarwanda_hau_cross_2e_05_en_5.5.0_3.0_1726865516748.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/furina_seed42_eng_kinyarwanda_hau_cross_2e_05_en_5.5.0_3.0_1726865516748.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("furina_seed42_eng_kinyarwanda_hau_cross_2e_05","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("furina_seed42_eng_kinyarwanda_hau_cross_2e_05", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|furina_seed42_eng_kinyarwanda_hau_cross_2e_05| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.5 GB| + +## References + +https://huggingface.co/Shijia/furina_seed42_eng_kin_hau_cross_2e-05 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-furina_seed42_eng_kinyarwanda_hau_cross_2e_05_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-furina_seed42_eng_kinyarwanda_hau_cross_2e_05_pipeline_en.md new file mode 100644 index 00000000000000..8d93a0b96963fd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-furina_seed42_eng_kinyarwanda_hau_cross_2e_05_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English furina_seed42_eng_kinyarwanda_hau_cross_2e_05_pipeline pipeline XlmRoBertaForSequenceClassification from Shijia +author: John Snow Labs +name: furina_seed42_eng_kinyarwanda_hau_cross_2e_05_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`furina_seed42_eng_kinyarwanda_hau_cross_2e_05_pipeline` is a English model originally trained by Shijia. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/furina_seed42_eng_kinyarwanda_hau_cross_2e_05_pipeline_en_5.5.0_3.0_1726865591572.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/furina_seed42_eng_kinyarwanda_hau_cross_2e_05_pipeline_en_5.5.0_3.0_1726865591572.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("furina_seed42_eng_kinyarwanda_hau_cross_2e_05_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("furina_seed42_eng_kinyarwanda_hau_cross_2e_05_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|furina_seed42_eng_kinyarwanda_hau_cross_2e_05_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.5 GB| + +## References + +https://huggingface.co/Shijia/furina_seed42_eng_kin_hau_cross_2e-05 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-gal_ner_iw_catalan_galician_2_en.md b/docs/_posts/ahmedlone127/2024-09-20-gal_ner_iw_catalan_galician_2_en.md new file mode 100644 index 00000000000000..98899ba673ebbb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-gal_ner_iw_catalan_galician_2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English gal_ner_iw_catalan_galician_2 XlmRoBertaForTokenClassification from homersimpson +author: John Snow Labs +name: gal_ner_iw_catalan_galician_2 +date: 2024-09-20 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`gal_ner_iw_catalan_galician_2` is a English model originally trained by homersimpson. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/gal_ner_iw_catalan_galician_2_en_5.5.0_3.0_1726842999907.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/gal_ner_iw_catalan_galician_2_en_5.5.0_3.0_1726842999907.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("gal_ner_iw_catalan_galician_2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("gal_ner_iw_catalan_galician_2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|gal_ner_iw_catalan_galician_2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|426.5 MB| + +## References + +https://huggingface.co/homersimpson/gal-ner-iw-ca-gl-2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-gal_ner_iw_catalan_galician_2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-gal_ner_iw_catalan_galician_2_pipeline_en.md new file mode 100644 index 00000000000000..5ad7755b472b13 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-gal_ner_iw_catalan_galician_2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English gal_ner_iw_catalan_galician_2_pipeline pipeline XlmRoBertaForTokenClassification from homersimpson +author: John Snow Labs +name: gal_ner_iw_catalan_galician_2_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`gal_ner_iw_catalan_galician_2_pipeline` is a English model originally trained by homersimpson. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/gal_ner_iw_catalan_galician_2_pipeline_en_5.5.0_3.0_1726843033314.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/gal_ner_iw_catalan_galician_2_pipeline_en_5.5.0_3.0_1726843033314.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("gal_ner_iw_catalan_galician_2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("gal_ner_iw_catalan_galician_2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|gal_ner_iw_catalan_galician_2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|426.5 MB| + +## References + +https://huggingface.co/homersimpson/gal-ner-iw-ca-gl-2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-harm_detection_en.md b/docs/_posts/ahmedlone127/2024-09-20-harm_detection_en.md new file mode 100644 index 00000000000000..e3a6a70080ac39 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-harm_detection_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English harm_detection DistilBertForSequenceClassification from DaJulster +author: John Snow Labs +name: harm_detection +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`harm_detection` is a English model originally trained by DaJulster. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/harm_detection_en_5.5.0_3.0_1726792402290.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/harm_detection_en_5.5.0_3.0_1726792402290.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("harm_detection","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("harm_detection", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|harm_detection| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/DaJulster/Harm_detection \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-harm_detection_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-harm_detection_pipeline_en.md new file mode 100644 index 00000000000000..be4810a3f5216b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-harm_detection_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English harm_detection_pipeline pipeline DistilBertForSequenceClassification from DaJulster +author: John Snow Labs +name: harm_detection_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`harm_detection_pipeline` is a English model originally trained by DaJulster. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/harm_detection_pipeline_en_5.5.0_3.0_1726792414853.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/harm_detection_pipeline_en_5.5.0_3.0_1726792414853.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("harm_detection_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("harm_detection_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|harm_detection_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/DaJulster/Harm_detection + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-hate_hate_random0_seed0_bernice_en.md b/docs/_posts/ahmedlone127/2024-09-20-hate_hate_random0_seed0_bernice_en.md new file mode 100644 index 00000000000000..3d58fa75ca3efc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-hate_hate_random0_seed0_bernice_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English hate_hate_random0_seed0_bernice XlmRoBertaForSequenceClassification from tweettemposhift +author: John Snow Labs +name: hate_hate_random0_seed0_bernice +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`hate_hate_random0_seed0_bernice` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/hate_hate_random0_seed0_bernice_en_5.5.0_3.0_1726865766972.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/hate_hate_random0_seed0_bernice_en_5.5.0_3.0_1726865766972.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("hate_hate_random0_seed0_bernice","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("hate_hate_random0_seed0_bernice", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|hate_hate_random0_seed0_bernice| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|783.4 MB| + +## References + +https://huggingface.co/tweettemposhift/hate-hate_random0_seed0-bernice \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-hate_hate_random0_seed0_bernice_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-hate_hate_random0_seed0_bernice_pipeline_en.md new file mode 100644 index 00000000000000..24672cc67e0c34 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-hate_hate_random0_seed0_bernice_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English hate_hate_random0_seed0_bernice_pipeline pipeline XlmRoBertaForSequenceClassification from tweettemposhift +author: John Snow Labs +name: hate_hate_random0_seed0_bernice_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`hate_hate_random0_seed0_bernice_pipeline` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/hate_hate_random0_seed0_bernice_pipeline_en_5.5.0_3.0_1726865910419.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/hate_hate_random0_seed0_bernice_pipeline_en_5.5.0_3.0_1726865910419.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("hate_hate_random0_seed0_bernice_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("hate_hate_random0_seed0_bernice_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|hate_hate_random0_seed0_bernice_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|783.5 MB| + +## References + +https://huggingface.co/tweettemposhift/hate-hate_random0_seed0-bernice + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-hate_hate_random2_seed0_twitter_roberta_base_dec2020_en.md b/docs/_posts/ahmedlone127/2024-09-20-hate_hate_random2_seed0_twitter_roberta_base_dec2020_en.md new file mode 100644 index 00000000000000..bd354bbdbf5789 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-hate_hate_random2_seed0_twitter_roberta_base_dec2020_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English hate_hate_random2_seed0_twitter_roberta_base_dec2020 RoBertaForSequenceClassification from tweettemposhift +author: John Snow Labs +name: hate_hate_random2_seed0_twitter_roberta_base_dec2020 +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`hate_hate_random2_seed0_twitter_roberta_base_dec2020` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/hate_hate_random2_seed0_twitter_roberta_base_dec2020_en_5.5.0_3.0_1726851589258.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/hate_hate_random2_seed0_twitter_roberta_base_dec2020_en_5.5.0_3.0_1726851589258.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("hate_hate_random2_seed0_twitter_roberta_base_dec2020","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("hate_hate_random2_seed0_twitter_roberta_base_dec2020", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|hate_hate_random2_seed0_twitter_roberta_base_dec2020| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|468.3 MB| + +## References + +https://huggingface.co/tweettemposhift/hate-hate_random2_seed0-twitter-roberta-base-dec2020 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-hate_hate_random2_seed0_twitter_roberta_base_dec2020_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-hate_hate_random2_seed0_twitter_roberta_base_dec2020_pipeline_en.md new file mode 100644 index 00000000000000..42a411d3c721e5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-hate_hate_random2_seed0_twitter_roberta_base_dec2020_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English hate_hate_random2_seed0_twitter_roberta_base_dec2020_pipeline pipeline RoBertaForSequenceClassification from tweettemposhift +author: John Snow Labs +name: hate_hate_random2_seed0_twitter_roberta_base_dec2020_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`hate_hate_random2_seed0_twitter_roberta_base_dec2020_pipeline` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/hate_hate_random2_seed0_twitter_roberta_base_dec2020_pipeline_en_5.5.0_3.0_1726851612285.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/hate_hate_random2_seed0_twitter_roberta_base_dec2020_pipeline_en_5.5.0_3.0_1726851612285.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("hate_hate_random2_seed0_twitter_roberta_base_dec2020_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("hate_hate_random2_seed0_twitter_roberta_base_dec2020_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|hate_hate_random2_seed0_twitter_roberta_base_dec2020_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|468.3 MB| + +## References + +https://huggingface.co/tweettemposhift/hate-hate_random2_seed0-twitter-roberta-base-dec2020 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-hate_hate_random3_seed1_roberta_base_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-hate_hate_random3_seed1_roberta_base_pipeline_en.md new file mode 100644 index 00000000000000..da7ef4e8fac96c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-hate_hate_random3_seed1_roberta_base_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English hate_hate_random3_seed1_roberta_base_pipeline pipeline RoBertaForSequenceClassification from tweettemposhift +author: John Snow Labs +name: hate_hate_random3_seed1_roberta_base_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`hate_hate_random3_seed1_roberta_base_pipeline` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/hate_hate_random3_seed1_roberta_base_pipeline_en_5.5.0_3.0_1726804464538.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/hate_hate_random3_seed1_roberta_base_pipeline_en_5.5.0_3.0_1726804464538.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("hate_hate_random3_seed1_roberta_base_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("hate_hate_random3_seed1_roberta_base_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|hate_hate_random3_seed1_roberta_base_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|427.6 MB| + +## References + +https://huggingface.co/tweettemposhift/hate-hate_random3_seed1-roberta-base + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-hf_qa_bert_base_uncased_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-hf_qa_bert_base_uncased_pipeline_en.md new file mode 100644 index 00000000000000..eb8f480158fa96 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-hf_qa_bert_base_uncased_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English hf_qa_bert_base_uncased_pipeline pipeline BertForQuestionAnswering from rinogrego +author: John Snow Labs +name: hf_qa_bert_base_uncased_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`hf_qa_bert_base_uncased_pipeline` is a English model originally trained by rinogrego. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/hf_qa_bert_base_uncased_pipeline_en_5.5.0_3.0_1726808128012.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/hf_qa_bert_base_uncased_pipeline_en_5.5.0_3.0_1726808128012.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("hf_qa_bert_base_uncased_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("hf_qa_bert_base_uncased_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|hf_qa_bert_base_uncased_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/rinogrego/HF-QA-bert-base-uncased + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-hw1_1_question_answering_bert_base_chinese_finetuned_en.md b/docs/_posts/ahmedlone127/2024-09-20-hw1_1_question_answering_bert_base_chinese_finetuned_en.md new file mode 100644 index 00000000000000..870761916f9b8c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-hw1_1_question_answering_bert_base_chinese_finetuned_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English hw1_1_question_answering_bert_base_chinese_finetuned BertForQuestionAnswering from b10401015 +author: John Snow Labs +name: hw1_1_question_answering_bert_base_chinese_finetuned +date: 2024-09-20 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`hw1_1_question_answering_bert_base_chinese_finetuned` is a English model originally trained by b10401015. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/hw1_1_question_answering_bert_base_chinese_finetuned_en_5.5.0_3.0_1726833829131.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/hw1_1_question_answering_bert_base_chinese_finetuned_en_5.5.0_3.0_1726833829131.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("hw1_1_question_answering_bert_base_chinese_finetuned","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("hw1_1_question_answering_bert_base_chinese_finetuned", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|hw1_1_question_answering_bert_base_chinese_finetuned| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|381.1 MB| + +## References + +https://huggingface.co/b10401015/hw1-1-question_answering-bert-base-chinese-finetuned \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-hw1_1_question_answering_bert_base_chinese_finetuned_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-hw1_1_question_answering_bert_base_chinese_finetuned_pipeline_en.md new file mode 100644 index 00000000000000..1d7671c884e707 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-hw1_1_question_answering_bert_base_chinese_finetuned_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English hw1_1_question_answering_bert_base_chinese_finetuned_pipeline pipeline BertForQuestionAnswering from b10401015 +author: John Snow Labs +name: hw1_1_question_answering_bert_base_chinese_finetuned_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`hw1_1_question_answering_bert_base_chinese_finetuned_pipeline` is a English model originally trained by b10401015. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/hw1_1_question_answering_bert_base_chinese_finetuned_pipeline_en_5.5.0_3.0_1726833847120.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/hw1_1_question_answering_bert_base_chinese_finetuned_pipeline_en_5.5.0_3.0_1726833847120.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("hw1_1_question_answering_bert_base_chinese_finetuned_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("hw1_1_question_answering_bert_base_chinese_finetuned_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|hw1_1_question_answering_bert_base_chinese_finetuned_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|381.1 MB| + +## References + +https://huggingface.co/b10401015/hw1-1-question_answering-bert-base-chinese-finetuned + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-hw_1_jerry81920_en.md b/docs/_posts/ahmedlone127/2024-09-20-hw_1_jerry81920_en.md new file mode 100644 index 00000000000000..5f33c3b6c79277 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-hw_1_jerry81920_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English hw_1_jerry81920 DistilBertForSequenceClassification from Jerry81920 +author: John Snow Labs +name: hw_1_jerry81920 +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`hw_1_jerry81920` is a English model originally trained by Jerry81920. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/hw_1_jerry81920_en_5.5.0_3.0_1726829860054.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/hw_1_jerry81920_en_5.5.0_3.0_1726829860054.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("hw_1_jerry81920","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("hw_1_jerry81920", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|hw_1_jerry81920| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Jerry81920/hw-1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-ibert_roberta_base_finetuned_imdb_en.md b/docs/_posts/ahmedlone127/2024-09-20-ibert_roberta_base_finetuned_imdb_en.md new file mode 100644 index 00000000000000..28c36946bdd58a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-ibert_roberta_base_finetuned_imdb_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English ibert_roberta_base_finetuned_imdb RoBertaEmbeddings from elayat +author: John Snow Labs +name: ibert_roberta_base_finetuned_imdb +date: 2024-09-20 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ibert_roberta_base_finetuned_imdb` is a English model originally trained by elayat. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ibert_roberta_base_finetuned_imdb_en_5.5.0_3.0_1726815707272.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ibert_roberta_base_finetuned_imdb_en_5.5.0_3.0_1726815707272.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("ibert_roberta_base_finetuned_imdb","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("ibert_roberta_base_finetuned_imdb","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ibert_roberta_base_finetuned_imdb| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|466.3 MB| + +## References + +https://huggingface.co/elayat/ibert-roberta-base-finetuned-imdb \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-ibert_roberta_base_finetuned_imdb_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-ibert_roberta_base_finetuned_imdb_pipeline_en.md new file mode 100644 index 00000000000000..639a4ca48dc7cc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-ibert_roberta_base_finetuned_imdb_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English ibert_roberta_base_finetuned_imdb_pipeline pipeline RoBertaEmbeddings from elayat +author: John Snow Labs +name: ibert_roberta_base_finetuned_imdb_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ibert_roberta_base_finetuned_imdb_pipeline` is a English model originally trained by elayat. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ibert_roberta_base_finetuned_imdb_pipeline_en_5.5.0_3.0_1726815729901.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ibert_roberta_base_finetuned_imdb_pipeline_en_5.5.0_3.0_1726815729901.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("ibert_roberta_base_finetuned_imdb_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("ibert_roberta_base_finetuned_imdb_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ibert_roberta_base_finetuned_imdb_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|466.3 MB| + +## References + +https://huggingface.co/elayat/ibert-roberta-base-finetuned-imdb + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-icebert_igc_is.md b/docs/_posts/ahmedlone127/2024-09-20-icebert_igc_is.md new file mode 100644 index 00000000000000..2a8c3b8e752112 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-icebert_igc_is.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Icelandic icebert_igc RoBertaEmbeddings from mideind +author: John Snow Labs +name: icebert_igc +date: 2024-09-20 +tags: [is, open_source, onnx, embeddings, roberta] +task: Embeddings +language: is +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`icebert_igc` is a Icelandic model originally trained by mideind. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/icebert_igc_is_5.5.0_3.0_1726816456111.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/icebert_igc_is_5.5.0_3.0_1726816456111.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("icebert_igc","is") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("icebert_igc","is") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|icebert_igc| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|is| +|Size:|295.9 MB| + +## References + +https://huggingface.co/mideind/IceBERT-igc \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-ieq_bert_en.md b/docs/_posts/ahmedlone127/2024-09-20-ieq_bert_en.md new file mode 100644 index 00000000000000..a8e641bac0555f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-ieq_bert_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English ieq_bert BertForSequenceClassification from ieq +author: John Snow Labs +name: ieq_bert +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ieq_bert` is a English model originally trained by ieq. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ieq_bert_en_5.5.0_3.0_1726828900783.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ieq_bert_en_5.5.0_3.0_1726828900783.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("ieq_bert","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("ieq_bert", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ieq_bert| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/ieq/IEQ-BERT \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-ieq_bert_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-ieq_bert_pipeline_en.md new file mode 100644 index 00000000000000..78dc47f27a096a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-ieq_bert_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English ieq_bert_pipeline pipeline BertForSequenceClassification from ieq +author: John Snow Labs +name: ieq_bert_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ieq_bert_pipeline` is a English model originally trained by ieq. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ieq_bert_pipeline_en_5.5.0_3.0_1726828919987.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ieq_bert_pipeline_en_5.5.0_3.0_1726828919987.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("ieq_bert_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("ieq_bert_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ieq_bert_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/ieq/IEQ-BERT + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-imdbreviews_classification_distilbert_v02_maherh_en.md b/docs/_posts/ahmedlone127/2024-09-20-imdbreviews_classification_distilbert_v02_maherh_en.md new file mode 100644 index 00000000000000..de7577bd475d99 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-imdbreviews_classification_distilbert_v02_maherh_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English imdbreviews_classification_distilbert_v02_maherh BertForSequenceClassification from maherh +author: John Snow Labs +name: imdbreviews_classification_distilbert_v02_maherh +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`imdbreviews_classification_distilbert_v02_maherh` is a English model originally trained by maherh. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/imdbreviews_classification_distilbert_v02_maherh_en_5.5.0_3.0_1726870100675.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/imdbreviews_classification_distilbert_v02_maherh_en_5.5.0_3.0_1726870100675.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("imdbreviews_classification_distilbert_v02_maherh","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("imdbreviews_classification_distilbert_v02_maherh", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|imdbreviews_classification_distilbert_v02_maherh| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|44.1 MB| + +## References + +https://huggingface.co/maherh/imdbreviews_classification_distilbert_v02 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-imdbreviews_classification_distilbert_v02_maherh_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-imdbreviews_classification_distilbert_v02_maherh_pipeline_en.md new file mode 100644 index 00000000000000..834ae13edbfeec --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-imdbreviews_classification_distilbert_v02_maherh_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English imdbreviews_classification_distilbert_v02_maherh_pipeline pipeline BertForSequenceClassification from maherh +author: John Snow Labs +name: imdbreviews_classification_distilbert_v02_maherh_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`imdbreviews_classification_distilbert_v02_maherh_pipeline` is a English model originally trained by maherh. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/imdbreviews_classification_distilbert_v02_maherh_pipeline_en_5.5.0_3.0_1726870103025.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/imdbreviews_classification_distilbert_v02_maherh_pipeline_en_5.5.0_3.0_1726870103025.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("imdbreviews_classification_distilbert_v02_maherh_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("imdbreviews_classification_distilbert_v02_maherh_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|imdbreviews_classification_distilbert_v02_maherh_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|44.2 MB| + +## References + +https://huggingface.co/maherh/imdbreviews_classification_distilbert_v02 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-infoxlm_base_on_custom_kural_500_en.md b/docs/_posts/ahmedlone127/2024-09-20-infoxlm_base_on_custom_kural_500_en.md new file mode 100644 index 00000000000000..8cfcf5d48fc0d6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-infoxlm_base_on_custom_kural_500_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English infoxlm_base_on_custom_kural_500 XlmRoBertaForSequenceClassification from bikram22pi7 +author: John Snow Labs +name: infoxlm_base_on_custom_kural_500 +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`infoxlm_base_on_custom_kural_500` is a English model originally trained by bikram22pi7. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/infoxlm_base_on_custom_kural_500_en_5.5.0_3.0_1726846313309.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/infoxlm_base_on_custom_kural_500_en_5.5.0_3.0_1726846313309.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("infoxlm_base_on_custom_kural_500","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("infoxlm_base_on_custom_kural_500", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|infoxlm_base_on_custom_kural_500| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|777.7 MB| + +## References + +https://huggingface.co/bikram22pi7/infoxlm-base-on-custom-kural-500 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-infoxlm_base_on_custom_kural_500_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-infoxlm_base_on_custom_kural_500_pipeline_en.md new file mode 100644 index 00000000000000..fcc2ec397c9067 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-infoxlm_base_on_custom_kural_500_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English infoxlm_base_on_custom_kural_500_pipeline pipeline XlmRoBertaForSequenceClassification from bikram22pi7 +author: John Snow Labs +name: infoxlm_base_on_custom_kural_500_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`infoxlm_base_on_custom_kural_500_pipeline` is a English model originally trained by bikram22pi7. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/infoxlm_base_on_custom_kural_500_pipeline_en_5.5.0_3.0_1726846462142.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/infoxlm_base_on_custom_kural_500_pipeline_en_5.5.0_3.0_1726846462142.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("infoxlm_base_on_custom_kural_500_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("infoxlm_base_on_custom_kural_500_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|infoxlm_base_on_custom_kural_500_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|777.7 MB| + +## References + +https://huggingface.co/bikram22pi7/infoxlm-base-on-custom-kural-500 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-inisw08_robert_mlm_adagrad_en.md b/docs/_posts/ahmedlone127/2024-09-20-inisw08_robert_mlm_adagrad_en.md new file mode 100644 index 00000000000000..08fb226f8d4205 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-inisw08_robert_mlm_adagrad_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English inisw08_robert_mlm_adagrad RoBertaEmbeddings from ugiugi +author: John Snow Labs +name: inisw08_robert_mlm_adagrad +date: 2024-09-20 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`inisw08_robert_mlm_adagrad` is a English model originally trained by ugiugi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/inisw08_robert_mlm_adagrad_en_5.5.0_3.0_1726857605416.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/inisw08_robert_mlm_adagrad_en_5.5.0_3.0_1726857605416.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("inisw08_robert_mlm_adagrad","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("inisw08_robert_mlm_adagrad","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|inisw08_robert_mlm_adagrad| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|466.3 MB| + +## References + +https://huggingface.co/ugiugi/inisw08-RoBERT-mlm-adagrad \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-inisw08_robert_mlm_adagrad_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-inisw08_robert_mlm_adagrad_pipeline_en.md new file mode 100644 index 00000000000000..973a251264a2e9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-inisw08_robert_mlm_adagrad_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English inisw08_robert_mlm_adagrad_pipeline pipeline RoBertaEmbeddings from ugiugi +author: John Snow Labs +name: inisw08_robert_mlm_adagrad_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`inisw08_robert_mlm_adagrad_pipeline` is a English model originally trained by ugiugi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/inisw08_robert_mlm_adagrad_pipeline_en_5.5.0_3.0_1726857633819.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/inisw08_robert_mlm_adagrad_pipeline_en_5.5.0_3.0_1726857633819.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("inisw08_robert_mlm_adagrad_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("inisw08_robert_mlm_adagrad_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|inisw08_robert_mlm_adagrad_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|466.3 MB| + +## References + +https://huggingface.co/ugiugi/inisw08-RoBERT-mlm-adagrad + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-intent_classification_distilbert_hoaan2003_en.md b/docs/_posts/ahmedlone127/2024-09-20-intent_classification_distilbert_hoaan2003_en.md new file mode 100644 index 00000000000000..69293ee4df61ff --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-intent_classification_distilbert_hoaan2003_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English intent_classification_distilbert_hoaan2003 DistilBertForSequenceClassification from HoaAn2003 +author: John Snow Labs +name: intent_classification_distilbert_hoaan2003 +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`intent_classification_distilbert_hoaan2003` is a English model originally trained by HoaAn2003. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/intent_classification_distilbert_hoaan2003_en_5.5.0_3.0_1726848629901.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/intent_classification_distilbert_hoaan2003_en_5.5.0_3.0_1726848629901.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("intent_classification_distilbert_hoaan2003","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("intent_classification_distilbert_hoaan2003", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|intent_classification_distilbert_hoaan2003| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/HoaAn2003/intent_classification_distilbert \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-intent_classification_distilbert_hoaan2003_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-intent_classification_distilbert_hoaan2003_pipeline_en.md new file mode 100644 index 00000000000000..fd7e049266ce4e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-intent_classification_distilbert_hoaan2003_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English intent_classification_distilbert_hoaan2003_pipeline pipeline DistilBertForSequenceClassification from HoaAn2003 +author: John Snow Labs +name: intent_classification_distilbert_hoaan2003_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`intent_classification_distilbert_hoaan2003_pipeline` is a English model originally trained by HoaAn2003. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/intent_classification_distilbert_hoaan2003_pipeline_en_5.5.0_3.0_1726848645337.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/intent_classification_distilbert_hoaan2003_pipeline_en_5.5.0_3.0_1726848645337.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("intent_classification_distilbert_hoaan2003_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("intent_classification_distilbert_hoaan2003_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|intent_classification_distilbert_hoaan2003_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/HoaAn2003/intent_classification_distilbert + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-job_listing_filtering_model_en.md b/docs/_posts/ahmedlone127/2024-09-20-job_listing_filtering_model_en.md new file mode 100644 index 00000000000000..eb36a0cd9ce8be --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-job_listing_filtering_model_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English job_listing_filtering_model XlmRoBertaForSequenceClassification from saattrupdan +author: John Snow Labs +name: job_listing_filtering_model +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`job_listing_filtering_model` is a English model originally trained by saattrupdan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/job_listing_filtering_model_en_5.5.0_3.0_1726845998495.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/job_listing_filtering_model_en_5.5.0_3.0_1726845998495.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("job_listing_filtering_model","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("job_listing_filtering_model", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|job_listing_filtering_model| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|776.2 MB| + +## References + +https://huggingface.co/saattrupdan/job-listing-filtering-model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-job_listing_filtering_model_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-job_listing_filtering_model_pipeline_en.md new file mode 100644 index 00000000000000..d7a7b64712f333 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-job_listing_filtering_model_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English job_listing_filtering_model_pipeline pipeline XlmRoBertaForSequenceClassification from saattrupdan +author: John Snow Labs +name: job_listing_filtering_model_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`job_listing_filtering_model_pipeline` is a English model originally trained by saattrupdan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/job_listing_filtering_model_pipeline_en_5.5.0_3.0_1726846136591.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/job_listing_filtering_model_pipeline_en_5.5.0_3.0_1726846136591.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("job_listing_filtering_model_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("job_listing_filtering_model_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|job_listing_filtering_model_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|776.3 MB| + +## References + +https://huggingface.co/saattrupdan/job-listing-filtering-model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-kannadabert_en.md b/docs/_posts/ahmedlone127/2024-09-20-kannadabert_en.md new file mode 100644 index 00000000000000..f3350dd57df569 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-kannadabert_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English kannadabert RoBertaEmbeddings from Chakita +author: John Snow Labs +name: kannadabert +date: 2024-09-20 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`kannadabert` is a English model originally trained by Chakita. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/kannadabert_en_5.5.0_3.0_1726857819869.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/kannadabert_en_5.5.0_3.0_1726857819869.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("kannadabert","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("kannadabert","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|kannadabert| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|450.6 MB| + +## References + +https://huggingface.co/Chakita/KannadaBERT \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-kannadabert_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-kannadabert_pipeline_en.md new file mode 100644 index 00000000000000..affdff991e8167 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-kannadabert_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English kannadabert_pipeline pipeline RoBertaEmbeddings from Chakita +author: John Snow Labs +name: kannadabert_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`kannadabert_pipeline` is a English model originally trained by Chakita. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/kannadabert_pipeline_en_5.5.0_3.0_1726857841170.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/kannadabert_pipeline_en_5.5.0_3.0_1726857841170.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("kannadabert_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("kannadabert_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|kannadabert_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|450.7 MB| + +## References + +https://huggingface.co/Chakita/KannadaBERT + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-kinyaroberta_large_kinte_finetuned_kinyarwanda_tweet_finetuned_kinyarwanda_sent2_en.md b/docs/_posts/ahmedlone127/2024-09-20-kinyaroberta_large_kinte_finetuned_kinyarwanda_tweet_finetuned_kinyarwanda_sent2_en.md new file mode 100644 index 00000000000000..7db16ae4a38c91 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-kinyaroberta_large_kinte_finetuned_kinyarwanda_tweet_finetuned_kinyarwanda_sent2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English kinyaroberta_large_kinte_finetuned_kinyarwanda_tweet_finetuned_kinyarwanda_sent2 RoBertaForSequenceClassification from RogerB +author: John Snow Labs +name: kinyaroberta_large_kinte_finetuned_kinyarwanda_tweet_finetuned_kinyarwanda_sent2 +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`kinyaroberta_large_kinte_finetuned_kinyarwanda_tweet_finetuned_kinyarwanda_sent2` is a English model originally trained by RogerB. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/kinyaroberta_large_kinte_finetuned_kinyarwanda_tweet_finetuned_kinyarwanda_sent2_en_5.5.0_3.0_1726804592386.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/kinyaroberta_large_kinte_finetuned_kinyarwanda_tweet_finetuned_kinyarwanda_sent2_en_5.5.0_3.0_1726804592386.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("kinyaroberta_large_kinte_finetuned_kinyarwanda_tweet_finetuned_kinyarwanda_sent2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("kinyaroberta_large_kinte_finetuned_kinyarwanda_tweet_finetuned_kinyarwanda_sent2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|kinyaroberta_large_kinte_finetuned_kinyarwanda_tweet_finetuned_kinyarwanda_sent2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.3 MB| + +## References + +https://huggingface.co/RogerB/kinyaRoberta-large-kinte-finetuned-kin-tweet-finetuned-kin-sent2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-ks8_en.md b/docs/_posts/ahmedlone127/2024-09-20-ks8_en.md new file mode 100644 index 00000000000000..0c734b07f46d36 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-ks8_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English ks8 RoBertaForSequenceClassification from aloxatel +author: John Snow Labs +name: ks8 +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ks8` is a English model originally trained by aloxatel. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ks8_en_5.5.0_3.0_1726849776466.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ks8_en_5.5.0_3.0_1726849776466.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("ks8","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("ks8", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ks8| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/aloxatel/KS8 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-ks8_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-ks8_pipeline_en.md new file mode 100644 index 00000000000000..6a1808d8ea568f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-ks8_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English ks8_pipeline pipeline RoBertaForSequenceClassification from aloxatel +author: John Snow Labs +name: ks8_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ks8_pipeline` is a English model originally trained by aloxatel. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ks8_pipeline_en_5.5.0_3.0_1726849861184.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ks8_pipeline_en_5.5.0_3.0_1726849861184.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("ks8_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("ks8_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ks8_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/aloxatel/KS8 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-lenate_model_8_en.md b/docs/_posts/ahmedlone127/2024-09-20-lenate_model_8_en.md new file mode 100644 index 00000000000000..acfe6138367667 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-lenate_model_8_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English lenate_model_8 DistilBertForSequenceClassification from lenate +author: John Snow Labs +name: lenate_model_8 +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`lenate_model_8` is a English model originally trained by lenate. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/lenate_model_8_en_5.5.0_3.0_1726832516504.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/lenate_model_8_en_5.5.0_3.0_1726832516504.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("lenate_model_8","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("lenate_model_8", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|lenate_model_8| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/lenate/lenate_model_8 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-lenate_model_8_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-lenate_model_8_pipeline_en.md new file mode 100644 index 00000000000000..cbec513de707b9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-lenate_model_8_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English lenate_model_8_pipeline pipeline DistilBertForSequenceClassification from lenate +author: John Snow Labs +name: lenate_model_8_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`lenate_model_8_pipeline` is a English model originally trained by lenate. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/lenate_model_8_pipeline_en_5.5.0_3.0_1726832529038.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/lenate_model_8_pipeline_en_5.5.0_3.0_1726832529038.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("lenate_model_8_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("lenate_model_8_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|lenate_model_8_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/lenate/lenate_model_8 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-llm_b_hw1_en.md b/docs/_posts/ahmedlone127/2024-09-20-llm_b_hw1_en.md new file mode 100644 index 00000000000000..22ccb9858a8e6c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-llm_b_hw1_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English llm_b_hw1 DistilBertForSequenceClassification from VincentYH +author: John Snow Labs +name: llm_b_hw1 +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`llm_b_hw1` is a English model originally trained by VincentYH. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/llm_b_hw1_en_5.5.0_3.0_1726841448495.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/llm_b_hw1_en_5.5.0_3.0_1726841448495.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("llm_b_hw1","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("llm_b_hw1", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|llm_b_hw1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/VincentYH/LLM_B_HW1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-llm_b_hw1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-llm_b_hw1_pipeline_en.md new file mode 100644 index 00000000000000..ef06e8a0e8784f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-llm_b_hw1_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English llm_b_hw1_pipeline pipeline DistilBertForSequenceClassification from VincentYH +author: John Snow Labs +name: llm_b_hw1_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`llm_b_hw1_pipeline` is a English model originally trained by VincentYH. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/llm_b_hw1_pipeline_en_5.5.0_3.0_1726841460739.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/llm_b_hw1_pipeline_en_5.5.0_3.0_1726841460739.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("llm_b_hw1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("llm_b_hw1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|llm_b_hw1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/VincentYH/LLM_B_HW1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-llm_hw1_en.md b/docs/_posts/ahmedlone127/2024-09-20-llm_hw1_en.md new file mode 100644 index 00000000000000..800609e6616855 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-llm_hw1_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English llm_hw1 DistilBertForSequenceClassification from Chenbirdy +author: John Snow Labs +name: llm_hw1 +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`llm_hw1` is a English model originally trained by Chenbirdy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/llm_hw1_en_5.5.0_3.0_1726809438132.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/llm_hw1_en_5.5.0_3.0_1726809438132.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("llm_hw1","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("llm_hw1", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|llm_hw1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Chenbirdy/LLM-HW1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-llm_hw1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-llm_hw1_pipeline_en.md new file mode 100644 index 00000000000000..9f9cccb1a99b42 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-llm_hw1_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English llm_hw1_pipeline pipeline DistilBertForSequenceClassification from Chenbirdy +author: John Snow Labs +name: llm_hw1_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`llm_hw1_pipeline` is a English model originally trained by Chenbirdy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/llm_hw1_pipeline_en_5.5.0_3.0_1726809449307.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/llm_hw1_pipeline_en_5.5.0_3.0_1726809449307.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("llm_hw1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("llm_hw1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|llm_hw1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Chenbirdy/LLM-HW1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-llmclasswork1_en.md b/docs/_posts/ahmedlone127/2024-09-20-llmclasswork1_en.md new file mode 100644 index 00000000000000..5fdbfb4fd3f4c1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-llmclasswork1_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English llmclasswork1 DistilBertForSequenceClassification from halu1003 +author: John Snow Labs +name: llmclasswork1 +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`llmclasswork1` is a English model originally trained by halu1003. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/llmclasswork1_en_5.5.0_3.0_1726842086882.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/llmclasswork1_en_5.5.0_3.0_1726842086882.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("llmclasswork1","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("llmclasswork1", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|llmclasswork1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/halu1003/LLMClassWork1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-llmclasswork1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-llmclasswork1_pipeline_en.md new file mode 100644 index 00000000000000..38efb95530c05b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-llmclasswork1_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English llmclasswork1_pipeline pipeline DistilBertForSequenceClassification from halu1003 +author: John Snow Labs +name: llmclasswork1_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`llmclasswork1_pipeline` is a English model originally trained by halu1003. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/llmclasswork1_pipeline_en_5.5.0_3.0_1726842099151.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/llmclasswork1_pipeline_en_5.5.0_3.0_1726842099151.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("llmclasswork1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("llmclasswork1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|llmclasswork1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/halu1003/LLMClassWork1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-llmhw01_wsyar_en.md b/docs/_posts/ahmedlone127/2024-09-20-llmhw01_wsyar_en.md new file mode 100644 index 00000000000000..c7317fcd17d203 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-llmhw01_wsyar_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English llmhw01_wsyar DistilBertForSequenceClassification from wsyar +author: John Snow Labs +name: llmhw01_wsyar +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`llmhw01_wsyar` is a English model originally trained by wsyar. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/llmhw01_wsyar_en_5.5.0_3.0_1726849045703.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/llmhw01_wsyar_en_5.5.0_3.0_1726849045703.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("llmhw01_wsyar","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("llmhw01_wsyar", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|llmhw01_wsyar| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/wsyar/llmhw01 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-locale_detector_en.md b/docs/_posts/ahmedlone127/2024-09-20-locale_detector_en.md new file mode 100644 index 00000000000000..d6b06cd81d3b90 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-locale_detector_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English locale_detector XlmRoBertaForSequenceClassification from yo +author: John Snow Labs +name: locale_detector +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`locale_detector` is a English model originally trained by yo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/locale_detector_en_5.5.0_3.0_1726866256880.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/locale_detector_en_5.5.0_3.0_1726866256880.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("locale_detector","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("locale_detector", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|locale_detector| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|844.0 MB| + +## References + +https://huggingface.co/yo/locale-detector \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-locale_detector_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-locale_detector_pipeline_en.md new file mode 100644 index 00000000000000..bf896d2872e2ff --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-locale_detector_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English locale_detector_pipeline pipeline XlmRoBertaForSequenceClassification from yo +author: John Snow Labs +name: locale_detector_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`locale_detector_pipeline` is a English model originally trained by yo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/locale_detector_pipeline_en_5.5.0_3.0_1726866368419.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/locale_detector_pipeline_en_5.5.0_3.0_1726866368419.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("locale_detector_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("locale_detector_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|locale_detector_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|844.0 MB| + +## References + +https://huggingface.co/yo/locale-detector + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-log_pretrained_bert_en.md b/docs/_posts/ahmedlone127/2024-09-20-log_pretrained_bert_en.md new file mode 100644 index 00000000000000..c794ea53318c9b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-log_pretrained_bert_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English log_pretrained_bert BertEmbeddings from eun-woo +author: John Snow Labs +name: log_pretrained_bert +date: 2024-09-20 +tags: [en, open_source, onnx, embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`log_pretrained_bert` is a English model originally trained by eun-woo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/log_pretrained_bert_en_5.5.0_3.0_1726825597017.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/log_pretrained_bert_en_5.5.0_3.0_1726825597017.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = BertEmbeddings.pretrained("log_pretrained_bert","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = BertEmbeddings.pretrained("log_pretrained_bert","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|log_pretrained_bert| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[bert]| +|Language:|en| +|Size:|407.6 MB| + +## References + +https://huggingface.co/eun-woo/log_pretrained-bert \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-log_pretrained_bert_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-log_pretrained_bert_pipeline_en.md new file mode 100644 index 00000000000000..8db329077bf86e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-log_pretrained_bert_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English log_pretrained_bert_pipeline pipeline BertEmbeddings from eun-woo +author: John Snow Labs +name: log_pretrained_bert_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`log_pretrained_bert_pipeline` is a English model originally trained by eun-woo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/log_pretrained_bert_pipeline_en_5.5.0_3.0_1726825616703.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/log_pretrained_bert_pipeline_en_5.5.0_3.0_1726825616703.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("log_pretrained_bert_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("log_pretrained_bert_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|log_pretrained_bert_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.6 MB| + +## References + +https://huggingface.co/eun-woo/log_pretrained-bert + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-ltp_roberta_large_defaultltp_roberta_large_default_char_ins_en.md b/docs/_posts/ahmedlone127/2024-09-20-ltp_roberta_large_defaultltp_roberta_large_default_char_ins_en.md new file mode 100644 index 00000000000000..72e24aa317680d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-ltp_roberta_large_defaultltp_roberta_large_default_char_ins_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English ltp_roberta_large_defaultltp_roberta_large_default_char_ins RoBertaForSequenceClassification from sara-nabhani +author: John Snow Labs +name: ltp_roberta_large_defaultltp_roberta_large_default_char_ins +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ltp_roberta_large_defaultltp_roberta_large_default_char_ins` is a English model originally trained by sara-nabhani. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ltp_roberta_large_defaultltp_roberta_large_default_char_ins_en_5.5.0_3.0_1726804467034.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ltp_roberta_large_defaultltp_roberta_large_default_char_ins_en_5.5.0_3.0_1726804467034.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("ltp_roberta_large_defaultltp_roberta_large_default_char_ins","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("ltp_roberta_large_defaultltp_roberta_large_default_char_ins", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ltp_roberta_large_defaultltp_roberta_large_default_char_ins| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/sara-nabhani/ltp-roberta-large-defaultltp-roberta-large-default-char_ins \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-ltp_roberta_large_defaultltp_roberta_large_default_char_ins_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-ltp_roberta_large_defaultltp_roberta_large_default_char_ins_pipeline_en.md new file mode 100644 index 00000000000000..99331266a04810 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-ltp_roberta_large_defaultltp_roberta_large_default_char_ins_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English ltp_roberta_large_defaultltp_roberta_large_default_char_ins_pipeline pipeline RoBertaForSequenceClassification from sara-nabhani +author: John Snow Labs +name: ltp_roberta_large_defaultltp_roberta_large_default_char_ins_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ltp_roberta_large_defaultltp_roberta_large_default_char_ins_pipeline` is a English model originally trained by sara-nabhani. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ltp_roberta_large_defaultltp_roberta_large_default_char_ins_pipeline_en_5.5.0_3.0_1726804540890.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ltp_roberta_large_defaultltp_roberta_large_default_char_ins_pipeline_en_5.5.0_3.0_1726804540890.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("ltp_roberta_large_defaultltp_roberta_large_default_char_ins_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("ltp_roberta_large_defaultltp_roberta_large_default_char_ins_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ltp_roberta_large_defaultltp_roberta_large_default_char_ins_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/sara-nabhani/ltp-roberta-large-defaultltp-roberta-large-default-char_ins + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-marbertv2_finetuned_egyptian_hate_speech_detection_ar.md b/docs/_posts/ahmedlone127/2024-09-20-marbertv2_finetuned_egyptian_hate_speech_detection_ar.md new file mode 100644 index 00000000000000..fb2be14ca04a9f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-marbertv2_finetuned_egyptian_hate_speech_detection_ar.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Arabic marbertv2_finetuned_egyptian_hate_speech_detection BertForSequenceClassification from IbrahimAmin +author: John Snow Labs +name: marbertv2_finetuned_egyptian_hate_speech_detection +date: 2024-09-20 +tags: [ar, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: ar +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`marbertv2_finetuned_egyptian_hate_speech_detection` is a Arabic model originally trained by IbrahimAmin. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/marbertv2_finetuned_egyptian_hate_speech_detection_ar_5.5.0_3.0_1726860432597.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/marbertv2_finetuned_egyptian_hate_speech_detection_ar_5.5.0_3.0_1726860432597.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("marbertv2_finetuned_egyptian_hate_speech_detection","ar") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("marbertv2_finetuned_egyptian_hate_speech_detection", "ar") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|marbertv2_finetuned_egyptian_hate_speech_detection| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|ar| +|Size:|608.8 MB| + +## References + +https://huggingface.co/IbrahimAmin/marbertv2-finetuned-egyptian-hate-speech-detection \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-marbertv2_finetuned_egyptian_hate_speech_detection_pipeline_ar.md b/docs/_posts/ahmedlone127/2024-09-20-marbertv2_finetuned_egyptian_hate_speech_detection_pipeline_ar.md new file mode 100644 index 00000000000000..2ee16ac4621941 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-marbertv2_finetuned_egyptian_hate_speech_detection_pipeline_ar.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Arabic marbertv2_finetuned_egyptian_hate_speech_detection_pipeline pipeline BertForSequenceClassification from IbrahimAmin +author: John Snow Labs +name: marbertv2_finetuned_egyptian_hate_speech_detection_pipeline +date: 2024-09-20 +tags: [ar, open_source, pipeline, onnx] +task: Text Classification +language: ar +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`marbertv2_finetuned_egyptian_hate_speech_detection_pipeline` is a Arabic model originally trained by IbrahimAmin. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/marbertv2_finetuned_egyptian_hate_speech_detection_pipeline_ar_5.5.0_3.0_1726860461568.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/marbertv2_finetuned_egyptian_hate_speech_detection_pipeline_ar_5.5.0_3.0_1726860461568.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("marbertv2_finetuned_egyptian_hate_speech_detection_pipeline", lang = "ar") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("marbertv2_finetuned_egyptian_hate_speech_detection_pipeline", lang = "ar") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|marbertv2_finetuned_egyptian_hate_speech_detection_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|ar| +|Size:|608.8 MB| + +## References + +https://huggingface.co/IbrahimAmin/marbertv2-finetuned-egyptian-hate-speech-detection + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-max_pruned_90_model_en.md b/docs/_posts/ahmedlone127/2024-09-20-max_pruned_90_model_en.md new file mode 100644 index 00000000000000..b2684a99e08730 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-max_pruned_90_model_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English max_pruned_90_model DistilBertForSequenceClassification from andygoh5 +author: John Snow Labs +name: max_pruned_90_model +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`max_pruned_90_model` is a English model originally trained by andygoh5. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/max_pruned_90_model_en_5.5.0_3.0_1726792488415.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/max_pruned_90_model_en_5.5.0_3.0_1726792488415.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("max_pruned_90_model","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("max_pruned_90_model", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|max_pruned_90_model| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/andygoh5/max-pruned-90-model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-max_pruned_90_model_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-max_pruned_90_model_pipeline_en.md new file mode 100644 index 00000000000000..8f8f7de80c2d01 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-max_pruned_90_model_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English max_pruned_90_model_pipeline pipeline DistilBertForSequenceClassification from andygoh5 +author: John Snow Labs +name: max_pruned_90_model_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`max_pruned_90_model_pipeline` is a English model originally trained by andygoh5. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/max_pruned_90_model_pipeline_en_5.5.0_3.0_1726792501048.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/max_pruned_90_model_pipeline_en_5.5.0_3.0_1726792501048.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("max_pruned_90_model_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("max_pruned_90_model_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|max_pruned_90_model_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/andygoh5/max-pruned-90-model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-mental_classification_en.md b/docs/_posts/ahmedlone127/2024-09-20-mental_classification_en.md new file mode 100644 index 00000000000000..2a3e8dc5cc62c3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-mental_classification_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English mental_classification RoBertaForSequenceClassification from Amalq +author: John Snow Labs +name: mental_classification +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`mental_classification` is a English model originally trained by Amalq. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/mental_classification_en_5.5.0_3.0_1726850338897.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/mental_classification_en_5.5.0_3.0_1726850338897.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("mental_classification","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("mental_classification", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|mental_classification| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/Amalq/mental_classification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-mental_classification_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-mental_classification_pipeline_en.md new file mode 100644 index 00000000000000..0f8223ff5c9e3a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-mental_classification_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English mental_classification_pipeline pipeline RoBertaForSequenceClassification from Amalq +author: John Snow Labs +name: mental_classification_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`mental_classification_pipeline` is a English model originally trained by Amalq. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/mental_classification_pipeline_en_5.5.0_3.0_1726850403279.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/mental_classification_pipeline_en_5.5.0_3.0_1726850403279.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("mental_classification_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("mental_classification_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|mental_classification_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/Amalq/mental_classification + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-mixologydb_en.md b/docs/_posts/ahmedlone127/2024-09-20-mixologydb_en.md new file mode 100644 index 00000000000000..dc25419a4d1b3f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-mixologydb_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English mixologydb DistilBertForSequenceClassification from mclemcrew +author: John Snow Labs +name: mixologydb +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`mixologydb` is a English model originally trained by mclemcrew. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/mixologydb_en_5.5.0_3.0_1726848544500.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/mixologydb_en_5.5.0_3.0_1726848544500.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("mixologydb","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("mixologydb", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|mixologydb| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/mclemcrew/MixologyDB \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-mixologydb_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-mixologydb_pipeline_en.md new file mode 100644 index 00000000000000..8033216eb61433 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-mixologydb_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English mixologydb_pipeline pipeline DistilBertForSequenceClassification from mclemcrew +author: John Snow Labs +name: mixologydb_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`mixologydb_pipeline` is a English model originally trained by mclemcrew. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/mixologydb_pipeline_en_5.5.0_3.0_1726848558035.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/mixologydb_pipeline_en_5.5.0_3.0_1726848558035.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("mixologydb_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("mixologydb_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|mixologydb_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/mclemcrew/MixologyDB + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-mlm_nomsgadded_en.md b/docs/_posts/ahmedlone127/2024-09-20-mlm_nomsgadded_en.md new file mode 100644 index 00000000000000..05f06363335796 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-mlm_nomsgadded_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English mlm_nomsgadded RoBertaEmbeddings from nomsgadded +author: John Snow Labs +name: mlm_nomsgadded +date: 2024-09-20 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`mlm_nomsgadded` is a English model originally trained by nomsgadded. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/mlm_nomsgadded_en_5.5.0_3.0_1726857649901.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/mlm_nomsgadded_en_5.5.0_3.0_1726857649901.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("mlm_nomsgadded","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("mlm_nomsgadded","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|mlm_nomsgadded| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|466.3 MB| + +## References + +https://huggingface.co/nomsgadded/mlm \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-mlm_nomsgadded_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-mlm_nomsgadded_pipeline_en.md new file mode 100644 index 00000000000000..54b059e0aa1c73 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-mlm_nomsgadded_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English mlm_nomsgadded_pipeline pipeline RoBertaEmbeddings from nomsgadded +author: John Snow Labs +name: mlm_nomsgadded_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`mlm_nomsgadded_pipeline` is a English model originally trained by nomsgadded. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/mlm_nomsgadded_pipeline_en_5.5.0_3.0_1726857671742.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/mlm_nomsgadded_pipeline_en_5.5.0_3.0_1726857671742.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("mlm_nomsgadded_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("mlm_nomsgadded_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|mlm_nomsgadded_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|466.3 MB| + +## References + +https://huggingface.co/nomsgadded/mlm + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-mlm_pretrain_model_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-mlm_pretrain_model_pipeline_en.md new file mode 100644 index 00000000000000..23804bfdc6dfe5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-mlm_pretrain_model_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English mlm_pretrain_model_pipeline pipeline RoBertaEmbeddings from pavi156 +author: John Snow Labs +name: mlm_pretrain_model_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`mlm_pretrain_model_pipeline` is a English model originally trained by pavi156. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/mlm_pretrain_model_pipeline_en_5.5.0_3.0_1726816157274.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/mlm_pretrain_model_pipeline_en_5.5.0_3.0_1726816157274.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("mlm_pretrain_model_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("mlm_pretrain_model_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|mlm_pretrain_model_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|306.4 MB| + +## References + +https://huggingface.co/pavi156/mlm_pretrain_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-mlops_bd_0509_en.md b/docs/_posts/ahmedlone127/2024-09-20-mlops_bd_0509_en.md new file mode 100644 index 00000000000000..3125ba189162f5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-mlops_bd_0509_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English mlops_bd_0509 DistilBertForSequenceClassification from AliMokh +author: John Snow Labs +name: mlops_bd_0509 +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`mlops_bd_0509` is a English model originally trained by AliMokh. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/mlops_bd_0509_en_5.5.0_3.0_1726830007417.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/mlops_bd_0509_en_5.5.0_3.0_1726830007417.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("mlops_bd_0509","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("mlops_bd_0509", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|mlops_bd_0509| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/AliMokh/MLOps_BD_0509 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-mlops_bd_0509_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-mlops_bd_0509_pipeline_en.md new file mode 100644 index 00000000000000..5b337d7935dd28 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-mlops_bd_0509_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English mlops_bd_0509_pipeline pipeline DistilBertForSequenceClassification from AliMokh +author: John Snow Labs +name: mlops_bd_0509_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`mlops_bd_0509_pipeline` is a English model originally trained by AliMokh. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/mlops_bd_0509_pipeline_en_5.5.0_3.0_1726830020384.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/mlops_bd_0509_pipeline_en_5.5.0_3.0_1726830020384.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("mlops_bd_0509_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("mlops_bd_0509_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|mlops_bd_0509_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/AliMokh/MLOps_BD_0509 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-model_3_en.md b/docs/_posts/ahmedlone127/2024-09-20-model_3_en.md new file mode 100644 index 00000000000000..86bcd63bda82a8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-model_3_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English model_3 BertForSequenceClassification from cannotbolt +author: John Snow Labs +name: model_3 +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`model_3` is a English model originally trained by cannotbolt. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/model_3_en_5.5.0_3.0_1726870087492.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/model_3_en_5.5.0_3.0_1726870087492.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("model_3","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("model_3", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|model_3| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/cannotbolt/model_3 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-model_3_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-model_3_pipeline_en.md new file mode 100644 index 00000000000000..019c3fd69462dc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-model_3_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English model_3_pipeline pipeline BertForSequenceClassification from cannotbolt +author: John Snow Labs +name: model_3_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`model_3_pipeline` is a English model originally trained by cannotbolt. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/model_3_pipeline_en_5.5.0_3.0_1726870106996.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/model_3_pipeline_en_5.5.0_3.0_1726870106996.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("model_3_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("model_3_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|model_3_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/cannotbolt/model_3 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-model_jbinek_en.md b/docs/_posts/ahmedlone127/2024-09-20-model_jbinek_en.md new file mode 100644 index 00000000000000..f2b9ee3c15f21f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-model_jbinek_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English model_jbinek RoBertaForSequenceClassification from jbinek +author: John Snow Labs +name: model_jbinek +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`model_jbinek` is a English model originally trained by jbinek. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/model_jbinek_en_5.5.0_3.0_1726851940822.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/model_jbinek_en_5.5.0_3.0_1726851940822.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("model_jbinek","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("model_jbinek", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|model_jbinek| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|416.4 MB| + +## References + +https://huggingface.co/jbinek/model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-model_name_emma0123_en.md b/docs/_posts/ahmedlone127/2024-09-20-model_name_emma0123_en.md new file mode 100644 index 00000000000000..75dae81efd36f9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-model_name_emma0123_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English model_name_emma0123 DistilBertForSequenceClassification from Emma0123 +author: John Snow Labs +name: model_name_emma0123 +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`model_name_emma0123` is a English model originally trained by Emma0123. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/model_name_emma0123_en_5.5.0_3.0_1726830087551.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/model_name_emma0123_en_5.5.0_3.0_1726830087551.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("model_name_emma0123","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("model_name_emma0123", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|model_name_emma0123| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Emma0123/model_name \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-model_name_emma0123_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-model_name_emma0123_pipeline_en.md new file mode 100644 index 00000000000000..f6ce6d8a31c913 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-model_name_emma0123_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English model_name_emma0123_pipeline pipeline DistilBertForSequenceClassification from Emma0123 +author: John Snow Labs +name: model_name_emma0123_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`model_name_emma0123_pipeline` is a English model originally trained by Emma0123. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/model_name_emma0123_pipeline_en_5.5.0_3.0_1726830100796.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/model_name_emma0123_pipeline_en_5.5.0_3.0_1726830100796.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("model_name_emma0123_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("model_name_emma0123_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|model_name_emma0123_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Emma0123/model_name + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-modello_finetunato_en.md b/docs/_posts/ahmedlone127/2024-09-20-modello_finetunato_en.md new file mode 100644 index 00000000000000..b9dd1122acb701 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-modello_finetunato_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English modello_finetunato DistilBertForSequenceClassification from soniarocca31 +author: John Snow Labs +name: modello_finetunato +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`modello_finetunato` is a English model originally trained by soniarocca31. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/modello_finetunato_en_5.5.0_3.0_1726848805090.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/modello_finetunato_en_5.5.0_3.0_1726848805090.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("modello_finetunato","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("modello_finetunato", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|modello_finetunato| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|507.6 MB| + +## References + +https://huggingface.co/soniarocca31/modello_finetunato \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-modello_finetunato_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-modello_finetunato_pipeline_en.md new file mode 100644 index 00000000000000..3d4326fe76493e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-modello_finetunato_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English modello_finetunato_pipeline pipeline DistilBertForSequenceClassification from soniarocca31 +author: John Snow Labs +name: modello_finetunato_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`modello_finetunato_pipeline` is a English model originally trained by soniarocca31. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/modello_finetunato_pipeline_en_5.5.0_3.0_1726848836693.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/modello_finetunato_pipeline_en_5.5.0_3.0_1726848836693.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("modello_finetunato_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("modello_finetunato_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|modello_finetunato_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|507.6 MB| + +## References + +https://huggingface.co/soniarocca31/modello_finetunato + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-mongolian_xlm_roberta_base_ner_hrl_mn.md b/docs/_posts/ahmedlone127/2024-09-20-mongolian_xlm_roberta_base_ner_hrl_mn.md new file mode 100644 index 00000000000000..cc17f6184ef295 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-mongolian_xlm_roberta_base_ner_hrl_mn.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Mongolian mongolian_xlm_roberta_base_ner_hrl XlmRoBertaForTokenClassification from srglnjmb +author: John Snow Labs +name: mongolian_xlm_roberta_base_ner_hrl +date: 2024-09-20 +tags: [mn, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: mn +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`mongolian_xlm_roberta_base_ner_hrl` is a Mongolian model originally trained by srglnjmb. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/mongolian_xlm_roberta_base_ner_hrl_mn_5.5.0_3.0_1726843479456.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/mongolian_xlm_roberta_base_ner_hrl_mn_5.5.0_3.0_1726843479456.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("mongolian_xlm_roberta_base_ner_hrl","mn") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("mongolian_xlm_roberta_base_ner_hrl", "mn") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|mongolian_xlm_roberta_base_ner_hrl| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|mn| +|Size:|911.8 MB| + +## References + +https://huggingface.co/srglnjmb/Mongolian-xlm-roberta-base-ner-hrl \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-mongolian_xlm_roberta_base_ner_hrl_pipeline_mn.md b/docs/_posts/ahmedlone127/2024-09-20-mongolian_xlm_roberta_base_ner_hrl_pipeline_mn.md new file mode 100644 index 00000000000000..9508234fc8960b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-mongolian_xlm_roberta_base_ner_hrl_pipeline_mn.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Mongolian mongolian_xlm_roberta_base_ner_hrl_pipeline pipeline XlmRoBertaForTokenClassification from srglnjmb +author: John Snow Labs +name: mongolian_xlm_roberta_base_ner_hrl_pipeline +date: 2024-09-20 +tags: [mn, open_source, pipeline, onnx] +task: Named Entity Recognition +language: mn +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`mongolian_xlm_roberta_base_ner_hrl_pipeline` is a Mongolian model originally trained by srglnjmb. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/mongolian_xlm_roberta_base_ner_hrl_pipeline_mn_5.5.0_3.0_1726843541770.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/mongolian_xlm_roberta_base_ner_hrl_pipeline_mn_5.5.0_3.0_1726843541770.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("mongolian_xlm_roberta_base_ner_hrl_pipeline", lang = "mn") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("mongolian_xlm_roberta_base_ner_hrl_pipeline", lang = "mn") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|mongolian_xlm_roberta_base_ner_hrl_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|mn| +|Size:|911.8 MB| + +## References + +https://huggingface.co/srglnjmb/Mongolian-xlm-roberta-base-ner-hrl + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-movie_genre_multi_classification_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-movie_genre_multi_classification_pipeline_en.md new file mode 100644 index 00000000000000..b1204ebdae3b41 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-movie_genre_multi_classification_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English movie_genre_multi_classification_pipeline pipeline DistilBertForSequenceClassification from handler-bird +author: John Snow Labs +name: movie_genre_multi_classification_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`movie_genre_multi_classification_pipeline` is a English model originally trained by handler-bird. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/movie_genre_multi_classification_pipeline_en_5.5.0_3.0_1726809633901.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/movie_genre_multi_classification_pipeline_en_5.5.0_3.0_1726809633901.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("movie_genre_multi_classification_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("movie_genre_multi_classification_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|movie_genre_multi_classification_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.6 MB| + +## References + +https://huggingface.co/handler-bird/movie_genre_multi_classification + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-mtwitter_roberta_base_model_reviewingcls_r1_en.md b/docs/_posts/ahmedlone127/2024-09-20-mtwitter_roberta_base_model_reviewingcls_r1_en.md new file mode 100644 index 00000000000000..4e34005a7082b0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-mtwitter_roberta_base_model_reviewingcls_r1_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English mtwitter_roberta_base_model_reviewingcls_r1 RoBertaForSequenceClassification from Yeerchiu +author: John Snow Labs +name: mtwitter_roberta_base_model_reviewingcls_r1 +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`mtwitter_roberta_base_model_reviewingcls_r1` is a English model originally trained by Yeerchiu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/mtwitter_roberta_base_model_reviewingcls_r1_en_5.5.0_3.0_1726852163081.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/mtwitter_roberta_base_model_reviewingcls_r1_en_5.5.0_3.0_1726852163081.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("mtwitter_roberta_base_model_reviewingcls_r1","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("mtwitter_roberta_base_model_reviewingcls_r1", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|mtwitter_roberta_base_model_reviewingcls_r1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|468.3 MB| + +## References + +https://huggingface.co/Yeerchiu/mtwitter-roberta-base-model-reviewingcls-r1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-mtwitter_roberta_base_model_reviewingcls_r1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-mtwitter_roberta_base_model_reviewingcls_r1_pipeline_en.md new file mode 100644 index 00000000000000..294ee2e5b7512c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-mtwitter_roberta_base_model_reviewingcls_r1_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English mtwitter_roberta_base_model_reviewingcls_r1_pipeline pipeline RoBertaForSequenceClassification from Yeerchiu +author: John Snow Labs +name: mtwitter_roberta_base_model_reviewingcls_r1_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`mtwitter_roberta_base_model_reviewingcls_r1_pipeline` is a English model originally trained by Yeerchiu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/mtwitter_roberta_base_model_reviewingcls_r1_pipeline_en_5.5.0_3.0_1726852186095.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/mtwitter_roberta_base_model_reviewingcls_r1_pipeline_en_5.5.0_3.0_1726852186095.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("mtwitter_roberta_base_model_reviewingcls_r1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("mtwitter_roberta_base_model_reviewingcls_r1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|mtwitter_roberta_base_model_reviewingcls_r1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|468.3 MB| + +## References + +https://huggingface.co/Yeerchiu/mtwitter-roberta-base-model-reviewingcls-r1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-n_distilbert_twitterfin_padding10model_wyzhw_en.md b/docs/_posts/ahmedlone127/2024-09-20-n_distilbert_twitterfin_padding10model_wyzhw_en.md new file mode 100644 index 00000000000000..43d2716b8496c0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-n_distilbert_twitterfin_padding10model_wyzhw_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English n_distilbert_twitterfin_padding10model_wyzhw DistilBertForSequenceClassification from wyzhw +author: John Snow Labs +name: n_distilbert_twitterfin_padding10model_wyzhw +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`n_distilbert_twitterfin_padding10model_wyzhw` is a English model originally trained by wyzhw. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/n_distilbert_twitterfin_padding10model_wyzhw_en_5.5.0_3.0_1726860761682.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/n_distilbert_twitterfin_padding10model_wyzhw_en_5.5.0_3.0_1726860761682.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("n_distilbert_twitterfin_padding10model_wyzhw","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("n_distilbert_twitterfin_padding10model_wyzhw", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|n_distilbert_twitterfin_padding10model_wyzhw| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/wyzhw/N_distilbert_twitterfin_padding10model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-n_distilbert_twitterfin_padding10model_wyzhw_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-n_distilbert_twitterfin_padding10model_wyzhw_pipeline_en.md new file mode 100644 index 00000000000000..d1c47d9aa9e4cd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-n_distilbert_twitterfin_padding10model_wyzhw_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English n_distilbert_twitterfin_padding10model_wyzhw_pipeline pipeline DistilBertForSequenceClassification from wyzhw +author: John Snow Labs +name: n_distilbert_twitterfin_padding10model_wyzhw_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`n_distilbert_twitterfin_padding10model_wyzhw_pipeline` is a English model originally trained by wyzhw. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/n_distilbert_twitterfin_padding10model_wyzhw_pipeline_en_5.5.0_3.0_1726860773633.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/n_distilbert_twitterfin_padding10model_wyzhw_pipeline_en_5.5.0_3.0_1726860773633.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("n_distilbert_twitterfin_padding10model_wyzhw_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("n_distilbert_twitterfin_padding10model_wyzhw_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|n_distilbert_twitterfin_padding10model_wyzhw_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/wyzhw/N_distilbert_twitterfin_padding10model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-ndd_claroline_test_content_tags_en.md b/docs/_posts/ahmedlone127/2024-09-20-ndd_claroline_test_content_tags_en.md new file mode 100644 index 00000000000000..c69dada0c9df41 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-ndd_claroline_test_content_tags_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English ndd_claroline_test_content_tags DistilBertForSequenceClassification from lgk03 +author: John Snow Labs +name: ndd_claroline_test_content_tags +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ndd_claroline_test_content_tags` is a English model originally trained by lgk03. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ndd_claroline_test_content_tags_en_5.5.0_3.0_1726861207724.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ndd_claroline_test_content_tags_en_5.5.0_3.0_1726861207724.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("ndd_claroline_test_content_tags","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("ndd_claroline_test_content_tags", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ndd_claroline_test_content_tags| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/lgk03/NDD-claroline_test-content_tags \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-ndd_claroline_test_content_tags_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-ndd_claroline_test_content_tags_pipeline_en.md new file mode 100644 index 00000000000000..706b27f53ff22f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-ndd_claroline_test_content_tags_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English ndd_claroline_test_content_tags_pipeline pipeline DistilBertForSequenceClassification from lgk03 +author: John Snow Labs +name: ndd_claroline_test_content_tags_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ndd_claroline_test_content_tags_pipeline` is a English model originally trained by lgk03. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ndd_claroline_test_content_tags_pipeline_en_5.5.0_3.0_1726861219982.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ndd_claroline_test_content_tags_pipeline_en_5.5.0_3.0_1726861219982.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("ndd_claroline_test_content_tags_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("ndd_claroline_test_content_tags_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ndd_claroline_test_content_tags_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/lgk03/NDD-claroline_test-content_tags + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-ndd_mantisbt_test_tags_en.md b/docs/_posts/ahmedlone127/2024-09-20-ndd_mantisbt_test_tags_en.md new file mode 100644 index 00000000000000..1ba58b844cc9da --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-ndd_mantisbt_test_tags_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English ndd_mantisbt_test_tags DistilBertForSequenceClassification from lgk03 +author: John Snow Labs +name: ndd_mantisbt_test_tags +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ndd_mantisbt_test_tags` is a English model originally trained by lgk03. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ndd_mantisbt_test_tags_en_5.5.0_3.0_1726871306752.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ndd_mantisbt_test_tags_en_5.5.0_3.0_1726871306752.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("ndd_mantisbt_test_tags","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("ndd_mantisbt_test_tags", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ndd_mantisbt_test_tags| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/lgk03/NDD-mantisbt_test-tags \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-ndd_mantisbt_test_tags_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-ndd_mantisbt_test_tags_pipeline_en.md new file mode 100644 index 00000000000000..15336e07cc8829 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-ndd_mantisbt_test_tags_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English ndd_mantisbt_test_tags_pipeline pipeline DistilBertForSequenceClassification from lgk03 +author: John Snow Labs +name: ndd_mantisbt_test_tags_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ndd_mantisbt_test_tags_pipeline` is a English model originally trained by lgk03. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ndd_mantisbt_test_tags_pipeline_en_5.5.0_3.0_1726871320100.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ndd_mantisbt_test_tags_pipeline_en_5.5.0_3.0_1726871320100.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("ndd_mantisbt_test_tags_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("ndd_mantisbt_test_tags_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ndd_mantisbt_test_tags_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/lgk03/NDD-mantisbt_test-tags + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-ndd_petclinic_test_content_tags_en.md b/docs/_posts/ahmedlone127/2024-09-20-ndd_petclinic_test_content_tags_en.md new file mode 100644 index 00000000000000..96d133a0df72ed --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-ndd_petclinic_test_content_tags_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English ndd_petclinic_test_content_tags DistilBertForSequenceClassification from lgk03 +author: John Snow Labs +name: ndd_petclinic_test_content_tags +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ndd_petclinic_test_content_tags` is a English model originally trained by lgk03. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ndd_petclinic_test_content_tags_en_5.5.0_3.0_1726829907645.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ndd_petclinic_test_content_tags_en_5.5.0_3.0_1726829907645.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("ndd_petclinic_test_content_tags","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("ndd_petclinic_test_content_tags", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ndd_petclinic_test_content_tags| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/lgk03/NDD-petclinic_test-content_tags \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-ndd_petclinic_test_content_tags_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-ndd_petclinic_test_content_tags_pipeline_en.md new file mode 100644 index 00000000000000..5d69dd3e4c3cf4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-ndd_petclinic_test_content_tags_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English ndd_petclinic_test_content_tags_pipeline pipeline DistilBertForSequenceClassification from lgk03 +author: John Snow Labs +name: ndd_petclinic_test_content_tags_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ndd_petclinic_test_content_tags_pipeline` is a English model originally trained by lgk03. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ndd_petclinic_test_content_tags_pipeline_en_5.5.0_3.0_1726829919462.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ndd_petclinic_test_content_tags_pipeline_en_5.5.0_3.0_1726829919462.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("ndd_petclinic_test_content_tags_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("ndd_petclinic_test_content_tags_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ndd_petclinic_test_content_tags_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/lgk03/NDD-petclinic_test-content_tags + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-nepal_bhasa_dummy_model_thewitcher_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-nepal_bhasa_dummy_model_thewitcher_pipeline_en.md new file mode 100644 index 00000000000000..efca09755930a9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-nepal_bhasa_dummy_model_thewitcher_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English nepal_bhasa_dummy_model_thewitcher_pipeline pipeline DistilBertForSequenceClassification from theWitcher +author: John Snow Labs +name: nepal_bhasa_dummy_model_thewitcher_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`nepal_bhasa_dummy_model_thewitcher_pipeline` is a English model originally trained by theWitcher. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/nepal_bhasa_dummy_model_thewitcher_pipeline_en_5.5.0_3.0_1726809197972.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/nepal_bhasa_dummy_model_thewitcher_pipeline_en_5.5.0_3.0_1726809197972.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("nepal_bhasa_dummy_model_thewitcher_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("nepal_bhasa_dummy_model_thewitcher_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|nepal_bhasa_dummy_model_thewitcher_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/theWitcher/new-dummy-model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-ner_ner_random0_seed2_roberta_large_en.md b/docs/_posts/ahmedlone127/2024-09-20-ner_ner_random0_seed2_roberta_large_en.md new file mode 100644 index 00000000000000..25d336c7aaabf8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-ner_ner_random0_seed2_roberta_large_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English ner_ner_random0_seed2_roberta_large RoBertaForTokenClassification from tweettemposhift +author: John Snow Labs +name: ner_ner_random0_seed2_roberta_large +date: 2024-09-20 +tags: [en, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ner_ner_random0_seed2_roberta_large` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ner_ner_random0_seed2_roberta_large_en_5.5.0_3.0_1726847326962.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ner_ner_random0_seed2_roberta_large_en_5.5.0_3.0_1726847326962.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("ner_ner_random0_seed2_roberta_large","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("ner_ner_random0_seed2_roberta_large", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ner_ner_random0_seed2_roberta_large| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/tweettemposhift/ner-ner_random0_seed2-roberta-large \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-ner_ner_random0_seed2_roberta_large_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-ner_ner_random0_seed2_roberta_large_pipeline_en.md new file mode 100644 index 00000000000000..539e551b1e7e4d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-ner_ner_random0_seed2_roberta_large_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English ner_ner_random0_seed2_roberta_large_pipeline pipeline RoBertaForTokenClassification from tweettemposhift +author: John Snow Labs +name: ner_ner_random0_seed2_roberta_large_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ner_ner_random0_seed2_roberta_large_pipeline` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ner_ner_random0_seed2_roberta_large_pipeline_en_5.5.0_3.0_1726847404591.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ner_ner_random0_seed2_roberta_large_pipeline_en_5.5.0_3.0_1726847404591.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("ner_ner_random0_seed2_roberta_large_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("ner_ner_random0_seed2_roberta_large_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ner_ner_random0_seed2_roberta_large_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/tweettemposhift/ner-ner_random0_seed2-roberta-large + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-ner_ner_random2_seed1_twitter_roberta_large_2022_154m_en.md b/docs/_posts/ahmedlone127/2024-09-20-ner_ner_random2_seed1_twitter_roberta_large_2022_154m_en.md new file mode 100644 index 00000000000000..106207976c5f4a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-ner_ner_random2_seed1_twitter_roberta_large_2022_154m_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English ner_ner_random2_seed1_twitter_roberta_large_2022_154m RoBertaForTokenClassification from tweettemposhift +author: John Snow Labs +name: ner_ner_random2_seed1_twitter_roberta_large_2022_154m +date: 2024-09-20 +tags: [en, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ner_ner_random2_seed1_twitter_roberta_large_2022_154m` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ner_ner_random2_seed1_twitter_roberta_large_2022_154m_en_5.5.0_3.0_1726853704141.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ner_ner_random2_seed1_twitter_roberta_large_2022_154m_en_5.5.0_3.0_1726853704141.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("ner_ner_random2_seed1_twitter_roberta_large_2022_154m","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("ner_ner_random2_seed1_twitter_roberta_large_2022_154m", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ner_ner_random2_seed1_twitter_roberta_large_2022_154m| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/tweettemposhift/ner-ner_random2_seed1-twitter-roberta-large-2022-154m \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-ner_ner_random2_seed1_twitter_roberta_large_2022_154m_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-ner_ner_random2_seed1_twitter_roberta_large_2022_154m_pipeline_en.md new file mode 100644 index 00000000000000..78d04bb0842c28 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-ner_ner_random2_seed1_twitter_roberta_large_2022_154m_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English ner_ner_random2_seed1_twitter_roberta_large_2022_154m_pipeline pipeline RoBertaForTokenClassification from tweettemposhift +author: John Snow Labs +name: ner_ner_random2_seed1_twitter_roberta_large_2022_154m_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ner_ner_random2_seed1_twitter_roberta_large_2022_154m_pipeline` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ner_ner_random2_seed1_twitter_roberta_large_2022_154m_pipeline_en_5.5.0_3.0_1726853767398.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ner_ner_random2_seed1_twitter_roberta_large_2022_154m_pipeline_en_5.5.0_3.0_1726853767398.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("ner_ner_random2_seed1_twitter_roberta_large_2022_154m_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("ner_ner_random2_seed1_twitter_roberta_large_2022_154m_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ner_ner_random2_seed1_twitter_roberta_large_2022_154m_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/tweettemposhift/ner-ner_random2_seed1-twitter-roberta-large-2022-154m + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-nerd_nerd_random1_seed0_bernice_en.md b/docs/_posts/ahmedlone127/2024-09-20-nerd_nerd_random1_seed0_bernice_en.md new file mode 100644 index 00000000000000..f8333d7bc319d6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-nerd_nerd_random1_seed0_bernice_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English nerd_nerd_random1_seed0_bernice XlmRoBertaForSequenceClassification from tweettemposhift +author: John Snow Labs +name: nerd_nerd_random1_seed0_bernice +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`nerd_nerd_random1_seed0_bernice` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/nerd_nerd_random1_seed0_bernice_en_5.5.0_3.0_1726846145859.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/nerd_nerd_random1_seed0_bernice_en_5.5.0_3.0_1726846145859.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("nerd_nerd_random1_seed0_bernice","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("nerd_nerd_random1_seed0_bernice", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|nerd_nerd_random1_seed0_bernice| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|831.7 MB| + +## References + +https://huggingface.co/tweettemposhift/nerd-nerd_random1_seed0-bernice \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-nerd_nerd_random1_seed0_bernice_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-nerd_nerd_random1_seed0_bernice_pipeline_en.md new file mode 100644 index 00000000000000..29aaaa3f7fae50 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-nerd_nerd_random1_seed0_bernice_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English nerd_nerd_random1_seed0_bernice_pipeline pipeline XlmRoBertaForSequenceClassification from tweettemposhift +author: John Snow Labs +name: nerd_nerd_random1_seed0_bernice_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`nerd_nerd_random1_seed0_bernice_pipeline` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/nerd_nerd_random1_seed0_bernice_pipeline_en_5.5.0_3.0_1726846271018.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/nerd_nerd_random1_seed0_bernice_pipeline_en_5.5.0_3.0_1726846271018.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("nerd_nerd_random1_seed0_bernice_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("nerd_nerd_random1_seed0_bernice_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|nerd_nerd_random1_seed0_bernice_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|831.7 MB| + +## References + +https://huggingface.co/tweettemposhift/nerd-nerd_random1_seed0-bernice + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-netuid1_classification_en.md b/docs/_posts/ahmedlone127/2024-09-20-netuid1_classification_en.md new file mode 100644 index 00000000000000..d26fc616882e1a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-netuid1_classification_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English netuid1_classification DistilBertForSequenceClassification from 0x9 +author: John Snow Labs +name: netuid1_classification +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`netuid1_classification` is a English model originally trained by 0x9. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/netuid1_classification_en_5.5.0_3.0_1726848550707.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/netuid1_classification_en_5.5.0_3.0_1726848550707.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("netuid1_classification","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("netuid1_classification", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|netuid1_classification| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/0x9/netuid1-classification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-netuid1_classification_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-netuid1_classification_pipeline_en.md new file mode 100644 index 00000000000000..02e0a475b0b205 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-netuid1_classification_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English netuid1_classification_pipeline pipeline DistilBertForSequenceClassification from 0x9 +author: John Snow Labs +name: netuid1_classification_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`netuid1_classification_pipeline` is a English model originally trained by 0x9. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/netuid1_classification_pipeline_en_5.5.0_3.0_1726848562650.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/netuid1_classification_pipeline_en_5.5.0_3.0_1726848562650.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("netuid1_classification_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("netuid1_classification_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|netuid1_classification_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/0x9/netuid1-classification + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-news_classifier_pipeline_ru.md b/docs/_posts/ahmedlone127/2024-09-20-news_classifier_pipeline_ru.md new file mode 100644 index 00000000000000..073ced5e129f0f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-news_classifier_pipeline_ru.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Russian news_classifier_pipeline pipeline BertForSequenceClassification from MikhailRepkin +author: John Snow Labs +name: news_classifier_pipeline +date: 2024-09-20 +tags: [ru, open_source, pipeline, onnx] +task: Text Classification +language: ru +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`news_classifier_pipeline` is a Russian model originally trained by MikhailRepkin. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/news_classifier_pipeline_ru_5.5.0_3.0_1726870045582.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/news_classifier_pipeline_ru_5.5.0_3.0_1726870045582.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("news_classifier_pipeline", lang = "ru") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("news_classifier_pipeline", lang = "ru") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|news_classifier_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|ru| +|Size:|666.6 MB| + +## References + +https://huggingface.co/MikhailRepkin/news_classifier + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-news_classifier_ru.md b/docs/_posts/ahmedlone127/2024-09-20-news_classifier_ru.md new file mode 100644 index 00000000000000..34e9f4a477390a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-news_classifier_ru.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Russian news_classifier BertForSequenceClassification from MikhailRepkin +author: John Snow Labs +name: news_classifier +date: 2024-09-20 +tags: [ru, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: ru +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`news_classifier` is a Russian model originally trained by MikhailRepkin. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/news_classifier_ru_5.5.0_3.0_1726870014135.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/news_classifier_ru_5.5.0_3.0_1726870014135.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("news_classifier","ru") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("news_classifier", "ru") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|news_classifier| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|ru| +|Size:|666.5 MB| + +## References + +https://huggingface.co/MikhailRepkin/news_classifier \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-nlp2_base_1e_4_en.md b/docs/_posts/ahmedlone127/2024-09-20-nlp2_base_1e_4_en.md new file mode 100644 index 00000000000000..82ddd84053ecb5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-nlp2_base_1e_4_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English nlp2_base_1e_4 DistilBertForSequenceClassification from NathanJLee +author: John Snow Labs +name: nlp2_base_1e_4 +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`nlp2_base_1e_4` is a English model originally trained by NathanJLee. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/nlp2_base_1e_4_en_5.5.0_3.0_1726840996249.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/nlp2_base_1e_4_en_5.5.0_3.0_1726840996249.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("nlp2_base_1e_4","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("nlp2_base_1e_4", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|nlp2_base_1e_4| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/NathanJLee/NLP2_Base_1e-4 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-nlp2_base_1e_4_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-nlp2_base_1e_4_pipeline_en.md new file mode 100644 index 00000000000000..4230cfce3a2939 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-nlp2_base_1e_4_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English nlp2_base_1e_4_pipeline pipeline DistilBertForSequenceClassification from NathanJLee +author: John Snow Labs +name: nlp2_base_1e_4_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`nlp2_base_1e_4_pipeline` is a English model originally trained by NathanJLee. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/nlp2_base_1e_4_pipeline_en_5.5.0_3.0_1726841012862.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/nlp2_base_1e_4_pipeline_en_5.5.0_3.0_1726841012862.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("nlp2_base_1e_4_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("nlp2_base_1e_4_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|nlp2_base_1e_4_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/NathanJLee/NLP2_Base_1e-4 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-nlp2_base_3e_4_nathanjlee_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-nlp2_base_3e_4_nathanjlee_pipeline_en.md new file mode 100644 index 00000000000000..9a2ecfd378ac1c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-nlp2_base_3e_4_nathanjlee_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English nlp2_base_3e_4_nathanjlee_pipeline pipeline DistilBertForSequenceClassification from NathanJLee +author: John Snow Labs +name: nlp2_base_3e_4_nathanjlee_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`nlp2_base_3e_4_nathanjlee_pipeline` is a English model originally trained by NathanJLee. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/nlp2_base_3e_4_nathanjlee_pipeline_en_5.5.0_3.0_1726849212811.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/nlp2_base_3e_4_nathanjlee_pipeline_en_5.5.0_3.0_1726849212811.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("nlp2_base_3e_4_nathanjlee_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("nlp2_base_3e_4_nathanjlee_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|nlp2_base_3e_4_nathanjlee_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/NathanJLee/NLP2_Base_3e-4 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-nlp2_base_5e_5_en.md b/docs/_posts/ahmedlone127/2024-09-20-nlp2_base_5e_5_en.md new file mode 100644 index 00000000000000..9d33383ecf674c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-nlp2_base_5e_5_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English nlp2_base_5e_5 DistilBertForSequenceClassification from VRT-2428211 +author: John Snow Labs +name: nlp2_base_5e_5 +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`nlp2_base_5e_5` is a English model originally trained by VRT-2428211. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/nlp2_base_5e_5_en_5.5.0_3.0_1726830326773.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/nlp2_base_5e_5_en_5.5.0_3.0_1726830326773.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("nlp2_base_5e_5","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("nlp2_base_5e_5", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|nlp2_base_5e_5| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/VRT-2428211/NLP2_Base_5e-5 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-nlp2_base_5e_5_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-nlp2_base_5e_5_pipeline_en.md new file mode 100644 index 00000000000000..c42c4c0ddc25bf --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-nlp2_base_5e_5_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English nlp2_base_5e_5_pipeline pipeline DistilBertForSequenceClassification from VRT-2428211 +author: John Snow Labs +name: nlp2_base_5e_5_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`nlp2_base_5e_5_pipeline` is a English model originally trained by VRT-2428211. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/nlp2_base_5e_5_pipeline_en_5.5.0_3.0_1726830339462.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/nlp2_base_5e_5_pipeline_en_5.5.0_3.0_1726830339462.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("nlp2_base_5e_5_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("nlp2_base_5e_5_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|nlp2_base_5e_5_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/VRT-2428211/NLP2_Base_5e-5 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-nlp_cw_en.md b/docs/_posts/ahmedlone127/2024-09-20-nlp_cw_en.md new file mode 100644 index 00000000000000..10e1fdb3146578 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-nlp_cw_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English nlp_cw RoBertaForTokenClassification from venkateshtata +author: John Snow Labs +name: nlp_cw +date: 2024-09-20 +tags: [en, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`nlp_cw` is a English model originally trained by venkateshtata. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/nlp_cw_en_5.5.0_3.0_1726847615408.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/nlp_cw_en_5.5.0_3.0_1726847615408.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("nlp_cw","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("nlp_cw", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|nlp_cw| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|425.7 MB| + +## References + +https://huggingface.co/venkateshtata/nlp_cw \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-nlp_cw_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-nlp_cw_pipeline_en.md new file mode 100644 index 00000000000000..fb82cba494b5c7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-nlp_cw_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English nlp_cw_pipeline pipeline RoBertaForTokenClassification from venkateshtata +author: John Snow Labs +name: nlp_cw_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`nlp_cw_pipeline` is a English model originally trained by venkateshtata. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/nlp_cw_pipeline_en_5.5.0_3.0_1726847654380.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/nlp_cw_pipeline_en_5.5.0_3.0_1726847654380.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("nlp_cw_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("nlp_cw_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|nlp_cw_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|425.8 MB| + +## References + +https://huggingface.co/venkateshtata/nlp_cw + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-nlp_hf_workshop_farzanrahmani_en.md b/docs/_posts/ahmedlone127/2024-09-20-nlp_hf_workshop_farzanrahmani_en.md new file mode 100644 index 00000000000000..6f198ccc7da225 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-nlp_hf_workshop_farzanrahmani_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English nlp_hf_workshop_farzanrahmani DistilBertForSequenceClassification from farzanrahmani +author: John Snow Labs +name: nlp_hf_workshop_farzanrahmani +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`nlp_hf_workshop_farzanrahmani` is a English model originally trained by farzanrahmani. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/nlp_hf_workshop_farzanrahmani_en_5.5.0_3.0_1726861121852.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/nlp_hf_workshop_farzanrahmani_en_5.5.0_3.0_1726861121852.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("nlp_hf_workshop_farzanrahmani","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("nlp_hf_workshop_farzanrahmani", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|nlp_hf_workshop_farzanrahmani| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|246.0 MB| + +## References + +https://huggingface.co/farzanrahmani/NLP_HF_Workshop \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-nlp_hf_workshop_farzanrahmani_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-nlp_hf_workshop_farzanrahmani_pipeline_en.md new file mode 100644 index 00000000000000..2ac9123891ae5c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-nlp_hf_workshop_farzanrahmani_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English nlp_hf_workshop_farzanrahmani_pipeline pipeline DistilBertForSequenceClassification from farzanrahmani +author: John Snow Labs +name: nlp_hf_workshop_farzanrahmani_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`nlp_hf_workshop_farzanrahmani_pipeline` is a English model originally trained by farzanrahmani. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/nlp_hf_workshop_farzanrahmani_pipeline_en_5.5.0_3.0_1726861134679.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/nlp_hf_workshop_farzanrahmani_pipeline_en_5.5.0_3.0_1726861134679.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("nlp_hf_workshop_farzanrahmani_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("nlp_hf_workshop_farzanrahmani_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|nlp_hf_workshop_farzanrahmani_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|246.0 MB| + +## References + +https://huggingface.co/farzanrahmani/NLP_HF_Workshop + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-nlp_model_en.md b/docs/_posts/ahmedlone127/2024-09-20-nlp_model_en.md new file mode 100644 index 00000000000000..20ce2a9006949e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-nlp_model_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English nlp_model RoBertaForSequenceClassification from juniorencode +author: John Snow Labs +name: nlp_model +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`nlp_model` is a English model originally trained by juniorencode. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/nlp_model_en_5.5.0_3.0_1726850124287.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/nlp_model_en_5.5.0_3.0_1726850124287.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("nlp_model","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("nlp_model", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|nlp_model| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|308.6 MB| + +## References + +https://huggingface.co/juniorencode/nlp_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-nlp_model_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-nlp_model_pipeline_en.md new file mode 100644 index 00000000000000..27cc5e7cb9e4ee --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-nlp_model_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English nlp_model_pipeline pipeline RoBertaForSequenceClassification from juniorencode +author: John Snow Labs +name: nlp_model_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`nlp_model_pipeline` is a English model originally trained by juniorencode. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/nlp_model_pipeline_en_5.5.0_3.0_1726850143400.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/nlp_model_pipeline_en_5.5.0_3.0_1726850143400.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("nlp_model_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("nlp_model_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|nlp_model_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|308.6 MB| + +## References + +https://huggingface.co/juniorencode/nlp_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-nlp_project_ajchang6_en.md b/docs/_posts/ahmedlone127/2024-09-20-nlp_project_ajchang6_en.md new file mode 100644 index 00000000000000..d8e884005178c1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-nlp_project_ajchang6_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English nlp_project_ajchang6 DistilBertForSequenceClassification from ajchang6 +author: John Snow Labs +name: nlp_project_ajchang6 +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`nlp_project_ajchang6` is a English model originally trained by ajchang6. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/nlp_project_ajchang6_en_5.5.0_3.0_1726832782547.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/nlp_project_ajchang6_en_5.5.0_3.0_1726832782547.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("nlp_project_ajchang6","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("nlp_project_ajchang6", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|nlp_project_ajchang6| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/ajchang6/nlp_project \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-nlp_project_ajchang6_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-nlp_project_ajchang6_pipeline_en.md new file mode 100644 index 00000000000000..4c969eb378eb6f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-nlp_project_ajchang6_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English nlp_project_ajchang6_pipeline pipeline DistilBertForSequenceClassification from ajchang6 +author: John Snow Labs +name: nlp_project_ajchang6_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`nlp_project_ajchang6_pipeline` is a English model originally trained by ajchang6. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/nlp_project_ajchang6_pipeline_en_5.5.0_3.0_1726832795119.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/nlp_project_ajchang6_pipeline_en_5.5.0_3.0_1726832795119.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("nlp_project_ajchang6_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("nlp_project_ajchang6_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|nlp_project_ajchang6_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/ajchang6/nlp_project + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-notibertrecensioni_en.md b/docs/_posts/ahmedlone127/2024-09-20-notibertrecensioni_en.md new file mode 100644 index 00000000000000..8867c88f7b9944 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-notibertrecensioni_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English notibertrecensioni RoBertaForSequenceClassification from GioReg +author: John Snow Labs +name: notibertrecensioni +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`notibertrecensioni` is a English model originally trained by GioReg. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/notibertrecensioni_en_5.5.0_3.0_1726849766920.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/notibertrecensioni_en_5.5.0_3.0_1726849766920.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("notibertrecensioni","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("notibertrecensioni", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|notibertrecensioni| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|313.3 MB| + +## References + +https://huggingface.co/GioReg/notiBERTrecensioni \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-notibertrecensioni_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-notibertrecensioni_pipeline_en.md new file mode 100644 index 00000000000000..663f4eb944ef19 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-notibertrecensioni_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English notibertrecensioni_pipeline pipeline RoBertaForSequenceClassification from GioReg +author: John Snow Labs +name: notibertrecensioni_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`notibertrecensioni_pipeline` is a English model originally trained by GioReg. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/notibertrecensioni_pipeline_en_5.5.0_3.0_1726849782639.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/notibertrecensioni_pipeline_en_5.5.0_3.0_1726849782639.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("notibertrecensioni_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("notibertrecensioni_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|notibertrecensioni_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|313.3 MB| + +## References + +https://huggingface.co/GioReg/notiBERTrecensioni + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-ntuadlhw1_question_answering_en.md b/docs/_posts/ahmedlone127/2024-09-20-ntuadlhw1_question_answering_en.md new file mode 100644 index 00000000000000..a4e28ec5b42787 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-ntuadlhw1_question_answering_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English ntuadlhw1_question_answering BertForQuestionAnswering from weitung8 +author: John Snow Labs +name: ntuadlhw1_question_answering +date: 2024-09-20 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ntuadlhw1_question_answering` is a English model originally trained by weitung8. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ntuadlhw1_question_answering_en_5.5.0_3.0_1726834371107.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ntuadlhw1_question_answering_en_5.5.0_3.0_1726834371107.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("ntuadlhw1_question_answering","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("ntuadlhw1_question_answering", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ntuadlhw1_question_answering| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/weitung8/ntuadlhw1-question-answering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-ntuadlhw1_question_answering_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-ntuadlhw1_question_answering_pipeline_en.md new file mode 100644 index 00000000000000..045f8792ee111c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-ntuadlhw1_question_answering_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English ntuadlhw1_question_answering_pipeline pipeline BertForQuestionAnswering from weitung8 +author: John Snow Labs +name: ntuadlhw1_question_answering_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ntuadlhw1_question_answering_pipeline` is a English model originally trained by weitung8. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ntuadlhw1_question_answering_pipeline_en_5.5.0_3.0_1726834427974.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ntuadlhw1_question_answering_pipeline_en_5.5.0_3.0_1726834427974.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("ntuadlhw1_question_answering_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("ntuadlhw1_question_answering_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ntuadlhw1_question_answering_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/weitung8/ntuadlhw1-question-answering + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-openai_whisper_tiny_spanish_ecu911_pasobajo_pipeline_es.md b/docs/_posts/ahmedlone127/2024-09-20-openai_whisper_tiny_spanish_ecu911_pasobajo_pipeline_es.md new file mode 100644 index 00000000000000..63a164c8cf24ae --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-openai_whisper_tiny_spanish_ecu911_pasobajo_pipeline_es.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Castilian, Spanish openai_whisper_tiny_spanish_ecu911_pasobajo_pipeline pipeline WhisperForCTC from DanielMarquez +author: John Snow Labs +name: openai_whisper_tiny_spanish_ecu911_pasobajo_pipeline +date: 2024-09-20 +tags: [es, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: es +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`openai_whisper_tiny_spanish_ecu911_pasobajo_pipeline` is a Castilian, Spanish model originally trained by DanielMarquez. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/openai_whisper_tiny_spanish_ecu911_pasobajo_pipeline_es_5.5.0_3.0_1726814121844.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/openai_whisper_tiny_spanish_ecu911_pasobajo_pipeline_es_5.5.0_3.0_1726814121844.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("openai_whisper_tiny_spanish_ecu911_pasobajo_pipeline", lang = "es") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("openai_whisper_tiny_spanish_ecu911_pasobajo_pipeline", lang = "es") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|openai_whisper_tiny_spanish_ecu911_pasobajo_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|es| +|Size:|379.7 MB| + +## References + +https://huggingface.co/DanielMarquez/openai-whisper-tiny-es_ecu911-PasoBajo + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-patient_doctor_text_classifier_eng_0523_en.md b/docs/_posts/ahmedlone127/2024-09-20-patient_doctor_text_classifier_eng_0523_en.md new file mode 100644 index 00000000000000..65d72b093a2597 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-patient_doctor_text_classifier_eng_0523_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English patient_doctor_text_classifier_eng_0523 DistilBertForSequenceClassification from LukeGPT88 +author: John Snow Labs +name: patient_doctor_text_classifier_eng_0523 +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`patient_doctor_text_classifier_eng_0523` is a English model originally trained by LukeGPT88. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/patient_doctor_text_classifier_eng_0523_en_5.5.0_3.0_1726861300171.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/patient_doctor_text_classifier_eng_0523_en_5.5.0_3.0_1726861300171.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("patient_doctor_text_classifier_eng_0523","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("patient_doctor_text_classifier_eng_0523", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|patient_doctor_text_classifier_eng_0523| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/LukeGPT88/patient-doctor-text-classifier-eng-0523 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-patient_doctor_text_classifier_eng_0523_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-patient_doctor_text_classifier_eng_0523_pipeline_en.md new file mode 100644 index 00000000000000..7bd717cd0b2e41 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-patient_doctor_text_classifier_eng_0523_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English patient_doctor_text_classifier_eng_0523_pipeline pipeline DistilBertForSequenceClassification from LukeGPT88 +author: John Snow Labs +name: patient_doctor_text_classifier_eng_0523_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`patient_doctor_text_classifier_eng_0523_pipeline` is a English model originally trained by LukeGPT88. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/patient_doctor_text_classifier_eng_0523_pipeline_en_5.5.0_3.0_1726861311873.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/patient_doctor_text_classifier_eng_0523_pipeline_en_5.5.0_3.0_1726861311873.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("patient_doctor_text_classifier_eng_0523_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("patient_doctor_text_classifier_eng_0523_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|patient_doctor_text_classifier_eng_0523_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/LukeGPT88/patient-doctor-text-classifier-eng-0523 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-personality_lm_en.md b/docs/_posts/ahmedlone127/2024-09-20-personality_lm_en.md new file mode 100644 index 00000000000000..4917f063abb0c4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-personality_lm_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English personality_lm RoBertaForSequenceClassification from rong4ivy +author: John Snow Labs +name: personality_lm +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`personality_lm` is a English model originally trained by rong4ivy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/personality_lm_en_5.5.0_3.0_1726852347846.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/personality_lm_en_5.5.0_3.0_1726852347846.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("personality_lm","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("personality_lm", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|personality_lm| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|468.3 MB| + +## References + +https://huggingface.co/rong4ivy/personality_LM \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-personality_lm_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-personality_lm_pipeline_en.md new file mode 100644 index 00000000000000..934691c246ff73 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-personality_lm_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English personality_lm_pipeline pipeline RoBertaForSequenceClassification from rong4ivy +author: John Snow Labs +name: personality_lm_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`personality_lm_pipeline` is a English model originally trained by rong4ivy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/personality_lm_pipeline_en_5.5.0_3.0_1726852370067.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/personality_lm_pipeline_en_5.5.0_3.0_1726852370067.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("personality_lm_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("personality_lm_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|personality_lm_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|468.3 MB| + +## References + +https://huggingface.co/rong4ivy/personality_LM + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-predict_perception_xlmr_cause_concept_en.md b/docs/_posts/ahmedlone127/2024-09-20-predict_perception_xlmr_cause_concept_en.md new file mode 100644 index 00000000000000..1823b68e60bcc4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-predict_perception_xlmr_cause_concept_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English predict_perception_xlmr_cause_concept XlmRoBertaForSequenceClassification from responsibility-framing +author: John Snow Labs +name: predict_perception_xlmr_cause_concept +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`predict_perception_xlmr_cause_concept` is a English model originally trained by responsibility-framing. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/predict_perception_xlmr_cause_concept_en_5.5.0_3.0_1726865929932.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/predict_perception_xlmr_cause_concept_en_5.5.0_3.0_1726865929932.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("predict_perception_xlmr_cause_concept","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("predict_perception_xlmr_cause_concept", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|predict_perception_xlmr_cause_concept| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|837.6 MB| + +## References + +https://huggingface.co/responsibility-framing/predict-perception-xlmr-cause-concept \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-predict_perception_xlmr_cause_concept_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-predict_perception_xlmr_cause_concept_pipeline_en.md new file mode 100644 index 00000000000000..4f2728e5256cef --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-predict_perception_xlmr_cause_concept_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English predict_perception_xlmr_cause_concept_pipeline pipeline XlmRoBertaForSequenceClassification from responsibility-framing +author: John Snow Labs +name: predict_perception_xlmr_cause_concept_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`predict_perception_xlmr_cause_concept_pipeline` is a English model originally trained by responsibility-framing. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/predict_perception_xlmr_cause_concept_pipeline_en_5.5.0_3.0_1726865994382.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/predict_perception_xlmr_cause_concept_pipeline_en_5.5.0_3.0_1726865994382.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("predict_perception_xlmr_cause_concept_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("predict_perception_xlmr_cause_concept_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|predict_perception_xlmr_cause_concept_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|837.6 MB| + +## References + +https://huggingface.co/responsibility-framing/predict-perception-xlmr-cause-concept + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-pretrained_mario_bert_448_paths_ctx_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-pretrained_mario_bert_448_paths_ctx_pipeline_en.md new file mode 100644 index 00000000000000..f165e7682a536f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-pretrained_mario_bert_448_paths_ctx_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English pretrained_mario_bert_448_paths_ctx_pipeline pipeline RoBertaEmbeddings from shyamsn97 +author: John Snow Labs +name: pretrained_mario_bert_448_paths_ctx_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`pretrained_mario_bert_448_paths_ctx_pipeline` is a English model originally trained by shyamsn97. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/pretrained_mario_bert_448_paths_ctx_pipeline_en_5.5.0_3.0_1726796424906.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/pretrained_mario_bert_448_paths_ctx_pipeline_en_5.5.0_3.0_1726796424906.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("pretrained_mario_bert_448_paths_ctx_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("pretrained_mario_bert_448_paths_ctx_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|pretrained_mario_bert_448_paths_ctx_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|306.5 MB| + +## References + +https://huggingface.co/shyamsn97/pretrained-mario-bert-448-paths-ctx + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-promptclassifcation_en.md b/docs/_posts/ahmedlone127/2024-09-20-promptclassifcation_en.md new file mode 100644 index 00000000000000..0574e8c500390d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-promptclassifcation_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English promptclassifcation RoBertaForSequenceClassification from rishika0704 +author: John Snow Labs +name: promptclassifcation +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`promptclassifcation` is a English model originally trained by rishika0704. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/promptclassifcation_en_5.5.0_3.0_1726799028256.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/promptclassifcation_en_5.5.0_3.0_1726799028256.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("promptclassifcation","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("promptclassifcation", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|promptclassifcation| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|426.5 MB| + +## References + +https://huggingface.co/rishika0704/promptClassifcation \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-prueba_en.md b/docs/_posts/ahmedlone127/2024-09-20-prueba_en.md new file mode 100644 index 00000000000000..18fe7f8156b59d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-prueba_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English prueba DistilBertForSequenceClassification from rayosoftware +author: John Snow Labs +name: prueba +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`prueba` is a English model originally trained by rayosoftware. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/prueba_en_5.5.0_3.0_1726832556209.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/prueba_en_5.5.0_3.0_1726832556209.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("prueba","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("prueba", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|prueba| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/rayosoftware/prueba \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-prueba_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-prueba_pipeline_en.md new file mode 100644 index 00000000000000..b3a78113f70625 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-prueba_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English prueba_pipeline pipeline DistilBertForSequenceClassification from rayosoftware +author: John Snow Labs +name: prueba_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`prueba_pipeline` is a English model originally trained by rayosoftware. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/prueba_pipeline_en_5.5.0_3.0_1726832568833.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/prueba_pipeline_en_5.5.0_3.0_1726832568833.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("prueba_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("prueba_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|prueba_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/rayosoftware/prueba + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-ptcrawl_plus_legal_base_v1_7__checkpoint_last_en.md b/docs/_posts/ahmedlone127/2024-09-20-ptcrawl_plus_legal_base_v1_7__checkpoint_last_en.md new file mode 100644 index 00000000000000..d20c9df58a99bf --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-ptcrawl_plus_legal_base_v1_7__checkpoint_last_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English ptcrawl_plus_legal_base_v1_7__checkpoint_last RoBertaEmbeddings from eduagarcia-temp +author: John Snow Labs +name: ptcrawl_plus_legal_base_v1_7__checkpoint_last +date: 2024-09-20 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ptcrawl_plus_legal_base_v1_7__checkpoint_last` is a English model originally trained by eduagarcia-temp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ptcrawl_plus_legal_base_v1_7__checkpoint_last_en_5.5.0_3.0_1726857200850.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ptcrawl_plus_legal_base_v1_7__checkpoint_last_en_5.5.0_3.0_1726857200850.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("ptcrawl_plus_legal_base_v1_7__checkpoint_last","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("ptcrawl_plus_legal_base_v1_7__checkpoint_last","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ptcrawl_plus_legal_base_v1_7__checkpoint_last| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|296.3 MB| + +## References + +https://huggingface.co/eduagarcia-temp/ptcrawl_plus_legal_base_v1_7__checkpoint_last \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-ptcrawl_plus_legal_base_v1_7__checkpoint_last_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-ptcrawl_plus_legal_base_v1_7__checkpoint_last_pipeline_en.md new file mode 100644 index 00000000000000..93ff206fb42650 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-ptcrawl_plus_legal_base_v1_7__checkpoint_last_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English ptcrawl_plus_legal_base_v1_7__checkpoint_last_pipeline pipeline RoBertaEmbeddings from eduagarcia-temp +author: John Snow Labs +name: ptcrawl_plus_legal_base_v1_7__checkpoint_last_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ptcrawl_plus_legal_base_v1_7__checkpoint_last_pipeline` is a English model originally trained by eduagarcia-temp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ptcrawl_plus_legal_base_v1_7__checkpoint_last_pipeline_en_5.5.0_3.0_1726857285774.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ptcrawl_plus_legal_base_v1_7__checkpoint_last_pipeline_en_5.5.0_3.0_1726857285774.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("ptcrawl_plus_legal_base_v1_7__checkpoint_last_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("ptcrawl_plus_legal_base_v1_7__checkpoint_last_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ptcrawl_plus_legal_base_v1_7__checkpoint_last_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|296.4 MB| + +## References + +https://huggingface.co/eduagarcia-temp/ptcrawl_plus_legal_base_v1_7__checkpoint_last + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-ptcrawl_plus_legal_large_v1_7__checkpoint_last_en.md b/docs/_posts/ahmedlone127/2024-09-20-ptcrawl_plus_legal_large_v1_7__checkpoint_last_en.md new file mode 100644 index 00000000000000..ca7f020adb1d43 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-ptcrawl_plus_legal_large_v1_7__checkpoint_last_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English ptcrawl_plus_legal_large_v1_7__checkpoint_last RoBertaEmbeddings from eduagarcia-temp +author: John Snow Labs +name: ptcrawl_plus_legal_large_v1_7__checkpoint_last +date: 2024-09-20 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ptcrawl_plus_legal_large_v1_7__checkpoint_last` is a English model originally trained by eduagarcia-temp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ptcrawl_plus_legal_large_v1_7__checkpoint_last_en_5.5.0_3.0_1726858053742.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ptcrawl_plus_legal_large_v1_7__checkpoint_last_en_5.5.0_3.0_1726858053742.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("ptcrawl_plus_legal_large_v1_7__checkpoint_last","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("ptcrawl_plus_legal_large_v1_7__checkpoint_last","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ptcrawl_plus_legal_large_v1_7__checkpoint_last| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|842.9 MB| + +## References + +https://huggingface.co/eduagarcia-temp/ptcrawl_plus_legal_large_v1_7__checkpoint_last \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-ptcrawl_plus_legal_large_v1_7__checkpoint_last_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-ptcrawl_plus_legal_large_v1_7__checkpoint_last_pipeline_en.md new file mode 100644 index 00000000000000..107f30525648cf --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-ptcrawl_plus_legal_large_v1_7__checkpoint_last_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English ptcrawl_plus_legal_large_v1_7__checkpoint_last_pipeline pipeline RoBertaEmbeddings from eduagarcia-temp +author: John Snow Labs +name: ptcrawl_plus_legal_large_v1_7__checkpoint_last_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ptcrawl_plus_legal_large_v1_7__checkpoint_last_pipeline` is a English model originally trained by eduagarcia-temp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ptcrawl_plus_legal_large_v1_7__checkpoint_last_pipeline_en_5.5.0_3.0_1726858287309.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ptcrawl_plus_legal_large_v1_7__checkpoint_last_pipeline_en_5.5.0_3.0_1726858287309.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("ptcrawl_plus_legal_large_v1_7__checkpoint_last_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("ptcrawl_plus_legal_large_v1_7__checkpoint_last_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ptcrawl_plus_legal_large_v1_7__checkpoint_last_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|842.9 MB| + +## References + +https://huggingface.co/eduagarcia-temp/ptcrawl_plus_legal_large_v1_7__checkpoint_last + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-qa_persian_bert_persian_farsi_zwnj_base_en.md b/docs/_posts/ahmedlone127/2024-09-20-qa_persian_bert_persian_farsi_zwnj_base_en.md new file mode 100644 index 00000000000000..948a8e31fffabc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-qa_persian_bert_persian_farsi_zwnj_base_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English qa_persian_bert_persian_farsi_zwnj_base BertForQuestionAnswering from makhataei +author: John Snow Labs +name: qa_persian_bert_persian_farsi_zwnj_base +date: 2024-09-20 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`qa_persian_bert_persian_farsi_zwnj_base` is a English model originally trained by makhataei. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/qa_persian_bert_persian_farsi_zwnj_base_en_5.5.0_3.0_1726820527184.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/qa_persian_bert_persian_farsi_zwnj_base_en_5.5.0_3.0_1726820527184.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("qa_persian_bert_persian_farsi_zwnj_base","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("qa_persian_bert_persian_farsi_zwnj_base", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|qa_persian_bert_persian_farsi_zwnj_base| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|441.7 MB| + +## References + +https://huggingface.co/makhataei/qa-persian-bert-fa-zwnj-base \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-qa_persian_bert_persian_farsi_zwnj_base_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-qa_persian_bert_persian_farsi_zwnj_base_pipeline_en.md new file mode 100644 index 00000000000000..18c2eef4763fde --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-qa_persian_bert_persian_farsi_zwnj_base_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English qa_persian_bert_persian_farsi_zwnj_base_pipeline pipeline BertForQuestionAnswering from makhataei +author: John Snow Labs +name: qa_persian_bert_persian_farsi_zwnj_base_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`qa_persian_bert_persian_farsi_zwnj_base_pipeline` is a English model originally trained by makhataei. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/qa_persian_bert_persian_farsi_zwnj_base_pipeline_en_5.5.0_3.0_1726820547412.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/qa_persian_bert_persian_farsi_zwnj_base_pipeline_en_5.5.0_3.0_1726820547412.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("qa_persian_bert_persian_farsi_zwnj_base_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("qa_persian_bert_persian_farsi_zwnj_base_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|qa_persian_bert_persian_farsi_zwnj_base_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|441.7 MB| + +## References + +https://huggingface.co/makhataei/qa-persian-bert-fa-zwnj-base + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-qatentbert_cpc_en.md b/docs/_posts/ahmedlone127/2024-09-20-qatentbert_cpc_en.md new file mode 100644 index 00000000000000..d9938c07f37b5d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-qatentbert_cpc_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English qatentbert_cpc BertForSequenceClassification from ZoeYou +author: John Snow Labs +name: qatentbert_cpc +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`qatentbert_cpc` is a English model originally trained by ZoeYou. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/qatentbert_cpc_en_5.5.0_3.0_1726797174844.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/qatentbert_cpc_en_5.5.0_3.0_1726797174844.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("qatentbert_cpc","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("qatentbert_cpc", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|qatentbert_cpc| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/ZoeYou/qatentBert-cpc \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-question_classification_abhibeats95_en.md b/docs/_posts/ahmedlone127/2024-09-20-question_classification_abhibeats95_en.md new file mode 100644 index 00000000000000..c1d2a48d9ab8f1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-question_classification_abhibeats95_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English question_classification_abhibeats95 DistilBertForSequenceClassification from Abhibeats95 +author: John Snow Labs +name: question_classification_abhibeats95 +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`question_classification_abhibeats95` is a English model originally trained by Abhibeats95. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/question_classification_abhibeats95_en_5.5.0_3.0_1726830426105.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/question_classification_abhibeats95_en_5.5.0_3.0_1726830426105.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("question_classification_abhibeats95","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("question_classification_abhibeats95", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|question_classification_abhibeats95| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Abhibeats95/question_classification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-question_classification_abhibeats95_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-question_classification_abhibeats95_pipeline_en.md new file mode 100644 index 00000000000000..a192af18500dbb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-question_classification_abhibeats95_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English question_classification_abhibeats95_pipeline pipeline DistilBertForSequenceClassification from Abhibeats95 +author: John Snow Labs +name: question_classification_abhibeats95_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`question_classification_abhibeats95_pipeline` is a English model originally trained by Abhibeats95. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/question_classification_abhibeats95_pipeline_en_5.5.0_3.0_1726830439859.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/question_classification_abhibeats95_pipeline_en_5.5.0_3.0_1726830439859.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("question_classification_abhibeats95_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("question_classification_abhibeats95_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|question_classification_abhibeats95_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Abhibeats95/question_classification + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-question_classification_minervabotteam_en.md b/docs/_posts/ahmedlone127/2024-09-20-question_classification_minervabotteam_en.md new file mode 100644 index 00000000000000..dff5e978946752 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-question_classification_minervabotteam_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English question_classification_minervabotteam DistilBertForSequenceClassification from MinervaBotTeam +author: John Snow Labs +name: question_classification_minervabotteam +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`question_classification_minervabotteam` is a English model originally trained by MinervaBotTeam. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/question_classification_minervabotteam_en_5.5.0_3.0_1726849123627.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/question_classification_minervabotteam_en_5.5.0_3.0_1726849123627.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("question_classification_minervabotteam","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("question_classification_minervabotteam", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|question_classification_minervabotteam| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/MinervaBotTeam/Question_Classification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-question_classification_minervabotteam_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-question_classification_minervabotteam_pipeline_en.md new file mode 100644 index 00000000000000..1bd9e174166876 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-question_classification_minervabotteam_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English question_classification_minervabotteam_pipeline pipeline DistilBertForSequenceClassification from MinervaBotTeam +author: John Snow Labs +name: question_classification_minervabotteam_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`question_classification_minervabotteam_pipeline` is a English model originally trained by MinervaBotTeam. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/question_classification_minervabotteam_pipeline_en_5.5.0_3.0_1726849135896.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/question_classification_minervabotteam_pipeline_en_5.5.0_3.0_1726849135896.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("question_classification_minervabotteam_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("question_classification_minervabotteam_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|question_classification_minervabotteam_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/MinervaBotTeam/Question_Classification + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-re_negref_nsd_nubes_training_test_dataset_roberta_base_bne_fine_tuned_en.md b/docs/_posts/ahmedlone127/2024-09-20-re_negref_nsd_nubes_training_test_dataset_roberta_base_bne_fine_tuned_en.md new file mode 100644 index 00000000000000..0c4aed133c7a91 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-re_negref_nsd_nubes_training_test_dataset_roberta_base_bne_fine_tuned_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English re_negref_nsd_nubes_training_test_dataset_roberta_base_bne_fine_tuned RoBertaForTokenClassification from ajtamayoh +author: John Snow Labs +name: re_negref_nsd_nubes_training_test_dataset_roberta_base_bne_fine_tuned +date: 2024-09-20 +tags: [en, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`re_negref_nsd_nubes_training_test_dataset_roberta_base_bne_fine_tuned` is a English model originally trained by ajtamayoh. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/re_negref_nsd_nubes_training_test_dataset_roberta_base_bne_fine_tuned_en_5.5.0_3.0_1726853301092.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/re_negref_nsd_nubes_training_test_dataset_roberta_base_bne_fine_tuned_en_5.5.0_3.0_1726853301092.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("re_negref_nsd_nubes_training_test_dataset_roberta_base_bne_fine_tuned","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("re_negref_nsd_nubes_training_test_dataset_roberta_base_bne_fine_tuned", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|re_negref_nsd_nubes_training_test_dataset_roberta_base_bne_fine_tuned| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|437.6 MB| + +## References + +https://huggingface.co/ajtamayoh/RE_NegREF_NSD_Nubes_Training_Test_dataset_RoBERTa_base_bne_fine_tuned \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-re_negref_nsd_nubes_training_test_dataset_roberta_base_bne_fine_tuned_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-re_negref_nsd_nubes_training_test_dataset_roberta_base_bne_fine_tuned_pipeline_en.md new file mode 100644 index 00000000000000..785c6155609ebe --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-re_negref_nsd_nubes_training_test_dataset_roberta_base_bne_fine_tuned_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English re_negref_nsd_nubes_training_test_dataset_roberta_base_bne_fine_tuned_pipeline pipeline RoBertaForTokenClassification from ajtamayoh +author: John Snow Labs +name: re_negref_nsd_nubes_training_test_dataset_roberta_base_bne_fine_tuned_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`re_negref_nsd_nubes_training_test_dataset_roberta_base_bne_fine_tuned_pipeline` is a English model originally trained by ajtamayoh. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/re_negref_nsd_nubes_training_test_dataset_roberta_base_bne_fine_tuned_pipeline_en_5.5.0_3.0_1726853324323.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/re_negref_nsd_nubes_training_test_dataset_roberta_base_bne_fine_tuned_pipeline_en_5.5.0_3.0_1726853324323.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("re_negref_nsd_nubes_training_test_dataset_roberta_base_bne_fine_tuned_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("re_negref_nsd_nubes_training_test_dataset_roberta_base_bne_fine_tuned_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|re_negref_nsd_nubes_training_test_dataset_roberta_base_bne_fine_tuned_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|437.6 MB| + +## References + +https://huggingface.co/ajtamayoh/RE_NegREF_NSD_Nubes_Training_Test_dataset_RoBERTa_base_bne_fine_tuned + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-regression_roberta_2_en.md b/docs/_posts/ahmedlone127/2024-09-20-regression_roberta_2_en.md new file mode 100644 index 00000000000000..6730542ddc4989 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-regression_roberta_2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English regression_roberta_2 RoBertaForSequenceClassification from Svetlana0303 +author: John Snow Labs +name: regression_roberta_2 +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`regression_roberta_2` is a English model originally trained by Svetlana0303. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/regression_roberta_2_en_5.5.0_3.0_1726804385247.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/regression_roberta_2_en_5.5.0_3.0_1726804385247.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("regression_roberta_2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("regression_roberta_2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|regression_roberta_2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|420.3 MB| + +## References + +https://huggingface.co/Svetlana0303/Regression_roberta_2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-regression_roberta_2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-regression_roberta_2_pipeline_en.md new file mode 100644 index 00000000000000..26f7b0fee69598 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-regression_roberta_2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English regression_roberta_2_pipeline pipeline RoBertaForSequenceClassification from Svetlana0303 +author: John Snow Labs +name: regression_roberta_2_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`regression_roberta_2_pipeline` is a English model originally trained by Svetlana0303. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/regression_roberta_2_pipeline_en_5.5.0_3.0_1726804423351.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/regression_roberta_2_pipeline_en_5.5.0_3.0_1726804423351.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("regression_roberta_2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("regression_roberta_2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|regression_roberta_2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|420.3 MB| + +## References + +https://huggingface.co/Svetlana0303/Regression_roberta_2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-results_cyh002_en.md b/docs/_posts/ahmedlone127/2024-09-20-results_cyh002_en.md new file mode 100644 index 00000000000000..e0d66336f685a7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-results_cyh002_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English results_cyh002 DistilBertForSequenceClassification from cyh002 +author: John Snow Labs +name: results_cyh002 +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`results_cyh002` is a English model originally trained by cyh002. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/results_cyh002_en_5.5.0_3.0_1726860851107.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/results_cyh002_en_5.5.0_3.0_1726860851107.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("results_cyh002","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("results_cyh002", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|results_cyh002| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/cyh002/results \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-results_cyh002_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-results_cyh002_pipeline_en.md new file mode 100644 index 00000000000000..fd6984f1fa03d6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-results_cyh002_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English results_cyh002_pipeline pipeline DistilBertForSequenceClassification from cyh002 +author: John Snow Labs +name: results_cyh002_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`results_cyh002_pipeline` is a English model originally trained by cyh002. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/results_cyh002_pipeline_en_5.5.0_3.0_1726860862863.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/results_cyh002_pipeline_en_5.5.0_3.0_1726860862863.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("results_cyh002_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("results_cyh002_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|results_cyh002_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/cyh002/results + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-rinna_roberta_qa_ar101_en.md b/docs/_posts/ahmedlone127/2024-09-20-rinna_roberta_qa_ar101_en.md new file mode 100644 index 00000000000000..160d9d1cb9e7b1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-rinna_roberta_qa_ar101_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English rinna_roberta_qa_ar101 BertForQuestionAnswering from Echiguerkh +author: John Snow Labs +name: rinna_roberta_qa_ar101 +date: 2024-09-20 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`rinna_roberta_qa_ar101` is a English model originally trained by Echiguerkh. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/rinna_roberta_qa_ar101_en_5.5.0_3.0_1726808549466.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/rinna_roberta_qa_ar101_en_5.5.0_3.0_1726808549466.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("rinna_roberta_qa_ar101","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("rinna_roberta_qa_ar101", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|rinna_roberta_qa_ar101| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|504.3 MB| + +## References + +https://huggingface.co/Echiguerkh/rinna-roberta-qa-ar101 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-roberta_agnews_padding20model_en.md b/docs/_posts/ahmedlone127/2024-09-20-roberta_agnews_padding20model_en.md new file mode 100644 index 00000000000000..26067fc25087bf --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-roberta_agnews_padding20model_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_agnews_padding20model RoBertaForSequenceClassification from Realgon +author: John Snow Labs +name: roberta_agnews_padding20model +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_agnews_padding20model` is a English model originally trained by Realgon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_agnews_padding20model_en_5.5.0_3.0_1726851897212.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_agnews_padding20model_en_5.5.0_3.0_1726851897212.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_agnews_padding20model","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_agnews_padding20model", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_agnews_padding20model| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|463.9 MB| + +## References + +https://huggingface.co/Realgon/roberta_agnews_padding20model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-roberta_agnews_padding20model_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-roberta_agnews_padding20model_pipeline_en.md new file mode 100644 index 00000000000000..a36ff83da75722 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-roberta_agnews_padding20model_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_agnews_padding20model_pipeline pipeline RoBertaForSequenceClassification from Realgon +author: John Snow Labs +name: roberta_agnews_padding20model_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_agnews_padding20model_pipeline` is a English model originally trained by Realgon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_agnews_padding20model_pipeline_en_5.5.0_3.0_1726851921186.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_agnews_padding20model_pipeline_en_5.5.0_3.0_1726851921186.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_agnews_padding20model_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_agnews_padding20model_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_agnews_padding20model_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|463.9 MB| + +## References + +https://huggingface.co/Realgon/roberta_agnews_padding20model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-roberta_babe_1epoch_en.md b/docs/_posts/ahmedlone127/2024-09-20-roberta_babe_1epoch_en.md new file mode 100644 index 00000000000000..ac0a8cc08bbfd1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-roberta_babe_1epoch_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_babe_1epoch RoBertaForSequenceClassification from jordankrishnayah +author: John Snow Labs +name: roberta_babe_1epoch +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_babe_1epoch` is a English model originally trained by jordankrishnayah. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_babe_1epoch_en_5.5.0_3.0_1726804838853.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_babe_1epoch_en_5.5.0_3.0_1726804838853.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_babe_1epoch","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_babe_1epoch", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_babe_1epoch| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|438.5 MB| + +## References + +https://huggingface.co/jordankrishnayah/ROBERTA-BABE-1epoch \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-roberta_babe_1epoch_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-roberta_babe_1epoch_pipeline_en.md new file mode 100644 index 00000000000000..8f40de7de1dd5a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-roberta_babe_1epoch_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_babe_1epoch_pipeline pipeline RoBertaForSequenceClassification from jordankrishnayah +author: John Snow Labs +name: roberta_babe_1epoch_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_babe_1epoch_pipeline` is a English model originally trained by jordankrishnayah. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_babe_1epoch_pipeline_en_5.5.0_3.0_1726804876574.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_babe_1epoch_pipeline_en_5.5.0_3.0_1726804876574.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_babe_1epoch_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_babe_1epoch_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_babe_1epoch_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|438.5 MB| + +## References + +https://huggingface.co/jordankrishnayah/ROBERTA-BABE-1epoch + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-roberta_base_ag_news_aktsvigun_en.md b/docs/_posts/ahmedlone127/2024-09-20-roberta_base_ag_news_aktsvigun_en.md new file mode 100644 index 00000000000000..cec1bd43db9376 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-roberta_base_ag_news_aktsvigun_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_base_ag_news_aktsvigun RoBertaForSequenceClassification from Aktsvigun +author: John Snow Labs +name: roberta_base_ag_news_aktsvigun +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_ag_news_aktsvigun` is a English model originally trained by Aktsvigun. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_ag_news_aktsvigun_en_5.5.0_3.0_1726805129339.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_ag_news_aktsvigun_en_5.5.0_3.0_1726805129339.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_base_ag_news_aktsvigun","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_base_ag_news_aktsvigun", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_ag_news_aktsvigun| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|468.4 MB| + +## References + +https://huggingface.co/Aktsvigun/roberta-base-ag_news \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-roberta_base_bc2gm_en.md b/docs/_posts/ahmedlone127/2024-09-20-roberta_base_bc2gm_en.md new file mode 100644 index 00000000000000..84958c28f7439a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-roberta_base_bc2gm_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_base_bc2gm RoBertaForTokenClassification from CheccoCando +author: John Snow Labs +name: roberta_base_bc2gm +date: 2024-09-20 +tags: [en, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_bc2gm` is a English model originally trained by CheccoCando. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_bc2gm_en_5.5.0_3.0_1726862345492.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_bc2gm_en_5.5.0_3.0_1726862345492.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_base_bc2gm","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_base_bc2gm", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_bc2gm| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|442.0 MB| + +## References + +https://huggingface.co/CheccoCando/roberta-base_bc2gm \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-roberta_base_bc2gm_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-roberta_base_bc2gm_pipeline_en.md new file mode 100644 index 00000000000000..36f5245bdb0139 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-roberta_base_bc2gm_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_bc2gm_pipeline pipeline RoBertaForTokenClassification from CheccoCando +author: John Snow Labs +name: roberta_base_bc2gm_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_bc2gm_pipeline` is a English model originally trained by CheccoCando. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_bc2gm_pipeline_en_5.5.0_3.0_1726862370596.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_bc2gm_pipeline_en_5.5.0_3.0_1726862370596.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_bc2gm_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_bc2gm_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_bc2gm_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|442.0 MB| + +## References + +https://huggingface.co/CheccoCando/roberta-base_bc2gm + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-roberta_base_biomedical_clinical_spanish_en.md b/docs/_posts/ahmedlone127/2024-09-20-roberta_base_biomedical_clinical_spanish_en.md new file mode 100644 index 00000000000000..dc2ffbf92d3931 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-roberta_base_biomedical_clinical_spanish_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_base_biomedical_clinical_spanish RoBertaForTokenClassification from manucos +author: John Snow Labs +name: roberta_base_biomedical_clinical_spanish +date: 2024-09-20 +tags: [en, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_biomedical_clinical_spanish` is a English model originally trained by manucos. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_biomedical_clinical_spanish_en_5.5.0_3.0_1726853139866.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_biomedical_clinical_spanish_en_5.5.0_3.0_1726853139866.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_base_biomedical_clinical_spanish","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_base_biomedical_clinical_spanish", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_biomedical_clinical_spanish| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|422.8 MB| + +## References + +https://huggingface.co/manucos/roberta-base-biomedical-clinical-es \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-roberta_base_biomedical_clinical_spanish_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-roberta_base_biomedical_clinical_spanish_pipeline_en.md new file mode 100644 index 00000000000000..3b8981a2b7e89b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-roberta_base_biomedical_clinical_spanish_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_biomedical_clinical_spanish_pipeline pipeline RoBertaForTokenClassification from manucos +author: John Snow Labs +name: roberta_base_biomedical_clinical_spanish_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_biomedical_clinical_spanish_pipeline` is a English model originally trained by manucos. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_biomedical_clinical_spanish_pipeline_en_5.5.0_3.0_1726853174943.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_biomedical_clinical_spanish_pipeline_en_5.5.0_3.0_1726853174943.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_biomedical_clinical_spanish_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_biomedical_clinical_spanish_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_biomedical_clinical_spanish_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|422.8 MB| + +## References + +https://huggingface.co/manucos/roberta-base-biomedical-clinical-es + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-roberta_base_bne_finetuned_detests_wandb24_en.md b/docs/_posts/ahmedlone127/2024-09-20-roberta_base_bne_finetuned_detests_wandb24_en.md new file mode 100644 index 00000000000000..a1f05f16b980ce --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-roberta_base_bne_finetuned_detests_wandb24_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_base_bne_finetuned_detests_wandb24 RoBertaForSequenceClassification from Pablo94 +author: John Snow Labs +name: roberta_base_bne_finetuned_detests_wandb24 +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_bne_finetuned_detests_wandb24` is a English model originally trained by Pablo94. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_bne_finetuned_detests_wandb24_en_5.5.0_3.0_1726850479165.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_bne_finetuned_detests_wandb24_en_5.5.0_3.0_1726850479165.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_base_bne_finetuned_detests_wandb24","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_base_bne_finetuned_detests_wandb24", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_bne_finetuned_detests_wandb24| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|431.3 MB| + +## References + +https://huggingface.co/Pablo94/roberta-base-bne-finetuned-detests-wandb24 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-roberta_base_bne_finetuned_detests_wandb24_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-roberta_base_bne_finetuned_detests_wandb24_pipeline_en.md new file mode 100644 index 00000000000000..dd1ecf31b5088c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-roberta_base_bne_finetuned_detests_wandb24_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_bne_finetuned_detests_wandb24_pipeline pipeline RoBertaForSequenceClassification from Pablo94 +author: John Snow Labs +name: roberta_base_bne_finetuned_detests_wandb24_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_bne_finetuned_detests_wandb24_pipeline` is a English model originally trained by Pablo94. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_bne_finetuned_detests_wandb24_pipeline_en_5.5.0_3.0_1726850508499.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_bne_finetuned_detests_wandb24_pipeline_en_5.5.0_3.0_1726850508499.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_bne_finetuned_detests_wandb24_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_bne_finetuned_detests_wandb24_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_bne_finetuned_detests_wandb24_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|431.3 MB| + +## References + +https://huggingface.co/Pablo94/roberta-base-bne-finetuned-detests-wandb24 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-roberta_base_bne_linear_ner_en.md b/docs/_posts/ahmedlone127/2024-09-20-roberta_base_bne_linear_ner_en.md new file mode 100644 index 00000000000000..65ac2f0c4d16dc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-roberta_base_bne_linear_ner_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_base_bne_linear_ner RoBertaForTokenClassification from hlhdatscience +author: John Snow Labs +name: roberta_base_bne_linear_ner +date: 2024-09-20 +tags: [en, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_bne_linear_ner` is a English model originally trained by hlhdatscience. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_bne_linear_ner_en_5.5.0_3.0_1726853337194.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_bne_linear_ner_en_5.5.0_3.0_1726853337194.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_base_bne_linear_ner","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_base_bne_linear_ner", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_bne_linear_ner| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|458.9 MB| + +## References + +https://huggingface.co/hlhdatscience/roberta-base-bne-Linear-NER \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-roberta_base_bne_linear_ner_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-roberta_base_bne_linear_ner_pipeline_en.md new file mode 100644 index 00000000000000..50ed82baf14c71 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-roberta_base_bne_linear_ner_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_bne_linear_ner_pipeline pipeline RoBertaForTokenClassification from hlhdatscience +author: John Snow Labs +name: roberta_base_bne_linear_ner_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_bne_linear_ner_pipeline` is a English model originally trained by hlhdatscience. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_bne_linear_ner_pipeline_en_5.5.0_3.0_1726853360393.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_bne_linear_ner_pipeline_en_5.5.0_3.0_1726853360393.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_bne_linear_ner_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_bne_linear_ner_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_bne_linear_ner_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|458.9 MB| + +## References + +https://huggingface.co/hlhdatscience/roberta-base-bne-Linear-NER + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-roberta_base_disaster_tweets_downpour_en.md b/docs/_posts/ahmedlone127/2024-09-20-roberta_base_disaster_tweets_downpour_en.md new file mode 100644 index 00000000000000..6977988ebf78b5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-roberta_base_disaster_tweets_downpour_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_base_disaster_tweets_downpour RoBertaForSequenceClassification from maxschlake +author: John Snow Labs +name: roberta_base_disaster_tweets_downpour +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_disaster_tweets_downpour` is a English model originally trained by maxschlake. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_disaster_tweets_downpour_en_5.5.0_3.0_1726851603878.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_disaster_tweets_downpour_en_5.5.0_3.0_1726851603878.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_base_disaster_tweets_downpour","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_base_disaster_tweets_downpour", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_disaster_tweets_downpour| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|444.9 MB| + +## References + +https://huggingface.co/maxschlake/roberta-base_disaster_tweets_downpour \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-roberta_base_emotion_galactic0205_en.md b/docs/_posts/ahmedlone127/2024-09-20-roberta_base_emotion_galactic0205_en.md new file mode 100644 index 00000000000000..10117d9ba35983 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-roberta_base_emotion_galactic0205_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_base_emotion_galactic0205 RoBertaForSequenceClassification from galactic0205 +author: John Snow Labs +name: roberta_base_emotion_galactic0205 +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_emotion_galactic0205` is a English model originally trained by galactic0205. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_emotion_galactic0205_en_5.5.0_3.0_1726850521579.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_emotion_galactic0205_en_5.5.0_3.0_1726850521579.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_base_emotion_galactic0205","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_base_emotion_galactic0205", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_emotion_galactic0205| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|439.0 MB| + +## References + +https://huggingface.co/galactic0205/roberta-base-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-roberta_base_emotion_galactic0205_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-roberta_base_emotion_galactic0205_pipeline_en.md new file mode 100644 index 00000000000000..d9cd4086217cab --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-roberta_base_emotion_galactic0205_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_emotion_galactic0205_pipeline pipeline RoBertaForSequenceClassification from galactic0205 +author: John Snow Labs +name: roberta_base_emotion_galactic0205_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_emotion_galactic0205_pipeline` is a English model originally trained by galactic0205. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_emotion_galactic0205_pipeline_en_5.5.0_3.0_1726850547616.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_emotion_galactic0205_pipeline_en_5.5.0_3.0_1726850547616.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_emotion_galactic0205_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_emotion_galactic0205_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_emotion_galactic0205_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|439.0 MB| + +## References + +https://huggingface.co/galactic0205/roberta-base-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-roberta_base_englishlawai_roberta_base_version4_en.md b/docs/_posts/ahmedlone127/2024-09-20-roberta_base_englishlawai_roberta_base_version4_en.md new file mode 100644 index 00000000000000..80c06541777f21 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-roberta_base_englishlawai_roberta_base_version4_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_base_englishlawai_roberta_base_version4 RoBertaEmbeddings from Makabaka +author: John Snow Labs +name: roberta_base_englishlawai_roberta_base_version4 +date: 2024-09-20 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_englishlawai_roberta_base_version4` is a English model originally trained by Makabaka. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_englishlawai_roberta_base_version4_en_5.5.0_3.0_1726793515075.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_englishlawai_roberta_base_version4_en_5.5.0_3.0_1726793515075.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("roberta_base_englishlawai_roberta_base_version4","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("roberta_base_englishlawai_roberta_base_version4","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_englishlawai_roberta_base_version4| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|465.6 MB| + +## References + +https://huggingface.co/Makabaka/roberta-base-EnglishLawAI_roberta_base_version4 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-roberta_base_epoch_32_en.md b/docs/_posts/ahmedlone127/2024-09-20-roberta_base_epoch_32_en.md new file mode 100644 index 00000000000000..a462e2fce17190 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-roberta_base_epoch_32_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_base_epoch_32 RoBertaEmbeddings from yanaiela +author: John Snow Labs +name: roberta_base_epoch_32 +date: 2024-09-20 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_epoch_32` is a English model originally trained by yanaiela. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_epoch_32_en_5.5.0_3.0_1726857396913.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_epoch_32_en_5.5.0_3.0_1726857396913.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("roberta_base_epoch_32","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("roberta_base_epoch_32","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_epoch_32| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|297.3 MB| + +## References + +https://huggingface.co/yanaiela/roberta-base-epoch_32 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-roberta_base_epoch_32_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-roberta_base_epoch_32_pipeline_en.md new file mode 100644 index 00000000000000..b87cd865cf7df8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-roberta_base_epoch_32_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_epoch_32_pipeline pipeline RoBertaEmbeddings from yanaiela +author: John Snow Labs +name: roberta_base_epoch_32_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_epoch_32_pipeline` is a English model originally trained by yanaiela. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_epoch_32_pipeline_en_5.5.0_3.0_1726857481531.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_epoch_32_pipeline_en_5.5.0_3.0_1726857481531.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_epoch_32_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_epoch_32_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_epoch_32_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|297.3 MB| + +## References + +https://huggingface.co/yanaiela/roberta-base-epoch_32 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-roberta_base_epoch_56_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-roberta_base_epoch_56_pipeline_en.md new file mode 100644 index 00000000000000..7019816a30d869 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-roberta_base_epoch_56_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_epoch_56_pipeline pipeline RoBertaEmbeddings from yanaiela +author: John Snow Labs +name: roberta_base_epoch_56_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_epoch_56_pipeline` is a English model originally trained by yanaiela. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_epoch_56_pipeline_en_5.5.0_3.0_1726793784768.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_epoch_56_pipeline_en_5.5.0_3.0_1726793784768.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_epoch_56_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_epoch_56_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_epoch_56_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|297.3 MB| + +## References + +https://huggingface.co/yanaiela/roberta-base-epoch_56 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-roberta_base_epoch_69_en.md b/docs/_posts/ahmedlone127/2024-09-20-roberta_base_epoch_69_en.md new file mode 100644 index 00000000000000..74c8fdfa3d626c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-roberta_base_epoch_69_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_base_epoch_69 RoBertaEmbeddings from yanaiela +author: John Snow Labs +name: roberta_base_epoch_69 +date: 2024-09-20 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_epoch_69` is a English model originally trained by yanaiela. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_epoch_69_en_5.5.0_3.0_1726793770331.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_epoch_69_en_5.5.0_3.0_1726793770331.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("roberta_base_epoch_69","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("roberta_base_epoch_69","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_epoch_69| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|297.3 MB| + +## References + +https://huggingface.co/yanaiela/roberta-base-epoch_69 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-roberta_base_epoch_69_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-roberta_base_epoch_69_pipeline_en.md new file mode 100644 index 00000000000000..bd2d85db33a01b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-roberta_base_epoch_69_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_epoch_69_pipeline pipeline RoBertaEmbeddings from yanaiela +author: John Snow Labs +name: roberta_base_epoch_69_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_epoch_69_pipeline` is a English model originally trained by yanaiela. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_epoch_69_pipeline_en_5.5.0_3.0_1726793853622.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_epoch_69_pipeline_en_5.5.0_3.0_1726793853622.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_epoch_69_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_epoch_69_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_epoch_69_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|297.3 MB| + +## References + +https://huggingface.co/yanaiela/roberta-base-epoch_69 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-roberta_base_epoch_71_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-roberta_base_epoch_71_pipeline_en.md new file mode 100644 index 00000000000000..dec16746f7b44a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-roberta_base_epoch_71_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_epoch_71_pipeline pipeline RoBertaEmbeddings from yanaiela +author: John Snow Labs +name: roberta_base_epoch_71_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_epoch_71_pipeline` is a English model originally trained by yanaiela. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_epoch_71_pipeline_en_5.5.0_3.0_1726796612564.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_epoch_71_pipeline_en_5.5.0_3.0_1726796612564.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_epoch_71_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_epoch_71_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_epoch_71_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|297.3 MB| + +## References + +https://huggingface.co/yanaiela/roberta-base-epoch_71 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-roberta_base_epoch_81_en.md b/docs/_posts/ahmedlone127/2024-09-20-roberta_base_epoch_81_en.md new file mode 100644 index 00000000000000..f2cf20e41f2ef6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-roberta_base_epoch_81_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_base_epoch_81 RoBertaEmbeddings from yanaiela +author: John Snow Labs +name: roberta_base_epoch_81 +date: 2024-09-20 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_epoch_81` is a English model originally trained by yanaiela. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_epoch_81_en_5.5.0_3.0_1726857151195.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_epoch_81_en_5.5.0_3.0_1726857151195.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("roberta_base_epoch_81","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("roberta_base_epoch_81","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_epoch_81| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|297.3 MB| + +## References + +https://huggingface.co/yanaiela/roberta-base-epoch_81 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-roberta_base_epoch_81_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-roberta_base_epoch_81_pipeline_en.md new file mode 100644 index 00000000000000..6895f846701797 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-roberta_base_epoch_81_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_epoch_81_pipeline pipeline RoBertaEmbeddings from yanaiela +author: John Snow Labs +name: roberta_base_epoch_81_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_epoch_81_pipeline` is a English model originally trained by yanaiela. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_epoch_81_pipeline_en_5.5.0_3.0_1726857236139.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_epoch_81_pipeline_en_5.5.0_3.0_1726857236139.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_epoch_81_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_epoch_81_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_epoch_81_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|297.3 MB| + +## References + +https://huggingface.co/yanaiela/roberta-base-epoch_81 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-roberta_base_finetuned_wallisian_manual_8ep_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-roberta_base_finetuned_wallisian_manual_8ep_pipeline_en.md new file mode 100644 index 00000000000000..6e84028391d5df --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-roberta_base_finetuned_wallisian_manual_8ep_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_finetuned_wallisian_manual_8ep_pipeline pipeline RoBertaEmbeddings from btamm12 +author: John Snow Labs +name: roberta_base_finetuned_wallisian_manual_8ep_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_finetuned_wallisian_manual_8ep_pipeline` is a English model originally trained by btamm12. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_finetuned_wallisian_manual_8ep_pipeline_en_5.5.0_3.0_1726793313586.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_finetuned_wallisian_manual_8ep_pipeline_en_5.5.0_3.0_1726793313586.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_finetuned_wallisian_manual_8ep_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_finetuned_wallisian_manual_8ep_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_finetuned_wallisian_manual_8ep_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|465.1 MB| + +## References + +https://huggingface.co/btamm12/roberta-base-finetuned-wls-manual-8ep + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-roberta_base_finetuned_wallisian_whisper_7ep_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-roberta_base_finetuned_wallisian_whisper_7ep_pipeline_en.md new file mode 100644 index 00000000000000..300e9f895ba6b5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-roberta_base_finetuned_wallisian_whisper_7ep_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_finetuned_wallisian_whisper_7ep_pipeline pipeline RoBertaEmbeddings from btamm12 +author: John Snow Labs +name: roberta_base_finetuned_wallisian_whisper_7ep_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_finetuned_wallisian_whisper_7ep_pipeline` is a English model originally trained by btamm12. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_finetuned_wallisian_whisper_7ep_pipeline_en_5.5.0_3.0_1726796431050.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_finetuned_wallisian_whisper_7ep_pipeline_en_5.5.0_3.0_1726796431050.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_finetuned_wallisian_whisper_7ep_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_finetuned_wallisian_whisper_7ep_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_finetuned_wallisian_whisper_7ep_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|466.0 MB| + +## References + +https://huggingface.co/btamm12/roberta-base-finetuned-wls-whisper-7ep + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-roberta_base_hb_classifier_en.md b/docs/_posts/ahmedlone127/2024-09-20-roberta_base_hb_classifier_en.md new file mode 100644 index 00000000000000..2ff212d4b4dc36 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-roberta_base_hb_classifier_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_base_hb_classifier RoBertaForSequenceClassification from Irsik +author: John Snow Labs +name: roberta_base_hb_classifier +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_hb_classifier` is a English model originally trained by Irsik. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_hb_classifier_en_5.5.0_3.0_1726850586771.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_hb_classifier_en_5.5.0_3.0_1726850586771.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_base_hb_classifier","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_base_hb_classifier", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_hb_classifier| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|447.5 MB| + +## References + +https://huggingface.co/Irsik/roberta-base-hb-classifier \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-roberta_base_hb_classifier_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-roberta_base_hb_classifier_pipeline_en.md new file mode 100644 index 00000000000000..57dc3676ea0338 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-roberta_base_hb_classifier_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_hb_classifier_pipeline pipeline RoBertaForSequenceClassification from Irsik +author: John Snow Labs +name: roberta_base_hb_classifier_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_hb_classifier_pipeline` is a English model originally trained by Irsik. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_hb_classifier_pipeline_en_5.5.0_3.0_1726850614389.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_hb_classifier_pipeline_en_5.5.0_3.0_1726850614389.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_hb_classifier_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_hb_classifier_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_hb_classifier_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|447.5 MB| + +## References + +https://huggingface.co/Irsik/roberta-base-hb-classifier + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-roberta_base_legal_indian_courts_downstream_build_rr_en.md b/docs/_posts/ahmedlone127/2024-09-20-roberta_base_legal_indian_courts_downstream_build_rr_en.md new file mode 100644 index 00000000000000..364e28bfa9ab3b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-roberta_base_legal_indian_courts_downstream_build_rr_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_base_legal_indian_courts_downstream_build_rr RoBertaForTokenClassification from MHGanainy +author: John Snow Labs +name: roberta_base_legal_indian_courts_downstream_build_rr +date: 2024-09-20 +tags: [en, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_legal_indian_courts_downstream_build_rr` is a English model originally trained by MHGanainy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_legal_indian_courts_downstream_build_rr_en_5.5.0_3.0_1726862684883.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_legal_indian_courts_downstream_build_rr_en_5.5.0_3.0_1726862684883.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_base_legal_indian_courts_downstream_build_rr","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_base_legal_indian_courts_downstream_build_rr", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_legal_indian_courts_downstream_build_rr| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|464.2 MB| + +## References + +https://huggingface.co/MHGanainy/roberta-base-legal-indian-courts-downstream-build_rr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-roberta_base_legal_indian_courts_downstream_build_rr_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-roberta_base_legal_indian_courts_downstream_build_rr_pipeline_en.md new file mode 100644 index 00000000000000..d0828a5900afd6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-roberta_base_legal_indian_courts_downstream_build_rr_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_legal_indian_courts_downstream_build_rr_pipeline pipeline RoBertaForTokenClassification from MHGanainy +author: John Snow Labs +name: roberta_base_legal_indian_courts_downstream_build_rr_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_legal_indian_courts_downstream_build_rr_pipeline` is a English model originally trained by MHGanainy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_legal_indian_courts_downstream_build_rr_pipeline_en_5.5.0_3.0_1726862707099.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_legal_indian_courts_downstream_build_rr_pipeline_en_5.5.0_3.0_1726862707099.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_legal_indian_courts_downstream_build_rr_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_legal_indian_courts_downstream_build_rr_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_legal_indian_courts_downstream_build_rr_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|464.2 MB| + +## References + +https://huggingface.co/MHGanainy/roberta-base-legal-indian-courts-downstream-build_rr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-roberta_base_ner_demo_dolgorsureng_mn.md b/docs/_posts/ahmedlone127/2024-09-20-roberta_base_ner_demo_dolgorsureng_mn.md new file mode 100644 index 00000000000000..7f8e4898474d23 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-roberta_base_ner_demo_dolgorsureng_mn.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Mongolian roberta_base_ner_demo_dolgorsureng RoBertaForTokenClassification from Dolgorsureng +author: John Snow Labs +name: roberta_base_ner_demo_dolgorsureng +date: 2024-09-20 +tags: [mn, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: mn +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_ner_demo_dolgorsureng` is a Mongolian model originally trained by Dolgorsureng. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_ner_demo_dolgorsureng_mn_5.5.0_3.0_1726847260202.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_ner_demo_dolgorsureng_mn_5.5.0_3.0_1726847260202.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_base_ner_demo_dolgorsureng","mn") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_base_ner_demo_dolgorsureng", "mn") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_ner_demo_dolgorsureng| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|mn| +|Size:|465.7 MB| + +## References + +https://huggingface.co/Dolgorsureng/roberta-base-ner-demo \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-roberta_base_ner_demo_dolgorsureng_pipeline_mn.md b/docs/_posts/ahmedlone127/2024-09-20-roberta_base_ner_demo_dolgorsureng_pipeline_mn.md new file mode 100644 index 00000000000000..963d11d7dff91f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-roberta_base_ner_demo_dolgorsureng_pipeline_mn.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Mongolian roberta_base_ner_demo_dolgorsureng_pipeline pipeline RoBertaForTokenClassification from Dolgorsureng +author: John Snow Labs +name: roberta_base_ner_demo_dolgorsureng_pipeline +date: 2024-09-20 +tags: [mn, open_source, pipeline, onnx] +task: Named Entity Recognition +language: mn +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_ner_demo_dolgorsureng_pipeline` is a Mongolian model originally trained by Dolgorsureng. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_ner_demo_dolgorsureng_pipeline_mn_5.5.0_3.0_1726847282840.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_ner_demo_dolgorsureng_pipeline_mn_5.5.0_3.0_1726847282840.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_ner_demo_dolgorsureng_pipeline", lang = "mn") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_ner_demo_dolgorsureng_pipeline", lang = "mn") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_ner_demo_dolgorsureng_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|mn| +|Size:|465.7 MB| + +## References + +https://huggingface.co/Dolgorsureng/roberta-base-ner-demo + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-roberta_base_ner_demo_enhjino_mn.md b/docs/_posts/ahmedlone127/2024-09-20-roberta_base_ner_demo_enhjino_mn.md new file mode 100644 index 00000000000000..007e74bedc36a5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-roberta_base_ner_demo_enhjino_mn.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Mongolian roberta_base_ner_demo_enhjino RoBertaForTokenClassification from enhjino +author: John Snow Labs +name: roberta_base_ner_demo_enhjino +date: 2024-09-20 +tags: [mn, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: mn +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_ner_demo_enhjino` is a Mongolian model originally trained by enhjino. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_ner_demo_enhjino_mn_5.5.0_3.0_1726853248382.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_ner_demo_enhjino_mn_5.5.0_3.0_1726853248382.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_base_ner_demo_enhjino","mn") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_base_ner_demo_enhjino", "mn") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_ner_demo_enhjino| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|mn| +|Size:|465.7 MB| + +## References + +https://huggingface.co/enhjino/roberta-base-ner-demo \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-roberta_base_ner_demo_enhjino_pipeline_mn.md b/docs/_posts/ahmedlone127/2024-09-20-roberta_base_ner_demo_enhjino_pipeline_mn.md new file mode 100644 index 00000000000000..ad78ca0ead3e1a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-roberta_base_ner_demo_enhjino_pipeline_mn.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Mongolian roberta_base_ner_demo_enhjino_pipeline pipeline RoBertaForTokenClassification from enhjino +author: John Snow Labs +name: roberta_base_ner_demo_enhjino_pipeline +date: 2024-09-20 +tags: [mn, open_source, pipeline, onnx] +task: Named Entity Recognition +language: mn +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_ner_demo_enhjino_pipeline` is a Mongolian model originally trained by enhjino. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_ner_demo_enhjino_pipeline_mn_5.5.0_3.0_1726853272438.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_ner_demo_enhjino_pipeline_mn_5.5.0_3.0_1726853272438.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_ner_demo_enhjino_pipeline", lang = "mn") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_ner_demo_enhjino_pipeline", lang = "mn") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_ner_demo_enhjino_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|mn| +|Size:|465.7 MB| + +## References + +https://huggingface.co/enhjino/roberta-base-ner-demo + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-roberta_base_ner_test_mn.md b/docs/_posts/ahmedlone127/2024-09-20-roberta_base_ner_test_mn.md new file mode 100644 index 00000000000000..45e7ea9c4fe868 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-roberta_base_ner_test_mn.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Mongolian roberta_base_ner_test RoBertaForTokenClassification from Dondog +author: John Snow Labs +name: roberta_base_ner_test +date: 2024-09-20 +tags: [mn, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: mn +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_ner_test` is a Mongolian model originally trained by Dondog. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_ner_test_mn_5.5.0_3.0_1726853635725.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_ner_test_mn_5.5.0_3.0_1726853635725.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_base_ner_test","mn") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_base_ner_test", "mn") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_ner_test| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|mn| +|Size:|465.7 MB| + +## References + +https://huggingface.co/Dondog/roberta-base-ner-test \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-roberta_base_ner_test_pipeline_mn.md b/docs/_posts/ahmedlone127/2024-09-20-roberta_base_ner_test_pipeline_mn.md new file mode 100644 index 00000000000000..cd38586ae9cec0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-roberta_base_ner_test_pipeline_mn.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Mongolian roberta_base_ner_test_pipeline pipeline RoBertaForTokenClassification from Dondog +author: John Snow Labs +name: roberta_base_ner_test_pipeline +date: 2024-09-20 +tags: [mn, open_source, pipeline, onnx] +task: Named Entity Recognition +language: mn +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_ner_test_pipeline` is a Mongolian model originally trained by Dondog. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_ner_test_pipeline_mn_5.5.0_3.0_1726853660360.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_ner_test_pipeline_mn_5.5.0_3.0_1726853660360.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_ner_test_pipeline", lang = "mn") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_ner_test_pipeline", lang = "mn") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_ner_test_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|mn| +|Size:|465.7 MB| + +## References + +https://huggingface.co/Dondog/roberta-base-ner-test + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-roberta_base_polyglotner_en.md b/docs/_posts/ahmedlone127/2024-09-20-roberta_base_polyglotner_en.md new file mode 100644 index 00000000000000..c3df9a8b168ab3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-roberta_base_polyglotner_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_base_polyglotner RoBertaForTokenClassification from CheccoCando +author: John Snow Labs +name: roberta_base_polyglotner +date: 2024-09-20 +tags: [en, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_polyglotner` is a English model originally trained by CheccoCando. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_polyglotner_en_5.5.0_3.0_1726862634645.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_polyglotner_en_5.5.0_3.0_1726862634645.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_base_polyglotner","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_base_polyglotner", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_polyglotner| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|449.1 MB| + +## References + +https://huggingface.co/CheccoCando/roberta-base_PolyglotNER \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-roberta_base_polyglotner_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-roberta_base_polyglotner_pipeline_en.md new file mode 100644 index 00000000000000..4a4b5fce722a22 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-roberta_base_polyglotner_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_polyglotner_pipeline pipeline RoBertaForTokenClassification from CheccoCando +author: John Snow Labs +name: roberta_base_polyglotner_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_polyglotner_pipeline` is a English model originally trained by CheccoCando. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_polyglotner_pipeline_en_5.5.0_3.0_1726862658860.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_polyglotner_pipeline_en_5.5.0_3.0_1726862658860.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_polyglotner_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_polyglotner_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_polyglotner_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|449.2 MB| + +## References + +https://huggingface.co/CheccoCando/roberta-base_PolyglotNER + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-roberta_base_reduced_upper_pattern_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-roberta_base_reduced_upper_pattern_pipeline_en.md new file mode 100644 index 00000000000000..cfcf04247a9853 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-roberta_base_reduced_upper_pattern_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_reduced_upper_pattern_pipeline pipeline RoBertaForSequenceClassification from Cournane +author: John Snow Labs +name: roberta_base_reduced_upper_pattern_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_reduced_upper_pattern_pipeline` is a English model originally trained by Cournane. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_reduced_upper_pattern_pipeline_en_5.5.0_3.0_1726805204737.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_reduced_upper_pattern_pipeline_en_5.5.0_3.0_1726805204737.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_reduced_upper_pattern_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_reduced_upper_pattern_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_reduced_upper_pattern_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|427.2 MB| + +## References + +https://huggingface.co/Cournane/roberta-base-reduced-Upper_pattern + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-roberta_base_sst2_modeltc_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-roberta_base_sst2_modeltc_pipeline_en.md new file mode 100644 index 00000000000000..86930cdd4eebc1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-roberta_base_sst2_modeltc_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_sst2_modeltc_pipeline pipeline RoBertaForSequenceClassification from ModelTC +author: John Snow Labs +name: roberta_base_sst2_modeltc_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_sst2_modeltc_pipeline` is a English model originally trained by ModelTC. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_sst2_modeltc_pipeline_en_5.5.0_3.0_1726851746124.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_sst2_modeltc_pipeline_en_5.5.0_3.0_1726851746124.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_sst2_modeltc_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_sst2_modeltc_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_sst2_modeltc_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|445.6 MB| + +## References + +https://huggingface.co/ModelTC/roberta-base-sst2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-roberta_baseline_finetuned_atis_3pct_v2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-roberta_baseline_finetuned_atis_3pct_v2_pipeline_en.md new file mode 100644 index 00000000000000..03601a9a181a67 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-roberta_baseline_finetuned_atis_3pct_v2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_baseline_finetuned_atis_3pct_v2_pipeline pipeline RoBertaForSequenceClassification from benayas +author: John Snow Labs +name: roberta_baseline_finetuned_atis_3pct_v2_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_baseline_finetuned_atis_3pct_v2_pipeline` is a English model originally trained by benayas. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_baseline_finetuned_atis_3pct_v2_pipeline_en_5.5.0_3.0_1726851858107.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_baseline_finetuned_atis_3pct_v2_pipeline_en_5.5.0_3.0_1726851858107.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_baseline_finetuned_atis_3pct_v2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_baseline_finetuned_atis_3pct_v2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_baseline_finetuned_atis_3pct_v2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|416.8 MB| + +## References + +https://huggingface.co/benayas/roberta-baseline-finetuned-atis_3pct_v2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-roberta_bert_10_good_en.md b/docs/_posts/ahmedlone127/2024-09-20-roberta_bert_10_good_en.md new file mode 100644 index 00000000000000..da1e7eaa7f0856 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-roberta_bert_10_good_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_bert_10_good RoBertaEmbeddings from ubaskota +author: John Snow Labs +name: roberta_bert_10_good +date: 2024-09-20 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_bert_10_good` is a English model originally trained by ubaskota. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_bert_10_good_en_5.5.0_3.0_1726857076155.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_bert_10_good_en_5.5.0_3.0_1726857076155.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("roberta_bert_10_good","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("roberta_bert_10_good","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_bert_10_good| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|466.4 MB| + +## References + +https://huggingface.co/ubaskota/roberta_BERT_10_good \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-roberta_bert_10_good_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-roberta_bert_10_good_pipeline_en.md new file mode 100644 index 00000000000000..ebb70f00faabd3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-roberta_bert_10_good_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_bert_10_good_pipeline pipeline RoBertaEmbeddings from ubaskota +author: John Snow Labs +name: roberta_bert_10_good_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_bert_10_good_pipeline` is a English model originally trained by ubaskota. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_bert_10_good_pipeline_en_5.5.0_3.0_1726857105973.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_bert_10_good_pipeline_en_5.5.0_3.0_1726857105973.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_bert_10_good_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_bert_10_good_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_bert_10_good_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|466.4 MB| + +## References + +https://huggingface.co/ubaskota/roberta_BERT_10_good + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-roberta_codesearchnet_nepal_bhasa_en.md b/docs/_posts/ahmedlone127/2024-09-20-roberta_codesearchnet_nepal_bhasa_en.md new file mode 100644 index 00000000000000..ee819e0eb7022f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-roberta_codesearchnet_nepal_bhasa_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_codesearchnet_nepal_bhasa RoBertaEmbeddings from shradha01 +author: John Snow Labs +name: roberta_codesearchnet_nepal_bhasa +date: 2024-09-20 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_codesearchnet_nepal_bhasa` is a English model originally trained by shradha01. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_codesearchnet_nepal_bhasa_en_5.5.0_3.0_1726816311900.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_codesearchnet_nepal_bhasa_en_5.5.0_3.0_1726816311900.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("roberta_codesearchnet_nepal_bhasa","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("roberta_codesearchnet_nepal_bhasa","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_codesearchnet_nepal_bhasa| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|466.0 MB| + +## References + +https://huggingface.co/shradha01/roberta_codesearchnet_new \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-roberta_combined_generated_epoch_4_en.md b/docs/_posts/ahmedlone127/2024-09-20-roberta_combined_generated_epoch_4_en.md new file mode 100644 index 00000000000000..dfd6d63a0be28c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-roberta_combined_generated_epoch_4_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_combined_generated_epoch_4 RoBertaForTokenClassification from ICT2214Team7 +author: John Snow Labs +name: roberta_combined_generated_epoch_4 +date: 2024-09-20 +tags: [en, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_combined_generated_epoch_4` is a English model originally trained by ICT2214Team7. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_combined_generated_epoch_4_en_5.5.0_3.0_1726853119606.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_combined_generated_epoch_4_en_5.5.0_3.0_1726853119606.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_combined_generated_epoch_4","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_combined_generated_epoch_4", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_combined_generated_epoch_4| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|306.6 MB| + +## References + +https://huggingface.co/ICT2214Team7/RoBERTa_Combined_Generated_epoch_4 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-roberta_combined_generated_epoch_4_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-roberta_combined_generated_epoch_4_pipeline_en.md new file mode 100644 index 00000000000000..a74f2dd816f160 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-roberta_combined_generated_epoch_4_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_combined_generated_epoch_4_pipeline pipeline RoBertaForTokenClassification from ICT2214Team7 +author: John Snow Labs +name: roberta_combined_generated_epoch_4_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_combined_generated_epoch_4_pipeline` is a English model originally trained by ICT2214Team7. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_combined_generated_epoch_4_pipeline_en_5.5.0_3.0_1726853134794.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_combined_generated_epoch_4_pipeline_en_5.5.0_3.0_1726853134794.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_combined_generated_epoch_4_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_combined_generated_epoch_4_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_combined_generated_epoch_4_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|306.6 MB| + +## References + +https://huggingface.co/ICT2214Team7/RoBERTa_Combined_Generated_epoch_4 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-roberta_empai_finetuned_def_en.md b/docs/_posts/ahmedlone127/2024-09-20-roberta_empai_finetuned_def_en.md new file mode 100644 index 00000000000000..5c5107518a9065 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-roberta_empai_finetuned_def_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_empai_finetuned_def RoBertaEmbeddings from LuangMV97 +author: John Snow Labs +name: roberta_empai_finetuned_def +date: 2024-09-20 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_empai_finetuned_def` is a English model originally trained by LuangMV97. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_empai_finetuned_def_en_5.5.0_3.0_1726796193487.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_empai_finetuned_def_en_5.5.0_3.0_1726796193487.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("roberta_empai_finetuned_def","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("roberta_empai_finetuned_def","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_empai_finetuned_def| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|466.4 MB| + +## References + +https://huggingface.co/LuangMV97/RoBERTa_EmpAI_FineTuned_def \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-roberta_ic_pborchert_en.md b/docs/_posts/ahmedlone127/2024-09-20-roberta_ic_pborchert_en.md new file mode 100644 index 00000000000000..e2fd3af123d57d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-roberta_ic_pborchert_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_ic_pborchert RoBertaEmbeddings from pborchert +author: John Snow Labs +name: roberta_ic_pborchert +date: 2024-09-20 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_ic_pborchert` is a English model originally trained by pborchert. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_ic_pborchert_en_5.5.0_3.0_1726857453043.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_ic_pborchert_en_5.5.0_3.0_1726857453043.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("roberta_ic_pborchert","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("roberta_ic_pborchert","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_ic_pborchert| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|466.3 MB| + +## References + +https://huggingface.co/pborchert/roberta-ic \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-roberta_ic_pborchert_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-roberta_ic_pborchert_pipeline_en.md new file mode 100644 index 00000000000000..4b79f3e70ad7b6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-roberta_ic_pborchert_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_ic_pborchert_pipeline pipeline RoBertaEmbeddings from pborchert +author: John Snow Labs +name: roberta_ic_pborchert_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_ic_pborchert_pipeline` is a English model originally trained by pborchert. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_ic_pborchert_pipeline_en_5.5.0_3.0_1726857475407.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_ic_pborchert_pipeline_en_5.5.0_3.0_1726857475407.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_ic_pborchert_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_ic_pborchert_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_ic_pborchert_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|466.3 MB| + +## References + +https://huggingface.co/pborchert/roberta-ic + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-roberta_large_bne_capitel_ner_spanish_es.md b/docs/_posts/ahmedlone127/2024-09-20-roberta_large_bne_capitel_ner_spanish_es.md new file mode 100644 index 00000000000000..f27e855987556d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-roberta_large_bne_capitel_ner_spanish_es.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Castilian, Spanish roberta_large_bne_capitel_ner_spanish RoBertaForTokenClassification from Dulfary +author: John Snow Labs +name: roberta_large_bne_capitel_ner_spanish +date: 2024-09-20 +tags: [es, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: es +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_large_bne_capitel_ner_spanish` is a Castilian, Spanish model originally trained by Dulfary. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_large_bne_capitel_ner_spanish_es_5.5.0_3.0_1726862445897.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_large_bne_capitel_ner_spanish_es_5.5.0_3.0_1726862445897.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_large_bne_capitel_ner_spanish","es") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_large_bne_capitel_ner_spanish", "es") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_large_bne_capitel_ner_spanish| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|es| +|Size:|1.3 GB| + +## References + +https://huggingface.co/Dulfary/roberta-large-bne-capitel-ner_spanish \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-roberta_large_bne_capitel_ner_spanish_pipeline_es.md b/docs/_posts/ahmedlone127/2024-09-20-roberta_large_bne_capitel_ner_spanish_pipeline_es.md new file mode 100644 index 00000000000000..4ce2ac7adb6115 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-roberta_large_bne_capitel_ner_spanish_pipeline_es.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Castilian, Spanish roberta_large_bne_capitel_ner_spanish_pipeline pipeline RoBertaForTokenClassification from Dulfary +author: John Snow Labs +name: roberta_large_bne_capitel_ner_spanish_pipeline +date: 2024-09-20 +tags: [es, open_source, pipeline, onnx] +task: Named Entity Recognition +language: es +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_large_bne_capitel_ner_spanish_pipeline` is a Castilian, Spanish model originally trained by Dulfary. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_large_bne_capitel_ner_spanish_pipeline_es_5.5.0_3.0_1726862508529.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_large_bne_capitel_ner_spanish_pipeline_es_5.5.0_3.0_1726862508529.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_large_bne_capitel_ner_spanish_pipeline", lang = "es") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_large_bne_capitel_ner_spanish_pipeline", lang = "es") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_large_bne_capitel_ner_spanish_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|es| +|Size:|1.3 GB| + +## References + +https://huggingface.co/Dulfary/roberta-large-bne-capitel-ner_spanish + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-roberta_large_detect_dep_v3_en.md b/docs/_posts/ahmedlone127/2024-09-20-roberta_large_detect_dep_v3_en.md new file mode 100644 index 00000000000000..51aadbe9446eb2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-roberta_large_detect_dep_v3_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_large_detect_dep_v3 RoBertaForSequenceClassification from Trong-Nghia +author: John Snow Labs +name: roberta_large_detect_dep_v3 +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_large_detect_dep_v3` is a English model originally trained by Trong-Nghia. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_large_detect_dep_v3_en_5.5.0_3.0_1726851948037.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_large_detect_dep_v3_en_5.5.0_3.0_1726851948037.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_large_detect_dep_v3","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_large_detect_dep_v3", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_large_detect_dep_v3| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/Trong-Nghia/roberta-large-detect-dep-v3 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-roberta_large_detect_dep_v3_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-roberta_large_detect_dep_v3_pipeline_en.md new file mode 100644 index 00000000000000..9a4dc293170d32 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-roberta_large_detect_dep_v3_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_large_detect_dep_v3_pipeline pipeline RoBertaForSequenceClassification from Trong-Nghia +author: John Snow Labs +name: roberta_large_detect_dep_v3_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_large_detect_dep_v3_pipeline` is a English model originally trained by Trong-Nghia. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_large_detect_dep_v3_pipeline_en_5.5.0_3.0_1726852034037.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_large_detect_dep_v3_pipeline_en_5.5.0_3.0_1726852034037.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_large_detect_dep_v3_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_large_detect_dep_v3_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_large_detect_dep_v3_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/Trong-Nghia/roberta-large-detect-dep-v3 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-roberta_large_finetuned_abbr_epoch18_en.md b/docs/_posts/ahmedlone127/2024-09-20-roberta_large_finetuned_abbr_epoch18_en.md new file mode 100644 index 00000000000000..a0ef6e224530f5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-roberta_large_finetuned_abbr_epoch18_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_large_finetuned_abbr_epoch18 RoBertaForTokenClassification from karsimkh +author: John Snow Labs +name: roberta_large_finetuned_abbr_epoch18 +date: 2024-09-20 +tags: [en, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_large_finetuned_abbr_epoch18` is a English model originally trained by karsimkh. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_large_finetuned_abbr_epoch18_en_5.5.0_3.0_1726847423233.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_large_finetuned_abbr_epoch18_en_5.5.0_3.0_1726847423233.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_large_finetuned_abbr_epoch18","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_large_finetuned_abbr_epoch18", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_large_finetuned_abbr_epoch18| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/karsimkh/roberta-large-finetuned-abbr-Epoch18 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-roberta_large_finetuned_abbr_epoch18_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-roberta_large_finetuned_abbr_epoch18_pipeline_en.md new file mode 100644 index 00000000000000..a8729febdaa89c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-roberta_large_finetuned_abbr_epoch18_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_large_finetuned_abbr_epoch18_pipeline pipeline RoBertaForTokenClassification from karsimkh +author: John Snow Labs +name: roberta_large_finetuned_abbr_epoch18_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_large_finetuned_abbr_epoch18_pipeline` is a English model originally trained by karsimkh. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_large_finetuned_abbr_epoch18_pipeline_en_5.5.0_3.0_1726847490570.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_large_finetuned_abbr_epoch18_pipeline_en_5.5.0_3.0_1726847490570.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_large_finetuned_abbr_epoch18_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_large_finetuned_abbr_epoch18_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_large_finetuned_abbr_epoch18_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/karsimkh/roberta-large-finetuned-abbr-Epoch18 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-roberta_large_finetuned_abbr_weightdecay0_1_en.md b/docs/_posts/ahmedlone127/2024-09-20-roberta_large_finetuned_abbr_weightdecay0_1_en.md new file mode 100644 index 00000000000000..8b75dea75a4e9b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-roberta_large_finetuned_abbr_weightdecay0_1_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_large_finetuned_abbr_weightdecay0_1 RoBertaForTokenClassification from karsimkh +author: John Snow Labs +name: roberta_large_finetuned_abbr_weightdecay0_1 +date: 2024-09-20 +tags: [en, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_large_finetuned_abbr_weightdecay0_1` is a English model originally trained by karsimkh. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_large_finetuned_abbr_weightdecay0_1_en_5.5.0_3.0_1726853558700.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_large_finetuned_abbr_weightdecay0_1_en_5.5.0_3.0_1726853558700.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_large_finetuned_abbr_weightdecay0_1","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_large_finetuned_abbr_weightdecay0_1", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_large_finetuned_abbr_weightdecay0_1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/karsimkh/roberta-large-finetuned-abbr-WeightDecay0.1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-roberta_large_finetuned_abbr_weightdecay0_1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-roberta_large_finetuned_abbr_weightdecay0_1_pipeline_en.md new file mode 100644 index 00000000000000..bc9661cc5c8ec6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-roberta_large_finetuned_abbr_weightdecay0_1_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_large_finetuned_abbr_weightdecay0_1_pipeline pipeline RoBertaForTokenClassification from karsimkh +author: John Snow Labs +name: roberta_large_finetuned_abbr_weightdecay0_1_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_large_finetuned_abbr_weightdecay0_1_pipeline` is a English model originally trained by karsimkh. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_large_finetuned_abbr_weightdecay0_1_pipeline_en_5.5.0_3.0_1726853627083.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_large_finetuned_abbr_weightdecay0_1_pipeline_en_5.5.0_3.0_1726853627083.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_large_finetuned_abbr_weightdecay0_1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_large_finetuned_abbr_weightdecay0_1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_large_finetuned_abbr_weightdecay0_1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/karsimkh/roberta-large-finetuned-abbr-WeightDecay0.1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-roberta_large_finetuned_combined_ds_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-roberta_large_finetuned_combined_ds_pipeline_en.md new file mode 100644 index 00000000000000..93c2347ee6b43c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-roberta_large_finetuned_combined_ds_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_large_finetuned_combined_ds_pipeline pipeline RoBertaForSequenceClassification from IIIT-L +author: John Snow Labs +name: roberta_large_finetuned_combined_ds_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_large_finetuned_combined_ds_pipeline` is a English model originally trained by IIIT-L. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_large_finetuned_combined_ds_pipeline_en_5.5.0_3.0_1726850825162.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_large_finetuned_combined_ds_pipeline_en_5.5.0_3.0_1726850825162.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_large_finetuned_combined_ds_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_large_finetuned_combined_ds_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_large_finetuned_combined_ds_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/IIIT-L/roberta-large-finetuned-combined-DS + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-roberta_large_lora_2_63m_snli_model1_en.md b/docs/_posts/ahmedlone127/2024-09-20-roberta_large_lora_2_63m_snli_model1_en.md new file mode 100644 index 00000000000000..72d8d9cb25af9b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-roberta_large_lora_2_63m_snli_model1_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_large_lora_2_63m_snli_model1 RoBertaForSequenceClassification from varun-v-rao +author: John Snow Labs +name: roberta_large_lora_2_63m_snli_model1 +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_large_lora_2_63m_snli_model1` is a English model originally trained by varun-v-rao. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_large_lora_2_63m_snli_model1_en_5.5.0_3.0_1726804895956.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_large_lora_2_63m_snli_model1_en_5.5.0_3.0_1726804895956.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_large_lora_2_63m_snli_model1","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_large_lora_2_63m_snli_model1", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_large_lora_2_63m_snli_model1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|915.0 MB| + +## References + +https://huggingface.co/varun-v-rao/roberta-large-lora-2.63M-snli-model1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-roberta_large_ncbi_en.md b/docs/_posts/ahmedlone127/2024-09-20-roberta_large_ncbi_en.md new file mode 100644 index 00000000000000..cf2002564f0bbb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-roberta_large_ncbi_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_large_ncbi RoBertaForTokenClassification from CheccoCando +author: John Snow Labs +name: roberta_large_ncbi +date: 2024-09-20 +tags: [en, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_large_ncbi` is a English model originally trained by CheccoCando. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_large_ncbi_en_5.5.0_3.0_1726862320259.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_large_ncbi_en_5.5.0_3.0_1726862320259.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_large_ncbi","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_large_ncbi", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_large_ncbi| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/CheccoCando/roberta-large_ncbi \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-roberta_large_ontonotes_en.md b/docs/_posts/ahmedlone127/2024-09-20-roberta_large_ontonotes_en.md new file mode 100644 index 00000000000000..7063fa5878cb7b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-roberta_large_ontonotes_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_large_ontonotes RoBertaForTokenClassification from CheccoCando +author: John Snow Labs +name: roberta_large_ontonotes +date: 2024-09-20 +tags: [en, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_large_ontonotes` is a English model originally trained by CheccoCando. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_large_ontonotes_en_5.5.0_3.0_1726862878309.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_large_ontonotes_en_5.5.0_3.0_1726862878309.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_large_ontonotes","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_large_ontonotes", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_large_ontonotes| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/CheccoCando/roberta-large_Ontonotes \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-roberta_large_ontonotes_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-roberta_large_ontonotes_pipeline_en.md new file mode 100644 index 00000000000000..ce1681ad7430cb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-roberta_large_ontonotes_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_large_ontonotes_pipeline pipeline RoBertaForTokenClassification from CheccoCando +author: John Snow Labs +name: roberta_large_ontonotes_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_large_ontonotes_pipeline` is a English model originally trained by CheccoCando. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_large_ontonotes_pipeline_en_5.5.0_3.0_1726862944918.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_large_ontonotes_pipeline_en_5.5.0_3.0_1726862944918.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_large_ontonotes_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_large_ontonotes_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_large_ontonotes_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/CheccoCando/roberta-large_Ontonotes + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-roberta_large_polyglotner_en.md b/docs/_posts/ahmedlone127/2024-09-20-roberta_large_polyglotner_en.md new file mode 100644 index 00000000000000..c36c700bde608e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-roberta_large_polyglotner_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_large_polyglotner RoBertaForTokenClassification from CheccoCando +author: John Snow Labs +name: roberta_large_polyglotner +date: 2024-09-20 +tags: [en, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_large_polyglotner` is a English model originally trained by CheccoCando. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_large_polyglotner_en_5.5.0_3.0_1726862590690.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_large_polyglotner_en_5.5.0_3.0_1726862590690.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_large_polyglotner","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_large_polyglotner", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_large_polyglotner| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/CheccoCando/roberta-large_PolyglotNER \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-roberta_large_polyglotner_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-roberta_large_polyglotner_pipeline_en.md new file mode 100644 index 00000000000000..808c67d7fc7121 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-roberta_large_polyglotner_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_large_polyglotner_pipeline pipeline RoBertaForTokenClassification from CheccoCando +author: John Snow Labs +name: roberta_large_polyglotner_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_large_polyglotner_pipeline` is a English model originally trained by CheccoCando. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_large_polyglotner_pipeline_en_5.5.0_3.0_1726862656273.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_large_polyglotner_pipeline_en_5.5.0_3.0_1726862656273.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_large_polyglotner_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_large_polyglotner_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_large_polyglotner_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/CheccoCando/roberta-large_PolyglotNER + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-roberta_large_unlabeled_labeled_gab_reddit_task_semeval2023_t10_270000sample_en.md b/docs/_posts/ahmedlone127/2024-09-20-roberta_large_unlabeled_labeled_gab_reddit_task_semeval2023_t10_270000sample_en.md new file mode 100644 index 00000000000000..660b7c07b634e2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-roberta_large_unlabeled_labeled_gab_reddit_task_semeval2023_t10_270000sample_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_large_unlabeled_labeled_gab_reddit_task_semeval2023_t10_270000sample RoBertaEmbeddings from HPL +author: John Snow Labs +name: roberta_large_unlabeled_labeled_gab_reddit_task_semeval2023_t10_270000sample +date: 2024-09-20 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_large_unlabeled_labeled_gab_reddit_task_semeval2023_t10_270000sample` is a English model originally trained by HPL. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_large_unlabeled_labeled_gab_reddit_task_semeval2023_t10_270000sample_en_5.5.0_3.0_1726857595167.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_large_unlabeled_labeled_gab_reddit_task_semeval2023_t10_270000sample_en_5.5.0_3.0_1726857595167.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("roberta_large_unlabeled_labeled_gab_reddit_task_semeval2023_t10_270000sample","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("roberta_large_unlabeled_labeled_gab_reddit_task_semeval2023_t10_270000sample","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_large_unlabeled_labeled_gab_reddit_task_semeval2023_t10_270000sample| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/HPL/roberta-large-unlabeled-labeled-gab-reddit-task-semeval2023-t10-270000sample \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-roberta_large_unlabeled_labeled_gab_reddit_task_semeval2023_t10_270000sample_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-roberta_large_unlabeled_labeled_gab_reddit_task_semeval2023_t10_270000sample_pipeline_en.md new file mode 100644 index 00000000000000..07682d2b42f962 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-roberta_large_unlabeled_labeled_gab_reddit_task_semeval2023_t10_270000sample_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_large_unlabeled_labeled_gab_reddit_task_semeval2023_t10_270000sample_pipeline pipeline RoBertaEmbeddings from HPL +author: John Snow Labs +name: roberta_large_unlabeled_labeled_gab_reddit_task_semeval2023_t10_270000sample_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_large_unlabeled_labeled_gab_reddit_task_semeval2023_t10_270000sample_pipeline` is a English model originally trained by HPL. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_large_unlabeled_labeled_gab_reddit_task_semeval2023_t10_270000sample_pipeline_en_5.5.0_3.0_1726857658972.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_large_unlabeled_labeled_gab_reddit_task_semeval2023_t10_270000sample_pipeline_en_5.5.0_3.0_1726857658972.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_large_unlabeled_labeled_gab_reddit_task_semeval2023_t10_270000sample_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_large_unlabeled_labeled_gab_reddit_task_semeval2023_t10_270000sample_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_large_unlabeled_labeled_gab_reddit_task_semeval2023_t10_270000sample_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/HPL/roberta-large-unlabeled-labeled-gab-reddit-task-semeval2023-t10-270000sample + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-roberta_large_with_labeled_data_and_unlabeled_gab_reddit_semeval2023_task10_13300_labeled_sample_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-roberta_large_with_labeled_data_and_unlabeled_gab_reddit_semeval2023_task10_13300_labeled_sample_pipeline_en.md new file mode 100644 index 00000000000000..65a9bba748aada --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-roberta_large_with_labeled_data_and_unlabeled_gab_reddit_semeval2023_task10_13300_labeled_sample_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_large_with_labeled_data_and_unlabeled_gab_reddit_semeval2023_task10_13300_labeled_sample_pipeline pipeline RoBertaEmbeddings from Hudee +author: John Snow Labs +name: roberta_large_with_labeled_data_and_unlabeled_gab_reddit_semeval2023_task10_13300_labeled_sample_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_large_with_labeled_data_and_unlabeled_gab_reddit_semeval2023_task10_13300_labeled_sample_pipeline` is a English model originally trained by Hudee. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_large_with_labeled_data_and_unlabeled_gab_reddit_semeval2023_task10_13300_labeled_sample_pipeline_en_5.5.0_3.0_1726793875961.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_large_with_labeled_data_and_unlabeled_gab_reddit_semeval2023_task10_13300_labeled_sample_pipeline_en_5.5.0_3.0_1726793875961.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_large_with_labeled_data_and_unlabeled_gab_reddit_semeval2023_task10_13300_labeled_sample_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_large_with_labeled_data_and_unlabeled_gab_reddit_semeval2023_task10_13300_labeled_sample_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_large_with_labeled_data_and_unlabeled_gab_reddit_semeval2023_task10_13300_labeled_sample_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/Hudee/roberta-large-with-labeled-data-and-unlabeled-gab-reddit-semeval2023-task10-13300-labeled-sample + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-roberta_link_en.md b/docs/_posts/ahmedlone127/2024-09-20-roberta_link_en.md new file mode 100644 index 00000000000000..488eef0111bb96 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-roberta_link_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_link RoBertaForTokenClassification from chanwoopark +author: John Snow Labs +name: roberta_link +date: 2024-09-20 +tags: [en, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_link` is a English model originally trained by chanwoopark. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_link_en_5.5.0_3.0_1726846890436.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_link_en_5.5.0_3.0_1726846890436.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_link","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_link", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_link| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|462.4 MB| + +## References + +https://huggingface.co/chanwoopark/roberta-link \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-roberta_link_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-roberta_link_pipeline_en.md new file mode 100644 index 00000000000000..4840352e3d87f8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-roberta_link_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_link_pipeline pipeline RoBertaForTokenClassification from chanwoopark +author: John Snow Labs +name: roberta_link_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_link_pipeline` is a English model originally trained by chanwoopark. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_link_pipeline_en_5.5.0_3.0_1726846913970.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_link_pipeline_en_5.5.0_3.0_1726846913970.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_link_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_link_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_link_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|462.5 MB| + +## References + +https://huggingface.co/chanwoopark/roberta-link + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-roberta_medium_word_chinese_cluecorpussmall_pipeline_zh.md b/docs/_posts/ahmedlone127/2024-09-20-roberta_medium_word_chinese_cluecorpussmall_pipeline_zh.md new file mode 100644 index 00000000000000..c94582effa34b3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-roberta_medium_word_chinese_cluecorpussmall_pipeline_zh.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Chinese roberta_medium_word_chinese_cluecorpussmall_pipeline pipeline BertEmbeddings from uer +author: John Snow Labs +name: roberta_medium_word_chinese_cluecorpussmall_pipeline +date: 2024-09-20 +tags: [zh, open_source, pipeline, onnx] +task: Embeddings +language: zh +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_medium_word_chinese_cluecorpussmall_pipeline` is a Chinese model originally trained by uer. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_medium_word_chinese_cluecorpussmall_pipeline_zh_5.5.0_3.0_1726806326404.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_medium_word_chinese_cluecorpussmall_pipeline_zh_5.5.0_3.0_1726806326404.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_medium_word_chinese_cluecorpussmall_pipeline", lang = "zh") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_medium_word_chinese_cluecorpussmall_pipeline", lang = "zh") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_medium_word_chinese_cluecorpussmall_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|zh| +|Size:|287.4 MB| + +## References + +https://huggingface.co/uer/roberta-medium-word-chinese-cluecorpussmall + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-roberta_medium_word_chinese_cluecorpussmall_zh.md b/docs/_posts/ahmedlone127/2024-09-20-roberta_medium_word_chinese_cluecorpussmall_zh.md new file mode 100644 index 00000000000000..50fc5454650d99 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-roberta_medium_word_chinese_cluecorpussmall_zh.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Chinese roberta_medium_word_chinese_cluecorpussmall BertEmbeddings from uer +author: John Snow Labs +name: roberta_medium_word_chinese_cluecorpussmall +date: 2024-09-20 +tags: [zh, open_source, onnx, embeddings, bert] +task: Embeddings +language: zh +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_medium_word_chinese_cluecorpussmall` is a Chinese model originally trained by uer. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_medium_word_chinese_cluecorpussmall_zh_5.5.0_3.0_1726806311796.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_medium_word_chinese_cluecorpussmall_zh_5.5.0_3.0_1726806311796.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = BertEmbeddings.pretrained("roberta_medium_word_chinese_cluecorpussmall","zh") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = BertEmbeddings.pretrained("roberta_medium_word_chinese_cluecorpussmall","zh") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_medium_word_chinese_cluecorpussmall| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[bert]| +|Language:|zh| +|Size:|287.4 MB| + +## References + +https://huggingface.co/uer/roberta-medium-word-chinese-cluecorpussmall \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-roberta_ner_devanshrj_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-roberta_ner_devanshrj_pipeline_en.md new file mode 100644 index 00000000000000..a0506d2c0f06fb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-roberta_ner_devanshrj_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_ner_devanshrj_pipeline pipeline RoBertaForTokenClassification from devanshrj +author: John Snow Labs +name: roberta_ner_devanshrj_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_ner_devanshrj_pipeline` is a English model originally trained by devanshrj. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_ner_devanshrj_pipeline_en_5.5.0_3.0_1726847651924.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_ner_devanshrj_pipeline_en_5.5.0_3.0_1726847651924.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_ner_devanshrj_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_ner_devanshrj_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_ner_devanshrj_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|428.2 MB| + +## References + +https://huggingface.co/devanshrj/roberta-ner + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-roberta_nerc_en.md b/docs/_posts/ahmedlone127/2024-09-20-roberta_nerc_en.md new file mode 100644 index 00000000000000..9921786de3b2f4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-roberta_nerc_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_nerc RoBertaForTokenClassification from pawlo2013 +author: John Snow Labs +name: roberta_nerc +date: 2024-09-20 +tags: [en, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_nerc` is a English model originally trained by pawlo2013. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_nerc_en_5.5.0_3.0_1726853476626.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_nerc_en_5.5.0_3.0_1726853476626.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_nerc","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_nerc", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_nerc| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|432.0 MB| + +## References + +https://huggingface.co/pawlo2013/roberta-nerc \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-roberta_nerc_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-roberta_nerc_pipeline_en.md new file mode 100644 index 00000000000000..48ac8d326e3123 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-roberta_nerc_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_nerc_pipeline pipeline RoBertaForTokenClassification from pawlo2013 +author: John Snow Labs +name: roberta_nerc_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_nerc_pipeline` is a English model originally trained by pawlo2013. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_nerc_pipeline_en_5.5.0_3.0_1726853511540.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_nerc_pipeline_en_5.5.0_3.0_1726853511540.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_nerc_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_nerc_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_nerc_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|432.0 MB| + +## References + +https://huggingface.co/pawlo2013/roberta-nerc + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-roberta_on_movie_review_data_en.md b/docs/_posts/ahmedlone127/2024-09-20-roberta_on_movie_review_data_en.md new file mode 100644 index 00000000000000..658ae5a82adf6f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-roberta_on_movie_review_data_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_on_movie_review_data RoBertaForSequenceClassification from allevelly +author: John Snow Labs +name: roberta_on_movie_review_data +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_on_movie_review_data` is a English model originally trained by allevelly. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_on_movie_review_data_en_5.5.0_3.0_1726850273428.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_on_movie_review_data_en_5.5.0_3.0_1726850273428.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_on_movie_review_data","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_on_movie_review_data", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_on_movie_review_data| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|442.9 MB| + +## References + +https://huggingface.co/allevelly/roberta_on_Movie_Review_Data \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-roberta_on_movie_review_data_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-roberta_on_movie_review_data_pipeline_en.md new file mode 100644 index 00000000000000..109eefd900562b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-roberta_on_movie_review_data_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_on_movie_review_data_pipeline pipeline RoBertaForSequenceClassification from allevelly +author: John Snow Labs +name: roberta_on_movie_review_data_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_on_movie_review_data_pipeline` is a English model originally trained by allevelly. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_on_movie_review_data_pipeline_en_5.5.0_3.0_1726850304787.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_on_movie_review_data_pipeline_en_5.5.0_3.0_1726850304787.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_on_movie_review_data_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_on_movie_review_data_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_on_movie_review_data_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|442.9 MB| + +## References + +https://huggingface.co/allevelly/roberta_on_Movie_Review_Data + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-roberta_retrained_russian_covid_papers_en.md b/docs/_posts/ahmedlone127/2024-09-20-roberta_retrained_russian_covid_papers_en.md new file mode 100644 index 00000000000000..7a946e14d4595c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-roberta_retrained_russian_covid_papers_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_retrained_russian_covid_papers RoBertaEmbeddings from Daryaflp +author: John Snow Labs +name: roberta_retrained_russian_covid_papers +date: 2024-09-20 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_retrained_russian_covid_papers` is a English model originally trained by Daryaflp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_retrained_russian_covid_papers_en_5.5.0_3.0_1726796737593.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_retrained_russian_covid_papers_en_5.5.0_3.0_1726796737593.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("roberta_retrained_russian_covid_papers","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("roberta_retrained_russian_covid_papers","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_retrained_russian_covid_papers| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|463.9 MB| + +## References + +https://huggingface.co/Daryaflp/roberta-retrained_ru_covid_papers \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-roberta_sayula_popoluca_tagging_amir01_en.md b/docs/_posts/ahmedlone127/2024-09-20-roberta_sayula_popoluca_tagging_amir01_en.md new file mode 100644 index 00000000000000..4c4a5180c4f1c4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-roberta_sayula_popoluca_tagging_amir01_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_sayula_popoluca_tagging_amir01 RoBertaForTokenClassification from Amir01 +author: John Snow Labs +name: roberta_sayula_popoluca_tagging_amir01 +date: 2024-09-20 +tags: [en, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_sayula_popoluca_tagging_amir01` is a English model originally trained by Amir01. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_sayula_popoluca_tagging_amir01_en_5.5.0_3.0_1726853246028.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_sayula_popoluca_tagging_amir01_en_5.5.0_3.0_1726853246028.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_sayula_popoluca_tagging_amir01","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_sayula_popoluca_tagging_amir01", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_sayula_popoluca_tagging_amir01| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|464.5 MB| + +## References + +https://huggingface.co/Amir01/roberta-pos-tagging \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-roberta_sayula_popoluca_tagging_amir01_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-roberta_sayula_popoluca_tagging_amir01_pipeline_en.md new file mode 100644 index 00000000000000..4fc134b6e1c070 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-roberta_sayula_popoluca_tagging_amir01_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_sayula_popoluca_tagging_amir01_pipeline pipeline RoBertaForTokenClassification from Amir01 +author: John Snow Labs +name: roberta_sayula_popoluca_tagging_amir01_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_sayula_popoluca_tagging_amir01_pipeline` is a English model originally trained by Amir01. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_sayula_popoluca_tagging_amir01_pipeline_en_5.5.0_3.0_1726853272400.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_sayula_popoluca_tagging_amir01_pipeline_en_5.5.0_3.0_1726853272400.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_sayula_popoluca_tagging_amir01_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_sayula_popoluca_tagging_amir01_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_sayula_popoluca_tagging_amir01_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|464.5 MB| + +## References + +https://huggingface.co/Amir01/roberta-pos-tagging + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-roberta_small_word_chinese_cluecorpussmall_pipeline_zh.md b/docs/_posts/ahmedlone127/2024-09-20-roberta_small_word_chinese_cluecorpussmall_pipeline_zh.md new file mode 100644 index 00000000000000..186f14ded1469e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-roberta_small_word_chinese_cluecorpussmall_pipeline_zh.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Chinese roberta_small_word_chinese_cluecorpussmall_pipeline pipeline BertEmbeddings from uer +author: John Snow Labs +name: roberta_small_word_chinese_cluecorpussmall_pipeline +date: 2024-09-20 +tags: [zh, open_source, pipeline, onnx] +task: Embeddings +language: zh +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_small_word_chinese_cluecorpussmall_pipeline` is a Chinese model originally trained by uer. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_small_word_chinese_cluecorpussmall_pipeline_zh_5.5.0_3.0_1726805984748.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_small_word_chinese_cluecorpussmall_pipeline_zh_5.5.0_3.0_1726805984748.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_small_word_chinese_cluecorpussmall_pipeline", lang = "zh") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_small_word_chinese_cluecorpussmall_pipeline", lang = "zh") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_small_word_chinese_cluecorpussmall_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|zh| +|Size:|240.3 MB| + +## References + +https://huggingface.co/uer/roberta-small-word-chinese-cluecorpussmall + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-roberta_tagalog_base_ft_udpos213_japanese_en.md b/docs/_posts/ahmedlone127/2024-09-20-roberta_tagalog_base_ft_udpos213_japanese_en.md new file mode 100644 index 00000000000000..f0404bc9094f39 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-roberta_tagalog_base_ft_udpos213_japanese_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_tagalog_base_ft_udpos213_japanese RoBertaForTokenClassification from iceman2434 +author: John Snow Labs +name: roberta_tagalog_base_ft_udpos213_japanese +date: 2024-09-20 +tags: [en, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_tagalog_base_ft_udpos213_japanese` is a English model originally trained by iceman2434. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_tagalog_base_ft_udpos213_japanese_en_5.5.0_3.0_1726862140811.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_tagalog_base_ft_udpos213_japanese_en_5.5.0_3.0_1726862140811.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_tagalog_base_ft_udpos213_japanese","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_tagalog_base_ft_udpos213_japanese", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_tagalog_base_ft_udpos213_japanese| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/iceman2434/roberta-tagalog-base-ft-udpos213-ja \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-roberta_tagalog_base_ft_udpos213_japanese_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-roberta_tagalog_base_ft_udpos213_japanese_pipeline_en.md new file mode 100644 index 00000000000000..de2f5042610ae3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-roberta_tagalog_base_ft_udpos213_japanese_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_tagalog_base_ft_udpos213_japanese_pipeline pipeline RoBertaForTokenClassification from iceman2434 +author: John Snow Labs +name: roberta_tagalog_base_ft_udpos213_japanese_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_tagalog_base_ft_udpos213_japanese_pipeline` is a English model originally trained by iceman2434. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_tagalog_base_ft_udpos213_japanese_pipeline_en_5.5.0_3.0_1726862159845.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_tagalog_base_ft_udpos213_japanese_pipeline_en_5.5.0_3.0_1726862159845.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_tagalog_base_ft_udpos213_japanese_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_tagalog_base_ft_udpos213_japanese_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_tagalog_base_ft_udpos213_japanese_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/iceman2434/roberta-tagalog-base-ft-udpos213-ja + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-roberta_tagalog_base_ft_udpos213_manx_pipeline_tl.md b/docs/_posts/ahmedlone127/2024-09-20-roberta_tagalog_base_ft_udpos213_manx_pipeline_tl.md new file mode 100644 index 00000000000000..840b4892a85e05 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-roberta_tagalog_base_ft_udpos213_manx_pipeline_tl.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Tagalog roberta_tagalog_base_ft_udpos213_manx_pipeline pipeline RoBertaForTokenClassification from iceman2434 +author: John Snow Labs +name: roberta_tagalog_base_ft_udpos213_manx_pipeline +date: 2024-09-20 +tags: [tl, open_source, pipeline, onnx] +task: Named Entity Recognition +language: tl +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_tagalog_base_ft_udpos213_manx_pipeline` is a Tagalog model originally trained by iceman2434. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_tagalog_base_ft_udpos213_manx_pipeline_tl_5.5.0_3.0_1726847021470.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_tagalog_base_ft_udpos213_manx_pipeline_tl_5.5.0_3.0_1726847021470.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_tagalog_base_ft_udpos213_manx_pipeline", lang = "tl") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_tagalog_base_ft_udpos213_manx_pipeline", lang = "tl") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_tagalog_base_ft_udpos213_manx_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|tl| +|Size:|407.2 MB| + +## References + +https://huggingface.co/iceman2434/roberta-tagalog-base-ft-udpos213-gv + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-roberta_tagalog_base_ft_udpos213_manx_tl.md b/docs/_posts/ahmedlone127/2024-09-20-roberta_tagalog_base_ft_udpos213_manx_tl.md new file mode 100644 index 00000000000000..8db15cf3f2c4f9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-roberta_tagalog_base_ft_udpos213_manx_tl.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Tagalog roberta_tagalog_base_ft_udpos213_manx RoBertaForTokenClassification from iceman2434 +author: John Snow Labs +name: roberta_tagalog_base_ft_udpos213_manx +date: 2024-09-20 +tags: [tl, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: tl +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_tagalog_base_ft_udpos213_manx` is a Tagalog model originally trained by iceman2434. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_tagalog_base_ft_udpos213_manx_tl_5.5.0_3.0_1726847001856.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_tagalog_base_ft_udpos213_manx_tl_5.5.0_3.0_1726847001856.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_tagalog_base_ft_udpos213_manx","tl") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_tagalog_base_ft_udpos213_manx", "tl") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_tagalog_base_ft_udpos213_manx| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|tl| +|Size:|407.2 MB| + +## References + +https://huggingface.co/iceman2434/roberta-tagalog-base-ft-udpos213-gv \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-roberta_tagalog_base_ft_udpos213_top3lang_en.md b/docs/_posts/ahmedlone127/2024-09-20-roberta_tagalog_base_ft_udpos213_top3lang_en.md new file mode 100644 index 00000000000000..6d420d20d713b0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-roberta_tagalog_base_ft_udpos213_top3lang_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_tagalog_base_ft_udpos213_top3lang RoBertaForTokenClassification from katrinatan +author: John Snow Labs +name: roberta_tagalog_base_ft_udpos213_top3lang +date: 2024-09-20 +tags: [en, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_tagalog_base_ft_udpos213_top3lang` is a English model originally trained by katrinatan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_tagalog_base_ft_udpos213_top3lang_en_5.5.0_3.0_1726853498989.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_tagalog_base_ft_udpos213_top3lang_en_5.5.0_3.0_1726853498989.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_tagalog_base_ft_udpos213_top3lang","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_tagalog_base_ft_udpos213_top3lang", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_tagalog_base_ft_udpos213_top3lang| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/katrinatan/roberta-tagalog-base-ft-udpos213-top3lang \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-roberta_tagalog_base_ft_udpos213_top3lang_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-roberta_tagalog_base_ft_udpos213_top3lang_pipeline_en.md new file mode 100644 index 00000000000000..e2fed027f83062 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-roberta_tagalog_base_ft_udpos213_top3lang_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_tagalog_base_ft_udpos213_top3lang_pipeline pipeline RoBertaForTokenClassification from katrinatan +author: John Snow Labs +name: roberta_tagalog_base_ft_udpos213_top3lang_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_tagalog_base_ft_udpos213_top3lang_pipeline` is a English model originally trained by katrinatan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_tagalog_base_ft_udpos213_top3lang_pipeline_en_5.5.0_3.0_1726853518338.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_tagalog_base_ft_udpos213_top3lang_pipeline_en_5.5.0_3.0_1726853518338.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_tagalog_base_ft_udpos213_top3lang_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_tagalog_base_ft_udpos213_top3lang_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_tagalog_base_ft_udpos213_top3lang_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/katrinatan/roberta-tagalog-base-ft-udpos213-top3lang + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-roberta_topic_en.md b/docs/_posts/ahmedlone127/2024-09-20-roberta_topic_en.md new file mode 100644 index 00000000000000..53123d607c433d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-roberta_topic_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_topic RoBertaForSequenceClassification from pawlo2013 +author: John Snow Labs +name: roberta_topic +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_topic` is a English model originally trained by pawlo2013. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_topic_en_5.5.0_3.0_1726851699050.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_topic_en_5.5.0_3.0_1726851699050.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_topic","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_topic", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_topic| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|465.6 MB| + +## References + +https://huggingface.co/pawlo2013/roberta_topic \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-roberta_topic_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-roberta_topic_pipeline_en.md new file mode 100644 index 00000000000000..84ad65b2abe605 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-roberta_topic_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_topic_pipeline pipeline RoBertaForSequenceClassification from pawlo2013 +author: John Snow Labs +name: roberta_topic_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_topic_pipeline` is a English model originally trained by pawlo2013. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_topic_pipeline_en_5.5.0_3.0_1726851722277.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_topic_pipeline_en_5.5.0_3.0_1726851722277.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_topic_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_topic_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_topic_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|465.6 MB| + +## References + +https://huggingface.co/pawlo2013/roberta_topic + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-robertalex_mlm_armas_inga_estrella_en.md b/docs/_posts/ahmedlone127/2024-09-20-robertalex_mlm_armas_inga_estrella_en.md new file mode 100644 index 00000000000000..c645cb60526159 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-robertalex_mlm_armas_inga_estrella_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English robertalex_mlm_armas_inga_estrella RoBertaEmbeddings from JFernandoGRE +author: John Snow Labs +name: robertalex_mlm_armas_inga_estrella +date: 2024-09-20 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`robertalex_mlm_armas_inga_estrella` is a English model originally trained by JFernandoGRE. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/robertalex_mlm_armas_inga_estrella_en_5.5.0_3.0_1726857591924.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/robertalex_mlm_armas_inga_estrella_en_5.5.0_3.0_1726857591924.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("robertalex_mlm_armas_inga_estrella","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("robertalex_mlm_armas_inga_estrella","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|robertalex_mlm_armas_inga_estrella| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|468.2 MB| + +## References + +https://huggingface.co/JFernandoGRE/RoBERTalex_mlm_ARMAS_INGA_ESTRELLA \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-robertalex_mlm_armas_inga_estrella_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-robertalex_mlm_armas_inga_estrella_pipeline_en.md new file mode 100644 index 00000000000000..fbd61caf6434f2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-robertalex_mlm_armas_inga_estrella_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English robertalex_mlm_armas_inga_estrella_pipeline pipeline RoBertaEmbeddings from JFernandoGRE +author: John Snow Labs +name: robertalex_mlm_armas_inga_estrella_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`robertalex_mlm_armas_inga_estrella_pipeline` is a English model originally trained by JFernandoGRE. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/robertalex_mlm_armas_inga_estrella_pipeline_en_5.5.0_3.0_1726857614791.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/robertalex_mlm_armas_inga_estrella_pipeline_en_5.5.0_3.0_1726857614791.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("robertalex_mlm_armas_inga_estrella_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("robertalex_mlm_armas_inga_estrella_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|robertalex_mlm_armas_inga_estrella_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|468.2 MB| + +## References + +https://huggingface.co/JFernandoGRE/RoBERTalex_mlm_ARMAS_INGA_ESTRELLA + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-rubert_tiny_sberquad_6ep_1_en.md b/docs/_posts/ahmedlone127/2024-09-20-rubert_tiny_sberquad_6ep_1_en.md new file mode 100644 index 00000000000000..8efa91bf3aae9e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-rubert_tiny_sberquad_6ep_1_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English rubert_tiny_sberquad_6ep_1 BertForQuestionAnswering from Mathnub +author: John Snow Labs +name: rubert_tiny_sberquad_6ep_1 +date: 2024-09-20 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`rubert_tiny_sberquad_6ep_1` is a English model originally trained by Mathnub. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/rubert_tiny_sberquad_6ep_1_en_5.5.0_3.0_1726820368432.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/rubert_tiny_sberquad_6ep_1_en_5.5.0_3.0_1726820368432.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("rubert_tiny_sberquad_6ep_1","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("rubert_tiny_sberquad_6ep_1", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|rubert_tiny_sberquad_6ep_1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|43.8 MB| + +## References + +https://huggingface.co/Mathnub/rubert-tiny-sberquad-6ep-1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-rubert_tiny_sberquad_6ep_1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-rubert_tiny_sberquad_6ep_1_pipeline_en.md new file mode 100644 index 00000000000000..f874c6e1611bfa --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-rubert_tiny_sberquad_6ep_1_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English rubert_tiny_sberquad_6ep_1_pipeline pipeline BertForQuestionAnswering from Mathnub +author: John Snow Labs +name: rubert_tiny_sberquad_6ep_1_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`rubert_tiny_sberquad_6ep_1_pipeline` is a English model originally trained by Mathnub. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/rubert_tiny_sberquad_6ep_1_pipeline_en_5.5.0_3.0_1726820370859.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/rubert_tiny_sberquad_6ep_1_pipeline_en_5.5.0_3.0_1726820370859.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("rubert_tiny_sberquad_6ep_1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("rubert_tiny_sberquad_6ep_1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|rubert_tiny_sberquad_6ep_1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|43.8 MB| + +## References + +https://huggingface.co/Mathnub/rubert-tiny-sberquad-6ep-1 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-rulebert_v0_5_k0_it.md b/docs/_posts/ahmedlone127/2024-09-20-rulebert_v0_5_k0_it.md new file mode 100644 index 00000000000000..9600f6114dde77 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-rulebert_v0_5_k0_it.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Italian rulebert_v0_5_k0 XlmRoBertaForSequenceClassification from ribesstefano +author: John Snow Labs +name: rulebert_v0_5_k0 +date: 2024-09-20 +tags: [it, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: it +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`rulebert_v0_5_k0` is a Italian model originally trained by ribesstefano. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/rulebert_v0_5_k0_it_5.5.0_3.0_1726845885713.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/rulebert_v0_5_k0_it_5.5.0_3.0_1726845885713.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("rulebert_v0_5_k0","it") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("rulebert_v0_5_k0", "it") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|rulebert_v0_5_k0| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|it| +|Size:|870.4 MB| + +## References + +https://huggingface.co/ribesstefano/RuleBert-v0.5-k0 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-rulebert_v0_5_k0_pipeline_it.md b/docs/_posts/ahmedlone127/2024-09-20-rulebert_v0_5_k0_pipeline_it.md new file mode 100644 index 00000000000000..0cda18b8fdf9f7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-rulebert_v0_5_k0_pipeline_it.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Italian rulebert_v0_5_k0_pipeline pipeline XlmRoBertaForSequenceClassification from ribesstefano +author: John Snow Labs +name: rulebert_v0_5_k0_pipeline +date: 2024-09-20 +tags: [it, open_source, pipeline, onnx] +task: Text Classification +language: it +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`rulebert_v0_5_k0_pipeline` is a Italian model originally trained by ribesstefano. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/rulebert_v0_5_k0_pipeline_it_5.5.0_3.0_1726845990061.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/rulebert_v0_5_k0_pipeline_it_5.5.0_3.0_1726845990061.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("rulebert_v0_5_k0_pipeline", lang = "it") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("rulebert_v0_5_k0_pipeline", lang = "it") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|rulebert_v0_5_k0_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|it| +|Size:|870.5 MB| + +## References + +https://huggingface.co/ribesstefano/RuleBert-v0.5-k0 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-sarcasm_detection_bert_base_uncased_sayula_popoluca_en.md b/docs/_posts/ahmedlone127/2024-09-20-sarcasm_detection_bert_base_uncased_sayula_popoluca_en.md new file mode 100644 index 00000000000000..c2e21e3d990965 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-sarcasm_detection_bert_base_uncased_sayula_popoluca_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sarcasm_detection_bert_base_uncased_sayula_popoluca BertForSequenceClassification from jkhan447 +author: John Snow Labs +name: sarcasm_detection_bert_base_uncased_sayula_popoluca +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sarcasm_detection_bert_base_uncased_sayula_popoluca` is a English model originally trained by jkhan447. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sarcasm_detection_bert_base_uncased_sayula_popoluca_en_5.5.0_3.0_1726860135168.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sarcasm_detection_bert_base_uncased_sayula_popoluca_en_5.5.0_3.0_1726860135168.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("sarcasm_detection_bert_base_uncased_sayula_popoluca","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("sarcasm_detection_bert_base_uncased_sayula_popoluca", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sarcasm_detection_bert_base_uncased_sayula_popoluca| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/jkhan447/sarcasm-detection-Bert-base-uncased-POS \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-sarcasm_detection_bert_base_uncased_sayula_popoluca_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-sarcasm_detection_bert_base_uncased_sayula_popoluca_pipeline_en.md new file mode 100644 index 00000000000000..3c92aeb17092cb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-sarcasm_detection_bert_base_uncased_sayula_popoluca_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English sarcasm_detection_bert_base_uncased_sayula_popoluca_pipeline pipeline BertForSequenceClassification from jkhan447 +author: John Snow Labs +name: sarcasm_detection_bert_base_uncased_sayula_popoluca_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sarcasm_detection_bert_base_uncased_sayula_popoluca_pipeline` is a English model originally trained by jkhan447. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sarcasm_detection_bert_base_uncased_sayula_popoluca_pipeline_en_5.5.0_3.0_1726860155220.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sarcasm_detection_bert_base_uncased_sayula_popoluca_pipeline_en_5.5.0_3.0_1726860155220.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sarcasm_detection_bert_base_uncased_sayula_popoluca_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sarcasm_detection_bert_base_uncased_sayula_popoluca_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sarcasm_detection_bert_base_uncased_sayula_popoluca_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/jkhan447/sarcasm-detection-Bert-base-uncased-POS + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-scam_detection_zelchy_en.md b/docs/_posts/ahmedlone127/2024-09-20-scam_detection_zelchy_en.md new file mode 100644 index 00000000000000..f41625507dec07 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-scam_detection_zelchy_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English scam_detection_zelchy BertForSequenceClassification from zelchy +author: John Snow Labs +name: scam_detection_zelchy +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`scam_detection_zelchy` is a English model originally trained by zelchy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/scam_detection_zelchy_en_5.5.0_3.0_1726828994543.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/scam_detection_zelchy_en_5.5.0_3.0_1726828994543.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("scam_detection_zelchy","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("scam_detection_zelchy", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|scam_detection_zelchy| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/zelchy/scam-detection \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-scenario_non_kd_scr_d2_data_amazonscience_massive_all_1_1111_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-scenario_non_kd_scr_d2_data_amazonscience_massive_all_1_1111_pipeline_en.md new file mode 100644 index 00000000000000..c687b8c0505f71 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-scenario_non_kd_scr_d2_data_amazonscience_massive_all_1_1111_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English scenario_non_kd_scr_d2_data_amazonscience_massive_all_1_1111_pipeline pipeline XlmRoBertaForSequenceClassification from haryoaw +author: John Snow Labs +name: scenario_non_kd_scr_d2_data_amazonscience_massive_all_1_1111_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`scenario_non_kd_scr_d2_data_amazonscience_massive_all_1_1111_pipeline` is a English model originally trained by haryoaw. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/scenario_non_kd_scr_d2_data_amazonscience_massive_all_1_1111_pipeline_en_5.5.0_3.0_1726799920060.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/scenario_non_kd_scr_d2_data_amazonscience_massive_all_1_1111_pipeline_en_5.5.0_3.0_1726799920060.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("scenario_non_kd_scr_d2_data_amazonscience_massive_all_1_1111_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("scenario_non_kd_scr_d2_data_amazonscience_massive_all_1_1111_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|scenario_non_kd_scr_d2_data_amazonscience_massive_all_1_1111_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|884.3 MB| + +## References + +https://huggingface.co/haryoaw/scenario-NON-KD-SCR-D2_data-AmazonScience_massive_all_1_1111 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-sead_l_6_h_256_a_8_qqp_en.md b/docs/_posts/ahmedlone127/2024-09-20-sead_l_6_h_256_a_8_qqp_en.md new file mode 100644 index 00000000000000..fa231a2bf20a62 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-sead_l_6_h_256_a_8_qqp_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sead_l_6_h_256_a_8_qqp BertForSequenceClassification from C5i +author: John Snow Labs +name: sead_l_6_h_256_a_8_qqp +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sead_l_6_h_256_a_8_qqp` is a English model originally trained by C5i. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sead_l_6_h_256_a_8_qqp_en_5.5.0_3.0_1726803479431.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sead_l_6_h_256_a_8_qqp_en_5.5.0_3.0_1726803479431.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("sead_l_6_h_256_a_8_qqp","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("sead_l_6_h_256_a_8_qqp", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sead_l_6_h_256_a_8_qqp| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|47.4 MB| + +## References + +https://huggingface.co/C5i/SEAD-L-6_H-256_A-8-qqp \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-sead_l_6_h_256_a_8_qqp_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-sead_l_6_h_256_a_8_qqp_pipeline_en.md new file mode 100644 index 00000000000000..39247d97906041 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-sead_l_6_h_256_a_8_qqp_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English sead_l_6_h_256_a_8_qqp_pipeline pipeline BertForSequenceClassification from C5i +author: John Snow Labs +name: sead_l_6_h_256_a_8_qqp_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sead_l_6_h_256_a_8_qqp_pipeline` is a English model originally trained by C5i. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sead_l_6_h_256_a_8_qqp_pipeline_en_5.5.0_3.0_1726803482245.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sead_l_6_h_256_a_8_qqp_pipeline_en_5.5.0_3.0_1726803482245.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sead_l_6_h_256_a_8_qqp_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sead_l_6_h_256_a_8_qqp_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sead_l_6_h_256_a_8_qqp_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|47.4 MB| + +## References + +https://huggingface.co/C5i/SEAD-L-6_H-256_A-8-qqp + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-sent_artificial_languages_des_bert_large_cased_en.md b/docs/_posts/ahmedlone127/2024-09-20-sent_artificial_languages_des_bert_large_cased_en.md new file mode 100644 index 00000000000000..f6813b909a4f20 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-sent_artificial_languages_des_bert_large_cased_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_artificial_languages_des_bert_large_cased BertSentenceEmbeddings from bencyc1129 +author: John Snow Labs +name: sent_artificial_languages_des_bert_large_cased +date: 2024-09-20 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_artificial_languages_des_bert_large_cased` is a English model originally trained by bencyc1129. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_artificial_languages_des_bert_large_cased_en_5.5.0_3.0_1726867412998.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_artificial_languages_des_bert_large_cased_en_5.5.0_3.0_1726867412998.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_artificial_languages_des_bert_large_cased","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_artificial_languages_des_bert_large_cased","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_artificial_languages_des_bert_large_cased| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/bencyc1129/art-des-bert-large-cased \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-sent_artificial_languages_des_bert_large_cased_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-sent_artificial_languages_des_bert_large_cased_pipeline_en.md new file mode 100644 index 00000000000000..5f967be85e242f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-sent_artificial_languages_des_bert_large_cased_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_artificial_languages_des_bert_large_cased_pipeline pipeline BertSentenceEmbeddings from bencyc1129 +author: John Snow Labs +name: sent_artificial_languages_des_bert_large_cased_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_artificial_languages_des_bert_large_cased_pipeline` is a English model originally trained by bencyc1129. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_artificial_languages_des_bert_large_cased_pipeline_en_5.5.0_3.0_1726867469322.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_artificial_languages_des_bert_large_cased_pipeline_en_5.5.0_3.0_1726867469322.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_artificial_languages_des_bert_large_cased_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_artificial_languages_des_bert_large_cased_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_artificial_languages_des_bert_large_cased_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/bencyc1129/art-des-bert-large-cased + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-sent_bert_base_bookcorpus_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-sent_bert_base_bookcorpus_pipeline_en.md new file mode 100644 index 00000000000000..ff250c36e3b4af --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-sent_bert_base_bookcorpus_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_base_bookcorpus_pipeline pipeline BertSentenceEmbeddings from AiresPucrs +author: John Snow Labs +name: sent_bert_base_bookcorpus_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_bookcorpus_pipeline` is a English model originally trained by AiresPucrs. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_bookcorpus_pipeline_en_5.5.0_3.0_1726868327591.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_bookcorpus_pipeline_en_5.5.0_3.0_1726868327591.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_bookcorpus_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_bookcorpus_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_bookcorpus_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|406.9 MB| + +## References + +https://huggingface.co/AiresPucrs/bert-base-bookcorpus + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-sent_bert_base_embedding_en.md b/docs/_posts/ahmedlone127/2024-09-20-sent_bert_base_embedding_en.md new file mode 100644 index 00000000000000..1b69d306929937 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-sent_bert_base_embedding_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_base_embedding BertSentenceEmbeddings from CH3COOK +author: John Snow Labs +name: sent_bert_base_embedding +date: 2024-09-20 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_embedding` is a English model originally trained by CH3COOK. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_embedding_en_5.5.0_3.0_1726868130007.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_embedding_en_5.5.0_3.0_1726868130007.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_embedding","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_embedding","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_embedding| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|407.9 MB| + +## References + +https://huggingface.co/CH3COOK/bert-base-embedding \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-sent_bert_base_embedding_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-sent_bert_base_embedding_pipeline_en.md new file mode 100644 index 00000000000000..78d55ae950e3d9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-sent_bert_base_embedding_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_base_embedding_pipeline pipeline BertSentenceEmbeddings from CH3COOK +author: John Snow Labs +name: sent_bert_base_embedding_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_embedding_pipeline` is a English model originally trained by CH3COOK. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_embedding_pipeline_en_5.5.0_3.0_1726868149361.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_embedding_pipeline_en_5.5.0_3.0_1726868149361.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_embedding_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_embedding_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_embedding_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|408.5 MB| + +## References + +https://huggingface.co/CH3COOK/bert-base-embedding + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-sent_bert_base_english_russian_cased_en.md b/docs/_posts/ahmedlone127/2024-09-20-sent_bert_base_english_russian_cased_en.md new file mode 100644 index 00000000000000..125720683f6734 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-sent_bert_base_english_russian_cased_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_base_english_russian_cased BertSentenceEmbeddings from Geotrend +author: John Snow Labs +name: sent_bert_base_english_russian_cased +date: 2024-09-20 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_english_russian_cased` is a English model originally trained by Geotrend. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_english_russian_cased_en_5.5.0_3.0_1726867228351.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_english_russian_cased_en_5.5.0_3.0_1726867228351.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_english_russian_cased","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_english_russian_cased","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_english_russian_cased| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|428.3 MB| + +## References + +https://huggingface.co/Geotrend/bert-base-en-ru-cased \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-sent_bert_base_english_russian_cased_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-sent_bert_base_english_russian_cased_pipeline_en.md new file mode 100644 index 00000000000000..6e25f6a91d54f2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-sent_bert_base_english_russian_cased_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_base_english_russian_cased_pipeline pipeline BertSentenceEmbeddings from Geotrend +author: John Snow Labs +name: sent_bert_base_english_russian_cased_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_english_russian_cased_pipeline` is a English model originally trained by Geotrend. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_english_russian_cased_pipeline_en_5.5.0_3.0_1726867249705.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_english_russian_cased_pipeline_en_5.5.0_3.0_1726867249705.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_english_russian_cased_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_english_russian_cased_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_english_russian_cased_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|428.8 MB| + +## References + +https://huggingface.co/Geotrend/bert-base-en-ru-cased + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-sent_bert_base_swedish_cased_alpha_en.md b/docs/_posts/ahmedlone127/2024-09-20-sent_bert_base_swedish_cased_alpha_en.md new file mode 100644 index 00000000000000..ecb8c70df43d50 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-sent_bert_base_swedish_cased_alpha_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_base_swedish_cased_alpha BertSentenceEmbeddings from KBLab +author: John Snow Labs +name: sent_bert_base_swedish_cased_alpha +date: 2024-09-20 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_swedish_cased_alpha` is a English model originally trained by KBLab. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_swedish_cased_alpha_en_5.5.0_3.0_1726868118921.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_swedish_cased_alpha_en_5.5.0_3.0_1726868118921.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_swedish_cased_alpha","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_swedish_cased_alpha","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_swedish_cased_alpha| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|464.2 MB| + +## References + +https://huggingface.co/KBLab/bert-base-swedish-cased-alpha \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-sent_bert_base_swedish_cased_alpha_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-sent_bert_base_swedish_cased_alpha_pipeline_en.md new file mode 100644 index 00000000000000..1c745291490bf4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-sent_bert_base_swedish_cased_alpha_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_base_swedish_cased_alpha_pipeline pipeline BertSentenceEmbeddings from KBLab +author: John Snow Labs +name: sent_bert_base_swedish_cased_alpha_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_swedish_cased_alpha_pipeline` is a English model originally trained by KBLab. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_swedish_cased_alpha_pipeline_en_5.5.0_3.0_1726868145830.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_swedish_cased_alpha_pipeline_en_5.5.0_3.0_1726868145830.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_swedish_cased_alpha_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_swedish_cased_alpha_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_swedish_cased_alpha_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|464.8 MB| + +## References + +https://huggingface.co/KBLab/bert-base-swedish-cased-alpha + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-sent_bert_base_uncased_git_pipeline_zh.md b/docs/_posts/ahmedlone127/2024-09-20-sent_bert_base_uncased_git_pipeline_zh.md new file mode 100644 index 00000000000000..38836927a3445a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-sent_bert_base_uncased_git_pipeline_zh.md @@ -0,0 +1,71 @@ +--- +layout: model +title: Chinese sent_bert_base_uncased_git_pipeline pipeline BertSentenceEmbeddings from littlebird13 +author: John Snow Labs +name: sent_bert_base_uncased_git_pipeline +date: 2024-09-20 +tags: [zh, open_source, pipeline, onnx] +task: Embeddings +language: zh +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_uncased_git_pipeline` is a Chinese model originally trained by littlebird13. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_git_pipeline_zh_5.5.0_3.0_1726867246871.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_git_pipeline_zh_5.5.0_3.0_1726867246871.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_uncased_git_pipeline", lang = "zh") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_uncased_git_pipeline", lang = "zh") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_uncased_git_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|zh| +|Size:|407.7 MB| + +## References + +https://huggingface.co/littlebird13/bert-base-uncased-git + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-sent_bert_base_uncased_issues_128_phnghiapro_en.md b/docs/_posts/ahmedlone127/2024-09-20-sent_bert_base_uncased_issues_128_phnghiapro_en.md new file mode 100644 index 00000000000000..0e7d504da2e5cc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-sent_bert_base_uncased_issues_128_phnghiapro_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_base_uncased_issues_128_phnghiapro BertSentenceEmbeddings from phnghiapro +author: John Snow Labs +name: sent_bert_base_uncased_issues_128_phnghiapro +date: 2024-09-20 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_uncased_issues_128_phnghiapro` is a English model originally trained by phnghiapro. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_issues_128_phnghiapro_en_5.5.0_3.0_1726814861960.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_issues_128_phnghiapro_en_5.5.0_3.0_1726814861960.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_uncased_issues_128_phnghiapro","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_uncased_issues_128_phnghiapro","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_uncased_issues_128_phnghiapro| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|407.1 MB| + +## References + +https://huggingface.co/phnghiapro/bert-base-uncased-issues-128 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-sent_bert_base_uncased_issues_128_susnato_en.md b/docs/_posts/ahmedlone127/2024-09-20-sent_bert_base_uncased_issues_128_susnato_en.md new file mode 100644 index 00000000000000..25e3eda4ad86d1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-sent_bert_base_uncased_issues_128_susnato_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_base_uncased_issues_128_susnato BertSentenceEmbeddings from susnato +author: John Snow Labs +name: sent_bert_base_uncased_issues_128_susnato +date: 2024-09-20 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_uncased_issues_128_susnato` is a English model originally trained by susnato. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_issues_128_susnato_en_5.5.0_3.0_1726868015750.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_issues_128_susnato_en_5.5.0_3.0_1726868015750.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_uncased_issues_128_susnato","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_uncased_issues_128_susnato","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_uncased_issues_128_susnato| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|407.1 MB| + +## References + +https://huggingface.co/susnato/bert-base-uncased-issues-128 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-sent_bert_base_uncased_issues_128_susnato_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-sent_bert_base_uncased_issues_128_susnato_pipeline_en.md new file mode 100644 index 00000000000000..d44c3127fec1d5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-sent_bert_base_uncased_issues_128_susnato_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_base_uncased_issues_128_susnato_pipeline pipeline BertSentenceEmbeddings from susnato +author: John Snow Labs +name: sent_bert_base_uncased_issues_128_susnato_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_uncased_issues_128_susnato_pipeline` is a English model originally trained by susnato. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_issues_128_susnato_pipeline_en_5.5.0_3.0_1726868035654.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_issues_128_susnato_pipeline_en_5.5.0_3.0_1726868035654.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_uncased_issues_128_susnato_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_uncased_issues_128_susnato_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_uncased_issues_128_susnato_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.7 MB| + +## References + +https://huggingface.co/susnato/bert-base-uncased-issues-128 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-sent_bert_base_uncased_issues_128_takaiwai_en.md b/docs/_posts/ahmedlone127/2024-09-20-sent_bert_base_uncased_issues_128_takaiwai_en.md new file mode 100644 index 00000000000000..a24714e49ced94 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-sent_bert_base_uncased_issues_128_takaiwai_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_base_uncased_issues_128_takaiwai BertSentenceEmbeddings from takaiwai +author: John Snow Labs +name: sent_bert_base_uncased_issues_128_takaiwai +date: 2024-09-20 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_uncased_issues_128_takaiwai` is a English model originally trained by takaiwai. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_issues_128_takaiwai_en_5.5.0_3.0_1726801358089.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_issues_128_takaiwai_en_5.5.0_3.0_1726801358089.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_uncased_issues_128_takaiwai","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_uncased_issues_128_takaiwai","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_uncased_issues_128_takaiwai| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|407.1 MB| + +## References + +https://huggingface.co/takaiwai/bert-base-uncased-issues-128 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-sent_bert_base_uncased_issues_128_takaiwai_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-sent_bert_base_uncased_issues_128_takaiwai_pipeline_en.md new file mode 100644 index 00000000000000..803ae257916843 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-sent_bert_base_uncased_issues_128_takaiwai_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_base_uncased_issues_128_takaiwai_pipeline pipeline BertSentenceEmbeddings from takaiwai +author: John Snow Labs +name: sent_bert_base_uncased_issues_128_takaiwai_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_uncased_issues_128_takaiwai_pipeline` is a English model originally trained by takaiwai. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_issues_128_takaiwai_pipeline_en_5.5.0_3.0_1726801376234.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_issues_128_takaiwai_pipeline_en_5.5.0_3.0_1726801376234.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_uncased_issues_128_takaiwai_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_uncased_issues_128_takaiwai_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_uncased_issues_128_takaiwai_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.7 MB| + +## References + +https://huggingface.co/takaiwai/bert-base-uncased-issues-128 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-sent_bert_base_uncased_sparse_85_unstructured_pruneofa_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-sent_bert_base_uncased_sparse_85_unstructured_pruneofa_pipeline_en.md new file mode 100644 index 00000000000000..0ca6a8ad1e0ae4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-sent_bert_base_uncased_sparse_85_unstructured_pruneofa_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_base_uncased_sparse_85_unstructured_pruneofa_pipeline pipeline BertSentenceEmbeddings from Intel +author: John Snow Labs +name: sent_bert_base_uncased_sparse_85_unstructured_pruneofa_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_uncased_sparse_85_unstructured_pruneofa_pipeline` is a English model originally trained by Intel. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_sparse_85_unstructured_pruneofa_pipeline_en_5.5.0_3.0_1726815113148.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_sparse_85_unstructured_pruneofa_pipeline_en_5.5.0_3.0_1726815113148.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_uncased_sparse_85_unstructured_pruneofa_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_uncased_sparse_85_unstructured_pruneofa_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_uncased_sparse_85_unstructured_pruneofa_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|176.2 MB| + +## References + +https://huggingface.co/Intel/bert-base-uncased-sparse-85-unstructured-pruneofa + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-sent_bert_gb_2021_en.md b/docs/_posts/ahmedlone127/2024-09-20-sent_bert_gb_2021_en.md new file mode 100644 index 00000000000000..c46b62a8f2ffd5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-sent_bert_gb_2021_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_gb_2021 BertSentenceEmbeddings from mossaic-candle +author: John Snow Labs +name: sent_bert_gb_2021 +date: 2024-09-20 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_gb_2021` is a English model originally trained by mossaic-candle. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_gb_2021_en_5.5.0_3.0_1726854450196.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_gb_2021_en_5.5.0_3.0_1726854450196.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_gb_2021","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_gb_2021","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_gb_2021| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|258.6 MB| + +## References + +https://huggingface.co/mossaic-candle/bert-gb-2021 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-sent_bert_gb_2021_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-sent_bert_gb_2021_pipeline_en.md new file mode 100644 index 00000000000000..0e976494925faa --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-sent_bert_gb_2021_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_gb_2021_pipeline pipeline BertSentenceEmbeddings from mossaic-candle +author: John Snow Labs +name: sent_bert_gb_2021_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_gb_2021_pipeline` is a English model originally trained by mossaic-candle. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_gb_2021_pipeline_en_5.5.0_3.0_1726854523699.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_gb_2021_pipeline_en_5.5.0_3.0_1726854523699.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_gb_2021_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_gb_2021_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_gb_2021_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|259.2 MB| + +## References + +https://huggingface.co/mossaic-candle/bert-gb-2021 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-sent_bert_patent_reference_extraction_en.md b/docs/_posts/ahmedlone127/2024-09-20-sent_bert_patent_reference_extraction_en.md new file mode 100644 index 00000000000000..525f2d7b542909 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-sent_bert_patent_reference_extraction_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_patent_reference_extraction BertSentenceEmbeddings from kaesve +author: John Snow Labs +name: sent_bert_patent_reference_extraction +date: 2024-09-20 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_patent_reference_extraction` is a English model originally trained by kaesve. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_patent_reference_extraction_en_5.5.0_3.0_1726868269687.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_patent_reference_extraction_en_5.5.0_3.0_1726868269687.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_patent_reference_extraction","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_patent_reference_extraction","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_patent_reference_extraction| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|403.6 MB| + +## References + +https://huggingface.co/kaesve/BERT_patent_reference_extraction \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-sent_bert_patent_reference_extraction_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-sent_bert_patent_reference_extraction_pipeline_en.md new file mode 100644 index 00000000000000..fcbe7bf1cac0ed --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-sent_bert_patent_reference_extraction_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_patent_reference_extraction_pipeline pipeline BertSentenceEmbeddings from kaesve +author: John Snow Labs +name: sent_bert_patent_reference_extraction_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_patent_reference_extraction_pipeline` is a English model originally trained by kaesve. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_patent_reference_extraction_pipeline_en_5.5.0_3.0_1726868289713.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_patent_reference_extraction_pipeline_en_5.5.0_3.0_1726868289713.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_patent_reference_extraction_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_patent_reference_extraction_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_patent_reference_extraction_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|404.2 MB| + +## References + +https://huggingface.co/kaesve/BERT_patent_reference_extraction + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-sent_distilbertu_base_cased_0_5_en.md b/docs/_posts/ahmedlone127/2024-09-20-sent_distilbertu_base_cased_0_5_en.md new file mode 100644 index 00000000000000..ffcb06a3c6a193 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-sent_distilbertu_base_cased_0_5_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_distilbertu_base_cased_0_5 BertSentenceEmbeddings from amitness +author: John Snow Labs +name: sent_distilbertu_base_cased_0_5 +date: 2024-09-20 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_distilbertu_base_cased_0_5` is a English model originally trained by amitness. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_distilbertu_base_cased_0_5_en_5.5.0_3.0_1726867942280.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_distilbertu_base_cased_0_5_en_5.5.0_3.0_1726867942280.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_distilbertu_base_cased_0_5","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_distilbertu_base_cased_0_5","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_distilbertu_base_cased_0_5| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|469.9 MB| + +## References + +https://huggingface.co/amitness/distilbertu-base-cased-0.5 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-sent_distilbertu_base_cased_0_5_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-sent_distilbertu_base_cased_0_5_pipeline_en.md new file mode 100644 index 00000000000000..7c07dfc368d91d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-sent_distilbertu_base_cased_0_5_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_distilbertu_base_cased_0_5_pipeline pipeline BertSentenceEmbeddings from amitness +author: John Snow Labs +name: sent_distilbertu_base_cased_0_5_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_distilbertu_base_cased_0_5_pipeline` is a English model originally trained by amitness. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_distilbertu_base_cased_0_5_pipeline_en_5.5.0_3.0_1726867965085.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_distilbertu_base_cased_0_5_pipeline_en_5.5.0_3.0_1726867965085.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_distilbertu_base_cased_0_5_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_distilbertu_base_cased_0_5_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_distilbertu_base_cased_0_5_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|470.5 MB| + +## References + +https://huggingface.co/amitness/distilbertu-base-cased-0.5 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-sent_distilbertu_base_cased_1_0_en.md b/docs/_posts/ahmedlone127/2024-09-20-sent_distilbertu_base_cased_1_0_en.md new file mode 100644 index 00000000000000..3614e612e892ce --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-sent_distilbertu_base_cased_1_0_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_distilbertu_base_cased_1_0 BertSentenceEmbeddings from amitness +author: John Snow Labs +name: sent_distilbertu_base_cased_1_0 +date: 2024-09-20 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_distilbertu_base_cased_1_0` is a English model originally trained by amitness. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_distilbertu_base_cased_1_0_en_5.5.0_3.0_1726868118645.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_distilbertu_base_cased_1_0_en_5.5.0_3.0_1726868118645.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_distilbertu_base_cased_1_0","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_distilbertu_base_cased_1_0","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_distilbertu_base_cased_1_0| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|470.4 MB| + +## References + +https://huggingface.co/amitness/distilbertu-base-cased-1.0 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-sent_distilbertu_base_cased_1_0_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-sent_distilbertu_base_cased_1_0_pipeline_en.md new file mode 100644 index 00000000000000..95d8f0535e38ff --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-sent_distilbertu_base_cased_1_0_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_distilbertu_base_cased_1_0_pipeline pipeline BertSentenceEmbeddings from amitness +author: John Snow Labs +name: sent_distilbertu_base_cased_1_0_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_distilbertu_base_cased_1_0_pipeline` is a English model originally trained by amitness. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_distilbertu_base_cased_1_0_pipeline_en_5.5.0_3.0_1726868146022.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_distilbertu_base_cased_1_0_pipeline_en_5.5.0_3.0_1726868146022.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_distilbertu_base_cased_1_0_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_distilbertu_base_cased_1_0_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_distilbertu_base_cased_1_0_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|470.9 MB| + +## References + +https://huggingface.co/amitness/distilbertu-base-cased-1.0 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-sent_finest_bert_fi.md b/docs/_posts/ahmedlone127/2024-09-20-sent_finest_bert_fi.md new file mode 100644 index 00000000000000..69094a702a1e50 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-sent_finest_bert_fi.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Finnish sent_finest_bert BertSentenceEmbeddings from EMBEDDIA +author: John Snow Labs +name: sent_finest_bert +date: 2024-09-20 +tags: [fi, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: fi +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_finest_bert` is a Finnish model originally trained by EMBEDDIA. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_finest_bert_fi_5.5.0_3.0_1726815227913.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_finest_bert_fi_5.5.0_3.0_1726815227913.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_finest_bert","fi") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_finest_bert","fi") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_finest_bert| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|fi| +|Size:|535.1 MB| + +## References + +https://huggingface.co/EMBEDDIA/finest-bert \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-sent_mental_bert_en.md b/docs/_posts/ahmedlone127/2024-09-20-sent_mental_bert_en.md new file mode 100644 index 00000000000000..59d0a9ef5b5961 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-sent_mental_bert_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_mental_bert BertSentenceEmbeddings from Zamoranesis +author: John Snow Labs +name: sent_mental_bert +date: 2024-09-20 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_mental_bert` is a English model originally trained by Zamoranesis. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_mental_bert_en_5.5.0_3.0_1726801301293.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_mental_bert_en_5.5.0_3.0_1726801301293.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_mental_bert","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_mental_bert","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_mental_bert| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|406.6 MB| + +## References + +https://huggingface.co/Zamoranesis/mental_bert \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-sent_mental_bert_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-sent_mental_bert_pipeline_en.md new file mode 100644 index 00000000000000..4ddf07d3892d85 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-sent_mental_bert_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_mental_bert_pipeline pipeline BertSentenceEmbeddings from Zamoranesis +author: John Snow Labs +name: sent_mental_bert_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_mental_bert_pipeline` is a English model originally trained by Zamoranesis. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_mental_bert_pipeline_en_5.5.0_3.0_1726801319792.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_mental_bert_pipeline_en_5.5.0_3.0_1726801319792.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_mental_bert_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_mental_bert_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_mental_bert_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/Zamoranesis/mental_bert + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-sent_notram_bert_norwegian_cased_080321_no.md b/docs/_posts/ahmedlone127/2024-09-20-sent_notram_bert_norwegian_cased_080321_no.md new file mode 100644 index 00000000000000..d1494d6026181f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-sent_notram_bert_norwegian_cased_080321_no.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Norwegian sent_notram_bert_norwegian_cased_080321 BertSentenceEmbeddings from NbAiLab +author: John Snow Labs +name: sent_notram_bert_norwegian_cased_080321 +date: 2024-09-20 +tags: ["no", open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: "no" +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_notram_bert_norwegian_cased_080321` is a Norwegian model originally trained by NbAiLab. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_notram_bert_norwegian_cased_080321_no_5.5.0_3.0_1726867046861.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_notram_bert_norwegian_cased_080321_no_5.5.0_3.0_1726867046861.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_notram_bert_norwegian_cased_080321","no") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_notram_bert_norwegian_cased_080321","no") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_notram_bert_norwegian_cased_080321| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|no| +|Size:|663.0 MB| + +## References + +https://huggingface.co/NbAiLab/notram-bert-norwegian-cased-080321 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-sent_notram_bert_norwegian_cased_080321_pipeline_no.md b/docs/_posts/ahmedlone127/2024-09-20-sent_notram_bert_norwegian_cased_080321_pipeline_no.md new file mode 100644 index 00000000000000..9ce07218082764 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-sent_notram_bert_norwegian_cased_080321_pipeline_no.md @@ -0,0 +1,71 @@ +--- +layout: model +title: Norwegian sent_notram_bert_norwegian_cased_080321_pipeline pipeline BertSentenceEmbeddings from NbAiLab +author: John Snow Labs +name: sent_notram_bert_norwegian_cased_080321_pipeline +date: 2024-09-20 +tags: ["no", open_source, pipeline, onnx] +task: Embeddings +language: "no" +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_notram_bert_norwegian_cased_080321_pipeline` is a Norwegian model originally trained by NbAiLab. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_notram_bert_norwegian_cased_080321_pipeline_no_5.5.0_3.0_1726867078698.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_notram_bert_norwegian_cased_080321_pipeline_no_5.5.0_3.0_1726867078698.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_notram_bert_norwegian_cased_080321_pipeline", lang = "no") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_notram_bert_norwegian_cased_080321_pipeline", lang = "no") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_notram_bert_norwegian_cased_080321_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|no| +|Size:|663.6 MB| + +## References + +https://huggingface.co/NbAiLab/notram-bert-norwegian-cased-080321 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-sent_roberta_base_word_chinese_cluecorpussmall_pipeline_zh.md b/docs/_posts/ahmedlone127/2024-09-20-sent_roberta_base_word_chinese_cluecorpussmall_pipeline_zh.md new file mode 100644 index 00000000000000..283e1d09a57436 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-sent_roberta_base_word_chinese_cluecorpussmall_pipeline_zh.md @@ -0,0 +1,71 @@ +--- +layout: model +title: Chinese sent_roberta_base_word_chinese_cluecorpussmall_pipeline pipeline BertSentenceEmbeddings from uer +author: John Snow Labs +name: sent_roberta_base_word_chinese_cluecorpussmall_pipeline +date: 2024-09-20 +tags: [zh, open_source, pipeline, onnx] +task: Embeddings +language: zh +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_roberta_base_word_chinese_cluecorpussmall_pipeline` is a Chinese model originally trained by uer. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_roberta_base_word_chinese_cluecorpussmall_pipeline_zh_5.5.0_3.0_1726854431769.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_roberta_base_word_chinese_cluecorpussmall_pipeline_zh_5.5.0_3.0_1726854431769.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_roberta_base_word_chinese_cluecorpussmall_pipeline", lang = "zh") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_roberta_base_word_chinese_cluecorpussmall_pipeline", lang = "zh") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_roberta_base_word_chinese_cluecorpussmall_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|zh| +|Size:|607.8 MB| + +## References + +https://huggingface.co/uer/roberta-base-word-chinese-cluecorpussmall + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-sent_roberta_base_word_chinese_cluecorpussmall_zh.md b/docs/_posts/ahmedlone127/2024-09-20-sent_roberta_base_word_chinese_cluecorpussmall_zh.md new file mode 100644 index 00000000000000..ffeb16696709d0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-sent_roberta_base_word_chinese_cluecorpussmall_zh.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Chinese sent_roberta_base_word_chinese_cluecorpussmall BertSentenceEmbeddings from uer +author: John Snow Labs +name: sent_roberta_base_word_chinese_cluecorpussmall +date: 2024-09-20 +tags: [zh, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: zh +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_roberta_base_word_chinese_cluecorpussmall` is a Chinese model originally trained by uer. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_roberta_base_word_chinese_cluecorpussmall_zh_5.5.0_3.0_1726854402062.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_roberta_base_word_chinese_cluecorpussmall_zh_5.5.0_3.0_1726854402062.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_roberta_base_word_chinese_cluecorpussmall","zh") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_roberta_base_word_chinese_cluecorpussmall","zh") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_roberta_base_word_chinese_cluecorpussmall| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|zh| +|Size:|607.3 MB| + +## References + +https://huggingface.co/uer/roberta-base-word-chinese-cluecorpussmall \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-sent_wobert_chinese_plus_pipeline_zh.md b/docs/_posts/ahmedlone127/2024-09-20-sent_wobert_chinese_plus_pipeline_zh.md new file mode 100644 index 00000000000000..e1a7325227e00a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-sent_wobert_chinese_plus_pipeline_zh.md @@ -0,0 +1,71 @@ +--- +layout: model +title: Chinese sent_wobert_chinese_plus_pipeline pipeline BertSentenceEmbeddings from qinluo +author: John Snow Labs +name: sent_wobert_chinese_plus_pipeline +date: 2024-09-20 +tags: [zh, open_source, pipeline, onnx] +task: Embeddings +language: zh +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_wobert_chinese_plus_pipeline` is a Chinese model originally trained by qinluo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_wobert_chinese_plus_pipeline_zh_5.5.0_3.0_1726866898156.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_wobert_chinese_plus_pipeline_zh_5.5.0_3.0_1726866898156.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_wobert_chinese_plus_pipeline", lang = "zh") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_wobert_chinese_plus_pipeline", lang = "zh") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_wobert_chinese_plus_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|zh| +|Size:|465.0 MB| + +## References + +https://huggingface.co/qinluo/wobert-chinese-plus + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-sent_wobert_chinese_plus_zh.md b/docs/_posts/ahmedlone127/2024-09-20-sent_wobert_chinese_plus_zh.md new file mode 100644 index 00000000000000..ae0b28626e0647 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-sent_wobert_chinese_plus_zh.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Chinese sent_wobert_chinese_plus BertSentenceEmbeddings from qinluo +author: John Snow Labs +name: sent_wobert_chinese_plus +date: 2024-09-20 +tags: [zh, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: zh +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_wobert_chinese_plus` is a Chinese model originally trained by qinluo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_wobert_chinese_plus_zh_5.5.0_3.0_1726866876022.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_wobert_chinese_plus_zh_5.5.0_3.0_1726866876022.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_wobert_chinese_plus","zh") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_wobert_chinese_plus","zh") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_wobert_chinese_plus| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|zh| +|Size:|464.5 MB| + +## References + +https://huggingface.co/qinluo/wobert-chinese-plus \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-sentience_classification_score_pytorch_en.md b/docs/_posts/ahmedlone127/2024-09-20-sentience_classification_score_pytorch_en.md new file mode 100644 index 00000000000000..ab55a1e6a29482 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-sentience_classification_score_pytorch_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sentience_classification_score_pytorch DistilBertForSequenceClassification from aeaee +author: John Snow Labs +name: sentience_classification_score_pytorch +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sentience_classification_score_pytorch` is a English model originally trained by aeaee. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sentience_classification_score_pytorch_en_5.5.0_3.0_1726823615020.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sentience_classification_score_pytorch_en_5.5.0_3.0_1726823615020.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("sentience_classification_score_pytorch","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("sentience_classification_score_pytorch", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sentience_classification_score_pytorch| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.4 MB| + +## References + +https://huggingface.co/aeaee/SENTIENCE_Classification_Score_pytorch \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-sentiment_analysis_base_rslora_en.md b/docs/_posts/ahmedlone127/2024-09-20-sentiment_analysis_base_rslora_en.md new file mode 100644 index 00000000000000..f11791c9154561 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-sentiment_analysis_base_rslora_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sentiment_analysis_base_rslora RoBertaForSequenceClassification from Shotaro30678 +author: John Snow Labs +name: sentiment_analysis_base_rslora +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sentiment_analysis_base_rslora` is a English model originally trained by Shotaro30678. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sentiment_analysis_base_rslora_en_5.5.0_3.0_1726851633427.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sentiment_analysis_base_rslora_en_5.5.0_3.0_1726851633427.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("sentiment_analysis_base_rslora","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("sentiment_analysis_base_rslora", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sentiment_analysis_base_rslora| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|308.8 MB| + +## References + +https://huggingface.co/Shotaro30678/sentiment-analysis-base-rslora \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-sentiment_analysis_finalmodel_en.md b/docs/_posts/ahmedlone127/2024-09-20-sentiment_analysis_finalmodel_en.md new file mode 100644 index 00000000000000..f458afbfe31ae7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-sentiment_analysis_finalmodel_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sentiment_analysis_finalmodel DistilBertForSequenceClassification from OmidAghili +author: John Snow Labs +name: sentiment_analysis_finalmodel +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sentiment_analysis_finalmodel` is a English model originally trained by OmidAghili. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sentiment_analysis_finalmodel_en_5.5.0_3.0_1726823838829.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sentiment_analysis_finalmodel_en_5.5.0_3.0_1726823838829.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("sentiment_analysis_finalmodel","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("sentiment_analysis_finalmodel", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sentiment_analysis_finalmodel| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/OmidAghili/Sentiment_Analysis_FinalModel \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-sentiment_analysis_for_emotion_chat_bot_en.md b/docs/_posts/ahmedlone127/2024-09-20-sentiment_analysis_for_emotion_chat_bot_en.md new file mode 100644 index 00000000000000..5229a849e530e0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-sentiment_analysis_for_emotion_chat_bot_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sentiment_analysis_for_emotion_chat_bot RoBertaForSequenceClassification from Shotaro30678 +author: John Snow Labs +name: sentiment_analysis_for_emotion_chat_bot +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sentiment_analysis_for_emotion_chat_bot` is a English model originally trained by Shotaro30678. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sentiment_analysis_for_emotion_chat_bot_en_5.5.0_3.0_1726849942035.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sentiment_analysis_for_emotion_chat_bot_en_5.5.0_3.0_1726849942035.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("sentiment_analysis_for_emotion_chat_bot","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("sentiment_analysis_for_emotion_chat_bot", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sentiment_analysis_for_emotion_chat_bot| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|308.8 MB| + +## References + +https://huggingface.co/Shotaro30678/sentiment_analysis_for_emotion_chat_bot \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-sentiment_analysis_for_emotion_chat_bot_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-sentiment_analysis_for_emotion_chat_bot_pipeline_en.md new file mode 100644 index 00000000000000..050e63688ded9c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-sentiment_analysis_for_emotion_chat_bot_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English sentiment_analysis_for_emotion_chat_bot_pipeline pipeline RoBertaForSequenceClassification from Shotaro30678 +author: John Snow Labs +name: sentiment_analysis_for_emotion_chat_bot_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sentiment_analysis_for_emotion_chat_bot_pipeline` is a English model originally trained by Shotaro30678. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sentiment_analysis_for_emotion_chat_bot_pipeline_en_5.5.0_3.0_1726849956337.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sentiment_analysis_for_emotion_chat_bot_pipeline_en_5.5.0_3.0_1726849956337.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sentiment_analysis_for_emotion_chat_bot_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sentiment_analysis_for_emotion_chat_bot_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sentiment_analysis_for_emotion_chat_bot_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|308.9 MB| + +## References + +https://huggingface.co/Shotaro30678/sentiment_analysis_for_emotion_chat_bot + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-sentiment_analysis_model_trained_en.md b/docs/_posts/ahmedlone127/2024-09-20-sentiment_analysis_model_trained_en.md new file mode 100644 index 00000000000000..fe88df5e46df38 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-sentiment_analysis_model_trained_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sentiment_analysis_model_trained DistilBertForSequenceClassification from Lasghar +author: John Snow Labs +name: sentiment_analysis_model_trained +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sentiment_analysis_model_trained` is a English model originally trained by Lasghar. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sentiment_analysis_model_trained_en_5.5.0_3.0_1726860999400.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sentiment_analysis_model_trained_en_5.5.0_3.0_1726860999400.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("sentiment_analysis_model_trained","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("sentiment_analysis_model_trained", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sentiment_analysis_model_trained| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Lasghar/sentiment-analysis-model-trained \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-sentiment_analysis_model_trained_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-sentiment_analysis_model_trained_pipeline_en.md new file mode 100644 index 00000000000000..5a6b03e7612818 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-sentiment_analysis_model_trained_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English sentiment_analysis_model_trained_pipeline pipeline DistilBertForSequenceClassification from Lasghar +author: John Snow Labs +name: sentiment_analysis_model_trained_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sentiment_analysis_model_trained_pipeline` is a English model originally trained by Lasghar. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sentiment_analysis_model_trained_pipeline_en_5.5.0_3.0_1726861011705.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sentiment_analysis_model_trained_pipeline_en_5.5.0_3.0_1726861011705.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sentiment_analysis_model_trained_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sentiment_analysis_model_trained_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sentiment_analysis_model_trained_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Lasghar/sentiment-analysis-model-trained + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-setfit_jt_en.md b/docs/_posts/ahmedlone127/2024-09-20-setfit_jt_en.md new file mode 100644 index 00000000000000..2b5cda4749b8f1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-setfit_jt_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English setfit_jt MPNetForSequenceClassification from akswasti +author: John Snow Labs +name: setfit_jt +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, mpnet] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MPNetForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MPNetForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`setfit_jt` is a English model originally trained by akswasti. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/setfit_jt_en_5.5.0_3.0_1726824630667.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/setfit_jt_en_5.5.0_3.0_1726824630667.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = MPNetForSequenceClassification.pretrained("setfit_jt","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = MPNetForSequenceClassification.pretrained("setfit_jt", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|setfit_jt| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.0 MB| + +## References + +https://huggingface.co/akswasti/setfit-jt \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-setfit_jt_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-setfit_jt_pipeline_en.md new file mode 100644 index 00000000000000..99d5360b5287a9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-setfit_jt_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English setfit_jt_pipeline pipeline MPNetForSequenceClassification from akswasti +author: John Snow Labs +name: setfit_jt_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MPNetForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`setfit_jt_pipeline` is a English model originally trained by akswasti. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/setfit_jt_pipeline_en_5.5.0_3.0_1726824649926.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/setfit_jt_pipeline_en_5.5.0_3.0_1726824649926.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("setfit_jt_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("setfit_jt_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|setfit_jt_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.1 MB| + +## References + +https://huggingface.co/akswasti/setfit-jt + +## Included Models + +- DocumentAssembler +- TokenizerModel +- MPNetForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-snli_roberta_large_seed_1_en.md b/docs/_posts/ahmedlone127/2024-09-20-snli_roberta_large_seed_1_en.md new file mode 100644 index 00000000000000..089d79a0d083fc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-snli_roberta_large_seed_1_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English snli_roberta_large_seed_1 RoBertaForSequenceClassification from utahnlp +author: John Snow Labs +name: snli_roberta_large_seed_1 +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`snli_roberta_large_seed_1` is a English model originally trained by utahnlp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/snli_roberta_large_seed_1_en_5.5.0_3.0_1726851711061.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/snli_roberta_large_seed_1_en_5.5.0_3.0_1726851711061.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("snli_roberta_large_seed_1","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("snli_roberta_large_seed_1", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|snli_roberta_large_seed_1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/utahnlp/snli_roberta-large_seed-1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-snli_roberta_large_seed_1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-snli_roberta_large_seed_1_pipeline_en.md new file mode 100644 index 00000000000000..a8a13b654970fa --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-snli_roberta_large_seed_1_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English snli_roberta_large_seed_1_pipeline pipeline RoBertaForSequenceClassification from utahnlp +author: John Snow Labs +name: snli_roberta_large_seed_1_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`snli_roberta_large_seed_1_pipeline` is a English model originally trained by utahnlp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/snli_roberta_large_seed_1_pipeline_en_5.5.0_3.0_1726851788290.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/snli_roberta_large_seed_1_pipeline_en_5.5.0_3.0_1726851788290.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("snli_roberta_large_seed_1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("snli_roberta_large_seed_1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|snli_roberta_large_seed_1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/utahnlp/snli_roberta-large_seed-1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-spea_2_en.md b/docs/_posts/ahmedlone127/2024-09-20-spea_2_en.md new file mode 100644 index 00000000000000..fdc59700bbb1c2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-spea_2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English spea_2 RoBertaForSequenceClassification from BaronSch +author: John Snow Labs +name: spea_2 +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`spea_2` is a English model originally trained by BaronSch. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/spea_2_en_5.5.0_3.0_1726849670024.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/spea_2_en_5.5.0_3.0_1726849670024.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("spea_2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("spea_2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|spea_2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|468.5 MB| + +## References + +https://huggingface.co/BaronSch/Spea_2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-spea_2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-spea_2_pipeline_en.md new file mode 100644 index 00000000000000..a2c9fe7500ae77 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-spea_2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English spea_2_pipeline pipeline RoBertaForSequenceClassification from BaronSch +author: John Snow Labs +name: spea_2_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`spea_2_pipeline` is a English model originally trained by BaronSch. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/spea_2_pipeline_en_5.5.0_3.0_1726849692853.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/spea_2_pipeline_en_5.5.0_3.0_1726849692853.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("spea_2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("spea_2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|spea_2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|468.5 MB| + +## References + +https://huggingface.co/BaronSch/Spea_2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-splade_roberta_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-splade_roberta_pipeline_en.md new file mode 100644 index 00000000000000..f783e56495e26b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-splade_roberta_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English splade_roberta_pipeline pipeline RoBertaEmbeddings from maximedb +author: John Snow Labs +name: splade_roberta_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`splade_roberta_pipeline` is a English model originally trained by maximedb. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/splade_roberta_pipeline_en_5.5.0_3.0_1726796612748.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/splade_roberta_pipeline_en_5.5.0_3.0_1726796612748.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("splade_roberta_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("splade_roberta_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|splade_roberta_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|306.5 MB| + +## References + +https://huggingface.co/maximedb/splade-roberta + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-sst2_padding20model_en.md b/docs/_posts/ahmedlone127/2024-09-20-sst2_padding20model_en.md new file mode 100644 index 00000000000000..98cbfe37f5babe --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-sst2_padding20model_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sst2_padding20model DistilBertForSequenceClassification from Realgon +author: John Snow Labs +name: sst2_padding20model +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sst2_padding20model` is a English model originally trained by Realgon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sst2_padding20model_en_5.5.0_3.0_1726823468438.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sst2_padding20model_en_5.5.0_3.0_1726823468438.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("sst2_padding20model","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("sst2_padding20model", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sst2_padding20model| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Realgon/sst2_padding20model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-sst2_padding20model_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-sst2_padding20model_pipeline_en.md new file mode 100644 index 00000000000000..eabe84040ba74a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-sst2_padding20model_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English sst2_padding20model_pipeline pipeline DistilBertForSequenceClassification from Realgon +author: John Snow Labs +name: sst2_padding20model_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sst2_padding20model_pipeline` is a English model originally trained by Realgon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sst2_padding20model_pipeline_en_5.5.0_3.0_1726823480739.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sst2_padding20model_pipeline_en_5.5.0_3.0_1726823480739.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sst2_padding20model_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sst2_padding20model_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sst2_padding20model_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Realgon/sst2_padding20model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-sst5_padding100model_en.md b/docs/_posts/ahmedlone127/2024-09-20-sst5_padding100model_en.md new file mode 100644 index 00000000000000..564883377f36bb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-sst5_padding100model_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sst5_padding100model DistilBertForSequenceClassification from Realgon +author: John Snow Labs +name: sst5_padding100model +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sst5_padding100model` is a English model originally trained by Realgon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sst5_padding100model_en_5.5.0_3.0_1726848683901.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sst5_padding100model_en_5.5.0_3.0_1726848683901.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("sst5_padding100model","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("sst5_padding100model", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sst5_padding100model| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Realgon/sst5_padding100model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-stego_classifier_checkpoint_epoch_10_2024_07_26_16_03_28_en.md b/docs/_posts/ahmedlone127/2024-09-20-stego_classifier_checkpoint_epoch_10_2024_07_26_16_03_28_en.md new file mode 100644 index 00000000000000..7135f58a360700 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-stego_classifier_checkpoint_epoch_10_2024_07_26_16_03_28_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English stego_classifier_checkpoint_epoch_10_2024_07_26_16_03_28 DistilBertForSequenceClassification from jvelja +author: John Snow Labs +name: stego_classifier_checkpoint_epoch_10_2024_07_26_16_03_28 +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`stego_classifier_checkpoint_epoch_10_2024_07_26_16_03_28` is a English model originally trained by jvelja. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/stego_classifier_checkpoint_epoch_10_2024_07_26_16_03_28_en_5.5.0_3.0_1726792176186.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/stego_classifier_checkpoint_epoch_10_2024_07_26_16_03_28_en_5.5.0_3.0_1726792176186.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("stego_classifier_checkpoint_epoch_10_2024_07_26_16_03_28","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("stego_classifier_checkpoint_epoch_10_2024_07_26_16_03_28", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|stego_classifier_checkpoint_epoch_10_2024_07_26_16_03_28| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/jvelja/stego-classifier-checkpoint-epoch-10-2024-07-26_16-03-28 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-stego_classifier_checkpoint_epoch_10_2024_07_26_16_03_28_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-stego_classifier_checkpoint_epoch_10_2024_07_26_16_03_28_pipeline_en.md new file mode 100644 index 00000000000000..8981922090cbe8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-stego_classifier_checkpoint_epoch_10_2024_07_26_16_03_28_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English stego_classifier_checkpoint_epoch_10_2024_07_26_16_03_28_pipeline pipeline DistilBertForSequenceClassification from jvelja +author: John Snow Labs +name: stego_classifier_checkpoint_epoch_10_2024_07_26_16_03_28_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`stego_classifier_checkpoint_epoch_10_2024_07_26_16_03_28_pipeline` is a English model originally trained by jvelja. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/stego_classifier_checkpoint_epoch_10_2024_07_26_16_03_28_pipeline_en_5.5.0_3.0_1726792188595.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/stego_classifier_checkpoint_epoch_10_2024_07_26_16_03_28_pipeline_en_5.5.0_3.0_1726792188595.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("stego_classifier_checkpoint_epoch_10_2024_07_26_16_03_28_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("stego_classifier_checkpoint_epoch_10_2024_07_26_16_03_28_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|stego_classifier_checkpoint_epoch_10_2024_07_26_16_03_28_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/jvelja/stego-classifier-checkpoint-epoch-10-2024-07-26_16-03-28 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-stego_classifier_checkpoint_epoch_90_2024_07_26_16_03_28_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-stego_classifier_checkpoint_epoch_90_2024_07_26_16_03_28_pipeline_en.md new file mode 100644 index 00000000000000..08739e8c8e8bd6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-stego_classifier_checkpoint_epoch_90_2024_07_26_16_03_28_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English stego_classifier_checkpoint_epoch_90_2024_07_26_16_03_28_pipeline pipeline DistilBertForSequenceClassification from jvelja +author: John Snow Labs +name: stego_classifier_checkpoint_epoch_90_2024_07_26_16_03_28_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`stego_classifier_checkpoint_epoch_90_2024_07_26_16_03_28_pipeline` is a English model originally trained by jvelja. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/stego_classifier_checkpoint_epoch_90_2024_07_26_16_03_28_pipeline_en_5.5.0_3.0_1726792390324.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/stego_classifier_checkpoint_epoch_90_2024_07_26_16_03_28_pipeline_en_5.5.0_3.0_1726792390324.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("stego_classifier_checkpoint_epoch_90_2024_07_26_16_03_28_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("stego_classifier_checkpoint_epoch_90_2024_07_26_16_03_28_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|stego_classifier_checkpoint_epoch_90_2024_07_26_16_03_28_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/jvelja/stego-classifier-checkpoint-epoch-90-2024-07-26_16-03-28 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-stereotype_italian_it.md b/docs/_posts/ahmedlone127/2024-09-20-stereotype_italian_it.md new file mode 100644 index 00000000000000..aa2062f3c0f1f2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-stereotype_italian_it.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Italian stereotype_italian BertForSequenceClassification from aequa-tech +author: John Snow Labs +name: stereotype_italian +date: 2024-09-20 +tags: [it, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: it +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`stereotype_italian` is a Italian model originally trained by aequa-tech. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/stereotype_italian_it_5.5.0_3.0_1726859952233.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/stereotype_italian_it_5.5.0_3.0_1726859952233.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("stereotype_italian","it") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("stereotype_italian", "it") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|stereotype_italian| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|it| +|Size:|691.9 MB| + +## References + +https://huggingface.co/aequa-tech/stereotype-it \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-stereotype_italian_pipeline_it.md b/docs/_posts/ahmedlone127/2024-09-20-stereotype_italian_pipeline_it.md new file mode 100644 index 00000000000000..99806185836de4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-stereotype_italian_pipeline_it.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Italian stereotype_italian_pipeline pipeline BertForSequenceClassification from aequa-tech +author: John Snow Labs +name: stereotype_italian_pipeline +date: 2024-09-20 +tags: [it, open_source, pipeline, onnx] +task: Text Classification +language: it +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`stereotype_italian_pipeline` is a Italian model originally trained by aequa-tech. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/stereotype_italian_pipeline_it_5.5.0_3.0_1726859987957.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/stereotype_italian_pipeline_it_5.5.0_3.0_1726859987957.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("stereotype_italian_pipeline", lang = "it") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("stereotype_italian_pipeline", lang = "it") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|stereotype_italian_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|it| +|Size:|691.9 MB| + +## References + +https://huggingface.co/aequa-tech/stereotype-it + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-stress_mentalbert_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-stress_mentalbert_pipeline_en.md new file mode 100644 index 00000000000000..00500657e7ed4d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-stress_mentalbert_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English stress_mentalbert_pipeline pipeline BertForSequenceClassification from tiya1012 +author: John Snow Labs +name: stress_mentalbert_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`stress_mentalbert_pipeline` is a English model originally trained by tiya1012. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/stress_mentalbert_pipeline_en_5.5.0_3.0_1726803968928.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/stress_mentalbert_pipeline_en_5.5.0_3.0_1726803968928.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("stress_mentalbert_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("stress_mentalbert_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|stress_mentalbert_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|408.9 MB| + +## References + +https://huggingface.co/tiya1012/stress_mentalbert + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-subjectivity_detection_for_chatgpt_sentiment_en.md b/docs/_posts/ahmedlone127/2024-09-20-subjectivity_detection_for_chatgpt_sentiment_en.md new file mode 100644 index 00000000000000..678cf91d090690 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-subjectivity_detection_for_chatgpt_sentiment_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English subjectivity_detection_for_chatgpt_sentiment RoBertaForSequenceClassification from Re0x10 +author: John Snow Labs +name: subjectivity_detection_for_chatgpt_sentiment +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`subjectivity_detection_for_chatgpt_sentiment` is a English model originally trained by Re0x10. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/subjectivity_detection_for_chatgpt_sentiment_en_5.5.0_3.0_1726850334101.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/subjectivity_detection_for_chatgpt_sentiment_en_5.5.0_3.0_1726850334101.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("subjectivity_detection_for_chatgpt_sentiment","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("subjectivity_detection_for_chatgpt_sentiment", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|subjectivity_detection_for_chatgpt_sentiment| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|468.1 MB| + +## References + +https://huggingface.co/Re0x10/subjectivity-detection-for-ChatGPT-sentiment \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-subjectivity_detection_for_chatgpt_sentiment_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-subjectivity_detection_for_chatgpt_sentiment_pipeline_en.md new file mode 100644 index 00000000000000..bea3d7cc2a6e4d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-subjectivity_detection_for_chatgpt_sentiment_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English subjectivity_detection_for_chatgpt_sentiment_pipeline pipeline RoBertaForSequenceClassification from Re0x10 +author: John Snow Labs +name: subjectivity_detection_for_chatgpt_sentiment_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`subjectivity_detection_for_chatgpt_sentiment_pipeline` is a English model originally trained by Re0x10. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/subjectivity_detection_for_chatgpt_sentiment_pipeline_en_5.5.0_3.0_1726850356761.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/subjectivity_detection_for_chatgpt_sentiment_pipeline_en_5.5.0_3.0_1726850356761.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("subjectivity_detection_for_chatgpt_sentiment_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("subjectivity_detection_for_chatgpt_sentiment_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|subjectivity_detection_for_chatgpt_sentiment_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|468.1 MB| + +## References + +https://huggingface.co/Re0x10/subjectivity-detection-for-ChatGPT-sentiment + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-symptom_ner_en.md b/docs/_posts/ahmedlone127/2024-09-20-symptom_ner_en.md new file mode 100644 index 00000000000000..44aa2b5cc03632 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-symptom_ner_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English symptom_ner RoBertaForTokenClassification from biololab +author: John Snow Labs +name: symptom_ner +date: 2024-09-20 +tags: [en, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`symptom_ner` is a English model originally trained by biololab. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/symptom_ner_en_5.5.0_3.0_1726853492980.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/symptom_ner_en_5.5.0_3.0_1726853492980.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("symptom_ner","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("symptom_ner", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|symptom_ner| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|433.3 MB| + +## References + +https://huggingface.co/biololab/symptom_ner \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-symptom_ner_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-symptom_ner_pipeline_en.md new file mode 100644 index 00000000000000..12cb6fa5d7182b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-symptom_ner_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English symptom_ner_pipeline pipeline RoBertaForTokenClassification from biololab +author: John Snow Labs +name: symptom_ner_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`symptom_ner_pipeline` is a English model originally trained by biololab. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/symptom_ner_pipeline_en_5.5.0_3.0_1726853523152.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/symptom_ner_pipeline_en_5.5.0_3.0_1726853523152.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("symptom_ner_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("symptom_ner_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|symptom_ner_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|433.3 MB| + +## References + +https://huggingface.co/biololab/symptom_ner + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-t_100002_en.md b/docs/_posts/ahmedlone127/2024-09-20-t_100002_en.md new file mode 100644 index 00000000000000..ea45a4421425bf --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-t_100002_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English t_100002 RoBertaForSequenceClassification from Pablojmed +author: John Snow Labs +name: t_100002 +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`t_100002` is a English model originally trained by Pablojmed. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/t_100002_en_5.5.0_3.0_1726852190915.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/t_100002_en_5.5.0_3.0_1726852190915.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("t_100002","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("t_100002", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|t_100002| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|438.2 MB| + +## References + +https://huggingface.co/Pablojmed/t_100002 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-t_100002_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-t_100002_pipeline_en.md new file mode 100644 index 00000000000000..7cc87c45c76c60 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-t_100002_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English t_100002_pipeline pipeline RoBertaForSequenceClassification from Pablojmed +author: John Snow Labs +name: t_100002_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`t_100002_pipeline` is a English model originally trained by Pablojmed. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/t_100002_pipeline_en_5.5.0_3.0_1726852221483.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/t_100002_pipeline_en_5.5.0_3.0_1726852221483.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("t_100002_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("t_100002_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|t_100002_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|438.2 MB| + +## References + +https://huggingface.co/Pablojmed/t_100002 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-test1_onlysmokehuazi_en.md b/docs/_posts/ahmedlone127/2024-09-20-test1_onlysmokehuazi_en.md new file mode 100644 index 00000000000000..5ac99e7ffb5c98 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-test1_onlysmokehuazi_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English test1_onlysmokehuazi DistilBertForSequenceClassification from Onlysmokehuazi +author: John Snow Labs +name: test1_onlysmokehuazi +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`test1_onlysmokehuazi` is a English model originally trained by Onlysmokehuazi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/test1_onlysmokehuazi_en_5.5.0_3.0_1726832506501.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/test1_onlysmokehuazi_en_5.5.0_3.0_1726832506501.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("test1_onlysmokehuazi","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("test1_onlysmokehuazi", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|test1_onlysmokehuazi| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Onlysmokehuazi/test1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-test1_onlysmokehuazi_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-test1_onlysmokehuazi_pipeline_en.md new file mode 100644 index 00000000000000..337443d4c0e1ee --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-test1_onlysmokehuazi_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English test1_onlysmokehuazi_pipeline pipeline DistilBertForSequenceClassification from Onlysmokehuazi +author: John Snow Labs +name: test1_onlysmokehuazi_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`test1_onlysmokehuazi_pipeline` is a English model originally trained by Onlysmokehuazi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/test1_onlysmokehuazi_pipeline_en_5.5.0_3.0_1726832519545.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/test1_onlysmokehuazi_pipeline_en_5.5.0_3.0_1726832519545.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("test1_onlysmokehuazi_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("test1_onlysmokehuazi_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|test1_onlysmokehuazi_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Onlysmokehuazi/test1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-test2_en.md b/docs/_posts/ahmedlone127/2024-09-20-test2_en.md new file mode 100644 index 00000000000000..5f9702b3d0465c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-test2_en.md @@ -0,0 +1,92 @@ +--- +layout: model +title: English test2 DistilBertForTokenClassification from yam1ke +author: John Snow Labs +name: test2 +date: 2024-09-20 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`test2` is a English model originally trained by yam1ke. + +## Predicted Entities + + + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/test2_en_5.5.0_3.0_1726812626002.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/test2_en_5.5.0_3.0_1726812626002.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = DistilBertForTokenClassification.pretrained("test2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = DistilBertForTokenClassification + .pretrained("test2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|test2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.7 GB| + +## References + +References + +https://huggingface.co/yam1ke/test2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-test_model_name_en.md b/docs/_posts/ahmedlone127/2024-09-20-test_model_name_en.md new file mode 100644 index 00000000000000..15f7a68ffd7ff5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-test_model_name_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English test_model_name DistilBertForSequenceClassification from lingaying +author: John Snow Labs +name: test_model_name +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`test_model_name` is a English model originally trained by lingaying. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/test_model_name_en_5.5.0_3.0_1726848654259.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/test_model_name_en_5.5.0_3.0_1726848654259.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("test_model_name","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("test_model_name", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|test_model_name| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/lingaying/test_model_name \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-test_model_name_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-test_model_name_pipeline_en.md new file mode 100644 index 00000000000000..66ea4e47ee91d7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-test_model_name_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English test_model_name_pipeline pipeline DistilBertForSequenceClassification from lingaying +author: John Snow Labs +name: test_model_name_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`test_model_name_pipeline` is a English model originally trained by lingaying. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/test_model_name_pipeline_en_5.5.0_3.0_1726848666312.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/test_model_name_pipeline_en_5.5.0_3.0_1726848666312.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("test_model_name_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("test_model_name_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|test_model_name_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/lingaying/test_model_name + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-test_trainer_raghavsharma06_en.md b/docs/_posts/ahmedlone127/2024-09-20-test_trainer_raghavsharma06_en.md new file mode 100644 index 00000000000000..174bf824769b9b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-test_trainer_raghavsharma06_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English test_trainer_raghavsharma06 DistilBertForSequenceClassification from raghavsharma06 +author: John Snow Labs +name: test_trainer_raghavsharma06 +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`test_trainer_raghavsharma06` is a English model originally trained by raghavsharma06. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/test_trainer_raghavsharma06_en_5.5.0_3.0_1726861223495.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/test_trainer_raghavsharma06_en_5.5.0_3.0_1726861223495.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("test_trainer_raghavsharma06","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("test_trainer_raghavsharma06", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|test_trainer_raghavsharma06| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/raghavsharma06/test_trainer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-test_trainer_raghavsharma06_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-test_trainer_raghavsharma06_pipeline_en.md new file mode 100644 index 00000000000000..216994b3e94817 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-test_trainer_raghavsharma06_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English test_trainer_raghavsharma06_pipeline pipeline DistilBertForSequenceClassification from raghavsharma06 +author: John Snow Labs +name: test_trainer_raghavsharma06_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`test_trainer_raghavsharma06_pipeline` is a English model originally trained by raghavsharma06. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/test_trainer_raghavsharma06_pipeline_en_5.5.0_3.0_1726861235212.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/test_trainer_raghavsharma06_pipeline_en_5.5.0_3.0_1726861235212.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("test_trainer_raghavsharma06_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("test_trainer_raghavsharma06_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|test_trainer_raghavsharma06_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/raghavsharma06/test_trainer + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-test_whisper_tiny_thai_kwanchiva_en.md b/docs/_posts/ahmedlone127/2024-09-20-test_whisper_tiny_thai_kwanchiva_en.md new file mode 100644 index 00000000000000..cfa7502ed6e140 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-test_whisper_tiny_thai_kwanchiva_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English test_whisper_tiny_thai_kwanchiva WhisperForCTC from kwanchiva +author: John Snow Labs +name: test_whisper_tiny_thai_kwanchiva +date: 2024-09-20 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`test_whisper_tiny_thai_kwanchiva` is a English model originally trained by kwanchiva. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/test_whisper_tiny_thai_kwanchiva_en_5.5.0_3.0_1726813864712.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/test_whisper_tiny_thai_kwanchiva_en_5.5.0_3.0_1726813864712.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("test_whisper_tiny_thai_kwanchiva","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("test_whisper_tiny_thai_kwanchiva", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|test_whisper_tiny_thai_kwanchiva| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|389.9 MB| + +## References + +https://huggingface.co/kwanchiva/test-whisper-tiny-th \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-test_whisper_tiny_thai_kwanchiva_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-test_whisper_tiny_thai_kwanchiva_pipeline_en.md new file mode 100644 index 00000000000000..b8b8a012f10d8a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-test_whisper_tiny_thai_kwanchiva_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English test_whisper_tiny_thai_kwanchiva_pipeline pipeline WhisperForCTC from kwanchiva +author: John Snow Labs +name: test_whisper_tiny_thai_kwanchiva_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`test_whisper_tiny_thai_kwanchiva_pipeline` is a English model originally trained by kwanchiva. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/test_whisper_tiny_thai_kwanchiva_pipeline_en_5.5.0_3.0_1726813885848.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/test_whisper_tiny_thai_kwanchiva_pipeline_en_5.5.0_3.0_1726813885848.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("test_whisper_tiny_thai_kwanchiva_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("test_whisper_tiny_thai_kwanchiva_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|test_whisper_tiny_thai_kwanchiva_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|389.9 MB| + +## References + +https://huggingface.co/kwanchiva/test-whisper-tiny-th + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-test_whisper_tiny_thai_sipang_en.md b/docs/_posts/ahmedlone127/2024-09-20-test_whisper_tiny_thai_sipang_en.md new file mode 100644 index 00000000000000..09d0170a46642a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-test_whisper_tiny_thai_sipang_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English test_whisper_tiny_thai_sipang WhisperForCTC from Sipang +author: John Snow Labs +name: test_whisper_tiny_thai_sipang +date: 2024-09-20 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`test_whisper_tiny_thai_sipang` is a English model originally trained by Sipang. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/test_whisper_tiny_thai_sipang_en_5.5.0_3.0_1726874307654.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/test_whisper_tiny_thai_sipang_en_5.5.0_3.0_1726874307654.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("test_whisper_tiny_thai_sipang","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("test_whisper_tiny_thai_sipang", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|test_whisper_tiny_thai_sipang| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|389.9 MB| + +## References + +https://huggingface.co/Sipang/test-whisper-tiny-th \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-test_whisper_tiny_thai_sipang_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-test_whisper_tiny_thai_sipang_pipeline_en.md new file mode 100644 index 00000000000000..9f3949d215fb58 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-test_whisper_tiny_thai_sipang_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English test_whisper_tiny_thai_sipang_pipeline pipeline WhisperForCTC from Sipang +author: John Snow Labs +name: test_whisper_tiny_thai_sipang_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`test_whisper_tiny_thai_sipang_pipeline` is a English model originally trained by Sipang. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/test_whisper_tiny_thai_sipang_pipeline_en_5.5.0_3.0_1726874333460.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/test_whisper_tiny_thai_sipang_pipeline_en_5.5.0_3.0_1726874333460.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("test_whisper_tiny_thai_sipang_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("test_whisper_tiny_thai_sipang_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|test_whisper_tiny_thai_sipang_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|389.9 MB| + +## References + +https://huggingface.co/Sipang/test-whisper-tiny-th + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-tester2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-tester2_pipeline_en.md new file mode 100644 index 00000000000000..89ffe4bd1fb5d6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-tester2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English tester2_pipeline pipeline DistilBertForSequenceClassification from thomasbeetz +author: John Snow Labs +name: tester2_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`tester2_pipeline` is a English model originally trained by thomasbeetz. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/tester2_pipeline_en_5.5.0_3.0_1726792101011.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/tester2_pipeline_en_5.5.0_3.0_1726792101011.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("tester2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("tester2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|tester2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/thomasbeetz/tester2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-testing_model_isom5240sp24_en.md b/docs/_posts/ahmedlone127/2024-09-20-testing_model_isom5240sp24_en.md new file mode 100644 index 00000000000000..d1046ecb9aaff9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-testing_model_isom5240sp24_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English testing_model_isom5240sp24 DistilBertForSequenceClassification from isom5240sp24 +author: John Snow Labs +name: testing_model_isom5240sp24 +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`testing_model_isom5240sp24` is a English model originally trained by isom5240sp24. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/testing_model_isom5240sp24_en_5.5.0_3.0_1726871777088.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/testing_model_isom5240sp24_en_5.5.0_3.0_1726871777088.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("testing_model_isom5240sp24","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("testing_model_isom5240sp24", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|testing_model_isom5240sp24| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/isom5240sp24/testing_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-testing_model_isom5240sp24_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-testing_model_isom5240sp24_pipeline_en.md new file mode 100644 index 00000000000000..f7bb1f995e21ec --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-testing_model_isom5240sp24_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English testing_model_isom5240sp24_pipeline pipeline DistilBertForSequenceClassification from isom5240sp24 +author: John Snow Labs +name: testing_model_isom5240sp24_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`testing_model_isom5240sp24_pipeline` is a English model originally trained by isom5240sp24. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/testing_model_isom5240sp24_pipeline_en_5.5.0_3.0_1726871788612.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/testing_model_isom5240sp24_pipeline_en_5.5.0_3.0_1726871788612.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("testing_model_isom5240sp24_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("testing_model_isom5240sp24_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|testing_model_isom5240sp24_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/isom5240sp24/testing_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-testxlmmulti3_en.md b/docs/_posts/ahmedlone127/2024-09-20-testxlmmulti3_en.md new file mode 100644 index 00000000000000..6f4e27a1f9c621 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-testxlmmulti3_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English testxlmmulti3 XlmRoBertaForSequenceClassification from sheduele +author: John Snow Labs +name: testxlmmulti3 +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`testxlmmulti3` is a English model originally trained by sheduele. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/testxlmmulti3_en_5.5.0_3.0_1726845583467.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/testxlmmulti3_en_5.5.0_3.0_1726845583467.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("testxlmmulti3","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("testxlmmulti3", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|testxlmmulti3| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|834.3 MB| + +## References + +https://huggingface.co/sheduele/testXLMMULTI3 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-testxlmmulti3_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-testxlmmulti3_pipeline_en.md new file mode 100644 index 00000000000000..30646a588266a8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-testxlmmulti3_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English testxlmmulti3_pipeline pipeline XlmRoBertaForSequenceClassification from sheduele +author: John Snow Labs +name: testxlmmulti3_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`testxlmmulti3_pipeline` is a English model originally trained by sheduele. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/testxlmmulti3_pipeline_en_5.5.0_3.0_1726845690127.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/testxlmmulti3_pipeline_en_5.5.0_3.0_1726845690127.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("testxlmmulti3_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("testxlmmulti3_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|testxlmmulti3_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|834.3 MB| + +## References + +https://huggingface.co/sheduele/testXLMMULTI3 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-text_classification_100_en.md b/docs/_posts/ahmedlone127/2024-09-20-text_classification_100_en.md new file mode 100644 index 00000000000000..3005a7302f6323 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-text_classification_100_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English text_classification_100 DistilBertForSequenceClassification from Neroism8422 +author: John Snow Labs +name: text_classification_100 +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`text_classification_100` is a English model originally trained by Neroism8422. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/text_classification_100_en_5.5.0_3.0_1726841339097.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/text_classification_100_en_5.5.0_3.0_1726841339097.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("text_classification_100","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("text_classification_100", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|text_classification_100| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Neroism8422/Text_Classification_100 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-text_classification_100_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-text_classification_100_pipeline_en.md new file mode 100644 index 00000000000000..8291bd82766d2a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-text_classification_100_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English text_classification_100_pipeline pipeline DistilBertForSequenceClassification from Neroism8422 +author: John Snow Labs +name: text_classification_100_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`text_classification_100_pipeline` is a English model originally trained by Neroism8422. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/text_classification_100_pipeline_en_5.5.0_3.0_1726841351309.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/text_classification_100_pipeline_en_5.5.0_3.0_1726841351309.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("text_classification_100_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("text_classification_100_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|text_classification_100_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Neroism8422/Text_Classification_100 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-text_clf_en.md b/docs/_posts/ahmedlone127/2024-09-20-text_clf_en.md new file mode 100644 index 00000000000000..cabf2b28816135 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-text_clf_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English text_clf DistilBertForSequenceClassification from SLKpnu +author: John Snow Labs +name: text_clf +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`text_clf` is a English model originally trained by SLKpnu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/text_clf_en_5.5.0_3.0_1726823798299.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/text_clf_en_5.5.0_3.0_1726823798299.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("text_clf","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("text_clf", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|text_clf| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/SLKpnu/text_clf \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-text_clf_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-text_clf_pipeline_en.md new file mode 100644 index 00000000000000..71db10b59bee93 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-text_clf_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English text_clf_pipeline pipeline DistilBertForSequenceClassification from SLKpnu +author: John Snow Labs +name: text_clf_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`text_clf_pipeline` is a English model originally trained by SLKpnu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/text_clf_pipeline_en_5.5.0_3.0_1726823810450.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/text_clf_pipeline_en_5.5.0_3.0_1726823810450.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("text_clf_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("text_clf_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|text_clf_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/SLKpnu/text_clf + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-tmp_trainer_rajendrabaskota_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-tmp_trainer_rajendrabaskota_pipeline_en.md new file mode 100644 index 00000000000000..0fe6d2efd02f8f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-tmp_trainer_rajendrabaskota_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English tmp_trainer_rajendrabaskota_pipeline pipeline RoBertaForSequenceClassification from rajendrabaskota +author: John Snow Labs +name: tmp_trainer_rajendrabaskota_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`tmp_trainer_rajendrabaskota_pipeline` is a English model originally trained by rajendrabaskota. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/tmp_trainer_rajendrabaskota_pipeline_en_5.5.0_3.0_1726804491369.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/tmp_trainer_rajendrabaskota_pipeline_en_5.5.0_3.0_1726804491369.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("tmp_trainer_rajendrabaskota_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("tmp_trainer_rajendrabaskota_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|tmp_trainer_rajendrabaskota_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|457.4 MB| + +## References + +https://huggingface.co/rajendrabaskota/tmp_trainer + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-tmp_trainer_vicman229_en.md b/docs/_posts/ahmedlone127/2024-09-20-tmp_trainer_vicman229_en.md new file mode 100644 index 00000000000000..aebc6e649d4780 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-tmp_trainer_vicman229_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English tmp_trainer_vicman229 DistilBertForSequenceClassification from Vicman229 +author: John Snow Labs +name: tmp_trainer_vicman229 +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`tmp_trainer_vicman229` is a English model originally trained by Vicman229. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/tmp_trainer_vicman229_en_5.5.0_3.0_1726823748924.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/tmp_trainer_vicman229_en_5.5.0_3.0_1726823748924.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("tmp_trainer_vicman229","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("tmp_trainer_vicman229", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|tmp_trainer_vicman229| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Vicman229/tmp_trainer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-topic_topic_random0_seed0_bernice_en.md b/docs/_posts/ahmedlone127/2024-09-20-topic_topic_random0_seed0_bernice_en.md new file mode 100644 index 00000000000000..99281648ef9ec5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-topic_topic_random0_seed0_bernice_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English topic_topic_random0_seed0_bernice XlmRoBertaForSequenceClassification from tweettemposhift +author: John Snow Labs +name: topic_topic_random0_seed0_bernice +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`topic_topic_random0_seed0_bernice` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/topic_topic_random0_seed0_bernice_en_5.5.0_3.0_1726865253086.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/topic_topic_random0_seed0_bernice_en_5.5.0_3.0_1726865253086.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("topic_topic_random0_seed0_bernice","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("topic_topic_random0_seed0_bernice", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|topic_topic_random0_seed0_bernice| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|805.6 MB| + +## References + +https://huggingface.co/tweettemposhift/topic-topic_random0_seed0-bernice \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-topic_topic_random0_seed0_bernice_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-topic_topic_random0_seed0_bernice_pipeline_en.md new file mode 100644 index 00000000000000..cde9c03aa28d98 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-topic_topic_random0_seed0_bernice_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English topic_topic_random0_seed0_bernice_pipeline pipeline XlmRoBertaForSequenceClassification from tweettemposhift +author: John Snow Labs +name: topic_topic_random0_seed0_bernice_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`topic_topic_random0_seed0_bernice_pipeline` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/topic_topic_random0_seed0_bernice_pipeline_en_5.5.0_3.0_1726865387882.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/topic_topic_random0_seed0_bernice_pipeline_en_5.5.0_3.0_1726865387882.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("topic_topic_random0_seed0_bernice_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("topic_topic_random0_seed0_bernice_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|topic_topic_random0_seed0_bernice_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|805.7 MB| + +## References + +https://huggingface.co/tweettemposhift/topic-topic_random0_seed0-bernice + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-topic_topic_random1_seed0_bernice_en.md b/docs/_posts/ahmedlone127/2024-09-20-topic_topic_random1_seed0_bernice_en.md new file mode 100644 index 00000000000000..1610008a4bdf15 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-topic_topic_random1_seed0_bernice_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English topic_topic_random1_seed0_bernice XlmRoBertaForSequenceClassification from tweettemposhift +author: John Snow Labs +name: topic_topic_random1_seed0_bernice +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`topic_topic_random1_seed0_bernice` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/topic_topic_random1_seed0_bernice_en_5.5.0_3.0_1726872646660.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/topic_topic_random1_seed0_bernice_en_5.5.0_3.0_1726872646660.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("topic_topic_random1_seed0_bernice","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("topic_topic_random1_seed0_bernice", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|topic_topic_random1_seed0_bernice| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|805.5 MB| + +## References + +https://huggingface.co/tweettemposhift/topic-topic_random1_seed0-bernice \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-topic_topic_random1_seed0_bernice_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-topic_topic_random1_seed0_bernice_pipeline_en.md new file mode 100644 index 00000000000000..c88d6011c35c96 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-topic_topic_random1_seed0_bernice_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English topic_topic_random1_seed0_bernice_pipeline pipeline XlmRoBertaForSequenceClassification from tweettemposhift +author: John Snow Labs +name: topic_topic_random1_seed0_bernice_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`topic_topic_random1_seed0_bernice_pipeline` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/topic_topic_random1_seed0_bernice_pipeline_en_5.5.0_3.0_1726872779836.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/topic_topic_random1_seed0_bernice_pipeline_en_5.5.0_3.0_1726872779836.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("topic_topic_random1_seed0_bernice_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("topic_topic_random1_seed0_bernice_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|topic_topic_random1_seed0_bernice_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|805.5 MB| + +## References + +https://huggingface.co/tweettemposhift/topic-topic_random1_seed0-bernice + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-toxic_detection_en.md b/docs/_posts/ahmedlone127/2024-09-20-toxic_detection_en.md new file mode 100644 index 00000000000000..f7f2f8f8505ede --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-toxic_detection_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English toxic_detection XlmRoBertaForSequenceClassification from sonnv +author: John Snow Labs +name: toxic_detection +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`toxic_detection` is a English model originally trained by sonnv. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/toxic_detection_en_5.5.0_3.0_1726873454516.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/toxic_detection_en_5.5.0_3.0_1726873454516.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("toxic_detection","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("toxic_detection", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|toxic_detection| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|693.4 MB| + +## References + +https://huggingface.co/sonnv/toxic_detection \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-toxic_detection_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-toxic_detection_pipeline_en.md new file mode 100644 index 00000000000000..d9cd3617f2ef68 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-toxic_detection_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English toxic_detection_pipeline pipeline XlmRoBertaForSequenceClassification from sonnv +author: John Snow Labs +name: toxic_detection_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`toxic_detection_pipeline` is a English model originally trained by sonnv. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/toxic_detection_pipeline_en_5.5.0_3.0_1726873619523.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/toxic_detection_pipeline_en_5.5.0_3.0_1726873619523.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("toxic_detection_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("toxic_detection_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|toxic_detection_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|693.4 MB| + +## References + +https://huggingface.co/sonnv/toxic_detection + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-trained_sentiment_analyzer_ipex_en.md b/docs/_posts/ahmedlone127/2024-09-20-trained_sentiment_analyzer_ipex_en.md new file mode 100644 index 00000000000000..3a736f82229ac0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-trained_sentiment_analyzer_ipex_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English trained_sentiment_analyzer_ipex DistilBertForSequenceClassification from redbaron007 +author: John Snow Labs +name: trained_sentiment_analyzer_ipex +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`trained_sentiment_analyzer_ipex` is a English model originally trained by redbaron007. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/trained_sentiment_analyzer_ipex_en_5.5.0_3.0_1726823720949.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/trained_sentiment_analyzer_ipex_en_5.5.0_3.0_1726823720949.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("trained_sentiment_analyzer_ipex","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("trained_sentiment_analyzer_ipex", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|trained_sentiment_analyzer_ipex| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/redbaron007/trained-sentiment-analyzer-ipex \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-trained_sentiment_analyzer_ipex_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-trained_sentiment_analyzer_ipex_pipeline_en.md new file mode 100644 index 00000000000000..e923e74dd2245a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-trained_sentiment_analyzer_ipex_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English trained_sentiment_analyzer_ipex_pipeline pipeline DistilBertForSequenceClassification from redbaron007 +author: John Snow Labs +name: trained_sentiment_analyzer_ipex_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`trained_sentiment_analyzer_ipex_pipeline` is a English model originally trained by redbaron007. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/trained_sentiment_analyzer_ipex_pipeline_en_5.5.0_3.0_1726823732750.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/trained_sentiment_analyzer_ipex_pipeline_en_5.5.0_3.0_1726823732750.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("trained_sentiment_analyzer_ipex_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("trained_sentiment_analyzer_ipex_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|trained_sentiment_analyzer_ipex_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/redbaron007/trained-sentiment-analyzer-ipex + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-trainer3b_en.md b/docs/_posts/ahmedlone127/2024-09-20-trainer3b_en.md new file mode 100644 index 00000000000000..208d065e1f0386 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-trainer3b_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English trainer3b DistilBertForSequenceClassification from SimoneJLaudani +author: John Snow Labs +name: trainer3b +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`trainer3b` is a English model originally trained by SimoneJLaudani. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/trainer3b_en_5.5.0_3.0_1726823830451.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/trainer3b_en_5.5.0_3.0_1726823830451.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("trainer3b","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("trainer3b", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|trainer3b| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|410.0 MB| + +## References + +https://huggingface.co/SimoneJLaudani/trainer3b \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-trainerh2_en.md b/docs/_posts/ahmedlone127/2024-09-20-trainerh2_en.md new file mode 100644 index 00000000000000..4839584d189fdc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-trainerh2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English trainerh2 DistilBertForSequenceClassification from SimoneJLaudani +author: John Snow Labs +name: trainerh2 +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`trainerh2` is a English model originally trained by SimoneJLaudani. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/trainerh2_en_5.5.0_3.0_1726848926481.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/trainerh2_en_5.5.0_3.0_1726848926481.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("trainerh2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("trainerh2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|trainerh2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/SimoneJLaudani/trainerH2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-trainerh2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-trainerh2_pipeline_en.md new file mode 100644 index 00000000000000..6e62113d0b6002 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-trainerh2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English trainerh2_pipeline pipeline DistilBertForSequenceClassification from SimoneJLaudani +author: John Snow Labs +name: trainerh2_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`trainerh2_pipeline` is a English model originally trained by SimoneJLaudani. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/trainerh2_pipeline_en_5.5.0_3.0_1726848938623.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/trainerh2_pipeline_en_5.5.0_3.0_1726848938623.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("trainerh2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("trainerh2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|trainerh2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/SimoneJLaudani/trainerH2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-try_model_icelandic_health_en.md b/docs/_posts/ahmedlone127/2024-09-20-try_model_icelandic_health_en.md new file mode 100644 index 00000000000000..ebc068d3486fe1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-try_model_icelandic_health_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English try_model_icelandic_health DistilBertForSequenceClassification from iamaries +author: John Snow Labs +name: try_model_icelandic_health +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`try_model_icelandic_health` is a English model originally trained by iamaries. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/try_model_icelandic_health_en_5.5.0_3.0_1726841329126.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/try_model_icelandic_health_en_5.5.0_3.0_1726841329126.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("try_model_icelandic_health","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("try_model_icelandic_health", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|try_model_icelandic_health| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/iamaries/try_model_is_health \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-try_model_icelandic_health_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-try_model_icelandic_health_pipeline_en.md new file mode 100644 index 00000000000000..c15145944bed9e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-try_model_icelandic_health_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English try_model_icelandic_health_pipeline pipeline DistilBertForSequenceClassification from iamaries +author: John Snow Labs +name: try_model_icelandic_health_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`try_model_icelandic_health_pipeline` is a English model originally trained by iamaries. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/try_model_icelandic_health_pipeline_en_5.5.0_3.0_1726841341662.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/try_model_icelandic_health_pipeline_en_5.5.0_3.0_1726841341662.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("try_model_icelandic_health_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("try_model_icelandic_health_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|try_model_icelandic_health_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/iamaries/try_model_is_health + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-twiiter_try6_fold1_en.md b/docs/_posts/ahmedlone127/2024-09-20-twiiter_try6_fold1_en.md new file mode 100644 index 00000000000000..3be8e1f1f16a01 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-twiiter_try6_fold1_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English twiiter_try6_fold1 XlmRoBertaForSequenceClassification from yanezh +author: John Snow Labs +name: twiiter_try6_fold1 +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`twiiter_try6_fold1` is a English model originally trained by yanezh. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/twiiter_try6_fold1_en_5.5.0_3.0_1726872739216.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/twiiter_try6_fold1_en_5.5.0_3.0_1726872739216.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("twiiter_try6_fold1","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("twiiter_try6_fold1", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|twiiter_try6_fold1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.0 GB| + +## References + +https://huggingface.co/yanezh/twiiter_try6_fold1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-twitchleaguebert_1000k_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-twitchleaguebert_1000k_pipeline_en.md new file mode 100644 index 00000000000000..bac475912586c7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-twitchleaguebert_1000k_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English twitchleaguebert_1000k_pipeline pipeline RoBertaEmbeddings from Epidot +author: John Snow Labs +name: twitchleaguebert_1000k_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`twitchleaguebert_1000k_pipeline` is a English model originally trained by Epidot. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/twitchleaguebert_1000k_pipeline_en_5.5.0_3.0_1726796346204.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/twitchleaguebert_1000k_pipeline_en_5.5.0_3.0_1726796346204.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("twitchleaguebert_1000k_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("twitchleaguebert_1000k_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|twitchleaguebert_1000k_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|305.5 MB| + +## References + +https://huggingface.co/Epidot/TwitchLeagueBert-1000k + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-twitter_roberta_base_mar2021_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-twitter_roberta_base_mar2021_pipeline_en.md new file mode 100644 index 00000000000000..5e216cc7ccdd14 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-twitter_roberta_base_mar2021_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English twitter_roberta_base_mar2021_pipeline pipeline RoBertaEmbeddings from cardiffnlp +author: John Snow Labs +name: twitter_roberta_base_mar2021_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`twitter_roberta_base_mar2021_pipeline` is a English model originally trained by cardiffnlp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/twitter_roberta_base_mar2021_pipeline_en_5.5.0_3.0_1726816167077.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/twitter_roberta_base_mar2021_pipeline_en_5.5.0_3.0_1726816167077.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("twitter_roberta_base_mar2021_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("twitter_roberta_base_mar2021_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|twitter_roberta_base_mar2021_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|466.1 MB| + +## References + +https://huggingface.co/cardiffnlp/twitter-roberta-base-mar2021 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-twitter_xlm_roberta_base_sentiment_geomeife_en.md b/docs/_posts/ahmedlone127/2024-09-20-twitter_xlm_roberta_base_sentiment_geomeife_en.md new file mode 100644 index 00000000000000..a52c4ec5d5ffe7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-twitter_xlm_roberta_base_sentiment_geomeife_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English twitter_xlm_roberta_base_sentiment_geomeife XlmRoBertaForSequenceClassification from Geomeife +author: John Snow Labs +name: twitter_xlm_roberta_base_sentiment_geomeife +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`twitter_xlm_roberta_base_sentiment_geomeife` is a English model originally trained by Geomeife. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/twitter_xlm_roberta_base_sentiment_geomeife_en_5.5.0_3.0_1726800690703.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/twitter_xlm_roberta_base_sentiment_geomeife_en_5.5.0_3.0_1726800690703.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("twitter_xlm_roberta_base_sentiment_geomeife","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("twitter_xlm_roberta_base_sentiment_geomeife", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|twitter_xlm_roberta_base_sentiment_geomeife| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.0 GB| + +## References + +https://huggingface.co/Geomeife/twitter-xlm-roberta-base-sentiment \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-twitterfin_padding40model_en.md b/docs/_posts/ahmedlone127/2024-09-20-twitterfin_padding40model_en.md new file mode 100644 index 00000000000000..86bc89dba47f7a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-twitterfin_padding40model_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English twitterfin_padding40model DistilBertForSequenceClassification from Realgon +author: John Snow Labs +name: twitterfin_padding40model +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`twitterfin_padding40model` is a English model originally trained by Realgon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/twitterfin_padding40model_en_5.5.0_3.0_1726841097603.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/twitterfin_padding40model_en_5.5.0_3.0_1726841097603.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("twitterfin_padding40model","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("twitterfin_padding40model", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|twitterfin_padding40model| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Realgon/twitterfin_padding40model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-twitterfin_padding40model_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-twitterfin_padding40model_pipeline_en.md new file mode 100644 index 00000000000000..e00cab2f71735b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-twitterfin_padding40model_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English twitterfin_padding40model_pipeline pipeline DistilBertForSequenceClassification from Realgon +author: John Snow Labs +name: twitterfin_padding40model_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`twitterfin_padding40model_pipeline` is a English model originally trained by Realgon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/twitterfin_padding40model_pipeline_en_5.5.0_3.0_1726841109749.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/twitterfin_padding40model_pipeline_en_5.5.0_3.0_1726841109749.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("twitterfin_padding40model_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("twitterfin_padding40model_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|twitterfin_padding40model_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Realgon/twitterfin_padding40model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-uned_tfg_08_42_en.md b/docs/_posts/ahmedlone127/2024-09-20-uned_tfg_08_42_en.md new file mode 100644 index 00000000000000..65dc80fa079c8b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-uned_tfg_08_42_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English uned_tfg_08_42 RoBertaForSequenceClassification from alexisdr +author: John Snow Labs +name: uned_tfg_08_42 +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`uned_tfg_08_42` is a English model originally trained by alexisdr. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/uned_tfg_08_42_en_5.5.0_3.0_1726852227572.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/uned_tfg_08_42_en_5.5.0_3.0_1726852227572.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("uned_tfg_08_42","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("uned_tfg_08_42", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|uned_tfg_08_42| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|426.5 MB| + +## References + +https://huggingface.co/alexisdr/uned-tfg-08.42 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-uned_tfg_08_42_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-uned_tfg_08_42_pipeline_en.md new file mode 100644 index 00000000000000..a35adb2ff35d54 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-uned_tfg_08_42_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English uned_tfg_08_42_pipeline pipeline RoBertaForSequenceClassification from alexisdr +author: John Snow Labs +name: uned_tfg_08_42_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`uned_tfg_08_42_pipeline` is a English model originally trained by alexisdr. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/uned_tfg_08_42_pipeline_en_5.5.0_3.0_1726852264168.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/uned_tfg_08_42_pipeline_en_5.5.0_3.0_1726852264168.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("uned_tfg_08_42_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("uned_tfg_08_42_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|uned_tfg_08_42_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|426.5 MB| + +## References + +https://huggingface.co/alexisdr/uned-tfg-08.42 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-uned_tfg_08_56_en.md b/docs/_posts/ahmedlone127/2024-09-20-uned_tfg_08_56_en.md new file mode 100644 index 00000000000000..fc0cc882419c4b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-uned_tfg_08_56_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English uned_tfg_08_56 RoBertaForSequenceClassification from alexisdr +author: John Snow Labs +name: uned_tfg_08_56 +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`uned_tfg_08_56` is a English model originally trained by alexisdr. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/uned_tfg_08_56_en_5.5.0_3.0_1726851486673.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/uned_tfg_08_56_en_5.5.0_3.0_1726851486673.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("uned_tfg_08_56","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("uned_tfg_08_56", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|uned_tfg_08_56| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|438.6 MB| + +## References + +https://huggingface.co/alexisdr/uned-tfg-08.56 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-uned_tfg_08_56_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-uned_tfg_08_56_pipeline_en.md new file mode 100644 index 00000000000000..4e30a32d1a8344 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-uned_tfg_08_56_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English uned_tfg_08_56_pipeline pipeline RoBertaForSequenceClassification from alexisdr +author: John Snow Labs +name: uned_tfg_08_56_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`uned_tfg_08_56_pipeline` is a English model originally trained by alexisdr. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/uned_tfg_08_56_pipeline_en_5.5.0_3.0_1726851518349.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/uned_tfg_08_56_pipeline_en_5.5.0_3.0_1726851518349.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("uned_tfg_08_56_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("uned_tfg_08_56_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|uned_tfg_08_56_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|438.7 MB| + +## References + +https://huggingface.co/alexisdr/uned-tfg-08.56 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-uned_tfg_08_64_mas_frecuentes_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-uned_tfg_08_64_mas_frecuentes_pipeline_en.md new file mode 100644 index 00000000000000..c23f374e016c4f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-uned_tfg_08_64_mas_frecuentes_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English uned_tfg_08_64_mas_frecuentes_pipeline pipeline RoBertaForSequenceClassification from alexisdr +author: John Snow Labs +name: uned_tfg_08_64_mas_frecuentes_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`uned_tfg_08_64_mas_frecuentes_pipeline` is a English model originally trained by alexisdr. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/uned_tfg_08_64_mas_frecuentes_pipeline_en_5.5.0_3.0_1726804387863.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/uned_tfg_08_64_mas_frecuentes_pipeline_en_5.5.0_3.0_1726804387863.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("uned_tfg_08_64_mas_frecuentes_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("uned_tfg_08_64_mas_frecuentes_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|uned_tfg_08_64_mas_frecuentes_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|429.9 MB| + +## References + +https://huggingface.co/alexisdr/uned-tfg-08.64_mas_frecuentes + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-uned_tfg_08_78_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-uned_tfg_08_78_pipeline_en.md new file mode 100644 index 00000000000000..979cfadfe70c3a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-uned_tfg_08_78_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English uned_tfg_08_78_pipeline pipeline RoBertaForSequenceClassification from alexisdr +author: John Snow Labs +name: uned_tfg_08_78_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`uned_tfg_08_78_pipeline` is a English model originally trained by alexisdr. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/uned_tfg_08_78_pipeline_en_5.5.0_3.0_1726852103895.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/uned_tfg_08_78_pipeline_en_5.5.0_3.0_1726852103895.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("uned_tfg_08_78_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("uned_tfg_08_78_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|uned_tfg_08_78_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|443.4 MB| + +## References + +https://huggingface.co/alexisdr/uned-tfg-08.78 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-wandb_test_5e_5_en.md b/docs/_posts/ahmedlone127/2024-09-20-wandb_test_5e_5_en.md new file mode 100644 index 00000000000000..bbb9ad95cc99c4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-wandb_test_5e_5_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English wandb_test_5e_5 XlmRoBertaForTokenClassification from grace-pro +author: John Snow Labs +name: wandb_test_5e_5 +date: 2024-09-20 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`wandb_test_5e_5` is a English model originally trained by grace-pro. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/wandb_test_5e_5_en_5.5.0_3.0_1726843955223.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/wandb_test_5e_5_en_5.5.0_3.0_1726843955223.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("wandb_test_5e_5","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("wandb_test_5e_5", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|wandb_test_5e_5| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|1.0 GB| + +## References + +https://huggingface.co/grace-pro/wandb_test_5e-5 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-wandb_test_5e_5_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-wandb_test_5e_5_pipeline_en.md new file mode 100644 index 00000000000000..84fe1665951fe4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-wandb_test_5e_5_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English wandb_test_5e_5_pipeline pipeline XlmRoBertaForTokenClassification from grace-pro +author: John Snow Labs +name: wandb_test_5e_5_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`wandb_test_5e_5_pipeline` is a English model originally trained by grace-pro. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/wandb_test_5e_5_pipeline_en_5.5.0_3.0_1726844006065.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/wandb_test_5e_5_pipeline_en_5.5.0_3.0_1726844006065.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("wandb_test_5e_5_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("wandb_test_5e_5_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|wandb_test_5e_5_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.0 GB| + +## References + +https://huggingface.co/grace-pro/wandb_test_5e-5 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-wannasleep_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-wannasleep_pipeline_en.md new file mode 100644 index 00000000000000..ee661bc8235e94 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-wannasleep_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English wannasleep_pipeline pipeline DistilBertForSequenceClassification from kithangw +author: John Snow Labs +name: wannasleep_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`wannasleep_pipeline` is a English model originally trained by kithangw. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/wannasleep_pipeline_en_5.5.0_3.0_1726809654745.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/wannasleep_pipeline_en_5.5.0_3.0_1726809654745.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("wannasleep_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("wannasleep_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|wannasleep_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/kithangw/wannasleep + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-whisper_base_atco2_en.md b/docs/_posts/ahmedlone127/2024-09-20-whisper_base_atco2_en.md new file mode 100644 index 00000000000000..9423b12694c9a2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-whisper_base_atco2_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_base_atco2 WhisperForCTC from FunPang +author: John Snow Labs +name: whisper_base_atco2 +date: 2024-09-20 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_base_atco2` is a English model originally trained by FunPang. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_base_atco2_en_5.5.0_3.0_1726874736873.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_base_atco2_en_5.5.0_3.0_1726874736873.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_base_atco2","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_base_atco2", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_base_atco2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|642.1 MB| + +## References + +https://huggingface.co/FunPang/whisper_base_atco2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-whisper_base_atco2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-whisper_base_atco2_pipeline_en.md new file mode 100644 index 00000000000000..7e4d4690b1fe97 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-whisper_base_atco2_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_base_atco2_pipeline pipeline WhisperForCTC from FunPang +author: John Snow Labs +name: whisper_base_atco2_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_base_atco2_pipeline` is a English model originally trained by FunPang. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_base_atco2_pipeline_en_5.5.0_3.0_1726874775207.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_base_atco2_pipeline_en_5.5.0_3.0_1726874775207.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_base_atco2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_base_atco2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_base_atco2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|642.1 MB| + +## References + +https://huggingface.co/FunPang/whisper_base_atco2 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-whisper_base_pashto_ihanif_en.md b/docs/_posts/ahmedlone127/2024-09-20-whisper_base_pashto_ihanif_en.md new file mode 100644 index 00000000000000..8720a87d41cc7f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-whisper_base_pashto_ihanif_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_base_pashto_ihanif WhisperForCTC from ihanif +author: John Snow Labs +name: whisper_base_pashto_ihanif +date: 2024-09-20 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_base_pashto_ihanif` is a English model originally trained by ihanif. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_base_pashto_ihanif_en_5.5.0_3.0_1726810197147.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_base_pashto_ihanif_en_5.5.0_3.0_1726810197147.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_base_pashto_ihanif","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_base_pashto_ihanif", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_base_pashto_ihanif| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|643.9 MB| + +## References + +https://huggingface.co/ihanif/whisper-base-ps \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-whisper_base_pashto_ihanif_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-whisper_base_pashto_ihanif_pipeline_en.md new file mode 100644 index 00000000000000..6022245aa9b836 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-whisper_base_pashto_ihanif_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_base_pashto_ihanif_pipeline pipeline WhisperForCTC from ihanif +author: John Snow Labs +name: whisper_base_pashto_ihanif_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_base_pashto_ihanif_pipeline` is a English model originally trained by ihanif. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_base_pashto_ihanif_pipeline_en_5.5.0_3.0_1726810229623.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_base_pashto_ihanif_pipeline_en_5.5.0_3.0_1726810229623.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_base_pashto_ihanif_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_base_pashto_ihanif_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_base_pashto_ihanif_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|643.9 MB| + +## References + +https://huggingface.co/ihanif/whisper-base-ps + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-whisper_base_portuguese_old_pipeline_pt.md b/docs/_posts/ahmedlone127/2024-09-20-whisper_base_portuguese_old_pipeline_pt.md new file mode 100644 index 00000000000000..83c7c2c7ac38d0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-whisper_base_portuguese_old_pipeline_pt.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Portuguese whisper_base_portuguese_old_pipeline pipeline WhisperForCTC from zuazo +author: John Snow Labs +name: whisper_base_portuguese_old_pipeline +date: 2024-09-20 +tags: [pt, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: pt +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_base_portuguese_old_pipeline` is a Portuguese model originally trained by zuazo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_base_portuguese_old_pipeline_pt_5.5.0_3.0_1726874362473.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_base_portuguese_old_pipeline_pt_5.5.0_3.0_1726874362473.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_base_portuguese_old_pipeline", lang = "pt") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_base_portuguese_old_pipeline", lang = "pt") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_base_portuguese_old_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|pt| +|Size:|641.4 MB| + +## References + +https://huggingface.co/zuazo/whisper-base-pt-old + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-whisper_base_portuguese_old_pt.md b/docs/_posts/ahmedlone127/2024-09-20-whisper_base_portuguese_old_pt.md new file mode 100644 index 00000000000000..5a7aecb8bfd9ef --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-whisper_base_portuguese_old_pt.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Portuguese whisper_base_portuguese_old WhisperForCTC from zuazo +author: John Snow Labs +name: whisper_base_portuguese_old +date: 2024-09-20 +tags: [pt, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: pt +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_base_portuguese_old` is a Portuguese model originally trained by zuazo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_base_portuguese_old_pt_5.5.0_3.0_1726874329464.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_base_portuguese_old_pt_5.5.0_3.0_1726874329464.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_base_portuguese_old","pt") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_base_portuguese_old", "pt") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_base_portuguese_old| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|pt| +|Size:|641.4 MB| + +## References + +https://huggingface.co/zuazo/whisper-base-pt-old \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-whisper_finetuning_kr.md b/docs/_posts/ahmedlone127/2024-09-20-whisper_finetuning_kr.md new file mode 100644 index 00000000000000..36ede53c41c13a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-whisper_finetuning_kr.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Kanuri whisper_finetuning WhisperForCTC from doongsae +author: John Snow Labs +name: whisper_finetuning +date: 2024-09-20 +tags: [kr, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: kr +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_finetuning` is a Kanuri model originally trained by doongsae. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_finetuning_kr_5.5.0_3.0_1726876760105.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_finetuning_kr_5.5.0_3.0_1726876760105.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_finetuning","kr") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_finetuning", "kr") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_finetuning| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|kr| +|Size:|642.3 MB| + +## References + +https://huggingface.co/doongsae/whisper_finetuning \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-whisper_finetuning_pipeline_kr.md b/docs/_posts/ahmedlone127/2024-09-20-whisper_finetuning_pipeline_kr.md new file mode 100644 index 00000000000000..f0a98989cd40a1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-whisper_finetuning_pipeline_kr.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Kanuri whisper_finetuning_pipeline pipeline WhisperForCTC from doongsae +author: John Snow Labs +name: whisper_finetuning_pipeline +date: 2024-09-20 +tags: [kr, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: kr +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_finetuning_pipeline` is a Kanuri model originally trained by doongsae. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_finetuning_pipeline_kr_5.5.0_3.0_1726876793877.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_finetuning_pipeline_kr_5.5.0_3.0_1726876793877.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_finetuning_pipeline", lang = "kr") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_finetuning_pipeline", lang = "kr") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_finetuning_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|kr| +|Size:|642.3 MB| + +## References + +https://huggingface.co/doongsae/whisper_finetuning + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-whisper_small200speedysep6_spanish_es.md b/docs/_posts/ahmedlone127/2024-09-20-whisper_small200speedysep6_spanish_es.md new file mode 100644 index 00000000000000..3c222654592b8d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-whisper_small200speedysep6_spanish_es.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Castilian, Spanish whisper_small200speedysep6_spanish WhisperForCTC from jessicadiveai +author: John Snow Labs +name: whisper_small200speedysep6_spanish +date: 2024-09-20 +tags: [es, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: es +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small200speedysep6_spanish` is a Castilian, Spanish model originally trained by jessicadiveai. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small200speedysep6_spanish_es_5.5.0_3.0_1726814314277.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small200speedysep6_spanish_es_5.5.0_3.0_1726814314277.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small200speedysep6_spanish","es") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small200speedysep6_spanish", "es") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small200speedysep6_spanish| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|es| +|Size:|1.7 GB| + +## References + +https://huggingface.co/jessicadiveai/whisper-small200speedysep6-es \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-whisper_small200speedysep6_spanish_pipeline_es.md b/docs/_posts/ahmedlone127/2024-09-20-whisper_small200speedysep6_spanish_pipeline_es.md new file mode 100644 index 00000000000000..2c6309623e8adc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-whisper_small200speedysep6_spanish_pipeline_es.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Castilian, Spanish whisper_small200speedysep6_spanish_pipeline pipeline WhisperForCTC from jessicadiveai +author: John Snow Labs +name: whisper_small200speedysep6_spanish_pipeline +date: 2024-09-20 +tags: [es, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: es +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small200speedysep6_spanish_pipeline` is a Castilian, Spanish model originally trained by jessicadiveai. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small200speedysep6_spanish_pipeline_es_5.5.0_3.0_1726814396240.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small200speedysep6_spanish_pipeline_es_5.5.0_3.0_1726814396240.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small200speedysep6_spanish_pipeline", lang = "es") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small200speedysep6_spanish_pipeline", lang = "es") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small200speedysep6_spanish_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|es| +|Size:|1.7 GB| + +## References + +https://huggingface.co/jessicadiveai/whisper-small200speedysep6-es + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-whisper_small_breton_en.md b/docs/_posts/ahmedlone127/2024-09-20-whisper_small_breton_en.md new file mode 100644 index 00000000000000..4e940cc429c2cd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-whisper_small_breton_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_small_breton WhisperForCTC from gweltou +author: John Snow Labs +name: whisper_small_breton +date: 2024-09-20 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_breton` is a English model originally trained by gweltou. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_breton_en_5.5.0_3.0_1726814094059.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_breton_en_5.5.0_3.0_1726814094059.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_breton","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_breton", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_breton| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/gweltou/whisper-small-br \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-whisper_small_divehi_arpan_das_astrophysics_en.md b/docs/_posts/ahmedlone127/2024-09-20-whisper_small_divehi_arpan_das_astrophysics_en.md new file mode 100644 index 00000000000000..d560e7bc663e0d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-whisper_small_divehi_arpan_das_astrophysics_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_small_divehi_arpan_das_astrophysics WhisperForCTC from arpan-das-astrophysics +author: John Snow Labs +name: whisper_small_divehi_arpan_das_astrophysics +date: 2024-09-20 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_divehi_arpan_das_astrophysics` is a English model originally trained by arpan-das-astrophysics. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_divehi_arpan_das_astrophysics_en_5.5.0_3.0_1726874472982.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_divehi_arpan_das_astrophysics_en_5.5.0_3.0_1726874472982.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_divehi_arpan_das_astrophysics","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_divehi_arpan_das_astrophysics", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_divehi_arpan_das_astrophysics| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|390.9 MB| + +## References + +https://huggingface.co/arpan-das-astrophysics/whisper-small-dv \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-whisper_small_divehi_arpan_das_astrophysics_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-whisper_small_divehi_arpan_das_astrophysics_pipeline_en.md new file mode 100644 index 00000000000000..3d09b0ccf9c9e8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-whisper_small_divehi_arpan_das_astrophysics_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_small_divehi_arpan_das_astrophysics_pipeline pipeline WhisperForCTC from arpan-das-astrophysics +author: John Snow Labs +name: whisper_small_divehi_arpan_das_astrophysics_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_divehi_arpan_das_astrophysics_pipeline` is a English model originally trained by arpan-das-astrophysics. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_divehi_arpan_das_astrophysics_pipeline_en_5.5.0_3.0_1726874492670.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_divehi_arpan_das_astrophysics_pipeline_en_5.5.0_3.0_1726874492670.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_divehi_arpan_das_astrophysics_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_divehi_arpan_das_astrophysics_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_divehi_arpan_das_astrophysics_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|390.9 MB| + +## References + +https://huggingface.co/arpan-das-astrophysics/whisper-small-dv + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-whisper_small_hindi_ver2_hi.md b/docs/_posts/ahmedlone127/2024-09-20-whisper_small_hindi_ver2_hi.md new file mode 100644 index 00000000000000..e429f7769b487c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-whisper_small_hindi_ver2_hi.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Hindi whisper_small_hindi_ver2 WhisperForCTC from saxenagauravhf +author: John Snow Labs +name: whisper_small_hindi_ver2 +date: 2024-09-20 +tags: [hi, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: hi +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_hindi_ver2` is a Hindi model originally trained by saxenagauravhf. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_hindi_ver2_hi_5.5.0_3.0_1726814107573.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_hindi_ver2_hi_5.5.0_3.0_1726814107573.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_hindi_ver2","hi") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_hindi_ver2", "hi") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_hindi_ver2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|hi| +|Size:|1.7 GB| + +## References + +https://huggingface.co/saxenagauravhf/whisper-small-hi-ver2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-whisper_small_hindi_ver2_pipeline_hi.md b/docs/_posts/ahmedlone127/2024-09-20-whisper_small_hindi_ver2_pipeline_hi.md new file mode 100644 index 00000000000000..06d45fb0cdc2db --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-whisper_small_hindi_ver2_pipeline_hi.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Hindi whisper_small_hindi_ver2_pipeline pipeline WhisperForCTC from saxenagauravhf +author: John Snow Labs +name: whisper_small_hindi_ver2_pipeline +date: 2024-09-20 +tags: [hi, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: hi +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_hindi_ver2_pipeline` is a Hindi model originally trained by saxenagauravhf. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_hindi_ver2_pipeline_hi_5.5.0_3.0_1726814198361.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_hindi_ver2_pipeline_hi_5.5.0_3.0_1726814198361.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_hindi_ver2_pipeline", lang = "hi") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_hindi_ver2_pipeline", lang = "hi") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_hindi_ver2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|hi| +|Size:|1.7 GB| + +## References + +https://huggingface.co/saxenagauravhf/whisper-small-hi-ver2 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-whisper_small_indonesian_zeinhasan_hi.md b/docs/_posts/ahmedlone127/2024-09-20-whisper_small_indonesian_zeinhasan_hi.md new file mode 100644 index 00000000000000..b2be33bd4eb4e0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-whisper_small_indonesian_zeinhasan_hi.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Hindi whisper_small_indonesian_zeinhasan WhisperForCTC from zeinhasan +author: John Snow Labs +name: whisper_small_indonesian_zeinhasan +date: 2024-09-20 +tags: [hi, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: hi +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_indonesian_zeinhasan` is a Hindi model originally trained by zeinhasan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_indonesian_zeinhasan_hi_5.5.0_3.0_1726811954031.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_indonesian_zeinhasan_hi_5.5.0_3.0_1726811954031.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_indonesian_zeinhasan","hi") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_indonesian_zeinhasan", "hi") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_indonesian_zeinhasan| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|hi| +|Size:|389.8 MB| + +## References + +https://huggingface.co/zeinhasan/whisper-small-id \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-whisper_small_indonesian_zeinhasan_pipeline_hi.md b/docs/_posts/ahmedlone127/2024-09-20-whisper_small_indonesian_zeinhasan_pipeline_hi.md new file mode 100644 index 00000000000000..2a6d2112340887 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-whisper_small_indonesian_zeinhasan_pipeline_hi.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Hindi whisper_small_indonesian_zeinhasan_pipeline pipeline WhisperForCTC from zeinhasan +author: John Snow Labs +name: whisper_small_indonesian_zeinhasan_pipeline +date: 2024-09-20 +tags: [hi, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: hi +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_indonesian_zeinhasan_pipeline` is a Hindi model originally trained by zeinhasan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_indonesian_zeinhasan_pipeline_hi_5.5.0_3.0_1726811972858.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_indonesian_zeinhasan_pipeline_hi_5.5.0_3.0_1726811972858.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_indonesian_zeinhasan_pipeline", lang = "hi") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_indonesian_zeinhasan_pipeline", lang = "hi") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_indonesian_zeinhasan_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|hi| +|Size:|389.8 MB| + +## References + +https://huggingface.co/zeinhasan/whisper-small-id + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-whisper_small_oriya_pipeline_bn.md b/docs/_posts/ahmedlone127/2024-09-20-whisper_small_oriya_pipeline_bn.md new file mode 100644 index 00000000000000..1e263b35553f6e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-whisper_small_oriya_pipeline_bn.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Bengali whisper_small_oriya_pipeline pipeline WhisperForCTC from amitkayal +author: John Snow Labs +name: whisper_small_oriya_pipeline +date: 2024-09-20 +tags: [bn, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: bn +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_oriya_pipeline` is a Bengali model originally trained by amitkayal. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_oriya_pipeline_bn_5.5.0_3.0_1726811909944.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_oriya_pipeline_bn_5.5.0_3.0_1726811909944.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_oriya_pipeline", lang = "bn") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_oriya_pipeline", lang = "bn") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_oriya_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|bn| +|Size:|1.7 GB| + +## References + +https://huggingface.co/amitkayal/whisper-small-or + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-whisper_small_persian_farsi_tavakoli_fa.md b/docs/_posts/ahmedlone127/2024-09-20-whisper_small_persian_farsi_tavakoli_fa.md new file mode 100644 index 00000000000000..5972e1fd9e044f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-whisper_small_persian_farsi_tavakoli_fa.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Persian whisper_small_persian_farsi_tavakoli WhisperForCTC from Tavakoli +author: John Snow Labs +name: whisper_small_persian_farsi_tavakoli +date: 2024-09-20 +tags: [fa, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: fa +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_persian_farsi_tavakoli` is a Persian model originally trained by Tavakoli. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_persian_farsi_tavakoli_fa_5.5.0_3.0_1726876625515.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_persian_farsi_tavakoli_fa_5.5.0_3.0_1726876625515.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_persian_farsi_tavakoli","fa") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_persian_farsi_tavakoli", "fa") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_persian_farsi_tavakoli| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|fa| +|Size:|1.7 GB| + +## References + +https://huggingface.co/Tavakoli/whisper-small-fa \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-whisper_small_persian_farsi_tavakoli_pipeline_fa.md b/docs/_posts/ahmedlone127/2024-09-20-whisper_small_persian_farsi_tavakoli_pipeline_fa.md new file mode 100644 index 00000000000000..f54702074f58ba --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-whisper_small_persian_farsi_tavakoli_pipeline_fa.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Persian whisper_small_persian_farsi_tavakoli_pipeline pipeline WhisperForCTC from Tavakoli +author: John Snow Labs +name: whisper_small_persian_farsi_tavakoli_pipeline +date: 2024-09-20 +tags: [fa, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: fa +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_persian_farsi_tavakoli_pipeline` is a Persian model originally trained by Tavakoli. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_persian_farsi_tavakoli_pipeline_fa_5.5.0_3.0_1726876712959.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_persian_farsi_tavakoli_pipeline_fa_5.5.0_3.0_1726876712959.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_persian_farsi_tavakoli_pipeline", lang = "fa") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_persian_farsi_tavakoli_pipeline", lang = "fa") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_persian_farsi_tavakoli_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|fa| +|Size:|1.7 GB| + +## References + +https://huggingface.co/Tavakoli/whisper-small-fa + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-whisper_small_portuguese_estelle1emerson_pipeline_pt.md b/docs/_posts/ahmedlone127/2024-09-20-whisper_small_portuguese_estelle1emerson_pipeline_pt.md new file mode 100644 index 00000000000000..c3e9d20da2ec69 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-whisper_small_portuguese_estelle1emerson_pipeline_pt.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Portuguese whisper_small_portuguese_estelle1emerson_pipeline pipeline WhisperForCTC from estelle1emerson +author: John Snow Labs +name: whisper_small_portuguese_estelle1emerson_pipeline +date: 2024-09-20 +tags: [pt, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: pt +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_portuguese_estelle1emerson_pipeline` is a Portuguese model originally trained by estelle1emerson. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_portuguese_estelle1emerson_pipeline_pt_5.5.0_3.0_1726874720934.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_portuguese_estelle1emerson_pipeline_pt_5.5.0_3.0_1726874720934.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_portuguese_estelle1emerson_pipeline", lang = "pt") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_portuguese_estelle1emerson_pipeline", lang = "pt") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_portuguese_estelle1emerson_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|pt| +|Size:|1.7 GB| + +## References + +https://huggingface.co/estelle1emerson/whisper-small-pt + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-whisper_small_portuguese_estelle1emerson_pt.md b/docs/_posts/ahmedlone127/2024-09-20-whisper_small_portuguese_estelle1emerson_pt.md new file mode 100644 index 00000000000000..64715e573d0fcc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-whisper_small_portuguese_estelle1emerson_pt.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Portuguese whisper_small_portuguese_estelle1emerson WhisperForCTC from estelle1emerson +author: John Snow Labs +name: whisper_small_portuguese_estelle1emerson +date: 2024-09-20 +tags: [pt, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: pt +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_portuguese_estelle1emerson` is a Portuguese model originally trained by estelle1emerson. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_portuguese_estelle1emerson_pt_5.5.0_3.0_1726874637633.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_portuguese_estelle1emerson_pt_5.5.0_3.0_1726874637633.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_portuguese_estelle1emerson","pt") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_portuguese_estelle1emerson", "pt") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_portuguese_estelle1emerson| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|pt| +|Size:|1.7 GB| + +## References + +https://huggingface.co/estelle1emerson/whisper-small-pt \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-whisper_small_yue_chinese_hk_retrained_1_en.md b/docs/_posts/ahmedlone127/2024-09-20-whisper_small_yue_chinese_hk_retrained_1_en.md new file mode 100644 index 00000000000000..a740d30f2799d8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-whisper_small_yue_chinese_hk_retrained_1_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_small_yue_chinese_hk_retrained_1 WhisperForCTC from wcyat +author: John Snow Labs +name: whisper_small_yue_chinese_hk_retrained_1 +date: 2024-09-20 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_yue_chinese_hk_retrained_1` is a English model originally trained by wcyat. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_yue_chinese_hk_retrained_1_en_5.5.0_3.0_1726813472719.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_yue_chinese_hk_retrained_1_en_5.5.0_3.0_1726813472719.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_yue_chinese_hk_retrained_1","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_yue_chinese_hk_retrained_1", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_yue_chinese_hk_retrained_1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/wcyat/whisper-small-yue-hk-retrained-1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-whisper_tiny_arabic_hi.md b/docs/_posts/ahmedlone127/2024-09-20-whisper_tiny_arabic_hi.md new file mode 100644 index 00000000000000..467b3addc27c56 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-whisper_tiny_arabic_hi.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Hindi whisper_tiny_arabic WhisperForCTC from arbml +author: John Snow Labs +name: whisper_tiny_arabic +date: 2024-09-20 +tags: [hi, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: hi +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_arabic` is a Hindi model originally trained by arbml. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_arabic_hi_5.5.0_3.0_1726874656691.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_arabic_hi_5.5.0_3.0_1726874656691.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_tiny_arabic","hi") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_tiny_arabic", "hi") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_arabic| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|hi| +|Size:|390.7 MB| + +## References + +https://huggingface.co/arbml/whisper-tiny-ar \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-whisper_tiny_arabic_pipeline_hi.md b/docs/_posts/ahmedlone127/2024-09-20-whisper_tiny_arabic_pipeline_hi.md new file mode 100644 index 00000000000000..6a00ebaebe213b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-whisper_tiny_arabic_pipeline_hi.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Hindi whisper_tiny_arabic_pipeline pipeline WhisperForCTC from arbml +author: John Snow Labs +name: whisper_tiny_arabic_pipeline +date: 2024-09-20 +tags: [hi, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: hi +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_arabic_pipeline` is a Hindi model originally trained by arbml. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_arabic_pipeline_hi_5.5.0_3.0_1726874676801.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_arabic_pipeline_hi_5.5.0_3.0_1726874676801.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_tiny_arabic_pipeline", lang = "hi") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_tiny_arabic_pipeline", lang = "hi") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_arabic_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|hi| +|Size:|390.7 MB| + +## References + +https://huggingface.co/arbml/whisper-tiny-ar + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-whisper_tiny_finetuned_minds14_jackismyshephard_en.md b/docs/_posts/ahmedlone127/2024-09-20-whisper_tiny_finetuned_minds14_jackismyshephard_en.md new file mode 100644 index 00000000000000..b9fdb14098ead4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-whisper_tiny_finetuned_minds14_jackismyshephard_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_tiny_finetuned_minds14_jackismyshephard WhisperForCTC from JackismyShephard +author: John Snow Labs +name: whisper_tiny_finetuned_minds14_jackismyshephard +date: 2024-09-20 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_finetuned_minds14_jackismyshephard` is a English model originally trained by JackismyShephard. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_finetuned_minds14_jackismyshephard_en_5.5.0_3.0_1726874308111.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_finetuned_minds14_jackismyshephard_en_5.5.0_3.0_1726874308111.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_tiny_finetuned_minds14_jackismyshephard","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_tiny_finetuned_minds14_jackismyshephard", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_finetuned_minds14_jackismyshephard| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|389.7 MB| + +## References + +https://huggingface.co/JackismyShephard/whisper-tiny-finetuned-minds14 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-whisper_tiny_finetuned_minds14_jackismyshephard_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-whisper_tiny_finetuned_minds14_jackismyshephard_pipeline_en.md new file mode 100644 index 00000000000000..735b9aaf027621 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-whisper_tiny_finetuned_minds14_jackismyshephard_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_tiny_finetuned_minds14_jackismyshephard_pipeline pipeline WhisperForCTC from JackismyShephard +author: John Snow Labs +name: whisper_tiny_finetuned_minds14_jackismyshephard_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_finetuned_minds14_jackismyshephard_pipeline` is a English model originally trained by JackismyShephard. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_finetuned_minds14_jackismyshephard_pipeline_en_5.5.0_3.0_1726874333467.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_finetuned_minds14_jackismyshephard_pipeline_en_5.5.0_3.0_1726874333467.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_tiny_finetuned_minds14_jackismyshephard_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_tiny_finetuned_minds14_jackismyshephard_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_finetuned_minds14_jackismyshephard_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|389.7 MB| + +## References + +https://huggingface.co/JackismyShephard/whisper-tiny-finetuned-minds14 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-whisper_tiny_finetuned_multilingual_5_languages_pipeline_xx.md b/docs/_posts/ahmedlone127/2024-09-20-whisper_tiny_finetuned_multilingual_5_languages_pipeline_xx.md new file mode 100644 index 00000000000000..d4ea5c5929dc4b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-whisper_tiny_finetuned_multilingual_5_languages_pipeline_xx.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Multilingual whisper_tiny_finetuned_multilingual_5_languages_pipeline pipeline WhisperForCTC from Prasetyow12 +author: John Snow Labs +name: whisper_tiny_finetuned_multilingual_5_languages_pipeline +date: 2024-09-20 +tags: [xx, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_finetuned_multilingual_5_languages_pipeline` is a Multilingual model originally trained by Prasetyow12. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_finetuned_multilingual_5_languages_pipeline_xx_5.5.0_3.0_1726874328827.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_finetuned_multilingual_5_languages_pipeline_xx_5.5.0_3.0_1726874328827.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_tiny_finetuned_multilingual_5_languages_pipeline", lang = "xx") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_tiny_finetuned_multilingual_5_languages_pipeline", lang = "xx") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_finetuned_multilingual_5_languages_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|xx| +|Size:|390.0 MB| + +## References + +https://huggingface.co/Prasetyow12/whisper-tiny-finetuned-multilingual-5-languages + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-whisper_tiny_finetuned_multilingual_5_languages_xx.md b/docs/_posts/ahmedlone127/2024-09-20-whisper_tiny_finetuned_multilingual_5_languages_xx.md new file mode 100644 index 00000000000000..ae68c7ae88571a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-whisper_tiny_finetuned_multilingual_5_languages_xx.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Multilingual whisper_tiny_finetuned_multilingual_5_languages WhisperForCTC from Prasetyow12 +author: John Snow Labs +name: whisper_tiny_finetuned_multilingual_5_languages +date: 2024-09-20 +tags: [xx, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_finetuned_multilingual_5_languages` is a Multilingual model originally trained by Prasetyow12. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_finetuned_multilingual_5_languages_xx_5.5.0_3.0_1726874307504.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_finetuned_multilingual_5_languages_xx_5.5.0_3.0_1726874307504.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_tiny_finetuned_multilingual_5_languages","xx") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_tiny_finetuned_multilingual_5_languages", "xx") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_finetuned_multilingual_5_languages| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|xx| +|Size:|389.9 MB| + +## References + +https://huggingface.co/Prasetyow12/whisper-tiny-finetuned-multilingual-5-languages \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-whisper_tiny_galician_pipeline_gl.md b/docs/_posts/ahmedlone127/2024-09-20-whisper_tiny_galician_pipeline_gl.md new file mode 100644 index 00000000000000..964ae5f2313509 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-whisper_tiny_galician_pipeline_gl.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Galician whisper_tiny_galician_pipeline pipeline WhisperForCTC from zuazo +author: John Snow Labs +name: whisper_tiny_galician_pipeline +date: 2024-09-20 +tags: [gl, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: gl +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_galician_pipeline` is a Galician model originally trained by zuazo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_galician_pipeline_gl_5.5.0_3.0_1726812459915.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_galician_pipeline_gl_5.5.0_3.0_1726812459915.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_tiny_galician_pipeline", lang = "gl") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_tiny_galician_pipeline", lang = "gl") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_galician_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|gl| +|Size:|390.7 MB| + +## References + +https://huggingface.co/zuazo/whisper-tiny-gl + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-whisper_tiny_italian_4_it.md b/docs/_posts/ahmedlone127/2024-09-20-whisper_tiny_italian_4_it.md new file mode 100644 index 00000000000000..f041df5db847d3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-whisper_tiny_italian_4_it.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Italian whisper_tiny_italian_4 WhisperForCTC from GIanlucaRub +author: John Snow Labs +name: whisper_tiny_italian_4 +date: 2024-09-20 +tags: [it, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: it +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_italian_4` is a Italian model originally trained by GIanlucaRub. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_italian_4_it_5.5.0_3.0_1726874990796.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_italian_4_it_5.5.0_3.0_1726874990796.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_tiny_italian_4","it") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_tiny_italian_4", "it") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_italian_4| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|it| +|Size:|390.4 MB| + +## References + +https://huggingface.co/GIanlucaRub/whisper-tiny-it-4 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-whisper_tiny_italian_4_pipeline_it.md b/docs/_posts/ahmedlone127/2024-09-20-whisper_tiny_italian_4_pipeline_it.md new file mode 100644 index 00000000000000..b2ead2de2c6945 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-whisper_tiny_italian_4_pipeline_it.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Italian whisper_tiny_italian_4_pipeline pipeline WhisperForCTC from GIanlucaRub +author: John Snow Labs +name: whisper_tiny_italian_4_pipeline +date: 2024-09-20 +tags: [it, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: it +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_italian_4_pipeline` is a Italian model originally trained by GIanlucaRub. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_italian_4_pipeline_it_5.5.0_3.0_1726875011501.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_italian_4_pipeline_it_5.5.0_3.0_1726875011501.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_tiny_italian_4_pipeline", lang = "it") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_tiny_italian_4_pipeline", lang = "it") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_italian_4_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|it| +|Size:|390.4 MB| + +## References + +https://huggingface.co/GIanlucaRub/whisper-tiny-it-4 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-whisper_tiny_tamil_pipeline_hi.md b/docs/_posts/ahmedlone127/2024-09-20-whisper_tiny_tamil_pipeline_hi.md new file mode 100644 index 00000000000000..d089eed6c1fba4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-whisper_tiny_tamil_pipeline_hi.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Hindi whisper_tiny_tamil_pipeline pipeline WhisperForCTC from Sammarieo +author: John Snow Labs +name: whisper_tiny_tamil_pipeline +date: 2024-09-20 +tags: [hi, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: hi +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_tamil_pipeline` is a Hindi model originally trained by Sammarieo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_tamil_pipeline_hi_5.5.0_3.0_1726813718761.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_tamil_pipeline_hi_5.5.0_3.0_1726813718761.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_tiny_tamil_pipeline", lang = "hi") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_tiny_tamil_pipeline", lang = "hi") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_tamil_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|hi| +|Size:|390.7 MB| + +## References + +https://huggingface.co/Sammarieo/whisper-tiny-ta + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-whisper_tiny_telugu_en.md b/docs/_posts/ahmedlone127/2024-09-20-whisper_tiny_telugu_en.md new file mode 100644 index 00000000000000..a4c3ffe14317f4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-whisper_tiny_telugu_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_tiny_telugu WhisperForCTC from parambharat +author: John Snow Labs +name: whisper_tiny_telugu +date: 2024-09-20 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_telugu` is a English model originally trained by parambharat. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_telugu_en_5.5.0_3.0_1726814263563.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_telugu_en_5.5.0_3.0_1726814263563.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_tiny_telugu","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_tiny_telugu", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_telugu| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|391.1 MB| + +## References + +https://huggingface.co/parambharat/whisper-tiny-te \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-whisper_tiny_telugu_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-whisper_tiny_telugu_pipeline_en.md new file mode 100644 index 00000000000000..c91330c59ba4de --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-whisper_tiny_telugu_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_tiny_telugu_pipeline pipeline WhisperForCTC from parambharat +author: John Snow Labs +name: whisper_tiny_telugu_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_telugu_pipeline` is a English model originally trained by parambharat. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_telugu_pipeline_en_5.5.0_3.0_1726814283408.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_telugu_pipeline_en_5.5.0_3.0_1726814283408.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_tiny_telugu_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_tiny_telugu_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_telugu_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|391.1 MB| + +## References + +https://huggingface.co/parambharat/whisper-tiny-te + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-whisper_tiny_us_agercas_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-whisper_tiny_us_agercas_pipeline_en.md new file mode 100644 index 00000000000000..383ed8afda6f7f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-whisper_tiny_us_agercas_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_tiny_us_agercas_pipeline pipeline WhisperForCTC from agercas +author: John Snow Labs +name: whisper_tiny_us_agercas_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_us_agercas_pipeline` is a English model originally trained by agercas. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_us_agercas_pipeline_en_5.5.0_3.0_1726811709623.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_us_agercas_pipeline_en_5.5.0_3.0_1726811709623.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_tiny_us_agercas_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_tiny_us_agercas_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_us_agercas_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|390.9 MB| + +## References + +https://huggingface.co/agercas/whisper-tiny-us + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-wisesight_sentiment_xlm_r_en.md b/docs/_posts/ahmedlone127/2024-09-20-wisesight_sentiment_xlm_r_en.md new file mode 100644 index 00000000000000..efefeb6f19a2c3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-wisesight_sentiment_xlm_r_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English wisesight_sentiment_xlm_r XlmRoBertaForSequenceClassification from Cincin-nvp +author: John Snow Labs +name: wisesight_sentiment_xlm_r +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`wisesight_sentiment_xlm_r` is a English model originally trained by Cincin-nvp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/wisesight_sentiment_xlm_r_en_5.5.0_3.0_1726846617205.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/wisesight_sentiment_xlm_r_en_5.5.0_3.0_1726846617205.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("wisesight_sentiment_xlm_r","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("wisesight_sentiment_xlm_r", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|wisesight_sentiment_xlm_r| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|853.0 MB| + +## References + +https://huggingface.co/Cincin-nvp/wisesight_sentiment_XLM-R \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-wisesight_sentiment_xlm_r_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-wisesight_sentiment_xlm_r_pipeline_en.md new file mode 100644 index 00000000000000..cef57e1ad42ae0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-wisesight_sentiment_xlm_r_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English wisesight_sentiment_xlm_r_pipeline pipeline XlmRoBertaForSequenceClassification from Cincin-nvp +author: John Snow Labs +name: wisesight_sentiment_xlm_r_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`wisesight_sentiment_xlm_r_pipeline` is a English model originally trained by Cincin-nvp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/wisesight_sentiment_xlm_r_pipeline_en_5.5.0_3.0_1726846679445.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/wisesight_sentiment_xlm_r_pipeline_en_5.5.0_3.0_1726846679445.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("wisesight_sentiment_xlm_r_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("wisesight_sentiment_xlm_r_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|wisesight_sentiment_xlm_r_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|853.0 MB| + +## References + +https://huggingface.co/Cincin-nvp/wisesight_sentiment_XLM-R + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-withinapps_ndd_mrbs_test_content_tags_cwadj_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-withinapps_ndd_mrbs_test_content_tags_cwadj_pipeline_en.md new file mode 100644 index 00000000000000..dfbd76ed62d837 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-withinapps_ndd_mrbs_test_content_tags_cwadj_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English withinapps_ndd_mrbs_test_content_tags_cwadj_pipeline pipeline DistilBertForSequenceClassification from lgk03 +author: John Snow Labs +name: withinapps_ndd_mrbs_test_content_tags_cwadj_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`withinapps_ndd_mrbs_test_content_tags_cwadj_pipeline` is a English model originally trained by lgk03. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/withinapps_ndd_mrbs_test_content_tags_cwadj_pipeline_en_5.5.0_3.0_1726792032389.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/withinapps_ndd_mrbs_test_content_tags_cwadj_pipeline_en_5.5.0_3.0_1726792032389.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("withinapps_ndd_mrbs_test_content_tags_cwadj_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("withinapps_ndd_mrbs_test_content_tags_cwadj_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|withinapps_ndd_mrbs_test_content_tags_cwadj_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/lgk03/WITHINAPPS_NDD-mrbs_test-content_tags-CWAdj + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-withinapps_ndd_ppma_test_tags_cwadj_en.md b/docs/_posts/ahmedlone127/2024-09-20-withinapps_ndd_ppma_test_tags_cwadj_en.md new file mode 100644 index 00000000000000..51d185ed0e119b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-withinapps_ndd_ppma_test_tags_cwadj_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English withinapps_ndd_ppma_test_tags_cwadj DistilBertForSequenceClassification from lgk03 +author: John Snow Labs +name: withinapps_ndd_ppma_test_tags_cwadj +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`withinapps_ndd_ppma_test_tags_cwadj` is a English model originally trained by lgk03. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/withinapps_ndd_ppma_test_tags_cwadj_en_5.5.0_3.0_1726792181182.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/withinapps_ndd_ppma_test_tags_cwadj_en_5.5.0_3.0_1726792181182.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("withinapps_ndd_ppma_test_tags_cwadj","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("withinapps_ndd_ppma_test_tags_cwadj", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|withinapps_ndd_ppma_test_tags_cwadj| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/lgk03/WITHINAPPS_NDD-ppma_test-tags-CWAdj \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-withinapps_ndd_ppma_test_tags_cwadj_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-withinapps_ndd_ppma_test_tags_cwadj_pipeline_en.md new file mode 100644 index 00000000000000..4d2a03bd54c1d4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-withinapps_ndd_ppma_test_tags_cwadj_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English withinapps_ndd_ppma_test_tags_cwadj_pipeline pipeline DistilBertForSequenceClassification from lgk03 +author: John Snow Labs +name: withinapps_ndd_ppma_test_tags_cwadj_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`withinapps_ndd_ppma_test_tags_cwadj_pipeline` is a English model originally trained by lgk03. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/withinapps_ndd_ppma_test_tags_cwadj_pipeline_en_5.5.0_3.0_1726792193625.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/withinapps_ndd_ppma_test_tags_cwadj_pipeline_en_5.5.0_3.0_1726792193625.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("withinapps_ndd_ppma_test_tags_cwadj_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("withinapps_ndd_ppma_test_tags_cwadj_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|withinapps_ndd_ppma_test_tags_cwadj_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/lgk03/WITHINAPPS_NDD-ppma_test-tags-CWAdj + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-xlm_robert_finetune_model_thai_french_mm_en.md b/docs/_posts/ahmedlone127/2024-09-20-xlm_robert_finetune_model_thai_french_mm_en.md new file mode 100644 index 00000000000000..49db2f75464b6f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-xlm_robert_finetune_model_thai_french_mm_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_robert_finetune_model_thai_french_mm XlmRoBertaForTokenClassification from zhangwenzhe +author: John Snow Labs +name: xlm_robert_finetune_model_thai_french_mm +date: 2024-09-20 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_robert_finetune_model_thai_french_mm` is a English model originally trained by zhangwenzhe. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_robert_finetune_model_thai_french_mm_en_5.5.0_3.0_1726844434243.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_robert_finetune_model_thai_french_mm_en_5.5.0_3.0_1726844434243.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_robert_finetune_model_thai_french_mm","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_robert_finetune_model_thai_french_mm", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_robert_finetune_model_thai_french_mm| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|773.1 MB| + +## References + +https://huggingface.co/zhangwenzhe/XLM-Robert-finetune-model-THAI-FR-MM \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-xlm_robert_finetune_model_thai_french_mm_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-xlm_robert_finetune_model_thai_french_mm_pipeline_en.md new file mode 100644 index 00000000000000..9e19da5ddac3b7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-xlm_robert_finetune_model_thai_french_mm_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_robert_finetune_model_thai_french_mm_pipeline pipeline XlmRoBertaForTokenClassification from zhangwenzhe +author: John Snow Labs +name: xlm_robert_finetune_model_thai_french_mm_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_robert_finetune_model_thai_french_mm_pipeline` is a English model originally trained by zhangwenzhe. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_robert_finetune_model_thai_french_mm_pipeline_en_5.5.0_3.0_1726844575810.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_robert_finetune_model_thai_french_mm_pipeline_en_5.5.0_3.0_1726844575810.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_robert_finetune_model_thai_french_mm_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_robert_finetune_model_thai_french_mm_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_robert_finetune_model_thai_french_mm_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|773.2 MB| + +## References + +https://huggingface.co/zhangwenzhe/XLM-Robert-finetune-model-THAI-FR-MM + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_balance_vietnam_train_en.md b/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_balance_vietnam_train_en.md new file mode 100644 index 00000000000000..38fd1302459171 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_balance_vietnam_train_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_balance_vietnam_train XlmRoBertaForSequenceClassification from ThuyNT03 +author: John Snow Labs +name: xlm_roberta_base_balance_vietnam_train +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_balance_vietnam_train` is a English model originally trained by ThuyNT03. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_balance_vietnam_train_en_5.5.0_3.0_1726865047942.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_balance_vietnam_train_en_5.5.0_3.0_1726865047942.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_balance_vietnam_train","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_balance_vietnam_train", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_balance_vietnam_train| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|815.1 MB| + +## References + +https://huggingface.co/ThuyNT03/xlm-roberta-base-Balance_VietNam-train \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_balance_vietnam_train_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_balance_vietnam_train_pipeline_en.md new file mode 100644 index 00000000000000..c670e77a310191 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_balance_vietnam_train_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_balance_vietnam_train_pipeline pipeline XlmRoBertaForSequenceClassification from ThuyNT03 +author: John Snow Labs +name: xlm_roberta_base_balance_vietnam_train_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_balance_vietnam_train_pipeline` is a English model originally trained by ThuyNT03. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_balance_vietnam_train_pipeline_en_5.5.0_3.0_1726865145513.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_balance_vietnam_train_pipeline_en_5.5.0_3.0_1726865145513.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_balance_vietnam_train_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_balance_vietnam_train_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_balance_vietnam_train_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|815.1 MB| + +## References + +https://huggingface.co/ThuyNT03/xlm-roberta-base-Balance_VietNam-train + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_claimbuster_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_claimbuster_pipeline_en.md new file mode 100644 index 00000000000000..a2120b086dbddb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_claimbuster_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_claimbuster_pipeline pipeline XlmRoBertaForSequenceClassification from Nithiwat +author: John Snow Labs +name: xlm_roberta_base_claimbuster_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_claimbuster_pipeline` is a English model originally trained by Nithiwat. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_claimbuster_pipeline_en_5.5.0_3.0_1726800482552.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_claimbuster_pipeline_en_5.5.0_3.0_1726800482552.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_claimbuster_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_claimbuster_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_claimbuster_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|845.2 MB| + +## References + +https://huggingface.co/Nithiwat/xlm-roberta-base_claimbuster + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_english_sentweet_sentiment_en.md b/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_english_sentweet_sentiment_en.md new file mode 100644 index 00000000000000..3744419373282d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_english_sentweet_sentiment_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_english_sentweet_sentiment XlmRoBertaForSequenceClassification from jayanta +author: John Snow Labs +name: xlm_roberta_base_english_sentweet_sentiment +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_english_sentweet_sentiment` is a English model originally trained by jayanta. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_english_sentweet_sentiment_en_5.5.0_3.0_1726873079975.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_english_sentweet_sentiment_en_5.5.0_3.0_1726873079975.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_english_sentweet_sentiment","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_english_sentweet_sentiment", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_english_sentweet_sentiment| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|785.8 MB| + +## References + +https://huggingface.co/jayanta/xlm-roberta-base-english-sentweet-sentiment \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_english_sentweet_sentiment_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_english_sentweet_sentiment_pipeline_en.md new file mode 100644 index 00000000000000..6b58292ad25440 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_english_sentweet_sentiment_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_english_sentweet_sentiment_pipeline pipeline XlmRoBertaForSequenceClassification from jayanta +author: John Snow Labs +name: xlm_roberta_base_english_sentweet_sentiment_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_english_sentweet_sentiment_pipeline` is a English model originally trained by jayanta. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_english_sentweet_sentiment_pipeline_en_5.5.0_3.0_1726873214651.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_english_sentweet_sentiment_pipeline_en_5.5.0_3.0_1726873214651.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_english_sentweet_sentiment_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_english_sentweet_sentiment_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_english_sentweet_sentiment_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|785.8 MB| + +## References + +https://huggingface.co/jayanta/xlm-roberta-base-english-sentweet-sentiment + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_fakenews_dravidian_nt_3e_4_en.md b/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_fakenews_dravidian_nt_3e_4_en.md new file mode 100644 index 00000000000000..d84d83c968314f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_fakenews_dravidian_nt_3e_4_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_fakenews_dravidian_nt_3e_4 XlmRoBertaForSequenceClassification from mdosama39 +author: John Snow Labs +name: xlm_roberta_base_fakenews_dravidian_nt_3e_4 +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_fakenews_dravidian_nt_3e_4` is a English model originally trained by mdosama39. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_fakenews_dravidian_nt_3e_4_en_5.5.0_3.0_1726873102294.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_fakenews_dravidian_nt_3e_4_en_5.5.0_3.0_1726873102294.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_fakenews_dravidian_nt_3e_4","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_fakenews_dravidian_nt_3e_4", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_fakenews_dravidian_nt_3e_4| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|849.2 MB| + +## References + +https://huggingface.co/mdosama39/xlm-roberta-base-FakeNews-Dravidian-NT-3e-4 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_fakenews_dravidian_nt_3e_4_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_fakenews_dravidian_nt_3e_4_pipeline_en.md new file mode 100644 index 00000000000000..d3691cc1372329 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_fakenews_dravidian_nt_3e_4_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_fakenews_dravidian_nt_3e_4_pipeline pipeline XlmRoBertaForSequenceClassification from mdosama39 +author: John Snow Labs +name: xlm_roberta_base_fakenews_dravidian_nt_3e_4_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_fakenews_dravidian_nt_3e_4_pipeline` is a English model originally trained by mdosama39. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_fakenews_dravidian_nt_3e_4_pipeline_en_5.5.0_3.0_1726873163639.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_fakenews_dravidian_nt_3e_4_pipeline_en_5.5.0_3.0_1726873163639.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_fakenews_dravidian_nt_3e_4_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_fakenews_dravidian_nt_3e_4_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_fakenews_dravidian_nt_3e_4_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|849.2 MB| + +## References + +https://huggingface.co/mdosama39/xlm-roberta-base-FakeNews-Dravidian-NT-3e-4 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_final_mixed_aug_delete_2_en.md b/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_final_mixed_aug_delete_2_en.md new file mode 100644 index 00000000000000..e0a73f791bcc66 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_final_mixed_aug_delete_2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_final_mixed_aug_delete_2 XlmRoBertaForSequenceClassification from ThuyNT03 +author: John Snow Labs +name: xlm_roberta_base_final_mixed_aug_delete_2 +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_final_mixed_aug_delete_2` is a English model originally trained by ThuyNT03. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_final_mixed_aug_delete_2_en_5.5.0_3.0_1726872384312.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_final_mixed_aug_delete_2_en_5.5.0_3.0_1726872384312.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_final_mixed_aug_delete_2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_final_mixed_aug_delete_2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_final_mixed_aug_delete_2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|794.8 MB| + +## References + +https://huggingface.co/ThuyNT03/xlm-roberta-base-Final_Mixed-aug_delete-2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_final_mixed_aug_delete_2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_final_mixed_aug_delete_2_pipeline_en.md new file mode 100644 index 00000000000000..4762324a2a039b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_final_mixed_aug_delete_2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_final_mixed_aug_delete_2_pipeline pipeline XlmRoBertaForSequenceClassification from ThuyNT03 +author: John Snow Labs +name: xlm_roberta_base_final_mixed_aug_delete_2_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_final_mixed_aug_delete_2_pipeline` is a English model originally trained by ThuyNT03. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_final_mixed_aug_delete_2_pipeline_en_5.5.0_3.0_1726872507134.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_final_mixed_aug_delete_2_pipeline_en_5.5.0_3.0_1726872507134.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_final_mixed_aug_delete_2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_final_mixed_aug_delete_2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_final_mixed_aug_delete_2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|794.8 MB| + +## References + +https://huggingface.co/ThuyNT03/xlm-roberta-base-Final_Mixed-aug_delete-2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_final_mixed_aug_insert_w2v_2_en.md b/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_final_mixed_aug_insert_w2v_2_en.md new file mode 100644 index 00000000000000..2e58ab357efbee --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_final_mixed_aug_insert_w2v_2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_final_mixed_aug_insert_w2v_2 XlmRoBertaForSequenceClassification from ThuyNT03 +author: John Snow Labs +name: xlm_roberta_base_final_mixed_aug_insert_w2v_2 +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_final_mixed_aug_insert_w2v_2` is a English model originally trained by ThuyNT03. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_final_mixed_aug_insert_w2v_2_en_5.5.0_3.0_1726872389227.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_final_mixed_aug_insert_w2v_2_en_5.5.0_3.0_1726872389227.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_final_mixed_aug_insert_w2v_2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_final_mixed_aug_insert_w2v_2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_final_mixed_aug_insert_w2v_2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|796.5 MB| + +## References + +https://huggingface.co/ThuyNT03/xlm-roberta-base-Final_Mixed-aug_insert_w2v-2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_final_mixed_aug_insert_w2v_2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_final_mixed_aug_insert_w2v_2_pipeline_en.md new file mode 100644 index 00000000000000..9574080c848608 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_final_mixed_aug_insert_w2v_2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_final_mixed_aug_insert_w2v_2_pipeline pipeline XlmRoBertaForSequenceClassification from ThuyNT03 +author: John Snow Labs +name: xlm_roberta_base_final_mixed_aug_insert_w2v_2_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_final_mixed_aug_insert_w2v_2_pipeline` is a English model originally trained by ThuyNT03. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_final_mixed_aug_insert_w2v_2_pipeline_en_5.5.0_3.0_1726872511729.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_final_mixed_aug_insert_w2v_2_pipeline_en_5.5.0_3.0_1726872511729.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_final_mixed_aug_insert_w2v_2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_final_mixed_aug_insert_w2v_2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_final_mixed_aug_insert_w2v_2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|796.5 MB| + +## References + +https://huggingface.co/ThuyNT03/xlm-roberta-base-Final_Mixed-aug_insert_w2v-2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_finetuned_marc_anshengmay_en.md b/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_finetuned_marc_anshengmay_en.md new file mode 100644 index 00000000000000..aedb7274cecf91 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_finetuned_marc_anshengmay_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_marc_anshengmay XlmRoBertaForSequenceClassification from anshengmay +author: John Snow Labs +name: xlm_roberta_base_finetuned_marc_anshengmay +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_marc_anshengmay` is a English model originally trained by anshengmay. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_marc_anshengmay_en_5.5.0_3.0_1726865355015.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_marc_anshengmay_en_5.5.0_3.0_1726865355015.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_finetuned_marc_anshengmay","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_finetuned_marc_anshengmay", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_marc_anshengmay| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|835.1 MB| + +## References + +https://huggingface.co/anshengmay/xlm-roberta-base-finetuned-marc \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_finetuned_marc_anshengmay_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_finetuned_marc_anshengmay_pipeline_en.md new file mode 100644 index 00000000000000..bf2b1260bb4f48 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_finetuned_marc_anshengmay_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_marc_anshengmay_pipeline pipeline XlmRoBertaForSequenceClassification from anshengmay +author: John Snow Labs +name: xlm_roberta_base_finetuned_marc_anshengmay_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_marc_anshengmay_pipeline` is a English model originally trained by anshengmay. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_marc_anshengmay_pipeline_en_5.5.0_3.0_1726865436930.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_marc_anshengmay_pipeline_en_5.5.0_3.0_1726865436930.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_marc_anshengmay_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_marc_anshengmay_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_marc_anshengmay_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|835.1 MB| + +## References + +https://huggingface.co/anshengmay/xlm-roberta-base-finetuned-marc + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_finetuned_panx_all_huangjia_en.md b/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_finetuned_panx_all_huangjia_en.md new file mode 100644 index 00000000000000..d1b15aaf0565ec --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_finetuned_panx_all_huangjia_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_all_huangjia XlmRoBertaForTokenClassification from huangjia +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_all_huangjia +date: 2024-09-20 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_all_huangjia` is a English model originally trained by huangjia. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_huangjia_en_5.5.0_3.0_1726843750806.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_huangjia_en_5.5.0_3.0_1726843750806.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_all_huangjia","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_all_huangjia", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_all_huangjia| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|860.9 MB| + +## References + +https://huggingface.co/huangjia/xlm-roberta-base-finetuned-panx-all \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_finetuned_panx_all_huangjia_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_finetuned_panx_all_huangjia_pipeline_en.md new file mode 100644 index 00000000000000..16c98efda63b7f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_finetuned_panx_all_huangjia_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_all_huangjia_pipeline pipeline XlmRoBertaForTokenClassification from huangjia +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_all_huangjia_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_all_huangjia_pipeline` is a English model originally trained by huangjia. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_huangjia_pipeline_en_5.5.0_3.0_1726843819609.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_huangjia_pipeline_en_5.5.0_3.0_1726843819609.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_all_huangjia_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_all_huangjia_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_all_huangjia_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|861.0 MB| + +## References + +https://huggingface.co/huangjia/xlm-roberta-base-finetuned-panx-all + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_finetuned_panx_all_jaemin12_en.md b/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_finetuned_panx_all_jaemin12_en.md new file mode 100644 index 00000000000000..e79d856f5b073a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_finetuned_panx_all_jaemin12_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_all_jaemin12 XlmRoBertaForTokenClassification from jaemin12 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_all_jaemin12 +date: 2024-09-20 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_all_jaemin12` is a English model originally trained by jaemin12. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_jaemin12_en_5.5.0_3.0_1726844050291.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_jaemin12_en_5.5.0_3.0_1726844050291.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_all_jaemin12","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_all_jaemin12", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_all_jaemin12| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|861.0 MB| + +## References + +https://huggingface.co/jaemin12/xlm-roberta-base-finetuned-panx-all \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_finetuned_panx_all_jaemin12_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_finetuned_panx_all_jaemin12_pipeline_en.md new file mode 100644 index 00000000000000..75e40c74797ada --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_finetuned_panx_all_jaemin12_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_all_jaemin12_pipeline pipeline XlmRoBertaForTokenClassification from jaemin12 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_all_jaemin12_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_all_jaemin12_pipeline` is a English model originally trained by jaemin12. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_jaemin12_pipeline_en_5.5.0_3.0_1726844114149.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_jaemin12_pipeline_en_5.5.0_3.0_1726844114149.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_all_jaemin12_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_all_jaemin12_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_all_jaemin12_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|861.0 MB| + +## References + +https://huggingface.co/jaemin12/xlm-roberta-base-finetuned-panx-all + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_finetuned_panx_all_koroku_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_finetuned_panx_all_koroku_pipeline_en.md new file mode 100644 index 00000000000000..13d4f3954dad13 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_finetuned_panx_all_koroku_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_all_koroku_pipeline pipeline XlmRoBertaForTokenClassification from koroku +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_all_koroku_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_all_koroku_pipeline` is a English model originally trained by koroku. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_koroku_pipeline_en_5.5.0_3.0_1726844308912.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_koroku_pipeline_en_5.5.0_3.0_1726844308912.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_all_koroku_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_all_koroku_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_all_koroku_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|861.2 MB| + +## References + +https://huggingface.co/koroku/xlm-roberta-base-finetuned-panx-all + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_finetuned_panx_all_lee_soha_en.md b/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_finetuned_panx_all_lee_soha_en.md new file mode 100644 index 00000000000000..861ca4be0db6a8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_finetuned_panx_all_lee_soha_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_all_lee_soha XlmRoBertaForTokenClassification from Lee-soha +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_all_lee_soha +date: 2024-09-20 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_all_lee_soha` is a English model originally trained by Lee-soha. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_lee_soha_en_5.5.0_3.0_1726844508600.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_lee_soha_en_5.5.0_3.0_1726844508600.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_all_lee_soha","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_all_lee_soha", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_all_lee_soha| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|848.0 MB| + +## References + +https://huggingface.co/Lee-soha/xlm-roberta-base-finetuned-panx-all \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_finetuned_panx_all_lee_soha_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_finetuned_panx_all_lee_soha_pipeline_en.md new file mode 100644 index 00000000000000..3aba9703a47ac7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_finetuned_panx_all_lee_soha_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_all_lee_soha_pipeline pipeline XlmRoBertaForTokenClassification from Lee-soha +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_all_lee_soha_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_all_lee_soha_pipeline` is a English model originally trained by Lee-soha. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_lee_soha_pipeline_en_5.5.0_3.0_1726844590280.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_lee_soha_pipeline_en_5.5.0_3.0_1726844590280.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_all_lee_soha_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_all_lee_soha_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_all_lee_soha_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|848.0 MB| + +## References + +https://huggingface.co/Lee-soha/xlm-roberta-base-finetuned-panx-all + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_finetuned_panx_english_andrew45_en.md b/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_finetuned_panx_english_andrew45_en.md new file mode 100644 index 00000000000000..6c245b7831c952 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_finetuned_panx_english_andrew45_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_english_andrew45 XlmRoBertaForTokenClassification from andrew45 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_english_andrew45 +date: 2024-09-20 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_english_andrew45` is a English model originally trained by andrew45. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_andrew45_en_5.5.0_3.0_1726843382983.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_andrew45_en_5.5.0_3.0_1726843382983.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_english_andrew45","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_english_andrew45", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_english_andrew45| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|814.3 MB| + +## References + +https://huggingface.co/andrew45/xlm-roberta-base-finetuned-panx-en \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_finetuned_panx_english_andrew45_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_finetuned_panx_english_andrew45_pipeline_en.md new file mode 100644 index 00000000000000..4835be958bd2d9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_finetuned_panx_english_andrew45_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_english_andrew45_pipeline pipeline XlmRoBertaForTokenClassification from andrew45 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_english_andrew45_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_english_andrew45_pipeline` is a English model originally trained by andrew45. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_andrew45_pipeline_en_5.5.0_3.0_1726843489036.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_andrew45_pipeline_en_5.5.0_3.0_1726843489036.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_english_andrew45_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_english_andrew45_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_english_andrew45_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|814.3 MB| + +## References + +https://huggingface.co/andrew45/xlm-roberta-base-finetuned-panx-en + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_finetuned_panx_french_takizawa_en.md b/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_finetuned_panx_french_takizawa_en.md new file mode 100644 index 00000000000000..20eb9ed38c69fe --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_finetuned_panx_french_takizawa_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_takizawa XlmRoBertaForTokenClassification from takizawa +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_takizawa +date: 2024-09-20 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_takizawa` is a English model originally trained by takizawa. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_takizawa_en_5.5.0_3.0_1726843594238.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_takizawa_en_5.5.0_3.0_1726843594238.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_french_takizawa","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_french_takizawa", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_takizawa| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|840.9 MB| + +## References + +https://huggingface.co/takizawa/xlm-roberta-base-finetuned-panx-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_finetuned_panx_french_takizawa_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_finetuned_panx_french_takizawa_pipeline_en.md new file mode 100644 index 00000000000000..ea30fae4be48de --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_finetuned_panx_french_takizawa_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_takizawa_pipeline pipeline XlmRoBertaForTokenClassification from takizawa +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_takizawa_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_takizawa_pipeline` is a English model originally trained by takizawa. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_takizawa_pipeline_en_5.5.0_3.0_1726843671871.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_takizawa_pipeline_en_5.5.0_3.0_1726843671871.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_takizawa_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_takizawa_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_takizawa_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|840.9 MB| + +## References + +https://huggingface.co/takizawa/xlm-roberta-base-finetuned-panx-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_finetuned_panx_french_yurit04_en.md b/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_finetuned_panx_french_yurit04_en.md new file mode 100644 index 00000000000000..12771ba64deb06 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_finetuned_panx_french_yurit04_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_yurit04 XlmRoBertaForTokenClassification from yurit04 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_yurit04 +date: 2024-09-20 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_yurit04` is a English model originally trained by yurit04. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_yurit04_en_5.5.0_3.0_1726844116316.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_yurit04_en_5.5.0_3.0_1726844116316.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_french_yurit04","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_french_yurit04", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_yurit04| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|840.9 MB| + +## References + +https://huggingface.co/yurit04/xlm-roberta-base-finetuned-panx-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_finetuned_panx_french_yurit04_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_finetuned_panx_french_yurit04_pipeline_en.md new file mode 100644 index 00000000000000..08cbd5533f53bc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_finetuned_panx_french_yurit04_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_yurit04_pipeline pipeline XlmRoBertaForTokenClassification from yurit04 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_yurit04_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_yurit04_pipeline` is a English model originally trained by yurit04. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_yurit04_pipeline_en_5.5.0_3.0_1726844200751.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_yurit04_pipeline_en_5.5.0_3.0_1726844200751.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_yurit04_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_yurit04_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_yurit04_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|840.9 MB| + +## References + +https://huggingface.co/yurit04/xlm-roberta-base-finetuned-panx-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_finetuned_panx_german_cogitur_en.md b/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_finetuned_panx_german_cogitur_en.md new file mode 100644 index 00000000000000..fbb00dfd7a13d1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_finetuned_panx_german_cogitur_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_cogitur XlmRoBertaForTokenClassification from cogitur +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_cogitur +date: 2024-09-20 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_cogitur` is a English model originally trained by cogitur. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_cogitur_en_5.5.0_3.0_1726843253285.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_cogitur_en_5.5.0_3.0_1726843253285.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_cogitur","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_cogitur", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_cogitur| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/cogitur/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_finetuned_panx_german_cogitur_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_finetuned_panx_german_cogitur_pipeline_en.md new file mode 100644 index 00000000000000..22dd634c647249 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_finetuned_panx_german_cogitur_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_cogitur_pipeline pipeline XlmRoBertaForTokenClassification from cogitur +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_cogitur_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_cogitur_pipeline` is a English model originally trained by cogitur. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_cogitur_pipeline_en_5.5.0_3.0_1726843321244.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_cogitur_pipeline_en_5.5.0_3.0_1726843321244.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_cogitur_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_cogitur_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_cogitur_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/cogitur/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_finetuned_panx_german_french_bennef_en.md b/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_finetuned_panx_german_french_bennef_en.md new file mode 100644 index 00000000000000..f9f4fb0adbb326 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_finetuned_panx_german_french_bennef_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_bennef XlmRoBertaForTokenClassification from BenneF +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_bennef +date: 2024-09-20 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_bennef` is a English model originally trained by BenneF. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_bennef_en_5.5.0_3.0_1726844717944.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_bennef_en_5.5.0_3.0_1726844717944.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_bennef","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_bennef", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_bennef| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|843.4 MB| + +## References + +https://huggingface.co/BenneF/xlm-roberta-base-finetuned-panx-de-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_finetuned_panx_german_french_bennef_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_finetuned_panx_german_french_bennef_pipeline_en.md new file mode 100644 index 00000000000000..eb363c6075479a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_finetuned_panx_german_french_bennef_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_bennef_pipeline pipeline XlmRoBertaForTokenClassification from BenneF +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_bennef_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_bennef_pipeline` is a English model originally trained by BenneF. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_bennef_pipeline_en_5.5.0_3.0_1726844802305.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_bennef_pipeline_en_5.5.0_3.0_1726844802305.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_bennef_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_bennef_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_bennef_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|843.4 MB| + +## References + +https://huggingface.co/BenneF/xlm-roberta-base-finetuned-panx-de-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_finetuned_panx_german_french_guroruseru_en.md b/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_finetuned_panx_german_french_guroruseru_en.md new file mode 100644 index 00000000000000..b411a232b08349 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_finetuned_panx_german_french_guroruseru_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_guroruseru XlmRoBertaForTokenClassification from Guroruseru +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_guroruseru +date: 2024-09-20 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_guroruseru` is a English model originally trained by Guroruseru. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_guroruseru_en_5.5.0_3.0_1726843107067.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_guroruseru_en_5.5.0_3.0_1726843107067.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_guroruseru","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_guroruseru", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_guroruseru| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|858.2 MB| + +## References + +https://huggingface.co/Guroruseru/xlm-roberta-base-finetuned-panx-de-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_finetuned_panx_german_french_guroruseru_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_finetuned_panx_german_french_guroruseru_pipeline_en.md new file mode 100644 index 00000000000000..0ab4a72d021c4b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_finetuned_panx_german_french_guroruseru_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_guroruseru_pipeline pipeline XlmRoBertaForTokenClassification from Guroruseru +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_guroruseru_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_guroruseru_pipeline` is a English model originally trained by Guroruseru. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_guroruseru_pipeline_en_5.5.0_3.0_1726843172077.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_guroruseru_pipeline_en_5.5.0_3.0_1726843172077.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_guroruseru_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_guroruseru_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_guroruseru_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|858.2 MB| + +## References + +https://huggingface.co/Guroruseru/xlm-roberta-base-finetuned-panx-de-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_finetuned_panx_german_french_kbleejohn_en.md b/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_finetuned_panx_german_french_kbleejohn_en.md new file mode 100644 index 00000000000000..bbe7fc5a655a65 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_finetuned_panx_german_french_kbleejohn_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_kbleejohn XlmRoBertaForTokenClassification from kbleejohn +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_kbleejohn +date: 2024-09-20 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_kbleejohn` is a English model originally trained by kbleejohn. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_kbleejohn_en_5.5.0_3.0_1726844501552.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_kbleejohn_en_5.5.0_3.0_1726844501552.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_kbleejohn","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_kbleejohn", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_kbleejohn| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|858.2 MB| + +## References + +https://huggingface.co/kbleejohn/xlm-roberta-base-finetuned-panx-de-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_finetuned_panx_german_french_kbleejohn_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_finetuned_panx_german_french_kbleejohn_pipeline_en.md new file mode 100644 index 00000000000000..4a3134f91cb234 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_finetuned_panx_german_french_kbleejohn_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_kbleejohn_pipeline pipeline XlmRoBertaForTokenClassification from kbleejohn +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_kbleejohn_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_kbleejohn_pipeline` is a English model originally trained by kbleejohn. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_kbleejohn_pipeline_en_5.5.0_3.0_1726844566516.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_kbleejohn_pipeline_en_5.5.0_3.0_1726844566516.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_kbleejohn_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_kbleejohn_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_kbleejohn_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|858.2 MB| + +## References + +https://huggingface.co/kbleejohn/xlm-roberta-base-finetuned-panx-de-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_finetuned_panx_german_french_kenhoffman_en.md b/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_finetuned_panx_german_french_kenhoffman_en.md new file mode 100644 index 00000000000000..0a99ba5b8653a6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_finetuned_panx_german_french_kenhoffman_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_kenhoffman XlmRoBertaForTokenClassification from kenhoffman +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_kenhoffman +date: 2024-09-20 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_kenhoffman` is a English model originally trained by kenhoffman. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_kenhoffman_en_5.5.0_3.0_1726843910561.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_kenhoffman_en_5.5.0_3.0_1726843910561.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_kenhoffman","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_kenhoffman", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_kenhoffman| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|858.2 MB| + +## References + +https://huggingface.co/kenhoffman/xlm-roberta-base-finetuned-panx-de-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_finetuned_panx_german_french_kenhoffman_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_finetuned_panx_german_french_kenhoffman_pipeline_en.md new file mode 100644 index 00000000000000..dd920fb4f0491d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_finetuned_panx_german_french_kenhoffman_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_kenhoffman_pipeline pipeline XlmRoBertaForTokenClassification from kenhoffman +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_kenhoffman_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_kenhoffman_pipeline` is a English model originally trained by kenhoffman. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_kenhoffman_pipeline_en_5.5.0_3.0_1726843976661.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_kenhoffman_pipeline_en_5.5.0_3.0_1726843976661.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_kenhoffman_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_kenhoffman_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_kenhoffman_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|858.2 MB| + +## References + +https://huggingface.co/kenhoffman/xlm-roberta-base-finetuned-panx-de-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_finetuned_panx_german_french_qilin1_en.md b/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_finetuned_panx_german_french_qilin1_en.md new file mode 100644 index 00000000000000..3784b1a76ca8fe --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_finetuned_panx_german_french_qilin1_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_qilin1 XlmRoBertaForTokenClassification from qilin1 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_qilin1 +date: 2024-09-20 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_qilin1` is a English model originally trained by qilin1. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_qilin1_en_5.5.0_3.0_1726843222905.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_qilin1_en_5.5.0_3.0_1726843222905.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_qilin1","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_qilin1", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_qilin1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|858.6 MB| + +## References + +https://huggingface.co/qilin1/xlm-roberta-base-finetuned-panx-de-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_finetuned_panx_german_french_qilin1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_finetuned_panx_german_french_qilin1_pipeline_en.md new file mode 100644 index 00000000000000..e133deaf8817dc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_finetuned_panx_german_french_qilin1_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_qilin1_pipeline pipeline XlmRoBertaForTokenClassification from qilin1 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_qilin1_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_qilin1_pipeline` is a English model originally trained by qilin1. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_qilin1_pipeline_en_5.5.0_3.0_1726843286319.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_qilin1_pipeline_en_5.5.0_3.0_1726843286319.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_qilin1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_qilin1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_qilin1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|858.7 MB| + +## References + +https://huggingface.co/qilin1/xlm-roberta-base-finetuned-panx-de-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_finetuned_panx_german_french_zardian_en.md b/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_finetuned_panx_german_french_zardian_en.md new file mode 100644 index 00000000000000..cb62268e2f073c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_finetuned_panx_german_french_zardian_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_zardian XlmRoBertaForTokenClassification from Zardian +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_zardian +date: 2024-09-20 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_zardian` is a English model originally trained by Zardian. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_zardian_en_5.5.0_3.0_1726844250806.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_zardian_en_5.5.0_3.0_1726844250806.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_zardian","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_zardian", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_zardian| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|843.4 MB| + +## References + +https://huggingface.co/Zardian/xlm-roberta-base-finetuned-panx-de-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_finetuned_panx_german_french_zardian_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_finetuned_panx_german_french_zardian_pipeline_en.md new file mode 100644 index 00000000000000..1c7cf228b506e5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_finetuned_panx_german_french_zardian_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_zardian_pipeline pipeline XlmRoBertaForTokenClassification from Zardian +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_zardian_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_zardian_pipeline` is a English model originally trained by Zardian. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_zardian_pipeline_en_5.5.0_3.0_1726844336344.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_zardian_pipeline_en_5.5.0_3.0_1726844336344.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_zardian_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_zardian_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_zardian_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|843.4 MB| + +## References + +https://huggingface.co/Zardian/xlm-roberta-base-finetuned-panx-de-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_finetuned_panx_german_vantaa32_en.md b/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_finetuned_panx_german_vantaa32_en.md new file mode 100644 index 00000000000000..f14fb49da26bd9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_finetuned_panx_german_vantaa32_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_vantaa32 XlmRoBertaForTokenClassification from vantaa32 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_vantaa32 +date: 2024-09-20 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_vantaa32` is a English model originally trained by vantaa32. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_vantaa32_en_5.5.0_3.0_1726843410192.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_vantaa32_en_5.5.0_3.0_1726843410192.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_vantaa32","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_vantaa32", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_vantaa32| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|840.8 MB| + +## References + +https://huggingface.co/vantaa32/xlm-roberta-base-finetuned_panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_finetuned_panx_german_vantaa32_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_finetuned_panx_german_vantaa32_pipeline_en.md new file mode 100644 index 00000000000000..68ea90a907368a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_finetuned_panx_german_vantaa32_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_vantaa32_pipeline pipeline XlmRoBertaForTokenClassification from vantaa32 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_vantaa32_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_vantaa32_pipeline` is a English model originally trained by vantaa32. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_vantaa32_pipeline_en_5.5.0_3.0_1726843497533.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_vantaa32_pipeline_en_5.5.0_3.0_1726843497533.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_vantaa32_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_vantaa32_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_vantaa32_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|840.8 MB| + +## References + +https://huggingface.co/vantaa32/xlm-roberta-base-finetuned_panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_lr0_0001_seed42_basic_original_esp_kinyarwanda_eng_train_en.md b/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_lr0_0001_seed42_basic_original_esp_kinyarwanda_eng_train_en.md new file mode 100644 index 00000000000000..c10e6e61403b11 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_lr0_0001_seed42_basic_original_esp_kinyarwanda_eng_train_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_lr0_0001_seed42_basic_original_esp_kinyarwanda_eng_train XlmRoBertaForSequenceClassification from shanhy +author: John Snow Labs +name: xlm_roberta_base_lr0_0001_seed42_basic_original_esp_kinyarwanda_eng_train +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_lr0_0001_seed42_basic_original_esp_kinyarwanda_eng_train` is a English model originally trained by shanhy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_lr0_0001_seed42_basic_original_esp_kinyarwanda_eng_train_en_5.5.0_3.0_1726872771744.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_lr0_0001_seed42_basic_original_esp_kinyarwanda_eng_train_en_5.5.0_3.0_1726872771744.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_lr0_0001_seed42_basic_original_esp_kinyarwanda_eng_train","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_lr0_0001_seed42_basic_original_esp_kinyarwanda_eng_train", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_lr0_0001_seed42_basic_original_esp_kinyarwanda_eng_train| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|819.4 MB| + +## References + +https://huggingface.co/shanhy/xlm-roberta-base_lr0.0001_seed42_basic_original_esp-kin-eng_train \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_lr0_0001_seed42_basic_original_esp_kinyarwanda_eng_train_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_lr0_0001_seed42_basic_original_esp_kinyarwanda_eng_train_pipeline_en.md new file mode 100644 index 00000000000000..019cc369058046 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_lr0_0001_seed42_basic_original_esp_kinyarwanda_eng_train_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_lr0_0001_seed42_basic_original_esp_kinyarwanda_eng_train_pipeline pipeline XlmRoBertaForSequenceClassification from shanhy +author: John Snow Labs +name: xlm_roberta_base_lr0_0001_seed42_basic_original_esp_kinyarwanda_eng_train_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_lr0_0001_seed42_basic_original_esp_kinyarwanda_eng_train_pipeline` is a English model originally trained by shanhy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_lr0_0001_seed42_basic_original_esp_kinyarwanda_eng_train_pipeline_en_5.5.0_3.0_1726872897843.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_lr0_0001_seed42_basic_original_esp_kinyarwanda_eng_train_pipeline_en_5.5.0_3.0_1726872897843.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_lr0_0001_seed42_basic_original_esp_kinyarwanda_eng_train_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_lr0_0001_seed42_basic_original_esp_kinyarwanda_eng_train_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_lr0_0001_seed42_basic_original_esp_kinyarwanda_eng_train_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|819.4 MB| + +## References + +https://huggingface.co/shanhy/xlm-roberta-base_lr0.0001_seed42_basic_original_esp-kin-eng_train + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_lr5e_06_seed42_kinyarwanda_hau_eng_train_en.md b/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_lr5e_06_seed42_kinyarwanda_hau_eng_train_en.md new file mode 100644 index 00000000000000..993bf74aaac24f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_lr5e_06_seed42_kinyarwanda_hau_eng_train_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_lr5e_06_seed42_kinyarwanda_hau_eng_train XlmRoBertaForSequenceClassification from shanhy +author: John Snow Labs +name: xlm_roberta_base_lr5e_06_seed42_kinyarwanda_hau_eng_train +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_lr5e_06_seed42_kinyarwanda_hau_eng_train` is a English model originally trained by shanhy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_lr5e_06_seed42_kinyarwanda_hau_eng_train_en_5.5.0_3.0_1726873023366.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_lr5e_06_seed42_kinyarwanda_hau_eng_train_en_5.5.0_3.0_1726873023366.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_lr5e_06_seed42_kinyarwanda_hau_eng_train","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_lr5e_06_seed42_kinyarwanda_hau_eng_train", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_lr5e_06_seed42_kinyarwanda_hau_eng_train| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|800.7 MB| + +## References + +https://huggingface.co/shanhy/xlm-roberta-base_lr5e-06_seed42_kin-hau-eng_train \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_lr5e_06_seed42_kinyarwanda_hau_eng_train_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_lr5e_06_seed42_kinyarwanda_hau_eng_train_pipeline_en.md new file mode 100644 index 00000000000000..739cc05b9cda19 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_lr5e_06_seed42_kinyarwanda_hau_eng_train_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_lr5e_06_seed42_kinyarwanda_hau_eng_train_pipeline pipeline XlmRoBertaForSequenceClassification from shanhy +author: John Snow Labs +name: xlm_roberta_base_lr5e_06_seed42_kinyarwanda_hau_eng_train_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_lr5e_06_seed42_kinyarwanda_hau_eng_train_pipeline` is a English model originally trained by shanhy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_lr5e_06_seed42_kinyarwanda_hau_eng_train_pipeline_en_5.5.0_3.0_1726873153049.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_lr5e_06_seed42_kinyarwanda_hau_eng_train_pipeline_en_5.5.0_3.0_1726873153049.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_lr5e_06_seed42_kinyarwanda_hau_eng_train_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_lr5e_06_seed42_kinyarwanda_hau_eng_train_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_lr5e_06_seed42_kinyarwanda_hau_eng_train_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|800.7 MB| + +## References + +https://huggingface.co/shanhy/xlm-roberta-base_lr5e-06_seed42_kin-hau-eng_train + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_nepal_bhasa_vietnam_aug_delete_1_en.md b/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_nepal_bhasa_vietnam_aug_delete_1_en.md new file mode 100644 index 00000000000000..e0f623d945a7e3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_nepal_bhasa_vietnam_aug_delete_1_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_nepal_bhasa_vietnam_aug_delete_1 XlmRoBertaForSequenceClassification from ThuyNT03 +author: John Snow Labs +name: xlm_roberta_base_nepal_bhasa_vietnam_aug_delete_1 +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_nepal_bhasa_vietnam_aug_delete_1` is a English model originally trained by ThuyNT03. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_nepal_bhasa_vietnam_aug_delete_1_en_5.5.0_3.0_1726800470246.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_nepal_bhasa_vietnam_aug_delete_1_en_5.5.0_3.0_1726800470246.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_nepal_bhasa_vietnam_aug_delete_1","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_nepal_bhasa_vietnam_aug_delete_1", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_nepal_bhasa_vietnam_aug_delete_1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|793.6 MB| + +## References + +https://huggingface.co/ThuyNT03/xlm-roberta-base-New_VietNam-aug_delete-1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_nepal_bhasa_vietnam_aug_delete_1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_nepal_bhasa_vietnam_aug_delete_1_pipeline_en.md new file mode 100644 index 00000000000000..e5ebc1c5bfe8f1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_nepal_bhasa_vietnam_aug_delete_1_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_nepal_bhasa_vietnam_aug_delete_1_pipeline pipeline XlmRoBertaForSequenceClassification from ThuyNT03 +author: John Snow Labs +name: xlm_roberta_base_nepal_bhasa_vietnam_aug_delete_1_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_nepal_bhasa_vietnam_aug_delete_1_pipeline` is a English model originally trained by ThuyNT03. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_nepal_bhasa_vietnam_aug_delete_1_pipeline_en_5.5.0_3.0_1726800595026.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_nepal_bhasa_vietnam_aug_delete_1_pipeline_en_5.5.0_3.0_1726800595026.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_nepal_bhasa_vietnam_aug_delete_1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_nepal_bhasa_vietnam_aug_delete_1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_nepal_bhasa_vietnam_aug_delete_1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|793.7 MB| + +## References + +https://huggingface.co/ThuyNT03/xlm-roberta-base-New_VietNam-aug_delete-1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_nepal_bhasa_vietnam_aug_insert_w2v_1_en.md b/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_nepal_bhasa_vietnam_aug_insert_w2v_1_en.md new file mode 100644 index 00000000000000..a95b7ec767c355 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_nepal_bhasa_vietnam_aug_insert_w2v_1_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_nepal_bhasa_vietnam_aug_insert_w2v_1 XlmRoBertaForSequenceClassification from ThuyNT03 +author: John Snow Labs +name: xlm_roberta_base_nepal_bhasa_vietnam_aug_insert_w2v_1 +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_nepal_bhasa_vietnam_aug_insert_w2v_1` is a English model originally trained by ThuyNT03. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_nepal_bhasa_vietnam_aug_insert_w2v_1_en_5.5.0_3.0_1726873212944.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_nepal_bhasa_vietnam_aug_insert_w2v_1_en_5.5.0_3.0_1726873212944.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_nepal_bhasa_vietnam_aug_insert_w2v_1","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_nepal_bhasa_vietnam_aug_insert_w2v_1", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_nepal_bhasa_vietnam_aug_insert_w2v_1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|796.3 MB| + +## References + +https://huggingface.co/ThuyNT03/xlm-roberta-base-New_VietNam-aug_insert_w2v-1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_nepal_bhasa_vietnam_aug_insert_w2v_1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_nepal_bhasa_vietnam_aug_insert_w2v_1_pipeline_en.md new file mode 100644 index 00000000000000..907031812bb88f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_nepal_bhasa_vietnam_aug_insert_w2v_1_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_nepal_bhasa_vietnam_aug_insert_w2v_1_pipeline pipeline XlmRoBertaForSequenceClassification from ThuyNT03 +author: John Snow Labs +name: xlm_roberta_base_nepal_bhasa_vietnam_aug_insert_w2v_1_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_nepal_bhasa_vietnam_aug_insert_w2v_1_pipeline` is a English model originally trained by ThuyNT03. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_nepal_bhasa_vietnam_aug_insert_w2v_1_pipeline_en_5.5.0_3.0_1726873336700.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_nepal_bhasa_vietnam_aug_insert_w2v_1_pipeline_en_5.5.0_3.0_1726873336700.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_nepal_bhasa_vietnam_aug_insert_w2v_1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_nepal_bhasa_vietnam_aug_insert_w2v_1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_nepal_bhasa_vietnam_aug_insert_w2v_1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|796.3 MB| + +## References + +https://huggingface.co/ThuyNT03/xlm-roberta-base-New_VietNam-aug_insert_w2v-1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_stress_identification_task_2_en.md b/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_stress_identification_task_2_en.md new file mode 100644 index 00000000000000..23267309dd09ea --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_stress_identification_task_2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_stress_identification_task_2 XlmRoBertaForSequenceClassification from mdosama39 +author: John Snow Labs +name: xlm_roberta_base_stress_identification_task_2 +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_stress_identification_task_2` is a English model originally trained by mdosama39. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_stress_identification_task_2_en_5.5.0_3.0_1726799860352.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_stress_identification_task_2_en_5.5.0_3.0_1726799860352.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_stress_identification_task_2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_stress_identification_task_2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_stress_identification_task_2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|811.2 MB| + +## References + +https://huggingface.co/mdosama39/xlm-roberta-base-Stress-identification-task-2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_trimmed_arabic_10000_tweet_sentiment_arabic_en.md b/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_trimmed_arabic_10000_tweet_sentiment_arabic_en.md new file mode 100644 index 00000000000000..a2d18acf6d53bc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_trimmed_arabic_10000_tweet_sentiment_arabic_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_trimmed_arabic_10000_tweet_sentiment_arabic XlmRoBertaForSequenceClassification from vocabtrimmer +author: John Snow Labs +name: xlm_roberta_base_trimmed_arabic_10000_tweet_sentiment_arabic +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_trimmed_arabic_10000_tweet_sentiment_arabic` is a English model originally trained by vocabtrimmer. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_trimmed_arabic_10000_tweet_sentiment_arabic_en_5.5.0_3.0_1726872069222.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_trimmed_arabic_10000_tweet_sentiment_arabic_en_5.5.0_3.0_1726872069222.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_trimmed_arabic_10000_tweet_sentiment_arabic","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_trimmed_arabic_10000_tweet_sentiment_arabic", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_trimmed_arabic_10000_tweet_sentiment_arabic| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|350.3 MB| + +## References + +https://huggingface.co/vocabtrimmer/xlm-roberta-base-trimmed-ar-10000-tweet-sentiment-ar \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_trimmed_arabic_10000_tweet_sentiment_arabic_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_trimmed_arabic_10000_tweet_sentiment_arabic_pipeline_en.md new file mode 100644 index 00000000000000..2cee1cd1867872 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_trimmed_arabic_10000_tweet_sentiment_arabic_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_trimmed_arabic_10000_tweet_sentiment_arabic_pipeline pipeline XlmRoBertaForSequenceClassification from vocabtrimmer +author: John Snow Labs +name: xlm_roberta_base_trimmed_arabic_10000_tweet_sentiment_arabic_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_trimmed_arabic_10000_tweet_sentiment_arabic_pipeline` is a English model originally trained by vocabtrimmer. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_trimmed_arabic_10000_tweet_sentiment_arabic_pipeline_en_5.5.0_3.0_1726872087530.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_trimmed_arabic_10000_tweet_sentiment_arabic_pipeline_en_5.5.0_3.0_1726872087530.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_trimmed_arabic_10000_tweet_sentiment_arabic_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_trimmed_arabic_10000_tweet_sentiment_arabic_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_trimmed_arabic_10000_tweet_sentiment_arabic_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|350.3 MB| + +## References + +https://huggingface.co/vocabtrimmer/xlm-roberta-base-trimmed-ar-10000-tweet-sentiment-ar + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_trimmed_arabic_30000_xnli_arabic_en.md b/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_trimmed_arabic_30000_xnli_arabic_en.md new file mode 100644 index 00000000000000..b2001fcbf08db6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_trimmed_arabic_30000_xnli_arabic_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_trimmed_arabic_30000_xnli_arabic XlmRoBertaForSequenceClassification from vocabtrimmer +author: John Snow Labs +name: xlm_roberta_base_trimmed_arabic_30000_xnli_arabic +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_trimmed_arabic_30000_xnli_arabic` is a English model originally trained by vocabtrimmer. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_trimmed_arabic_30000_xnli_arabic_en_5.5.0_3.0_1726864792038.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_trimmed_arabic_30000_xnli_arabic_en_5.5.0_3.0_1726864792038.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_trimmed_arabic_30000_xnli_arabic","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_trimmed_arabic_30000_xnli_arabic", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_trimmed_arabic_30000_xnli_arabic| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|398.4 MB| + +## References + +https://huggingface.co/vocabtrimmer/xlm-roberta-base-trimmed-ar-30000-xnli-ar \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_trimmed_arabic_30000_xnli_arabic_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_trimmed_arabic_30000_xnli_arabic_pipeline_en.md new file mode 100644 index 00000000000000..5d734e79440757 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_trimmed_arabic_30000_xnli_arabic_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_trimmed_arabic_30000_xnli_arabic_pipeline pipeline XlmRoBertaForSequenceClassification from vocabtrimmer +author: John Snow Labs +name: xlm_roberta_base_trimmed_arabic_30000_xnli_arabic_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_trimmed_arabic_30000_xnli_arabic_pipeline` is a English model originally trained by vocabtrimmer. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_trimmed_arabic_30000_xnli_arabic_pipeline_en_5.5.0_3.0_1726864816167.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_trimmed_arabic_30000_xnli_arabic_pipeline_en_5.5.0_3.0_1726864816167.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_trimmed_arabic_30000_xnli_arabic_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_trimmed_arabic_30000_xnli_arabic_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_trimmed_arabic_30000_xnli_arabic_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|398.5 MB| + +## References + +https://huggingface.co/vocabtrimmer/xlm-roberta-base-trimmed-ar-30000-xnli-ar + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_trimmed_french_60000_xnli_french_en.md b/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_trimmed_french_60000_xnli_french_en.md new file mode 100644 index 00000000000000..086054948161b9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_trimmed_french_60000_xnli_french_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_trimmed_french_60000_xnli_french XlmRoBertaForSequenceClassification from vocabtrimmer +author: John Snow Labs +name: xlm_roberta_base_trimmed_french_60000_xnli_french +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_trimmed_french_60000_xnli_french` is a English model originally trained by vocabtrimmer. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_trimmed_french_60000_xnli_french_en_5.5.0_3.0_1726865275589.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_trimmed_french_60000_xnli_french_en_5.5.0_3.0_1726865275589.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_trimmed_french_60000_xnli_french","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_trimmed_french_60000_xnli_french", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_trimmed_french_60000_xnli_french| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|467.3 MB| + +## References + +https://huggingface.co/vocabtrimmer/xlm-roberta-base-trimmed-fr-60000-xnli-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_trimmed_french_60000_xnli_french_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_trimmed_french_60000_xnli_french_pipeline_en.md new file mode 100644 index 00000000000000..074fe30637b5bf --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_trimmed_french_60000_xnli_french_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_trimmed_french_60000_xnli_french_pipeline pipeline XlmRoBertaForSequenceClassification from vocabtrimmer +author: John Snow Labs +name: xlm_roberta_base_trimmed_french_60000_xnli_french_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_trimmed_french_60000_xnli_french_pipeline` is a English model originally trained by vocabtrimmer. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_trimmed_french_60000_xnli_french_pipeline_en_5.5.0_3.0_1726865308589.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_trimmed_french_60000_xnli_french_pipeline_en_5.5.0_3.0_1726865308589.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_trimmed_french_60000_xnli_french_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_trimmed_french_60000_xnli_french_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_trimmed_french_60000_xnli_french_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|467.4 MB| + +## References + +https://huggingface.co/vocabtrimmer/xlm-roberta-base-trimmed-fr-60000-xnli-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_tweet_sentiment_french_trimmed_french_30000_en.md b/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_tweet_sentiment_french_trimmed_french_30000_en.md new file mode 100644 index 00000000000000..27722696d657c0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_tweet_sentiment_french_trimmed_french_30000_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_tweet_sentiment_french_trimmed_french_30000 XlmRoBertaForSequenceClassification from vocabtrimmer +author: John Snow Labs +name: xlm_roberta_base_tweet_sentiment_french_trimmed_french_30000 +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_tweet_sentiment_french_trimmed_french_30000` is a English model originally trained by vocabtrimmer. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_tweet_sentiment_french_trimmed_french_30000_en_5.5.0_3.0_1726872132269.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_tweet_sentiment_french_trimmed_french_30000_en_5.5.0_3.0_1726872132269.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_tweet_sentiment_french_trimmed_french_30000","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_tweet_sentiment_french_trimmed_french_30000", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_tweet_sentiment_french_trimmed_french_30000| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|388.2 MB| + +## References + +https://huggingface.co/vocabtrimmer/xlm-roberta-base-tweet-sentiment-fr-trimmed-fr-30000 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_tweet_sentiment_french_trimmed_french_30000_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_tweet_sentiment_french_trimmed_french_30000_pipeline_en.md new file mode 100644 index 00000000000000..71141c6854d2fb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_tweet_sentiment_french_trimmed_french_30000_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_tweet_sentiment_french_trimmed_french_30000_pipeline pipeline XlmRoBertaForSequenceClassification from vocabtrimmer +author: John Snow Labs +name: xlm_roberta_base_tweet_sentiment_french_trimmed_french_30000_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_tweet_sentiment_french_trimmed_french_30000_pipeline` is a English model originally trained by vocabtrimmer. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_tweet_sentiment_french_trimmed_french_30000_pipeline_en_5.5.0_3.0_1726872164141.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_tweet_sentiment_french_trimmed_french_30000_pipeline_en_5.5.0_3.0_1726872164141.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_tweet_sentiment_french_trimmed_french_30000_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_tweet_sentiment_french_trimmed_french_30000_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_tweet_sentiment_french_trimmed_french_30000_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|388.2 MB| + +## References + +https://huggingface.co/vocabtrimmer/xlm-roberta-base-tweet-sentiment-fr-trimmed-fr-30000 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_tweet_sentiment_german_trimmed_german_10000_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_tweet_sentiment_german_trimmed_german_10000_pipeline_en.md new file mode 100644 index 00000000000000..426f50ecff9623 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_tweet_sentiment_german_trimmed_german_10000_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_tweet_sentiment_german_trimmed_german_10000_pipeline pipeline XlmRoBertaForSequenceClassification from vocabtrimmer +author: John Snow Labs +name: xlm_roberta_base_tweet_sentiment_german_trimmed_german_10000_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_tweet_sentiment_german_trimmed_german_10000_pipeline` is a English model originally trained by vocabtrimmer. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_tweet_sentiment_german_trimmed_german_10000_pipeline_en_5.5.0_3.0_1726800030431.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_tweet_sentiment_german_trimmed_german_10000_pipeline_en_5.5.0_3.0_1726800030431.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_tweet_sentiment_german_trimmed_german_10000_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_tweet_sentiment_german_trimmed_german_10000_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_tweet_sentiment_german_trimmed_german_10000_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|350.1 MB| + +## References + +https://huggingface.co/vocabtrimmer/xlm-roberta-base-tweet-sentiment-de-trimmed-de-10000 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_tweet_sentiment_spanish_trimmed_spanish_15000_en.md b/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_tweet_sentiment_spanish_trimmed_spanish_15000_en.md new file mode 100644 index 00000000000000..109f61e5c9ae86 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_tweet_sentiment_spanish_trimmed_spanish_15000_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_tweet_sentiment_spanish_trimmed_spanish_15000 XlmRoBertaForSequenceClassification from vocabtrimmer +author: John Snow Labs +name: xlm_roberta_base_tweet_sentiment_spanish_trimmed_spanish_15000 +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_tweet_sentiment_spanish_trimmed_spanish_15000` is a English model originally trained by vocabtrimmer. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_tweet_sentiment_spanish_trimmed_spanish_15000_en_5.5.0_3.0_1726865340704.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_tweet_sentiment_spanish_trimmed_spanish_15000_en_5.5.0_3.0_1726865340704.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_tweet_sentiment_spanish_trimmed_spanish_15000","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_tweet_sentiment_spanish_trimmed_spanish_15000", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_tweet_sentiment_spanish_trimmed_spanish_15000| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|359.2 MB| + +## References + +https://huggingface.co/vocabtrimmer/xlm-roberta-base-tweet-sentiment-es-trimmed-es-15000 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_tweet_sentiment_spanish_trimmed_spanish_15000_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_tweet_sentiment_spanish_trimmed_spanish_15000_pipeline_en.md new file mode 100644 index 00000000000000..f3f6efa0d678d7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_tweet_sentiment_spanish_trimmed_spanish_15000_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_tweet_sentiment_spanish_trimmed_spanish_15000_pipeline pipeline XlmRoBertaForSequenceClassification from vocabtrimmer +author: John Snow Labs +name: xlm_roberta_base_tweet_sentiment_spanish_trimmed_spanish_15000_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_tweet_sentiment_spanish_trimmed_spanish_15000_pipeline` is a English model originally trained by vocabtrimmer. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_tweet_sentiment_spanish_trimmed_spanish_15000_pipeline_en_5.5.0_3.0_1726865361736.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_tweet_sentiment_spanish_trimmed_spanish_15000_pipeline_en_5.5.0_3.0_1726865361736.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_tweet_sentiment_spanish_trimmed_spanish_15000_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_tweet_sentiment_spanish_trimmed_spanish_15000_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_tweet_sentiment_spanish_trimmed_spanish_15000_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|359.2 MB| + +## References + +https://huggingface.co/vocabtrimmer/xlm-roberta-base-tweet-sentiment-es-trimmed-es-15000 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_xnli_german_trimmed_german_60000_en.md b/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_xnli_german_trimmed_german_60000_en.md new file mode 100644 index 00000000000000..5a8230168b40f5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_xnli_german_trimmed_german_60000_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_xnli_german_trimmed_german_60000 XlmRoBertaForSequenceClassification from vocabtrimmer +author: John Snow Labs +name: xlm_roberta_base_xnli_german_trimmed_german_60000 +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_xnli_german_trimmed_german_60000` is a English model originally trained by vocabtrimmer. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_xnli_german_trimmed_german_60000_en_5.5.0_3.0_1726799736990.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_xnli_german_trimmed_german_60000_en_5.5.0_3.0_1726799736990.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_xnli_german_trimmed_german_60000","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_xnli_german_trimmed_german_60000", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_xnli_german_trimmed_german_60000| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|469.4 MB| + +## References + +https://huggingface.co/vocabtrimmer/xlm-roberta-base-xnli-de-trimmed-de-60000 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_xnli_spanish_trimmed_spanish_60000_en.md b/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_xnli_spanish_trimmed_spanish_60000_en.md new file mode 100644 index 00000000000000..75a8926f9a18e2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_xnli_spanish_trimmed_spanish_60000_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_xnli_spanish_trimmed_spanish_60000 XlmRoBertaForSequenceClassification from vocabtrimmer +author: John Snow Labs +name: xlm_roberta_base_xnli_spanish_trimmed_spanish_60000 +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_xnli_spanish_trimmed_spanish_60000` is a English model originally trained by vocabtrimmer. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_xnli_spanish_trimmed_spanish_60000_en_5.5.0_3.0_1726865521273.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_xnli_spanish_trimmed_spanish_60000_en_5.5.0_3.0_1726865521273.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_xnli_spanish_trimmed_spanish_60000","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_xnli_spanish_trimmed_spanish_60000", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_xnli_spanish_trimmed_spanish_60000| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|470.1 MB| + +## References + +https://huggingface.co/vocabtrimmer/xlm-roberta-base-xnli-es-trimmed-es-60000 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_xnli_spanish_trimmed_spanish_60000_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_xnli_spanish_trimmed_spanish_60000_pipeline_en.md new file mode 100644 index 00000000000000..77109cd8028629 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_xnli_spanish_trimmed_spanish_60000_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_xnli_spanish_trimmed_spanish_60000_pipeline pipeline XlmRoBertaForSequenceClassification from vocabtrimmer +author: John Snow Labs +name: xlm_roberta_base_xnli_spanish_trimmed_spanish_60000_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_xnli_spanish_trimmed_spanish_60000_pipeline` is a English model originally trained by vocabtrimmer. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_xnli_spanish_trimmed_spanish_60000_pipeline_en_5.5.0_3.0_1726865556560.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_xnli_spanish_trimmed_spanish_60000_pipeline_en_5.5.0_3.0_1726865556560.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_xnli_spanish_trimmed_spanish_60000_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_xnli_spanish_trimmed_spanish_60000_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_xnli_spanish_trimmed_spanish_60000_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|470.2 MB| + +## References + +https://huggingface.co/vocabtrimmer/xlm-roberta-base-xnli-es-trimmed-es-60000 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_emotion_spanish_en.md b/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_emotion_spanish_en.md new file mode 100644 index 00000000000000..4fe2ca8f59f82a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_emotion_spanish_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_emotion_spanish XlmRoBertaForSequenceClassification from Cesar42 +author: John Snow Labs +name: xlm_roberta_emotion_spanish +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_emotion_spanish` is a English model originally trained by Cesar42. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_emotion_spanish_en_5.5.0_3.0_1726865190099.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_emotion_spanish_en_5.5.0_3.0_1726865190099.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_emotion_spanish","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_emotion_spanish", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_emotion_spanish| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.0 GB| + +## References + +https://huggingface.co/Cesar42/xlm-roberta-emotion-es \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_emotion_spanish_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_emotion_spanish_pipeline_en.md new file mode 100644 index 00000000000000..ed1688cacd00af --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_emotion_spanish_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_emotion_spanish_pipeline pipeline XlmRoBertaForSequenceClassification from Cesar42 +author: John Snow Labs +name: xlm_roberta_emotion_spanish_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_emotion_spanish_pipeline` is a English model originally trained by Cesar42. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_emotion_spanish_pipeline_en_5.5.0_3.0_1726865240242.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_emotion_spanish_pipeline_en_5.5.0_3.0_1726865240242.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_emotion_spanish_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_emotion_spanish_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_emotion_spanish_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.0 GB| + +## References + +https://huggingface.co/Cesar42/xlm-roberta-emotion-es + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_finetuned_emojis_1_client_toxic_fedavg_non_iid_fed_en.md b/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_finetuned_emojis_1_client_toxic_fedavg_non_iid_fed_en.md new file mode 100644 index 00000000000000..3f85917b302719 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_finetuned_emojis_1_client_toxic_fedavg_non_iid_fed_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_finetuned_emojis_1_client_toxic_fedavg_non_iid_fed XlmRoBertaForSequenceClassification from Karim-Gamal +author: John Snow Labs +name: xlm_roberta_finetuned_emojis_1_client_toxic_fedavg_non_iid_fed +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_finetuned_emojis_1_client_toxic_fedavg_non_iid_fed` is a English model originally trained by Karim-Gamal. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_finetuned_emojis_1_client_toxic_fedavg_non_iid_fed_en_5.5.0_3.0_1726846363148.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_finetuned_emojis_1_client_toxic_fedavg_non_iid_fed_en_5.5.0_3.0_1726846363148.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_finetuned_emojis_1_client_toxic_fedavg_non_iid_fed","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_finetuned_emojis_1_client_toxic_fedavg_non_iid_fed", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_finetuned_emojis_1_client_toxic_fedavg_non_iid_fed| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.0 GB| + +## References + +https://huggingface.co/Karim-Gamal/XLM-Roberta-finetuned-emojis-1-client-toxic-FedAvg-non-IID-Fed \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_finetuned_emojis_1_client_toxic_fedavg_non_iid_fed_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_finetuned_emojis_1_client_toxic_fedavg_non_iid_fed_pipeline_en.md new file mode 100644 index 00000000000000..d8457dbc009fc7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_finetuned_emojis_1_client_toxic_fedavg_non_iid_fed_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_finetuned_emojis_1_client_toxic_fedavg_non_iid_fed_pipeline pipeline XlmRoBertaForSequenceClassification from Karim-Gamal +author: John Snow Labs +name: xlm_roberta_finetuned_emojis_1_client_toxic_fedavg_non_iid_fed_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_finetuned_emojis_1_client_toxic_fedavg_non_iid_fed_pipeline` is a English model originally trained by Karim-Gamal. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_finetuned_emojis_1_client_toxic_fedavg_non_iid_fed_pipeline_en_5.5.0_3.0_1726846412683.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_finetuned_emojis_1_client_toxic_fedavg_non_iid_fed_pipeline_en_5.5.0_3.0_1726846412683.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_finetuned_emojis_1_client_toxic_fedavg_non_iid_fed_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_finetuned_emojis_1_client_toxic_fedavg_non_iid_fed_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_finetuned_emojis_1_client_toxic_fedavg_non_iid_fed_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.0 GB| + +## References + +https://huggingface.co/Karim-Gamal/XLM-Roberta-finetuned-emojis-1-client-toxic-FedAvg-non-IID-Fed + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-xlm_v_base_xnli_french_trimmed_french_en.md b/docs/_posts/ahmedlone127/2024-09-20-xlm_v_base_xnli_french_trimmed_french_en.md new file mode 100644 index 00000000000000..5d03cf1d59e9f1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-xlm_v_base_xnli_french_trimmed_french_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_v_base_xnli_french_trimmed_french XlmRoBertaForSequenceClassification from vocabtrimmer +author: John Snow Labs +name: xlm_v_base_xnli_french_trimmed_french +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_v_base_xnli_french_trimmed_french` is a English model originally trained by vocabtrimmer. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_v_base_xnli_french_trimmed_french_en_5.5.0_3.0_1726800591158.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_v_base_xnli_french_trimmed_french_en_5.5.0_3.0_1726800591158.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_v_base_xnli_french_trimmed_french","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_v_base_xnli_french_trimmed_french", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_v_base_xnli_french_trimmed_french| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|778.5 MB| + +## References + +https://huggingface.co/vocabtrimmer/xlm-v-base-xnli-fr-trimmed-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-xlmr_english_chinese_train_shuffled_1986_test2000_en.md b/docs/_posts/ahmedlone127/2024-09-20-xlmr_english_chinese_train_shuffled_1986_test2000_en.md new file mode 100644 index 00000000000000..b5ec43e9826beb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-xlmr_english_chinese_train_shuffled_1986_test2000_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlmr_english_chinese_train_shuffled_1986_test2000 XlmRoBertaForSequenceClassification from patpizio +author: John Snow Labs +name: xlmr_english_chinese_train_shuffled_1986_test2000 +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlmr_english_chinese_train_shuffled_1986_test2000` is a English model originally trained by patpizio. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmr_english_chinese_train_shuffled_1986_test2000_en_5.5.0_3.0_1726865787739.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmr_english_chinese_train_shuffled_1986_test2000_en_5.5.0_3.0_1726865787739.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlmr_english_chinese_train_shuffled_1986_test2000","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlmr_english_chinese_train_shuffled_1986_test2000", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmr_english_chinese_train_shuffled_1986_test2000| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|825.9 MB| + +## References + +https://huggingface.co/patpizio/xlmr-en-zh-train_shuffled-1986-test2000 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-xlmr_english_chinese_train_shuffled_1986_test2000_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-xlmr_english_chinese_train_shuffled_1986_test2000_pipeline_en.md new file mode 100644 index 00000000000000..4d83cf1fae7add --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-xlmr_english_chinese_train_shuffled_1986_test2000_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlmr_english_chinese_train_shuffled_1986_test2000_pipeline pipeline XlmRoBertaForSequenceClassification from patpizio +author: John Snow Labs +name: xlmr_english_chinese_train_shuffled_1986_test2000_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlmr_english_chinese_train_shuffled_1986_test2000_pipeline` is a English model originally trained by patpizio. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmr_english_chinese_train_shuffled_1986_test2000_pipeline_en_5.5.0_3.0_1726865900090.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmr_english_chinese_train_shuffled_1986_test2000_pipeline_en_5.5.0_3.0_1726865900090.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlmr_english_chinese_train_shuffled_1986_test2000_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlmr_english_chinese_train_shuffled_1986_test2000_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmr_english_chinese_train_shuffled_1986_test2000_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|825.9 MB| + +## References + +https://huggingface.co/patpizio/xlmr-en-zh-train_shuffled-1986-test2000 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-xlmr_sinhalese_english_all_shuffled_1986_test2000_en.md b/docs/_posts/ahmedlone127/2024-09-20-xlmr_sinhalese_english_all_shuffled_1986_test2000_en.md new file mode 100644 index 00000000000000..bcf6b858759b46 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-xlmr_sinhalese_english_all_shuffled_1986_test2000_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlmr_sinhalese_english_all_shuffled_1986_test2000 XlmRoBertaForSequenceClassification from patpizio +author: John Snow Labs +name: xlmr_sinhalese_english_all_shuffled_1986_test2000 +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlmr_sinhalese_english_all_shuffled_1986_test2000` is a English model originally trained by patpizio. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmr_sinhalese_english_all_shuffled_1986_test2000_en_5.5.0_3.0_1726864925595.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmr_sinhalese_english_all_shuffled_1986_test2000_en_5.5.0_3.0_1726864925595.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlmr_sinhalese_english_all_shuffled_1986_test2000","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlmr_sinhalese_english_all_shuffled_1986_test2000", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmr_sinhalese_english_all_shuffled_1986_test2000| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|814.5 MB| + +## References + +https://huggingface.co/patpizio/xlmr-si-en-all_shuffled-1986-test2000 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-xlmr_sinhalese_english_all_shuffled_1986_test2000_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-xlmr_sinhalese_english_all_shuffled_1986_test2000_pipeline_en.md new file mode 100644 index 00000000000000..e73f80a67361ea --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-xlmr_sinhalese_english_all_shuffled_1986_test2000_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlmr_sinhalese_english_all_shuffled_1986_test2000_pipeline pipeline XlmRoBertaForSequenceClassification from patpizio +author: John Snow Labs +name: xlmr_sinhalese_english_all_shuffled_1986_test2000_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlmr_sinhalese_english_all_shuffled_1986_test2000_pipeline` is a English model originally trained by patpizio. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmr_sinhalese_english_all_shuffled_1986_test2000_pipeline_en_5.5.0_3.0_1726865046378.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmr_sinhalese_english_all_shuffled_1986_test2000_pipeline_en_5.5.0_3.0_1726865046378.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlmr_sinhalese_english_all_shuffled_1986_test2000_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlmr_sinhalese_english_all_shuffled_1986_test2000_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmr_sinhalese_english_all_shuffled_1986_test2000_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|814.5 MB| + +## References + +https://huggingface.co/patpizio/xlmr-si-en-all_shuffled-1986-test2000 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-xnli_xlm_r_only_english_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-xnli_xlm_r_only_english_pipeline_en.md new file mode 100644 index 00000000000000..00745fbf79e8ce --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-xnli_xlm_r_only_english_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xnli_xlm_r_only_english_pipeline pipeline XlmRoBertaForSequenceClassification from semindan +author: John Snow Labs +name: xnli_xlm_r_only_english_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xnli_xlm_r_only_english_pipeline` is a English model originally trained by semindan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xnli_xlm_r_only_english_pipeline_en_5.5.0_3.0_1726800691101.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xnli_xlm_r_only_english_pipeline_en_5.5.0_3.0_1726800691101.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xnli_xlm_r_only_english_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xnli_xlm_r_only_english_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xnli_xlm_r_only_english_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|811.9 MB| + +## References + +https://huggingface.co/semindan/xnli_xlm_r_only_en + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-xnli_xlm_r_only_french_en.md b/docs/_posts/ahmedlone127/2024-09-20-xnli_xlm_r_only_french_en.md new file mode 100644 index 00000000000000..5357d6aea05870 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-xnli_xlm_r_only_french_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xnli_xlm_r_only_french XlmRoBertaForSequenceClassification from semindan +author: John Snow Labs +name: xnli_xlm_r_only_french +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xnli_xlm_r_only_french` is a English model originally trained by semindan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xnli_xlm_r_only_french_en_5.5.0_3.0_1726872639105.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xnli_xlm_r_only_french_en_5.5.0_3.0_1726872639105.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xnli_xlm_r_only_french","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xnli_xlm_r_only_french", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xnli_xlm_r_only_french| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|810.8 MB| + +## References + +https://huggingface.co/semindan/xnli_xlm_r_only_fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-xnli_xlm_r_only_french_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-xnli_xlm_r_only_french_pipeline_en.md new file mode 100644 index 00000000000000..afd53dbdffb43b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-xnli_xlm_r_only_french_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xnli_xlm_r_only_french_pipeline pipeline XlmRoBertaForSequenceClassification from semindan +author: John Snow Labs +name: xnli_xlm_r_only_french_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xnli_xlm_r_only_french_pipeline` is a English model originally trained by semindan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xnli_xlm_r_only_french_pipeline_en_5.5.0_3.0_1726872765998.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xnli_xlm_r_only_french_pipeline_en_5.5.0_3.0_1726872765998.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xnli_xlm_r_only_french_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xnli_xlm_r_only_french_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xnli_xlm_r_only_french_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|810.8 MB| + +## References + +https://huggingface.co/semindan/xnli_xlm_r_only_fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-xnli_xlm_r_only_hindi_en.md b/docs/_posts/ahmedlone127/2024-09-20-xnli_xlm_r_only_hindi_en.md new file mode 100644 index 00000000000000..d241590f3cca62 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-xnli_xlm_r_only_hindi_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xnli_xlm_r_only_hindi XlmRoBertaForSequenceClassification from semindan +author: John Snow Labs +name: xnli_xlm_r_only_hindi +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xnli_xlm_r_only_hindi` is a English model originally trained by semindan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xnli_xlm_r_only_hindi_en_5.5.0_3.0_1726845592925.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xnli_xlm_r_only_hindi_en_5.5.0_3.0_1726845592925.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xnli_xlm_r_only_hindi","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xnli_xlm_r_only_hindi", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xnli_xlm_r_only_hindi| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|803.2 MB| + +## References + +https://huggingface.co/semindan/xnli_xlm_r_only_hi \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-xnli_xlm_r_only_hindi_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-xnli_xlm_r_only_hindi_pipeline_en.md new file mode 100644 index 00000000000000..416cd69a257383 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-xnli_xlm_r_only_hindi_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xnli_xlm_r_only_hindi_pipeline pipeline XlmRoBertaForSequenceClassification from semindan +author: John Snow Labs +name: xnli_xlm_r_only_hindi_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xnli_xlm_r_only_hindi_pipeline` is a English model originally trained by semindan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xnli_xlm_r_only_hindi_pipeline_en_5.5.0_3.0_1726845724886.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xnli_xlm_r_only_hindi_pipeline_en_5.5.0_3.0_1726845724886.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xnli_xlm_r_only_hindi_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xnli_xlm_r_only_hindi_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xnli_xlm_r_only_hindi_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|803.2 MB| + +## References + +https://huggingface.co/semindan/xnli_xlm_r_only_hi + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-xnli_xlm_r_only_spanish_en.md b/docs/_posts/ahmedlone127/2024-09-20-xnli_xlm_r_only_spanish_en.md new file mode 100644 index 00000000000000..5a8a76b4258b74 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-xnli_xlm_r_only_spanish_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xnli_xlm_r_only_spanish XlmRoBertaForSequenceClassification from semindan +author: John Snow Labs +name: xnli_xlm_r_only_spanish +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xnli_xlm_r_only_spanish` is a English model originally trained by semindan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xnli_xlm_r_only_spanish_en_5.5.0_3.0_1726865235552.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xnli_xlm_r_only_spanish_en_5.5.0_3.0_1726865235552.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xnli_xlm_r_only_spanish","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xnli_xlm_r_only_spanish", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xnli_xlm_r_only_spanish| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|814.7 MB| + +## References + +https://huggingface.co/semindan/xnli_xlm_r_only_es \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-xnli_xlm_r_only_spanish_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-xnli_xlm_r_only_spanish_pipeline_en.md new file mode 100644 index 00000000000000..39ceb62ce048de --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-xnli_xlm_r_only_spanish_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xnli_xlm_r_only_spanish_pipeline pipeline XlmRoBertaForSequenceClassification from semindan +author: John Snow Labs +name: xnli_xlm_r_only_spanish_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xnli_xlm_r_only_spanish_pipeline` is a English model originally trained by semindan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xnli_xlm_r_only_spanish_pipeline_en_5.5.0_3.0_1726865361232.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xnli_xlm_r_only_spanish_pipeline_en_5.5.0_3.0_1726865361232.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xnli_xlm_r_only_spanish_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xnli_xlm_r_only_spanish_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xnli_xlm_r_only_spanish_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|814.7 MB| + +## References + +https://huggingface.co/semindan/xnli_xlm_r_only_es + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-your_model_checkpoint_finetuned_your_task_en.md b/docs/_posts/ahmedlone127/2024-09-20-your_model_checkpoint_finetuned_your_task_en.md new file mode 100644 index 00000000000000..c9c5e34fba64e1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-your_model_checkpoint_finetuned_your_task_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English your_model_checkpoint_finetuned_your_task DistilBertForSequenceClassification from hanzla107 +author: John Snow Labs +name: your_model_checkpoint_finetuned_your_task +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`your_model_checkpoint_finetuned_your_task` is a English model originally trained by hanzla107. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/your_model_checkpoint_finetuned_your_task_en_5.5.0_3.0_1726842011995.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/your_model_checkpoint_finetuned_your_task_en_5.5.0_3.0_1726842011995.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("your_model_checkpoint_finetuned_your_task","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("your_model_checkpoint_finetuned_your_task", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|your_model_checkpoint_finetuned_your_task| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/hanzla107/your_model_checkpoint-finetuned-your_task \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-your_model_checkpoint_finetuned_your_task_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-your_model_checkpoint_finetuned_your_task_pipeline_en.md new file mode 100644 index 00000000000000..344f83c1a2d06b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-your_model_checkpoint_finetuned_your_task_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English your_model_checkpoint_finetuned_your_task_pipeline pipeline DistilBertForSequenceClassification from hanzla107 +author: John Snow Labs +name: your_model_checkpoint_finetuned_your_task_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`your_model_checkpoint_finetuned_your_task_pipeline` is a English model originally trained by hanzla107. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/your_model_checkpoint_finetuned_your_task_pipeline_en_5.5.0_3.0_1726842023984.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/your_model_checkpoint_finetuned_your_task_pipeline_en_5.5.0_3.0_1726842023984.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("your_model_checkpoint_finetuned_your_task_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("your_model_checkpoint_finetuned_your_task_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|your_model_checkpoint_finetuned_your_task_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/hanzla107/your_model_checkpoint-finetuned-your_task + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-your_repo_name_iwaves_en.md b/docs/_posts/ahmedlone127/2024-09-20-your_repo_name_iwaves_en.md new file mode 100644 index 00000000000000..4a6e152c357865 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-your_repo_name_iwaves_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English your_repo_name_iwaves DistilBertForSequenceClassification from Iwaves +author: John Snow Labs +name: your_repo_name_iwaves +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`your_repo_name_iwaves` is a English model originally trained by Iwaves. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/your_repo_name_iwaves_en_5.5.0_3.0_1726832695360.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/your_repo_name_iwaves_en_5.5.0_3.0_1726832695360.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("your_repo_name_iwaves","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("your_repo_name_iwaves", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|your_repo_name_iwaves| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Iwaves/your-repo-name \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-your_repo_name_iwaves_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-your_repo_name_iwaves_pipeline_en.md new file mode 100644 index 00000000000000..84d15f7d82612c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-your_repo_name_iwaves_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English your_repo_name_iwaves_pipeline pipeline DistilBertForSequenceClassification from Iwaves +author: John Snow Labs +name: your_repo_name_iwaves_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`your_repo_name_iwaves_pipeline` is a English model originally trained by Iwaves. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/your_repo_name_iwaves_pipeline_en_5.5.0_3.0_1726832707735.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/your_repo_name_iwaves_pipeline_en_5.5.0_3.0_1726832707735.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("your_repo_name_iwaves_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("your_repo_name_iwaves_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|your_repo_name_iwaves_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Iwaves/your-repo-name + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-0730bert_en.md b/docs/_posts/ahmedlone127/2024-09-21-0730bert_en.md new file mode 100644 index 00000000000000..507205b50d3148 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-0730bert_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English 0730bert BertForSequenceClassification from ryan0218 +author: John Snow Labs +name: 0730bert +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`0730bert` is a English model originally trained by ryan0218. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/0730bert_en_5.5.0_3.0_1726955822083.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/0730bert_en_5.5.0_3.0_1726955822083.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("0730bert","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("0730bert", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|0730bert| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/ryan0218/0730Bert \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-0730bert_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-0730bert_pipeline_en.md new file mode 100644 index 00000000000000..e0780be401f03e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-0730bert_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English 0730bert_pipeline pipeline BertForSequenceClassification from ryan0218 +author: John Snow Labs +name: 0730bert_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`0730bert_pipeline` is a English model originally trained by ryan0218. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/0730bert_pipeline_en_5.5.0_3.0_1726955844718.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/0730bert_pipeline_en_5.5.0_3.0_1726955844718.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("0730bert_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("0730bert_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|0730bert_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/ryan0218/0730Bert + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-2020_q1_50p_filtered_en.md b/docs/_posts/ahmedlone127/2024-09-21-2020_q1_50p_filtered_en.md new file mode 100644 index 00000000000000..b05759467987e0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-2020_q1_50p_filtered_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English 2020_q1_50p_filtered RoBertaEmbeddings from DouglasPontes +author: John Snow Labs +name: 2020_q1_50p_filtered +date: 2024-09-21 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`2020_q1_50p_filtered` is a English model originally trained by DouglasPontes. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/2020_q1_50p_filtered_en_5.5.0_3.0_1726934376925.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/2020_q1_50p_filtered_en_5.5.0_3.0_1726934376925.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("2020_q1_50p_filtered","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("2020_q1_50p_filtered","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|2020_q1_50p_filtered| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|466.1 MB| + +## References + +https://huggingface.co/DouglasPontes/2020-Q1-50p-filtered \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-2020_q1_50p_filtered_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-2020_q1_50p_filtered_pipeline_en.md new file mode 100644 index 00000000000000..dab9348d02665c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-2020_q1_50p_filtered_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English 2020_q1_50p_filtered_pipeline pipeline RoBertaEmbeddings from DouglasPontes +author: John Snow Labs +name: 2020_q1_50p_filtered_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`2020_q1_50p_filtered_pipeline` is a English model originally trained by DouglasPontes. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/2020_q1_50p_filtered_pipeline_en_5.5.0_3.0_1726934397872.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/2020_q1_50p_filtered_pipeline_en_5.5.0_3.0_1726934397872.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("2020_q1_50p_filtered_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("2020_q1_50p_filtered_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|2020_q1_50p_filtered_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|466.1 MB| + +## References + +https://huggingface.co/DouglasPontes/2020-Q1-50p-filtered + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-2020_q1_50p_filtered_random_en.md b/docs/_posts/ahmedlone127/2024-09-21-2020_q1_50p_filtered_random_en.md new file mode 100644 index 00000000000000..591c546b55c9a7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-2020_q1_50p_filtered_random_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English 2020_q1_50p_filtered_random RoBertaEmbeddings from DouglasPontes +author: John Snow Labs +name: 2020_q1_50p_filtered_random +date: 2024-09-21 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`2020_q1_50p_filtered_random` is a English model originally trained by DouglasPontes. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/2020_q1_50p_filtered_random_en_5.5.0_3.0_1726957715846.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/2020_q1_50p_filtered_random_en_5.5.0_3.0_1726957715846.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("2020_q1_50p_filtered_random","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("2020_q1_50p_filtered_random","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|2020_q1_50p_filtered_random| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|466.1 MB| + +## References + +https://huggingface.co/DouglasPontes/2020-Q1-50p-filtered-random \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-2020_q1_50p_filtered_random_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-2020_q1_50p_filtered_random_pipeline_en.md new file mode 100644 index 00000000000000..0a4a5188206173 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-2020_q1_50p_filtered_random_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English 2020_q1_50p_filtered_random_pipeline pipeline RoBertaEmbeddings from DouglasPontes +author: John Snow Labs +name: 2020_q1_50p_filtered_random_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`2020_q1_50p_filtered_random_pipeline` is a English model originally trained by DouglasPontes. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/2020_q1_50p_filtered_random_pipeline_en_5.5.0_3.0_1726957738187.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/2020_q1_50p_filtered_random_pipeline_en_5.5.0_3.0_1726957738187.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("2020_q1_50p_filtered_random_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("2020_q1_50p_filtered_random_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|2020_q1_50p_filtered_random_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|466.1 MB| + +## References + +https://huggingface.co/DouglasPontes/2020-Q1-50p-filtered-random + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-2020_q1_75p_filtered_random_en.md b/docs/_posts/ahmedlone127/2024-09-21-2020_q1_75p_filtered_random_en.md new file mode 100644 index 00000000000000..09f336254f305d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-2020_q1_75p_filtered_random_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English 2020_q1_75p_filtered_random RoBertaEmbeddings from DouglasPontes +author: John Snow Labs +name: 2020_q1_75p_filtered_random +date: 2024-09-21 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`2020_q1_75p_filtered_random` is a English model originally trained by DouglasPontes. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/2020_q1_75p_filtered_random_en_5.5.0_3.0_1726958120107.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/2020_q1_75p_filtered_random_en_5.5.0_3.0_1726958120107.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("2020_q1_75p_filtered_random","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("2020_q1_75p_filtered_random","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|2020_q1_75p_filtered_random| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|466.0 MB| + +## References + +https://huggingface.co/DouglasPontes/2020-Q1-75p-filtered-random \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-2020_q1_75p_filtered_random_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-2020_q1_75p_filtered_random_pipeline_en.md new file mode 100644 index 00000000000000..9963e5580591d7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-2020_q1_75p_filtered_random_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English 2020_q1_75p_filtered_random_pipeline pipeline RoBertaEmbeddings from DouglasPontes +author: John Snow Labs +name: 2020_q1_75p_filtered_random_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`2020_q1_75p_filtered_random_pipeline` is a English model originally trained by DouglasPontes. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/2020_q1_75p_filtered_random_pipeline_en_5.5.0_3.0_1726958141978.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/2020_q1_75p_filtered_random_pipeline_en_5.5.0_3.0_1726958141978.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("2020_q1_75p_filtered_random_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("2020_q1_75p_filtered_random_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|2020_q1_75p_filtered_random_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|466.1 MB| + +## References + +https://huggingface.co/DouglasPontes/2020-Q1-75p-filtered-random + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-2020_q3_50p_filtered_random_prog_from_q2_en.md b/docs/_posts/ahmedlone127/2024-09-21-2020_q3_50p_filtered_random_prog_from_q2_en.md new file mode 100644 index 00000000000000..4a859d937e95fd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-2020_q3_50p_filtered_random_prog_from_q2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English 2020_q3_50p_filtered_random_prog_from_q2 RoBertaEmbeddings from DouglasPontes +author: John Snow Labs +name: 2020_q3_50p_filtered_random_prog_from_q2 +date: 2024-09-21 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`2020_q3_50p_filtered_random_prog_from_q2` is a English model originally trained by DouglasPontes. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/2020_q3_50p_filtered_random_prog_from_q2_en_5.5.0_3.0_1726882061910.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/2020_q3_50p_filtered_random_prog_from_q2_en_5.5.0_3.0_1726882061910.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("2020_q3_50p_filtered_random_prog_from_q2","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("2020_q3_50p_filtered_random_prog_from_q2","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|2020_q3_50p_filtered_random_prog_from_q2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|466.0 MB| + +## References + +https://huggingface.co/DouglasPontes/2020-Q3-50p-filtered-random-prog_from_Q2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-2020_q3_50p_filtered_random_prog_from_q2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-2020_q3_50p_filtered_random_prog_from_q2_pipeline_en.md new file mode 100644 index 00000000000000..0ffa6112336f79 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-2020_q3_50p_filtered_random_prog_from_q2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English 2020_q3_50p_filtered_random_prog_from_q2_pipeline pipeline RoBertaEmbeddings from DouglasPontes +author: John Snow Labs +name: 2020_q3_50p_filtered_random_prog_from_q2_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`2020_q3_50p_filtered_random_prog_from_q2_pipeline` is a English model originally trained by DouglasPontes. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/2020_q3_50p_filtered_random_prog_from_q2_pipeline_en_5.5.0_3.0_1726882083858.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/2020_q3_50p_filtered_random_prog_from_q2_pipeline_en_5.5.0_3.0_1726882083858.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("2020_q3_50p_filtered_random_prog_from_q2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("2020_q3_50p_filtered_random_prog_from_q2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|2020_q3_50p_filtered_random_prog_from_q2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|466.0 MB| + +## References + +https://huggingface.co/DouglasPontes/2020-Q3-50p-filtered-random-prog_from_Q2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-2020_q4_50p_filtered_random_en.md b/docs/_posts/ahmedlone127/2024-09-21-2020_q4_50p_filtered_random_en.md new file mode 100644 index 00000000000000..b0a918ce372a50 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-2020_q4_50p_filtered_random_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English 2020_q4_50p_filtered_random RoBertaEmbeddings from DouglasPontes +author: John Snow Labs +name: 2020_q4_50p_filtered_random +date: 2024-09-21 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`2020_q4_50p_filtered_random` is a English model originally trained by DouglasPontes. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/2020_q4_50p_filtered_random_en_5.5.0_3.0_1726957640736.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/2020_q4_50p_filtered_random_en_5.5.0_3.0_1726957640736.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("2020_q4_50p_filtered_random","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("2020_q4_50p_filtered_random","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|2020_q4_50p_filtered_random| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|466.1 MB| + +## References + +https://huggingface.co/DouglasPontes/2020-Q4-50p-filtered-random \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-2020_q4_50p_filtered_random_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-2020_q4_50p_filtered_random_pipeline_en.md new file mode 100644 index 00000000000000..8c7b428422cef7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-2020_q4_50p_filtered_random_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English 2020_q4_50p_filtered_random_pipeline pipeline RoBertaEmbeddings from DouglasPontes +author: John Snow Labs +name: 2020_q4_50p_filtered_random_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`2020_q4_50p_filtered_random_pipeline` is a English model originally trained by DouglasPontes. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/2020_q4_50p_filtered_random_pipeline_en_5.5.0_3.0_1726957662686.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/2020_q4_50p_filtered_random_pipeline_en_5.5.0_3.0_1726957662686.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("2020_q4_50p_filtered_random_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("2020_q4_50p_filtered_random_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|2020_q4_50p_filtered_random_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|466.1 MB| + +## References + +https://huggingface.co/DouglasPontes/2020-Q4-50p-filtered-random + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-abcnew_en.md b/docs/_posts/ahmedlone127/2024-09-21-abcnew_en.md new file mode 100644 index 00000000000000..4c8990c071d9a4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-abcnew_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English abcnew RoBertaEmbeddings from Alemat +author: John Snow Labs +name: abcnew +date: 2024-09-21 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`abcnew` is a English model originally trained by Alemat. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/abcnew_en_5.5.0_3.0_1726958031473.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/abcnew_en_5.5.0_3.0_1726958031473.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("abcnew","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("abcnew","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|abcnew| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|464.9 MB| + +## References + +https://huggingface.co/Alemat/abcnew \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-abcnew_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-abcnew_pipeline_en.md new file mode 100644 index 00000000000000..3f9f984bbc1f42 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-abcnew_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English abcnew_pipeline pipeline RoBertaEmbeddings from Alemat +author: John Snow Labs +name: abcnew_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`abcnew_pipeline` is a English model originally trained by Alemat. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/abcnew_pipeline_en_5.5.0_3.0_1726958056299.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/abcnew_pipeline_en_5.5.0_3.0_1726958056299.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("abcnew_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("abcnew_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|abcnew_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|465.0 MB| + +## References + +https://huggingface.co/Alemat/abcnew + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-acrossapps_ndd_pagekit_test_content_en.md b/docs/_posts/ahmedlone127/2024-09-21-acrossapps_ndd_pagekit_test_content_en.md new file mode 100644 index 00000000000000..8758bf9d720bfa --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-acrossapps_ndd_pagekit_test_content_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English acrossapps_ndd_pagekit_test_content DistilBertForSequenceClassification from lgk03 +author: John Snow Labs +name: acrossapps_ndd_pagekit_test_content +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`acrossapps_ndd_pagekit_test_content` is a English model originally trained by lgk03. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/acrossapps_ndd_pagekit_test_content_en_5.5.0_3.0_1726952959133.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/acrossapps_ndd_pagekit_test_content_en_5.5.0_3.0_1726952959133.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("acrossapps_ndd_pagekit_test_content","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("acrossapps_ndd_pagekit_test_content", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|acrossapps_ndd_pagekit_test_content| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/lgk03/ACROSSAPPS_NDD-pagekit_test-content \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-acrossapps_ndd_pagekit_test_content_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-acrossapps_ndd_pagekit_test_content_pipeline_en.md new file mode 100644 index 00000000000000..244901ee7e51d6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-acrossapps_ndd_pagekit_test_content_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English acrossapps_ndd_pagekit_test_content_pipeline pipeline DistilBertForSequenceClassification from lgk03 +author: John Snow Labs +name: acrossapps_ndd_pagekit_test_content_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`acrossapps_ndd_pagekit_test_content_pipeline` is a English model originally trained by lgk03. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/acrossapps_ndd_pagekit_test_content_pipeline_en_5.5.0_3.0_1726952971052.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/acrossapps_ndd_pagekit_test_content_pipeline_en_5.5.0_3.0_1726952971052.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("acrossapps_ndd_pagekit_test_content_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("acrossapps_ndd_pagekit_test_content_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|acrossapps_ndd_pagekit_test_content_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/lgk03/ACROSSAPPS_NDD-pagekit_test-content + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-afriberta_base_finetuned_igbo_2e_4_en.md b/docs/_posts/ahmedlone127/2024-09-21-afriberta_base_finetuned_igbo_2e_4_en.md new file mode 100644 index 00000000000000..40c4a4226ae4e7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-afriberta_base_finetuned_igbo_2e_4_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English afriberta_base_finetuned_igbo_2e_4 XlmRoBertaForTokenClassification from grace-pro +author: John Snow Labs +name: afriberta_base_finetuned_igbo_2e_4 +date: 2024-09-21 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`afriberta_base_finetuned_igbo_2e_4` is a English model originally trained by grace-pro. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/afriberta_base_finetuned_igbo_2e_4_en_5.5.0_3.0_1726883669364.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/afriberta_base_finetuned_igbo_2e_4_en_5.5.0_3.0_1726883669364.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("afriberta_base_finetuned_igbo_2e_4","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("afriberta_base_finetuned_igbo_2e_4", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|afriberta_base_finetuned_igbo_2e_4| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|415.2 MB| + +## References + +https://huggingface.co/grace-pro/afriberta-base-finetuned-igbo-2e-4 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-afriberta_base_finetuned_igbo_2e_4_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-afriberta_base_finetuned_igbo_2e_4_pipeline_en.md new file mode 100644 index 00000000000000..27233638abbb92 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-afriberta_base_finetuned_igbo_2e_4_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English afriberta_base_finetuned_igbo_2e_4_pipeline pipeline XlmRoBertaForTokenClassification from grace-pro +author: John Snow Labs +name: afriberta_base_finetuned_igbo_2e_4_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`afriberta_base_finetuned_igbo_2e_4_pipeline` is a English model originally trained by grace-pro. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/afriberta_base_finetuned_igbo_2e_4_pipeline_en_5.5.0_3.0_1726883689564.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/afriberta_base_finetuned_igbo_2e_4_pipeline_en_5.5.0_3.0_1726883689564.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("afriberta_base_finetuned_igbo_2e_4_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("afriberta_base_finetuned_igbo_2e_4_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|afriberta_base_finetuned_igbo_2e_4_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|415.3 MB| + +## References + +https://huggingface.co/grace-pro/afriberta-base-finetuned-igbo-2e-4 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-afrispeech_whisper_tiny_en.md b/docs/_posts/ahmedlone127/2024-09-21-afrispeech_whisper_tiny_en.md new file mode 100644 index 00000000000000..1ac5489b46721b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-afrispeech_whisper_tiny_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English afrispeech_whisper_tiny WhisperForCTC from kanyekuthi +author: John Snow Labs +name: afrispeech_whisper_tiny +date: 2024-09-21 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`afrispeech_whisper_tiny` is a English model originally trained by kanyekuthi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/afrispeech_whisper_tiny_en_5.5.0_3.0_1726904718188.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/afrispeech_whisper_tiny_en_5.5.0_3.0_1726904718188.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("afrispeech_whisper_tiny","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("afrispeech_whisper_tiny", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|afrispeech_whisper_tiny| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|391.3 MB| + +## References + +https://huggingface.co/kanyekuthi/AfriSpeech-whisper-tiny \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-afrispeech_whisper_tiny_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-afrispeech_whisper_tiny_pipeline_en.md new file mode 100644 index 00000000000000..8b2bb711f41277 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-afrispeech_whisper_tiny_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English afrispeech_whisper_tiny_pipeline pipeline WhisperForCTC from kanyekuthi +author: John Snow Labs +name: afrispeech_whisper_tiny_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`afrispeech_whisper_tiny_pipeline` is a English model originally trained by kanyekuthi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/afrispeech_whisper_tiny_pipeline_en_5.5.0_3.0_1726904737347.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/afrispeech_whisper_tiny_pipeline_en_5.5.0_3.0_1726904737347.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("afrispeech_whisper_tiny_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("afrispeech_whisper_tiny_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|afrispeech_whisper_tiny_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|391.3 MB| + +## References + +https://huggingface.co/kanyekuthi/AfriSpeech-whisper-tiny + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-albert_model__29_2_en.md b/docs/_posts/ahmedlone127/2024-09-21-albert_model__29_2_en.md new file mode 100644 index 00000000000000..3efc224244c462 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-albert_model__29_2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English albert_model__29_2 DistilBertForSequenceClassification from KalaiselvanD +author: John Snow Labs +name: albert_model__29_2 +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`albert_model__29_2` is a English model originally trained by KalaiselvanD. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/albert_model__29_2_en_5.5.0_3.0_1726953706792.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/albert_model__29_2_en_5.5.0_3.0_1726953706792.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("albert_model__29_2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("albert_model__29_2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|albert_model__29_2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/KalaiselvanD/albert_model__29_2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-albert_model__29_2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-albert_model__29_2_pipeline_en.md new file mode 100644 index 00000000000000..4f0dc48cf54c3b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-albert_model__29_2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English albert_model__29_2_pipeline pipeline DistilBertForSequenceClassification from KalaiselvanD +author: John Snow Labs +name: albert_model__29_2_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`albert_model__29_2_pipeline` is a English model originally trained by KalaiselvanD. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/albert_model__29_2_pipeline_en_5.5.0_3.0_1726953718477.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/albert_model__29_2_pipeline_en_5.5.0_3.0_1726953718477.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("albert_model__29_2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("albert_model__29_2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|albert_model__29_2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/KalaiselvanD/albert_model__29_2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-albert_qa_chinese_en.md b/docs/_posts/ahmedlone127/2024-09-21-albert_qa_chinese_en.md new file mode 100644 index 00000000000000..e95e3ebafd10bd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-albert_qa_chinese_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English albert_qa_chinese BertForQuestionAnswering from FooJiaYin +author: John Snow Labs +name: albert_qa_chinese +date: 2024-09-21 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`albert_qa_chinese` is a English model originally trained by FooJiaYin. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/albert_qa_chinese_en_5.5.0_3.0_1726928680678.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/albert_qa_chinese_en_5.5.0_3.0_1726928680678.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("albert_qa_chinese","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("albert_qa_chinese", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|albert_qa_chinese| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|58.3 MB| + +## References + +https://huggingface.co/FooJiaYin/albert-qa-chinese \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-albert_qa_chinese_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-albert_qa_chinese_pipeline_en.md new file mode 100644 index 00000000000000..62f4ad8241a663 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-albert_qa_chinese_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English albert_qa_chinese_pipeline pipeline BertForQuestionAnswering from FooJiaYin +author: John Snow Labs +name: albert_qa_chinese_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`albert_qa_chinese_pipeline` is a English model originally trained by FooJiaYin. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/albert_qa_chinese_pipeline_en_5.5.0_3.0_1726928683687.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/albert_qa_chinese_pipeline_en_5.5.0_3.0_1726928683687.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("albert_qa_chinese_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("albert_qa_chinese_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|albert_qa_chinese_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|58.3 MB| + +## References + +https://huggingface.co/FooJiaYin/albert-qa-chinese + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-aleksis_heb_small_he.md b/docs/_posts/ahmedlone127/2024-09-21-aleksis_heb_small_he.md new file mode 100644 index 00000000000000..1734cf6985fa14 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-aleksis_heb_small_he.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Hebrew aleksis_heb_small WhisperForCTC from Alex2575 +author: John Snow Labs +name: aleksis_heb_small +date: 2024-09-21 +tags: [he, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: he +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`aleksis_heb_small` is a Hebrew model originally trained by Alex2575. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/aleksis_heb_small_he_5.5.0_3.0_1726947648816.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/aleksis_heb_small_he_5.5.0_3.0_1726947648816.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("aleksis_heb_small","he") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("aleksis_heb_small", "he") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|aleksis_heb_small| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|he| +|Size:|1.7 GB| + +## References + +https://huggingface.co/Alex2575/aleksis_heb_small \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-all_roberta_large_v1_kitchen_and_dining_6_16_5_en.md b/docs/_posts/ahmedlone127/2024-09-21-all_roberta_large_v1_kitchen_and_dining_6_16_5_en.md new file mode 100644 index 00000000000000..068b8277d32448 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-all_roberta_large_v1_kitchen_and_dining_6_16_5_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English all_roberta_large_v1_kitchen_and_dining_6_16_5 RoBertaForSequenceClassification from fathyshalab +author: John Snow Labs +name: all_roberta_large_v1_kitchen_and_dining_6_16_5 +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`all_roberta_large_v1_kitchen_and_dining_6_16_5` is a English model originally trained by fathyshalab. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/all_roberta_large_v1_kitchen_and_dining_6_16_5_en_5.5.0_3.0_1726900132408.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/all_roberta_large_v1_kitchen_and_dining_6_16_5_en_5.5.0_3.0_1726900132408.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("all_roberta_large_v1_kitchen_and_dining_6_16_5","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("all_roberta_large_v1_kitchen_and_dining_6_16_5", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|all_roberta_large_v1_kitchen_and_dining_6_16_5| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/fathyshalab/all-roberta-large-v1-kitchen_and_dining-6-16-5 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-all_roberta_large_v1_kitchen_and_dining_6_16_5_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-all_roberta_large_v1_kitchen_and_dining_6_16_5_pipeline_en.md new file mode 100644 index 00000000000000..4e40b85166cafa --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-all_roberta_large_v1_kitchen_and_dining_6_16_5_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English all_roberta_large_v1_kitchen_and_dining_6_16_5_pipeline pipeline RoBertaForSequenceClassification from fathyshalab +author: John Snow Labs +name: all_roberta_large_v1_kitchen_and_dining_6_16_5_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`all_roberta_large_v1_kitchen_and_dining_6_16_5_pipeline` is a English model originally trained by fathyshalab. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/all_roberta_large_v1_kitchen_and_dining_6_16_5_pipeline_en_5.5.0_3.0_1726900194480.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/all_roberta_large_v1_kitchen_and_dining_6_16_5_pipeline_en_5.5.0_3.0_1726900194480.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("all_roberta_large_v1_kitchen_and_dining_6_16_5_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("all_roberta_large_v1_kitchen_and_dining_6_16_5_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|all_roberta_large_v1_kitchen_and_dining_6_16_5_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/fathyshalab/all-roberta-large-v1-kitchen_and_dining-6-16-5 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-amazon_topical_chat_sentiment_en.md b/docs/_posts/ahmedlone127/2024-09-21-amazon_topical_chat_sentiment_en.md new file mode 100644 index 00000000000000..a4585dc78146c7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-amazon_topical_chat_sentiment_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English amazon_topical_chat_sentiment DistilBertForSequenceClassification from reichenbach +author: John Snow Labs +name: amazon_topical_chat_sentiment +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`amazon_topical_chat_sentiment` is a English model originally trained by reichenbach. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/amazon_topical_chat_sentiment_en_5.5.0_3.0_1726889129674.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/amazon_topical_chat_sentiment_en_5.5.0_3.0_1726889129674.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("amazon_topical_chat_sentiment","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("amazon_topical_chat_sentiment", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|amazon_topical_chat_sentiment| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/reichenbach/amazon_topical_chat_sentiment \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-amazon_topical_chat_sentiment_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-amazon_topical_chat_sentiment_pipeline_en.md new file mode 100644 index 00000000000000..b891dcfb0297bd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-amazon_topical_chat_sentiment_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English amazon_topical_chat_sentiment_pipeline pipeline DistilBertForSequenceClassification from reichenbach +author: John Snow Labs +name: amazon_topical_chat_sentiment_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`amazon_topical_chat_sentiment_pipeline` is a English model originally trained by reichenbach. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/amazon_topical_chat_sentiment_pipeline_en_5.5.0_3.0_1726889140922.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/amazon_topical_chat_sentiment_pipeline_en_5.5.0_3.0_1726889140922.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("amazon_topical_chat_sentiment_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("amazon_topical_chat_sentiment_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|amazon_topical_chat_sentiment_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/reichenbach/amazon_topical_chat_sentiment + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-angry_en.md b/docs/_posts/ahmedlone127/2024-09-21-angry_en.md new file mode 100644 index 00000000000000..b1deb7dc773640 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-angry_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English angry RoBertaEmbeddings from MatthijsN +author: John Snow Labs +name: angry +date: 2024-09-21 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`angry` is a English model originally trained by MatthijsN. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/angry_en_5.5.0_3.0_1726942707991.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/angry_en_5.5.0_3.0_1726942707991.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("angry","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("angry","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|angry| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|466.3 MB| + +## References + +https://huggingface.co/MatthijsN/angry \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-angry_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-angry_pipeline_en.md new file mode 100644 index 00000000000000..6662a1613c2316 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-angry_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English angry_pipeline pipeline RoBertaEmbeddings from MatthijsN +author: John Snow Labs +name: angry_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`angry_pipeline` is a English model originally trained by MatthijsN. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/angry_pipeline_en_5.5.0_3.0_1726942728859.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/angry_pipeline_en_5.5.0_3.0_1726942728859.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("angry_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("angry_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|angry_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|466.4 MB| + +## References + +https://huggingface.co/MatthijsN/angry + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-arabicbert_arabic_dialect_identification_ar.md b/docs/_posts/ahmedlone127/2024-09-21-arabicbert_arabic_dialect_identification_ar.md new file mode 100644 index 00000000000000..d214e50197eb85 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-arabicbert_arabic_dialect_identification_ar.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Arabic arabicbert_arabic_dialect_identification BertForSequenceClassification from lafifi-24 +author: John Snow Labs +name: arabicbert_arabic_dialect_identification +date: 2024-09-21 +tags: [ar, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: ar +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`arabicbert_arabic_dialect_identification` is a Arabic model originally trained by lafifi-24. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/arabicbert_arabic_dialect_identification_ar_5.5.0_3.0_1726954747998.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/arabicbert_arabic_dialect_identification_ar_5.5.0_3.0_1726954747998.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("arabicbert_arabic_dialect_identification","ar") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("arabicbert_arabic_dialect_identification", "ar") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|arabicbert_arabic_dialect_identification| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|ar| +|Size:|414.3 MB| + +## References + +https://huggingface.co/lafifi-24/arabicBert_arabic_dialect_identification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-arabicbert_arabic_dialect_identification_pipeline_ar.md b/docs/_posts/ahmedlone127/2024-09-21-arabicbert_arabic_dialect_identification_pipeline_ar.md new file mode 100644 index 00000000000000..4cb20e5e51c270 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-arabicbert_arabic_dialect_identification_pipeline_ar.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Arabic arabicbert_arabic_dialect_identification_pipeline pipeline BertForSequenceClassification from lafifi-24 +author: John Snow Labs +name: arabicbert_arabic_dialect_identification_pipeline +date: 2024-09-21 +tags: [ar, open_source, pipeline, onnx] +task: Text Classification +language: ar +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`arabicbert_arabic_dialect_identification_pipeline` is a Arabic model originally trained by lafifi-24. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/arabicbert_arabic_dialect_identification_pipeline_ar_5.5.0_3.0_1726954767946.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/arabicbert_arabic_dialect_identification_pipeline_ar_5.5.0_3.0_1726954767946.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("arabicbert_arabic_dialect_identification_pipeline", lang = "ar") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("arabicbert_arabic_dialect_identification_pipeline", lang = "ar") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|arabicbert_arabic_dialect_identification_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|ar| +|Size:|414.3 MB| + +## References + +https://huggingface.co/lafifi-24/arabicBert_arabic_dialect_identification + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-ariabert_finetuned_digimag_epoch_6_lr_2e_5_freezed_en.md b/docs/_posts/ahmedlone127/2024-09-21-ariabert_finetuned_digimag_epoch_6_lr_2e_5_freezed_en.md new file mode 100644 index 00000000000000..37106d9a6b5543 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-ariabert_finetuned_digimag_epoch_6_lr_2e_5_freezed_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English ariabert_finetuned_digimag_epoch_6_lr_2e_5_freezed RoBertaForSequenceClassification from farzanrahmani +author: John Snow Labs +name: ariabert_finetuned_digimag_epoch_6_lr_2e_5_freezed +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ariabert_finetuned_digimag_epoch_6_lr_2e_5_freezed` is a English model originally trained by farzanrahmani. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ariabert_finetuned_digimag_epoch_6_lr_2e_5_freezed_en_5.5.0_3.0_1726900362872.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ariabert_finetuned_digimag_epoch_6_lr_2e_5_freezed_en_5.5.0_3.0_1726900362872.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("ariabert_finetuned_digimag_epoch_6_lr_2e_5_freezed","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("ariabert_finetuned_digimag_epoch_6_lr_2e_5_freezed", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ariabert_finetuned_digimag_epoch_6_lr_2e_5_freezed| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|492.0 MB| + +## References + +https://huggingface.co/farzanrahmani/AriaBERT_finetuned_digimag_Epoch_6_lr_2e_5_freezed \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-ariabert_finetuned_digimag_epoch_6_lr_2e_5_freezed_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-ariabert_finetuned_digimag_epoch_6_lr_2e_5_freezed_pipeline_en.md new file mode 100644 index 00000000000000..806130182eb595 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-ariabert_finetuned_digimag_epoch_6_lr_2e_5_freezed_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English ariabert_finetuned_digimag_epoch_6_lr_2e_5_freezed_pipeline pipeline RoBertaForSequenceClassification from farzanrahmani +author: John Snow Labs +name: ariabert_finetuned_digimag_epoch_6_lr_2e_5_freezed_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ariabert_finetuned_digimag_epoch_6_lr_2e_5_freezed_pipeline` is a English model originally trained by farzanrahmani. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ariabert_finetuned_digimag_epoch_6_lr_2e_5_freezed_pipeline_en_5.5.0_3.0_1726900385927.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ariabert_finetuned_digimag_epoch_6_lr_2e_5_freezed_pipeline_en_5.5.0_3.0_1726900385927.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("ariabert_finetuned_digimag_epoch_6_lr_2e_5_freezed_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("ariabert_finetuned_digimag_epoch_6_lr_2e_5_freezed_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ariabert_finetuned_digimag_epoch_6_lr_2e_5_freezed_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|492.0 MB| + +## References + +https://huggingface.co/farzanrahmani/AriaBERT_finetuned_digimag_Epoch_6_lr_2e_5_freezed + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-asr_id.md b/docs/_posts/ahmedlone127/2024-09-21-asr_id.md new file mode 100644 index 00000000000000..0f6462ac4c7e1c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-asr_id.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Indonesian asr WhisperForCTC from yusufagung29 +author: John Snow Labs +name: asr +date: 2024-09-21 +tags: [id, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: id +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`asr` is a Indonesian model originally trained by yusufagung29. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/asr_id_5.5.0_3.0_1726912255269.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/asr_id_5.5.0_3.0_1726912255269.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("asr","id") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("asr", "id") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|asr| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|id| +|Size:|1.1 GB| + +## References + +https://huggingface.co/yusufagung29/asr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-asr_iot_en.md b/docs/_posts/ahmedlone127/2024-09-21-asr_iot_en.md new file mode 100644 index 00000000000000..255bb3079b1907 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-asr_iot_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English asr_iot WhisperForCTC from Phong1807 +author: John Snow Labs +name: asr_iot +date: 2024-09-21 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`asr_iot` is a English model originally trained by Phong1807. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/asr_iot_en_5.5.0_3.0_1726910973206.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/asr_iot_en_5.5.0_3.0_1726910973206.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("asr_iot","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("asr_iot", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|asr_iot| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/Phong1807/ASR_IoT \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-asr_iot_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-asr_iot_pipeline_en.md new file mode 100644 index 00000000000000..af7449b782ae0d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-asr_iot_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English asr_iot_pipeline pipeline WhisperForCTC from Phong1807 +author: John Snow Labs +name: asr_iot_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`asr_iot_pipeline` is a English model originally trained by Phong1807. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/asr_iot_pipeline_en_5.5.0_3.0_1726911065446.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/asr_iot_pipeline_en_5.5.0_3.0_1726911065446.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("asr_iot_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("asr_iot_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|asr_iot_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/Phong1807/ASR_IoT + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-asr_pipeline_id.md b/docs/_posts/ahmedlone127/2024-09-21-asr_pipeline_id.md new file mode 100644 index 00000000000000..5d14c2ae1584ab --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-asr_pipeline_id.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Indonesian asr_pipeline pipeline WhisperForCTC from yusufagung29 +author: John Snow Labs +name: asr_pipeline +date: 2024-09-21 +tags: [id, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: id +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`asr_pipeline` is a Indonesian model originally trained by yusufagung29. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/asr_pipeline_id_5.5.0_3.0_1726912553487.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/asr_pipeline_id_5.5.0_3.0_1726912553487.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("asr_pipeline", lang = "id") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("asr_pipeline", lang = "id") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|asr_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|id| +|Size:|1.1 GB| + +## References + +https://huggingface.co/yusufagung29/asr + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-autotrain_5um8a_sa81u_en.md b/docs/_posts/ahmedlone127/2024-09-21-autotrain_5um8a_sa81u_en.md new file mode 100644 index 00000000000000..56e3d14a36acc4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-autotrain_5um8a_sa81u_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English autotrain_5um8a_sa81u DistilBertForSequenceClassification from karthikrathod +author: John Snow Labs +name: autotrain_5um8a_sa81u +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`autotrain_5um8a_sa81u` is a English model originally trained by karthikrathod. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/autotrain_5um8a_sa81u_en_5.5.0_3.0_1726888596665.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/autotrain_5um8a_sa81u_en_5.5.0_3.0_1726888596665.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("autotrain_5um8a_sa81u","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("autotrain_5um8a_sa81u", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|autotrain_5um8a_sa81u| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/karthikrathod/autotrain-5um8a-sa81u \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-autotrain_5um8a_sa81u_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-autotrain_5um8a_sa81u_pipeline_en.md new file mode 100644 index 00000000000000..58c935156033f3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-autotrain_5um8a_sa81u_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English autotrain_5um8a_sa81u_pipeline pipeline DistilBertForSequenceClassification from karthikrathod +author: John Snow Labs +name: autotrain_5um8a_sa81u_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`autotrain_5um8a_sa81u_pipeline` is a English model originally trained by karthikrathod. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/autotrain_5um8a_sa81u_pipeline_en_5.5.0_3.0_1726888613991.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/autotrain_5um8a_sa81u_pipeline_en_5.5.0_3.0_1726888613991.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("autotrain_5um8a_sa81u_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("autotrain_5um8a_sa81u_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|autotrain_5um8a_sa81u_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/karthikrathod/autotrain-5um8a-sa81u + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-autotrain_intent_classification_6categories_roberta_89129143858_en.md b/docs/_posts/ahmedlone127/2024-09-21-autotrain_intent_classification_6categories_roberta_89129143858_en.md new file mode 100644 index 00000000000000..d20482afa3c50f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-autotrain_intent_classification_6categories_roberta_89129143858_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English autotrain_intent_classification_6categories_roberta_89129143858 XlmRoBertaForSequenceClassification from yeye776 +author: John Snow Labs +name: autotrain_intent_classification_6categories_roberta_89129143858 +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`autotrain_intent_classification_6categories_roberta_89129143858` is a English model originally trained by yeye776. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/autotrain_intent_classification_6categories_roberta_89129143858_en_5.5.0_3.0_1726932498765.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/autotrain_intent_classification_6categories_roberta_89129143858_en_5.5.0_3.0_1726932498765.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("autotrain_intent_classification_6categories_roberta_89129143858","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("autotrain_intent_classification_6categories_roberta_89129143858", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|autotrain_intent_classification_6categories_roberta_89129143858| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|770.2 MB| + +## References + +https://huggingface.co/yeye776/autotrain-intent-classification-6categories-roberta-89129143858 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-autotrain_intent_classification_6categories_roberta_89129143858_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-autotrain_intent_classification_6categories_roberta_89129143858_pipeline_en.md new file mode 100644 index 00000000000000..f6a9b29cf1f51a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-autotrain_intent_classification_6categories_roberta_89129143858_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English autotrain_intent_classification_6categories_roberta_89129143858_pipeline pipeline XlmRoBertaForSequenceClassification from yeye776 +author: John Snow Labs +name: autotrain_intent_classification_6categories_roberta_89129143858_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`autotrain_intent_classification_6categories_roberta_89129143858_pipeline` is a English model originally trained by yeye776. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/autotrain_intent_classification_6categories_roberta_89129143858_pipeline_en_5.5.0_3.0_1726932637016.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/autotrain_intent_classification_6categories_roberta_89129143858_pipeline_en_5.5.0_3.0_1726932637016.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("autotrain_intent_classification_6categories_roberta_89129143858_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("autotrain_intent_classification_6categories_roberta_89129143858_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|autotrain_intent_classification_6categories_roberta_89129143858_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|770.2 MB| + +## References + +https://huggingface.co/yeye776/autotrain-intent-classification-6categories-roberta-89129143858 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-autotrain_model_datasaur_nzfmymy2nzu_owy1njq2zjk_en.md b/docs/_posts/ahmedlone127/2024-09-21-autotrain_model_datasaur_nzfmymy2nzu_owy1njq2zjk_en.md new file mode 100644 index 00000000000000..fdd45794a45f46 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-autotrain_model_datasaur_nzfmymy2nzu_owy1njq2zjk_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English autotrain_model_datasaur_nzfmymy2nzu_owy1njq2zjk BertForSequenceClassification from Saripudin +author: John Snow Labs +name: autotrain_model_datasaur_nzfmymy2nzu_owy1njq2zjk +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`autotrain_model_datasaur_nzfmymy2nzu_owy1njq2zjk` is a English model originally trained by Saripudin. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/autotrain_model_datasaur_nzfmymy2nzu_owy1njq2zjk_en_5.5.0_3.0_1726956239230.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/autotrain_model_datasaur_nzfmymy2nzu_owy1njq2zjk_en_5.5.0_3.0_1726956239230.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("autotrain_model_datasaur_nzfmymy2nzu_owy1njq2zjk","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("autotrain_model_datasaur_nzfmymy2nzu_owy1njq2zjk", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|autotrain_model_datasaur_nzfmymy2nzu_owy1njq2zjk| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/Saripudin/autotrain-model-datasaur-NzFmYmY2NzU-OWY1NjQ2Zjk \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-autotrain_model_datasaur_nzfmymy2nzu_owy1njq2zjk_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-autotrain_model_datasaur_nzfmymy2nzu_owy1njq2zjk_pipeline_en.md new file mode 100644 index 00000000000000..5c87f448f139d0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-autotrain_model_datasaur_nzfmymy2nzu_owy1njq2zjk_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English autotrain_model_datasaur_nzfmymy2nzu_owy1njq2zjk_pipeline pipeline BertForSequenceClassification from Saripudin +author: John Snow Labs +name: autotrain_model_datasaur_nzfmymy2nzu_owy1njq2zjk_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`autotrain_model_datasaur_nzfmymy2nzu_owy1njq2zjk_pipeline` is a English model originally trained by Saripudin. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/autotrain_model_datasaur_nzfmymy2nzu_owy1njq2zjk_pipeline_en_5.5.0_3.0_1726956258247.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/autotrain_model_datasaur_nzfmymy2nzu_owy1njq2zjk_pipeline_en_5.5.0_3.0_1726956258247.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("autotrain_model_datasaur_nzfmymy2nzu_owy1njq2zjk_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("autotrain_model_datasaur_nzfmymy2nzu_owy1njq2zjk_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|autotrain_model_datasaur_nzfmymy2nzu_owy1njq2zjk_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/Saripudin/autotrain-model-datasaur-NzFmYmY2NzU-OWY1NjQ2Zjk + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-autotrain_tes2_en.md b/docs/_posts/ahmedlone127/2024-09-21-autotrain_tes2_en.md new file mode 100644 index 00000000000000..bd2803d0682d17 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-autotrain_tes2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English autotrain_tes2 RoBertaForSequenceClassification from vuk123 +author: John Snow Labs +name: autotrain_tes2 +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`autotrain_tes2` is a English model originally trained by vuk123. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/autotrain_tes2_en_5.5.0_3.0_1726900059409.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/autotrain_tes2_en_5.5.0_3.0_1726900059409.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("autotrain_tes2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("autotrain_tes2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|autotrain_tes2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|418.8 MB| + +## References + +https://huggingface.co/vuk123/autotrain-tes2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-autotrain_tes2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-autotrain_tes2_pipeline_en.md new file mode 100644 index 00000000000000..9c61a635e10126 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-autotrain_tes2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English autotrain_tes2_pipeline pipeline RoBertaForSequenceClassification from vuk123 +author: John Snow Labs +name: autotrain_tes2_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`autotrain_tes2_pipeline` is a English model originally trained by vuk123. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/autotrain_tes2_pipeline_en_5.5.0_3.0_1726900098520.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/autotrain_tes2_pipeline_en_5.5.0_3.0_1726900098520.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("autotrain_tes2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("autotrain_tes2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|autotrain_tes2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|418.8 MB| + +## References + +https://huggingface.co/vuk123/autotrain-tes2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-banking77_distilbert_jidhu_mohan_en.md b/docs/_posts/ahmedlone127/2024-09-21-banking77_distilbert_jidhu_mohan_en.md new file mode 100644 index 00000000000000..b890f48eeedaf7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-banking77_distilbert_jidhu_mohan_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English banking77_distilbert_jidhu_mohan DistilBertForSequenceClassification from jidhu-mohan +author: John Snow Labs +name: banking77_distilbert_jidhu_mohan +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`banking77_distilbert_jidhu_mohan` is a English model originally trained by jidhu-mohan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/banking77_distilbert_jidhu_mohan_en_5.5.0_3.0_1726924185100.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/banking77_distilbert_jidhu_mohan_en_5.5.0_3.0_1726924185100.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("banking77_distilbert_jidhu_mohan","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("banking77_distilbert_jidhu_mohan", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|banking77_distilbert_jidhu_mohan| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.7 MB| + +## References + +https://huggingface.co/jidhu-mohan/banking77-distilbert \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-banking77_distilbert_jidhu_mohan_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-banking77_distilbert_jidhu_mohan_pipeline_en.md new file mode 100644 index 00000000000000..0b9f2326ccaffb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-banking77_distilbert_jidhu_mohan_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English banking77_distilbert_jidhu_mohan_pipeline pipeline DistilBertForSequenceClassification from jidhu-mohan +author: John Snow Labs +name: banking77_distilbert_jidhu_mohan_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`banking77_distilbert_jidhu_mohan_pipeline` is a English model originally trained by jidhu-mohan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/banking77_distilbert_jidhu_mohan_pipeline_en_5.5.0_3.0_1726924197494.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/banking77_distilbert_jidhu_mohan_pipeline_en_5.5.0_3.0_1726924197494.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("banking77_distilbert_jidhu_mohan_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("banking77_distilbert_jidhu_mohan_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|banking77_distilbert_jidhu_mohan_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.7 MB| + +## References + +https://huggingface.co/jidhu-mohan/banking77-distilbert + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-base_english_combined_v4_2_0_1_16_1e_06_balmy_sweep_40_en.md b/docs/_posts/ahmedlone127/2024-09-21-base_english_combined_v4_2_0_1_16_1e_06_balmy_sweep_40_en.md new file mode 100644 index 00000000000000..54e2579de9e346 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-base_english_combined_v4_2_0_1_16_1e_06_balmy_sweep_40_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English base_english_combined_v4_2_0_1_16_1e_06_balmy_sweep_40 WhisperForCTC from saahith +author: John Snow Labs +name: base_english_combined_v4_2_0_1_16_1e_06_balmy_sweep_40 +date: 2024-09-21 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`base_english_combined_v4_2_0_1_16_1e_06_balmy_sweep_40` is a English model originally trained by saahith. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/base_english_combined_v4_2_0_1_16_1e_06_balmy_sweep_40_en_5.5.0_3.0_1726878251687.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/base_english_combined_v4_2_0_1_16_1e_06_balmy_sweep_40_en_5.5.0_3.0_1726878251687.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("base_english_combined_v4_2_0_1_16_1e_06_balmy_sweep_40","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("base_english_combined_v4_2_0_1_16_1e_06_balmy_sweep_40", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|base_english_combined_v4_2_0_1_16_1e_06_balmy_sweep_40| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|646.2 MB| + +## References + +https://huggingface.co/saahith/base.en-combined_v4-2-0.1-16-1e-06-balmy-sweep-40 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-base_english_combined_v4_2_0_1_16_1e_06_balmy_sweep_40_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-base_english_combined_v4_2_0_1_16_1e_06_balmy_sweep_40_pipeline_en.md new file mode 100644 index 00000000000000..4d4af16534383f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-base_english_combined_v4_2_0_1_16_1e_06_balmy_sweep_40_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English base_english_combined_v4_2_0_1_16_1e_06_balmy_sweep_40_pipeline pipeline WhisperForCTC from saahith +author: John Snow Labs +name: base_english_combined_v4_2_0_1_16_1e_06_balmy_sweep_40_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`base_english_combined_v4_2_0_1_16_1e_06_balmy_sweep_40_pipeline` is a English model originally trained by saahith. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/base_english_combined_v4_2_0_1_16_1e_06_balmy_sweep_40_pipeline_en_5.5.0_3.0_1726878283355.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/base_english_combined_v4_2_0_1_16_1e_06_balmy_sweep_40_pipeline_en_5.5.0_3.0_1726878283355.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("base_english_combined_v4_2_0_1_16_1e_06_balmy_sweep_40_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("base_english_combined_v4_2_0_1_16_1e_06_balmy_sweep_40_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|base_english_combined_v4_2_0_1_16_1e_06_balmy_sweep_40_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|646.2 MB| + +## References + +https://huggingface.co/saahith/base.en-combined_v4-2-0.1-16-1e-06-balmy-sweep-40 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-base_english_combined_v4_4_0_8_1e_05_divine_sweep_17_en.md b/docs/_posts/ahmedlone127/2024-09-21-base_english_combined_v4_4_0_8_1e_05_divine_sweep_17_en.md new file mode 100644 index 00000000000000..41ce14259acc91 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-base_english_combined_v4_4_0_8_1e_05_divine_sweep_17_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English base_english_combined_v4_4_0_8_1e_05_divine_sweep_17 WhisperForCTC from saahith +author: John Snow Labs +name: base_english_combined_v4_4_0_8_1e_05_divine_sweep_17 +date: 2024-09-21 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`base_english_combined_v4_4_0_8_1e_05_divine_sweep_17` is a English model originally trained by saahith. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/base_english_combined_v4_4_0_8_1e_05_divine_sweep_17_en_5.5.0_3.0_1726960713043.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/base_english_combined_v4_4_0_8_1e_05_divine_sweep_17_en_5.5.0_3.0_1726960713043.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("base_english_combined_v4_4_0_8_1e_05_divine_sweep_17","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("base_english_combined_v4_4_0_8_1e_05_divine_sweep_17", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|base_english_combined_v4_4_0_8_1e_05_divine_sweep_17| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|646.6 MB| + +## References + +https://huggingface.co/saahith/base.en-combined_v4-4-0-8-1e-05-divine-sweep-17 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-base_english_combined_v4_4_0_8_1e_05_divine_sweep_17_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-base_english_combined_v4_4_0_8_1e_05_divine_sweep_17_pipeline_en.md new file mode 100644 index 00000000000000..fe314a7f2ddf9d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-base_english_combined_v4_4_0_8_1e_05_divine_sweep_17_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English base_english_combined_v4_4_0_8_1e_05_divine_sweep_17_pipeline pipeline WhisperForCTC from saahith +author: John Snow Labs +name: base_english_combined_v4_4_0_8_1e_05_divine_sweep_17_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`base_english_combined_v4_4_0_8_1e_05_divine_sweep_17_pipeline` is a English model originally trained by saahith. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/base_english_combined_v4_4_0_8_1e_05_divine_sweep_17_pipeline_en_5.5.0_3.0_1726960744722.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/base_english_combined_v4_4_0_8_1e_05_divine_sweep_17_pipeline_en_5.5.0_3.0_1726960744722.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("base_english_combined_v4_4_0_8_1e_05_divine_sweep_17_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("base_english_combined_v4_4_0_8_1e_05_divine_sweep_17_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|base_english_combined_v4_4_0_8_1e_05_divine_sweep_17_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|646.6 MB| + +## References + +https://huggingface.co/saahith/base.en-combined_v4-4-0-8-1e-05-divine-sweep-17 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-base_english_final_combined_2_0_1_8_1e_05_radiant_sweep_11_en.md b/docs/_posts/ahmedlone127/2024-09-21-base_english_final_combined_2_0_1_8_1e_05_radiant_sweep_11_en.md new file mode 100644 index 00000000000000..0771e81fd739d2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-base_english_final_combined_2_0_1_8_1e_05_radiant_sweep_11_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English base_english_final_combined_2_0_1_8_1e_05_radiant_sweep_11 WhisperForCTC from saahith +author: John Snow Labs +name: base_english_final_combined_2_0_1_8_1e_05_radiant_sweep_11 +date: 2024-09-21 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`base_english_final_combined_2_0_1_8_1e_05_radiant_sweep_11` is a English model originally trained by saahith. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/base_english_final_combined_2_0_1_8_1e_05_radiant_sweep_11_en_5.5.0_3.0_1726909036448.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/base_english_final_combined_2_0_1_8_1e_05_radiant_sweep_11_en_5.5.0_3.0_1726909036448.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("base_english_final_combined_2_0_1_8_1e_05_radiant_sweep_11","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("base_english_final_combined_2_0_1_8_1e_05_radiant_sweep_11", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|base_english_final_combined_2_0_1_8_1e_05_radiant_sweep_11| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|646.4 MB| + +## References + +https://huggingface.co/saahith/base.en-final-combined-2-0.1-8-1e-05-radiant-sweep-11 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-base_english_final_combined_2_0_1_8_1e_05_radiant_sweep_11_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-base_english_final_combined_2_0_1_8_1e_05_radiant_sweep_11_pipeline_en.md new file mode 100644 index 00000000000000..0d448990014109 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-base_english_final_combined_2_0_1_8_1e_05_radiant_sweep_11_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English base_english_final_combined_2_0_1_8_1e_05_radiant_sweep_11_pipeline pipeline WhisperForCTC from saahith +author: John Snow Labs +name: base_english_final_combined_2_0_1_8_1e_05_radiant_sweep_11_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`base_english_final_combined_2_0_1_8_1e_05_radiant_sweep_11_pipeline` is a English model originally trained by saahith. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/base_english_final_combined_2_0_1_8_1e_05_radiant_sweep_11_pipeline_en_5.5.0_3.0_1726909066495.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/base_english_final_combined_2_0_1_8_1e_05_radiant_sweep_11_pipeline_en_5.5.0_3.0_1726909066495.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("base_english_final_combined_2_0_1_8_1e_05_radiant_sweep_11_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("base_english_final_combined_2_0_1_8_1e_05_radiant_sweep_11_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|base_english_final_combined_2_0_1_8_1e_05_radiant_sweep_11_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|646.5 MB| + +## References + +https://huggingface.co/saahith/base.en-final-combined-2-0.1-8-1e-05-radiant-sweep-11 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-base_english_final_combined_2_0_8_1e_05_balmy_sweep_1_en.md b/docs/_posts/ahmedlone127/2024-09-21-base_english_final_combined_2_0_8_1e_05_balmy_sweep_1_en.md new file mode 100644 index 00000000000000..51f15d2bbc0e5c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-base_english_final_combined_2_0_8_1e_05_balmy_sweep_1_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English base_english_final_combined_2_0_8_1e_05_balmy_sweep_1 WhisperForCTC from saahith +author: John Snow Labs +name: base_english_final_combined_2_0_8_1e_05_balmy_sweep_1 +date: 2024-09-21 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`base_english_final_combined_2_0_8_1e_05_balmy_sweep_1` is a English model originally trained by saahith. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/base_english_final_combined_2_0_8_1e_05_balmy_sweep_1_en_5.5.0_3.0_1726950024504.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/base_english_final_combined_2_0_8_1e_05_balmy_sweep_1_en_5.5.0_3.0_1726950024504.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("base_english_final_combined_2_0_8_1e_05_balmy_sweep_1","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("base_english_final_combined_2_0_8_1e_05_balmy_sweep_1", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|base_english_final_combined_2_0_8_1e_05_balmy_sweep_1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|646.6 MB| + +## References + +https://huggingface.co/saahith/base.en-final-combined-2-0-8-1e-05-balmy-sweep-1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-base_english_final_combined_2_0_8_1e_05_balmy_sweep_1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-base_english_final_combined_2_0_8_1e_05_balmy_sweep_1_pipeline_en.md new file mode 100644 index 00000000000000..9d9ee19d13cb85 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-base_english_final_combined_2_0_8_1e_05_balmy_sweep_1_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English base_english_final_combined_2_0_8_1e_05_balmy_sweep_1_pipeline pipeline WhisperForCTC from saahith +author: John Snow Labs +name: base_english_final_combined_2_0_8_1e_05_balmy_sweep_1_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`base_english_final_combined_2_0_8_1e_05_balmy_sweep_1_pipeline` is a English model originally trained by saahith. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/base_english_final_combined_2_0_8_1e_05_balmy_sweep_1_pipeline_en_5.5.0_3.0_1726950055300.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/base_english_final_combined_2_0_8_1e_05_balmy_sweep_1_pipeline_en_5.5.0_3.0_1726950055300.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("base_english_final_combined_2_0_8_1e_05_balmy_sweep_1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("base_english_final_combined_2_0_8_1e_05_balmy_sweep_1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|base_english_final_combined_2_0_8_1e_05_balmy_sweep_1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|646.6 MB| + +## References + +https://huggingface.co/saahith/base.en-final-combined-2-0-8-1e-05-balmy-sweep-1 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-base_english_final_combined_2_0_8_1e_05_pretty_sweep_7_en.md b/docs/_posts/ahmedlone127/2024-09-21-base_english_final_combined_2_0_8_1e_05_pretty_sweep_7_en.md new file mode 100644 index 00000000000000..f43e0dc94f9240 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-base_english_final_combined_2_0_8_1e_05_pretty_sweep_7_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English base_english_final_combined_2_0_8_1e_05_pretty_sweep_7 WhisperForCTC from saahith +author: John Snow Labs +name: base_english_final_combined_2_0_8_1e_05_pretty_sweep_7 +date: 2024-09-21 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`base_english_final_combined_2_0_8_1e_05_pretty_sweep_7` is a English model originally trained by saahith. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/base_english_final_combined_2_0_8_1e_05_pretty_sweep_7_en_5.5.0_3.0_1726905289447.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/base_english_final_combined_2_0_8_1e_05_pretty_sweep_7_en_5.5.0_3.0_1726905289447.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("base_english_final_combined_2_0_8_1e_05_pretty_sweep_7","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("base_english_final_combined_2_0_8_1e_05_pretty_sweep_7", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|base_english_final_combined_2_0_8_1e_05_pretty_sweep_7| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|646.4 MB| + +## References + +https://huggingface.co/saahith/base.en-final-combined-2-0-8-1e-05-pretty-sweep-7 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-base_english_final_combined_2_0_8_1e_05_pretty_sweep_7_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-base_english_final_combined_2_0_8_1e_05_pretty_sweep_7_pipeline_en.md new file mode 100644 index 00000000000000..961d92de52fe27 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-base_english_final_combined_2_0_8_1e_05_pretty_sweep_7_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English base_english_final_combined_2_0_8_1e_05_pretty_sweep_7_pipeline pipeline WhisperForCTC from saahith +author: John Snow Labs +name: base_english_final_combined_2_0_8_1e_05_pretty_sweep_7_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`base_english_final_combined_2_0_8_1e_05_pretty_sweep_7_pipeline` is a English model originally trained by saahith. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/base_english_final_combined_2_0_8_1e_05_pretty_sweep_7_pipeline_en_5.5.0_3.0_1726905323189.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/base_english_final_combined_2_0_8_1e_05_pretty_sweep_7_pipeline_en_5.5.0_3.0_1726905323189.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("base_english_final_combined_2_0_8_1e_05_pretty_sweep_7_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("base_english_final_combined_2_0_8_1e_05_pretty_sweep_7_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|base_english_final_combined_2_0_8_1e_05_pretty_sweep_7_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|646.4 MB| + +## References + +https://huggingface.co/saahith/base.en-final-combined-2-0-8-1e-05-pretty-sweep-7 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-bengali_whisper_base_bn.md b/docs/_posts/ahmedlone127/2024-09-21-bengali_whisper_base_bn.md new file mode 100644 index 00000000000000..feb91c9af7f455 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-bengali_whisper_base_bn.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Bengali bengali_whisper_base WhisperForCTC from emon-j +author: John Snow Labs +name: bengali_whisper_base +date: 2024-09-21 +tags: [bn, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: bn +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bengali_whisper_base` is a Bengali model originally trained by emon-j. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bengali_whisper_base_bn_5.5.0_3.0_1726906138544.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bengali_whisper_base_bn_5.5.0_3.0_1726906138544.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("bengali_whisper_base","bn") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("bengali_whisper_base", "bn") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bengali_whisper_base| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|bn| +|Size:|642.0 MB| + +## References + +https://huggingface.co/emon-j/Bengali-Whisper-Base \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-bengali_whisper_base_pipeline_bn.md b/docs/_posts/ahmedlone127/2024-09-21-bengali_whisper_base_pipeline_bn.md new file mode 100644 index 00000000000000..315a97dd959e84 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-bengali_whisper_base_pipeline_bn.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Bengali bengali_whisper_base_pipeline pipeline WhisperForCTC from emon-j +author: John Snow Labs +name: bengali_whisper_base_pipeline +date: 2024-09-21 +tags: [bn, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: bn +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bengali_whisper_base_pipeline` is a Bengali model originally trained by emon-j. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bengali_whisper_base_pipeline_bn_5.5.0_3.0_1726906170359.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bengali_whisper_base_pipeline_bn_5.5.0_3.0_1726906170359.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bengali_whisper_base_pipeline", lang = "bn") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bengali_whisper_base_pipeline", lang = "bn") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bengali_whisper_base_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|bn| +|Size:|642.0 MB| + +## References + +https://huggingface.co/emon-j/Bengali-Whisper-Base + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-bert_210_en.md b/docs/_posts/ahmedlone127/2024-09-21-bert_210_en.md new file mode 100644 index 00000000000000..30b263ba9bb290 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-bert_210_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_210 DistilBertForSequenceClassification from DRAGOO +author: John Snow Labs +name: bert_210 +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_210` is a English model originally trained by DRAGOO. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_210_en_5.5.0_3.0_1726924033899.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_210_en_5.5.0_3.0_1726924033899.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("bert_210","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("bert_210", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_210| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|250.1 MB| + +## References + +https://huggingface.co/DRAGOO/bert_210 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-bert_210_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-bert_210_pipeline_en.md new file mode 100644 index 00000000000000..4de64252ac095d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-bert_210_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_210_pipeline pipeline DistilBertForSequenceClassification from DRAGOO +author: John Snow Labs +name: bert_210_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_210_pipeline` is a English model originally trained by DRAGOO. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_210_pipeline_en_5.5.0_3.0_1726924046839.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_210_pipeline_en_5.5.0_3.0_1726924046839.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_210_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_210_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_210_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|250.1 MB| + +## References + +https://huggingface.co/DRAGOO/bert_210 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-bert_base_dutch_cased_finetuned_ner8_en.md b/docs/_posts/ahmedlone127/2024-09-21-bert_base_dutch_cased_finetuned_ner8_en.md new file mode 100644 index 00000000000000..d9ff2f732f00c1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-bert_base_dutch_cased_finetuned_ner8_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_dutch_cased_finetuned_ner8 BertForTokenClassification from Matthijsvanhof +author: John Snow Labs +name: bert_base_dutch_cased_finetuned_ner8 +date: 2024-09-21 +tags: [en, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_dutch_cased_finetuned_ner8` is a English model originally trained by Matthijsvanhof. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_dutch_cased_finetuned_ner8_en_5.5.0_3.0_1726889649969.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_dutch_cased_finetuned_ner8_en_5.5.0_3.0_1726889649969.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("bert_base_dutch_cased_finetuned_ner8","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_base_dutch_cased_finetuned_ner8", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_dutch_cased_finetuned_ner8| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|406.8 MB| + +## References + +https://huggingface.co/Matthijsvanhof/bert-base-dutch-cased-finetuned-NER8 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-bert_base_dutch_cased_finetuned_ner8_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-bert_base_dutch_cased_finetuned_ner8_pipeline_en.md new file mode 100644 index 00000000000000..44c8db91f0eee3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-bert_base_dutch_cased_finetuned_ner8_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_base_dutch_cased_finetuned_ner8_pipeline pipeline BertForTokenClassification from Matthijsvanhof +author: John Snow Labs +name: bert_base_dutch_cased_finetuned_ner8_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_dutch_cased_finetuned_ner8_pipeline` is a English model originally trained by Matthijsvanhof. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_dutch_cased_finetuned_ner8_pipeline_en_5.5.0_3.0_1726889669063.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_dutch_cased_finetuned_ner8_pipeline_en_5.5.0_3.0_1726889669063.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_dutch_cased_finetuned_ner8_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_dutch_cased_finetuned_ner8_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_dutch_cased_finetuned_ner8_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|406.8 MB| + +## References + +https://huggingface.co/Matthijsvanhof/bert-base-dutch-cased-finetuned-NER8 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-bert_base_imdb2_en.md b/docs/_posts/ahmedlone127/2024-09-21-bert_base_imdb2_en.md new file mode 100644 index 00000000000000..bda3a382cece53 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-bert_base_imdb2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_imdb2 BertForSequenceClassification from Vishwas1 +author: John Snow Labs +name: bert_base_imdb2 +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_imdb2` is a English model originally trained by Vishwas1. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_imdb2_en_5.5.0_3.0_1726925538448.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_imdb2_en_5.5.0_3.0_1726925538448.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_imdb2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_imdb2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_imdb2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/Vishwas1/bert-base-imdb2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-bert_base_imdb2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-bert_base_imdb2_pipeline_en.md new file mode 100644 index 00000000000000..be9f71c6c969c0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-bert_base_imdb2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_base_imdb2_pipeline pipeline BertForSequenceClassification from Vishwas1 +author: John Snow Labs +name: bert_base_imdb2_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_imdb2_pipeline` is a English model originally trained by Vishwas1. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_imdb2_pipeline_en_5.5.0_3.0_1726925556975.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_imdb2_pipeline_en_5.5.0_3.0_1726925556975.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_imdb2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_imdb2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_imdb2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/Vishwas1/bert-base-imdb2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-bert_base_multilingual_cased_0_8_finetuned_squad_pipeline_xx.md b/docs/_posts/ahmedlone127/2024-09-21-bert_base_multilingual_cased_0_8_finetuned_squad_pipeline_xx.md new file mode 100644 index 00000000000000..da1b4843dc8984 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-bert_base_multilingual_cased_0_8_finetuned_squad_pipeline_xx.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Multilingual bert_base_multilingual_cased_0_8_finetuned_squad_pipeline pipeline BertForQuestionAnswering from alikanakar +author: John Snow Labs +name: bert_base_multilingual_cased_0_8_finetuned_squad_pipeline +date: 2024-09-21 +tags: [xx, open_source, pipeline, onnx] +task: Question Answering +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_multilingual_cased_0_8_finetuned_squad_pipeline` is a Multilingual model originally trained by alikanakar. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_multilingual_cased_0_8_finetuned_squad_pipeline_xx_5.5.0_3.0_1726947005890.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_multilingual_cased_0_8_finetuned_squad_pipeline_xx_5.5.0_3.0_1726947005890.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_multilingual_cased_0_8_finetuned_squad_pipeline", lang = "xx") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_multilingual_cased_0_8_finetuned_squad_pipeline", lang = "xx") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_multilingual_cased_0_8_finetuned_squad_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|xx| +|Size:|665.1 MB| + +## References + +https://huggingface.co/alikanakar/bert-base-multilingual-cased-0_8-finetuned-squad + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-bert_base_multilingual_cased_0_8_finetuned_squad_xx.md b/docs/_posts/ahmedlone127/2024-09-21-bert_base_multilingual_cased_0_8_finetuned_squad_xx.md new file mode 100644 index 00000000000000..4b55d44c46638a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-bert_base_multilingual_cased_0_8_finetuned_squad_xx.md @@ -0,0 +1,86 @@ +--- +layout: model +title: Multilingual bert_base_multilingual_cased_0_8_finetuned_squad BertForQuestionAnswering from alikanakar +author: John Snow Labs +name: bert_base_multilingual_cased_0_8_finetuned_squad +date: 2024-09-21 +tags: [xx, open_source, onnx, question_answering, bert] +task: Question Answering +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_multilingual_cased_0_8_finetuned_squad` is a Multilingual model originally trained by alikanakar. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_multilingual_cased_0_8_finetuned_squad_xx_5.5.0_3.0_1726946975588.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_multilingual_cased_0_8_finetuned_squad_xx_5.5.0_3.0_1726946975588.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_multilingual_cased_0_8_finetuned_squad","xx") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_multilingual_cased_0_8_finetuned_squad", "xx") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_multilingual_cased_0_8_finetuned_squad| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|xx| +|Size:|665.1 MB| + +## References + +https://huggingface.co/alikanakar/bert-base-multilingual-cased-0_8-finetuned-squad \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-bert_base_squad_v1_1_portuguese_ibama_v0_420240915010230_en.md b/docs/_posts/ahmedlone127/2024-09-21-bert_base_squad_v1_1_portuguese_ibama_v0_420240915010230_en.md new file mode 100644 index 00000000000000..00098ad5f2b962 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-bert_base_squad_v1_1_portuguese_ibama_v0_420240915010230_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_squad_v1_1_portuguese_ibama_v0_420240915010230 BertForQuestionAnswering from alcalazans +author: John Snow Labs +name: bert_base_squad_v1_1_portuguese_ibama_v0_420240915010230 +date: 2024-09-21 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_squad_v1_1_portuguese_ibama_v0_420240915010230` is a English model originally trained by alcalazans. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_squad_v1_1_portuguese_ibama_v0_420240915010230_en_5.5.0_3.0_1726946868480.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_squad_v1_1_portuguese_ibama_v0_420240915010230_en_5.5.0_3.0_1726946868480.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_squad_v1_1_portuguese_ibama_v0_420240915010230","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_squad_v1_1_portuguese_ibama_v0_420240915010230", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_squad_v1_1_portuguese_ibama_v0_420240915010230| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|405.9 MB| + +## References + +https://huggingface.co/alcalazans/bert-base-squad-v1.1-pt-IBAMA_v0.420240915010230 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-bert_base_squad_v1_1_portuguese_ibama_v0_420240915010230_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-bert_base_squad_v1_1_portuguese_ibama_v0_420240915010230_pipeline_en.md new file mode 100644 index 00000000000000..56481bdb323b03 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-bert_base_squad_v1_1_portuguese_ibama_v0_420240915010230_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_squad_v1_1_portuguese_ibama_v0_420240915010230_pipeline pipeline BertForQuestionAnswering from alcalazans +author: John Snow Labs +name: bert_base_squad_v1_1_portuguese_ibama_v0_420240915010230_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_squad_v1_1_portuguese_ibama_v0_420240915010230_pipeline` is a English model originally trained by alcalazans. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_squad_v1_1_portuguese_ibama_v0_420240915010230_pipeline_en_5.5.0_3.0_1726946887770.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_squad_v1_1_portuguese_ibama_v0_420240915010230_pipeline_en_5.5.0_3.0_1726946887770.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_squad_v1_1_portuguese_ibama_v0_420240915010230_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_squad_v1_1_portuguese_ibama_v0_420240915010230_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_squad_v1_1_portuguese_ibama_v0_420240915010230_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|405.9 MB| + +## References + +https://huggingface.co/alcalazans/bert-base-squad-v1.1-pt-IBAMA_v0.420240915010230 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-bert_base_squad_v1_1_portuguese_ibama_v0_420240915021306_en.md b/docs/_posts/ahmedlone127/2024-09-21-bert_base_squad_v1_1_portuguese_ibama_v0_420240915021306_en.md new file mode 100644 index 00000000000000..6cff61cf2c1c70 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-bert_base_squad_v1_1_portuguese_ibama_v0_420240915021306_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_squad_v1_1_portuguese_ibama_v0_420240915021306 BertForQuestionAnswering from alcalazans +author: John Snow Labs +name: bert_base_squad_v1_1_portuguese_ibama_v0_420240915021306 +date: 2024-09-21 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_squad_v1_1_portuguese_ibama_v0_420240915021306` is a English model originally trained by alcalazans. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_squad_v1_1_portuguese_ibama_v0_420240915021306_en_5.5.0_3.0_1726922193595.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_squad_v1_1_portuguese_ibama_v0_420240915021306_en_5.5.0_3.0_1726922193595.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_squad_v1_1_portuguese_ibama_v0_420240915021306","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_squad_v1_1_portuguese_ibama_v0_420240915021306", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_squad_v1_1_portuguese_ibama_v0_420240915021306| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|405.9 MB| + +## References + +https://huggingface.co/alcalazans/bert-base-squad-v1.1-pt-IBAMA_v0.420240915021306 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-bert_base_squad_v1_1_portuguese_ibama_v0_420240915021306_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-bert_base_squad_v1_1_portuguese_ibama_v0_420240915021306_pipeline_en.md new file mode 100644 index 00000000000000..576880bef1f4a9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-bert_base_squad_v1_1_portuguese_ibama_v0_420240915021306_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_squad_v1_1_portuguese_ibama_v0_420240915021306_pipeline pipeline BertForQuestionAnswering from alcalazans +author: John Snow Labs +name: bert_base_squad_v1_1_portuguese_ibama_v0_420240915021306_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_squad_v1_1_portuguese_ibama_v0_420240915021306_pipeline` is a English model originally trained by alcalazans. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_squad_v1_1_portuguese_ibama_v0_420240915021306_pipeline_en_5.5.0_3.0_1726922212218.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_squad_v1_1_portuguese_ibama_v0_420240915021306_pipeline_en_5.5.0_3.0_1726922212218.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_squad_v1_1_portuguese_ibama_v0_420240915021306_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_squad_v1_1_portuguese_ibama_v0_420240915021306_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_squad_v1_1_portuguese_ibama_v0_420240915021306_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|406.0 MB| + +## References + +https://huggingface.co/alcalazans/bert-base-squad-v1.1-pt-IBAMA_v0.420240915021306 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-bert_base_squad_v1_1_portuguese_ibama_v0_420240915122818_en.md b/docs/_posts/ahmedlone127/2024-09-21-bert_base_squad_v1_1_portuguese_ibama_v0_420240915122818_en.md new file mode 100644 index 00000000000000..02877518c8d87e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-bert_base_squad_v1_1_portuguese_ibama_v0_420240915122818_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_squad_v1_1_portuguese_ibama_v0_420240915122818 BertForQuestionAnswering from alcalazans +author: John Snow Labs +name: bert_base_squad_v1_1_portuguese_ibama_v0_420240915122818 +date: 2024-09-21 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_squad_v1_1_portuguese_ibama_v0_420240915122818` is a English model originally trained by alcalazans. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_squad_v1_1_portuguese_ibama_v0_420240915122818_en_5.5.0_3.0_1726929009669.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_squad_v1_1_portuguese_ibama_v0_420240915122818_en_5.5.0_3.0_1726929009669.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_squad_v1_1_portuguese_ibama_v0_420240915122818","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_squad_v1_1_portuguese_ibama_v0_420240915122818", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_squad_v1_1_portuguese_ibama_v0_420240915122818| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|405.9 MB| + +## References + +https://huggingface.co/alcalazans/bert-base-squad-v1.1-pt-IBAMA_v0.420240915122818 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-bert_base_squad_v1_1_portuguese_ibama_v0_420240915122818_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-bert_base_squad_v1_1_portuguese_ibama_v0_420240915122818_pipeline_en.md new file mode 100644 index 00000000000000..4c50ea9f7808c8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-bert_base_squad_v1_1_portuguese_ibama_v0_420240915122818_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_squad_v1_1_portuguese_ibama_v0_420240915122818_pipeline pipeline BertForQuestionAnswering from alcalazans +author: John Snow Labs +name: bert_base_squad_v1_1_portuguese_ibama_v0_420240915122818_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_squad_v1_1_portuguese_ibama_v0_420240915122818_pipeline` is a English model originally trained by alcalazans. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_squad_v1_1_portuguese_ibama_v0_420240915122818_pipeline_en_5.5.0_3.0_1726929027327.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_squad_v1_1_portuguese_ibama_v0_420240915122818_pipeline_en_5.5.0_3.0_1726929027327.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_squad_v1_1_portuguese_ibama_v0_420240915122818_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_squad_v1_1_portuguese_ibama_v0_420240915122818_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_squad_v1_1_portuguese_ibama_v0_420240915122818_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|405.9 MB| + +## References + +https://huggingface.co/alcalazans/bert-base-squad-v1.1-pt-IBAMA_v0.420240915122818 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-bert_base_uncased_ep_10_0_b_32_lr_1_2e_06_dp_0_3_swati_0_southern_sotho_false_fh_false_hs_1000_en.md b/docs/_posts/ahmedlone127/2024-09-21-bert_base_uncased_ep_10_0_b_32_lr_1_2e_06_dp_0_3_swati_0_southern_sotho_false_fh_false_hs_1000_en.md new file mode 100644 index 00000000000000..1fe97aa3005e66 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-bert_base_uncased_ep_10_0_b_32_lr_1_2e_06_dp_0_3_swati_0_southern_sotho_false_fh_false_hs_1000_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_uncased_ep_10_0_b_32_lr_1_2e_06_dp_0_3_swati_0_southern_sotho_false_fh_false_hs_1000 BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_ep_10_0_b_32_lr_1_2e_06_dp_0_3_swati_0_southern_sotho_false_fh_false_hs_1000 +date: 2024-09-21 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_ep_10_0_b_32_lr_1_2e_06_dp_0_3_swati_0_southern_sotho_false_fh_false_hs_1000` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_ep_10_0_b_32_lr_1_2e_06_dp_0_3_swati_0_southern_sotho_false_fh_false_hs_1000_en_5.5.0_3.0_1726947179597.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_ep_10_0_b_32_lr_1_2e_06_dp_0_3_swati_0_southern_sotho_false_fh_false_hs_1000_en_5.5.0_3.0_1726947179597.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_ep_10_0_b_32_lr_1_2e_06_dp_0_3_swati_0_southern_sotho_false_fh_false_hs_1000","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_ep_10_0_b_32_lr_1_2e_06_dp_0_3_swati_0_southern_sotho_false_fh_false_hs_1000", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_ep_10_0_b_32_lr_1_2e_06_dp_0_3_swati_0_southern_sotho_false_fh_false_hs_1000| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-ep-10.0-b-32-lr-1.2e-06-dp-0.3-ss-0-st-False-fh-False-hs-1000 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-bert_base_uncased_ep_10_0_b_32_lr_1_2e_06_dp_0_3_swati_0_southern_sotho_false_fh_false_hs_1000_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-bert_base_uncased_ep_10_0_b_32_lr_1_2e_06_dp_0_3_swati_0_southern_sotho_false_fh_false_hs_1000_pipeline_en.md new file mode 100644 index 00000000000000..dba36334a5b55d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-bert_base_uncased_ep_10_0_b_32_lr_1_2e_06_dp_0_3_swati_0_southern_sotho_false_fh_false_hs_1000_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_uncased_ep_10_0_b_32_lr_1_2e_06_dp_0_3_swati_0_southern_sotho_false_fh_false_hs_1000_pipeline pipeline BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_ep_10_0_b_32_lr_1_2e_06_dp_0_3_swati_0_southern_sotho_false_fh_false_hs_1000_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_ep_10_0_b_32_lr_1_2e_06_dp_0_3_swati_0_southern_sotho_false_fh_false_hs_1000_pipeline` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_ep_10_0_b_32_lr_1_2e_06_dp_0_3_swati_0_southern_sotho_false_fh_false_hs_1000_pipeline_en_5.5.0_3.0_1726947198546.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_ep_10_0_b_32_lr_1_2e_06_dp_0_3_swati_0_southern_sotho_false_fh_false_hs_1000_pipeline_en_5.5.0_3.0_1726947198546.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_ep_10_0_b_32_lr_1_2e_06_dp_0_3_swati_0_southern_sotho_false_fh_false_hs_1000_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_ep_10_0_b_32_lr_1_2e_06_dp_0_3_swati_0_southern_sotho_false_fh_false_hs_1000_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_ep_10_0_b_32_lr_1_2e_06_dp_0_3_swati_0_southern_sotho_false_fh_false_hs_1000_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-ep-10.0-b-32-lr-1.2e-06-dp-0.3-ss-0-st-False-fh-False-hs-1000 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-bert_base_uncased_ep_1_56_b_8_lr_4e_07_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_400_en.md b/docs/_posts/ahmedlone127/2024-09-21-bert_base_uncased_ep_1_56_b_8_lr_4e_07_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_400_en.md new file mode 100644 index 00000000000000..e828696ccc2f7a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-bert_base_uncased_ep_1_56_b_8_lr_4e_07_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_400_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_uncased_ep_1_56_b_8_lr_4e_07_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_400 BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_ep_1_56_b_8_lr_4e_07_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_400 +date: 2024-09-21 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_ep_1_56_b_8_lr_4e_07_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_400` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_ep_1_56_b_8_lr_4e_07_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_400_en_5.5.0_3.0_1726947192551.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_ep_1_56_b_8_lr_4e_07_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_400_en_5.5.0_3.0_1726947192551.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_ep_1_56_b_8_lr_4e_07_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_400","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_ep_1_56_b_8_lr_4e_07_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_400", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_ep_1_56_b_8_lr_4e_07_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_400| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-ep-1.56-b-8-lr-4e-07-dp-1.0-ss-0-st-False-fh-False-hs-400 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-bert_base_uncased_ep_1_56_b_8_lr_4e_07_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_400_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-bert_base_uncased_ep_1_56_b_8_lr_4e_07_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_400_pipeline_en.md new file mode 100644 index 00000000000000..13357e1d5dc7cf --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-bert_base_uncased_ep_1_56_b_8_lr_4e_07_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_400_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_uncased_ep_1_56_b_8_lr_4e_07_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_400_pipeline pipeline BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_ep_1_56_b_8_lr_4e_07_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_400_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_ep_1_56_b_8_lr_4e_07_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_400_pipeline` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_ep_1_56_b_8_lr_4e_07_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_400_pipeline_en_5.5.0_3.0_1726947210977.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_ep_1_56_b_8_lr_4e_07_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_400_pipeline_en_5.5.0_3.0_1726947210977.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_ep_1_56_b_8_lr_4e_07_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_400_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_ep_1_56_b_8_lr_4e_07_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_400_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_ep_1_56_b_8_lr_4e_07_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_400_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-ep-1.56-b-8-lr-4e-07-dp-1.0-ss-0-st-False-fh-False-hs-400 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-bert_base_uncased_finetune_squad_ep_1_0_lr_0_0005_wd_0_01_dp_0_99_en.md b/docs/_posts/ahmedlone127/2024-09-21-bert_base_uncased_finetune_squad_ep_1_0_lr_0_0005_wd_0_01_dp_0_99_en.md new file mode 100644 index 00000000000000..0648e9b15d56af --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-bert_base_uncased_finetune_squad_ep_1_0_lr_0_0005_wd_0_01_dp_0_99_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_uncased_finetune_squad_ep_1_0_lr_0_0005_wd_0_01_dp_0_99 BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_finetune_squad_ep_1_0_lr_0_0005_wd_0_01_dp_0_99 +date: 2024-09-21 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetune_squad_ep_1_0_lr_0_0005_wd_0_01_dp_0_99` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_1_0_lr_0_0005_wd_0_01_dp_0_99_en_5.5.0_3.0_1726946321521.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_1_0_lr_0_0005_wd_0_01_dp_0_99_en_5.5.0_3.0_1726946321521.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetune_squad_ep_1_0_lr_0_0005_wd_0_01_dp_0_99","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetune_squad_ep_1_0_lr_0_0005_wd_0_01_dp_0_99", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetune_squad_ep_1_0_lr_0_0005_wd_0_01_dp_0_99| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-finetune-squad-ep-1.0-lr-0.0005-wd-0.01-dp-0.99 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-bert_base_uncased_finetune_squad_ep_1_0_lr_1e_06_wd_0_001_dp_0_2_swati_0_southern_sotho_false_fh_false_hs_100_en.md b/docs/_posts/ahmedlone127/2024-09-21-bert_base_uncased_finetune_squad_ep_1_0_lr_1e_06_wd_0_001_dp_0_2_swati_0_southern_sotho_false_fh_false_hs_100_en.md new file mode 100644 index 00000000000000..d32f29de6bc382 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-bert_base_uncased_finetune_squad_ep_1_0_lr_1e_06_wd_0_001_dp_0_2_swati_0_southern_sotho_false_fh_false_hs_100_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_uncased_finetune_squad_ep_1_0_lr_1e_06_wd_0_001_dp_0_2_swati_0_southern_sotho_false_fh_false_hs_100 BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_finetune_squad_ep_1_0_lr_1e_06_wd_0_001_dp_0_2_swati_0_southern_sotho_false_fh_false_hs_100 +date: 2024-09-21 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetune_squad_ep_1_0_lr_1e_06_wd_0_001_dp_0_2_swati_0_southern_sotho_false_fh_false_hs_100` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_1_0_lr_1e_06_wd_0_001_dp_0_2_swati_0_southern_sotho_false_fh_false_hs_100_en_5.5.0_3.0_1726946574056.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_1_0_lr_1e_06_wd_0_001_dp_0_2_swati_0_southern_sotho_false_fh_false_hs_100_en_5.5.0_3.0_1726946574056.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetune_squad_ep_1_0_lr_1e_06_wd_0_001_dp_0_2_swati_0_southern_sotho_false_fh_false_hs_100","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetune_squad_ep_1_0_lr_1e_06_wd_0_001_dp_0_2_swati_0_southern_sotho_false_fh_false_hs_100", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetune_squad_ep_1_0_lr_1e_06_wd_0_001_dp_0_2_swati_0_southern_sotho_false_fh_false_hs_100| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-finetune-squad-ep-1.0-lr-1e-06-wd-0.001-dp-0.2-ss-0-st-False-fh-False-hs-100 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-bert_base_uncased_finetune_squad_ep_1_0_lr_1e_06_wd_0_001_dp_0_2_swati_0_southern_sotho_false_fh_false_hs_100_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-bert_base_uncased_finetune_squad_ep_1_0_lr_1e_06_wd_0_001_dp_0_2_swati_0_southern_sotho_false_fh_false_hs_100_pipeline_en.md new file mode 100644 index 00000000000000..8b03de129d7f05 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-bert_base_uncased_finetune_squad_ep_1_0_lr_1e_06_wd_0_001_dp_0_2_swati_0_southern_sotho_false_fh_false_hs_100_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_uncased_finetune_squad_ep_1_0_lr_1e_06_wd_0_001_dp_0_2_swati_0_southern_sotho_false_fh_false_hs_100_pipeline pipeline BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_finetune_squad_ep_1_0_lr_1e_06_wd_0_001_dp_0_2_swati_0_southern_sotho_false_fh_false_hs_100_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetune_squad_ep_1_0_lr_1e_06_wd_0_001_dp_0_2_swati_0_southern_sotho_false_fh_false_hs_100_pipeline` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_1_0_lr_1e_06_wd_0_001_dp_0_2_swati_0_southern_sotho_false_fh_false_hs_100_pipeline_en_5.5.0_3.0_1726946592664.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_1_0_lr_1e_06_wd_0_001_dp_0_2_swati_0_southern_sotho_false_fh_false_hs_100_pipeline_en_5.5.0_3.0_1726946592664.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_finetune_squad_ep_1_0_lr_1e_06_wd_0_001_dp_0_2_swati_0_southern_sotho_false_fh_false_hs_100_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_finetune_squad_ep_1_0_lr_1e_06_wd_0_001_dp_0_2_swati_0_southern_sotho_false_fh_false_hs_100_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetune_squad_ep_1_0_lr_1e_06_wd_0_001_dp_0_2_swati_0_southern_sotho_false_fh_false_hs_100_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-finetune-squad-ep-1.0-lr-1e-06-wd-0.001-dp-0.2-ss-0-st-False-fh-False-hs-100 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-bert_base_uncased_finetune_squad_ep_1_0_lr_8e_07_wd_0_08_dp_0_6_swati_20_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-bert_base_uncased_finetune_squad_ep_1_0_lr_8e_07_wd_0_08_dp_0_6_swati_20_pipeline_en.md new file mode 100644 index 00000000000000..b27be263542012 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-bert_base_uncased_finetune_squad_ep_1_0_lr_8e_07_wd_0_08_dp_0_6_swati_20_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_uncased_finetune_squad_ep_1_0_lr_8e_07_wd_0_08_dp_0_6_swati_20_pipeline pipeline BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_finetune_squad_ep_1_0_lr_8e_07_wd_0_08_dp_0_6_swati_20_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetune_squad_ep_1_0_lr_8e_07_wd_0_08_dp_0_6_swati_20_pipeline` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_1_0_lr_8e_07_wd_0_08_dp_0_6_swati_20_pipeline_en_5.5.0_3.0_1726946344313.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_1_0_lr_8e_07_wd_0_08_dp_0_6_swati_20_pipeline_en_5.5.0_3.0_1726946344313.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_finetune_squad_ep_1_0_lr_8e_07_wd_0_08_dp_0_6_swati_20_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_finetune_squad_ep_1_0_lr_8e_07_wd_0_08_dp_0_6_swati_20_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetune_squad_ep_1_0_lr_8e_07_wd_0_08_dp_0_6_swati_20_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-finetune-squad-ep-1.0-lr-8e-07-wd-0.08-dp-0.6-ss-20 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-bert_base_uncased_finetune_squad_ep_1_87_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_500_en.md b/docs/_posts/ahmedlone127/2024-09-21-bert_base_uncased_finetune_squad_ep_1_87_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_500_en.md new file mode 100644 index 00000000000000..e9531520c2c4db --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-bert_base_uncased_finetune_squad_ep_1_87_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_500_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_uncased_finetune_squad_ep_1_87_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_500 BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_finetune_squad_ep_1_87_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_500 +date: 2024-09-21 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetune_squad_ep_1_87_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_500` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_1_87_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_500_en_5.5.0_3.0_1726946619944.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_1_87_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_500_en_5.5.0_3.0_1726946619944.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetune_squad_ep_1_87_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_500","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetune_squad_ep_1_87_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_500", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetune_squad_ep_1_87_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_500| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-finetune-squad-ep-1.87-lr-4e-07-wd-1e-05-dp-1.0-ss-0-st-False-fh-False-hs-500 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-bert_base_uncased_finetune_squad_ep_1_87_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_500_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-bert_base_uncased_finetune_squad_ep_1_87_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_500_pipeline_en.md new file mode 100644 index 00000000000000..2746abfbd649af --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-bert_base_uncased_finetune_squad_ep_1_87_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_500_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_uncased_finetune_squad_ep_1_87_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_500_pipeline pipeline BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_finetune_squad_ep_1_87_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_500_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetune_squad_ep_1_87_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_500_pipeline` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_1_87_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_500_pipeline_en_5.5.0_3.0_1726946638698.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_1_87_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_500_pipeline_en_5.5.0_3.0_1726946638698.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_finetune_squad_ep_1_87_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_500_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_finetune_squad_ep_1_87_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_500_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetune_squad_ep_1_87_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_500_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-finetune-squad-ep-1.87-lr-4e-07-wd-1e-05-dp-1.0-ss-0-st-False-fh-False-hs-500 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-bert_base_uncased_finetune_squad_ep_2_0_lr_0_0001_wd_0_001_dp_0_9_en.md b/docs/_posts/ahmedlone127/2024-09-21-bert_base_uncased_finetune_squad_ep_2_0_lr_0_0001_wd_0_001_dp_0_9_en.md new file mode 100644 index 00000000000000..0f72df0cb48911 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-bert_base_uncased_finetune_squad_ep_2_0_lr_0_0001_wd_0_001_dp_0_9_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_uncased_finetune_squad_ep_2_0_lr_0_0001_wd_0_001_dp_0_9 BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_finetune_squad_ep_2_0_lr_0_0001_wd_0_001_dp_0_9 +date: 2024-09-21 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetune_squad_ep_2_0_lr_0_0001_wd_0_001_dp_0_9` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_2_0_lr_0_0001_wd_0_001_dp_0_9_en_5.5.0_3.0_1726946545397.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_2_0_lr_0_0001_wd_0_001_dp_0_9_en_5.5.0_3.0_1726946545397.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetune_squad_ep_2_0_lr_0_0001_wd_0_001_dp_0_9","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetune_squad_ep_2_0_lr_0_0001_wd_0_001_dp_0_9", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetune_squad_ep_2_0_lr_0_0001_wd_0_001_dp_0_9| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-finetune-squad-ep-2.0-lr-0.0001-wd-0.001-dp-0.9 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-bert_base_uncased_finetune_squad_ep_2_0_lr_0_0001_wd_0_001_dp_0_9_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-bert_base_uncased_finetune_squad_ep_2_0_lr_0_0001_wd_0_001_dp_0_9_pipeline_en.md new file mode 100644 index 00000000000000..3adae85d63cf54 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-bert_base_uncased_finetune_squad_ep_2_0_lr_0_0001_wd_0_001_dp_0_9_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_uncased_finetune_squad_ep_2_0_lr_0_0001_wd_0_001_dp_0_9_pipeline pipeline BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_finetune_squad_ep_2_0_lr_0_0001_wd_0_001_dp_0_9_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetune_squad_ep_2_0_lr_0_0001_wd_0_001_dp_0_9_pipeline` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_2_0_lr_0_0001_wd_0_001_dp_0_9_pipeline_en_5.5.0_3.0_1726946565998.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_2_0_lr_0_0001_wd_0_001_dp_0_9_pipeline_en_5.5.0_3.0_1726946565998.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_finetune_squad_ep_2_0_lr_0_0001_wd_0_001_dp_0_9_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_finetune_squad_ep_2_0_lr_0_0001_wd_0_001_dp_0_9_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetune_squad_ep_2_0_lr_0_0001_wd_0_001_dp_0_9_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-finetune-squad-ep-2.0-lr-0.0001-wd-0.001-dp-0.9 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-bert_base_uncased_finetune_squad_ep_2_0_lr_1e_05_wd_0_001_dp_0_01_swati_0_en.md b/docs/_posts/ahmedlone127/2024-09-21-bert_base_uncased_finetune_squad_ep_2_0_lr_1e_05_wd_0_001_dp_0_01_swati_0_en.md new file mode 100644 index 00000000000000..336be0aab09ed3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-bert_base_uncased_finetune_squad_ep_2_0_lr_1e_05_wd_0_001_dp_0_01_swati_0_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_uncased_finetune_squad_ep_2_0_lr_1e_05_wd_0_001_dp_0_01_swati_0 BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_finetune_squad_ep_2_0_lr_1e_05_wd_0_001_dp_0_01_swati_0 +date: 2024-09-21 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetune_squad_ep_2_0_lr_1e_05_wd_0_001_dp_0_01_swati_0` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_2_0_lr_1e_05_wd_0_001_dp_0_01_swati_0_en_5.5.0_3.0_1726946351971.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_2_0_lr_1e_05_wd_0_001_dp_0_01_swati_0_en_5.5.0_3.0_1726946351971.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetune_squad_ep_2_0_lr_1e_05_wd_0_001_dp_0_01_swati_0","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetune_squad_ep_2_0_lr_1e_05_wd_0_001_dp_0_01_swati_0", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetune_squad_ep_2_0_lr_1e_05_wd_0_001_dp_0_01_swati_0| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-finetune-squad-ep-2.0-lr-1e-05-wd-0.001-dp-0.01-ss-0 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-bert_base_uncased_finetune_squad_ep_2_0_lr_1e_05_wd_0_001_dp_0_01_swati_0_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-bert_base_uncased_finetune_squad_ep_2_0_lr_1e_05_wd_0_001_dp_0_01_swati_0_pipeline_en.md new file mode 100644 index 00000000000000..dab30618ae58bb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-bert_base_uncased_finetune_squad_ep_2_0_lr_1e_05_wd_0_001_dp_0_01_swati_0_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_uncased_finetune_squad_ep_2_0_lr_1e_05_wd_0_001_dp_0_01_swati_0_pipeline pipeline BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_finetune_squad_ep_2_0_lr_1e_05_wd_0_001_dp_0_01_swati_0_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetune_squad_ep_2_0_lr_1e_05_wd_0_001_dp_0_01_swati_0_pipeline` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_2_0_lr_1e_05_wd_0_001_dp_0_01_swati_0_pipeline_en_5.5.0_3.0_1726946370602.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_2_0_lr_1e_05_wd_0_001_dp_0_01_swati_0_pipeline_en_5.5.0_3.0_1726946370602.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_finetune_squad_ep_2_0_lr_1e_05_wd_0_001_dp_0_01_swati_0_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_finetune_squad_ep_2_0_lr_1e_05_wd_0_001_dp_0_01_swati_0_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetune_squad_ep_2_0_lr_1e_05_wd_0_001_dp_0_01_swati_0_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-finetune-squad-ep-2.0-lr-1e-05-wd-0.001-dp-0.01-ss-0 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-bert_base_uncased_finetune_squad_ep_2_0_lr_1e_06_wd_0_001_dp_0_8_swati_0_southern_sotho_true_fh_true_en.md b/docs/_posts/ahmedlone127/2024-09-21-bert_base_uncased_finetune_squad_ep_2_0_lr_1e_06_wd_0_001_dp_0_8_swati_0_southern_sotho_true_fh_true_en.md new file mode 100644 index 00000000000000..30a3884bc51e5b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-bert_base_uncased_finetune_squad_ep_2_0_lr_1e_06_wd_0_001_dp_0_8_swati_0_southern_sotho_true_fh_true_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_uncased_finetune_squad_ep_2_0_lr_1e_06_wd_0_001_dp_0_8_swati_0_southern_sotho_true_fh_true BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_finetune_squad_ep_2_0_lr_1e_06_wd_0_001_dp_0_8_swati_0_southern_sotho_true_fh_true +date: 2024-09-21 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetune_squad_ep_2_0_lr_1e_06_wd_0_001_dp_0_8_swati_0_southern_sotho_true_fh_true` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_2_0_lr_1e_06_wd_0_001_dp_0_8_swati_0_southern_sotho_true_fh_true_en_5.5.0_3.0_1726946682393.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_2_0_lr_1e_06_wd_0_001_dp_0_8_swati_0_southern_sotho_true_fh_true_en_5.5.0_3.0_1726946682393.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetune_squad_ep_2_0_lr_1e_06_wd_0_001_dp_0_8_swati_0_southern_sotho_true_fh_true","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetune_squad_ep_2_0_lr_1e_06_wd_0_001_dp_0_8_swati_0_southern_sotho_true_fh_true", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetune_squad_ep_2_0_lr_1e_06_wd_0_001_dp_0_8_swati_0_southern_sotho_true_fh_true| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-finetune-squad-ep-2.0-lr-1e-06-wd-0.001-dp-0.8-ss-0-st-True-fh-True \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-bert_base_uncased_finetune_squad_ep_2_0_lr_1e_06_wd_0_001_dp_0_8_swati_0_southern_sotho_true_fh_true_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-bert_base_uncased_finetune_squad_ep_2_0_lr_1e_06_wd_0_001_dp_0_8_swati_0_southern_sotho_true_fh_true_pipeline_en.md new file mode 100644 index 00000000000000..f3a120a0129ed3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-bert_base_uncased_finetune_squad_ep_2_0_lr_1e_06_wd_0_001_dp_0_8_swati_0_southern_sotho_true_fh_true_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_uncased_finetune_squad_ep_2_0_lr_1e_06_wd_0_001_dp_0_8_swati_0_southern_sotho_true_fh_true_pipeline pipeline BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_finetune_squad_ep_2_0_lr_1e_06_wd_0_001_dp_0_8_swati_0_southern_sotho_true_fh_true_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetune_squad_ep_2_0_lr_1e_06_wd_0_001_dp_0_8_swati_0_southern_sotho_true_fh_true_pipeline` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_2_0_lr_1e_06_wd_0_001_dp_0_8_swati_0_southern_sotho_true_fh_true_pipeline_en_5.5.0_3.0_1726946701420.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_2_0_lr_1e_06_wd_0_001_dp_0_8_swati_0_southern_sotho_true_fh_true_pipeline_en_5.5.0_3.0_1726946701420.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_finetune_squad_ep_2_0_lr_1e_06_wd_0_001_dp_0_8_swati_0_southern_sotho_true_fh_true_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_finetune_squad_ep_2_0_lr_1e_06_wd_0_001_dp_0_8_swati_0_southern_sotho_true_fh_true_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetune_squad_ep_2_0_lr_1e_06_wd_0_001_dp_0_8_swati_0_southern_sotho_true_fh_true_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-finetune-squad-ep-2.0-lr-1e-06-wd-0.001-dp-0.8-ss-0-st-True-fh-True + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-bert_base_uncased_finetune_squad_ep_3_44_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_800_en.md b/docs/_posts/ahmedlone127/2024-09-21-bert_base_uncased_finetune_squad_ep_3_44_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_800_en.md new file mode 100644 index 00000000000000..1fdc219b050c68 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-bert_base_uncased_finetune_squad_ep_3_44_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_800_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_uncased_finetune_squad_ep_3_44_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_800 BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_finetune_squad_ep_3_44_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_800 +date: 2024-09-21 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetune_squad_ep_3_44_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_800` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_3_44_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_800_en_5.5.0_3.0_1726946458174.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_3_44_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_800_en_5.5.0_3.0_1726946458174.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetune_squad_ep_3_44_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_800","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetune_squad_ep_3_44_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_800", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetune_squad_ep_3_44_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_800| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-finetune-squad-ep-3.44-lr-4e-07-wd-1e-05-dp-1.0-ss-0-st-False-fh-False-hs-800 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-bert_base_uncased_finetune_squad_ep_3_44_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_800_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-bert_base_uncased_finetune_squad_ep_3_44_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_800_pipeline_en.md new file mode 100644 index 00000000000000..34377362b59e83 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-bert_base_uncased_finetune_squad_ep_3_44_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_800_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_uncased_finetune_squad_ep_3_44_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_800_pipeline pipeline BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_finetune_squad_ep_3_44_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_800_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetune_squad_ep_3_44_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_800_pipeline` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_3_44_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_800_pipeline_en_5.5.0_3.0_1726946477206.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_3_44_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_800_pipeline_en_5.5.0_3.0_1726946477206.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_finetune_squad_ep_3_44_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_800_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_finetune_squad_ep_3_44_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_800_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetune_squad_ep_3_44_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_800_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-finetune-squad-ep-3.44-lr-4e-07-wd-1e-05-dp-1.0-ss-0-st-False-fh-False-hs-800 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-bert_base_uncased_finetuned_quac_1qahistory_en.md b/docs/_posts/ahmedlone127/2024-09-21-bert_base_uncased_finetuned_quac_1qahistory_en.md new file mode 100644 index 00000000000000..9a7c882faecc34 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-bert_base_uncased_finetuned_quac_1qahistory_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_uncased_finetuned_quac_1qahistory BertForQuestionAnswering from Jellevdl +author: John Snow Labs +name: bert_base_uncased_finetuned_quac_1qahistory +date: 2024-09-21 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetuned_quac_1qahistory` is a English model originally trained by Jellevdl. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_quac_1qahistory_en_5.5.0_3.0_1726946740836.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_quac_1qahistory_en_5.5.0_3.0_1726946740836.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetuned_quac_1qahistory","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetuned_quac_1qahistory", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetuned_quac_1qahistory| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/Jellevdl/bert-base-uncased-finetuned-quac-1QAhistory \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-bert_base_uncased_finetuned_quac_1qahistory_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-bert_base_uncased_finetuned_quac_1qahistory_pipeline_en.md new file mode 100644 index 00000000000000..4cccbffcfb7a01 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-bert_base_uncased_finetuned_quac_1qahistory_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_uncased_finetuned_quac_1qahistory_pipeline pipeline BertForQuestionAnswering from Jellevdl +author: John Snow Labs +name: bert_base_uncased_finetuned_quac_1qahistory_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetuned_quac_1qahistory_pipeline` is a English model originally trained by Jellevdl. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_quac_1qahistory_pipeline_en_5.5.0_3.0_1726946760086.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_quac_1qahistory_pipeline_en_5.5.0_3.0_1726946760086.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_finetuned_quac_1qahistory_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_finetuned_quac_1qahistory_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetuned_quac_1qahistory_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/Jellevdl/bert-base-uncased-finetuned-quac-1QAhistory + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-bert_base_uncased_mrpc_serjssv_en.md b/docs/_posts/ahmedlone127/2024-09-21-bert_base_uncased_mrpc_serjssv_en.md new file mode 100644 index 00000000000000..e4297e72bf0dc1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-bert_base_uncased_mrpc_serjssv_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_uncased_mrpc_serjssv BertForSequenceClassification from Serjssv +author: John Snow Labs +name: bert_base_uncased_mrpc_serjssv +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_mrpc_serjssv` is a English model originally trained by Serjssv. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_mrpc_serjssv_en_5.5.0_3.0_1726954966664.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_mrpc_serjssv_en_5.5.0_3.0_1726954966664.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_uncased_mrpc_serjssv","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_uncased_mrpc_serjssv", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_mrpc_serjssv| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/Serjssv/bert-base-uncased-mrpc \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-bert_base_uncased_mrpc_serjssv_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-bert_base_uncased_mrpc_serjssv_pipeline_en.md new file mode 100644 index 00000000000000..87f7f41a8936bd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-bert_base_uncased_mrpc_serjssv_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_base_uncased_mrpc_serjssv_pipeline pipeline BertForSequenceClassification from Serjssv +author: John Snow Labs +name: bert_base_uncased_mrpc_serjssv_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_mrpc_serjssv_pipeline` is a English model originally trained by Serjssv. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_mrpc_serjssv_pipeline_en_5.5.0_3.0_1726954985709.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_mrpc_serjssv_pipeline_en_5.5.0_3.0_1726954985709.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_mrpc_serjssv_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_mrpc_serjssv_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_mrpc_serjssv_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/Serjssv/bert-base-uncased-mrpc + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-bert_base_uncased_newscategoryclassification_fullmodel_2_en.md b/docs/_posts/ahmedlone127/2024-09-21-bert_base_uncased_newscategoryclassification_fullmodel_2_en.md new file mode 100644 index 00000000000000..e19c8829a87de0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-bert_base_uncased_newscategoryclassification_fullmodel_2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_uncased_newscategoryclassification_fullmodel_2 DistilBertForSequenceClassification from akashmaggon +author: John Snow Labs +name: bert_base_uncased_newscategoryclassification_fullmodel_2 +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_newscategoryclassification_fullmodel_2` is a English model originally trained by akashmaggon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_newscategoryclassification_fullmodel_2_en_5.5.0_3.0_1726953044175.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_newscategoryclassification_fullmodel_2_en_5.5.0_3.0_1726953044175.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("bert_base_uncased_newscategoryclassification_fullmodel_2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("bert_base_uncased_newscategoryclassification_fullmodel_2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_newscategoryclassification_fullmodel_2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.6 MB| + +## References + +https://huggingface.co/akashmaggon/bert-base-uncased-newscategoryclassification-fullmodel-2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-bert_base_uncased_newscategoryclassification_fullmodel_2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-bert_base_uncased_newscategoryclassification_fullmodel_2_pipeline_en.md new file mode 100644 index 00000000000000..2b8820c6869091 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-bert_base_uncased_newscategoryclassification_fullmodel_2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_base_uncased_newscategoryclassification_fullmodel_2_pipeline pipeline DistilBertForSequenceClassification from akashmaggon +author: John Snow Labs +name: bert_base_uncased_newscategoryclassification_fullmodel_2_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_newscategoryclassification_fullmodel_2_pipeline` is a English model originally trained by akashmaggon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_newscategoryclassification_fullmodel_2_pipeline_en_5.5.0_3.0_1726953056432.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_newscategoryclassification_fullmodel_2_pipeline_en_5.5.0_3.0_1726953056432.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_newscategoryclassification_fullmodel_2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_newscategoryclassification_fullmodel_2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_newscategoryclassification_fullmodel_2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.6 MB| + +## References + +https://huggingface.co/akashmaggon/bert-base-uncased-newscategoryclassification-fullmodel-2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-bert_base_uncased_squad_v1_finetuned_clickbait_detection_en.md b/docs/_posts/ahmedlone127/2024-09-21-bert_base_uncased_squad_v1_finetuned_clickbait_detection_en.md new file mode 100644 index 00000000000000..dd5582a6992810 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-bert_base_uncased_squad_v1_finetuned_clickbait_detection_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_uncased_squad_v1_finetuned_clickbait_detection BertForQuestionAnswering from abdulmanaam +author: John Snow Labs +name: bert_base_uncased_squad_v1_finetuned_clickbait_detection +date: 2024-09-21 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_squad_v1_finetuned_clickbait_detection` is a English model originally trained by abdulmanaam. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_squad_v1_finetuned_clickbait_detection_en_5.5.0_3.0_1726947064641.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_squad_v1_finetuned_clickbait_detection_en_5.5.0_3.0_1726947064641.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_squad_v1_finetuned_clickbait_detection","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_squad_v1_finetuned_clickbait_detection", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_squad_v1_finetuned_clickbait_detection| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/abdulmanaam/bert-base-uncased-squad-v1-finetuned-clickbait-detection \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-bert_base_uncased_squad_v1_finetuned_clickbait_detection_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-bert_base_uncased_squad_v1_finetuned_clickbait_detection_pipeline_en.md new file mode 100644 index 00000000000000..05c4b42c97c168 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-bert_base_uncased_squad_v1_finetuned_clickbait_detection_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_uncased_squad_v1_finetuned_clickbait_detection_pipeline pipeline BertForQuestionAnswering from abdulmanaam +author: John Snow Labs +name: bert_base_uncased_squad_v1_finetuned_clickbait_detection_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_squad_v1_finetuned_clickbait_detection_pipeline` is a English model originally trained by abdulmanaam. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_squad_v1_finetuned_clickbait_detection_pipeline_en_5.5.0_3.0_1726947083855.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_squad_v1_finetuned_clickbait_detection_pipeline_en_5.5.0_3.0_1726947083855.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_squad_v1_finetuned_clickbait_detection_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_squad_v1_finetuned_clickbait_detection_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_squad_v1_finetuned_clickbait_detection_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/abdulmanaam/bert-base-uncased-squad-v1-finetuned-clickbait-detection + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-bert_base_uncased_token_itr0_0_0001_all_01_03_2022_04_48_27_en.md b/docs/_posts/ahmedlone127/2024-09-21-bert_base_uncased_token_itr0_0_0001_all_01_03_2022_04_48_27_en.md new file mode 100644 index 00000000000000..62c2b8c7993f98 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-bert_base_uncased_token_itr0_0_0001_all_01_03_2022_04_48_27_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_uncased_token_itr0_0_0001_all_01_03_2022_04_48_27 BertForTokenClassification from ali2066 +author: John Snow Labs +name: bert_base_uncased_token_itr0_0_0001_all_01_03_2022_04_48_27 +date: 2024-09-21 +tags: [en, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_token_itr0_0_0001_all_01_03_2022_04_48_27` is a English model originally trained by ali2066. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_token_itr0_0_0001_all_01_03_2022_04_48_27_en_5.5.0_3.0_1726889341266.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_token_itr0_0_0001_all_01_03_2022_04_48_27_en_5.5.0_3.0_1726889341266.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("bert_base_uncased_token_itr0_0_0001_all_01_03_2022_04_48_27","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_base_uncased_token_itr0_0_0001_all_01_03_2022_04_48_27", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_token_itr0_0_0001_all_01_03_2022_04_48_27| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/ali2066/bert-base-uncased_token_itr0_0.0001_all_01_03_2022-04_48_27 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-bert_base_uncased_token_itr0_0_0001_all_01_03_2022_04_48_27_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-bert_base_uncased_token_itr0_0_0001_all_01_03_2022_04_48_27_pipeline_en.md new file mode 100644 index 00000000000000..a3de40b9026012 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-bert_base_uncased_token_itr0_0_0001_all_01_03_2022_04_48_27_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_base_uncased_token_itr0_0_0001_all_01_03_2022_04_48_27_pipeline pipeline BertForTokenClassification from ali2066 +author: John Snow Labs +name: bert_base_uncased_token_itr0_0_0001_all_01_03_2022_04_48_27_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_token_itr0_0_0001_all_01_03_2022_04_48_27_pipeline` is a English model originally trained by ali2066. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_token_itr0_0_0001_all_01_03_2022_04_48_27_pipeline_en_5.5.0_3.0_1726889361014.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_token_itr0_0_0001_all_01_03_2022_04_48_27_pipeline_en_5.5.0_3.0_1726889361014.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_token_itr0_0_0001_all_01_03_2022_04_48_27_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_token_itr0_0_0001_all_01_03_2022_04_48_27_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_token_itr0_0_0001_all_01_03_2022_04_48_27_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/ali2066/bert-base-uncased_token_itr0_0.0001_all_01_03_2022-04_48_27 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-bert_classifier_bert_base_multilingual_uncased_sentiment_pipeline_xx.md b/docs/_posts/ahmedlone127/2024-09-21-bert_classifier_bert_base_multilingual_uncased_sentiment_pipeline_xx.md new file mode 100644 index 00000000000000..5268dc518e6463 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-bert_classifier_bert_base_multilingual_uncased_sentiment_pipeline_xx.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Multilingual bert_classifier_bert_base_multilingual_uncased_sentiment_pipeline pipeline BertForSequenceClassification from nlptown +author: John Snow Labs +name: bert_classifier_bert_base_multilingual_uncased_sentiment_pipeline +date: 2024-09-21 +tags: [xx, open_source, pipeline, onnx] +task: Text Classification +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_classifier_bert_base_multilingual_uncased_sentiment_pipeline` is a Multilingual model originally trained by nlptown. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_classifier_bert_base_multilingual_uncased_sentiment_pipeline_xx_5.5.0_3.0_1726955077035.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_classifier_bert_base_multilingual_uncased_sentiment_pipeline_xx_5.5.0_3.0_1726955077035.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_classifier_bert_base_multilingual_uncased_sentiment_pipeline", lang = "xx") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_classifier_bert_base_multilingual_uncased_sentiment_pipeline", lang = "xx") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_classifier_bert_base_multilingual_uncased_sentiment_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|xx| +|Size:|627.8 MB| + +## References + +https://huggingface.co/nlptownbert-base-multilingual-uncased-sentiment + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-bert_classifier_bert_base_multilingual_uncased_sentiment_xx.md b/docs/_posts/ahmedlone127/2024-09-21-bert_classifier_bert_base_multilingual_uncased_sentiment_xx.md new file mode 100644 index 00000000000000..e6ef5c19448509 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-bert_classifier_bert_base_multilingual_uncased_sentiment_xx.md @@ -0,0 +1,95 @@ +--- +layout: model +title: Multilingual BertForSequenceClassification Base Uncased model (from nlptown) +author: John Snow Labs +name: bert_classifier_bert_base_multilingual_uncased_sentiment +date: 2024-09-21 +tags: [sequence_classification, bert, openvino, xx, open_source, onnx] +task: Zero-Shot Classification +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +“ + + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. bert-base-multilingual-uncased-sentiment is a MultiLingual model originally trained by nlptown. + +## Predicted Entities + + + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_classifier_bert_base_multilingual_uncased_sentiment_xx_5.5.0_3.0_1726955046733.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_classifier_bert_base_multilingual_uncased_sentiment_xx_5.5.0_3.0_1726955046733.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +seq_classifier = BertForSequenceClassification.pretrained("bert_classifier_bert_base_multilingual_uncased_sentiment","xx") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("class") + +pipeline = Pipeline(stages=[documentAssembler, tokenizer, seq_classifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCols(Array("text")) + .setOutputCols(Array("document")) + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val seq_classifier = BertForSequenceClassification.pretrained("bert_classifier_bert_base_multilingual_uncased_sentiment","xx") + .setInputCols(Array("document", "token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, seq_classifier)) + +val data = Seq("PUT YOUR STRING HERE").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_classifier_bert_base_multilingual_uncased_sentiment| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|xx| +|Size:|627.7 MB| \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-bert_distilled_multi_teacher_model_random_mind_epoch7_alpha0_8_refined_en.md b/docs/_posts/ahmedlone127/2024-09-21-bert_distilled_multi_teacher_model_random_mind_epoch7_alpha0_8_refined_en.md new file mode 100644 index 00000000000000..ebd63bbad3a3ca --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-bert_distilled_multi_teacher_model_random_mind_epoch7_alpha0_8_refined_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_distilled_multi_teacher_model_random_mind_epoch7_alpha0_8_refined DistilBertForSequenceClassification from ArafatBHossain +author: John Snow Labs +name: bert_distilled_multi_teacher_model_random_mind_epoch7_alpha0_8_refined +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_distilled_multi_teacher_model_random_mind_epoch7_alpha0_8_refined` is a English model originally trained by ArafatBHossain. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_distilled_multi_teacher_model_random_mind_epoch7_alpha0_8_refined_en_5.5.0_3.0_1726953055560.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_distilled_multi_teacher_model_random_mind_epoch7_alpha0_8_refined_en_5.5.0_3.0_1726953055560.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("bert_distilled_multi_teacher_model_random_mind_epoch7_alpha0_8_refined","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("bert_distilled_multi_teacher_model_random_mind_epoch7_alpha0_8_refined", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_distilled_multi_teacher_model_random_mind_epoch7_alpha0_8_refined| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/ArafatBHossain/bert-distilled-multi_teacher_model_random_mind_epoch7_alpha0.8_refined \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-bert_distilled_multi_teacher_model_random_mind_epoch7_alpha0_8_refined_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-bert_distilled_multi_teacher_model_random_mind_epoch7_alpha0_8_refined_pipeline_en.md new file mode 100644 index 00000000000000..c1b3c588110dc6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-bert_distilled_multi_teacher_model_random_mind_epoch7_alpha0_8_refined_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_distilled_multi_teacher_model_random_mind_epoch7_alpha0_8_refined_pipeline pipeline DistilBertForSequenceClassification from ArafatBHossain +author: John Snow Labs +name: bert_distilled_multi_teacher_model_random_mind_epoch7_alpha0_8_refined_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_distilled_multi_teacher_model_random_mind_epoch7_alpha0_8_refined_pipeline` is a English model originally trained by ArafatBHossain. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_distilled_multi_teacher_model_random_mind_epoch7_alpha0_8_refined_pipeline_en_5.5.0_3.0_1726953068052.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_distilled_multi_teacher_model_random_mind_epoch7_alpha0_8_refined_pipeline_en_5.5.0_3.0_1726953068052.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_distilled_multi_teacher_model_random_mind_epoch7_alpha0_8_refined_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_distilled_multi_teacher_model_random_mind_epoch7_alpha0_8_refined_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_distilled_multi_teacher_model_random_mind_epoch7_alpha0_8_refined_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/ArafatBHossain/bert-distilled-multi_teacher_model_random_mind_epoch7_alpha0.8_refined + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-bert_job_recommendation_model_en.md b/docs/_posts/ahmedlone127/2024-09-21-bert_job_recommendation_model_en.md new file mode 100644 index 00000000000000..0d99e0e4582a4b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-bert_job_recommendation_model_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_job_recommendation_model BertForSequenceClassification from vanninh2101 +author: John Snow Labs +name: bert_job_recommendation_model +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_job_recommendation_model` is a English model originally trained by vanninh2101. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_job_recommendation_model_en_5.5.0_3.0_1726955973230.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_job_recommendation_model_en_5.5.0_3.0_1726955973230.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_job_recommendation_model","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_job_recommendation_model", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_job_recommendation_model| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|438.3 MB| + +## References + +https://huggingface.co/vanninh2101/bert_job_recommendation_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-bert_job_recommendation_model_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-bert_job_recommendation_model_pipeline_en.md new file mode 100644 index 00000000000000..5b1633523c0d48 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-bert_job_recommendation_model_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_job_recommendation_model_pipeline pipeline BertForSequenceClassification from vanninh2101 +author: John Snow Labs +name: bert_job_recommendation_model_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_job_recommendation_model_pipeline` is a English model originally trained by vanninh2101. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_job_recommendation_model_pipeline_en_5.5.0_3.0_1726955993908.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_job_recommendation_model_pipeline_en_5.5.0_3.0_1726955993908.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_job_recommendation_model_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_job_recommendation_model_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_job_recommendation_model_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|438.3 MB| + +## References + +https://huggingface.co/vanninh2101/bert_job_recommendation_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-bert_large_cased_squad_model2_en.md b/docs/_posts/ahmedlone127/2024-09-21-bert_large_cased_squad_model2_en.md new file mode 100644 index 00000000000000..56f8b7a22951f5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-bert_large_cased_squad_model2_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_large_cased_squad_model2 BertForQuestionAnswering from varun-v-rao +author: John Snow Labs +name: bert_large_cased_squad_model2 +date: 2024-09-21 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_large_cased_squad_model2` is a English model originally trained by varun-v-rao. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_large_cased_squad_model2_en_5.5.0_3.0_1726946820562.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_large_cased_squad_model2_en_5.5.0_3.0_1726946820562.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_large_cased_squad_model2","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_large_cased_squad_model2", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_large_cased_squad_model2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/varun-v-rao/bert-large-cased-squad-model2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-bert_large_cased_squad_model2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-bert_large_cased_squad_model2_pipeline_en.md new file mode 100644 index 00000000000000..c0df10526376bc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-bert_large_cased_squad_model2_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_large_cased_squad_model2_pipeline pipeline BertForQuestionAnswering from varun-v-rao +author: John Snow Labs +name: bert_large_cased_squad_model2_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_large_cased_squad_model2_pipeline` is a English model originally trained by varun-v-rao. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_large_cased_squad_model2_pipeline_en_5.5.0_3.0_1726946879264.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_large_cased_squad_model2_pipeline_en_5.5.0_3.0_1726946879264.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_large_cased_squad_model2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_large_cased_squad_model2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_large_cased_squad_model2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/varun-v-rao/bert-large-cased-squad-model2 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-bert_large_finetuned_tqa_en.md b/docs/_posts/ahmedlone127/2024-09-21-bert_large_finetuned_tqa_en.md new file mode 100644 index 00000000000000..2a84af452dcd72 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-bert_large_finetuned_tqa_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_large_finetuned_tqa BertForQuestionAnswering from tvsharish +author: John Snow Labs +name: bert_large_finetuned_tqa +date: 2024-09-21 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_large_finetuned_tqa` is a English model originally trained by tvsharish. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_large_finetuned_tqa_en_5.5.0_3.0_1726946543912.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_large_finetuned_tqa_en_5.5.0_3.0_1726946543912.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_large_finetuned_tqa","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_large_finetuned_tqa", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_large_finetuned_tqa| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/tvsharish/bert-large-finetuned-tqa \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-bert_large_finetuned_tqa_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-bert_large_finetuned_tqa_pipeline_en.md new file mode 100644 index 00000000000000..057bddd366db8f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-bert_large_finetuned_tqa_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_large_finetuned_tqa_pipeline pipeline BertForQuestionAnswering from tvsharish +author: John Snow Labs +name: bert_large_finetuned_tqa_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_large_finetuned_tqa_pipeline` is a English model originally trained by tvsharish. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_large_finetuned_tqa_pipeline_en_5.5.0_3.0_1726946603968.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_large_finetuned_tqa_pipeline_en_5.5.0_3.0_1726946603968.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_large_finetuned_tqa_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_large_finetuned_tqa_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_large_finetuned_tqa_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/tvsharish/bert-large-finetuned-tqa + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-bert_large_uncased_finetuned_squad_v2_en.md b/docs/_posts/ahmedlone127/2024-09-21-bert_large_uncased_finetuned_squad_v2_en.md new file mode 100644 index 00000000000000..8d2662726afb40 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-bert_large_uncased_finetuned_squad_v2_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_large_uncased_finetuned_squad_v2 BertForQuestionAnswering from ALOQAS +author: John Snow Labs +name: bert_large_uncased_finetuned_squad_v2 +date: 2024-09-21 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_large_uncased_finetuned_squad_v2` is a English model originally trained by ALOQAS. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_large_uncased_finetuned_squad_v2_en_5.5.0_3.0_1726946691672.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_large_uncased_finetuned_squad_v2_en_5.5.0_3.0_1726946691672.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_large_uncased_finetuned_squad_v2","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_large_uncased_finetuned_squad_v2", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_large_uncased_finetuned_squad_v2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/ALOQAS/bert-large-uncased-finetuned-squad-v2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-bert_large_uncased_finetuned_squad_v2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-bert_large_uncased_finetuned_squad_v2_pipeline_en.md new file mode 100644 index 00000000000000..2a124bd1c26b5e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-bert_large_uncased_finetuned_squad_v2_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_large_uncased_finetuned_squad_v2_pipeline pipeline BertForQuestionAnswering from ALOQAS +author: John Snow Labs +name: bert_large_uncased_finetuned_squad_v2_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_large_uncased_finetuned_squad_v2_pipeline` is a English model originally trained by ALOQAS. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_large_uncased_finetuned_squad_v2_pipeline_en_5.5.0_3.0_1726946750705.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_large_uncased_finetuned_squad_v2_pipeline_en_5.5.0_3.0_1726946750705.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_large_uncased_finetuned_squad_v2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_large_uncased_finetuned_squad_v2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_large_uncased_finetuned_squad_v2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/ALOQAS/bert-large-uncased-finetuned-squad-v2 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-bert_mini_mnli_en.md b/docs/_posts/ahmedlone127/2024-09-21-bert_mini_mnli_en.md new file mode 100644 index 00000000000000..6f0fbfe993f157 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-bert_mini_mnli_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_mini_mnli BertForSequenceClassification from prajjwal1 +author: John Snow Labs +name: bert_mini_mnli +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_mini_mnli` is a English model originally trained by prajjwal1. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_mini_mnli_en_5.5.0_3.0_1726902229482.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_mini_mnli_en_5.5.0_3.0_1726902229482.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_mini_mnli","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_mini_mnli", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_mini_mnli| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|42.1 MB| + +## References + +https://huggingface.co/prajjwal1/bert-mini-mnli \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-bert_mini_mnli_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-bert_mini_mnli_pipeline_en.md new file mode 100644 index 00000000000000..63c928457fc076 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-bert_mini_mnli_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_mini_mnli_pipeline pipeline BertForSequenceClassification from prajjwal1 +author: John Snow Labs +name: bert_mini_mnli_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_mini_mnli_pipeline` is a English model originally trained by prajjwal1. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_mini_mnli_pipeline_en_5.5.0_3.0_1726902231735.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_mini_mnli_pipeline_en_5.5.0_3.0_1726902231735.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_mini_mnli_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_mini_mnli_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_mini_mnli_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|42.1 MB| + +## References + +https://huggingface.co/prajjwal1/bert-mini-mnli + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-bert_multi_turkish_tweet_pipeline_tr.md b/docs/_posts/ahmedlone127/2024-09-21-bert_multi_turkish_tweet_pipeline_tr.md new file mode 100644 index 00000000000000..654e32f768533d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-bert_multi_turkish_tweet_pipeline_tr.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Turkish bert_multi_turkish_tweet_pipeline pipeline BertForSequenceClassification from anilguven +author: John Snow Labs +name: bert_multi_turkish_tweet_pipeline +date: 2024-09-21 +tags: [tr, open_source, pipeline, onnx] +task: Text Classification +language: tr +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_multi_turkish_tweet_pipeline` is a Turkish model originally trained by anilguven. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_multi_turkish_tweet_pipeline_tr_5.5.0_3.0_1726902415063.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_multi_turkish_tweet_pipeline_tr_5.5.0_3.0_1726902415063.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_multi_turkish_tweet_pipeline", lang = "tr") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_multi_turkish_tweet_pipeline", lang = "tr") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_multi_turkish_tweet_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|tr| +|Size:|627.8 MB| + +## References + +https://huggingface.co/anilguven/bert_multi_turkish_tweet + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-bert_multi_turkish_tweet_tr.md b/docs/_posts/ahmedlone127/2024-09-21-bert_multi_turkish_tweet_tr.md new file mode 100644 index 00000000000000..9b8879c3a0040f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-bert_multi_turkish_tweet_tr.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Turkish bert_multi_turkish_tweet BertForSequenceClassification from anilguven +author: John Snow Labs +name: bert_multi_turkish_tweet +date: 2024-09-21 +tags: [tr, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: tr +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_multi_turkish_tweet` is a Turkish model originally trained by anilguven. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_multi_turkish_tweet_tr_5.5.0_3.0_1726902385804.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_multi_turkish_tweet_tr_5.5.0_3.0_1726902385804.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_multi_turkish_tweet","tr") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_multi_turkish_tweet", "tr") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_multi_turkish_tweet| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|tr| +|Size:|627.7 MB| + +## References + +https://huggingface.co/anilguven/bert_multi_turkish_tweet \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-bert_multilingual_nature_pipeline_xx.md b/docs/_posts/ahmedlone127/2024-09-21-bert_multilingual_nature_pipeline_xx.md new file mode 100644 index 00000000000000..8d5b9c515d9c02 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-bert_multilingual_nature_pipeline_xx.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Multilingual bert_multilingual_nature_pipeline pipeline BertForQuestionAnswering from sue123456 +author: John Snow Labs +name: bert_multilingual_nature_pipeline +date: 2024-09-21 +tags: [xx, open_source, pipeline, onnx] +task: Question Answering +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_multilingual_nature_pipeline` is a Multilingual model originally trained by sue123456. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_multilingual_nature_pipeline_xx_5.5.0_3.0_1726947078970.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_multilingual_nature_pipeline_xx_5.5.0_3.0_1726947078970.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_multilingual_nature_pipeline", lang = "xx") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_multilingual_nature_pipeline", lang = "xx") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_multilingual_nature_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|xx| +|Size:|665.1 MB| + +## References + +https://huggingface.co/sue123456/bert-multilingual-nature + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-bert_multilingual_nature_xx.md b/docs/_posts/ahmedlone127/2024-09-21-bert_multilingual_nature_xx.md new file mode 100644 index 00000000000000..a5469e5e97cc83 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-bert_multilingual_nature_xx.md @@ -0,0 +1,86 @@ +--- +layout: model +title: Multilingual bert_multilingual_nature BertForQuestionAnswering from sue123456 +author: John Snow Labs +name: bert_multilingual_nature +date: 2024-09-21 +tags: [xx, open_source, onnx, question_answering, bert] +task: Question Answering +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_multilingual_nature` is a Multilingual model originally trained by sue123456. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_multilingual_nature_xx_5.5.0_3.0_1726947047998.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_multilingual_nature_xx_5.5.0_3.0_1726947047998.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_multilingual_nature","xx") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_multilingual_nature", "xx") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_multilingual_nature| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|xx| +|Size:|665.1 MB| + +## References + +https://huggingface.co/sue123456/bert-multilingual-nature \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-bert_rtgender_opgender_annotations_en.md b/docs/_posts/ahmedlone127/2024-09-21-bert_rtgender_opgender_annotations_en.md new file mode 100644 index 00000000000000..55e8771a8d3a10 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-bert_rtgender_opgender_annotations_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_rtgender_opgender_annotations BertForSequenceClassification from Cameron +author: John Snow Labs +name: bert_rtgender_opgender_annotations +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_rtgender_opgender_annotations` is a English model originally trained by Cameron. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_rtgender_opgender_annotations_en_5.5.0_3.0_1726956591257.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_rtgender_opgender_annotations_en_5.5.0_3.0_1726956591257.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_rtgender_opgender_annotations","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_rtgender_opgender_annotations", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_rtgender_opgender_annotations| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|405.9 MB| + +## References + +https://huggingface.co/Cameron/BERT-rtgender-opgender-annotations \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-bert_rtgender_opgender_annotations_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-bert_rtgender_opgender_annotations_pipeline_en.md new file mode 100644 index 00000000000000..118ab0914dbfa8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-bert_rtgender_opgender_annotations_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_rtgender_opgender_annotations_pipeline pipeline BertForSequenceClassification from Cameron +author: John Snow Labs +name: bert_rtgender_opgender_annotations_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_rtgender_opgender_annotations_pipeline` is a English model originally trained by Cameron. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_rtgender_opgender_annotations_pipeline_en_5.5.0_3.0_1726956610302.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_rtgender_opgender_annotations_pipeline_en_5.5.0_3.0_1726956610302.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_rtgender_opgender_annotations_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_rtgender_opgender_annotations_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_rtgender_opgender_annotations_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|405.9 MB| + +## References + +https://huggingface.co/Cameron/BERT-rtgender-opgender-annotations + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-bert_vllm_gemma2b_llmoversight_0_5_nodropsus_1_en.md b/docs/_posts/ahmedlone127/2024-09-21-bert_vllm_gemma2b_llmoversight_0_5_nodropsus_1_en.md new file mode 100644 index 00000000000000..7103cc148f18d8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-bert_vllm_gemma2b_llmoversight_0_5_nodropsus_1_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_vllm_gemma2b_llmoversight_0_5_nodropsus_1 DistilBertForSequenceClassification from jvelja +author: John Snow Labs +name: bert_vllm_gemma2b_llmoversight_0_5_nodropsus_1 +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_vllm_gemma2b_llmoversight_0_5_nodropsus_1` is a English model originally trained by jvelja. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_vllm_gemma2b_llmoversight_0_5_nodropsus_1_en_5.5.0_3.0_1726953548753.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_vllm_gemma2b_llmoversight_0_5_nodropsus_1_en_5.5.0_3.0_1726953548753.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("bert_vllm_gemma2b_llmoversight_0_5_nodropsus_1","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("bert_vllm_gemma2b_llmoversight_0_5_nodropsus_1", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_vllm_gemma2b_llmoversight_0_5_nodropsus_1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/jvelja/BERT_vllm-gemma2b-llmOversight-0.5-noDropSus_1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-bert_vllm_gemma2b_llmoversight_0_5_nodropsus_1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-bert_vllm_gemma2b_llmoversight_0_5_nodropsus_1_pipeline_en.md new file mode 100644 index 00000000000000..e04a51324170e5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-bert_vllm_gemma2b_llmoversight_0_5_nodropsus_1_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_vllm_gemma2b_llmoversight_0_5_nodropsus_1_pipeline pipeline DistilBertForSequenceClassification from jvelja +author: John Snow Labs +name: bert_vllm_gemma2b_llmoversight_0_5_nodropsus_1_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_vllm_gemma2b_llmoversight_0_5_nodropsus_1_pipeline` is a English model originally trained by jvelja. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_vllm_gemma2b_llmoversight_0_5_nodropsus_1_pipeline_en_5.5.0_3.0_1726953561042.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_vllm_gemma2b_llmoversight_0_5_nodropsus_1_pipeline_en_5.5.0_3.0_1726953561042.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_vllm_gemma2b_llmoversight_0_5_nodropsus_1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_vllm_gemma2b_llmoversight_0_5_nodropsus_1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_vllm_gemma2b_llmoversight_0_5_nodropsus_1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/jvelja/BERT_vllm-gemma2b-llmOversight-0.5-noDropSus_1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-bertouch_en.md b/docs/_posts/ahmedlone127/2024-09-21-bertouch_en.md new file mode 100644 index 00000000000000..855295e62e68ec --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-bertouch_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bertouch BertForSequenceClassification from AbderrahmanSkiredj1 +author: John Snow Labs +name: bertouch +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bertouch` is a English model originally trained by AbderrahmanSkiredj1. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bertouch_en_5.5.0_3.0_1726954946471.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bertouch_en_5.5.0_3.0_1726954946471.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bertouch","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bertouch", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bertouch| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|507.3 MB| + +## References + +https://huggingface.co/AbderrahmanSkiredj1/BERTouch \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-bertouch_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-bertouch_pipeline_en.md new file mode 100644 index 00000000000000..0b13191a848571 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-bertouch_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bertouch_pipeline pipeline BertForSequenceClassification from AbderrahmanSkiredj1 +author: John Snow Labs +name: bertouch_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bertouch_pipeline` is a English model originally trained by AbderrahmanSkiredj1. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bertouch_pipeline_en_5.5.0_3.0_1726954969814.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bertouch_pipeline_en_5.5.0_3.0_1726954969814.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bertouch_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bertouch_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bertouch_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|507.4 MB| + +## References + +https://huggingface.co/AbderrahmanSkiredj1/BERTouch + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-berturk_earthquake_tweets_classification_en.md b/docs/_posts/ahmedlone127/2024-09-21-berturk_earthquake_tweets_classification_en.md new file mode 100644 index 00000000000000..dda949b2e134f3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-berturk_earthquake_tweets_classification_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English berturk_earthquake_tweets_classification BertForSequenceClassification from yhaslan +author: John Snow Labs +name: berturk_earthquake_tweets_classification +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`berturk_earthquake_tweets_classification` is a English model originally trained by yhaslan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/berturk_earthquake_tweets_classification_en_5.5.0_3.0_1726955822332.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/berturk_earthquake_tweets_classification_en_5.5.0_3.0_1726955822332.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("berturk_earthquake_tweets_classification","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("berturk_earthquake_tweets_classification", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|berturk_earthquake_tweets_classification| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|414.5 MB| + +## References + +https://huggingface.co/yhaslan/berturk-earthquake-tweets-classification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-berturk_earthquake_tweets_classification_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-berturk_earthquake_tweets_classification_pipeline_en.md new file mode 100644 index 00000000000000..1c4e8d31ebdb39 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-berturk_earthquake_tweets_classification_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English berturk_earthquake_tweets_classification_pipeline pipeline BertForSequenceClassification from yhaslan +author: John Snow Labs +name: berturk_earthquake_tweets_classification_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`berturk_earthquake_tweets_classification_pipeline` is a English model originally trained by yhaslan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/berturk_earthquake_tweets_classification_pipeline_en_5.5.0_3.0_1726955844805.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/berturk_earthquake_tweets_classification_pipeline_en_5.5.0_3.0_1726955844805.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("berturk_earthquake_tweets_classification_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("berturk_earthquake_tweets_classification_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|berturk_earthquake_tweets_classification_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|414.5 MB| + +## References + +https://huggingface.co/yhaslan/berturk-earthquake-tweets-classification + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-bertweet_large_reddit_gab_16000sample_en.md b/docs/_posts/ahmedlone127/2024-09-21-bertweet_large_reddit_gab_16000sample_en.md new file mode 100644 index 00000000000000..282095292321eb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-bertweet_large_reddit_gab_16000sample_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bertweet_large_reddit_gab_16000sample RoBertaEmbeddings from HPL +author: John Snow Labs +name: bertweet_large_reddit_gab_16000sample +date: 2024-09-21 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bertweet_large_reddit_gab_16000sample` is a English model originally trained by HPL. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bertweet_large_reddit_gab_16000sample_en_5.5.0_3.0_1726957849707.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bertweet_large_reddit_gab_16000sample_en_5.5.0_3.0_1726957849707.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("bertweet_large_reddit_gab_16000sample","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("bertweet_large_reddit_gab_16000sample","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bertweet_large_reddit_gab_16000sample| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/HPL/bertweet-large-reddit-gab-16000sample \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-bertweet_large_reddit_gab_16000sample_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-bertweet_large_reddit_gab_16000sample_pipeline_en.md new file mode 100644 index 00000000000000..0e52e5d2650367 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-bertweet_large_reddit_gab_16000sample_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bertweet_large_reddit_gab_16000sample_pipeline pipeline RoBertaEmbeddings from HPL +author: John Snow Labs +name: bertweet_large_reddit_gab_16000sample_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bertweet_large_reddit_gab_16000sample_pipeline` is a English model originally trained by HPL. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bertweet_large_reddit_gab_16000sample_pipeline_en_5.5.0_3.0_1726957912393.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bertweet_large_reddit_gab_16000sample_pipeline_en_5.5.0_3.0_1726957912393.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bertweet_large_reddit_gab_16000sample_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bertweet_large_reddit_gab_16000sample_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bertweet_large_reddit_gab_16000sample_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/HPL/bertweet-large-reddit-gab-16000sample + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-brwac_v1_2__checkpoint_8_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-brwac_v1_2__checkpoint_8_pipeline_en.md new file mode 100644 index 00000000000000..82dddfa545aeb8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-brwac_v1_2__checkpoint_8_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English brwac_v1_2__checkpoint_8_pipeline pipeline RoBertaEmbeddings from eduagarcia-temp +author: John Snow Labs +name: brwac_v1_2__checkpoint_8_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`brwac_v1_2__checkpoint_8_pipeline` is a English model originally trained by eduagarcia-temp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/brwac_v1_2__checkpoint_8_pipeline_en_5.5.0_3.0_1726942690545.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/brwac_v1_2__checkpoint_8_pipeline_en_5.5.0_3.0_1726942690545.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("brwac_v1_2__checkpoint_8_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("brwac_v1_2__checkpoint_8_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|brwac_v1_2__checkpoint_8_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|298.4 MB| + +## References + +https://huggingface.co/eduagarcia-temp/brwac_v1_2__checkpoint_8 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-brwac_v1_3__checkpoint_last_en.md b/docs/_posts/ahmedlone127/2024-09-21-brwac_v1_3__checkpoint_last_en.md new file mode 100644 index 00000000000000..cb14f083793de3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-brwac_v1_3__checkpoint_last_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English brwac_v1_3__checkpoint_last RoBertaEmbeddings from eduagarcia-temp +author: John Snow Labs +name: brwac_v1_3__checkpoint_last +date: 2024-09-21 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`brwac_v1_3__checkpoint_last` is a English model originally trained by eduagarcia-temp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/brwac_v1_3__checkpoint_last_en_5.5.0_3.0_1726943523628.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/brwac_v1_3__checkpoint_last_en_5.5.0_3.0_1726943523628.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("brwac_v1_3__checkpoint_last","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("brwac_v1_3__checkpoint_last","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|brwac_v1_3__checkpoint_last| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|298.6 MB| + +## References + +https://huggingface.co/eduagarcia-temp/brwac_v1_3__checkpoint_last \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-brwac_v1_3__checkpoint_last_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-brwac_v1_3__checkpoint_last_pipeline_en.md new file mode 100644 index 00000000000000..e2ed9d6b4ec186 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-brwac_v1_3__checkpoint_last_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English brwac_v1_3__checkpoint_last_pipeline pipeline RoBertaEmbeddings from eduagarcia-temp +author: John Snow Labs +name: brwac_v1_3__checkpoint_last_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`brwac_v1_3__checkpoint_last_pipeline` is a English model originally trained by eduagarcia-temp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/brwac_v1_3__checkpoint_last_pipeline_en_5.5.0_3.0_1726943612362.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/brwac_v1_3__checkpoint_last_pipeline_en_5.5.0_3.0_1726943612362.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("brwac_v1_3__checkpoint_last_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("brwac_v1_3__checkpoint_last_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|brwac_v1_3__checkpoint_last_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|298.6 MB| + +## References + +https://huggingface.co/eduagarcia-temp/brwac_v1_3__checkpoint_last + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-buntesgelaber_en.md b/docs/_posts/ahmedlone127/2024-09-21-buntesgelaber_en.md new file mode 100644 index 00000000000000..0ed58ee40c125d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-buntesgelaber_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English buntesgelaber RoBertaEmbeddings from Janst1000 +author: John Snow Labs +name: buntesgelaber +date: 2024-09-21 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`buntesgelaber` is a English model originally trained by Janst1000. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/buntesgelaber_en_5.5.0_3.0_1726934185614.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/buntesgelaber_en_5.5.0_3.0_1726934185614.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("buntesgelaber","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("buntesgelaber","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|buntesgelaber| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|312.0 MB| + +## References + +https://huggingface.co/Janst1000/buntesgelaber \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-buntesgelaber_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-buntesgelaber_pipeline_en.md new file mode 100644 index 00000000000000..eea55926ce3a3a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-buntesgelaber_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English buntesgelaber_pipeline pipeline RoBertaEmbeddings from Janst1000 +author: John Snow Labs +name: buntesgelaber_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`buntesgelaber_pipeline` is a English model originally trained by Janst1000. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/buntesgelaber_pipeline_en_5.5.0_3.0_1726934199700.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/buntesgelaber_pipeline_en_5.5.0_3.0_1726934199700.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("buntesgelaber_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("buntesgelaber_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|buntesgelaber_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|312.0 MB| + +## References + +https://huggingface.co/Janst1000/buntesgelaber + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-burmese_awesome_eli5_mlm_model_chunwoolee0_en.md b/docs/_posts/ahmedlone127/2024-09-21-burmese_awesome_eli5_mlm_model_chunwoolee0_en.md new file mode 100644 index 00000000000000..d8d67f092026fb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-burmese_awesome_eli5_mlm_model_chunwoolee0_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_awesome_eli5_mlm_model_chunwoolee0 RoBertaEmbeddings from chunwoolee0 +author: John Snow Labs +name: burmese_awesome_eli5_mlm_model_chunwoolee0 +date: 2024-09-21 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_eli5_mlm_model_chunwoolee0` is a English model originally trained by chunwoolee0. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_eli5_mlm_model_chunwoolee0_en_5.5.0_3.0_1726957673399.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_eli5_mlm_model_chunwoolee0_en_5.5.0_3.0_1726957673399.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("burmese_awesome_eli5_mlm_model_chunwoolee0","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("burmese_awesome_eli5_mlm_model_chunwoolee0","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_eli5_mlm_model_chunwoolee0| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|306.5 MB| + +## References + +https://huggingface.co/chunwoolee0/my_awesome_eli5_mlm_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-burmese_awesome_eli5_mlm_model_chunwoolee0_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-burmese_awesome_eli5_mlm_model_chunwoolee0_pipeline_en.md new file mode 100644 index 00000000000000..a06b26ba5ec9eb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-burmese_awesome_eli5_mlm_model_chunwoolee0_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_eli5_mlm_model_chunwoolee0_pipeline pipeline RoBertaEmbeddings from chunwoolee0 +author: John Snow Labs +name: burmese_awesome_eli5_mlm_model_chunwoolee0_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_eli5_mlm_model_chunwoolee0_pipeline` is a English model originally trained by chunwoolee0. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_eli5_mlm_model_chunwoolee0_pipeline_en_5.5.0_3.0_1726957688046.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_eli5_mlm_model_chunwoolee0_pipeline_en_5.5.0_3.0_1726957688046.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_eli5_mlm_model_chunwoolee0_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_eli5_mlm_model_chunwoolee0_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_eli5_mlm_model_chunwoolee0_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|306.5 MB| + +## References + +https://huggingface.co/chunwoolee0/my_awesome_eli5_mlm_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-burmese_awesome_eli5_mlm_model_confunius_en.md b/docs/_posts/ahmedlone127/2024-09-21-burmese_awesome_eli5_mlm_model_confunius_en.md new file mode 100644 index 00000000000000..5a4994a87f624b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-burmese_awesome_eli5_mlm_model_confunius_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_awesome_eli5_mlm_model_confunius RoBertaEmbeddings from confunius +author: John Snow Labs +name: burmese_awesome_eli5_mlm_model_confunius +date: 2024-09-21 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_eli5_mlm_model_confunius` is a English model originally trained by confunius. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_eli5_mlm_model_confunius_en_5.5.0_3.0_1726934698843.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_eli5_mlm_model_confunius_en_5.5.0_3.0_1726934698843.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("burmese_awesome_eli5_mlm_model_confunius","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("burmese_awesome_eli5_mlm_model_confunius","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_eli5_mlm_model_confunius| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|306.5 MB| + +## References + +https://huggingface.co/confunius/my_awesome_eli5_mlm_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-burmese_awesome_eli5_mlm_model_confunius_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-burmese_awesome_eli5_mlm_model_confunius_pipeline_en.md new file mode 100644 index 00000000000000..3138cde892a874 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-burmese_awesome_eli5_mlm_model_confunius_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_eli5_mlm_model_confunius_pipeline pipeline RoBertaEmbeddings from confunius +author: John Snow Labs +name: burmese_awesome_eli5_mlm_model_confunius_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_eli5_mlm_model_confunius_pipeline` is a English model originally trained by confunius. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_eli5_mlm_model_confunius_pipeline_en_5.5.0_3.0_1726934712999.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_eli5_mlm_model_confunius_pipeline_en_5.5.0_3.0_1726934712999.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_eli5_mlm_model_confunius_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_eli5_mlm_model_confunius_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_eli5_mlm_model_confunius_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|306.5 MB| + +## References + +https://huggingface.co/confunius/my_awesome_eli5_mlm_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-burmese_awesome_eli5_mlm_model_jaiiiiii_en.md b/docs/_posts/ahmedlone127/2024-09-21-burmese_awesome_eli5_mlm_model_jaiiiiii_en.md new file mode 100644 index 00000000000000..6ea4faef930779 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-burmese_awesome_eli5_mlm_model_jaiiiiii_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_awesome_eli5_mlm_model_jaiiiiii RoBertaEmbeddings from Jaiiiiii +author: John Snow Labs +name: burmese_awesome_eli5_mlm_model_jaiiiiii +date: 2024-09-21 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_eli5_mlm_model_jaiiiiii` is a English model originally trained by Jaiiiiii. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_eli5_mlm_model_jaiiiiii_en_5.5.0_3.0_1726943731472.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_eli5_mlm_model_jaiiiiii_en_5.5.0_3.0_1726943731472.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("burmese_awesome_eli5_mlm_model_jaiiiiii","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("burmese_awesome_eli5_mlm_model_jaiiiiii","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_eli5_mlm_model_jaiiiiii| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|306.3 MB| + +## References + +https://huggingface.co/Jaiiiiii/my_awesome_eli5_mlm_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-burmese_awesome_eli5_mlm_model_jaiiiiii_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-burmese_awesome_eli5_mlm_model_jaiiiiii_pipeline_en.md new file mode 100644 index 00000000000000..9aca11f509c482 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-burmese_awesome_eli5_mlm_model_jaiiiiii_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_eli5_mlm_model_jaiiiiii_pipeline pipeline RoBertaEmbeddings from Jaiiiiii +author: John Snow Labs +name: burmese_awesome_eli5_mlm_model_jaiiiiii_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_eli5_mlm_model_jaiiiiii_pipeline` is a English model originally trained by Jaiiiiii. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_eli5_mlm_model_jaiiiiii_pipeline_en_5.5.0_3.0_1726943746270.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_eli5_mlm_model_jaiiiiii_pipeline_en_5.5.0_3.0_1726943746270.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_eli5_mlm_model_jaiiiiii_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_eli5_mlm_model_jaiiiiii_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_eli5_mlm_model_jaiiiiii_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|306.3 MB| + +## References + +https://huggingface.co/Jaiiiiii/my_awesome_eli5_mlm_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-burmese_awesome_eli5_mlm_model_skotha_en.md b/docs/_posts/ahmedlone127/2024-09-21-burmese_awesome_eli5_mlm_model_skotha_en.md new file mode 100644 index 00000000000000..1400ee9d44a980 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-burmese_awesome_eli5_mlm_model_skotha_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_awesome_eli5_mlm_model_skotha RoBertaEmbeddings from skotha +author: John Snow Labs +name: burmese_awesome_eli5_mlm_model_skotha +date: 2024-09-21 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_eli5_mlm_model_skotha` is a English model originally trained by skotha. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_eli5_mlm_model_skotha_en_5.5.0_3.0_1726934226103.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_eli5_mlm_model_skotha_en_5.5.0_3.0_1726934226103.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("burmese_awesome_eli5_mlm_model_skotha","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("burmese_awesome_eli5_mlm_model_skotha","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_eli5_mlm_model_skotha| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|306.5 MB| + +## References + +https://huggingface.co/skotha/my_awesome_eli5_mlm_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-burmese_awesome_eli5_mlm_model_skotha_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-burmese_awesome_eli5_mlm_model_skotha_pipeline_en.md new file mode 100644 index 00000000000000..819513a34f05ad --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-burmese_awesome_eli5_mlm_model_skotha_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_eli5_mlm_model_skotha_pipeline pipeline RoBertaEmbeddings from skotha +author: John Snow Labs +name: burmese_awesome_eli5_mlm_model_skotha_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_eli5_mlm_model_skotha_pipeline` is a English model originally trained by skotha. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_eli5_mlm_model_skotha_pipeline_en_5.5.0_3.0_1726934240200.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_eli5_mlm_model_skotha_pipeline_en_5.5.0_3.0_1726934240200.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_eli5_mlm_model_skotha_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_eli5_mlm_model_skotha_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_eli5_mlm_model_skotha_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|306.5 MB| + +## References + +https://huggingface.co/skotha/my_awesome_eli5_mlm_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-burmese_awesome_eli5_mlm_model_vulture_en.md b/docs/_posts/ahmedlone127/2024-09-21-burmese_awesome_eli5_mlm_model_vulture_en.md new file mode 100644 index 00000000000000..d8332a37060d06 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-burmese_awesome_eli5_mlm_model_vulture_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_awesome_eli5_mlm_model_vulture RoBertaEmbeddings from vulture +author: John Snow Labs +name: burmese_awesome_eli5_mlm_model_vulture +date: 2024-09-21 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_eli5_mlm_model_vulture` is a English model originally trained by vulture. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_eli5_mlm_model_vulture_en_5.5.0_3.0_1726934244530.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_eli5_mlm_model_vulture_en_5.5.0_3.0_1726934244530.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("burmese_awesome_eli5_mlm_model_vulture","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("burmese_awesome_eli5_mlm_model_vulture","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_eli5_mlm_model_vulture| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|306.5 MB| + +## References + +https://huggingface.co/vulture/my_awesome_eli5_mlm_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-burmese_awesome_eli5_mlm_model_vulture_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-burmese_awesome_eli5_mlm_model_vulture_pipeline_en.md new file mode 100644 index 00000000000000..4dd638a385eeb2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-burmese_awesome_eli5_mlm_model_vulture_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_eli5_mlm_model_vulture_pipeline pipeline RoBertaEmbeddings from vulture +author: John Snow Labs +name: burmese_awesome_eli5_mlm_model_vulture_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_eli5_mlm_model_vulture_pipeline` is a English model originally trained by vulture. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_eli5_mlm_model_vulture_pipeline_en_5.5.0_3.0_1726934258765.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_eli5_mlm_model_vulture_pipeline_en_5.5.0_3.0_1726934258765.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_eli5_mlm_model_vulture_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_eli5_mlm_model_vulture_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_eli5_mlm_model_vulture_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|306.5 MB| + +## References + +https://huggingface.co/vulture/my_awesome_eli5_mlm_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-burmese_awesome_model2_amv146_en.md b/docs/_posts/ahmedlone127/2024-09-21-burmese_awesome_model2_amv146_en.md new file mode 100644 index 00000000000000..5fa2dd9591ed15 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-burmese_awesome_model2_amv146_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_awesome_model2_amv146 DistilBertForSequenceClassification from amv146 +author: John Snow Labs +name: burmese_awesome_model2_amv146 +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model2_amv146` is a English model originally trained by amv146. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model2_amv146_en_5.5.0_3.0_1726888930652.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model2_amv146_en_5.5.0_3.0_1726888930652.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model2_amv146","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model2_amv146", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model2_amv146| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/amv146/my_awesome_model2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-burmese_awesome_model2_amv146_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-burmese_awesome_model2_amv146_pipeline_en.md new file mode 100644 index 00000000000000..78baf0cf09d1b4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-burmese_awesome_model2_amv146_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_model2_amv146_pipeline pipeline DistilBertForSequenceClassification from amv146 +author: John Snow Labs +name: burmese_awesome_model2_amv146_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model2_amv146_pipeline` is a English model originally trained by amv146. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model2_amv146_pipeline_en_5.5.0_3.0_1726888943028.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model2_amv146_pipeline_en_5.5.0_3.0_1726888943028.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_model2_amv146_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_model2_amv146_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model2_amv146_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/amv146/my_awesome_model2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-burmese_awesome_model_5cean_en.md b/docs/_posts/ahmedlone127/2024-09-21-burmese_awesome_model_5cean_en.md new file mode 100644 index 00000000000000..13017edb7b16d3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-burmese_awesome_model_5cean_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_awesome_model_5cean DistilBertForSequenceClassification from 5cean +author: John Snow Labs +name: burmese_awesome_model_5cean +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_5cean` is a English model originally trained by 5cean. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_5cean_en_5.5.0_3.0_1726953461564.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_5cean_en_5.5.0_3.0_1726953461564.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_5cean","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_5cean", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_5cean| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/5cean/my_awesome_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-burmese_awesome_model_5cean_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-burmese_awesome_model_5cean_pipeline_en.md new file mode 100644 index 00000000000000..eef05a426534d7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-burmese_awesome_model_5cean_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_model_5cean_pipeline pipeline DistilBertForSequenceClassification from 5cean +author: John Snow Labs +name: burmese_awesome_model_5cean_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_5cean_pipeline` is a English model originally trained by 5cean. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_5cean_pipeline_en_5.5.0_3.0_1726953474246.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_5cean_pipeline_en_5.5.0_3.0_1726953474246.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_model_5cean_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_model_5cean_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_5cean_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/5cean/my_awesome_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-burmese_awesome_model_catbult_en.md b/docs/_posts/ahmedlone127/2024-09-21-burmese_awesome_model_catbult_en.md new file mode 100644 index 00000000000000..6666a849f40bf9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-burmese_awesome_model_catbult_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_awesome_model_catbult DistilBertForSequenceClassification from catbult +author: John Snow Labs +name: burmese_awesome_model_catbult +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_catbult` is a English model originally trained by catbult. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_catbult_en_5.5.0_3.0_1726953276609.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_catbult_en_5.5.0_3.0_1726953276609.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_catbult","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_catbult", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_catbult| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/catbult/my_awesome_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-burmese_awesome_model_catbult_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-burmese_awesome_model_catbult_pipeline_en.md new file mode 100644 index 00000000000000..6f1b7ee0182461 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-burmese_awesome_model_catbult_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_model_catbult_pipeline pipeline DistilBertForSequenceClassification from catbult +author: John Snow Labs +name: burmese_awesome_model_catbult_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_catbult_pipeline` is a English model originally trained by catbult. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_catbult_pipeline_en_5.5.0_3.0_1726953289116.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_catbult_pipeline_en_5.5.0_3.0_1726953289116.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_model_catbult_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_model_catbult_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_catbult_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/catbult/my_awesome_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-burmese_awesome_model_ilanpar_en.md b/docs/_posts/ahmedlone127/2024-09-21-burmese_awesome_model_ilanpar_en.md new file mode 100644 index 00000000000000..d817070c069c5d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-burmese_awesome_model_ilanpar_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_awesome_model_ilanpar DistilBertForSequenceClassification from ilanPar +author: John Snow Labs +name: burmese_awesome_model_ilanpar +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_ilanpar` is a English model originally trained by ilanPar. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_ilanpar_en_5.5.0_3.0_1726953059247.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_ilanpar_en_5.5.0_3.0_1726953059247.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_ilanpar","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_ilanpar", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_ilanpar| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/ilanPar/my_awesome_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-burmese_awesome_model_ilanpar_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-burmese_awesome_model_ilanpar_pipeline_en.md new file mode 100644 index 00000000000000..554440c2c25a73 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-burmese_awesome_model_ilanpar_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_model_ilanpar_pipeline pipeline DistilBertForSequenceClassification from ilanPar +author: John Snow Labs +name: burmese_awesome_model_ilanpar_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_ilanpar_pipeline` is a English model originally trained by ilanPar. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_ilanpar_pipeline_en_5.5.0_3.0_1726953071581.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_ilanpar_pipeline_en_5.5.0_3.0_1726953071581.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_model_ilanpar_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_model_ilanpar_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_ilanpar_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/ilanPar/my_awesome_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-burmese_awesome_model_maggiezhang_en.md b/docs/_posts/ahmedlone127/2024-09-21-burmese_awesome_model_maggiezhang_en.md new file mode 100644 index 00000000000000..aed51cd747707a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-burmese_awesome_model_maggiezhang_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_awesome_model_maggiezhang DistilBertForSequenceClassification from MaggieZhang +author: John Snow Labs +name: burmese_awesome_model_maggiezhang +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_maggiezhang` is a English model originally trained by MaggieZhang. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_maggiezhang_en_5.5.0_3.0_1726884550680.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_maggiezhang_en_5.5.0_3.0_1726884550680.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_maggiezhang","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_maggiezhang", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_maggiezhang| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/MaggieZhang/my_awesome_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-burmese_awesome_model_maggiezhang_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-burmese_awesome_model_maggiezhang_pipeline_en.md new file mode 100644 index 00000000000000..f3982dfd5148f6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-burmese_awesome_model_maggiezhang_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_model_maggiezhang_pipeline pipeline DistilBertForSequenceClassification from MaggieZhang +author: John Snow Labs +name: burmese_awesome_model_maggiezhang_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_maggiezhang_pipeline` is a English model originally trained by MaggieZhang. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_maggiezhang_pipeline_en_5.5.0_3.0_1726884562560.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_maggiezhang_pipeline_en_5.5.0_3.0_1726884562560.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_model_maggiezhang_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_model_maggiezhang_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_maggiezhang_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/MaggieZhang/my_awesome_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-burmese_awesome_model_riaraju_en.md b/docs/_posts/ahmedlone127/2024-09-21-burmese_awesome_model_riaraju_en.md new file mode 100644 index 00000000000000..246264f1c82364 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-burmese_awesome_model_riaraju_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_awesome_model_riaraju DistilBertForSequenceClassification from riaraju +author: John Snow Labs +name: burmese_awesome_model_riaraju +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_riaraju` is a English model originally trained by riaraju. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_riaraju_en_5.5.0_3.0_1726884936396.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_riaraju_en_5.5.0_3.0_1726884936396.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_riaraju","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_riaraju", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_riaraju| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/riaraju/my_awesome_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-burmese_awesome_model_riaraju_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-burmese_awesome_model_riaraju_pipeline_en.md new file mode 100644 index 00000000000000..5e2b82e6f64e8a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-burmese_awesome_model_riaraju_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_model_riaraju_pipeline pipeline DistilBertForSequenceClassification from riaraju +author: John Snow Labs +name: burmese_awesome_model_riaraju_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_riaraju_pipeline` is a English model originally trained by riaraju. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_riaraju_pipeline_en_5.5.0_3.0_1726884947900.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_riaraju_pipeline_en_5.5.0_3.0_1726884947900.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_model_riaraju_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_model_riaraju_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_riaraju_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/riaraju/my_awesome_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-burmese_awesome_model_rozzacreat_en.md b/docs/_posts/ahmedlone127/2024-09-21-burmese_awesome_model_rozzacreat_en.md new file mode 100644 index 00000000000000..f72186a920d5e4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-burmese_awesome_model_rozzacreat_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_awesome_model_rozzacreat DistilBertForSequenceClassification from RozzaCreat +author: John Snow Labs +name: burmese_awesome_model_rozzacreat +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_rozzacreat` is a English model originally trained by RozzaCreat. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_rozzacreat_en_5.5.0_3.0_1726884740846.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_rozzacreat_en_5.5.0_3.0_1726884740846.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_rozzacreat","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_rozzacreat", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_rozzacreat| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/RozzaCreat/my_awesome_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-burmese_awesome_model_rozzacreat_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-burmese_awesome_model_rozzacreat_pipeline_en.md new file mode 100644 index 00000000000000..69417c88ca137a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-burmese_awesome_model_rozzacreat_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_model_rozzacreat_pipeline pipeline DistilBertForSequenceClassification from RozzaCreat +author: John Snow Labs +name: burmese_awesome_model_rozzacreat_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_rozzacreat_pipeline` is a English model originally trained by RozzaCreat. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_rozzacreat_pipeline_en_5.5.0_3.0_1726884753556.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_rozzacreat_pipeline_en_5.5.0_3.0_1726884753556.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_model_rozzacreat_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_model_rozzacreat_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_rozzacreat_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/RozzaCreat/my_awesome_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-burmese_awesome_model_tjspross_en.md b/docs/_posts/ahmedlone127/2024-09-21-burmese_awesome_model_tjspross_en.md new file mode 100644 index 00000000000000..5f3459d772a4cf --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-burmese_awesome_model_tjspross_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_awesome_model_tjspross DistilBertForSequenceClassification from tjspross +author: John Snow Labs +name: burmese_awesome_model_tjspross +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_tjspross` is a English model originally trained by tjspross. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_tjspross_en_5.5.0_3.0_1726953106447.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_tjspross_en_5.5.0_3.0_1726953106447.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_tjspross","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_tjspross", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_tjspross| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/tjspross/my_awesome_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-burmese_awesome_model_tjspross_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-burmese_awesome_model_tjspross_pipeline_en.md new file mode 100644 index 00000000000000..22dbf07d64765e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-burmese_awesome_model_tjspross_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_model_tjspross_pipeline pipeline DistilBertForSequenceClassification from tjspross +author: John Snow Labs +name: burmese_awesome_model_tjspross_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_tjspross_pipeline` is a English model originally trained by tjspross. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_tjspross_pipeline_en_5.5.0_3.0_1726953118519.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_tjspross_pipeline_en_5.5.0_3.0_1726953118519.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_model_tjspross_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_model_tjspross_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_tjspross_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/tjspross/my_awesome_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-burmese_awesome_model_tomchristensen474_en.md b/docs/_posts/ahmedlone127/2024-09-21-burmese_awesome_model_tomchristensen474_en.md new file mode 100644 index 00000000000000..2607c1ad9f62a8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-burmese_awesome_model_tomchristensen474_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_awesome_model_tomchristensen474 DistilBertForSequenceClassification from TomChristensen474 +author: John Snow Labs +name: burmese_awesome_model_tomchristensen474 +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_tomchristensen474` is a English model originally trained by TomChristensen474. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_tomchristensen474_en_5.5.0_3.0_1726924004582.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_tomchristensen474_en_5.5.0_3.0_1726924004582.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_tomchristensen474","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_tomchristensen474", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_tomchristensen474| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/TomChristensen474/my_awesome_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-burmese_awesome_model_tomchristensen474_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-burmese_awesome_model_tomchristensen474_pipeline_en.md new file mode 100644 index 00000000000000..bf8798cd9a8a32 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-burmese_awesome_model_tomchristensen474_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_model_tomchristensen474_pipeline pipeline DistilBertForSequenceClassification from TomChristensen474 +author: John Snow Labs +name: burmese_awesome_model_tomchristensen474_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_tomchristensen474_pipeline` is a English model originally trained by TomChristensen474. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_tomchristensen474_pipeline_en_5.5.0_3.0_1726924016705.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_tomchristensen474_pipeline_en_5.5.0_3.0_1726924016705.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_model_tomchristensen474_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_model_tomchristensen474_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_tomchristensen474_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/TomChristensen474/my_awesome_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-burmese_bert_question_answering_model3_en.md b/docs/_posts/ahmedlone127/2024-09-21-burmese_bert_question_answering_model3_en.md new file mode 100644 index 00000000000000..2b7198904ba657 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-burmese_bert_question_answering_model3_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English burmese_bert_question_answering_model3 BertForQuestionAnswering from Ashkh0099 +author: John Snow Labs +name: burmese_bert_question_answering_model3 +date: 2024-09-21 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_bert_question_answering_model3` is a English model originally trained by Ashkh0099. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_bert_question_answering_model3_en_5.5.0_3.0_1726921735109.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_bert_question_answering_model3_en_5.5.0_3.0_1726921735109.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("burmese_bert_question_answering_model3","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("burmese_bert_question_answering_model3", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_bert_question_answering_model3| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/Ashkh0099/my-bert-question-answering-model3 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-burmese_bert_question_answering_model3_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-burmese_bert_question_answering_model3_pipeline_en.md new file mode 100644 index 00000000000000..12047dbafa9de1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-burmese_bert_question_answering_model3_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English burmese_bert_question_answering_model3_pipeline pipeline BertForQuestionAnswering from Ashkh0099 +author: John Snow Labs +name: burmese_bert_question_answering_model3_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_bert_question_answering_model3_pipeline` is a English model originally trained by Ashkh0099. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_bert_question_answering_model3_pipeline_en_5.5.0_3.0_1726921754492.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_bert_question_answering_model3_pipeline_en_5.5.0_3.0_1726921754492.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_bert_question_answering_model3_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_bert_question_answering_model3_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_bert_question_answering_model3_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/Ashkh0099/my-bert-question-answering-model3 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-burmese_distilbert_imdb_model_en.md b/docs/_posts/ahmedlone127/2024-09-21-burmese_distilbert_imdb_model_en.md new file mode 100644 index 00000000000000..10c517060cde58 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-burmese_distilbert_imdb_model_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_distilbert_imdb_model DistilBertForSequenceClassification from chaseme +author: John Snow Labs +name: burmese_distilbert_imdb_model +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_distilbert_imdb_model` is a English model originally trained by chaseme. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_distilbert_imdb_model_en_5.5.0_3.0_1726923712543.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_distilbert_imdb_model_en_5.5.0_3.0_1726923712543.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_distilbert_imdb_model","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_distilbert_imdb_model", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_distilbert_imdb_model| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/chaseme/my_distilbert_imdb_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-burmese_distilbert_imdb_model_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-burmese_distilbert_imdb_model_pipeline_en.md new file mode 100644 index 00000000000000..cc9a33e7124b06 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-burmese_distilbert_imdb_model_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_distilbert_imdb_model_pipeline pipeline DistilBertForSequenceClassification from chaseme +author: John Snow Labs +name: burmese_distilbert_imdb_model_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_distilbert_imdb_model_pipeline` is a English model originally trained by chaseme. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_distilbert_imdb_model_pipeline_en_5.5.0_3.0_1726923727382.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_distilbert_imdb_model_pipeline_en_5.5.0_3.0_1726923727382.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_distilbert_imdb_model_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_distilbert_imdb_model_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_distilbert_imdb_model_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/chaseme/my_distilbert_imdb_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-cagbert_base_fl32_checkpoint_15852_de.md b/docs/_posts/ahmedlone127/2024-09-21-cagbert_base_fl32_checkpoint_15852_de.md new file mode 100644 index 00000000000000..dc66d32216a527 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-cagbert_base_fl32_checkpoint_15852_de.md @@ -0,0 +1,94 @@ +--- +layout: model +title: German cagbert_base_fl32_checkpoint_15852 BertForTokenClassification from MSey +author: John Snow Labs +name: cagbert_base_fl32_checkpoint_15852 +date: 2024-09-21 +tags: [de, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: de +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`cagbert_base_fl32_checkpoint_15852` is a German model originally trained by MSey. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/cagbert_base_fl32_checkpoint_15852_de_5.5.0_3.0_1726890042532.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/cagbert_base_fl32_checkpoint_15852_de_5.5.0_3.0_1726890042532.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("cagbert_base_fl32_checkpoint_15852","de") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("cagbert_base_fl32_checkpoint_15852", "de") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|cagbert_base_fl32_checkpoint_15852| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|de| +|Size:|409.8 MB| + +## References + +https://huggingface.co/MSey/CaGBERT-base_fl32_checkpoint-15852 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-cagbert_base_fl32_checkpoint_15852_pipeline_de.md b/docs/_posts/ahmedlone127/2024-09-21-cagbert_base_fl32_checkpoint_15852_pipeline_de.md new file mode 100644 index 00000000000000..f148d68e3ae028 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-cagbert_base_fl32_checkpoint_15852_pipeline_de.md @@ -0,0 +1,70 @@ +--- +layout: model +title: German cagbert_base_fl32_checkpoint_15852_pipeline pipeline BertForTokenClassification from MSey +author: John Snow Labs +name: cagbert_base_fl32_checkpoint_15852_pipeline +date: 2024-09-21 +tags: [de, open_source, pipeline, onnx] +task: Named Entity Recognition +language: de +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`cagbert_base_fl32_checkpoint_15852_pipeline` is a German model originally trained by MSey. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/cagbert_base_fl32_checkpoint_15852_pipeline_de_5.5.0_3.0_1726890061211.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/cagbert_base_fl32_checkpoint_15852_pipeline_de_5.5.0_3.0_1726890061211.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("cagbert_base_fl32_checkpoint_15852_pipeline", lang = "de") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("cagbert_base_fl32_checkpoint_15852_pipeline", lang = "de") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|cagbert_base_fl32_checkpoint_15852_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|de| +|Size:|409.8 MB| + +## References + +https://huggingface.co/MSey/CaGBERT-base_fl32_checkpoint-15852 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-camembert_base_fr.md b/docs/_posts/ahmedlone127/2024-09-21-camembert_base_fr.md new file mode 100644 index 00000000000000..44b5b6e3ba6dc4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-camembert_base_fr.md @@ -0,0 +1,92 @@ +--- +layout: model +title: CamemBERT Base Model +author: John Snow Labs +name: camembert_base +date: 2024-09-21 +tags: [fr, french, embeddings, camembert, base, open_source, onnx, openvino] +task: Embeddings +language: fr +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: openvino +annotator: CamemBertEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +[CamemBERT](https://arxiv.org/abs/1911.03894) is a state-of-the-art language model for French based on the RoBERTa model. +For further information or requests, please go to [Camembert Website](https://camembert-model.fr/) + +## Predicted Entities + + + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/camembert_base_fr_5.5.0_3.0_1726906183343.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/camembert_base_fr_5.5.0_3.0_1726906183343.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +embeddings = CamemBertEmbeddings.pretrained("camembert_base", "fr") \ +.setInputCols("sentence", "token") \ +.setOutputCol("embeddings") +``` +```scala +val embeddings = CamemBertEmbeddings.pretrained("camembert_base", "fr") +.setInputCols("sentence", "token") +.setOutputCol("embeddings") +``` + +{:.nlu-block} +```python +import nlu +nlu.load("fr.embed.camembert_base").predict("""Put your text here.""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|camembert_base| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[embeddings]| +|Language:|fr| +|Size:|263.6 MB| +|Case sensitive:|true| +|Max sentence length:|512| + +## References + +https://huggingface.co/almanach/camembert-base + +## Benchmarking + +```bash + + +| Model | #params | Arch. | Training data | +|--------------------------------|--------------------------------|-------|-----------------------------------| +| `camembert-base` | 110M | Base | OSCAR (138 GB of text) | +| `camembert/camembert-large` | 335M | Large | CCNet (135 GB of text) | +| `camembert/camembert-base-ccnet` | 110M | Base | CCNet (135 GB of text) | +| `camembert/camembert-base-wikipedia-4gb` | 110M | Base | Wikipedia (4 GB of text) | +| `camembert/camembert-base-oscar-4gb` | 110M | Base | Subsample of OSCAR (4 GB of text) | +| `camembert/camembert-base-ccnet-4gb` | 110M | Base | Subsample of CCNet (4 GB of text) | +``` \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-case_classifier_en.md b/docs/_posts/ahmedlone127/2024-09-21-case_classifier_en.md new file mode 100644 index 00000000000000..d12f7fac092f2b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-case_classifier_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English case_classifier DistilBertForSequenceClassification from LahiruProjects +author: John Snow Labs +name: case_classifier +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`case_classifier` is a English model originally trained by LahiruProjects. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/case_classifier_en_5.5.0_3.0_1726888826809.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/case_classifier_en_5.5.0_3.0_1726888826809.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("case_classifier","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("case_classifier", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|case_classifier| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/LahiruProjects/case-classifier \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-case_classifier_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-case_classifier_pipeline_en.md new file mode 100644 index 00000000000000..eb51ea8582eceb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-case_classifier_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English case_classifier_pipeline pipeline DistilBertForSequenceClassification from LahiruProjects +author: John Snow Labs +name: case_classifier_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`case_classifier_pipeline` is a English model originally trained by LahiruProjects. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/case_classifier_pipeline_en_5.5.0_3.0_1726888838380.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/case_classifier_pipeline_en_5.5.0_3.0_1726888838380.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("case_classifier_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("case_classifier_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|case_classifier_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/LahiruProjects/case-classifier + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-centerpartisan_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-centerpartisan_pipeline_en.md new file mode 100644 index 00000000000000..6f41ab3c4c6550 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-centerpartisan_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English centerpartisan_pipeline pipeline DistilBertForSequenceClassification from spencerh +author: John Snow Labs +name: centerpartisan_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`centerpartisan_pipeline` is a English model originally trained by spencerh. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/centerpartisan_pipeline_en_5.5.0_3.0_1726884480148.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/centerpartisan_pipeline_en_5.5.0_3.0_1726884480148.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("centerpartisan_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("centerpartisan_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|centerpartisan_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/spencerh/centerpartisan + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-clas_0_en.md b/docs/_posts/ahmedlone127/2024-09-21-clas_0_en.md new file mode 100644 index 00000000000000..c3eaf25b172387 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-clas_0_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English clas_0 RoBertaForSequenceClassification from BaronSch +author: John Snow Labs +name: clas_0 +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`clas_0` is a English model originally trained by BaronSch. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/clas_0_en_5.5.0_3.0_1726900777696.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/clas_0_en_5.5.0_3.0_1726900777696.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("clas_0","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("clas_0", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|clas_0| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|468.5 MB| + +## References + +https://huggingface.co/BaronSch/Clas_0 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-clas_0_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-clas_0_pipeline_en.md new file mode 100644 index 00000000000000..843ae3f19bbe31 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-clas_0_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English clas_0_pipeline pipeline RoBertaForSequenceClassification from BaronSch +author: John Snow Labs +name: clas_0_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`clas_0_pipeline` is a English model originally trained by BaronSch. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/clas_0_pipeline_en_5.5.0_3.0_1726900799685.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/clas_0_pipeline_en_5.5.0_3.0_1726900799685.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("clas_0_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("clas_0_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|clas_0_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|468.5 MB| + +## References + +https://huggingface.co/BaronSch/Clas_0 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-clasificadorcorreosoportedistilespanol_dataser_en.md b/docs/_posts/ahmedlone127/2024-09-21-clasificadorcorreosoportedistilespanol_dataser_en.md new file mode 100644 index 00000000000000..6a92a67e6b2a6f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-clasificadorcorreosoportedistilespanol_dataser_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English clasificadorcorreosoportedistilespanol_dataser DistilBertForSequenceClassification from Arodrigo +author: John Snow Labs +name: clasificadorcorreosoportedistilespanol_dataser +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`clasificadorcorreosoportedistilespanol_dataser` is a English model originally trained by Arodrigo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/clasificadorcorreosoportedistilespanol_dataser_en_5.5.0_3.0_1726884564724.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/clasificadorcorreosoportedistilespanol_dataser_en_5.5.0_3.0_1726884564724.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("clasificadorcorreosoportedistilespanol_dataser","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("clasificadorcorreosoportedistilespanol_dataser", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|clasificadorcorreosoportedistilespanol_dataser| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|252.4 MB| + +## References + +https://huggingface.co/Arodrigo/ClasificadorCorreoSoporteDistilEspanol-dataser \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-clasificadorcorreosoportedistilespanol_dataser_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-clasificadorcorreosoportedistilespanol_dataser_pipeline_en.md new file mode 100644 index 00000000000000..3afda0e7ba90c0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-clasificadorcorreosoportedistilespanol_dataser_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English clasificadorcorreosoportedistilespanol_dataser_pipeline pipeline DistilBertForSequenceClassification from Arodrigo +author: John Snow Labs +name: clasificadorcorreosoportedistilespanol_dataser_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`clasificadorcorreosoportedistilespanol_dataser_pipeline` is a English model originally trained by Arodrigo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/clasificadorcorreosoportedistilespanol_dataser_pipeline_en_5.5.0_3.0_1726884577971.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/clasificadorcorreosoportedistilespanol_dataser_pipeline_en_5.5.0_3.0_1726884577971.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("clasificadorcorreosoportedistilespanol_dataser_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("clasificadorcorreosoportedistilespanol_dataser_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|clasificadorcorreosoportedistilespanol_dataser_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|252.5 MB| + +## References + +https://huggingface.co/Arodrigo/ClasificadorCorreoSoporteDistilEspanol-dataser + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-climate_percentage_regression_en.md b/docs/_posts/ahmedlone127/2024-09-21-climate_percentage_regression_en.md new file mode 100644 index 00000000000000..7fce82be6fb9a3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-climate_percentage_regression_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English climate_percentage_regression BertForSequenceClassification from alex-miller +author: John Snow Labs +name: climate_percentage_regression +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`climate_percentage_regression` is a English model originally trained by alex-miller. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/climate_percentage_regression_en_5.5.0_3.0_1726925930088.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/climate_percentage_regression_en_5.5.0_3.0_1726925930088.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("climate_percentage_regression","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("climate_percentage_regression", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|climate_percentage_regression| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|630.9 MB| + +## References + +https://huggingface.co/alex-miller/climate-percentage-regression \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-climate_percentage_regression_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-climate_percentage_regression_pipeline_en.md new file mode 100644 index 00000000000000..a42e872cd3c625 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-climate_percentage_regression_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English climate_percentage_regression_pipeline pipeline BertForSequenceClassification from alex-miller +author: John Snow Labs +name: climate_percentage_regression_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`climate_percentage_regression_pipeline` is a English model originally trained by alex-miller. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/climate_percentage_regression_pipeline_en_5.5.0_3.0_1726925959682.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/climate_percentage_regression_pipeline_en_5.5.0_3.0_1726925959682.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("climate_percentage_regression_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("climate_percentage_regression_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|climate_percentage_regression_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|630.9 MB| + +## References + +https://huggingface.co/alex-miller/climate-percentage-regression + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-clinicalbert_aci_bench_section_classifier_en.md b/docs/_posts/ahmedlone127/2024-09-21-clinicalbert_aci_bench_section_classifier_en.md new file mode 100644 index 00000000000000..ea5f7648af726d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-clinicalbert_aci_bench_section_classifier_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English clinicalbert_aci_bench_section_classifier DistilBertForSequenceClassification from dhananjay2912 +author: John Snow Labs +name: clinicalbert_aci_bench_section_classifier +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`clinicalbert_aci_bench_section_classifier` is a English model originally trained by dhananjay2912. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/clinicalbert_aci_bench_section_classifier_en_5.5.0_3.0_1726953088971.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/clinicalbert_aci_bench_section_classifier_en_5.5.0_3.0_1726953088971.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("clinicalbert_aci_bench_section_classifier","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("clinicalbert_aci_bench_section_classifier", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|clinicalbert_aci_bench_section_classifier| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|507.6 MB| + +## References + +https://huggingface.co/dhananjay2912/clinicalbert_aci_bench_section_classifier \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-clinicalbert_aci_bench_section_classifier_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-clinicalbert_aci_bench_section_classifier_pipeline_en.md new file mode 100644 index 00000000000000..cc466e3ae197c9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-clinicalbert_aci_bench_section_classifier_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English clinicalbert_aci_bench_section_classifier_pipeline pipeline DistilBertForSequenceClassification from dhananjay2912 +author: John Snow Labs +name: clinicalbert_aci_bench_section_classifier_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`clinicalbert_aci_bench_section_classifier_pipeline` is a English model originally trained by dhananjay2912. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/clinicalbert_aci_bench_section_classifier_pipeline_en_5.5.0_3.0_1726953112849.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/clinicalbert_aci_bench_section_classifier_pipeline_en_5.5.0_3.0_1726953112849.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("clinicalbert_aci_bench_section_classifier_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("clinicalbert_aci_bench_section_classifier_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|clinicalbert_aci_bench_section_classifier_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|507.6 MB| + +## References + +https://huggingface.co/dhananjay2912/clinicalbert_aci_bench_section_classifier + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-cnj_large_v1_2_dinkytrain_weird_en.md b/docs/_posts/ahmedlone127/2024-09-21-cnj_large_v1_2_dinkytrain_weird_en.md new file mode 100644 index 00000000000000..18802606d264ee --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-cnj_large_v1_2_dinkytrain_weird_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English cnj_large_v1_2_dinkytrain_weird RoBertaEmbeddings from eduagarcia-temp +author: John Snow Labs +name: cnj_large_v1_2_dinkytrain_weird +date: 2024-09-21 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`cnj_large_v1_2_dinkytrain_weird` is a English model originally trained by eduagarcia-temp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/cnj_large_v1_2_dinkytrain_weird_en_5.5.0_3.0_1726934605802.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/cnj_large_v1_2_dinkytrain_weird_en_5.5.0_3.0_1726934605802.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("cnj_large_v1_2_dinkytrain_weird","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("cnj_large_v1_2_dinkytrain_weird","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|cnj_large_v1_2_dinkytrain_weird| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|845.2 MB| + +## References + +https://huggingface.co/eduagarcia-temp/cnj_large_v1_2_dinkytrain_weird \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-cnj_large_v1_2_dinkytrain_weird_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-cnj_large_v1_2_dinkytrain_weird_pipeline_en.md new file mode 100644 index 00000000000000..115a7bcb4f6bed --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-cnj_large_v1_2_dinkytrain_weird_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English cnj_large_v1_2_dinkytrain_weird_pipeline pipeline RoBertaEmbeddings from eduagarcia-temp +author: John Snow Labs +name: cnj_large_v1_2_dinkytrain_weird_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`cnj_large_v1_2_dinkytrain_weird_pipeline` is a English model originally trained by eduagarcia-temp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/cnj_large_v1_2_dinkytrain_weird_pipeline_en_5.5.0_3.0_1726934845727.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/cnj_large_v1_2_dinkytrain_weird_pipeline_en_5.5.0_3.0_1726934845727.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("cnj_large_v1_2_dinkytrain_weird_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("cnj_large_v1_2_dinkytrain_weird_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|cnj_large_v1_2_dinkytrain_weird_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|845.3 MB| + +## References + +https://huggingface.co/eduagarcia-temp/cnj_large_v1_2_dinkytrain_weird + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-coha1860s_en.md b/docs/_posts/ahmedlone127/2024-09-21-coha1860s_en.md new file mode 100644 index 00000000000000..9e14d61ceea60c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-coha1860s_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English coha1860s RoBertaEmbeddings from simonmun +author: John Snow Labs +name: coha1860s +date: 2024-09-21 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`coha1860s` is a English model originally trained by simonmun. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/coha1860s_en_5.5.0_3.0_1726934221985.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/coha1860s_en_5.5.0_3.0_1726934221985.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("coha1860s","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("coha1860s","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|coha1860s| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|312.0 MB| + +## References + +https://huggingface.co/simonmun/COHA1860s \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-colombian_sign_language_small_biased_random_20_en.md b/docs/_posts/ahmedlone127/2024-09-21-colombian_sign_language_small_biased_random_20_en.md new file mode 100644 index 00000000000000..b9c25ca537c5cb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-colombian_sign_language_small_biased_random_20_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English colombian_sign_language_small_biased_random_20 RoBertaEmbeddings from antolin +author: John Snow Labs +name: colombian_sign_language_small_biased_random_20 +date: 2024-09-21 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`colombian_sign_language_small_biased_random_20` is a English model originally trained by antolin. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/colombian_sign_language_small_biased_random_20_en_5.5.0_3.0_1726958030993.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/colombian_sign_language_small_biased_random_20_en_5.5.0_3.0_1726958030993.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("colombian_sign_language_small_biased_random_20","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("colombian_sign_language_small_biased_random_20","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|colombian_sign_language_small_biased_random_20| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|470.6 MB| + +## References + +https://huggingface.co/antolin/csn-small-biased-random-20 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-colombian_sign_language_small_biased_random_20_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-colombian_sign_language_small_biased_random_20_pipeline_en.md new file mode 100644 index 00000000000000..867b45b1c68813 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-colombian_sign_language_small_biased_random_20_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English colombian_sign_language_small_biased_random_20_pipeline pipeline RoBertaEmbeddings from antolin +author: John Snow Labs +name: colombian_sign_language_small_biased_random_20_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`colombian_sign_language_small_biased_random_20_pipeline` is a English model originally trained by antolin. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/colombian_sign_language_small_biased_random_20_pipeline_en_5.5.0_3.0_1726958055226.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/colombian_sign_language_small_biased_random_20_pipeline_en_5.5.0_3.0_1726958055226.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("colombian_sign_language_small_biased_random_20_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("colombian_sign_language_small_biased_random_20_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|colombian_sign_language_small_biased_random_20_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|470.6 MB| + +## References + +https://huggingface.co/antolin/csn-small-biased-random-20 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-convbert_base_turkish_mc4_toxicity_uncased_pipeline_tr.md b/docs/_posts/ahmedlone127/2024-09-21-convbert_base_turkish_mc4_toxicity_uncased_pipeline_tr.md new file mode 100644 index 00000000000000..e4d07f6539bc52 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-convbert_base_turkish_mc4_toxicity_uncased_pipeline_tr.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Turkish convbert_base_turkish_mc4_toxicity_uncased_pipeline pipeline BertForSequenceClassification from gokceuludogan +author: John Snow Labs +name: convbert_base_turkish_mc4_toxicity_uncased_pipeline +date: 2024-09-21 +tags: [tr, open_source, pipeline, onnx] +task: Text Classification +language: tr +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`convbert_base_turkish_mc4_toxicity_uncased_pipeline` is a Turkish model originally trained by gokceuludogan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/convbert_base_turkish_mc4_toxicity_uncased_pipeline_tr_5.5.0_3.0_1726902426552.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/convbert_base_turkish_mc4_toxicity_uncased_pipeline_tr_5.5.0_3.0_1726902426552.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("convbert_base_turkish_mc4_toxicity_uncased_pipeline", lang = "tr") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("convbert_base_turkish_mc4_toxicity_uncased_pipeline", lang = "tr") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|convbert_base_turkish_mc4_toxicity_uncased_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|tr| +|Size:|402.3 MB| + +## References + +https://huggingface.co/gokceuludogan/convbert-base-turkish-mc4-toxicity-uncased + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-convbert_base_turkish_mc4_toxicity_uncased_tr.md b/docs/_posts/ahmedlone127/2024-09-21-convbert_base_turkish_mc4_toxicity_uncased_tr.md new file mode 100644 index 00000000000000..d6700eb49ce9a9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-convbert_base_turkish_mc4_toxicity_uncased_tr.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Turkish convbert_base_turkish_mc4_toxicity_uncased BertForSequenceClassification from gokceuludogan +author: John Snow Labs +name: convbert_base_turkish_mc4_toxicity_uncased +date: 2024-09-21 +tags: [tr, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: tr +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`convbert_base_turkish_mc4_toxicity_uncased` is a Turkish model originally trained by gokceuludogan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/convbert_base_turkish_mc4_toxicity_uncased_tr_5.5.0_3.0_1726902408476.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/convbert_base_turkish_mc4_toxicity_uncased_tr_5.5.0_3.0_1726902408476.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("convbert_base_turkish_mc4_toxicity_uncased","tr") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("convbert_base_turkish_mc4_toxicity_uncased", "tr") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|convbert_base_turkish_mc4_toxicity_uncased| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|tr| +|Size:|402.3 MB| + +## References + +https://huggingface.co/gokceuludogan/convbert-base-turkish-mc4-toxicity-uncased \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-cooking_info_need_classifier_en.md b/docs/_posts/ahmedlone127/2024-09-21-cooking_info_need_classifier_en.md new file mode 100644 index 00000000000000..21bd477de37b4c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-cooking_info_need_classifier_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English cooking_info_need_classifier BertForSequenceClassification from AlexFr +author: John Snow Labs +name: cooking_info_need_classifier +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`cooking_info_need_classifier` is a English model originally trained by AlexFr. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/cooking_info_need_classifier_en_5.5.0_3.0_1726902517024.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/cooking_info_need_classifier_en_5.5.0_3.0_1726902517024.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("cooking_info_need_classifier","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("cooking_info_need_classifier", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|cooking_info_need_classifier| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|405.9 MB| + +## References + +https://huggingface.co/AlexFr/cooking-info-need-classifier \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-cooking_info_need_classifier_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-cooking_info_need_classifier_pipeline_en.md new file mode 100644 index 00000000000000..e5709246df9673 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-cooking_info_need_classifier_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English cooking_info_need_classifier_pipeline pipeline BertForSequenceClassification from AlexFr +author: John Snow Labs +name: cooking_info_need_classifier_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`cooking_info_need_classifier_pipeline` is a English model originally trained by AlexFr. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/cooking_info_need_classifier_pipeline_en_5.5.0_3.0_1726902535228.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/cooking_info_need_classifier_pipeline_en_5.5.0_3.0_1726902535228.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("cooking_info_need_classifier_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("cooking_info_need_classifier_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|cooking_info_need_classifier_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|405.9 MB| + +## References + +https://huggingface.co/AlexFr/cooking-info-need-classifier + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-cv9_special_batch8_small_id.md b/docs/_posts/ahmedlone127/2024-09-21-cv9_special_batch8_small_id.md new file mode 100644 index 00000000000000..cb4f79cab4228d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-cv9_special_batch8_small_id.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Indonesian cv9_special_batch8_small WhisperForCTC from TheRains +author: John Snow Labs +name: cv9_special_batch8_small +date: 2024-09-21 +tags: [id, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: id +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`cv9_special_batch8_small` is a Indonesian model originally trained by TheRains. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/cv9_special_batch8_small_id_5.5.0_3.0_1726890823275.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/cv9_special_batch8_small_id_5.5.0_3.0_1726890823275.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("cv9_special_batch8_small","id") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("cv9_special_batch8_small", "id") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|cv9_special_batch8_small| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|id| +|Size:|1.7 GB| + +## References + +https://huggingface.co/TheRains/cv9-special-batch8-small \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-cv9_special_batch8_small_pipeline_id.md b/docs/_posts/ahmedlone127/2024-09-21-cv9_special_batch8_small_pipeline_id.md new file mode 100644 index 00000000000000..8dff6853b80b8a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-cv9_special_batch8_small_pipeline_id.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Indonesian cv9_special_batch8_small_pipeline pipeline WhisperForCTC from TheRains +author: John Snow Labs +name: cv9_special_batch8_small_pipeline +date: 2024-09-21 +tags: [id, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: id +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`cv9_special_batch8_small_pipeline` is a Indonesian model originally trained by TheRains. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/cv9_special_batch8_small_pipeline_id_5.5.0_3.0_1726890911615.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/cv9_special_batch8_small_pipeline_id_5.5.0_3.0_1726890911615.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("cv9_special_batch8_small_pipeline", lang = "id") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("cv9_special_batch8_small_pipeline", lang = "id") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|cv9_special_batch8_small_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|id| +|Size:|1.7 GB| + +## References + +https://huggingface.co/TheRains/cv9-special-batch8-small + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-deepset_bert_base_cased_squad2_orkg_unchanged_5e_05_en.md b/docs/_posts/ahmedlone127/2024-09-21-deepset_bert_base_cased_squad2_orkg_unchanged_5e_05_en.md new file mode 100644 index 00000000000000..1b99215cd5ac5a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-deepset_bert_base_cased_squad2_orkg_unchanged_5e_05_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English deepset_bert_base_cased_squad2_orkg_unchanged_5e_05 BertForQuestionAnswering from Moussab +author: John Snow Labs +name: deepset_bert_base_cased_squad2_orkg_unchanged_5e_05 +date: 2024-09-21 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`deepset_bert_base_cased_squad2_orkg_unchanged_5e_05` is a English model originally trained by Moussab. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/deepset_bert_base_cased_squad2_orkg_unchanged_5e_05_en_5.5.0_3.0_1726946445195.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/deepset_bert_base_cased_squad2_orkg_unchanged_5e_05_en_5.5.0_3.0_1726946445195.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("deepset_bert_base_cased_squad2_orkg_unchanged_5e_05","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("deepset_bert_base_cased_squad2_orkg_unchanged_5e_05", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|deepset_bert_base_cased_squad2_orkg_unchanged_5e_05| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/Moussab/deepset_bert-base-cased-squad2-orkg-unchanged-5e-05 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-deepset_bert_base_cased_squad2_orkg_unchanged_5e_05_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-deepset_bert_base_cased_squad2_orkg_unchanged_5e_05_pipeline_en.md new file mode 100644 index 00000000000000..5768d66488ae5b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-deepset_bert_base_cased_squad2_orkg_unchanged_5e_05_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English deepset_bert_base_cased_squad2_orkg_unchanged_5e_05_pipeline pipeline BertForQuestionAnswering from Moussab +author: John Snow Labs +name: deepset_bert_base_cased_squad2_orkg_unchanged_5e_05_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`deepset_bert_base_cased_squad2_orkg_unchanged_5e_05_pipeline` is a English model originally trained by Moussab. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/deepset_bert_base_cased_squad2_orkg_unchanged_5e_05_pipeline_en_5.5.0_3.0_1726946464694.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/deepset_bert_base_cased_squad2_orkg_unchanged_5e_05_pipeline_en_5.5.0_3.0_1726946464694.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("deepset_bert_base_cased_squad2_orkg_unchanged_5e_05_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("deepset_bert_base_cased_squad2_orkg_unchanged_5e_05_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|deepset_bert_base_cased_squad2_orkg_unchanged_5e_05_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/Moussab/deepset_bert-base-cased-squad2-orkg-unchanged-5e-05 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-delivery_distilbert_base_uncased_v1_en.md b/docs/_posts/ahmedlone127/2024-09-21-delivery_distilbert_base_uncased_v1_en.md new file mode 100644 index 00000000000000..6ce4dadcd639f6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-delivery_distilbert_base_uncased_v1_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English delivery_distilbert_base_uncased_v1 DistilBertForSequenceClassification from chuuhtetnaing +author: John Snow Labs +name: delivery_distilbert_base_uncased_v1 +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`delivery_distilbert_base_uncased_v1` is a English model originally trained by chuuhtetnaing. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/delivery_distilbert_base_uncased_v1_en_5.5.0_3.0_1726953015849.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/delivery_distilbert_base_uncased_v1_en_5.5.0_3.0_1726953015849.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("delivery_distilbert_base_uncased_v1","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("delivery_distilbert_base_uncased_v1", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|delivery_distilbert_base_uncased_v1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/chuuhtetnaing/delivery-distilbert-base-uncased-v1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-delivery_distilbert_base_uncased_v1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-delivery_distilbert_base_uncased_v1_pipeline_en.md new file mode 100644 index 00000000000000..53535223b93f76 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-delivery_distilbert_base_uncased_v1_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English delivery_distilbert_base_uncased_v1_pipeline pipeline DistilBertForSequenceClassification from chuuhtetnaing +author: John Snow Labs +name: delivery_distilbert_base_uncased_v1_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`delivery_distilbert_base_uncased_v1_pipeline` is a English model originally trained by chuuhtetnaing. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/delivery_distilbert_base_uncased_v1_pipeline_en_5.5.0_3.0_1726953028387.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/delivery_distilbert_base_uncased_v1_pipeline_en_5.5.0_3.0_1726953028387.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("delivery_distilbert_base_uncased_v1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("delivery_distilbert_base_uncased_v1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|delivery_distilbert_base_uncased_v1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/chuuhtetnaing/delivery-distilbert-base-uncased-v1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-demo_whisper_en.md b/docs/_posts/ahmedlone127/2024-09-21-demo_whisper_en.md new file mode 100644 index 00000000000000..6b8a62467bb913 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-demo_whisper_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English demo_whisper WhisperForCTC from yash072 +author: John Snow Labs +name: demo_whisper +date: 2024-09-21 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`demo_whisper` is a English model originally trained by yash072. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/demo_whisper_en_5.5.0_3.0_1726877747985.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/demo_whisper_en_5.5.0_3.0_1726877747985.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("demo_whisper","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("demo_whisper", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|demo_whisper| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/yash072/demo_whisper \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-demo_whisper_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-demo_whisper_pipeline_en.md new file mode 100644 index 00000000000000..ce68ad87d440c0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-demo_whisper_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English demo_whisper_pipeline pipeline WhisperForCTC from yash072 +author: John Snow Labs +name: demo_whisper_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`demo_whisper_pipeline` is a English model originally trained by yash072. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/demo_whisper_pipeline_en_5.5.0_3.0_1726877830352.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/demo_whisper_pipeline_en_5.5.0_3.0_1726877830352.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("demo_whisper_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("demo_whisper_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|demo_whisper_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/yash072/demo_whisper + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-discord_message_small_en.md b/docs/_posts/ahmedlone127/2024-09-21-discord_message_small_en.md new file mode 100644 index 00000000000000..608bcdac47214a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-discord_message_small_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English discord_message_small RoBertaEmbeddings from TheDiamondKing +author: John Snow Labs +name: discord_message_small +date: 2024-09-21 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`discord_message_small` is a English model originally trained by TheDiamondKing. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/discord_message_small_en_5.5.0_3.0_1726943616114.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/discord_message_small_en_5.5.0_3.0_1726943616114.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("discord_message_small","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("discord_message_small","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|discord_message_small| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|1.8 GB| + +## References + +https://huggingface.co/TheDiamondKing/Discord-Message-Small \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distil_whisper_english_en.md b/docs/_posts/ahmedlone127/2024-09-21-distil_whisper_english_en.md new file mode 100644 index 00000000000000..36dda7c5c9ddef --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distil_whisper_english_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English distil_whisper_english WhisperForCTC from pravin96 +author: John Snow Labs +name: distil_whisper_english +date: 2024-09-21 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distil_whisper_english` is a English model originally trained by pravin96. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distil_whisper_english_en_5.5.0_3.0_1726948860770.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distil_whisper_english_en_5.5.0_3.0_1726948860770.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("distil_whisper_english","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("distil_whisper_english", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distil_whisper_english| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/pravin96/distil_whisper_en \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distil_whisper_english_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-distil_whisper_english_pipeline_en.md new file mode 100644 index 00000000000000..5d6cfd04ea5245 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distil_whisper_english_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English distil_whisper_english_pipeline pipeline WhisperForCTC from pravin96 +author: John Snow Labs +name: distil_whisper_english_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distil_whisper_english_pipeline` is a English model originally trained by pravin96. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distil_whisper_english_pipeline_en_5.5.0_3.0_1726948921691.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distil_whisper_english_pipeline_en_5.5.0_3.0_1726948921691.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distil_whisper_english_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distil_whisper_english_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distil_whisper_english_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/pravin96/distil_whisper_en + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distilbert4_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-distilbert4_pipeline_en.md new file mode 100644 index 00000000000000..76e089ee8d300c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distilbert4_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert4_pipeline pipeline DistilBertForSequenceClassification from deptage +author: John Snow Labs +name: distilbert4_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert4_pipeline` is a English model originally trained by deptage. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert4_pipeline_en_5.5.0_3.0_1726953679128.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert4_pipeline_en_5.5.0_3.0_1726953679128.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert4_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert4_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert4_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/deptage/distilbert4 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distilbert_agnews_padding30model_en.md b/docs/_posts/ahmedlone127/2024-09-21-distilbert_agnews_padding30model_en.md new file mode 100644 index 00000000000000..bfeb7306b8e0f6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distilbert_agnews_padding30model_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_agnews_padding30model DistilBertForSequenceClassification from Realgon +author: John Snow Labs +name: distilbert_agnews_padding30model +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_agnews_padding30model` is a English model originally trained by Realgon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_agnews_padding30model_en_5.5.0_3.0_1726953317712.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_agnews_padding30model_en_5.5.0_3.0_1726953317712.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_agnews_padding30model","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_agnews_padding30model", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_agnews_padding30model| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Realgon/distilbert_agnews_padding30model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distilbert_agnews_padding30model_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-distilbert_agnews_padding30model_pipeline_en.md new file mode 100644 index 00000000000000..88679461af5bcf --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distilbert_agnews_padding30model_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_agnews_padding30model_pipeline pipeline DistilBertForSequenceClassification from Realgon +author: John Snow Labs +name: distilbert_agnews_padding30model_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_agnews_padding30model_pipeline` is a English model originally trained by Realgon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_agnews_padding30model_pipeline_en_5.5.0_3.0_1726953330217.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_agnews_padding30model_pipeline_en_5.5.0_3.0_1726953330217.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_agnews_padding30model_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_agnews_padding30model_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_agnews_padding30model_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Realgon/distilbert_agnews_padding30model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distilbert_amazon_software_reviews_finetuned_en.md b/docs/_posts/ahmedlone127/2024-09-21-distilbert_amazon_software_reviews_finetuned_en.md new file mode 100644 index 00000000000000..b9646d84600cb3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distilbert_amazon_software_reviews_finetuned_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_amazon_software_reviews_finetuned DistilBertForSequenceClassification from PHILIPPUNI +author: John Snow Labs +name: distilbert_amazon_software_reviews_finetuned +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_amazon_software_reviews_finetuned` is a English model originally trained by PHILIPPUNI. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_amazon_software_reviews_finetuned_en_5.5.0_3.0_1726953265149.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_amazon_software_reviews_finetuned_en_5.5.0_3.0_1726953265149.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_amazon_software_reviews_finetuned","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_amazon_software_reviews_finetuned", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_amazon_software_reviews_finetuned| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/PHILIPPUNI/distilbert-amazon-software-reviews-finetuned \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distilbert_amazon_software_reviews_finetuned_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-distilbert_amazon_software_reviews_finetuned_pipeline_en.md new file mode 100644 index 00000000000000..9f9a1e47a55317 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distilbert_amazon_software_reviews_finetuned_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_amazon_software_reviews_finetuned_pipeline pipeline DistilBertForSequenceClassification from PHILIPPUNI +author: John Snow Labs +name: distilbert_amazon_software_reviews_finetuned_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_amazon_software_reviews_finetuned_pipeline` is a English model originally trained by PHILIPPUNI. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_amazon_software_reviews_finetuned_pipeline_en_5.5.0_3.0_1726953277769.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_amazon_software_reviews_finetuned_pipeline_en_5.5.0_3.0_1726953277769.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_amazon_software_reviews_finetuned_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_amazon_software_reviews_finetuned_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_amazon_software_reviews_finetuned_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/PHILIPPUNI/distilbert-amazon-software-reviews-finetuned + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_multilingual_cased_finetuned_kor_8_emotions_v1_4_pipeline_xx.md b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_multilingual_cased_finetuned_kor_8_emotions_v1_4_pipeline_xx.md new file mode 100644 index 00000000000000..3c3aba709a1f58 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_multilingual_cased_finetuned_kor_8_emotions_v1_4_pipeline_xx.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Multilingual distilbert_base_multilingual_cased_finetuned_kor_8_emotions_v1_4_pipeline pipeline DistilBertForSequenceClassification from blue2959 +author: John Snow Labs +name: distilbert_base_multilingual_cased_finetuned_kor_8_emotions_v1_4_pipeline +date: 2024-09-21 +tags: [xx, open_source, pipeline, onnx] +task: Text Classification +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_multilingual_cased_finetuned_kor_8_emotions_v1_4_pipeline` is a Multilingual model originally trained by blue2959. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_multilingual_cased_finetuned_kor_8_emotions_v1_4_pipeline_xx_5.5.0_3.0_1726924016536.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_multilingual_cased_finetuned_kor_8_emotions_v1_4_pipeline_xx_5.5.0_3.0_1726924016536.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_multilingual_cased_finetuned_kor_8_emotions_v1_4_pipeline", lang = "xx") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_multilingual_cased_finetuned_kor_8_emotions_v1_4_pipeline", lang = "xx") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_multilingual_cased_finetuned_kor_8_emotions_v1_4_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|xx| +|Size:|507.7 MB| + +## References + +https://huggingface.co/blue2959/distilbert-base-multilingual-cased-finetuned-kor-8-emotions_v1.4 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_multilingual_cased_finetuned_kor_8_emotions_v1_4_xx.md b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_multilingual_cased_finetuned_kor_8_emotions_v1_4_xx.md new file mode 100644 index 00000000000000..3b7cc9ea1594a2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_multilingual_cased_finetuned_kor_8_emotions_v1_4_xx.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Multilingual distilbert_base_multilingual_cased_finetuned_kor_8_emotions_v1_4 DistilBertForSequenceClassification from blue2959 +author: John Snow Labs +name: distilbert_base_multilingual_cased_finetuned_kor_8_emotions_v1_4 +date: 2024-09-21 +tags: [xx, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_multilingual_cased_finetuned_kor_8_emotions_v1_4` is a Multilingual model originally trained by blue2959. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_multilingual_cased_finetuned_kor_8_emotions_v1_4_xx_5.5.0_3.0_1726923992706.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_multilingual_cased_finetuned_kor_8_emotions_v1_4_xx_5.5.0_3.0_1726923992706.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_multilingual_cased_finetuned_kor_8_emotions_v1_4","xx") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_multilingual_cased_finetuned_kor_8_emotions_v1_4", "xx") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_multilingual_cased_finetuned_kor_8_emotions_v1_4| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|xx| +|Size:|507.6 MB| + +## References + +https://huggingface.co/blue2959/distilbert-base-multilingual-cased-finetuned-kor-8-emotions_v1.4 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_multilingual_cased_finetuned_pipeline_xx.md b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_multilingual_cased_finetuned_pipeline_xx.md new file mode 100644 index 00000000000000..4b2b9c306db9e5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_multilingual_cased_finetuned_pipeline_xx.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Multilingual distilbert_base_multilingual_cased_finetuned_pipeline pipeline DistilBertForSequenceClassification from Esmail275 +author: John Snow Labs +name: distilbert_base_multilingual_cased_finetuned_pipeline +date: 2024-09-21 +tags: [xx, open_source, pipeline, onnx] +task: Text Classification +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_multilingual_cased_finetuned_pipeline` is a Multilingual model originally trained by Esmail275. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_multilingual_cased_finetuned_pipeline_xx_5.5.0_3.0_1726924353140.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_multilingual_cased_finetuned_pipeline_xx_5.5.0_3.0_1726924353140.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_multilingual_cased_finetuned_pipeline", lang = "xx") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_multilingual_cased_finetuned_pipeline", lang = "xx") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_multilingual_cased_finetuned_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|xx| +|Size:|507.7 MB| + +## References + +https://huggingface.co/Esmail275/distilbert-base-multilingual-cased-finetuned + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_multilingual_cased_finetuned_xx.md b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_multilingual_cased_finetuned_xx.md new file mode 100644 index 00000000000000..471d6b9e2fc944 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_multilingual_cased_finetuned_xx.md @@ -0,0 +1,98 @@ +--- +layout: model +title: Multilingual distilbert_base_multilingual_cased_finetuned DistilBertForSequenceClassification from chiatzu +author: John Snow Labs +name: distilbert_base_multilingual_cased_finetuned +date: 2024-09-21 +tags: [bert, xx, open_source, sequence_classification, onnx] +task: Text Classification +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_multilingual_cased_finetuned` is a Multilingual model originally trained by chiatzu. + +## Predicted Entities + + + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_multilingual_cased_finetuned_xx_5.5.0_3.0_1726924328021.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_multilingual_cased_finetuned_xx_5.5.0_3.0_1726924328021.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +document_assembler = DocumentAssembler()\ + .setInputCol("text")\ + .setOutputCol("document") + +tokenizer = Tokenizer()\ + .setInputCols("document")\ + .setOutputCol("token") + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_multilingual_cased_finetuned","xx")\ + .setInputCols(["document","token"])\ + .setOutputCol("class") + +pipeline = Pipeline().setStages([document_assembler, tokenizer, sequenceClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val document_assembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_multilingual_cased_finetuned","xx") + .setInputCols(Array("document","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_multilingual_cased_finetuned| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|xx| +|Size:|507.6 MB| + +## References + +References + +https://huggingface.co/chiatzu/distilbert-base-multilingual-cased-finetuned \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_banking_zphr_0st72_ut52ut1_plain_simsp_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_banking_zphr_0st72_ut52ut1_plain_simsp_pipeline_en.md new file mode 100644 index 00000000000000..be0f0c67ea363f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_banking_zphr_0st72_ut52ut1_plain_simsp_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_banking_zphr_0st72_ut52ut1_plain_simsp_pipeline pipeline DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_banking_zphr_0st72_ut52ut1_plain_simsp_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_banking_zphr_0st72_ut52ut1_plain_simsp_pipeline` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_banking_zphr_0st72_ut52ut1_plain_simsp_pipeline_en_5.5.0_3.0_1726923919930.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_banking_zphr_0st72_ut52ut1_plain_simsp_pipeline_en_5.5.0_3.0_1726923919930.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_banking_zphr_0st72_ut52ut1_plain_simsp_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_banking_zphr_0st72_ut52ut1_plain_simsp_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_banking_zphr_0st72_ut52ut1_plain_simsp_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_banking_zphr_0st72_ut52ut1_plain_simsp + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_banking_zphr_0st72_ut72ut7_plain_simsp_clean_en.md b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_banking_zphr_0st72_ut72ut7_plain_simsp_clean_en.md new file mode 100644 index 00000000000000..850d25e1390bde --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_banking_zphr_0st72_ut72ut7_plain_simsp_clean_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_banking_zphr_0st72_ut72ut7_plain_simsp_clean DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_banking_zphr_0st72_ut72ut7_plain_simsp_clean +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_banking_zphr_0st72_ut72ut7_plain_simsp_clean` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_banking_zphr_0st72_ut72ut7_plain_simsp_clean_en_5.5.0_3.0_1726923883088.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_banking_zphr_0st72_ut72ut7_plain_simsp_clean_en_5.5.0_3.0_1726923883088.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_banking_zphr_0st72_ut72ut7_plain_simsp_clean","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_banking_zphr_0st72_ut72ut7_plain_simsp_clean", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_banking_zphr_0st72_ut72ut7_plain_simsp_clean| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_banking_zphr_0st72_ut72ut7_plain_simsp_clean \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_banking_zphr_0st72_ut72ut7_plain_simsp_clean_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_banking_zphr_0st72_ut72ut7_plain_simsp_clean_pipeline_en.md new file mode 100644 index 00000000000000..11826b18e53b0f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_banking_zphr_0st72_ut72ut7_plain_simsp_clean_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_banking_zphr_0st72_ut72ut7_plain_simsp_clean_pipeline pipeline DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_banking_zphr_0st72_ut72ut7_plain_simsp_clean_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_banking_zphr_0st72_ut72ut7_plain_simsp_clean_pipeline` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_banking_zphr_0st72_ut72ut7_plain_simsp_clean_pipeline_en_5.5.0_3.0_1726923896313.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_banking_zphr_0st72_ut72ut7_plain_simsp_clean_pipeline_en_5.5.0_3.0_1726923896313.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_banking_zphr_0st72_ut72ut7_plain_simsp_clean_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_banking_zphr_0st72_ut72ut7_plain_simsp_clean_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_banking_zphr_0st72_ut72ut7_plain_simsp_clean_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_banking_zphr_0st72_ut72ut7_plain_simsp_clean + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_distilled_clinc_mrwetsnow_en.md b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_distilled_clinc_mrwetsnow_en.md new file mode 100644 index 00000000000000..5b34b647b80320 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_distilled_clinc_mrwetsnow_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_distilled_clinc_mrwetsnow DistilBertForSequenceClassification from MrWetsnow +author: John Snow Labs +name: distilbert_base_uncased_distilled_clinc_mrwetsnow +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_distilled_clinc_mrwetsnow` is a English model originally trained by MrWetsnow. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_distilled_clinc_mrwetsnow_en_5.5.0_3.0_1726953235001.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_distilled_clinc_mrwetsnow_en_5.5.0_3.0_1726953235001.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_distilled_clinc_mrwetsnow","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_distilled_clinc_mrwetsnow", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_distilled_clinc_mrwetsnow| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.9 MB| + +## References + +https://huggingface.co/MrWetsnow/distilbert-base-uncased-distilled-clinc \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_distilled_clinc_mrwetsnow_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_distilled_clinc_mrwetsnow_pipeline_en.md new file mode 100644 index 00000000000000..4aac673f2aae89 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_distilled_clinc_mrwetsnow_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_distilled_clinc_mrwetsnow_pipeline pipeline DistilBertForSequenceClassification from MrWetsnow +author: John Snow Labs +name: distilbert_base_uncased_distilled_clinc_mrwetsnow_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_distilled_clinc_mrwetsnow_pipeline` is a English model originally trained by MrWetsnow. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_distilled_clinc_mrwetsnow_pipeline_en_5.5.0_3.0_1726953247058.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_distilled_clinc_mrwetsnow_pipeline_en_5.5.0_3.0_1726953247058.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_distilled_clinc_mrwetsnow_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_distilled_clinc_mrwetsnow_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_distilled_clinc_mrwetsnow_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.9 MB| + +## References + +https://huggingface.co/MrWetsnow/distilbert-base-uncased-distilled-clinc + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_finetuned_adl_hw1_penguinyeh_en.md b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_finetuned_adl_hw1_penguinyeh_en.md new file mode 100644 index 00000000000000..5356cd54107596 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_finetuned_adl_hw1_penguinyeh_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_adl_hw1_penguinyeh DistilBertForSequenceClassification from penguinyeh +author: John Snow Labs +name: distilbert_base_uncased_finetuned_adl_hw1_penguinyeh +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_adl_hw1_penguinyeh` is a English model originally trained by penguinyeh. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_adl_hw1_penguinyeh_en_5.5.0_3.0_1726924361395.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_adl_hw1_penguinyeh_en_5.5.0_3.0_1726924361395.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_adl_hw1_penguinyeh","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_adl_hw1_penguinyeh", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_adl_hw1_penguinyeh| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.9 MB| + +## References + +https://huggingface.co/penguinyeh/distilbert-base-uncased-finetuned-adl_hw1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_finetuned_adl_hw1_penguinyeh_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_finetuned_adl_hw1_penguinyeh_pipeline_en.md new file mode 100644 index 00000000000000..94e7daf1dc3368 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_finetuned_adl_hw1_penguinyeh_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_adl_hw1_penguinyeh_pipeline pipeline DistilBertForSequenceClassification from penguinyeh +author: John Snow Labs +name: distilbert_base_uncased_finetuned_adl_hw1_penguinyeh_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_adl_hw1_penguinyeh_pipeline` is a English model originally trained by penguinyeh. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_adl_hw1_penguinyeh_pipeline_en_5.5.0_3.0_1726924374076.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_adl_hw1_penguinyeh_pipeline_en_5.5.0_3.0_1726924374076.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_adl_hw1_penguinyeh_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_adl_hw1_penguinyeh_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_adl_hw1_penguinyeh_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.9 MB| + +## References + +https://huggingface.co/penguinyeh/distilbert-base-uncased-finetuned-adl_hw1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_finetuned_cola_cltsai_en.md b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_finetuned_cola_cltsai_en.md new file mode 100644 index 00000000000000..862810eca25855 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_finetuned_cola_cltsai_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_cola_cltsai DistilBertForSequenceClassification from cltsai +author: John Snow Labs +name: distilbert_base_uncased_finetuned_cola_cltsai +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_cola_cltsai` is a English model originally trained by cltsai. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_cltsai_en_5.5.0_3.0_1726888780862.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_cltsai_en_5.5.0_3.0_1726888780862.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_cola_cltsai","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_cola_cltsai", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_cola_cltsai| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/cltsai/distilbert-base-uncased-finetuned-cola \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_finetuned_cola_cltsai_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_finetuned_cola_cltsai_pipeline_en.md new file mode 100644 index 00000000000000..529b8d290582ca --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_finetuned_cola_cltsai_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_cola_cltsai_pipeline pipeline DistilBertForSequenceClassification from cltsai +author: John Snow Labs +name: distilbert_base_uncased_finetuned_cola_cltsai_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_cola_cltsai_pipeline` is a English model originally trained by cltsai. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_cltsai_pipeline_en_5.5.0_3.0_1726888792279.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_cltsai_pipeline_en_5.5.0_3.0_1726888792279.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_cola_cltsai_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_cola_cltsai_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_cola_cltsai_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/cltsai/distilbert-base-uncased-finetuned-cola + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_finetuned_emotion_a5med_en.md b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_finetuned_emotion_a5med_en.md new file mode 100644 index 00000000000000..0b9a01c4da7262 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_finetuned_emotion_a5med_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_a5med DistilBertForSequenceClassification from a5med +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_a5med +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_a5med` is a English model originally trained by a5med. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_a5med_en_5.5.0_3.0_1726884690124.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_a5med_en_5.5.0_3.0_1726884690124.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_a5med","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_a5med", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_a5med| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/a5med/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_finetuned_emotion_a5med_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_finetuned_emotion_a5med_pipeline_en.md new file mode 100644 index 00000000000000..9f3de1cdc00f89 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_finetuned_emotion_a5med_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_a5med_pipeline pipeline DistilBertForSequenceClassification from a5med +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_a5med_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_a5med_pipeline` is a English model originally trained by a5med. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_a5med_pipeline_en_5.5.0_3.0_1726884701891.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_a5med_pipeline_en_5.5.0_3.0_1726884701891.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_a5med_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_a5med_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_a5med_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/a5med/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_finetuned_emotion_basantsubba_en.md b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_finetuned_emotion_basantsubba_en.md new file mode 100644 index 00000000000000..ddd9c69306b568 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_finetuned_emotion_basantsubba_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_basantsubba DistilBertForSequenceClassification from BasantSubba +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_basantsubba +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_basantsubba` is a English model originally trained by BasantSubba. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_basantsubba_en_5.5.0_3.0_1726952965786.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_basantsubba_en_5.5.0_3.0_1726952965786.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_basantsubba","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_basantsubba", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_basantsubba| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/BasantSubba/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_finetuned_emotion_basantsubba_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_finetuned_emotion_basantsubba_pipeline_en.md new file mode 100644 index 00000000000000..da09c78b28c38c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_finetuned_emotion_basantsubba_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_basantsubba_pipeline pipeline DistilBertForSequenceClassification from BasantSubba +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_basantsubba_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_basantsubba_pipeline` is a English model originally trained by BasantSubba. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_basantsubba_pipeline_en_5.5.0_3.0_1726952979055.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_basantsubba_pipeline_en_5.5.0_3.0_1726952979055.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_basantsubba_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_basantsubba_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_basantsubba_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/BasantSubba/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_finetuned_emotion_book_en.md b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_finetuned_emotion_book_en.md new file mode 100644 index 00000000000000..ab1bdc8a740105 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_finetuned_emotion_book_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_book DistilBertForSequenceClassification from tibi96 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_book +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_book` is a English model originally trained by tibi96. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_book_en_5.5.0_3.0_1726888868335.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_book_en_5.5.0_3.0_1726888868335.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_book","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_book", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_book| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/tibi96/distilbert-base-uncased-finetuned-emotion-book \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_finetuned_emotion_book_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_finetuned_emotion_book_pipeline_en.md new file mode 100644 index 00000000000000..50997ee0170de7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_finetuned_emotion_book_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_book_pipeline pipeline DistilBertForSequenceClassification from tibi96 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_book_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_book_pipeline` is a English model originally trained by tibi96. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_book_pipeline_en_5.5.0_3.0_1726888880743.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_book_pipeline_en_5.5.0_3.0_1726888880743.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_book_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_book_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_book_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/tibi96/distilbert-base-uncased-finetuned-emotion-book + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_finetuned_emotion_boringyogurt_en.md b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_finetuned_emotion_boringyogurt_en.md new file mode 100644 index 00000000000000..7162f4bdd0a553 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_finetuned_emotion_boringyogurt_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_boringyogurt DistilBertForSequenceClassification from BoringYogurt +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_boringyogurt +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_boringyogurt` is a English model originally trained by BoringYogurt. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_boringyogurt_en_5.5.0_3.0_1726952966506.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_boringyogurt_en_5.5.0_3.0_1726952966506.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_boringyogurt","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_boringyogurt", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_boringyogurt| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/BoringYogurt/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_finetuned_emotion_boringyogurt_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_finetuned_emotion_boringyogurt_pipeline_en.md new file mode 100644 index 00000000000000..1124c5f87126c7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_finetuned_emotion_boringyogurt_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_boringyogurt_pipeline pipeline DistilBertForSequenceClassification from BoringYogurt +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_boringyogurt_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_boringyogurt_pipeline` is a English model originally trained by BoringYogurt. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_boringyogurt_pipeline_en_5.5.0_3.0_1726952979106.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_boringyogurt_pipeline_en_5.5.0_3.0_1726952979106.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_boringyogurt_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_boringyogurt_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_boringyogurt_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/BoringYogurt/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_finetuned_emotion_crazymoment_en.md b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_finetuned_emotion_crazymoment_en.md new file mode 100644 index 00000000000000..512971eeb2dc9c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_finetuned_emotion_crazymoment_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_crazymoment DistilBertForSequenceClassification from CrazyMoment +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_crazymoment +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_crazymoment` is a English model originally trained by CrazyMoment. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_crazymoment_en_5.5.0_3.0_1726953203911.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_crazymoment_en_5.5.0_3.0_1726953203911.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_crazymoment","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_crazymoment", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_crazymoment| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/CrazyMoment/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_finetuned_emotion_crazymoment_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_finetuned_emotion_crazymoment_pipeline_en.md new file mode 100644 index 00000000000000..80dddccf4378b8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_finetuned_emotion_crazymoment_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_crazymoment_pipeline pipeline DistilBertForSequenceClassification from CrazyMoment +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_crazymoment_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_crazymoment_pipeline` is a English model originally trained by CrazyMoment. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_crazymoment_pipeline_en_5.5.0_3.0_1726953215778.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_crazymoment_pipeline_en_5.5.0_3.0_1726953215778.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_crazymoment_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_crazymoment_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_crazymoment_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/CrazyMoment/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_finetuned_emotion_dasooo_en.md b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_finetuned_emotion_dasooo_en.md new file mode 100644 index 00000000000000..43d807a229e1ab --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_finetuned_emotion_dasooo_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_dasooo DistilBertForSequenceClassification from daSooo +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_dasooo +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_dasooo` is a English model originally trained by daSooo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_dasooo_en_5.5.0_3.0_1726888865889.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_dasooo_en_5.5.0_3.0_1726888865889.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_dasooo","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_dasooo", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_dasooo| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/daSooo/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_finetuned_emotion_dasooo_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_finetuned_emotion_dasooo_pipeline_en.md new file mode 100644 index 00000000000000..0f94decb602274 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_finetuned_emotion_dasooo_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_dasooo_pipeline pipeline DistilBertForSequenceClassification from daSooo +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_dasooo_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_dasooo_pipeline` is a English model originally trained by daSooo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_dasooo_pipeline_en_5.5.0_3.0_1726888877540.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_dasooo_pipeline_en_5.5.0_3.0_1726888877540.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_dasooo_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_dasooo_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_dasooo_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/daSooo/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_finetuned_emotion_hyadav22_en.md b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_finetuned_emotion_hyadav22_en.md new file mode 100644 index 00000000000000..b7df3a30ee5876 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_finetuned_emotion_hyadav22_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_hyadav22 DistilBertForSequenceClassification from hyadav22 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_hyadav22 +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_hyadav22` is a English model originally trained by hyadav22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_hyadav22_en_5.5.0_3.0_1726924161548.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_hyadav22_en_5.5.0_3.0_1726924161548.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_hyadav22","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_hyadav22", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_hyadav22| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/hyadav22/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_finetuned_emotion_hyadav22_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_finetuned_emotion_hyadav22_pipeline_en.md new file mode 100644 index 00000000000000..f01ca34aa7168f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_finetuned_emotion_hyadav22_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_hyadav22_pipeline pipeline DistilBertForSequenceClassification from hyadav22 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_hyadav22_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_hyadav22_pipeline` is a English model originally trained by hyadav22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_hyadav22_pipeline_en_5.5.0_3.0_1726924173837.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_hyadav22_pipeline_en_5.5.0_3.0_1726924173837.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_hyadav22_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_hyadav22_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_hyadav22_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/hyadav22/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_finetuned_emotion_jiogenes_en.md b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_finetuned_emotion_jiogenes_en.md new file mode 100644 index 00000000000000..f70af11c93993d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_finetuned_emotion_jiogenes_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_jiogenes DistilBertForSequenceClassification from jiogenes +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_jiogenes +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_jiogenes` is a English model originally trained by jiogenes. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_jiogenes_en_5.5.0_3.0_1726953188262.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_jiogenes_en_5.5.0_3.0_1726953188262.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_jiogenes","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_jiogenes", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_jiogenes| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/jiogenes/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_finetuned_emotion_jiogenes_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_finetuned_emotion_jiogenes_pipeline_en.md new file mode 100644 index 00000000000000..0cc8fce8625cac --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_finetuned_emotion_jiogenes_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_jiogenes_pipeline pipeline DistilBertForSequenceClassification from jiogenes +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_jiogenes_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_jiogenes_pipeline` is a English model originally trained by jiogenes. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_jiogenes_pipeline_en_5.5.0_3.0_1726953200491.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_jiogenes_pipeline_en_5.5.0_3.0_1726953200491.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_jiogenes_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_jiogenes_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_jiogenes_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/jiogenes/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_finetuned_emotion_large_sets_en.md b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_finetuned_emotion_large_sets_en.md new file mode 100644 index 00000000000000..96fed544ea58c1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_finetuned_emotion_large_sets_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_large_sets DistilBertForSequenceClassification from A01794620 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_large_sets +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_large_sets` is a English model originally trained by A01794620. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_large_sets_en_5.5.0_3.0_1726888739873.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_large_sets_en_5.5.0_3.0_1726888739873.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_large_sets","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_large_sets", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_large_sets| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/A01794620/distilbert-base-uncased-finetuned-emotion-large-sets \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_finetuned_emotion_large_sets_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_finetuned_emotion_large_sets_pipeline_en.md new file mode 100644 index 00000000000000..1832f790b081b2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_finetuned_emotion_large_sets_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_large_sets_pipeline pipeline DistilBertForSequenceClassification from A01794620 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_large_sets_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_large_sets_pipeline` is a English model originally trained by A01794620. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_large_sets_pipeline_en_5.5.0_3.0_1726888752502.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_large_sets_pipeline_en_5.5.0_3.0_1726888752502.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_large_sets_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_large_sets_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_large_sets_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/A01794620/distilbert-base-uncased-finetuned-emotion-large-sets + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_finetuned_emotion_leeht0113_en.md b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_finetuned_emotion_leeht0113_en.md new file mode 100644 index 00000000000000..a2b2ed6be19572 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_finetuned_emotion_leeht0113_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_leeht0113 DistilBertForSequenceClassification from leeht0113 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_leeht0113 +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_leeht0113` is a English model originally trained by leeht0113. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_leeht0113_en_5.5.0_3.0_1726952857645.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_leeht0113_en_5.5.0_3.0_1726952857645.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_leeht0113","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_leeht0113", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_leeht0113| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/leeht0113/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_finetuned_emotion_leeht0113_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_finetuned_emotion_leeht0113_pipeline_en.md new file mode 100644 index 00000000000000..0aef1e8beb5933 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_finetuned_emotion_leeht0113_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_leeht0113_pipeline pipeline DistilBertForSequenceClassification from leeht0113 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_leeht0113_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_leeht0113_pipeline` is a English model originally trained by leeht0113. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_leeht0113_pipeline_en_5.5.0_3.0_1726952869985.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_leeht0113_pipeline_en_5.5.0_3.0_1726952869985.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_leeht0113_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_leeht0113_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_leeht0113_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/leeht0113/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_finetuned_emotion_leotunganh_en.md b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_finetuned_emotion_leotunganh_en.md new file mode 100644 index 00000000000000..cda90210db340d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_finetuned_emotion_leotunganh_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_leotunganh DistilBertForSequenceClassification from LeoTungAnh +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_leotunganh +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_leotunganh` is a English model originally trained by LeoTungAnh. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_leotunganh_en_5.5.0_3.0_1726888764161.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_leotunganh_en_5.5.0_3.0_1726888764161.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_leotunganh","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_leotunganh", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_leotunganh| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/LeoTungAnh/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_finetuned_emotion_leotunganh_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_finetuned_emotion_leotunganh_pipeline_en.md new file mode 100644 index 00000000000000..2c9f9b5fd0f6dd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_finetuned_emotion_leotunganh_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_leotunganh_pipeline pipeline DistilBertForSequenceClassification from LeoTungAnh +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_leotunganh_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_leotunganh_pipeline` is a English model originally trained by LeoTungAnh. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_leotunganh_pipeline_en_5.5.0_3.0_1726888776471.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_leotunganh_pipeline_en_5.5.0_3.0_1726888776471.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_leotunganh_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_leotunganh_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_leotunganh_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/LeoTungAnh/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_finetuned_emotion_omersubasi_en.md b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_finetuned_emotion_omersubasi_en.md new file mode 100644 index 00000000000000..2c4c9ffaba2a94 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_finetuned_emotion_omersubasi_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_omersubasi DistilBertForSequenceClassification from omersubasi +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_omersubasi +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_omersubasi` is a English model originally trained by omersubasi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_omersubasi_en_5.5.0_3.0_1726952857600.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_omersubasi_en_5.5.0_3.0_1726952857600.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_omersubasi","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_omersubasi", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_omersubasi| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/omersubasi/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_finetuned_emotion_omersubasi_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_finetuned_emotion_omersubasi_pipeline_en.md new file mode 100644 index 00000000000000..cbf785d382ea9a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_finetuned_emotion_omersubasi_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_omersubasi_pipeline pipeline DistilBertForSequenceClassification from omersubasi +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_omersubasi_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_omersubasi_pipeline` is a English model originally trained by omersubasi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_omersubasi_pipeline_en_5.5.0_3.0_1726952871947.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_omersubasi_pipeline_en_5.5.0_3.0_1726952871947.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_omersubasi_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_omersubasi_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_omersubasi_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/omersubasi/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_finetuned_emotion_parksuna_en.md b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_finetuned_emotion_parksuna_en.md new file mode 100644 index 00000000000000..e4a98cfeb6a5e3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_finetuned_emotion_parksuna_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_parksuna DistilBertForSequenceClassification from parksuna +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_parksuna +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_parksuna` is a English model originally trained by parksuna. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_parksuna_en_5.5.0_3.0_1726953577536.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_parksuna_en_5.5.0_3.0_1726953577536.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_parksuna","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_parksuna", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_parksuna| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/parksuna/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_finetuned_emotion_parksuna_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_finetuned_emotion_parksuna_pipeline_en.md new file mode 100644 index 00000000000000..504e919db999f6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_finetuned_emotion_parksuna_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_parksuna_pipeline pipeline DistilBertForSequenceClassification from parksuna +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_parksuna_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_parksuna_pipeline` is a English model originally trained by parksuna. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_parksuna_pipeline_en_5.5.0_3.0_1726953589868.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_parksuna_pipeline_en_5.5.0_3.0_1726953589868.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_parksuna_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_parksuna_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_parksuna_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/parksuna/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_finetuned_emotion_rorra_en.md b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_finetuned_emotion_rorra_en.md new file mode 100644 index 00000000000000..e49d453a5dd8ad --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_finetuned_emotion_rorra_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_rorra DistilBertForSequenceClassification from rorra +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_rorra +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_rorra` is a English model originally trained by rorra. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_rorra_en_5.5.0_3.0_1726924056773.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_rorra_en_5.5.0_3.0_1726924056773.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_rorra","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_rorra", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_rorra| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/rorra/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_finetuned_emotion_rorra_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_finetuned_emotion_rorra_pipeline_en.md new file mode 100644 index 00000000000000..6e1fb494825e45 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_finetuned_emotion_rorra_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_rorra_pipeline pipeline DistilBertForSequenceClassification from rorra +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_rorra_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_rorra_pipeline` is a English model originally trained by rorra. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_rorra_pipeline_en_5.5.0_3.0_1726924069293.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_rorra_pipeline_en_5.5.0_3.0_1726924069293.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_rorra_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_rorra_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_rorra_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/rorra/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_finetuned_emotion_tingting0104_en.md b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_finetuned_emotion_tingting0104_en.md new file mode 100644 index 00000000000000..3cfb680edbd7d9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_finetuned_emotion_tingting0104_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_tingting0104 DistilBertForSequenceClassification from TingTing0104 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_tingting0104 +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_tingting0104` is a English model originally trained by TingTing0104. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_tingting0104_en_5.5.0_3.0_1726889004519.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_tingting0104_en_5.5.0_3.0_1726889004519.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_tingting0104","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_tingting0104", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_tingting0104| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/TingTing0104/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_finetuned_emotion_tingting0104_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_finetuned_emotion_tingting0104_pipeline_en.md new file mode 100644 index 00000000000000..a289e6613c9de7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_finetuned_emotion_tingting0104_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_tingting0104_pipeline pipeline DistilBertForSequenceClassification from TingTing0104 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_tingting0104_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_tingting0104_pipeline` is a English model originally trained by TingTing0104. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_tingting0104_pipeline_en_5.5.0_3.0_1726889016212.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_tingting0104_pipeline_en_5.5.0_3.0_1726889016212.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_tingting0104_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_tingting0104_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_tingting0104_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/TingTing0104/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_finetuned_emotion_yerkekz_en.md b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_finetuned_emotion_yerkekz_en.md new file mode 100644 index 00000000000000..42d14e009398d0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_finetuned_emotion_yerkekz_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_yerkekz DistilBertForSequenceClassification from yerkekz +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_yerkekz +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_yerkekz` is a English model originally trained by yerkekz. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_yerkekz_en_5.5.0_3.0_1726884582353.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_yerkekz_en_5.5.0_3.0_1726884582353.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_yerkekz","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_yerkekz", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_yerkekz| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/yerkekz/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_finetuned_emotion_yerkekz_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_finetuned_emotion_yerkekz_pipeline_en.md new file mode 100644 index 00000000000000..37e28e514e1f97 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_finetuned_emotion_yerkekz_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_yerkekz_pipeline pipeline DistilBertForSequenceClassification from yerkekz +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_yerkekz_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_yerkekz_pipeline` is a English model originally trained by yerkekz. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_yerkekz_pipeline_en_5.5.0_3.0_1726884594212.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_yerkekz_pipeline_en_5.5.0_3.0_1726884594212.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_yerkekz_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_yerkekz_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_yerkekz_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/yerkekz/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_finetuned_emotions_piernik_en.md b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_finetuned_emotions_piernik_en.md new file mode 100644 index 00000000000000..9a98644f5b28f5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_finetuned_emotions_piernik_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotions_piernik DistilBertForSequenceClassification from piernikr +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotions_piernik +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotions_piernik` is a English model originally trained by piernikr. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotions_piernik_en_5.5.0_3.0_1726884465041.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotions_piernik_en_5.5.0_3.0_1726884465041.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotions_piernik","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotions_piernik", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotions_piernik| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/piernikr/distilbert-base-uncased-finetuned-emotions-piernik \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_finetuned_emotions_piernik_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_finetuned_emotions_piernik_pipeline_en.md new file mode 100644 index 00000000000000..a170ff6ea4573a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_finetuned_emotions_piernik_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotions_piernik_pipeline pipeline DistilBertForSequenceClassification from piernikr +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotions_piernik_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotions_piernik_pipeline` is a English model originally trained by piernikr. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotions_piernik_pipeline_en_5.5.0_3.0_1726884476772.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotions_piernik_pipeline_en_5.5.0_3.0_1726884476772.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotions_piernik_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotions_piernik_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotions_piernik_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/piernikr/distilbert-base-uncased-finetuned-emotions-piernik + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_finetuned_language_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_finetuned_language_pipeline_en.md new file mode 100644 index 00000000000000..cb1ead946a031d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_finetuned_language_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_language_pipeline pipeline DistilBertForSequenceClassification from davidliu0716 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_language_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_language_pipeline` is a English model originally trained by davidliu0716. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_language_pipeline_en_5.5.0_3.0_1726924110292.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_language_pipeline_en_5.5.0_3.0_1726924110292.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_language_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_language_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_language_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/davidliu0716/distilbert-base-uncased-finetuned-language + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_finetuned_nlp_letters_full_text_degendered_class_weighted_ben_yu_en.md b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_finetuned_nlp_letters_full_text_degendered_class_weighted_ben_yu_en.md new file mode 100644 index 00000000000000..ad9eec2c363cc2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_finetuned_nlp_letters_full_text_degendered_class_weighted_ben_yu_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_nlp_letters_full_text_degendered_class_weighted_ben_yu DistilBertForSequenceClassification from ben-yu +author: John Snow Labs +name: distilbert_base_uncased_finetuned_nlp_letters_full_text_degendered_class_weighted_ben_yu +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_nlp_letters_full_text_degendered_class_weighted_ben_yu` is a English model originally trained by ben-yu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_nlp_letters_full_text_degendered_class_weighted_ben_yu_en_5.5.0_3.0_1726884857751.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_nlp_letters_full_text_degendered_class_weighted_ben_yu_en_5.5.0_3.0_1726884857751.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_nlp_letters_full_text_degendered_class_weighted_ben_yu","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_nlp_letters_full_text_degendered_class_weighted_ben_yu", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_nlp_letters_full_text_degendered_class_weighted_ben_yu| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/ben-yu/distilbert-base-uncased-finetuned-nlp-letters-full_text-degendered-class-weighted \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_finetuned_nlp_letters_full_text_degendered_class_weighted_ben_yu_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_finetuned_nlp_letters_full_text_degendered_class_weighted_ben_yu_pipeline_en.md new file mode 100644 index 00000000000000..616e45f2a9b18c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_finetuned_nlp_letters_full_text_degendered_class_weighted_ben_yu_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_nlp_letters_full_text_degendered_class_weighted_ben_yu_pipeline pipeline DistilBertForSequenceClassification from ben-yu +author: John Snow Labs +name: distilbert_base_uncased_finetuned_nlp_letters_full_text_degendered_class_weighted_ben_yu_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_nlp_letters_full_text_degendered_class_weighted_ben_yu_pipeline` is a English model originally trained by ben-yu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_nlp_letters_full_text_degendered_class_weighted_ben_yu_pipeline_en_5.5.0_3.0_1726884869452.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_nlp_letters_full_text_degendered_class_weighted_ben_yu_pipeline_en_5.5.0_3.0_1726884869452.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_nlp_letters_full_text_degendered_class_weighted_ben_yu_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_nlp_letters_full_text_degendered_class_weighted_ben_yu_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_nlp_letters_full_text_degendered_class_weighted_ben_yu_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/ben-yu/distilbert-base-uncased-finetuned-nlp-letters-full_text-degendered-class-weighted + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_finetuned_qnli_abhinavreddy17_en.md b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_finetuned_qnli_abhinavreddy17_en.md new file mode 100644 index 00000000000000..dc208a6bebd341 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_finetuned_qnli_abhinavreddy17_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_qnli_abhinavreddy17 DistilBertForSequenceClassification from abhinavreddy17 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_qnli_abhinavreddy17 +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_qnli_abhinavreddy17` is a English model originally trained by abhinavreddy17. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_qnli_abhinavreddy17_en_5.5.0_3.0_1726884494367.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_qnli_abhinavreddy17_en_5.5.0_3.0_1726884494367.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_qnli_abhinavreddy17","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_qnli_abhinavreddy17", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_qnli_abhinavreddy17| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/abhinavreddy17/distilbert-base-uncased-finetuned-qnli \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_finetuned_qnli_abhinavreddy17_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_finetuned_qnli_abhinavreddy17_pipeline_en.md new file mode 100644 index 00000000000000..1c5557116f5300 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_finetuned_qnli_abhinavreddy17_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_qnli_abhinavreddy17_pipeline pipeline DistilBertForSequenceClassification from abhinavreddy17 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_qnli_abhinavreddy17_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_qnli_abhinavreddy17_pipeline` is a English model originally trained by abhinavreddy17. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_qnli_abhinavreddy17_pipeline_en_5.5.0_3.0_1726884506167.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_qnli_abhinavreddy17_pipeline_en_5.5.0_3.0_1726884506167.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_qnli_abhinavreddy17_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_qnli_abhinavreddy17_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_qnli_abhinavreddy17_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/abhinavreddy17/distilbert-base-uncased-finetuned-qnli + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_finetuned_sst_2_english_tuning_amazon_baby_5000_en.md b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_finetuned_sst_2_english_tuning_amazon_baby_5000_en.md new file mode 100644 index 00000000000000..50c7d04e1203ac --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_finetuned_sst_2_english_tuning_amazon_baby_5000_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_sst_2_english_tuning_amazon_baby_5000 DistilBertForSequenceClassification from Vicman229 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_sst_2_english_tuning_amazon_baby_5000 +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_sst_2_english_tuning_amazon_baby_5000` is a English model originally trained by Vicman229. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_sst_2_english_tuning_amazon_baby_5000_en_5.5.0_3.0_1726884669543.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_sst_2_english_tuning_amazon_baby_5000_en_5.5.0_3.0_1726884669543.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_sst_2_english_tuning_amazon_baby_5000","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_sst_2_english_tuning_amazon_baby_5000", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_sst_2_english_tuning_amazon_baby_5000| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Vicman229/distilbert-base-uncased-finetuned-sst-2-english-tuning-amazon-baby-5000 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_finetuned_sst_2_english_tuning_amazon_baby_5000_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_finetuned_sst_2_english_tuning_amazon_baby_5000_pipeline_en.md new file mode 100644 index 00000000000000..2dc6b6d373d870 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_finetuned_sst_2_english_tuning_amazon_baby_5000_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_sst_2_english_tuning_amazon_baby_5000_pipeline pipeline DistilBertForSequenceClassification from Vicman229 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_sst_2_english_tuning_amazon_baby_5000_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_sst_2_english_tuning_amazon_baby_5000_pipeline` is a English model originally trained by Vicman229. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_sst_2_english_tuning_amazon_baby_5000_pipeline_en_5.5.0_3.0_1726884681513.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_sst_2_english_tuning_amazon_baby_5000_pipeline_en_5.5.0_3.0_1726884681513.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_sst_2_english_tuning_amazon_baby_5000_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_sst_2_english_tuning_amazon_baby_5000_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_sst_2_english_tuning_amazon_baby_5000_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Vicman229/distilbert-base-uncased-finetuned-sst-2-english-tuning-amazon-baby-5000 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large10pfxnf_simsp_en.md b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large10pfxnf_simsp_en.md new file mode 100644 index 00000000000000..9687a297d1e1dc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large10pfxnf_simsp_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large10pfxnf_simsp DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large10pfxnf_simsp +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large10pfxnf_simsp` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large10pfxnf_simsp_en_5.5.0_3.0_1726884786154.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large10pfxnf_simsp_en_5.5.0_3.0_1726884786154.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large10pfxnf_simsp","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large10pfxnf_simsp", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large10pfxnf_simsp| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.6 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st42sd_ut72ut1large10PfxNf_simsp \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large10pfxnf_simsp_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large10pfxnf_simsp_pipeline_en.md new file mode 100644 index 00000000000000..d3b11a8be5e1d1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large10pfxnf_simsp_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large10pfxnf_simsp_pipeline pipeline DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large10pfxnf_simsp_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large10pfxnf_simsp_pipeline` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large10pfxnf_simsp_pipeline_en_5.5.0_3.0_1726884797954.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large10pfxnf_simsp_pipeline_en_5.5.0_3.0_1726884797954.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large10pfxnf_simsp_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large10pfxnf_simsp_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large10pfxnf_simsp_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.7 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st42sd_ut72ut1large10PfxNf_simsp + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large53pfxnf_simsp_en.md b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large53pfxnf_simsp_en.md new file mode 100644 index 00000000000000..f398150cfcaf96 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large53pfxnf_simsp_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large53pfxnf_simsp DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large53pfxnf_simsp +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large53pfxnf_simsp` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large53pfxnf_simsp_en_5.5.0_3.0_1726884765202.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large53pfxnf_simsp_en_5.5.0_3.0_1726884765202.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large53pfxnf_simsp","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large53pfxnf_simsp", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large53pfxnf_simsp| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.6 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st42sd_ut72ut1large53PfxNf_simsp \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large53pfxnf_simsp_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large53pfxnf_simsp_pipeline_en.md new file mode 100644 index 00000000000000..de5f147770d17c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large53pfxnf_simsp_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large53pfxnf_simsp_pipeline pipeline DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large53pfxnf_simsp_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large53pfxnf_simsp_pipeline` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large53pfxnf_simsp_pipeline_en_5.5.0_3.0_1726884777282.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large53pfxnf_simsp_pipeline_en_5.5.0_3.0_1726884777282.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large53pfxnf_simsp_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large53pfxnf_simsp_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large53pfxnf_simsp_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.7 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st42sd_ut72ut1large53PfxNf_simsp + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_odm_zphr_0st6sd_ut72ut1large6pfxnf_simsp400_clean200_en.md b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_odm_zphr_0st6sd_ut72ut1large6pfxnf_simsp400_clean200_en.md new file mode 100644 index 00000000000000..913a2975288a6c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_odm_zphr_0st6sd_ut72ut1large6pfxnf_simsp400_clean200_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st6sd_ut72ut1large6pfxnf_simsp400_clean200 DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st6sd_ut72ut1large6pfxnf_simsp400_clean200 +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st6sd_ut72ut1large6pfxnf_simsp400_clean200` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st6sd_ut72ut1large6pfxnf_simsp400_clean200_en_5.5.0_3.0_1726952913388.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st6sd_ut72ut1large6pfxnf_simsp400_clean200_en_5.5.0_3.0_1726952913388.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st6sd_ut72ut1large6pfxnf_simsp400_clean200","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st6sd_ut72ut1large6pfxnf_simsp400_clean200", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st6sd_ut72ut1large6pfxnf_simsp400_clean200| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.6 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st6sd_ut72ut1large6PfxNf_simsp400_clean200 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_odm_zphr_0st6sd_ut72ut1large6pfxnf_simsp400_clean200_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_odm_zphr_0st6sd_ut72ut1large6pfxnf_simsp400_clean200_pipeline_en.md new file mode 100644 index 00000000000000..b022d4624a5520 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_odm_zphr_0st6sd_ut72ut1large6pfxnf_simsp400_clean200_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st6sd_ut72ut1large6pfxnf_simsp400_clean200_pipeline pipeline DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st6sd_ut72ut1large6pfxnf_simsp400_clean200_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st6sd_ut72ut1large6pfxnf_simsp400_clean200_pipeline` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st6sd_ut72ut1large6pfxnf_simsp400_clean200_pipeline_en_5.5.0_3.0_1726952925884.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st6sd_ut72ut1large6pfxnf_simsp400_clean200_pipeline_en_5.5.0_3.0_1726952925884.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st6sd_ut72ut1large6pfxnf_simsp400_clean200_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st6sd_ut72ut1large6pfxnf_simsp400_clean200_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st6sd_ut72ut1large6pfxnf_simsp400_clean200_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.7 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st6sd_ut72ut1large6PfxNf_simsp400_clean200 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_odm_zphr_0st8sd_ut52ut1_plprefix0stlarge_simsp100_en.md b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_odm_zphr_0st8sd_ut52ut1_plprefix0stlarge_simsp100_en.md new file mode 100644 index 00000000000000..1d1b43cc9cfaa0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_odm_zphr_0st8sd_ut52ut1_plprefix0stlarge_simsp100_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st8sd_ut52ut1_plprefix0stlarge_simsp100 DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st8sd_ut52ut1_plprefix0stlarge_simsp100 +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st8sd_ut52ut1_plprefix0stlarge_simsp100` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st8sd_ut52ut1_plprefix0stlarge_simsp100_en_5.5.0_3.0_1726884762567.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st8sd_ut52ut1_plprefix0stlarge_simsp100_en_5.5.0_3.0_1726884762567.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st8sd_ut52ut1_plprefix0stlarge_simsp100","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st8sd_ut52ut1_plprefix0stlarge_simsp100", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st8sd_ut52ut1_plprefix0stlarge_simsp100| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.6 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st8sd_ut52ut1_PLPrefix0stlarge_simsp100 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_odm_zphr_0st8sd_ut52ut1_plprefix0stlarge_simsp100_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_odm_zphr_0st8sd_ut52ut1_plprefix0stlarge_simsp100_pipeline_en.md new file mode 100644 index 00000000000000..5ebb8d0ee30e49 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_odm_zphr_0st8sd_ut52ut1_plprefix0stlarge_simsp100_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st8sd_ut52ut1_plprefix0stlarge_simsp100_pipeline pipeline DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st8sd_ut52ut1_plprefix0stlarge_simsp100_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st8sd_ut52ut1_plprefix0stlarge_simsp100_pipeline` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st8sd_ut52ut1_plprefix0stlarge_simsp100_pipeline_en_5.5.0_3.0_1726884774548.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st8sd_ut52ut1_plprefix0stlarge_simsp100_pipeline_en_5.5.0_3.0_1726884774548.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st8sd_ut52ut1_plprefix0stlarge_simsp100_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st8sd_ut52ut1_plprefix0stlarge_simsp100_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st8sd_ut52ut1_plprefix0stlarge_simsp100_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.7 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st8sd_ut52ut1_PLPrefix0stlarge_simsp100 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_rile_v1_fully_frozen_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_rile_v1_fully_frozen_pipeline_en.md new file mode 100644 index 00000000000000..2ec578d671179f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_rile_v1_fully_frozen_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_rile_v1_fully_frozen_pipeline pipeline DistilBertForSequenceClassification from kghanlon +author: John Snow Labs +name: distilbert_base_uncased_rile_v1_fully_frozen_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_rile_v1_fully_frozen_pipeline` is a English model originally trained by kghanlon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_rile_v1_fully_frozen_pipeline_en_5.5.0_3.0_1726924448372.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_rile_v1_fully_frozen_pipeline_en_5.5.0_3.0_1726924448372.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_rile_v1_fully_frozen_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_rile_v1_fully_frozen_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_rile_v1_fully_frozen_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/kghanlon/distilbert-base-uncased-RILE-v1_fully_frozen + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_small_talk_zphr_0st_ut52ut5_ad7_simsp_clean_en.md b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_small_talk_zphr_0st_ut52ut5_ad7_simsp_clean_en.md new file mode 100644 index 00000000000000..9243f6d04693dd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_small_talk_zphr_0st_ut52ut5_ad7_simsp_clean_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_small_talk_zphr_0st_ut52ut5_ad7_simsp_clean DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_small_talk_zphr_0st_ut52ut5_ad7_simsp_clean +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_small_talk_zphr_0st_ut52ut5_ad7_simsp_clean` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_small_talk_zphr_0st_ut52ut5_ad7_simsp_clean_en_5.5.0_3.0_1726923818099.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_small_talk_zphr_0st_ut52ut5_ad7_simsp_clean_en_5.5.0_3.0_1726923818099.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_small_talk_zphr_0st_ut52ut5_ad7_simsp_clean","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_small_talk_zphr_0st_ut52ut5_ad7_simsp_clean", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_small_talk_zphr_0st_ut52ut5_ad7_simsp_clean| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_small_talk_zphr_0st_ut52ut5_ad7_simsp_clean \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_small_talk_zphr_0st_ut52ut5_ad7_simsp_clean_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_small_talk_zphr_0st_ut52ut5_ad7_simsp_clean_pipeline_en.md new file mode 100644 index 00000000000000..151a6f1fbca169 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_small_talk_zphr_0st_ut52ut5_ad7_simsp_clean_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_small_talk_zphr_0st_ut52ut5_ad7_simsp_clean_pipeline pipeline DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_small_talk_zphr_0st_ut52ut5_ad7_simsp_clean_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_small_talk_zphr_0st_ut52ut5_ad7_simsp_clean_pipeline` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_small_talk_zphr_0st_ut52ut5_ad7_simsp_clean_pipeline_en_5.5.0_3.0_1726923830310.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_small_talk_zphr_0st_ut52ut5_ad7_simsp_clean_pipeline_en_5.5.0_3.0_1726923830310.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_small_talk_zphr_0st_ut52ut5_ad7_simsp_clean_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_small_talk_zphr_0st_ut52ut5_ad7_simsp_clean_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_small_talk_zphr_0st_ut52ut5_ad7_simsp_clean_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_small_talk_zphr_0st_ut52ut5_ad7_simsp_clean + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_travel_zphr_0st_refine_cl_en.md b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_travel_zphr_0st_refine_cl_en.md new file mode 100644 index 00000000000000..2dc77db2b7d492 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_travel_zphr_0st_refine_cl_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_travel_zphr_0st_refine_cl DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_travel_zphr_0st_refine_cl +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_travel_zphr_0st_refine_cl` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_travel_zphr_0st_refine_cl_en_5.5.0_3.0_1726924300028.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_travel_zphr_0st_refine_cl_en_5.5.0_3.0_1726924300028.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_travel_zphr_0st_refine_cl","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_travel_zphr_0st_refine_cl", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_travel_zphr_0st_refine_cl| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_travel_zphr_0st_refine_cl \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_travel_zphr_0st_refine_cl_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_travel_zphr_0st_refine_cl_pipeline_en.md new file mode 100644 index 00000000000000..c975ff522b1620 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_travel_zphr_0st_refine_cl_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_travel_zphr_0st_refine_cl_pipeline pipeline DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_travel_zphr_0st_refine_cl_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_travel_zphr_0st_refine_cl_pipeline` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_travel_zphr_0st_refine_cl_pipeline_en_5.5.0_3.0_1726924311919.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_travel_zphr_0st_refine_cl_pipeline_en_5.5.0_3.0_1726924311919.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_travel_zphr_0st_refine_cl_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_travel_zphr_0st_refine_cl_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_travel_zphr_0st_refine_cl_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_travel_zphr_0st_refine_cl + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_travel_zphr_0st_ut102ut1_plainprefix_simsp_en.md b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_travel_zphr_0st_ut102ut1_plainprefix_simsp_en.md new file mode 100644 index 00000000000000..56f317e1afc0ee --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_travel_zphr_0st_ut102ut1_plainprefix_simsp_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_travel_zphr_0st_ut102ut1_plainprefix_simsp DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_travel_zphr_0st_ut102ut1_plainprefix_simsp +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_travel_zphr_0st_ut102ut1_plainprefix_simsp` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_travel_zphr_0st_ut102ut1_plainprefix_simsp_en_5.5.0_3.0_1726952939075.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_travel_zphr_0st_ut102ut1_plainprefix_simsp_en_5.5.0_3.0_1726952939075.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_travel_zphr_0st_ut102ut1_plainprefix_simsp","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_travel_zphr_0st_ut102ut1_plainprefix_simsp", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_travel_zphr_0st_ut102ut1_plainprefix_simsp| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_travel_zphr_0st_ut102ut1_plainPrefix_simsp \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_travel_zphr_0st_ut102ut1_plainprefix_simsp_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_travel_zphr_0st_ut102ut1_plainprefix_simsp_pipeline_en.md new file mode 100644 index 00000000000000..d6f94cf2a9ea29 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_travel_zphr_0st_ut102ut1_plainprefix_simsp_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_travel_zphr_0st_ut102ut1_plainprefix_simsp_pipeline pipeline DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_travel_zphr_0st_ut102ut1_plainprefix_simsp_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_travel_zphr_0st_ut102ut1_plainprefix_simsp_pipeline` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_travel_zphr_0st_ut102ut1_plainprefix_simsp_pipeline_en_5.5.0_3.0_1726952952240.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_travel_zphr_0st_ut102ut1_plainprefix_simsp_pipeline_en_5.5.0_3.0_1726952952240.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_travel_zphr_0st_ut102ut1_plainprefix_simsp_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_travel_zphr_0st_ut102ut1_plainprefix_simsp_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_travel_zphr_0st_ut102ut1_plainprefix_simsp_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_travel_zphr_0st_ut102ut1_plainPrefix_simsp + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_travel_zphr_0st_ut72ut7_plain_sp_clean_en.md b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_travel_zphr_0st_ut72ut7_plain_sp_clean_en.md new file mode 100644 index 00000000000000..18c7e16d3d0789 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_travel_zphr_0st_ut72ut7_plain_sp_clean_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_travel_zphr_0st_ut72ut7_plain_sp_clean DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_travel_zphr_0st_ut72ut7_plain_sp_clean +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_travel_zphr_0st_ut72ut7_plain_sp_clean` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_travel_zphr_0st_ut72ut7_plain_sp_clean_en_5.5.0_3.0_1726884911510.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_travel_zphr_0st_ut72ut7_plain_sp_clean_en_5.5.0_3.0_1726884911510.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_travel_zphr_0st_ut72ut7_plain_sp_clean","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_travel_zphr_0st_ut72ut7_plain_sp_clean", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_travel_zphr_0st_ut72ut7_plain_sp_clean| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_travel_zphr_0st_ut72ut7_plain_sp_clean \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_travel_zphr_0st_ut72ut7_plain_sp_clean_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_travel_zphr_0st_ut72ut7_plain_sp_clean_pipeline_en.md new file mode 100644 index 00000000000000..2f1557953875cd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_travel_zphr_0st_ut72ut7_plain_sp_clean_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_travel_zphr_0st_ut72ut7_plain_sp_clean_pipeline pipeline DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_travel_zphr_0st_ut72ut7_plain_sp_clean_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_travel_zphr_0st_ut72ut7_plain_sp_clean_pipeline` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_travel_zphr_0st_ut72ut7_plain_sp_clean_pipeline_en_5.5.0_3.0_1726884923526.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_travel_zphr_0st_ut72ut7_plain_sp_clean_pipeline_en_5.5.0_3.0_1726884923526.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_travel_zphr_0st_ut72ut7_plain_sp_clean_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_travel_zphr_0st_ut72ut7_plain_sp_clean_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_travel_zphr_0st_ut72ut7_plain_sp_clean_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_travel_zphr_0st_ut72ut7_plain_sp_clean + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_utility_zphr_0st_ut72ut7_plain_simsp_clean_en.md b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_utility_zphr_0st_ut72ut7_plain_simsp_clean_en.md new file mode 100644 index 00000000000000..3ef792e984d8b3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_utility_zphr_0st_ut72ut7_plain_simsp_clean_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_utility_zphr_0st_ut72ut7_plain_simsp_clean DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_utility_zphr_0st_ut72ut7_plain_simsp_clean +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_utility_zphr_0st_ut72ut7_plain_simsp_clean` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_utility_zphr_0st_ut72ut7_plain_simsp_clean_en_5.5.0_3.0_1726889100361.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_utility_zphr_0st_ut72ut7_plain_simsp_clean_en_5.5.0_3.0_1726889100361.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_utility_zphr_0st_ut72ut7_plain_simsp_clean","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_utility_zphr_0st_ut72ut7_plain_simsp_clean", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_utility_zphr_0st_ut72ut7_plain_simsp_clean| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_utility_zphr_0st_ut72ut7_plain_simsp_clean \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_utility_zphr_0st_ut72ut7_plain_simsp_clean_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_utility_zphr_0st_ut72ut7_plain_simsp_clean_pipeline_en.md new file mode 100644 index 00000000000000..d6d1763e28cdac --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_utility_zphr_0st_ut72ut7_plain_simsp_clean_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_utility_zphr_0st_ut72ut7_plain_simsp_clean_pipeline pipeline DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_utility_zphr_0st_ut72ut7_plain_simsp_clean_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_utility_zphr_0st_ut72ut7_plain_simsp_clean_pipeline` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_utility_zphr_0st_ut72ut7_plain_simsp_clean_pipeline_en_5.5.0_3.0_1726889111995.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_utility_zphr_0st_ut72ut7_plain_simsp_clean_pipeline_en_5.5.0_3.0_1726889111995.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_utility_zphr_0st_ut72ut7_plain_simsp_clean_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_utility_zphr_0st_ut72ut7_plain_simsp_clean_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_utility_zphr_0st_ut72ut7_plain_simsp_clean_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_utility_zphr_0st_ut72ut7_plain_simsp_clean + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distilbert_emotion_aliciiavs_en.md b/docs/_posts/ahmedlone127/2024-09-21-distilbert_emotion_aliciiavs_en.md new file mode 100644 index 00000000000000..ae577092e0fa8c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distilbert_emotion_aliciiavs_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_emotion_aliciiavs DistilBertForSequenceClassification from aliciiavs +author: John Snow Labs +name: distilbert_emotion_aliciiavs +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_emotion_aliciiavs` is a English model originally trained by aliciiavs. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_emotion_aliciiavs_en_5.5.0_3.0_1726888838940.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_emotion_aliciiavs_en_5.5.0_3.0_1726888838940.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_emotion_aliciiavs","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_emotion_aliciiavs", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_emotion_aliciiavs| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/aliciiavs/distilbert-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distilbert_emotion_aliciiavs_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-distilbert_emotion_aliciiavs_pipeline_en.md new file mode 100644 index 00000000000000..fc848f2702fcf1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distilbert_emotion_aliciiavs_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_emotion_aliciiavs_pipeline pipeline DistilBertForSequenceClassification from aliciiavs +author: John Snow Labs +name: distilbert_emotion_aliciiavs_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_emotion_aliciiavs_pipeline` is a English model originally trained by aliciiavs. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_emotion_aliciiavs_pipeline_en_5.5.0_3.0_1726888852343.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_emotion_aliciiavs_pipeline_en_5.5.0_3.0_1726888852343.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_emotion_aliciiavs_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_emotion_aliciiavs_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_emotion_aliciiavs_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/aliciiavs/distilbert-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distilbert_emotion_sangmitra_06_en.md b/docs/_posts/ahmedlone127/2024-09-21-distilbert_emotion_sangmitra_06_en.md new file mode 100644 index 00000000000000..5099ff9280ff39 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distilbert_emotion_sangmitra_06_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_emotion_sangmitra_06 DistilBertForSequenceClassification from Sangmitra-06 +author: John Snow Labs +name: distilbert_emotion_sangmitra_06 +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_emotion_sangmitra_06` is a English model originally trained by Sangmitra-06. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_emotion_sangmitra_06_en_5.5.0_3.0_1726884945356.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_emotion_sangmitra_06_en_5.5.0_3.0_1726884945356.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_emotion_sangmitra_06","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_emotion_sangmitra_06", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_emotion_sangmitra_06| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Sangmitra-06/DistilBERT_emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distilbert_emotion_sangmitra_06_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-distilbert_emotion_sangmitra_06_pipeline_en.md new file mode 100644 index 00000000000000..3826a268d816b9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distilbert_emotion_sangmitra_06_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_emotion_sangmitra_06_pipeline pipeline DistilBertForSequenceClassification from Sangmitra-06 +author: John Snow Labs +name: distilbert_emotion_sangmitra_06_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_emotion_sangmitra_06_pipeline` is a English model originally trained by Sangmitra-06. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_emotion_sangmitra_06_pipeline_en_5.5.0_3.0_1726884956825.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_emotion_sangmitra_06_pipeline_en_5.5.0_3.0_1726884956825.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_emotion_sangmitra_06_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_emotion_sangmitra_06_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_emotion_sangmitra_06_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Sangmitra-06/DistilBERT_emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distilbert_finetuned_imdb_sentiment_uchaturvedi_en.md b/docs/_posts/ahmedlone127/2024-09-21-distilbert_finetuned_imdb_sentiment_uchaturvedi_en.md new file mode 100644 index 00000000000000..425f087d6ec24e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distilbert_finetuned_imdb_sentiment_uchaturvedi_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_finetuned_imdb_sentiment_uchaturvedi DistilBertForSequenceClassification from uchaturvedi +author: John Snow Labs +name: distilbert_finetuned_imdb_sentiment_uchaturvedi +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_finetuned_imdb_sentiment_uchaturvedi` is a English model originally trained by uchaturvedi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_finetuned_imdb_sentiment_uchaturvedi_en_5.5.0_3.0_1726953156754.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_finetuned_imdb_sentiment_uchaturvedi_en_5.5.0_3.0_1726953156754.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_finetuned_imdb_sentiment_uchaturvedi","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_finetuned_imdb_sentiment_uchaturvedi", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_finetuned_imdb_sentiment_uchaturvedi| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/uchaturvedi/distilbert-finetuned-imdb-sentiment \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distilbert_finetuned_imdb_sentiment_uchaturvedi_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-distilbert_finetuned_imdb_sentiment_uchaturvedi_pipeline_en.md new file mode 100644 index 00000000000000..88723e1e64097f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distilbert_finetuned_imdb_sentiment_uchaturvedi_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_finetuned_imdb_sentiment_uchaturvedi_pipeline pipeline DistilBertForSequenceClassification from uchaturvedi +author: John Snow Labs +name: distilbert_finetuned_imdb_sentiment_uchaturvedi_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_finetuned_imdb_sentiment_uchaturvedi_pipeline` is a English model originally trained by uchaturvedi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_finetuned_imdb_sentiment_uchaturvedi_pipeline_en_5.5.0_3.0_1726953168793.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_finetuned_imdb_sentiment_uchaturvedi_pipeline_en_5.5.0_3.0_1726953168793.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_finetuned_imdb_sentiment_uchaturvedi_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_finetuned_imdb_sentiment_uchaturvedi_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_finetuned_imdb_sentiment_uchaturvedi_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/uchaturvedi/distilbert-finetuned-imdb-sentiment + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distilbert_finetuning_sentime_movie_model_skspend_en.md b/docs/_posts/ahmedlone127/2024-09-21-distilbert_finetuning_sentime_movie_model_skspend_en.md new file mode 100644 index 00000000000000..c2e988830c74fd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distilbert_finetuning_sentime_movie_model_skspend_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_finetuning_sentime_movie_model_skspend DistilBertForSequenceClassification from skspend +author: John Snow Labs +name: distilbert_finetuning_sentime_movie_model_skspend +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_finetuning_sentime_movie_model_skspend` is a English model originally trained by skspend. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_finetuning_sentime_movie_model_skspend_en_5.5.0_3.0_1726888699651.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_finetuning_sentime_movie_model_skspend_en_5.5.0_3.0_1726888699651.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_finetuning_sentime_movie_model_skspend","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_finetuning_sentime_movie_model_skspend", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_finetuning_sentime_movie_model_skspend| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/skspend/distilbert_finetuning-sentime-movie-model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distilbert_finetuning_sentime_movie_model_skspend_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-distilbert_finetuning_sentime_movie_model_skspend_pipeline_en.md new file mode 100644 index 00000000000000..4d8b8137a975bd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distilbert_finetuning_sentime_movie_model_skspend_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_finetuning_sentime_movie_model_skspend_pipeline pipeline DistilBertForSequenceClassification from skspend +author: John Snow Labs +name: distilbert_finetuning_sentime_movie_model_skspend_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_finetuning_sentime_movie_model_skspend_pipeline` is a English model originally trained by skspend. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_finetuning_sentime_movie_model_skspend_pipeline_en_5.5.0_3.0_1726888711702.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_finetuning_sentime_movie_model_skspend_pipeline_en_5.5.0_3.0_1726888711702.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_finetuning_sentime_movie_model_skspend_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_finetuning_sentime_movie_model_skspend_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_finetuning_sentime_movie_model_skspend_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/skspend/distilbert_finetuning-sentime-movie-model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distilbert_food_vespinoza_en.md b/docs/_posts/ahmedlone127/2024-09-21-distilbert_food_vespinoza_en.md new file mode 100644 index 00000000000000..7e6e1b180be208 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distilbert_food_vespinoza_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_food_vespinoza DistilBertForSequenceClassification from vespinoza +author: John Snow Labs +name: distilbert_food_vespinoza +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_food_vespinoza` is a English model originally trained by vespinoza. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_food_vespinoza_en_5.5.0_3.0_1726924213614.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_food_vespinoza_en_5.5.0_3.0_1726924213614.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_food_vespinoza","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_food_vespinoza", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_food_vespinoza| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/vespinoza/distilbert-food \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distilbert_food_vespinoza_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-distilbert_food_vespinoza_pipeline_en.md new file mode 100644 index 00000000000000..974bdbcd582dc4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distilbert_food_vespinoza_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_food_vespinoza_pipeline pipeline DistilBertForSequenceClassification from vespinoza +author: John Snow Labs +name: distilbert_food_vespinoza_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_food_vespinoza_pipeline` is a English model originally trained by vespinoza. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_food_vespinoza_pipeline_en_5.5.0_3.0_1726924226174.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_food_vespinoza_pipeline_en_5.5.0_3.0_1726924226174.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_food_vespinoza_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_food_vespinoza_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_food_vespinoza_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/vespinoza/distilbert-food + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distilbert_mouse_enhancers_en.md b/docs/_posts/ahmedlone127/2024-09-21-distilbert_mouse_enhancers_en.md new file mode 100644 index 00000000000000..f3047b3678a5b2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distilbert_mouse_enhancers_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_mouse_enhancers DistilBertForSequenceClassification from addykan +author: John Snow Labs +name: distilbert_mouse_enhancers +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_mouse_enhancers` is a English model originally trained by addykan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_mouse_enhancers_en_5.5.0_3.0_1726953290100.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_mouse_enhancers_en_5.5.0_3.0_1726953290100.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_mouse_enhancers","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_mouse_enhancers", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_mouse_enhancers| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/addykan/distilbert-mouse-enhancers \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distilbert_mouse_enhancers_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-distilbert_mouse_enhancers_pipeline_en.md new file mode 100644 index 00000000000000..44abc2a015bf81 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distilbert_mouse_enhancers_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_mouse_enhancers_pipeline pipeline DistilBertForSequenceClassification from addykan +author: John Snow Labs +name: distilbert_mouse_enhancers_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_mouse_enhancers_pipeline` is a English model originally trained by addykan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_mouse_enhancers_pipeline_en_5.5.0_3.0_1726953302309.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_mouse_enhancers_pipeline_en_5.5.0_3.0_1726953302309.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_mouse_enhancers_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_mouse_enhancers_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_mouse_enhancers_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/addykan/distilbert-mouse-enhancers + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distilbert_sanskrit_saskta_glue_experiment_data_aug_cola_384_en.md b/docs/_posts/ahmedlone127/2024-09-21-distilbert_sanskrit_saskta_glue_experiment_data_aug_cola_384_en.md new file mode 100644 index 00000000000000..762dc806bdb9e2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distilbert_sanskrit_saskta_glue_experiment_data_aug_cola_384_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_sanskrit_saskta_glue_experiment_data_aug_cola_384 DistilBertForSequenceClassification from gokuls +author: John Snow Labs +name: distilbert_sanskrit_saskta_glue_experiment_data_aug_cola_384 +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_sanskrit_saskta_glue_experiment_data_aug_cola_384` is a English model originally trained by gokuls. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_sanskrit_saskta_glue_experiment_data_aug_cola_384_en_5.5.0_3.0_1726953479755.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_sanskrit_saskta_glue_experiment_data_aug_cola_384_en_5.5.0_3.0_1726953479755.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_sanskrit_saskta_glue_experiment_data_aug_cola_384","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_sanskrit_saskta_glue_experiment_data_aug_cola_384", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_sanskrit_saskta_glue_experiment_data_aug_cola_384| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|111.8 MB| + +## References + +https://huggingface.co/gokuls/distilbert_sa_GLUE_Experiment_data_aug_cola_384 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distilbert_sanskrit_saskta_glue_experiment_data_aug_cola_384_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-distilbert_sanskrit_saskta_glue_experiment_data_aug_cola_384_pipeline_en.md new file mode 100644 index 00000000000000..4241b24339e645 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distilbert_sanskrit_saskta_glue_experiment_data_aug_cola_384_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_sanskrit_saskta_glue_experiment_data_aug_cola_384_pipeline pipeline DistilBertForSequenceClassification from gokuls +author: John Snow Labs +name: distilbert_sanskrit_saskta_glue_experiment_data_aug_cola_384_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_sanskrit_saskta_glue_experiment_data_aug_cola_384_pipeline` is a English model originally trained by gokuls. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_sanskrit_saskta_glue_experiment_data_aug_cola_384_pipeline_en_5.5.0_3.0_1726953485436.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_sanskrit_saskta_glue_experiment_data_aug_cola_384_pipeline_en_5.5.0_3.0_1726953485436.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_sanskrit_saskta_glue_experiment_data_aug_cola_384_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_sanskrit_saskta_glue_experiment_data_aug_cola_384_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_sanskrit_saskta_glue_experiment_data_aug_cola_384_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|111.9 MB| + +## References + +https://huggingface.co/gokuls/distilbert_sa_GLUE_Experiment_data_aug_cola_384 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distilbert_sql_timeout_classifier_with_trained_tokenizer_en.md b/docs/_posts/ahmedlone127/2024-09-21-distilbert_sql_timeout_classifier_with_trained_tokenizer_en.md new file mode 100644 index 00000000000000..c502d96771eb3f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distilbert_sql_timeout_classifier_with_trained_tokenizer_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_sql_timeout_classifier_with_trained_tokenizer DistilBertForSequenceClassification from Lifehouse +author: John Snow Labs +name: distilbert_sql_timeout_classifier_with_trained_tokenizer +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_sql_timeout_classifier_with_trained_tokenizer` is a English model originally trained by Lifehouse. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_sql_timeout_classifier_with_trained_tokenizer_en_5.5.0_3.0_1726923713907.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_sql_timeout_classifier_with_trained_tokenizer_en_5.5.0_3.0_1726923713907.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_sql_timeout_classifier_with_trained_tokenizer","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_sql_timeout_classifier_with_trained_tokenizer", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_sql_timeout_classifier_with_trained_tokenizer| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|266.9 MB| + +## References + +https://huggingface.co/Lifehouse/distilbert-sql-timeout-classifier-with-trained-tokenizer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distilbert_sql_timeout_classifier_with_trained_tokenizer_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-distilbert_sql_timeout_classifier_with_trained_tokenizer_pipeline_en.md new file mode 100644 index 00000000000000..9c99a0473a5dfd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distilbert_sql_timeout_classifier_with_trained_tokenizer_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_sql_timeout_classifier_with_trained_tokenizer_pipeline pipeline DistilBertForSequenceClassification from Lifehouse +author: John Snow Labs +name: distilbert_sql_timeout_classifier_with_trained_tokenizer_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_sql_timeout_classifier_with_trained_tokenizer_pipeline` is a English model originally trained by Lifehouse. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_sql_timeout_classifier_with_trained_tokenizer_pipeline_en_5.5.0_3.0_1726923728799.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_sql_timeout_classifier_with_trained_tokenizer_pipeline_en_5.5.0_3.0_1726923728799.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_sql_timeout_classifier_with_trained_tokenizer_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_sql_timeout_classifier_with_trained_tokenizer_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_sql_timeout_classifier_with_trained_tokenizer_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|267.0 MB| + +## References + +https://huggingface.co/Lifehouse/distilbert-sql-timeout-classifier-with-trained-tokenizer + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distilbert_sst2_padding80model_en.md b/docs/_posts/ahmedlone127/2024-09-21-distilbert_sst2_padding80model_en.md new file mode 100644 index 00000000000000..cc653889eb2eda --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distilbert_sst2_padding80model_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_sst2_padding80model DistilBertForSequenceClassification from Realgon +author: John Snow Labs +name: distilbert_sst2_padding80model +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_sst2_padding80model` is a English model originally trained by Realgon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_sst2_padding80model_en_5.5.0_3.0_1726884465019.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_sst2_padding80model_en_5.5.0_3.0_1726884465019.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_sst2_padding80model","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_sst2_padding80model", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_sst2_padding80model| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Realgon/distilbert_sst2_padding80model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distilbert_sst2_padding80model_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-distilbert_sst2_padding80model_pipeline_en.md new file mode 100644 index 00000000000000..800fb5206c735a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distilbert_sst2_padding80model_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_sst2_padding80model_pipeline pipeline DistilBertForSequenceClassification from Realgon +author: John Snow Labs +name: distilbert_sst2_padding80model_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_sst2_padding80model_pipeline` is a English model originally trained by Realgon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_sst2_padding80model_pipeline_en_5.5.0_3.0_1726884480097.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_sst2_padding80model_pipeline_en_5.5.0_3.0_1726884480097.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_sst2_padding80model_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_sst2_padding80model_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_sst2_padding80model_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Realgon/distilbert_sst2_padding80model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distilbert_twitterfin_padding100model_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-distilbert_twitterfin_padding100model_pipeline_en.md new file mode 100644 index 00000000000000..7e401e62da65b6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distilbert_twitterfin_padding100model_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_twitterfin_padding100model_pipeline pipeline DistilBertForSequenceClassification from Realgon +author: John Snow Labs +name: distilbert_twitterfin_padding100model_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_twitterfin_padding100model_pipeline` is a English model originally trained by Realgon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_twitterfin_padding100model_pipeline_en_5.5.0_3.0_1726953383698.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_twitterfin_padding100model_pipeline_en_5.5.0_3.0_1726953383698.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_twitterfin_padding100model_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_twitterfin_padding100model_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_twitterfin_padding100model_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Realgon/distilbert_twitterfin_padding100model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distilbert_twitterfin_padding10model_en.md b/docs/_posts/ahmedlone127/2024-09-21-distilbert_twitterfin_padding10model_en.md new file mode 100644 index 00000000000000..9cfa916c4f682a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distilbert_twitterfin_padding10model_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_twitterfin_padding10model DistilBertForSequenceClassification from Realgon +author: John Snow Labs +name: distilbert_twitterfin_padding10model +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_twitterfin_padding10model` is a English model originally trained by Realgon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_twitterfin_padding10model_en_5.5.0_3.0_1726888596795.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_twitterfin_padding10model_en_5.5.0_3.0_1726888596795.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_twitterfin_padding10model","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_twitterfin_padding10model", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_twitterfin_padding10model| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Realgon/distilbert_twitterfin_padding10model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distilbert_twitterfin_padding10model_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-distilbert_twitterfin_padding10model_pipeline_en.md new file mode 100644 index 00000000000000..4899389fb62faa --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distilbert_twitterfin_padding10model_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_twitterfin_padding10model_pipeline pipeline DistilBertForSequenceClassification from Realgon +author: John Snow Labs +name: distilbert_twitterfin_padding10model_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_twitterfin_padding10model_pipeline` is a English model originally trained by Realgon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_twitterfin_padding10model_pipeline_en_5.5.0_3.0_1726888614013.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_twitterfin_padding10model_pipeline_en_5.5.0_3.0_1726888614013.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_twitterfin_padding10model_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_twitterfin_padding10model_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_twitterfin_padding10model_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Realgon/distilbert_twitterfin_padding10model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distilft_frisian_1ha_fy.md b/docs/_posts/ahmedlone127/2024-09-21-distilft_frisian_1ha_fy.md new file mode 100644 index 00000000000000..4623fbb9b86973 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distilft_frisian_1ha_fy.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Western Frisian distilft_frisian_1ha WhisperForCTC from Pageee +author: John Snow Labs +name: distilft_frisian_1ha +date: 2024-09-21 +tags: [fy, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: fy +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilft_frisian_1ha` is a Western Frisian model originally trained by Pageee. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilft_frisian_1ha_fy_5.5.0_3.0_1726894650532.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilft_frisian_1ha_fy_5.5.0_3.0_1726894650532.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("distilft_frisian_1ha","fy") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("distilft_frisian_1ha", "fy") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilft_frisian_1ha| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|fy| +|Size:|1.2 GB| + +## References + +https://huggingface.co/Pageee/DistilFT-Frisian-1ha \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distilft_frisian_1ha_pipeline_fy.md b/docs/_posts/ahmedlone127/2024-09-21-distilft_frisian_1ha_pipeline_fy.md new file mode 100644 index 00000000000000..f95894d5b24d30 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distilft_frisian_1ha_pipeline_fy.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Western Frisian distilft_frisian_1ha_pipeline pipeline WhisperForCTC from Pageee +author: John Snow Labs +name: distilft_frisian_1ha_pipeline +date: 2024-09-21 +tags: [fy, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: fy +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilft_frisian_1ha_pipeline` is a Western Frisian model originally trained by Pageee. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilft_frisian_1ha_pipeline_fy_5.5.0_3.0_1726894709854.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilft_frisian_1ha_pipeline_fy_5.5.0_3.0_1726894709854.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilft_frisian_1ha_pipeline", lang = "fy") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilft_frisian_1ha_pipeline", lang = "fy") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilft_frisian_1ha_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|fy| +|Size:|1.2 GB| + +## References + +https://huggingface.co/Pageee/DistilFT-Frisian-1ha + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distilft_frisian_1hd_fy.md b/docs/_posts/ahmedlone127/2024-09-21-distilft_frisian_1hd_fy.md new file mode 100644 index 00000000000000..de5221aa7c9385 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distilft_frisian_1hd_fy.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Western Frisian distilft_frisian_1hd WhisperForCTC from Pageee +author: John Snow Labs +name: distilft_frisian_1hd +date: 2024-09-21 +tags: [fy, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: fy +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilft_frisian_1hd` is a Western Frisian model originally trained by Pageee. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilft_frisian_1hd_fy_5.5.0_3.0_1726903833514.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilft_frisian_1hd_fy_5.5.0_3.0_1726903833514.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("distilft_frisian_1hd","fy") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("distilft_frisian_1hd", "fy") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilft_frisian_1hd| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|fy| +|Size:|1.2 GB| + +## References + +https://huggingface.co/Pageee/DistilFT-Frisian-1hd \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distilft_frisian_1hd_pipeline_fy.md b/docs/_posts/ahmedlone127/2024-09-21-distilft_frisian_1hd_pipeline_fy.md new file mode 100644 index 00000000000000..87232b72e2e131 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distilft_frisian_1hd_pipeline_fy.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Western Frisian distilft_frisian_1hd_pipeline pipeline WhisperForCTC from Pageee +author: John Snow Labs +name: distilft_frisian_1hd_pipeline +date: 2024-09-21 +tags: [fy, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: fy +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilft_frisian_1hd_pipeline` is a Western Frisian model originally trained by Pageee. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilft_frisian_1hd_pipeline_fy_5.5.0_3.0_1726903893132.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilft_frisian_1hd_pipeline_fy_5.5.0_3.0_1726903893132.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilft_frisian_1hd_pipeline", lang = "fy") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilft_frisian_1hd_pipeline", lang = "fy") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilft_frisian_1hd_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|fy| +|Size:|1.2 GB| + +## References + +https://huggingface.co/Pageee/DistilFT-Frisian-1hd + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distilkobert_ep3_en.md b/docs/_posts/ahmedlone127/2024-09-21-distilkobert_ep3_en.md new file mode 100644 index 00000000000000..2772f42581e2de --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distilkobert_ep3_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilkobert_ep3 DistilBertForSequenceClassification from yeye776 +author: John Snow Labs +name: distilkobert_ep3 +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilkobert_ep3` is a English model originally trained by yeye776. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilkobert_ep3_en_5.5.0_3.0_1726923699841.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilkobert_ep3_en_5.5.0_3.0_1726923699841.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilkobert_ep3","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilkobert_ep3", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilkobert_ep3| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|106.5 MB| + +## References + +https://huggingface.co/yeye776/DistilKoBERT-ep3 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distilkobert_ep3_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-distilkobert_ep3_pipeline_en.md new file mode 100644 index 00000000000000..4f7cb94e098424 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distilkobert_ep3_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilkobert_ep3_pipeline pipeline DistilBertForSequenceClassification from yeye776 +author: John Snow Labs +name: distilkobert_ep3_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilkobert_ep3_pipeline` is a English model originally trained by yeye776. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilkobert_ep3_pipeline_en_5.5.0_3.0_1726923705730.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilkobert_ep3_pipeline_en_5.5.0_3.0_1726923705730.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilkobert_ep3_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilkobert_ep3_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilkobert_ep3_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|106.5 MB| + +## References + +https://huggingface.co/yeye776/DistilKoBERT-ep3 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distill_pi_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-distill_pi_pipeline_en.md new file mode 100644 index 00000000000000..30205287c3c84a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distill_pi_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distill_pi_pipeline pipeline DistilBertForSequenceClassification from dawidmt +author: John Snow Labs +name: distill_pi_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distill_pi_pipeline` is a English model originally trained by dawidmt. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distill_pi_pipeline_en_5.5.0_3.0_1726953369769.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distill_pi_pipeline_en_5.5.0_3.0_1726953369769.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distill_pi_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distill_pi_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distill_pi_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/dawidmt/distill_pi + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distillbert_8_en.md b/docs/_posts/ahmedlone127/2024-09-21-distillbert_8_en.md new file mode 100644 index 00000000000000..315047012b77b7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distillbert_8_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distillbert_8 DistilBertForSequenceClassification from dzd828 +author: John Snow Labs +name: distillbert_8 +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distillbert_8` is a English model originally trained by dzd828. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distillbert_8_en_5.5.0_3.0_1726924115259.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distillbert_8_en_5.5.0_3.0_1726924115259.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distillbert_8","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distillbert_8", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distillbert_8| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/dzd828/distillbert-8 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distillbert_8_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-distillbert_8_pipeline_en.md new file mode 100644 index 00000000000000..8ff136b7825f8b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distillbert_8_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distillbert_8_pipeline pipeline DistilBertForSequenceClassification from dzd828 +author: John Snow Labs +name: distillbert_8_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distillbert_8_pipeline` is a English model originally trained by dzd828. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distillbert_8_pipeline_en_5.5.0_3.0_1726924127253.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distillbert_8_pipeline_en_5.5.0_3.0_1726924127253.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distillbert_8_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distillbert_8_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distillbert_8_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/dzd828/distillbert-8 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distillbert_mentalv2_en.md b/docs/_posts/ahmedlone127/2024-09-21-distillbert_mentalv2_en.md new file mode 100644 index 00000000000000..49223a03d1d187 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distillbert_mentalv2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distillbert_mentalv2 DistilBertForSequenceClassification from ColinCcz +author: John Snow Labs +name: distillbert_mentalv2 +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distillbert_mentalv2` is a English model originally trained by ColinCcz. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distillbert_mentalv2_en_5.5.0_3.0_1726923870165.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distillbert_mentalv2_en_5.5.0_3.0_1726923870165.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distillbert_mentalv2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distillbert_mentalv2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distillbert_mentalv2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/ColinCcz/distillBERT_mentalv2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distillbert_mentalv2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-distillbert_mentalv2_pipeline_en.md new file mode 100644 index 00000000000000..e74b63fb4af73e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distillbert_mentalv2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distillbert_mentalv2_pipeline pipeline DistilBertForSequenceClassification from ColinCcz +author: John Snow Labs +name: distillbert_mentalv2_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distillbert_mentalv2_pipeline` is a English model originally trained by ColinCcz. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distillbert_mentalv2_pipeline_en_5.5.0_3.0_1726923882179.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distillbert_mentalv2_pipeline_en_5.5.0_3.0_1726923882179.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distillbert_mentalv2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distillbert_mentalv2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distillbert_mentalv2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/ColinCcz/distillBERT_mentalv2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distilroberta_base_finetuned_wikitext2_ethanoutangoun_en.md b/docs/_posts/ahmedlone127/2024-09-21-distilroberta_base_finetuned_wikitext2_ethanoutangoun_en.md new file mode 100644 index 00000000000000..4adb0df8b435b4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distilroberta_base_finetuned_wikitext2_ethanoutangoun_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilroberta_base_finetuned_wikitext2_ethanoutangoun RoBertaEmbeddings from ethanoutangoun +author: John Snow Labs +name: distilroberta_base_finetuned_wikitext2_ethanoutangoun +date: 2024-09-21 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilroberta_base_finetuned_wikitext2_ethanoutangoun` is a English model originally trained by ethanoutangoun. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilroberta_base_finetuned_wikitext2_ethanoutangoun_en_5.5.0_3.0_1726934771608.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilroberta_base_finetuned_wikitext2_ethanoutangoun_en_5.5.0_3.0_1726934771608.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("distilroberta_base_finetuned_wikitext2_ethanoutangoun","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("distilroberta_base_finetuned_wikitext2_ethanoutangoun","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilroberta_base_finetuned_wikitext2_ethanoutangoun| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|306.5 MB| + +## References + +https://huggingface.co/ethanoutangoun/distilroberta-base-finetuned-wikitext2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distilroberta_base_finetuned_wikitext2_ethanoutangoun_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-distilroberta_base_finetuned_wikitext2_ethanoutangoun_pipeline_en.md new file mode 100644 index 00000000000000..be1a099c43fcc8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distilroberta_base_finetuned_wikitext2_ethanoutangoun_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilroberta_base_finetuned_wikitext2_ethanoutangoun_pipeline pipeline RoBertaEmbeddings from ethanoutangoun +author: John Snow Labs +name: distilroberta_base_finetuned_wikitext2_ethanoutangoun_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilroberta_base_finetuned_wikitext2_ethanoutangoun_pipeline` is a English model originally trained by ethanoutangoun. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilroberta_base_finetuned_wikitext2_ethanoutangoun_pipeline_en_5.5.0_3.0_1726934785651.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilroberta_base_finetuned_wikitext2_ethanoutangoun_pipeline_en_5.5.0_3.0_1726934785651.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilroberta_base_finetuned_wikitext2_ethanoutangoun_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilroberta_base_finetuned_wikitext2_ethanoutangoun_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilroberta_base_finetuned_wikitext2_ethanoutangoun_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|306.5 MB| + +## References + +https://huggingface.co/ethanoutangoun/distilroberta-base-finetuned-wikitext2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distilroberta_base_finetuned_wikitext2_giannifiore_en.md b/docs/_posts/ahmedlone127/2024-09-21-distilroberta_base_finetuned_wikitext2_giannifiore_en.md new file mode 100644 index 00000000000000..776e9ff66faa7d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distilroberta_base_finetuned_wikitext2_giannifiore_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilroberta_base_finetuned_wikitext2_giannifiore RoBertaEmbeddings from giannifiore +author: John Snow Labs +name: distilroberta_base_finetuned_wikitext2_giannifiore +date: 2024-09-21 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilroberta_base_finetuned_wikitext2_giannifiore` is a English model originally trained by giannifiore. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilroberta_base_finetuned_wikitext2_giannifiore_en_5.5.0_3.0_1726942219279.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilroberta_base_finetuned_wikitext2_giannifiore_en_5.5.0_3.0_1726942219279.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("distilroberta_base_finetuned_wikitext2_giannifiore","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("distilroberta_base_finetuned_wikitext2_giannifiore","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilroberta_base_finetuned_wikitext2_giannifiore| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|306.5 MB| + +## References + +https://huggingface.co/giannifiore/distilroberta-base-finetuned-wikitext2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distilroberta_base_finetuned_wikitext2_giannifiore_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-distilroberta_base_finetuned_wikitext2_giannifiore_pipeline_en.md new file mode 100644 index 00000000000000..c8056db63f4c14 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distilroberta_base_finetuned_wikitext2_giannifiore_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilroberta_base_finetuned_wikitext2_giannifiore_pipeline pipeline RoBertaEmbeddings from giannifiore +author: John Snow Labs +name: distilroberta_base_finetuned_wikitext2_giannifiore_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilroberta_base_finetuned_wikitext2_giannifiore_pipeline` is a English model originally trained by giannifiore. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilroberta_base_finetuned_wikitext2_giannifiore_pipeline_en_5.5.0_3.0_1726942235990.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilroberta_base_finetuned_wikitext2_giannifiore_pipeline_en_5.5.0_3.0_1726942235990.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilroberta_base_finetuned_wikitext2_giannifiore_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilroberta_base_finetuned_wikitext2_giannifiore_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilroberta_base_finetuned_wikitext2_giannifiore_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|306.5 MB| + +## References + +https://huggingface.co/giannifiore/distilroberta-base-finetuned-wikitext2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distilroberta_base_ft_climateskeptics_en.md b/docs/_posts/ahmedlone127/2024-09-21-distilroberta_base_ft_climateskeptics_en.md new file mode 100644 index 00000000000000..2467b54b2832ab --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distilroberta_base_ft_climateskeptics_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilroberta_base_ft_climateskeptics RoBertaEmbeddings from jkruk +author: John Snow Labs +name: distilroberta_base_ft_climateskeptics +date: 2024-09-21 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilroberta_base_ft_climateskeptics` is a English model originally trained by jkruk. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilroberta_base_ft_climateskeptics_en_5.5.0_3.0_1726943703970.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilroberta_base_ft_climateskeptics_en_5.5.0_3.0_1726943703970.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("distilroberta_base_ft_climateskeptics","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("distilroberta_base_ft_climateskeptics","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilroberta_base_ft_climateskeptics| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|306.5 MB| + +## References + +https://huggingface.co/jkruk/distilroberta-base-ft-climateskeptics \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distilroberta_base_ft_climateskeptics_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-distilroberta_base_ft_climateskeptics_pipeline_en.md new file mode 100644 index 00000000000000..7919c4a3fe2447 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distilroberta_base_ft_climateskeptics_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilroberta_base_ft_climateskeptics_pipeline pipeline RoBertaEmbeddings from jkruk +author: John Snow Labs +name: distilroberta_base_ft_climateskeptics_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilroberta_base_ft_climateskeptics_pipeline` is a English model originally trained by jkruk. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilroberta_base_ft_climateskeptics_pipeline_en_5.5.0_3.0_1726943719532.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilroberta_base_ft_climateskeptics_pipeline_en_5.5.0_3.0_1726943719532.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilroberta_base_ft_climateskeptics_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilroberta_base_ft_climateskeptics_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilroberta_base_ft_climateskeptics_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|306.5 MB| + +## References + +https://huggingface.co/jkruk/distilroberta-base-ft-climateskeptics + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distilroberta_base_ft_consoom_en.md b/docs/_posts/ahmedlone127/2024-09-21-distilroberta_base_ft_consoom_en.md new file mode 100644 index 00000000000000..3b2e1576afe683 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distilroberta_base_ft_consoom_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilroberta_base_ft_consoom RoBertaEmbeddings from jkruk +author: John Snow Labs +name: distilroberta_base_ft_consoom +date: 2024-09-21 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilroberta_base_ft_consoom` is a English model originally trained by jkruk. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilroberta_base_ft_consoom_en_5.5.0_3.0_1726943857114.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilroberta_base_ft_consoom_en_5.5.0_3.0_1726943857114.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("distilroberta_base_ft_consoom","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("distilroberta_base_ft_consoom","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilroberta_base_ft_consoom| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|306.5 MB| + +## References + +https://huggingface.co/jkruk/distilroberta-base-ft-consoom \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distilroberta_base_ft_consoom_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-distilroberta_base_ft_consoom_pipeline_en.md new file mode 100644 index 00000000000000..cb0d98bd4016cd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distilroberta_base_ft_consoom_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilroberta_base_ft_consoom_pipeline pipeline RoBertaEmbeddings from jkruk +author: John Snow Labs +name: distilroberta_base_ft_consoom_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilroberta_base_ft_consoom_pipeline` is a English model originally trained by jkruk. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilroberta_base_ft_consoom_pipeline_en_5.5.0_3.0_1726943871596.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilroberta_base_ft_consoom_pipeline_en_5.5.0_3.0_1726943871596.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilroberta_base_ft_consoom_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilroberta_base_ft_consoom_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilroberta_base_ft_consoom_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|306.5 MB| + +## References + +https://huggingface.co/jkruk/distilroberta-base-ft-consoom + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distilroberta_base_ft_dating_advice_en.md b/docs/_posts/ahmedlone127/2024-09-21-distilroberta_base_ft_dating_advice_en.md new file mode 100644 index 00000000000000..ce34d6161f5d0c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distilroberta_base_ft_dating_advice_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilroberta_base_ft_dating_advice RoBertaEmbeddings from jkruk +author: John Snow Labs +name: distilroberta_base_ft_dating_advice +date: 2024-09-21 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilroberta_base_ft_dating_advice` is a English model originally trained by jkruk. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilroberta_base_ft_dating_advice_en_5.5.0_3.0_1726958209818.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilroberta_base_ft_dating_advice_en_5.5.0_3.0_1726958209818.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("distilroberta_base_ft_dating_advice","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("distilroberta_base_ft_dating_advice","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilroberta_base_ft_dating_advice| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|306.5 MB| + +## References + +https://huggingface.co/jkruk/distilroberta-base-ft-dating_advice \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distilroberta_base_ft_dating_advice_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-distilroberta_base_ft_dating_advice_pipeline_en.md new file mode 100644 index 00000000000000..a7342be1b7d63b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distilroberta_base_ft_dating_advice_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilroberta_base_ft_dating_advice_pipeline pipeline RoBertaEmbeddings from jkruk +author: John Snow Labs +name: distilroberta_base_ft_dating_advice_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilroberta_base_ft_dating_advice_pipeline` is a English model originally trained by jkruk. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilroberta_base_ft_dating_advice_pipeline_en_5.5.0_3.0_1726958224471.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilroberta_base_ft_dating_advice_pipeline_en_5.5.0_3.0_1726958224471.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilroberta_base_ft_dating_advice_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilroberta_base_ft_dating_advice_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilroberta_base_ft_dating_advice_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|306.5 MB| + +## References + +https://huggingface.co/jkruk/distilroberta-base-ft-dating_advice + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distilroberta_base_ft_goldandblack_en.md b/docs/_posts/ahmedlone127/2024-09-21-distilroberta_base_ft_goldandblack_en.md new file mode 100644 index 00000000000000..fc91f13a92eab4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distilroberta_base_ft_goldandblack_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilroberta_base_ft_goldandblack RoBertaEmbeddings from jkruk +author: John Snow Labs +name: distilroberta_base_ft_goldandblack +date: 2024-09-21 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilroberta_base_ft_goldandblack` is a English model originally trained by jkruk. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilroberta_base_ft_goldandblack_en_5.5.0_3.0_1726957688499.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilroberta_base_ft_goldandblack_en_5.5.0_3.0_1726957688499.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("distilroberta_base_ft_goldandblack","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("distilroberta_base_ft_goldandblack","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilroberta_base_ft_goldandblack| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|306.5 MB| + +## References + +https://huggingface.co/jkruk/distilroberta-base-ft-goldandblack \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distilroberta_base_reuters_bloomberg_ep30_ep20_en.md b/docs/_posts/ahmedlone127/2024-09-21-distilroberta_base_reuters_bloomberg_ep30_ep20_en.md new file mode 100644 index 00000000000000..6f80bf15bf7e49 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distilroberta_base_reuters_bloomberg_ep30_ep20_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilroberta_base_reuters_bloomberg_ep30_ep20 RoBertaEmbeddings from judy93536 +author: John Snow Labs +name: distilroberta_base_reuters_bloomberg_ep30_ep20 +date: 2024-09-21 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilroberta_base_reuters_bloomberg_ep30_ep20` is a English model originally trained by judy93536. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilroberta_base_reuters_bloomberg_ep30_ep20_en_5.5.0_3.0_1726882494020.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilroberta_base_reuters_bloomberg_ep30_ep20_en_5.5.0_3.0_1726882494020.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("distilroberta_base_reuters_bloomberg_ep30_ep20","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("distilroberta_base_reuters_bloomberg_ep30_ep20","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilroberta_base_reuters_bloomberg_ep30_ep20| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|306.0 MB| + +## References + +https://huggingface.co/judy93536/distilroberta-base-reuters-bloomberg-ep30-ep20 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distilroberta_base_reuters_bloomberg_ep30_ep20_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-distilroberta_base_reuters_bloomberg_ep30_ep20_pipeline_en.md new file mode 100644 index 00000000000000..9ec30a95aca223 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distilroberta_base_reuters_bloomberg_ep30_ep20_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilroberta_base_reuters_bloomberg_ep30_ep20_pipeline pipeline RoBertaEmbeddings from judy93536 +author: John Snow Labs +name: distilroberta_base_reuters_bloomberg_ep30_ep20_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilroberta_base_reuters_bloomberg_ep30_ep20_pipeline` is a English model originally trained by judy93536. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilroberta_base_reuters_bloomberg_ep30_ep20_pipeline_en_5.5.0_3.0_1726882508007.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilroberta_base_reuters_bloomberg_ep30_ep20_pipeline_en_5.5.0_3.0_1726882508007.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilroberta_base_reuters_bloomberg_ep30_ep20_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilroberta_base_reuters_bloomberg_ep30_ep20_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilroberta_base_reuters_bloomberg_ep30_ep20_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|306.0 MB| + +## References + +https://huggingface.co/judy93536/distilroberta-base-reuters-bloomberg-ep30-ep20 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distilroberta_base_test_en.md b/docs/_posts/ahmedlone127/2024-09-21-distilroberta_base_test_en.md new file mode 100644 index 00000000000000..e05d40527af2e1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distilroberta_base_test_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilroberta_base_test RoBertaForSequenceClassification from yu-jia-wang +author: John Snow Labs +name: distilroberta_base_test +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilroberta_base_test` is a English model originally trained by yu-jia-wang. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilroberta_base_test_en_5.5.0_3.0_1726940674661.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilroberta_base_test_en_5.5.0_3.0_1726940674661.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("distilroberta_base_test","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("distilroberta_base_test", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilroberta_base_test| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|308.5 MB| + +## References + +https://huggingface.co/yu-jia-wang/distilroberta-base-test \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distilroberta_base_test_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-distilroberta_base_test_pipeline_en.md new file mode 100644 index 00000000000000..2e6b2f024d2804 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distilroberta_base_test_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilroberta_base_test_pipeline pipeline RoBertaForSequenceClassification from yu-jia-wang +author: John Snow Labs +name: distilroberta_base_test_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilroberta_base_test_pipeline` is a English model originally trained by yu-jia-wang. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilroberta_base_test_pipeline_en_5.5.0_3.0_1726940689856.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilroberta_base_test_pipeline_en_5.5.0_3.0_1726940689856.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilroberta_base_test_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilroberta_base_test_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilroberta_base_test_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|308.5 MB| + +## References + +https://huggingface.co/yu-jia-wang/distilroberta-base-test + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distilroberta_base_testingsb_testingsb_en.md b/docs/_posts/ahmedlone127/2024-09-21-distilroberta_base_testingsb_testingsb_en.md new file mode 100644 index 00000000000000..eea6f182c3fe85 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distilroberta_base_testingsb_testingsb_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilroberta_base_testingsb_testingsb RoBertaEmbeddings from MistahCase +author: John Snow Labs +name: distilroberta_base_testingsb_testingsb +date: 2024-09-21 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilroberta_base_testingsb_testingsb` is a English model originally trained by MistahCase. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilroberta_base_testingsb_testingsb_en_5.5.0_3.0_1726882067130.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilroberta_base_testingsb_testingsb_en_5.5.0_3.0_1726882067130.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("distilroberta_base_testingsb_testingsb","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("distilroberta_base_testingsb_testingsb","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilroberta_base_testingsb_testingsb| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|306.5 MB| + +## References + +https://huggingface.co/MistahCase/distilroberta-base-testingSB-testingSB \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distilroberta_base_testingsb_testingsb_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-distilroberta_base_testingsb_testingsb_pipeline_en.md new file mode 100644 index 00000000000000..ec15ba6bb47a02 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distilroberta_base_testingsb_testingsb_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilroberta_base_testingsb_testingsb_pipeline pipeline RoBertaEmbeddings from MistahCase +author: John Snow Labs +name: distilroberta_base_testingsb_testingsb_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilroberta_base_testingsb_testingsb_pipeline` is a English model originally trained by MistahCase. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilroberta_base_testingsb_testingsb_pipeline_en_5.5.0_3.0_1726882081506.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilroberta_base_testingsb_testingsb_pipeline_en_5.5.0_3.0_1726882081506.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilroberta_base_testingsb_testingsb_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilroberta_base_testingsb_testingsb_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilroberta_base_testingsb_testingsb_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|306.5 MB| + +## References + +https://huggingface.co/MistahCase/distilroberta-base-testingSB-testingSB + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distilroberta_rbm213k_ep40_ep20_en.md b/docs/_posts/ahmedlone127/2024-09-21-distilroberta_rbm213k_ep40_ep20_en.md new file mode 100644 index 00000000000000..0ca656c3ec42e4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distilroberta_rbm213k_ep40_ep20_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilroberta_rbm213k_ep40_ep20 RoBertaEmbeddings from judy93536 +author: John Snow Labs +name: distilroberta_rbm213k_ep40_ep20 +date: 2024-09-21 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilroberta_rbm213k_ep40_ep20` is a English model originally trained by judy93536. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilroberta_rbm213k_ep40_ep20_en_5.5.0_3.0_1726957827336.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilroberta_rbm213k_ep40_ep20_en_5.5.0_3.0_1726957827336.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("distilroberta_rbm213k_ep40_ep20","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("distilroberta_rbm213k_ep40_ep20","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilroberta_rbm213k_ep40_ep20| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|306.1 MB| + +## References + +https://huggingface.co/judy93536/distilroberta-rbm213k-ep40-ep20 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distilroberta_rbm213k_ep40_ep20_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-distilroberta_rbm213k_ep40_ep20_pipeline_en.md new file mode 100644 index 00000000000000..511ca523cddb5d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distilroberta_rbm213k_ep40_ep20_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilroberta_rbm213k_ep40_ep20_pipeline pipeline RoBertaEmbeddings from judy93536 +author: John Snow Labs +name: distilroberta_rbm213k_ep40_ep20_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilroberta_rbm213k_ep40_ep20_pipeline` is a English model originally trained by judy93536. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilroberta_rbm213k_ep40_ep20_pipeline_en_5.5.0_3.0_1726957842105.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilroberta_rbm213k_ep40_ep20_pipeline_en_5.5.0_3.0_1726957842105.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilroberta_rbm213k_ep40_ep20_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilroberta_rbm213k_ep40_ep20_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilroberta_rbm213k_ep40_ep20_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|306.1 MB| + +## References + +https://huggingface.co/judy93536/distilroberta-rbm213k-ep40-ep20 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distilroberta_topic_classification_3_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-distilroberta_topic_classification_3_pipeline_en.md new file mode 100644 index 00000000000000..f328a55821b159 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distilroberta_topic_classification_3_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilroberta_topic_classification_3_pipeline pipeline RoBertaForSequenceClassification from abdulmatinomotoso +author: John Snow Labs +name: distilroberta_topic_classification_3_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilroberta_topic_classification_3_pipeline` is a English model originally trained by abdulmatinomotoso. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilroberta_topic_classification_3_pipeline_en_5.5.0_3.0_1726940706676.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilroberta_topic_classification_3_pipeline_en_5.5.0_3.0_1726940706676.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilroberta_topic_classification_3_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilroberta_topic_classification_3_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilroberta_topic_classification_3_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|309.4 MB| + +## References + +https://huggingface.co/abdulmatinomotoso/distilroberta-topic-classification_3 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-domain_specific_finetuning_en.md b/docs/_posts/ahmedlone127/2024-09-21-domain_specific_finetuning_en.md new file mode 100644 index 00000000000000..a3932e8747f798 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-domain_specific_finetuning_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English domain_specific_finetuning RoBertaEmbeddings from pavi156 +author: John Snow Labs +name: domain_specific_finetuning +date: 2024-09-21 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`domain_specific_finetuning` is a English model originally trained by pavi156. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/domain_specific_finetuning_en_5.5.0_3.0_1726934086142.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/domain_specific_finetuning_en_5.5.0_3.0_1726934086142.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("domain_specific_finetuning","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("domain_specific_finetuning","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|domain_specific_finetuning| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|306.4 MB| + +## References + +https://huggingface.co/pavi156/domain_specific_finetuning \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-emoji_emoji_random0_seed0_twitter_roberta_base_2022_154m_en.md b/docs/_posts/ahmedlone127/2024-09-21-emoji_emoji_random0_seed0_twitter_roberta_base_2022_154m_en.md new file mode 100644 index 00000000000000..d4e33c763fa16a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-emoji_emoji_random0_seed0_twitter_roberta_base_2022_154m_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English emoji_emoji_random0_seed0_twitter_roberta_base_2022_154m RoBertaForSequenceClassification from tweettemposhift +author: John Snow Labs +name: emoji_emoji_random0_seed0_twitter_roberta_base_2022_154m +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`emoji_emoji_random0_seed0_twitter_roberta_base_2022_154m` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/emoji_emoji_random0_seed0_twitter_roberta_base_2022_154m_en_5.5.0_3.0_1726900183767.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/emoji_emoji_random0_seed0_twitter_roberta_base_2022_154m_en_5.5.0_3.0_1726900183767.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("emoji_emoji_random0_seed0_twitter_roberta_base_2022_154m","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("emoji_emoji_random0_seed0_twitter_roberta_base_2022_154m", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|emoji_emoji_random0_seed0_twitter_roberta_base_2022_154m| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|468.4 MB| + +## References + +https://huggingface.co/tweettemposhift/emoji-emoji_random0_seed0-twitter-roberta-base-2022-154m \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-emoji_emoji_random0_seed0_twitter_roberta_base_2022_154m_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-emoji_emoji_random0_seed0_twitter_roberta_base_2022_154m_pipeline_en.md new file mode 100644 index 00000000000000..a1f19f1f77af80 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-emoji_emoji_random0_seed0_twitter_roberta_base_2022_154m_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English emoji_emoji_random0_seed0_twitter_roberta_base_2022_154m_pipeline pipeline RoBertaForSequenceClassification from tweettemposhift +author: John Snow Labs +name: emoji_emoji_random0_seed0_twitter_roberta_base_2022_154m_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`emoji_emoji_random0_seed0_twitter_roberta_base_2022_154m_pipeline` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/emoji_emoji_random0_seed0_twitter_roberta_base_2022_154m_pipeline_en_5.5.0_3.0_1726900206240.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/emoji_emoji_random0_seed0_twitter_roberta_base_2022_154m_pipeline_en_5.5.0_3.0_1726900206240.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("emoji_emoji_random0_seed0_twitter_roberta_base_2022_154m_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("emoji_emoji_random0_seed0_twitter_roberta_base_2022_154m_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|emoji_emoji_random0_seed0_twitter_roberta_base_2022_154m_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|468.4 MB| + +## References + +https://huggingface.co/tweettemposhift/emoji-emoji_random0_seed0-twitter-roberta-base-2022-154m + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-emotion_english_distilroberta_base_fine_tuned_for_amazon_english_reviews_on_200k_review_v3_en.md b/docs/_posts/ahmedlone127/2024-09-21-emotion_english_distilroberta_base_fine_tuned_for_amazon_english_reviews_on_200k_review_v3_en.md new file mode 100644 index 00000000000000..f5cf8240f6d8af --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-emotion_english_distilroberta_base_fine_tuned_for_amazon_english_reviews_on_200k_review_v3_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English emotion_english_distilroberta_base_fine_tuned_for_amazon_english_reviews_on_200k_review_v3 RoBertaForSequenceClassification from adnanakbr +author: John Snow Labs +name: emotion_english_distilroberta_base_fine_tuned_for_amazon_english_reviews_on_200k_review_v3 +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`emotion_english_distilroberta_base_fine_tuned_for_amazon_english_reviews_on_200k_review_v3` is a English model originally trained by adnanakbr. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/emotion_english_distilroberta_base_fine_tuned_for_amazon_english_reviews_on_200k_review_v3_en_5.5.0_3.0_1726940774612.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/emotion_english_distilroberta_base_fine_tuned_for_amazon_english_reviews_on_200k_review_v3_en_5.5.0_3.0_1726940774612.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("emotion_english_distilroberta_base_fine_tuned_for_amazon_english_reviews_on_200k_review_v3","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("emotion_english_distilroberta_base_fine_tuned_for_amazon_english_reviews_on_200k_review_v3", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|emotion_english_distilroberta_base_fine_tuned_for_amazon_english_reviews_on_200k_review_v3| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|309.0 MB| + +## References + +https://huggingface.co/adnanakbr/emotion-english-distilroberta-base-fine_tuned_for_amazon_english_reviews_on_200K_review_v3 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-emotion_english_distilroberta_base_fine_tuned_for_amazon_english_reviews_on_200k_review_v3_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-emotion_english_distilroberta_base_fine_tuned_for_amazon_english_reviews_on_200k_review_v3_pipeline_en.md new file mode 100644 index 00000000000000..7d7fd69b21bf3e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-emotion_english_distilroberta_base_fine_tuned_for_amazon_english_reviews_on_200k_review_v3_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English emotion_english_distilroberta_base_fine_tuned_for_amazon_english_reviews_on_200k_review_v3_pipeline pipeline RoBertaForSequenceClassification from adnanakbr +author: John Snow Labs +name: emotion_english_distilroberta_base_fine_tuned_for_amazon_english_reviews_on_200k_review_v3_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`emotion_english_distilroberta_base_fine_tuned_for_amazon_english_reviews_on_200k_review_v3_pipeline` is a English model originally trained by adnanakbr. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/emotion_english_distilroberta_base_fine_tuned_for_amazon_english_reviews_on_200k_review_v3_pipeline_en_5.5.0_3.0_1726940789123.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/emotion_english_distilroberta_base_fine_tuned_for_amazon_english_reviews_on_200k_review_v3_pipeline_en_5.5.0_3.0_1726940789123.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("emotion_english_distilroberta_base_fine_tuned_for_amazon_english_reviews_on_200k_review_v3_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("emotion_english_distilroberta_base_fine_tuned_for_amazon_english_reviews_on_200k_review_v3_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|emotion_english_distilroberta_base_fine_tuned_for_amazon_english_reviews_on_200k_review_v3_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|309.1 MB| + +## References + +https://huggingface.co/adnanakbr/emotion-english-distilroberta-base-fine_tuned_for_amazon_english_reviews_on_200K_review_v3 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-emscad_skill_extraction_conference_en.md b/docs/_posts/ahmedlone127/2024-09-21-emscad_skill_extraction_conference_en.md new file mode 100644 index 00000000000000..bb4dcf2d75259f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-emscad_skill_extraction_conference_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English emscad_skill_extraction_conference BertForSequenceClassification from Ivo +author: John Snow Labs +name: emscad_skill_extraction_conference +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`emscad_skill_extraction_conference` is a English model originally trained by Ivo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/emscad_skill_extraction_conference_en_5.5.0_3.0_1726956396052.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/emscad_skill_extraction_conference_en_5.5.0_3.0_1726956396052.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("emscad_skill_extraction_conference","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("emscad_skill_extraction_conference", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|emscad_skill_extraction_conference| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/Ivo/emscad-skill-extraction-conference \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-emscad_skill_extraction_conference_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-emscad_skill_extraction_conference_pipeline_en.md new file mode 100644 index 00000000000000..97b79cdb4c54df --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-emscad_skill_extraction_conference_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English emscad_skill_extraction_conference_pipeline pipeline BertForSequenceClassification from Ivo +author: John Snow Labs +name: emscad_skill_extraction_conference_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`emscad_skill_extraction_conference_pipeline` is a English model originally trained by Ivo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/emscad_skill_extraction_conference_pipeline_en_5.5.0_3.0_1726956415213.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/emscad_skill_extraction_conference_pipeline_en_5.5.0_3.0_1726956415213.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("emscad_skill_extraction_conference_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("emscad_skill_extraction_conference_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|emscad_skill_extraction_conference_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/Ivo/emscad-skill-extraction-conference + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-enviroduediligence_lm_en.md b/docs/_posts/ahmedlone127/2024-09-21-enviroduediligence_lm_en.md new file mode 100644 index 00000000000000..53b0921ffad7f7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-enviroduediligence_lm_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English enviroduediligence_lm RoBertaEmbeddings from d4data +author: John Snow Labs +name: enviroduediligence_lm +date: 2024-09-21 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`enviroduediligence_lm` is a English model originally trained by d4data. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/enviroduediligence_lm_en_5.5.0_3.0_1726944063472.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/enviroduediligence_lm_en_5.5.0_3.0_1726944063472.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("enviroduediligence_lm","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("enviroduediligence_lm","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|enviroduediligence_lm| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|310.4 MB| + +## References + +https://huggingface.co/d4data/EnviroDueDiligence_LM \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-enviroduediligence_lm_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-enviroduediligence_lm_pipeline_en.md new file mode 100644 index 00000000000000..6b056952eb449e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-enviroduediligence_lm_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English enviroduediligence_lm_pipeline pipeline RoBertaEmbeddings from d4data +author: John Snow Labs +name: enviroduediligence_lm_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`enviroduediligence_lm_pipeline` is a English model originally trained by d4data. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/enviroduediligence_lm_pipeline_en_5.5.0_3.0_1726944077556.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/enviroduediligence_lm_pipeline_en_5.5.0_3.0_1726944077556.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("enviroduediligence_lm_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("enviroduediligence_lm_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|enviroduediligence_lm_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|310.4 MB| + +## References + +https://huggingface.co/d4data/EnviroDueDiligence_LM + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-esperberto_corgikoh_en.md b/docs/_posts/ahmedlone127/2024-09-21-esperberto_corgikoh_en.md new file mode 100644 index 00000000000000..243f6722b29018 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-esperberto_corgikoh_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English esperberto_corgikoh RoBertaEmbeddings from corgikoh +author: John Snow Labs +name: esperberto_corgikoh +date: 2024-09-21 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`esperberto_corgikoh` is a English model originally trained by corgikoh. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/esperberto_corgikoh_en_5.5.0_3.0_1726881955142.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/esperberto_corgikoh_en_5.5.0_3.0_1726881955142.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("esperberto_corgikoh","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("esperberto_corgikoh","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|esperberto_corgikoh| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|311.4 MB| + +## References + +https://huggingface.co/corgikoh/EsperBERTo \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-esperberto_corgikoh_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-esperberto_corgikoh_pipeline_en.md new file mode 100644 index 00000000000000..ed69dfb9d3887b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-esperberto_corgikoh_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English esperberto_corgikoh_pipeline pipeline RoBertaEmbeddings from corgikoh +author: John Snow Labs +name: esperberto_corgikoh_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`esperberto_corgikoh_pipeline` is a English model originally trained by corgikoh. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/esperberto_corgikoh_pipeline_en_5.5.0_3.0_1726881969960.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/esperberto_corgikoh_pipeline_en_5.5.0_3.0_1726881969960.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("esperberto_corgikoh_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("esperberto_corgikoh_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|esperberto_corgikoh_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|311.5 MB| + +## References + +https://huggingface.co/corgikoh/EsperBERTo + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-fairlex_scotus_minilm_en.md b/docs/_posts/ahmedlone127/2024-09-21-fairlex_scotus_minilm_en.md new file mode 100644 index 00000000000000..054ab6d50de4c5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-fairlex_scotus_minilm_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English fairlex_scotus_minilm RoBertaEmbeddings from coastalcph +author: John Snow Labs +name: fairlex_scotus_minilm +date: 2024-09-21 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`fairlex_scotus_minilm` is a English model originally trained by coastalcph. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/fairlex_scotus_minilm_en_5.5.0_3.0_1726934184452.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/fairlex_scotus_minilm_en_5.5.0_3.0_1726934184452.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("fairlex_scotus_minilm","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("fairlex_scotus_minilm","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|fairlex_scotus_minilm| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|114.0 MB| + +## References + +https://huggingface.co/coastalcph/fairlex-scotus-minilm \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-fairlex_scotus_minilm_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-fairlex_scotus_minilm_pipeline_en.md new file mode 100644 index 00000000000000..5d92de2096fb97 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-fairlex_scotus_minilm_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English fairlex_scotus_minilm_pipeline pipeline RoBertaEmbeddings from coastalcph +author: John Snow Labs +name: fairlex_scotus_minilm_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`fairlex_scotus_minilm_pipeline` is a English model originally trained by coastalcph. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/fairlex_scotus_minilm_pipeline_en_5.5.0_3.0_1726934189708.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/fairlex_scotus_minilm_pipeline_en_5.5.0_3.0_1726934189708.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("fairlex_scotus_minilm_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("fairlex_scotus_minilm_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|fairlex_scotus_minilm_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|114.0 MB| + +## References + +https://huggingface.co/coastalcph/fairlex-scotus-minilm + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-fieldclassifier_v2_en.md b/docs/_posts/ahmedlone127/2024-09-21-fieldclassifier_v2_en.md new file mode 100644 index 00000000000000..f622cec9ca7fb9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-fieldclassifier_v2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English fieldclassifier_v2 BertForSequenceClassification from CleveGreen +author: John Snow Labs +name: fieldclassifier_v2 +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`fieldclassifier_v2` is a English model originally trained by CleveGreen. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/fieldclassifier_v2_en_5.5.0_3.0_1726956616182.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/fieldclassifier_v2_en_5.5.0_3.0_1726956616182.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("fieldclassifier_v2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("fieldclassifier_v2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|fieldclassifier_v2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|405.9 MB| + +## References + +https://huggingface.co/CleveGreen/FieldClassifier_v2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-fieldclassifier_v2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-fieldclassifier_v2_pipeline_en.md new file mode 100644 index 00000000000000..41f68ac1ca5d2d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-fieldclassifier_v2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English fieldclassifier_v2_pipeline pipeline BertForSequenceClassification from CleveGreen +author: John Snow Labs +name: fieldclassifier_v2_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`fieldclassifier_v2_pipeline` is a English model originally trained by CleveGreen. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/fieldclassifier_v2_pipeline_en_5.5.0_3.0_1726956634651.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/fieldclassifier_v2_pipeline_en_5.5.0_3.0_1726956634651.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("fieldclassifier_v2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("fieldclassifier_v2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|fieldclassifier_v2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|406.0 MB| + +## References + +https://huggingface.co/CleveGreen/FieldClassifier_v2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-final_finetuned_model_en.md b/docs/_posts/ahmedlone127/2024-09-21-final_finetuned_model_en.md new file mode 100644 index 00000000000000..6133c3400f1c6e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-final_finetuned_model_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English final_finetuned_model DistilBertForSequenceClassification from rahulgaikwad007 +author: John Snow Labs +name: final_finetuned_model +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`final_finetuned_model` is a English model originally trained by rahulgaikwad007. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/final_finetuned_model_en_5.5.0_3.0_1726924222361.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/final_finetuned_model_en_5.5.0_3.0_1726924222361.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("final_finetuned_model","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("final_finetuned_model", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|final_finetuned_model| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/rahulgaikwad007/Final-Finetuned-model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-final_finetuned_model_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-final_finetuned_model_pipeline_en.md new file mode 100644 index 00000000000000..b20e01b6174fae --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-final_finetuned_model_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English final_finetuned_model_pipeline pipeline DistilBertForSequenceClassification from rahulgaikwad007 +author: John Snow Labs +name: final_finetuned_model_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`final_finetuned_model_pipeline` is a English model originally trained by rahulgaikwad007. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/final_finetuned_model_pipeline_en_5.5.0_3.0_1726924234603.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/final_finetuned_model_pipeline_en_5.5.0_3.0_1726924234603.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("final_finetuned_model_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("final_finetuned_model_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|final_finetuned_model_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/rahulgaikwad007/Final-Finetuned-model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-fine_tuned_datasetqas_squad_indonesian_with_indobert_base_uncased_without_ittl_without_freeze_lr_1e_05_muhammadravi251001_en.md b/docs/_posts/ahmedlone127/2024-09-21-fine_tuned_datasetqas_squad_indonesian_with_indobert_base_uncased_without_ittl_without_freeze_lr_1e_05_muhammadravi251001_en.md new file mode 100644 index 00000000000000..e534c19381d391 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-fine_tuned_datasetqas_squad_indonesian_with_indobert_base_uncased_without_ittl_without_freeze_lr_1e_05_muhammadravi251001_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English fine_tuned_datasetqas_squad_indonesian_with_indobert_base_uncased_without_ittl_without_freeze_lr_1e_05_muhammadravi251001 BertForQuestionAnswering from muhammadravi251001 +author: John Snow Labs +name: fine_tuned_datasetqas_squad_indonesian_with_indobert_base_uncased_without_ittl_without_freeze_lr_1e_05_muhammadravi251001 +date: 2024-09-21 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`fine_tuned_datasetqas_squad_indonesian_with_indobert_base_uncased_without_ittl_without_freeze_lr_1e_05_muhammadravi251001` is a English model originally trained by muhammadravi251001. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/fine_tuned_datasetqas_squad_indonesian_with_indobert_base_uncased_without_ittl_without_freeze_lr_1e_05_muhammadravi251001_en_5.5.0_3.0_1726928935633.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/fine_tuned_datasetqas_squad_indonesian_with_indobert_base_uncased_without_ittl_without_freeze_lr_1e_05_muhammadravi251001_en_5.5.0_3.0_1726928935633.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("fine_tuned_datasetqas_squad_indonesian_with_indobert_base_uncased_without_ittl_without_freeze_lr_1e_05_muhammadravi251001","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("fine_tuned_datasetqas_squad_indonesian_with_indobert_base_uncased_without_ittl_without_freeze_lr_1e_05_muhammadravi251001", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|fine_tuned_datasetqas_squad_indonesian_with_indobert_base_uncased_without_ittl_without_freeze_lr_1e_05_muhammadravi251001| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|411.7 MB| + +## References + +https://huggingface.co/muhammadravi251001/fine-tuned-DatasetQAS-Squad-ID-with-indobert-base-uncased-without-ITTL-without-freeze-LR-1e-05 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-fine_tuned_datasetqas_squad_indonesian_with_indobert_base_uncased_without_ittl_without_freeze_lr_1e_05_muhammadravi251001_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-fine_tuned_datasetqas_squad_indonesian_with_indobert_base_uncased_without_ittl_without_freeze_lr_1e_05_muhammadravi251001_pipeline_en.md new file mode 100644 index 00000000000000..cc5e0eb55be495 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-fine_tuned_datasetqas_squad_indonesian_with_indobert_base_uncased_without_ittl_without_freeze_lr_1e_05_muhammadravi251001_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English fine_tuned_datasetqas_squad_indonesian_with_indobert_base_uncased_without_ittl_without_freeze_lr_1e_05_muhammadravi251001_pipeline pipeline BertForQuestionAnswering from muhammadravi251001 +author: John Snow Labs +name: fine_tuned_datasetqas_squad_indonesian_with_indobert_base_uncased_without_ittl_without_freeze_lr_1e_05_muhammadravi251001_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`fine_tuned_datasetqas_squad_indonesian_with_indobert_base_uncased_without_ittl_without_freeze_lr_1e_05_muhammadravi251001_pipeline` is a English model originally trained by muhammadravi251001. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/fine_tuned_datasetqas_squad_indonesian_with_indobert_base_uncased_without_ittl_without_freeze_lr_1e_05_muhammadravi251001_pipeline_en_5.5.0_3.0_1726928954251.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/fine_tuned_datasetqas_squad_indonesian_with_indobert_base_uncased_without_ittl_without_freeze_lr_1e_05_muhammadravi251001_pipeline_en_5.5.0_3.0_1726928954251.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("fine_tuned_datasetqas_squad_indonesian_with_indobert_base_uncased_without_ittl_without_freeze_lr_1e_05_muhammadravi251001_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("fine_tuned_datasetqas_squad_indonesian_with_indobert_base_uncased_without_ittl_without_freeze_lr_1e_05_muhammadravi251001_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|fine_tuned_datasetqas_squad_indonesian_with_indobert_base_uncased_without_ittl_without_freeze_lr_1e_05_muhammadravi251001_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|411.7 MB| + +## References + +https://huggingface.co/muhammadravi251001/fine-tuned-DatasetQAS-Squad-ID-with-indobert-base-uncased-without-ITTL-without-freeze-LR-1e-05 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-fine_tuned_distilbert_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-fine_tuned_distilbert_pipeline_en.md new file mode 100644 index 00000000000000..3016ff1f2591c0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-fine_tuned_distilbert_pipeline_en.md @@ -0,0 +1,72 @@ +--- +layout: model +title: English fine_tuned_distilbert_pipeline pipeline DistilBertForQuestionAnswering from Roamify +author: John Snow Labs +name: fine_tuned_distilbert_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`fine_tuned_distilbert_pipeline` is a English model originally trained by Roamify. + +## Predicted Entities + + + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/fine_tuned_distilbert_pipeline_en_5.5.0_3.0_1726924135235.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/fine_tuned_distilbert_pipeline_en_5.5.0_3.0_1726924135235.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +pipeline = PretrainedPipeline("fine_tuned_distilbert_pipeline", lang = "en") +annotations = pipeline.transform(df) +``` +```scala +val pipeline = new PretrainedPipeline("fine_tuned_distilbert_pipeline", lang = "en") +val annotations = pipeline.transform(df) +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|fine_tuned_distilbert_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +References + +https://huggingface.co/Roamify/fine-tuned-distilbert + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-finetuned_demo_2_nardellu_en.md b/docs/_posts/ahmedlone127/2024-09-21-finetuned_demo_2_nardellu_en.md new file mode 100644 index 00000000000000..d86e2719b72b08 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-finetuned_demo_2_nardellu_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuned_demo_2_nardellu DistilBertForSequenceClassification from nardellu +author: John Snow Labs +name: finetuned_demo_2_nardellu +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuned_demo_2_nardellu` is a English model originally trained by nardellu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuned_demo_2_nardellu_en_5.5.0_3.0_1726953348834.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuned_demo_2_nardellu_en_5.5.0_3.0_1726953348834.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuned_demo_2_nardellu","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuned_demo_2_nardellu", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuned_demo_2_nardellu| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|246.0 MB| + +## References + +https://huggingface.co/nardellu/finetuned_demo_2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-finetuned_demo_2_nardellu_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-finetuned_demo_2_nardellu_pipeline_en.md new file mode 100644 index 00000000000000..ae07d1ba32f314 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-finetuned_demo_2_nardellu_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuned_demo_2_nardellu_pipeline pipeline DistilBertForSequenceClassification from nardellu +author: John Snow Labs +name: finetuned_demo_2_nardellu_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuned_demo_2_nardellu_pipeline` is a English model originally trained by nardellu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuned_demo_2_nardellu_pipeline_en_5.5.0_3.0_1726953360871.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuned_demo_2_nardellu_pipeline_en_5.5.0_3.0_1726953360871.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuned_demo_2_nardellu_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuned_demo_2_nardellu_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuned_demo_2_nardellu_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|246.0 MB| + +## References + +https://huggingface.co/nardellu/finetuned_demo_2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-finetuned_distilbert_for_reddit_depression_detection_en.md b/docs/_posts/ahmedlone127/2024-09-21-finetuned_distilbert_for_reddit_depression_detection_en.md new file mode 100644 index 00000000000000..4d7ce3348bbede --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-finetuned_distilbert_for_reddit_depression_detection_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuned_distilbert_for_reddit_depression_detection DistilBertForSequenceClassification from sunF1ow3r +author: John Snow Labs +name: finetuned_distilbert_for_reddit_depression_detection +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuned_distilbert_for_reddit_depression_detection` is a English model originally trained by sunF1ow3r. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuned_distilbert_for_reddit_depression_detection_en_5.5.0_3.0_1726953713804.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuned_distilbert_for_reddit_depression_detection_en_5.5.0_3.0_1726953713804.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuned_distilbert_for_reddit_depression_detection","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuned_distilbert_for_reddit_depression_detection", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuned_distilbert_for_reddit_depression_detection| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/sunF1ow3r/finetuned-distilBERT-for-reddit-depression-detection \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-finetuned_distilbert_for_reddit_depression_detection_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-finetuned_distilbert_for_reddit_depression_detection_pipeline_en.md new file mode 100644 index 00000000000000..d3649033381f4f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-finetuned_distilbert_for_reddit_depression_detection_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuned_distilbert_for_reddit_depression_detection_pipeline pipeline DistilBertForSequenceClassification from sunF1ow3r +author: John Snow Labs +name: finetuned_distilbert_for_reddit_depression_detection_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuned_distilbert_for_reddit_depression_detection_pipeline` is a English model originally trained by sunF1ow3r. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuned_distilbert_for_reddit_depression_detection_pipeline_en_5.5.0_3.0_1726953725388.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuned_distilbert_for_reddit_depression_detection_pipeline_en_5.5.0_3.0_1726953725388.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuned_distilbert_for_reddit_depression_detection_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuned_distilbert_for_reddit_depression_detection_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuned_distilbert_for_reddit_depression_detection_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/sunF1ow3r/finetuned-distilBERT-for-reddit-depression-detection + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-finetuned_sail2017_additionalpretrained_xlm_roberta_base_en.md b/docs/_posts/ahmedlone127/2024-09-21-finetuned_sail2017_additionalpretrained_xlm_roberta_base_en.md new file mode 100644 index 00000000000000..8c3a2f6a806689 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-finetuned_sail2017_additionalpretrained_xlm_roberta_base_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuned_sail2017_additionalpretrained_xlm_roberta_base XlmRoBertaForSequenceClassification from aditeyabaral +author: John Snow Labs +name: finetuned_sail2017_additionalpretrained_xlm_roberta_base +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuned_sail2017_additionalpretrained_xlm_roberta_base` is a English model originally trained by aditeyabaral. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuned_sail2017_additionalpretrained_xlm_roberta_base_en_5.5.0_3.0_1726919054586.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuned_sail2017_additionalpretrained_xlm_roberta_base_en_5.5.0_3.0_1726919054586.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("finetuned_sail2017_additionalpretrained_xlm_roberta_base","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("finetuned_sail2017_additionalpretrained_xlm_roberta_base", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuned_sail2017_additionalpretrained_xlm_roberta_base| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|855.5 MB| + +## References + +https://huggingface.co/aditeyabaral/finetuned-sail2017-additionalpretrained-xlm-roberta-base \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-finetuned_sail2017_additionalpretrained_xlm_roberta_base_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-finetuned_sail2017_additionalpretrained_xlm_roberta_base_pipeline_en.md new file mode 100644 index 00000000000000..8e6611d96dddc7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-finetuned_sail2017_additionalpretrained_xlm_roberta_base_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuned_sail2017_additionalpretrained_xlm_roberta_base_pipeline pipeline XlmRoBertaForSequenceClassification from aditeyabaral +author: John Snow Labs +name: finetuned_sail2017_additionalpretrained_xlm_roberta_base_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuned_sail2017_additionalpretrained_xlm_roberta_base_pipeline` is a English model originally trained by aditeyabaral. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuned_sail2017_additionalpretrained_xlm_roberta_base_pipeline_en_5.5.0_3.0_1726919111944.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuned_sail2017_additionalpretrained_xlm_roberta_base_pipeline_en_5.5.0_3.0_1726919111944.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuned_sail2017_additionalpretrained_xlm_roberta_base_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuned_sail2017_additionalpretrained_xlm_roberta_base_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuned_sail2017_additionalpretrained_xlm_roberta_base_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|855.6 MB| + +## References + +https://huggingface.co/aditeyabaral/finetuned-sail2017-additionalpretrained-xlm-roberta-base + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-finetuning_sentiment_model_2_en.md b/docs/_posts/ahmedlone127/2024-09-21-finetuning_sentiment_model_2_en.md new file mode 100644 index 00000000000000..62fe7f00d22e28 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-finetuning_sentiment_model_2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuning_sentiment_model_2 DistilBertForSequenceClassification from OscarSuarez +author: John Snow Labs +name: finetuning_sentiment_model_2 +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_2` is a English model originally trained by OscarSuarez. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_2_en_5.5.0_3.0_1726953146959.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_2_en_5.5.0_3.0_1726953146959.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/OscarSuarez/finetuning-sentiment-model-2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-finetuning_sentiment_model_2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-finetuning_sentiment_model_2_pipeline_en.md new file mode 100644 index 00000000000000..2c8faac06bcd53 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-finetuning_sentiment_model_2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuning_sentiment_model_2_pipeline pipeline DistilBertForSequenceClassification from OscarSuarez +author: John Snow Labs +name: finetuning_sentiment_model_2_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_2_pipeline` is a English model originally trained by OscarSuarez. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_2_pipeline_en_5.5.0_3.0_1726953159138.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_2_pipeline_en_5.5.0_3.0_1726953159138.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuning_sentiment_model_2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuning_sentiment_model_2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/OscarSuarez/finetuning-sentiment-model-2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-finetuning_sentiment_model_3000_samples_diegodelvalle_en.md b/docs/_posts/ahmedlone127/2024-09-21-finetuning_sentiment_model_3000_samples_diegodelvalle_en.md new file mode 100644 index 00000000000000..b893041370da78 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-finetuning_sentiment_model_3000_samples_diegodelvalle_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_diegodelvalle DistilBertForSequenceClassification from DiegodelValle +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_diegodelvalle +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_diegodelvalle` is a English model originally trained by DiegodelValle. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_diegodelvalle_en_5.5.0_3.0_1726884672597.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_diegodelvalle_en_5.5.0_3.0_1726884672597.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_diegodelvalle","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_diegodelvalle", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_diegodelvalle| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/DiegodelValle/finetuning-sentiment-model-3000-samples \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-finetuning_sentiment_model_3000_samples_diegodelvalle_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-finetuning_sentiment_model_3000_samples_diegodelvalle_pipeline_en.md new file mode 100644 index 00000000000000..4ad170f08300c9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-finetuning_sentiment_model_3000_samples_diegodelvalle_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_diegodelvalle_pipeline pipeline DistilBertForSequenceClassification from DiegodelValle +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_diegodelvalle_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_diegodelvalle_pipeline` is a English model originally trained by DiegodelValle. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_diegodelvalle_pipeline_en_5.5.0_3.0_1726884684711.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_diegodelvalle_pipeline_en_5.5.0_3.0_1726884684711.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuning_sentiment_model_3000_samples_diegodelvalle_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuning_sentiment_model_3000_samples_diegodelvalle_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_diegodelvalle_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/DiegodelValle/finetuning-sentiment-model-3000-samples + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-finetuning_sentiment_model_3000_samples_iossifpalli_en.md b/docs/_posts/ahmedlone127/2024-09-21-finetuning_sentiment_model_3000_samples_iossifpalli_en.md new file mode 100644 index 00000000000000..84b82710b361df --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-finetuning_sentiment_model_3000_samples_iossifpalli_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_iossifpalli DistilBertForSequenceClassification from IossifPalli +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_iossifpalli +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_iossifpalli` is a English model originally trained by IossifPalli. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_iossifpalli_en_5.5.0_3.0_1726888596801.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_iossifpalli_en_5.5.0_3.0_1726888596801.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_iossifpalli","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_iossifpalli", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_iossifpalli| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/IossifPalli/finetuning-sentiment-model-3000-samples \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-finetuning_sentiment_model_3000_samples_iossifpalli_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-finetuning_sentiment_model_3000_samples_iossifpalli_pipeline_en.md new file mode 100644 index 00000000000000..6f6669ea117ba0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-finetuning_sentiment_model_3000_samples_iossifpalli_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_iossifpalli_pipeline pipeline DistilBertForSequenceClassification from IossifPalli +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_iossifpalli_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_iossifpalli_pipeline` is a English model originally trained by IossifPalli. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_iossifpalli_pipeline_en_5.5.0_3.0_1726888613997.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_iossifpalli_pipeline_en_5.5.0_3.0_1726888613997.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuning_sentiment_model_3000_samples_iossifpalli_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuning_sentiment_model_3000_samples_iossifpalli_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_iossifpalli_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/IossifPalli/finetuning-sentiment-model-3000-samples + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-finetuning_sentiment_model_3000_samples_lilianvoss_en.md b/docs/_posts/ahmedlone127/2024-09-21-finetuning_sentiment_model_3000_samples_lilianvoss_en.md new file mode 100644 index 00000000000000..5d44e5adc4571a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-finetuning_sentiment_model_3000_samples_lilianvoss_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_lilianvoss DistilBertForSequenceClassification from LilianVoss +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_lilianvoss +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_lilianvoss` is a English model originally trained by LilianVoss. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_lilianvoss_en_5.5.0_3.0_1726953390507.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_lilianvoss_en_5.5.0_3.0_1726953390507.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_lilianvoss","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_lilianvoss", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_lilianvoss| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/LilianVoss/finetuning-sentiment-model-3000-samples \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-finetuning_sentiment_model_3000_samples_lilianvoss_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-finetuning_sentiment_model_3000_samples_lilianvoss_pipeline_en.md new file mode 100644 index 00000000000000..302c53a25ba3f6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-finetuning_sentiment_model_3000_samples_lilianvoss_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_lilianvoss_pipeline pipeline DistilBertForSequenceClassification from LilianVoss +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_lilianvoss_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_lilianvoss_pipeline` is a English model originally trained by LilianVoss. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_lilianvoss_pipeline_en_5.5.0_3.0_1726953403085.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_lilianvoss_pipeline_en_5.5.0_3.0_1726953403085.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuning_sentiment_model_3000_samples_lilianvoss_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuning_sentiment_model_3000_samples_lilianvoss_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_lilianvoss_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/LilianVoss/finetuning-sentiment-model-3000-samples + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-finetuning_sentiment_model_3000_samples_marcelarosalesj_en.md b/docs/_posts/ahmedlone127/2024-09-21-finetuning_sentiment_model_3000_samples_marcelarosalesj_en.md new file mode 100644 index 00000000000000..4345c9b0a1285c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-finetuning_sentiment_model_3000_samples_marcelarosalesj_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_marcelarosalesj DistilBertForSequenceClassification from marcelarosalesj +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_marcelarosalesj +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_marcelarosalesj` is a English model originally trained by marcelarosalesj. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_marcelarosalesj_en_5.5.0_3.0_1726923712368.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_marcelarosalesj_en_5.5.0_3.0_1726923712368.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_marcelarosalesj","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_marcelarosalesj", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_marcelarosalesj| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/marcelarosalesj/finetuning-sentiment-model-3000-samples \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-finetuning_sentiment_model_3000_samples_marcelarosalesj_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-finetuning_sentiment_model_3000_samples_marcelarosalesj_pipeline_en.md new file mode 100644 index 00000000000000..58da77d0069ca4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-finetuning_sentiment_model_3000_samples_marcelarosalesj_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_marcelarosalesj_pipeline pipeline DistilBertForSequenceClassification from marcelarosalesj +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_marcelarosalesj_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_marcelarosalesj_pipeline` is a English model originally trained by marcelarosalesj. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_marcelarosalesj_pipeline_en_5.5.0_3.0_1726923725204.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_marcelarosalesj_pipeline_en_5.5.0_3.0_1726923725204.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuning_sentiment_model_3000_samples_marcelarosalesj_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuning_sentiment_model_3000_samples_marcelarosalesj_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_marcelarosalesj_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/marcelarosalesj/finetuning-sentiment-model-3000-samples + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-finetuning_sentiment_model_3000_samples_ramabhishek_en.md b/docs/_posts/ahmedlone127/2024-09-21-finetuning_sentiment_model_3000_samples_ramabhishek_en.md new file mode 100644 index 00000000000000..5d132c1b469abe --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-finetuning_sentiment_model_3000_samples_ramabhishek_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_ramabhishek DistilBertForSequenceClassification from RamAbhishek +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_ramabhishek +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_ramabhishek` is a English model originally trained by RamAbhishek. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_ramabhishek_en_5.5.0_3.0_1726953643510.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_ramabhishek_en_5.5.0_3.0_1726953643510.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_ramabhishek","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_ramabhishek", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_ramabhishek| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/RamAbhishek/finetuning-sentiment-model-3000-samples \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-finetuning_sentiment_model_3000_samples_ramabhishek_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-finetuning_sentiment_model_3000_samples_ramabhishek_pipeline_en.md new file mode 100644 index 00000000000000..731156cd834cc4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-finetuning_sentiment_model_3000_samples_ramabhishek_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_ramabhishek_pipeline pipeline DistilBertForSequenceClassification from RamAbhishek +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_ramabhishek_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_ramabhishek_pipeline` is a English model originally trained by RamAbhishek. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_ramabhishek_pipeline_en_5.5.0_3.0_1726953655537.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_ramabhishek_pipeline_en_5.5.0_3.0_1726953655537.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuning_sentiment_model_3000_samples_ramabhishek_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuning_sentiment_model_3000_samples_ramabhishek_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_ramabhishek_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/RamAbhishek/finetuning-sentiment-model-3000-samples + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-finetuning_sentiment_model_3000_samples_ramanen_en.md b/docs/_posts/ahmedlone127/2024-09-21-finetuning_sentiment_model_3000_samples_ramanen_en.md new file mode 100644 index 00000000000000..3b65927e824f89 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-finetuning_sentiment_model_3000_samples_ramanen_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_ramanen DistilBertForSequenceClassification from Ramanen +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_ramanen +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_ramanen` is a English model originally trained by Ramanen. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_ramanen_en_5.5.0_3.0_1726889108044.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_ramanen_en_5.5.0_3.0_1726889108044.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_ramanen","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_ramanen", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_ramanen| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Ramanen/finetuning-sentiment-model-3000-samples \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-finetuning_sentiment_model_3000_samples_ramanen_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-finetuning_sentiment_model_3000_samples_ramanen_pipeline_en.md new file mode 100644 index 00000000000000..8cbacdbd8c341d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-finetuning_sentiment_model_3000_samples_ramanen_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_ramanen_pipeline pipeline DistilBertForSequenceClassification from Ramanen +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_ramanen_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_ramanen_pipeline` is a English model originally trained by Ramanen. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_ramanen_pipeline_en_5.5.0_3.0_1726889120186.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_ramanen_pipeline_en_5.5.0_3.0_1726889120186.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuning_sentiment_model_3000_samples_ramanen_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuning_sentiment_model_3000_samples_ramanen_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_ramanen_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Ramanen/finetuning-sentiment-model-3000-samples + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-finetuning_sentiment_model_3000_samples_vladcarare_en.md b/docs/_posts/ahmedlone127/2024-09-21-finetuning_sentiment_model_3000_samples_vladcarare_en.md new file mode 100644 index 00000000000000..a8d9bc012f490b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-finetuning_sentiment_model_3000_samples_vladcarare_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_vladcarare DistilBertForSequenceClassification from VladCarare +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_vladcarare +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_vladcarare` is a English model originally trained by VladCarare. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_vladcarare_en_5.5.0_3.0_1726884586527.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_vladcarare_en_5.5.0_3.0_1726884586527.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_vladcarare","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_vladcarare", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_vladcarare| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/VladCarare/finetuning-sentiment-model-3000-samples \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-finetuning_sentiment_model_3000_samples_vladcarare_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-finetuning_sentiment_model_3000_samples_vladcarare_pipeline_en.md new file mode 100644 index 00000000000000..f0facb87ff346c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-finetuning_sentiment_model_3000_samples_vladcarare_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_vladcarare_pipeline pipeline DistilBertForSequenceClassification from VladCarare +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_vladcarare_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_vladcarare_pipeline` is a English model originally trained by VladCarare. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_vladcarare_pipeline_en_5.5.0_3.0_1726884598423.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_vladcarare_pipeline_en_5.5.0_3.0_1726884598423.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuning_sentiment_model_3000_samples_vladcarare_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuning_sentiment_model_3000_samples_vladcarare_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_vladcarare_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/VladCarare/finetuning-sentiment-model-3000-samples + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-finetuning_sentiment_model_uzb_en.md b/docs/_posts/ahmedlone127/2024-09-21-finetuning_sentiment_model_uzb_en.md new file mode 100644 index 00000000000000..a1b98e38fd018d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-finetuning_sentiment_model_uzb_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuning_sentiment_model_uzb DistilBertForSequenceClassification from blackhole33 +author: John Snow Labs +name: finetuning_sentiment_model_uzb +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_uzb` is a English model originally trained by blackhole33. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_uzb_en_5.5.0_3.0_1726884887679.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_uzb_en_5.5.0_3.0_1726884887679.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_uzb","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_uzb", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_uzb| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/blackhole33/finetuning-sentiment-model-uzb \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-finetuning_sentiment_model_uzb_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-finetuning_sentiment_model_uzb_pipeline_en.md new file mode 100644 index 00000000000000..c72c293cee8dfd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-finetuning_sentiment_model_uzb_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuning_sentiment_model_uzb_pipeline pipeline DistilBertForSequenceClassification from blackhole33 +author: John Snow Labs +name: finetuning_sentiment_model_uzb_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_uzb_pipeline` is a English model originally trained by blackhole33. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_uzb_pipeline_en_5.5.0_3.0_1726884899813.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_uzb_pipeline_en_5.5.0_3.0_1726884899813.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuning_sentiment_model_uzb_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuning_sentiment_model_uzb_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_uzb_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/blackhole33/finetuning-sentiment-model-uzb + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-finnews_sentimentanalysis_v1_en.md b/docs/_posts/ahmedlone127/2024-09-21-finnews_sentimentanalysis_v1_en.md new file mode 100644 index 00000000000000..32b98899fd0b7e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-finnews_sentimentanalysis_v1_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finnews_sentimentanalysis_v1 DistilBertForSequenceClassification from JoanParanoid +author: John Snow Labs +name: finnews_sentimentanalysis_v1 +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finnews_sentimentanalysis_v1` is a English model originally trained by JoanParanoid. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finnews_sentimentanalysis_v1_en_5.5.0_3.0_1726953427943.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finnews_sentimentanalysis_v1_en_5.5.0_3.0_1726953427943.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("finnews_sentimentanalysis_v1","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("finnews_sentimentanalysis_v1", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finnews_sentimentanalysis_v1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|507.6 MB| + +## References + +https://huggingface.co/JoanParanoid/FinNews_SentimentAnalysis_v1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-finnews_sentimentanalysis_v1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-finnews_sentimentanalysis_v1_pipeline_en.md new file mode 100644 index 00000000000000..d346e4fcdc3cbb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-finnews_sentimentanalysis_v1_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finnews_sentimentanalysis_v1_pipeline pipeline DistilBertForSequenceClassification from JoanParanoid +author: John Snow Labs +name: finnews_sentimentanalysis_v1_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finnews_sentimentanalysis_v1_pipeline` is a English model originally trained by JoanParanoid. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finnews_sentimentanalysis_v1_pipeline_en_5.5.0_3.0_1726953452749.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finnews_sentimentanalysis_v1_pipeline_en_5.5.0_3.0_1726953452749.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finnews_sentimentanalysis_v1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finnews_sentimentanalysis_v1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finnews_sentimentanalysis_v1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|507.6 MB| + +## References + +https://huggingface.co/JoanParanoid/FinNews_SentimentAnalysis_v1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-fintunned_v2_roberta_irish_en.md b/docs/_posts/ahmedlone127/2024-09-21-fintunned_v2_roberta_irish_en.md new file mode 100644 index 00000000000000..7a8272fb20276c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-fintunned_v2_roberta_irish_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English fintunned_v2_roberta_irish RoBertaForSequenceClassification from nebiyu29 +author: John Snow Labs +name: fintunned_v2_roberta_irish +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`fintunned_v2_roberta_irish` is a English model originally trained by nebiyu29. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/fintunned_v2_roberta_irish_en_5.5.0_3.0_1726900225486.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/fintunned_v2_roberta_irish_en_5.5.0_3.0_1726900225486.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("fintunned_v2_roberta_irish","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("fintunned_v2_roberta_irish", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|fintunned_v2_roberta_irish| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|435.3 MB| + +## References + +https://huggingface.co/nebiyu29/fintunned-v2-roberta_GA \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-fintunned_v2_roberta_irish_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-fintunned_v2_roberta_irish_pipeline_en.md new file mode 100644 index 00000000000000..1377ca3bbf2ddf --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-fintunned_v2_roberta_irish_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English fintunned_v2_roberta_irish_pipeline pipeline RoBertaForSequenceClassification from nebiyu29 +author: John Snow Labs +name: fintunned_v2_roberta_irish_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`fintunned_v2_roberta_irish_pipeline` is a English model originally trained by nebiyu29. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/fintunned_v2_roberta_irish_pipeline_en_5.5.0_3.0_1726900252789.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/fintunned_v2_roberta_irish_pipeline_en_5.5.0_3.0_1726900252789.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("fintunned_v2_roberta_irish_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("fintunned_v2_roberta_irish_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|fintunned_v2_roberta_irish_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|435.3 MB| + +## References + +https://huggingface.co/nebiyu29/fintunned-v2-roberta_GA + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-firmner_v2_small_en.md b/docs/_posts/ahmedlone127/2024-09-21-firmner_v2_small_en.md new file mode 100644 index 00000000000000..92f44c32e805db --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-firmner_v2_small_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English firmner_v2_small BertForTokenClassification from loyoladatamining +author: John Snow Labs +name: firmner_v2_small +date: 2024-09-21 +tags: [en, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`firmner_v2_small` is a English model originally trained by loyoladatamining. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/firmner_v2_small_en_5.5.0_3.0_1726889728528.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/firmner_v2_small_en_5.5.0_3.0_1726889728528.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("firmner_v2_small","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("firmner_v2_small", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|firmner_v2_small| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|107.0 MB| + +## References + +https://huggingface.co/loyoladatamining/firmNER-v2-small \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-firmner_v2_small_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-firmner_v2_small_pipeline_en.md new file mode 100644 index 00000000000000..911c0719f9c720 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-firmner_v2_small_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English firmner_v2_small_pipeline pipeline BertForTokenClassification from loyoladatamining +author: John Snow Labs +name: firmner_v2_small_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`firmner_v2_small_pipeline` is a English model originally trained by loyoladatamining. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/firmner_v2_small_pipeline_en_5.5.0_3.0_1726889734236.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/firmner_v2_small_pipeline_en_5.5.0_3.0_1726889734236.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("firmner_v2_small_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("firmner_v2_small_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|firmner_v2_small_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|107.0 MB| + +## References + +https://huggingface.co/loyoladatamining/firmNER-v2-small + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-frozen_news_classifier_ft_pipeline_ru.md b/docs/_posts/ahmedlone127/2024-09-21-frozen_news_classifier_ft_pipeline_ru.md new file mode 100644 index 00000000000000..70648a01f38469 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-frozen_news_classifier_ft_pipeline_ru.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Russian frozen_news_classifier_ft_pipeline pipeline BertForSequenceClassification from data-silence +author: John Snow Labs +name: frozen_news_classifier_ft_pipeline +date: 2024-09-21 +tags: [ru, open_source, pipeline, onnx] +task: Text Classification +language: ru +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`frozen_news_classifier_ft_pipeline` is a Russian model originally trained by data-silence. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/frozen_news_classifier_ft_pipeline_ru_5.5.0_3.0_1726955255135.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/frozen_news_classifier_ft_pipeline_ru_5.5.0_3.0_1726955255135.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("frozen_news_classifier_ft_pipeline", lang = "ru") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("frozen_news_classifier_ft_pipeline", lang = "ru") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|frozen_news_classifier_ft_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|ru| +|Size:|1.8 GB| + +## References + +https://huggingface.co/data-silence/frozen_news_classifier_ft + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-frozen_news_classifier_ft_ru.md b/docs/_posts/ahmedlone127/2024-09-21-frozen_news_classifier_ft_ru.md new file mode 100644 index 00000000000000..0e55ccb985d585 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-frozen_news_classifier_ft_ru.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Russian frozen_news_classifier_ft BertForSequenceClassification from data-silence +author: John Snow Labs +name: frozen_news_classifier_ft +date: 2024-09-21 +tags: [ru, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: ru +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`frozen_news_classifier_ft` is a Russian model originally trained by data-silence. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/frozen_news_classifier_ft_ru_5.5.0_3.0_1726955176000.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/frozen_news_classifier_ft_ru_5.5.0_3.0_1726955176000.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("frozen_news_classifier_ft","ru") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("frozen_news_classifier_ft", "ru") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|frozen_news_classifier_ft| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|ru| +|Size:|1.8 GB| + +## References + +https://huggingface.co/data-silence/frozen_news_classifier_ft \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-ft_distilbert_en.md b/docs/_posts/ahmedlone127/2024-09-21-ft_distilbert_en.md new file mode 100644 index 00000000000000..a061781fd4c5d8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-ft_distilbert_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English ft_distilbert DistilBertForSequenceClassification from kumbi500 +author: John Snow Labs +name: ft_distilbert +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ft_distilbert` is a English model originally trained by kumbi500. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ft_distilbert_en_5.5.0_3.0_1726888759607.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ft_distilbert_en_5.5.0_3.0_1726888759607.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("ft_distilbert","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("ft_distilbert", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ft_distilbert| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/kumbi500/FT_DistilBERT \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-ft_distilbert_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-ft_distilbert_pipeline_en.md new file mode 100644 index 00000000000000..7856deb0f12b5a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-ft_distilbert_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English ft_distilbert_pipeline pipeline DistilBertForSequenceClassification from kumbi500 +author: John Snow Labs +name: ft_distilbert_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ft_distilbert_pipeline` is a English model originally trained by kumbi500. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ft_distilbert_pipeline_en_5.5.0_3.0_1726888771915.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ft_distilbert_pipeline_en_5.5.0_3.0_1726888771915.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("ft_distilbert_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("ft_distilbert_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ft_distilbert_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/kumbi500/FT_DistilBERT + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-ft_english_10maa_en.md b/docs/_posts/ahmedlone127/2024-09-21-ft_english_10maa_en.md new file mode 100644 index 00000000000000..c15a6c8cd22851 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-ft_english_10maa_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English ft_english_10maa WhisperForCTC from Pageee +author: John Snow Labs +name: ft_english_10maa +date: 2024-09-21 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ft_english_10maa` is a English model originally trained by Pageee. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ft_english_10maa_en_5.5.0_3.0_1726912918803.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ft_english_10maa_en_5.5.0_3.0_1726912918803.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("ft_english_10maa","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("ft_english_10maa", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ft_english_10maa| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/Pageee/FT-English-10maa \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-ft_english_10maa_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-ft_english_10maa_pipeline_en.md new file mode 100644 index 00000000000000..2f51cc5d07260f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-ft_english_10maa_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English ft_english_10maa_pipeline pipeline WhisperForCTC from Pageee +author: John Snow Labs +name: ft_english_10maa_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ft_english_10maa_pipeline` is a English model originally trained by Pageee. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ft_english_10maa_pipeline_en_5.5.0_3.0_1726913000741.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ft_english_10maa_pipeline_en_5.5.0_3.0_1726913000741.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("ft_english_10maa_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("ft_english_10maa_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ft_english_10maa_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/Pageee/FT-English-10maa + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-ghpoliticsbert_en.md b/docs/_posts/ahmedlone127/2024-09-21-ghpoliticsbert_en.md new file mode 100644 index 00000000000000..44d688c68f6458 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-ghpoliticsbert_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English ghpoliticsbert RoBertaEmbeddings from JoAmps +author: John Snow Labs +name: ghpoliticsbert +date: 2024-09-21 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ghpoliticsbert` is a English model originally trained by JoAmps. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ghpoliticsbert_en_5.5.0_3.0_1726943756034.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ghpoliticsbert_en_5.5.0_3.0_1726943756034.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("ghpoliticsbert","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("ghpoliticsbert","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ghpoliticsbert| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|306.6 MB| + +## References + +https://huggingface.co/JoAmps/GhPoliticsBERT \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-ghpoliticsbert_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-ghpoliticsbert_pipeline_en.md new file mode 100644 index 00000000000000..094a0ddc31e5f4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-ghpoliticsbert_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English ghpoliticsbert_pipeline pipeline RoBertaEmbeddings from JoAmps +author: John Snow Labs +name: ghpoliticsbert_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ghpoliticsbert_pipeline` is a English model originally trained by JoAmps. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ghpoliticsbert_pipeline_en_5.5.0_3.0_1726943770610.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ghpoliticsbert_pipeline_en_5.5.0_3.0_1726943770610.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("ghpoliticsbert_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("ghpoliticsbert_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ghpoliticsbert_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|306.6 MB| + +## References + +https://huggingface.co/JoAmps/GhPoliticsBERT + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-ground_english_roberta_base_en.md b/docs/_posts/ahmedlone127/2024-09-21-ground_english_roberta_base_en.md new file mode 100644 index 00000000000000..5ca2e2128e81c5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-ground_english_roberta_base_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English ground_english_roberta_base RoBertaEmbeddings from dreamerdeo +author: John Snow Labs +name: ground_english_roberta_base +date: 2024-09-21 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ground_english_roberta_base` is a English model originally trained by dreamerdeo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ground_english_roberta_base_en_5.5.0_3.0_1726934664039.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ground_english_roberta_base_en_5.5.0_3.0_1726934664039.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("ground_english_roberta_base","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("ground_english_roberta_base","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ground_english_roberta_base| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|464.1 MB| + +## References + +https://huggingface.co/dreamerdeo/ground-en-roberta-base \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-ground_english_roberta_base_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-ground_english_roberta_base_pipeline_en.md new file mode 100644 index 00000000000000..7cec08e2a574aa --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-ground_english_roberta_base_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English ground_english_roberta_base_pipeline pipeline RoBertaEmbeddings from dreamerdeo +author: John Snow Labs +name: ground_english_roberta_base_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ground_english_roberta_base_pipeline` is a English model originally trained by dreamerdeo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ground_english_roberta_base_pipeline_en_5.5.0_3.0_1726934685904.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ground_english_roberta_base_pipeline_en_5.5.0_3.0_1726934685904.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("ground_english_roberta_base_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("ground_english_roberta_base_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ground_english_roberta_base_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|464.2 MB| + +## References + +https://huggingface.co/dreamerdeo/ground-en-roberta-base + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-hate_hate_balance_random0_seed0_bernice_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-hate_hate_balance_random0_seed0_bernice_pipeline_en.md new file mode 100644 index 00000000000000..92bbce1a083aaf --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-hate_hate_balance_random0_seed0_bernice_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English hate_hate_balance_random0_seed0_bernice_pipeline pipeline XlmRoBertaForSequenceClassification from tweettemposhift +author: John Snow Labs +name: hate_hate_balance_random0_seed0_bernice_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`hate_hate_balance_random0_seed0_bernice_pipeline` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/hate_hate_balance_random0_seed0_bernice_pipeline_en_5.5.0_3.0_1726933410534.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/hate_hate_balance_random0_seed0_bernice_pipeline_en_5.5.0_3.0_1726933410534.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("hate_hate_balance_random0_seed0_bernice_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("hate_hate_balance_random0_seed0_bernice_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|hate_hate_balance_random0_seed0_bernice_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|783.4 MB| + +## References + +https://huggingface.co/tweettemposhift/hate-hate_balance_random0_seed0-bernice + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-hate_hate_balance_random2_seed1_bernice_en.md b/docs/_posts/ahmedlone127/2024-09-21-hate_hate_balance_random2_seed1_bernice_en.md new file mode 100644 index 00000000000000..6395c398be5bab --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-hate_hate_balance_random2_seed1_bernice_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English hate_hate_balance_random2_seed1_bernice XlmRoBertaForSequenceClassification from tweettemposhift +author: John Snow Labs +name: hate_hate_balance_random2_seed1_bernice +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`hate_hate_balance_random2_seed1_bernice` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/hate_hate_balance_random2_seed1_bernice_en_5.5.0_3.0_1726918596987.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/hate_hate_balance_random2_seed1_bernice_en_5.5.0_3.0_1726918596987.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("hate_hate_balance_random2_seed1_bernice","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("hate_hate_balance_random2_seed1_bernice", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|hate_hate_balance_random2_seed1_bernice| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|783.5 MB| + +## References + +https://huggingface.co/tweettemposhift/hate-hate_balance_random2_seed1-bernice \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-hate_hate_balance_random2_seed1_bernice_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-hate_hate_balance_random2_seed1_bernice_pipeline_en.md new file mode 100644 index 00000000000000..fa3681c0417994 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-hate_hate_balance_random2_seed1_bernice_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English hate_hate_balance_random2_seed1_bernice_pipeline pipeline XlmRoBertaForSequenceClassification from tweettemposhift +author: John Snow Labs +name: hate_hate_balance_random2_seed1_bernice_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`hate_hate_balance_random2_seed1_bernice_pipeline` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/hate_hate_balance_random2_seed1_bernice_pipeline_en_5.5.0_3.0_1726918742539.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/hate_hate_balance_random2_seed1_bernice_pipeline_en_5.5.0_3.0_1726918742539.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("hate_hate_balance_random2_seed1_bernice_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("hate_hate_balance_random2_seed1_bernice_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|hate_hate_balance_random2_seed1_bernice_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|783.5 MB| + +## References + +https://huggingface.co/tweettemposhift/hate-hate_balance_random2_seed1-bernice + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-hate_hate_balance_temporal_bernice_en.md b/docs/_posts/ahmedlone127/2024-09-21-hate_hate_balance_temporal_bernice_en.md new file mode 100644 index 00000000000000..1ef53fe6a479fa --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-hate_hate_balance_temporal_bernice_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English hate_hate_balance_temporal_bernice XlmRoBertaForSequenceClassification from tweettemposhift +author: John Snow Labs +name: hate_hate_balance_temporal_bernice +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`hate_hate_balance_temporal_bernice` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/hate_hate_balance_temporal_bernice_en_5.5.0_3.0_1726918123464.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/hate_hate_balance_temporal_bernice_en_5.5.0_3.0_1726918123464.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("hate_hate_balance_temporal_bernice","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("hate_hate_balance_temporal_bernice", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|hate_hate_balance_temporal_bernice| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|783.6 MB| + +## References + +https://huggingface.co/tweettemposhift/hate-hate_balance_temporal-bernice \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-hate_hate_balance_temporal_bernice_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-hate_hate_balance_temporal_bernice_pipeline_en.md new file mode 100644 index 00000000000000..e651f731eab514 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-hate_hate_balance_temporal_bernice_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English hate_hate_balance_temporal_bernice_pipeline pipeline XlmRoBertaForSequenceClassification from tweettemposhift +author: John Snow Labs +name: hate_hate_balance_temporal_bernice_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`hate_hate_balance_temporal_bernice_pipeline` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/hate_hate_balance_temporal_bernice_pipeline_en_5.5.0_3.0_1726918268977.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/hate_hate_balance_temporal_bernice_pipeline_en_5.5.0_3.0_1726918268977.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("hate_hate_balance_temporal_bernice_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("hate_hate_balance_temporal_bernice_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|hate_hate_balance_temporal_bernice_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|783.7 MB| + +## References + +https://huggingface.co/tweettemposhift/hate-hate_balance_temporal-bernice + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-helf_tiny_english_en.md b/docs/_posts/ahmedlone127/2024-09-21-helf_tiny_english_en.md new file mode 100644 index 00000000000000..b08915ef1dca78 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-helf_tiny_english_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English helf_tiny_english WhisperForCTC from ChitNan +author: John Snow Labs +name: helf_tiny_english +date: 2024-09-21 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`helf_tiny_english` is a English model originally trained by ChitNan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/helf_tiny_english_en_5.5.0_3.0_1726937788711.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/helf_tiny_english_en_5.5.0_3.0_1726937788711.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("helf_tiny_english","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("helf_tiny_english", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|helf_tiny_english| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|389.8 MB| + +## References + +https://huggingface.co/ChitNan/helf-tiny-en \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-helf_tiny_english_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-helf_tiny_english_pipeline_en.md new file mode 100644 index 00000000000000..6f3e3a906524db --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-helf_tiny_english_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English helf_tiny_english_pipeline pipeline WhisperForCTC from ChitNan +author: John Snow Labs +name: helf_tiny_english_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`helf_tiny_english_pipeline` is a English model originally trained by ChitNan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/helf_tiny_english_pipeline_en_5.5.0_3.0_1726937808954.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/helf_tiny_english_pipeline_en_5.5.0_3.0_1726937808954.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("helf_tiny_english_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("helf_tiny_english_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|helf_tiny_english_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|389.8 MB| + +## References + +https://huggingface.co/ChitNan/helf-tiny-en + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-hin_trac2_en.md b/docs/_posts/ahmedlone127/2024-09-21-hin_trac2_en.md new file mode 100644 index 00000000000000..fa26f6fbfda102 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-hin_trac2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English hin_trac2 BertForSequenceClassification from Maha +author: John Snow Labs +name: hin_trac2 +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`hin_trac2` is a English model originally trained by Maha. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/hin_trac2_en_5.5.0_3.0_1726956335279.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/hin_trac2_en_5.5.0_3.0_1726956335279.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("hin_trac2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("hin_trac2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|hin_trac2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|667.3 MB| + +## References + +https://huggingface.co/Maha/hin-trac2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-hin_trac2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-hin_trac2_pipeline_en.md new file mode 100644 index 00000000000000..8d57e8a81bd91c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-hin_trac2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English hin_trac2_pipeline pipeline BertForSequenceClassification from Maha +author: John Snow Labs +name: hin_trac2_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`hin_trac2_pipeline` is a English model originally trained by Maha. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/hin_trac2_pipeline_en_5.5.0_3.0_1726956366178.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/hin_trac2_pipeline_en_5.5.0_3.0_1726956366178.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("hin_trac2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("hin_trac2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|hin_trac2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|667.3 MB| + +## References + +https://huggingface.co/Maha/hin-trac2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-homework001_en.md b/docs/_posts/ahmedlone127/2024-09-21-homework001_en.md new file mode 100644 index 00000000000000..fbb998efcfbf5d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-homework001_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English homework001 DistilBertForSequenceClassification from andyWuTw +author: John Snow Labs +name: homework001 +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`homework001` is a English model originally trained by andyWuTw. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/homework001_en_5.5.0_3.0_1726923742874.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/homework001_en_5.5.0_3.0_1726923742874.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("homework001","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("homework001", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|homework001| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/andyWuTw/homework001 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-homework001_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-homework001_pipeline_en.md new file mode 100644 index 00000000000000..ca8b9788c2b911 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-homework001_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English homework001_pipeline pipeline DistilBertForSequenceClassification from andyWuTw +author: John Snow Labs +name: homework001_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`homework001_pipeline` is a English model originally trained by andyWuTw. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/homework001_pipeline_en_5.5.0_3.0_1726923754709.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/homework001_pipeline_en_5.5.0_3.0_1726923754709.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("homework001_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("homework001_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|homework001_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/andyWuTw/homework001 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-hp_book_classifier_en.md b/docs/_posts/ahmedlone127/2024-09-21-hp_book_classifier_en.md new file mode 100644 index 00000000000000..db802cfb5f5c9e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-hp_book_classifier_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English hp_book_classifier RoBertaForSequenceClassification from chrisliu298 +author: John Snow Labs +name: hp_book_classifier +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`hp_book_classifier` is a English model originally trained by chrisliu298. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/hp_book_classifier_en_5.5.0_3.0_1726900160127.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/hp_book_classifier_en_5.5.0_3.0_1726900160127.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("hp_book_classifier","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("hp_book_classifier", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|hp_book_classifier| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|454.7 MB| + +## References + +https://huggingface.co/chrisliu298/hp_book_classifier \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-hp_book_classifier_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-hp_book_classifier_pipeline_en.md new file mode 100644 index 00000000000000..3297ec6dc3c331 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-hp_book_classifier_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English hp_book_classifier_pipeline pipeline RoBertaForSequenceClassification from chrisliu298 +author: John Snow Labs +name: hp_book_classifier_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`hp_book_classifier_pipeline` is a English model originally trained by chrisliu298. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/hp_book_classifier_pipeline_en_5.5.0_3.0_1726900181337.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/hp_book_classifier_pipeline_en_5.5.0_3.0_1726900181337.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("hp_book_classifier_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("hp_book_classifier_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|hp_book_classifier_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|454.8 MB| + +## References + +https://huggingface.co/chrisliu298/hp_book_classifier + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-hraf_multilabel_hierarchical_en.md b/docs/_posts/ahmedlone127/2024-09-21-hraf_multilabel_hierarchical_en.md new file mode 100644 index 00000000000000..73994f20c72c8b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-hraf_multilabel_hierarchical_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English hraf_multilabel_hierarchical DistilBertForSequenceClassification from Chantland +author: John Snow Labs +name: hraf_multilabel_hierarchical +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`hraf_multilabel_hierarchical` is a English model originally trained by Chantland. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/hraf_multilabel_hierarchical_en_5.5.0_3.0_1726953472235.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/hraf_multilabel_hierarchical_en_5.5.0_3.0_1726953472235.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("hraf_multilabel_hierarchical","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("hraf_multilabel_hierarchical", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|hraf_multilabel_hierarchical| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Chantland/HRAF_MultiLabel_Hierarchical \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-inlegalbert_en.md b/docs/_posts/ahmedlone127/2024-09-21-inlegalbert_en.md new file mode 100644 index 00000000000000..ae4bd58d121ee7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-inlegalbert_en.md @@ -0,0 +1,92 @@ +--- +layout: model +title: English inlegalbert BertEmbeddings from law-ai +author: John Snow Labs +name: inlegalbert +date: 2024-09-21 +tags: [bert, en, open_source, fill_mask, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`inlegalbert` is a English model originally trained by law-ai. + +## Predicted Entities + + + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/inlegalbert_en_5.5.0_3.0_1726956471784.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/inlegalbert_en_5.5.0_3.0_1726956471784.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +document_assembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +embeddings =BertEmbeddings.pretrained("inlegalbert","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([document_assembler, embeddings]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) +``` +```scala +val document_assembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val embeddings = BertEmbeddings + .pretrained("inlegalbert", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(document_assembler, embeddings)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|inlegalbert| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.5 MB| + +## References + +References + +https://huggingface.co/law-ai/InLegalBERT \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-inlegalbert_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-inlegalbert_pipeline_en.md new file mode 100644 index 00000000000000..4e98da507994ea --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-inlegalbert_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English inlegalbert_pipeline pipeline BertForSequenceClassification from xshubhamx +author: John Snow Labs +name: inlegalbert_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`inlegalbert_pipeline` is a English model originally trained by xshubhamx. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/inlegalbert_pipeline_en_5.5.0_3.0_1726956490522.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/inlegalbert_pipeline_en_5.5.0_3.0_1726956490522.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("inlegalbert_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("inlegalbert_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|inlegalbert_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.5 MB| + +## References + +https://huggingface.co/xshubhamx/InLegalBERT + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-inria_roberta_en.md b/docs/_posts/ahmedlone127/2024-09-21-inria_roberta_en.md new file mode 100644 index 00000000000000..a70cb22041e43a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-inria_roberta_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English inria_roberta RoBertaEmbeddings from subbareddyiiit +author: John Snow Labs +name: inria_roberta +date: 2024-09-21 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`inria_roberta` is a English model originally trained by subbareddyiiit. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/inria_roberta_en_5.5.0_3.0_1726942567776.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/inria_roberta_en_5.5.0_3.0_1726942567776.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("inria_roberta","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("inria_roberta","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|inria_roberta| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|465.9 MB| + +## References + +https://huggingface.co/subbareddyiiit/inria_roberta \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-inria_roberta_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-inria_roberta_pipeline_en.md new file mode 100644 index 00000000000000..f9b2549877dddb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-inria_roberta_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English inria_roberta_pipeline pipeline RoBertaEmbeddings from subbareddyiiit +author: John Snow Labs +name: inria_roberta_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`inria_roberta_pipeline` is a English model originally trained by subbareddyiiit. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/inria_roberta_pipeline_en_5.5.0_3.0_1726942589300.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/inria_roberta_pipeline_en_5.5.0_3.0_1726942589300.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("inria_roberta_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("inria_roberta_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|inria_roberta_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|465.9 MB| + +## References + +https://huggingface.co/subbareddyiiit/inria_roberta + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-japanese_fine_tuned_whisper_model_nadiaholmlund_ja.md b/docs/_posts/ahmedlone127/2024-09-21-japanese_fine_tuned_whisper_model_nadiaholmlund_ja.md new file mode 100644 index 00000000000000..879722b12e8562 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-japanese_fine_tuned_whisper_model_nadiaholmlund_ja.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Japanese japanese_fine_tuned_whisper_model_nadiaholmlund WhisperForCTC from NadiaHolmlund +author: John Snow Labs +name: japanese_fine_tuned_whisper_model_nadiaholmlund +date: 2024-09-21 +tags: [ja, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: ja +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`japanese_fine_tuned_whisper_model_nadiaholmlund` is a Japanese model originally trained by NadiaHolmlund. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/japanese_fine_tuned_whisper_model_nadiaholmlund_ja_5.5.0_3.0_1726904512990.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/japanese_fine_tuned_whisper_model_nadiaholmlund_ja_5.5.0_3.0_1726904512990.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("japanese_fine_tuned_whisper_model_nadiaholmlund","ja") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("japanese_fine_tuned_whisper_model_nadiaholmlund", "ja") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|japanese_fine_tuned_whisper_model_nadiaholmlund| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|ja| +|Size:|390.9 MB| + +## References + +https://huggingface.co/NadiaHolmlund/Japanese_Fine_Tuned_Whisper_Model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-japanese_fine_tuned_whisper_model_nadiaholmlund_pipeline_ja.md b/docs/_posts/ahmedlone127/2024-09-21-japanese_fine_tuned_whisper_model_nadiaholmlund_pipeline_ja.md new file mode 100644 index 00000000000000..aff9f49667bb0a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-japanese_fine_tuned_whisper_model_nadiaholmlund_pipeline_ja.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Japanese japanese_fine_tuned_whisper_model_nadiaholmlund_pipeline pipeline WhisperForCTC from NadiaHolmlund +author: John Snow Labs +name: japanese_fine_tuned_whisper_model_nadiaholmlund_pipeline +date: 2024-09-21 +tags: [ja, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: ja +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`japanese_fine_tuned_whisper_model_nadiaholmlund_pipeline` is a Japanese model originally trained by NadiaHolmlund. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/japanese_fine_tuned_whisper_model_nadiaholmlund_pipeline_ja_5.5.0_3.0_1726904533776.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/japanese_fine_tuned_whisper_model_nadiaholmlund_pipeline_ja_5.5.0_3.0_1726904533776.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("japanese_fine_tuned_whisper_model_nadiaholmlund_pipeline", lang = "ja") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("japanese_fine_tuned_whisper_model_nadiaholmlund_pipeline", lang = "ja") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|japanese_fine_tuned_whisper_model_nadiaholmlund_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|ja| +|Size:|390.9 MB| + +## References + +https://huggingface.co/NadiaHolmlund/Japanese_Fine_Tuned_Whisper_Model + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-jerteh355sentneg4_en.md b/docs/_posts/ahmedlone127/2024-09-21-jerteh355sentneg4_en.md new file mode 100644 index 00000000000000..1aeafda02461fb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-jerteh355sentneg4_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English jerteh355sentneg4 RoBertaForSequenceClassification from Tanor +author: John Snow Labs +name: jerteh355sentneg4 +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`jerteh355sentneg4` is a English model originally trained by Tanor. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/jerteh355sentneg4_en_5.5.0_3.0_1726900880289.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/jerteh355sentneg4_en_5.5.0_3.0_1726900880289.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("jerteh355sentneg4","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("jerteh355sentneg4", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|jerteh355sentneg4| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/Tanor/Jerteh355SENTNEG4 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-jerteh355sentneg4_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-jerteh355sentneg4_pipeline_en.md new file mode 100644 index 00000000000000..b72a4b21462703 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-jerteh355sentneg4_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English jerteh355sentneg4_pipeline pipeline RoBertaForSequenceClassification from Tanor +author: John Snow Labs +name: jerteh355sentneg4_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`jerteh355sentneg4_pipeline` is a English model originally trained by Tanor. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/jerteh355sentneg4_pipeline_en_5.5.0_3.0_1726900943137.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/jerteh355sentneg4_pipeline_en_5.5.0_3.0_1726900943137.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("jerteh355sentneg4_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("jerteh355sentneg4_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|jerteh355sentneg4_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/Tanor/Jerteh355SENTNEG4 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-m2_mlm_en.md b/docs/_posts/ahmedlone127/2024-09-21-m2_mlm_en.md new file mode 100644 index 00000000000000..8b9c7602cc31b7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-m2_mlm_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English m2_mlm RoBertaEmbeddings from S2312dal +author: John Snow Labs +name: m2_mlm +date: 2024-09-21 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`m2_mlm` is a English model originally trained by S2312dal. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/m2_mlm_en_5.5.0_3.0_1726934340884.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/m2_mlm_en_5.5.0_3.0_1726934340884.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("m2_mlm","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("m2_mlm","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|m2_mlm| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|466.1 MB| + +## References + +https://huggingface.co/S2312dal/M2_MLM \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-mal_asr_whisper_small_imasc_1000_nan.md b/docs/_posts/ahmedlone127/2024-09-21-mal_asr_whisper_small_imasc_1000_nan.md new file mode 100644 index 00000000000000..5a4e2673ee9e44 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-mal_asr_whisper_small_imasc_1000_nan.md @@ -0,0 +1,84 @@ +--- +layout: model +title: None mal_asr_whisper_small_imasc_1000 WhisperForCTC from leenag +author: John Snow Labs +name: mal_asr_whisper_small_imasc_1000 +date: 2024-09-21 +tags: [nan, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: nan +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`mal_asr_whisper_small_imasc_1000` is a None model originally trained by leenag. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/mal_asr_whisper_small_imasc_1000_nan_5.5.0_3.0_1726960205194.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/mal_asr_whisper_small_imasc_1000_nan_5.5.0_3.0_1726960205194.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("mal_asr_whisper_small_imasc_1000","nan") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("mal_asr_whisper_small_imasc_1000", "nan") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|mal_asr_whisper_small_imasc_1000| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|nan| +|Size:|1.7 GB| + +## References + +https://huggingface.co/leenag/Mal_ASR_Whisper_small_imasc_1000 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-mal_asr_whisper_small_imasc_1000_pipeline_nan.md b/docs/_posts/ahmedlone127/2024-09-21-mal_asr_whisper_small_imasc_1000_pipeline_nan.md new file mode 100644 index 00000000000000..574a46e283dc3c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-mal_asr_whisper_small_imasc_1000_pipeline_nan.md @@ -0,0 +1,69 @@ +--- +layout: model +title: None mal_asr_whisper_small_imasc_1000_pipeline pipeline WhisperForCTC from leenag +author: John Snow Labs +name: mal_asr_whisper_small_imasc_1000_pipeline +date: 2024-09-21 +tags: [nan, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: nan +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`mal_asr_whisper_small_imasc_1000_pipeline` is a None model originally trained by leenag. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/mal_asr_whisper_small_imasc_1000_pipeline_nan_5.5.0_3.0_1726960285629.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/mal_asr_whisper_small_imasc_1000_pipeline_nan_5.5.0_3.0_1726960285629.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("mal_asr_whisper_small_imasc_1000_pipeline", lang = "nan") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("mal_asr_whisper_small_imasc_1000_pipeline", lang = "nan") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|mal_asr_whisper_small_imasc_1000_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|nan| +|Size:|1.7 GB| + +## References + +https://huggingface.co/leenag/Mal_ASR_Whisper_small_imasc_1000 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-malasar_asr_final_nan.md b/docs/_posts/ahmedlone127/2024-09-21-malasar_asr_final_nan.md new file mode 100644 index 00000000000000..bb0bca3233dd8e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-malasar_asr_final_nan.md @@ -0,0 +1,84 @@ +--- +layout: model +title: None malasar_asr_final WhisperForCTC from basilkr +author: John Snow Labs +name: malasar_asr_final +date: 2024-09-21 +tags: [nan, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: nan +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`malasar_asr_final` is a None model originally trained by basilkr. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/malasar_asr_final_nan_5.5.0_3.0_1726912360697.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/malasar_asr_final_nan_5.5.0_3.0_1726912360697.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("malasar_asr_final","nan") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("malasar_asr_final", "nan") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|malasar_asr_final| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|nan| +|Size:|4.8 GB| + +## References + +https://huggingface.co/basilkr/Malasar_ASr_Final \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-malayalam_news_classifier_en.md b/docs/_posts/ahmedlone127/2024-09-21-malayalam_news_classifier_en.md new file mode 100644 index 00000000000000..0bb462a45071cb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-malayalam_news_classifier_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English malayalam_news_classifier XlmRoBertaForSequenceClassification from rajeshradhakrishnan +author: John Snow Labs +name: malayalam_news_classifier +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`malayalam_news_classifier` is a English model originally trained by rajeshradhakrishnan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/malayalam_news_classifier_en_5.5.0_3.0_1726918471323.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/malayalam_news_classifier_en_5.5.0_3.0_1726918471323.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("malayalam_news_classifier","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("malayalam_news_classifier", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|malayalam_news_classifier| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|777.4 MB| + +## References + +https://huggingface.co/rajeshradhakrishnan/malayalam_news_classifier \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-malayalam_news_classifier_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-malayalam_news_classifier_pipeline_en.md new file mode 100644 index 00000000000000..0e44f0fcb42332 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-malayalam_news_classifier_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English malayalam_news_classifier_pipeline pipeline XlmRoBertaForSequenceClassification from rajeshradhakrishnan +author: John Snow Labs +name: malayalam_news_classifier_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`malayalam_news_classifier_pipeline` is a English model originally trained by rajeshradhakrishnan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/malayalam_news_classifier_pipeline_en_5.5.0_3.0_1726918614546.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/malayalam_news_classifier_pipeline_en_5.5.0_3.0_1726918614546.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("malayalam_news_classifier_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("malayalam_news_classifier_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|malayalam_news_classifier_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|777.4 MB| + +## References + +https://huggingface.co/rajeshradhakrishnan/malayalam_news_classifier + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-markuus_bert_base_multilingual_squad_cqa_english_pipeline_xx.md b/docs/_posts/ahmedlone127/2024-09-21-markuus_bert_base_multilingual_squad_cqa_english_pipeline_xx.md new file mode 100644 index 00000000000000..8108cdd38929e3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-markuus_bert_base_multilingual_squad_cqa_english_pipeline_xx.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Multilingual markuus_bert_base_multilingual_squad_cqa_english_pipeline pipeline BertForQuestionAnswering from imrazaa +author: John Snow Labs +name: markuus_bert_base_multilingual_squad_cqa_english_pipeline +date: 2024-09-21 +tags: [xx, open_source, pipeline, onnx] +task: Question Answering +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`markuus_bert_base_multilingual_squad_cqa_english_pipeline` is a Multilingual model originally trained by imrazaa. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/markuus_bert_base_multilingual_squad_cqa_english_pipeline_xx_5.5.0_3.0_1726946516988.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/markuus_bert_base_multilingual_squad_cqa_english_pipeline_xx_5.5.0_3.0_1726946516988.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("markuus_bert_base_multilingual_squad_cqa_english_pipeline", lang = "xx") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("markuus_bert_base_multilingual_squad_cqa_english_pipeline", lang = "xx") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|markuus_bert_base_multilingual_squad_cqa_english_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|xx| +|Size:|625.5 MB| + +## References + +https://huggingface.co/imrazaa/markuus-bert-base-multilingual-squad-cqa-en + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-markuus_bert_base_multilingual_squad_cqa_english_xx.md b/docs/_posts/ahmedlone127/2024-09-21-markuus_bert_base_multilingual_squad_cqa_english_xx.md new file mode 100644 index 00000000000000..b971308a4d1272 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-markuus_bert_base_multilingual_squad_cqa_english_xx.md @@ -0,0 +1,86 @@ +--- +layout: model +title: Multilingual markuus_bert_base_multilingual_squad_cqa_english BertForQuestionAnswering from imrazaa +author: John Snow Labs +name: markuus_bert_base_multilingual_squad_cqa_english +date: 2024-09-21 +tags: [xx, open_source, onnx, question_answering, bert] +task: Question Answering +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`markuus_bert_base_multilingual_squad_cqa_english` is a Multilingual model originally trained by imrazaa. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/markuus_bert_base_multilingual_squad_cqa_english_xx_5.5.0_3.0_1726946487722.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/markuus_bert_base_multilingual_squad_cqa_english_xx_5.5.0_3.0_1726946487722.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("markuus_bert_base_multilingual_squad_cqa_english","xx") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("markuus_bert_base_multilingual_squad_cqa_english", "xx") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|markuus_bert_base_multilingual_squad_cqa_english| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|xx| +|Size:|625.5 MB| + +## References + +https://huggingface.co/imrazaa/markuus-bert-base-multilingual-squad-cqa-en \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-markuus_bert_base_multilingual_squad_cqa_urdu_pipeline_xx.md b/docs/_posts/ahmedlone127/2024-09-21-markuus_bert_base_multilingual_squad_cqa_urdu_pipeline_xx.md new file mode 100644 index 00000000000000..8784e92aab7dd7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-markuus_bert_base_multilingual_squad_cqa_urdu_pipeline_xx.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Multilingual markuus_bert_base_multilingual_squad_cqa_urdu_pipeline pipeline BertForQuestionAnswering from imrazaa +author: John Snow Labs +name: markuus_bert_base_multilingual_squad_cqa_urdu_pipeline +date: 2024-09-21 +tags: [xx, open_source, pipeline, onnx] +task: Question Answering +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`markuus_bert_base_multilingual_squad_cqa_urdu_pipeline` is a Multilingual model originally trained by imrazaa. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/markuus_bert_base_multilingual_squad_cqa_urdu_pipeline_xx_5.5.0_3.0_1726946921579.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/markuus_bert_base_multilingual_squad_cqa_urdu_pipeline_xx_5.5.0_3.0_1726946921579.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("markuus_bert_base_multilingual_squad_cqa_urdu_pipeline", lang = "xx") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("markuus_bert_base_multilingual_squad_cqa_urdu_pipeline", lang = "xx") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|markuus_bert_base_multilingual_squad_cqa_urdu_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|xx| +|Size:|625.5 MB| + +## References + +https://huggingface.co/imrazaa/markuus-bert-base-multilingual-squad-cqa-ur + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-markuus_bert_base_multilingual_squad_cqa_urdu_xx.md b/docs/_posts/ahmedlone127/2024-09-21-markuus_bert_base_multilingual_squad_cqa_urdu_xx.md new file mode 100644 index 00000000000000..e2337414ab3c1d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-markuus_bert_base_multilingual_squad_cqa_urdu_xx.md @@ -0,0 +1,86 @@ +--- +layout: model +title: Multilingual markuus_bert_base_multilingual_squad_cqa_urdu BertForQuestionAnswering from imrazaa +author: John Snow Labs +name: markuus_bert_base_multilingual_squad_cqa_urdu +date: 2024-09-21 +tags: [xx, open_source, onnx, question_answering, bert] +task: Question Answering +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`markuus_bert_base_multilingual_squad_cqa_urdu` is a Multilingual model originally trained by imrazaa. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/markuus_bert_base_multilingual_squad_cqa_urdu_xx_5.5.0_3.0_1726946892102.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/markuus_bert_base_multilingual_squad_cqa_urdu_xx_5.5.0_3.0_1726946892102.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("markuus_bert_base_multilingual_squad_cqa_urdu","xx") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("markuus_bert_base_multilingual_squad_cqa_urdu", "xx") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|markuus_bert_base_multilingual_squad_cqa_urdu| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|xx| +|Size:|625.5 MB| + +## References + +https://huggingface.co/imrazaa/markuus-bert-base-multilingual-squad-cqa-ur \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-maskedberton_en.md b/docs/_posts/ahmedlone127/2024-09-21-maskedberton_en.md new file mode 100644 index 00000000000000..7e5614834846a9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-maskedberton_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English maskedberton RoBertaEmbeddings from jeremierostan +author: John Snow Labs +name: maskedberton +date: 2024-09-21 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`maskedberton` is a English model originally trained by jeremierostan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/maskedberton_en_5.5.0_3.0_1726958059592.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/maskedberton_en_5.5.0_3.0_1726958059592.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("maskedberton","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("maskedberton","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|maskedberton| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|311.4 MB| + +## References + +https://huggingface.co/jeremierostan/maskedBERTon \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-maskedberton_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-maskedberton_pipeline_en.md new file mode 100644 index 00000000000000..f275b2f4501053 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-maskedberton_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English maskedberton_pipeline pipeline RoBertaEmbeddings from jeremierostan +author: John Snow Labs +name: maskedberton_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`maskedberton_pipeline` is a English model originally trained by jeremierostan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/maskedberton_pipeline_en_5.5.0_3.0_1726958075588.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/maskedberton_pipeline_en_5.5.0_3.0_1726958075588.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("maskedberton_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("maskedberton_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|maskedberton_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|311.5 MB| + +## References + +https://huggingface.co/jeremierostan/maskedBERTon + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-medium_24_2_tpu_timestamped_prob_0_2_en.md b/docs/_posts/ahmedlone127/2024-09-21-medium_24_2_tpu_timestamped_prob_0_2_en.md new file mode 100644 index 00000000000000..0486b18fce1ee9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-medium_24_2_tpu_timestamped_prob_0_2_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English medium_24_2_tpu_timestamped_prob_0_2 WhisperForCTC from sanchit-gandhi +author: John Snow Labs +name: medium_24_2_tpu_timestamped_prob_0_2 +date: 2024-09-21 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`medium_24_2_tpu_timestamped_prob_0_2` is a English model originally trained by sanchit-gandhi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/medium_24_2_tpu_timestamped_prob_0_2_en_5.5.0_3.0_1726912640380.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/medium_24_2_tpu_timestamped_prob_0_2_en_5.5.0_3.0_1726912640380.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("medium_24_2_tpu_timestamped_prob_0_2","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("medium_24_2_tpu_timestamped_prob_0_2", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|medium_24_2_tpu_timestamped_prob_0_2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.8 GB| + +## References + +https://huggingface.co/sanchit-gandhi/medium-24-2-tpu-timestamped-prob-0.2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-medium_24_2_tpu_timestamped_prob_0_2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-medium_24_2_tpu_timestamped_prob_0_2_pipeline_en.md new file mode 100644 index 00000000000000..ec992929deba4c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-medium_24_2_tpu_timestamped_prob_0_2_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English medium_24_2_tpu_timestamped_prob_0_2_pipeline pipeline WhisperForCTC from sanchit-gandhi +author: John Snow Labs +name: medium_24_2_tpu_timestamped_prob_0_2_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`medium_24_2_tpu_timestamped_prob_0_2_pipeline` is a English model originally trained by sanchit-gandhi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/medium_24_2_tpu_timestamped_prob_0_2_pipeline_en_5.5.0_3.0_1726912899567.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/medium_24_2_tpu_timestamped_prob_0_2_pipeline_en_5.5.0_3.0_1726912899567.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("medium_24_2_tpu_timestamped_prob_0_2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("medium_24_2_tpu_timestamped_prob_0_2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|medium_24_2_tpu_timestamped_prob_0_2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.8 GB| + +## References + +https://huggingface.co/sanchit-gandhi/medium-24-2-tpu-timestamped-prob-0.2 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-minilmv2_l6_h768_distilled_from_roberta_large_finetuned_wikitext103_en.md b/docs/_posts/ahmedlone127/2024-09-21-minilmv2_l6_h768_distilled_from_roberta_large_finetuned_wikitext103_en.md new file mode 100644 index 00000000000000..6e9d4b94ff8b3a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-minilmv2_l6_h768_distilled_from_roberta_large_finetuned_wikitext103_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English minilmv2_l6_h768_distilled_from_roberta_large_finetuned_wikitext103 RoBertaEmbeddings from saghar +author: John Snow Labs +name: minilmv2_l6_h768_distilled_from_roberta_large_finetuned_wikitext103 +date: 2024-09-21 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`minilmv2_l6_h768_distilled_from_roberta_large_finetuned_wikitext103` is a English model originally trained by saghar. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/minilmv2_l6_h768_distilled_from_roberta_large_finetuned_wikitext103_en_5.5.0_3.0_1726943613547.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/minilmv2_l6_h768_distilled_from_roberta_large_finetuned_wikitext103_en_5.5.0_3.0_1726943613547.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("minilmv2_l6_h768_distilled_from_roberta_large_finetuned_wikitext103","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("minilmv2_l6_h768_distilled_from_roberta_large_finetuned_wikitext103","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|minilmv2_l6_h768_distilled_from_roberta_large_finetuned_wikitext103| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|196.0 MB| + +## References + +https://huggingface.co/saghar/MiniLMv2-L6-H768-distilled-from-RoBERTa-Large-finetuned-wikitext103 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-minilmv2_l6_h768_distilled_from_roberta_large_finetuned_wikitext103_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-minilmv2_l6_h768_distilled_from_roberta_large_finetuned_wikitext103_pipeline_en.md new file mode 100644 index 00000000000000..90d0be61e1d145 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-minilmv2_l6_h768_distilled_from_roberta_large_finetuned_wikitext103_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English minilmv2_l6_h768_distilled_from_roberta_large_finetuned_wikitext103_pipeline pipeline RoBertaEmbeddings from saghar +author: John Snow Labs +name: minilmv2_l6_h768_distilled_from_roberta_large_finetuned_wikitext103_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`minilmv2_l6_h768_distilled_from_roberta_large_finetuned_wikitext103_pipeline` is a English model originally trained by saghar. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/minilmv2_l6_h768_distilled_from_roberta_large_finetuned_wikitext103_pipeline_en_5.5.0_3.0_1726943671538.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/minilmv2_l6_h768_distilled_from_roberta_large_finetuned_wikitext103_pipeline_en_5.5.0_3.0_1726943671538.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("minilmv2_l6_h768_distilled_from_roberta_large_finetuned_wikitext103_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("minilmv2_l6_h768_distilled_from_roberta_large_finetuned_wikitext103_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|minilmv2_l6_h768_distilled_from_roberta_large_finetuned_wikitext103_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|196.0 MB| + +## References + +https://huggingface.co/saghar/MiniLMv2-L6-H768-distilled-from-RoBERTa-Large-finetuned-wikitext103 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-mlroberta_en.md b/docs/_posts/ahmedlone127/2024-09-21-mlroberta_en.md new file mode 100644 index 00000000000000..206a436c0cd86b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-mlroberta_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English mlroberta RoBertaEmbeddings from shrutisingh +author: John Snow Labs +name: mlroberta +date: 2024-09-21 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`mlroberta` is a English model originally trained by shrutisingh. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/mlroberta_en_5.5.0_3.0_1726942219409.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/mlroberta_en_5.5.0_3.0_1726942219409.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("mlroberta","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("mlroberta","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|mlroberta| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|311.7 MB| + +## References + +https://huggingface.co/shrutisingh/MLRoBERTa \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-mlroberta_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-mlroberta_pipeline_en.md new file mode 100644 index 00000000000000..4f4bd80f3c52d5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-mlroberta_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English mlroberta_pipeline pipeline RoBertaEmbeddings from shrutisingh +author: John Snow Labs +name: mlroberta_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`mlroberta_pipeline` is a English model originally trained by shrutisingh. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/mlroberta_pipeline_en_5.5.0_3.0_1726942236334.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/mlroberta_pipeline_en_5.5.0_3.0_1726942236334.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("mlroberta_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("mlroberta_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|mlroberta_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|311.8 MB| + +## References + +https://huggingface.co/shrutisingh/MLRoBERTa + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-model_1_cannotbolt_en.md b/docs/_posts/ahmedlone127/2024-09-21-model_1_cannotbolt_en.md new file mode 100644 index 00000000000000..65f18bd04d9ed4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-model_1_cannotbolt_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English model_1_cannotbolt BertForSequenceClassification from cannotbolt +author: John Snow Labs +name: model_1_cannotbolt +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`model_1_cannotbolt` is a English model originally trained by cannotbolt. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/model_1_cannotbolt_en_5.5.0_3.0_1726956305698.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/model_1_cannotbolt_en_5.5.0_3.0_1726956305698.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("model_1_cannotbolt","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("model_1_cannotbolt", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|model_1_cannotbolt| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/cannotbolt/model_1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-model_1_cannotbolt_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-model_1_cannotbolt_pipeline_en.md new file mode 100644 index 00000000000000..04929a6fd03590 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-model_1_cannotbolt_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English model_1_cannotbolt_pipeline pipeline BertForSequenceClassification from cannotbolt +author: John Snow Labs +name: model_1_cannotbolt_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`model_1_cannotbolt_pipeline` is a English model originally trained by cannotbolt. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/model_1_cannotbolt_pipeline_en_5.5.0_3.0_1726956324979.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/model_1_cannotbolt_pipeline_en_5.5.0_3.0_1726956324979.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("model_1_cannotbolt_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("model_1_cannotbolt_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|model_1_cannotbolt_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/cannotbolt/model_1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-model_leotrim_en.md b/docs/_posts/ahmedlone127/2024-09-21-model_leotrim_en.md new file mode 100644 index 00000000000000..afeaca582c1343 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-model_leotrim_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English model_leotrim DistilBertForSequenceClassification from Leotrim +author: John Snow Labs +name: model_leotrim +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`model_leotrim` is a English model originally trained by Leotrim. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/model_leotrim_en_5.5.0_3.0_1726889039574.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/model_leotrim_en_5.5.0_3.0_1726889039574.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("model_leotrim","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("model_leotrim", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|model_leotrim| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Leotrim/model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-model_leotrim_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-model_leotrim_pipeline_en.md new file mode 100644 index 00000000000000..2f478ec5d609b4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-model_leotrim_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English model_leotrim_pipeline pipeline DistilBertForSequenceClassification from Leotrim +author: John Snow Labs +name: model_leotrim_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`model_leotrim_pipeline` is a English model originally trained by Leotrim. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/model_leotrim_pipeline_en_5.5.0_3.0_1726889052238.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/model_leotrim_pipeline_en_5.5.0_3.0_1726889052238.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("model_leotrim_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("model_leotrim_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|model_leotrim_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Leotrim/model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-model_mental_williamywy_en.md b/docs/_posts/ahmedlone127/2024-09-21-model_mental_williamywy_en.md new file mode 100644 index 00000000000000..53fbc127ae83a9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-model_mental_williamywy_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English model_mental_williamywy DistilBertForSequenceClassification from WilliamYWY +author: John Snow Labs +name: model_mental_williamywy +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`model_mental_williamywy` is a English model originally trained by WilliamYWY. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/model_mental_williamywy_en_5.5.0_3.0_1726923778353.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/model_mental_williamywy_en_5.5.0_3.0_1726923778353.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("model_mental_williamywy","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("model_mental_williamywy", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|model_mental_williamywy| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/WilliamYWY/model_mental \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-model_mental_williamywy_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-model_mental_williamywy_pipeline_en.md new file mode 100644 index 00000000000000..95f71ad5eceb65 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-model_mental_williamywy_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English model_mental_williamywy_pipeline pipeline DistilBertForSequenceClassification from WilliamYWY +author: John Snow Labs +name: model_mental_williamywy_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`model_mental_williamywy_pipeline` is a English model originally trained by WilliamYWY. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/model_mental_williamywy_pipeline_en_5.5.0_3.0_1726923790224.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/model_mental_williamywy_pipeline_en_5.5.0_3.0_1726923790224.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("model_mental_williamywy_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("model_mental_williamywy_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|model_mental_williamywy_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/WilliamYWY/model_mental + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-movie_remarks_classifier_en.md b/docs/_posts/ahmedlone127/2024-09-21-movie_remarks_classifier_en.md new file mode 100644 index 00000000000000..39ef44c9185264 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-movie_remarks_classifier_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English movie_remarks_classifier DistilBertForSequenceClassification from Liusuthu +author: John Snow Labs +name: movie_remarks_classifier +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`movie_remarks_classifier` is a English model originally trained by Liusuthu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/movie_remarks_classifier_en_5.5.0_3.0_1726884999895.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/movie_remarks_classifier_en_5.5.0_3.0_1726884999895.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("movie_remarks_classifier","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("movie_remarks_classifier", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|movie_remarks_classifier| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Liusuthu/movie_remarks_classifier \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-movie_remarks_classifier_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-movie_remarks_classifier_pipeline_en.md new file mode 100644 index 00000000000000..57728d8cc8e0a4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-movie_remarks_classifier_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English movie_remarks_classifier_pipeline pipeline DistilBertForSequenceClassification from Liusuthu +author: John Snow Labs +name: movie_remarks_classifier_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`movie_remarks_classifier_pipeline` is a English model originally trained by Liusuthu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/movie_remarks_classifier_pipeline_en_5.5.0_3.0_1726885011838.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/movie_remarks_classifier_pipeline_en_5.5.0_3.0_1726885011838.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("movie_remarks_classifier_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("movie_remarks_classifier_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|movie_remarks_classifier_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Liusuthu/movie_remarks_classifier + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-mu_phobert_v1_en.md b/docs/_posts/ahmedlone127/2024-09-21-mu_phobert_v1_en.md new file mode 100644 index 00000000000000..826742f590d45f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-mu_phobert_v1_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English mu_phobert_v1 RoBertaEmbeddings from keepitreal +author: John Snow Labs +name: mu_phobert_v1 +date: 2024-09-21 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`mu_phobert_v1` is a English model originally trained by keepitreal. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/mu_phobert_v1_en_5.5.0_3.0_1726942224178.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/mu_phobert_v1_en_5.5.0_3.0_1726942224178.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("mu_phobert_v1","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("mu_phobert_v1","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|mu_phobert_v1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|349.4 MB| + +## References + +https://huggingface.co/keepitreal/mu-phobert-v1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-mu_phobert_v1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-mu_phobert_v1_pipeline_en.md new file mode 100644 index 00000000000000..7c72bd27973c2d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-mu_phobert_v1_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English mu_phobert_v1_pipeline pipeline RoBertaEmbeddings from keepitreal +author: John Snow Labs +name: mu_phobert_v1_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`mu_phobert_v1_pipeline` is a English model originally trained by keepitreal. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/mu_phobert_v1_pipeline_en_5.5.0_3.0_1726942241085.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/mu_phobert_v1_pipeline_en_5.5.0_3.0_1726942241085.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("mu_phobert_v1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("mu_phobert_v1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|mu_phobert_v1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|349.4 MB| + +## References + +https://huggingface.co/keepitreal/mu-phobert-v1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-multiclass_model_en.md b/docs/_posts/ahmedlone127/2024-09-21-multiclass_model_en.md new file mode 100644 index 00000000000000..3dc0b68bbd24cc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-multiclass_model_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English multiclass_model XlmRoBertaForSequenceClassification from Maximich +author: John Snow Labs +name: multiclass_model +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`multiclass_model` is a English model originally trained by Maximich. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/multiclass_model_en_5.5.0_3.0_1726933013317.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/multiclass_model_en_5.5.0_3.0_1726933013317.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("multiclass_model","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("multiclass_model", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|multiclass_model| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|821.3 MB| + +## References + +https://huggingface.co/Maximich/multiclass-model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-multiclass_model_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-multiclass_model_pipeline_en.md new file mode 100644 index 00000000000000..051e8842020744 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-multiclass_model_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English multiclass_model_pipeline pipeline XlmRoBertaForSequenceClassification from Maximich +author: John Snow Labs +name: multiclass_model_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`multiclass_model_pipeline` is a English model originally trained by Maximich. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/multiclass_model_pipeline_en_5.5.0_3.0_1726933124280.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/multiclass_model_pipeline_en_5.5.0_3.0_1726933124280.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("multiclass_model_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("multiclass_model_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|multiclass_model_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|821.4 MB| + +## References + +https://huggingface.co/Maximich/multiclass-model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-n_distilbert_sst2_padding10model_realgon_en.md b/docs/_posts/ahmedlone127/2024-09-21-n_distilbert_sst2_padding10model_realgon_en.md new file mode 100644 index 00000000000000..9583ed4c2a64dc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-n_distilbert_sst2_padding10model_realgon_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English n_distilbert_sst2_padding10model_realgon DistilBertForSequenceClassification from Realgon +author: John Snow Labs +name: n_distilbert_sst2_padding10model_realgon +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`n_distilbert_sst2_padding10model_realgon` is a English model originally trained by Realgon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/n_distilbert_sst2_padding10model_realgon_en_5.5.0_3.0_1726953628545.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/n_distilbert_sst2_padding10model_realgon_en_5.5.0_3.0_1726953628545.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("n_distilbert_sst2_padding10model_realgon","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("n_distilbert_sst2_padding10model_realgon", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|n_distilbert_sst2_padding10model_realgon| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Realgon/N_distilbert_sst2_padding10model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-n_distilbert_sst2_padding10model_realgon_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-n_distilbert_sst2_padding10model_realgon_pipeline_en.md new file mode 100644 index 00000000000000..474b0daa1a4342 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-n_distilbert_sst2_padding10model_realgon_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English n_distilbert_sst2_padding10model_realgon_pipeline pipeline DistilBertForSequenceClassification from Realgon +author: John Snow Labs +name: n_distilbert_sst2_padding10model_realgon_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`n_distilbert_sst2_padding10model_realgon_pipeline` is a English model originally trained by Realgon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/n_distilbert_sst2_padding10model_realgon_pipeline_en_5.5.0_3.0_1726953640500.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/n_distilbert_sst2_padding10model_realgon_pipeline_en_5.5.0_3.0_1726953640500.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("n_distilbert_sst2_padding10model_realgon_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("n_distilbert_sst2_padding10model_realgon_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|n_distilbert_sst2_padding10model_realgon_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Realgon/N_distilbert_sst2_padding10model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-nbbert_ed1_en.md b/docs/_posts/ahmedlone127/2024-09-21-nbbert_ed1_en.md new file mode 100644 index 00000000000000..ba06f806e313b9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-nbbert_ed1_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English nbbert_ed1 BertForSequenceClassification from yemen2016 +author: John Snow Labs +name: nbbert_ed1 +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`nbbert_ed1` is a English model originally trained by yemen2016. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/nbbert_ed1_en_5.5.0_3.0_1726956035993.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/nbbert_ed1_en_5.5.0_3.0_1726956035993.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("nbbert_ed1","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("nbbert_ed1", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|nbbert_ed1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|668.4 MB| + +## References + +https://huggingface.co/yemen2016/nbbert_ED1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-nbbert_ed1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-nbbert_ed1_pipeline_en.md new file mode 100644 index 00000000000000..67ee52ffb5b3e4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-nbbert_ed1_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English nbbert_ed1_pipeline pipeline BertForSequenceClassification from yemen2016 +author: John Snow Labs +name: nbbert_ed1_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`nbbert_ed1_pipeline` is a English model originally trained by yemen2016. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/nbbert_ed1_pipeline_en_5.5.0_3.0_1726956066713.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/nbbert_ed1_pipeline_en_5.5.0_3.0_1726956066713.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("nbbert_ed1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("nbbert_ed1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|nbbert_ed1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|668.5 MB| + +## References + +https://huggingface.co/yemen2016/nbbert_ED1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-nepal_bhasa_finetuning_covidsenti_distilbert_model_en.md b/docs/_posts/ahmedlone127/2024-09-21-nepal_bhasa_finetuning_covidsenti_distilbert_model_en.md new file mode 100644 index 00000000000000..313108df9f5c88 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-nepal_bhasa_finetuning_covidsenti_distilbert_model_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English nepal_bhasa_finetuning_covidsenti_distilbert_model DistilBertForSequenceClassification from Letrica +author: John Snow Labs +name: nepal_bhasa_finetuning_covidsenti_distilbert_model +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`nepal_bhasa_finetuning_covidsenti_distilbert_model` is a English model originally trained by Letrica. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/nepal_bhasa_finetuning_covidsenti_distilbert_model_en_5.5.0_3.0_1726924360512.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/nepal_bhasa_finetuning_covidsenti_distilbert_model_en_5.5.0_3.0_1726924360512.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("nepal_bhasa_finetuning_covidsenti_distilbert_model","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("nepal_bhasa_finetuning_covidsenti_distilbert_model", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|nepal_bhasa_finetuning_covidsenti_distilbert_model| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Letrica/new-finetuning-COVIDSenti-distilbert-model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-nepal_bhasa_finetuning_covidsenti_distilbert_model_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-nepal_bhasa_finetuning_covidsenti_distilbert_model_pipeline_en.md new file mode 100644 index 00000000000000..72874efb701845 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-nepal_bhasa_finetuning_covidsenti_distilbert_model_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English nepal_bhasa_finetuning_covidsenti_distilbert_model_pipeline pipeline DistilBertForSequenceClassification from Letrica +author: John Snow Labs +name: nepal_bhasa_finetuning_covidsenti_distilbert_model_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`nepal_bhasa_finetuning_covidsenti_distilbert_model_pipeline` is a English model originally trained by Letrica. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/nepal_bhasa_finetuning_covidsenti_distilbert_model_pipeline_en_5.5.0_3.0_1726924372546.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/nepal_bhasa_finetuning_covidsenti_distilbert_model_pipeline_en_5.5.0_3.0_1726924372546.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("nepal_bhasa_finetuning_covidsenti_distilbert_model_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("nepal_bhasa_finetuning_covidsenti_distilbert_model_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|nepal_bhasa_finetuning_covidsenti_distilbert_model_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Letrica/new-finetuning-COVIDSenti-distilbert-model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-nlp_hf_workshop_amirai24_en.md b/docs/_posts/ahmedlone127/2024-09-21-nlp_hf_workshop_amirai24_en.md new file mode 100644 index 00000000000000..3bde48eb353d18 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-nlp_hf_workshop_amirai24_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English nlp_hf_workshop_amirai24 DistilBertForSequenceClassification from AmirAI24 +author: John Snow Labs +name: nlp_hf_workshop_amirai24 +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`nlp_hf_workshop_amirai24` is a English model originally trained by AmirAI24. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/nlp_hf_workshop_amirai24_en_5.5.0_3.0_1726884828274.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/nlp_hf_workshop_amirai24_en_5.5.0_3.0_1726884828274.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("nlp_hf_workshop_amirai24","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("nlp_hf_workshop_amirai24", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|nlp_hf_workshop_amirai24| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|246.0 MB| + +## References + +https://huggingface.co/AmirAI24/NLP_HF_Workshop \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-nlp_hf_workshop_amirai24_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-nlp_hf_workshop_amirai24_pipeline_en.md new file mode 100644 index 00000000000000..e58c225a8df420 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-nlp_hf_workshop_amirai24_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English nlp_hf_workshop_amirai24_pipeline pipeline DistilBertForSequenceClassification from AmirAI24 +author: John Snow Labs +name: nlp_hf_workshop_amirai24_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`nlp_hf_workshop_amirai24_pipeline` is a English model originally trained by AmirAI24. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/nlp_hf_workshop_amirai24_pipeline_en_5.5.0_3.0_1726884839938.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/nlp_hf_workshop_amirai24_pipeline_en_5.5.0_3.0_1726884839938.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("nlp_hf_workshop_amirai24_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("nlp_hf_workshop_amirai24_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|nlp_hf_workshop_amirai24_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|246.0 MB| + +## References + +https://huggingface.co/AmirAI24/NLP_HF_Workshop + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-nlp_hf_workshop_mahdi_fathian_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-nlp_hf_workshop_mahdi_fathian_pipeline_en.md new file mode 100644 index 00000000000000..872b4c6a7f2806 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-nlp_hf_workshop_mahdi_fathian_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English nlp_hf_workshop_mahdi_fathian_pipeline pipeline DistilBertForSequenceClassification from mahdi-fathian +author: John Snow Labs +name: nlp_hf_workshop_mahdi_fathian_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`nlp_hf_workshop_mahdi_fathian_pipeline` is a English model originally trained by mahdi-fathian. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/nlp_hf_workshop_mahdi_fathian_pipeline_en_5.5.0_3.0_1726885029415.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/nlp_hf_workshop_mahdi_fathian_pipeline_en_5.5.0_3.0_1726885029415.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("nlp_hf_workshop_mahdi_fathian_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("nlp_hf_workshop_mahdi_fathian_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|nlp_hf_workshop_mahdi_fathian_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|246.0 MB| + +## References + +https://huggingface.co/mahdi-fathian/NLP_HF_Workshop + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-odyssey_test_9_en.md b/docs/_posts/ahmedlone127/2024-09-21-odyssey_test_9_en.md new file mode 100644 index 00000000000000..3fa20cc1a833b5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-odyssey_test_9_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English odyssey_test_9 WhisperForCTC from zoe145768586678 +author: John Snow Labs +name: odyssey_test_9 +date: 2024-09-21 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`odyssey_test_9` is a English model originally trained by zoe145768586678. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/odyssey_test_9_en_5.5.0_3.0_1726876971503.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/odyssey_test_9_en_5.5.0_3.0_1726876971503.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("odyssey_test_9","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("odyssey_test_9", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|odyssey_test_9| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/zoe145768586678/odyssey-test-9 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-odyssey_test_9_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-odyssey_test_9_pipeline_en.md new file mode 100644 index 00000000000000..adbe6f2f5f1d54 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-odyssey_test_9_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English odyssey_test_9_pipeline pipeline WhisperForCTC from zoe145768586678 +author: John Snow Labs +name: odyssey_test_9_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`odyssey_test_9_pipeline` is a English model originally trained by zoe145768586678. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/odyssey_test_9_pipeline_en_5.5.0_3.0_1726877065139.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/odyssey_test_9_pipeline_en_5.5.0_3.0_1726877065139.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("odyssey_test_9_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("odyssey_test_9_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|odyssey_test_9_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/zoe145768586678/odyssey-test-9 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-openai_whisper_small_zoomrx_colab_1_pipeline_xx.md b/docs/_posts/ahmedlone127/2024-09-21-openai_whisper_small_zoomrx_colab_1_pipeline_xx.md new file mode 100644 index 00000000000000..d548faf807f00d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-openai_whisper_small_zoomrx_colab_1_pipeline_xx.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Multilingual openai_whisper_small_zoomrx_colab_1_pipeline pipeline WhisperForCTC from PraveenJesu +author: John Snow Labs +name: openai_whisper_small_zoomrx_colab_1_pipeline +date: 2024-09-21 +tags: [xx, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`openai_whisper_small_zoomrx_colab_1_pipeline` is a Multilingual model originally trained by PraveenJesu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/openai_whisper_small_zoomrx_colab_1_pipeline_xx_5.5.0_3.0_1726939315984.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/openai_whisper_small_zoomrx_colab_1_pipeline_xx_5.5.0_3.0_1726939315984.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("openai_whisper_small_zoomrx_colab_1_pipeline", lang = "xx") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("openai_whisper_small_zoomrx_colab_1_pipeline", lang = "xx") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|openai_whisper_small_zoomrx_colab_1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|xx| +|Size:|1.1 GB| + +## References + +https://huggingface.co/PraveenJesu/openai-whisper-small-zoomrx-colab-1 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-openai_whisper_tiny_spanish_ecu911_2_es.md b/docs/_posts/ahmedlone127/2024-09-21-openai_whisper_tiny_spanish_ecu911_2_es.md new file mode 100644 index 00000000000000..f5680182355972 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-openai_whisper_tiny_spanish_ecu911_2_es.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Castilian, Spanish openai_whisper_tiny_spanish_ecu911_2 WhisperForCTC from DanielMarquez +author: John Snow Labs +name: openai_whisper_tiny_spanish_ecu911_2 +date: 2024-09-21 +tags: [es, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: es +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`openai_whisper_tiny_spanish_ecu911_2` is a Castilian, Spanish model originally trained by DanielMarquez. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/openai_whisper_tiny_spanish_ecu911_2_es_5.5.0_3.0_1726891604555.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/openai_whisper_tiny_spanish_ecu911_2_es_5.5.0_3.0_1726891604555.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("openai_whisper_tiny_spanish_ecu911_2","es") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("openai_whisper_tiny_spanish_ecu911_2", "es") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|openai_whisper_tiny_spanish_ecu911_2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|es| +|Size:|379.6 MB| + +## References + +https://huggingface.co/DanielMarquez/openai-whisper-tiny-es_ecu911-2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-openai_whisper_tiny_spanish_ecu911_2_pipeline_es.md b/docs/_posts/ahmedlone127/2024-09-21-openai_whisper_tiny_spanish_ecu911_2_pipeline_es.md new file mode 100644 index 00000000000000..ececbed4b2babb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-openai_whisper_tiny_spanish_ecu911_2_pipeline_es.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Castilian, Spanish openai_whisper_tiny_spanish_ecu911_2_pipeline pipeline WhisperForCTC from DanielMarquez +author: John Snow Labs +name: openai_whisper_tiny_spanish_ecu911_2_pipeline +date: 2024-09-21 +tags: [es, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: es +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`openai_whisper_tiny_spanish_ecu911_2_pipeline` is a Castilian, Spanish model originally trained by DanielMarquez. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/openai_whisper_tiny_spanish_ecu911_2_pipeline_es_5.5.0_3.0_1726891627975.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/openai_whisper_tiny_spanish_ecu911_2_pipeline_es_5.5.0_3.0_1726891627975.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("openai_whisper_tiny_spanish_ecu911_2_pipeline", lang = "es") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("openai_whisper_tiny_spanish_ecu911_2_pipeline", lang = "es") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|openai_whisper_tiny_spanish_ecu911_2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|es| +|Size:|379.6 MB| + +## References + +https://huggingface.co/DanielMarquez/openai-whisper-tiny-es_ecu911-2 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-outputs_kyumeo_en.md b/docs/_posts/ahmedlone127/2024-09-21-outputs_kyumeo_en.md new file mode 100644 index 00000000000000..bfac7da336e84b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-outputs_kyumeo_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English outputs_kyumeo DistilBertForSequenceClassification from Kyumeo +author: John Snow Labs +name: outputs_kyumeo +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`outputs_kyumeo` is a English model originally trained by Kyumeo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/outputs_kyumeo_en_5.5.0_3.0_1726924204731.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/outputs_kyumeo_en_5.5.0_3.0_1726924204731.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("outputs_kyumeo","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("outputs_kyumeo", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|outputs_kyumeo| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Kyumeo/outputs \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-outputs_kyumeo_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-outputs_kyumeo_pipeline_en.md new file mode 100644 index 00000000000000..bb2f5eb68223fa --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-outputs_kyumeo_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English outputs_kyumeo_pipeline pipeline DistilBertForSequenceClassification from Kyumeo +author: John Snow Labs +name: outputs_kyumeo_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`outputs_kyumeo_pipeline` is a English model originally trained by Kyumeo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/outputs_kyumeo_pipeline_en_5.5.0_3.0_1726924217060.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/outputs_kyumeo_pipeline_en_5.5.0_3.0_1726924217060.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("outputs_kyumeo_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("outputs_kyumeo_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|outputs_kyumeo_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Kyumeo/outputs + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-plot_classification_en.md b/docs/_posts/ahmedlone127/2024-09-21-plot_classification_en.md new file mode 100644 index 00000000000000..3ae9f9d8fb9f0c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-plot_classification_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English plot_classification DistilBertForSequenceClassification from dduy193 +author: John Snow Labs +name: plot_classification +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`plot_classification` is a English model originally trained by dduy193. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/plot_classification_en_5.5.0_3.0_1726953531871.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/plot_classification_en_5.5.0_3.0_1726953531871.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("plot_classification","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("plot_classification", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|plot_classification| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/dduy193/plot-classification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-plot_classification_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-plot_classification_pipeline_en.md new file mode 100644 index 00000000000000..0cb08dec6a7da9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-plot_classification_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English plot_classification_pipeline pipeline DistilBertForSequenceClassification from dduy193 +author: John Snow Labs +name: plot_classification_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`plot_classification_pipeline` is a English model originally trained by dduy193. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/plot_classification_pipeline_en_5.5.0_3.0_1726953544027.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/plot_classification_pipeline_en_5.5.0_3.0_1726953544027.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("plot_classification_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("plot_classification_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|plot_classification_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/dduy193/plot-classification + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-pmp_h256_pipeline_zh.md b/docs/_posts/ahmedlone127/2024-09-21-pmp_h256_pipeline_zh.md new file mode 100644 index 00000000000000..d022bdf3960331 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-pmp_h256_pipeline_zh.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Chinese pmp_h256_pipeline pipeline BertForTokenClassification from rickltt +author: John Snow Labs +name: pmp_h256_pipeline +date: 2024-09-21 +tags: [zh, open_source, pipeline, onnx] +task: Named Entity Recognition +language: zh +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`pmp_h256_pipeline` is a Chinese model originally trained by rickltt. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/pmp_h256_pipeline_zh_5.5.0_3.0_1726881193340.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/pmp_h256_pipeline_zh_5.5.0_3.0_1726881193340.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("pmp_h256_pipeline", lang = "zh") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("pmp_h256_pipeline", lang = "zh") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|pmp_h256_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|zh| +|Size:|38.7 MB| + +## References + +https://huggingface.co/rickltt/pmp-h256 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-pmp_h256_zh.md b/docs/_posts/ahmedlone127/2024-09-21-pmp_h256_zh.md new file mode 100644 index 00000000000000..87d731c6e55000 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-pmp_h256_zh.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Chinese pmp_h256 BertForTokenClassification from rickltt +author: John Snow Labs +name: pmp_h256 +date: 2024-09-21 +tags: [zh, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: zh +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`pmp_h256` is a Chinese model originally trained by rickltt. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/pmp_h256_zh_5.5.0_3.0_1726881191136.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/pmp_h256_zh_5.5.0_3.0_1726881191136.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("pmp_h256","zh") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("pmp_h256", "zh") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|pmp_h256| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|zh| +|Size:|38.7 MB| + +## References + +https://huggingface.co/rickltt/pmp-h256 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-predict_perception_xlmr_cause_none_en.md b/docs/_posts/ahmedlone127/2024-09-21-predict_perception_xlmr_cause_none_en.md new file mode 100644 index 00000000000000..e2740b38201613 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-predict_perception_xlmr_cause_none_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English predict_perception_xlmr_cause_none XlmRoBertaForSequenceClassification from responsibility-framing +author: John Snow Labs +name: predict_perception_xlmr_cause_none +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`predict_perception_xlmr_cause_none` is a English model originally trained by responsibility-framing. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/predict_perception_xlmr_cause_none_en_5.5.0_3.0_1726918107795.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/predict_perception_xlmr_cause_none_en_5.5.0_3.0_1726918107795.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("predict_perception_xlmr_cause_none","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("predict_perception_xlmr_cause_none", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|predict_perception_xlmr_cause_none| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|837.6 MB| + +## References + +https://huggingface.co/responsibility-framing/predict-perception-xlmr-cause-none \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-predict_perception_xlmr_cause_none_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-predict_perception_xlmr_cause_none_pipeline_en.md new file mode 100644 index 00000000000000..84b80a7f26a6dd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-predict_perception_xlmr_cause_none_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English predict_perception_xlmr_cause_none_pipeline pipeline XlmRoBertaForSequenceClassification from responsibility-framing +author: John Snow Labs +name: predict_perception_xlmr_cause_none_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`predict_perception_xlmr_cause_none_pipeline` is a English model originally trained by responsibility-framing. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/predict_perception_xlmr_cause_none_pipeline_en_5.5.0_3.0_1726918172732.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/predict_perception_xlmr_cause_none_pipeline_en_5.5.0_3.0_1726918172732.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("predict_perception_xlmr_cause_none_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("predict_perception_xlmr_cause_none_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|predict_perception_xlmr_cause_none_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|837.6 MB| + +## References + +https://huggingface.co/responsibility-framing/predict-perception-xlmr-cause-none + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-pretrained_distilroberta_on_ireland_tweets_en.md b/docs/_posts/ahmedlone127/2024-09-21-pretrained_distilroberta_on_ireland_tweets_en.md new file mode 100644 index 00000000000000..c9d2e56755bab7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-pretrained_distilroberta_on_ireland_tweets_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English pretrained_distilroberta_on_ireland_tweets RoBertaEmbeddings from mitra-mir +author: John Snow Labs +name: pretrained_distilroberta_on_ireland_tweets +date: 2024-09-21 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`pretrained_distilroberta_on_ireland_tweets` is a English model originally trained by mitra-mir. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/pretrained_distilroberta_on_ireland_tweets_en_5.5.0_3.0_1726934056013.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/pretrained_distilroberta_on_ireland_tweets_en_5.5.0_3.0_1726934056013.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("pretrained_distilroberta_on_ireland_tweets","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("pretrained_distilroberta_on_ireland_tweets","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|pretrained_distilroberta_on_ireland_tweets| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|306.8 MB| + +## References + +https://huggingface.co/mitra-mir/pretrained-distilroberta-on-ireland-tweets \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-pretrained_distilroberta_on_ireland_tweets_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-pretrained_distilroberta_on_ireland_tweets_pipeline_en.md new file mode 100644 index 00000000000000..44b70ef05fb4cb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-pretrained_distilroberta_on_ireland_tweets_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English pretrained_distilroberta_on_ireland_tweets_pipeline pipeline RoBertaEmbeddings from mitra-mir +author: John Snow Labs +name: pretrained_distilroberta_on_ireland_tweets_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`pretrained_distilroberta_on_ireland_tweets_pipeline` is a English model originally trained by mitra-mir. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/pretrained_distilroberta_on_ireland_tweets_pipeline_en_5.5.0_3.0_1726934070122.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/pretrained_distilroberta_on_ireland_tweets_pipeline_en_5.5.0_3.0_1726934070122.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("pretrained_distilroberta_on_ireland_tweets_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("pretrained_distilroberta_on_ireland_tweets_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|pretrained_distilroberta_on_ireland_tweets_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|306.8 MB| + +## References + +https://huggingface.co/mitra-mir/pretrained-distilroberta-on-ireland-tweets + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-ptcrawl_base_v1_5__checkpoint1_en.md b/docs/_posts/ahmedlone127/2024-09-21-ptcrawl_base_v1_5__checkpoint1_en.md new file mode 100644 index 00000000000000..dcc9c899f2ceff --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-ptcrawl_base_v1_5__checkpoint1_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English ptcrawl_base_v1_5__checkpoint1 RoBertaEmbeddings from eduagarcia-temp +author: John Snow Labs +name: ptcrawl_base_v1_5__checkpoint1 +date: 2024-09-21 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ptcrawl_base_v1_5__checkpoint1` is a English model originally trained by eduagarcia-temp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ptcrawl_base_v1_5__checkpoint1_en_5.5.0_3.0_1726882120200.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ptcrawl_base_v1_5__checkpoint1_en_5.5.0_3.0_1726882120200.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("ptcrawl_base_v1_5__checkpoint1","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("ptcrawl_base_v1_5__checkpoint1","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ptcrawl_base_v1_5__checkpoint1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|296.9 MB| + +## References + +https://huggingface.co/eduagarcia-temp/ptcrawl_base_v1_5__checkpoint1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-ptcrawl_base_v1_5__checkpoint1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-ptcrawl_base_v1_5__checkpoint1_pipeline_en.md new file mode 100644 index 00000000000000..07048f598ca35f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-ptcrawl_base_v1_5__checkpoint1_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English ptcrawl_base_v1_5__checkpoint1_pipeline pipeline RoBertaEmbeddings from eduagarcia-temp +author: John Snow Labs +name: ptcrawl_base_v1_5__checkpoint1_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ptcrawl_base_v1_5__checkpoint1_pipeline` is a English model originally trained by eduagarcia-temp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ptcrawl_base_v1_5__checkpoint1_pipeline_en_5.5.0_3.0_1726882204847.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ptcrawl_base_v1_5__checkpoint1_pipeline_en_5.5.0_3.0_1726882204847.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("ptcrawl_base_v1_5__checkpoint1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("ptcrawl_base_v1_5__checkpoint1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ptcrawl_base_v1_5__checkpoint1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|296.9 MB| + +## References + +https://huggingface.co/eduagarcia-temp/ptcrawl_base_v1_5__checkpoint1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-racism_finetuned_detests_wandb_en.md b/docs/_posts/ahmedlone127/2024-09-21-racism_finetuned_detests_wandb_en.md new file mode 100644 index 00000000000000..7aab138a6af585 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-racism_finetuned_detests_wandb_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English racism_finetuned_detests_wandb RoBertaForSequenceClassification from Pablo94 +author: John Snow Labs +name: racism_finetuned_detests_wandb +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`racism_finetuned_detests_wandb` is a English model originally trained by Pablo94. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/racism_finetuned_detests_wandb_en_5.5.0_3.0_1726940903940.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/racism_finetuned_detests_wandb_en_5.5.0_3.0_1726940903940.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("racism_finetuned_detests_wandb","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("racism_finetuned_detests_wandb", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|racism_finetuned_detests_wandb| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|449.6 MB| + +## References + +https://huggingface.co/Pablo94/racism-finetuned-detests-wandb \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-racism_finetuned_detests_wandb_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-racism_finetuned_detests_wandb_pipeline_en.md new file mode 100644 index 00000000000000..42298787742899 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-racism_finetuned_detests_wandb_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English racism_finetuned_detests_wandb_pipeline pipeline RoBertaForSequenceClassification from Pablo94 +author: John Snow Labs +name: racism_finetuned_detests_wandb_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`racism_finetuned_detests_wandb_pipeline` is a English model originally trained by Pablo94. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/racism_finetuned_detests_wandb_pipeline_en_5.5.0_3.0_1726940926547.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/racism_finetuned_detests_wandb_pipeline_en_5.5.0_3.0_1726940926547.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("racism_finetuned_detests_wandb_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("racism_finetuned_detests_wandb_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|racism_finetuned_detests_wandb_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|449.7 MB| + +## References + +https://huggingface.co/Pablo94/racism-finetuned-detests-wandb + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-reanker_en.md b/docs/_posts/ahmedlone127/2024-09-21-reanker_en.md new file mode 100644 index 00000000000000..35b8f6a184b789 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-reanker_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English reanker XlmRoBertaForSequenceClassification from YoungPanda +author: John Snow Labs +name: reanker +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`reanker` is a English model originally trained by YoungPanda. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/reanker_en_5.5.0_3.0_1726932810830.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/reanker_en_5.5.0_3.0_1726932810830.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("reanker","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("reanker", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|reanker| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|993.9 MB| + +## References + +https://huggingface.co/YoungPanda/Reanker \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-reanker_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-reanker_pipeline_en.md new file mode 100644 index 00000000000000..c8777728335582 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-reanker_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English reanker_pipeline pipeline XlmRoBertaForSequenceClassification from YoungPanda +author: John Snow Labs +name: reanker_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`reanker_pipeline` is a English model originally trained by YoungPanda. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/reanker_pipeline_en_5.5.0_3.0_1726932860857.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/reanker_pipeline_en_5.5.0_3.0_1726932860857.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("reanker_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("reanker_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|reanker_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|993.9 MB| + +## References + +https://huggingface.co/YoungPanda/Reanker + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-recipes_roberta_base_en.md b/docs/_posts/ahmedlone127/2024-09-21-recipes_roberta_base_en.md new file mode 100644 index 00000000000000..b02b1190a89a59 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-recipes_roberta_base_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English recipes_roberta_base RoBertaEmbeddings from AnonymousSub +author: John Snow Labs +name: recipes_roberta_base +date: 2024-09-21 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`recipes_roberta_base` is a English model originally trained by AnonymousSub. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/recipes_roberta_base_en_5.5.0_3.0_1726934281815.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/recipes_roberta_base_en_5.5.0_3.0_1726934281815.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("recipes_roberta_base","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("recipes_roberta_base","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|recipes_roberta_base| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|466.3 MB| + +## References + +https://huggingface.co/AnonymousSub/recipes-roberta-base \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-recipes_roberta_base_norwegian_ingr_en.md b/docs/_posts/ahmedlone127/2024-09-21-recipes_roberta_base_norwegian_ingr_en.md new file mode 100644 index 00000000000000..046c99728d2154 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-recipes_roberta_base_norwegian_ingr_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English recipes_roberta_base_norwegian_ingr RoBertaEmbeddings from AnonymousSub +author: John Snow Labs +name: recipes_roberta_base_norwegian_ingr +date: 2024-09-21 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`recipes_roberta_base_norwegian_ingr` is a English model originally trained by AnonymousSub. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/recipes_roberta_base_norwegian_ingr_en_5.5.0_3.0_1726942458579.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/recipes_roberta_base_norwegian_ingr_en_5.5.0_3.0_1726942458579.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("recipes_roberta_base_norwegian_ingr","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("recipes_roberta_base_norwegian_ingr","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|recipes_roberta_base_norwegian_ingr| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|466.3 MB| + +## References + +https://huggingface.co/AnonymousSub/recipes-roberta-base-no-ingr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-recipes_roberta_base_norwegian_ingr_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-recipes_roberta_base_norwegian_ingr_pipeline_en.md new file mode 100644 index 00000000000000..65b69a0bfe71cc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-recipes_roberta_base_norwegian_ingr_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English recipes_roberta_base_norwegian_ingr_pipeline pipeline RoBertaEmbeddings from AnonymousSub +author: John Snow Labs +name: recipes_roberta_base_norwegian_ingr_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`recipes_roberta_base_norwegian_ingr_pipeline` is a English model originally trained by AnonymousSub. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/recipes_roberta_base_norwegian_ingr_pipeline_en_5.5.0_3.0_1726942480286.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/recipes_roberta_base_norwegian_ingr_pipeline_en_5.5.0_3.0_1726942480286.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("recipes_roberta_base_norwegian_ingr_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("recipes_roberta_base_norwegian_ingr_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|recipes_roberta_base_norwegian_ingr_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|466.3 MB| + +## References + +https://huggingface.co/AnonymousSub/recipes-roberta-base-no-ingr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-recipes_roberta_base_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-recipes_roberta_base_pipeline_en.md new file mode 100644 index 00000000000000..7365f79bfb7c2e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-recipes_roberta_base_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English recipes_roberta_base_pipeline pipeline RoBertaEmbeddings from AnonymousSub +author: John Snow Labs +name: recipes_roberta_base_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`recipes_roberta_base_pipeline` is a English model originally trained by AnonymousSub. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/recipes_roberta_base_pipeline_en_5.5.0_3.0_1726934303061.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/recipes_roberta_base_pipeline_en_5.5.0_3.0_1726934303061.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("recipes_roberta_base_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("recipes_roberta_base_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|recipes_roberta_base_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|466.3 MB| + +## References + +https://huggingface.co/AnonymousSub/recipes-roberta-base + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-robbert_2023_dutch_large_nelf_ft_lcn_en.md b/docs/_posts/ahmedlone127/2024-09-21-robbert_2023_dutch_large_nelf_ft_lcn_en.md new file mode 100644 index 00000000000000..c3ef06fc9571fb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-robbert_2023_dutch_large_nelf_ft_lcn_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English robbert_2023_dutch_large_nelf_ft_lcn RoBertaEmbeddings from btamm12 +author: John Snow Labs +name: robbert_2023_dutch_large_nelf_ft_lcn +date: 2024-09-21 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`robbert_2023_dutch_large_nelf_ft_lcn` is a English model originally trained by btamm12. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/robbert_2023_dutch_large_nelf_ft_lcn_en_5.5.0_3.0_1726943979755.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/robbert_2023_dutch_large_nelf_ft_lcn_en_5.5.0_3.0_1726943979755.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("robbert_2023_dutch_large_nelf_ft_lcn","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("robbert_2023_dutch_large_nelf_ft_lcn","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|robbert_2023_dutch_large_nelf_ft_lcn| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/btamm12/robbert-2023-dutch-large-nelf-ft-lcn \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-robbert_2023_dutch_large_nelf_ft_lcn_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-robbert_2023_dutch_large_nelf_ft_lcn_pipeline_en.md new file mode 100644 index 00000000000000..a9f9ac57dff1a6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-robbert_2023_dutch_large_nelf_ft_lcn_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English robbert_2023_dutch_large_nelf_ft_lcn_pipeline pipeline RoBertaEmbeddings from btamm12 +author: John Snow Labs +name: robbert_2023_dutch_large_nelf_ft_lcn_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`robbert_2023_dutch_large_nelf_ft_lcn_pipeline` is a English model originally trained by btamm12. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/robbert_2023_dutch_large_nelf_ft_lcn_pipeline_en_5.5.0_3.0_1726944041391.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/robbert_2023_dutch_large_nelf_ft_lcn_pipeline_en_5.5.0_3.0_1726944041391.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("robbert_2023_dutch_large_nelf_ft_lcn_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("robbert_2023_dutch_large_nelf_ft_lcn_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|robbert_2023_dutch_large_nelf_ft_lcn_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/btamm12/robbert-2023-dutch-large-nelf-ft-lcn + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-robbert_cosmetic_v2_finetuned_en.md b/docs/_posts/ahmedlone127/2024-09-21-robbert_cosmetic_v2_finetuned_en.md new file mode 100644 index 00000000000000..50c26726e4f5d6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-robbert_cosmetic_v2_finetuned_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English robbert_cosmetic_v2_finetuned RoBertaEmbeddings from ymelka +author: John Snow Labs +name: robbert_cosmetic_v2_finetuned +date: 2024-09-21 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`robbert_cosmetic_v2_finetuned` is a English model originally trained by ymelka. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/robbert_cosmetic_v2_finetuned_en_5.5.0_3.0_1726934426072.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/robbert_cosmetic_v2_finetuned_en_5.5.0_3.0_1726934426072.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("robbert_cosmetic_v2_finetuned","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("robbert_cosmetic_v2_finetuned","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|robbert_cosmetic_v2_finetuned| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|464.8 MB| + +## References + +https://huggingface.co/ymelka/robbert-cosmetic-v2-finetuned \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-robbert_cosmetic_v2_finetuned_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-robbert_cosmetic_v2_finetuned_pipeline_en.md new file mode 100644 index 00000000000000..705f562add0e65 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-robbert_cosmetic_v2_finetuned_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English robbert_cosmetic_v2_finetuned_pipeline pipeline RoBertaEmbeddings from ymelka +author: John Snow Labs +name: robbert_cosmetic_v2_finetuned_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`robbert_cosmetic_v2_finetuned_pipeline` is a English model originally trained by ymelka. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/robbert_cosmetic_v2_finetuned_pipeline_en_5.5.0_3.0_1726934446697.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/robbert_cosmetic_v2_finetuned_pipeline_en_5.5.0_3.0_1726934446697.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("robbert_cosmetic_v2_finetuned_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("robbert_cosmetic_v2_finetuned_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|robbert_cosmetic_v2_finetuned_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|464.8 MB| + +## References + +https://huggingface.co/ymelka/robbert-cosmetic-v2-finetuned + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-roberdou_100k_en.md b/docs/_posts/ahmedlone127/2024-09-21-roberdou_100k_en.md new file mode 100644 index 00000000000000..256b0e36581421 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-roberdou_100k_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberdou_100k RoBertaEmbeddings from flavio-nakasato +author: John Snow Labs +name: roberdou_100k +date: 2024-09-21 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberdou_100k` is a English model originally trained by flavio-nakasato. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberdou_100k_en_5.5.0_3.0_1726882094775.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberdou_100k_en_5.5.0_3.0_1726882094775.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("roberdou_100k","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("roberdou_100k","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberdou_100k| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|634.4 MB| + +## References + +https://huggingface.co/flavio-nakasato/roberdou_100k \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-roberdou_100k_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-roberdou_100k_pipeline_en.md new file mode 100644 index 00000000000000..f23be44ffc9b90 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-roberdou_100k_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberdou_100k_pipeline pipeline RoBertaEmbeddings from flavio-nakasato +author: John Snow Labs +name: roberdou_100k_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberdou_100k_pipeline` is a English model originally trained by flavio-nakasato. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberdou_100k_pipeline_en_5.5.0_3.0_1726882127665.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberdou_100k_pipeline_en_5.5.0_3.0_1726882127665.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberdou_100k_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberdou_100k_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberdou_100k_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|634.4 MB| + +## References + +https://huggingface.co/flavio-nakasato/roberdou_100k + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-robert_bpe_zinc100k_en.md b/docs/_posts/ahmedlone127/2024-09-21-robert_bpe_zinc100k_en.md new file mode 100644 index 00000000000000..a41fb5485a0771 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-robert_bpe_zinc100k_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English robert_bpe_zinc100k RoBertaEmbeddings from rifkat +author: John Snow Labs +name: robert_bpe_zinc100k +date: 2024-09-21 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`robert_bpe_zinc100k` is a English model originally trained by rifkat. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/robert_bpe_zinc100k_en_5.5.0_3.0_1726958158477.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/robert_bpe_zinc100k_en_5.5.0_3.0_1726958158477.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("robert_bpe_zinc100k","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("robert_bpe_zinc100k","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|robert_bpe_zinc100k| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|310.5 MB| + +## References + +https://huggingface.co/rifkat/robert_BPE_zinc100k \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-robert_bpe_zinc100k_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-robert_bpe_zinc100k_pipeline_en.md new file mode 100644 index 00000000000000..9dccf23c3bf7d8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-robert_bpe_zinc100k_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English robert_bpe_zinc100k_pipeline pipeline RoBertaEmbeddings from rifkat +author: John Snow Labs +name: robert_bpe_zinc100k_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`robert_bpe_zinc100k_pipeline` is a English model originally trained by rifkat. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/robert_bpe_zinc100k_pipeline_en_5.5.0_3.0_1726958173503.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/robert_bpe_zinc100k_pipeline_en_5.5.0_3.0_1726958173503.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("robert_bpe_zinc100k_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("robert_bpe_zinc100k_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|robert_bpe_zinc100k_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|310.5 MB| + +## References + +https://huggingface.co/rifkat/robert_BPE_zinc100k + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-roberta_arabic_en.md b/docs/_posts/ahmedlone127/2024-09-21-roberta_arabic_en.md new file mode 100644 index 00000000000000..fc624cfc01a041 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-roberta_arabic_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_arabic RoBertaEmbeddings from gagan3012 +author: John Snow Labs +name: roberta_arabic +date: 2024-09-21 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_arabic` is a English model originally trained by gagan3012. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_arabic_en_5.5.0_3.0_1726943844113.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_arabic_en_5.5.0_3.0_1726943844113.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("roberta_arabic","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("roberta_arabic","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_arabic| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|465.9 MB| + +## References + +https://huggingface.co/gagan3012/roberta-ar \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-roberta_arabic_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-roberta_arabic_pipeline_en.md new file mode 100644 index 00000000000000..b8681b4cefa326 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-roberta_arabic_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_arabic_pipeline pipeline RoBertaEmbeddings from gagan3012 +author: John Snow Labs +name: roberta_arabic_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_arabic_pipeline` is a English model originally trained by gagan3012. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_arabic_pipeline_en_5.5.0_3.0_1726943865679.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_arabic_pipeline_en_5.5.0_3.0_1726943865679.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_arabic_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_arabic_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_arabic_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|466.0 MB| + +## References + +https://huggingface.co/gagan3012/roberta-ar + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-roberta_atomic_en.md b/docs/_posts/ahmedlone127/2024-09-21-roberta_atomic_en.md new file mode 100644 index 00000000000000..ed112ae69342ca --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-roberta_atomic_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_atomic RoBertaEmbeddings from ClovenDoug +author: John Snow Labs +name: roberta_atomic +date: 2024-09-21 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_atomic` is a English model originally trained by ClovenDoug. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_atomic_en_5.5.0_3.0_1726943609207.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_atomic_en_5.5.0_3.0_1726943609207.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("roberta_atomic","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("roberta_atomic","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_atomic| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|8.7 MB| + +## References + +https://huggingface.co/ClovenDoug/roberta-atomic \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-roberta_atomic_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-roberta_atomic_pipeline_en.md new file mode 100644 index 00000000000000..d59f92c2b21d7a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-roberta_atomic_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_atomic_pipeline pipeline RoBertaEmbeddings from ClovenDoug +author: John Snow Labs +name: roberta_atomic_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_atomic_pipeline` is a English model originally trained by ClovenDoug. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_atomic_pipeline_en_5.5.0_3.0_1726943610176.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_atomic_pipeline_en_5.5.0_3.0_1726943610176.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_atomic_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_atomic_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_atomic_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|8.7 MB| + +## References + +https://huggingface.co/ClovenDoug/roberta-atomic + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-roberta_base_bne_finetuned_tass2020_rogelioplatt_en.md b/docs/_posts/ahmedlone127/2024-09-21-roberta_base_bne_finetuned_tass2020_rogelioplatt_en.md new file mode 100644 index 00000000000000..4853878b11053c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-roberta_base_bne_finetuned_tass2020_rogelioplatt_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_base_bne_finetuned_tass2020_rogelioplatt RoBertaEmbeddings from rogelioplatt +author: John Snow Labs +name: roberta_base_bne_finetuned_tass2020_rogelioplatt +date: 2024-09-21 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_bne_finetuned_tass2020_rogelioplatt` is a English model originally trained by rogelioplatt. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_bne_finetuned_tass2020_rogelioplatt_en_5.5.0_3.0_1726958099483.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_bne_finetuned_tass2020_rogelioplatt_en_5.5.0_3.0_1726958099483.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("roberta_base_bne_finetuned_tass2020_rogelioplatt","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("roberta_base_bne_finetuned_tass2020_rogelioplatt","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_bne_finetuned_tass2020_rogelioplatt| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|463.7 MB| + +## References + +https://huggingface.co/rogelioplatt/roberta-base-bne-finetuned-Tass2020 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-roberta_base_bne_finetuned_tripadvisordomainadaptation_finetuned_e2_restmex2023_pais_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-roberta_base_bne_finetuned_tripadvisordomainadaptation_finetuned_e2_restmex2023_pais_pipeline_en.md new file mode 100644 index 00000000000000..62b7ebb9ddfc93 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-roberta_base_bne_finetuned_tripadvisordomainadaptation_finetuned_e2_restmex2023_pais_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_bne_finetuned_tripadvisordomainadaptation_finetuned_e2_restmex2023_pais_pipeline pipeline RoBertaForSequenceClassification from vg055 +author: John Snow Labs +name: roberta_base_bne_finetuned_tripadvisordomainadaptation_finetuned_e2_restmex2023_pais_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_bne_finetuned_tripadvisordomainadaptation_finetuned_e2_restmex2023_pais_pipeline` is a English model originally trained by vg055. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_bne_finetuned_tripadvisordomainadaptation_finetuned_e2_restmex2023_pais_pipeline_en_5.5.0_3.0_1726940368627.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_bne_finetuned_tripadvisordomainadaptation_finetuned_e2_restmex2023_pais_pipeline_en_5.5.0_3.0_1726940368627.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_bne_finetuned_tripadvisordomainadaptation_finetuned_e2_restmex2023_pais_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_bne_finetuned_tripadvisordomainadaptation_finetuned_e2_restmex2023_pais_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_bne_finetuned_tripadvisordomainadaptation_finetuned_e2_restmex2023_pais_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|466.3 MB| + +## References + +https://huggingface.co/vg055/roberta-base-bne-finetuned-TripAdvisorDomainAdaptation-finetuned-e2-RestMex2023-pais + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-roberta_base_chatgpt_and_reddit_qa_en.md b/docs/_posts/ahmedlone127/2024-09-21-roberta_base_chatgpt_and_reddit_qa_en.md new file mode 100644 index 00000000000000..0ba2c360ce47ba --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-roberta_base_chatgpt_and_reddit_qa_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_base_chatgpt_and_reddit_qa RoBertaForSequenceClassification from fahrialfiansyah +author: John Snow Labs +name: roberta_base_chatgpt_and_reddit_qa +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_chatgpt_and_reddit_qa` is a English model originally trained by fahrialfiansyah. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_chatgpt_and_reddit_qa_en_5.5.0_3.0_1726899933427.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_chatgpt_and_reddit_qa_en_5.5.0_3.0_1726899933427.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_base_chatgpt_and_reddit_qa","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_base_chatgpt_and_reddit_qa", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_chatgpt_and_reddit_qa| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|436.6 MB| + +## References + +https://huggingface.co/fahrialfiansyah/roberta-base_chatgpt_and_reddit_qa \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-roberta_base_chatgpt_and_reddit_qa_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-roberta_base_chatgpt_and_reddit_qa_pipeline_en.md new file mode 100644 index 00000000000000..292bb3a81c8dc7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-roberta_base_chatgpt_and_reddit_qa_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_chatgpt_and_reddit_qa_pipeline pipeline RoBertaForSequenceClassification from fahrialfiansyah +author: John Snow Labs +name: roberta_base_chatgpt_and_reddit_qa_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_chatgpt_and_reddit_qa_pipeline` is a English model originally trained by fahrialfiansyah. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_chatgpt_and_reddit_qa_pipeline_en_5.5.0_3.0_1726899958502.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_chatgpt_and_reddit_qa_pipeline_en_5.5.0_3.0_1726899958502.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_chatgpt_and_reddit_qa_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_chatgpt_and_reddit_qa_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_chatgpt_and_reddit_qa_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|436.6 MB| + +## References + +https://huggingface.co/fahrialfiansyah/roberta-base_chatgpt_and_reddit_qa + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-roberta_base_disaster_tweets_hail_en.md b/docs/_posts/ahmedlone127/2024-09-21-roberta_base_disaster_tweets_hail_en.md new file mode 100644 index 00000000000000..8b228f77992bee --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-roberta_base_disaster_tweets_hail_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_base_disaster_tweets_hail RoBertaForSequenceClassification from maxschlake +author: John Snow Labs +name: roberta_base_disaster_tweets_hail +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_disaster_tweets_hail` is a English model originally trained by maxschlake. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_disaster_tweets_hail_en_5.5.0_3.0_1726900869751.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_disaster_tweets_hail_en_5.5.0_3.0_1726900869751.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_base_disaster_tweets_hail","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_base_disaster_tweets_hail", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_disaster_tweets_hail| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|448.6 MB| + +## References + +https://huggingface.co/maxschlake/roberta-base_disaster_tweets_hail \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-roberta_base_disaster_tweets_hail_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-roberta_base_disaster_tweets_hail_pipeline_en.md new file mode 100644 index 00000000000000..9da8d7a2ba5584 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-roberta_base_disaster_tweets_hail_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_disaster_tweets_hail_pipeline pipeline RoBertaForSequenceClassification from maxschlake +author: John Snow Labs +name: roberta_base_disaster_tweets_hail_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_disaster_tweets_hail_pipeline` is a English model originally trained by maxschlake. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_disaster_tweets_hail_pipeline_en_5.5.0_3.0_1726900893078.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_disaster_tweets_hail_pipeline_en_5.5.0_3.0_1726900893078.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_disaster_tweets_hail_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_disaster_tweets_hail_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_disaster_tweets_hail_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|448.6 MB| + +## References + +https://huggingface.co/maxschlake/roberta-base_disaster_tweets_hail + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-roberta_base_epoch_47_en.md b/docs/_posts/ahmedlone127/2024-09-21-roberta_base_epoch_47_en.md new file mode 100644 index 00000000000000..f1c8e66fb3799b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-roberta_base_epoch_47_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_base_epoch_47 RoBertaEmbeddings from yanaiela +author: John Snow Labs +name: roberta_base_epoch_47 +date: 2024-09-21 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_epoch_47` is a English model originally trained by yanaiela. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_epoch_47_en_5.5.0_3.0_1726957889283.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_epoch_47_en_5.5.0_3.0_1726957889283.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("roberta_base_epoch_47","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("roberta_base_epoch_47","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_epoch_47| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|297.3 MB| + +## References + +https://huggingface.co/yanaiela/roberta-base-epoch_47 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-roberta_base_epoch_47_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-roberta_base_epoch_47_pipeline_en.md new file mode 100644 index 00000000000000..e41116d8d60a7e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-roberta_base_epoch_47_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_epoch_47_pipeline pipeline RoBertaEmbeddings from yanaiela +author: John Snow Labs +name: roberta_base_epoch_47_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_epoch_47_pipeline` is a English model originally trained by yanaiela. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_epoch_47_pipeline_en_5.5.0_3.0_1726957977606.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_epoch_47_pipeline_en_5.5.0_3.0_1726957977606.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_epoch_47_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_epoch_47_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_epoch_47_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|297.3 MB| + +## References + +https://huggingface.co/yanaiela/roberta-base-epoch_47 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-roberta_base_epoch_67_en.md b/docs/_posts/ahmedlone127/2024-09-21-roberta_base_epoch_67_en.md new file mode 100644 index 00000000000000..dd8ea2ee32c15e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-roberta_base_epoch_67_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_base_epoch_67 RoBertaEmbeddings from yanaiela +author: John Snow Labs +name: roberta_base_epoch_67 +date: 2024-09-21 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_epoch_67` is a English model originally trained by yanaiela. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_epoch_67_en_5.5.0_3.0_1726934455815.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_epoch_67_en_5.5.0_3.0_1726934455815.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("roberta_base_epoch_67","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("roberta_base_epoch_67","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_epoch_67| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|297.3 MB| + +## References + +https://huggingface.co/yanaiela/roberta-base-epoch_67 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-roberta_base_epoch_67_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-roberta_base_epoch_67_pipeline_en.md new file mode 100644 index 00000000000000..d8b5873ba52233 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-roberta_base_epoch_67_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_epoch_67_pipeline pipeline RoBertaEmbeddings from yanaiela +author: John Snow Labs +name: roberta_base_epoch_67_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_epoch_67_pipeline` is a English model originally trained by yanaiela. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_epoch_67_pipeline_en_5.5.0_3.0_1726934537960.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_epoch_67_pipeline_en_5.5.0_3.0_1726934537960.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_epoch_67_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_epoch_67_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_epoch_67_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|297.3 MB| + +## References + +https://huggingface.co/yanaiela/roberta-base-epoch_67 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-roberta_base_epoch_77_en.md b/docs/_posts/ahmedlone127/2024-09-21-roberta_base_epoch_77_en.md new file mode 100644 index 00000000000000..3cdf714f354c33 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-roberta_base_epoch_77_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_base_epoch_77 RoBertaEmbeddings from yanaiela +author: John Snow Labs +name: roberta_base_epoch_77 +date: 2024-09-21 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_epoch_77` is a English model originally trained by yanaiela. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_epoch_77_en_5.5.0_3.0_1726942419570.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_epoch_77_en_5.5.0_3.0_1726942419570.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("roberta_base_epoch_77","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("roberta_base_epoch_77","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_epoch_77| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|297.3 MB| + +## References + +https://huggingface.co/yanaiela/roberta-base-epoch_77 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-roberta_base_epoch_77_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-roberta_base_epoch_77_pipeline_en.md new file mode 100644 index 00000000000000..6c25d3e7e841b0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-roberta_base_epoch_77_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_epoch_77_pipeline pipeline RoBertaEmbeddings from yanaiela +author: John Snow Labs +name: roberta_base_epoch_77_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_epoch_77_pipeline` is a English model originally trained by yanaiela. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_epoch_77_pipeline_en_5.5.0_3.0_1726942506753.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_epoch_77_pipeline_en_5.5.0_3.0_1726942506753.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_epoch_77_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_epoch_77_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_epoch_77_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|297.3 MB| + +## References + +https://huggingface.co/yanaiela/roberta-base-epoch_77 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-roberta_base_finetuned_abs_en.md b/docs/_posts/ahmedlone127/2024-09-21-roberta_base_finetuned_abs_en.md new file mode 100644 index 00000000000000..4c8f4f5142bf74 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-roberta_base_finetuned_abs_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_base_finetuned_abs RoBertaEmbeddings from Transabrar +author: John Snow Labs +name: roberta_base_finetuned_abs +date: 2024-09-21 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_finetuned_abs` is a English model originally trained by Transabrar. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_finetuned_abs_en_5.5.0_3.0_1726957887138.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_finetuned_abs_en_5.5.0_3.0_1726957887138.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("roberta_base_finetuned_abs","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("roberta_base_finetuned_abs","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_finetuned_abs| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|464.7 MB| + +## References + +https://huggingface.co/Transabrar/roberta-base-finetuned-abs \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-roberta_base_finetuned_abs_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-roberta_base_finetuned_abs_pipeline_en.md new file mode 100644 index 00000000000000..047e2e798a9fa9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-roberta_base_finetuned_abs_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_finetuned_abs_pipeline pipeline RoBertaEmbeddings from Transabrar +author: John Snow Labs +name: roberta_base_finetuned_abs_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_finetuned_abs_pipeline` is a English model originally trained by Transabrar. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_finetuned_abs_pipeline_en_5.5.0_3.0_1726957910586.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_finetuned_abs_pipeline_en_5.5.0_3.0_1726957910586.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_finetuned_abs_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_finetuned_abs_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_finetuned_abs_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|464.7 MB| + +## References + +https://huggingface.co/Transabrar/roberta-base-finetuned-abs + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-roberta_base_finetuned_wallisian_manual_10ep_en.md b/docs/_posts/ahmedlone127/2024-09-21-roberta_base_finetuned_wallisian_manual_10ep_en.md new file mode 100644 index 00000000000000..4ab14228eef6f6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-roberta_base_finetuned_wallisian_manual_10ep_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_base_finetuned_wallisian_manual_10ep RoBertaEmbeddings from btamm12 +author: John Snow Labs +name: roberta_base_finetuned_wallisian_manual_10ep +date: 2024-09-21 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_finetuned_wallisian_manual_10ep` is a English model originally trained by btamm12. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_finetuned_wallisian_manual_10ep_en_5.5.0_3.0_1726882372636.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_finetuned_wallisian_manual_10ep_en_5.5.0_3.0_1726882372636.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("roberta_base_finetuned_wallisian_manual_10ep","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("roberta_base_finetuned_wallisian_manual_10ep","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_finetuned_wallisian_manual_10ep| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|465.2 MB| + +## References + +https://huggingface.co/btamm12/roberta-base-finetuned-wls-manual-10ep \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-roberta_base_finetuned_wallisian_manual_10ep_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-roberta_base_finetuned_wallisian_manual_10ep_pipeline_en.md new file mode 100644 index 00000000000000..ea152da4ef027e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-roberta_base_finetuned_wallisian_manual_10ep_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_finetuned_wallisian_manual_10ep_pipeline pipeline RoBertaEmbeddings from btamm12 +author: John Snow Labs +name: roberta_base_finetuned_wallisian_manual_10ep_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_finetuned_wallisian_manual_10ep_pipeline` is a English model originally trained by btamm12. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_finetuned_wallisian_manual_10ep_pipeline_en_5.5.0_3.0_1726882397361.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_finetuned_wallisian_manual_10ep_pipeline_en_5.5.0_3.0_1726882397361.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_finetuned_wallisian_manual_10ep_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_finetuned_wallisian_manual_10ep_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_finetuned_wallisian_manual_10ep_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|465.2 MB| + +## References + +https://huggingface.co/btamm12/roberta-base-finetuned-wls-manual-10ep + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-roberta_base_finetuned_wallisian_whisper_5ep_en.md b/docs/_posts/ahmedlone127/2024-09-21-roberta_base_finetuned_wallisian_whisper_5ep_en.md new file mode 100644 index 00000000000000..9cc07d0af152a4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-roberta_base_finetuned_wallisian_whisper_5ep_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_base_finetuned_wallisian_whisper_5ep RoBertaEmbeddings from btamm12 +author: John Snow Labs +name: roberta_base_finetuned_wallisian_whisper_5ep +date: 2024-09-21 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_finetuned_wallisian_whisper_5ep` is a English model originally trained by btamm12. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_finetuned_wallisian_whisper_5ep_en_5.5.0_3.0_1726943981506.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_finetuned_wallisian_whisper_5ep_en_5.5.0_3.0_1726943981506.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("roberta_base_finetuned_wallisian_whisper_5ep","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("roberta_base_finetuned_wallisian_whisper_5ep","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_finetuned_wallisian_whisper_5ep| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|465.9 MB| + +## References + +https://huggingface.co/btamm12/roberta-base-finetuned-wls-whisper-5ep \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-roberta_base_finetuned_wallisian_whisper_5ep_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-roberta_base_finetuned_wallisian_whisper_5ep_pipeline_en.md new file mode 100644 index 00000000000000..6c638a932bca85 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-roberta_base_finetuned_wallisian_whisper_5ep_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_finetuned_wallisian_whisper_5ep_pipeline pipeline RoBertaEmbeddings from btamm12 +author: John Snow Labs +name: roberta_base_finetuned_wallisian_whisper_5ep_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_finetuned_wallisian_whisper_5ep_pipeline` is a English model originally trained by btamm12. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_finetuned_wallisian_whisper_5ep_pipeline_en_5.5.0_3.0_1726944004572.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_finetuned_wallisian_whisper_5ep_pipeline_en_5.5.0_3.0_1726944004572.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_finetuned_wallisian_whisper_5ep_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_finetuned_wallisian_whisper_5ep_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_finetuned_wallisian_whisper_5ep_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|465.9 MB| + +## References + +https://huggingface.co/btamm12/roberta-base-finetuned-wls-whisper-5ep + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-roberta_base_full_finetuned_ner_multi_label_en.md b/docs/_posts/ahmedlone127/2024-09-21-roberta_base_full_finetuned_ner_multi_label_en.md new file mode 100644 index 00000000000000..80eeee65c774cd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-roberta_base_full_finetuned_ner_multi_label_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_base_full_finetuned_ner_multi_label RoBertaForTokenClassification from DDDacc +author: John Snow Labs +name: roberta_base_full_finetuned_ner_multi_label +date: 2024-09-21 +tags: [en, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_full_finetuned_ner_multi_label` is a English model originally trained by DDDacc. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_full_finetuned_ner_multi_label_en_5.5.0_3.0_1726926399537.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_full_finetuned_ner_multi_label_en_5.5.0_3.0_1726926399537.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_base_full_finetuned_ner_multi_label","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_base_full_finetuned_ner_multi_label", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_full_finetuned_ner_multi_label| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|427.3 MB| + +## References + +https://huggingface.co/DDDacc/RoBERTa-Base-full-finetuned-ner-multi-label \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-roberta_base_full_finetuned_ner_multi_label_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-roberta_base_full_finetuned_ner_multi_label_pipeline_en.md new file mode 100644 index 00000000000000..bad3c165597c2d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-roberta_base_full_finetuned_ner_multi_label_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_full_finetuned_ner_multi_label_pipeline pipeline RoBertaForTokenClassification from DDDacc +author: John Snow Labs +name: roberta_base_full_finetuned_ner_multi_label_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_full_finetuned_ner_multi_label_pipeline` is a English model originally trained by DDDacc. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_full_finetuned_ner_multi_label_pipeline_en_5.5.0_3.0_1726926428036.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_full_finetuned_ner_multi_label_pipeline_en_5.5.0_3.0_1726926428036.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_full_finetuned_ner_multi_label_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_full_finetuned_ner_multi_label_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_full_finetuned_ner_multi_label_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|427.3 MB| + +## References + +https://huggingface.co/DDDacc/RoBERTa-Base-full-finetuned-ner-multi-label + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-roberta_base_last_2_chars_acl2023_en.md b/docs/_posts/ahmedlone127/2024-09-21-roberta_base_last_2_chars_acl2023_en.md new file mode 100644 index 00000000000000..ac79f4df6bf38e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-roberta_base_last_2_chars_acl2023_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_base_last_2_chars_acl2023 RoBertaEmbeddings from hitachi-nlp +author: John Snow Labs +name: roberta_base_last_2_chars_acl2023 +date: 2024-09-21 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_last_2_chars_acl2023` is a English model originally trained by hitachi-nlp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_last_2_chars_acl2023_en_5.5.0_3.0_1726934695970.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_last_2_chars_acl2023_en_5.5.0_3.0_1726934695970.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("roberta_base_last_2_chars_acl2023","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("roberta_base_last_2_chars_acl2023","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_last_2_chars_acl2023| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|465.3 MB| + +## References + +https://huggingface.co/hitachi-nlp/roberta-base_last-2-chars_acl2023 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-roberta_base_last_2_chars_acl2023_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-roberta_base_last_2_chars_acl2023_pipeline_en.md new file mode 100644 index 00000000000000..5d4f649b2a2581 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-roberta_base_last_2_chars_acl2023_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_last_2_chars_acl2023_pipeline pipeline RoBertaEmbeddings from hitachi-nlp +author: John Snow Labs +name: roberta_base_last_2_chars_acl2023_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_last_2_chars_acl2023_pipeline` is a English model originally trained by hitachi-nlp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_last_2_chars_acl2023_pipeline_en_5.5.0_3.0_1726934716722.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_last_2_chars_acl2023_pipeline_en_5.5.0_3.0_1726934716722.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_last_2_chars_acl2023_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_last_2_chars_acl2023_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_last_2_chars_acl2023_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|465.3 MB| + +## References + +https://huggingface.co/hitachi-nlp/roberta-base_last-2-chars_acl2023 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-roberta_base_wechsel_french_pipeline_fr.md b/docs/_posts/ahmedlone127/2024-09-21-roberta_base_wechsel_french_pipeline_fr.md new file mode 100644 index 00000000000000..c9b30118549fad --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-roberta_base_wechsel_french_pipeline_fr.md @@ -0,0 +1,70 @@ +--- +layout: model +title: French roberta_base_wechsel_french_pipeline pipeline RoBertaEmbeddings from benjamin +author: John Snow Labs +name: roberta_base_wechsel_french_pipeline +date: 2024-09-21 +tags: [fr, open_source, pipeline, onnx] +task: Embeddings +language: fr +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_wechsel_french_pipeline` is a French model originally trained by benjamin. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_wechsel_french_pipeline_fr_5.5.0_3.0_1726882597336.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_wechsel_french_pipeline_fr_5.5.0_3.0_1726882597336.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_wechsel_french_pipeline", lang = "fr") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_wechsel_french_pipeline", lang = "fr") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_wechsel_french_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|fr| +|Size:|465.8 MB| + +## References + +https://huggingface.co/benjamin/roberta-base-wechsel-french + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-roberta_cws_msr_en.md b/docs/_posts/ahmedlone127/2024-09-21-roberta_cws_msr_en.md new file mode 100644 index 00000000000000..d0e3df4b6f88f5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-roberta_cws_msr_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_cws_msr BertForTokenClassification from tjspross +author: John Snow Labs +name: roberta_cws_msr +date: 2024-09-21 +tags: [en, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_cws_msr` is a English model originally trained by tjspross. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_cws_msr_en_5.5.0_3.0_1726890246005.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_cws_msr_en_5.5.0_3.0_1726890246005.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("roberta_cws_msr","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("roberta_cws_msr", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_cws_msr| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/tjspross/roberta_cws_msr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-roberta_cws_msr_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-roberta_cws_msr_pipeline_en.md new file mode 100644 index 00000000000000..ea5038d1928461 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-roberta_cws_msr_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_cws_msr_pipeline pipeline BertForTokenClassification from tjspross +author: John Snow Labs +name: roberta_cws_msr_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_cws_msr_pipeline` is a English model originally trained by tjspross. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_cws_msr_pipeline_en_5.5.0_3.0_1726890301229.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_cws_msr_pipeline_en_5.5.0_3.0_1726890301229.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_cws_msr_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_cws_msr_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_cws_msr_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/tjspross/roberta_cws_msr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-roberta_en.md b/docs/_posts/ahmedlone127/2024-09-21-roberta_en.md new file mode 100644 index 00000000000000..f4836c3026ad34 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-roberta_en.md @@ -0,0 +1,96 @@ +--- +layout: model +title: English roberta RoBertaForTokenClassification from autosyrup +author: John Snow Labs +name: roberta +date: 2024-09-21 +tags: [en, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta` is a English model originally trained by autosyrup. + +## Predicted Entities + + + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_en_5.5.0_3.0_1726952857788.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_en_5.5.0_3.0_1726952857788.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("roberta","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("roberta", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +References + +https://huggingface.co/autosyrup/roberta \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-roberta_kubhist2_pipeline_sv.md b/docs/_posts/ahmedlone127/2024-09-21-roberta_kubhist2_pipeline_sv.md new file mode 100644 index 00000000000000..d7f4f99ae124ff --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-roberta_kubhist2_pipeline_sv.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Swedish roberta_kubhist2_pipeline pipeline RoBertaEmbeddings from ChangeIsKey +author: John Snow Labs +name: roberta_kubhist2_pipeline +date: 2024-09-21 +tags: [sv, open_source, pipeline, onnx] +task: Embeddings +language: sv +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_kubhist2_pipeline` is a Swedish model originally trained by ChangeIsKey. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_kubhist2_pipeline_sv_5.5.0_3.0_1726882548133.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_kubhist2_pipeline_sv_5.5.0_3.0_1726882548133.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_kubhist2_pipeline", lang = "sv") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_kubhist2_pipeline", lang = "sv") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_kubhist2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|sv| +|Size:|289.0 MB| + +## References + +https://huggingface.co/ChangeIsKey/roberta-kubhist2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-roberta_kubhist2_sv.md b/docs/_posts/ahmedlone127/2024-09-21-roberta_kubhist2_sv.md new file mode 100644 index 00000000000000..dfb6d3a1bba0c8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-roberta_kubhist2_sv.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Swedish roberta_kubhist2 RoBertaEmbeddings from ChangeIsKey +author: John Snow Labs +name: roberta_kubhist2 +date: 2024-09-21 +tags: [sv, open_source, onnx, embeddings, roberta] +task: Embeddings +language: sv +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_kubhist2` is a Swedish model originally trained by ChangeIsKey. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_kubhist2_sv_5.5.0_3.0_1726882534683.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_kubhist2_sv_5.5.0_3.0_1726882534683.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("roberta_kubhist2","sv") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("roberta_kubhist2","sv") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_kubhist2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|sv| +|Size:|289.0 MB| + +## References + +https://huggingface.co/ChangeIsKey/roberta-kubhist2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-roberta_large_bne_finetunedemoevent_en.md b/docs/_posts/ahmedlone127/2024-09-21-roberta_large_bne_finetunedemoevent_en.md new file mode 100644 index 00000000000000..1010a410c2089c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-roberta_large_bne_finetunedemoevent_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_large_bne_finetunedemoevent RoBertaForSequenceClassification from joancipria +author: John Snow Labs +name: roberta_large_bne_finetunedemoevent +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_large_bne_finetunedemoevent` is a English model originally trained by joancipria. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_large_bne_finetunedemoevent_en_5.5.0_3.0_1726900018210.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_large_bne_finetunedemoevent_en_5.5.0_3.0_1726900018210.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_large_bne_finetunedemoevent","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_large_bne_finetunedemoevent", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_large_bne_finetunedemoevent| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/joancipria/roberta-large-bne-FineTunedEmoEvent \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-roberta_large_bne_finetunedemoevent_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-roberta_large_bne_finetunedemoevent_pipeline_en.md new file mode 100644 index 00000000000000..67e0db58636d76 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-roberta_large_bne_finetunedemoevent_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_large_bne_finetunedemoevent_pipeline pipeline RoBertaForSequenceClassification from joancipria +author: John Snow Labs +name: roberta_large_bne_finetunedemoevent_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_large_bne_finetunedemoevent_pipeline` is a English model originally trained by joancipria. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_large_bne_finetunedemoevent_pipeline_en_5.5.0_3.0_1726900092510.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_large_bne_finetunedemoevent_pipeline_en_5.5.0_3.0_1726900092510.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_large_bne_finetunedemoevent_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_large_bne_finetunedemoevent_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_large_bne_finetunedemoevent_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/joancipria/roberta-large-bne-FineTunedEmoEvent + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-roberta_large_finetuned_chunking_en.md b/docs/_posts/ahmedlone127/2024-09-21-roberta_large_finetuned_chunking_en.md new file mode 100644 index 00000000000000..95efbd6226a208 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-roberta_large_finetuned_chunking_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_large_finetuned_chunking BertForTokenClassification from mariolinml +author: John Snow Labs +name: roberta_large_finetuned_chunking +date: 2024-09-21 +tags: [en, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_large_finetuned_chunking` is a English model originally trained by mariolinml. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_large_finetuned_chunking_en_5.5.0_3.0_1726889549745.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_large_finetuned_chunking_en_5.5.0_3.0_1726889549745.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("roberta_large_finetuned_chunking","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("roberta_large_finetuned_chunking", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_large_finetuned_chunking| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/mariolinml/roberta-large-finetuned-chunking \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-roberta_large_finetuned_chunking_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-roberta_large_finetuned_chunking_pipeline_en.md new file mode 100644 index 00000000000000..f592baadcc022c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-roberta_large_finetuned_chunking_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_large_finetuned_chunking_pipeline pipeline BertForTokenClassification from mariolinml +author: John Snow Labs +name: roberta_large_finetuned_chunking_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_large_finetuned_chunking_pipeline` is a English model originally trained by mariolinml. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_large_finetuned_chunking_pipeline_en_5.5.0_3.0_1726889568755.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_large_finetuned_chunking_pipeline_en_5.5.0_3.0_1726889568755.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_large_finetuned_chunking_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_large_finetuned_chunking_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_large_finetuned_chunking_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/mariolinml/roberta-large-finetuned-chunking + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-roberta_large_finetuned_disaster_en.md b/docs/_posts/ahmedlone127/2024-09-21-roberta_large_finetuned_disaster_en.md new file mode 100644 index 00000000000000..62af4b2d448f76 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-roberta_large_finetuned_disaster_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_large_finetuned_disaster RoBertaForSequenceClassification from tiansz +author: John Snow Labs +name: roberta_large_finetuned_disaster +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_large_finetuned_disaster` is a English model originally trained by tiansz. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_large_finetuned_disaster_en_5.5.0_3.0_1726900603445.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_large_finetuned_disaster_en_5.5.0_3.0_1726900603445.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_large_finetuned_disaster","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_large_finetuned_disaster", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_large_finetuned_disaster| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/tiansz/roberta-large-finetuned-disaster \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-roberta_large_finetuned_disaster_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-roberta_large_finetuned_disaster_pipeline_en.md new file mode 100644 index 00000000000000..c6299e6014a898 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-roberta_large_finetuned_disaster_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_large_finetuned_disaster_pipeline pipeline RoBertaForSequenceClassification from tiansz +author: John Snow Labs +name: roberta_large_finetuned_disaster_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_large_finetuned_disaster_pipeline` is a English model originally trained by tiansz. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_large_finetuned_disaster_pipeline_en_5.5.0_3.0_1726900670587.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_large_finetuned_disaster_pipeline_en_5.5.0_3.0_1726900670587.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_large_finetuned_disaster_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_large_finetuned_disaster_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_large_finetuned_disaster_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/tiansz/roberta-large-finetuned-disaster + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-roberta_large_finetuned_m_express_emo_en.md b/docs/_posts/ahmedlone127/2024-09-21-roberta_large_finetuned_m_express_emo_en.md new file mode 100644 index 00000000000000..24104cca7e7a78 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-roberta_large_finetuned_m_express_emo_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_large_finetuned_m_express_emo RoBertaForSequenceClassification from Gregorig +author: John Snow Labs +name: roberta_large_finetuned_m_express_emo +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_large_finetuned_m_express_emo` is a English model originally trained by Gregorig. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_large_finetuned_m_express_emo_en_5.5.0_3.0_1726900421013.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_large_finetuned_m_express_emo_en_5.5.0_3.0_1726900421013.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_large_finetuned_m_express_emo","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_large_finetuned_m_express_emo", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_large_finetuned_m_express_emo| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/Gregorig/roberta-large-finetuned-m_express_emo \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-roberta_large_finetuned_m_express_emo_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-roberta_large_finetuned_m_express_emo_pipeline_en.md new file mode 100644 index 00000000000000..35d94cde50e205 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-roberta_large_finetuned_m_express_emo_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_large_finetuned_m_express_emo_pipeline pipeline RoBertaForSequenceClassification from Gregorig +author: John Snow Labs +name: roberta_large_finetuned_m_express_emo_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_large_finetuned_m_express_emo_pipeline` is a English model originally trained by Gregorig. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_large_finetuned_m_express_emo_pipeline_en_5.5.0_3.0_1726900497664.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_large_finetuned_m_express_emo_pipeline_en_5.5.0_3.0_1726900497664.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_large_finetuned_m_express_emo_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_large_finetuned_m_express_emo_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_large_finetuned_m_express_emo_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/Gregorig/roberta-large-finetuned-m_express_emo + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-roberta_large_fp_sick_en.md b/docs/_posts/ahmedlone127/2024-09-21-roberta_large_fp_sick_en.md new file mode 100644 index 00000000000000..e518f971b06f95 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-roberta_large_fp_sick_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_large_fp_sick RoBertaForSequenceClassification from varun-v-rao +author: John Snow Labs +name: roberta_large_fp_sick +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_large_fp_sick` is a English model originally trained by varun-v-rao. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_large_fp_sick_en_5.5.0_3.0_1726940734034.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_large_fp_sick_en_5.5.0_3.0_1726940734034.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_large_fp_sick","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_large_fp_sick", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_large_fp_sick| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/varun-v-rao/roberta-large-fp-sick \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-roberta_large_fp_sick_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-roberta_large_fp_sick_pipeline_en.md new file mode 100644 index 00000000000000..891488b72a41c9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-roberta_large_fp_sick_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_large_fp_sick_pipeline pipeline RoBertaForSequenceClassification from varun-v-rao +author: John Snow Labs +name: roberta_large_fp_sick_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_large_fp_sick_pipeline` is a English model originally trained by varun-v-rao. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_large_fp_sick_pipeline_en_5.5.0_3.0_1726940814436.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_large_fp_sick_pipeline_en_5.5.0_3.0_1726940814436.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_large_fp_sick_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_large_fp_sick_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_large_fp_sick_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/varun-v-rao/roberta-large-fp-sick + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-roberta_large_pm_m3_voc_hf_finetuned_ner_yokesh456_en.md b/docs/_posts/ahmedlone127/2024-09-21-roberta_large_pm_m3_voc_hf_finetuned_ner_yokesh456_en.md new file mode 100644 index 00000000000000..2c543d6d8fe86f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-roberta_large_pm_m3_voc_hf_finetuned_ner_yokesh456_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_large_pm_m3_voc_hf_finetuned_ner_yokesh456 RoBertaForTokenClassification from yokesh456 +author: John Snow Labs +name: roberta_large_pm_m3_voc_hf_finetuned_ner_yokesh456 +date: 2024-09-21 +tags: [en, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_large_pm_m3_voc_hf_finetuned_ner_yokesh456` is a English model originally trained by yokesh456. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_large_pm_m3_voc_hf_finetuned_ner_yokesh456_en_5.5.0_3.0_1726926826311.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_large_pm_m3_voc_hf_finetuned_ner_yokesh456_en_5.5.0_3.0_1726926826311.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_large_pm_m3_voc_hf_finetuned_ner_yokesh456","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_large_pm_m3_voc_hf_finetuned_ner_yokesh456", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_large_pm_m3_voc_hf_finetuned_ner_yokesh456| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/yokesh456/RoBERTa-large-PM-M3-Voc-hf-finetuned-ner \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-roberta_large_pm_m3_voc_hf_finetuned_ner_yokesh456_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-roberta_large_pm_m3_voc_hf_finetuned_ner_yokesh456_pipeline_en.md new file mode 100644 index 00000000000000..be746f6869ca26 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-roberta_large_pm_m3_voc_hf_finetuned_ner_yokesh456_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_large_pm_m3_voc_hf_finetuned_ner_yokesh456_pipeline pipeline RoBertaForTokenClassification from yokesh456 +author: John Snow Labs +name: roberta_large_pm_m3_voc_hf_finetuned_ner_yokesh456_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_large_pm_m3_voc_hf_finetuned_ner_yokesh456_pipeline` is a English model originally trained by yokesh456. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_large_pm_m3_voc_hf_finetuned_ner_yokesh456_pipeline_en_5.5.0_3.0_1726926903454.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_large_pm_m3_voc_hf_finetuned_ner_yokesh456_pipeline_en_5.5.0_3.0_1726926903454.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_large_pm_m3_voc_hf_finetuned_ner_yokesh456_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_large_pm_m3_voc_hf_finetuned_ner_yokesh456_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_large_pm_m3_voc_hf_finetuned_ner_yokesh456_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/yokesh456/RoBERTa-large-PM-M3-Voc-hf-finetuned-ner + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-roberta_large_tweet_topic_multi_2020_en.md b/docs/_posts/ahmedlone127/2024-09-21-roberta_large_tweet_topic_multi_2020_en.md new file mode 100644 index 00000000000000..b840a672133704 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-roberta_large_tweet_topic_multi_2020_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_large_tweet_topic_multi_2020 RoBertaForSequenceClassification from cardiffnlp +author: John Snow Labs +name: roberta_large_tweet_topic_multi_2020 +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_large_tweet_topic_multi_2020` is a English model originally trained by cardiffnlp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_large_tweet_topic_multi_2020_en_5.5.0_3.0_1726940968585.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_large_tweet_topic_multi_2020_en_5.5.0_3.0_1726940968585.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_large_tweet_topic_multi_2020","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_large_tweet_topic_multi_2020", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_large_tweet_topic_multi_2020| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/cardiffnlp/roberta-large-tweet-topic-multi-2020 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-roberta_large_tweet_topic_multi_2020_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-roberta_large_tweet_topic_multi_2020_pipeline_en.md new file mode 100644 index 00000000000000..6ef70f3e3a8267 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-roberta_large_tweet_topic_multi_2020_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_large_tweet_topic_multi_2020_pipeline pipeline RoBertaForSequenceClassification from cardiffnlp +author: John Snow Labs +name: roberta_large_tweet_topic_multi_2020_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_large_tweet_topic_multi_2020_pipeline` is a English model originally trained by cardiffnlp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_large_tweet_topic_multi_2020_pipeline_en_5.5.0_3.0_1726941042196.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_large_tweet_topic_multi_2020_pipeline_en_5.5.0_3.0_1726941042196.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_large_tweet_topic_multi_2020_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_large_tweet_topic_multi_2020_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_large_tweet_topic_multi_2020_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/cardiffnlp/roberta-large-tweet-topic-multi-2020 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-roberta_legal_experiment_en.md b/docs/_posts/ahmedlone127/2024-09-21-roberta_legal_experiment_en.md new file mode 100644 index 00000000000000..0d2cfe05b533d7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-roberta_legal_experiment_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_legal_experiment RoBertaForSequenceClassification from rkotcher +author: John Snow Labs +name: roberta_legal_experiment +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_legal_experiment` is a English model originally trained by rkotcher. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_legal_experiment_en_5.5.0_3.0_1726900838710.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_legal_experiment_en_5.5.0_3.0_1726900838710.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_legal_experiment","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_legal_experiment", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_legal_experiment| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|422.1 MB| + +## References + +https://huggingface.co/rkotcher/roberta_legal_experiment \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-roberta_legal_experiment_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-roberta_legal_experiment_pipeline_en.md new file mode 100644 index 00000000000000..d425bf2341270d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-roberta_legal_experiment_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_legal_experiment_pipeline pipeline RoBertaForSequenceClassification from rkotcher +author: John Snow Labs +name: roberta_legal_experiment_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_legal_experiment_pipeline` is a English model originally trained by rkotcher. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_legal_experiment_pipeline_en_5.5.0_3.0_1726900874078.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_legal_experiment_pipeline_en_5.5.0_3.0_1726900874078.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_legal_experiment_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_legal_experiment_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_legal_experiment_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|422.1 MB| + +## References + +https://huggingface.co/rkotcher/roberta_legal_experiment + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-roberta_news_cnn_dailymail_en.md b/docs/_posts/ahmedlone127/2024-09-21-roberta_news_cnn_dailymail_en.md new file mode 100644 index 00000000000000..f56d652c79d774 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-roberta_news_cnn_dailymail_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_news_cnn_dailymail RoBertaEmbeddings from isarth +author: John Snow Labs +name: roberta_news_cnn_dailymail +date: 2024-09-21 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_news_cnn_dailymail` is a English model originally trained by isarth. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_news_cnn_dailymail_en_5.5.0_3.0_1726943452670.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_news_cnn_dailymail_en_5.5.0_3.0_1726943452670.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("roberta_news_cnn_dailymail","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("roberta_news_cnn_dailymail","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_news_cnn_dailymail| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|465.9 MB| + +## References + +https://huggingface.co/isarth/roberta-news-cnn_dailymail \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-roberta_news_cnn_dailymail_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-roberta_news_cnn_dailymail_pipeline_en.md new file mode 100644 index 00000000000000..d8118bd9dfa950 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-roberta_news_cnn_dailymail_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_news_cnn_dailymail_pipeline pipeline RoBertaEmbeddings from isarth +author: John Snow Labs +name: roberta_news_cnn_dailymail_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_news_cnn_dailymail_pipeline` is a English model originally trained by isarth. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_news_cnn_dailymail_pipeline_en_5.5.0_3.0_1726943474597.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_news_cnn_dailymail_pipeline_en_5.5.0_3.0_1726943474597.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_news_cnn_dailymail_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_news_cnn_dailymail_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_news_cnn_dailymail_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|466.0 MB| + +## References + +https://huggingface.co/isarth/roberta-news-cnn_dailymail + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-roberta_retrained_250k_en.md b/docs/_posts/ahmedlone127/2024-09-21-roberta_retrained_250k_en.md new file mode 100644 index 00000000000000..f0bceac3c712a6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-roberta_retrained_250k_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_retrained_250k RoBertaEmbeddings from bitsanlp +author: John Snow Labs +name: roberta_retrained_250k +date: 2024-09-21 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_retrained_250k` is a English model originally trained by bitsanlp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_retrained_250k_en_5.5.0_3.0_1726934064605.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_retrained_250k_en_5.5.0_3.0_1726934064605.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("roberta_retrained_250k","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("roberta_retrained_250k","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_retrained_250k| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|465.8 MB| + +## References + +https://huggingface.co/bitsanlp/roberta-retrained-250k \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-roberta_retrained_250k_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-roberta_retrained_250k_pipeline_en.md new file mode 100644 index 00000000000000..d043c5c14a5fab --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-roberta_retrained_250k_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_retrained_250k_pipeline pipeline RoBertaEmbeddings from bitsanlp +author: John Snow Labs +name: roberta_retrained_250k_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_retrained_250k_pipeline` is a English model originally trained by bitsanlp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_retrained_250k_pipeline_en_5.5.0_3.0_1726934085300.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_retrained_250k_pipeline_en_5.5.0_3.0_1726934085300.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_retrained_250k_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_retrained_250k_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_retrained_250k_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|465.8 MB| + +## References + +https://huggingface.co/bitsanlp/roberta-retrained-250k + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-roberta_retrained_kunalr63_en.md b/docs/_posts/ahmedlone127/2024-09-21-roberta_retrained_kunalr63_en.md new file mode 100644 index 00000000000000..f28be9ffda0e3a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-roberta_retrained_kunalr63_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_retrained_kunalr63 RoBertaEmbeddings from kunalr63 +author: John Snow Labs +name: roberta_retrained_kunalr63 +date: 2024-09-21 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_retrained_kunalr63` is a English model originally trained by kunalr63. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_retrained_kunalr63_en_5.5.0_3.0_1726882347322.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_retrained_kunalr63_en_5.5.0_3.0_1726882347322.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("roberta_retrained_kunalr63","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("roberta_retrained_kunalr63","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_retrained_kunalr63| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|466.3 MB| + +## References + +https://huggingface.co/kunalr63/roberta-retrained \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-roberta_retrained_kunalr63_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-roberta_retrained_kunalr63_pipeline_en.md new file mode 100644 index 00000000000000..f38f3b6399ea27 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-roberta_retrained_kunalr63_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_retrained_kunalr63_pipeline pipeline RoBertaEmbeddings from kunalr63 +author: John Snow Labs +name: roberta_retrained_kunalr63_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_retrained_kunalr63_pipeline` is a English model originally trained by kunalr63. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_retrained_kunalr63_pipeline_en_5.5.0_3.0_1726882369436.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_retrained_kunalr63_pipeline_en_5.5.0_3.0_1726882369436.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_retrained_kunalr63_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_retrained_kunalr63_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_retrained_kunalr63_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|466.3 MB| + +## References + +https://huggingface.co/kunalr63/roberta-retrained + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-roberta_tagalog_base_ft_udpos213_top5lang_en.md b/docs/_posts/ahmedlone127/2024-09-21-roberta_tagalog_base_ft_udpos213_top5lang_en.md new file mode 100644 index 00000000000000..48d82c2f1dafa7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-roberta_tagalog_base_ft_udpos213_top5lang_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_tagalog_base_ft_udpos213_top5lang RoBertaForTokenClassification from katrinatan +author: John Snow Labs +name: roberta_tagalog_base_ft_udpos213_top5lang +date: 2024-09-21 +tags: [en, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_tagalog_base_ft_udpos213_top5lang` is a English model originally trained by katrinatan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_tagalog_base_ft_udpos213_top5lang_en_5.5.0_3.0_1726927034010.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_tagalog_base_ft_udpos213_top5lang_en_5.5.0_3.0_1726927034010.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_tagalog_base_ft_udpos213_top5lang","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_tagalog_base_ft_udpos213_top5lang", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_tagalog_base_ft_udpos213_top5lang| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/katrinatan/roberta-tagalog-base-ft-udpos213-top5lang \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-roberta_tagalog_base_ft_udpos213_top5lang_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-roberta_tagalog_base_ft_udpos213_top5lang_pipeline_en.md new file mode 100644 index 00000000000000..a9fc1e93cf1ab9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-roberta_tagalog_base_ft_udpos213_top5lang_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_tagalog_base_ft_udpos213_top5lang_pipeline pipeline RoBertaForTokenClassification from katrinatan +author: John Snow Labs +name: roberta_tagalog_base_ft_udpos213_top5lang_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_tagalog_base_ft_udpos213_top5lang_pipeline` is a English model originally trained by katrinatan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_tagalog_base_ft_udpos213_top5lang_pipeline_en_5.5.0_3.0_1726927051947.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_tagalog_base_ft_udpos213_top5lang_pipeline_en_5.5.0_3.0_1726927051947.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_tagalog_base_ft_udpos213_top5lang_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_tagalog_base_ft_udpos213_top5lang_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_tagalog_base_ft_udpos213_top5lang_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/katrinatan/roberta-tagalog-base-ft-udpos213-top5lang + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-roberta_tagalog_base_ft_udpos213_vietnamese_en.md b/docs/_posts/ahmedlone127/2024-09-21-roberta_tagalog_base_ft_udpos213_vietnamese_en.md new file mode 100644 index 00000000000000..fd802811090f33 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-roberta_tagalog_base_ft_udpos213_vietnamese_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_tagalog_base_ft_udpos213_vietnamese RoBertaForTokenClassification from hellojimson +author: John Snow Labs +name: roberta_tagalog_base_ft_udpos213_vietnamese +date: 2024-09-21 +tags: [en, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_tagalog_base_ft_udpos213_vietnamese` is a English model originally trained by hellojimson. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_tagalog_base_ft_udpos213_vietnamese_en_5.5.0_3.0_1726926747818.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_tagalog_base_ft_udpos213_vietnamese_en_5.5.0_3.0_1726926747818.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_tagalog_base_ft_udpos213_vietnamese","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_tagalog_base_ft_udpos213_vietnamese", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_tagalog_base_ft_udpos213_vietnamese| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/hellojimson/roberta-tagalog-base-ft-udpos213-vi \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-roberta_tagalog_base_ft_udpos213_vietnamese_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-roberta_tagalog_base_ft_udpos213_vietnamese_pipeline_en.md new file mode 100644 index 00000000000000..840792ff958615 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-roberta_tagalog_base_ft_udpos213_vietnamese_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_tagalog_base_ft_udpos213_vietnamese_pipeline pipeline RoBertaForTokenClassification from hellojimson +author: John Snow Labs +name: roberta_tagalog_base_ft_udpos213_vietnamese_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_tagalog_base_ft_udpos213_vietnamese_pipeline` is a English model originally trained by hellojimson. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_tagalog_base_ft_udpos213_vietnamese_pipeline_en_5.5.0_3.0_1726926765626.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_tagalog_base_ft_udpos213_vietnamese_pipeline_en_5.5.0_3.0_1726926765626.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_tagalog_base_ft_udpos213_vietnamese_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_tagalog_base_ft_udpos213_vietnamese_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_tagalog_base_ft_udpos213_vietnamese_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/hellojimson/roberta-tagalog-base-ft-udpos213-vi + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-roberta_v2_en.md b/docs/_posts/ahmedlone127/2024-09-21-roberta_v2_en.md new file mode 100644 index 00000000000000..913411eb008f9f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-roberta_v2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_v2 RoBertaForTokenClassification from token-classifier +author: John Snow Labs +name: roberta_v2 +date: 2024-09-21 +tags: [en, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_v2` is a English model originally trained by token-classifier. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_v2_en_5.5.0_3.0_1726926649407.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_v2_en_5.5.0_3.0_1726926649407.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_v2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_v2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_v2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/token-classifier/roBERTa-v2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-roberta_v2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-roberta_v2_pipeline_en.md new file mode 100644 index 00000000000000..cbac4a2f22c086 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-roberta_v2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_v2_pipeline pipeline RoBertaForTokenClassification from token-classifier +author: John Snow Labs +name: roberta_v2_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_v2_pipeline` is a English model originally trained by token-classifier. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_v2_pipeline_en_5.5.0_3.0_1726926724927.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_v2_pipeline_en_5.5.0_3.0_1726926724927.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_v2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_v2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_v2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/token-classifier/roBERTa-v2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-robertabaseallenai_ppt_occitan_en.md b/docs/_posts/ahmedlone127/2024-09-21-robertabaseallenai_ppt_occitan_en.md new file mode 100644 index 00000000000000..17f15ae77dd2bf --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-robertabaseallenai_ppt_occitan_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English robertabaseallenai_ppt_occitan RoBertaEmbeddings from mehrshadk +author: John Snow Labs +name: robertabaseallenai_ppt_occitan +date: 2024-09-21 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`robertabaseallenai_ppt_occitan` is a English model originally trained by mehrshadk. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/robertabaseallenai_ppt_occitan_en_5.5.0_3.0_1726882217317.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/robertabaseallenai_ppt_occitan_en_5.5.0_3.0_1726882217317.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("robertabaseallenai_ppt_occitan","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("robertabaseallenai_ppt_occitan","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|robertabaseallenai_ppt_occitan| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|465.9 MB| + +## References + +https://huggingface.co/mehrshadk/robertaBaseAllenAI_ppt_OC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-robertabaseallenai_ppt_occitan_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-robertabaseallenai_ppt_occitan_pipeline_en.md new file mode 100644 index 00000000000000..0e4b9c0e27d6fe --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-robertabaseallenai_ppt_occitan_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English robertabaseallenai_ppt_occitan_pipeline pipeline RoBertaEmbeddings from mehrshadk +author: John Snow Labs +name: robertabaseallenai_ppt_occitan_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`robertabaseallenai_ppt_occitan_pipeline` is a English model originally trained by mehrshadk. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/robertabaseallenai_ppt_occitan_pipeline_en_5.5.0_3.0_1726882239465.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/robertabaseallenai_ppt_occitan_pipeline_en_5.5.0_3.0_1726882239465.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("robertabaseallenai_ppt_occitan_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("robertabaseallenai_ppt_occitan_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|robertabaseallenai_ppt_occitan_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|465.9 MB| + +## References + +https://huggingface.co/mehrshadk/robertaBaseAllenAI_ppt_OC + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-robit_roberta_base_italian_aida_upm_en.md b/docs/_posts/ahmedlone127/2024-09-21-robit_roberta_base_italian_aida_upm_en.md new file mode 100644 index 00000000000000..d5de39062f7e53 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-robit_roberta_base_italian_aida_upm_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English robit_roberta_base_italian_aida_upm RoBertaEmbeddings from AIDA-UPM +author: John Snow Labs +name: robit_roberta_base_italian_aida_upm +date: 2024-09-21 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`robit_roberta_base_italian_aida_upm` is a English model originally trained by AIDA-UPM. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/robit_roberta_base_italian_aida_upm_en_5.5.0_3.0_1726882369761.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/robit_roberta_base_italian_aida_upm_en_5.5.0_3.0_1726882369761.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("robit_roberta_base_italian_aida_upm","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("robit_roberta_base_italian_aida_upm","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|robit_roberta_base_italian_aida_upm| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|465.7 MB| + +## References + +https://huggingface.co/AIDA-UPM/robit-roberta-base-it \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-robit_roberta_base_italian_aida_upm_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-robit_roberta_base_italian_aida_upm_pipeline_en.md new file mode 100644 index 00000000000000..1f3b01d192a27f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-robit_roberta_base_italian_aida_upm_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English robit_roberta_base_italian_aida_upm_pipeline pipeline RoBertaEmbeddings from AIDA-UPM +author: John Snow Labs +name: robit_roberta_base_italian_aida_upm_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`robit_roberta_base_italian_aida_upm_pipeline` is a English model originally trained by AIDA-UPM. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/robit_roberta_base_italian_aida_upm_pipeline_en_5.5.0_3.0_1726882398728.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/robit_roberta_base_italian_aida_upm_pipeline_en_5.5.0_3.0_1726882398728.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("robit_roberta_base_italian_aida_upm_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("robit_roberta_base_italian_aida_upm_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|robit_roberta_base_italian_aida_upm_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|465.7 MB| + +## References + +https://huggingface.co/AIDA-UPM/robit-roberta-base-it + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-rubert_tiny2_cedr_russian_emotion_pipeline_ru.md b/docs/_posts/ahmedlone127/2024-09-21-rubert_tiny2_cedr_russian_emotion_pipeline_ru.md new file mode 100644 index 00000000000000..621dcdfb3f75ef --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-rubert_tiny2_cedr_russian_emotion_pipeline_ru.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Russian rubert_tiny2_cedr_russian_emotion_pipeline pipeline BertForSequenceClassification from seara +author: John Snow Labs +name: rubert_tiny2_cedr_russian_emotion_pipeline +date: 2024-09-21 +tags: [ru, open_source, pipeline, onnx] +task: Text Classification +language: ru +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`rubert_tiny2_cedr_russian_emotion_pipeline` is a Russian model originally trained by seara. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/rubert_tiny2_cedr_russian_emotion_pipeline_ru_5.5.0_3.0_1726955077704.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/rubert_tiny2_cedr_russian_emotion_pipeline_ru_5.5.0_3.0_1726955077704.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("rubert_tiny2_cedr_russian_emotion_pipeline", lang = "ru") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("rubert_tiny2_cedr_russian_emotion_pipeline", lang = "ru") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|rubert_tiny2_cedr_russian_emotion_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|ru| +|Size:|109.5 MB| + +## References + +https://huggingface.co/seara/rubert-tiny2-cedr-russian-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-rubert_tiny2_cedr_russian_emotion_ru.md b/docs/_posts/ahmedlone127/2024-09-21-rubert_tiny2_cedr_russian_emotion_ru.md new file mode 100644 index 00000000000000..b7cca0898ef6fb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-rubert_tiny2_cedr_russian_emotion_ru.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Russian rubert_tiny2_cedr_russian_emotion BertForSequenceClassification from seara +author: John Snow Labs +name: rubert_tiny2_cedr_russian_emotion +date: 2024-09-21 +tags: [ru, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: ru +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`rubert_tiny2_cedr_russian_emotion` is a Russian model originally trained by seara. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/rubert_tiny2_cedr_russian_emotion_ru_5.5.0_3.0_1726955071913.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/rubert_tiny2_cedr_russian_emotion_ru_5.5.0_3.0_1726955071913.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("rubert_tiny2_cedr_russian_emotion","ru") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("rubert_tiny2_cedr_russian_emotion", "ru") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|rubert_tiny2_cedr_russian_emotion| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|ru| +|Size:|109.5 MB| + +## References + +https://huggingface.co/seara/rubert-tiny2-cedr-russian-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-saved_model_en.md b/docs/_posts/ahmedlone127/2024-09-21-saved_model_en.md new file mode 100644 index 00000000000000..6028522afbe64b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-saved_model_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English saved_model DistilBertForSequenceClassification from hanyp +author: John Snow Labs +name: saved_model +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`saved_model` is a English model originally trained by hanyp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/saved_model_en_5.5.0_3.0_1726888953733.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/saved_model_en_5.5.0_3.0_1726888953733.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("saved_model","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("saved_model", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|saved_model| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/hanyp/saved_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-saved_model_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-saved_model_pipeline_en.md new file mode 100644 index 00000000000000..f7474c5a37acc9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-saved_model_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English saved_model_pipeline pipeline DistilBertForSequenceClassification from hanyp +author: John Snow Labs +name: saved_model_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`saved_model_pipeline` is a English model originally trained by hanyp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/saved_model_pipeline_en_5.5.0_3.0_1726888965838.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/saved_model_pipeline_en_5.5.0_3.0_1726888965838.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("saved_model_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("saved_model_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|saved_model_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/hanyp/saved_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-scibert_ner_drugname_en.md b/docs/_posts/ahmedlone127/2024-09-21-scibert_ner_drugname_en.md new file mode 100644 index 00000000000000..080dc8669fedaa --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-scibert_ner_drugname_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English scibert_ner_drugname BertForTokenClassification from duytu +author: John Snow Labs +name: scibert_ner_drugname +date: 2024-09-21 +tags: [en, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`scibert_ner_drugname` is a English model originally trained by duytu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/scibert_ner_drugname_en_5.5.0_3.0_1726889563283.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/scibert_ner_drugname_en_5.5.0_3.0_1726889563283.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("scibert_ner_drugname","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("scibert_ner_drugname", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|scibert_ner_drugname| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|409.9 MB| + +## References + +https://huggingface.co/duytu/scibert_ner_drugname \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-scibert_ner_drugname_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-scibert_ner_drugname_pipeline_en.md new file mode 100644 index 00000000000000..3a9789e3b10360 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-scibert_ner_drugname_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English scibert_ner_drugname_pipeline pipeline BertForTokenClassification from duytu +author: John Snow Labs +name: scibert_ner_drugname_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`scibert_ner_drugname_pipeline` is a English model originally trained by duytu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/scibert_ner_drugname_pipeline_en_5.5.0_3.0_1726889582104.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/scibert_ner_drugname_pipeline_en_5.5.0_3.0_1726889582104.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("scibert_ner_drugname_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("scibert_ner_drugname_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|scibert_ner_drugname_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|410.0 MB| + +## References + +https://huggingface.co/duytu/scibert_ner_drugname + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-sent_aethiqs_base_bertje_data_rotterdam_epochs_30_epoch_30_en.md b/docs/_posts/ahmedlone127/2024-09-21-sent_aethiqs_base_bertje_data_rotterdam_epochs_30_epoch_30_en.md new file mode 100644 index 00000000000000..b27980c7fc3f7c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-sent_aethiqs_base_bertje_data_rotterdam_epochs_30_epoch_30_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_aethiqs_base_bertje_data_rotterdam_epochs_30_epoch_30 BertSentenceEmbeddings from AethiQs-Max +author: John Snow Labs +name: sent_aethiqs_base_bertje_data_rotterdam_epochs_30_epoch_30 +date: 2024-09-21 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_aethiqs_base_bertje_data_rotterdam_epochs_30_epoch_30` is a English model originally trained by AethiQs-Max. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_aethiqs_base_bertje_data_rotterdam_epochs_30_epoch_30_en_5.5.0_3.0_1726914062554.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_aethiqs_base_bertje_data_rotterdam_epochs_30_epoch_30_en_5.5.0_3.0_1726914062554.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_aethiqs_base_bertje_data_rotterdam_epochs_30_epoch_30","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_aethiqs_base_bertje_data_rotterdam_epochs_30_epoch_30","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_aethiqs_base_bertje_data_rotterdam_epochs_30_epoch_30| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|406.8 MB| + +## References + +https://huggingface.co/AethiQs-Max/aethiqs-base_bertje-data_rotterdam-epochs_30-epoch_30 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-sent_aethiqs_base_bertje_data_rotterdam_epochs_30_epoch_30_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-sent_aethiqs_base_bertje_data_rotterdam_epochs_30_epoch_30_pipeline_en.md new file mode 100644 index 00000000000000..bc1c51f9de5d71 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-sent_aethiqs_base_bertje_data_rotterdam_epochs_30_epoch_30_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_aethiqs_base_bertje_data_rotterdam_epochs_30_epoch_30_pipeline pipeline BertSentenceEmbeddings from AethiQs-Max +author: John Snow Labs +name: sent_aethiqs_base_bertje_data_rotterdam_epochs_30_epoch_30_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_aethiqs_base_bertje_data_rotterdam_epochs_30_epoch_30_pipeline` is a English model originally trained by AethiQs-Max. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_aethiqs_base_bertje_data_rotterdam_epochs_30_epoch_30_pipeline_en_5.5.0_3.0_1726914081062.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_aethiqs_base_bertje_data_rotterdam_epochs_30_epoch_30_pipeline_en_5.5.0_3.0_1726914081062.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_aethiqs_base_bertje_data_rotterdam_epochs_30_epoch_30_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_aethiqs_base_bertje_data_rotterdam_epochs_30_epoch_30_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_aethiqs_base_bertje_data_rotterdam_epochs_30_epoch_30_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.4 MB| + +## References + +https://huggingface.co/AethiQs-Max/aethiqs-base_bertje-data_rotterdam-epochs_30-epoch_30 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-sent_arabertmo_base_v1_ar.md b/docs/_posts/ahmedlone127/2024-09-21-sent_arabertmo_base_v1_ar.md new file mode 100644 index 00000000000000..be6e576b63bea5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-sent_arabertmo_base_v1_ar.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Arabic sent_arabertmo_base_v1 BertSentenceEmbeddings from Ebtihal +author: John Snow Labs +name: sent_arabertmo_base_v1 +date: 2024-09-21 +tags: [ar, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: ar +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_arabertmo_base_v1` is a Arabic model originally trained by Ebtihal. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_arabertmo_base_v1_ar_5.5.0_3.0_1726913685204.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_arabertmo_base_v1_ar_5.5.0_3.0_1726913685204.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_arabertmo_base_v1","ar") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_arabertmo_base_v1","ar") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_arabertmo_base_v1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|ar| +|Size:|407.8 MB| + +## References + +https://huggingface.co/Ebtihal/AraBertMo_base_V1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-sent_arabertmo_base_v1_pipeline_ar.md b/docs/_posts/ahmedlone127/2024-09-21-sent_arabertmo_base_v1_pipeline_ar.md new file mode 100644 index 00000000000000..0c7d4d6041c8bc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-sent_arabertmo_base_v1_pipeline_ar.md @@ -0,0 +1,71 @@ +--- +layout: model +title: Arabic sent_arabertmo_base_v1_pipeline pipeline BertSentenceEmbeddings from Ebtihal +author: John Snow Labs +name: sent_arabertmo_base_v1_pipeline +date: 2024-09-21 +tags: [ar, open_source, pipeline, onnx] +task: Embeddings +language: ar +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_arabertmo_base_v1_pipeline` is a Arabic model originally trained by Ebtihal. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_arabertmo_base_v1_pipeline_ar_5.5.0_3.0_1726913704077.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_arabertmo_base_v1_pipeline_ar_5.5.0_3.0_1726913704077.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_arabertmo_base_v1_pipeline", lang = "ar") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_arabertmo_base_v1_pipeline", lang = "ar") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_arabertmo_base_v1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|ar| +|Size:|408.4 MB| + +## References + +https://huggingface.co/Ebtihal/AraBertMo_base_V1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-sent_arabertmo_base_v2_ar.md b/docs/_posts/ahmedlone127/2024-09-21-sent_arabertmo_base_v2_ar.md new file mode 100644 index 00000000000000..a6f791f66f665f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-sent_arabertmo_base_v2_ar.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Arabic sent_arabertmo_base_v2 BertSentenceEmbeddings from Ebtihal +author: John Snow Labs +name: sent_arabertmo_base_v2 +date: 2024-09-21 +tags: [ar, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: ar +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_arabertmo_base_v2` is a Arabic model originally trained by Ebtihal. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_arabertmo_base_v2_ar_5.5.0_3.0_1726941884078.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_arabertmo_base_v2_ar_5.5.0_3.0_1726941884078.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_arabertmo_base_v2","ar") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_arabertmo_base_v2","ar") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_arabertmo_base_v2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|ar| +|Size:|407.9 MB| + +## References + +https://huggingface.co/Ebtihal/AraBertMo_base_V2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-sent_arabertmo_base_v2_pipeline_ar.md b/docs/_posts/ahmedlone127/2024-09-21-sent_arabertmo_base_v2_pipeline_ar.md new file mode 100644 index 00000000000000..f20f7e8fa2cb91 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-sent_arabertmo_base_v2_pipeline_ar.md @@ -0,0 +1,71 @@ +--- +layout: model +title: Arabic sent_arabertmo_base_v2_pipeline pipeline BertSentenceEmbeddings from Ebtihal +author: John Snow Labs +name: sent_arabertmo_base_v2_pipeline +date: 2024-09-21 +tags: [ar, open_source, pipeline, onnx] +task: Embeddings +language: ar +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_arabertmo_base_v2_pipeline` is a Arabic model originally trained by Ebtihal. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_arabertmo_base_v2_pipeline_ar_5.5.0_3.0_1726941903084.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_arabertmo_base_v2_pipeline_ar_5.5.0_3.0_1726941903084.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_arabertmo_base_v2_pipeline", lang = "ar") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_arabertmo_base_v2_pipeline", lang = "ar") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_arabertmo_base_v2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|ar| +|Size:|408.4 MB| + +## References + +https://huggingface.co/Ebtihal/AraBertMo_base_V2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-sent_arabertmo_base_v4_ar.md b/docs/_posts/ahmedlone127/2024-09-21-sent_arabertmo_base_v4_ar.md new file mode 100644 index 00000000000000..2efff000864798 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-sent_arabertmo_base_v4_ar.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Arabic sent_arabertmo_base_v4 BertSentenceEmbeddings from Ebtihal +author: John Snow Labs +name: sent_arabertmo_base_v4 +date: 2024-09-21 +tags: [ar, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: ar +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_arabertmo_base_v4` is a Arabic model originally trained by Ebtihal. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_arabertmo_base_v4_ar_5.5.0_3.0_1726913753700.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_arabertmo_base_v4_ar_5.5.0_3.0_1726913753700.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_arabertmo_base_v4","ar") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_arabertmo_base_v4","ar") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_arabertmo_base_v4| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|ar| +|Size:|407.9 MB| + +## References + +https://huggingface.co/Ebtihal/AraBertMo_base_V4 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-sent_arabertmo_base_v4_pipeline_ar.md b/docs/_posts/ahmedlone127/2024-09-21-sent_arabertmo_base_v4_pipeline_ar.md new file mode 100644 index 00000000000000..b65ceb2ef5ff4d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-sent_arabertmo_base_v4_pipeline_ar.md @@ -0,0 +1,71 @@ +--- +layout: model +title: Arabic sent_arabertmo_base_v4_pipeline pipeline BertSentenceEmbeddings from Ebtihal +author: John Snow Labs +name: sent_arabertmo_base_v4_pipeline +date: 2024-09-21 +tags: [ar, open_source, pipeline, onnx] +task: Embeddings +language: ar +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_arabertmo_base_v4_pipeline` is a Arabic model originally trained by Ebtihal. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_arabertmo_base_v4_pipeline_ar_5.5.0_3.0_1726913772927.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_arabertmo_base_v4_pipeline_ar_5.5.0_3.0_1726913772927.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_arabertmo_base_v4_pipeline", lang = "ar") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_arabertmo_base_v4_pipeline", lang = "ar") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_arabertmo_base_v4_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|ar| +|Size:|408.4 MB| + +## References + +https://huggingface.co/Ebtihal/AraBertMo_base_V4 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-sent_arabertmo_base_v7_en.md b/docs/_posts/ahmedlone127/2024-09-21-sent_arabertmo_base_v7_en.md new file mode 100644 index 00000000000000..412ba59ac74799 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-sent_arabertmo_base_v7_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_arabertmo_base_v7 BertSentenceEmbeddings from Ebtihal +author: John Snow Labs +name: sent_arabertmo_base_v7 +date: 2024-09-21 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_arabertmo_base_v7` is a English model originally trained by Ebtihal. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_arabertmo_base_v7_en_5.5.0_3.0_1726913831502.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_arabertmo_base_v7_en_5.5.0_3.0_1726913831502.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_arabertmo_base_v7","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_arabertmo_base_v7","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_arabertmo_base_v7| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|407.9 MB| + +## References + +https://huggingface.co/Ebtihal/AraBertMo_base_V7 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-sent_arabertmo_base_v7_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-sent_arabertmo_base_v7_pipeline_en.md new file mode 100644 index 00000000000000..ca18a4380deab2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-sent_arabertmo_base_v7_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_arabertmo_base_v7_pipeline pipeline BertSentenceEmbeddings from Ebtihal +author: John Snow Labs +name: sent_arabertmo_base_v7_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_arabertmo_base_v7_pipeline` is a English model originally trained by Ebtihal. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_arabertmo_base_v7_pipeline_en_5.5.0_3.0_1726913851033.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_arabertmo_base_v7_pipeline_en_5.5.0_3.0_1726913851033.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_arabertmo_base_v7_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_arabertmo_base_v7_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_arabertmo_base_v7_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|408.4 MB| + +## References + +https://huggingface.co/Ebtihal/AraBertMo_base_V7 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-sent_aristoberto_en.md b/docs/_posts/ahmedlone127/2024-09-21-sent_aristoberto_en.md new file mode 100644 index 00000000000000..e7659207be3399 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-sent_aristoberto_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_aristoberto BertSentenceEmbeddings from Jacobo +author: John Snow Labs +name: sent_aristoberto +date: 2024-09-21 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_aristoberto` is a English model originally trained by Jacobo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_aristoberto_en_5.5.0_3.0_1726941679489.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_aristoberto_en_5.5.0_3.0_1726941679489.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_aristoberto","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_aristoberto","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_aristoberto| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|420.1 MB| + +## References + +https://huggingface.co/Jacobo/aristoBERTo \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-sent_aristoberto_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-sent_aristoberto_pipeline_en.md new file mode 100644 index 00000000000000..6675c94790a25e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-sent_aristoberto_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_aristoberto_pipeline pipeline BertSentenceEmbeddings from Jacobo +author: John Snow Labs +name: sent_aristoberto_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_aristoberto_pipeline` is a English model originally trained by Jacobo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_aristoberto_pipeline_en_5.5.0_3.0_1726941699256.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_aristoberto_pipeline_en_5.5.0_3.0_1726941699256.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_aristoberto_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_aristoberto_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_aristoberto_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|420.6 MB| + +## References + +https://huggingface.co/Jacobo/aristoBERTo + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-sent_bert_base_english_french_spanish_cased_en.md b/docs/_posts/ahmedlone127/2024-09-21-sent_bert_base_english_french_spanish_cased_en.md new file mode 100644 index 00000000000000..59cf4d4bffb29d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-sent_bert_base_english_french_spanish_cased_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_base_english_french_spanish_cased BertSentenceEmbeddings from Geotrend +author: John Snow Labs +name: sent_bert_base_english_french_spanish_cased +date: 2024-09-21 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_english_french_spanish_cased` is a English model originally trained by Geotrend. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_english_french_spanish_cased_en_5.5.0_3.0_1726941760423.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_english_french_spanish_cased_en_5.5.0_3.0_1726941760423.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_english_french_spanish_cased","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_english_french_spanish_cased","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_english_french_spanish_cased| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|433.1 MB| + +## References + +https://huggingface.co/Geotrend/bert-base-en-fr-es-cased \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-sent_bert_base_english_french_spanish_cased_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-sent_bert_base_english_french_spanish_cased_pipeline_en.md new file mode 100644 index 00000000000000..9a28024847d2f5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-sent_bert_base_english_french_spanish_cased_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_base_english_french_spanish_cased_pipeline pipeline BertSentenceEmbeddings from Geotrend +author: John Snow Labs +name: sent_bert_base_english_french_spanish_cased_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_english_french_spanish_cased_pipeline` is a English model originally trained by Geotrend. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_english_french_spanish_cased_pipeline_en_5.5.0_3.0_1726941780454.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_english_french_spanish_cased_pipeline_en_5.5.0_3.0_1726941780454.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_english_french_spanish_cased_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_english_french_spanish_cased_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_english_french_spanish_cased_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|433.7 MB| + +## References + +https://huggingface.co/Geotrend/bert-base-en-fr-es-cased + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-sent_bert_base_english_greek_modern_russian_cased_en.md b/docs/_posts/ahmedlone127/2024-09-21-sent_bert_base_english_greek_modern_russian_cased_en.md new file mode 100644 index 00000000000000..7dab268918d828 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-sent_bert_base_english_greek_modern_russian_cased_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_base_english_greek_modern_russian_cased BertSentenceEmbeddings from Geotrend +author: John Snow Labs +name: sent_bert_base_english_greek_modern_russian_cased +date: 2024-09-21 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_english_greek_modern_russian_cased` is a English model originally trained by Geotrend. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_english_greek_modern_russian_cased_en_5.5.0_3.0_1726941359004.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_english_greek_modern_russian_cased_en_5.5.0_3.0_1726941359004.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_english_greek_modern_russian_cased","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_english_greek_modern_russian_cased","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_english_greek_modern_russian_cased| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|433.2 MB| + +## References + +https://huggingface.co/Geotrend/bert-base-en-el-ru-cased \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-sent_bert_base_english_greek_modern_russian_cased_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-sent_bert_base_english_greek_modern_russian_cased_pipeline_en.md new file mode 100644 index 00000000000000..1d0c4a4fa5642d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-sent_bert_base_english_greek_modern_russian_cased_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_base_english_greek_modern_russian_cased_pipeline pipeline BertSentenceEmbeddings from Geotrend +author: John Snow Labs +name: sent_bert_base_english_greek_modern_russian_cased_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_english_greek_modern_russian_cased_pipeline` is a English model originally trained by Geotrend. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_english_greek_modern_russian_cased_pipeline_en_5.5.0_3.0_1726941379148.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_english_greek_modern_russian_cased_pipeline_en_5.5.0_3.0_1726941379148.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_english_greek_modern_russian_cased_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_english_greek_modern_russian_cased_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_english_greek_modern_russian_cased_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|433.7 MB| + +## References + +https://huggingface.co/Geotrend/bert-base-en-el-ru-cased + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-sent_bert_base_english_hindi_cased_en.md b/docs/_posts/ahmedlone127/2024-09-21-sent_bert_base_english_hindi_cased_en.md new file mode 100644 index 00000000000000..a4a40a17463d69 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-sent_bert_base_english_hindi_cased_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_base_english_hindi_cased BertSentenceEmbeddings from Geotrend +author: John Snow Labs +name: sent_bert_base_english_hindi_cased +date: 2024-09-21 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_english_hindi_cased` is a English model originally trained by Geotrend. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_english_hindi_cased_en_5.5.0_3.0_1726931761231.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_english_hindi_cased_en_5.5.0_3.0_1726931761231.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_english_hindi_cased","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_english_hindi_cased","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_english_hindi_cased| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|406.9 MB| + +## References + +https://huggingface.co/Geotrend/bert-base-en-hi-cased \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-sent_bert_base_english_hindi_cased_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-sent_bert_base_english_hindi_cased_pipeline_en.md new file mode 100644 index 00000000000000..4d4c7d0725e174 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-sent_bert_base_english_hindi_cased_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_base_english_hindi_cased_pipeline pipeline BertSentenceEmbeddings from Geotrend +author: John Snow Labs +name: sent_bert_base_english_hindi_cased_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_english_hindi_cased_pipeline` is a English model originally trained by Geotrend. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_english_hindi_cased_pipeline_en_5.5.0_3.0_1726931782459.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_english_hindi_cased_pipeline_en_5.5.0_3.0_1726931782459.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_english_hindi_cased_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_english_hindi_cased_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_english_hindi_cased_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.5 MB| + +## References + +https://huggingface.co/Geotrend/bert-base-en-hi-cased + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-sent_bert_base_english_spanish_italian_cased_en.md b/docs/_posts/ahmedlone127/2024-09-21-sent_bert_base_english_spanish_italian_cased_en.md new file mode 100644 index 00000000000000..9bbcf5890d12cc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-sent_bert_base_english_spanish_italian_cased_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_base_english_spanish_italian_cased BertSentenceEmbeddings from Geotrend +author: John Snow Labs +name: sent_bert_base_english_spanish_italian_cased +date: 2024-09-21 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_english_spanish_italian_cased` is a English model originally trained by Geotrend. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_english_spanish_italian_cased_en_5.5.0_3.0_1726941501400.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_english_spanish_italian_cased_en_5.5.0_3.0_1726941501400.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_english_spanish_italian_cased","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_english_spanish_italian_cased","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_english_spanish_italian_cased| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|431.6 MB| + +## References + +https://huggingface.co/Geotrend/bert-base-en-es-it-cased \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-sent_bert_base_english_spanish_italian_cased_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-sent_bert_base_english_spanish_italian_cased_pipeline_en.md new file mode 100644 index 00000000000000..138dd2acfd07ae --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-sent_bert_base_english_spanish_italian_cased_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_base_english_spanish_italian_cased_pipeline pipeline BertSentenceEmbeddings from Geotrend +author: John Snow Labs +name: sent_bert_base_english_spanish_italian_cased_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_english_spanish_italian_cased_pipeline` is a English model originally trained by Geotrend. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_english_spanish_italian_cased_pipeline_en_5.5.0_3.0_1726941521969.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_english_spanish_italian_cased_pipeline_en_5.5.0_3.0_1726941521969.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_english_spanish_italian_cased_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_english_spanish_italian_cased_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_english_spanish_italian_cased_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|432.1 MB| + +## References + +https://huggingface.co/Geotrend/bert-base-en-es-it-cased + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-sent_bert_base_italian_cased_osiria_it.md b/docs/_posts/ahmedlone127/2024-09-21-sent_bert_base_italian_cased_osiria_it.md new file mode 100644 index 00000000000000..602c47dd63b3c6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-sent_bert_base_italian_cased_osiria_it.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Italian sent_bert_base_italian_cased_osiria BertSentenceEmbeddings from osiria +author: John Snow Labs +name: sent_bert_base_italian_cased_osiria +date: 2024-09-21 +tags: [it, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: it +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_italian_cased_osiria` is a Italian model originally trained by osiria. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_italian_cased_osiria_it_5.5.0_3.0_1726898297759.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_italian_cased_osiria_it_5.5.0_3.0_1726898297759.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_italian_cased_osiria","it") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_italian_cased_osiria","it") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_italian_cased_osiria| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|it| +|Size:|409.0 MB| + +## References + +https://huggingface.co/osiria/bert-base-italian-cased \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-sent_bert_base_italian_cased_osiria_pipeline_it.md b/docs/_posts/ahmedlone127/2024-09-21-sent_bert_base_italian_cased_osiria_pipeline_it.md new file mode 100644 index 00000000000000..8b28866da63fd2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-sent_bert_base_italian_cased_osiria_pipeline_it.md @@ -0,0 +1,71 @@ +--- +layout: model +title: Italian sent_bert_base_italian_cased_osiria_pipeline pipeline BertSentenceEmbeddings from osiria +author: John Snow Labs +name: sent_bert_base_italian_cased_osiria_pipeline +date: 2024-09-21 +tags: [it, open_source, pipeline, onnx] +task: Embeddings +language: it +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_italian_cased_osiria_pipeline` is a Italian model originally trained by osiria. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_italian_cased_osiria_pipeline_it_5.5.0_3.0_1726898316037.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_italian_cased_osiria_pipeline_it_5.5.0_3.0_1726898316037.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_italian_cased_osiria_pipeline", lang = "it") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_italian_cased_osiria_pipeline", lang = "it") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_italian_cased_osiria_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|it| +|Size:|409.5 MB| + +## References + +https://huggingface.co/osiria/bert-base-italian-cased + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-sent_bert_base_multilingual_cased_finetuned_igbo_pipeline_xx.md b/docs/_posts/ahmedlone127/2024-09-21-sent_bert_base_multilingual_cased_finetuned_igbo_pipeline_xx.md new file mode 100644 index 00000000000000..4266b73b79bdcb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-sent_bert_base_multilingual_cased_finetuned_igbo_pipeline_xx.md @@ -0,0 +1,71 @@ +--- +layout: model +title: Multilingual sent_bert_base_multilingual_cased_finetuned_igbo_pipeline pipeline BertSentenceEmbeddings from Davlan +author: John Snow Labs +name: sent_bert_base_multilingual_cased_finetuned_igbo_pipeline +date: 2024-09-21 +tags: [xx, open_source, pipeline, onnx] +task: Embeddings +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_multilingual_cased_finetuned_igbo_pipeline` is a Multilingual model originally trained by Davlan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_multilingual_cased_finetuned_igbo_pipeline_xx_5.5.0_3.0_1726898402847.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_multilingual_cased_finetuned_igbo_pipeline_xx_5.5.0_3.0_1726898402847.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_multilingual_cased_finetuned_igbo_pipeline", lang = "xx") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_multilingual_cased_finetuned_igbo_pipeline", lang = "xx") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_multilingual_cased_finetuned_igbo_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|xx| +|Size:|665.6 MB| + +## References + +https://huggingface.co/Davlan/bert-base-multilingual-cased-finetuned-igbo + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-sent_bert_base_multilingual_cased_finetuned_igbo_xx.md b/docs/_posts/ahmedlone127/2024-09-21-sent_bert_base_multilingual_cased_finetuned_igbo_xx.md new file mode 100644 index 00000000000000..5e886926f7480a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-sent_bert_base_multilingual_cased_finetuned_igbo_xx.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Multilingual sent_bert_base_multilingual_cased_finetuned_igbo BertSentenceEmbeddings from Davlan +author: John Snow Labs +name: sent_bert_base_multilingual_cased_finetuned_igbo +date: 2024-09-21 +tags: [xx, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_multilingual_cased_finetuned_igbo` is a Multilingual model originally trained by Davlan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_multilingual_cased_finetuned_igbo_xx_5.5.0_3.0_1726898372527.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_multilingual_cased_finetuned_igbo_xx_5.5.0_3.0_1726898372527.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_multilingual_cased_finetuned_igbo","xx") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_multilingual_cased_finetuned_igbo","xx") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_multilingual_cased_finetuned_igbo| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|xx| +|Size:|665.0 MB| + +## References + +https://huggingface.co/Davlan/bert-base-multilingual-cased-finetuned-igbo \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-sent_bert_base_spanish_wwm_cased_finetuned_tweets_es.md b/docs/_posts/ahmedlone127/2024-09-21-sent_bert_base_spanish_wwm_cased_finetuned_tweets_es.md new file mode 100644 index 00000000000000..18715288ec612e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-sent_bert_base_spanish_wwm_cased_finetuned_tweets_es.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Castilian, Spanish sent_bert_base_spanish_wwm_cased_finetuned_tweets BertSentenceEmbeddings from mariav +author: John Snow Labs +name: sent_bert_base_spanish_wwm_cased_finetuned_tweets +date: 2024-09-21 +tags: [es, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: es +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_spanish_wwm_cased_finetuned_tweets` is a Castilian, Spanish model originally trained by mariav. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_spanish_wwm_cased_finetuned_tweets_es_5.5.0_3.0_1726941870072.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_spanish_wwm_cased_finetuned_tweets_es_5.5.0_3.0_1726941870072.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_spanish_wwm_cased_finetuned_tweets","es") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_spanish_wwm_cased_finetuned_tweets","es") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_spanish_wwm_cased_finetuned_tweets| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|es| +|Size:|409.4 MB| + +## References + +https://huggingface.co/mariav/bert-base-spanish-wwm-cased-finetuned-tweets \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-sent_bert_base_spanish_wwm_cased_finetuned_tweets_pipeline_es.md b/docs/_posts/ahmedlone127/2024-09-21-sent_bert_base_spanish_wwm_cased_finetuned_tweets_pipeline_es.md new file mode 100644 index 00000000000000..1aad616864dd14 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-sent_bert_base_spanish_wwm_cased_finetuned_tweets_pipeline_es.md @@ -0,0 +1,71 @@ +--- +layout: model +title: Castilian, Spanish sent_bert_base_spanish_wwm_cased_finetuned_tweets_pipeline pipeline BertSentenceEmbeddings from mariav +author: John Snow Labs +name: sent_bert_base_spanish_wwm_cased_finetuned_tweets_pipeline +date: 2024-09-21 +tags: [es, open_source, pipeline, onnx] +task: Embeddings +language: es +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_spanish_wwm_cased_finetuned_tweets_pipeline` is a Castilian, Spanish model originally trained by mariav. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_spanish_wwm_cased_finetuned_tweets_pipeline_es_5.5.0_3.0_1726941889129.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_spanish_wwm_cased_finetuned_tweets_pipeline_es_5.5.0_3.0_1726941889129.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_spanish_wwm_cased_finetuned_tweets_pipeline", lang = "es") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_spanish_wwm_cased_finetuned_tweets_pipeline", lang = "es") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_spanish_wwm_cased_finetuned_tweets_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|es| +|Size:|410.0 MB| + +## References + +https://huggingface.co/mariav/bert-base-spanish-wwm-cased-finetuned-tweets + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-sent_bert_base_uncased_copy_en.md b/docs/_posts/ahmedlone127/2024-09-21-sent_bert_base_uncased_copy_en.md new file mode 100644 index 00000000000000..1403973f5689e9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-sent_bert_base_uncased_copy_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_base_uncased_copy BertSentenceEmbeddings from osanseviero +author: John Snow Labs +name: sent_bert_base_uncased_copy +date: 2024-09-21 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_uncased_copy` is a English model originally trained by osanseviero. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_copy_en_5.5.0_3.0_1726941900598.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_copy_en_5.5.0_3.0_1726941900598.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_uncased_copy","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_uncased_copy","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_uncased_copy| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/osanseviero/bert-base-uncased-copy \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-sent_bert_base_uncased_copy_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-sent_bert_base_uncased_copy_pipeline_en.md new file mode 100644 index 00000000000000..e7891dfe394c61 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-sent_bert_base_uncased_copy_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_base_uncased_copy_pipeline pipeline BertSentenceEmbeddings from osanseviero +author: John Snow Labs +name: sent_bert_base_uncased_copy_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_uncased_copy_pipeline` is a English model originally trained by osanseviero. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_copy_pipeline_en_5.5.0_3.0_1726941920470.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_copy_pipeline_en_5.5.0_3.0_1726941920470.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_uncased_copy_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_uncased_copy_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_uncased_copy_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.7 MB| + +## References + +https://huggingface.co/osanseviero/bert-base-uncased-copy + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-sent_bert_base_uncased_finetuned_wallisian_manual_10ep_lower_en.md b/docs/_posts/ahmedlone127/2024-09-21-sent_bert_base_uncased_finetuned_wallisian_manual_10ep_lower_en.md new file mode 100644 index 00000000000000..231eaea661a4fe --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-sent_bert_base_uncased_finetuned_wallisian_manual_10ep_lower_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_base_uncased_finetuned_wallisian_manual_10ep_lower BertSentenceEmbeddings from btamm12 +author: John Snow Labs +name: sent_bert_base_uncased_finetuned_wallisian_manual_10ep_lower +date: 2024-09-21 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_uncased_finetuned_wallisian_manual_10ep_lower` is a English model originally trained by btamm12. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_finetuned_wallisian_manual_10ep_lower_en_5.5.0_3.0_1726941522945.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_finetuned_wallisian_manual_10ep_lower_en_5.5.0_3.0_1726941522945.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_uncased_finetuned_wallisian_manual_10ep_lower","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_uncased_finetuned_wallisian_manual_10ep_lower","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_uncased_finetuned_wallisian_manual_10ep_lower| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/btamm12/bert-base-uncased-finetuned-wls-manual-10ep-lower \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-sent_bert_base_uncased_finetuned_wallisian_manual_10ep_lower_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-sent_bert_base_uncased_finetuned_wallisian_manual_10ep_lower_pipeline_en.md new file mode 100644 index 00000000000000..67dd2ec59d86fc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-sent_bert_base_uncased_finetuned_wallisian_manual_10ep_lower_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_base_uncased_finetuned_wallisian_manual_10ep_lower_pipeline pipeline BertSentenceEmbeddings from btamm12 +author: John Snow Labs +name: sent_bert_base_uncased_finetuned_wallisian_manual_10ep_lower_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_uncased_finetuned_wallisian_manual_10ep_lower_pipeline` is a English model originally trained by btamm12. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_finetuned_wallisian_manual_10ep_lower_pipeline_en_5.5.0_3.0_1726941542654.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_finetuned_wallisian_manual_10ep_lower_pipeline_en_5.5.0_3.0_1726941542654.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_uncased_finetuned_wallisian_manual_10ep_lower_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_uncased_finetuned_wallisian_manual_10ep_lower_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_uncased_finetuned_wallisian_manual_10ep_lower_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.7 MB| + +## References + +https://huggingface.co/btamm12/bert-base-uncased-finetuned-wls-manual-10ep-lower + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-sent_bert_base_uncased_finetuned_wallisian_manual_3ep_lower_en.md b/docs/_posts/ahmedlone127/2024-09-21-sent_bert_base_uncased_finetuned_wallisian_manual_3ep_lower_en.md new file mode 100644 index 00000000000000..71bcb81d59f735 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-sent_bert_base_uncased_finetuned_wallisian_manual_3ep_lower_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_base_uncased_finetuned_wallisian_manual_3ep_lower BertSentenceEmbeddings from btamm12 +author: John Snow Labs +name: sent_bert_base_uncased_finetuned_wallisian_manual_3ep_lower +date: 2024-09-21 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_uncased_finetuned_wallisian_manual_3ep_lower` is a English model originally trained by btamm12. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_finetuned_wallisian_manual_3ep_lower_en_5.5.0_3.0_1726914260503.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_finetuned_wallisian_manual_3ep_lower_en_5.5.0_3.0_1726914260503.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_uncased_finetuned_wallisian_manual_3ep_lower","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_uncased_finetuned_wallisian_manual_3ep_lower","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_uncased_finetuned_wallisian_manual_3ep_lower| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/btamm12/bert-base-uncased-finetuned-wls-manual-3ep-lower \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-sent_bert_base_uncased_finetuned_wallisian_manual_3ep_lower_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-sent_bert_base_uncased_finetuned_wallisian_manual_3ep_lower_pipeline_en.md new file mode 100644 index 00000000000000..76e54761969b6d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-sent_bert_base_uncased_finetuned_wallisian_manual_3ep_lower_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_base_uncased_finetuned_wallisian_manual_3ep_lower_pipeline pipeline BertSentenceEmbeddings from btamm12 +author: John Snow Labs +name: sent_bert_base_uncased_finetuned_wallisian_manual_3ep_lower_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_uncased_finetuned_wallisian_manual_3ep_lower_pipeline` is a English model originally trained by btamm12. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_finetuned_wallisian_manual_3ep_lower_pipeline_en_5.5.0_3.0_1726914278351.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_finetuned_wallisian_manual_3ep_lower_pipeline_en_5.5.0_3.0_1726914278351.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_uncased_finetuned_wallisian_manual_3ep_lower_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_uncased_finetuned_wallisian_manual_3ep_lower_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_uncased_finetuned_wallisian_manual_3ep_lower_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.7 MB| + +## References + +https://huggingface.co/btamm12/bert-base-uncased-finetuned-wls-manual-3ep-lower + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-sent_bert_base_uncased_issues_128_lijingxin_en.md b/docs/_posts/ahmedlone127/2024-09-21-sent_bert_base_uncased_issues_128_lijingxin_en.md new file mode 100644 index 00000000000000..59d1df9c7b46f5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-sent_bert_base_uncased_issues_128_lijingxin_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_base_uncased_issues_128_lijingxin BertSentenceEmbeddings from lijingxin +author: John Snow Labs +name: sent_bert_base_uncased_issues_128_lijingxin +date: 2024-09-21 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_uncased_issues_128_lijingxin` is a English model originally trained by lijingxin. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_issues_128_lijingxin_en_5.5.0_3.0_1726941432089.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_issues_128_lijingxin_en_5.5.0_3.0_1726941432089.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_uncased_issues_128_lijingxin","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_uncased_issues_128_lijingxin","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_uncased_issues_128_lijingxin| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|407.1 MB| + +## References + +https://huggingface.co/lijingxin/bert-base-uncased-issues-128 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-sent_bert_base_uncased_issues_128_lijingxin_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-sent_bert_base_uncased_issues_128_lijingxin_pipeline_en.md new file mode 100644 index 00000000000000..45cbdeee945934 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-sent_bert_base_uncased_issues_128_lijingxin_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_base_uncased_issues_128_lijingxin_pipeline pipeline BertSentenceEmbeddings from lijingxin +author: John Snow Labs +name: sent_bert_base_uncased_issues_128_lijingxin_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_uncased_issues_128_lijingxin_pipeline` is a English model originally trained by lijingxin. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_issues_128_lijingxin_pipeline_en_5.5.0_3.0_1726941450852.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_issues_128_lijingxin_pipeline_en_5.5.0_3.0_1726941450852.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_uncased_issues_128_lijingxin_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_uncased_issues_128_lijingxin_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_uncased_issues_128_lijingxin_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.7 MB| + +## References + +https://huggingface.co/lijingxin/bert-base-uncased-issues-128 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-sent_bert_base_vn_en.md b/docs/_posts/ahmedlone127/2024-09-21-sent_bert_base_vn_en.md new file mode 100644 index 00000000000000..2d6d036214be21 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-sent_bert_base_vn_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_base_vn BertSentenceEmbeddings from NlpHUST +author: John Snow Labs +name: sent_bert_base_vn +date: 2024-09-21 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_vn` is a English model originally trained by NlpHUST. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_vn_en_5.5.0_3.0_1726931737145.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_vn_en_5.5.0_3.0_1726931737145.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_vn","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_vn","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_vn| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|498.8 MB| + +## References + +https://huggingface.co/NlpHUST/bert-base-vn \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-sent_bert_base_vn_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-sent_bert_base_vn_pipeline_en.md new file mode 100644 index 00000000000000..739f824d2b09de --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-sent_bert_base_vn_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_base_vn_pipeline pipeline BertSentenceEmbeddings from NlpHUST +author: John Snow Labs +name: sent_bert_base_vn_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_vn_pipeline` is a English model originally trained by NlpHUST. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_vn_pipeline_en_5.5.0_3.0_1726931759911.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_vn_pipeline_en_5.5.0_3.0_1726931759911.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_vn_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_vn_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_vn_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|499.3 MB| + +## References + +https://huggingface.co/NlpHUST/bert-base-vn + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-sent_bert_e_base_mlm_en.md b/docs/_posts/ahmedlone127/2024-09-21-sent_bert_e_base_mlm_en.md new file mode 100644 index 00000000000000..75724c5d160b05 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-sent_bert_e_base_mlm_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_e_base_mlm BertSentenceEmbeddings from nasa-impact +author: John Snow Labs +name: sent_bert_e_base_mlm +date: 2024-09-21 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_e_base_mlm` is a English model originally trained by nasa-impact. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_e_base_mlm_en_5.5.0_3.0_1726913950367.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_e_base_mlm_en_5.5.0_3.0_1726913950367.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_e_base_mlm","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_e_base_mlm","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_e_base_mlm| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|409.9 MB| + +## References + +https://huggingface.co/nasa-impact/bert-e-base-mlm \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-sent_bert_e_base_mlm_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-sent_bert_e_base_mlm_pipeline_en.md new file mode 100644 index 00000000000000..dec7616b9511d2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-sent_bert_e_base_mlm_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_e_base_mlm_pipeline pipeline BertSentenceEmbeddings from nasa-impact +author: John Snow Labs +name: sent_bert_e_base_mlm_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_e_base_mlm_pipeline` is a English model originally trained by nasa-impact. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_e_base_mlm_pipeline_en_5.5.0_3.0_1726913969047.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_e_base_mlm_pipeline_en_5.5.0_3.0_1726913969047.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_e_base_mlm_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_e_base_mlm_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_e_base_mlm_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|410.4 MB| + +## References + +https://huggingface.co/nasa-impact/bert-e-base-mlm + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-sent_bert_large_contrastive_self_supervised_acl2020_en.md b/docs/_posts/ahmedlone127/2024-09-21-sent_bert_large_contrastive_self_supervised_acl2020_en.md new file mode 100644 index 00000000000000..95d983ee954f31 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-sent_bert_large_contrastive_self_supervised_acl2020_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_large_contrastive_self_supervised_acl2020 BertSentenceEmbeddings from sap-ai-research +author: John Snow Labs +name: sent_bert_large_contrastive_self_supervised_acl2020 +date: 2024-09-21 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_large_contrastive_self_supervised_acl2020` is a English model originally trained by sap-ai-research. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_large_contrastive_self_supervised_acl2020_en_5.5.0_3.0_1726931442096.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_large_contrastive_self_supervised_acl2020_en_5.5.0_3.0_1726931442096.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_large_contrastive_self_supervised_acl2020","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_large_contrastive_self_supervised_acl2020","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_large_contrastive_self_supervised_acl2020| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/sap-ai-research/BERT-Large-Contrastive-Self-Supervised-ACL2020 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-sent_bert_large_contrastive_self_supervised_acl2020_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-sent_bert_large_contrastive_self_supervised_acl2020_pipeline_en.md new file mode 100644 index 00000000000000..c042341977d776 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-sent_bert_large_contrastive_self_supervised_acl2020_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_large_contrastive_self_supervised_acl2020_pipeline pipeline BertSentenceEmbeddings from sap-ai-research +author: John Snow Labs +name: sent_bert_large_contrastive_self_supervised_acl2020_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_large_contrastive_self_supervised_acl2020_pipeline` is a English model originally trained by sap-ai-research. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_large_contrastive_self_supervised_acl2020_pipeline_en_5.5.0_3.0_1726931498893.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_large_contrastive_self_supervised_acl2020_pipeline_en_5.5.0_3.0_1726931498893.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_large_contrastive_self_supervised_acl2020_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_large_contrastive_self_supervised_acl2020_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_large_contrastive_self_supervised_acl2020_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/sap-ai-research/BERT-Large-Contrastive-Self-Supervised-ACL2020 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-sent_bert_mlm_armas_inga_estrella_en.md b/docs/_posts/ahmedlone127/2024-09-21-sent_bert_mlm_armas_inga_estrella_en.md new file mode 100644 index 00000000000000..05a15f0c18ef7b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-sent_bert_mlm_armas_inga_estrella_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_mlm_armas_inga_estrella BertSentenceEmbeddings from JFernandoGRE +author: John Snow Labs +name: sent_bert_mlm_armas_inga_estrella +date: 2024-09-21 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_mlm_armas_inga_estrella` is a English model originally trained by JFernandoGRE. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_mlm_armas_inga_estrella_en_5.5.0_3.0_1726913862288.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_mlm_armas_inga_estrella_en_5.5.0_3.0_1726913862288.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_mlm_armas_inga_estrella","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_mlm_armas_inga_estrella","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_mlm_armas_inga_estrella| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|409.7 MB| + +## References + +https://huggingface.co/JFernandoGRE/bert_mlm_ARMAS_INGA_ESTRELLA \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-sent_bert_mlm_armas_inga_estrella_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-sent_bert_mlm_armas_inga_estrella_pipeline_en.md new file mode 100644 index 00000000000000..2a6aef20f57869 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-sent_bert_mlm_armas_inga_estrella_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_mlm_armas_inga_estrella_pipeline pipeline BertSentenceEmbeddings from JFernandoGRE +author: John Snow Labs +name: sent_bert_mlm_armas_inga_estrella_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_mlm_armas_inga_estrella_pipeline` is a English model originally trained by JFernandoGRE. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_mlm_armas_inga_estrella_pipeline_en_5.5.0_3.0_1726913881158.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_mlm_armas_inga_estrella_pipeline_en_5.5.0_3.0_1726913881158.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_mlm_armas_inga_estrella_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_mlm_armas_inga_estrella_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_mlm_armas_inga_estrella_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|410.2 MB| + +## References + +https://huggingface.co/JFernandoGRE/bert_mlm_ARMAS_INGA_ESTRELLA + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-sent_bertimbau_large_fine_tuned_sindhi_en.md b/docs/_posts/ahmedlone127/2024-09-21-sent_bertimbau_large_fine_tuned_sindhi_en.md new file mode 100644 index 00000000000000..7d056d5b05cdf6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-sent_bertimbau_large_fine_tuned_sindhi_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bertimbau_large_fine_tuned_sindhi BertSentenceEmbeddings from AVSilva +author: John Snow Labs +name: sent_bertimbau_large_fine_tuned_sindhi +date: 2024-09-21 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bertimbau_large_fine_tuned_sindhi` is a English model originally trained by AVSilva. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bertimbau_large_fine_tuned_sindhi_en_5.5.0_3.0_1726913695133.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bertimbau_large_fine_tuned_sindhi_en_5.5.0_3.0_1726913695133.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bertimbau_large_fine_tuned_sindhi","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bertimbau_large_fine_tuned_sindhi","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bertimbau_large_fine_tuned_sindhi| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/AVSilva/bertimbau-large-fine-tuned-sd \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-sent_bertimbau_large_fine_tuned_sindhi_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-sent_bertimbau_large_fine_tuned_sindhi_pipeline_en.md new file mode 100644 index 00000000000000..0d1f561199303f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-sent_bertimbau_large_fine_tuned_sindhi_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bertimbau_large_fine_tuned_sindhi_pipeline pipeline BertSentenceEmbeddings from AVSilva +author: John Snow Labs +name: sent_bertimbau_large_fine_tuned_sindhi_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bertimbau_large_fine_tuned_sindhi_pipeline` is a English model originally trained by AVSilva. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bertimbau_large_fine_tuned_sindhi_pipeline_en_5.5.0_3.0_1726913757007.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bertimbau_large_fine_tuned_sindhi_pipeline_en_5.5.0_3.0_1726913757007.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bertimbau_large_fine_tuned_sindhi_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bertimbau_large_fine_tuned_sindhi_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bertimbau_large_fine_tuned_sindhi_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/AVSilva/bertimbau-large-fine-tuned-sd + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-sent_biomedical_en.md b/docs/_posts/ahmedlone127/2024-09-21-sent_biomedical_en.md new file mode 100644 index 00000000000000..a5e9f79a5b7f9d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-sent_biomedical_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_biomedical BertSentenceEmbeddings from ajitrajasekharan +author: John Snow Labs +name: sent_biomedical +date: 2024-09-21 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_biomedical` is a English model originally trained by ajitrajasekharan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_biomedical_en_5.5.0_3.0_1726931483563.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_biomedical_en_5.5.0_3.0_1726931483563.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_biomedical","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_biomedical","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_biomedical| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/ajitrajasekharan/biomedical \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-sent_biomedical_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-sent_biomedical_pipeline_en.md new file mode 100644 index 00000000000000..91023f32c7fbda --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-sent_biomedical_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_biomedical_pipeline pipeline BertSentenceEmbeddings from ajitrajasekharan +author: John Snow Labs +name: sent_biomedical_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_biomedical_pipeline` is a English model originally trained by ajitrajasekharan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_biomedical_pipeline_en_5.5.0_3.0_1726931540826.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_biomedical_pipeline_en_5.5.0_3.0_1726931540826.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_biomedical_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_biomedical_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_biomedical_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/ajitrajasekharan/biomedical + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-sent_defsent_bert_base_uncased_mean_en.md b/docs/_posts/ahmedlone127/2024-09-21-sent_defsent_bert_base_uncased_mean_en.md new file mode 100644 index 00000000000000..2d73c9f996c0a2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-sent_defsent_bert_base_uncased_mean_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_defsent_bert_base_uncased_mean BertSentenceEmbeddings from cl-nagoya +author: John Snow Labs +name: sent_defsent_bert_base_uncased_mean +date: 2024-09-21 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_defsent_bert_base_uncased_mean` is a English model originally trained by cl-nagoya. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_defsent_bert_base_uncased_mean_en_5.5.0_3.0_1726941302233.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_defsent_bert_base_uncased_mean_en_5.5.0_3.0_1726941302233.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_defsent_bert_base_uncased_mean","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_defsent_bert_base_uncased_mean","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_defsent_bert_base_uncased_mean| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/cl-nagoya/defsent-bert-base-uncased-mean \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-sent_defsent_bert_base_uncased_mean_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-sent_defsent_bert_base_uncased_mean_pipeline_en.md new file mode 100644 index 00000000000000..00c35f6fdbe4ea --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-sent_defsent_bert_base_uncased_mean_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_defsent_bert_base_uncased_mean_pipeline pipeline BertSentenceEmbeddings from cl-nagoya +author: John Snow Labs +name: sent_defsent_bert_base_uncased_mean_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_defsent_bert_base_uncased_mean_pipeline` is a English model originally trained by cl-nagoya. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_defsent_bert_base_uncased_mean_pipeline_en_5.5.0_3.0_1726941325000.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_defsent_bert_base_uncased_mean_pipeline_en_5.5.0_3.0_1726941325000.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_defsent_bert_base_uncased_mean_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_defsent_bert_base_uncased_mean_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_defsent_bert_base_uncased_mean_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.7 MB| + +## References + +https://huggingface.co/cl-nagoya/defsent-bert-base-uncased-mean + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-sent_defsent_bert_large_uncased_cls_en.md b/docs/_posts/ahmedlone127/2024-09-21-sent_defsent_bert_large_uncased_cls_en.md new file mode 100644 index 00000000000000..aee330b76ae731 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-sent_defsent_bert_large_uncased_cls_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_defsent_bert_large_uncased_cls BertSentenceEmbeddings from cl-nagoya +author: John Snow Labs +name: sent_defsent_bert_large_uncased_cls +date: 2024-09-21 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_defsent_bert_large_uncased_cls` is a English model originally trained by cl-nagoya. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_defsent_bert_large_uncased_cls_en_5.5.0_3.0_1726941575535.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_defsent_bert_large_uncased_cls_en_5.5.0_3.0_1726941575535.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_defsent_bert_large_uncased_cls","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_defsent_bert_large_uncased_cls","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_defsent_bert_large_uncased_cls| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/cl-nagoya/defsent-bert-large-uncased-cls \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-sent_defsent_bert_large_uncased_cls_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-sent_defsent_bert_large_uncased_cls_pipeline_en.md new file mode 100644 index 00000000000000..cae4cd22abf562 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-sent_defsent_bert_large_uncased_cls_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_defsent_bert_large_uncased_cls_pipeline pipeline BertSentenceEmbeddings from cl-nagoya +author: John Snow Labs +name: sent_defsent_bert_large_uncased_cls_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_defsent_bert_large_uncased_cls_pipeline` is a English model originally trained by cl-nagoya. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_defsent_bert_large_uncased_cls_pipeline_en_5.5.0_3.0_1726941634187.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_defsent_bert_large_uncased_cls_pipeline_en_5.5.0_3.0_1726941634187.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_defsent_bert_large_uncased_cls_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_defsent_bert_large_uncased_cls_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_defsent_bert_large_uncased_cls_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/cl-nagoya/defsent-bert-large-uncased-cls + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-sent_distilbertu_base_cased_0_0_en.md b/docs/_posts/ahmedlone127/2024-09-21-sent_distilbertu_base_cased_0_0_en.md new file mode 100644 index 00000000000000..e16c54ca605458 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-sent_distilbertu_base_cased_0_0_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_distilbertu_base_cased_0_0 BertSentenceEmbeddings from amitness +author: John Snow Labs +name: sent_distilbertu_base_cased_0_0 +date: 2024-09-21 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_distilbertu_base_cased_0_0` is a English model originally trained by amitness. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_distilbertu_base_cased_0_0_en_5.5.0_3.0_1726931753590.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_distilbertu_base_cased_0_0_en_5.5.0_3.0_1726931753590.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_distilbertu_base_cased_0_0","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_distilbertu_base_cased_0_0","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_distilbertu_base_cased_0_0| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|470.4 MB| + +## References + +https://huggingface.co/amitness/distilbertu-base-cased-0.0 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-sent_distilbertu_base_cased_0_0_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-sent_distilbertu_base_cased_0_0_pipeline_en.md new file mode 100644 index 00000000000000..1068a27b7267e3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-sent_distilbertu_base_cased_0_0_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_distilbertu_base_cased_0_0_pipeline pipeline BertSentenceEmbeddings from amitness +author: John Snow Labs +name: sent_distilbertu_base_cased_0_0_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_distilbertu_base_cased_0_0_pipeline` is a English model originally trained by amitness. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_distilbertu_base_cased_0_0_pipeline_en_5.5.0_3.0_1726931774954.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_distilbertu_base_cased_0_0_pipeline_en_5.5.0_3.0_1726931774954.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_distilbertu_base_cased_0_0_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_distilbertu_base_cased_0_0_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_distilbertu_base_cased_0_0_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|471.0 MB| + +## References + +https://huggingface.co/amitness/distilbertu-base-cased-0.0 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-sent_hatebertimbau_pipeline_pt.md b/docs/_posts/ahmedlone127/2024-09-21-sent_hatebertimbau_pipeline_pt.md new file mode 100644 index 00000000000000..126fc17e6f0df6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-sent_hatebertimbau_pipeline_pt.md @@ -0,0 +1,71 @@ +--- +layout: model +title: Portuguese sent_hatebertimbau_pipeline pipeline BertSentenceEmbeddings from knowhate +author: John Snow Labs +name: sent_hatebertimbau_pipeline +date: 2024-09-21 +tags: [pt, open_source, pipeline, onnx] +task: Embeddings +language: pt +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_hatebertimbau_pipeline` is a Portuguese model originally trained by knowhate. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_hatebertimbau_pipeline_pt_5.5.0_3.0_1726913827278.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_hatebertimbau_pipeline_pt_5.5.0_3.0_1726913827278.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_hatebertimbau_pipeline", lang = "pt") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_hatebertimbau_pipeline", lang = "pt") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_hatebertimbau_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|pt| +|Size:|406.1 MB| + +## References + +https://huggingface.co/knowhate/HateBERTimbau + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-sent_hatebertimbau_pt.md b/docs/_posts/ahmedlone127/2024-09-21-sent_hatebertimbau_pt.md new file mode 100644 index 00000000000000..6ac84170a6eb3a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-sent_hatebertimbau_pt.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Portuguese sent_hatebertimbau BertSentenceEmbeddings from knowhate +author: John Snow Labs +name: sent_hatebertimbau +date: 2024-09-21 +tags: [pt, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: pt +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_hatebertimbau` is a Portuguese model originally trained by knowhate. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_hatebertimbau_pt_5.5.0_3.0_1726913808299.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_hatebertimbau_pt_5.5.0_3.0_1726913808299.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_hatebertimbau","pt") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_hatebertimbau","pt") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_hatebertimbau| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|pt| +|Size:|405.5 MB| + +## References + +https://huggingface.co/knowhate/HateBERTimbau \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-sent_hindi_bert_v1_hi.md b/docs/_posts/ahmedlone127/2024-09-21-sent_hindi_bert_v1_hi.md new file mode 100644 index 00000000000000..0146f37f92571e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-sent_hindi_bert_v1_hi.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Hindi sent_hindi_bert_v1 BertSentenceEmbeddings from l3cube-pune +author: John Snow Labs +name: sent_hindi_bert_v1 +date: 2024-09-21 +tags: [hi, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: hi +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_hindi_bert_v1` is a Hindi model originally trained by l3cube-pune. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_hindi_bert_v1_hi_5.5.0_3.0_1726941623089.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_hindi_bert_v1_hi_5.5.0_3.0_1726941623089.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_hindi_bert_v1","hi") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_hindi_bert_v1","hi") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_hindi_bert_v1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|hi| +|Size:|663.8 MB| + +## References + +https://huggingface.co/l3cube-pune/hindi-bert-v1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-sent_hindi_bert_v1_pipeline_hi.md b/docs/_posts/ahmedlone127/2024-09-21-sent_hindi_bert_v1_pipeline_hi.md new file mode 100644 index 00000000000000..8750c34b1e3a77 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-sent_hindi_bert_v1_pipeline_hi.md @@ -0,0 +1,71 @@ +--- +layout: model +title: Hindi sent_hindi_bert_v1_pipeline pipeline BertSentenceEmbeddings from l3cube-pune +author: John Snow Labs +name: sent_hindi_bert_v1_pipeline +date: 2024-09-21 +tags: [hi, open_source, pipeline, onnx] +task: Embeddings +language: hi +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_hindi_bert_v1_pipeline` is a Hindi model originally trained by l3cube-pune. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_hindi_bert_v1_pipeline_hi_5.5.0_3.0_1726941654573.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_hindi_bert_v1_pipeline_hi_5.5.0_3.0_1726941654573.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_hindi_bert_v1_pipeline", lang = "hi") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_hindi_bert_v1_pipeline", lang = "hi") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_hindi_bert_v1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|hi| +|Size:|664.4 MB| + +## References + +https://huggingface.co/l3cube-pune/hindi-bert-v1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-sent_korean_albert_base_v1_ko.md b/docs/_posts/ahmedlone127/2024-09-21-sent_korean_albert_base_v1_ko.md new file mode 100644 index 00000000000000..e832afa2fcea8a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-sent_korean_albert_base_v1_ko.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Korean sent_korean_albert_base_v1 BertSentenceEmbeddings from lots-o +author: John Snow Labs +name: sent_korean_albert_base_v1 +date: 2024-09-21 +tags: [ko, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: ko +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_korean_albert_base_v1` is a Korean model originally trained by lots-o. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_korean_albert_base_v1_ko_5.5.0_3.0_1726941863347.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_korean_albert_base_v1_ko_5.5.0_3.0_1726941863347.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_korean_albert_base_v1","ko") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_korean_albert_base_v1","ko") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_korean_albert_base_v1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|ko| +|Size:|47.7 MB| + +## References + +https://huggingface.co/lots-o/ko-albert-base-v1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-sent_korean_albert_base_v1_pipeline_ko.md b/docs/_posts/ahmedlone127/2024-09-21-sent_korean_albert_base_v1_pipeline_ko.md new file mode 100644 index 00000000000000..8639f945c12dd3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-sent_korean_albert_base_v1_pipeline_ko.md @@ -0,0 +1,71 @@ +--- +layout: model +title: Korean sent_korean_albert_base_v1_pipeline pipeline BertSentenceEmbeddings from lots-o +author: John Snow Labs +name: sent_korean_albert_base_v1_pipeline +date: 2024-09-21 +tags: [ko, open_source, pipeline, onnx] +task: Embeddings +language: ko +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_korean_albert_base_v1_pipeline` is a Korean model originally trained by lots-o. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_korean_albert_base_v1_pipeline_ko_5.5.0_3.0_1726941866082.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_korean_albert_base_v1_pipeline_ko_5.5.0_3.0_1726941866082.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_korean_albert_base_v1_pipeline", lang = "ko") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_korean_albert_base_v1_pipeline", lang = "ko") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_korean_albert_base_v1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|ko| +|Size:|48.3 MB| + +## References + +https://huggingface.co/lots-o/ko-albert-base-v1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-sent_logion_base_en.md b/docs/_posts/ahmedlone127/2024-09-21-sent_logion_base_en.md new file mode 100644 index 00000000000000..04a6f87beb8c5d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-sent_logion_base_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_logion_base BertSentenceEmbeddings from cabrooks +author: John Snow Labs +name: sent_logion_base +date: 2024-09-21 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_logion_base` is a English model originally trained by cabrooks. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_logion_base_en_5.5.0_3.0_1726941302826.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_logion_base_en_5.5.0_3.0_1726941302826.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_logion_base","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_logion_base","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_logion_base| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|420.8 MB| + +## References + +https://huggingface.co/cabrooks/LOGION-base \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-sent_logion_base_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-sent_logion_base_pipeline_en.md new file mode 100644 index 00000000000000..c98cdc34db4e01 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-sent_logion_base_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_logion_base_pipeline pipeline BertSentenceEmbeddings from cabrooks +author: John Snow Labs +name: sent_logion_base_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_logion_base_pipeline` is a English model originally trained by cabrooks. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_logion_base_pipeline_en_5.5.0_3.0_1726941325565.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_logion_base_pipeline_en_5.5.0_3.0_1726941325565.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_logion_base_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_logion_base_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_logion_base_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|421.4 MB| + +## References + +https://huggingface.co/cabrooks/LOGION-base + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-sent_minilmv2_l6_h768_distilled_from_bert_base_en.md b/docs/_posts/ahmedlone127/2024-09-21-sent_minilmv2_l6_h768_distilled_from_bert_base_en.md new file mode 100644 index 00000000000000..52159f972b8da9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-sent_minilmv2_l6_h768_distilled_from_bert_base_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_minilmv2_l6_h768_distilled_from_bert_base BertSentenceEmbeddings from nreimers +author: John Snow Labs +name: sent_minilmv2_l6_h768_distilled_from_bert_base +date: 2024-09-21 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_minilmv2_l6_h768_distilled_from_bert_base` is a English model originally trained by nreimers. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_minilmv2_l6_h768_distilled_from_bert_base_en_5.5.0_3.0_1726941672164.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_minilmv2_l6_h768_distilled_from_bert_base_en_5.5.0_3.0_1726941672164.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_minilmv2_l6_h768_distilled_from_bert_base","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_minilmv2_l6_h768_distilled_from_bert_base","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_minilmv2_l6_h768_distilled_from_bert_base| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|158.7 MB| + +## References + +https://huggingface.co/nreimers/MiniLMv2-L6-H768-distilled-from-BERT-Base \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-sent_minilmv2_l6_h768_distilled_from_bert_base_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-sent_minilmv2_l6_h768_distilled_from_bert_base_pipeline_en.md new file mode 100644 index 00000000000000..5100e16ef1d36a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-sent_minilmv2_l6_h768_distilled_from_bert_base_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_minilmv2_l6_h768_distilled_from_bert_base_pipeline pipeline BertSentenceEmbeddings from nreimers +author: John Snow Labs +name: sent_minilmv2_l6_h768_distilled_from_bert_base_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_minilmv2_l6_h768_distilled_from_bert_base_pipeline` is a English model originally trained by nreimers. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_minilmv2_l6_h768_distilled_from_bert_base_pipeline_en_5.5.0_3.0_1726941718978.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_minilmv2_l6_h768_distilled_from_bert_base_pipeline_en_5.5.0_3.0_1726941718978.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_minilmv2_l6_h768_distilled_from_bert_base_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_minilmv2_l6_h768_distilled_from_bert_base_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_minilmv2_l6_h768_distilled_from_bert_base_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|159.3 MB| + +## References + +https://huggingface.co/nreimers/MiniLMv2-L6-H768-distilled-from-BERT-Base + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-sent_opticalbert_uncased_en.md b/docs/_posts/ahmedlone127/2024-09-21-sent_opticalbert_uncased_en.md new file mode 100644 index 00000000000000..c3f39f8f8b1df5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-sent_opticalbert_uncased_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_opticalbert_uncased BertSentenceEmbeddings from opticalmaterials +author: John Snow Labs +name: sent_opticalbert_uncased +date: 2024-09-21 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_opticalbert_uncased` is a English model originally trained by opticalmaterials. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_opticalbert_uncased_en_5.5.0_3.0_1726913612953.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_opticalbert_uncased_en_5.5.0_3.0_1726913612953.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_opticalbert_uncased","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_opticalbert_uncased","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_opticalbert_uncased| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|407.1 MB| + +## References + +https://huggingface.co/opticalmaterials/opticalbert_uncased \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-sent_opticalbert_uncased_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-sent_opticalbert_uncased_pipeline_en.md new file mode 100644 index 00000000000000..ac9e2a3cd65b77 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-sent_opticalbert_uncased_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_opticalbert_uncased_pipeline pipeline BertSentenceEmbeddings from opticalmaterials +author: John Snow Labs +name: sent_opticalbert_uncased_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_opticalbert_uncased_pipeline` is a English model originally trained by opticalmaterials. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_opticalbert_uncased_pipeline_en_5.5.0_3.0_1726913632176.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_opticalbert_uncased_pipeline_en_5.5.0_3.0_1726913632176.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_opticalbert_uncased_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_opticalbert_uncased_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_opticalbert_uncased_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.6 MB| + +## References + +https://huggingface.co/opticalmaterials/opticalbert_uncased + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-sent_protaugment_lm_hwu64_en.md b/docs/_posts/ahmedlone127/2024-09-21-sent_protaugment_lm_hwu64_en.md new file mode 100644 index 00000000000000..0c241125810b96 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-sent_protaugment_lm_hwu64_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_protaugment_lm_hwu64 BertSentenceEmbeddings from tdopierre +author: John Snow Labs +name: sent_protaugment_lm_hwu64 +date: 2024-09-21 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_protaugment_lm_hwu64` is a English model originally trained by tdopierre. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_protaugment_lm_hwu64_en_5.5.0_3.0_1726898733210.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_protaugment_lm_hwu64_en_5.5.0_3.0_1726898733210.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_protaugment_lm_hwu64","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_protaugment_lm_hwu64","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_protaugment_lm_hwu64| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|403.5 MB| + +## References + +https://huggingface.co/tdopierre/ProtAugment-LM-HWU64 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-sent_protaugment_lm_hwu64_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-sent_protaugment_lm_hwu64_pipeline_en.md new file mode 100644 index 00000000000000..803b6ba8e8c739 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-sent_protaugment_lm_hwu64_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_protaugment_lm_hwu64_pipeline pipeline BertSentenceEmbeddings from tdopierre +author: John Snow Labs +name: sent_protaugment_lm_hwu64_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_protaugment_lm_hwu64_pipeline` is a English model originally trained by tdopierre. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_protaugment_lm_hwu64_pipeline_en_5.5.0_3.0_1726898750811.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_protaugment_lm_hwu64_pipeline_en_5.5.0_3.0_1726898750811.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_protaugment_lm_hwu64_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_protaugment_lm_hwu64_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_protaugment_lm_hwu64_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|404.1 MB| + +## References + +https://huggingface.co/tdopierre/ProtAugment-LM-HWU64 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-sentiment_roberta_twitter_imdb_10_en.md b/docs/_posts/ahmedlone127/2024-09-21-sentiment_roberta_twitter_imdb_10_en.md new file mode 100644 index 00000000000000..e44f9418480705 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-sentiment_roberta_twitter_imdb_10_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sentiment_roberta_twitter_imdb_10 RoBertaForSequenceClassification from pachequinho +author: John Snow Labs +name: sentiment_roberta_twitter_imdb_10 +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sentiment_roberta_twitter_imdb_10` is a English model originally trained by pachequinho. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sentiment_roberta_twitter_imdb_10_en_5.5.0_3.0_1726900643986.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sentiment_roberta_twitter_imdb_10_en_5.5.0_3.0_1726900643986.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("sentiment_roberta_twitter_imdb_10","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("sentiment_roberta_twitter_imdb_10", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sentiment_roberta_twitter_imdb_10| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|468.3 MB| + +## References + +https://huggingface.co/pachequinho/sentiment_roberta_twitter_imdb_10 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-sentiment_roberta_twitter_imdb_10_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-sentiment_roberta_twitter_imdb_10_pipeline_en.md new file mode 100644 index 00000000000000..22d9ef047ee330 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-sentiment_roberta_twitter_imdb_10_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English sentiment_roberta_twitter_imdb_10_pipeline pipeline RoBertaForSequenceClassification from pachequinho +author: John Snow Labs +name: sentiment_roberta_twitter_imdb_10_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sentiment_roberta_twitter_imdb_10_pipeline` is a English model originally trained by pachequinho. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sentiment_roberta_twitter_imdb_10_pipeline_en_5.5.0_3.0_1726900665351.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sentiment_roberta_twitter_imdb_10_pipeline_en_5.5.0_3.0_1726900665351.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sentiment_roberta_twitter_imdb_10_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sentiment_roberta_twitter_imdb_10_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sentiment_roberta_twitter_imdb_10_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|468.3 MB| + +## References + +https://huggingface.co/pachequinho/sentiment_roberta_twitter_imdb_10 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-snli_6_en.md b/docs/_posts/ahmedlone127/2024-09-21-snli_6_en.md new file mode 100644 index 00000000000000..8ab7dc0826a37d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-snli_6_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English snli_6 RoBertaEmbeddings from mahdiyar +author: John Snow Labs +name: snli_6 +date: 2024-09-21 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`snli_6` is a English model originally trained by mahdiyar. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/snli_6_en_5.5.0_3.0_1726934811509.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/snli_6_en_5.5.0_3.0_1726934811509.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("snli_6","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("snli_6","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|snli_6| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|445.6 MB| + +## References + +https://huggingface.co/mahdiyar/snli-6 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-snli_6_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-snli_6_pipeline_en.md new file mode 100644 index 00000000000000..4993e44d366e21 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-snli_6_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English snli_6_pipeline pipeline RoBertaEmbeddings from mahdiyar +author: John Snow Labs +name: snli_6_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`snli_6_pipeline` is a English model originally trained by mahdiyar. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/snli_6_pipeline_en_5.5.0_3.0_1726934838130.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/snli_6_pipeline_en_5.5.0_3.0_1726934838130.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("snli_6_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("snli_6_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|snli_6_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|445.6 MB| + +## References + +https://huggingface.co/mahdiyar/snli-6 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-spam_classifier_qiragg_en.md b/docs/_posts/ahmedlone127/2024-09-21-spam_classifier_qiragg_en.md new file mode 100644 index 00000000000000..254cdcfdaf50c1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-spam_classifier_qiragg_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English spam_classifier_qiragg DistilBertForSequenceClassification from qiragg +author: John Snow Labs +name: spam_classifier_qiragg +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`spam_classifier_qiragg` is a English model originally trained by qiragg. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/spam_classifier_qiragg_en_5.5.0_3.0_1726888734132.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/spam_classifier_qiragg_en_5.5.0_3.0_1726888734132.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("spam_classifier_qiragg","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("spam_classifier_qiragg", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|spam_classifier_qiragg| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/qiragg/spam-classifier \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-spam_classifier_qiragg_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-spam_classifier_qiragg_pipeline_en.md new file mode 100644 index 00000000000000..1dd48d6c99644d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-spam_classifier_qiragg_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English spam_classifier_qiragg_pipeline pipeline DistilBertForSequenceClassification from qiragg +author: John Snow Labs +name: spam_classifier_qiragg_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`spam_classifier_qiragg_pipeline` is a English model originally trained by qiragg. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/spam_classifier_qiragg_pipeline_en_5.5.0_3.0_1726888746056.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/spam_classifier_qiragg_pipeline_en_5.5.0_3.0_1726888746056.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("spam_classifier_qiragg_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("spam_classifier_qiragg_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|spam_classifier_qiragg_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/qiragg/spam-classifier + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-spanish_euph_classifier_final_en.md b/docs/_posts/ahmedlone127/2024-09-21-spanish_euph_classifier_final_en.md new file mode 100644 index 00000000000000..daa4dba09aebff --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-spanish_euph_classifier_final_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English spanish_euph_classifier_final DistilBertForSequenceClassification from nhankins +author: John Snow Labs +name: spanish_euph_classifier_final +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`spanish_euph_classifier_final` is a English model originally trained by nhankins. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/spanish_euph_classifier_final_en_5.5.0_3.0_1726953595347.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/spanish_euph_classifier_final_en_5.5.0_3.0_1726953595347.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("spanish_euph_classifier_final","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("spanish_euph_classifier_final", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|spanish_euph_classifier_final| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|507.6 MB| + +## References + +https://huggingface.co/nhankins/es_euph_classifier_final \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-spanish_euph_classifier_final_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-spanish_euph_classifier_final_pipeline_en.md new file mode 100644 index 00000000000000..4d3e1531c16f4b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-spanish_euph_classifier_final_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English spanish_euph_classifier_final_pipeline pipeline DistilBertForSequenceClassification from nhankins +author: John Snow Labs +name: spanish_euph_classifier_final_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`spanish_euph_classifier_final_pipeline` is a English model originally trained by nhankins. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/spanish_euph_classifier_final_pipeline_en_5.5.0_3.0_1726953618866.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/spanish_euph_classifier_final_pipeline_en_5.5.0_3.0_1726953618866.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("spanish_euph_classifier_final_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("spanish_euph_classifier_final_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|spanish_euph_classifier_final_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|507.6 MB| + +## References + +https://huggingface.co/nhankins/es_euph_classifier_final + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-spanishroberta_multicardioner_en.md b/docs/_posts/ahmedlone127/2024-09-21-spanishroberta_multicardioner_en.md new file mode 100644 index 00000000000000..19945e3aadd6c5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-spanishroberta_multicardioner_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English spanishroberta_multicardioner RoBertaForTokenClassification from aaaksenova +author: John Snow Labs +name: spanishroberta_multicardioner +date: 2024-09-21 +tags: [en, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`spanishroberta_multicardioner` is a English model originally trained by aaaksenova. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/spanishroberta_multicardioner_en_5.5.0_3.0_1726926467779.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/spanishroberta_multicardioner_en_5.5.0_3.0_1726926467779.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("spanishroberta_multicardioner","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("spanishroberta_multicardioner", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|spanishroberta_multicardioner| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|440.6 MB| + +## References + +https://huggingface.co/aaaksenova/SpanishRoberta_multicardioner \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-spanishroberta_multicardioner_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-spanishroberta_multicardioner_pipeline_en.md new file mode 100644 index 00000000000000..816d7c4d3c3e84 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-spanishroberta_multicardioner_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English spanishroberta_multicardioner_pipeline pipeline RoBertaForTokenClassification from aaaksenova +author: John Snow Labs +name: spanishroberta_multicardioner_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`spanishroberta_multicardioner_pipeline` is a English model originally trained by aaaksenova. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/spanishroberta_multicardioner_pipeline_en_5.5.0_3.0_1726926495546.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/spanishroberta_multicardioner_pipeline_en_5.5.0_3.0_1726926495546.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("spanishroberta_multicardioner_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("spanishroberta_multicardioner_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|spanishroberta_multicardioner_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|440.6 MB| + +## References + +https://huggingface.co/aaaksenova/SpanishRoberta_multicardioner + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-sst2_padding70model_en.md b/docs/_posts/ahmedlone127/2024-09-21-sst2_padding70model_en.md new file mode 100644 index 00000000000000..3f0b7961b013e7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-sst2_padding70model_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sst2_padding70model DistilBertForSequenceClassification from Realgon +author: John Snow Labs +name: sst2_padding70model +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sst2_padding70model` is a English model originally trained by Realgon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sst2_padding70model_en_5.5.0_3.0_1726888915001.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sst2_padding70model_en_5.5.0_3.0_1726888915001.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("sst2_padding70model","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("sst2_padding70model", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sst2_padding70model| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Realgon/sst2_padding70model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-sst2_padding70model_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-sst2_padding70model_pipeline_en.md new file mode 100644 index 00000000000000..02c2ebbba94ee8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-sst2_padding70model_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English sst2_padding70model_pipeline pipeline DistilBertForSequenceClassification from Realgon +author: John Snow Labs +name: sst2_padding70model_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sst2_padding70model_pipeline` is a English model originally trained by Realgon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sst2_padding70model_pipeline_en_5.5.0_3.0_1726888926498.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sst2_padding70model_pipeline_en_5.5.0_3.0_1726888926498.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sst2_padding70model_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sst2_padding70model_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sst2_padding70model_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Realgon/sst2_padding70model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-sungbeom_whisper_small_korean_set1_ko.md b/docs/_posts/ahmedlone127/2024-09-21-sungbeom_whisper_small_korean_set1_ko.md new file mode 100644 index 00000000000000..cc85a26d125415 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-sungbeom_whisper_small_korean_set1_ko.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Korean sungbeom_whisper_small_korean_set1 WhisperForCTC from maxseats +author: John Snow Labs +name: sungbeom_whisper_small_korean_set1 +date: 2024-09-21 +tags: [ko, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: ko +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sungbeom_whisper_small_korean_set1` is a Korean model originally trained by maxseats. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sungbeom_whisper_small_korean_set1_ko_5.5.0_3.0_1726913050200.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sungbeom_whisper_small_korean_set1_ko_5.5.0_3.0_1726913050200.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("sungbeom_whisper_small_korean_set1","ko") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("sungbeom_whisper_small_korean_set1", "ko") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sungbeom_whisper_small_korean_set1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|ko| +|Size:|1.7 GB| + +## References + +https://huggingface.co/maxseats/SungBeom-whisper-small-ko-set1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-sungbeom_whisper_small_korean_set1_pipeline_ko.md b/docs/_posts/ahmedlone127/2024-09-21-sungbeom_whisper_small_korean_set1_pipeline_ko.md new file mode 100644 index 00000000000000..65e459f78de454 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-sungbeom_whisper_small_korean_set1_pipeline_ko.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Korean sungbeom_whisper_small_korean_set1_pipeline pipeline WhisperForCTC from maxseats +author: John Snow Labs +name: sungbeom_whisper_small_korean_set1_pipeline +date: 2024-09-21 +tags: [ko, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: ko +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sungbeom_whisper_small_korean_set1_pipeline` is a Korean model originally trained by maxseats. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sungbeom_whisper_small_korean_set1_pipeline_ko_5.5.0_3.0_1726913128073.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sungbeom_whisper_small_korean_set1_pipeline_ko_5.5.0_3.0_1726913128073.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sungbeom_whisper_small_korean_set1_pipeline", lang = "ko") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sungbeom_whisper_small_korean_set1_pipeline", lang = "ko") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sungbeom_whisper_small_korean_set1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|ko| +|Size:|1.7 GB| + +## References + +https://huggingface.co/maxseats/SungBeom-whisper-small-ko-set1 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-sungbeom_whisper_small_korean_set26_ko.md b/docs/_posts/ahmedlone127/2024-09-21-sungbeom_whisper_small_korean_set26_ko.md new file mode 100644 index 00000000000000..1e364ff7277ea0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-sungbeom_whisper_small_korean_set26_ko.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Korean sungbeom_whisper_small_korean_set26 WhisperForCTC from maxseats +author: John Snow Labs +name: sungbeom_whisper_small_korean_set26 +date: 2024-09-21 +tags: [ko, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: ko +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sungbeom_whisper_small_korean_set26` is a Korean model originally trained by maxseats. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sungbeom_whisper_small_korean_set26_ko_5.5.0_3.0_1726962587177.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sungbeom_whisper_small_korean_set26_ko_5.5.0_3.0_1726962587177.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("sungbeom_whisper_small_korean_set26","ko") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("sungbeom_whisper_small_korean_set26", "ko") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sungbeom_whisper_small_korean_set26| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|ko| +|Size:|1.7 GB| + +## References + +https://huggingface.co/maxseats/SungBeom-whisper-small-ko-set26 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-sungbeom_whisper_small_korean_set26_pipeline_ko.md b/docs/_posts/ahmedlone127/2024-09-21-sungbeom_whisper_small_korean_set26_pipeline_ko.md new file mode 100644 index 00000000000000..17ff5ae687630b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-sungbeom_whisper_small_korean_set26_pipeline_ko.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Korean sungbeom_whisper_small_korean_set26_pipeline pipeline WhisperForCTC from maxseats +author: John Snow Labs +name: sungbeom_whisper_small_korean_set26_pipeline +date: 2024-09-21 +tags: [ko, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: ko +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sungbeom_whisper_small_korean_set26_pipeline` is a Korean model originally trained by maxseats. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sungbeom_whisper_small_korean_set26_pipeline_ko_5.5.0_3.0_1726962664667.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sungbeom_whisper_small_korean_set26_pipeline_ko_5.5.0_3.0_1726962664667.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sungbeom_whisper_small_korean_set26_pipeline", lang = "ko") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sungbeom_whisper_small_korean_set26_pipeline", lang = "ko") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sungbeom_whisper_small_korean_set26_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|ko| +|Size:|1.7 GB| + +## References + +https://huggingface.co/maxseats/SungBeom-whisper-small-ko-set26 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-t_3_en.md b/docs/_posts/ahmedlone127/2024-09-21-t_3_en.md new file mode 100644 index 00000000000000..24ff518f787576 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-t_3_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English t_3 RoBertaForSequenceClassification from Pablojmed +author: John Snow Labs +name: t_3 +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`t_3` is a English model originally trained by Pablojmed. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/t_3_en_5.5.0_3.0_1726900545346.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/t_3_en_5.5.0_3.0_1726900545346.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("t_3","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("t_3", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|t_3| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|438.2 MB| + +## References + +https://huggingface.co/Pablojmed/t_3 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-t_3_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-t_3_pipeline_en.md new file mode 100644 index 00000000000000..16e7c767c2868b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-t_3_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English t_3_pipeline pipeline RoBertaForSequenceClassification from Pablojmed +author: John Snow Labs +name: t_3_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`t_3_pipeline` is a English model originally trained by Pablojmed. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/t_3_pipeline_en_5.5.0_3.0_1726900569278.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/t_3_pipeline_en_5.5.0_3.0_1726900569278.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("t_3_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("t_3_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|t_3_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|438.2 MB| + +## References + +https://huggingface.co/Pablojmed/t_3 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-t_6_en.md b/docs/_posts/ahmedlone127/2024-09-21-t_6_en.md new file mode 100644 index 00000000000000..4afd9ce181f91e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-t_6_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English t_6 RoBertaForSequenceClassification from Pablojmed +author: John Snow Labs +name: t_6 +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`t_6` is a English model originally trained by Pablojmed. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/t_6_en_5.5.0_3.0_1726940671995.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/t_6_en_5.5.0_3.0_1726940671995.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("t_6","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("t_6", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|t_6| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|438.3 MB| + +## References + +https://huggingface.co/Pablojmed/t_6 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-t_6_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-t_6_pipeline_en.md new file mode 100644 index 00000000000000..e64e7102d2ab1c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-t_6_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English t_6_pipeline pipeline RoBertaForSequenceClassification from Pablojmed +author: John Snow Labs +name: t_6_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`t_6_pipeline` is a English model originally trained by Pablojmed. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/t_6_pipeline_en_5.5.0_3.0_1726940696169.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/t_6_pipeline_en_5.5.0_3.0_1726940696169.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("t_6_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("t_6_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|t_6_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|438.3 MB| + +## References + +https://huggingface.co/Pablojmed/t_6 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-task_1_en.md b/docs/_posts/ahmedlone127/2024-09-21-task_1_en.md new file mode 100644 index 00000000000000..dc8db9954c45aa --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-task_1_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English task_1 DistilBertForSequenceClassification from lucando27 +author: John Snow Labs +name: task_1 +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`task_1` is a English model originally trained by lucando27. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/task_1_en_5.5.0_3.0_1726924071651.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/task_1_en_5.5.0_3.0_1726924071651.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("task_1","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("task_1", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|task_1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/lucando27/Task_1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-task_1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-task_1_pipeline_en.md new file mode 100644 index 00000000000000..9d790f364afcfd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-task_1_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English task_1_pipeline pipeline DistilBertForSequenceClassification from lucando27 +author: John Snow Labs +name: task_1_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`task_1_pipeline` is a English model originally trained by lucando27. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/task_1_pipeline_en_5.5.0_3.0_1726924083781.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/task_1_pipeline_en_5.5.0_3.0_1726924083781.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("task_1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("task_1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|task_1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/lucando27/Task_1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-teamim_tiny_weightdecay_0_05_combined_data_date_17_07_2024_10_10_pipeline_he.md b/docs/_posts/ahmedlone127/2024-09-21-teamim_tiny_weightdecay_0_05_combined_data_date_17_07_2024_10_10_pipeline_he.md new file mode 100644 index 00000000000000..edeb20fe62c2eb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-teamim_tiny_weightdecay_0_05_combined_data_date_17_07_2024_10_10_pipeline_he.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Hebrew teamim_tiny_weightdecay_0_05_combined_data_date_17_07_2024_10_10_pipeline pipeline WhisperForCTC from cantillation +author: John Snow Labs +name: teamim_tiny_weightdecay_0_05_combined_data_date_17_07_2024_10_10_pipeline +date: 2024-09-21 +tags: [he, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: he +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`teamim_tiny_weightdecay_0_05_combined_data_date_17_07_2024_10_10_pipeline` is a Hebrew model originally trained by cantillation. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/teamim_tiny_weightdecay_0_05_combined_data_date_17_07_2024_10_10_pipeline_he_5.5.0_3.0_1726878988531.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/teamim_tiny_weightdecay_0_05_combined_data_date_17_07_2024_10_10_pipeline_he_5.5.0_3.0_1726878988531.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("teamim_tiny_weightdecay_0_05_combined_data_date_17_07_2024_10_10_pipeline", lang = "he") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("teamim_tiny_weightdecay_0_05_combined_data_date_17_07_2024_10_10_pipeline", lang = "he") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|teamim_tiny_weightdecay_0_05_combined_data_date_17_07_2024_10_10_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|he| +|Size:|388.7 MB| + +## References + +https://huggingface.co/cantillation/Teamim-tiny_WeightDecay-0.05_Combined-Data_date-17-07-2024_10-10 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-test_filter0416_en.md b/docs/_posts/ahmedlone127/2024-09-21-test_filter0416_en.md new file mode 100644 index 00000000000000..824ef9f5d7c463 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-test_filter0416_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English test_filter0416 DistilBertForSequenceClassification from Filter0416 +author: John Snow Labs +name: test_filter0416 +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`test_filter0416` is a English model originally trained by Filter0416. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/test_filter0416_en_5.5.0_3.0_1726924402565.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/test_filter0416_en_5.5.0_3.0_1726924402565.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("test_filter0416","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("test_filter0416", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|test_filter0416| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Filter0416/test \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-test_filter0416_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-test_filter0416_pipeline_en.md new file mode 100644 index 00000000000000..2b2185e054ece2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-test_filter0416_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English test_filter0416_pipeline pipeline DistilBertForSequenceClassification from Filter0416 +author: John Snow Labs +name: test_filter0416_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`test_filter0416_pipeline` is a English model originally trained by Filter0416. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/test_filter0416_pipeline_en_5.5.0_3.0_1726924414204.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/test_filter0416_pipeline_en_5.5.0_3.0_1726924414204.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("test_filter0416_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("test_filter0416_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|test_filter0416_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Filter0416/test + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-test_reward_model_en.md b/docs/_posts/ahmedlone127/2024-09-21-test_reward_model_en.md new file mode 100644 index 00000000000000..0e39b7b3a4fa56 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-test_reward_model_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English test_reward_model RoBertaForSequenceClassification from Adzka +author: John Snow Labs +name: test_reward_model +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`test_reward_model` is a English model originally trained by Adzka. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/test_reward_model_en_5.5.0_3.0_1726940467484.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/test_reward_model_en_5.5.0_3.0_1726940467484.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("test_reward_model","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("test_reward_model", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|test_reward_model| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|467.7 MB| + +## References + +https://huggingface.co/Adzka/test-reward-model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-test_reward_model_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-test_reward_model_pipeline_en.md new file mode 100644 index 00000000000000..d025028ffa09fd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-test_reward_model_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English test_reward_model_pipeline pipeline RoBertaForSequenceClassification from Adzka +author: John Snow Labs +name: test_reward_model_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`test_reward_model_pipeline` is a English model originally trained by Adzka. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/test_reward_model_pipeline_en_5.5.0_3.0_1726940489062.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/test_reward_model_pipeline_en_5.5.0_3.0_1726940489062.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("test_reward_model_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("test_reward_model_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|test_reward_model_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|467.7 MB| + +## References + +https://huggingface.co/Adzka/test-reward-model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-text_classification_10000_en.md b/docs/_posts/ahmedlone127/2024-09-21-text_classification_10000_en.md new file mode 100644 index 00000000000000..a9a10c0a1503b3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-text_classification_10000_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English text_classification_10000 DistilBertForSequenceClassification from Neroism8422 +author: John Snow Labs +name: text_classification_10000 +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`text_classification_10000` is a English model originally trained by Neroism8422. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/text_classification_10000_en_5.5.0_3.0_1726884855454.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/text_classification_10000_en_5.5.0_3.0_1726884855454.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("text_classification_10000","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("text_classification_10000", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|text_classification_10000| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Neroism8422/Text_Classification_10000 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-text_classification_10000_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-text_classification_10000_pipeline_en.md new file mode 100644 index 00000000000000..60472f7b7a17f9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-text_classification_10000_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English text_classification_10000_pipeline pipeline DistilBertForSequenceClassification from Neroism8422 +author: John Snow Labs +name: text_classification_10000_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`text_classification_10000_pipeline` is a English model originally trained by Neroism8422. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/text_classification_10000_pipeline_en_5.5.0_3.0_1726884867126.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/text_classification_10000_pipeline_en_5.5.0_3.0_1726884867126.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("text_classification_10000_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("text_classification_10000_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|text_classification_10000_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Neroism8422/Text_Classification_10000 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-text_classifier_madanagrawal_en.md b/docs/_posts/ahmedlone127/2024-09-21-text_classifier_madanagrawal_en.md new file mode 100644 index 00000000000000..160c2ddacd9ad0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-text_classifier_madanagrawal_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English text_classifier_madanagrawal DistilBertForSequenceClassification from madanagrawal +author: John Snow Labs +name: text_classifier_madanagrawal +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`text_classifier_madanagrawal` is a English model originally trained by madanagrawal. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/text_classifier_madanagrawal_en_5.5.0_3.0_1726953563903.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/text_classifier_madanagrawal_en_5.5.0_3.0_1726953563903.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("text_classifier_madanagrawal","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("text_classifier_madanagrawal", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|text_classifier_madanagrawal| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/madanagrawal/text_classifier \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-text_classifier_madanagrawal_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-text_classifier_madanagrawal_pipeline_en.md new file mode 100644 index 00000000000000..b71838bd2cd744 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-text_classifier_madanagrawal_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English text_classifier_madanagrawal_pipeline pipeline DistilBertForSequenceClassification from madanagrawal +author: John Snow Labs +name: text_classifier_madanagrawal_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`text_classifier_madanagrawal_pipeline` is a English model originally trained by madanagrawal. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/text_classifier_madanagrawal_pipeline_en_5.5.0_3.0_1726953576701.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/text_classifier_madanagrawal_pipeline_en_5.5.0_3.0_1726953576701.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("text_classifier_madanagrawal_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("text_classifier_madanagrawal_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|text_classifier_madanagrawal_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/madanagrawal/text_classifier + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-tibetan_roberta_g_v1_918252_en.md b/docs/_posts/ahmedlone127/2024-09-21-tibetan_roberta_g_v1_918252_en.md new file mode 100644 index 00000000000000..8bee11c8b89fd8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-tibetan_roberta_g_v1_918252_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English tibetan_roberta_g_v1_918252 RoBertaEmbeddings from spsither +author: John Snow Labs +name: tibetan_roberta_g_v1_918252 +date: 2024-09-21 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`tibetan_roberta_g_v1_918252` is a English model originally trained by spsither. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/tibetan_roberta_g_v1_918252_en_5.5.0_3.0_1726934603465.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/tibetan_roberta_g_v1_918252_en_5.5.0_3.0_1726934603465.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("tibetan_roberta_g_v1_918252","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("tibetan_roberta_g_v1_918252","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|tibetan_roberta_g_v1_918252| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|412.1 MB| + +## References + +https://huggingface.co/spsither/tibetan_RoBERTa_G_v1_918252 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-tibetan_roberta_g_v1_918252_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-tibetan_roberta_g_v1_918252_pipeline_en.md new file mode 100644 index 00000000000000..9ad993169fac8b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-tibetan_roberta_g_v1_918252_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English tibetan_roberta_g_v1_918252_pipeline pipeline RoBertaEmbeddings from spsither +author: John Snow Labs +name: tibetan_roberta_g_v1_918252_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`tibetan_roberta_g_v1_918252_pipeline` is a English model originally trained by spsither. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/tibetan_roberta_g_v1_918252_pipeline_en_5.5.0_3.0_1726934621923.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/tibetan_roberta_g_v1_918252_pipeline_en_5.5.0_3.0_1726934621923.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("tibetan_roberta_g_v1_918252_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("tibetan_roberta_g_v1_918252_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|tibetan_roberta_g_v1_918252_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|412.2 MB| + +## References + +https://huggingface.co/spsither/tibetan_RoBERTa_G_v1_918252 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-tiny_english_combined_v4_1_0_32_1e_06_cool_sweep_12_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-tiny_english_combined_v4_1_0_32_1e_06_cool_sweep_12_pipeline_en.md new file mode 100644 index 00000000000000..c02d33547dcccf --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-tiny_english_combined_v4_1_0_32_1e_06_cool_sweep_12_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English tiny_english_combined_v4_1_0_32_1e_06_cool_sweep_12_pipeline pipeline WhisperForCTC from saahith +author: John Snow Labs +name: tiny_english_combined_v4_1_0_32_1e_06_cool_sweep_12_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`tiny_english_combined_v4_1_0_32_1e_06_cool_sweep_12_pipeline` is a English model originally trained by saahith. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/tiny_english_combined_v4_1_0_32_1e_06_cool_sweep_12_pipeline_en_5.5.0_3.0_1726908317619.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/tiny_english_combined_v4_1_0_32_1e_06_cool_sweep_12_pipeline_en_5.5.0_3.0_1726908317619.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("tiny_english_combined_v4_1_0_32_1e_06_cool_sweep_12_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("tiny_english_combined_v4_1_0_32_1e_06_cool_sweep_12_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|tiny_english_combined_v4_1_0_32_1e_06_cool_sweep_12_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|391.4 MB| + +## References + +https://huggingface.co/saahith/tiny.en-combined_v4-1-0-32-1e-06-cool-sweep-12 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-tiny_english_emsassist_2_10_0_2_16_1e_06_ethereal_sweep_16_en.md b/docs/_posts/ahmedlone127/2024-09-21-tiny_english_emsassist_2_10_0_2_16_1e_06_ethereal_sweep_16_en.md new file mode 100644 index 00000000000000..c8f8593e76fc3d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-tiny_english_emsassist_2_10_0_2_16_1e_06_ethereal_sweep_16_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English tiny_english_emsassist_2_10_0_2_16_1e_06_ethereal_sweep_16 WhisperForCTC from saahith +author: John Snow Labs +name: tiny_english_emsassist_2_10_0_2_16_1e_06_ethereal_sweep_16 +date: 2024-09-21 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`tiny_english_emsassist_2_10_0_2_16_1e_06_ethereal_sweep_16` is a English model originally trained by saahith. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/tiny_english_emsassist_2_10_0_2_16_1e_06_ethereal_sweep_16_en_5.5.0_3.0_1726908553172.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/tiny_english_emsassist_2_10_0_2_16_1e_06_ethereal_sweep_16_en_5.5.0_3.0_1726908553172.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("tiny_english_emsassist_2_10_0_2_16_1e_06_ethereal_sweep_16","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("tiny_english_emsassist_2_10_0_2_16_1e_06_ethereal_sweep_16", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|tiny_english_emsassist_2_10_0_2_16_1e_06_ethereal_sweep_16| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|394.7 MB| + +## References + +https://huggingface.co/saahith/tiny.en-EMSAssist-2-10-0.2-16-1e-06-ethereal-sweep-16 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-tiny_english_emsassist_2_10_0_2_16_1e_06_ethereal_sweep_16_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-tiny_english_emsassist_2_10_0_2_16_1e_06_ethereal_sweep_16_pipeline_en.md new file mode 100644 index 00000000000000..29bdb6bb712f63 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-tiny_english_emsassist_2_10_0_2_16_1e_06_ethereal_sweep_16_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English tiny_english_emsassist_2_10_0_2_16_1e_06_ethereal_sweep_16_pipeline pipeline WhisperForCTC from saahith +author: John Snow Labs +name: tiny_english_emsassist_2_10_0_2_16_1e_06_ethereal_sweep_16_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`tiny_english_emsassist_2_10_0_2_16_1e_06_ethereal_sweep_16_pipeline` is a English model originally trained by saahith. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/tiny_english_emsassist_2_10_0_2_16_1e_06_ethereal_sweep_16_pipeline_en_5.5.0_3.0_1726908573334.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/tiny_english_emsassist_2_10_0_2_16_1e_06_ethereal_sweep_16_pipeline_en_5.5.0_3.0_1726908573334.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("tiny_english_emsassist_2_10_0_2_16_1e_06_ethereal_sweep_16_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("tiny_english_emsassist_2_10_0_2_16_1e_06_ethereal_sweep_16_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|tiny_english_emsassist_2_10_0_2_16_1e_06_ethereal_sweep_16_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|394.7 MB| + +## References + +https://huggingface.co/saahith/tiny.en-EMSAssist-2-10-0.2-16-1e-06-ethereal-sweep-16 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-tiny_english_emsassist_2_1_0_16_1e_05_eager_sweep_4_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-tiny_english_emsassist_2_1_0_16_1e_05_eager_sweep_4_pipeline_en.md new file mode 100644 index 00000000000000..a25b20d5d4b2f5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-tiny_english_emsassist_2_1_0_16_1e_05_eager_sweep_4_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English tiny_english_emsassist_2_1_0_16_1e_05_eager_sweep_4_pipeline pipeline WhisperForCTC from saahith +author: John Snow Labs +name: tiny_english_emsassist_2_1_0_16_1e_05_eager_sweep_4_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`tiny_english_emsassist_2_1_0_16_1e_05_eager_sweep_4_pipeline` is a English model originally trained by saahith. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/tiny_english_emsassist_2_1_0_16_1e_05_eager_sweep_4_pipeline_en_5.5.0_3.0_1726903313740.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/tiny_english_emsassist_2_1_0_16_1e_05_eager_sweep_4_pipeline_en_5.5.0_3.0_1726903313740.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("tiny_english_emsassist_2_1_0_16_1e_05_eager_sweep_4_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("tiny_english_emsassist_2_1_0_16_1e_05_eager_sweep_4_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|tiny_english_emsassist_2_1_0_16_1e_05_eager_sweep_4_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|394.3 MB| + +## References + +https://huggingface.co/saahith/tiny.en-EMSAssist-2-1-0-16-1e-05-eager-sweep-4 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-tiny_english_final_combined_1_0_1_8_1e_06_daily_sweep_15_en.md b/docs/_posts/ahmedlone127/2024-09-21-tiny_english_final_combined_1_0_1_8_1e_06_daily_sweep_15_en.md new file mode 100644 index 00000000000000..11ab6a8a715d80 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-tiny_english_final_combined_1_0_1_8_1e_06_daily_sweep_15_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English tiny_english_final_combined_1_0_1_8_1e_06_daily_sweep_15 WhisperForCTC from saahith +author: John Snow Labs +name: tiny_english_final_combined_1_0_1_8_1e_06_daily_sweep_15 +date: 2024-09-21 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`tiny_english_final_combined_1_0_1_8_1e_06_daily_sweep_15` is a English model originally trained by saahith. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/tiny_english_final_combined_1_0_1_8_1e_06_daily_sweep_15_en_5.5.0_3.0_1726960757339.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/tiny_english_final_combined_1_0_1_8_1e_06_daily_sweep_15_en_5.5.0_3.0_1726960757339.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("tiny_english_final_combined_1_0_1_8_1e_06_daily_sweep_15","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("tiny_english_final_combined_1_0_1_8_1e_06_daily_sweep_15", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|tiny_english_final_combined_1_0_1_8_1e_06_daily_sweep_15| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|394.5 MB| + +## References + +https://huggingface.co/saahith/tiny.en-final-combined-1-0.1-8-1e-06-daily-sweep-15 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-tiny_english_final_combined_1_0_1_8_1e_06_daily_sweep_15_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-tiny_english_final_combined_1_0_1_8_1e_06_daily_sweep_15_pipeline_en.md new file mode 100644 index 00000000000000..52b77690a79dd0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-tiny_english_final_combined_1_0_1_8_1e_06_daily_sweep_15_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English tiny_english_final_combined_1_0_1_8_1e_06_daily_sweep_15_pipeline pipeline WhisperForCTC from saahith +author: John Snow Labs +name: tiny_english_final_combined_1_0_1_8_1e_06_daily_sweep_15_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`tiny_english_final_combined_1_0_1_8_1e_06_daily_sweep_15_pipeline` is a English model originally trained by saahith. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/tiny_english_final_combined_1_0_1_8_1e_06_daily_sweep_15_pipeline_en_5.5.0_3.0_1726960775763.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/tiny_english_final_combined_1_0_1_8_1e_06_daily_sweep_15_pipeline_en_5.5.0_3.0_1726960775763.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("tiny_english_final_combined_1_0_1_8_1e_06_daily_sweep_15_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("tiny_english_final_combined_1_0_1_8_1e_06_daily_sweep_15_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|tiny_english_final_combined_1_0_1_8_1e_06_daily_sweep_15_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|394.5 MB| + +## References + +https://huggingface.co/saahith/tiny.en-final-combined-1-0.1-8-1e-06-daily-sweep-15 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-tiny_random_debertaforquestionanswering_en.md b/docs/_posts/ahmedlone127/2024-09-21-tiny_random_debertaforquestionanswering_en.md new file mode 100644 index 00000000000000..b566650e97d2b4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-tiny_random_debertaforquestionanswering_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English tiny_random_debertaforquestionanswering BertForQuestionAnswering from hf-tiny-model-private +author: John Snow Labs +name: tiny_random_debertaforquestionanswering +date: 2024-09-21 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`tiny_random_debertaforquestionanswering` is a English model originally trained by hf-tiny-model-private. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/tiny_random_debertaforquestionanswering_en_5.5.0_3.0_1726946311859.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/tiny_random_debertaforquestionanswering_en_5.5.0_3.0_1726946311859.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("tiny_random_debertaforquestionanswering","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("tiny_random_debertaforquestionanswering", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|tiny_random_debertaforquestionanswering| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|339.7 KB| + +## References + +https://huggingface.co/hf-tiny-model-private/tiny-random-DebertaForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-tiny_random_debertaforquestionanswering_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-tiny_random_debertaforquestionanswering_pipeline_en.md new file mode 100644 index 00000000000000..c7dc48b62c98ac --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-tiny_random_debertaforquestionanswering_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English tiny_random_debertaforquestionanswering_pipeline pipeline BertForQuestionAnswering from hf-tiny-model-private +author: John Snow Labs +name: tiny_random_debertaforquestionanswering_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`tiny_random_debertaforquestionanswering_pipeline` is a English model originally trained by hf-tiny-model-private. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/tiny_random_debertaforquestionanswering_pipeline_en_5.5.0_3.0_1726946312295.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/tiny_random_debertaforquestionanswering_pipeline_en_5.5.0_3.0_1726946312295.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("tiny_random_debertaforquestionanswering_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("tiny_random_debertaforquestionanswering_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|tiny_random_debertaforquestionanswering_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|346.3 KB| + +## References + +https://huggingface.co/hf-tiny-model-private/tiny-random-DebertaForQuestionAnswering + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-tinybert_general_4l_312d_natureuniverse_en.md b/docs/_posts/ahmedlone127/2024-09-21-tinybert_general_4l_312d_natureuniverse_en.md new file mode 100644 index 00000000000000..2374a540026bf1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-tinybert_general_4l_312d_natureuniverse_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English tinybert_general_4l_312d_natureuniverse BertForQuestionAnswering from NatureUniverse +author: John Snow Labs +name: tinybert_general_4l_312d_natureuniverse +date: 2024-09-21 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`tinybert_general_4l_312d_natureuniverse` is a English model originally trained by NatureUniverse. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/tinybert_general_4l_312d_natureuniverse_en_5.5.0_3.0_1726928611962.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/tinybert_general_4l_312d_natureuniverse_en_5.5.0_3.0_1726928611962.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("tinybert_general_4l_312d_natureuniverse","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("tinybert_general_4l_312d_natureuniverse", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|tinybert_general_4l_312d_natureuniverse| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|53.8 MB| + +## References + +https://huggingface.co/NatureUniverse/TinyBERT_general_4L_312d \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-tinybert_general_4l_312d_natureuniverse_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-tinybert_general_4l_312d_natureuniverse_pipeline_en.md new file mode 100644 index 00000000000000..1a9ba4f48acf0e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-tinybert_general_4l_312d_natureuniverse_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English tinybert_general_4l_312d_natureuniverse_pipeline pipeline BertForQuestionAnswering from NatureUniverse +author: John Snow Labs +name: tinybert_general_4l_312d_natureuniverse_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`tinybert_general_4l_312d_natureuniverse_pipeline` is a English model originally trained by NatureUniverse. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/tinybert_general_4l_312d_natureuniverse_pipeline_en_5.5.0_3.0_1726928614777.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/tinybert_general_4l_312d_natureuniverse_pipeline_en_5.5.0_3.0_1726928614777.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("tinybert_general_4l_312d_natureuniverse_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("tinybert_general_4l_312d_natureuniverse_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|tinybert_general_4l_312d_natureuniverse_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|53.9 MB| + +## References + +https://huggingface.co/NatureUniverse/TinyBERT_general_4L_312d + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-topic_topic_random0_seed1_bernice_en.md b/docs/_posts/ahmedlone127/2024-09-21-topic_topic_random0_seed1_bernice_en.md new file mode 100644 index 00000000000000..5082416658a95b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-topic_topic_random0_seed1_bernice_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English topic_topic_random0_seed1_bernice XlmRoBertaForSequenceClassification from tweettemposhift +author: John Snow Labs +name: topic_topic_random0_seed1_bernice +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`topic_topic_random0_seed1_bernice` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/topic_topic_random0_seed1_bernice_en_5.5.0_3.0_1726932285428.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/topic_topic_random0_seed1_bernice_en_5.5.0_3.0_1726932285428.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("topic_topic_random0_seed1_bernice","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("topic_topic_random0_seed1_bernice", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|topic_topic_random0_seed1_bernice| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|805.7 MB| + +## References + +https://huggingface.co/tweettemposhift/topic-topic_random0_seed1-bernice \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-topic_topic_random0_seed1_bernice_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-topic_topic_random0_seed1_bernice_pipeline_en.md new file mode 100644 index 00000000000000..19b8094237e983 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-topic_topic_random0_seed1_bernice_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English topic_topic_random0_seed1_bernice_pipeline pipeline XlmRoBertaForSequenceClassification from tweettemposhift +author: John Snow Labs +name: topic_topic_random0_seed1_bernice_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`topic_topic_random0_seed1_bernice_pipeline` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/topic_topic_random0_seed1_bernice_pipeline_en_5.5.0_3.0_1726932418538.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/topic_topic_random0_seed1_bernice_pipeline_en_5.5.0_3.0_1726932418538.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("topic_topic_random0_seed1_bernice_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("topic_topic_random0_seed1_bernice_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|topic_topic_random0_seed1_bernice_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|805.7 MB| + +## References + +https://huggingface.co/tweettemposhift/topic-topic_random0_seed1-bernice + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-toulmin_classifier8_distilbert_base_uncased_en.md b/docs/_posts/ahmedlone127/2024-09-21-toulmin_classifier8_distilbert_base_uncased_en.md new file mode 100644 index 00000000000000..b731934b386b77 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-toulmin_classifier8_distilbert_base_uncased_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English toulmin_classifier8_distilbert_base_uncased DistilBertForSequenceClassification from againeureka +author: John Snow Labs +name: toulmin_classifier8_distilbert_base_uncased +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`toulmin_classifier8_distilbert_base_uncased` is a English model originally trained by againeureka. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/toulmin_classifier8_distilbert_base_uncased_en_5.5.0_3.0_1726889042315.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/toulmin_classifier8_distilbert_base_uncased_en_5.5.0_3.0_1726889042315.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("toulmin_classifier8_distilbert_base_uncased","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("toulmin_classifier8_distilbert_base_uncased", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|toulmin_classifier8_distilbert_base_uncased| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|394.6 MB| + +## References + +https://huggingface.co/againeureka/toulmin_classifier8_distilbert-base-uncased \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-toulmin_classifier8_distilbert_base_uncased_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-toulmin_classifier8_distilbert_base_uncased_pipeline_en.md new file mode 100644 index 00000000000000..443e04fb245b57 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-toulmin_classifier8_distilbert_base_uncased_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English toulmin_classifier8_distilbert_base_uncased_pipeline pipeline DistilBertForSequenceClassification from againeureka +author: John Snow Labs +name: toulmin_classifier8_distilbert_base_uncased_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`toulmin_classifier8_distilbert_base_uncased_pipeline` is a English model originally trained by againeureka. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/toulmin_classifier8_distilbert_base_uncased_pipeline_en_5.5.0_3.0_1726889062813.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/toulmin_classifier8_distilbert_base_uncased_pipeline_en_5.5.0_3.0_1726889062813.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("toulmin_classifier8_distilbert_base_uncased_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("toulmin_classifier8_distilbert_base_uncased_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|toulmin_classifier8_distilbert_base_uncased_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|394.6 MB| + +## References + +https://huggingface.co/againeureka/toulmin_classifier8_distilbert-base-uncased + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-toxic_comment_model_toxicity_ft_en.md b/docs/_posts/ahmedlone127/2024-09-21-toxic_comment_model_toxicity_ft_en.md new file mode 100644 index 00000000000000..49267e31fa3372 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-toxic_comment_model_toxicity_ft_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English toxic_comment_model_toxicity_ft DistilBertForSequenceClassification from fatmhd1995 +author: John Snow Labs +name: toxic_comment_model_toxicity_ft +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`toxic_comment_model_toxicity_ft` is a English model originally trained by fatmhd1995. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/toxic_comment_model_toxicity_ft_en_5.5.0_3.0_1726953456329.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/toxic_comment_model_toxicity_ft_en_5.5.0_3.0_1726953456329.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("toxic_comment_model_toxicity_ft","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("toxic_comment_model_toxicity_ft", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|toxic_comment_model_toxicity_ft| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/fatmhd1995/toxic-comment-model-TOXICITY-FT \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-toxic_comment_model_toxicity_ft_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-toxic_comment_model_toxicity_ft_pipeline_en.md new file mode 100644 index 00000000000000..21fceb423bf0a0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-toxic_comment_model_toxicity_ft_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English toxic_comment_model_toxicity_ft_pipeline pipeline DistilBertForSequenceClassification from fatmhd1995 +author: John Snow Labs +name: toxic_comment_model_toxicity_ft_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`toxic_comment_model_toxicity_ft_pipeline` is a English model originally trained by fatmhd1995. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/toxic_comment_model_toxicity_ft_pipeline_en_5.5.0_3.0_1726953469297.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/toxic_comment_model_toxicity_ft_pipeline_en_5.5.0_3.0_1726953469297.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("toxic_comment_model_toxicity_ft_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("toxic_comment_model_toxicity_ft_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|toxic_comment_model_toxicity_ft_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/fatmhd1995/toxic-comment-model-TOXICITY-FT + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-toxicity_classifier_en.md b/docs/_posts/ahmedlone127/2024-09-21-toxicity_classifier_en.md new file mode 100644 index 00000000000000..b8e5ae78ac4953 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-toxicity_classifier_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English toxicity_classifier DistilBertForSequenceClassification from richterleo +author: John Snow Labs +name: toxicity_classifier +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`toxicity_classifier` is a English model originally trained by richterleo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/toxicity_classifier_en_5.5.0_3.0_1726888596796.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/toxicity_classifier_en_5.5.0_3.0_1726888596796.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("toxicity_classifier","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("toxicity_classifier", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|toxicity_classifier| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/richterleo/toxicity_classifier \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-toxicity_classifier_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-toxicity_classifier_pipeline_en.md new file mode 100644 index 00000000000000..11d83553399cb8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-toxicity_classifier_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English toxicity_classifier_pipeline pipeline DistilBertForSequenceClassification from richterleo +author: John Snow Labs +name: toxicity_classifier_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`toxicity_classifier_pipeline` is a English model originally trained by richterleo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/toxicity_classifier_pipeline_en_5.5.0_3.0_1726888609200.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/toxicity_classifier_pipeline_en_5.5.0_3.0_1726888609200.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("toxicity_classifier_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("toxicity_classifier_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|toxicity_classifier_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/richterleo/toxicity_classifier + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-trainere_en.md b/docs/_posts/ahmedlone127/2024-09-21-trainere_en.md new file mode 100644 index 00000000000000..15892b2dfad7af --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-trainere_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English trainere DistilBertForSequenceClassification from SimoneJLaudani +author: John Snow Labs +name: trainere +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`trainere` is a English model originally trained by SimoneJLaudani. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/trainere_en_5.5.0_3.0_1726888980736.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/trainere_en_5.5.0_3.0_1726888980736.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("trainere","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("trainere", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|trainere| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|246.0 MB| + +## References + +https://huggingface.co/SimoneJLaudani/trainerE \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-trainere_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-trainere_pipeline_en.md new file mode 100644 index 00000000000000..c1614727f411a1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-trainere_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English trainere_pipeline pipeline DistilBertForSequenceClassification from SimoneJLaudani +author: John Snow Labs +name: trainere_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`trainere_pipeline` is a English model originally trained by SimoneJLaudani. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/trainere_pipeline_en_5.5.0_3.0_1726888993006.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/trainere_pipeline_en_5.5.0_3.0_1726888993006.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("trainere_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("trainere_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|trainere_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|246.0 MB| + +## References + +https://huggingface.co/SimoneJLaudani/trainerE + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-tunlangmodel_test1_10_ar.md b/docs/_posts/ahmedlone127/2024-09-21-tunlangmodel_test1_10_ar.md new file mode 100644 index 00000000000000..1efc8ca87f239e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-tunlangmodel_test1_10_ar.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Arabic tunlangmodel_test1_10 WhisperForCTC from Arbi-Houssem +author: John Snow Labs +name: tunlangmodel_test1_10 +date: 2024-09-21 +tags: [ar, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: ar +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`tunlangmodel_test1_10` is a Arabic model originally trained by Arbi-Houssem. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/tunlangmodel_test1_10_ar_5.5.0_3.0_1726950281584.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/tunlangmodel_test1_10_ar_5.5.0_3.0_1726950281584.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("tunlangmodel_test1_10","ar") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("tunlangmodel_test1_10", "ar") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|tunlangmodel_test1_10| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|ar| +|Size:|1.7 GB| + +## References + +https://huggingface.co/Arbi-Houssem/TunLangModel_test1.10 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-tunlangmodel_test1_10_pipeline_ar.md b/docs/_posts/ahmedlone127/2024-09-21-tunlangmodel_test1_10_pipeline_ar.md new file mode 100644 index 00000000000000..7182c1766d1530 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-tunlangmodel_test1_10_pipeline_ar.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Arabic tunlangmodel_test1_10_pipeline pipeline WhisperForCTC from Arbi-Houssem +author: John Snow Labs +name: tunlangmodel_test1_10_pipeline +date: 2024-09-21 +tags: [ar, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: ar +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`tunlangmodel_test1_10_pipeline` is a Arabic model originally trained by Arbi-Houssem. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/tunlangmodel_test1_10_pipeline_ar_5.5.0_3.0_1726950365600.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/tunlangmodel_test1_10_pipeline_ar_5.5.0_3.0_1726950365600.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("tunlangmodel_test1_10_pipeline", lang = "ar") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("tunlangmodel_test1_10_pipeline", lang = "ar") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|tunlangmodel_test1_10_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|ar| +|Size:|1.7 GB| + +## References + +https://huggingface.co/Arbi-Houssem/TunLangModel_test1.10 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-turkish_lyric_tonga_tonga_islands_genre_pipeline_tr.md b/docs/_posts/ahmedlone127/2024-09-21-turkish_lyric_tonga_tonga_islands_genre_pipeline_tr.md new file mode 100644 index 00000000000000..5fa9bc2feddb1b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-turkish_lyric_tonga_tonga_islands_genre_pipeline_tr.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Turkish turkish_lyric_tonga_tonga_islands_genre_pipeline pipeline BertForSequenceClassification from Veucci +author: John Snow Labs +name: turkish_lyric_tonga_tonga_islands_genre_pipeline +date: 2024-09-21 +tags: [tr, open_source, pipeline, onnx] +task: Text Classification +language: tr +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`turkish_lyric_tonga_tonga_islands_genre_pipeline` is a Turkish model originally trained by Veucci. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/turkish_lyric_tonga_tonga_islands_genre_pipeline_tr_5.5.0_3.0_1726902765867.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/turkish_lyric_tonga_tonga_islands_genre_pipeline_tr_5.5.0_3.0_1726902765867.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("turkish_lyric_tonga_tonga_islands_genre_pipeline", lang = "tr") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("turkish_lyric_tonga_tonga_islands_genre_pipeline", lang = "tr") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|turkish_lyric_tonga_tonga_islands_genre_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|tr| +|Size:|409.4 MB| + +## References + +https://huggingface.co/Veucci/turkish-lyric-to-genre + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-twitter_roberta_base_dec2020_tweet_topic_single_all_en.md b/docs/_posts/ahmedlone127/2024-09-21-twitter_roberta_base_dec2020_tweet_topic_single_all_en.md new file mode 100644 index 00000000000000..89335a9bb3c196 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-twitter_roberta_base_dec2020_tweet_topic_single_all_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English twitter_roberta_base_dec2020_tweet_topic_single_all RoBertaForSequenceClassification from cardiffnlp +author: John Snow Labs +name: twitter_roberta_base_dec2020_tweet_topic_single_all +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`twitter_roberta_base_dec2020_tweet_topic_single_all` is a English model originally trained by cardiffnlp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/twitter_roberta_base_dec2020_tweet_topic_single_all_en_5.5.0_3.0_1726900487324.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/twitter_roberta_base_dec2020_tweet_topic_single_all_en_5.5.0_3.0_1726900487324.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("twitter_roberta_base_dec2020_tweet_topic_single_all","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("twitter_roberta_base_dec2020_tweet_topic_single_all", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|twitter_roberta_base_dec2020_tweet_topic_single_all| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|468.3 MB| + +## References + +https://huggingface.co/cardiffnlp/twitter-roberta-base-dec2020-tweet-topic-single-all \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-twitter_roberta_base_dec2020_tweet_topic_single_all_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-twitter_roberta_base_dec2020_tweet_topic_single_all_pipeline_en.md new file mode 100644 index 00000000000000..047fbc6fedd08d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-twitter_roberta_base_dec2020_tweet_topic_single_all_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English twitter_roberta_base_dec2020_tweet_topic_single_all_pipeline pipeline RoBertaForSequenceClassification from cardiffnlp +author: John Snow Labs +name: twitter_roberta_base_dec2020_tweet_topic_single_all_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`twitter_roberta_base_dec2020_tweet_topic_single_all_pipeline` is a English model originally trained by cardiffnlp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/twitter_roberta_base_dec2020_tweet_topic_single_all_pipeline_en_5.5.0_3.0_1726900508641.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/twitter_roberta_base_dec2020_tweet_topic_single_all_pipeline_en_5.5.0_3.0_1726900508641.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("twitter_roberta_base_dec2020_tweet_topic_single_all_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("twitter_roberta_base_dec2020_tweet_topic_single_all_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|twitter_roberta_base_dec2020_tweet_topic_single_all_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|468.3 MB| + +## References + +https://huggingface.co/cardiffnlp/twitter-roberta-base-dec2020-tweet-topic-single-all + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-twitter_roberta_base_ner7_latest_finetuned_en.md b/docs/_posts/ahmedlone127/2024-09-21-twitter_roberta_base_ner7_latest_finetuned_en.md new file mode 100644 index 00000000000000..dcff94056ec99e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-twitter_roberta_base_ner7_latest_finetuned_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English twitter_roberta_base_ner7_latest_finetuned RoBertaForTokenClassification from alban12 +author: John Snow Labs +name: twitter_roberta_base_ner7_latest_finetuned +date: 2024-09-21 +tags: [en, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`twitter_roberta_base_ner7_latest_finetuned` is a English model originally trained by alban12. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/twitter_roberta_base_ner7_latest_finetuned_en_5.5.0_3.0_1726887426399.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/twitter_roberta_base_ner7_latest_finetuned_en_5.5.0_3.0_1726887426399.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("twitter_roberta_base_ner7_latest_finetuned","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("twitter_roberta_base_ner7_latest_finetuned", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|twitter_roberta_base_ner7_latest_finetuned| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|454.6 MB| + +## References + +https://huggingface.co/alban12/twitter-roberta-base-ner7-latest-finetuned \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-twitter_roberta_base_ner7_latest_finetuned_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-twitter_roberta_base_ner7_latest_finetuned_pipeline_en.md new file mode 100644 index 00000000000000..fead4aa70c77d0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-twitter_roberta_base_ner7_latest_finetuned_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English twitter_roberta_base_ner7_latest_finetuned_pipeline pipeline RoBertaForTokenClassification from alban12 +author: John Snow Labs +name: twitter_roberta_base_ner7_latest_finetuned_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`twitter_roberta_base_ner7_latest_finetuned_pipeline` is a English model originally trained by alban12. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/twitter_roberta_base_ner7_latest_finetuned_pipeline_en_5.5.0_3.0_1726887452783.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/twitter_roberta_base_ner7_latest_finetuned_pipeline_en_5.5.0_3.0_1726887452783.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("twitter_roberta_base_ner7_latest_finetuned_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("twitter_roberta_base_ner7_latest_finetuned_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|twitter_roberta_base_ner7_latest_finetuned_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|454.6 MB| + +## References + +https://huggingface.co/alban12/twitter-roberta-base-ner7-latest-finetuned + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-twitterfin_padding100model_en.md b/docs/_posts/ahmedlone127/2024-09-21-twitterfin_padding100model_en.md new file mode 100644 index 00000000000000..6ad8fcc54791e7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-twitterfin_padding100model_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English twitterfin_padding100model DistilBertForSequenceClassification from Realgon +author: John Snow Labs +name: twitterfin_padding100model +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`twitterfin_padding100model` is a English model originally trained by Realgon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/twitterfin_padding100model_en_5.5.0_3.0_1726888870325.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/twitterfin_padding100model_en_5.5.0_3.0_1726888870325.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("twitterfin_padding100model","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("twitterfin_padding100model", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|twitterfin_padding100model| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Realgon/twitterfin_padding100model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-twitterfin_padding100model_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-twitterfin_padding100model_pipeline_en.md new file mode 100644 index 00000000000000..b7b7bbadf21d00 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-twitterfin_padding100model_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English twitterfin_padding100model_pipeline pipeline DistilBertForSequenceClassification from Realgon +author: John Snow Labs +name: twitterfin_padding100model_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`twitterfin_padding100model_pipeline` is a English model originally trained by Realgon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/twitterfin_padding100model_pipeline_en_5.5.0_3.0_1726888882316.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/twitterfin_padding100model_pipeline_en_5.5.0_3.0_1726888882316.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("twitterfin_padding100model_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("twitterfin_padding100model_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|twitterfin_padding100model_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Realgon/twitterfin_padding100model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-unbias_roberta_ner_en.md b/docs/_posts/ahmedlone127/2024-09-21-unbias_roberta_ner_en.md new file mode 100644 index 00000000000000..ed756824aa5452 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-unbias_roberta_ner_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English unbias_roberta_ner RoBertaForTokenClassification from newsmediabias +author: John Snow Labs +name: unbias_roberta_ner +date: 2024-09-21 +tags: [en, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`unbias_roberta_ner` is a English model originally trained by newsmediabias. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/unbias_roberta_ner_en_5.5.0_3.0_1726927084819.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/unbias_roberta_ner_en_5.5.0_3.0_1726927084819.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("unbias_roberta_ner","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("unbias_roberta_ner", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|unbias_roberta_ner| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/newsmediabias/UnBIAS-RoBERTa-NER \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-unbias_roberta_ner_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-unbias_roberta_ner_pipeline_en.md new file mode 100644 index 00000000000000..3af788b2dd7667 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-unbias_roberta_ner_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English unbias_roberta_ner_pipeline pipeline RoBertaForTokenClassification from newsmediabias +author: John Snow Labs +name: unbias_roberta_ner_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`unbias_roberta_ner_pipeline` is a English model originally trained by newsmediabias. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/unbias_roberta_ner_pipeline_en_5.5.0_3.0_1726927153254.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/unbias_roberta_ner_pipeline_en_5.5.0_3.0_1726927153254.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("unbias_roberta_ner_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("unbias_roberta_ner_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|unbias_roberta_ner_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/newsmediabias/UnBIAS-RoBERTa-NER + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-unibert_roberta_2_en.md b/docs/_posts/ahmedlone127/2024-09-21-unibert_roberta_2_en.md new file mode 100644 index 00000000000000..2a1faf0ec10e40 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-unibert_roberta_2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English unibert_roberta_2 RoBertaForTokenClassification from dbala02 +author: John Snow Labs +name: unibert_roberta_2 +date: 2024-09-21 +tags: [en, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`unibert_roberta_2` is a English model originally trained by dbala02. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/unibert_roberta_2_en_5.5.0_3.0_1726926893606.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/unibert_roberta_2_en_5.5.0_3.0_1726926893606.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("unibert_roberta_2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("unibert_roberta_2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|unibert_roberta_2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|422.7 MB| + +## References + +https://huggingface.co/dbala02/uniBERT.RoBERTa.2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-unibert_roberta_2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-unibert_roberta_2_pipeline_en.md new file mode 100644 index 00000000000000..52bf89b02ebdd7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-unibert_roberta_2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English unibert_roberta_2_pipeline pipeline RoBertaForTokenClassification from dbala02 +author: John Snow Labs +name: unibert_roberta_2_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`unibert_roberta_2_pipeline` is a English model originally trained by dbala02. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/unibert_roberta_2_pipeline_en_5.5.0_3.0_1726926926026.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/unibert_roberta_2_pipeline_en_5.5.0_3.0_1726926926026.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("unibert_roberta_2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("unibert_roberta_2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|unibert_roberta_2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|422.8 MB| + +## References + +https://huggingface.co/dbala02/uniBERT.RoBERTa.2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_12_en.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_12_en.md new file mode 100644 index 00000000000000..cf55b1768cff48 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_12_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_12 WhisperForCTC from namkyeong +author: John Snow Labs +name: whisper_12 +date: 2024-09-21 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_12` is a English model originally trained by namkyeong. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_12_en_5.5.0_3.0_1726961916608.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_12_en_5.5.0_3.0_1726961916608.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_12","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_12", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_12| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/namkyeong/whisper_12 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_12_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_12_pipeline_en.md new file mode 100644 index 00000000000000..d0801ef107bd43 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_12_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_12_pipeline pipeline WhisperForCTC from namkyeong +author: John Snow Labs +name: whisper_12_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_12_pipeline` is a English model originally trained by namkyeong. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_12_pipeline_en_5.5.0_3.0_1726961998483.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_12_pipeline_en_5.5.0_3.0_1726961998483.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_12_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_12_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_12_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/namkyeong/whisper_12 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_3_namkyeong_en.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_3_namkyeong_en.md new file mode 100644 index 00000000000000..65198a3fb73612 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_3_namkyeong_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_3_namkyeong WhisperForCTC from namkyeong +author: John Snow Labs +name: whisper_3_namkyeong +date: 2024-09-21 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_3_namkyeong` is a English model originally trained by namkyeong. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_3_namkyeong_en_5.5.0_3.0_1726935611765.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_3_namkyeong_en_5.5.0_3.0_1726935611765.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_3_namkyeong","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_3_namkyeong", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_3_namkyeong| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|642.3 MB| + +## References + +https://huggingface.co/namkyeong/whisper_3 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_3_namkyeong_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_3_namkyeong_pipeline_en.md new file mode 100644 index 00000000000000..76e5693c44d280 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_3_namkyeong_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_3_namkyeong_pipeline pipeline WhisperForCTC from namkyeong +author: John Snow Labs +name: whisper_3_namkyeong_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_3_namkyeong_pipeline` is a English model originally trained by namkyeong. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_3_namkyeong_pipeline_en_5.5.0_3.0_1726935642399.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_3_namkyeong_pipeline_en_5.5.0_3.0_1726935642399.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_3_namkyeong_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_3_namkyeong_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_3_namkyeong_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|642.4 MB| + +## References + +https://huggingface.co/namkyeong/whisper_3 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_4_en.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_4_en.md new file mode 100644 index 00000000000000..2c9d9be5fe9817 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_4_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_4 WhisperForCTC from namkyeong +author: John Snow Labs +name: whisper_4 +date: 2024-09-21 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_4` is a English model originally trained by namkyeong. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_4_en_5.5.0_3.0_1726908311102.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_4_en_5.5.0_3.0_1726908311102.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_4","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_4", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_4| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|642.6 MB| + +## References + +https://huggingface.co/namkyeong/whisper_4 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_4_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_4_pipeline_en.md new file mode 100644 index 00000000000000..def1bd19b1e593 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_4_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_4_pipeline pipeline WhisperForCTC from namkyeong +author: John Snow Labs +name: whisper_4_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_4_pipeline` is a English model originally trained by namkyeong. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_4_pipeline_en_5.5.0_3.0_1726908341717.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_4_pipeline_en_5.5.0_3.0_1726908341717.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_4_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_4_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_4_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|642.6 MB| + +## References + +https://huggingface.co/namkyeong/whisper_4 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_6e_4_clean_legion_fleurs_v2_en.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_6e_4_clean_legion_fleurs_v2_en.md new file mode 100644 index 00000000000000..900982c869c0ef --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_6e_4_clean_legion_fleurs_v2_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_6e_4_clean_legion_fleurs_v2 WhisperForCTC from yusufagung29 +author: John Snow Labs +name: whisper_6e_4_clean_legion_fleurs_v2 +date: 2024-09-21 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_6e_4_clean_legion_fleurs_v2` is a English model originally trained by yusufagung29. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_6e_4_clean_legion_fleurs_v2_en_5.5.0_3.0_1726906003719.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_6e_4_clean_legion_fleurs_v2_en_5.5.0_3.0_1726906003719.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_6e_4_clean_legion_fleurs_v2","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_6e_4_clean_legion_fleurs_v2", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_6e_4_clean_legion_fleurs_v2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|387.3 MB| + +## References + +https://huggingface.co/yusufagung29/whisper_6e-4_clean_legion_fleurs_v2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_6e_4_clean_legion_fleurs_v2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_6e_4_clean_legion_fleurs_v2_pipeline_en.md new file mode 100644 index 00000000000000..e76c89a5fd950e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_6e_4_clean_legion_fleurs_v2_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_6e_4_clean_legion_fleurs_v2_pipeline pipeline WhisperForCTC from yusufagung29 +author: John Snow Labs +name: whisper_6e_4_clean_legion_fleurs_v2_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_6e_4_clean_legion_fleurs_v2_pipeline` is a English model originally trained by yusufagung29. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_6e_4_clean_legion_fleurs_v2_pipeline_en_5.5.0_3.0_1726906024535.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_6e_4_clean_legion_fleurs_v2_pipeline_en_5.5.0_3.0_1726906024535.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_6e_4_clean_legion_fleurs_v2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_6e_4_clean_legion_fleurs_v2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_6e_4_clean_legion_fleurs_v2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|387.3 MB| + +## References + +https://huggingface.co/yusufagung29/whisper_6e-4_clean_legion_fleurs_v2 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_base_catalan_en.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_base_catalan_en.md new file mode 100644 index 00000000000000..059ff1711b80e6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_base_catalan_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_base_catalan WhisperForCTC from softcatala +author: John Snow Labs +name: whisper_base_catalan +date: 2024-09-21 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_base_catalan` is a English model originally trained by softcatala. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_base_catalan_en_5.5.0_3.0_1726878181232.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_base_catalan_en_5.5.0_3.0_1726878181232.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_base_catalan","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_base_catalan", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_base_catalan| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|641.5 MB| + +## References + +https://huggingface.co/softcatala/whisper-base-ca \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_base_catalan_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_base_catalan_pipeline_en.md new file mode 100644 index 00000000000000..c6c42f1060ca0c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_base_catalan_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_base_catalan_pipeline pipeline WhisperForCTC from softcatala +author: John Snow Labs +name: whisper_base_catalan_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_base_catalan_pipeline` is a English model originally trained by softcatala. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_base_catalan_pipeline_en_5.5.0_3.0_1726878214512.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_base_catalan_pipeline_en_5.5.0_3.0_1726878214512.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_base_catalan_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_base_catalan_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_base_catalan_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|641.5 MB| + +## References + +https://huggingface.co/softcatala/whisper-base-ca + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_base_chinese_cn_cv9_en.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_base_chinese_cn_cv9_en.md new file mode 100644 index 00000000000000..8fbb4005a958a7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_base_chinese_cn_cv9_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_base_chinese_cn_cv9 WhisperForCTC from Hydrodynamical +author: John Snow Labs +name: whisper_base_chinese_cn_cv9 +date: 2024-09-21 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_base_chinese_cn_cv9` is a English model originally trained by Hydrodynamical. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_base_chinese_cn_cv9_en_5.5.0_3.0_1726891248883.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_base_chinese_cn_cv9_en_5.5.0_3.0_1726891248883.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_base_chinese_cn_cv9","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_base_chinese_cn_cv9", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_base_chinese_cn_cv9| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|643.6 MB| + +## References + +https://huggingface.co/Hydrodynamical/whisper-base-zh-CN-cv9 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_base_chinese_cn_cv9_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_base_chinese_cn_cv9_pipeline_en.md new file mode 100644 index 00000000000000..28b13e46e8d7d6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_base_chinese_cn_cv9_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_base_chinese_cn_cv9_pipeline pipeline WhisperForCTC from Hydrodynamical +author: John Snow Labs +name: whisper_base_chinese_cn_cv9_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_base_chinese_cn_cv9_pipeline` is a English model originally trained by Hydrodynamical. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_base_chinese_cn_cv9_pipeline_en_5.5.0_3.0_1726891280738.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_base_chinese_cn_cv9_pipeline_en_5.5.0_3.0_1726891280738.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_base_chinese_cn_cv9_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_base_chinese_cn_cv9_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_base_chinese_cn_cv9_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|643.6 MB| + +## References + +https://huggingface.co/Hydrodynamical/whisper-base-zh-CN-cv9 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_base_common_voice_16_portuguese_en.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_base_common_voice_16_portuguese_en.md new file mode 100644 index 00000000000000..55fc0129651d09 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_base_common_voice_16_portuguese_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_base_common_voice_16_portuguese WhisperForCTC from thiagobarbosa +author: John Snow Labs +name: whisper_base_common_voice_16_portuguese +date: 2024-09-21 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_base_common_voice_16_portuguese` is a English model originally trained by thiagobarbosa. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_base_common_voice_16_portuguese_en_5.5.0_3.0_1726911263889.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_base_common_voice_16_portuguese_en_5.5.0_3.0_1726911263889.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_base_common_voice_16_portuguese","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_base_common_voice_16_portuguese", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_base_common_voice_16_portuguese| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|641.9 MB| + +## References + +https://huggingface.co/thiagobarbosa/whisper-base-common-voice-16-pt \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_base_common_voice_16_portuguese_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_base_common_voice_16_portuguese_pipeline_en.md new file mode 100644 index 00000000000000..4d4eddc42b3255 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_base_common_voice_16_portuguese_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_base_common_voice_16_portuguese_pipeline pipeline WhisperForCTC from thiagobarbosa +author: John Snow Labs +name: whisper_base_common_voice_16_portuguese_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_base_common_voice_16_portuguese_pipeline` is a English model originally trained by thiagobarbosa. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_base_common_voice_16_portuguese_pipeline_en_5.5.0_3.0_1726911300347.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_base_common_voice_16_portuguese_pipeline_en_5.5.0_3.0_1726911300347.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_base_common_voice_16_portuguese_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_base_common_voice_16_portuguese_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_base_common_voice_16_portuguese_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|641.9 MB| + +## References + +https://huggingface.co/thiagobarbosa/whisper-base-common-voice-16-pt + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_base_full_data_v2_en.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_base_full_data_v2_en.md new file mode 100644 index 00000000000000..4cbe2f36649fbf --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_base_full_data_v2_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_base_full_data_v2 WhisperForCTC from pphuc25 +author: John Snow Labs +name: whisper_base_full_data_v2 +date: 2024-09-21 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_base_full_data_v2` is a English model originally trained by pphuc25. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_base_full_data_v2_en_5.5.0_3.0_1726960098733.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_base_full_data_v2_en_5.5.0_3.0_1726960098733.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_base_full_data_v2","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_base_full_data_v2", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_base_full_data_v2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|642.1 MB| + +## References + +https://huggingface.co/pphuc25/whisper-base-full-data-v2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_base_full_data_v2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_base_full_data_v2_pipeline_en.md new file mode 100644 index 00000000000000..ef9d702a33cce4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_base_full_data_v2_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_base_full_data_v2_pipeline pipeline WhisperForCTC from pphuc25 +author: John Snow Labs +name: whisper_base_full_data_v2_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_base_full_data_v2_pipeline` is a English model originally trained by pphuc25. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_base_full_data_v2_pipeline_en_5.5.0_3.0_1726960129474.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_base_full_data_v2_pipeline_en_5.5.0_3.0_1726960129474.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_base_full_data_v2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_base_full_data_v2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_base_full_data_v2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|642.1 MB| + +## References + +https://huggingface.co/pphuc25/whisper-base-full-data-v2 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_base_german_cv15_v1_de.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_base_german_cv15_v1_de.md new file mode 100644 index 00000000000000..11b0e71b3a4c82 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_base_german_cv15_v1_de.md @@ -0,0 +1,84 @@ +--- +layout: model +title: German whisper_base_german_cv15_v1 WhisperForCTC from flozi00 +author: John Snow Labs +name: whisper_base_german_cv15_v1 +date: 2024-09-21 +tags: [de, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: de +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_base_german_cv15_v1` is a German model originally trained by flozi00. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_base_german_cv15_v1_de_5.5.0_3.0_1726907361355.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_base_german_cv15_v1_de_5.5.0_3.0_1726907361355.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_base_german_cv15_v1","de") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_base_german_cv15_v1", "de") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_base_german_cv15_v1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|de| +|Size:|398.3 MB| + +## References + +https://huggingface.co/flozi00/whisper-base-german-cv15-v1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_base_german_cv15_v1_pipeline_de.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_base_german_cv15_v1_pipeline_de.md new file mode 100644 index 00000000000000..8ca58221660be9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_base_german_cv15_v1_pipeline_de.md @@ -0,0 +1,69 @@ +--- +layout: model +title: German whisper_base_german_cv15_v1_pipeline pipeline WhisperForCTC from flozi00 +author: John Snow Labs +name: whisper_base_german_cv15_v1_pipeline +date: 2024-09-21 +tags: [de, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: de +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_base_german_cv15_v1_pipeline` is a German model originally trained by flozi00. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_base_german_cv15_v1_pipeline_de_5.5.0_3.0_1726907468424.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_base_german_cv15_v1_pipeline_de_5.5.0_3.0_1726907468424.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_base_german_cv15_v1_pipeline", lang = "de") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_base_german_cv15_v1_pipeline", lang = "de") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_base_german_cv15_v1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|de| +|Size:|398.4 MB| + +## References + +https://huggingface.co/flozi00/whisper-base-german-cv15-v1 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_base_hungarian_cleaned_hu.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_base_hungarian_cleaned_hu.md new file mode 100644 index 00000000000000..dd8c572869f039 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_base_hungarian_cleaned_hu.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Hungarian whisper_base_hungarian_cleaned WhisperForCTC from Hungarians +author: John Snow Labs +name: whisper_base_hungarian_cleaned +date: 2024-09-21 +tags: [hu, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: hu +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_base_hungarian_cleaned` is a Hungarian model originally trained by Hungarians. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_base_hungarian_cleaned_hu_5.5.0_3.0_1726893011171.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_base_hungarian_cleaned_hu_5.5.0_3.0_1726893011171.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_base_hungarian_cleaned","hu") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_base_hungarian_cleaned", "hu") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_base_hungarian_cleaned| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|hu| +|Size:|640.6 MB| + +## References + +https://huggingface.co/Hungarians/whisper-base-hu-cleaned \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_base_hungarian_cleaned_pipeline_hu.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_base_hungarian_cleaned_pipeline_hu.md new file mode 100644 index 00000000000000..89c7927107605c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_base_hungarian_cleaned_pipeline_hu.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Hungarian whisper_base_hungarian_cleaned_pipeline pipeline WhisperForCTC from Hungarians +author: John Snow Labs +name: whisper_base_hungarian_cleaned_pipeline +date: 2024-09-21 +tags: [hu, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: hu +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_base_hungarian_cleaned_pipeline` is a Hungarian model originally trained by Hungarians. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_base_hungarian_cleaned_pipeline_hu_5.5.0_3.0_1726893043743.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_base_hungarian_cleaned_pipeline_hu_5.5.0_3.0_1726893043743.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_base_hungarian_cleaned_pipeline", lang = "hu") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_base_hungarian_cleaned_pipeline", lang = "hu") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_base_hungarian_cleaned_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|hu| +|Size:|640.6 MB| + +## References + +https://huggingface.co/Hungarians/whisper-base-hu-cleaned + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_base_korean_en.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_base_korean_en.md new file mode 100644 index 00000000000000..b520141cd69933 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_base_korean_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_base_korean WhisperForCTC from SungBeom +author: John Snow Labs +name: whisper_base_korean +date: 2024-09-21 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_base_korean` is a English model originally trained by SungBeom. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_base_korean_en_5.5.0_3.0_1726894831556.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_base_korean_en_5.5.0_3.0_1726894831556.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_base_korean","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_base_korean", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_base_korean| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|642.2 MB| + +## References + +https://huggingface.co/SungBeom/whisper-base-ko \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_base_korean_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_base_korean_pipeline_en.md new file mode 100644 index 00000000000000..fdd522a9a2c1fc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_base_korean_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_base_korean_pipeline pipeline WhisperForCTC from SungBeom +author: John Snow Labs +name: whisper_base_korean_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_base_korean_pipeline` is a English model originally trained by SungBeom. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_base_korean_pipeline_en_5.5.0_3.0_1726894864246.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_base_korean_pipeline_en_5.5.0_3.0_1726894864246.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_base_korean_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_base_korean_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_base_korean_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|642.2 MB| + +## References + +https://huggingface.co/SungBeom/whisper-base-ko + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_base_lyric_transcription_en.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_base_lyric_transcription_en.md new file mode 100644 index 00000000000000..07195a530713c3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_base_lyric_transcription_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_base_lyric_transcription WhisperForCTC from peterjwms +author: John Snow Labs +name: whisper_base_lyric_transcription +date: 2024-09-21 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_base_lyric_transcription` is a English model originally trained by peterjwms. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_base_lyric_transcription_en_5.5.0_3.0_1726893745216.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_base_lyric_transcription_en_5.5.0_3.0_1726893745216.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_base_lyric_transcription","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_base_lyric_transcription", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_base_lyric_transcription| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|646.7 MB| + +## References + +https://huggingface.co/peterjwms/whisper-base-lyric-transcription \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_base_lyric_transcription_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_base_lyric_transcription_pipeline_en.md new file mode 100644 index 00000000000000..f9e539105d552c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_base_lyric_transcription_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_base_lyric_transcription_pipeline pipeline WhisperForCTC from peterjwms +author: John Snow Labs +name: whisper_base_lyric_transcription_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_base_lyric_transcription_pipeline` is a English model originally trained by peterjwms. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_base_lyric_transcription_pipeline_en_5.5.0_3.0_1726893779769.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_base_lyric_transcription_pipeline_en_5.5.0_3.0_1726893779769.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_base_lyric_transcription_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_base_lyric_transcription_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_base_lyric_transcription_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|646.7 MB| + +## References + +https://huggingface.co/peterjwms/whisper-base-lyric-transcription + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_base_malayalam_redw0rm_en.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_base_malayalam_redw0rm_en.md new file mode 100644 index 00000000000000..cb2d1f048df103 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_base_malayalam_redw0rm_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_base_malayalam_redw0rm WhisperForCTC from redw0rm +author: John Snow Labs +name: whisper_base_malayalam_redw0rm +date: 2024-09-21 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_base_malayalam_redw0rm` is a English model originally trained by redw0rm. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_base_malayalam_redw0rm_en_5.5.0_3.0_1726950627435.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_base_malayalam_redw0rm_en_5.5.0_3.0_1726950627435.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_base_malayalam_redw0rm","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_base_malayalam_redw0rm", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_base_malayalam_redw0rm| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|644.1 MB| + +## References + +https://huggingface.co/redw0rm/whisper-base-ml \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_base_superb_3_epochs_en.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_base_superb_3_epochs_en.md new file mode 100644 index 00000000000000..4deedd28ddbb3e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_base_superb_3_epochs_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_base_superb_3_epochs WhisperForCTC from deepnet +author: John Snow Labs +name: whisper_base_superb_3_epochs +date: 2024-09-21 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_base_superb_3_epochs` is a English model originally trained by deepnet. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_base_superb_3_epochs_en_5.5.0_3.0_1726910023443.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_base_superb_3_epochs_en_5.5.0_3.0_1726910023443.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_base_superb_3_epochs","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_base_superb_3_epochs", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_base_superb_3_epochs| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|643.8 MB| + +## References + +https://huggingface.co/deepnet/whisper-base-Superb-3-Epochs \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_base_superb_3_epochs_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_base_superb_3_epochs_pipeline_en.md new file mode 100644 index 00000000000000..c615019e5e7bc9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_base_superb_3_epochs_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_base_superb_3_epochs_pipeline pipeline WhisperForCTC from deepnet +author: John Snow Labs +name: whisper_base_superb_3_epochs_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_base_superb_3_epochs_pipeline` is a English model originally trained by deepnet. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_base_superb_3_epochs_pipeline_en_5.5.0_3.0_1726910057441.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_base_superb_3_epochs_pipeline_en_5.5.0_3.0_1726910057441.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_base_superb_3_epochs_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_base_superb_3_epochs_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_base_superb_3_epochs_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|643.8 MB| + +## References + +https://huggingface.co/deepnet/whisper-base-Superb-3-Epochs + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_base_tagalog_1_en.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_base_tagalog_1_en.md new file mode 100644 index 00000000000000..2160aa58e14239 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_base_tagalog_1_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_base_tagalog_1 WhisperForCTC from arun100 +author: John Snow Labs +name: whisper_base_tagalog_1 +date: 2024-09-21 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_base_tagalog_1` is a English model originally trained by arun100. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_base_tagalog_1_en_5.5.0_3.0_1726909327492.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_base_tagalog_1_en_5.5.0_3.0_1726909327492.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_base_tagalog_1","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_base_tagalog_1", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_base_tagalog_1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|642.4 MB| + +## References + +https://huggingface.co/arun100/whisper-base-tl-1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_base_tagalog_1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_base_tagalog_1_pipeline_en.md new file mode 100644 index 00000000000000..4145bae0070b49 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_base_tagalog_1_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_base_tagalog_1_pipeline pipeline WhisperForCTC from arun100 +author: John Snow Labs +name: whisper_base_tagalog_1_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_base_tagalog_1_pipeline` is a English model originally trained by arun100. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_base_tagalog_1_pipeline_en_5.5.0_3.0_1726909360036.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_base_tagalog_1_pipeline_en_5.5.0_3.0_1726909360036.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_base_tagalog_1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_base_tagalog_1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_base_tagalog_1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|642.4 MB| + +## References + +https://huggingface.co/arun100/whisper-base-tl-1 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_calls_small_en.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_calls_small_en.md new file mode 100644 index 00000000000000..47361bc43851bf --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_calls_small_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_calls_small WhisperForCTC from SteffenSeiffarth +author: John Snow Labs +name: whisper_calls_small +date: 2024-09-21 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_calls_small` is a English model originally trained by SteffenSeiffarth. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_calls_small_en_5.5.0_3.0_1726908549156.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_calls_small_en_5.5.0_3.0_1726908549156.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_calls_small","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_calls_small", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_calls_small| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/SteffenSeiffarth/whisper-calls-small \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_calls_small_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_calls_small_pipeline_en.md new file mode 100644 index 00000000000000..858dcefbb7236c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_calls_small_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_calls_small_pipeline pipeline WhisperForCTC from SteffenSeiffarth +author: John Snow Labs +name: whisper_calls_small_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_calls_small_pipeline` is a English model originally trained by SteffenSeiffarth. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_calls_small_pipeline_en_5.5.0_3.0_1726908632304.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_calls_small_pipeline_en_5.5.0_3.0_1726908632304.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_calls_small_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_calls_small_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_calls_small_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/SteffenSeiffarth/whisper-calls-small + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_clean_3_en.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_clean_3_en.md new file mode 100644 index 00000000000000..7108b970b0fe60 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_clean_3_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_clean_3 WhisperForCTC from lyhourt +author: John Snow Labs +name: whisper_clean_3 +date: 2024-09-21 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_clean_3` is a English model originally trained by lyhourt. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_clean_3_en_5.5.0_3.0_1726960961311.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_clean_3_en_5.5.0_3.0_1726960961311.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_clean_3","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_clean_3", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_clean_3| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/lyhourt/whisper-clean_3 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_clean_3_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_clean_3_pipeline_en.md new file mode 100644 index 00000000000000..d9315a0ba220d2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_clean_3_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_clean_3_pipeline pipeline WhisperForCTC from lyhourt +author: John Snow Labs +name: whisper_clean_3_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_clean_3_pipeline` is a English model originally trained by lyhourt. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_clean_3_pipeline_en_5.5.0_3.0_1726961043791.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_clean_3_pipeline_en_5.5.0_3.0_1726961043791.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_clean_3_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_clean_3_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_clean_3_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/lyhourt/whisper-clean_3 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_common_voice_small_english_en.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_common_voice_small_english_en.md new file mode 100644 index 00000000000000..f4fd81fbb5955d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_common_voice_small_english_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_common_voice_small_english WhisperForCTC from pnandhini +author: John Snow Labs +name: whisper_common_voice_small_english +date: 2024-09-21 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_common_voice_small_english` is a English model originally trained by pnandhini. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_common_voice_small_english_en_5.5.0_3.0_1726950968572.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_common_voice_small_english_en_5.5.0_3.0_1726950968572.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_common_voice_small_english","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_common_voice_small_english", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_common_voice_small_english| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/pnandhini/whisper_common_voice_small_en \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_common_voice_small_english_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_common_voice_small_english_pipeline_en.md new file mode 100644 index 00000000000000..518ebb0f46b6c0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_common_voice_small_english_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_common_voice_small_english_pipeline pipeline WhisperForCTC from pnandhini +author: John Snow Labs +name: whisper_common_voice_small_english_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_common_voice_small_english_pipeline` is a English model originally trained by pnandhini. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_common_voice_small_english_pipeline_en_5.5.0_3.0_1726951052089.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_common_voice_small_english_pipeline_en_5.5.0_3.0_1726951052089.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_common_voice_small_english_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_common_voice_small_english_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_common_voice_small_english_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/pnandhini/whisper_common_voice_small_en + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_dpv_finetuned_with_augmentation_lower_lr_en.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_dpv_finetuned_with_augmentation_lower_lr_en.md new file mode 100644 index 00000000000000..7814cadbca115a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_dpv_finetuned_with_augmentation_lower_lr_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_dpv_finetuned_with_augmentation_lower_lr WhisperForCTC from aherzberg +author: John Snow Labs +name: whisper_dpv_finetuned_with_augmentation_lower_lr +date: 2024-09-21 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_dpv_finetuned_with_augmentation_lower_lr` is a English model originally trained by aherzberg. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_dpv_finetuned_with_augmentation_lower_lr_en_5.5.0_3.0_1726910687315.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_dpv_finetuned_with_augmentation_lower_lr_en_5.5.0_3.0_1726910687315.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_dpv_finetuned_with_augmentation_lower_lr","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_dpv_finetuned_with_augmentation_lower_lr", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_dpv_finetuned_with_augmentation_lower_lr| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|4.8 GB| + +## References + +https://huggingface.co/aherzberg/whisper-dpv-finetuned-WITH-AUGMENTATION-LOWER-LR \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_fine_tuned_base_company_earnings_call_v0_en.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_fine_tuned_base_company_earnings_call_v0_en.md new file mode 100644 index 00000000000000..fb8cca3d5dbae1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_fine_tuned_base_company_earnings_call_v0_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_fine_tuned_base_company_earnings_call_v0 WhisperForCTC from MasatoShima1618 +author: John Snow Labs +name: whisper_fine_tuned_base_company_earnings_call_v0 +date: 2024-09-21 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_fine_tuned_base_company_earnings_call_v0` is a English model originally trained by MasatoShima1618. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_fine_tuned_base_company_earnings_call_v0_en_5.5.0_3.0_1726962066021.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_fine_tuned_base_company_earnings_call_v0_en_5.5.0_3.0_1726962066021.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_fine_tuned_base_company_earnings_call_v0","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_fine_tuned_base_company_earnings_call_v0", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_fine_tuned_base_company_earnings_call_v0| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|642.5 MB| + +## References + +https://huggingface.co/MasatoShima1618/Whisper-fine-tuned-base-company-earnings-call-v0 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_fine_tuned_base_company_earnings_call_v0_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_fine_tuned_base_company_earnings_call_v0_pipeline_en.md new file mode 100644 index 00000000000000..cefeb8d192dd45 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_fine_tuned_base_company_earnings_call_v0_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_fine_tuned_base_company_earnings_call_v0_pipeline pipeline WhisperForCTC from MasatoShima1618 +author: John Snow Labs +name: whisper_fine_tuned_base_company_earnings_call_v0_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_fine_tuned_base_company_earnings_call_v0_pipeline` is a English model originally trained by MasatoShima1618. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_fine_tuned_base_company_earnings_call_v0_pipeline_en_5.5.0_3.0_1726962096819.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_fine_tuned_base_company_earnings_call_v0_pipeline_en_5.5.0_3.0_1726962096819.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_fine_tuned_base_company_earnings_call_v0_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_fine_tuned_base_company_earnings_call_v0_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_fine_tuned_base_company_earnings_call_v0_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|642.5 MB| + +## References + +https://huggingface.co/MasatoShima1618/Whisper-fine-tuned-base-company-earnings-call-v0 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_finetuning_phoneme_transcription_g2p_large_dataset_space_seperated_phonemes_en.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_finetuning_phoneme_transcription_g2p_large_dataset_space_seperated_phonemes_en.md new file mode 100644 index 00000000000000..5bd8b835247dd2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_finetuning_phoneme_transcription_g2p_large_dataset_space_seperated_phonemes_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_finetuning_phoneme_transcription_g2p_large_dataset_space_seperated_phonemes WhisperForCTC from dg96 +author: John Snow Labs +name: whisper_finetuning_phoneme_transcription_g2p_large_dataset_space_seperated_phonemes +date: 2024-09-21 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_finetuning_phoneme_transcription_g2p_large_dataset_space_seperated_phonemes` is a English model originally trained by dg96. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_finetuning_phoneme_transcription_g2p_large_dataset_space_seperated_phonemes_en_5.5.0_3.0_1726938683056.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_finetuning_phoneme_transcription_g2p_large_dataset_space_seperated_phonemes_en_5.5.0_3.0_1726938683056.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_finetuning_phoneme_transcription_g2p_large_dataset_space_seperated_phonemes","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_finetuning_phoneme_transcription_g2p_large_dataset_space_seperated_phonemes", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_finetuning_phoneme_transcription_g2p_large_dataset_space_seperated_phonemes| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|643.7 MB| + +## References + +https://huggingface.co/dg96/whisper-finetuning-phoneme-transcription-g2p-large-dataset-space-seperated-phonemes \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_finetuning_phoneme_transcription_g2p_large_dataset_space_seperated_phonemes_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_finetuning_phoneme_transcription_g2p_large_dataset_space_seperated_phonemes_pipeline_en.md new file mode 100644 index 00000000000000..c405c1a581ec6f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_finetuning_phoneme_transcription_g2p_large_dataset_space_seperated_phonemes_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_finetuning_phoneme_transcription_g2p_large_dataset_space_seperated_phonemes_pipeline pipeline WhisperForCTC from dg96 +author: John Snow Labs +name: whisper_finetuning_phoneme_transcription_g2p_large_dataset_space_seperated_phonemes_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_finetuning_phoneme_transcription_g2p_large_dataset_space_seperated_phonemes_pipeline` is a English model originally trained by dg96. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_finetuning_phoneme_transcription_g2p_large_dataset_space_seperated_phonemes_pipeline_en_5.5.0_3.0_1726938713782.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_finetuning_phoneme_transcription_g2p_large_dataset_space_seperated_phonemes_pipeline_en_5.5.0_3.0_1726938713782.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_finetuning_phoneme_transcription_g2p_large_dataset_space_seperated_phonemes_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_finetuning_phoneme_transcription_g2p_large_dataset_space_seperated_phonemes_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_finetuning_phoneme_transcription_g2p_large_dataset_space_seperated_phonemes_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|643.7 MB| + +## References + +https://huggingface.co/dg96/whisper-finetuning-phoneme-transcription-g2p-large-dataset-space-seperated-phonemes + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_hungarian_small_augmented_hu.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_hungarian_small_augmented_hu.md new file mode 100644 index 00000000000000..92f86dc49f4630 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_hungarian_small_augmented_hu.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Hungarian whisper_hungarian_small_augmented WhisperForCTC from ALM +author: John Snow Labs +name: whisper_hungarian_small_augmented +date: 2024-09-21 +tags: [hu, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: hu +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_hungarian_small_augmented` is a Hungarian model originally trained by ALM. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_hungarian_small_augmented_hu_5.5.0_3.0_1726891217968.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_hungarian_small_augmented_hu_5.5.0_3.0_1726891217968.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_hungarian_small_augmented","hu") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_hungarian_small_augmented", "hu") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_hungarian_small_augmented| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|hu| +|Size:|1.7 GB| + +## References + +https://huggingface.co/ALM/whisper-hu-small-augmented \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_hungarian_small_augmented_pipeline_hu.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_hungarian_small_augmented_pipeline_hu.md new file mode 100644 index 00000000000000..fb698739cf0af1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_hungarian_small_augmented_pipeline_hu.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Hungarian whisper_hungarian_small_augmented_pipeline pipeline WhisperForCTC from ALM +author: John Snow Labs +name: whisper_hungarian_small_augmented_pipeline +date: 2024-09-21 +tags: [hu, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: hu +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_hungarian_small_augmented_pipeline` is a Hungarian model originally trained by ALM. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_hungarian_small_augmented_pipeline_hu_5.5.0_3.0_1726891298005.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_hungarian_small_augmented_pipeline_hu_5.5.0_3.0_1726891298005.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_hungarian_small_augmented_pipeline", lang = "hu") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_hungarian_small_augmented_pipeline", lang = "hu") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_hungarian_small_augmented_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|hu| +|Size:|1.7 GB| + +## References + +https://huggingface.co/ALM/whisper-hu-small-augmented + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_md_greek_modern_intlv_xs_el.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_md_greek_modern_intlv_xs_el.md new file mode 100644 index 00000000000000..d8a02cbd7d409e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_md_greek_modern_intlv_xs_el.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Modern Greek (1453-) whisper_md_greek_modern_intlv_xs WhisperForCTC from farsipal +author: John Snow Labs +name: whisper_md_greek_modern_intlv_xs +date: 2024-09-21 +tags: [el, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: el +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_md_greek_modern_intlv_xs` is a Modern Greek (1453-) model originally trained by farsipal. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_md_greek_modern_intlv_xs_el_5.5.0_3.0_1726962187330.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_md_greek_modern_intlv_xs_el_5.5.0_3.0_1726962187330.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_md_greek_modern_intlv_xs","el") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_md_greek_modern_intlv_xs", "el") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_md_greek_modern_intlv_xs| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|el| +|Size:|4.8 GB| + +## References + +https://huggingface.co/farsipal/whisper-md-el-intlv-xs \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_medium_english_tonga_tonga_islands_myst_pf_en.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_medium_english_tonga_tonga_islands_myst_pf_en.md new file mode 100644 index 00000000000000..568f02084a4ec4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_medium_english_tonga_tonga_islands_myst_pf_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_medium_english_tonga_tonga_islands_myst_pf WhisperForCTC from rishabhjain16 +author: John Snow Labs +name: whisper_medium_english_tonga_tonga_islands_myst_pf +date: 2024-09-21 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_medium_english_tonga_tonga_islands_myst_pf` is a English model originally trained by rishabhjain16. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_medium_english_tonga_tonga_islands_myst_pf_en_5.5.0_3.0_1726950255043.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_medium_english_tonga_tonga_islands_myst_pf_en_5.5.0_3.0_1726950255043.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_medium_english_tonga_tonga_islands_myst_pf","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_medium_english_tonga_tonga_islands_myst_pf", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_medium_english_tonga_tonga_islands_myst_pf| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|4.8 GB| + +## References + +https://huggingface.co/rishabhjain16/whisper_medium_en_to_myst_pf \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_medium_hindi_shripadbhat_hi.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_medium_hindi_shripadbhat_hi.md new file mode 100644 index 00000000000000..f458862fc255cc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_medium_hindi_shripadbhat_hi.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Hindi whisper_medium_hindi_shripadbhat WhisperForCTC from shripadbhat +author: John Snow Labs +name: whisper_medium_hindi_shripadbhat +date: 2024-09-21 +tags: [hi, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: hi +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_medium_hindi_shripadbhat` is a Hindi model originally trained by shripadbhat. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_medium_hindi_shripadbhat_hi_5.5.0_3.0_1726907822555.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_medium_hindi_shripadbhat_hi_5.5.0_3.0_1726907822555.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_medium_hindi_shripadbhat","hi") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_medium_hindi_shripadbhat", "hi") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_medium_hindi_shripadbhat| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|hi| +|Size:|4.8 GB| + +## References + +https://huggingface.co/shripadbhat/whisper-medium-hi \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_medium_hindi_shripadbhat_pipeline_hi.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_medium_hindi_shripadbhat_pipeline_hi.md new file mode 100644 index 00000000000000..4c4c557a160be7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_medium_hindi_shripadbhat_pipeline_hi.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Hindi whisper_medium_hindi_shripadbhat_pipeline pipeline WhisperForCTC from shripadbhat +author: John Snow Labs +name: whisper_medium_hindi_shripadbhat_pipeline +date: 2024-09-21 +tags: [hi, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: hi +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_medium_hindi_shripadbhat_pipeline` is a Hindi model originally trained by shripadbhat. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_medium_hindi_shripadbhat_pipeline_hi_5.5.0_3.0_1726908022139.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_medium_hindi_shripadbhat_pipeline_hi_5.5.0_3.0_1726908022139.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_medium_hindi_shripadbhat_pipeline", lang = "hi") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_medium_hindi_shripadbhat_pipeline", lang = "hi") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_medium_hindi_shripadbhat_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|hi| +|Size:|4.8 GB| + +## References + +https://huggingface.co/shripadbhat/whisper-medium-hi + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_small_afrikaans_za_abhinay45_en.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_afrikaans_za_abhinay45_en.md new file mode 100644 index 00000000000000..a56fec95c734f3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_afrikaans_za_abhinay45_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_small_afrikaans_za_abhinay45 WhisperForCTC from Abhinay45 +author: John Snow Labs +name: whisper_small_afrikaans_za_abhinay45 +date: 2024-09-21 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_afrikaans_za_abhinay45` is a English model originally trained by Abhinay45. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_afrikaans_za_abhinay45_en_5.5.0_3.0_1726949271124.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_afrikaans_za_abhinay45_en_5.5.0_3.0_1726949271124.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_afrikaans_za_abhinay45","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_afrikaans_za_abhinay45", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_afrikaans_za_abhinay45| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/Abhinay45/whisper-small-af-ZA \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_small_afrikaans_za_abhinay45_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_afrikaans_za_abhinay45_pipeline_en.md new file mode 100644 index 00000000000000..94bb13e314310a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_afrikaans_za_abhinay45_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_small_afrikaans_za_abhinay45_pipeline pipeline WhisperForCTC from Abhinay45 +author: John Snow Labs +name: whisper_small_afrikaans_za_abhinay45_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_afrikaans_za_abhinay45_pipeline` is a English model originally trained by Abhinay45. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_afrikaans_za_abhinay45_pipeline_en_5.5.0_3.0_1726949358786.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_afrikaans_za_abhinay45_pipeline_en_5.5.0_3.0_1726949358786.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_afrikaans_za_abhinay45_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_afrikaans_za_abhinay45_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_afrikaans_za_abhinay45_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/Abhinay45/whisper-small-af-ZA + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_small_arabic_huzaifatahir_ar.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_arabic_huzaifatahir_ar.md new file mode 100644 index 00000000000000..c5d6df6b3a748c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_arabic_huzaifatahir_ar.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Arabic whisper_small_arabic_huzaifatahir WhisperForCTC from Huzaifatahir +author: John Snow Labs +name: whisper_small_arabic_huzaifatahir +date: 2024-09-21 +tags: [ar, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: ar +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_arabic_huzaifatahir` is a Arabic model originally trained by Huzaifatahir. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_arabic_huzaifatahir_ar_5.5.0_3.0_1726893614773.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_arabic_huzaifatahir_ar_5.5.0_3.0_1726893614773.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_arabic_huzaifatahir","ar") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_arabic_huzaifatahir", "ar") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_arabic_huzaifatahir| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|ar| +|Size:|1.7 GB| + +## References + +https://huggingface.co/Huzaifatahir/whisper-small-ar \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_small_arabic_huzaifatahir_pipeline_ar.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_arabic_huzaifatahir_pipeline_ar.md new file mode 100644 index 00000000000000..d5de735a37f70d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_arabic_huzaifatahir_pipeline_ar.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Arabic whisper_small_arabic_huzaifatahir_pipeline pipeline WhisperForCTC from Huzaifatahir +author: John Snow Labs +name: whisper_small_arabic_huzaifatahir_pipeline +date: 2024-09-21 +tags: [ar, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: ar +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_arabic_huzaifatahir_pipeline` is a Arabic model originally trained by Huzaifatahir. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_arabic_huzaifatahir_pipeline_ar_5.5.0_3.0_1726893699736.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_arabic_huzaifatahir_pipeline_ar_5.5.0_3.0_1726893699736.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_arabic_huzaifatahir_pipeline", lang = "ar") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_arabic_huzaifatahir_pipeline", lang = "ar") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_arabic_huzaifatahir_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|ar| +|Size:|1.7 GB| + +## References + +https://huggingface.co/Huzaifatahir/whisper-small-ar + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_small_arabic_test_draligomaa_dataset_ar.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_arabic_test_draligomaa_dataset_ar.md new file mode 100644 index 00000000000000..1e2f91cd5564fc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_arabic_test_draligomaa_dataset_ar.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Arabic whisper_small_arabic_test_draligomaa_dataset WhisperForCTC from DrAliGomaa +author: John Snow Labs +name: whisper_small_arabic_test_draligomaa_dataset +date: 2024-09-21 +tags: [ar, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: ar +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_arabic_test_draligomaa_dataset` is a Arabic model originally trained by DrAliGomaa. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_arabic_test_draligomaa_dataset_ar_5.5.0_3.0_1726936405461.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_arabic_test_draligomaa_dataset_ar_5.5.0_3.0_1726936405461.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_arabic_test_draligomaa_dataset","ar") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_arabic_test_draligomaa_dataset", "ar") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_arabic_test_draligomaa_dataset| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|ar| +|Size:|1.7 GB| + +## References + +https://huggingface.co/DrAliGomaa/whisper-small-ar-test-draligomaa-dataset \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_small_arabic_test_draligomaa_dataset_pipeline_ar.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_arabic_test_draligomaa_dataset_pipeline_ar.md new file mode 100644 index 00000000000000..a7faf88f1707b9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_arabic_test_draligomaa_dataset_pipeline_ar.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Arabic whisper_small_arabic_test_draligomaa_dataset_pipeline pipeline WhisperForCTC from DrAliGomaa +author: John Snow Labs +name: whisper_small_arabic_test_draligomaa_dataset_pipeline +date: 2024-09-21 +tags: [ar, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: ar +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_arabic_test_draligomaa_dataset_pipeline` is a Arabic model originally trained by DrAliGomaa. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_arabic_test_draligomaa_dataset_pipeline_ar_5.5.0_3.0_1726936493009.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_arabic_test_draligomaa_dataset_pipeline_ar_5.5.0_3.0_1726936493009.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_arabic_test_draligomaa_dataset_pipeline", lang = "ar") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_arabic_test_draligomaa_dataset_pipeline", lang = "ar") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_arabic_test_draligomaa_dataset_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|ar| +|Size:|1.7 GB| + +## References + +https://huggingface.co/DrAliGomaa/whisper-small-ar-test-draligomaa-dataset + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_small_atco2_asr_en.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_atco2_asr_en.md new file mode 100644 index 00000000000000..38e3f8d77042ae --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_atco2_asr_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_small_atco2_asr WhisperForCTC from jlvdoorn +author: John Snow Labs +name: whisper_small_atco2_asr +date: 2024-09-21 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_atco2_asr` is a English model originally trained by jlvdoorn. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_atco2_asr_en_5.5.0_3.0_1726936074461.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_atco2_asr_en_5.5.0_3.0_1726936074461.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_atco2_asr","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_atco2_asr", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_atco2_asr| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/jlvdoorn/whisper-small-atco2-asr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_small_atco2_asr_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_atco2_asr_pipeline_en.md new file mode 100644 index 00000000000000..5ab531a69cdc76 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_atco2_asr_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_small_atco2_asr_pipeline pipeline WhisperForCTC from jlvdoorn +author: John Snow Labs +name: whisper_small_atco2_asr_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_atco2_asr_pipeline` is a English model originally trained by jlvdoorn. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_atco2_asr_pipeline_en_5.5.0_3.0_1726936153815.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_atco2_asr_pipeline_en_5.5.0_3.0_1726936153815.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_atco2_asr_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_atco2_asr_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_atco2_asr_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/jlvdoorn/whisper-small-atco2-asr + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_small_bengali_our_dataset_grapheme_en.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_bengali_our_dataset_grapheme_en.md new file mode 100644 index 00000000000000..5528b3ec911e4b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_bengali_our_dataset_grapheme_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_small_bengali_our_dataset_grapheme WhisperForCTC from AIFahim +author: John Snow Labs +name: whisper_small_bengali_our_dataset_grapheme +date: 2024-09-21 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_bengali_our_dataset_grapheme` is a English model originally trained by AIFahim. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_bengali_our_dataset_grapheme_en_5.5.0_3.0_1726939473804.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_bengali_our_dataset_grapheme_en_5.5.0_3.0_1726939473804.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_bengali_our_dataset_grapheme","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_bengali_our_dataset_grapheme", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_bengali_our_dataset_grapheme| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/AIFahim/whisper-small-bn_our_dataset_grapheme \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_small_bengali_shamik_bn.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_bengali_shamik_bn.md new file mode 100644 index 00000000000000..30389fa33e77e0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_bengali_shamik_bn.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Bengali whisper_small_bengali_shamik WhisperForCTC from Shamik +author: John Snow Labs +name: whisper_small_bengali_shamik +date: 2024-09-21 +tags: [bn, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: bn +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_bengali_shamik` is a Bengali model originally trained by Shamik. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_bengali_shamik_bn_5.5.0_3.0_1726905515902.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_bengali_shamik_bn_5.5.0_3.0_1726905515902.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_bengali_shamik","bn") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_bengali_shamik", "bn") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_bengali_shamik| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|bn| +|Size:|1.7 GB| + +## References + +https://huggingface.co/Shamik/whisper-small-bn \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_small_bengali_shamik_pipeline_bn.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_bengali_shamik_pipeline_bn.md new file mode 100644 index 00000000000000..e73dc04e578579 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_bengali_shamik_pipeline_bn.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Bengali whisper_small_bengali_shamik_pipeline pipeline WhisperForCTC from Shamik +author: John Snow Labs +name: whisper_small_bengali_shamik_pipeline +date: 2024-09-21 +tags: [bn, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: bn +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_bengali_shamik_pipeline` is a Bengali model originally trained by Shamik. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_bengali_shamik_pipeline_bn_5.5.0_3.0_1726905601199.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_bengali_shamik_pipeline_bn_5.5.0_3.0_1726905601199.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_bengali_shamik_pipeline", lang = "bn") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_bengali_shamik_pipeline", lang = "bn") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_bengali_shamik_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|bn| +|Size:|1.7 GB| + +## References + +https://huggingface.co/Shamik/whisper-small-bn + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_small_best_de.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_best_de.md new file mode 100644 index 00000000000000..d036b0f528c347 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_best_de.md @@ -0,0 +1,84 @@ +--- +layout: model +title: German whisper_small_best WhisperForCTC from marccgrau +author: John Snow Labs +name: whisper_small_best +date: 2024-09-21 +tags: [de, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: de +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_best` is a German model originally trained by marccgrau. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_best_de_5.5.0_3.0_1726904911476.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_best_de_5.5.0_3.0_1726904911476.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_best","de") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_best", "de") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_best| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|de| +|Size:|1.7 GB| + +## References + +https://huggingface.co/marccgrau/whisper-small-best \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_small_best_pipeline_de.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_best_pipeline_de.md new file mode 100644 index 00000000000000..48425277491691 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_best_pipeline_de.md @@ -0,0 +1,69 @@ +--- +layout: model +title: German whisper_small_best_pipeline pipeline WhisperForCTC from marccgrau +author: John Snow Labs +name: whisper_small_best_pipeline +date: 2024-09-21 +tags: [de, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: de +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_best_pipeline` is a German model originally trained by marccgrau. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_best_pipeline_de_5.5.0_3.0_1726904999911.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_best_pipeline_de_5.5.0_3.0_1726904999911.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_best_pipeline", lang = "de") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_best_pipeline", lang = "de") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_best_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|de| +|Size:|1.7 GB| + +## References + +https://huggingface.co/marccgrau/whisper-small-best + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_small_chinese_hk_jason1i_en.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_chinese_hk_jason1i_en.md new file mode 100644 index 00000000000000..350870d2eb564b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_chinese_hk_jason1i_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_small_chinese_hk_jason1i WhisperForCTC from jason1i +author: John Snow Labs +name: whisper_small_chinese_hk_jason1i +date: 2024-09-21 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_chinese_hk_jason1i` is a English model originally trained by jason1i. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_chinese_hk_jason1i_en_5.5.0_3.0_1726877886740.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_chinese_hk_jason1i_en_5.5.0_3.0_1726877886740.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_chinese_hk_jason1i","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_chinese_hk_jason1i", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_chinese_hk_jason1i| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/jason1i/whisper-small-zh-HK \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_small_chinese_hk_jason1i_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_chinese_hk_jason1i_pipeline_en.md new file mode 100644 index 00000000000000..7b622b8df4c057 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_chinese_hk_jason1i_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_small_chinese_hk_jason1i_pipeline pipeline WhisperForCTC from jason1i +author: John Snow Labs +name: whisper_small_chinese_hk_jason1i_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_chinese_hk_jason1i_pipeline` is a English model originally trained by jason1i. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_chinese_hk_jason1i_pipeline_en_5.5.0_3.0_1726877978548.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_chinese_hk_jason1i_pipeline_en_5.5.0_3.0_1726877978548.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_chinese_hk_jason1i_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_chinese_hk_jason1i_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_chinese_hk_jason1i_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/jason1i/whisper-small-zh-HK + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_small_cn_patrickml_en.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_cn_patrickml_en.md new file mode 100644 index 00000000000000..4a10f75a9addd4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_cn_patrickml_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_small_cn_patrickml WhisperForCTC from PatrickML +author: John Snow Labs +name: whisper_small_cn_patrickml +date: 2024-09-21 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_cn_patrickml` is a English model originally trained by PatrickML. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_cn_patrickml_en_5.5.0_3.0_1726910991376.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_cn_patrickml_en_5.5.0_3.0_1726910991376.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_cn_patrickml","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_cn_patrickml", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_cn_patrickml| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/PatrickML/whisper-small-CN \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_small_cn_patrickml_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_cn_patrickml_pipeline_en.md new file mode 100644 index 00000000000000..ecf9cb43be57ed --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_cn_patrickml_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_small_cn_patrickml_pipeline pipeline WhisperForCTC from PatrickML +author: John Snow Labs +name: whisper_small_cn_patrickml_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_cn_patrickml_pipeline` is a English model originally trained by PatrickML. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_cn_patrickml_pipeline_en_5.5.0_3.0_1726911080110.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_cn_patrickml_pipeline_en_5.5.0_3.0_1726911080110.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_cn_patrickml_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_cn_patrickml_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_cn_patrickml_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/PatrickML/whisper-small-CN + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_small_ctmtrained_en.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_ctmtrained_en.md new file mode 100644 index 00000000000000..3a6293a309e0d5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_ctmtrained_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_small_ctmtrained WhisperForCTC from ctm446 +author: John Snow Labs +name: whisper_small_ctmtrained +date: 2024-09-21 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_ctmtrained` is a English model originally trained by ctm446. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_ctmtrained_en_5.5.0_3.0_1726905695025.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_ctmtrained_en_5.5.0_3.0_1726905695025.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_ctmtrained","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_ctmtrained", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_ctmtrained| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/ctm446/whisper-small-ctmtrained \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_small_ctmtrained_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_ctmtrained_pipeline_en.md new file mode 100644 index 00000000000000..8e5c82f692e18d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_ctmtrained_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_small_ctmtrained_pipeline pipeline WhisperForCTC from ctm446 +author: John Snow Labs +name: whisper_small_ctmtrained_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_ctmtrained_pipeline` is a English model originally trained by ctm446. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_ctmtrained_pipeline_en_5.5.0_3.0_1726905779785.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_ctmtrained_pipeline_en_5.5.0_3.0_1726905779785.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_ctmtrained_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_ctmtrained_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_ctmtrained_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/ctm446/whisper-small-ctmtrained + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_small_divehi_arch4ngel_pipeline_dv.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_divehi_arch4ngel_pipeline_dv.md new file mode 100644 index 00000000000000..648111f12312da --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_divehi_arch4ngel_pipeline_dv.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Dhivehi, Divehi, Maldivian whisper_small_divehi_arch4ngel_pipeline pipeline WhisperForCTC from Arch4ngel +author: John Snow Labs +name: whisper_small_divehi_arch4ngel_pipeline +date: 2024-09-21 +tags: [dv, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: dv +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_divehi_arch4ngel_pipeline` is a Dhivehi, Divehi, Maldivian model originally trained by Arch4ngel. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_divehi_arch4ngel_pipeline_dv_5.5.0_3.0_1726906317656.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_divehi_arch4ngel_pipeline_dv_5.5.0_3.0_1726906317656.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_divehi_arch4ngel_pipeline", lang = "dv") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_divehi_arch4ngel_pipeline", lang = "dv") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_divehi_arch4ngel_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|dv| +|Size:|1.7 GB| + +## References + +https://huggingface.co/Arch4ngel/whisper-small-dv + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_small_divehi_nafishzaldinanda_en.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_divehi_nafishzaldinanda_en.md new file mode 100644 index 00000000000000..1ba34266e44aaf --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_divehi_nafishzaldinanda_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_small_divehi_nafishzaldinanda WhisperForCTC from NafishZaldinanda +author: John Snow Labs +name: whisper_small_divehi_nafishzaldinanda +date: 2024-09-21 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_divehi_nafishzaldinanda` is a English model originally trained by NafishZaldinanda. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_divehi_nafishzaldinanda_en_5.5.0_3.0_1726907455591.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_divehi_nafishzaldinanda_en_5.5.0_3.0_1726907455591.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_divehi_nafishzaldinanda","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_divehi_nafishzaldinanda", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_divehi_nafishzaldinanda| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/NafishZaldinanda/whisper-small-dv \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_small_divehi_nafishzaldinanda_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_divehi_nafishzaldinanda_pipeline_en.md new file mode 100644 index 00000000000000..fd4639871fc769 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_divehi_nafishzaldinanda_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_small_divehi_nafishzaldinanda_pipeline pipeline WhisperForCTC from NafishZaldinanda +author: John Snow Labs +name: whisper_small_divehi_nafishzaldinanda_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_divehi_nafishzaldinanda_pipeline` is a English model originally trained by NafishZaldinanda. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_divehi_nafishzaldinanda_pipeline_en_5.5.0_3.0_1726907532398.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_divehi_nafishzaldinanda_pipeline_en_5.5.0_3.0_1726907532398.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_divehi_nafishzaldinanda_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_divehi_nafishzaldinanda_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_divehi_nafishzaldinanda_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/NafishZaldinanda/whisper-small-dv + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_small_divehi_ptah23_dv.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_divehi_ptah23_dv.md new file mode 100644 index 00000000000000..482d87e39aa47e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_divehi_ptah23_dv.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Dhivehi, Divehi, Maldivian whisper_small_divehi_ptah23 WhisperForCTC from ptah23 +author: John Snow Labs +name: whisper_small_divehi_ptah23 +date: 2024-09-21 +tags: [dv, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: dv +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_divehi_ptah23` is a Dhivehi, Divehi, Maldivian model originally trained by ptah23. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_divehi_ptah23_dv_5.5.0_3.0_1726890817757.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_divehi_ptah23_dv_5.5.0_3.0_1726890817757.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_divehi_ptah23","dv") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_divehi_ptah23", "dv") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_divehi_ptah23| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|dv| +|Size:|1.7 GB| + +## References + +https://huggingface.co/ptah23/whisper-small-dv \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_small_divehi_ptah23_pipeline_dv.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_divehi_ptah23_pipeline_dv.md new file mode 100644 index 00000000000000..bad0d1bc1c53cb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_divehi_ptah23_pipeline_dv.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Dhivehi, Divehi, Maldivian whisper_small_divehi_ptah23_pipeline pipeline WhisperForCTC from ptah23 +author: John Snow Labs +name: whisper_small_divehi_ptah23_pipeline +date: 2024-09-21 +tags: [dv, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: dv +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_divehi_ptah23_pipeline` is a Dhivehi, Divehi, Maldivian model originally trained by ptah23. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_divehi_ptah23_pipeline_dv_5.5.0_3.0_1726890904394.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_divehi_ptah23_pipeline_dv_5.5.0_3.0_1726890904394.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_divehi_ptah23_pipeline", lang = "dv") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_divehi_ptah23_pipeline", lang = "dv") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_divehi_ptah23_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|dv| +|Size:|1.7 GB| + +## References + +https://huggingface.co/ptah23/whisper-small-dv + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_small_divehi_tashi58_dv.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_divehi_tashi58_dv.md new file mode 100644 index 00000000000000..be04345a08f184 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_divehi_tashi58_dv.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Dhivehi, Divehi, Maldivian whisper_small_divehi_tashi58 WhisperForCTC from Tashi58 +author: John Snow Labs +name: whisper_small_divehi_tashi58 +date: 2024-09-21 +tags: [dv, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: dv +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_divehi_tashi58` is a Dhivehi, Divehi, Maldivian model originally trained by Tashi58. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_divehi_tashi58_dv_5.5.0_3.0_1726936234191.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_divehi_tashi58_dv_5.5.0_3.0_1726936234191.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_divehi_tashi58","dv") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_divehi_tashi58", "dv") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_divehi_tashi58| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|dv| +|Size:|1.7 GB| + +## References + +https://huggingface.co/Tashi58/whisper-small-dv \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_small_divehi_tashi58_pipeline_dv.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_divehi_tashi58_pipeline_dv.md new file mode 100644 index 00000000000000..18add79dd69c82 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_divehi_tashi58_pipeline_dv.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Dhivehi, Divehi, Maldivian whisper_small_divehi_tashi58_pipeline pipeline WhisperForCTC from Tashi58 +author: John Snow Labs +name: whisper_small_divehi_tashi58_pipeline +date: 2024-09-21 +tags: [dv, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: dv +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_divehi_tashi58_pipeline` is a Dhivehi, Divehi, Maldivian model originally trained by Tashi58. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_divehi_tashi58_pipeline_dv_5.5.0_3.0_1726936315841.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_divehi_tashi58_pipeline_dv_5.5.0_3.0_1726936315841.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_divehi_tashi58_pipeline", lang = "dv") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_divehi_tashi58_pipeline", lang = "dv") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_divehi_tashi58_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|dv| +|Size:|1.7 GB| + +## References + +https://huggingface.co/Tashi58/whisper-small-dv + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_small_divehi_vinayakp_dv.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_divehi_vinayakp_dv.md new file mode 100644 index 00000000000000..42d9271b58a1a1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_divehi_vinayakp_dv.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Dhivehi, Divehi, Maldivian whisper_small_divehi_vinayakp WhisperForCTC from VinayakP +author: John Snow Labs +name: whisper_small_divehi_vinayakp +date: 2024-09-21 +tags: [dv, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: dv +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_divehi_vinayakp` is a Dhivehi, Divehi, Maldivian model originally trained by VinayakP. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_divehi_vinayakp_dv_5.5.0_3.0_1726962778205.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_divehi_vinayakp_dv_5.5.0_3.0_1726962778205.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_divehi_vinayakp","dv") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_divehi_vinayakp", "dv") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_divehi_vinayakp| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|dv| +|Size:|1.7 GB| + +## References + +https://huggingface.co/VinayakP/whisper-small-dv \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_small_divehi_vinayakp_pipeline_dv.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_divehi_vinayakp_pipeline_dv.md new file mode 100644 index 00000000000000..8cabee7647f005 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_divehi_vinayakp_pipeline_dv.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Dhivehi, Divehi, Maldivian whisper_small_divehi_vinayakp_pipeline pipeline WhisperForCTC from VinayakP +author: John Snow Labs +name: whisper_small_divehi_vinayakp_pipeline +date: 2024-09-21 +tags: [dv, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: dv +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_divehi_vinayakp_pipeline` is a Dhivehi, Divehi, Maldivian model originally trained by VinayakP. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_divehi_vinayakp_pipeline_dv_5.5.0_3.0_1726962853037.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_divehi_vinayakp_pipeline_dv_5.5.0_3.0_1726962853037.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_divehi_vinayakp_pipeline", lang = "dv") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_divehi_vinayakp_pipeline", lang = "dv") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_divehi_vinayakp_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|dv| +|Size:|1.7 GB| + +## References + +https://huggingface.co/VinayakP/whisper-small-dv + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_small_divehi_winmodel_dv.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_divehi_winmodel_dv.md new file mode 100644 index 00000000000000..36ffb1e8ae5fcd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_divehi_winmodel_dv.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Dhivehi, Divehi, Maldivian whisper_small_divehi_winmodel WhisperForCTC from Winmodel +author: John Snow Labs +name: whisper_small_divehi_winmodel +date: 2024-09-21 +tags: [dv, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: dv +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_divehi_winmodel` is a Dhivehi, Divehi, Maldivian model originally trained by Winmodel. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_divehi_winmodel_dv_5.5.0_3.0_1726935914727.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_divehi_winmodel_dv_5.5.0_3.0_1726935914727.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_divehi_winmodel","dv") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_divehi_winmodel", "dv") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_divehi_winmodel| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|dv| +|Size:|1.7 GB| + +## References + +https://huggingface.co/Winmodel/whisper-small-dv \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_small_divehi_winmodel_pipeline_dv.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_divehi_winmodel_pipeline_dv.md new file mode 100644 index 00000000000000..7a7d2d620d75a1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_divehi_winmodel_pipeline_dv.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Dhivehi, Divehi, Maldivian whisper_small_divehi_winmodel_pipeline pipeline WhisperForCTC from Winmodel +author: John Snow Labs +name: whisper_small_divehi_winmodel_pipeline +date: 2024-09-21 +tags: [dv, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: dv +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_divehi_winmodel_pipeline` is a Dhivehi, Divehi, Maldivian model originally trained by Winmodel. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_divehi_winmodel_pipeline_dv_5.5.0_3.0_1726935991273.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_divehi_winmodel_pipeline_dv_5.5.0_3.0_1726935991273.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_divehi_winmodel_pipeline", lang = "dv") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_divehi_winmodel_pipeline", lang = "dv") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_divehi_winmodel_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|dv| +|Size:|1.7 GB| + +## References + +https://huggingface.co/Winmodel/whisper-small-dv + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_small_dv.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_dv.md new file mode 100644 index 00000000000000..569b2d46bd8502 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_dv.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Dhivehi, Divehi, Maldivian whisper_small WhisperForCTC from ClementXie +author: John Snow Labs +name: whisper_small +date: 2024-09-21 +tags: [dv, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: dv +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small` is a Dhivehi, Divehi, Maldivian model originally trained by ClementXie. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_dv_5.5.0_3.0_1726950785426.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_dv_5.5.0_3.0_1726950785426.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small","dv") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small", "dv") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|dv| +|Size:|1.7 GB| + +## References + +https://huggingface.co/ClementXie/whisper-small \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_small_egy_en.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_egy_en.md new file mode 100644 index 00000000000000..dfae756ac8278f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_egy_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_small_egy WhisperForCTC from HuggingPanda +author: John Snow Labs +name: whisper_small_egy +date: 2024-09-21 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_egy` is a English model originally trained by HuggingPanda. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_egy_en_5.5.0_3.0_1726939650502.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_egy_en_5.5.0_3.0_1726939650502.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_egy","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_egy", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_egy| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/HuggingPanda/whisper-small-egy \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_small_egy_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_egy_pipeline_en.md new file mode 100644 index 00000000000000..b633b196c0cfff --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_egy_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_small_egy_pipeline pipeline WhisperForCTC from HuggingPanda +author: John Snow Labs +name: whisper_small_egy_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_egy_pipeline` is a English model originally trained by HuggingPanda. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_egy_pipeline_en_5.5.0_3.0_1726939727362.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_egy_pipeline_en_5.5.0_3.0_1726939727362.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_egy_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_egy_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_egy_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/HuggingPanda/whisper-small-egy + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_small_english_mskov_en.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_english_mskov_en.md new file mode 100644 index 00000000000000..2c5d6a90f71fd0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_english_mskov_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_small_english_mskov WhisperForCTC from mskov +author: John Snow Labs +name: whisper_small_english_mskov +date: 2024-09-21 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_english_mskov` is a English model originally trained by mskov. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_english_mskov_en_5.5.0_3.0_1726949929598.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_english_mskov_en_5.5.0_3.0_1726949929598.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_english_mskov","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_english_mskov", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_english_mskov| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.1 GB| + +## References + +https://huggingface.co/mskov/whisper-small.en \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_small_english_mskov_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_english_mskov_pipeline_en.md new file mode 100644 index 00000000000000..ce5f251e40a532 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_english_mskov_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_small_english_mskov_pipeline pipeline WhisperForCTC from mskov +author: John Snow Labs +name: whisper_small_english_mskov_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_english_mskov_pipeline` is a English model originally trained by mskov. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_english_mskov_pipeline_en_5.5.0_3.0_1726950224671.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_english_mskov_pipeline_en_5.5.0_3.0_1726950224671.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_english_mskov_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_english_mskov_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_english_mskov_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.1 GB| + +## References + +https://huggingface.co/mskov/whisper-small.en + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_small_english_tonga_tonga_islands_myst55h_en.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_english_tonga_tonga_islands_myst55h_en.md new file mode 100644 index 00000000000000..3988d8a8d77b83 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_english_tonga_tonga_islands_myst55h_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_small_english_tonga_tonga_islands_myst55h WhisperForCTC from rishabhjain16 +author: John Snow Labs +name: whisper_small_english_tonga_tonga_islands_myst55h +date: 2024-09-21 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_english_tonga_tonga_islands_myst55h` is a English model originally trained by rishabhjain16. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_english_tonga_tonga_islands_myst55h_en_5.5.0_3.0_1726911883945.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_english_tonga_tonga_islands_myst55h_en_5.5.0_3.0_1726911883945.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_english_tonga_tonga_islands_myst55h","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_english_tonga_tonga_islands_myst55h", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_english_tonga_tonga_islands_myst55h| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/rishabhjain16/whisper_small_en_to_myst55h \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_small_english_tonga_tonga_islands_myst55h_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_english_tonga_tonga_islands_myst55h_pipeline_en.md new file mode 100644 index 00000000000000..e1484cce58038c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_english_tonga_tonga_islands_myst55h_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_small_english_tonga_tonga_islands_myst55h_pipeline pipeline WhisperForCTC from rishabhjain16 +author: John Snow Labs +name: whisper_small_english_tonga_tonga_islands_myst55h_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_english_tonga_tonga_islands_myst55h_pipeline` is a English model originally trained by rishabhjain16. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_english_tonga_tonga_islands_myst55h_pipeline_en_5.5.0_3.0_1726911965461.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_english_tonga_tonga_islands_myst55h_pipeline_en_5.5.0_3.0_1726911965461.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_english_tonga_tonga_islands_myst55h_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_english_tonga_tonga_islands_myst55h_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_english_tonga_tonga_islands_myst55h_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/rishabhjain16/whisper_small_en_to_myst55h + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_small_european_en.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_european_en.md new file mode 100644 index 00000000000000..34cb7bf1b5fc99 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_european_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_small_european WhisperForCTC from aware-ai +author: John Snow Labs +name: whisper_small_european +date: 2024-09-21 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_european` is a English model originally trained by aware-ai. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_european_en_5.5.0_3.0_1726935937730.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_european_en_5.5.0_3.0_1726935937730.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_european","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_european", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_european| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/aware-ai/whisper-small-european \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_small_european_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_european_pipeline_en.md new file mode 100644 index 00000000000000..76bb0f148b65d5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_european_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_small_european_pipeline pipeline WhisperForCTC from aware-ai +author: John Snow Labs +name: whisper_small_european_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_european_pipeline` is a English model originally trained by aware-ai. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_european_pipeline_en_5.5.0_3.0_1726936014455.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_european_pipeline_en_5.5.0_3.0_1726936014455.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_european_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_european_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_european_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/aware-ai/whisper-small-european + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_small_fine_tuned_with_patient_conversations_en.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_fine_tuned_with_patient_conversations_en.md new file mode 100644 index 00000000000000..1f1e3adb29800c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_fine_tuned_with_patient_conversations_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_small_fine_tuned_with_patient_conversations WhisperForCTC from ilyyyyy +author: John Snow Labs +name: whisper_small_fine_tuned_with_patient_conversations +date: 2024-09-21 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_fine_tuned_with_patient_conversations` is a English model originally trained by ilyyyyy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_fine_tuned_with_patient_conversations_en_5.5.0_3.0_1726906001662.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_fine_tuned_with_patient_conversations_en_5.5.0_3.0_1726906001662.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_fine_tuned_with_patient_conversations","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_fine_tuned_with_patient_conversations", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_fine_tuned_with_patient_conversations| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/ilyyyyy/whisper-small-fine-tuned-with-patient-conversations \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_small_fine_tuned_with_patient_conversations_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_fine_tuned_with_patient_conversations_pipeline_en.md new file mode 100644 index 00000000000000..9c65e74ee7ccb1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_fine_tuned_with_patient_conversations_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_small_fine_tuned_with_patient_conversations_pipeline pipeline WhisperForCTC from ilyyyyy +author: John Snow Labs +name: whisper_small_fine_tuned_with_patient_conversations_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_fine_tuned_with_patient_conversations_pipeline` is a English model originally trained by ilyyyyy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_fine_tuned_with_patient_conversations_pipeline_en_5.5.0_3.0_1726906087124.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_fine_tuned_with_patient_conversations_pipeline_en_5.5.0_3.0_1726906087124.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_fine_tuned_with_patient_conversations_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_fine_tuned_with_patient_conversations_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_fine_tuned_with_patient_conversations_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/ilyyyyy/whisper-small-fine-tuned-with-patient-conversations + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_small_ga2en_v1_2_r_ga.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_ga2en_v1_2_r_ga.md new file mode 100644 index 00000000000000..842b47a2bb40b6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_ga2en_v1_2_r_ga.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Irish whisper_small_ga2en_v1_2_r WhisperForCTC from ymoslem +author: John Snow Labs +name: whisper_small_ga2en_v1_2_r +date: 2024-09-21 +tags: [ga, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: ga +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_ga2en_v1_2_r` is a Irish model originally trained by ymoslem. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_ga2en_v1_2_r_ga_5.5.0_3.0_1726937466070.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_ga2en_v1_2_r_ga_5.5.0_3.0_1726937466070.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_ga2en_v1_2_r","ga") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_ga2en_v1_2_r", "ga") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_ga2en_v1_2_r| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|ga| +|Size:|1.7 GB| + +## References + +https://huggingface.co/ymoslem/whisper-small-ga2en-v1.2-r \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_small_ga2en_v5_2_1_r_ga.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_ga2en_v5_2_1_r_ga.md new file mode 100644 index 00000000000000..a2586aaadc25b2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_ga2en_v5_2_1_r_ga.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Irish whisper_small_ga2en_v5_2_1_r WhisperForCTC from ymoslem +author: John Snow Labs +name: whisper_small_ga2en_v5_2_1_r +date: 2024-09-21 +tags: [ga, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: ga +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_ga2en_v5_2_1_r` is a Irish model originally trained by ymoslem. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_ga2en_v5_2_1_r_ga_5.5.0_3.0_1726876888340.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_ga2en_v5_2_1_r_ga_5.5.0_3.0_1726876888340.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_ga2en_v5_2_1_r","ga") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_ga2en_v5_2_1_r", "ga") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_ga2en_v5_2_1_r| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|ga| +|Size:|1.7 GB| + +## References + +https://huggingface.co/ymoslem/whisper-small-ga2en-v5.2.1-r \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_small_ga2en_v5_2_1_r_pipeline_ga.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_ga2en_v5_2_1_r_pipeline_ga.md new file mode 100644 index 00000000000000..c03ec1cecb5ebc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_ga2en_v5_2_1_r_pipeline_ga.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Irish whisper_small_ga2en_v5_2_1_r_pipeline pipeline WhisperForCTC from ymoslem +author: John Snow Labs +name: whisper_small_ga2en_v5_2_1_r_pipeline +date: 2024-09-21 +tags: [ga, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: ga +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_ga2en_v5_2_1_r_pipeline` is a Irish model originally trained by ymoslem. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_ga2en_v5_2_1_r_pipeline_ga_5.5.0_3.0_1726876975780.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_ga2en_v5_2_1_r_pipeline_ga_5.5.0_3.0_1726876975780.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_ga2en_v5_2_1_r_pipeline", lang = "ga") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_ga2en_v5_2_1_r_pipeline", lang = "ga") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_ga2en_v5_2_1_r_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|ga| +|Size:|1.7 GB| + +## References + +https://huggingface.co/ymoslem/whisper-small-ga2en-v5.2.1-r + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_small_ger_en.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_ger_en.md new file mode 100644 index 00000000000000..71dc2cdc04f3c2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_ger_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_small_ger WhisperForCTC from daniel123321 +author: John Snow Labs +name: whisper_small_ger +date: 2024-09-21 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_ger` is a English model originally trained by daniel123321. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_ger_en_5.5.0_3.0_1726894944274.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_ger_en_5.5.0_3.0_1726894944274.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_ger","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_ger", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_ger| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/daniel123321/whisper-small-ger \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_small_ger_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_ger_pipeline_en.md new file mode 100644 index 00000000000000..4737f61a65f885 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_ger_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_small_ger_pipeline pipeline WhisperForCTC from daniel123321 +author: John Snow Labs +name: whisper_small_ger_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_ger_pipeline` is a English model originally trained by daniel123321. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_ger_pipeline_en_5.5.0_3.0_1726895024495.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_ger_pipeline_en_5.5.0_3.0_1726895024495.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_ger_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_ger_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_ger_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/daniel123321/whisper-small-ger + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_small_hausa_phaeeza_en.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_hausa_phaeeza_en.md new file mode 100644 index 00000000000000..81f7b2d2aa3664 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_hausa_phaeeza_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_small_hausa_phaeeza WhisperForCTC from phaeeza +author: John Snow Labs +name: whisper_small_hausa_phaeeza +date: 2024-09-21 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_hausa_phaeeza` is a English model originally trained by phaeeza. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_hausa_phaeeza_en_5.5.0_3.0_1726947967917.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_hausa_phaeeza_en_5.5.0_3.0_1726947967917.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_hausa_phaeeza","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_hausa_phaeeza", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_hausa_phaeeza| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/phaeeza/whisper-small-ha \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_small_hausa_phaeeza_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_hausa_phaeeza_pipeline_en.md new file mode 100644 index 00000000000000..e9422cf52bef08 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_hausa_phaeeza_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_small_hausa_phaeeza_pipeline pipeline WhisperForCTC from phaeeza +author: John Snow Labs +name: whisper_small_hausa_phaeeza_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_hausa_phaeeza_pipeline` is a English model originally trained by phaeeza. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_hausa_phaeeza_pipeline_en_5.5.0_3.0_1726948066875.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_hausa_phaeeza_pipeline_en_5.5.0_3.0_1726948066875.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_hausa_phaeeza_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_hausa_phaeeza_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_hausa_phaeeza_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/phaeeza/whisper-small-ha + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_small_hebrew_modern_3_he.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_hebrew_modern_3_he.md new file mode 100644 index 00000000000000..1eeebd9f9f0b00 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_hebrew_modern_3_he.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Hebrew whisper_small_hebrew_modern_3 WhisperForCTC from mike249 +author: John Snow Labs +name: whisper_small_hebrew_modern_3 +date: 2024-09-21 +tags: [he, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: he +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_hebrew_modern_3` is a Hebrew model originally trained by mike249. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_hebrew_modern_3_he_5.5.0_3.0_1726879173933.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_hebrew_modern_3_he_5.5.0_3.0_1726879173933.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_hebrew_modern_3","he") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_hebrew_modern_3", "he") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_hebrew_modern_3| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|he| +|Size:|1.7 GB| + +## References + +https://huggingface.co/mike249/whisper-small-he-3 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_small_hebrew_modern_3_pipeline_he.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_hebrew_modern_3_pipeline_he.md new file mode 100644 index 00000000000000..da82cb43f51d17 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_hebrew_modern_3_pipeline_he.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Hebrew whisper_small_hebrew_modern_3_pipeline pipeline WhisperForCTC from mike249 +author: John Snow Labs +name: whisper_small_hebrew_modern_3_pipeline +date: 2024-09-21 +tags: [he, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: he +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_hebrew_modern_3_pipeline` is a Hebrew model originally trained by mike249. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_hebrew_modern_3_pipeline_he_5.5.0_3.0_1726879271547.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_hebrew_modern_3_pipeline_he_5.5.0_3.0_1726879271547.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_hebrew_modern_3_pipeline", lang = "he") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_hebrew_modern_3_pipeline", lang = "he") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_hebrew_modern_3_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|he| +|Size:|1.7 GB| + +## References + +https://huggingface.co/mike249/whisper-small-he-3 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_small_hindi_02_liangc40_en.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_hindi_02_liangc40_en.md new file mode 100644 index 00000000000000..9a64d1f9ae81a9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_hindi_02_liangc40_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_small_hindi_02_liangc40 WhisperForCTC from liangc40 +author: John Snow Labs +name: whisper_small_hindi_02_liangc40 +date: 2024-09-21 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_hindi_02_liangc40` is a English model originally trained by liangc40. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_hindi_02_liangc40_en_5.5.0_3.0_1726961626329.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_hindi_02_liangc40_en_5.5.0_3.0_1726961626329.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_hindi_02_liangc40","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_hindi_02_liangc40", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_hindi_02_liangc40| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/liangc40/whisper-small-hi_02 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_small_hindi_02_liangc40_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_hindi_02_liangc40_pipeline_en.md new file mode 100644 index 00000000000000..6f208874c5ec35 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_hindi_02_liangc40_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_small_hindi_02_liangc40_pipeline pipeline WhisperForCTC from liangc40 +author: John Snow Labs +name: whisper_small_hindi_02_liangc40_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_hindi_02_liangc40_pipeline` is a English model originally trained by liangc40. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_hindi_02_liangc40_pipeline_en_5.5.0_3.0_1726961709016.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_hindi_02_liangc40_pipeline_en_5.5.0_3.0_1726961709016.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_hindi_02_liangc40_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_hindi_02_liangc40_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_hindi_02_liangc40_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/liangc40/whisper-small-hi_02 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_small_hindi_anuragshas_hi.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_hindi_anuragshas_hi.md new file mode 100644 index 00000000000000..652ebfddf078f4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_hindi_anuragshas_hi.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Hindi whisper_small_hindi_anuragshas WhisperForCTC from anuragshas +author: John Snow Labs +name: whisper_small_hindi_anuragshas +date: 2024-09-21 +tags: [hi, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: hi +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_hindi_anuragshas` is a Hindi model originally trained by anuragshas. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_hindi_anuragshas_hi_5.5.0_3.0_1726892582397.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_hindi_anuragshas_hi_5.5.0_3.0_1726892582397.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_hindi_anuragshas","hi") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_hindi_anuragshas", "hi") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_hindi_anuragshas| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|hi| +|Size:|1.7 GB| + +## References + +https://huggingface.co/anuragshas/whisper-small-hi \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_small_hindi_anuragshas_pipeline_hi.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_hindi_anuragshas_pipeline_hi.md new file mode 100644 index 00000000000000..445bc0b528ac7b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_hindi_anuragshas_pipeline_hi.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Hindi whisper_small_hindi_anuragshas_pipeline pipeline WhisperForCTC from anuragshas +author: John Snow Labs +name: whisper_small_hindi_anuragshas_pipeline +date: 2024-09-21 +tags: [hi, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: hi +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_hindi_anuragshas_pipeline` is a Hindi model originally trained by anuragshas. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_hindi_anuragshas_pipeline_hi_5.5.0_3.0_1726892665928.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_hindi_anuragshas_pipeline_hi_5.5.0_3.0_1726892665928.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_hindi_anuragshas_pipeline", lang = "hi") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_hindi_anuragshas_pipeline", lang = "hi") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_hindi_anuragshas_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|hi| +|Size:|1.7 GB| + +## References + +https://huggingface.co/anuragshas/whisper-small-hi + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_small_hindi_jamin20_en.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_hindi_jamin20_en.md new file mode 100644 index 00000000000000..3a0078c85827b1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_hindi_jamin20_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_small_hindi_jamin20 WhisperForCTC from Jamin20 +author: John Snow Labs +name: whisper_small_hindi_jamin20 +date: 2024-09-21 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_hindi_jamin20` is a English model originally trained by Jamin20. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_hindi_jamin20_en_5.5.0_3.0_1726911717269.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_hindi_jamin20_en_5.5.0_3.0_1726911717269.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_hindi_jamin20","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_hindi_jamin20", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_hindi_jamin20| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/Jamin20/whisper-small-hi \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_small_hindi_jamin20_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_hindi_jamin20_pipeline_en.md new file mode 100644 index 00000000000000..3da3958795cbca --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_hindi_jamin20_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_small_hindi_jamin20_pipeline pipeline WhisperForCTC from Jamin20 +author: John Snow Labs +name: whisper_small_hindi_jamin20_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_hindi_jamin20_pipeline` is a English model originally trained by Jamin20. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_hindi_jamin20_pipeline_en_5.5.0_3.0_1726911798803.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_hindi_jamin20_pipeline_en_5.5.0_3.0_1726911798803.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_hindi_jamin20_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_hindi_jamin20_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_hindi_jamin20_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/Jamin20/whisper-small-hi + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_small_hindi_lmh16_en.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_hindi_lmh16_en.md new file mode 100644 index 00000000000000..c4081309af6000 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_hindi_lmh16_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_small_hindi_lmh16 WhisperForCTC from lmh16 +author: John Snow Labs +name: whisper_small_hindi_lmh16 +date: 2024-09-21 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_hindi_lmh16` is a English model originally trained by lmh16. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_hindi_lmh16_en_5.5.0_3.0_1726912886963.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_hindi_lmh16_en_5.5.0_3.0_1726912886963.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_hindi_lmh16","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_hindi_lmh16", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_hindi_lmh16| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/lmh16/whisper-small-hi \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_small_hindi_lmh16_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_hindi_lmh16_pipeline_en.md new file mode 100644 index 00000000000000..de033298cc25c7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_hindi_lmh16_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_small_hindi_lmh16_pipeline pipeline WhisperForCTC from lmh16 +author: John Snow Labs +name: whisper_small_hindi_lmh16_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_hindi_lmh16_pipeline` is a English model originally trained by lmh16. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_hindi_lmh16_pipeline_en_5.5.0_3.0_1726912968052.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_hindi_lmh16_pipeline_en_5.5.0_3.0_1726912968052.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_hindi_lmh16_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_hindi_lmh16_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_hindi_lmh16_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/lmh16/whisper-small-hi + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_small_hindi_me2140733_hi.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_hindi_me2140733_hi.md new file mode 100644 index 00000000000000..8a2ed9b4b3400b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_hindi_me2140733_hi.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Hindi whisper_small_hindi_me2140733 WhisperForCTC from me2140733 +author: John Snow Labs +name: whisper_small_hindi_me2140733 +date: 2024-09-21 +tags: [hi, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: hi +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_hindi_me2140733` is a Hindi model originally trained by me2140733. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_hindi_me2140733_hi_5.5.0_3.0_1726909994177.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_hindi_me2140733_hi_5.5.0_3.0_1726909994177.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_hindi_me2140733","hi") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_hindi_me2140733", "hi") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_hindi_me2140733| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|hi| +|Size:|1.7 GB| + +## References + +https://huggingface.co/me2140733/whisper-small-hi \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_small_hindi_me2140733_pipeline_hi.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_hindi_me2140733_pipeline_hi.md new file mode 100644 index 00000000000000..4c8d07cfe69600 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_hindi_me2140733_pipeline_hi.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Hindi whisper_small_hindi_me2140733_pipeline pipeline WhisperForCTC from me2140733 +author: John Snow Labs +name: whisper_small_hindi_me2140733_pipeline +date: 2024-09-21 +tags: [hi, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: hi +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_hindi_me2140733_pipeline` is a Hindi model originally trained by me2140733. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_hindi_me2140733_pipeline_hi_5.5.0_3.0_1726910081512.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_hindi_me2140733_pipeline_hi_5.5.0_3.0_1726910081512.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_hindi_me2140733_pipeline", lang = "hi") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_hindi_me2140733_pipeline", lang = "hi") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_hindi_me2140733_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|hi| +|Size:|1.7 GB| + +## References + +https://huggingface.co/me2140733/whisper-small-hi + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_small_hindi_qisan_pipeline_sv.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_hindi_qisan_pipeline_sv.md new file mode 100644 index 00000000000000..c9fbf4b9c69f81 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_hindi_qisan_pipeline_sv.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Swedish whisper_small_hindi_qisan_pipeline pipeline WhisperForCTC from qisan +author: John Snow Labs +name: whisper_small_hindi_qisan_pipeline +date: 2024-09-21 +tags: [sv, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: sv +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_hindi_qisan_pipeline` is a Swedish model originally trained by qisan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_hindi_qisan_pipeline_sv_5.5.0_3.0_1726951230753.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_hindi_qisan_pipeline_sv_5.5.0_3.0_1726951230753.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_hindi_qisan_pipeline", lang = "sv") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_hindi_qisan_pipeline", lang = "sv") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_hindi_qisan_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|sv| +|Size:|1.7 GB| + +## References + +https://huggingface.co/qisan/whisper-small-hi + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_small_hindi_qisan_sv.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_hindi_qisan_sv.md new file mode 100644 index 00000000000000..238eb1d358401a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_hindi_qisan_sv.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Swedish whisper_small_hindi_qisan WhisperForCTC from qisan +author: John Snow Labs +name: whisper_small_hindi_qisan +date: 2024-09-21 +tags: [sv, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: sv +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_hindi_qisan` is a Swedish model originally trained by qisan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_hindi_qisan_sv_5.5.0_3.0_1726951140976.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_hindi_qisan_sv_5.5.0_3.0_1726951140976.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_hindi_qisan","sv") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_hindi_qisan", "sv") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_hindi_qisan| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|sv| +|Size:|1.7 GB| + +## References + +https://huggingface.co/qisan/whisper-small-hi \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_small_hindi_rishita25_en.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_hindi_rishita25_en.md new file mode 100644 index 00000000000000..ce00b1b0c32ba0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_hindi_rishita25_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_small_hindi_rishita25 WhisperForCTC from rishita25 +author: John Snow Labs +name: whisper_small_hindi_rishita25 +date: 2024-09-21 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_hindi_rishita25` is a English model originally trained by rishita25. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_hindi_rishita25_en_5.5.0_3.0_1726892655268.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_hindi_rishita25_en_5.5.0_3.0_1726892655268.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_hindi_rishita25","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_hindi_rishita25", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_hindi_rishita25| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/rishita25/whisper-small-hi \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_small_hindi_rishita25_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_hindi_rishita25_pipeline_en.md new file mode 100644 index 00000000000000..97199d078bf54a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_hindi_rishita25_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_small_hindi_rishita25_pipeline pipeline WhisperForCTC from rishita25 +author: John Snow Labs +name: whisper_small_hindi_rishita25_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_hindi_rishita25_pipeline` is a English model originally trained by rishita25. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_hindi_rishita25_pipeline_en_5.5.0_3.0_1726892737210.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_hindi_rishita25_pipeline_en_5.5.0_3.0_1726892737210.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_hindi_rishita25_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_hindi_rishita25_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_hindi_rishita25_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/rishita25/whisper-small-hi + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_small_hindi_riteshkr_hi.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_hindi_riteshkr_hi.md new file mode 100644 index 00000000000000..42335f5b48371b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_hindi_riteshkr_hi.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Hindi whisper_small_hindi_riteshkr WhisperForCTC from riteshkr +author: John Snow Labs +name: whisper_small_hindi_riteshkr +date: 2024-09-21 +tags: [hi, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: hi +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_hindi_riteshkr` is a Hindi model originally trained by riteshkr. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_hindi_riteshkr_hi_5.5.0_3.0_1726951320112.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_hindi_riteshkr_hi_5.5.0_3.0_1726951320112.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_hindi_riteshkr","hi") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_hindi_riteshkr", "hi") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_hindi_riteshkr| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|hi| +|Size:|1.7 GB| + +## References + +https://huggingface.co/riteshkr/whisper-small-hi \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_small_hindi_riteshkr_pipeline_hi.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_hindi_riteshkr_pipeline_hi.md new file mode 100644 index 00000000000000..b58948df50e053 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_hindi_riteshkr_pipeline_hi.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Hindi whisper_small_hindi_riteshkr_pipeline pipeline WhisperForCTC from riteshkr +author: John Snow Labs +name: whisper_small_hindi_riteshkr_pipeline +date: 2024-09-21 +tags: [hi, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: hi +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_hindi_riteshkr_pipeline` is a Hindi model originally trained by riteshkr. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_hindi_riteshkr_pipeline_hi_5.5.0_3.0_1726951408597.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_hindi_riteshkr_pipeline_hi_5.5.0_3.0_1726951408597.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_hindi_riteshkr_pipeline", lang = "hi") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_hindi_riteshkr_pipeline", lang = "hi") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_hindi_riteshkr_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|hi| +|Size:|1.7 GB| + +## References + +https://huggingface.co/riteshkr/whisper-small-hi + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_small_hindi_wx971025_hi.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_hindi_wx971025_hi.md new file mode 100644 index 00000000000000..029ce706cfb848 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_hindi_wx971025_hi.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Hindi whisper_small_hindi_wx971025 WhisperForCTC from Wx971025 +author: John Snow Labs +name: whisper_small_hindi_wx971025 +date: 2024-09-21 +tags: [hi, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: hi +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_hindi_wx971025` is a Hindi model originally trained by Wx971025. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_hindi_wx971025_hi_5.5.0_3.0_1726892759940.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_hindi_wx971025_hi_5.5.0_3.0_1726892759940.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_hindi_wx971025","hi") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_hindi_wx971025", "hi") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_hindi_wx971025| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|hi| +|Size:|1.1 GB| + +## References + +https://huggingface.co/Wx971025/whisper-small-hi \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_small_hindi_wx971025_pipeline_hi.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_hindi_wx971025_pipeline_hi.md new file mode 100644 index 00000000000000..7ac2b6ed1984d5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_hindi_wx971025_pipeline_hi.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Hindi whisper_small_hindi_wx971025_pipeline pipeline WhisperForCTC from Wx971025 +author: John Snow Labs +name: whisper_small_hindi_wx971025_pipeline +date: 2024-09-21 +tags: [hi, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: hi +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_hindi_wx971025_pipeline` is a Hindi model originally trained by Wx971025. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_hindi_wx971025_pipeline_hi_5.5.0_3.0_1726893059799.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_hindi_wx971025_pipeline_hi_5.5.0_3.0_1726893059799.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_hindi_wx971025_pipeline", lang = "hi") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_hindi_wx971025_pipeline", lang = "hi") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_hindi_wx971025_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|hi| +|Size:|1.1 GB| + +## References + +https://huggingface.co/Wx971025/whisper-small-hi + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_small_hindi_yash_04_en.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_hindi_yash_04_en.md new file mode 100644 index 00000000000000..ce70383f420b12 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_hindi_yash_04_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_small_hindi_yash_04 WhisperForCTC from yash-04 +author: John Snow Labs +name: whisper_small_hindi_yash_04 +date: 2024-09-21 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_hindi_yash_04` is a English model originally trained by yash-04. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_hindi_yash_04_en_5.5.0_3.0_1726905869656.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_hindi_yash_04_en_5.5.0_3.0_1726905869656.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_hindi_yash_04","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_hindi_yash_04", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_hindi_yash_04| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/yash-04/whisper-small-hi \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_small_hindi_yash_04_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_hindi_yash_04_pipeline_en.md new file mode 100644 index 00000000000000..8c7ca68586eee4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_hindi_yash_04_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_small_hindi_yash_04_pipeline pipeline WhisperForCTC from yash-04 +author: John Snow Labs +name: whisper_small_hindi_yash_04_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_hindi_yash_04_pipeline` is a English model originally trained by yash-04. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_hindi_yash_04_pipeline_en_5.5.0_3.0_1726905951712.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_hindi_yash_04_pipeline_en_5.5.0_3.0_1726905951712.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_hindi_yash_04_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_hindi_yash_04_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_hindi_yash_04_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/yash-04/whisper-small-hi + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_small_hre6_nu_en.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_hre6_nu_en.md new file mode 100644 index 00000000000000..ca1129552d45a2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_hre6_nu_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_small_hre6_nu WhisperForCTC from ntviet +author: John Snow Labs +name: whisper_small_hre6_nu +date: 2024-09-21 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_hre6_nu` is a English model originally trained by ntviet. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_hre6_nu_en_5.5.0_3.0_1726910973262.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_hre6_nu_en_5.5.0_3.0_1726910973262.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_hre6_nu","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_hre6_nu", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_hre6_nu| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/ntviet/whisper-small-hre6-nu \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_small_hre6_nu_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_hre6_nu_pipeline_en.md new file mode 100644 index 00000000000000..48a60a4aa8f50c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_hre6_nu_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_small_hre6_nu_pipeline pipeline WhisperForCTC from ntviet +author: John Snow Labs +name: whisper_small_hre6_nu_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_hre6_nu_pipeline` is a English model originally trained by ntviet. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_hre6_nu_pipeline_en_5.5.0_3.0_1726911061967.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_hre6_nu_pipeline_en_5.5.0_3.0_1726911061967.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_hre6_nu_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_hre6_nu_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_hre6_nu_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/ntviet/whisper-small-hre6-nu + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_small_igbo_jamese360_pipeline_ig.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_igbo_jamese360_pipeline_ig.md new file mode 100644 index 00000000000000..1b1971295d746a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_igbo_jamese360_pipeline_ig.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Igbo whisper_small_igbo_jamese360_pipeline pipeline WhisperForCTC from jamese360 +author: John Snow Labs +name: whisper_small_igbo_jamese360_pipeline +date: 2024-09-21 +tags: [ig, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: ig +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_igbo_jamese360_pipeline` is a Igbo model originally trained by jamese360. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_igbo_jamese360_pipeline_ig_5.5.0_3.0_1726892402949.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_igbo_jamese360_pipeline_ig_5.5.0_3.0_1726892402949.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_igbo_jamese360_pipeline", lang = "ig") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_igbo_jamese360_pipeline", lang = "ig") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_igbo_jamese360_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|ig| +|Size:|1.7 GB| + +## References + +https://huggingface.co/jamese360/whisper-small-ig + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_small_indonesian_nafishzaldinanda_en.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_indonesian_nafishzaldinanda_en.md new file mode 100644 index 00000000000000..c5a6daf9a7a5e8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_indonesian_nafishzaldinanda_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_small_indonesian_nafishzaldinanda WhisperForCTC from NafishZaldinanda +author: John Snow Labs +name: whisper_small_indonesian_nafishzaldinanda +date: 2024-09-21 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_indonesian_nafishzaldinanda` is a English model originally trained by NafishZaldinanda. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_indonesian_nafishzaldinanda_en_5.5.0_3.0_1726908994012.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_indonesian_nafishzaldinanda_en_5.5.0_3.0_1726908994012.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_indonesian_nafishzaldinanda","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_indonesian_nafishzaldinanda", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_indonesian_nafishzaldinanda| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/NafishZaldinanda/whisper-small-id \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_small_indonesian_nafishzaldinanda_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_indonesian_nafishzaldinanda_pipeline_en.md new file mode 100644 index 00000000000000..b39d3a83cce378 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_indonesian_nafishzaldinanda_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_small_indonesian_nafishzaldinanda_pipeline pipeline WhisperForCTC from NafishZaldinanda +author: John Snow Labs +name: whisper_small_indonesian_nafishzaldinanda_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_indonesian_nafishzaldinanda_pipeline` is a English model originally trained by NafishZaldinanda. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_indonesian_nafishzaldinanda_pipeline_en_5.5.0_3.0_1726909073128.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_indonesian_nafishzaldinanda_pipeline_en_5.5.0_3.0_1726909073128.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_indonesian_nafishzaldinanda_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_indonesian_nafishzaldinanda_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_indonesian_nafishzaldinanda_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/NafishZaldinanda/whisper-small-id + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_small_indonesian_therains_id.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_indonesian_therains_id.md new file mode 100644 index 00000000000000..872499b7768d07 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_indonesian_therains_id.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Indonesian whisper_small_indonesian_therains WhisperForCTC from TheRains +author: John Snow Labs +name: whisper_small_indonesian_therains +date: 2024-09-21 +tags: [id, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: id +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_indonesian_therains` is a Indonesian model originally trained by TheRains. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_indonesian_therains_id_5.5.0_3.0_1726962419833.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_indonesian_therains_id_5.5.0_3.0_1726962419833.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_indonesian_therains","id") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_indonesian_therains", "id") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_indonesian_therains| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|id| +|Size:|1.7 GB| + +## References + +https://huggingface.co/TheRains/whisper-small-id \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_small_indonesian_therains_pipeline_id.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_indonesian_therains_pipeline_id.md new file mode 100644 index 00000000000000..08f86794c3931e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_indonesian_therains_pipeline_id.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Indonesian whisper_small_indonesian_therains_pipeline pipeline WhisperForCTC from TheRains +author: John Snow Labs +name: whisper_small_indonesian_therains_pipeline +date: 2024-09-21 +tags: [id, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: id +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_indonesian_therains_pipeline` is a Indonesian model originally trained by TheRains. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_indonesian_therains_pipeline_id_5.5.0_3.0_1726962504391.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_indonesian_therains_pipeline_id_5.5.0_3.0_1726962504391.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_indonesian_therains_pipeline", lang = "id") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_indonesian_therains_pipeline", lang = "id") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_indonesian_therains_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|id| +|Size:|1.7 GB| + +## References + +https://huggingface.co/TheRains/whisper-small-id + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_small_italian_luigisaetta_it.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_italian_luigisaetta_it.md new file mode 100644 index 00000000000000..14a5bb3b1ac888 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_italian_luigisaetta_it.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Italian whisper_small_italian_luigisaetta WhisperForCTC from luigisaetta +author: John Snow Labs +name: whisper_small_italian_luigisaetta +date: 2024-09-21 +tags: [it, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: it +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_italian_luigisaetta` is a Italian model originally trained by luigisaetta. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_italian_luigisaetta_it_5.5.0_3.0_1726909270918.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_italian_luigisaetta_it_5.5.0_3.0_1726909270918.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_italian_luigisaetta","it") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_italian_luigisaetta", "it") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_italian_luigisaetta| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|it| +|Size:|1.7 GB| + +## References + +https://huggingface.co/luigisaetta/whisper-small-it \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_small_italian_luigisaetta_pipeline_it.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_italian_luigisaetta_pipeline_it.md new file mode 100644 index 00000000000000..eef6be513b454a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_italian_luigisaetta_pipeline_it.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Italian whisper_small_italian_luigisaetta_pipeline pipeline WhisperForCTC from luigisaetta +author: John Snow Labs +name: whisper_small_italian_luigisaetta_pipeline +date: 2024-09-21 +tags: [it, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: it +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_italian_luigisaetta_pipeline` is a Italian model originally trained by luigisaetta. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_italian_luigisaetta_pipeline_it_5.5.0_3.0_1726909351045.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_italian_luigisaetta_pipeline_it_5.5.0_3.0_1726909351045.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_italian_luigisaetta_pipeline", lang = "it") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_italian_luigisaetta_pipeline", lang = "it") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_italian_luigisaetta_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|it| +|Size:|1.7 GB| + +## References + +https://huggingface.co/luigisaetta/whisper-small-it + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_small_japanese_test2_ja.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_japanese_test2_ja.md new file mode 100644 index 00000000000000..f54c790a75d07c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_japanese_test2_ja.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Japanese whisper_small_japanese_test2 WhisperForCTC from Slothful2024 +author: John Snow Labs +name: whisper_small_japanese_test2 +date: 2024-09-21 +tags: [ja, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: ja +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_japanese_test2` is a Japanese model originally trained by Slothful2024. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_japanese_test2_ja_5.5.0_3.0_1726893595560.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_japanese_test2_ja_5.5.0_3.0_1726893595560.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_japanese_test2","ja") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_japanese_test2", "ja") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_japanese_test2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|ja| +|Size:|1.7 GB| + +## References + +https://huggingface.co/Slothful2024/whisper-small-ja-test2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_small_japanese_test2_pipeline_ja.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_japanese_test2_pipeline_ja.md new file mode 100644 index 00000000000000..031ac58253a76e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_japanese_test2_pipeline_ja.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Japanese whisper_small_japanese_test2_pipeline pipeline WhisperForCTC from Slothful2024 +author: John Snow Labs +name: whisper_small_japanese_test2_pipeline +date: 2024-09-21 +tags: [ja, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: ja +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_japanese_test2_pipeline` is a Japanese model originally trained by Slothful2024. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_japanese_test2_pipeline_ja_5.5.0_3.0_1726893677244.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_japanese_test2_pipeline_ja_5.5.0_3.0_1726893677244.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_japanese_test2_pipeline", lang = "ja") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_japanese_test2_pipeline", lang = "ja") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_japanese_test2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|ja| +|Size:|1.7 GB| + +## References + +https://huggingface.co/Slothful2024/whisper-small-ja-test2 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_small_javanese_en.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_javanese_en.md new file mode 100644 index 00000000000000..b1b26044405ccb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_javanese_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_small_javanese WhisperForCTC from Rizka +author: John Snow Labs +name: whisper_small_javanese +date: 2024-09-21 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_javanese` is a English model originally trained by Rizka. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_javanese_en_5.5.0_3.0_1726936800542.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_javanese_en_5.5.0_3.0_1726936800542.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_javanese","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_javanese", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_javanese| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/Rizka/whisper-small-jv \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_small_korean_test_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_korean_test_pipeline_en.md new file mode 100644 index 00000000000000..1df7e27b76c8c6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_korean_test_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_small_korean_test_pipeline pipeline WhisperForCTC from Rifky +author: John Snow Labs +name: whisper_small_korean_test_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_korean_test_pipeline` is a English model originally trained by Rifky. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_korean_test_pipeline_en_5.5.0_3.0_1726949662917.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_korean_test_pipeline_en_5.5.0_3.0_1726949662917.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_korean_test_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_korean_test_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_korean_test_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/Rifky/whisper-small-ko-test + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_small_korean_yfreq_hi.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_korean_yfreq_hi.md new file mode 100644 index 00000000000000..92db72bfb8ac1b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_korean_yfreq_hi.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Hindi whisper_small_korean_yfreq WhisperForCTC from Gummybear05 +author: John Snow Labs +name: whisper_small_korean_yfreq +date: 2024-09-21 +tags: [hi, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: hi +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_korean_yfreq` is a Hindi model originally trained by Gummybear05. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_korean_yfreq_hi_5.5.0_3.0_1726905254568.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_korean_yfreq_hi_5.5.0_3.0_1726905254568.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_korean_yfreq","hi") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_korean_yfreq", "hi") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_korean_yfreq| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|hi| +|Size:|1.7 GB| + +## References + +https://huggingface.co/Gummybear05/whisper-small-ko-Yfreq \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_small_korean_yfreq_pipeline_hi.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_korean_yfreq_pipeline_hi.md new file mode 100644 index 00000000000000..9be9662eb05848 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_korean_yfreq_pipeline_hi.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Hindi whisper_small_korean_yfreq_pipeline pipeline WhisperForCTC from Gummybear05 +author: John Snow Labs +name: whisper_small_korean_yfreq_pipeline +date: 2024-09-21 +tags: [hi, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: hi +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_korean_yfreq_pipeline` is a Hindi model originally trained by Gummybear05. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_korean_yfreq_pipeline_hi_5.5.0_3.0_1726905335726.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_korean_yfreq_pipeline_hi_5.5.0_3.0_1726905335726.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_korean_yfreq_pipeline", lang = "hi") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_korean_yfreq_pipeline", lang = "hi") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_korean_yfreq_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|hi| +|Size:|1.7 GB| + +## References + +https://huggingface.co/Gummybear05/whisper-small-ko-Yfreq + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_small_korean_young_sp_en.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_korean_young_sp_en.md new file mode 100644 index 00000000000000..39c22a497fa25b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_korean_young_sp_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_small_korean_young_sp WhisperForCTC from syp1229 +author: John Snow Labs +name: whisper_small_korean_young_sp +date: 2024-09-21 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_korean_young_sp` is a English model originally trained by syp1229. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_korean_young_sp_en_5.5.0_3.0_1726961757008.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_korean_young_sp_en_5.5.0_3.0_1726961757008.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_korean_young_sp","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_korean_young_sp", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_korean_young_sp| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/syp1229/whisper-small-ko-young-sp \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_small_korean_young_sp_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_korean_young_sp_pipeline_en.md new file mode 100644 index 00000000000000..8c6c854e0adc10 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_korean_young_sp_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_small_korean_young_sp_pipeline pipeline WhisperForCTC from syp1229 +author: John Snow Labs +name: whisper_small_korean_young_sp_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_korean_young_sp_pipeline` is a English model originally trained by syp1229. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_korean_young_sp_pipeline_en_5.5.0_3.0_1726961834834.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_korean_young_sp_pipeline_en_5.5.0_3.0_1726961834834.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_korean_young_sp_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_korean_young_sp_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_korean_young_sp_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/syp1229/whisper-small-ko-young-sp + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_small_kurdish_ckb_en.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_kurdish_ckb_en.md new file mode 100644 index 00000000000000..ce722fffd48f92 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_kurdish_ckb_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_small_kurdish_ckb WhisperForCTC from roshna-omer +author: John Snow Labs +name: whisper_small_kurdish_ckb +date: 2024-09-21 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_kurdish_ckb` is a English model originally trained by roshna-omer. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_kurdish_ckb_en_5.5.0_3.0_1726950134496.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_kurdish_ckb_en_5.5.0_3.0_1726950134496.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_kurdish_ckb","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_kurdish_ckb", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_kurdish_ckb| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/roshna-omer/whisper-small-ku-ckb \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_small_kurdish_ckb_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_kurdish_ckb_pipeline_en.md new file mode 100644 index 00000000000000..4073a5537b2f32 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_kurdish_ckb_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_small_kurdish_ckb_pipeline pipeline WhisperForCTC from roshna-omer +author: John Snow Labs +name: whisper_small_kurdish_ckb_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_kurdish_ckb_pipeline` is a English model originally trained by roshna-omer. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_kurdish_ckb_pipeline_en_5.5.0_3.0_1726950220134.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_kurdish_ckb_pipeline_en_5.5.0_3.0_1726950220134.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_kurdish_ckb_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_kurdish_ckb_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_kurdish_ckb_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/roshna-omer/whisper-small-ku-ckb + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_small_llm_lingo_o_en.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_llm_lingo_o_en.md new file mode 100644 index 00000000000000..cc73d92d21d9ba --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_llm_lingo_o_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_small_llm_lingo_o WhisperForCTC from Enagamirzayev +author: John Snow Labs +name: whisper_small_llm_lingo_o +date: 2024-09-21 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_llm_lingo_o` is a English model originally trained by Enagamirzayev. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_llm_lingo_o_en_5.5.0_3.0_1726891467906.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_llm_lingo_o_en_5.5.0_3.0_1726891467906.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_llm_lingo_o","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_llm_lingo_o", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_llm_lingo_o| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/Enagamirzayev/whisper-small-llm-lingo_o \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_small_llm_lingo_o_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_llm_lingo_o_pipeline_en.md new file mode 100644 index 00000000000000..96024e4b780b69 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_llm_lingo_o_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_small_llm_lingo_o_pipeline pipeline WhisperForCTC from Enagamirzayev +author: John Snow Labs +name: whisper_small_llm_lingo_o_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_llm_lingo_o_pipeline` is a English model originally trained by Enagamirzayev. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_llm_lingo_o_pipeline_en_5.5.0_3.0_1726891735513.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_llm_lingo_o_pipeline_en_5.5.0_3.0_1726891735513.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_llm_lingo_o_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_llm_lingo_o_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_llm_lingo_o_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/Enagamirzayev/whisper-small-llm-lingo_o + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_small_oriya_v1_3_or.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_oriya_v1_3_or.md new file mode 100644 index 00000000000000..348d8625de6f92 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_oriya_v1_3_or.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Oriya (macrolanguage) whisper_small_oriya_v1_3 WhisperForCTC from chandan3007 +author: John Snow Labs +name: whisper_small_oriya_v1_3 +date: 2024-09-21 +tags: [or, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: or +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_oriya_v1_3` is a Oriya (macrolanguage) model originally trained by chandan3007. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_oriya_v1_3_or_5.5.0_3.0_1726937284746.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_oriya_v1_3_or_5.5.0_3.0_1726937284746.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_oriya_v1_3","or") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_oriya_v1_3", "or") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_oriya_v1_3| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|or| +|Size:|1.7 GB| + +## References + +https://huggingface.co/chandan3007/whisper-small-oriya-v1.3 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_small_oriya_v1_3_pipeline_or.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_oriya_v1_3_pipeline_or.md new file mode 100644 index 00000000000000..7a64dbf604e75d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_oriya_v1_3_pipeline_or.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Oriya (macrolanguage) whisper_small_oriya_v1_3_pipeline pipeline WhisperForCTC from chandan3007 +author: John Snow Labs +name: whisper_small_oriya_v1_3_pipeline +date: 2024-09-21 +tags: [or, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: or +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_oriya_v1_3_pipeline` is a Oriya (macrolanguage) model originally trained by chandan3007. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_oriya_v1_3_pipeline_or_5.5.0_3.0_1726937373371.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_oriya_v1_3_pipeline_or_5.5.0_3.0_1726937373371.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_oriya_v1_3_pipeline", lang = "or") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_oriya_v1_3_pipeline", lang = "or") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_oriya_v1_3_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|or| +|Size:|1.7 GB| + +## References + +https://huggingface.co/chandan3007/whisper-small-oriya-v1.3 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_small_pashto_pipeline_ps.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_pashto_pipeline_ps.md new file mode 100644 index 00000000000000..9eb08520b1eb9d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_pashto_pipeline_ps.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Pashto, Pushto whisper_small_pashto_pipeline pipeline WhisperForCTC from ihanif +author: John Snow Labs +name: whisper_small_pashto_pipeline +date: 2024-09-21 +tags: [ps, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: ps +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_pashto_pipeline` is a Pashto, Pushto model originally trained by ihanif. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_pashto_pipeline_ps_5.5.0_3.0_1726878658350.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_pashto_pipeline_ps_5.5.0_3.0_1726878658350.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_pashto_pipeline", lang = "ps") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_pashto_pipeline", lang = "ps") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_pashto_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|ps| +|Size:|1.7 GB| + +## References + +https://huggingface.co/ihanif/whisper-small-ps + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_small_pashto_ps.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_pashto_ps.md new file mode 100644 index 00000000000000..6068830568ef69 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_pashto_ps.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Pashto, Pushto whisper_small_pashto WhisperForCTC from ihanif +author: John Snow Labs +name: whisper_small_pashto +date: 2024-09-21 +tags: [ps, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: ps +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_pashto` is a Pashto, Pushto model originally trained by ihanif. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_pashto_ps_5.5.0_3.0_1726878577342.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_pashto_ps_5.5.0_3.0_1726878577342.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_pashto","ps") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_pashto", "ps") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_pashto| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|ps| +|Size:|1.7 GB| + +## References + +https://huggingface.co/ihanif/whisper-small-ps \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_small_persian_farsi_1k_steps_fa.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_persian_farsi_1k_steps_fa.md new file mode 100644 index 00000000000000..c7c8c3fa49d9dd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_persian_farsi_1k_steps_fa.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Persian whisper_small_persian_farsi_1k_steps WhisperForCTC from sanchit-gandhi +author: John Snow Labs +name: whisper_small_persian_farsi_1k_steps +date: 2024-09-21 +tags: [fa, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: fa +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_persian_farsi_1k_steps` is a Persian model originally trained by sanchit-gandhi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_persian_farsi_1k_steps_fa_5.5.0_3.0_1726937078203.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_persian_farsi_1k_steps_fa_5.5.0_3.0_1726937078203.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_persian_farsi_1k_steps","fa") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_persian_farsi_1k_steps", "fa") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_persian_farsi_1k_steps| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|fa| +|Size:|1.7 GB| + +## References + +https://huggingface.co/sanchit-gandhi/whisper-small-fa-1k-steps \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_small_persian_farsi_1k_steps_pipeline_fa.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_persian_farsi_1k_steps_pipeline_fa.md new file mode 100644 index 00000000000000..5d32ae2e096a12 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_persian_farsi_1k_steps_pipeline_fa.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Persian whisper_small_persian_farsi_1k_steps_pipeline pipeline WhisperForCTC from sanchit-gandhi +author: John Snow Labs +name: whisper_small_persian_farsi_1k_steps_pipeline +date: 2024-09-21 +tags: [fa, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: fa +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_persian_farsi_1k_steps_pipeline` is a Persian model originally trained by sanchit-gandhi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_persian_farsi_1k_steps_pipeline_fa_5.5.0_3.0_1726937185115.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_persian_farsi_1k_steps_pipeline_fa_5.5.0_3.0_1726937185115.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_persian_farsi_1k_steps_pipeline", lang = "fa") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_persian_farsi_1k_steps_pipeline", lang = "fa") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_persian_farsi_1k_steps_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|fa| +|Size:|1.7 GB| + +## References + +https://huggingface.co/sanchit-gandhi/whisper-small-fa-1k-steps + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_small_persian_farsi_hi.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_persian_farsi_hi.md new file mode 100644 index 00000000000000..9ef65c0326492a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_persian_farsi_hi.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Hindi whisper_small_persian_farsi WhisperForCTC from hubare +author: John Snow Labs +name: whisper_small_persian_farsi +date: 2024-09-21 +tags: [hi, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: hi +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_persian_farsi` is a Hindi model originally trained by hubare. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_persian_farsi_hi_5.5.0_3.0_1726909994877.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_persian_farsi_hi_5.5.0_3.0_1726909994877.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_persian_farsi","hi") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_persian_farsi", "hi") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_persian_farsi| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|hi| +|Size:|1.7 GB| + +## References + +https://huggingface.co/hubare/whisper-small-fa \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_small_persian_farsi_pipeline_hi.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_persian_farsi_pipeline_hi.md new file mode 100644 index 00000000000000..f30efa49cc937d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_persian_farsi_pipeline_hi.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Hindi whisper_small_persian_farsi_pipeline pipeline WhisperForCTC from hubare +author: John Snow Labs +name: whisper_small_persian_farsi_pipeline +date: 2024-09-21 +tags: [hi, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: hi +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_persian_farsi_pipeline` is a Hindi model originally trained by hubare. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_persian_farsi_pipeline_hi_5.5.0_3.0_1726910084390.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_persian_farsi_pipeline_hi_5.5.0_3.0_1726910084390.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_persian_farsi_pipeline", lang = "hi") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_persian_farsi_pipeline", lang = "hi") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_persian_farsi_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|hi| +|Size:|1.7 GB| + +## References + +https://huggingface.co/hubare/whisper-small-fa + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_small_pipeline_dv.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_pipeline_dv.md new file mode 100644 index 00000000000000..fb1efb04f4f85d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_pipeline_dv.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Dhivehi, Divehi, Maldivian whisper_small_pipeline pipeline WhisperForCTC from ClementXie +author: John Snow Labs +name: whisper_small_pipeline +date: 2024-09-21 +tags: [dv, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: dv +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_pipeline` is a Dhivehi, Divehi, Maldivian model originally trained by ClementXie. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_pipeline_dv_5.5.0_3.0_1726950865208.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_pipeline_dv_5.5.0_3.0_1726950865208.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_pipeline", lang = "dv") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_pipeline", lang = "dv") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|dv| +|Size:|1.7 GB| + +## References + +https://huggingface.co/ClementXie/whisper-small + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_small_portuguese_1_pipeline_pt.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_portuguese_1_pipeline_pt.md new file mode 100644 index 00000000000000..c18d62f711f8fb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_portuguese_1_pipeline_pt.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Portuguese whisper_small_portuguese_1_pipeline pipeline WhisperForCTC from Berly00 +author: John Snow Labs +name: whisper_small_portuguese_1_pipeline +date: 2024-09-21 +tags: [pt, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: pt +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_portuguese_1_pipeline` is a Portuguese model originally trained by Berly00. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_portuguese_1_pipeline_pt_5.5.0_3.0_1726939156825.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_portuguese_1_pipeline_pt_5.5.0_3.0_1726939156825.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_portuguese_1_pipeline", lang = "pt") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_portuguese_1_pipeline", lang = "pt") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_portuguese_1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|pt| +|Size:|1.7 GB| + +## References + +https://huggingface.co/Berly00/whisper-small-portuguese-1 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_small_portuguese_1_pt.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_portuguese_1_pt.md new file mode 100644 index 00000000000000..c46eddca2eab8f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_portuguese_1_pt.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Portuguese whisper_small_portuguese_1 WhisperForCTC from Berly00 +author: John Snow Labs +name: whisper_small_portuguese_1 +date: 2024-09-21 +tags: [pt, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: pt +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_portuguese_1` is a Portuguese model originally trained by Berly00. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_portuguese_1_pt_5.5.0_3.0_1726939074785.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_portuguese_1_pt_5.5.0_3.0_1726939074785.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_portuguese_1","pt") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_portuguese_1", "pt") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_portuguese_1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|pt| +|Size:|1.7 GB| + +## References + +https://huggingface.co/Berly00/whisper-small-portuguese-1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_small_portuguese_m2laborg_pipeline_pt.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_portuguese_m2laborg_pipeline_pt.md new file mode 100644 index 00000000000000..574eb248552c9f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_portuguese_m2laborg_pipeline_pt.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Portuguese whisper_small_portuguese_m2laborg_pipeline pipeline WhisperForCTC from M2LabOrg +author: John Snow Labs +name: whisper_small_portuguese_m2laborg_pipeline +date: 2024-09-21 +tags: [pt, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: pt +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_portuguese_m2laborg_pipeline` is a Portuguese model originally trained by M2LabOrg. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_portuguese_m2laborg_pipeline_pt_5.5.0_3.0_1726912590327.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_portuguese_m2laborg_pipeline_pt_5.5.0_3.0_1726912590327.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_portuguese_m2laborg_pipeline", lang = "pt") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_portuguese_m2laborg_pipeline", lang = "pt") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_portuguese_m2laborg_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|pt| +|Size:|1.7 GB| + +## References + +https://huggingface.co/M2LabOrg/whisper-small-pt + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_small_portuguese_m2laborg_pt.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_portuguese_m2laborg_pt.md new file mode 100644 index 00000000000000..0a0532ebd3ec9d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_portuguese_m2laborg_pt.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Portuguese whisper_small_portuguese_m2laborg WhisperForCTC from M2LabOrg +author: John Snow Labs +name: whisper_small_portuguese_m2laborg +date: 2024-09-21 +tags: [pt, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: pt +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_portuguese_m2laborg` is a Portuguese model originally trained by M2LabOrg. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_portuguese_m2laborg_pt_5.5.0_3.0_1726912506253.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_portuguese_m2laborg_pt_5.5.0_3.0_1726912506253.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_portuguese_m2laborg","pt") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_portuguese_m2laborg", "pt") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_portuguese_m2laborg| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|pt| +|Size:|1.7 GB| + +## References + +https://huggingface.co/M2LabOrg/whisper-small-pt \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_small_rixvox_en.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_rixvox_en.md new file mode 100644 index 00000000000000..dd94973679657c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_rixvox_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_small_rixvox WhisperForCTC from KBLab +author: John Snow Labs +name: whisper_small_rixvox +date: 2024-09-21 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_rixvox` is a English model originally trained by KBLab. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_rixvox_en_5.5.0_3.0_1726893207906.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_rixvox_en_5.5.0_3.0_1726893207906.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_rixvox","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_rixvox", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_rixvox| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.6 GB| + +## References + +https://huggingface.co/KBLab/whisper-small-rixvox \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_small_rixvox_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_rixvox_pipeline_en.md new file mode 100644 index 00000000000000..522b2a853e11fd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_rixvox_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_small_rixvox_pipeline pipeline WhisperForCTC from KBLab +author: John Snow Labs +name: whisper_small_rixvox_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_rixvox_pipeline` is a English model originally trained by KBLab. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_rixvox_pipeline_en_5.5.0_3.0_1726893336587.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_rixvox_pipeline_en_5.5.0_3.0_1726893336587.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_rixvox_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_rixvox_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_rixvox_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.6 GB| + +## References + +https://huggingface.co/KBLab/whisper-small-rixvox + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_small_romanian_cv11_pipeline_ro.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_romanian_cv11_pipeline_ro.md new file mode 100644 index 00000000000000..70ba7ec08eac1c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_romanian_cv11_pipeline_ro.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Moldavian, Moldovan, Romanian whisper_small_romanian_cv11_pipeline pipeline WhisperForCTC from mikr +author: John Snow Labs +name: whisper_small_romanian_cv11_pipeline +date: 2024-09-21 +tags: [ro, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: ro +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_romanian_cv11_pipeline` is a Moldavian, Moldovan, Romanian model originally trained by mikr. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_romanian_cv11_pipeline_ro_5.5.0_3.0_1726962856404.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_romanian_cv11_pipeline_ro_5.5.0_3.0_1726962856404.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_romanian_cv11_pipeline", lang = "ro") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_romanian_cv11_pipeline", lang = "ro") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_romanian_cv11_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|ro| +|Size:|1.1 GB| + +## References + +https://huggingface.co/mikr/whisper-small-ro-cv11 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_small_romanian_cv11_ro.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_romanian_cv11_ro.md new file mode 100644 index 00000000000000..0d89e05c471c8a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_romanian_cv11_ro.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Moldavian, Moldovan, Romanian whisper_small_romanian_cv11 WhisperForCTC from mikr +author: John Snow Labs +name: whisper_small_romanian_cv11 +date: 2024-09-21 +tags: [ro, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: ro +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_romanian_cv11` is a Moldavian, Moldovan, Romanian model originally trained by mikr. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_romanian_cv11_ro_5.5.0_3.0_1726962570615.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_romanian_cv11_ro_5.5.0_3.0_1726962570615.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_romanian_cv11","ro") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_romanian_cv11", "ro") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_romanian_cv11| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|ro| +|Size:|1.1 GB| + +## References + +https://huggingface.co/mikr/whisper-small-ro-cv11 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_small_rus_kainet_en.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_rus_kainet_en.md new file mode 100644 index 00000000000000..a76d64d210cfc2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_rus_kainet_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_small_rus_kainet WhisperForCTC from Kainet +author: John Snow Labs +name: whisper_small_rus_kainet +date: 2024-09-21 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_rus_kainet` is a English model originally trained by Kainet. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_rus_kainet_en_5.5.0_3.0_1726891742330.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_rus_kainet_en_5.5.0_3.0_1726891742330.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_rus_kainet","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_rus_kainet", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_rus_kainet| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/Kainet/whisper-small-rus \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_small_rus_kainet_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_rus_kainet_pipeline_en.md new file mode 100644 index 00000000000000..1384ecdc12a09d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_rus_kainet_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_small_rus_kainet_pipeline pipeline WhisperForCTC from Kainet +author: John Snow Labs +name: whisper_small_rus_kainet_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_rus_kainet_pipeline` is a English model originally trained by Kainet. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_rus_kainet_pipeline_en_5.5.0_3.0_1726891828145.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_rus_kainet_pipeline_en_5.5.0_3.0_1726891828145.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_rus_kainet_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_rus_kainet_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_rus_kainet_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/Kainet/whisper-small-rus + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_small_russian_lorenzoncina_ru.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_russian_lorenzoncina_ru.md new file mode 100644 index 00000000000000..b96038ca06d2c7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_russian_lorenzoncina_ru.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Russian whisper_small_russian_lorenzoncina WhisperForCTC from lorenzoncina +author: John Snow Labs +name: whisper_small_russian_lorenzoncina +date: 2024-09-21 +tags: [ru, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: ru +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_russian_lorenzoncina` is a Russian model originally trained by lorenzoncina. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_russian_lorenzoncina_ru_5.5.0_3.0_1726939249065.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_russian_lorenzoncina_ru_5.5.0_3.0_1726939249065.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_russian_lorenzoncina","ru") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_russian_lorenzoncina", "ru") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_russian_lorenzoncina| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|ru| +|Size:|1.7 GB| + +## References + +https://huggingface.co/lorenzoncina/whisper-small-ru \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_small_russian_v4_pipeline_ru.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_russian_v4_pipeline_ru.md new file mode 100644 index 00000000000000..086af5493bd7e0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_russian_v4_pipeline_ru.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Russian whisper_small_russian_v4_pipeline pipeline WhisperForCTC from sam-alavardo-1980 +author: John Snow Labs +name: whisper_small_russian_v4_pipeline +date: 2024-09-21 +tags: [ru, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: ru +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_russian_v4_pipeline` is a Russian model originally trained by sam-alavardo-1980. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_russian_v4_pipeline_ru_5.5.0_3.0_1726893019344.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_russian_v4_pipeline_ru_5.5.0_3.0_1726893019344.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_russian_v4_pipeline", lang = "ru") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_russian_v4_pipeline", lang = "ru") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_russian_v4_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|ru| +|Size:|1.7 GB| + +## References + +https://huggingface.co/sam-alavardo-1980/whisper-small-ru-v4 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_small_russian_v4_ru.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_russian_v4_ru.md new file mode 100644 index 00000000000000..dd1f36e5075ece --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_russian_v4_ru.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Russian whisper_small_russian_v4 WhisperForCTC from sam-alavardo-1980 +author: John Snow Labs +name: whisper_small_russian_v4 +date: 2024-09-21 +tags: [ru, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: ru +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_russian_v4` is a Russian model originally trained by sam-alavardo-1980. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_russian_v4_ru_5.5.0_3.0_1726892939866.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_russian_v4_ru_5.5.0_3.0_1726892939866.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_russian_v4","ru") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_russian_v4", "ru") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_russian_v4| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|ru| +|Size:|1.7 GB| + +## References + +https://huggingface.co/sam-alavardo-1980/whisper-small-ru-v4 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_small_slovenian_pipeline_sl.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_slovenian_pipeline_sl.md new file mode 100644 index 00000000000000..1763d0090bf604 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_slovenian_pipeline_sl.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Slovenian whisper_small_slovenian_pipeline pipeline WhisperForCTC from samolego +author: John Snow Labs +name: whisper_small_slovenian_pipeline +date: 2024-09-21 +tags: [sl, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: sl +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_slovenian_pipeline` is a Slovenian model originally trained by samolego. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_slovenian_pipeline_sl_5.5.0_3.0_1726905372850.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_slovenian_pipeline_sl_5.5.0_3.0_1726905372850.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_slovenian_pipeline", lang = "sl") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_slovenian_pipeline", lang = "sl") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_slovenian_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|sl| +|Size:|1.7 GB| + +## References + +https://huggingface.co/samolego/whisper-small-slovenian + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_small_slovenian_sl.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_slovenian_sl.md new file mode 100644 index 00000000000000..d410b8204450be --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_slovenian_sl.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Slovenian whisper_small_slovenian WhisperForCTC from samolego +author: John Snow Labs +name: whisper_small_slovenian +date: 2024-09-21 +tags: [sl, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: sl +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_slovenian` is a Slovenian model originally trained by samolego. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_slovenian_sl_5.5.0_3.0_1726905289601.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_slovenian_sl_5.5.0_3.0_1726905289601.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_slovenian","sl") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_slovenian", "sl") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_slovenian| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|sl| +|Size:|1.7 GB| + +## References + +https://huggingface.co/samolego/whisper-small-slovenian \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_small_spanish_gonznm_es.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_spanish_gonznm_es.md new file mode 100644 index 00000000000000..27ae2f6cec10cd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_spanish_gonznm_es.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Castilian, Spanish whisper_small_spanish_gonznm WhisperForCTC from gonznm +author: John Snow Labs +name: whisper_small_spanish_gonznm +date: 2024-09-21 +tags: [es, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: es +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_spanish_gonznm` is a Castilian, Spanish model originally trained by gonznm. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_spanish_gonznm_es_5.5.0_3.0_1726935716689.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_spanish_gonznm_es_5.5.0_3.0_1726935716689.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_spanish_gonznm","es") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_spanish_gonznm", "es") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_spanish_gonznm| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|es| +|Size:|1.7 GB| + +## References + +https://huggingface.co/gonznm/whisper-small-es \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_small_spanish_gonznm_pipeline_es.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_spanish_gonznm_pipeline_es.md new file mode 100644 index 00000000000000..30d0d673b288fe --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_spanish_gonznm_pipeline_es.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Castilian, Spanish whisper_small_spanish_gonznm_pipeline pipeline WhisperForCTC from gonznm +author: John Snow Labs +name: whisper_small_spanish_gonznm_pipeline +date: 2024-09-21 +tags: [es, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: es +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_spanish_gonznm_pipeline` is a Castilian, Spanish model originally trained by gonznm. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_spanish_gonznm_pipeline_es_5.5.0_3.0_1726935806907.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_spanish_gonznm_pipeline_es_5.5.0_3.0_1726935806907.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_spanish_gonznm_pipeline", lang = "es") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_spanish_gonznm_pipeline", lang = "es") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_spanish_gonznm_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|es| +|Size:|1.7 GB| + +## References + +https://huggingface.co/gonznm/whisper-small-es + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_small_swe_en.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_swe_en.md new file mode 100644 index 00000000000000..75ef18ae704e3d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_swe_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_small_swe WhisperForCTC from Alexao +author: John Snow Labs +name: whisper_small_swe +date: 2024-09-21 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_swe` is a English model originally trained by Alexao. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_swe_en_5.5.0_3.0_1726891325497.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_swe_en_5.5.0_3.0_1726891325497.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_swe","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_swe", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_swe| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/Alexao/whisper-small-swe \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_small_swe_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_swe_pipeline_en.md new file mode 100644 index 00000000000000..b150310dfd5c4a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_swe_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_small_swe_pipeline pipeline WhisperForCTC from Alexao +author: John Snow Labs +name: whisper_small_swe_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_swe_pipeline` is a English model originally trained by Alexao. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_swe_pipeline_en_5.5.0_3.0_1726891405702.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_swe_pipeline_en_5.5.0_3.0_1726891405702.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_swe_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_swe_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_swe_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/Alexao/whisper-small-swe + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_small_swedish_bambara_sv.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_swedish_bambara_sv.md new file mode 100644 index 00000000000000..30163795135f92 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_swedish_bambara_sv.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Swedish whisper_small_swedish_bambara WhisperForCTC from birgermoell +author: John Snow Labs +name: whisper_small_swedish_bambara +date: 2024-09-21 +tags: [sv, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: sv +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_swedish_bambara` is a Swedish model originally trained by birgermoell. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_swedish_bambara_sv_5.5.0_3.0_1726912048541.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_swedish_bambara_sv_5.5.0_3.0_1726912048541.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_swedish_bambara","sv") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_swedish_bambara", "sv") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_swedish_bambara| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|sv| +|Size:|1.7 GB| + +## References + +https://huggingface.co/birgermoell/whisper-small-sv-bm \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_small_swedish_rscolati_en.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_swedish_rscolati_en.md new file mode 100644 index 00000000000000..5ad26d7ddaea7c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_swedish_rscolati_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_small_swedish_rscolati WhisperForCTC from rscolati +author: John Snow Labs +name: whisper_small_swedish_rscolati +date: 2024-09-21 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_swedish_rscolati` is a English model originally trained by rscolati. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_swedish_rscolati_en_5.5.0_3.0_1726893185066.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_swedish_rscolati_en_5.5.0_3.0_1726893185066.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_swedish_rscolati","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_swedish_rscolati", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_swedish_rscolati| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/rscolati/whisper-small-sv \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_small_swedish_rscolati_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_swedish_rscolati_pipeline_en.md new file mode 100644 index 00000000000000..e1e36433d208d1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_swedish_rscolati_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_small_swedish_rscolati_pipeline pipeline WhisperForCTC from rscolati +author: John Snow Labs +name: whisper_small_swedish_rscolati_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_swedish_rscolati_pipeline` is a English model originally trained by rscolati. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_swedish_rscolati_pipeline_en_5.5.0_3.0_1726893273793.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_swedish_rscolati_pipeline_en_5.5.0_3.0_1726893273793.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_swedish_rscolati_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_swedish_rscolati_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_swedish_rscolati_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/rscolati/whisper-small-sv + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_small_swedish_se2_pipeline_sv.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_swedish_se2_pipeline_sv.md new file mode 100644 index 00000000000000..96ca322b2fc2ea --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_swedish_se2_pipeline_sv.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Swedish whisper_small_swedish_se2_pipeline pipeline WhisperForCTC from woberg +author: John Snow Labs +name: whisper_small_swedish_se2_pipeline +date: 2024-09-21 +tags: [sv, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: sv +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_swedish_se2_pipeline` is a Swedish model originally trained by woberg. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_swedish_se2_pipeline_sv_5.5.0_3.0_1726895005301.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_swedish_se2_pipeline_sv_5.5.0_3.0_1726895005301.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_swedish_se2_pipeline", lang = "sv") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_swedish_se2_pipeline", lang = "sv") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_swedish_se2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|sv| +|Size:|1.7 GB| + +## References + +https://huggingface.co/woberg/whisper-small-sv-SE2 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_small_swedish_se2_sv.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_swedish_se2_sv.md new file mode 100644 index 00000000000000..34d17ff0bf3c48 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_swedish_se2_sv.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Swedish whisper_small_swedish_se2 WhisperForCTC from woberg +author: John Snow Labs +name: whisper_small_swedish_se2 +date: 2024-09-21 +tags: [sv, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: sv +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_swedish_se2` is a Swedish model originally trained by woberg. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_swedish_se2_sv_5.5.0_3.0_1726894924797.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_swedish_se2_sv_5.5.0_3.0_1726894924797.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_swedish_se2","sv") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_swedish_se2", "sv") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_swedish_se2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|sv| +|Size:|1.7 GB| + +## References + +https://huggingface.co/woberg/whisper-small-sv-SE2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_small_telugu_hi.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_telugu_hi.md new file mode 100644 index 00000000000000..81e55ee1942df0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_telugu_hi.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Hindi whisper_small_telugu WhisperForCTC from Mukund017 +author: John Snow Labs +name: whisper_small_telugu +date: 2024-09-21 +tags: [hi, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: hi +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_telugu` is a Hindi model originally trained by Mukund017. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_telugu_hi_5.5.0_3.0_1726948967575.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_telugu_hi_5.5.0_3.0_1726948967575.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_telugu","hi") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_telugu", "hi") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_telugu| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|hi| +|Size:|1.7 GB| + +## References + +https://huggingface.co/Mukund017/whisper-small-telugu \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_small_telugu_pipeline_hi.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_telugu_pipeline_hi.md new file mode 100644 index 00000000000000..2c07de19d1f2eb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_telugu_pipeline_hi.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Hindi whisper_small_telugu_pipeline pipeline WhisperForCTC from Mukund017 +author: John Snow Labs +name: whisper_small_telugu_pipeline +date: 2024-09-21 +tags: [hi, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: hi +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_telugu_pipeline` is a Hindi model originally trained by Mukund017. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_telugu_pipeline_hi_5.5.0_3.0_1726949049377.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_telugu_pipeline_hi_5.5.0_3.0_1726949049377.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_telugu_pipeline", lang = "hi") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_telugu_pipeline", lang = "hi") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_telugu_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|hi| +|Size:|1.7 GB| + +## References + +https://huggingface.co/Mukund017/whisper-small-telugu + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_small_turkish_sgangireddy_pipeline_tr.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_turkish_sgangireddy_pipeline_tr.md new file mode 100644 index 00000000000000..59e9b4282dd1b6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_turkish_sgangireddy_pipeline_tr.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Turkish whisper_small_turkish_sgangireddy_pipeline pipeline WhisperForCTC from sgangireddy +author: John Snow Labs +name: whisper_small_turkish_sgangireddy_pipeline +date: 2024-09-21 +tags: [tr, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: tr +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_turkish_sgangireddy_pipeline` is a Turkish model originally trained by sgangireddy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_turkish_sgangireddy_pipeline_tr_5.5.0_3.0_1726891695394.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_turkish_sgangireddy_pipeline_tr_5.5.0_3.0_1726891695394.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_turkish_sgangireddy_pipeline", lang = "tr") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_turkish_sgangireddy_pipeline", lang = "tr") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_turkish_sgangireddy_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|tr| +|Size:|1.7 GB| + +## References + +https://huggingface.co/sgangireddy/whisper-small-tr + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_small_turkish_sgangireddy_tr.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_turkish_sgangireddy_tr.md new file mode 100644 index 00000000000000..a779232c794317 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_turkish_sgangireddy_tr.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Turkish whisper_small_turkish_sgangireddy WhisperForCTC from sgangireddy +author: John Snow Labs +name: whisper_small_turkish_sgangireddy +date: 2024-09-21 +tags: [tr, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: tr +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_turkish_sgangireddy` is a Turkish model originally trained by sgangireddy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_turkish_sgangireddy_tr_5.5.0_3.0_1726891613492.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_turkish_sgangireddy_tr_5.5.0_3.0_1726891613492.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_turkish_sgangireddy","tr") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_turkish_sgangireddy", "tr") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_turkish_sgangireddy| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|tr| +|Size:|1.7 GB| + +## References + +https://huggingface.co/sgangireddy/whisper-small-tr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_small_uzbek_gitnazarov_pipeline_uz.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_uzbek_gitnazarov_pipeline_uz.md new file mode 100644 index 00000000000000..4eb4d434931966 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_uzbek_gitnazarov_pipeline_uz.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Uzbek whisper_small_uzbek_gitnazarov_pipeline pipeline WhisperForCTC from GitNazarov +author: John Snow Labs +name: whisper_small_uzbek_gitnazarov_pipeline +date: 2024-09-21 +tags: [uz, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: uz +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_uzbek_gitnazarov_pipeline` is a Uzbek model originally trained by GitNazarov. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_uzbek_gitnazarov_pipeline_uz_5.5.0_3.0_1726951142080.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_uzbek_gitnazarov_pipeline_uz_5.5.0_3.0_1726951142080.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_uzbek_gitnazarov_pipeline", lang = "uz") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_uzbek_gitnazarov_pipeline", lang = "uz") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_uzbek_gitnazarov_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|uz| +|Size:|1.1 GB| + +## References + +https://huggingface.co/GitNazarov/whisper-small-uz + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_small_uzbek_gitnazarov_uz.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_uzbek_gitnazarov_uz.md new file mode 100644 index 00000000000000..009a0b8624e863 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_uzbek_gitnazarov_uz.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Uzbek whisper_small_uzbek_gitnazarov WhisperForCTC from GitNazarov +author: John Snow Labs +name: whisper_small_uzbek_gitnazarov +date: 2024-09-21 +tags: [uz, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: uz +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_uzbek_gitnazarov` is a Uzbek model originally trained by GitNazarov. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_uzbek_gitnazarov_uz_5.5.0_3.0_1726950829188.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_uzbek_gitnazarov_uz_5.5.0_3.0_1726950829188.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_uzbek_gitnazarov","uz") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_uzbek_gitnazarov", "uz") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_uzbek_gitnazarov| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|uz| +|Size:|1.1 GB| + +## References + +https://huggingface.co/GitNazarov/whisper-small-uz \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_small_vasi001_hi.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_vasi001_hi.md new file mode 100644 index 00000000000000..e12901fbaaf9e9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_vasi001_hi.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Hindi whisper_small_vasi001 WhisperForCTC from Vasi001 +author: John Snow Labs +name: whisper_small_vasi001 +date: 2024-09-21 +tags: [hi, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: hi +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_vasi001` is a Hindi model originally trained by Vasi001. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_vasi001_hi_5.5.0_3.0_1726937653820.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_vasi001_hi_5.5.0_3.0_1726937653820.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_vasi001","hi") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_vasi001", "hi") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_vasi001| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|hi| +|Size:|1.7 GB| + +## References + +https://huggingface.co/Vasi001/whisper-small \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_small_vietnamese_v4_en.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_vietnamese_v4_en.md new file mode 100644 index 00000000000000..09477aae8d8251 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_vietnamese_v4_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_small_vietnamese_v4 WhisperForCTC from thanhduycao +author: John Snow Labs +name: whisper_small_vietnamese_v4 +date: 2024-09-21 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_vietnamese_v4` is a English model originally trained by thanhduycao. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_vietnamese_v4_en_5.5.0_3.0_1726893675526.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_vietnamese_v4_en_5.5.0_3.0_1726893675526.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_vietnamese_v4","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_vietnamese_v4", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_vietnamese_v4| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/thanhduycao/whisper-small-vi-v4 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_small_vietnamese_v4_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_vietnamese_v4_pipeline_en.md new file mode 100644 index 00000000000000..828b119c573e0b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_vietnamese_v4_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_small_vietnamese_v4_pipeline pipeline WhisperForCTC from thanhduycao +author: John Snow Labs +name: whisper_small_vietnamese_v4_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_vietnamese_v4_pipeline` is a English model originally trained by thanhduycao. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_vietnamese_v4_pipeline_en_5.5.0_3.0_1726893767941.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_vietnamese_v4_pipeline_en_5.5.0_3.0_1726893767941.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_vietnamese_v4_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_vietnamese_v4_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_vietnamese_v4_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/thanhduycao/whisper-small-vi-v4 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_small_withaq_v1_en.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_withaq_v1_en.md new file mode 100644 index 00000000000000..3f62c0f6fce62e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_withaq_v1_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_small_withaq_v1 WhisperForCTC from naiftamia +author: John Snow Labs +name: whisper_small_withaq_v1 +date: 2024-09-21 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_withaq_v1` is a English model originally trained by naiftamia. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_withaq_v1_en_5.5.0_3.0_1726905338968.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_withaq_v1_en_5.5.0_3.0_1726905338968.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_withaq_v1","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_withaq_v1", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_withaq_v1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/naiftamia/whisper-small-Withaq-V1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_small_withaq_v1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_withaq_v1_pipeline_en.md new file mode 100644 index 00000000000000..e8b2cc421c6a65 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_withaq_v1_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_small_withaq_v1_pipeline pipeline WhisperForCTC from naiftamia +author: John Snow Labs +name: whisper_small_withaq_v1_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_withaq_v1_pipeline` is a English model originally trained by naiftamia. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_withaq_v1_pipeline_en_5.5.0_3.0_1726905429360.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_withaq_v1_pipeline_en_5.5.0_3.0_1726905429360.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_withaq_v1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_withaq_v1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_withaq_v1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/naiftamia/whisper-small-Withaq-V1 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_testing_en.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_testing_en.md new file mode 100644 index 00000000000000..8bc3f710a3bd28 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_testing_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_testing WhisperForCTC from SamagraDataGov +author: John Snow Labs +name: whisper_testing +date: 2024-09-21 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_testing` is a English model originally trained by SamagraDataGov. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_testing_en_5.5.0_3.0_1726878565680.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_testing_en_5.5.0_3.0_1726878565680.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_testing","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_testing", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_testing| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|242.8 MB| + +## References + +https://huggingface.co/SamagraDataGov/whisper-testing \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_testing_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_testing_pipeline_en.md new file mode 100644 index 00000000000000..0bfbc283c62be1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_testing_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_testing_pipeline pipeline WhisperForCTC from SamagraDataGov +author: John Snow Labs +name: whisper_testing_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_testing_pipeline` is a English model originally trained by SamagraDataGov. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_testing_pipeline_en_5.5.0_3.0_1726878634881.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_testing_pipeline_en_5.5.0_3.0_1726878634881.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_testing_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_testing_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_testing_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|242.9 MB| + +## References + +https://huggingface.co/SamagraDataGov/whisper-testing + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny2_italian_it.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny2_italian_it.md new file mode 100644 index 00000000000000..6ebf3305c49ccc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny2_italian_it.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Italian whisper_tiny2_italian WhisperForCTC from luigisaetta +author: John Snow Labs +name: whisper_tiny2_italian +date: 2024-09-21 +tags: [it, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: it +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny2_italian` is a Italian model originally trained by luigisaetta. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny2_italian_it_5.5.0_3.0_1726892729595.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny2_italian_it_5.5.0_3.0_1726892729595.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_tiny2_italian","it") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_tiny2_italian", "it") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny2_italian| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|it| +|Size:|390.7 MB| + +## References + +https://huggingface.co/luigisaetta/whisper-tiny2-it \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny2_italian_pipeline_it.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny2_italian_pipeline_it.md new file mode 100644 index 00000000000000..a44c7efd40bd44 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny2_italian_pipeline_it.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Italian whisper_tiny2_italian_pipeline pipeline WhisperForCTC from luigisaetta +author: John Snow Labs +name: whisper_tiny2_italian_pipeline +date: 2024-09-21 +tags: [it, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: it +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny2_italian_pipeline` is a Italian model originally trained by luigisaetta. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny2_italian_pipeline_it_5.5.0_3.0_1726892749195.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny2_italian_pipeline_it_5.5.0_3.0_1726892749195.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_tiny2_italian_pipeline", lang = "it") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_tiny2_italian_pipeline", lang = "it") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny2_italian_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|it| +|Size:|390.8 MB| + +## References + +https://huggingface.co/luigisaetta/whisper-tiny2-it + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_chinese_hk_youtube_en.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_chinese_hk_youtube_en.md new file mode 100644 index 00000000000000..2053b1e49d8233 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_chinese_hk_youtube_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_tiny_chinese_hk_youtube WhisperForCTC from alex-tecky +author: John Snow Labs +name: whisper_tiny_chinese_hk_youtube +date: 2024-09-21 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_chinese_hk_youtube` is a English model originally trained by alex-tecky. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_chinese_hk_youtube_en_5.5.0_3.0_1726936202633.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_chinese_hk_youtube_en_5.5.0_3.0_1726936202633.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_tiny_chinese_hk_youtube","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_tiny_chinese_hk_youtube", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_chinese_hk_youtube| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|390.8 MB| + +## References + +https://huggingface.co/alex-tecky/whisper-tiny-zh-HK-youtube \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_chinese_hk_youtube_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_chinese_hk_youtube_pipeline_en.md new file mode 100644 index 00000000000000..0f905e35535067 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_chinese_hk_youtube_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_tiny_chinese_hk_youtube_pipeline pipeline WhisperForCTC from alex-tecky +author: John Snow Labs +name: whisper_tiny_chinese_hk_youtube_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_chinese_hk_youtube_pipeline` is a English model originally trained by alex-tecky. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_chinese_hk_youtube_pipeline_en_5.5.0_3.0_1726936221354.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_chinese_hk_youtube_pipeline_en_5.5.0_3.0_1726936221354.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_tiny_chinese_hk_youtube_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_tiny_chinese_hk_youtube_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_chinese_hk_youtube_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|390.9 MB| + +## References + +https://huggingface.co/alex-tecky/whisper-tiny-zh-HK-youtube + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_cv16_hungarian_final_hu.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_cv16_hungarian_final_hu.md new file mode 100644 index 00000000000000..2a6d1be949cbf9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_cv16_hungarian_final_hu.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Hungarian whisper_tiny_cv16_hungarian_final WhisperForCTC from Hungarians +author: John Snow Labs +name: whisper_tiny_cv16_hungarian_final +date: 2024-09-21 +tags: [hu, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: hu +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_cv16_hungarian_final` is a Hungarian model originally trained by Hungarians. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_cv16_hungarian_final_hu_5.5.0_3.0_1726892591598.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_cv16_hungarian_final_hu_5.5.0_3.0_1726892591598.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_tiny_cv16_hungarian_final","hu") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_tiny_cv16_hungarian_final", "hu") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_cv16_hungarian_final| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|hu| +|Size:|385.4 MB| + +## References + +https://huggingface.co/Hungarians/whisper-tiny-cv16-hu-final \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_cv16_hungarian_final_pipeline_hu.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_cv16_hungarian_final_pipeline_hu.md new file mode 100644 index 00000000000000..28861ff9c7ca4d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_cv16_hungarian_final_pipeline_hu.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Hungarian whisper_tiny_cv16_hungarian_final_pipeline pipeline WhisperForCTC from Hungarians +author: John Snow Labs +name: whisper_tiny_cv16_hungarian_final_pipeline +date: 2024-09-21 +tags: [hu, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: hu +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_cv16_hungarian_final_pipeline` is a Hungarian model originally trained by Hungarians. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_cv16_hungarian_final_pipeline_hu_5.5.0_3.0_1726892615254.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_cv16_hungarian_final_pipeline_hu_5.5.0_3.0_1726892615254.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_tiny_cv16_hungarian_final_pipeline", lang = "hu") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_tiny_cv16_hungarian_final_pipeline", lang = "hu") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_cv16_hungarian_final_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|hu| +|Size:|385.4 MB| + +## References + +https://huggingface.co/Hungarians/whisper-tiny-cv16-hu-final + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_divehi_leksa_pramheda_en.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_divehi_leksa_pramheda_en.md new file mode 100644 index 00000000000000..f587da23b8dbf7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_divehi_leksa_pramheda_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_tiny_divehi_leksa_pramheda WhisperForCTC from leksa-pramheda +author: John Snow Labs +name: whisper_tiny_divehi_leksa_pramheda +date: 2024-09-21 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_divehi_leksa_pramheda` is a English model originally trained by leksa-pramheda. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_divehi_leksa_pramheda_en_5.5.0_3.0_1726947495289.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_divehi_leksa_pramheda_en_5.5.0_3.0_1726947495289.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_tiny_divehi_leksa_pramheda","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_tiny_divehi_leksa_pramheda", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_divehi_leksa_pramheda| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|390.8 MB| + +## References + +https://huggingface.co/leksa-pramheda/whisper-tiny-dv \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_divehi_leksa_pramheda_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_divehi_leksa_pramheda_pipeline_en.md new file mode 100644 index 00000000000000..b9cc2e73fc1053 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_divehi_leksa_pramheda_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_tiny_divehi_leksa_pramheda_pipeline pipeline WhisperForCTC from leksa-pramheda +author: John Snow Labs +name: whisper_tiny_divehi_leksa_pramheda_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_divehi_leksa_pramheda_pipeline` is a English model originally trained by leksa-pramheda. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_divehi_leksa_pramheda_pipeline_en_5.5.0_3.0_1726947518821.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_divehi_leksa_pramheda_pipeline_en_5.5.0_3.0_1726947518821.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_tiny_divehi_leksa_pramheda_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_tiny_divehi_leksa_pramheda_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_divehi_leksa_pramheda_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|390.8 MB| + +## References + +https://huggingface.co/leksa-pramheda/whisper-tiny-dv + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_divehi_mcamara_en.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_divehi_mcamara_en.md new file mode 100644 index 00000000000000..056862f47b991b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_divehi_mcamara_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_tiny_divehi_mcamara WhisperForCTC from mcamara +author: John Snow Labs +name: whisper_tiny_divehi_mcamara +date: 2024-09-21 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_divehi_mcamara` is a English model originally trained by mcamara. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_divehi_mcamara_en_5.5.0_3.0_1726878242269.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_divehi_mcamara_en_5.5.0_3.0_1726878242269.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_tiny_divehi_mcamara","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_tiny_divehi_mcamara", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_divehi_mcamara| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|390.9 MB| + +## References + +https://huggingface.co/mcamara/whisper-tiny-dv \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_divehi_mcamara_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_divehi_mcamara_pipeline_en.md new file mode 100644 index 00000000000000..356ec731307b47 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_divehi_mcamara_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_tiny_divehi_mcamara_pipeline pipeline WhisperForCTC from mcamara +author: John Snow Labs +name: whisper_tiny_divehi_mcamara_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_divehi_mcamara_pipeline` is a English model originally trained by mcamara. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_divehi_mcamara_pipeline_en_5.5.0_3.0_1726878262061.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_divehi_mcamara_pipeline_en_5.5.0_3.0_1726878262061.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_tiny_divehi_mcamara_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_tiny_divehi_mcamara_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_divehi_mcamara_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|390.9 MB| + +## References + +https://huggingface.co/mcamara/whisper-tiny-dv + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_dutch_25_nl.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_dutch_25_nl.md new file mode 100644 index 00000000000000..b346e45c3ee43a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_dutch_25_nl.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Dutch, Flemish whisper_tiny_dutch_25 WhisperForCTC from renesteeman +author: John Snow Labs +name: whisper_tiny_dutch_25 +date: 2024-09-21 +tags: [nl, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: nl +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_dutch_25` is a Dutch, Flemish model originally trained by renesteeman. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_dutch_25_nl_5.5.0_3.0_1726936375811.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_dutch_25_nl_5.5.0_3.0_1726936375811.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_tiny_dutch_25","nl") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_tiny_dutch_25", "nl") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_dutch_25| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|nl| +|Size:|390.8 MB| + +## References + +https://huggingface.co/renesteeman/whisper-tiny-dutch-25 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_dutch_25_pipeline_nl.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_dutch_25_pipeline_nl.md new file mode 100644 index 00000000000000..18e3ec1ea77413 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_dutch_25_pipeline_nl.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Dutch, Flemish whisper_tiny_dutch_25_pipeline pipeline WhisperForCTC from renesteeman +author: John Snow Labs +name: whisper_tiny_dutch_25_pipeline +date: 2024-09-21 +tags: [nl, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: nl +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_dutch_25_pipeline` is a Dutch, Flemish model originally trained by renesteeman. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_dutch_25_pipeline_nl_5.5.0_3.0_1726936396239.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_dutch_25_pipeline_nl_5.5.0_3.0_1726936396239.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_tiny_dutch_25_pipeline", lang = "nl") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_tiny_dutch_25_pipeline", lang = "nl") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_dutch_25_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|nl| +|Size:|390.9 MB| + +## References + +https://huggingface.co/renesteeman/whisper-tiny-dutch-25 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_english_ellight_en.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_english_ellight_en.md new file mode 100644 index 00000000000000..1b566cf8bcdf55 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_english_ellight_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_tiny_english_ellight WhisperForCTC from Ellight +author: John Snow Labs +name: whisper_tiny_english_ellight +date: 2024-09-21 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_english_ellight` is a English model originally trained by Ellight. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_english_ellight_en_5.5.0_3.0_1726960908948.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_english_ellight_en_5.5.0_3.0_1726960908948.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_tiny_english_ellight","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_tiny_english_ellight", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_english_ellight| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|389.9 MB| + +## References + +https://huggingface.co/Ellight/whisper-tiny-en \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_english_ellight_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_english_ellight_pipeline_en.md new file mode 100644 index 00000000000000..bf61623de60dfb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_english_ellight_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_tiny_english_ellight_pipeline pipeline WhisperForCTC from Ellight +author: John Snow Labs +name: whisper_tiny_english_ellight_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_english_ellight_pipeline` is a English model originally trained by Ellight. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_english_ellight_pipeline_en_5.5.0_3.0_1726960927466.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_english_ellight_pipeline_en_5.5.0_3.0_1726960927466.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_tiny_english_ellight_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_tiny_english_ellight_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_english_ellight_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|390.0 MB| + +## References + +https://huggingface.co/Ellight/whisper-tiny-en + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_english_korif_en.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_english_korif_en.md new file mode 100644 index 00000000000000..13618f450a88b3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_english_korif_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_tiny_english_korif WhisperForCTC from KoRiF +author: John Snow Labs +name: whisper_tiny_english_korif +date: 2024-09-21 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_english_korif` is a English model originally trained by KoRiF. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_english_korif_en_5.5.0_3.0_1726962037931.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_english_korif_en_5.5.0_3.0_1726962037931.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_tiny_english_korif","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_tiny_english_korif", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_english_korif| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|390.9 MB| + +## References + +https://huggingface.co/KoRiF/whisper-tiny-en \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_english_korif_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_english_korif_pipeline_en.md new file mode 100644 index 00000000000000..937a27eb5a8970 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_english_korif_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_tiny_english_korif_pipeline pipeline WhisperForCTC from KoRiF +author: John Snow Labs +name: whisper_tiny_english_korif_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_english_korif_pipeline` is a English model originally trained by KoRiF. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_english_korif_pipeline_en_5.5.0_3.0_1726962056114.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_english_korif_pipeline_en_5.5.0_3.0_1726962056114.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_tiny_english_korif_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_tiny_english_korif_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_english_korif_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|390.9 MB| + +## References + +https://huggingface.co/KoRiF/whisper-tiny-en + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_english_us_sumet_en.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_english_us_sumet_en.md new file mode 100644 index 00000000000000..518625af1930d1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_english_us_sumet_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_tiny_english_us_sumet WhisperForCTC from sumet +author: John Snow Labs +name: whisper_tiny_english_us_sumet +date: 2024-09-21 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_english_us_sumet` is a English model originally trained by sumet. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_english_us_sumet_en_5.5.0_3.0_1726905992328.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_english_us_sumet_en_5.5.0_3.0_1726905992328.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_tiny_english_us_sumet","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_tiny_english_us_sumet", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_english_us_sumet| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|390.9 MB| + +## References + +https://huggingface.co/sumet/whisper-tiny-en-US \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_english_us_sumet_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_english_us_sumet_pipeline_en.md new file mode 100644 index 00000000000000..21a0cd9356e4c9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_english_us_sumet_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_tiny_english_us_sumet_pipeline pipeline WhisperForCTC from sumet +author: John Snow Labs +name: whisper_tiny_english_us_sumet_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_english_us_sumet_pipeline` is a English model originally trained by sumet. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_english_us_sumet_pipeline_en_5.5.0_3.0_1726906011183.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_english_us_sumet_pipeline_en_5.5.0_3.0_1726906011183.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_tiny_english_us_sumet_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_tiny_english_us_sumet_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_english_us_sumet_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|390.9 MB| + +## References + +https://huggingface.co/sumet/whisper-tiny-en-US + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_enus_tieincred_en.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_enus_tieincred_en.md new file mode 100644 index 00000000000000..71c7bb47b65342 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_enus_tieincred_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_tiny_enus_tieincred WhisperForCTC from TieIncred +author: John Snow Labs +name: whisper_tiny_enus_tieincred +date: 2024-09-21 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_enus_tieincred` is a English model originally trained by TieIncred. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_enus_tieincred_en_5.5.0_3.0_1726906233761.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_enus_tieincred_en_5.5.0_3.0_1726906233761.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_tiny_enus_tieincred","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_tiny_enus_tieincred", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_enus_tieincred| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|390.9 MB| + +## References + +https://huggingface.co/TieIncred/whisper-tiny-enUS \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_enus_tieincred_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_enus_tieincred_pipeline_en.md new file mode 100644 index 00000000000000..6206b95a784cde --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_enus_tieincred_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_tiny_enus_tieincred_pipeline pipeline WhisperForCTC from TieIncred +author: John Snow Labs +name: whisper_tiny_enus_tieincred_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_enus_tieincred_pipeline` is a English model originally trained by TieIncred. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_enus_tieincred_pipeline_en_5.5.0_3.0_1726906253424.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_enus_tieincred_pipeline_en_5.5.0_3.0_1726906253424.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_tiny_enus_tieincred_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_tiny_enus_tieincred_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_enus_tieincred_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|390.9 MB| + +## References + +https://huggingface.co/TieIncred/whisper-tiny-enUS + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_faroese_8k_steps_100h_fo.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_faroese_8k_steps_100h_fo.md new file mode 100644 index 00000000000000..a575c6afb95edd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_faroese_8k_steps_100h_fo.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Faroese whisper_tiny_faroese_8k_steps_100h WhisperForCTC from carlosdanielhernandezmena +author: John Snow Labs +name: whisper_tiny_faroese_8k_steps_100h +date: 2024-09-21 +tags: [fo, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: fo +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_faroese_8k_steps_100h` is a Faroese model originally trained by carlosdanielhernandezmena. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_faroese_8k_steps_100h_fo_5.5.0_3.0_1726877866989.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_faroese_8k_steps_100h_fo_5.5.0_3.0_1726877866989.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_tiny_faroese_8k_steps_100h","fo") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_tiny_faroese_8k_steps_100h", "fo") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_faroese_8k_steps_100h| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|fo| +|Size:|390.9 MB| + +## References + +https://huggingface.co/carlosdanielhernandezmena/whisper-tiny-faroese-8k-steps-100h \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_faroese_8k_steps_100h_pipeline_fo.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_faroese_8k_steps_100h_pipeline_fo.md new file mode 100644 index 00000000000000..c590913e761ecd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_faroese_8k_steps_100h_pipeline_fo.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Faroese whisper_tiny_faroese_8k_steps_100h_pipeline pipeline WhisperForCTC from carlosdanielhernandezmena +author: John Snow Labs +name: whisper_tiny_faroese_8k_steps_100h_pipeline +date: 2024-09-21 +tags: [fo, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: fo +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_faroese_8k_steps_100h_pipeline` is a Faroese model originally trained by carlosdanielhernandezmena. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_faroese_8k_steps_100h_pipeline_fo_5.5.0_3.0_1726877886786.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_faroese_8k_steps_100h_pipeline_fo_5.5.0_3.0_1726877886786.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_tiny_faroese_8k_steps_100h_pipeline", lang = "fo") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_tiny_faroese_8k_steps_100h_pipeline", lang = "fo") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_faroese_8k_steps_100h_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|fo| +|Size:|390.9 MB| + +## References + +https://huggingface.co/carlosdanielhernandezmena/whisper-tiny-faroese-8k-steps-100h + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_faroese_temp_en.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_faroese_temp_en.md new file mode 100644 index 00000000000000..39237595c2cd27 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_faroese_temp_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_tiny_faroese_temp WhisperForCTC from lukespeech +author: John Snow Labs +name: whisper_tiny_faroese_temp +date: 2024-09-21 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_faroese_temp` is a English model originally trained by lukespeech. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_faroese_temp_en_5.5.0_3.0_1726908840067.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_faroese_temp_en_5.5.0_3.0_1726908840067.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_tiny_faroese_temp","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_tiny_faroese_temp", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_faroese_temp| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|389.9 MB| + +## References + +https://huggingface.co/lukespeech/whisper-tiny-fo-temp \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_faroese_temp_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_faroese_temp_pipeline_en.md new file mode 100644 index 00000000000000..672485ae515508 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_faroese_temp_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_tiny_faroese_temp_pipeline pipeline WhisperForCTC from lukespeech +author: John Snow Labs +name: whisper_tiny_faroese_temp_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_faroese_temp_pipeline` is a English model originally trained by lukespeech. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_faroese_temp_pipeline_en_5.5.0_3.0_1726908859633.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_faroese_temp_pipeline_en_5.5.0_3.0_1726908859633.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_tiny_faroese_temp_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_tiny_faroese_temp_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_faroese_temp_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|389.9 MB| + +## References + +https://huggingface.co/lukespeech/whisper-tiny-fo-temp + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_frenchmed_v1_fr.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_frenchmed_v1_fr.md new file mode 100644 index 00000000000000..d7341edeaa6374 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_frenchmed_v1_fr.md @@ -0,0 +1,84 @@ +--- +layout: model +title: French whisper_tiny_frenchmed_v1 WhisperForCTC from Hanhpt23 +author: John Snow Labs +name: whisper_tiny_frenchmed_v1 +date: 2024-09-21 +tags: [fr, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: fr +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_frenchmed_v1` is a French model originally trained by Hanhpt23. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_frenchmed_v1_fr_5.5.0_3.0_1726939373479.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_frenchmed_v1_fr_5.5.0_3.0_1726939373479.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_tiny_frenchmed_v1","fr") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_tiny_frenchmed_v1", "fr") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_frenchmed_v1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|fr| +|Size:|379.2 MB| + +## References + +https://huggingface.co/Hanhpt23/whisper-tiny-frenchmed-v1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_frenchmed_v1_pipeline_fr.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_frenchmed_v1_pipeline_fr.md new file mode 100644 index 00000000000000..b499b210d38a5a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_frenchmed_v1_pipeline_fr.md @@ -0,0 +1,69 @@ +--- +layout: model +title: French whisper_tiny_frenchmed_v1_pipeline pipeline WhisperForCTC from Hanhpt23 +author: John Snow Labs +name: whisper_tiny_frenchmed_v1_pipeline +date: 2024-09-21 +tags: [fr, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: fr +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_frenchmed_v1_pipeline` is a French model originally trained by Hanhpt23. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_frenchmed_v1_pipeline_fr_5.5.0_3.0_1726939396382.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_frenchmed_v1_pipeline_fr_5.5.0_3.0_1726939396382.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_tiny_frenchmed_v1_pipeline", lang = "fr") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_tiny_frenchmed_v1_pipeline", lang = "fr") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_frenchmed_v1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|fr| +|Size:|379.2 MB| + +## References + +https://huggingface.co/Hanhpt23/whisper-tiny-frenchmed-v1 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_hindi_common_voice_16_1_en.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_hindi_common_voice_16_1_en.md new file mode 100644 index 00000000000000..74ba20c8442f80 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_hindi_common_voice_16_1_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_tiny_hindi_common_voice_16_1 WhisperForCTC from archit342000 +author: John Snow Labs +name: whisper_tiny_hindi_common_voice_16_1 +date: 2024-09-21 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_hindi_common_voice_16_1` is a English model originally trained by archit342000. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_hindi_common_voice_16_1_en_5.5.0_3.0_1726961919587.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_hindi_common_voice_16_1_en_5.5.0_3.0_1726961919587.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_tiny_hindi_common_voice_16_1","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_tiny_hindi_common_voice_16_1", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_hindi_common_voice_16_1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|390.7 MB| + +## References + +https://huggingface.co/archit342000/whisper_tiny_hi_common_voice_16_1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_hindi_common_voice_16_1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_hindi_common_voice_16_1_pipeline_en.md new file mode 100644 index 00000000000000..f7bb0c6a70cece --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_hindi_common_voice_16_1_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_tiny_hindi_common_voice_16_1_pipeline pipeline WhisperForCTC from archit342000 +author: John Snow Labs +name: whisper_tiny_hindi_common_voice_16_1_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_hindi_common_voice_16_1_pipeline` is a English model originally trained by archit342000. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_hindi_common_voice_16_1_pipeline_en_5.5.0_3.0_1726961939227.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_hindi_common_voice_16_1_pipeline_en_5.5.0_3.0_1726961939227.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_tiny_hindi_common_voice_16_1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_tiny_hindi_common_voice_16_1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_hindi_common_voice_16_1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|390.7 MB| + +## References + +https://huggingface.co/archit342000/whisper_tiny_hi_common_voice_16_1 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_kannada_kn.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_kannada_kn.md new file mode 100644 index 00000000000000..fe672965cf9f49 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_kannada_kn.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Kannada whisper_tiny_kannada WhisperForCTC from parambharat +author: John Snow Labs +name: whisper_tiny_kannada +date: 2024-09-21 +tags: [kn, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: kn +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_kannada` is a Kannada model originally trained by parambharat. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_kannada_kn_5.5.0_3.0_1726891918307.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_kannada_kn_5.5.0_3.0_1726891918307.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_tiny_kannada","kn") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_tiny_kannada", "kn") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_kannada| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|kn| +|Size:|391.1 MB| + +## References + +https://huggingface.co/parambharat/whisper-tiny-kn \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_kannada_pipeline_kn.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_kannada_pipeline_kn.md new file mode 100644 index 00000000000000..2d05ba283b0a85 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_kannada_pipeline_kn.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Kannada whisper_tiny_kannada_pipeline pipeline WhisperForCTC from parambharat +author: John Snow Labs +name: whisper_tiny_kannada_pipeline +date: 2024-09-21 +tags: [kn, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: kn +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_kannada_pipeline` is a Kannada model originally trained by parambharat. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_kannada_pipeline_kn_5.5.0_3.0_1726891937103.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_kannada_pipeline_kn_5.5.0_3.0_1726891937103.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_tiny_kannada_pipeline", lang = "kn") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_tiny_kannada_pipeline", lang = "kn") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_kannada_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|kn| +|Size:|391.1 MB| + +## References + +https://huggingface.co/parambharat/whisper-tiny-kn + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_khmer_colab_en.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_khmer_colab_en.md new file mode 100644 index 00000000000000..29c810537b43c7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_khmer_colab_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_tiny_khmer_colab WhisperForCTC from seanghay +author: John Snow Labs +name: whisper_tiny_khmer_colab +date: 2024-09-21 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_khmer_colab` is a English model originally trained by seanghay. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_khmer_colab_en_5.5.0_3.0_1726912661505.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_khmer_colab_en_5.5.0_3.0_1726912661505.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_tiny_khmer_colab","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_tiny_khmer_colab", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_khmer_colab| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|391.3 MB| + +## References + +https://huggingface.co/seanghay/whisper-tiny-km-colab \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_khmer_colab_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_khmer_colab_pipeline_en.md new file mode 100644 index 00000000000000..bcd236e5955a4b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_khmer_colab_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_tiny_khmer_colab_pipeline pipeline WhisperForCTC from seanghay +author: John Snow Labs +name: whisper_tiny_khmer_colab_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_khmer_colab_pipeline` is a English model originally trained by seanghay. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_khmer_colab_pipeline_en_5.5.0_3.0_1726912680861.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_khmer_colab_pipeline_en_5.5.0_3.0_1726912680861.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_tiny_khmer_colab_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_tiny_khmer_colab_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_khmer_colab_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|391.3 MB| + +## References + +https://huggingface.co/seanghay/whisper-tiny-km-colab + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_malayalam_sid330_en.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_malayalam_sid330_en.md new file mode 100644 index 00000000000000..937f6825907931 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_malayalam_sid330_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_tiny_malayalam_sid330 WhisperForCTC from sid330 +author: John Snow Labs +name: whisper_tiny_malayalam_sid330 +date: 2024-09-21 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_malayalam_sid330` is a English model originally trained by sid330. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_malayalam_sid330_en_5.5.0_3.0_1726949010013.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_malayalam_sid330_en_5.5.0_3.0_1726949010013.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_tiny_malayalam_sid330","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_tiny_malayalam_sid330", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_malayalam_sid330| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|390.2 MB| + +## References + +https://huggingface.co/sid330/whisper-tiny-ml \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_malayalam_sid330_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_malayalam_sid330_pipeline_en.md new file mode 100644 index 00000000000000..135f11e0ce4a92 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_malayalam_sid330_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_tiny_malayalam_sid330_pipeline pipeline WhisperForCTC from sid330 +author: John Snow Labs +name: whisper_tiny_malayalam_sid330_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_malayalam_sid330_pipeline` is a English model originally trained by sid330. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_malayalam_sid330_pipeline_en_5.5.0_3.0_1726949029268.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_malayalam_sid330_pipeline_en_5.5.0_3.0_1726949029268.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_tiny_malayalam_sid330_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_tiny_malayalam_sid330_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_malayalam_sid330_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|390.2 MB| + +## References + +https://huggingface.co/sid330/whisper-tiny-ml + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_minds14_english_nickprock_en.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_minds14_english_nickprock_en.md new file mode 100644 index 00000000000000..e90cbede278a29 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_minds14_english_nickprock_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_tiny_minds14_english_nickprock WhisperForCTC from nickprock +author: John Snow Labs +name: whisper_tiny_minds14_english_nickprock +date: 2024-09-21 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_minds14_english_nickprock` is a English model originally trained by nickprock. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_minds14_english_nickprock_en_5.5.0_3.0_1726948307657.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_minds14_english_nickprock_en_5.5.0_3.0_1726948307657.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_tiny_minds14_english_nickprock","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_tiny_minds14_english_nickprock", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_minds14_english_nickprock| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|389.9 MB| + +## References + +https://huggingface.co/nickprock/whisper-tiny-minds14-en \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_minds14_english_nickprock_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_minds14_english_nickprock_pipeline_en.md new file mode 100644 index 00000000000000..51a2aebd22878f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_minds14_english_nickprock_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_tiny_minds14_english_nickprock_pipeline pipeline WhisperForCTC from nickprock +author: John Snow Labs +name: whisper_tiny_minds14_english_nickprock_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_minds14_english_nickprock_pipeline` is a English model originally trained by nickprock. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_minds14_english_nickprock_pipeline_en_5.5.0_3.0_1726948326622.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_minds14_english_nickprock_pipeline_en_5.5.0_3.0_1726948326622.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_tiny_minds14_english_nickprock_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_tiny_minds14_english_nickprock_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_minds14_english_nickprock_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|389.9 MB| + +## References + +https://huggingface.co/nickprock/whisper-tiny-minds14-en + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_minds14_iammartian0_en.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_minds14_iammartian0_en.md new file mode 100644 index 00000000000000..d69f8dd26b300e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_minds14_iammartian0_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_tiny_minds14_iammartian0 WhisperForCTC from iammartian0 +author: John Snow Labs +name: whisper_tiny_minds14_iammartian0 +date: 2024-09-21 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_minds14_iammartian0` is a English model originally trained by iammartian0. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_minds14_iammartian0_en_5.5.0_3.0_1726950885193.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_minds14_iammartian0_en_5.5.0_3.0_1726950885193.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_tiny_minds14_iammartian0","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_tiny_minds14_iammartian0", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_minds14_iammartian0| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|390.9 MB| + +## References + +https://huggingface.co/iammartian0/whisper-tiny-minds14 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_minds14_iammartian0_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_minds14_iammartian0_pipeline_en.md new file mode 100644 index 00000000000000..12182bd656a696 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_minds14_iammartian0_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_tiny_minds14_iammartian0_pipeline pipeline WhisperForCTC from iammartian0 +author: John Snow Labs +name: whisper_tiny_minds14_iammartian0_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_minds14_iammartian0_pipeline` is a English model originally trained by iammartian0. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_minds14_iammartian0_pipeline_en_5.5.0_3.0_1726950904627.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_minds14_iammartian0_pipeline_en_5.5.0_3.0_1726950904627.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_tiny_minds14_iammartian0_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_tiny_minds14_iammartian0_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_minds14_iammartian0_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|390.9 MB| + +## References + +https://huggingface.co/iammartian0/whisper-tiny-minds14 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_minds_hlumin_en.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_minds_hlumin_en.md new file mode 100644 index 00000000000000..d23cc82397d906 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_minds_hlumin_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_tiny_minds_hlumin WhisperForCTC from hlumin +author: John Snow Labs +name: whisper_tiny_minds_hlumin +date: 2024-09-21 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_minds_hlumin` is a English model originally trained by hlumin. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_minds_hlumin_en_5.5.0_3.0_1726962795653.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_minds_hlumin_en_5.5.0_3.0_1726962795653.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_tiny_minds_hlumin","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_tiny_minds_hlumin", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_minds_hlumin| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|390.8 MB| + +## References + +https://huggingface.co/hlumin/whisper-tiny-minds \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_minds_hlumin_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_minds_hlumin_pipeline_en.md new file mode 100644 index 00000000000000..ca06afc1a21740 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_minds_hlumin_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_tiny_minds_hlumin_pipeline pipeline WhisperForCTC from hlumin +author: John Snow Labs +name: whisper_tiny_minds_hlumin_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_minds_hlumin_pipeline` is a English model originally trained by hlumin. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_minds_hlumin_pipeline_en_5.5.0_3.0_1726962813747.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_minds_hlumin_pipeline_en_5.5.0_3.0_1726962813747.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_tiny_minds_hlumin_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_tiny_minds_hlumin_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_minds_hlumin_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|390.9 MB| + +## References + +https://huggingface.co/hlumin/whisper-tiny-minds + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_train1_en.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_train1_en.md new file mode 100644 index 00000000000000..36c0f140cc8cbf --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_train1_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_tiny_train1 WhisperForCTC from christinakyp +author: John Snow Labs +name: whisper_tiny_train1 +date: 2024-09-21 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_train1` is a English model originally trained by christinakyp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_train1_en_5.5.0_3.0_1726948820287.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_train1_en_5.5.0_3.0_1726948820287.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_tiny_train1","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_tiny_train1", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_train1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|391.3 MB| + +## References + +https://huggingface.co/christinakyp/whisper-tiny-train1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_train1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_train1_pipeline_en.md new file mode 100644 index 00000000000000..7d25ce16a8c73e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_train1_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_tiny_train1_pipeline pipeline WhisperForCTC from christinakyp +author: John Snow Labs +name: whisper_tiny_train1_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_train1_pipeline` is a English model originally trained by christinakyp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_train1_pipeline_en_5.5.0_3.0_1726948844261.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_train1_pipeline_en_5.5.0_3.0_1726948844261.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_tiny_train1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_tiny_train1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_train1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|391.3 MB| + +## References + +https://huggingface.co/christinakyp/whisper-tiny-train1 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_vietnamese_doof_ferb_pipeline_vi.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_vietnamese_doof_ferb_pipeline_vi.md new file mode 100644 index 00000000000000..8a06344cebf569 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_vietnamese_doof_ferb_pipeline_vi.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Vietnamese whisper_tiny_vietnamese_doof_ferb_pipeline pipeline WhisperForCTC from doof-ferb +author: John Snow Labs +name: whisper_tiny_vietnamese_doof_ferb_pipeline +date: 2024-09-21 +tags: [vi, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: vi +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_vietnamese_doof_ferb_pipeline` is a Vietnamese model originally trained by doof-ferb. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_vietnamese_doof_ferb_pipeline_vi_5.5.0_3.0_1726938005333.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_vietnamese_doof_ferb_pipeline_vi_5.5.0_3.0_1726938005333.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_tiny_vietnamese_doof_ferb_pipeline", lang = "vi") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_tiny_vietnamese_doof_ferb_pipeline", lang = "vi") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_vietnamese_doof_ferb_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|vi| +|Size:|390.0 MB| + +## References + +https://huggingface.co/doof-ferb/whisper-tiny-vi + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_vietnamese_doof_ferb_vi.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_vietnamese_doof_ferb_vi.md new file mode 100644 index 00000000000000..468f84906c02d7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_vietnamese_doof_ferb_vi.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Vietnamese whisper_tiny_vietnamese_doof_ferb WhisperForCTC from doof-ferb +author: John Snow Labs +name: whisper_tiny_vietnamese_doof_ferb +date: 2024-09-21 +tags: [vi, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: vi +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_vietnamese_doof_ferb` is a Vietnamese model originally trained by doof-ferb. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_vietnamese_doof_ferb_vi_5.5.0_3.0_1726937985505.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_vietnamese_doof_ferb_vi_5.5.0_3.0_1726937985505.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_tiny_vietnamese_doof_ferb","vi") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_tiny_vietnamese_doof_ferb", "vi") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_vietnamese_doof_ferb| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|vi| +|Size:|390.0 MB| + +## References + +https://huggingface.co/doof-ferb/whisper-tiny-vi \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_wd_1k_v1_en.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_wd_1k_v1_en.md new file mode 100644 index 00000000000000..bf3516769355c9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_wd_1k_v1_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_tiny_wd_1k_v1 WhisperForCTC from devkyle +author: John Snow Labs +name: whisper_tiny_wd_1k_v1 +date: 2024-09-21 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_wd_1k_v1` is a English model originally trained by devkyle. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_wd_1k_v1_en_5.5.0_3.0_1726935875486.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_wd_1k_v1_en_5.5.0_3.0_1726935875486.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_tiny_wd_1k_v1","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_tiny_wd_1k_v1", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_wd_1k_v1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|389.7 MB| + +## References + +https://huggingface.co/devkyle/whisper-tiny-wd-1k-v1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_wd_1k_v1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_wd_1k_v1_pipeline_en.md new file mode 100644 index 00000000000000..c354c139eed13a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_wd_1k_v1_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_tiny_wd_1k_v1_pipeline pipeline WhisperForCTC from devkyle +author: John Snow Labs +name: whisper_tiny_wd_1k_v1_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_wd_1k_v1_pipeline` is a English model originally trained by devkyle. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_wd_1k_v1_pipeline_en_5.5.0_3.0_1726935893434.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_wd_1k_v1_pipeline_en_5.5.0_3.0_1726935893434.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_tiny_wd_1k_v1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_tiny_wd_1k_v1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_wd_1k_v1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|389.7 MB| + +## References + +https://huggingface.co/devkyle/whisper-tiny-wd-1k-v1 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_yosthingalindo_en.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_yosthingalindo_en.md new file mode 100644 index 00000000000000..f75b07c0ae2b74 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_yosthingalindo_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_tiny_yosthingalindo WhisperForCTC from yosthin06 +author: John Snow Labs +name: whisper_tiny_yosthingalindo +date: 2024-09-21 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_yosthingalindo` is a English model originally trained by yosthin06. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_yosthingalindo_en_5.5.0_3.0_1726949599493.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_yosthingalindo_en_5.5.0_3.0_1726949599493.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_tiny_yosthingalindo","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_tiny_yosthingalindo", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_yosthingalindo| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|389.9 MB| + +## References + +https://huggingface.co/yosthin06/whisper-tiny_yosthingalindo \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_yosthingalindo_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_yosthingalindo_pipeline_en.md new file mode 100644 index 00000000000000..62e623ae2aa3f7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_yosthingalindo_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_tiny_yosthingalindo_pipeline pipeline WhisperForCTC from yosthin06 +author: John Snow Labs +name: whisper_tiny_yosthingalindo_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_yosthingalindo_pipeline` is a English model originally trained by yosthin06. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_yosthingalindo_pipeline_en_5.5.0_3.0_1726949618522.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_yosthingalindo_pipeline_en_5.5.0_3.0_1726949618522.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_tiny_yosthingalindo_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_tiny_yosthingalindo_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_yosthingalindo_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|389.9 MB| + +## References + +https://huggingface.co/yosthin06/whisper-tiny_yosthingalindo + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_v4_small_3_en.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_v4_small_3_en.md new file mode 100644 index 00000000000000..a0feebb78edd5d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_v4_small_3_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_v4_small_3 WhisperForCTC from karinthommen +author: John Snow Labs +name: whisper_v4_small_3 +date: 2024-09-21 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_v4_small_3` is a English model originally trained by karinthommen. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_v4_small_3_en_5.5.0_3.0_1726937174062.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_v4_small_3_en_5.5.0_3.0_1726937174062.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_v4_small_3","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_v4_small_3", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_v4_small_3| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/karinthommen/whisper-V4-small-3 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_v4_small_3_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_v4_small_3_pipeline_en.md new file mode 100644 index 00000000000000..804adb158887b8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_v4_small_3_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_v4_small_3_pipeline pipeline WhisperForCTC from karinthommen +author: John Snow Labs +name: whisper_v4_small_3_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_v4_small_3_pipeline` is a English model originally trained by karinthommen. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_v4_small_3_pipeline_en_5.5.0_3.0_1726937254162.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_v4_small_3_pipeline_en_5.5.0_3.0_1726937254162.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_v4_small_3_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_v4_small_3_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_v4_small_3_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/karinthommen/whisper-V4-small-3 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whispnep_ne.md b/docs/_posts/ahmedlone127/2024-09-21-whispnep_ne.md new file mode 100644 index 00000000000000..1011625c813e16 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whispnep_ne.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Nepali (macrolanguage) whispnep WhisperForCTC from sunilregmi +author: John Snow Labs +name: whispnep +date: 2024-09-21 +tags: [ne, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: ne +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whispnep` is a Nepali (macrolanguage) model originally trained by sunilregmi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whispnep_ne_5.5.0_3.0_1726908891720.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whispnep_ne_5.5.0_3.0_1726908891720.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whispnep","ne") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whispnep", "ne") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whispnep| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|ne| +|Size:|1.7 GB| + +## References + +https://huggingface.co/sunilregmi/whispNEP \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whispnep_pipeline_ne.md b/docs/_posts/ahmedlone127/2024-09-21-whispnep_pipeline_ne.md new file mode 100644 index 00000000000000..d839b82d691ded --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whispnep_pipeline_ne.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Nepali (macrolanguage) whispnep_pipeline pipeline WhisperForCTC from sunilregmi +author: John Snow Labs +name: whispnep_pipeline +date: 2024-09-21 +tags: [ne, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: ne +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whispnep_pipeline` is a Nepali (macrolanguage) model originally trained by sunilregmi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whispnep_pipeline_ne_5.5.0_3.0_1726908971604.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whispnep_pipeline_ne_5.5.0_3.0_1726908971604.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whispnep_pipeline", lang = "ne") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whispnep_pipeline", lang = "ne") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whispnep_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|ne| +|Size:|1.7 GB| + +## References + +https://huggingface.co/sunilregmi/whispNEP + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-withinapps_ndd_addressbook_test_tags_cwadj_en.md b/docs/_posts/ahmedlone127/2024-09-21-withinapps_ndd_addressbook_test_tags_cwadj_en.md new file mode 100644 index 00000000000000..2a3665e5b13b3f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-withinapps_ndd_addressbook_test_tags_cwadj_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English withinapps_ndd_addressbook_test_tags_cwadj DistilBertForSequenceClassification from lgk03 +author: John Snow Labs +name: withinapps_ndd_addressbook_test_tags_cwadj +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`withinapps_ndd_addressbook_test_tags_cwadj` is a English model originally trained by lgk03. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/withinapps_ndd_addressbook_test_tags_cwadj_en_5.5.0_3.0_1726953244842.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/withinapps_ndd_addressbook_test_tags_cwadj_en_5.5.0_3.0_1726953244842.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("withinapps_ndd_addressbook_test_tags_cwadj","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("withinapps_ndd_addressbook_test_tags_cwadj", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|withinapps_ndd_addressbook_test_tags_cwadj| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/lgk03/WITHINAPPS_NDD-addressbook_test-tags-CWAdj \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-withinapps_ndd_addressbook_test_tags_cwadj_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-withinapps_ndd_addressbook_test_tags_cwadj_pipeline_en.md new file mode 100644 index 00000000000000..b6c6c70f7c84b7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-withinapps_ndd_addressbook_test_tags_cwadj_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English withinapps_ndd_addressbook_test_tags_cwadj_pipeline pipeline DistilBertForSequenceClassification from lgk03 +author: John Snow Labs +name: withinapps_ndd_addressbook_test_tags_cwadj_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`withinapps_ndd_addressbook_test_tags_cwadj_pipeline` is a English model originally trained by lgk03. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/withinapps_ndd_addressbook_test_tags_cwadj_pipeline_en_5.5.0_3.0_1726953257286.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/withinapps_ndd_addressbook_test_tags_cwadj_pipeline_en_5.5.0_3.0_1726953257286.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("withinapps_ndd_addressbook_test_tags_cwadj_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("withinapps_ndd_addressbook_test_tags_cwadj_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|withinapps_ndd_addressbook_test_tags_cwadj_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/lgk03/WITHINAPPS_NDD-addressbook_test-tags-CWAdj + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-withinapps_ndd_mantisbt_test_content_cwadj_en.md b/docs/_posts/ahmedlone127/2024-09-21-withinapps_ndd_mantisbt_test_content_cwadj_en.md new file mode 100644 index 00000000000000..11ac93cfc9e79d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-withinapps_ndd_mantisbt_test_content_cwadj_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English withinapps_ndd_mantisbt_test_content_cwadj DistilBertForSequenceClassification from lgk03 +author: John Snow Labs +name: withinapps_ndd_mantisbt_test_content_cwadj +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`withinapps_ndd_mantisbt_test_content_cwadj` is a English model originally trained by lgk03. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/withinapps_ndd_mantisbt_test_content_cwadj_en_5.5.0_3.0_1726889061605.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/withinapps_ndd_mantisbt_test_content_cwadj_en_5.5.0_3.0_1726889061605.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("withinapps_ndd_mantisbt_test_content_cwadj","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("withinapps_ndd_mantisbt_test_content_cwadj", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|withinapps_ndd_mantisbt_test_content_cwadj| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/lgk03/WITHINAPPS_NDD-mantisbt_test-content-CWAdj \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-withinapps_ndd_mantisbt_test_content_cwadj_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-withinapps_ndd_mantisbt_test_content_cwadj_pipeline_en.md new file mode 100644 index 00000000000000..95e2cd62ef8888 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-withinapps_ndd_mantisbt_test_content_cwadj_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English withinapps_ndd_mantisbt_test_content_cwadj_pipeline pipeline DistilBertForSequenceClassification from lgk03 +author: John Snow Labs +name: withinapps_ndd_mantisbt_test_content_cwadj_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`withinapps_ndd_mantisbt_test_content_cwadj_pipeline` is a English model originally trained by lgk03. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/withinapps_ndd_mantisbt_test_content_cwadj_pipeline_en_5.5.0_3.0_1726889074296.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/withinapps_ndd_mantisbt_test_content_cwadj_pipeline_en_5.5.0_3.0_1726889074296.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("withinapps_ndd_mantisbt_test_content_cwadj_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("withinapps_ndd_mantisbt_test_content_cwadj_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|withinapps_ndd_mantisbt_test_content_cwadj_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/lgk03/WITHINAPPS_NDD-mantisbt_test-content-CWAdj + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_2_en.md b/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_2_en.md new file mode 100644 index 00000000000000..87f49d90c9cefd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_2 XlmRoBertaForSequenceClassification from alyazharr +author: John Snow Labs +name: xlm_roberta_base_2 +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_2` is a English model originally trained by alyazharr. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_2_en_5.5.0_3.0_1726918684450.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_2_en_5.5.0_3.0_1726918684450.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|827.5 MB| + +## References + +https://huggingface.co/alyazharr/xlm_roberta_base_2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_2_pipeline_en.md new file mode 100644 index 00000000000000..0acb22e0749805 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_2_pipeline pipeline XlmRoBertaForSequenceClassification from alyazharr +author: John Snow Labs +name: xlm_roberta_base_2_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_2_pipeline` is a English model originally trained by alyazharr. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_2_pipeline_en_5.5.0_3.0_1726918774072.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_2_pipeline_en_5.5.0_3.0_1726918774072.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|827.5 MB| + +## References + +https://huggingface.co/alyazharr/xlm_roberta_base_2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_balance_mixed_aug_insert_w2v_en.md b/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_balance_mixed_aug_insert_w2v_en.md new file mode 100644 index 00000000000000..933ce9a2d1fc4c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_balance_mixed_aug_insert_w2v_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_balance_mixed_aug_insert_w2v XlmRoBertaForSequenceClassification from ThuyNT03 +author: John Snow Labs +name: xlm_roberta_base_balance_mixed_aug_insert_w2v +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_balance_mixed_aug_insert_w2v` is a English model originally trained by ThuyNT03. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_balance_mixed_aug_insert_w2v_en_5.5.0_3.0_1726932368304.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_balance_mixed_aug_insert_w2v_en_5.5.0_3.0_1726932368304.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_balance_mixed_aug_insert_w2v","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_balance_mixed_aug_insert_w2v", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_balance_mixed_aug_insert_w2v| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|795.1 MB| + +## References + +https://huggingface.co/ThuyNT03/xlm-roberta-base-Balance_Mixed-aug_insert_w2v \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_balance_mixed_aug_insert_w2v_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_balance_mixed_aug_insert_w2v_pipeline_en.md new file mode 100644 index 00000000000000..6a81bdd6151567 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_balance_mixed_aug_insert_w2v_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_balance_mixed_aug_insert_w2v_pipeline pipeline XlmRoBertaForSequenceClassification from ThuyNT03 +author: John Snow Labs +name: xlm_roberta_base_balance_mixed_aug_insert_w2v_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_balance_mixed_aug_insert_w2v_pipeline` is a English model originally trained by ThuyNT03. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_balance_mixed_aug_insert_w2v_pipeline_en_5.5.0_3.0_1726932490977.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_balance_mixed_aug_insert_w2v_pipeline_en_5.5.0_3.0_1726932490977.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_balance_mixed_aug_insert_w2v_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_balance_mixed_aug_insert_w2v_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_balance_mixed_aug_insert_w2v_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|795.2 MB| + +## References + +https://huggingface.co/ThuyNT03/xlm-roberta-base-Balance_Mixed-aug_insert_w2v + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_balance_vietnam_aug_replace_tfidf_en.md b/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_balance_vietnam_aug_replace_tfidf_en.md new file mode 100644 index 00000000000000..ac5fcc213f0da0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_balance_vietnam_aug_replace_tfidf_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_balance_vietnam_aug_replace_tfidf XlmRoBertaForSequenceClassification from ThuyNT03 +author: John Snow Labs +name: xlm_roberta_base_balance_vietnam_aug_replace_tfidf +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_balance_vietnam_aug_replace_tfidf` is a English model originally trained by ThuyNT03. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_balance_vietnam_aug_replace_tfidf_en_5.5.0_3.0_1726918516471.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_balance_vietnam_aug_replace_tfidf_en_5.5.0_3.0_1726918516471.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_balance_vietnam_aug_replace_tfidf","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_balance_vietnam_aug_replace_tfidf", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_balance_vietnam_aug_replace_tfidf| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|793.7 MB| + +## References + +https://huggingface.co/ThuyNT03/xlm-roberta-base-Balance_VietNam-aug_replace_tfidf \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_balance_vietnam_aug_replace_tfidf_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_balance_vietnam_aug_replace_tfidf_pipeline_en.md new file mode 100644 index 00000000000000..a29a2d8af0ea1f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_balance_vietnam_aug_replace_tfidf_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_balance_vietnam_aug_replace_tfidf_pipeline pipeline XlmRoBertaForSequenceClassification from ThuyNT03 +author: John Snow Labs +name: xlm_roberta_base_balance_vietnam_aug_replace_tfidf_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_balance_vietnam_aug_replace_tfidf_pipeline` is a English model originally trained by ThuyNT03. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_balance_vietnam_aug_replace_tfidf_pipeline_en_5.5.0_3.0_1726918646903.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_balance_vietnam_aug_replace_tfidf_pipeline_en_5.5.0_3.0_1726918646903.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_balance_vietnam_aug_replace_tfidf_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_balance_vietnam_aug_replace_tfidf_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_balance_vietnam_aug_replace_tfidf_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|793.7 MB| + +## References + +https://huggingface.co/ThuyNT03/xlm-roberta-base-Balance_VietNam-aug_replace_tfidf + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_balance_vietnam_aug_swap_en.md b/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_balance_vietnam_aug_swap_en.md new file mode 100644 index 00000000000000..a9b12530643a1f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_balance_vietnam_aug_swap_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_balance_vietnam_aug_swap XlmRoBertaForSequenceClassification from ThuyNT03 +author: John Snow Labs +name: xlm_roberta_base_balance_vietnam_aug_swap +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_balance_vietnam_aug_swap` is a English model originally trained by ThuyNT03. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_balance_vietnam_aug_swap_en_5.5.0_3.0_1726933119882.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_balance_vietnam_aug_swap_en_5.5.0_3.0_1726933119882.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_balance_vietnam_aug_swap","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_balance_vietnam_aug_swap", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_balance_vietnam_aug_swap| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|795.3 MB| + +## References + +https://huggingface.co/ThuyNT03/xlm-roberta-base-Balance_VietNam-aug_swap \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_balance_vietnam_aug_swap_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_balance_vietnam_aug_swap_pipeline_en.md new file mode 100644 index 00000000000000..1f2db042022a11 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_balance_vietnam_aug_swap_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_balance_vietnam_aug_swap_pipeline pipeline XlmRoBertaForSequenceClassification from ThuyNT03 +author: John Snow Labs +name: xlm_roberta_base_balance_vietnam_aug_swap_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_balance_vietnam_aug_swap_pipeline` is a English model originally trained by ThuyNT03. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_balance_vietnam_aug_swap_pipeline_en_5.5.0_3.0_1726933239709.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_balance_vietnam_aug_swap_pipeline_en_5.5.0_3.0_1726933239709.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_balance_vietnam_aug_swap_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_balance_vietnam_aug_swap_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_balance_vietnam_aug_swap_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|795.3 MB| + +## References + +https://huggingface.co/ThuyNT03/xlm-roberta-base-Balance_VietNam-aug_swap + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_finetuned_panx_all_sungkwangjoong_en.md b/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_finetuned_panx_all_sungkwangjoong_en.md new file mode 100644 index 00000000000000..28a19b0a01b796 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_finetuned_panx_all_sungkwangjoong_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_all_sungkwangjoong XlmRoBertaForTokenClassification from sungkwangjoong +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_all_sungkwangjoong +date: 2024-09-21 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_all_sungkwangjoong` is a English model originally trained by sungkwangjoong. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_sungkwangjoong_en_5.5.0_3.0_1726896607351.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_sungkwangjoong_en_5.5.0_3.0_1726896607351.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_all_sungkwangjoong","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_all_sungkwangjoong", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_all_sungkwangjoong| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|850.1 MB| + +## References + +https://huggingface.co/sungkwangjoong/xlm-roberta-base-finetuned-panx-all \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_finetuned_panx_all_sungkwangjoong_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_finetuned_panx_all_sungkwangjoong_pipeline_en.md new file mode 100644 index 00000000000000..3df2947815f768 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_finetuned_panx_all_sungkwangjoong_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_all_sungkwangjoong_pipeline pipeline XlmRoBertaForTokenClassification from sungkwangjoong +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_all_sungkwangjoong_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_all_sungkwangjoong_pipeline` is a English model originally trained by sungkwangjoong. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_sungkwangjoong_pipeline_en_5.5.0_3.0_1726896691344.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_sungkwangjoong_pipeline_en_5.5.0_3.0_1726896691344.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_all_sungkwangjoong_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_all_sungkwangjoong_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_all_sungkwangjoong_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|850.2 MB| + +## References + +https://huggingface.co/sungkwangjoong/xlm-roberta-base-finetuned-panx-all + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_finetuned_panx_english_lee_soha_en.md b/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_finetuned_panx_english_lee_soha_en.md new file mode 100644 index 00000000000000..9d068d606f2d51 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_finetuned_panx_english_lee_soha_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_english_lee_soha XlmRoBertaForTokenClassification from Lee-soha +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_english_lee_soha +date: 2024-09-21 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_english_lee_soha` is a English model originally trained by Lee-soha. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_lee_soha_en_5.5.0_3.0_1726883615894.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_lee_soha_en_5.5.0_3.0_1726883615894.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_english_lee_soha","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_english_lee_soha", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_english_lee_soha| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|814.3 MB| + +## References + +https://huggingface.co/Lee-soha/xlm-roberta-base-finetuned-panx-en \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_finetuned_panx_english_reaverlee_en.md b/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_finetuned_panx_english_reaverlee_en.md new file mode 100644 index 00000000000000..84378a95da16aa --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_finetuned_panx_english_reaverlee_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_english_reaverlee XlmRoBertaForTokenClassification from reaverlee +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_english_reaverlee +date: 2024-09-21 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_english_reaverlee` is a English model originally trained by reaverlee. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_reaverlee_en_5.5.0_3.0_1726897076979.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_reaverlee_en_5.5.0_3.0_1726897076979.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_english_reaverlee","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_english_reaverlee", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_english_reaverlee| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|826.4 MB| + +## References + +https://huggingface.co/reaverlee/xlm-roberta-base-finetuned-panx-en \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_finetuned_panx_english_reaverlee_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_finetuned_panx_english_reaverlee_pipeline_en.md new file mode 100644 index 00000000000000..3d447a7b4f0a16 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_finetuned_panx_english_reaverlee_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_english_reaverlee_pipeline pipeline XlmRoBertaForTokenClassification from reaverlee +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_english_reaverlee_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_english_reaverlee_pipeline` is a English model originally trained by reaverlee. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_reaverlee_pipeline_en_5.5.0_3.0_1726897167038.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_reaverlee_pipeline_en_5.5.0_3.0_1726897167038.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_english_reaverlee_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_english_reaverlee_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_english_reaverlee_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|826.4 MB| + +## References + +https://huggingface.co/reaverlee/xlm-roberta-base-finetuned-panx-en + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_finetuned_panx_english_sponomary_en.md b/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_finetuned_panx_english_sponomary_en.md new file mode 100644 index 00000000000000..8cc665ba8a0c67 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_finetuned_panx_english_sponomary_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_english_sponomary XlmRoBertaForTokenClassification from sponomary +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_english_sponomary +date: 2024-09-21 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_english_sponomary` is a English model originally trained by sponomary. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_sponomary_en_5.5.0_3.0_1726883494147.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_sponomary_en_5.5.0_3.0_1726883494147.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_english_sponomary","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_english_sponomary", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_english_sponomary| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|826.4 MB| + +## References + +https://huggingface.co/sponomary/xlm-roberta-base-finetuned-panx-en \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_finetuned_panx_english_sponomary_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_finetuned_panx_english_sponomary_pipeline_en.md new file mode 100644 index 00000000000000..fc179465295eec --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_finetuned_panx_english_sponomary_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_english_sponomary_pipeline pipeline XlmRoBertaForTokenClassification from sponomary +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_english_sponomary_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_english_sponomary_pipeline` is a English model originally trained by sponomary. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_sponomary_pipeline_en_5.5.0_3.0_1726883587308.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_sponomary_pipeline_en_5.5.0_3.0_1726883587308.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_english_sponomary_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_english_sponomary_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_english_sponomary_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|826.4 MB| + +## References + +https://huggingface.co/sponomary/xlm-roberta-base-finetuned-panx-en + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_finetuned_panx_french_jhagege_en.md b/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_finetuned_panx_french_jhagege_en.md new file mode 100644 index 00000000000000..b39335ead090d4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_finetuned_panx_french_jhagege_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_jhagege XlmRoBertaForTokenClassification from jhagege +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_jhagege +date: 2024-09-21 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_jhagege` is a English model originally trained by jhagege. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_jhagege_en_5.5.0_3.0_1726884180627.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_jhagege_en_5.5.0_3.0_1726884180627.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_french_jhagege","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_french_jhagege", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_jhagege| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|831.2 MB| + +## References + +https://huggingface.co/jhagege/xlm-roberta-base-finetuned-panx-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_finetuned_panx_french_jhagege_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_finetuned_panx_french_jhagege_pipeline_en.md new file mode 100644 index 00000000000000..2128bf66ad6b2d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_finetuned_panx_french_jhagege_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_jhagege_pipeline pipeline XlmRoBertaForTokenClassification from jhagege +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_jhagege_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_jhagege_pipeline` is a English model originally trained by jhagege. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_jhagege_pipeline_en_5.5.0_3.0_1726884261312.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_jhagege_pipeline_en_5.5.0_3.0_1726884261312.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_jhagege_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_jhagege_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_jhagege_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|831.2 MB| + +## References + +https://huggingface.co/jhagege/xlm-roberta-base-finetuned-panx-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_finetuned_panx_french_ladoza03_en.md b/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_finetuned_panx_french_ladoza03_en.md new file mode 100644 index 00000000000000..714b9b3b46c67d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_finetuned_panx_french_ladoza03_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_ladoza03 XlmRoBertaForTokenClassification from ladoza03 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_ladoza03 +date: 2024-09-21 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_ladoza03` is a English model originally trained by ladoza03. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_ladoza03_en_5.5.0_3.0_1726896783856.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_ladoza03_en_5.5.0_3.0_1726896783856.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_french_ladoza03","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_french_ladoza03", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_ladoza03| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|842.1 MB| + +## References + +https://huggingface.co/ladoza03/xlm-roberta-base-finetuned-panx-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_finetuned_panx_french_ladoza03_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_finetuned_panx_french_ladoza03_pipeline_en.md new file mode 100644 index 00000000000000..24bd975a716dc5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_finetuned_panx_french_ladoza03_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_ladoza03_pipeline pipeline XlmRoBertaForTokenClassification from ladoza03 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_ladoza03_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_ladoza03_pipeline` is a English model originally trained by ladoza03. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_ladoza03_pipeline_en_5.5.0_3.0_1726896862439.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_ladoza03_pipeline_en_5.5.0_3.0_1726896862439.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_ladoza03_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_ladoza03_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_ladoza03_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|842.1 MB| + +## References + +https://huggingface.co/ladoza03/xlm-roberta-base-finetuned-panx-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_finetuned_panx_french_maxnet_en.md b/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_finetuned_panx_french_maxnet_en.md new file mode 100644 index 00000000000000..735783c460e7d9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_finetuned_panx_french_maxnet_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_maxnet XlmRoBertaForTokenClassification from Maxnet +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_maxnet +date: 2024-09-21 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_maxnet` is a English model originally trained by Maxnet. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_maxnet_en_5.5.0_3.0_1726883329071.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_maxnet_en_5.5.0_3.0_1726883329071.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_french_maxnet","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_french_maxnet", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_maxnet| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|831.2 MB| + +## References + +https://huggingface.co/Maxnet/xlm-roberta-base-finetuned-panx-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_finetuned_panx_french_maxnet_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_finetuned_panx_french_maxnet_pipeline_en.md new file mode 100644 index 00000000000000..e6539e1561dd68 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_finetuned_panx_french_maxnet_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_maxnet_pipeline pipeline XlmRoBertaForTokenClassification from Maxnet +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_maxnet_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_maxnet_pipeline` is a English model originally trained by Maxnet. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_maxnet_pipeline_en_5.5.0_3.0_1726883411129.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_maxnet_pipeline_en_5.5.0_3.0_1726883411129.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_maxnet_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_maxnet_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_maxnet_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|831.2 MB| + +## References + +https://huggingface.co/Maxnet/xlm-roberta-base-finetuned-panx-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_finetuned_panx_french_skr1125_en.md b/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_finetuned_panx_french_skr1125_en.md new file mode 100644 index 00000000000000..b4c3880cf03684 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_finetuned_panx_french_skr1125_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_skr1125 XlmRoBertaForTokenClassification from skr1125 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_skr1125 +date: 2024-09-21 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_skr1125` is a English model originally trained by skr1125. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_skr1125_en_5.5.0_3.0_1726896808286.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_skr1125_en_5.5.0_3.0_1726896808286.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_french_skr1125","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_french_skr1125", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_skr1125| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|840.9 MB| + +## References + +https://huggingface.co/skr1125/xlm-roberta-base-finetuned-panx-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_finetuned_panx_french_skr1125_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_finetuned_panx_french_skr1125_pipeline_en.md new file mode 100644 index 00000000000000..fc0e083feebb87 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_finetuned_panx_french_skr1125_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_skr1125_pipeline pipeline XlmRoBertaForTokenClassification from skr1125 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_skr1125_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_skr1125_pipeline` is a English model originally trained by skr1125. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_skr1125_pipeline_en_5.5.0_3.0_1726896883077.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_skr1125_pipeline_en_5.5.0_3.0_1726896883077.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_skr1125_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_skr1125_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_skr1125_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|840.9 MB| + +## References + +https://huggingface.co/skr1125/xlm-roberta-base-finetuned-panx-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_finetuned_panx_french_ysige_en.md b/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_finetuned_panx_french_ysige_en.md new file mode 100644 index 00000000000000..c4610d9a74e6f5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_finetuned_panx_french_ysige_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_ysige XlmRoBertaForTokenClassification from ysige +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_ysige +date: 2024-09-21 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_ysige` is a English model originally trained by ysige. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_ysige_en_5.5.0_3.0_1726897268118.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_ysige_en_5.5.0_3.0_1726897268118.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_french_ysige","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_french_ysige", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_ysige| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|840.9 MB| + +## References + +https://huggingface.co/ysige/xlm-roberta-base-finetuned-panx-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_finetuned_panx_french_ysige_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_finetuned_panx_french_ysige_pipeline_en.md new file mode 100644 index 00000000000000..4ecb1d164edac4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_finetuned_panx_french_ysige_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_ysige_pipeline pipeline XlmRoBertaForTokenClassification from ysige +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_ysige_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_ysige_pipeline` is a English model originally trained by ysige. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_ysige_pipeline_en_5.5.0_3.0_1726897343584.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_ysige_pipeline_en_5.5.0_3.0_1726897343584.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_ysige_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_ysige_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_ysige_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|840.9 MB| + +## References + +https://huggingface.co/ysige/xlm-roberta-base-finetuned-panx-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_finetuned_panx_german_1_en.md b/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_finetuned_panx_german_1_en.md new file mode 100644 index 00000000000000..c580718a47d1c8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_finetuned_panx_german_1_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_1 XlmRoBertaForTokenClassification from LeoTungAnh +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_1 +date: 2024-09-21 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_1` is a English model originally trained by LeoTungAnh. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_1_en_5.5.0_3.0_1726897416419.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_1_en_5.5.0_3.0_1726897416419.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_1","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_1", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|840.8 MB| + +## References + +https://huggingface.co/LeoTungAnh/xlm-roberta-base-finetuned-panx-de-1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_finetuned_panx_german_1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_finetuned_panx_german_1_pipeline_en.md new file mode 100644 index 00000000000000..c2ed2fa6701969 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_finetuned_panx_german_1_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_1_pipeline pipeline XlmRoBertaForTokenClassification from LeoTungAnh +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_1_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_1_pipeline` is a English model originally trained by LeoTungAnh. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_1_pipeline_en_5.5.0_3.0_1726897495849.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_1_pipeline_en_5.5.0_3.0_1726897495849.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|840.8 MB| + +## References + +https://huggingface.co/LeoTungAnh/xlm-roberta-base-finetuned-panx-de-1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_finetuned_panx_german_alex423_en.md b/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_finetuned_panx_german_alex423_en.md new file mode 100644 index 00000000000000..a9af36790f982b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_finetuned_panx_german_alex423_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_alex423 XlmRoBertaForTokenClassification from Alex423 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_alex423 +date: 2024-09-21 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_alex423` is a English model originally trained by Alex423. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_alex423_en_5.5.0_3.0_1726884065665.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_alex423_en_5.5.0_3.0_1726884065665.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_alex423","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_alex423", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_alex423| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/Alex423/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_finetuned_panx_german_alex423_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_finetuned_panx_german_alex423_pipeline_en.md new file mode 100644 index 00000000000000..7ca7e62002221b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_finetuned_panx_german_alex423_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_alex423_pipeline pipeline XlmRoBertaForTokenClassification from Alex423 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_alex423_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_alex423_pipeline` is a English model originally trained by Alex423. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_alex423_pipeline_en_5.5.0_3.0_1726884130313.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_alex423_pipeline_en_5.5.0_3.0_1726884130313.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_alex423_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_alex423_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_alex423_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/Alex423/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_finetuned_panx_german_arashkhan58_en.md b/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_finetuned_panx_german_arashkhan58_en.md new file mode 100644 index 00000000000000..dd59eb399a457e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_finetuned_panx_german_arashkhan58_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_arashkhan58 XlmRoBertaForTokenClassification from arashkhan58 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_arashkhan58 +date: 2024-09-21 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_arashkhan58` is a English model originally trained by arashkhan58. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_arashkhan58_en_5.5.0_3.0_1726897298104.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_arashkhan58_en_5.5.0_3.0_1726897298104.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_arashkhan58","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_arashkhan58", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_arashkhan58| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/arashkhan58/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_finetuned_panx_german_arashkhan58_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_finetuned_panx_german_arashkhan58_pipeline_en.md new file mode 100644 index 00000000000000..8ebbfdb53189b7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_finetuned_panx_german_arashkhan58_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_arashkhan58_pipeline pipeline XlmRoBertaForTokenClassification from arashkhan58 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_arashkhan58_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_arashkhan58_pipeline` is a English model originally trained by arashkhan58. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_arashkhan58_pipeline_en_5.5.0_3.0_1726897362570.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_arashkhan58_pipeline_en_5.5.0_3.0_1726897362570.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_arashkhan58_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_arashkhan58_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_arashkhan58_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/arashkhan58/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_finetuned_panx_german_french_aiekek_en.md b/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_finetuned_panx_german_french_aiekek_en.md new file mode 100644 index 00000000000000..3215eb4f832570 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_finetuned_panx_german_french_aiekek_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_aiekek XlmRoBertaForTokenClassification from AIEKEK +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_aiekek +date: 2024-09-21 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_aiekek` is a English model originally trained by AIEKEK. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_aiekek_en_5.5.0_3.0_1726883790386.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_aiekek_en_5.5.0_3.0_1726883790386.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_aiekek","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_aiekek", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_aiekek| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|843.4 MB| + +## References + +https://huggingface.co/AIEKEK/xlm-roberta-base-finetuned-panx-de-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_finetuned_panx_german_french_aiekek_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_finetuned_panx_german_french_aiekek_pipeline_en.md new file mode 100644 index 00000000000000..06363b9dca3472 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_finetuned_panx_german_french_aiekek_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_aiekek_pipeline pipeline XlmRoBertaForTokenClassification from AIEKEK +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_aiekek_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_aiekek_pipeline` is a English model originally trained by AIEKEK. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_aiekek_pipeline_en_5.5.0_3.0_1726883874508.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_aiekek_pipeline_en_5.5.0_3.0_1726883874508.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_aiekek_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_aiekek_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_aiekek_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|843.4 MB| + +## References + +https://huggingface.co/AIEKEK/xlm-roberta-base-finetuned-panx-de-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_finetuned_panx_german_french_hravi_en.md b/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_finetuned_panx_german_french_hravi_en.md new file mode 100644 index 00000000000000..f3384b5e124890 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_finetuned_panx_german_french_hravi_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_hravi XlmRoBertaForTokenClassification from hravi +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_hravi +date: 2024-09-21 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_hravi` is a English model originally trained by hravi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_hravi_en_5.5.0_3.0_1726897259026.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_hravi_en_5.5.0_3.0_1726897259026.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_hravi","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_hravi", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_hravi| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|843.4 MB| + +## References + +https://huggingface.co/hravi/xlm-roberta-base-finetuned-panx-de-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_finetuned_panx_german_french_hravi_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_finetuned_panx_german_french_hravi_pipeline_en.md new file mode 100644 index 00000000000000..f768cd96e24e03 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_finetuned_panx_german_french_hravi_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_hravi_pipeline pipeline XlmRoBertaForTokenClassification from hravi +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_hravi_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_hravi_pipeline` is a English model originally trained by hravi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_hravi_pipeline_en_5.5.0_3.0_1726897340869.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_hravi_pipeline_en_5.5.0_3.0_1726897340869.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_hravi_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_hravi_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_hravi_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|843.4 MB| + +## References + +https://huggingface.co/hravi/xlm-roberta-base-finetuned-panx-de-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_finetuned_panx_german_french_neel_jotaniya_en.md b/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_finetuned_panx_german_french_neel_jotaniya_en.md new file mode 100644 index 00000000000000..b4307325f62c86 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_finetuned_panx_german_french_neel_jotaniya_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_neel_jotaniya XlmRoBertaForTokenClassification from neel-jotaniya +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_neel_jotaniya +date: 2024-09-21 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_neel_jotaniya` is a English model originally trained by neel-jotaniya. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_neel_jotaniya_en_5.5.0_3.0_1726883472789.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_neel_jotaniya_en_5.5.0_3.0_1726883472789.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_neel_jotaniya","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_neel_jotaniya", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_neel_jotaniya| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|843.4 MB| + +## References + +https://huggingface.co/neel-jotaniya/xlm-roberta-base-finetuned-panx-de-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_finetuned_panx_german_french_neel_jotaniya_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_finetuned_panx_german_french_neel_jotaniya_pipeline_en.md new file mode 100644 index 00000000000000..01cb4ae154c75c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_finetuned_panx_german_french_neel_jotaniya_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_neel_jotaniya_pipeline pipeline XlmRoBertaForTokenClassification from neel-jotaniya +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_neel_jotaniya_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_neel_jotaniya_pipeline` is a English model originally trained by neel-jotaniya. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_neel_jotaniya_pipeline_en_5.5.0_3.0_1726883556635.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_neel_jotaniya_pipeline_en_5.5.0_3.0_1726883556635.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_neel_jotaniya_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_neel_jotaniya_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_neel_jotaniya_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|843.4 MB| + +## References + +https://huggingface.co/neel-jotaniya/xlm-roberta-base-finetuned-panx-de-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_finetuned_panx_german_french_nhung_en.md b/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_finetuned_panx_german_french_nhung_en.md new file mode 100644 index 00000000000000..fedaede861e4b2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_finetuned_panx_german_french_nhung_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_nhung XlmRoBertaForTokenClassification from nhung +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_nhung +date: 2024-09-21 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_nhung` is a English model originally trained by nhung. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_nhung_en_5.5.0_3.0_1726883343522.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_nhung_en_5.5.0_3.0_1726883343522.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_nhung","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_nhung", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_nhung| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|840.6 MB| + +## References + +https://huggingface.co/nhung/xlm-roberta-base-finetuned-panx-de-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_finetuned_panx_german_french_nhung_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_finetuned_panx_german_french_nhung_pipeline_en.md new file mode 100644 index 00000000000000..182b67e238ae1d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_finetuned_panx_german_french_nhung_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_nhung_pipeline pipeline XlmRoBertaForTokenClassification from nhung +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_nhung_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_nhung_pipeline` is a English model originally trained by nhung. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_nhung_pipeline_en_5.5.0_3.0_1726883412774.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_nhung_pipeline_en_5.5.0_3.0_1726883412774.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_nhung_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_nhung_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_nhung_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|840.6 MB| + +## References + +https://huggingface.co/nhung/xlm-roberta-base-finetuned-panx-de-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_finetuned_panx_german_marko_vasic_en.md b/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_finetuned_panx_german_marko_vasic_en.md new file mode 100644 index 00000000000000..da3b8025969851 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_finetuned_panx_german_marko_vasic_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_marko_vasic XlmRoBertaForTokenClassification from marko-vasic +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_marko_vasic +date: 2024-09-21 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_marko_vasic` is a English model originally trained by marko-vasic. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_marko_vasic_en_5.5.0_3.0_1726883911058.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_marko_vasic_en_5.5.0_3.0_1726883911058.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_marko_vasic","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_marko_vasic", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_marko_vasic| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/marko-vasic/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_finetuned_panx_german_marko_vasic_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_finetuned_panx_german_marko_vasic_pipeline_en.md new file mode 100644 index 00000000000000..2a7ded5ae2882a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_finetuned_panx_german_marko_vasic_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_marko_vasic_pipeline pipeline XlmRoBertaForTokenClassification from marko-vasic +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_marko_vasic_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_marko_vasic_pipeline` is a English model originally trained by marko-vasic. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_marko_vasic_pipeline_en_5.5.0_3.0_1726883977260.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_marko_vasic_pipeline_en_5.5.0_3.0_1726883977260.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_marko_vasic_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_marko_vasic_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_marko_vasic_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/marko-vasic/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_finetuned_panx_german_parksanha_en.md b/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_finetuned_panx_german_parksanha_en.md new file mode 100644 index 00000000000000..57f72bd53fbe68 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_finetuned_panx_german_parksanha_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_parksanha XlmRoBertaForTokenClassification from parksanha +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_parksanha +date: 2024-09-21 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_parksanha` is a English model originally trained by parksanha. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_parksanha_en_5.5.0_3.0_1726896609697.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_parksanha_en_5.5.0_3.0_1726896609697.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_parksanha","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_parksanha", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_parksanha| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|840.8 MB| + +## References + +https://huggingface.co/parksanha/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_finetuned_panx_german_parksanha_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_finetuned_panx_german_parksanha_pipeline_en.md new file mode 100644 index 00000000000000..d9430e828872e3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_finetuned_panx_german_parksanha_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_parksanha_pipeline pipeline XlmRoBertaForTokenClassification from parksanha +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_parksanha_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_parksanha_pipeline` is a English model originally trained by parksanha. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_parksanha_pipeline_en_5.5.0_3.0_1726896693644.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_parksanha_pipeline_en_5.5.0_3.0_1726896693644.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_parksanha_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_parksanha_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_parksanha_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|840.8 MB| + +## References + +https://huggingface.co/parksanha/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_finetuned_panx_german_vietnguyen1989_en.md b/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_finetuned_panx_german_vietnguyen1989_en.md new file mode 100644 index 00000000000000..ec2a6c3108404f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_finetuned_panx_german_vietnguyen1989_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_vietnguyen1989 XlmRoBertaForTokenClassification from vietnguyen1989 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_vietnguyen1989 +date: 2024-09-21 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_vietnguyen1989` is a English model originally trained by vietnguyen1989. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_vietnguyen1989_en_5.5.0_3.0_1726897085747.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_vietnguyen1989_en_5.5.0_3.0_1726897085747.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_vietnguyen1989","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_vietnguyen1989", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_vietnguyen1989| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|840.8 MB| + +## References + +https://huggingface.co/vietnguyen1989/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_finetuned_panx_german_vietnguyen1989_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_finetuned_panx_german_vietnguyen1989_pipeline_en.md new file mode 100644 index 00000000000000..3a67f0741eb9c6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_finetuned_panx_german_vietnguyen1989_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_vietnguyen1989_pipeline pipeline XlmRoBertaForTokenClassification from vietnguyen1989 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_vietnguyen1989_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_vietnguyen1989_pipeline` is a English model originally trained by vietnguyen1989. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_vietnguyen1989_pipeline_en_5.5.0_3.0_1726897166487.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_vietnguyen1989_pipeline_en_5.5.0_3.0_1726897166487.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_vietnguyen1989_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_vietnguyen1989_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_vietnguyen1989_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|840.8 MB| + +## References + +https://huggingface.co/vietnguyen1989/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_finetuned_panx_hindi_5_epochs_en.md b/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_finetuned_panx_hindi_5_epochs_en.md new file mode 100644 index 00000000000000..21c14e24ce5f27 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_finetuned_panx_hindi_5_epochs_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_hindi_5_epochs XlmRoBertaForTokenClassification from DeepaPeri +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_hindi_5_epochs +date: 2024-09-21 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_hindi_5_epochs` is a English model originally trained by DeepaPeri. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_hindi_5_epochs_en_5.5.0_3.0_1726896855933.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_hindi_5_epochs_en_5.5.0_3.0_1726896855933.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_hindi_5_epochs","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_hindi_5_epochs", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_hindi_5_epochs| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|824.7 MB| + +## References + +https://huggingface.co/DeepaPeri/xlm-roberta-base-finetuned-panx-hi-5-epochs \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_finetuned_panx_hindi_5_epochs_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_finetuned_panx_hindi_5_epochs_pipeline_en.md new file mode 100644 index 00000000000000..a614a9c0f885e6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_finetuned_panx_hindi_5_epochs_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_hindi_5_epochs_pipeline pipeline XlmRoBertaForTokenClassification from DeepaPeri +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_hindi_5_epochs_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_hindi_5_epochs_pipeline` is a English model originally trained by DeepaPeri. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_hindi_5_epochs_pipeline_en_5.5.0_3.0_1726896939361.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_hindi_5_epochs_pipeline_en_5.5.0_3.0_1726896939361.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_hindi_5_epochs_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_hindi_5_epochs_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_hindi_5_epochs_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|824.8 MB| + +## References + +https://huggingface.co/DeepaPeri/xlm-roberta-base-finetuned-panx-hi-5-epochs + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_finetuned_panx_italian_abdelkareem_en.md b/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_finetuned_panx_italian_abdelkareem_en.md new file mode 100644 index 00000000000000..517ae1e9dbb3f7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_finetuned_panx_italian_abdelkareem_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_abdelkareem XlmRoBertaForTokenClassification from Abdelkareem +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_abdelkareem +date: 2024-09-21 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_abdelkareem` is a English model originally trained by Abdelkareem. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_abdelkareem_en_5.5.0_3.0_1726883938062.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_abdelkareem_en_5.5.0_3.0_1726883938062.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_abdelkareem","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_abdelkareem", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_abdelkareem| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|816.7 MB| + +## References + +https://huggingface.co/Abdelkareem/xlm-roberta-base-finetuned-panx-it \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_finetuned_panx_italian_abdelkareem_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_finetuned_panx_italian_abdelkareem_pipeline_en.md new file mode 100644 index 00000000000000..3c89a0bddfb3a6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_finetuned_panx_italian_abdelkareem_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_abdelkareem_pipeline pipeline XlmRoBertaForTokenClassification from Abdelkareem +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_abdelkareem_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_abdelkareem_pipeline` is a English model originally trained by Abdelkareem. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_abdelkareem_pipeline_en_5.5.0_3.0_1726884034127.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_abdelkareem_pipeline_en_5.5.0_3.0_1726884034127.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_abdelkareem_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_abdelkareem_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_abdelkareem_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|816.8 MB| + +## References + +https://huggingface.co/Abdelkareem/xlm-roberta-base-finetuned-panx-it + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_finetuned_panx_italian_heerak_en.md b/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_finetuned_panx_italian_heerak_en.md new file mode 100644 index 00000000000000..9d1eda142749c9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_finetuned_panx_italian_heerak_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_heerak XlmRoBertaForTokenClassification from Heerak +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_heerak +date: 2024-09-21 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_heerak` is a English model originally trained by Heerak. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_heerak_en_5.5.0_3.0_1726883291846.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_heerak_en_5.5.0_3.0_1726883291846.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_heerak","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_heerak", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_heerak| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|828.6 MB| + +## References + +https://huggingface.co/Heerak/xlm-roberta-base-finetuned-panx-it \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_finetuned_panx_italian_heerak_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_finetuned_panx_italian_heerak_pipeline_en.md new file mode 100644 index 00000000000000..37a16890e90d75 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_finetuned_panx_italian_heerak_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_heerak_pipeline pipeline XlmRoBertaForTokenClassification from Heerak +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_heerak_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_heerak_pipeline` is a English model originally trained by Heerak. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_heerak_pipeline_en_5.5.0_3.0_1726883376459.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_heerak_pipeline_en_5.5.0_3.0_1726883376459.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_heerak_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_heerak_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_heerak_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|828.6 MB| + +## References + +https://huggingface.co/Heerak/xlm-roberta-base-finetuned-panx-it + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_lr5e_06_seed42_basic_eng_train_en.md b/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_lr5e_06_seed42_basic_eng_train_en.md new file mode 100644 index 00000000000000..0e0a8f62599951 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_lr5e_06_seed42_basic_eng_train_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_lr5e_06_seed42_basic_eng_train XlmRoBertaForSequenceClassification from shanhy +author: John Snow Labs +name: xlm_roberta_base_lr5e_06_seed42_basic_eng_train +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_lr5e_06_seed42_basic_eng_train` is a English model originally trained by shanhy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_lr5e_06_seed42_basic_eng_train_en_5.5.0_3.0_1726932938300.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_lr5e_06_seed42_basic_eng_train_en_5.5.0_3.0_1726932938300.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_lr5e_06_seed42_basic_eng_train","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_lr5e_06_seed42_basic_eng_train", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_lr5e_06_seed42_basic_eng_train| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|790.9 MB| + +## References + +https://huggingface.co/shanhy/xlm-roberta-base_lr5e-06_seed42_basic_eng_train \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_lr5e_06_seed42_basic_eng_train_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_lr5e_06_seed42_basic_eng_train_pipeline_en.md new file mode 100644 index 00000000000000..0c98f80472c649 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_lr5e_06_seed42_basic_eng_train_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_lr5e_06_seed42_basic_eng_train_pipeline pipeline XlmRoBertaForSequenceClassification from shanhy +author: John Snow Labs +name: xlm_roberta_base_lr5e_06_seed42_basic_eng_train_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_lr5e_06_seed42_basic_eng_train_pipeline` is a English model originally trained by shanhy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_lr5e_06_seed42_basic_eng_train_pipeline_en_5.5.0_3.0_1726933069506.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_lr5e_06_seed42_basic_eng_train_pipeline_en_5.5.0_3.0_1726933069506.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_lr5e_06_seed42_basic_eng_train_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_lr5e_06_seed42_basic_eng_train_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_lr5e_06_seed42_basic_eng_train_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|790.9 MB| + +## References + +https://huggingface.co/shanhy/xlm-roberta-base_lr5e-06_seed42_basic_eng_train + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_sst2_100_en.md b/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_sst2_100_en.md new file mode 100644 index 00000000000000..e10803b370f879 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_sst2_100_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_sst2_100 XlmRoBertaForSequenceClassification from tmnam20 +author: John Snow Labs +name: xlm_roberta_base_sst2_100 +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_sst2_100` is a English model originally trained by tmnam20. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_sst2_100_en_5.5.0_3.0_1726933380533.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_sst2_100_en_5.5.0_3.0_1726933380533.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_sst2_100","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_sst2_100", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_sst2_100| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|779.4 MB| + +## References + +https://huggingface.co/tmnam20/xlm-roberta-base-sst2-100 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_sst2_100_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_sst2_100_pipeline_en.md new file mode 100644 index 00000000000000..17e993b8d72560 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_sst2_100_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_sst2_100_pipeline pipeline XlmRoBertaForSequenceClassification from tmnam20 +author: John Snow Labs +name: xlm_roberta_base_sst2_100_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_sst2_100_pipeline` is a English model originally trained by tmnam20. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_sst2_100_pipeline_en_5.5.0_3.0_1726933512849.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_sst2_100_pipeline_en_5.5.0_3.0_1726933512849.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_sst2_100_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_sst2_100_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_sst2_100_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|779.4 MB| + +## References + +https://huggingface.co/tmnam20/xlm-roberta-base-sst2-100 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_trimmed_english_30000_xnli_english_en.md b/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_trimmed_english_30000_xnli_english_en.md new file mode 100644 index 00000000000000..6f8e51cad8dc51 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_trimmed_english_30000_xnli_english_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_trimmed_english_30000_xnli_english XlmRoBertaForSequenceClassification from vocabtrimmer +author: John Snow Labs +name: xlm_roberta_base_trimmed_english_30000_xnli_english +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_trimmed_english_30000_xnli_english` is a English model originally trained by vocabtrimmer. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_trimmed_english_30000_xnli_english_en_5.5.0_3.0_1726917949372.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_trimmed_english_30000_xnli_english_en_5.5.0_3.0_1726917949372.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_trimmed_english_30000_xnli_english","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_trimmed_english_30000_xnli_english", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_trimmed_english_30000_xnli_english| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|406.8 MB| + +## References + +https://huggingface.co/vocabtrimmer/xlm-roberta-base-trimmed-en-30000-xnli-en \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_trimmed_english_30000_xnli_english_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_trimmed_english_30000_xnli_english_pipeline_en.md new file mode 100644 index 00000000000000..0f271d7e17f5a4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_trimmed_english_30000_xnli_english_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_trimmed_english_30000_xnli_english_pipeline pipeline XlmRoBertaForSequenceClassification from vocabtrimmer +author: John Snow Labs +name: xlm_roberta_base_trimmed_english_30000_xnli_english_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_trimmed_english_30000_xnli_english_pipeline` is a English model originally trained by vocabtrimmer. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_trimmed_english_30000_xnli_english_pipeline_en_5.5.0_3.0_1726917970616.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_trimmed_english_30000_xnli_english_pipeline_en_5.5.0_3.0_1726917970616.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_trimmed_english_30000_xnli_english_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_trimmed_english_30000_xnli_english_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_trimmed_english_30000_xnli_english_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|406.8 MB| + +## References + +https://huggingface.co/vocabtrimmer/xlm-roberta-base-trimmed-en-30000-xnli-en + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_vsfc_100_en.md b/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_vsfc_100_en.md new file mode 100644 index 00000000000000..a9ab826d65d92f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_vsfc_100_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_vsfc_100 XlmRoBertaForSequenceClassification from tmnam20 +author: John Snow Labs +name: xlm_roberta_base_vsfc_100 +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_vsfc_100` is a English model originally trained by tmnam20. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_vsfc_100_en_5.5.0_3.0_1726919308873.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_vsfc_100_en_5.5.0_3.0_1726919308873.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_vsfc_100","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_vsfc_100", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_vsfc_100| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|773.7 MB| + +## References + +https://huggingface.co/tmnam20/xlm-roberta-base-vsfc-100 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_vsfc_100_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_vsfc_100_pipeline_en.md new file mode 100644 index 00000000000000..cef999f57e6008 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_vsfc_100_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_vsfc_100_pipeline pipeline XlmRoBertaForSequenceClassification from tmnam20 +author: John Snow Labs +name: xlm_roberta_base_vsfc_100_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_vsfc_100_pipeline` is a English model originally trained by tmnam20. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_vsfc_100_pipeline_en_5.5.0_3.0_1726919446713.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_vsfc_100_pipeline_en_5.5.0_3.0_1726919446713.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_vsfc_100_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_vsfc_100_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_vsfc_100_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|773.7 MB| + +## References + +https://huggingface.co/tmnam20/xlm-roberta-base-vsfc-100 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_xnli_arabic_trimmed_arabic_10000_en.md b/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_xnli_arabic_trimmed_arabic_10000_en.md new file mode 100644 index 00000000000000..a0008d08e11428 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_xnli_arabic_trimmed_arabic_10000_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_xnli_arabic_trimmed_arabic_10000 XlmRoBertaForSequenceClassification from vocabtrimmer +author: John Snow Labs +name: xlm_roberta_base_xnli_arabic_trimmed_arabic_10000 +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_xnli_arabic_trimmed_arabic_10000` is a English model originally trained by vocabtrimmer. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_xnli_arabic_trimmed_arabic_10000_en_5.5.0_3.0_1726918880726.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_xnli_arabic_trimmed_arabic_10000_en_5.5.0_3.0_1726918880726.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_xnli_arabic_trimmed_arabic_10000","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_xnli_arabic_trimmed_arabic_10000", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_xnli_arabic_trimmed_arabic_10000| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|352.6 MB| + +## References + +https://huggingface.co/vocabtrimmer/xlm-roberta-base-xnli-ar-trimmed-ar-10000 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_xnli_arabic_trimmed_arabic_10000_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_xnli_arabic_trimmed_arabic_10000_pipeline_en.md new file mode 100644 index 00000000000000..6d0b453b3c1d0c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_xnli_arabic_trimmed_arabic_10000_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_xnli_arabic_trimmed_arabic_10000_pipeline pipeline XlmRoBertaForSequenceClassification from vocabtrimmer +author: John Snow Labs +name: xlm_roberta_base_xnli_arabic_trimmed_arabic_10000_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_xnli_arabic_trimmed_arabic_10000_pipeline` is a English model originally trained by vocabtrimmer. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_xnli_arabic_trimmed_arabic_10000_pipeline_en_5.5.0_3.0_1726918897880.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_xnli_arabic_trimmed_arabic_10000_pipeline_en_5.5.0_3.0_1726918897880.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_xnli_arabic_trimmed_arabic_10000_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_xnli_arabic_trimmed_arabic_10000_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_xnli_arabic_trimmed_arabic_10000_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|352.6 MB| + +## References + +https://huggingface.co/vocabtrimmer/xlm-roberta-base-xnli-ar-trimmed-ar-10000 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_xnli_arabic_trimmed_arabic_15000_en.md b/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_xnli_arabic_trimmed_arabic_15000_en.md new file mode 100644 index 00000000000000..b3231a281a7fd9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_xnli_arabic_trimmed_arabic_15000_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_xnli_arabic_trimmed_arabic_15000 XlmRoBertaForSequenceClassification from vocabtrimmer +author: John Snow Labs +name: xlm_roberta_base_xnli_arabic_trimmed_arabic_15000 +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_xnli_arabic_trimmed_arabic_15000` is a English model originally trained by vocabtrimmer. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_xnli_arabic_trimmed_arabic_15000_en_5.5.0_3.0_1726919005302.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_xnli_arabic_trimmed_arabic_15000_en_5.5.0_3.0_1726919005302.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_xnli_arabic_trimmed_arabic_15000","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_xnli_arabic_trimmed_arabic_15000", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_xnli_arabic_trimmed_arabic_15000| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|364.7 MB| + +## References + +https://huggingface.co/vocabtrimmer/xlm-roberta-base-xnli-ar-trimmed-ar-15000 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_xnli_arabic_trimmed_arabic_15000_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_xnli_arabic_trimmed_arabic_15000_pipeline_en.md new file mode 100644 index 00000000000000..fc3a863ecef395 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_xnli_arabic_trimmed_arabic_15000_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_xnli_arabic_trimmed_arabic_15000_pipeline pipeline XlmRoBertaForSequenceClassification from vocabtrimmer +author: John Snow Labs +name: xlm_roberta_base_xnli_arabic_trimmed_arabic_15000_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_xnli_arabic_trimmed_arabic_15000_pipeline` is a English model originally trained by vocabtrimmer. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_xnli_arabic_trimmed_arabic_15000_pipeline_en_5.5.0_3.0_1726919023452.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_xnli_arabic_trimmed_arabic_15000_pipeline_en_5.5.0_3.0_1726919023452.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_xnli_arabic_trimmed_arabic_15000_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_xnli_arabic_trimmed_arabic_15000_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_xnli_arabic_trimmed_arabic_15000_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|364.7 MB| + +## References + +https://huggingface.co/vocabtrimmer/xlm-roberta-base-xnli-ar-trimmed-ar-15000 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_targin_final_en.md b/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_targin_final_en.md new file mode 100644 index 00000000000000..1e6b931789ef4a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_targin_final_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_targin_final XlmRoBertaForSequenceClassification from SiddharthaM +author: John Snow Labs +name: xlm_roberta_targin_final +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_targin_final` is a English model originally trained by SiddharthaM. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_targin_final_en_5.5.0_3.0_1726932543962.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_targin_final_en_5.5.0_3.0_1726932543962.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_targin_final","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_targin_final", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_targin_final| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|802.9 MB| + +## References + +https://huggingface.co/SiddharthaM/xlm-roberta-targin-final \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-xlm_v_base_trimmed_arabic_xnli_arabic_en.md b/docs/_posts/ahmedlone127/2024-09-21-xlm_v_base_trimmed_arabic_xnli_arabic_en.md new file mode 100644 index 00000000000000..feaac8de76511b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-xlm_v_base_trimmed_arabic_xnli_arabic_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_v_base_trimmed_arabic_xnli_arabic XlmRoBertaForSequenceClassification from vocabtrimmer +author: John Snow Labs +name: xlm_v_base_trimmed_arabic_xnli_arabic +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_v_base_trimmed_arabic_xnli_arabic` is a English model originally trained by vocabtrimmer. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_v_base_trimmed_arabic_xnli_arabic_en_5.5.0_3.0_1726933105156.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_v_base_trimmed_arabic_xnli_arabic_en_5.5.0_3.0_1726933105156.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_v_base_trimmed_arabic_xnli_arabic","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_v_base_trimmed_arabic_xnli_arabic", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_v_base_trimmed_arabic_xnli_arabic| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|530.8 MB| + +## References + +https://huggingface.co/vocabtrimmer/xlm-v-base-trimmed-ar-xnli-ar \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-xlm_v_base_trimmed_arabic_xnli_arabic_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-xlm_v_base_trimmed_arabic_xnli_arabic_pipeline_en.md new file mode 100644 index 00000000000000..656c27171cb01d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-xlm_v_base_trimmed_arabic_xnli_arabic_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_v_base_trimmed_arabic_xnli_arabic_pipeline pipeline XlmRoBertaForSequenceClassification from vocabtrimmer +author: John Snow Labs +name: xlm_v_base_trimmed_arabic_xnli_arabic_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_v_base_trimmed_arabic_xnli_arabic_pipeline` is a English model originally trained by vocabtrimmer. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_v_base_trimmed_arabic_xnli_arabic_pipeline_en_5.5.0_3.0_1726933151960.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_v_base_trimmed_arabic_xnli_arabic_pipeline_en_5.5.0_3.0_1726933151960.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_v_base_trimmed_arabic_xnli_arabic_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_v_base_trimmed_arabic_xnli_arabic_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_v_base_trimmed_arabic_xnli_arabic_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|530.8 MB| + +## References + +https://huggingface.co/vocabtrimmer/xlm-v-base-trimmed-ar-xnli-ar + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-xlmr_english_chinese_all_shuffled_2020_test1000_en.md b/docs/_posts/ahmedlone127/2024-09-21-xlmr_english_chinese_all_shuffled_2020_test1000_en.md new file mode 100644 index 00000000000000..7203bb574519b4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-xlmr_english_chinese_all_shuffled_2020_test1000_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlmr_english_chinese_all_shuffled_2020_test1000 XlmRoBertaForSequenceClassification from patpizio +author: John Snow Labs +name: xlmr_english_chinese_all_shuffled_2020_test1000 +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlmr_english_chinese_all_shuffled_2020_test1000` is a English model originally trained by patpizio. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmr_english_chinese_all_shuffled_2020_test1000_en_5.5.0_3.0_1726932748232.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmr_english_chinese_all_shuffled_2020_test1000_en_5.5.0_3.0_1726932748232.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlmr_english_chinese_all_shuffled_2020_test1000","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlmr_english_chinese_all_shuffled_2020_test1000", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmr_english_chinese_all_shuffled_2020_test1000| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|826.9 MB| + +## References + +https://huggingface.co/patpizio/xlmr-en-zh-all_shuffled-2020-test1000 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-xlmr_english_chinese_all_shuffled_2020_test1000_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-xlmr_english_chinese_all_shuffled_2020_test1000_pipeline_en.md new file mode 100644 index 00000000000000..c14c2d731c4860 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-xlmr_english_chinese_all_shuffled_2020_test1000_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlmr_english_chinese_all_shuffled_2020_test1000_pipeline pipeline XlmRoBertaForSequenceClassification from patpizio +author: John Snow Labs +name: xlmr_english_chinese_all_shuffled_2020_test1000_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlmr_english_chinese_all_shuffled_2020_test1000_pipeline` is a English model originally trained by patpizio. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmr_english_chinese_all_shuffled_2020_test1000_pipeline_en_5.5.0_3.0_1726932859860.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmr_english_chinese_all_shuffled_2020_test1000_pipeline_en_5.5.0_3.0_1726932859860.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlmr_english_chinese_all_shuffled_2020_test1000_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlmr_english_chinese_all_shuffled_2020_test1000_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmr_english_chinese_all_shuffled_2020_test1000_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|826.9 MB| + +## References + +https://huggingface.co/patpizio/xlmr-en-zh-all_shuffled-2020-test1000 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-xnli_xlm_r_only_russian_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-xnli_xlm_r_only_russian_pipeline_en.md new file mode 100644 index 00000000000000..b8c55ea8e96c5e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-xnli_xlm_r_only_russian_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xnli_xlm_r_only_russian_pipeline pipeline XlmRoBertaForSequenceClassification from semindan +author: John Snow Labs +name: xnli_xlm_r_only_russian_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xnli_xlm_r_only_russian_pipeline` is a English model originally trained by semindan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xnli_xlm_r_only_russian_pipeline_en_5.5.0_3.0_1726933699607.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xnli_xlm_r_only_russian_pipeline_en_5.5.0_3.0_1726933699607.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xnli_xlm_r_only_russian_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xnli_xlm_r_only_russian_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xnli_xlm_r_only_russian_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|812.7 MB| + +## References + +https://huggingface.co/semindan/xnli_xlm_r_only_ru + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-zeli_category_en.md b/docs/_posts/ahmedlone127/2024-09-21-zeli_category_en.md new file mode 100644 index 00000000000000..1dbcf7b2dc7bf2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-zeli_category_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English zeli_category DistilBertForSequenceClassification from laxman-zelibot +author: John Snow Labs +name: zeli_category +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`zeli_category` is a English model originally trained by laxman-zelibot. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/zeli_category_en_5.5.0_3.0_1726884588829.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/zeli_category_en_5.5.0_3.0_1726884588829.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("zeli_category","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("zeli_category", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|zeli_category| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/laxman-zelibot/zeli-category \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-zero_shot_learning_en.md b/docs/_posts/ahmedlone127/2024-09-21-zero_shot_learning_en.md new file mode 100644 index 00000000000000..9809fdca4c7054 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-zero_shot_learning_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English zero_shot_learning XlmRoBertaForSequenceClassification from cemilcelik +author: John Snow Labs +name: zero_shot_learning +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`zero_shot_learning` is a English model originally trained by cemilcelik. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/zero_shot_learning_en_5.5.0_3.0_1726918960438.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/zero_shot_learning_en_5.5.0_3.0_1726918960438.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("zero_shot_learning","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("zero_shot_learning", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|zero_shot_learning| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|812.8 MB| + +## References + +https://huggingface.co/cemilcelik/Zero-shot-learning \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-zero_shot_learning_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-zero_shot_learning_pipeline_en.md new file mode 100644 index 00000000000000..c99479540b0461 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-zero_shot_learning_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English zero_shot_learning_pipeline pipeline XlmRoBertaForSequenceClassification from cemilcelik +author: John Snow Labs +name: zero_shot_learning_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`zero_shot_learning_pipeline` is a English model originally trained by cemilcelik. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/zero_shot_learning_pipeline_en_5.5.0_3.0_1726919088349.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/zero_shot_learning_pipeline_en_5.5.0_3.0_1726919088349.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("zero_shot_learning_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("zero_shot_learning_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|zero_shot_learning_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|812.8 MB| + +## References + +https://huggingface.co/cemilcelik/Zero-shot-learning + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-0_000003_0_9_en.md b/docs/_posts/ahmedlone127/2024-09-22-0_000003_0_9_en.md new file mode 100644 index 00000000000000..e9fedf50dac437 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-0_000003_0_9_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English 0_000003_0_9 RoBertaForSequenceClassification from rose-e-wang +author: John Snow Labs +name: 0_000003_0_9 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`0_000003_0_9` is a English model originally trained by rose-e-wang. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/0_000003_0_9_en_5.5.0_3.0_1727016737959.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/0_000003_0_9_en_5.5.0_3.0_1727016737959.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("0_000003_0_9","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("0_000003_0_9", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|0_000003_0_9| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/rose-e-wang/0.000003_0.9 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-0_000003_0_9_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-0_000003_0_9_pipeline_en.md new file mode 100644 index 00000000000000..909d1a5f421932 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-0_000003_0_9_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English 0_000003_0_9_pipeline pipeline RoBertaForSequenceClassification from rose-e-wang +author: John Snow Labs +name: 0_000003_0_9_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`0_000003_0_9_pipeline` is a English model originally trained by rose-e-wang. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/0_000003_0_9_pipeline_en_5.5.0_3.0_1727016821291.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/0_000003_0_9_pipeline_en_5.5.0_3.0_1727016821291.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("0_000003_0_9_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("0_000003_0_9_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|0_000003_0_9_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/rose-e-wang/0.000003_0.9 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-1121223_hw01_en.md b/docs/_posts/ahmedlone127/2024-09-22-1121223_hw01_en.md new file mode 100644 index 00000000000000..f844c8b9a73155 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-1121223_hw01_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English 1121223_hw01 DistilBertForSequenceClassification from hcyang0401 +author: John Snow Labs +name: 1121223_hw01 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`1121223_hw01` is a English model originally trained by hcyang0401. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/1121223_hw01_en_5.5.0_3.0_1726980646376.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/1121223_hw01_en_5.5.0_3.0_1726980646376.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("1121223_hw01","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("1121223_hw01", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|1121223_hw01| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/hcyang0401/1121223_HW01 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-1121223_hw01_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-1121223_hw01_pipeline_en.md new file mode 100644 index 00000000000000..93fb2dc012900b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-1121223_hw01_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English 1121223_hw01_pipeline pipeline DistilBertForSequenceClassification from hcyang0401 +author: John Snow Labs +name: 1121223_hw01_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`1121223_hw01_pipeline` is a English model originally trained by hcyang0401. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/1121223_hw01_pipeline_en_5.5.0_3.0_1726980658613.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/1121223_hw01_pipeline_en_5.5.0_3.0_1726980658613.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("1121223_hw01_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("1121223_hw01_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|1121223_hw01_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/hcyang0401/1121223_HW01 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-20ng_raw_roberta_1e_en.md b/docs/_posts/ahmedlone127/2024-09-22-20ng_raw_roberta_1e_en.md new file mode 100644 index 00000000000000..e273651eb93b0e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-20ng_raw_roberta_1e_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English 20ng_raw_roberta_1e RoBertaForSequenceClassification from pig4431 +author: John Snow Labs +name: 20ng_raw_roberta_1e +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`20ng_raw_roberta_1e` is a English model originally trained by pig4431. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/20ng_raw_roberta_1e_en_5.5.0_3.0_1726971595453.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/20ng_raw_roberta_1e_en_5.5.0_3.0_1726971595453.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("20ng_raw_roberta_1e","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("20ng_raw_roberta_1e", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|20ng_raw_roberta_1e| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|460.8 MB| + +## References + +https://huggingface.co/pig4431/20NG_raw_roBERTa_1E \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-20ng_raw_roberta_1e_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-20ng_raw_roberta_1e_pipeline_en.md new file mode 100644 index 00000000000000..d7d250b4e05a49 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-20ng_raw_roberta_1e_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English 20ng_raw_roberta_1e_pipeline pipeline RoBertaForSequenceClassification from pig4431 +author: John Snow Labs +name: 20ng_raw_roberta_1e_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`20ng_raw_roberta_1e_pipeline` is a English model originally trained by pig4431. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/20ng_raw_roberta_1e_pipeline_en_5.5.0_3.0_1726971619330.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/20ng_raw_roberta_1e_pipeline_en_5.5.0_3.0_1726971619330.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("20ng_raw_roberta_1e_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("20ng_raw_roberta_1e_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|20ng_raw_roberta_1e_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|460.8 MB| + +## References + +https://huggingface.co/pig4431/20NG_raw_roBERTa_1E + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-2504separado3_en.md b/docs/_posts/ahmedlone127/2024-09-22-2504separado3_en.md new file mode 100644 index 00000000000000..48b05d59a04e84 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-2504separado3_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English 2504separado3 RoBertaForSequenceClassification from adriansanz +author: John Snow Labs +name: 2504separado3 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`2504separado3` is a English model originally trained by adriansanz. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/2504separado3_en_5.5.0_3.0_1726972448562.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/2504separado3_en_5.5.0_3.0_1726972448562.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("2504separado3","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("2504separado3", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|2504separado3| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|448.4 MB| + +## References + +https://huggingface.co/adriansanz/2504separado3 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-2504separado3_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-2504separado3_pipeline_en.md new file mode 100644 index 00000000000000..0d1d36148f0f2b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-2504separado3_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English 2504separado3_pipeline pipeline RoBertaForSequenceClassification from adriansanz +author: John Snow Labs +name: 2504separado3_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`2504separado3_pipeline` is a English model originally trained by adriansanz. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/2504separado3_pipeline_en_5.5.0_3.0_1726972479204.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/2504separado3_pipeline_en_5.5.0_3.0_1726972479204.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("2504separado3_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("2504separado3_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|2504separado3_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|448.4 MB| + +## References + +https://huggingface.co/adriansanz/2504separado3 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-4412_model_final_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-4412_model_final_pipeline_en.md new file mode 100644 index 00000000000000..20fe0d45aff86b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-4412_model_final_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English 4412_model_final_pipeline pipeline DistilBertForSequenceClassification from nadim365 +author: John Snow Labs +name: 4412_model_final_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`4412_model_final_pipeline` is a English model originally trained by nadim365. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/4412_model_final_pipeline_en_5.5.0_3.0_1727033400460.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/4412_model_final_pipeline_en_5.5.0_3.0_1727033400460.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("4412_model_final_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("4412_model_final_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|4412_model_final_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/nadim365/4412-model-final + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-adl_hw1_qa_model_en.md b/docs/_posts/ahmedlone127/2024-09-22-adl_hw1_qa_model_en.md new file mode 100644 index 00000000000000..3d7d5dbc65a6d2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-adl_hw1_qa_model_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English adl_hw1_qa_model DistilBertForQuestionAnswering from b09501048 +author: John Snow Labs +name: adl_hw1_qa_model +date: 2024-09-22 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`adl_hw1_qa_model` is a English model originally trained by b09501048. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/adl_hw1_qa_model_en_5.5.0_3.0_1726963533278.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/adl_hw1_qa_model_en_5.5.0_3.0_1726963533278.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("adl_hw1_qa_model","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("adl_hw1_qa_model", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|adl_hw1_qa_model| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|505.4 MB| + +## References + +https://huggingface.co/b09501048/adl_hw1_qa_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-adl_hw1_qa_model_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-adl_hw1_qa_model_pipeline_en.md new file mode 100644 index 00000000000000..86df5199715472 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-adl_hw1_qa_model_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English adl_hw1_qa_model_pipeline pipeline DistilBertForQuestionAnswering from b09501048 +author: John Snow Labs +name: adl_hw1_qa_model_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`adl_hw1_qa_model_pipeline` is a English model originally trained by b09501048. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/adl_hw1_qa_model_pipeline_en_5.5.0_3.0_1726963555273.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/adl_hw1_qa_model_pipeline_en_5.5.0_3.0_1726963555273.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("adl_hw1_qa_model_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("adl_hw1_qa_model_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|adl_hw1_qa_model_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|505.4 MB| + +## References + +https://huggingface.co/b09501048/adl_hw1_qa_model + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-adrv2_en.md b/docs/_posts/ahmedlone127/2024-09-22-adrv2_en.md new file mode 100644 index 00000000000000..24ebcea23df9bc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-adrv2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English adrv2 RoBertaForSequenceClassification from bqr5tf +author: John Snow Labs +name: adrv2 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`adrv2` is a English model originally trained by bqr5tf. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/adrv2_en_5.5.0_3.0_1726971790656.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/adrv2_en_5.5.0_3.0_1726971790656.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("adrv2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("adrv2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|adrv2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/bqr5tf/ADRv2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-adrv2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-adrv2_pipeline_en.md new file mode 100644 index 00000000000000..d2d39395fb09bc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-adrv2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English adrv2_pipeline pipeline RoBertaForSequenceClassification from bqr5tf +author: John Snow Labs +name: adrv2_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`adrv2_pipeline` is a English model originally trained by bqr5tf. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/adrv2_pipeline_en_5.5.0_3.0_1726971864356.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/adrv2_pipeline_en_5.5.0_3.0_1726971864356.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("adrv2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("adrv2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|adrv2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/bqr5tf/ADRv2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-adv_ssm_hw1_fullpara_fulldata_1726281318_en.md b/docs/_posts/ahmedlone127/2024-09-22-adv_ssm_hw1_fullpara_fulldata_1726281318_en.md new file mode 100644 index 00000000000000..905be4db568773 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-adv_ssm_hw1_fullpara_fulldata_1726281318_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English adv_ssm_hw1_fullpara_fulldata_1726281318 RoBertaForSequenceClassification from pristinawang +author: John Snow Labs +name: adv_ssm_hw1_fullpara_fulldata_1726281318 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`adv_ssm_hw1_fullpara_fulldata_1726281318` is a English model originally trained by pristinawang. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/adv_ssm_hw1_fullpara_fulldata_1726281318_en_5.5.0_3.0_1726967299647.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/adv_ssm_hw1_fullpara_fulldata_1726281318_en_5.5.0_3.0_1726967299647.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("adv_ssm_hw1_fullpara_fulldata_1726281318","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("adv_ssm_hw1_fullpara_fulldata_1726281318", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|adv_ssm_hw1_fullpara_fulldata_1726281318| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|438.8 MB| + +## References + +https://huggingface.co/pristinawang/adv-ssm-hw1-fullPara-fullData-1726281318 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-adv_ssm_hw1_fullpara_fulldata_1726281318_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-adv_ssm_hw1_fullpara_fulldata_1726281318_pipeline_en.md new file mode 100644 index 00000000000000..bb03a1d8f07774 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-adv_ssm_hw1_fullpara_fulldata_1726281318_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English adv_ssm_hw1_fullpara_fulldata_1726281318_pipeline pipeline RoBertaForSequenceClassification from pristinawang +author: John Snow Labs +name: adv_ssm_hw1_fullpara_fulldata_1726281318_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`adv_ssm_hw1_fullpara_fulldata_1726281318_pipeline` is a English model originally trained by pristinawang. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/adv_ssm_hw1_fullpara_fulldata_1726281318_pipeline_en_5.5.0_3.0_1726967331226.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/adv_ssm_hw1_fullpara_fulldata_1726281318_pipeline_en_5.5.0_3.0_1726967331226.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("adv_ssm_hw1_fullpara_fulldata_1726281318_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("adv_ssm_hw1_fullpara_fulldata_1726281318_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|adv_ssm_hw1_fullpara_fulldata_1726281318_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|438.9 MB| + +## References + +https://huggingface.co/pristinawang/adv-ssm-hw1-fullPara-fullData-1726281318 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-ai_text_model_en.md b/docs/_posts/ahmedlone127/2024-09-22-ai_text_model_en.md new file mode 100644 index 00000000000000..1b59c8f18c58d6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-ai_text_model_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English ai_text_model BertForSequenceClassification from KaranNag +author: John Snow Labs +name: ai_text_model +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ai_text_model` is a English model originally trained by KaranNag. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ai_text_model_en_5.5.0_3.0_1727032334648.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ai_text_model_en_5.5.0_3.0_1727032334648.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("ai_text_model","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("ai_text_model", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ai_text_model| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/KaranNag/Ai_text_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-ai_text_model_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-ai_text_model_pipeline_en.md new file mode 100644 index 00000000000000..e22144e8b4750f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-ai_text_model_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English ai_text_model_pipeline pipeline BertForSequenceClassification from KaranNag +author: John Snow Labs +name: ai_text_model_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ai_text_model_pipeline` is a English model originally trained by KaranNag. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ai_text_model_pipeline_en_5.5.0_3.0_1727032354740.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ai_text_model_pipeline_en_5.5.0_3.0_1727032354740.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("ai_text_model_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("ai_text_model_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ai_text_model_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/KaranNag/Ai_text_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-aigc_detector_zhv2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-aigc_detector_zhv2_pipeline_en.md new file mode 100644 index 00000000000000..24c4ebe612a715 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-aigc_detector_zhv2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English aigc_detector_zhv2_pipeline pipeline BertForSequenceClassification from yuchuantian +author: John Snow Labs +name: aigc_detector_zhv2_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`aigc_detector_zhv2_pipeline` is a English model originally trained by yuchuantian. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/aigc_detector_zhv2_pipeline_en_5.5.0_3.0_1727034670519.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/aigc_detector_zhv2_pipeline_en_5.5.0_3.0_1727034670519.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("aigc_detector_zhv2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("aigc_detector_zhv2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|aigc_detector_zhv2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|383.2 MB| + +## References + +https://huggingface.co/yuchuantian/AIGC_detector_zhv2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-akai_flow_classifier_kmai_dev_test_bot_en.md b/docs/_posts/ahmedlone127/2024-09-22-akai_flow_classifier_kmai_dev_test_bot_en.md new file mode 100644 index 00000000000000..a00f992a6b5ebf --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-akai_flow_classifier_kmai_dev_test_bot_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English akai_flow_classifier_kmai_dev_test_bot BertForSequenceClassification from GautamR +author: John Snow Labs +name: akai_flow_classifier_kmai_dev_test_bot +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`akai_flow_classifier_kmai_dev_test_bot` is a English model originally trained by GautamR. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/akai_flow_classifier_kmai_dev_test_bot_en_5.5.0_3.0_1727007410438.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/akai_flow_classifier_kmai_dev_test_bot_en_5.5.0_3.0_1727007410438.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("akai_flow_classifier_kmai_dev_test_bot","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("akai_flow_classifier_kmai_dev_test_bot", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|akai_flow_classifier_kmai_dev_test_bot| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/GautamR/akai_flow_classifier_kmai_dev_test_bot \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-akai_flow_classifier_kmai_dev_test_bot_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-akai_flow_classifier_kmai_dev_test_bot_pipeline_en.md new file mode 100644 index 00000000000000..859ac0793d2495 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-akai_flow_classifier_kmai_dev_test_bot_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English akai_flow_classifier_kmai_dev_test_bot_pipeline pipeline BertForSequenceClassification from GautamR +author: John Snow Labs +name: akai_flow_classifier_kmai_dev_test_bot_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`akai_flow_classifier_kmai_dev_test_bot_pipeline` is a English model originally trained by GautamR. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/akai_flow_classifier_kmai_dev_test_bot_pipeline_en_5.5.0_3.0_1727007428815.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/akai_flow_classifier_kmai_dev_test_bot_pipeline_en_5.5.0_3.0_1727007428815.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("akai_flow_classifier_kmai_dev_test_bot_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("akai_flow_classifier_kmai_dev_test_bot_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|akai_flow_classifier_kmai_dev_test_bot_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/GautamR/akai_flow_classifier_kmai_dev_test_bot + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-all_roberta_large_v1_banking_8_16_5_oos_en.md b/docs/_posts/ahmedlone127/2024-09-22-all_roberta_large_v1_banking_8_16_5_oos_en.md new file mode 100644 index 00000000000000..bbd1345f9e0c24 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-all_roberta_large_v1_banking_8_16_5_oos_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English all_roberta_large_v1_banking_8_16_5_oos RoBertaForSequenceClassification from fathyshalab +author: John Snow Labs +name: all_roberta_large_v1_banking_8_16_5_oos +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`all_roberta_large_v1_banking_8_16_5_oos` is a English model originally trained by fathyshalab. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/all_roberta_large_v1_banking_8_16_5_oos_en_5.5.0_3.0_1727026800688.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/all_roberta_large_v1_banking_8_16_5_oos_en_5.5.0_3.0_1727026800688.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("all_roberta_large_v1_banking_8_16_5_oos","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("all_roberta_large_v1_banking_8_16_5_oos", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|all_roberta_large_v1_banking_8_16_5_oos| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/fathyshalab/all-roberta-large-v1-banking-8-16-5-oos \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-all_roberta_large_v1_home_1000_16_5_oos_en.md b/docs/_posts/ahmedlone127/2024-09-22-all_roberta_large_v1_home_1000_16_5_oos_en.md new file mode 100644 index 00000000000000..7cec4993506a68 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-all_roberta_large_v1_home_1000_16_5_oos_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English all_roberta_large_v1_home_1000_16_5_oos RoBertaForSequenceClassification from fathyshalab +author: John Snow Labs +name: all_roberta_large_v1_home_1000_16_5_oos +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`all_roberta_large_v1_home_1000_16_5_oos` is a English model originally trained by fathyshalab. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/all_roberta_large_v1_home_1000_16_5_oos_en_5.5.0_3.0_1727036964269.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/all_roberta_large_v1_home_1000_16_5_oos_en_5.5.0_3.0_1727036964269.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("all_roberta_large_v1_home_1000_16_5_oos","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("all_roberta_large_v1_home_1000_16_5_oos", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|all_roberta_large_v1_home_1000_16_5_oos| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/fathyshalab/all-roberta-large-v1-home-1000-16-5-oos \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-all_roberta_large_v1_home_1_16_5_en.md b/docs/_posts/ahmedlone127/2024-09-22-all_roberta_large_v1_home_1_16_5_en.md new file mode 100644 index 00000000000000..26f017bc705c3e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-all_roberta_large_v1_home_1_16_5_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English all_roberta_large_v1_home_1_16_5 RoBertaForSequenceClassification from fathyshalab +author: John Snow Labs +name: all_roberta_large_v1_home_1_16_5 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`all_roberta_large_v1_home_1_16_5` is a English model originally trained by fathyshalab. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/all_roberta_large_v1_home_1_16_5_en_5.5.0_3.0_1727027488070.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/all_roberta_large_v1_home_1_16_5_en_5.5.0_3.0_1727027488070.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("all_roberta_large_v1_home_1_16_5","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("all_roberta_large_v1_home_1_16_5", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|all_roberta_large_v1_home_1_16_5| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/fathyshalab/all-roberta-large-v1-home-1-16-5 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-all_roberta_large_v1_home_1_16_5_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-all_roberta_large_v1_home_1_16_5_pipeline_en.md new file mode 100644 index 00000000000000..7bcb547c135ce5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-all_roberta_large_v1_home_1_16_5_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English all_roberta_large_v1_home_1_16_5_pipeline pipeline RoBertaForSequenceClassification from fathyshalab +author: John Snow Labs +name: all_roberta_large_v1_home_1_16_5_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`all_roberta_large_v1_home_1_16_5_pipeline` is a English model originally trained by fathyshalab. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/all_roberta_large_v1_home_1_16_5_pipeline_en_5.5.0_3.0_1727027563561.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/all_roberta_large_v1_home_1_16_5_pipeline_en_5.5.0_3.0_1727027563561.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("all_roberta_large_v1_home_1_16_5_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("all_roberta_large_v1_home_1_16_5_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|all_roberta_large_v1_home_1_16_5_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/fathyshalab/all-roberta-large-v1-home-1-16-5 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-amazon_review_sentiment_analysis_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-amazon_review_sentiment_analysis_pipeline_en.md new file mode 100644 index 00000000000000..9264672fe9cf32 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-amazon_review_sentiment_analysis_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English amazon_review_sentiment_analysis_pipeline pipeline DistilBertForSequenceClassification from Mekteck +author: John Snow Labs +name: amazon_review_sentiment_analysis_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`amazon_review_sentiment_analysis_pipeline` is a English model originally trained by Mekteck. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/amazon_review_sentiment_analysis_pipeline_en_5.5.0_3.0_1726980335068.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/amazon_review_sentiment_analysis_pipeline_en_5.5.0_3.0_1726980335068.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("amazon_review_sentiment_analysis_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("amazon_review_sentiment_analysis_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|amazon_review_sentiment_analysis_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Mekteck/amazon-review-sentiment-analysis + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-amt_classifier_roberta_large_en.md b/docs/_posts/ahmedlone127/2024-09-22-amt_classifier_roberta_large_en.md new file mode 100644 index 00000000000000..87d1a4a66dc3b1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-amt_classifier_roberta_large_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English amt_classifier_roberta_large RoBertaForSequenceClassification from hallisky +author: John Snow Labs +name: amt_classifier_roberta_large +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`amt_classifier_roberta_large` is a English model originally trained by hallisky. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/amt_classifier_roberta_large_en_5.5.0_3.0_1726967782516.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/amt_classifier_roberta_large_en_5.5.0_3.0_1726967782516.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("amt_classifier_roberta_large","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("amt_classifier_roberta_large", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|amt_classifier_roberta_large| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/hallisky/amt-classifier-roberta-large \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-amt_classifier_roberta_large_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-amt_classifier_roberta_large_pipeline_en.md new file mode 100644 index 00000000000000..88bd207df87026 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-amt_classifier_roberta_large_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English amt_classifier_roberta_large_pipeline pipeline RoBertaForSequenceClassification from hallisky +author: John Snow Labs +name: amt_classifier_roberta_large_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`amt_classifier_roberta_large_pipeline` is a English model originally trained by hallisky. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/amt_classifier_roberta_large_pipeline_en_5.5.0_3.0_1726967862340.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/amt_classifier_roberta_large_pipeline_en_5.5.0_3.0_1726967862340.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("amt_classifier_roberta_large_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("amt_classifier_roberta_large_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|amt_classifier_roberta_large_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/hallisky/amt-classifier-roberta-large + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-anakilang_kelas_ai_en.md b/docs/_posts/ahmedlone127/2024-09-22-anakilang_kelas_ai_en.md new file mode 100644 index 00000000000000..5aac0cd671dd1b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-anakilang_kelas_ai_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English anakilang_kelas_ai RoBertaForSequenceClassification from GilarYa +author: John Snow Labs +name: anakilang_kelas_ai +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`anakilang_kelas_ai` is a English model originally trained by GilarYa. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/anakilang_kelas_ai_en_5.5.0_3.0_1727017587303.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/anakilang_kelas_ai_en_5.5.0_3.0_1727017587303.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("anakilang_kelas_ai","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("anakilang_kelas_ai", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|anakilang_kelas_ai| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|432.6 MB| + +## References + +https://huggingface.co/GilarYa/anakilang-kelas-ai \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-anakilang_kelas_ai_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-anakilang_kelas_ai_pipeline_en.md new file mode 100644 index 00000000000000..dbaedacd8bca88 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-anakilang_kelas_ai_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English anakilang_kelas_ai_pipeline pipeline RoBertaForSequenceClassification from GilarYa +author: John Snow Labs +name: anakilang_kelas_ai_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`anakilang_kelas_ai_pipeline` is a English model originally trained by GilarYa. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/anakilang_kelas_ai_pipeline_en_5.5.0_3.0_1727017611434.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/anakilang_kelas_ai_pipeline_en_5.5.0_3.0_1727017611434.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("anakilang_kelas_ai_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("anakilang_kelas_ai_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|anakilang_kelas_ai_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|432.6 MB| + +## References + +https://huggingface.co/GilarYa/anakilang-kelas-ai + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-araroberta_jo_ar.md b/docs/_posts/ahmedlone127/2024-09-22-araroberta_jo_ar.md new file mode 100644 index 00000000000000..4a0f60760f9b9e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-araroberta_jo_ar.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Arabic araroberta_jo RoBertaEmbeddings from reemalyami +author: John Snow Labs +name: araroberta_jo +date: 2024-09-22 +tags: [ar, open_source, onnx, embeddings, roberta] +task: Embeddings +language: ar +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`araroberta_jo` is a Arabic model originally trained by reemalyami. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/araroberta_jo_ar_5.5.0_3.0_1726999512821.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/araroberta_jo_ar_5.5.0_3.0_1726999512821.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("araroberta_jo","ar") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("araroberta_jo","ar") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|araroberta_jo| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|ar| +|Size:|470.6 MB| + +## References + +https://huggingface.co/reemalyami/AraRoBERTa-JO \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-araroberta_jo_pipeline_ar.md b/docs/_posts/ahmedlone127/2024-09-22-araroberta_jo_pipeline_ar.md new file mode 100644 index 00000000000000..3ee221e742c253 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-araroberta_jo_pipeline_ar.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Arabic araroberta_jo_pipeline pipeline RoBertaEmbeddings from reemalyami +author: John Snow Labs +name: araroberta_jo_pipeline +date: 2024-09-22 +tags: [ar, open_source, pipeline, onnx] +task: Embeddings +language: ar +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`araroberta_jo_pipeline` is a Arabic model originally trained by reemalyami. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/araroberta_jo_pipeline_ar_5.5.0_3.0_1726999534008.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/araroberta_jo_pipeline_ar_5.5.0_3.0_1726999534008.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("araroberta_jo_pipeline", lang = "ar") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("araroberta_jo_pipeline", lang = "ar") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|araroberta_jo_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|ar| +|Size:|470.7 MB| + +## References + +https://huggingface.co/reemalyami/AraRoBERTa-JO + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-augmented_model_fast_2b_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-augmented_model_fast_2b_pipeline_en.md new file mode 100644 index 00000000000000..21b65570a446c7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-augmented_model_fast_2b_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English augmented_model_fast_2b_pipeline pipeline DistilBertForSequenceClassification from LeonardoFettucciari +author: John Snow Labs +name: augmented_model_fast_2b_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`augmented_model_fast_2b_pipeline` is a English model originally trained by LeonardoFettucciari. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/augmented_model_fast_2b_pipeline_en_5.5.0_3.0_1727033153203.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/augmented_model_fast_2b_pipeline_en_5.5.0_3.0_1727033153203.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("augmented_model_fast_2b_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("augmented_model_fast_2b_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|augmented_model_fast_2b_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/LeonardoFettucciari/augmented_model_fast_2b + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-auro_1_en.md b/docs/_posts/ahmedlone127/2024-09-22-auro_1_en.md new file mode 100644 index 00000000000000..850cc0a4ec240e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-auro_1_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English auro_1 RoBertaForSequenceClassification from BaronSch +author: John Snow Labs +name: auro_1 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`auro_1` is a English model originally trained by BaronSch. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/auro_1_en_5.5.0_3.0_1726972010668.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/auro_1_en_5.5.0_3.0_1726972010668.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("auro_1","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("auro_1", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|auro_1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|468.5 MB| + +## References + +https://huggingface.co/BaronSch/AURO_1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-auro_1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-auro_1_pipeline_en.md new file mode 100644 index 00000000000000..9da52e866f0c9c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-auro_1_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English auro_1_pipeline pipeline RoBertaForSequenceClassification from BaronSch +author: John Snow Labs +name: auro_1_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`auro_1_pipeline` is a English model originally trained by BaronSch. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/auro_1_pipeline_en_5.5.0_3.0_1726972032341.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/auro_1_pipeline_en_5.5.0_3.0_1726972032341.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("auro_1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("auro_1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|auro_1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|468.5 MB| + +## References + +https://huggingface.co/BaronSch/AURO_1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-authdetect_test_en.md b/docs/_posts/ahmedlone127/2024-09-22-authdetect_test_en.md new file mode 100644 index 00000000000000..2a31badf994be0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-authdetect_test_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English authdetect_test RoBertaForSequenceClassification from mmochtak +author: John Snow Labs +name: authdetect_test +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`authdetect_test` is a English model originally trained by mmochtak. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/authdetect_test_en_5.5.0_3.0_1726967649157.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/authdetect_test_en_5.5.0_3.0_1726967649157.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("authdetect_test","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("authdetect_test", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|authdetect_test| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|459.6 MB| + +## References + +https://huggingface.co/mmochtak/authdetect_test \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-authdetect_test_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-authdetect_test_pipeline_en.md new file mode 100644 index 00000000000000..b92d7267d23049 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-authdetect_test_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English authdetect_test_pipeline pipeline RoBertaForSequenceClassification from mmochtak +author: John Snow Labs +name: authdetect_test_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`authdetect_test_pipeline` is a English model originally trained by mmochtak. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/authdetect_test_pipeline_en_5.5.0_3.0_1726967672763.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/authdetect_test_pipeline_en_5.5.0_3.0_1726967672763.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("authdetect_test_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("authdetect_test_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|authdetect_test_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|459.6 MB| + +## References + +https://huggingface.co/mmochtak/authdetect_test + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-autonlp_bp_29016523_en.md b/docs/_posts/ahmedlone127/2024-09-22-autonlp_bp_29016523_en.md new file mode 100644 index 00000000000000..6baa64f3209e21 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-autonlp_bp_29016523_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English autonlp_bp_29016523 BertForSequenceClassification from JushBJJ +author: John Snow Labs +name: autonlp_bp_29016523 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`autonlp_bp_29016523` is a English model originally trained by JushBJJ. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/autonlp_bp_29016523_en_5.5.0_3.0_1727007933086.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/autonlp_bp_29016523_en_5.5.0_3.0_1727007933086.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("autonlp_bp_29016523","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("autonlp_bp_29016523", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|autonlp_bp_29016523| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/JushBJJ/autonlp-bp-29016523 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-autonlp_bp_29016523_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-autonlp_bp_29016523_pipeline_en.md new file mode 100644 index 00000000000000..4cee505e1ceba4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-autonlp_bp_29016523_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English autonlp_bp_29016523_pipeline pipeline BertForSequenceClassification from JushBJJ +author: John Snow Labs +name: autonlp_bp_29016523_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`autonlp_bp_29016523_pipeline` is a English model originally trained by JushBJJ. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/autonlp_bp_29016523_pipeline_en_5.5.0_3.0_1727007984994.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/autonlp_bp_29016523_pipeline_en_5.5.0_3.0_1727007984994.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("autonlp_bp_29016523_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("autonlp_bp_29016523_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|autonlp_bp_29016523_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/JushBJJ/autonlp-bp-29016523 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-autotrain_2_roberta_r_53890126861_en.md b/docs/_posts/ahmedlone127/2024-09-22-autotrain_2_roberta_r_53890126861_en.md new file mode 100644 index 00000000000000..d21bfbe1d01d7c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-autotrain_2_roberta_r_53890126861_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English autotrain_2_roberta_r_53890126861 BertForTokenClassification from tinyYhorm +author: John Snow Labs +name: autotrain_2_roberta_r_53890126861 +date: 2024-09-22 +tags: [en, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`autotrain_2_roberta_r_53890126861` is a English model originally trained by tinyYhorm. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/autotrain_2_roberta_r_53890126861_en_5.5.0_3.0_1726977296993.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/autotrain_2_roberta_r_53890126861_en_5.5.0_3.0_1726977296993.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("autotrain_2_roberta_r_53890126861","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("autotrain_2_roberta_r_53890126861", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|autotrain_2_roberta_r_53890126861| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|381.1 MB| + +## References + +https://huggingface.co/tinyYhorm/autotrain-2-roberta-r-53890126861 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-autotrain_2_roberta_r_53890126861_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-autotrain_2_roberta_r_53890126861_pipeline_en.md new file mode 100644 index 00000000000000..a7cc00696f6cb5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-autotrain_2_roberta_r_53890126861_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English autotrain_2_roberta_r_53890126861_pipeline pipeline BertForTokenClassification from tinyYhorm +author: John Snow Labs +name: autotrain_2_roberta_r_53890126861_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`autotrain_2_roberta_r_53890126861_pipeline` is a English model originally trained by tinyYhorm. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/autotrain_2_roberta_r_53890126861_pipeline_en_5.5.0_3.0_1726977313813.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/autotrain_2_roberta_r_53890126861_pipeline_en_5.5.0_3.0_1726977313813.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("autotrain_2_roberta_r_53890126861_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("autotrain_2_roberta_r_53890126861_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|autotrain_2_roberta_r_53890126861_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|381.1 MB| + +## References + +https://huggingface.co/tinyYhorm/autotrain-2-roberta-r-53890126861 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-b706f448_b1e7_4383_912d_79006b0f7393_en.md b/docs/_posts/ahmedlone127/2024-09-22-b706f448_b1e7_4383_912d_79006b0f7393_en.md new file mode 100644 index 00000000000000..dfed2e09c00287 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-b706f448_b1e7_4383_912d_79006b0f7393_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English b706f448_b1e7_4383_912d_79006b0f7393 RoBertaForSequenceClassification from IDQO +author: John Snow Labs +name: b706f448_b1e7_4383_912d_79006b0f7393 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`b706f448_b1e7_4383_912d_79006b0f7393` is a English model originally trained by IDQO. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/b706f448_b1e7_4383_912d_79006b0f7393_en_5.5.0_3.0_1726971982863.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/b706f448_b1e7_4383_912d_79006b0f7393_en_5.5.0_3.0_1726971982863.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("b706f448_b1e7_4383_912d_79006b0f7393","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("b706f448_b1e7_4383_912d_79006b0f7393", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|b706f448_b1e7_4383_912d_79006b0f7393| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|437.9 MB| + +## References + +https://huggingface.co/IDQO/b706f448-b1e7-4383-912d-79006b0f7393 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-b706f448_b1e7_4383_912d_79006b0f7393_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-b706f448_b1e7_4383_912d_79006b0f7393_pipeline_en.md new file mode 100644 index 00000000000000..62388aeb1e7878 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-b706f448_b1e7_4383_912d_79006b0f7393_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English b706f448_b1e7_4383_912d_79006b0f7393_pipeline pipeline RoBertaForSequenceClassification from IDQO +author: John Snow Labs +name: b706f448_b1e7_4383_912d_79006b0f7393_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`b706f448_b1e7_4383_912d_79006b0f7393_pipeline` is a English model originally trained by IDQO. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/b706f448_b1e7_4383_912d_79006b0f7393_pipeline_en_5.5.0_3.0_1726972004243.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/b706f448_b1e7_4383_912d_79006b0f7393_pipeline_en_5.5.0_3.0_1726972004243.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("b706f448_b1e7_4383_912d_79006b0f7393_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("b706f448_b1e7_4383_912d_79006b0f7393_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|b706f448_b1e7_4383_912d_79006b0f7393_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|438.0 MB| + +## References + +https://huggingface.co/IDQO/b706f448-b1e7-4383-912d-79006b0f7393 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bae_roberta_base_rte_5_en.md b/docs/_posts/ahmedlone127/2024-09-22-bae_roberta_base_rte_5_en.md new file mode 100644 index 00000000000000..1e48e958f26619 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bae_roberta_base_rte_5_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bae_roberta_base_rte_5 RoBertaForSequenceClassification from korca +author: John Snow Labs +name: bae_roberta_base_rte_5 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bae_roberta_base_rte_5` is a English model originally trained by korca. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bae_roberta_base_rte_5_en_5.5.0_3.0_1726971796925.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bae_roberta_base_rte_5_en_5.5.0_3.0_1726971796925.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("bae_roberta_base_rte_5","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("bae_roberta_base_rte_5", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bae_roberta_base_rte_5| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|447.9 MB| + +## References + +https://huggingface.co/korca/bae-roberta-base-rte-5 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bae_roberta_base_rte_5_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-bae_roberta_base_rte_5_pipeline_en.md new file mode 100644 index 00000000000000..effe1eb06f0e94 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bae_roberta_base_rte_5_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bae_roberta_base_rte_5_pipeline pipeline RoBertaForSequenceClassification from korca +author: John Snow Labs +name: bae_roberta_base_rte_5_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bae_roberta_base_rte_5_pipeline` is a English model originally trained by korca. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bae_roberta_base_rte_5_pipeline_en_5.5.0_3.0_1726971822147.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bae_roberta_base_rte_5_pipeline_en_5.5.0_3.0_1726971822147.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bae_roberta_base_rte_5_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bae_roberta_base_rte_5_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bae_roberta_base_rte_5_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|448.0 MB| + +## References + +https://huggingface.co/korca/bae-roberta-base-rte-5 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-balanced_augmented_bert_large_gest_pred_seqeval_partialmatch_2_en.md b/docs/_posts/ahmedlone127/2024-09-22-balanced_augmented_bert_large_gest_pred_seqeval_partialmatch_2_en.md new file mode 100644 index 00000000000000..f70ebb245cf551 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-balanced_augmented_bert_large_gest_pred_seqeval_partialmatch_2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English balanced_augmented_bert_large_gest_pred_seqeval_partialmatch_2 BertForTokenClassification from Jsevisal +author: John Snow Labs +name: balanced_augmented_bert_large_gest_pred_seqeval_partialmatch_2 +date: 2024-09-22 +tags: [en, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`balanced_augmented_bert_large_gest_pred_seqeval_partialmatch_2` is a English model originally trained by Jsevisal. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/balanced_augmented_bert_large_gest_pred_seqeval_partialmatch_2_en_5.5.0_3.0_1727041010402.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/balanced_augmented_bert_large_gest_pred_seqeval_partialmatch_2_en_5.5.0_3.0_1727041010402.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("balanced_augmented_bert_large_gest_pred_seqeval_partialmatch_2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("balanced_augmented_bert_large_gest_pred_seqeval_partialmatch_2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|balanced_augmented_bert_large_gest_pred_seqeval_partialmatch_2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/Jsevisal/balanced-augmented-bert-large-gest-pred-seqeval-partialmatch-2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-balanced_augmented_bert_large_gest_pred_seqeval_partialmatch_2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-balanced_augmented_bert_large_gest_pred_seqeval_partialmatch_2_pipeline_en.md new file mode 100644 index 00000000000000..5ae59ff6ca7c36 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-balanced_augmented_bert_large_gest_pred_seqeval_partialmatch_2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English balanced_augmented_bert_large_gest_pred_seqeval_partialmatch_2_pipeline pipeline BertForTokenClassification from Jsevisal +author: John Snow Labs +name: balanced_augmented_bert_large_gest_pred_seqeval_partialmatch_2_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`balanced_augmented_bert_large_gest_pred_seqeval_partialmatch_2_pipeline` is a English model originally trained by Jsevisal. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/balanced_augmented_bert_large_gest_pred_seqeval_partialmatch_2_pipeline_en_5.5.0_3.0_1727041070860.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/balanced_augmented_bert_large_gest_pred_seqeval_partialmatch_2_pipeline_en_5.5.0_3.0_1727041070860.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("balanced_augmented_bert_large_gest_pred_seqeval_partialmatch_2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("balanced_augmented_bert_large_gest_pred_seqeval_partialmatch_2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|balanced_augmented_bert_large_gest_pred_seqeval_partialmatch_2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/Jsevisal/balanced-augmented-bert-large-gest-pred-seqeval-partialmatch-2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_cased_finetuned_emotion_ncduy_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_cased_finetuned_emotion_ncduy_en.md new file mode 100644 index 00000000000000..8aa0498e818777 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_cased_finetuned_emotion_ncduy_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_cased_finetuned_emotion_ncduy BertForSequenceClassification from ncduy +author: John Snow Labs +name: bert_base_cased_finetuned_emotion_ncduy +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_cased_finetuned_emotion_ncduy` is a English model originally trained by ncduy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_cased_finetuned_emotion_ncduy_en_5.5.0_3.0_1727007715737.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_cased_finetuned_emotion_ncduy_en_5.5.0_3.0_1727007715737.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_cased_finetuned_emotion_ncduy","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_cased_finetuned_emotion_ncduy", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_cased_finetuned_emotion_ncduy| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|405.9 MB| + +## References + +https://huggingface.co/ncduy/bert-base-cased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_cased_finetuned_emotion_ncduy_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_cased_finetuned_emotion_ncduy_pipeline_en.md new file mode 100644 index 00000000000000..ee3d2fec1b7e6f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_cased_finetuned_emotion_ncduy_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_base_cased_finetuned_emotion_ncduy_pipeline pipeline BertForSequenceClassification from ncduy +author: John Snow Labs +name: bert_base_cased_finetuned_emotion_ncduy_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_cased_finetuned_emotion_ncduy_pipeline` is a English model originally trained by ncduy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_cased_finetuned_emotion_ncduy_pipeline_en_5.5.0_3.0_1727007733497.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_cased_finetuned_emotion_ncduy_pipeline_en_5.5.0_3.0_1727007733497.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_cased_finetuned_emotion_ncduy_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_cased_finetuned_emotion_ncduy_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_cased_finetuned_emotion_ncduy_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|405.9 MB| + +## References + +https://huggingface.co/ncduy/bert-base-cased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_cased_finetuned_runaways_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_cased_finetuned_runaways_en.md new file mode 100644 index 00000000000000..79650bcb964bc9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_cased_finetuned_runaways_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_cased_finetuned_runaways BertForQuestionAnswering from Nadav +author: John Snow Labs +name: bert_base_cased_finetuned_runaways +date: 2024-09-22 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_cased_finetuned_runaways` is a English model originally trained by Nadav. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_cased_finetuned_runaways_en_5.5.0_3.0_1726991667225.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_cased_finetuned_runaways_en_5.5.0_3.0_1726991667225.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_cased_finetuned_runaways","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_cased_finetuned_runaways", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_cased_finetuned_runaways| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/Nadav/bert-base-cased-finetuned-runaways \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_cased_finetuned_runaways_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_cased_finetuned_runaways_pipeline_en.md new file mode 100644 index 00000000000000..b59ac425089ef6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_cased_finetuned_runaways_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_cased_finetuned_runaways_pipeline pipeline BertForQuestionAnswering from Nadav +author: John Snow Labs +name: bert_base_cased_finetuned_runaways_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_cased_finetuned_runaways_pipeline` is a English model originally trained by Nadav. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_cased_finetuned_runaways_pipeline_en_5.5.0_3.0_1726991687595.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_cased_finetuned_runaways_pipeline_en_5.5.0_3.0_1726991687595.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_cased_finetuned_runaways_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_cased_finetuned_runaways_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_cased_finetuned_runaways_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/Nadav/bert-base-cased-finetuned-runaways + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_cased_finetuned_squad_bosnian_16_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_cased_finetuned_squad_bosnian_16_en.md new file mode 100644 index 00000000000000..6a413e512c597f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_cased_finetuned_squad_bosnian_16_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_cased_finetuned_squad_bosnian_16 BertForQuestionAnswering from lauraparra28 +author: John Snow Labs +name: bert_base_cased_finetuned_squad_bosnian_16 +date: 2024-09-22 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_cased_finetuned_squad_bosnian_16` is a English model originally trained by lauraparra28. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_cased_finetuned_squad_bosnian_16_en_5.5.0_3.0_1727042658834.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_cased_finetuned_squad_bosnian_16_en_5.5.0_3.0_1727042658834.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_cased_finetuned_squad_bosnian_16","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_cased_finetuned_squad_bosnian_16", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_cased_finetuned_squad_bosnian_16| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/lauraparra28/bert-base-cased-finetuned-squad-bs_16 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_cased_finetuned_squad_bosnian_16_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_cased_finetuned_squad_bosnian_16_pipeline_en.md new file mode 100644 index 00000000000000..40b9c30c23c7de --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_cased_finetuned_squad_bosnian_16_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_cased_finetuned_squad_bosnian_16_pipeline pipeline BertForQuestionAnswering from lauraparra28 +author: John Snow Labs +name: bert_base_cased_finetuned_squad_bosnian_16_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_cased_finetuned_squad_bosnian_16_pipeline` is a English model originally trained by lauraparra28. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_cased_finetuned_squad_bosnian_16_pipeline_en_5.5.0_3.0_1727042679168.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_cased_finetuned_squad_bosnian_16_pipeline_en_5.5.0_3.0_1727042679168.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_cased_finetuned_squad_bosnian_16_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_cased_finetuned_squad_bosnian_16_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_cased_finetuned_squad_bosnian_16_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/lauraparra28/bert-base-cased-finetuned-squad-bs_16 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_cased_finetuned_squad_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_cased_finetuned_squad_en.md new file mode 100644 index 00000000000000..39adf53add0597 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_cased_finetuned_squad_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_cased_finetuned_squad BertForQuestionAnswering from Arup-Dutta-Bappy +author: John Snow Labs +name: bert_base_cased_finetuned_squad +date: 2024-09-22 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_cased_finetuned_squad` is a English model originally trained by Arup-Dutta-Bappy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_cased_finetuned_squad_en_5.5.0_3.0_1727049190012.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_cased_finetuned_squad_en_5.5.0_3.0_1727049190012.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_cased_finetuned_squad","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_cased_finetuned_squad", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_cased_finetuned_squad| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/Arup-Dutta-Bappy/bert-base-cased-finetuned-squad \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_cased_finetuned_squad_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_cased_finetuned_squad_pipeline_en.md new file mode 100644 index 00000000000000..8c171b7919be8b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_cased_finetuned_squad_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_cased_finetuned_squad_pipeline pipeline BertForQuestionAnswering from Arup-Dutta-Bappy +author: John Snow Labs +name: bert_base_cased_finetuned_squad_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_cased_finetuned_squad_pipeline` is a English model originally trained by Arup-Dutta-Bappy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_cased_finetuned_squad_pipeline_en_5.5.0_3.0_1727049216807.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_cased_finetuned_squad_pipeline_en_5.5.0_3.0_1727049216807.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_cased_finetuned_squad_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_cased_finetuned_squad_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_cased_finetuned_squad_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/Arup-Dutta-Bappy/bert-base-cased-finetuned-squad + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_cased_imdb_sequence_classification_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_cased_imdb_sequence_classification_en.md new file mode 100644 index 00000000000000..7134c420042e18 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_cased_imdb_sequence_classification_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_cased_imdb_sequence_classification BertForSequenceClassification from ykacer +author: John Snow Labs +name: bert_base_cased_imdb_sequence_classification +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_cased_imdb_sequence_classification` is a English model originally trained by ykacer. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_cased_imdb_sequence_classification_en_5.5.0_3.0_1726988461475.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_cased_imdb_sequence_classification_en_5.5.0_3.0_1726988461475.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_cased_imdb_sequence_classification","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_cased_imdb_sequence_classification", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_cased_imdb_sequence_classification| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|405.9 MB| + +## References + +https://huggingface.co/ykacer/bert-base-cased-imdb-sequence-classification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_cased_imdb_sequence_classification_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_cased_imdb_sequence_classification_pipeline_en.md new file mode 100644 index 00000000000000..4793fcf370c4ee --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_cased_imdb_sequence_classification_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_base_cased_imdb_sequence_classification_pipeline pipeline BertForSequenceClassification from ykacer +author: John Snow Labs +name: bert_base_cased_imdb_sequence_classification_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_cased_imdb_sequence_classification_pipeline` is a English model originally trained by ykacer. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_cased_imdb_sequence_classification_pipeline_en_5.5.0_3.0_1726988479532.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_cased_imdb_sequence_classification_pipeline_en_5.5.0_3.0_1726988479532.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_cased_imdb_sequence_classification_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_cased_imdb_sequence_classification_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_cased_imdb_sequence_classification_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|405.9 MB| + +## References + +https://huggingface.co/ykacer/bert-base-cased-imdb-sequence-classification + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_cased_plane_ood_2_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_cased_plane_ood_2_en.md new file mode 100644 index 00000000000000..d3156fb191b2d7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_cased_plane_ood_2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_cased_plane_ood_2 BertForSequenceClassification from lorenzoscottb +author: John Snow Labs +name: bert_base_cased_plane_ood_2 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_cased_plane_ood_2` is a English model originally trained by lorenzoscottb. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_cased_plane_ood_2_en_5.5.0_3.0_1726991247597.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_cased_plane_ood_2_en_5.5.0_3.0_1726991247597.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_cased_plane_ood_2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_cased_plane_ood_2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_cased_plane_ood_2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|405.9 MB| + +## References + +https://huggingface.co/lorenzoscottb/bert-base-cased-PLANE-ood-2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_cased_plane_ood_2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_cased_plane_ood_2_pipeline_en.md new file mode 100644 index 00000000000000..855e7d4923b956 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_cased_plane_ood_2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_base_cased_plane_ood_2_pipeline pipeline BertForSequenceClassification from lorenzoscottb +author: John Snow Labs +name: bert_base_cased_plane_ood_2_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_cased_plane_ood_2_pipeline` is a English model originally trained by lorenzoscottb. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_cased_plane_ood_2_pipeline_en_5.5.0_3.0_1726991265734.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_cased_plane_ood_2_pipeline_en_5.5.0_3.0_1726991265734.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_cased_plane_ood_2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_cased_plane_ood_2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_cased_plane_ood_2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|405.9 MB| + +## References + +https://huggingface.co/lorenzoscottb/bert-base-cased-PLANE-ood-2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_chinese_finetuned_ner_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_chinese_finetuned_ner_en.md new file mode 100644 index 00000000000000..3f1aaf1147fc07 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_chinese_finetuned_ner_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_chinese_finetuned_ner BertForTokenClassification from zhiguoxu +author: John Snow Labs +name: bert_base_chinese_finetuned_ner +date: 2024-09-22 +tags: [en, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_chinese_finetuned_ner` is a English model originally trained by zhiguoxu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_chinese_finetuned_ner_en_5.5.0_3.0_1727015818418.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_chinese_finetuned_ner_en_5.5.0_3.0_1727015818418.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("bert_base_chinese_finetuned_ner","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_base_chinese_finetuned_ner", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_chinese_finetuned_ner| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|381.1 MB| + +## References + +https://huggingface.co/zhiguoxu/bert-base-chinese-finetuned-ner \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_chinese_finetuned_ner_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_chinese_finetuned_ner_pipeline_en.md new file mode 100644 index 00000000000000..e4bb21966f99b3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_chinese_finetuned_ner_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_base_chinese_finetuned_ner_pipeline pipeline BertForTokenClassification from zhiguoxu +author: John Snow Labs +name: bert_base_chinese_finetuned_ner_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_chinese_finetuned_ner_pipeline` is a English model originally trained by zhiguoxu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_chinese_finetuned_ner_pipeline_en_5.5.0_3.0_1727015834998.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_chinese_finetuned_ner_pipeline_en_5.5.0_3.0_1727015834998.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_chinese_finetuned_ner_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_chinese_finetuned_ner_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_chinese_finetuned_ner_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|381.2 MB| + +## References + +https://huggingface.co/zhiguoxu/bert-base-chinese-finetuned-ner + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_chinese_finetuned_qa_b32_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_chinese_finetuned_qa_b32_en.md new file mode 100644 index 00000000000000..9258c2f608fcb6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_chinese_finetuned_qa_b32_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_chinese_finetuned_qa_b32 BertForQuestionAnswering from sharkMeow +author: John Snow Labs +name: bert_base_chinese_finetuned_qa_b32 +date: 2024-09-22 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_chinese_finetuned_qa_b32` is a English model originally trained by sharkMeow. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_chinese_finetuned_qa_b32_en_5.5.0_3.0_1727049451096.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_chinese_finetuned_qa_b32_en_5.5.0_3.0_1727049451096.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_chinese_finetuned_qa_b32","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_chinese_finetuned_qa_b32", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_chinese_finetuned_qa_b32| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|381.0 MB| + +## References + +https://huggingface.co/sharkMeow/bert-base-chinese-finetuned-QA-b32 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_chinese_finetuned_qa_b32_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_chinese_finetuned_qa_b32_pipeline_en.md new file mode 100644 index 00000000000000..2267f1edf952cc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_chinese_finetuned_qa_b32_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_chinese_finetuned_qa_b32_pipeline pipeline BertForQuestionAnswering from sharkMeow +author: John Snow Labs +name: bert_base_chinese_finetuned_qa_b32_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_chinese_finetuned_qa_b32_pipeline` is a English model originally trained by sharkMeow. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_chinese_finetuned_qa_b32_pipeline_en_5.5.0_3.0_1727049469914.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_chinese_finetuned_qa_b32_pipeline_en_5.5.0_3.0_1727049469914.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_chinese_finetuned_qa_b32_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_chinese_finetuned_qa_b32_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_chinese_finetuned_qa_b32_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|381.0 MB| + +## References + +https://huggingface.co/sharkMeow/bert-base-chinese-finetuned-QA-b32 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_embedding_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_embedding_en.md new file mode 100644 index 00000000000000..860afdf2fdb082 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_embedding_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_embedding BertEmbeddings from CH3COOK +author: John Snow Labs +name: bert_base_embedding +date: 2024-09-22 +tags: [en, open_source, onnx, embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_embedding` is a English model originally trained by CH3COOK. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_embedding_en_5.5.0_3.0_1727002932073.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_embedding_en_5.5.0_3.0_1727002932073.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = BertEmbeddings.pretrained("bert_base_embedding","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = BertEmbeddings.pretrained("bert_base_embedding","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_embedding| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[bert]| +|Language:|en| +|Size:|407.9 MB| + +## References + +https://huggingface.co/CH3COOK/bert-base-embedding \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_embedding_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_embedding_pipeline_en.md new file mode 100644 index 00000000000000..033d229d7d00ea --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_embedding_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_base_embedding_pipeline pipeline BertEmbeddings from CH3COOK +author: John Snow Labs +name: bert_base_embedding_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_embedding_pipeline` is a English model originally trained by CH3COOK. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_embedding_pipeline_en_5.5.0_3.0_1727002949825.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_embedding_pipeline_en_5.5.0_3.0_1727002949825.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_embedding_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_embedding_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_embedding_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|408.0 MB| + +## References + +https://huggingface.co/CH3COOK/bert-base-embedding + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_historic_multilingual_64k_td_cased_squad_french_pipeline_xx.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_historic_multilingual_64k_td_cased_squad_french_pipeline_xx.md new file mode 100644 index 00000000000000..6e5bf8c198d480 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_historic_multilingual_64k_td_cased_squad_french_pipeline_xx.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Multilingual bert_base_historic_multilingual_64k_td_cased_squad_french_pipeline pipeline BertForQuestionAnswering from Nadav +author: John Snow Labs +name: bert_base_historic_multilingual_64k_td_cased_squad_french_pipeline +date: 2024-09-22 +tags: [xx, open_source, pipeline, onnx] +task: Question Answering +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_historic_multilingual_64k_td_cased_squad_french_pipeline` is a Multilingual model originally trained by Nadav. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_historic_multilingual_64k_td_cased_squad_french_pipeline_xx_5.5.0_3.0_1727049409312.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_historic_multilingual_64k_td_cased_squad_french_pipeline_xx_5.5.0_3.0_1727049409312.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_historic_multilingual_64k_td_cased_squad_french_pipeline", lang = "xx") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_historic_multilingual_64k_td_cased_squad_french_pipeline", lang = "xx") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_historic_multilingual_64k_td_cased_squad_french_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|xx| +|Size:|504.6 MB| + +## References + +https://huggingface.co/Nadav/bert-base-historic-multilingual-64k-td-cased-squad-fr + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_italian_xxl_uncased_italian_finetuned_emotions_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_italian_xxl_uncased_italian_finetuned_emotions_en.md new file mode 100644 index 00000000000000..fe70b4a7d8f027 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_italian_xxl_uncased_italian_finetuned_emotions_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_italian_xxl_uncased_italian_finetuned_emotions BertForSequenceClassification from MelmaGrigia +author: John Snow Labs +name: bert_base_italian_xxl_uncased_italian_finetuned_emotions +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_italian_xxl_uncased_italian_finetuned_emotions` is a English model originally trained by MelmaGrigia. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_italian_xxl_uncased_italian_finetuned_emotions_en_5.5.0_3.0_1727030079822.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_italian_xxl_uncased_italian_finetuned_emotions_en_5.5.0_3.0_1727030079822.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_italian_xxl_uncased_italian_finetuned_emotions","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_italian_xxl_uncased_italian_finetuned_emotions", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_italian_xxl_uncased_italian_finetuned_emotions| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|414.8 MB| + +## References + +https://huggingface.co/MelmaGrigia/bert-base-italian-xxl-uncased-italian-finetuned-emotions \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_italian_xxl_uncased_italian_finetuned_emotions_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_italian_xxl_uncased_italian_finetuned_emotions_pipeline_en.md new file mode 100644 index 00000000000000..497ba497ee3ba4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_italian_xxl_uncased_italian_finetuned_emotions_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_base_italian_xxl_uncased_italian_finetuned_emotions_pipeline pipeline BertForSequenceClassification from MelmaGrigia +author: John Snow Labs +name: bert_base_italian_xxl_uncased_italian_finetuned_emotions_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_italian_xxl_uncased_italian_finetuned_emotions_pipeline` is a English model originally trained by MelmaGrigia. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_italian_xxl_uncased_italian_finetuned_emotions_pipeline_en_5.5.0_3.0_1727030100823.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_italian_xxl_uncased_italian_finetuned_emotions_pipeline_en_5.5.0_3.0_1727030100823.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_italian_xxl_uncased_italian_finetuned_emotions_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_italian_xxl_uncased_italian_finetuned_emotions_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_italian_xxl_uncased_italian_finetuned_emotions_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|414.8 MB| + +## References + +https://huggingface.co/MelmaGrigia/bert-base-italian-xxl-uncased-italian-finetuned-emotions + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_local_results_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_local_results_en.md new file mode 100644 index 00000000000000..42aff594170361 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_local_results_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_local_results BertForSequenceClassification from serpapi +author: John Snow Labs +name: bert_base_local_results +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_local_results` is a English model originally trained by serpapi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_local_results_en_5.5.0_3.0_1726976498295.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_local_results_en_5.5.0_3.0_1726976498295.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_local_results","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_local_results", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_local_results| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/serpapi/bert-base-local-results \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_local_results_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_local_results_pipeline_en.md new file mode 100644 index 00000000000000..d35507c5e8afc7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_local_results_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_base_local_results_pipeline pipeline BertForSequenceClassification from serpapi +author: John Snow Labs +name: bert_base_local_results_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_local_results_pipeline` is a English model originally trained by serpapi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_local_results_pipeline_en_5.5.0_3.0_1726976516239.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_local_results_pipeline_en_5.5.0_3.0_1726976516239.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_local_results_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_local_results_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_local_results_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.5 MB| + +## References + +https://huggingface.co/serpapi/bert-base-local-results + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_multilingual_cased_finetuned_squadbn_pipeline_xx.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_multilingual_cased_finetuned_squadbn_pipeline_xx.md new file mode 100644 index 00000000000000..14c036b8d63468 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_multilingual_cased_finetuned_squadbn_pipeline_xx.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Multilingual bert_base_multilingual_cased_finetuned_squadbn_pipeline pipeline BertForQuestionAnswering from AsifAbrar6 +author: John Snow Labs +name: bert_base_multilingual_cased_finetuned_squadbn_pipeline +date: 2024-09-22 +tags: [xx, open_source, pipeline, onnx] +task: Question Answering +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_multilingual_cased_finetuned_squadbn_pipeline` is a Multilingual model originally trained by AsifAbrar6. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_multilingual_cased_finetuned_squadbn_pipeline_xx_5.5.0_3.0_1726992164645.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_multilingual_cased_finetuned_squadbn_pipeline_xx_5.5.0_3.0_1726992164645.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_multilingual_cased_finetuned_squadbn_pipeline", lang = "xx") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_multilingual_cased_finetuned_squadbn_pipeline", lang = "xx") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_multilingual_cased_finetuned_squadbn_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|xx| +|Size:|665.1 MB| + +## References + +https://huggingface.co/AsifAbrar6/bert-base-multilingual-cased-finetuned-squadBN + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_multilingual_cased_finetuned_squadbn_xx.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_multilingual_cased_finetuned_squadbn_xx.md new file mode 100644 index 00000000000000..8f85b1f5e47d0e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_multilingual_cased_finetuned_squadbn_xx.md @@ -0,0 +1,86 @@ +--- +layout: model +title: Multilingual bert_base_multilingual_cased_finetuned_squadbn BertForQuestionAnswering from AsifAbrar6 +author: John Snow Labs +name: bert_base_multilingual_cased_finetuned_squadbn +date: 2024-09-22 +tags: [xx, open_source, onnx, question_answering, bert] +task: Question Answering +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_multilingual_cased_finetuned_squadbn` is a Multilingual model originally trained by AsifAbrar6. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_multilingual_cased_finetuned_squadbn_xx_5.5.0_3.0_1726992135639.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_multilingual_cased_finetuned_squadbn_xx_5.5.0_3.0_1726992135639.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_multilingual_cased_finetuned_squadbn","xx") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_multilingual_cased_finetuned_squadbn", "xx") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_multilingual_cased_finetuned_squadbn| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|xx| +|Size:|665.1 MB| + +## References + +https://huggingface.co/AsifAbrar6/bert-base-multilingual-cased-finetuned-squadBN \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_multilingual_cased_squad_ani2857_pipeline_xx.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_multilingual_cased_squad_ani2857_pipeline_xx.md new file mode 100644 index 00000000000000..da4c34d00fb427 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_multilingual_cased_squad_ani2857_pipeline_xx.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Multilingual bert_base_multilingual_cased_squad_ani2857_pipeline pipeline BertForQuestionAnswering from ani2857 +author: John Snow Labs +name: bert_base_multilingual_cased_squad_ani2857_pipeline +date: 2024-09-22 +tags: [xx, open_source, pipeline, onnx] +task: Question Answering +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_multilingual_cased_squad_ani2857_pipeline` is a Multilingual model originally trained by ani2857. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_multilingual_cased_squad_ani2857_pipeline_xx_5.5.0_3.0_1727049523710.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_multilingual_cased_squad_ani2857_pipeline_xx_5.5.0_3.0_1727049523710.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_multilingual_cased_squad_ani2857_pipeline", lang = "xx") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_multilingual_cased_squad_ani2857_pipeline", lang = "xx") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_multilingual_cased_squad_ani2857_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|xx| +|Size:|665.1 MB| + +## References + +https://huggingface.co/ani2857/bert-base-multilingual-cased-squad + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_multilingual_cased_squad_ani2857_xx.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_multilingual_cased_squad_ani2857_xx.md new file mode 100644 index 00000000000000..69f8b8eed6743b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_multilingual_cased_squad_ani2857_xx.md @@ -0,0 +1,86 @@ +--- +layout: model +title: Multilingual bert_base_multilingual_cased_squad_ani2857 BertForQuestionAnswering from ani2857 +author: John Snow Labs +name: bert_base_multilingual_cased_squad_ani2857 +date: 2024-09-22 +tags: [xx, open_source, onnx, question_answering, bert] +task: Question Answering +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_multilingual_cased_squad_ani2857` is a Multilingual model originally trained by ani2857. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_multilingual_cased_squad_ani2857_xx_5.5.0_3.0_1727049492036.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_multilingual_cased_squad_ani2857_xx_5.5.0_3.0_1727049492036.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_multilingual_cased_squad_ani2857","xx") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_multilingual_cased_squad_ani2857", "xx") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_multilingual_cased_squad_ani2857| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|xx| +|Size:|665.1 MB| + +## References + +https://huggingface.co/ani2857/bert-base-multilingual-cased-squad \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_multilingual_uncased_sentiment_navenprasad_pipeline_xx.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_multilingual_uncased_sentiment_navenprasad_pipeline_xx.md new file mode 100644 index 00000000000000..0c0da7d706ad51 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_multilingual_uncased_sentiment_navenprasad_pipeline_xx.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Multilingual bert_base_multilingual_uncased_sentiment_navenprasad_pipeline pipeline BertForSequenceClassification from navenprasad +author: John Snow Labs +name: bert_base_multilingual_uncased_sentiment_navenprasad_pipeline +date: 2024-09-22 +tags: [xx, open_source, pipeline, onnx] +task: Text Classification +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_multilingual_uncased_sentiment_navenprasad_pipeline` is a Multilingual model originally trained by navenprasad. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_multilingual_uncased_sentiment_navenprasad_pipeline_xx_5.5.0_3.0_1727034487420.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_multilingual_uncased_sentiment_navenprasad_pipeline_xx_5.5.0_3.0_1727034487420.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_multilingual_uncased_sentiment_navenprasad_pipeline", lang = "xx") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_multilingual_uncased_sentiment_navenprasad_pipeline", lang = "xx") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_multilingual_uncased_sentiment_navenprasad_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|xx| +|Size:|627.8 MB| + +## References + +https://huggingface.co/navenprasad/bert-base-multilingual-uncased-sentiment + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_multilingual_uncased_sentiment_navenprasad_xx.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_multilingual_uncased_sentiment_navenprasad_xx.md new file mode 100644 index 00000000000000..adbdef9e2d0505 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_multilingual_uncased_sentiment_navenprasad_xx.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Multilingual bert_base_multilingual_uncased_sentiment_navenprasad BertForSequenceClassification from navenprasad +author: John Snow Labs +name: bert_base_multilingual_uncased_sentiment_navenprasad +date: 2024-09-22 +tags: [xx, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_multilingual_uncased_sentiment_navenprasad` is a Multilingual model originally trained by navenprasad. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_multilingual_uncased_sentiment_navenprasad_xx_5.5.0_3.0_1727034454039.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_multilingual_uncased_sentiment_navenprasad_xx_5.5.0_3.0_1727034454039.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_multilingual_uncased_sentiment_navenprasad","xx") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_multilingual_uncased_sentiment_navenprasad", "xx") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_multilingual_uncased_sentiment_navenprasad| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|xx| +|Size:|627.7 MB| + +## References + +https://huggingface.co/navenprasad/bert-base-multilingual-uncased-sentiment \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_squad_v1_1_portuguese_ibama_v0_320240905172321_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_squad_v1_1_portuguese_ibama_v0_320240905172321_en.md new file mode 100644 index 00000000000000..d1eb062654ad16 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_squad_v1_1_portuguese_ibama_v0_320240905172321_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_squad_v1_1_portuguese_ibama_v0_320240905172321 BertForQuestionAnswering from alcalazans +author: John Snow Labs +name: bert_base_squad_v1_1_portuguese_ibama_v0_320240905172321 +date: 2024-09-22 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_squad_v1_1_portuguese_ibama_v0_320240905172321` is a English model originally trained by alcalazans. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_squad_v1_1_portuguese_ibama_v0_320240905172321_en_5.5.0_3.0_1726991973557.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_squad_v1_1_portuguese_ibama_v0_320240905172321_en_5.5.0_3.0_1726991973557.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_squad_v1_1_portuguese_ibama_v0_320240905172321","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_squad_v1_1_portuguese_ibama_v0_320240905172321", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_squad_v1_1_portuguese_ibama_v0_320240905172321| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|405.9 MB| + +## References + +https://huggingface.co/alcalazans/bert-base-squad-v1.1-pt-IBAMA_v0.320240905172321 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_squad_v1_1_portuguese_ibama_v0_420240915003326_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_squad_v1_1_portuguese_ibama_v0_420240915003326_en.md new file mode 100644 index 00000000000000..f7e83f487c0373 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_squad_v1_1_portuguese_ibama_v0_420240915003326_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_squad_v1_1_portuguese_ibama_v0_420240915003326 BertForQuestionAnswering from alcalazans +author: John Snow Labs +name: bert_base_squad_v1_1_portuguese_ibama_v0_420240915003326 +date: 2024-09-22 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_squad_v1_1_portuguese_ibama_v0_420240915003326` is a English model originally trained by alcalazans. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_squad_v1_1_portuguese_ibama_v0_420240915003326_en_5.5.0_3.0_1727042449961.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_squad_v1_1_portuguese_ibama_v0_420240915003326_en_5.5.0_3.0_1727042449961.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_squad_v1_1_portuguese_ibama_v0_420240915003326","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_squad_v1_1_portuguese_ibama_v0_420240915003326", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_squad_v1_1_portuguese_ibama_v0_420240915003326| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|405.9 MB| + +## References + +https://huggingface.co/alcalazans/bert-base-squad-v1.1-pt-IBAMA_v0.420240915003326 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_squad_v1_1_portuguese_ibama_v0_420240915003326_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_squad_v1_1_portuguese_ibama_v0_420240915003326_pipeline_en.md new file mode 100644 index 00000000000000..10533e9e440592 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_squad_v1_1_portuguese_ibama_v0_420240915003326_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_squad_v1_1_portuguese_ibama_v0_420240915003326_pipeline pipeline BertForQuestionAnswering from alcalazans +author: John Snow Labs +name: bert_base_squad_v1_1_portuguese_ibama_v0_420240915003326_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_squad_v1_1_portuguese_ibama_v0_420240915003326_pipeline` is a English model originally trained by alcalazans. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_squad_v1_1_portuguese_ibama_v0_420240915003326_pipeline_en_5.5.0_3.0_1727042470722.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_squad_v1_1_portuguese_ibama_v0_420240915003326_pipeline_en_5.5.0_3.0_1727042470722.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_squad_v1_1_portuguese_ibama_v0_420240915003326_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_squad_v1_1_portuguese_ibama_v0_420240915003326_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_squad_v1_1_portuguese_ibama_v0_420240915003326_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|405.9 MB| + +## References + +https://huggingface.co/alcalazans/bert-base-squad-v1.1-pt-IBAMA_v0.420240915003326 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_squad_v1_1_portuguese_ibama_v0_420240915121227_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_squad_v1_1_portuguese_ibama_v0_420240915121227_en.md new file mode 100644 index 00000000000000..87f7605789a05c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_squad_v1_1_portuguese_ibama_v0_420240915121227_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_squad_v1_1_portuguese_ibama_v0_420240915121227 BertForQuestionAnswering from alcalazans +author: John Snow Labs +name: bert_base_squad_v1_1_portuguese_ibama_v0_420240915121227 +date: 2024-09-22 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_squad_v1_1_portuguese_ibama_v0_420240915121227` is a English model originally trained by alcalazans. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_squad_v1_1_portuguese_ibama_v0_420240915121227_en_5.5.0_3.0_1727039380809.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_squad_v1_1_portuguese_ibama_v0_420240915121227_en_5.5.0_3.0_1727039380809.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_squad_v1_1_portuguese_ibama_v0_420240915121227","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_squad_v1_1_portuguese_ibama_v0_420240915121227", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_squad_v1_1_portuguese_ibama_v0_420240915121227| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|405.9 MB| + +## References + +https://huggingface.co/alcalazans/bert-base-squad-v1.1-pt-IBAMA_v0.420240915121227 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_squad_v1_1_portuguese_ibama_v0_420240915121227_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_squad_v1_1_portuguese_ibama_v0_420240915121227_pipeline_en.md new file mode 100644 index 00000000000000..54218906163d89 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_squad_v1_1_portuguese_ibama_v0_420240915121227_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_squad_v1_1_portuguese_ibama_v0_420240915121227_pipeline pipeline BertForQuestionAnswering from alcalazans +author: John Snow Labs +name: bert_base_squad_v1_1_portuguese_ibama_v0_420240915121227_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_squad_v1_1_portuguese_ibama_v0_420240915121227_pipeline` is a English model originally trained by alcalazans. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_squad_v1_1_portuguese_ibama_v0_420240915121227_pipeline_en_5.5.0_3.0_1727039402106.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_squad_v1_1_portuguese_ibama_v0_420240915121227_pipeline_en_5.5.0_3.0_1727039402106.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_squad_v1_1_portuguese_ibama_v0_420240915121227_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_squad_v1_1_portuguese_ibama_v0_420240915121227_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_squad_v1_1_portuguese_ibama_v0_420240915121227_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|405.9 MB| + +## References + +https://huggingface.co/alcalazans/bert-base-squad-v1.1-pt-IBAMA_v0.420240915121227 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_squad_v1_1_portuguese_ibama_v0_420240915122349_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_squad_v1_1_portuguese_ibama_v0_420240915122349_en.md new file mode 100644 index 00000000000000..e0b7075885580a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_squad_v1_1_portuguese_ibama_v0_420240915122349_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_squad_v1_1_portuguese_ibama_v0_420240915122349 BertForQuestionAnswering from alcalazans +author: John Snow Labs +name: bert_base_squad_v1_1_portuguese_ibama_v0_420240915122349 +date: 2024-09-22 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_squad_v1_1_portuguese_ibama_v0_420240915122349` is a English model originally trained by alcalazans. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_squad_v1_1_portuguese_ibama_v0_420240915122349_en_5.5.0_3.0_1727039525805.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_squad_v1_1_portuguese_ibama_v0_420240915122349_en_5.5.0_3.0_1727039525805.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_squad_v1_1_portuguese_ibama_v0_420240915122349","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_squad_v1_1_portuguese_ibama_v0_420240915122349", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_squad_v1_1_portuguese_ibama_v0_420240915122349| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|405.9 MB| + +## References + +https://huggingface.co/alcalazans/bert-base-squad-v1.1-pt-IBAMA_v0.420240915122349 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_squad_v1_1_portuguese_ibama_v0_420240915122349_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_squad_v1_1_portuguese_ibama_v0_420240915122349_pipeline_en.md new file mode 100644 index 00000000000000..fd977c7362adfc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_squad_v1_1_portuguese_ibama_v0_420240915122349_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_squad_v1_1_portuguese_ibama_v0_420240915122349_pipeline pipeline BertForQuestionAnswering from alcalazans +author: John Snow Labs +name: bert_base_squad_v1_1_portuguese_ibama_v0_420240915122349_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_squad_v1_1_portuguese_ibama_v0_420240915122349_pipeline` is a English model originally trained by alcalazans. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_squad_v1_1_portuguese_ibama_v0_420240915122349_pipeline_en_5.5.0_3.0_1727039546460.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_squad_v1_1_portuguese_ibama_v0_420240915122349_pipeline_en_5.5.0_3.0_1727039546460.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_squad_v1_1_portuguese_ibama_v0_420240915122349_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_squad_v1_1_portuguese_ibama_v0_420240915122349_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_squad_v1_1_portuguese_ibama_v0_420240915122349_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|405.9 MB| + +## References + +https://huggingface.co/alcalazans/bert-base-squad-v1.1-pt-IBAMA_v0.420240915122349 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_squad_v1_1_portuguese_ibama_v0_420240915123325_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_squad_v1_1_portuguese_ibama_v0_420240915123325_en.md new file mode 100644 index 00000000000000..47f2a5fcdf053b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_squad_v1_1_portuguese_ibama_v0_420240915123325_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_squad_v1_1_portuguese_ibama_v0_420240915123325 BertForQuestionAnswering from alcalazans +author: John Snow Labs +name: bert_base_squad_v1_1_portuguese_ibama_v0_420240915123325 +date: 2024-09-22 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_squad_v1_1_portuguese_ibama_v0_420240915123325` is a English model originally trained by alcalazans. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_squad_v1_1_portuguese_ibama_v0_420240915123325_en_5.5.0_3.0_1727039808767.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_squad_v1_1_portuguese_ibama_v0_420240915123325_en_5.5.0_3.0_1727039808767.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_squad_v1_1_portuguese_ibama_v0_420240915123325","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_squad_v1_1_portuguese_ibama_v0_420240915123325", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_squad_v1_1_portuguese_ibama_v0_420240915123325| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|405.9 MB| + +## References + +https://huggingface.co/alcalazans/bert-base-squad-v1.1-pt-IBAMA_v0.420240915123325 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_squad_v1_1_portuguese_ibama_v0_420240915123325_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_squad_v1_1_portuguese_ibama_v0_420240915123325_pipeline_en.md new file mode 100644 index 00000000000000..f7353c2c15facc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_squad_v1_1_portuguese_ibama_v0_420240915123325_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_squad_v1_1_portuguese_ibama_v0_420240915123325_pipeline pipeline BertForQuestionAnswering from alcalazans +author: John Snow Labs +name: bert_base_squad_v1_1_portuguese_ibama_v0_420240915123325_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_squad_v1_1_portuguese_ibama_v0_420240915123325_pipeline` is a English model originally trained by alcalazans. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_squad_v1_1_portuguese_ibama_v0_420240915123325_pipeline_en_5.5.0_3.0_1727039829562.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_squad_v1_1_portuguese_ibama_v0_420240915123325_pipeline_en_5.5.0_3.0_1727039829562.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_squad_v1_1_portuguese_ibama_v0_420240915123325_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_squad_v1_1_portuguese_ibama_v0_420240915123325_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_squad_v1_1_portuguese_ibama_v0_420240915123325_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|405.9 MB| + +## References + +https://huggingface.co/alcalazans/bert-base-squad-v1.1-pt-IBAMA_v0.420240915123325 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_clinical_ner_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_clinical_ner_en.md new file mode 100644 index 00000000000000..8ff9fc78a2a629 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_clinical_ner_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_uncased_clinical_ner BertForTokenClassification from sschet +author: John Snow Labs +name: bert_base_uncased_clinical_ner +date: 2024-09-22 +tags: [en, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_clinical_ner` is a English model originally trained by sschet. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_clinical_ner_en_5.5.0_3.0_1726974764669.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_clinical_ner_en_5.5.0_3.0_1726974764669.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("bert_base_uncased_clinical_ner","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_base_uncased_clinical_ner", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_clinical_ner| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/sschet/bert-base-uncased_clinical-ner \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_clinical_ner_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_clinical_ner_pipeline_en.md new file mode 100644 index 00000000000000..cf88c1c29951dc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_clinical_ner_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_base_uncased_clinical_ner_pipeline pipeline BertForTokenClassification from sschet +author: John Snow Labs +name: bert_base_uncased_clinical_ner_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_clinical_ner_pipeline` is a English model originally trained by sschet. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_clinical_ner_pipeline_en_5.5.0_3.0_1726974783488.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_clinical_ner_pipeline_en_5.5.0_3.0_1726974783488.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_clinical_ner_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_clinical_ner_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_clinical_ner_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/sschet/bert-base-uncased_clinical-ner + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_coqa_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_coqa_en.md new file mode 100644 index 00000000000000..0e43327853c8d2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_coqa_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_uncased_coqa BertForQuestionAnswering from rooftopcoder +author: John Snow Labs +name: bert_base_uncased_coqa +date: 2024-09-22 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_coqa` is a English model originally trained by rooftopcoder. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_coqa_en_5.5.0_3.0_1726991819132.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_coqa_en_5.5.0_3.0_1726991819132.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_coqa","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_coqa", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_coqa| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/rooftopcoder/bert-base-uncased-coqa \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_coqa_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_coqa_pipeline_en.md new file mode 100644 index 00000000000000..3e73e09a01c571 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_coqa_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_uncased_coqa_pipeline pipeline BertForQuestionAnswering from rooftopcoder +author: John Snow Labs +name: bert_base_uncased_coqa_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_coqa_pipeline` is a English model originally trained by rooftopcoder. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_coqa_pipeline_en_5.5.0_3.0_1726991838445.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_coqa_pipeline_en_5.5.0_3.0_1726991838445.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_coqa_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_coqa_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_coqa_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/rooftopcoder/bert-base-uncased-coqa + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_english_sentweet_profane_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_english_sentweet_profane_en.md new file mode 100644 index 00000000000000..f42a6ec713d41f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_english_sentweet_profane_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_uncased_english_sentweet_profane BertForSequenceClassification from jayanta +author: John Snow Labs +name: bert_base_uncased_english_sentweet_profane +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_english_sentweet_profane` is a English model originally trained by jayanta. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_english_sentweet_profane_en_5.5.0_3.0_1727030006784.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_english_sentweet_profane_en_5.5.0_3.0_1727030006784.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_uncased_english_sentweet_profane","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_uncased_english_sentweet_profane", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_english_sentweet_profane| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/jayanta/bert-base-uncased-english-sentweet-profane \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_english_sentweet_profane_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_english_sentweet_profane_pipeline_en.md new file mode 100644 index 00000000000000..35285e1a9cd778 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_english_sentweet_profane_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_base_uncased_english_sentweet_profane_pipeline pipeline BertForSequenceClassification from jayanta +author: John Snow Labs +name: bert_base_uncased_english_sentweet_profane_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_english_sentweet_profane_pipeline` is a English model originally trained by jayanta. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_english_sentweet_profane_pipeline_en_5.5.0_3.0_1727030027707.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_english_sentweet_profane_pipeline_en_5.5.0_3.0_1727030027707.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_english_sentweet_profane_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_english_sentweet_profane_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_english_sentweet_profane_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/jayanta/bert-base-uncased-english-sentweet-profane + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_ep_10_0_b_32_lr_4e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_ep_10_0_b_32_lr_4e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_en.md new file mode 100644 index 00000000000000..0846ee1818c970 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_ep_10_0_b_32_lr_4e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_uncased_ep_10_0_b_32_lr_4e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0 BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_ep_10_0_b_32_lr_4e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0 +date: 2024-09-22 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_ep_10_0_b_32_lr_4e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_ep_10_0_b_32_lr_4e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_en_5.5.0_3.0_1727042598681.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_ep_10_0_b_32_lr_4e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_en_5.5.0_3.0_1727042598681.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_ep_10_0_b_32_lr_4e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_ep_10_0_b_32_lr_4e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_ep_10_0_b_32_lr_4e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-ep-10.0-b-32-lr-4e-07-dp-0.5-ss-0-st-True-fh-False-hs-0 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_ep_10_0_b_32_lr_4e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_ep_10_0_b_32_lr_4e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_pipeline_en.md new file mode 100644 index 00000000000000..5a95914e34bc7c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_ep_10_0_b_32_lr_4e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_uncased_ep_10_0_b_32_lr_4e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_pipeline pipeline BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_ep_10_0_b_32_lr_4e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_ep_10_0_b_32_lr_4e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_pipeline` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_ep_10_0_b_32_lr_4e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_pipeline_en_5.5.0_3.0_1727042622649.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_ep_10_0_b_32_lr_4e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_pipeline_en_5.5.0_3.0_1727042622649.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_ep_10_0_b_32_lr_4e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_ep_10_0_b_32_lr_4e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_ep_10_0_b_32_lr_4e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-ep-10.0-b-32-lr-4e-07-dp-0.5-ss-0-st-True-fh-False-hs-0 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_ep_1_29_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_false_fh_false_hs_300_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_ep_1_29_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_false_fh_false_hs_300_en.md new file mode 100644 index 00000000000000..a43271e4ba1dd2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_ep_1_29_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_false_fh_false_hs_300_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_uncased_ep_1_29_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_false_fh_false_hs_300 BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_ep_1_29_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_false_fh_false_hs_300 +date: 2024-09-22 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_ep_1_29_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_false_fh_false_hs_300` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_ep_1_29_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_false_fh_false_hs_300_en_5.5.0_3.0_1727042888237.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_ep_1_29_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_false_fh_false_hs_300_en_5.5.0_3.0_1727042888237.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_ep_1_29_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_false_fh_false_hs_300","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_ep_1_29_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_false_fh_false_hs_300", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_ep_1_29_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_false_fh_false_hs_300| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-ep-1.29-b-32-lr-8e-07-dp-0.5-ss-0-st-False-fh-False-hs-300 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_ep_1_56_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_ep_1_56_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_en.md new file mode 100644 index 00000000000000..b3310bdcc6d2a5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_ep_1_56_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_uncased_ep_1_56_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0 BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_ep_1_56_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0 +date: 2024-09-22 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_ep_1_56_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_ep_1_56_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_en_5.5.0_3.0_1726991711867.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_ep_1_56_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_en_5.5.0_3.0_1726991711867.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_ep_1_56_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_ep_1_56_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_ep_1_56_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-ep-1.56-b-32-lr-8e-07-dp-0.5-ss-0-st-True-fh-False-hs-0 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_ep_1_56_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_ep_1_56_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_pipeline_en.md new file mode 100644 index 00000000000000..57b4b1ba89d9c7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_ep_1_56_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_uncased_ep_1_56_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_pipeline pipeline BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_ep_1_56_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_ep_1_56_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_pipeline` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_ep_1_56_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_pipeline_en_5.5.0_3.0_1726991729708.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_ep_1_56_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_pipeline_en_5.5.0_3.0_1726991729708.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_ep_1_56_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_ep_1_56_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_ep_1_56_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-ep-1.56-b-32-lr-8e-07-dp-0.5-ss-0-st-True-fh-False-hs-0 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_ep_2_25_b_32_lr_1_2e_06_dp_0_3_swati_0_southern_sotho_false_fh_false_hs_600_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_ep_2_25_b_32_lr_1_2e_06_dp_0_3_swati_0_southern_sotho_false_fh_false_hs_600_pipeline_en.md new file mode 100644 index 00000000000000..d9563946c02d15 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_ep_2_25_b_32_lr_1_2e_06_dp_0_3_swati_0_southern_sotho_false_fh_false_hs_600_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_uncased_ep_2_25_b_32_lr_1_2e_06_dp_0_3_swati_0_southern_sotho_false_fh_false_hs_600_pipeline pipeline BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_ep_2_25_b_32_lr_1_2e_06_dp_0_3_swati_0_southern_sotho_false_fh_false_hs_600_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_ep_2_25_b_32_lr_1_2e_06_dp_0_3_swati_0_southern_sotho_false_fh_false_hs_600_pipeline` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_ep_2_25_b_32_lr_1_2e_06_dp_0_3_swati_0_southern_sotho_false_fh_false_hs_600_pipeline_en_5.5.0_3.0_1727042295759.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_ep_2_25_b_32_lr_1_2e_06_dp_0_3_swati_0_southern_sotho_false_fh_false_hs_600_pipeline_en_5.5.0_3.0_1727042295759.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_ep_2_25_b_32_lr_1_2e_06_dp_0_3_swati_0_southern_sotho_false_fh_false_hs_600_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_ep_2_25_b_32_lr_1_2e_06_dp_0_3_swati_0_southern_sotho_false_fh_false_hs_600_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_ep_2_25_b_32_lr_1_2e_06_dp_0_3_swati_0_southern_sotho_false_fh_false_hs_600_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-ep-2.25-b-32-lr-1.2e-06-dp-0.3-ss-0-st-False-fh-False-hs-600 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_ep_2_2_b_32_lr_1_2e_06_dp_0_3_swati_500_southern_sotho_false_fh_true_hs_0_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_ep_2_2_b_32_lr_1_2e_06_dp_0_3_swati_500_southern_sotho_false_fh_true_hs_0_en.md new file mode 100644 index 00000000000000..2cfa1dfe5e778d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_ep_2_2_b_32_lr_1_2e_06_dp_0_3_swati_500_southern_sotho_false_fh_true_hs_0_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_uncased_ep_2_2_b_32_lr_1_2e_06_dp_0_3_swati_500_southern_sotho_false_fh_true_hs_0 BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_ep_2_2_b_32_lr_1_2e_06_dp_0_3_swati_500_southern_sotho_false_fh_true_hs_0 +date: 2024-09-22 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_ep_2_2_b_32_lr_1_2e_06_dp_0_3_swati_500_southern_sotho_false_fh_true_hs_0` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_ep_2_2_b_32_lr_1_2e_06_dp_0_3_swati_500_southern_sotho_false_fh_true_hs_0_en_5.5.0_3.0_1726992015862.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_ep_2_2_b_32_lr_1_2e_06_dp_0_3_swati_500_southern_sotho_false_fh_true_hs_0_en_5.5.0_3.0_1726992015862.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_ep_2_2_b_32_lr_1_2e_06_dp_0_3_swati_500_southern_sotho_false_fh_true_hs_0","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_ep_2_2_b_32_lr_1_2e_06_dp_0_3_swati_500_southern_sotho_false_fh_true_hs_0", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_ep_2_2_b_32_lr_1_2e_06_dp_0_3_swati_500_southern_sotho_false_fh_true_hs_0| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-ep-2.2-b-32-lr-1.2e-06-dp-0.3-ss-500-st-False-fh-True-hs-0 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_ep_2_2_b_32_lr_1_2e_06_dp_0_3_swati_500_southern_sotho_false_fh_true_hs_0_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_ep_2_2_b_32_lr_1_2e_06_dp_0_3_swati_500_southern_sotho_false_fh_true_hs_0_pipeline_en.md new file mode 100644 index 00000000000000..0508d630ecac0e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_ep_2_2_b_32_lr_1_2e_06_dp_0_3_swati_500_southern_sotho_false_fh_true_hs_0_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_uncased_ep_2_2_b_32_lr_1_2e_06_dp_0_3_swati_500_southern_sotho_false_fh_true_hs_0_pipeline pipeline BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_ep_2_2_b_32_lr_1_2e_06_dp_0_3_swati_500_southern_sotho_false_fh_true_hs_0_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_ep_2_2_b_32_lr_1_2e_06_dp_0_3_swati_500_southern_sotho_false_fh_true_hs_0_pipeline` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_ep_2_2_b_32_lr_1_2e_06_dp_0_3_swati_500_southern_sotho_false_fh_true_hs_0_pipeline_en_5.5.0_3.0_1726992034386.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_ep_2_2_b_32_lr_1_2e_06_dp_0_3_swati_500_southern_sotho_false_fh_true_hs_0_pipeline_en_5.5.0_3.0_1726992034386.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_ep_2_2_b_32_lr_1_2e_06_dp_0_3_swati_500_southern_sotho_false_fh_true_hs_0_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_ep_2_2_b_32_lr_1_2e_06_dp_0_3_swati_500_southern_sotho_false_fh_true_hs_0_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_ep_2_2_b_32_lr_1_2e_06_dp_0_3_swati_500_southern_sotho_false_fh_true_hs_0_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-ep-2.2-b-32-lr-1.2e-06-dp-0.3-ss-500-st-False-fh-True-hs-0 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_ep_2_62_b_32_lr_4e_07_dp_1_0_swati_600_southern_sotho_false_fh_true_hs_0_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_ep_2_62_b_32_lr_4e_07_dp_1_0_swati_600_southern_sotho_false_fh_true_hs_0_en.md new file mode 100644 index 00000000000000..84e016310493c0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_ep_2_62_b_32_lr_4e_07_dp_1_0_swati_600_southern_sotho_false_fh_true_hs_0_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_uncased_ep_2_62_b_32_lr_4e_07_dp_1_0_swati_600_southern_sotho_false_fh_true_hs_0 BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_ep_2_62_b_32_lr_4e_07_dp_1_0_swati_600_southern_sotho_false_fh_true_hs_0 +date: 2024-09-22 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_ep_2_62_b_32_lr_4e_07_dp_1_0_swati_600_southern_sotho_false_fh_true_hs_0` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_ep_2_62_b_32_lr_4e_07_dp_1_0_swati_600_southern_sotho_false_fh_true_hs_0_en_5.5.0_3.0_1727042393714.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_ep_2_62_b_32_lr_4e_07_dp_1_0_swati_600_southern_sotho_false_fh_true_hs_0_en_5.5.0_3.0_1727042393714.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_ep_2_62_b_32_lr_4e_07_dp_1_0_swati_600_southern_sotho_false_fh_true_hs_0","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_ep_2_62_b_32_lr_4e_07_dp_1_0_swati_600_southern_sotho_false_fh_true_hs_0", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_ep_2_62_b_32_lr_4e_07_dp_1_0_swati_600_southern_sotho_false_fh_true_hs_0| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-ep-2.62-b-32-lr-4e-07-dp-1.0-ss-600-st-False-fh-True-hs-0 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_ep_2_62_b_32_lr_4e_07_dp_1_0_swati_600_southern_sotho_false_fh_true_hs_0_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_ep_2_62_b_32_lr_4e_07_dp_1_0_swati_600_southern_sotho_false_fh_true_hs_0_pipeline_en.md new file mode 100644 index 00000000000000..109063d72037e6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_ep_2_62_b_32_lr_4e_07_dp_1_0_swati_600_southern_sotho_false_fh_true_hs_0_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_uncased_ep_2_62_b_32_lr_4e_07_dp_1_0_swati_600_southern_sotho_false_fh_true_hs_0_pipeline pipeline BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_ep_2_62_b_32_lr_4e_07_dp_1_0_swati_600_southern_sotho_false_fh_true_hs_0_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_ep_2_62_b_32_lr_4e_07_dp_1_0_swati_600_southern_sotho_false_fh_true_hs_0_pipeline` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_ep_2_62_b_32_lr_4e_07_dp_1_0_swati_600_southern_sotho_false_fh_true_hs_0_pipeline_en_5.5.0_3.0_1727042414599.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_ep_2_62_b_32_lr_4e_07_dp_1_0_swati_600_southern_sotho_false_fh_true_hs_0_pipeline_en_5.5.0_3.0_1727042414599.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_ep_2_62_b_32_lr_4e_07_dp_1_0_swati_600_southern_sotho_false_fh_true_hs_0_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_ep_2_62_b_32_lr_4e_07_dp_1_0_swati_600_southern_sotho_false_fh_true_hs_0_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_ep_2_62_b_32_lr_4e_07_dp_1_0_swati_600_southern_sotho_false_fh_true_hs_0_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-ep-2.62-b-32-lr-4e-07-dp-1.0-ss-600-st-False-fh-True-hs-0 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_ep_2_69_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_ep_2_69_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_en.md new file mode 100644 index 00000000000000..d08670d5675b6a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_ep_2_69_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_uncased_ep_2_69_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0 BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_ep_2_69_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0 +date: 2024-09-22 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_ep_2_69_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_ep_2_69_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_en_5.5.0_3.0_1727049190563.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_ep_2_69_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_en_5.5.0_3.0_1727049190563.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_ep_2_69_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_ep_2_69_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_ep_2_69_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-ep-2.69-b-32-lr-8e-07-dp-0.5-ss-0-st-True-fh-False-hs-0 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_ep_2_69_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_ep_2_69_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_pipeline_en.md new file mode 100644 index 00000000000000..ec9d162e305811 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_ep_2_69_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_uncased_ep_2_69_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_pipeline pipeline BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_ep_2_69_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_ep_2_69_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_pipeline` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_ep_2_69_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_pipeline_en_5.5.0_3.0_1727049216838.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_ep_2_69_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_pipeline_en_5.5.0_3.0_1727049216838.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_ep_2_69_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_ep_2_69_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_ep_2_69_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-ep-2.69-b-32-lr-8e-07-dp-0.5-ss-0-st-True-fh-False-hs-0 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_ep_3_11_b_32_lr_4e_06_dp_0_1_swati_700_southern_sotho_false_fh_true_hs_0_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_ep_3_11_b_32_lr_4e_06_dp_0_1_swati_700_southern_sotho_false_fh_true_hs_0_en.md new file mode 100644 index 00000000000000..0dcfd66f76c494 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_ep_3_11_b_32_lr_4e_06_dp_0_1_swati_700_southern_sotho_false_fh_true_hs_0_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_uncased_ep_3_11_b_32_lr_4e_06_dp_0_1_swati_700_southern_sotho_false_fh_true_hs_0 BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_ep_3_11_b_32_lr_4e_06_dp_0_1_swati_700_southern_sotho_false_fh_true_hs_0 +date: 2024-09-22 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_ep_3_11_b_32_lr_4e_06_dp_0_1_swati_700_southern_sotho_false_fh_true_hs_0` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_ep_3_11_b_32_lr_4e_06_dp_0_1_swati_700_southern_sotho_false_fh_true_hs_0_en_5.5.0_3.0_1727049329348.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_ep_3_11_b_32_lr_4e_06_dp_0_1_swati_700_southern_sotho_false_fh_true_hs_0_en_5.5.0_3.0_1727049329348.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_ep_3_11_b_32_lr_4e_06_dp_0_1_swati_700_southern_sotho_false_fh_true_hs_0","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_ep_3_11_b_32_lr_4e_06_dp_0_1_swati_700_southern_sotho_false_fh_true_hs_0", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_ep_3_11_b_32_lr_4e_06_dp_0_1_swati_700_southern_sotho_false_fh_true_hs_0| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-ep-3.11-b-32-lr-4e-06-dp-0.1-ss-700-st-False-fh-True-hs-0 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_ep_3_11_b_32_lr_4e_06_dp_0_1_swati_700_southern_sotho_false_fh_true_hs_0_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_ep_3_11_b_32_lr_4e_06_dp_0_1_swati_700_southern_sotho_false_fh_true_hs_0_pipeline_en.md new file mode 100644 index 00000000000000..3cc32ba1e4ac6e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_ep_3_11_b_32_lr_4e_06_dp_0_1_swati_700_southern_sotho_false_fh_true_hs_0_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_uncased_ep_3_11_b_32_lr_4e_06_dp_0_1_swati_700_southern_sotho_false_fh_true_hs_0_pipeline pipeline BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_ep_3_11_b_32_lr_4e_06_dp_0_1_swati_700_southern_sotho_false_fh_true_hs_0_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_ep_3_11_b_32_lr_4e_06_dp_0_1_swati_700_southern_sotho_false_fh_true_hs_0_pipeline` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_ep_3_11_b_32_lr_4e_06_dp_0_1_swati_700_southern_sotho_false_fh_true_hs_0_pipeline_en_5.5.0_3.0_1727049354258.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_ep_3_11_b_32_lr_4e_06_dp_0_1_swati_700_southern_sotho_false_fh_true_hs_0_pipeline_en_5.5.0_3.0_1727049354258.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_ep_3_11_b_32_lr_4e_06_dp_0_1_swati_700_southern_sotho_false_fh_true_hs_0_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_ep_3_11_b_32_lr_4e_06_dp_0_1_swati_700_southern_sotho_false_fh_true_hs_0_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_ep_3_11_b_32_lr_4e_06_dp_0_1_swati_700_southern_sotho_false_fh_true_hs_0_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-ep-3.11-b-32-lr-4e-06-dp-0.1-ss-700-st-False-fh-True-hs-0 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_ep_3_44_b_32_lr_1_2e_06_dp_0_3_swati_0_southern_sotho_true_fh_false_hs_0_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_ep_3_44_b_32_lr_1_2e_06_dp_0_3_swati_0_southern_sotho_true_fh_false_hs_0_en.md new file mode 100644 index 00000000000000..47aae29b6b3bc5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_ep_3_44_b_32_lr_1_2e_06_dp_0_3_swati_0_southern_sotho_true_fh_false_hs_0_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_uncased_ep_3_44_b_32_lr_1_2e_06_dp_0_3_swati_0_southern_sotho_true_fh_false_hs_0 BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_ep_3_44_b_32_lr_1_2e_06_dp_0_3_swati_0_southern_sotho_true_fh_false_hs_0 +date: 2024-09-22 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_ep_3_44_b_32_lr_1_2e_06_dp_0_3_swati_0_southern_sotho_true_fh_false_hs_0` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_ep_3_44_b_32_lr_1_2e_06_dp_0_3_swati_0_southern_sotho_true_fh_false_hs_0_en_5.5.0_3.0_1727042880900.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_ep_3_44_b_32_lr_1_2e_06_dp_0_3_swati_0_southern_sotho_true_fh_false_hs_0_en_5.5.0_3.0_1727042880900.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_ep_3_44_b_32_lr_1_2e_06_dp_0_3_swati_0_southern_sotho_true_fh_false_hs_0","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_ep_3_44_b_32_lr_1_2e_06_dp_0_3_swati_0_southern_sotho_true_fh_false_hs_0", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_ep_3_44_b_32_lr_1_2e_06_dp_0_3_swati_0_southern_sotho_true_fh_false_hs_0| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-ep-3.44-b-32-lr-1.2e-06-dp-0.3-ss-0-st-True-fh-False-hs-0 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_ep_3_44_b_32_lr_1_2e_06_dp_0_3_swati_0_southern_sotho_true_fh_false_hs_0_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_ep_3_44_b_32_lr_1_2e_06_dp_0_3_swati_0_southern_sotho_true_fh_false_hs_0_pipeline_en.md new file mode 100644 index 00000000000000..0fa3964e77fd11 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_ep_3_44_b_32_lr_1_2e_06_dp_0_3_swati_0_southern_sotho_true_fh_false_hs_0_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_uncased_ep_3_44_b_32_lr_1_2e_06_dp_0_3_swati_0_southern_sotho_true_fh_false_hs_0_pipeline pipeline BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_ep_3_44_b_32_lr_1_2e_06_dp_0_3_swati_0_southern_sotho_true_fh_false_hs_0_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_ep_3_44_b_32_lr_1_2e_06_dp_0_3_swati_0_southern_sotho_true_fh_false_hs_0_pipeline` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_ep_3_44_b_32_lr_1_2e_06_dp_0_3_swati_0_southern_sotho_true_fh_false_hs_0_pipeline_en_5.5.0_3.0_1727042901943.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_ep_3_44_b_32_lr_1_2e_06_dp_0_3_swati_0_southern_sotho_true_fh_false_hs_0_pipeline_en_5.5.0_3.0_1727042901943.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_ep_3_44_b_32_lr_1_2e_06_dp_0_3_swati_0_southern_sotho_true_fh_false_hs_0_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_ep_3_44_b_32_lr_1_2e_06_dp_0_3_swati_0_southern_sotho_true_fh_false_hs_0_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_ep_3_44_b_32_lr_1_2e_06_dp_0_3_swati_0_southern_sotho_true_fh_false_hs_0_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-ep-3.44-b-32-lr-1.2e-06-dp-0.3-ss-0-st-True-fh-False-hs-0 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetune_squad_ep_0_8_lr_1e_06_wd_0_001_dp_0_99999_swati_20000_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetune_squad_ep_0_8_lr_1e_06_wd_0_001_dp_0_99999_swati_20000_en.md new file mode 100644 index 00000000000000..53f19cb1f007ce --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetune_squad_ep_0_8_lr_1e_06_wd_0_001_dp_0_99999_swati_20000_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_uncased_finetune_squad_ep_0_8_lr_1e_06_wd_0_001_dp_0_99999_swati_20000 BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_finetune_squad_ep_0_8_lr_1e_06_wd_0_001_dp_0_99999_swati_20000 +date: 2024-09-22 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetune_squad_ep_0_8_lr_1e_06_wd_0_001_dp_0_99999_swati_20000` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_0_8_lr_1e_06_wd_0_001_dp_0_99999_swati_20000_en_5.5.0_3.0_1726992065533.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_0_8_lr_1e_06_wd_0_001_dp_0_99999_swati_20000_en_5.5.0_3.0_1726992065533.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetune_squad_ep_0_8_lr_1e_06_wd_0_001_dp_0_99999_swati_20000","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetune_squad_ep_0_8_lr_1e_06_wd_0_001_dp_0_99999_swati_20000", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetune_squad_ep_0_8_lr_1e_06_wd_0_001_dp_0_99999_swati_20000| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-finetune-squad-ep-0.8-lr-1e-06-wd-0.001-dp-0.99999-ss-20000 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetune_squad_ep_0_8_lr_1e_06_wd_0_001_dp_0_99999_swati_20000_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetune_squad_ep_0_8_lr_1e_06_wd_0_001_dp_0_99999_swati_20000_pipeline_en.md new file mode 100644 index 00000000000000..7b61ab36b6976a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetune_squad_ep_0_8_lr_1e_06_wd_0_001_dp_0_99999_swati_20000_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_uncased_finetune_squad_ep_0_8_lr_1e_06_wd_0_001_dp_0_99999_swati_20000_pipeline pipeline BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_finetune_squad_ep_0_8_lr_1e_06_wd_0_001_dp_0_99999_swati_20000_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetune_squad_ep_0_8_lr_1e_06_wd_0_001_dp_0_99999_swati_20000_pipeline` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_0_8_lr_1e_06_wd_0_001_dp_0_99999_swati_20000_pipeline_en_5.5.0_3.0_1726992083748.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_0_8_lr_1e_06_wd_0_001_dp_0_99999_swati_20000_pipeline_en_5.5.0_3.0_1726992083748.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_finetune_squad_ep_0_8_lr_1e_06_wd_0_001_dp_0_99999_swati_20000_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_finetune_squad_ep_0_8_lr_1e_06_wd_0_001_dp_0_99999_swati_20000_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetune_squad_ep_0_8_lr_1e_06_wd_0_001_dp_0_99999_swati_20000_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-finetune-squad-ep-0.8-lr-1e-06-wd-0.001-dp-0.99999-ss-20000 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetune_squad_ep_0_9_lr_1e_05_wd_0_001_dp_0_99999_swati_70000_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetune_squad_ep_0_9_lr_1e_05_wd_0_001_dp_0_99999_swati_70000_en.md new file mode 100644 index 00000000000000..49c94a4c5c3d49 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetune_squad_ep_0_9_lr_1e_05_wd_0_001_dp_0_99999_swati_70000_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_uncased_finetune_squad_ep_0_9_lr_1e_05_wd_0_001_dp_0_99999_swati_70000 BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_finetune_squad_ep_0_9_lr_1e_05_wd_0_001_dp_0_99999_swati_70000 +date: 2024-09-22 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetune_squad_ep_0_9_lr_1e_05_wd_0_001_dp_0_99999_swati_70000` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_0_9_lr_1e_05_wd_0_001_dp_0_99999_swati_70000_en_5.5.0_3.0_1727042493060.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_0_9_lr_1e_05_wd_0_001_dp_0_99999_swati_70000_en_5.5.0_3.0_1727042493060.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetune_squad_ep_0_9_lr_1e_05_wd_0_001_dp_0_99999_swati_70000","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetune_squad_ep_0_9_lr_1e_05_wd_0_001_dp_0_99999_swati_70000", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetune_squad_ep_0_9_lr_1e_05_wd_0_001_dp_0_99999_swati_70000| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-finetune-squad-ep-0.9-lr-1e-05-wd-0.001-dp-0.99999-ss-70000 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetune_squad_ep_0_9_lr_1e_05_wd_0_001_dp_0_99999_swati_70000_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetune_squad_ep_0_9_lr_1e_05_wd_0_001_dp_0_99999_swati_70000_pipeline_en.md new file mode 100644 index 00000000000000..32cb5ed6c3c933 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetune_squad_ep_0_9_lr_1e_05_wd_0_001_dp_0_99999_swati_70000_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_uncased_finetune_squad_ep_0_9_lr_1e_05_wd_0_001_dp_0_99999_swati_70000_pipeline pipeline BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_finetune_squad_ep_0_9_lr_1e_05_wd_0_001_dp_0_99999_swati_70000_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetune_squad_ep_0_9_lr_1e_05_wd_0_001_dp_0_99999_swati_70000_pipeline` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_0_9_lr_1e_05_wd_0_001_dp_0_99999_swati_70000_pipeline_en_5.5.0_3.0_1727042513733.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_0_9_lr_1e_05_wd_0_001_dp_0_99999_swati_70000_pipeline_en_5.5.0_3.0_1727042513733.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_finetune_squad_ep_0_9_lr_1e_05_wd_0_001_dp_0_99999_swati_70000_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_finetune_squad_ep_0_9_lr_1e_05_wd_0_001_dp_0_99999_swati_70000_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetune_squad_ep_0_9_lr_1e_05_wd_0_001_dp_0_99999_swati_70000_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-finetune-squad-ep-0.9-lr-1e-05-wd-0.001-dp-0.99999-ss-70000 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetune_squad_ep_1_0_lr_1e_05_wd_0_001_dp_0_99999_swati_300_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetune_squad_ep_1_0_lr_1e_05_wd_0_001_dp_0_99999_swati_300_en.md new file mode 100644 index 00000000000000..b1beed2b861767 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetune_squad_ep_1_0_lr_1e_05_wd_0_001_dp_0_99999_swati_300_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_uncased_finetune_squad_ep_1_0_lr_1e_05_wd_0_001_dp_0_99999_swati_300 BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_finetune_squad_ep_1_0_lr_1e_05_wd_0_001_dp_0_99999_swati_300 +date: 2024-09-22 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetune_squad_ep_1_0_lr_1e_05_wd_0_001_dp_0_99999_swati_300` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_1_0_lr_1e_05_wd_0_001_dp_0_99999_swati_300_en_5.5.0_3.0_1726992444278.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_1_0_lr_1e_05_wd_0_001_dp_0_99999_swati_300_en_5.5.0_3.0_1726992444278.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetune_squad_ep_1_0_lr_1e_05_wd_0_001_dp_0_99999_swati_300","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetune_squad_ep_1_0_lr_1e_05_wd_0_001_dp_0_99999_swati_300", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetune_squad_ep_1_0_lr_1e_05_wd_0_001_dp_0_99999_swati_300| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-finetune-squad-ep-1.0-lr-1e-05-wd-0.001-dp-0.99999-ss-300 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetune_squad_ep_1_0_lr_1e_05_wd_0_001_dp_0_99999_swati_300_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetune_squad_ep_1_0_lr_1e_05_wd_0_001_dp_0_99999_swati_300_pipeline_en.md new file mode 100644 index 00000000000000..3540bbd99dd887 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetune_squad_ep_1_0_lr_1e_05_wd_0_001_dp_0_99999_swati_300_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_uncased_finetune_squad_ep_1_0_lr_1e_05_wd_0_001_dp_0_99999_swati_300_pipeline pipeline BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_finetune_squad_ep_1_0_lr_1e_05_wd_0_001_dp_0_99999_swati_300_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetune_squad_ep_1_0_lr_1e_05_wd_0_001_dp_0_99999_swati_300_pipeline` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_1_0_lr_1e_05_wd_0_001_dp_0_99999_swati_300_pipeline_en_5.5.0_3.0_1726992461663.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_1_0_lr_1e_05_wd_0_001_dp_0_99999_swati_300_pipeline_en_5.5.0_3.0_1726992461663.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_finetune_squad_ep_1_0_lr_1e_05_wd_0_001_dp_0_99999_swati_300_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_finetune_squad_ep_1_0_lr_1e_05_wd_0_001_dp_0_99999_swati_300_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetune_squad_ep_1_0_lr_1e_05_wd_0_001_dp_0_99999_swati_300_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-finetune-squad-ep-1.0-lr-1e-05-wd-0.001-dp-0.99999-ss-300 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetune_squad_ep_1_0_lr_1e_06_wd_0_001_dp_0_2_swati_500_southern_sotho_true_fh_true_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetune_squad_ep_1_0_lr_1e_06_wd_0_001_dp_0_2_swati_500_southern_sotho_true_fh_true_en.md new file mode 100644 index 00000000000000..5fa86b91ae2145 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetune_squad_ep_1_0_lr_1e_06_wd_0_001_dp_0_2_swati_500_southern_sotho_true_fh_true_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_uncased_finetune_squad_ep_1_0_lr_1e_06_wd_0_001_dp_0_2_swati_500_southern_sotho_true_fh_true BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_finetune_squad_ep_1_0_lr_1e_06_wd_0_001_dp_0_2_swati_500_southern_sotho_true_fh_true +date: 2024-09-22 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetune_squad_ep_1_0_lr_1e_06_wd_0_001_dp_0_2_swati_500_southern_sotho_true_fh_true` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_1_0_lr_1e_06_wd_0_001_dp_0_2_swati_500_southern_sotho_true_fh_true_en_5.5.0_3.0_1727042442666.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_1_0_lr_1e_06_wd_0_001_dp_0_2_swati_500_southern_sotho_true_fh_true_en_5.5.0_3.0_1727042442666.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetune_squad_ep_1_0_lr_1e_06_wd_0_001_dp_0_2_swati_500_southern_sotho_true_fh_true","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetune_squad_ep_1_0_lr_1e_06_wd_0_001_dp_0_2_swati_500_southern_sotho_true_fh_true", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetune_squad_ep_1_0_lr_1e_06_wd_0_001_dp_0_2_swati_500_southern_sotho_true_fh_true| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-finetune-squad-ep-1.0-lr-1e-06-wd-0.001-dp-0.2-ss-500-st-True-fh-True \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetune_squad_ep_1_0_lr_1e_06_wd_0_001_dp_0_2_swati_500_southern_sotho_true_fh_true_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetune_squad_ep_1_0_lr_1e_06_wd_0_001_dp_0_2_swati_500_southern_sotho_true_fh_true_pipeline_en.md new file mode 100644 index 00000000000000..b7b0838c55361d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetune_squad_ep_1_0_lr_1e_06_wd_0_001_dp_0_2_swati_500_southern_sotho_true_fh_true_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_uncased_finetune_squad_ep_1_0_lr_1e_06_wd_0_001_dp_0_2_swati_500_southern_sotho_true_fh_true_pipeline pipeline BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_finetune_squad_ep_1_0_lr_1e_06_wd_0_001_dp_0_2_swati_500_southern_sotho_true_fh_true_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetune_squad_ep_1_0_lr_1e_06_wd_0_001_dp_0_2_swati_500_southern_sotho_true_fh_true_pipeline` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_1_0_lr_1e_06_wd_0_001_dp_0_2_swati_500_southern_sotho_true_fh_true_pipeline_en_5.5.0_3.0_1727042463983.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_1_0_lr_1e_06_wd_0_001_dp_0_2_swati_500_southern_sotho_true_fh_true_pipeline_en_5.5.0_3.0_1727042463983.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_finetune_squad_ep_1_0_lr_1e_06_wd_0_001_dp_0_2_swati_500_southern_sotho_true_fh_true_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_finetune_squad_ep_1_0_lr_1e_06_wd_0_001_dp_0_2_swati_500_southern_sotho_true_fh_true_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetune_squad_ep_1_0_lr_1e_06_wd_0_001_dp_0_2_swati_500_southern_sotho_true_fh_true_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-finetune-squad-ep-1.0-lr-1e-06-wd-0.001-dp-0.2-ss-500-st-True-fh-True + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetune_squad_ep_1_0_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_100_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetune_squad_ep_1_0_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_100_en.md new file mode 100644 index 00000000000000..27f51a4dbe8bf1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetune_squad_ep_1_0_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_100_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_uncased_finetune_squad_ep_1_0_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_100 BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_finetune_squad_ep_1_0_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_100 +date: 2024-09-22 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetune_squad_ep_1_0_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_100` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_1_0_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_100_en_5.5.0_3.0_1727049583991.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_1_0_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_100_en_5.5.0_3.0_1727049583991.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetune_squad_ep_1_0_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_100","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetune_squad_ep_1_0_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_100", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetune_squad_ep_1_0_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_100| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-finetune-squad-ep-1.0-lr-4e-07-wd-1e-05-dp-1.0-ss-0-st-False-fh-False-hs-100 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetune_squad_ep_1_0_lr_9e_07_wd_0_001_dp_0_999_swati_0_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetune_squad_ep_1_0_lr_9e_07_wd_0_001_dp_0_999_swati_0_en.md new file mode 100644 index 00000000000000..0f4558287e2ca0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetune_squad_ep_1_0_lr_9e_07_wd_0_001_dp_0_999_swati_0_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_uncased_finetune_squad_ep_1_0_lr_9e_07_wd_0_001_dp_0_999_swati_0 BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_finetune_squad_ep_1_0_lr_9e_07_wd_0_001_dp_0_999_swati_0 +date: 2024-09-22 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetune_squad_ep_1_0_lr_9e_07_wd_0_001_dp_0_999_swati_0` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_1_0_lr_9e_07_wd_0_001_dp_0_999_swati_0_en_5.5.0_3.0_1726992335894.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_1_0_lr_9e_07_wd_0_001_dp_0_999_swati_0_en_5.5.0_3.0_1726992335894.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetune_squad_ep_1_0_lr_9e_07_wd_0_001_dp_0_999_swati_0","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetune_squad_ep_1_0_lr_9e_07_wd_0_001_dp_0_999_swati_0", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetune_squad_ep_1_0_lr_9e_07_wd_0_001_dp_0_999_swati_0| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-finetune-squad-ep-1.0-lr-9e-07-wd-0.001-dp-0.999-ss-0 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetune_squad_ep_1_0_lr_9e_07_wd_0_001_dp_0_999_swati_0_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetune_squad_ep_1_0_lr_9e_07_wd_0_001_dp_0_999_swati_0_pipeline_en.md new file mode 100644 index 00000000000000..726c12bb61594b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetune_squad_ep_1_0_lr_9e_07_wd_0_001_dp_0_999_swati_0_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_uncased_finetune_squad_ep_1_0_lr_9e_07_wd_0_001_dp_0_999_swati_0_pipeline pipeline BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_finetune_squad_ep_1_0_lr_9e_07_wd_0_001_dp_0_999_swati_0_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetune_squad_ep_1_0_lr_9e_07_wd_0_001_dp_0_999_swati_0_pipeline` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_1_0_lr_9e_07_wd_0_001_dp_0_999_swati_0_pipeline_en_5.5.0_3.0_1726992353795.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_1_0_lr_9e_07_wd_0_001_dp_0_999_swati_0_pipeline_en_5.5.0_3.0_1726992353795.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_finetune_squad_ep_1_0_lr_9e_07_wd_0_001_dp_0_999_swati_0_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_finetune_squad_ep_1_0_lr_9e_07_wd_0_001_dp_0_999_swati_0_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetune_squad_ep_1_0_lr_9e_07_wd_0_001_dp_0_999_swati_0_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-finetune-squad-ep-1.0-lr-9e-07-wd-0.001-dp-0.999-ss-0 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetune_squad_ep_1_1_lr_1e_06_wd_0_001_dp_0_99999_swati_70000_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetune_squad_ep_1_1_lr_1e_06_wd_0_001_dp_0_99999_swati_70000_en.md new file mode 100644 index 00000000000000..8e866771059a79 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetune_squad_ep_1_1_lr_1e_06_wd_0_001_dp_0_99999_swati_70000_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_uncased_finetune_squad_ep_1_1_lr_1e_06_wd_0_001_dp_0_99999_swati_70000 BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_finetune_squad_ep_1_1_lr_1e_06_wd_0_001_dp_0_99999_swati_70000 +date: 2024-09-22 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetune_squad_ep_1_1_lr_1e_06_wd_0_001_dp_0_99999_swati_70000` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_1_1_lr_1e_06_wd_0_001_dp_0_99999_swati_70000_en_5.5.0_3.0_1726991700507.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_1_1_lr_1e_06_wd_0_001_dp_0_99999_swati_70000_en_5.5.0_3.0_1726991700507.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetune_squad_ep_1_1_lr_1e_06_wd_0_001_dp_0_99999_swati_70000","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetune_squad_ep_1_1_lr_1e_06_wd_0_001_dp_0_99999_swati_70000", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetune_squad_ep_1_1_lr_1e_06_wd_0_001_dp_0_99999_swati_70000| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-finetune-squad-ep-1.1-lr-1e-06-wd-0.001-dp-0.99999-ss-70000 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetune_squad_ep_1_1_lr_1e_06_wd_0_001_dp_0_99999_swati_70000_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetune_squad_ep_1_1_lr_1e_06_wd_0_001_dp_0_99999_swati_70000_pipeline_en.md new file mode 100644 index 00000000000000..720bdc8b31ce03 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetune_squad_ep_1_1_lr_1e_06_wd_0_001_dp_0_99999_swati_70000_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_uncased_finetune_squad_ep_1_1_lr_1e_06_wd_0_001_dp_0_99999_swati_70000_pipeline pipeline BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_finetune_squad_ep_1_1_lr_1e_06_wd_0_001_dp_0_99999_swati_70000_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetune_squad_ep_1_1_lr_1e_06_wd_0_001_dp_0_99999_swati_70000_pipeline` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_1_1_lr_1e_06_wd_0_001_dp_0_99999_swati_70000_pipeline_en_5.5.0_3.0_1726991718141.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_1_1_lr_1e_06_wd_0_001_dp_0_99999_swati_70000_pipeline_en_5.5.0_3.0_1726991718141.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_finetune_squad_ep_1_1_lr_1e_06_wd_0_001_dp_0_99999_swati_70000_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_finetune_squad_ep_1_1_lr_1e_06_wd_0_001_dp_0_99999_swati_70000_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetune_squad_ep_1_1_lr_1e_06_wd_0_001_dp_0_99999_swati_70000_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-finetune-squad-ep-1.1-lr-1e-06-wd-0.001-dp-0.99999-ss-70000 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetune_squad_ep_1_29_lr_4e_07_wd_1e_05_dp_0_3_swati_0_southern_sotho_false_fh_false_hs_300_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetune_squad_ep_1_29_lr_4e_07_wd_1e_05_dp_0_3_swati_0_southern_sotho_false_fh_false_hs_300_en.md new file mode 100644 index 00000000000000..8d439badf1cf9d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetune_squad_ep_1_29_lr_4e_07_wd_1e_05_dp_0_3_swati_0_southern_sotho_false_fh_false_hs_300_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_uncased_finetune_squad_ep_1_29_lr_4e_07_wd_1e_05_dp_0_3_swati_0_southern_sotho_false_fh_false_hs_300 BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_finetune_squad_ep_1_29_lr_4e_07_wd_1e_05_dp_0_3_swati_0_southern_sotho_false_fh_false_hs_300 +date: 2024-09-22 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetune_squad_ep_1_29_lr_4e_07_wd_1e_05_dp_0_3_swati_0_southern_sotho_false_fh_false_hs_300` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_1_29_lr_4e_07_wd_1e_05_dp_0_3_swati_0_southern_sotho_false_fh_false_hs_300_en_5.5.0_3.0_1726991667410.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_1_29_lr_4e_07_wd_1e_05_dp_0_3_swati_0_southern_sotho_false_fh_false_hs_300_en_5.5.0_3.0_1726991667410.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetune_squad_ep_1_29_lr_4e_07_wd_1e_05_dp_0_3_swati_0_southern_sotho_false_fh_false_hs_300","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetune_squad_ep_1_29_lr_4e_07_wd_1e_05_dp_0_3_swati_0_southern_sotho_false_fh_false_hs_300", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetune_squad_ep_1_29_lr_4e_07_wd_1e_05_dp_0_3_swati_0_southern_sotho_false_fh_false_hs_300| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-finetune-squad-ep-1.29-lr-4e-07-wd-1e-05-dp-0.3-ss-0-st-False-fh-False-hs-300 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetune_squad_ep_1_29_lr_4e_07_wd_1e_05_dp_0_3_swati_0_southern_sotho_false_fh_false_hs_300_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetune_squad_ep_1_29_lr_4e_07_wd_1e_05_dp_0_3_swati_0_southern_sotho_false_fh_false_hs_300_pipeline_en.md new file mode 100644 index 00000000000000..cf23be8c9ab0b5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetune_squad_ep_1_29_lr_4e_07_wd_1e_05_dp_0_3_swati_0_southern_sotho_false_fh_false_hs_300_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_uncased_finetune_squad_ep_1_29_lr_4e_07_wd_1e_05_dp_0_3_swati_0_southern_sotho_false_fh_false_hs_300_pipeline pipeline BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_finetune_squad_ep_1_29_lr_4e_07_wd_1e_05_dp_0_3_swati_0_southern_sotho_false_fh_false_hs_300_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetune_squad_ep_1_29_lr_4e_07_wd_1e_05_dp_0_3_swati_0_southern_sotho_false_fh_false_hs_300_pipeline` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_1_29_lr_4e_07_wd_1e_05_dp_0_3_swati_0_southern_sotho_false_fh_false_hs_300_pipeline_en_5.5.0_3.0_1726991686796.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_1_29_lr_4e_07_wd_1e_05_dp_0_3_swati_0_southern_sotho_false_fh_false_hs_300_pipeline_en_5.5.0_3.0_1726991686796.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_finetune_squad_ep_1_29_lr_4e_07_wd_1e_05_dp_0_3_swati_0_southern_sotho_false_fh_false_hs_300_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_finetune_squad_ep_1_29_lr_4e_07_wd_1e_05_dp_0_3_swati_0_southern_sotho_false_fh_false_hs_300_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetune_squad_ep_1_29_lr_4e_07_wd_1e_05_dp_0_3_swati_0_southern_sotho_false_fh_false_hs_300_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-finetune-squad-ep-1.29-lr-4e-07-wd-1e-05-dp-0.3-ss-0-st-False-fh-False-hs-300 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetune_squad_ep_1_3_lr_1e_06_wd_0_001_dp_0_99999_swati_120000_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetune_squad_ep_1_3_lr_1e_06_wd_0_001_dp_0_99999_swati_120000_en.md new file mode 100644 index 00000000000000..cb96afd32c4c59 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetune_squad_ep_1_3_lr_1e_06_wd_0_001_dp_0_99999_swati_120000_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_uncased_finetune_squad_ep_1_3_lr_1e_06_wd_0_001_dp_0_99999_swati_120000 BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_finetune_squad_ep_1_3_lr_1e_06_wd_0_001_dp_0_99999_swati_120000 +date: 2024-09-22 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetune_squad_ep_1_3_lr_1e_06_wd_0_001_dp_0_99999_swati_120000` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_1_3_lr_1e_06_wd_0_001_dp_0_99999_swati_120000_en_5.5.0_3.0_1727043004595.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_1_3_lr_1e_06_wd_0_001_dp_0_99999_swati_120000_en_5.5.0_3.0_1727043004595.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetune_squad_ep_1_3_lr_1e_06_wd_0_001_dp_0_99999_swati_120000","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetune_squad_ep_1_3_lr_1e_06_wd_0_001_dp_0_99999_swati_120000", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetune_squad_ep_1_3_lr_1e_06_wd_0_001_dp_0_99999_swati_120000| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-finetune-squad-ep-1.3-lr-1e-06-wd-0.001-dp-0.99999-ss-120000 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetune_squad_ep_1_3_lr_1e_06_wd_0_001_dp_0_99999_swati_120000_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetune_squad_ep_1_3_lr_1e_06_wd_0_001_dp_0_99999_swati_120000_pipeline_en.md new file mode 100644 index 00000000000000..8e6c706bbea257 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetune_squad_ep_1_3_lr_1e_06_wd_0_001_dp_0_99999_swati_120000_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_uncased_finetune_squad_ep_1_3_lr_1e_06_wd_0_001_dp_0_99999_swati_120000_pipeline pipeline BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_finetune_squad_ep_1_3_lr_1e_06_wd_0_001_dp_0_99999_swati_120000_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetune_squad_ep_1_3_lr_1e_06_wd_0_001_dp_0_99999_swati_120000_pipeline` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_1_3_lr_1e_06_wd_0_001_dp_0_99999_swati_120000_pipeline_en_5.5.0_3.0_1727043024998.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_1_3_lr_1e_06_wd_0_001_dp_0_99999_swati_120000_pipeline_en_5.5.0_3.0_1727043024998.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_finetune_squad_ep_1_3_lr_1e_06_wd_0_001_dp_0_99999_swati_120000_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_finetune_squad_ep_1_3_lr_1e_06_wd_0_001_dp_0_99999_swati_120000_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetune_squad_ep_1_3_lr_1e_06_wd_0_001_dp_0_99999_swati_120000_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-finetune-squad-ep-1.3-lr-1e-06-wd-0.001-dp-0.99999-ss-120000 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetune_squad_ep_2_0_lr_0_0001_wd_0_1_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetune_squad_ep_2_0_lr_0_0001_wd_0_1_en.md new file mode 100644 index 00000000000000..68f0a93dd13744 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetune_squad_ep_2_0_lr_0_0001_wd_0_1_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_uncased_finetune_squad_ep_2_0_lr_0_0001_wd_0_1 BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_finetune_squad_ep_2_0_lr_0_0001_wd_0_1 +date: 2024-09-22 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetune_squad_ep_2_0_lr_0_0001_wd_0_1` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_2_0_lr_0_0001_wd_0_1_en_5.5.0_3.0_1727042575405.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_2_0_lr_0_0001_wd_0_1_en_5.5.0_3.0_1727042575405.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetune_squad_ep_2_0_lr_0_0001_wd_0_1","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetune_squad_ep_2_0_lr_0_0001_wd_0_1", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetune_squad_ep_2_0_lr_0_0001_wd_0_1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-finetune-squad-ep-2.0-lr-0.0001-wd-0.1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetune_squad_ep_2_0_lr_0_0001_wd_0_1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetune_squad_ep_2_0_lr_0_0001_wd_0_1_pipeline_en.md new file mode 100644 index 00000000000000..39ef7fad96c2af --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetune_squad_ep_2_0_lr_0_0001_wd_0_1_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_uncased_finetune_squad_ep_2_0_lr_0_0001_wd_0_1_pipeline pipeline BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_finetune_squad_ep_2_0_lr_0_0001_wd_0_1_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetune_squad_ep_2_0_lr_0_0001_wd_0_1_pipeline` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_2_0_lr_0_0001_wd_0_1_pipeline_en_5.5.0_3.0_1727042596169.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_2_0_lr_0_0001_wd_0_1_pipeline_en_5.5.0_3.0_1727042596169.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_finetune_squad_ep_2_0_lr_0_0001_wd_0_1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_finetune_squad_ep_2_0_lr_0_0001_wd_0_1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetune_squad_ep_2_0_lr_0_0001_wd_0_1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-finetune-squad-ep-2.0-lr-0.0001-wd-0.1 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetune_squad_ep_2_0_lr_1e_05_wd_0_001_dp_0_02_swati_0_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetune_squad_ep_2_0_lr_1e_05_wd_0_001_dp_0_02_swati_0_en.md new file mode 100644 index 00000000000000..ecdc6ffd21280d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetune_squad_ep_2_0_lr_1e_05_wd_0_001_dp_0_02_swati_0_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_uncased_finetune_squad_ep_2_0_lr_1e_05_wd_0_001_dp_0_02_swati_0 BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_finetune_squad_ep_2_0_lr_1e_05_wd_0_001_dp_0_02_swati_0 +date: 2024-09-22 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetune_squad_ep_2_0_lr_1e_05_wd_0_001_dp_0_02_swati_0` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_2_0_lr_1e_05_wd_0_001_dp_0_02_swati_0_en_5.5.0_3.0_1727042736954.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_2_0_lr_1e_05_wd_0_001_dp_0_02_swati_0_en_5.5.0_3.0_1727042736954.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetune_squad_ep_2_0_lr_1e_05_wd_0_001_dp_0_02_swati_0","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetune_squad_ep_2_0_lr_1e_05_wd_0_001_dp_0_02_swati_0", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetune_squad_ep_2_0_lr_1e_05_wd_0_001_dp_0_02_swati_0| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-finetune-squad-ep-2.0-lr-1e-05-wd-0.001-dp-0.02-ss-0 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetune_squad_ep_2_0_lr_1e_05_wd_0_001_dp_0_02_swati_0_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetune_squad_ep_2_0_lr_1e_05_wd_0_001_dp_0_02_swati_0_pipeline_en.md new file mode 100644 index 00000000000000..3cff31b16a0c2e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetune_squad_ep_2_0_lr_1e_05_wd_0_001_dp_0_02_swati_0_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_uncased_finetune_squad_ep_2_0_lr_1e_05_wd_0_001_dp_0_02_swati_0_pipeline pipeline BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_finetune_squad_ep_2_0_lr_1e_05_wd_0_001_dp_0_02_swati_0_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetune_squad_ep_2_0_lr_1e_05_wd_0_001_dp_0_02_swati_0_pipeline` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_2_0_lr_1e_05_wd_0_001_dp_0_02_swati_0_pipeline_en_5.5.0_3.0_1727042761630.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_2_0_lr_1e_05_wd_0_001_dp_0_02_swati_0_pipeline_en_5.5.0_3.0_1727042761630.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_finetune_squad_ep_2_0_lr_1e_05_wd_0_001_dp_0_02_swati_0_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_finetune_squad_ep_2_0_lr_1e_05_wd_0_001_dp_0_02_swati_0_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetune_squad_ep_2_0_lr_1e_05_wd_0_001_dp_0_02_swati_0_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-finetune-squad-ep-2.0-lr-1e-05-wd-0.001-dp-0.02-ss-0 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetune_squad_ep_2_0_lr_4e_07_wd_0_001_dp_0_999_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetune_squad_ep_2_0_lr_4e_07_wd_0_001_dp_0_999_en.md new file mode 100644 index 00000000000000..9aaf94ba32959b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetune_squad_ep_2_0_lr_4e_07_wd_0_001_dp_0_999_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_uncased_finetune_squad_ep_2_0_lr_4e_07_wd_0_001_dp_0_999 BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_finetune_squad_ep_2_0_lr_4e_07_wd_0_001_dp_0_999 +date: 2024-09-22 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetune_squad_ep_2_0_lr_4e_07_wd_0_001_dp_0_999` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_2_0_lr_4e_07_wd_0_001_dp_0_999_en_5.5.0_3.0_1727042269236.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_2_0_lr_4e_07_wd_0_001_dp_0_999_en_5.5.0_3.0_1727042269236.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetune_squad_ep_2_0_lr_4e_07_wd_0_001_dp_0_999","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetune_squad_ep_2_0_lr_4e_07_wd_0_001_dp_0_999", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetune_squad_ep_2_0_lr_4e_07_wd_0_001_dp_0_999| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-finetune-squad-ep-2.0-lr-4e-07-wd-0.001-dp-0.999 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetune_squad_ep_2_0_lr_4e_07_wd_0_001_dp_0_999_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetune_squad_ep_2_0_lr_4e_07_wd_0_001_dp_0_999_pipeline_en.md new file mode 100644 index 00000000000000..2f93059f06e910 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetune_squad_ep_2_0_lr_4e_07_wd_0_001_dp_0_999_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_uncased_finetune_squad_ep_2_0_lr_4e_07_wd_0_001_dp_0_999_pipeline pipeline BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_finetune_squad_ep_2_0_lr_4e_07_wd_0_001_dp_0_999_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetune_squad_ep_2_0_lr_4e_07_wd_0_001_dp_0_999_pipeline` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_2_0_lr_4e_07_wd_0_001_dp_0_999_pipeline_en_5.5.0_3.0_1727042295683.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_2_0_lr_4e_07_wd_0_001_dp_0_999_pipeline_en_5.5.0_3.0_1727042295683.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_finetune_squad_ep_2_0_lr_4e_07_wd_0_001_dp_0_999_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_finetune_squad_ep_2_0_lr_4e_07_wd_0_001_dp_0_999_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetune_squad_ep_2_0_lr_4e_07_wd_0_001_dp_0_999_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-finetune-squad-ep-2.0-lr-4e-07-wd-0.001-dp-0.999 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetune_squad_ep_3_0_lr_1e_05_wd_0_001_dp_0_99999_swati_800_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetune_squad_ep_3_0_lr_1e_05_wd_0_001_dp_0_99999_swati_800_en.md new file mode 100644 index 00000000000000..4841ad1ffc4052 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetune_squad_ep_3_0_lr_1e_05_wd_0_001_dp_0_99999_swati_800_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_uncased_finetune_squad_ep_3_0_lr_1e_05_wd_0_001_dp_0_99999_swati_800 BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_finetune_squad_ep_3_0_lr_1e_05_wd_0_001_dp_0_99999_swati_800 +date: 2024-09-22 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetune_squad_ep_3_0_lr_1e_05_wd_0_001_dp_0_99999_swati_800` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_3_0_lr_1e_05_wd_0_001_dp_0_99999_swati_800_en_5.5.0_3.0_1727042715022.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_3_0_lr_1e_05_wd_0_001_dp_0_99999_swati_800_en_5.5.0_3.0_1727042715022.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetune_squad_ep_3_0_lr_1e_05_wd_0_001_dp_0_99999_swati_800","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetune_squad_ep_3_0_lr_1e_05_wd_0_001_dp_0_99999_swati_800", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetune_squad_ep_3_0_lr_1e_05_wd_0_001_dp_0_99999_swati_800| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-finetune-squad-ep-3.0-lr-1e-05-wd-0.001-dp-0.99999-ss-800 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetune_squad_ep_3_0_lr_1e_05_wd_0_001_dp_0_99999_swati_800_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetune_squad_ep_3_0_lr_1e_05_wd_0_001_dp_0_99999_swati_800_pipeline_en.md new file mode 100644 index 00000000000000..8ac7b56cc4f1ad --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetune_squad_ep_3_0_lr_1e_05_wd_0_001_dp_0_99999_swati_800_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_uncased_finetune_squad_ep_3_0_lr_1e_05_wd_0_001_dp_0_99999_swati_800_pipeline pipeline BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_finetune_squad_ep_3_0_lr_1e_05_wd_0_001_dp_0_99999_swati_800_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetune_squad_ep_3_0_lr_1e_05_wd_0_001_dp_0_99999_swati_800_pipeline` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_3_0_lr_1e_05_wd_0_001_dp_0_99999_swati_800_pipeline_en_5.5.0_3.0_1727042736259.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_3_0_lr_1e_05_wd_0_001_dp_0_99999_swati_800_pipeline_en_5.5.0_3.0_1727042736259.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_finetune_squad_ep_3_0_lr_1e_05_wd_0_001_dp_0_99999_swati_800_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_finetune_squad_ep_3_0_lr_1e_05_wd_0_001_dp_0_99999_swati_800_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetune_squad_ep_3_0_lr_1e_05_wd_0_001_dp_0_99999_swati_800_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-finetune-squad-ep-3.0-lr-1e-05-wd-0.001-dp-0.99999-ss-800 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetune_squad_ep_4_0_lr_0_0005_wd_0_01_dp_0_41_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetune_squad_ep_4_0_lr_0_0005_wd_0_01_dp_0_41_en.md new file mode 100644 index 00000000000000..fd6a49b9c7702c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetune_squad_ep_4_0_lr_0_0005_wd_0_01_dp_0_41_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_uncased_finetune_squad_ep_4_0_lr_0_0005_wd_0_01_dp_0_41 BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_finetune_squad_ep_4_0_lr_0_0005_wd_0_01_dp_0_41 +date: 2024-09-22 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetune_squad_ep_4_0_lr_0_0005_wd_0_01_dp_0_41` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_4_0_lr_0_0005_wd_0_01_dp_0_41_en_5.5.0_3.0_1727043021678.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_4_0_lr_0_0005_wd_0_01_dp_0_41_en_5.5.0_3.0_1727043021678.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetune_squad_ep_4_0_lr_0_0005_wd_0_01_dp_0_41","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetune_squad_ep_4_0_lr_0_0005_wd_0_01_dp_0_41", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetune_squad_ep_4_0_lr_0_0005_wd_0_01_dp_0_41| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-finetune-squad-ep-4.0-lr-0.0005-wd-0.01-dp-0.41 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetune_squad_ep_4_0_lr_0_0005_wd_0_01_dp_0_41_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetune_squad_ep_4_0_lr_0_0005_wd_0_01_dp_0_41_pipeline_en.md new file mode 100644 index 00000000000000..8d809c25218553 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetune_squad_ep_4_0_lr_0_0005_wd_0_01_dp_0_41_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_uncased_finetune_squad_ep_4_0_lr_0_0005_wd_0_01_dp_0_41_pipeline pipeline BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_finetune_squad_ep_4_0_lr_0_0005_wd_0_01_dp_0_41_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetune_squad_ep_4_0_lr_0_0005_wd_0_01_dp_0_41_pipeline` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_4_0_lr_0_0005_wd_0_01_dp_0_41_pipeline_en_5.5.0_3.0_1727043042350.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_4_0_lr_0_0005_wd_0_01_dp_0_41_pipeline_en_5.5.0_3.0_1727043042350.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_finetune_squad_ep_4_0_lr_0_0005_wd_0_01_dp_0_41_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_finetune_squad_ep_4_0_lr_0_0005_wd_0_01_dp_0_41_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetune_squad_ep_4_0_lr_0_0005_wd_0_01_dp_0_41_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-finetune-squad-ep-4.0-lr-0.0005-wd-0.01-dp-0.41 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetune_squad_ep_4_0_lr_0_0005_wd_0_01_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetune_squad_ep_4_0_lr_0_0005_wd_0_01_en.md new file mode 100644 index 00000000000000..e7f7391909f36f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetune_squad_ep_4_0_lr_0_0005_wd_0_01_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_uncased_finetune_squad_ep_4_0_lr_0_0005_wd_0_01 BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_finetune_squad_ep_4_0_lr_0_0005_wd_0_01 +date: 2024-09-22 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetune_squad_ep_4_0_lr_0_0005_wd_0_01` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_4_0_lr_0_0005_wd_0_01_en_5.5.0_3.0_1726991973615.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_4_0_lr_0_0005_wd_0_01_en_5.5.0_3.0_1726991973615.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetune_squad_ep_4_0_lr_0_0005_wd_0_01","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetune_squad_ep_4_0_lr_0_0005_wd_0_01", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetune_squad_ep_4_0_lr_0_0005_wd_0_01| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-finetune-squad-ep-4.0-lr-0.0005-wd-0.01 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetune_squad_ep_4_0_lr_0_0005_wd_0_01_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetune_squad_ep_4_0_lr_0_0005_wd_0_01_pipeline_en.md new file mode 100644 index 00000000000000..9551fabbf7ceea --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetune_squad_ep_4_0_lr_0_0005_wd_0_01_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_uncased_finetune_squad_ep_4_0_lr_0_0005_wd_0_01_pipeline pipeline BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_finetune_squad_ep_4_0_lr_0_0005_wd_0_01_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetune_squad_ep_4_0_lr_0_0005_wd_0_01_pipeline` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_4_0_lr_0_0005_wd_0_01_pipeline_en_5.5.0_3.0_1726991995181.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_4_0_lr_0_0005_wd_0_01_pipeline_en_5.5.0_3.0_1726991995181.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_finetune_squad_ep_4_0_lr_0_0005_wd_0_01_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_finetune_squad_ep_4_0_lr_0_0005_wd_0_01_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetune_squad_ep_4_0_lr_0_0005_wd_0_01_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-finetune-squad-ep-4.0-lr-0.0005-wd-0.01 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetuned_cola_ardaaras99_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetuned_cola_ardaaras99_en.md new file mode 100644 index 00000000000000..5d2b88d503657e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetuned_cola_ardaaras99_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_uncased_finetuned_cola_ardaaras99 BertForSequenceClassification from ardaaras99 +author: John Snow Labs +name: bert_base_uncased_finetuned_cola_ardaaras99 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetuned_cola_ardaaras99` is a English model originally trained by ardaaras99. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_cola_ardaaras99_en_5.5.0_3.0_1726990987635.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_cola_ardaaras99_en_5.5.0_3.0_1726990987635.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_uncased_finetuned_cola_ardaaras99","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_uncased_finetuned_cola_ardaaras99", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetuned_cola_ardaaras99| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/ardaaras99/bert-base-uncased-finetuned-cola \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetuned_cola_ardaaras99_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetuned_cola_ardaaras99_pipeline_en.md new file mode 100644 index 00000000000000..8691f1a9f80a30 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetuned_cola_ardaaras99_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_base_uncased_finetuned_cola_ardaaras99_pipeline pipeline BertForSequenceClassification from ardaaras99 +author: John Snow Labs +name: bert_base_uncased_finetuned_cola_ardaaras99_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetuned_cola_ardaaras99_pipeline` is a English model originally trained by ardaaras99. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_cola_ardaaras99_pipeline_en_5.5.0_3.0_1726991005636.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_cola_ardaaras99_pipeline_en_5.5.0_3.0_1726991005636.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_finetuned_cola_ardaaras99_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_finetuned_cola_ardaaras99_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetuned_cola_ardaaras99_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/ardaaras99/bert-base-uncased-finetuned-cola + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetuned_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetuned_en.md new file mode 100644 index 00000000000000..bc5be16685f9a2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetuned_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_uncased_finetuned BertForQuestionAnswering from PabloGuinea +author: John Snow Labs +name: bert_base_uncased_finetuned +date: 2024-09-22 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetuned` is a English model originally trained by PabloGuinea. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_en_5.5.0_3.0_1727043049082.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_en_5.5.0_3.0_1727043049082.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetuned","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetuned", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetuned| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/PabloGuinea/bert-base-uncased-finetuned \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetuned_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetuned_pipeline_en.md new file mode 100644 index 00000000000000..496c97e68896bc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetuned_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_uncased_finetuned_pipeline pipeline BertForQuestionAnswering from PabloGuinea +author: John Snow Labs +name: bert_base_uncased_finetuned_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetuned_pipeline` is a English model originally trained by PabloGuinea. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_pipeline_en_5.5.0_3.0_1727043069170.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_pipeline_en_5.5.0_3.0_1727043069170.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_finetuned_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_finetuned_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetuned_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/PabloGuinea/bert-base-uncased-finetuned + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetuned_squad2_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetuned_squad2_en.md new file mode 100644 index 00000000000000..ea8b68831a094f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetuned_squad2_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_uncased_finetuned_squad2 BertForQuestionAnswering from thewiz +author: John Snow Labs +name: bert_base_uncased_finetuned_squad2 +date: 2024-09-22 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetuned_squad2` is a English model originally trained by thewiz. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_squad2_en_5.5.0_3.0_1727042301710.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_squad2_en_5.5.0_3.0_1727042301710.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetuned_squad2","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetuned_squad2", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetuned_squad2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/thewiz/bert-base-uncased-finetuned-squad2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetuned_squad2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetuned_squad2_pipeline_en.md new file mode 100644 index 00000000000000..8d44395f24b6cb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetuned_squad2_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_uncased_finetuned_squad2_pipeline pipeline BertForQuestionAnswering from thewiz +author: John Snow Labs +name: bert_base_uncased_finetuned_squad2_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetuned_squad2_pipeline` is a English model originally trained by thewiz. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_squad2_pipeline_en_5.5.0_3.0_1727042325859.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_squad2_pipeline_en_5.5.0_3.0_1727042325859.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_finetuned_squad2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_finetuned_squad2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetuned_squad2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/thewiz/bert-base-uncased-finetuned-squad2 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_squad_v1_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_squad_v1_en.md new file mode 100644 index 00000000000000..ec007aba1d3fef --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_squad_v1_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_uncased_squad_v1 BertForQuestionAnswering from helenai +author: John Snow Labs +name: bert_base_uncased_squad_v1 +date: 2024-09-22 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_squad_v1` is a English model originally trained by helenai. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_squad_v1_en_5.5.0_3.0_1726978429264.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_squad_v1_en_5.5.0_3.0_1726978429264.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_squad_v1","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_squad_v1", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_squad_v1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/helenai/bert-base-uncased-squad-v1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_squad_v1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_squad_v1_pipeline_en.md new file mode 100644 index 00000000000000..ecd06c06b2577a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_squad_v1_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_uncased_squad_v1_pipeline pipeline BertForQuestionAnswering from helenai +author: John Snow Labs +name: bert_base_uncased_squad_v1_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_squad_v1_pipeline` is a English model originally trained by helenai. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_squad_v1_pipeline_en_5.5.0_3.0_1726978447809.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_squad_v1_pipeline_en_5.5.0_3.0_1726978447809.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_squad_v1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_squad_v1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_squad_v1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/helenai/bert-base-uncased-squad-v1 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_body_shaming_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_body_shaming_en.md new file mode 100644 index 00000000000000..ce24624073851c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_body_shaming_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_body_shaming DistilBertForSequenceClassification from Christina0824 +author: John Snow Labs +name: bert_body_shaming +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_body_shaming` is a English model originally trained by Christina0824. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_body_shaming_en_5.5.0_3.0_1727012334717.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_body_shaming_en_5.5.0_3.0_1727012334717.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("bert_body_shaming","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("bert_body_shaming", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_body_shaming| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Christina0824/BERT_body_shaming \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_body_shaming_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_body_shaming_pipeline_en.md new file mode 100644 index 00000000000000..402c7144b9e225 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_body_shaming_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_body_shaming_pipeline pipeline DistilBertForSequenceClassification from Christina0824 +author: John Snow Labs +name: bert_body_shaming_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_body_shaming_pipeline` is a English model originally trained by Christina0824. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_body_shaming_pipeline_en_5.5.0_3.0_1727012346775.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_body_shaming_pipeline_en_5.5.0_3.0_1727012346775.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_body_shaming_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_body_shaming_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_body_shaming_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Christina0824/BERT_body_shaming + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_classifier_sped_transactions_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_classifier_sped_transactions_en.md new file mode 100644 index 00000000000000..e1f86c2954bd68 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_classifier_sped_transactions_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_classifier_sped_transactions BertForSequenceClassification from lcaffreymaffei +author: John Snow Labs +name: bert_classifier_sped_transactions +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_classifier_sped_transactions` is a English model originally trained by lcaffreymaffei. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_classifier_sped_transactions_en_5.5.0_3.0_1727034094533.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_classifier_sped_transactions_en_5.5.0_3.0_1727034094533.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_classifier_sped_transactions","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_classifier_sped_transactions", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_classifier_sped_transactions| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/lcaffreymaffei/bert_classifier_sped_transactions \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_classifier_sped_transactions_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_classifier_sped_transactions_pipeline_en.md new file mode 100644 index 00000000000000..c5303663b0c9be --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_classifier_sped_transactions_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_classifier_sped_transactions_pipeline pipeline BertForSequenceClassification from lcaffreymaffei +author: John Snow Labs +name: bert_classifier_sped_transactions_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_classifier_sped_transactions_pipeline` is a English model originally trained by lcaffreymaffei. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_classifier_sped_transactions_pipeline_en_5.5.0_3.0_1727034115140.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_classifier_sped_transactions_pipeline_en_5.5.0_3.0_1727034115140.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_classifier_sped_transactions_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_classifier_sped_transactions_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_classifier_sped_transactions_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/lcaffreymaffei/bert_classifier_sped_transactions + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_finetuned_legalentity_ner_accelerate_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_finetuned_legalentity_ner_accelerate_en.md new file mode 100644 index 00000000000000..2fefb5d499b6c9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_finetuned_legalentity_ner_accelerate_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_finetuned_legalentity_ner_accelerate BertForTokenClassification from aimlnerd +author: John Snow Labs +name: bert_finetuned_legalentity_ner_accelerate +date: 2024-09-22 +tags: [en, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_finetuned_legalentity_ner_accelerate` is a English model originally trained by aimlnerd. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_finetuned_legalentity_ner_accelerate_en_5.5.0_3.0_1727045873689.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_finetuned_legalentity_ner_accelerate_en_5.5.0_3.0_1727045873689.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("bert_finetuned_legalentity_ner_accelerate","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_finetuned_legalentity_ner_accelerate", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_finetuned_legalentity_ner_accelerate| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/aimlnerd/bert-finetuned-legalentity-ner-accelerate \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_finetuned_legalentity_ner_accelerate_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_finetuned_legalentity_ner_accelerate_pipeline_en.md new file mode 100644 index 00000000000000..f1099c1d3a5eee --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_finetuned_legalentity_ner_accelerate_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_finetuned_legalentity_ner_accelerate_pipeline pipeline BertForTokenClassification from aimlnerd +author: John Snow Labs +name: bert_finetuned_legalentity_ner_accelerate_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_finetuned_legalentity_ner_accelerate_pipeline` is a English model originally trained by aimlnerd. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_finetuned_legalentity_ner_accelerate_pipeline_en_5.5.0_3.0_1727045893745.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_finetuned_legalentity_ner_accelerate_pipeline_en_5.5.0_3.0_1727045893745.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_finetuned_legalentity_ner_accelerate_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_finetuned_legalentity_ner_accelerate_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_finetuned_legalentity_ner_accelerate_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/aimlnerd/bert-finetuned-legalentity-ner-accelerate + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_finetuned_ner_koakande_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_finetuned_ner_koakande_en.md new file mode 100644 index 00000000000000..14109fad0569ee --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_finetuned_ner_koakande_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_finetuned_ner_koakande BertForTokenClassification from koakande +author: John Snow Labs +name: bert_finetuned_ner_koakande +date: 2024-09-22 +tags: [en, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_finetuned_ner_koakande` is a English model originally trained by koakande. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_finetuned_ner_koakande_en_5.5.0_3.0_1727045435304.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_finetuned_ner_koakande_en_5.5.0_3.0_1727045435304.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("bert_finetuned_ner_koakande","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_finetuned_ner_koakande", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_finetuned_ner_koakande| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/koakande/bert-finetuned-ner \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_finetuned_ner_koakande_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_finetuned_ner_koakande_pipeline_en.md new file mode 100644 index 00000000000000..d3f1e40e49e12b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_finetuned_ner_koakande_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_finetuned_ner_koakande_pipeline pipeline BertForTokenClassification from koakande +author: John Snow Labs +name: bert_finetuned_ner_koakande_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_finetuned_ner_koakande_pipeline` is a English model originally trained by koakande. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_finetuned_ner_koakande_pipeline_en_5.5.0_3.0_1727045459415.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_finetuned_ner_koakande_pipeline_en_5.5.0_3.0_1727045459415.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_finetuned_ner_koakande_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_finetuned_ner_koakande_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_finetuned_ner_koakande_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/koakande/bert-finetuned-ner + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_finetuned_sql_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_finetuned_sql_en.md new file mode 100644 index 00000000000000..56d6e2ad31ab35 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_finetuned_sql_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_finetuned_sql BertForQuestionAnswering from AlexYang33 +author: John Snow Labs +name: bert_finetuned_sql +date: 2024-09-22 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_finetuned_sql` is a English model originally trained by AlexYang33. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_finetuned_sql_en_5.5.0_3.0_1726978413801.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_finetuned_sql_en_5.5.0_3.0_1726978413801.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_finetuned_sql","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_finetuned_sql", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_finetuned_sql| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/AlexYang33/bert-finetuned-sql \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_finetuned_squad_benroma_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_finetuned_squad_benroma_en.md new file mode 100644 index 00000000000000..454d34d155232c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_finetuned_squad_benroma_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_finetuned_squad_benroma BertForQuestionAnswering from benroma +author: John Snow Labs +name: bert_finetuned_squad_benroma +date: 2024-09-22 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_finetuned_squad_benroma` is a English model originally trained by benroma. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_finetuned_squad_benroma_en_5.5.0_3.0_1727049384527.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_finetuned_squad_benroma_en_5.5.0_3.0_1727049384527.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_finetuned_squad_benroma","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_finetuned_squad_benroma", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_finetuned_squad_benroma| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/benroma/bert-finetuned-squad \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_finetuned_squad_benroma_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_finetuned_squad_benroma_pipeline_en.md new file mode 100644 index 00000000000000..a18befb2f3ea58 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_finetuned_squad_benroma_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_finetuned_squad_benroma_pipeline pipeline BertForQuestionAnswering from benroma +author: John Snow Labs +name: bert_finetuned_squad_benroma_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_finetuned_squad_benroma_pipeline` is a English model originally trained by benroma. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_finetuned_squad_benroma_pipeline_en_5.5.0_3.0_1727049406644.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_finetuned_squad_benroma_pipeline_en_5.5.0_3.0_1727049406644.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_finetuned_squad_benroma_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_finetuned_squad_benroma_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_finetuned_squad_benroma_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/benroma/bert-finetuned-squad + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_finetuned_squad_robkayinto_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_finetuned_squad_robkayinto_en.md new file mode 100644 index 00000000000000..554d857f5596f8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_finetuned_squad_robkayinto_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_finetuned_squad_robkayinto BertForQuestionAnswering from robkayinto +author: John Snow Labs +name: bert_finetuned_squad_robkayinto +date: 2024-09-22 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_finetuned_squad_robkayinto` is a English model originally trained by robkayinto. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_finetuned_squad_robkayinto_en_5.5.0_3.0_1727042792146.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_finetuned_squad_robkayinto_en_5.5.0_3.0_1727042792146.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_finetuned_squad_robkayinto","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_finetuned_squad_robkayinto", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_finetuned_squad_robkayinto| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/robkayinto/bert-finetuned-squad \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_finetuned_squad_robkayinto_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_finetuned_squad_robkayinto_pipeline_en.md new file mode 100644 index 00000000000000..b4a6ae054f3317 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_finetuned_squad_robkayinto_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_finetuned_squad_robkayinto_pipeline pipeline BertForQuestionAnswering from robkayinto +author: John Snow Labs +name: bert_finetuned_squad_robkayinto_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_finetuned_squad_robkayinto_pipeline` is a English model originally trained by robkayinto. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_finetuned_squad_robkayinto_pipeline_en_5.5.0_3.0_1727042812823.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_finetuned_squad_robkayinto_pipeline_en_5.5.0_3.0_1727042812823.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_finetuned_squad_robkayinto_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_finetuned_squad_robkayinto_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_finetuned_squad_robkayinto_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/robkayinto/bert-finetuned-squad + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_finetuned_swag_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_finetuned_swag_en.md new file mode 100644 index 00000000000000..b31b33c899701c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_finetuned_swag_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_finetuned_swag BertForQuestionAnswering from ashaduzzaman +author: John Snow Labs +name: bert_finetuned_swag +date: 2024-09-22 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_finetuned_swag` is a English model originally trained by ashaduzzaman. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_finetuned_swag_en_5.5.0_3.0_1726992351186.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_finetuned_swag_en_5.5.0_3.0_1726992351186.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_finetuned_swag","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_finetuned_swag", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_finetuned_swag| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/ashaduzzaman/bert-finetuned-swag \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_finetuned_swag_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_finetuned_swag_pipeline_en.md new file mode 100644 index 00000000000000..3567687efcec6e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_finetuned_swag_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_finetuned_swag_pipeline pipeline BertForQuestionAnswering from ashaduzzaman +author: John Snow Labs +name: bert_finetuned_swag_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_finetuned_swag_pipeline` is a English model originally trained by ashaduzzaman. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_finetuned_swag_pipeline_en_5.5.0_3.0_1726992369375.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_finetuned_swag_pipeline_en_5.5.0_3.0_1726992369375.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_finetuned_swag_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_finetuned_swag_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_finetuned_swag_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/ashaduzzaman/bert-finetuned-swag + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_human_label_multiperspective_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_human_label_multiperspective_en.md new file mode 100644 index 00000000000000..bd5c02ce3d8a05 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_human_label_multiperspective_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_human_label_multiperspective BertForSequenceClassification from Multiperspective +author: John Snow Labs +name: bert_human_label_multiperspective +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_human_label_multiperspective` is a English model originally trained by Multiperspective. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_human_label_multiperspective_en_5.5.0_3.0_1727032927670.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_human_label_multiperspective_en_5.5.0_3.0_1727032927670.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_human_label_multiperspective","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_human_label_multiperspective", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_human_label_multiperspective| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/Multiperspective/bert-human_label \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_human_label_multiperspective_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_human_label_multiperspective_pipeline_en.md new file mode 100644 index 00000000000000..f31c648ea086c6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_human_label_multiperspective_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_human_label_multiperspective_pipeline pipeline BertForSequenceClassification from Multiperspective +author: John Snow Labs +name: bert_human_label_multiperspective_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_human_label_multiperspective_pipeline` is a English model originally trained by Multiperspective. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_human_label_multiperspective_pipeline_en_5.5.0_3.0_1727032988381.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_human_label_multiperspective_pipeline_en_5.5.0_3.0_1727032988381.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_human_label_multiperspective_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_human_label_multiperspective_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_human_label_multiperspective_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/Multiperspective/bert-human_label + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_imdb_danielcd99_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_imdb_danielcd99_en.md new file mode 100644 index 00000000000000..1d4107eee7bba6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_imdb_danielcd99_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_imdb_danielcd99 BertForSequenceClassification from danielcd99 +author: John Snow Labs +name: bert_imdb_danielcd99 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_imdb_danielcd99` is a English model originally trained by danielcd99. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_imdb_danielcd99_en_5.5.0_3.0_1727032019543.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_imdb_danielcd99_en_5.5.0_3.0_1727032019543.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_imdb_danielcd99","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_imdb_danielcd99", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_imdb_danielcd99| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/danielcd99/BERT_imdb \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_imdb_danielcd99_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_imdb_danielcd99_pipeline_en.md new file mode 100644 index 00000000000000..f80d72e6632711 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_imdb_danielcd99_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_imdb_danielcd99_pipeline pipeline BertForSequenceClassification from danielcd99 +author: John Snow Labs +name: bert_imdb_danielcd99_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_imdb_danielcd99_pipeline` is a English model originally trained by danielcd99. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_imdb_danielcd99_pipeline_en_5.5.0_3.0_1727032044936.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_imdb_danielcd99_pipeline_en_5.5.0_3.0_1727032044936.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_imdb_danielcd99_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_imdb_danielcd99_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_imdb_danielcd99_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/danielcd99/BERT_imdb + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_large_portuguese_cased_assin2_entailment_pipeline_pt.md b/docs/_posts/ahmedlone127/2024-09-22-bert_large_portuguese_cased_assin2_entailment_pipeline_pt.md new file mode 100644 index 00000000000000..d4891bf9723144 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_large_portuguese_cased_assin2_entailment_pipeline_pt.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Portuguese bert_large_portuguese_cased_assin2_entailment_pipeline pipeline BertForSequenceClassification from ruanchaves +author: John Snow Labs +name: bert_large_portuguese_cased_assin2_entailment_pipeline +date: 2024-09-22 +tags: [pt, open_source, pipeline, onnx] +task: Text Classification +language: pt +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_large_portuguese_cased_assin2_entailment_pipeline` is a Portuguese model originally trained by ruanchaves. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_large_portuguese_cased_assin2_entailment_pipeline_pt_5.5.0_3.0_1727032551203.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_large_portuguese_cased_assin2_entailment_pipeline_pt_5.5.0_3.0_1727032551203.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_large_portuguese_cased_assin2_entailment_pipeline", lang = "pt") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_large_portuguese_cased_assin2_entailment_pipeline", lang = "pt") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_large_portuguese_cased_assin2_entailment_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|pt| +|Size:|1.3 GB| + +## References + +https://huggingface.co/ruanchaves/bert-large-portuguese-cased-assin2-entailment + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_large_uncased_whole_word_masking_finetuned_squad_ozlemsenel_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_large_uncased_whole_word_masking_finetuned_squad_ozlemsenel_en.md new file mode 100644 index 00000000000000..316cff87b2d429 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_large_uncased_whole_word_masking_finetuned_squad_ozlemsenel_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_large_uncased_whole_word_masking_finetuned_squad_ozlemsenel BertForQuestionAnswering from ozlemsenel +author: John Snow Labs +name: bert_large_uncased_whole_word_masking_finetuned_squad_ozlemsenel +date: 2024-09-22 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_large_uncased_whole_word_masking_finetuned_squad_ozlemsenel` is a English model originally trained by ozlemsenel. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_large_uncased_whole_word_masking_finetuned_squad_ozlemsenel_en_5.5.0_3.0_1727049284300.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_large_uncased_whole_word_masking_finetuned_squad_ozlemsenel_en_5.5.0_3.0_1727049284300.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_large_uncased_whole_word_masking_finetuned_squad_ozlemsenel","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_large_uncased_whole_word_masking_finetuned_squad_ozlemsenel", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_large_uncased_whole_word_masking_finetuned_squad_ozlemsenel| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/ozlemsenel/bert-large-uncased-whole-word-masking-finetuned-squad \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_large_uncased_wikistance_v1_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_large_uncased_wikistance_v1_en.md new file mode 100644 index 00000000000000..83dd6550722e8b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_large_uncased_wikistance_v1_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_large_uncased_wikistance_v1 BertForSequenceClassification from research-dump +author: John Snow Labs +name: bert_large_uncased_wikistance_v1 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_large_uncased_wikistance_v1` is a English model originally trained by research-dump. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_large_uncased_wikistance_v1_en_5.5.0_3.0_1726989229205.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_large_uncased_wikistance_v1_en_5.5.0_3.0_1726989229205.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_large_uncased_wikistance_v1","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_large_uncased_wikistance_v1", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_large_uncased_wikistance_v1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/research-dump/bert-large-uncased_wikistance_v1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_large_uncased_wikistance_v1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_large_uncased_wikistance_v1_pipeline_en.md new file mode 100644 index 00000000000000..edce96cf4bb5f0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_large_uncased_wikistance_v1_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_large_uncased_wikistance_v1_pipeline pipeline BertForSequenceClassification from research-dump +author: John Snow Labs +name: bert_large_uncased_wikistance_v1_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_large_uncased_wikistance_v1_pipeline` is a English model originally trained by research-dump. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_large_uncased_wikistance_v1_pipeline_en_5.5.0_3.0_1726989281961.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_large_uncased_wikistance_v1_pipeline_en_5.5.0_3.0_1726989281961.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_large_uncased_wikistance_v1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_large_uncased_wikistance_v1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_large_uncased_wikistance_v1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/research-dump/bert-large-uncased_wikistance_v1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_racial_bias_model_80_0k_samples_fold_2_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_racial_bias_model_80_0k_samples_fold_2_en.md new file mode 100644 index 00000000000000..728314f278fe7b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_racial_bias_model_80_0k_samples_fold_2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_racial_bias_model_80_0k_samples_fold_2 DistilBertForSequenceClassification from jamnik99 +author: John Snow Labs +name: bert_racial_bias_model_80_0k_samples_fold_2 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_racial_bias_model_80_0k_samples_fold_2` is a English model originally trained by jamnik99. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_racial_bias_model_80_0k_samples_fold_2_en_5.5.0_3.0_1727020980716.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_racial_bias_model_80_0k_samples_fold_2_en_5.5.0_3.0_1727020980716.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("bert_racial_bias_model_80_0k_samples_fold_2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("bert_racial_bias_model_80_0k_samples_fold_2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_racial_bias_model_80_0k_samples_fold_2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/jamnik99/BERT-racial_bias_model_80.0K_samples_fold_2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_racial_bias_model_80_0k_samples_fold_2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_racial_bias_model_80_0k_samples_fold_2_pipeline_en.md new file mode 100644 index 00000000000000..fa1c92678248af --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_racial_bias_model_80_0k_samples_fold_2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_racial_bias_model_80_0k_samples_fold_2_pipeline pipeline DistilBertForSequenceClassification from jamnik99 +author: John Snow Labs +name: bert_racial_bias_model_80_0k_samples_fold_2_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_racial_bias_model_80_0k_samples_fold_2_pipeline` is a English model originally trained by jamnik99. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_racial_bias_model_80_0k_samples_fold_2_pipeline_en_5.5.0_3.0_1727020992605.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_racial_bias_model_80_0k_samples_fold_2_pipeline_en_5.5.0_3.0_1727020992605.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_racial_bias_model_80_0k_samples_fold_2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_racial_bias_model_80_0k_samples_fold_2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_racial_bias_model_80_0k_samples_fold_2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/jamnik99/BERT-racial_bias_model_80.0K_samples_fold_2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_vllm_gemma2b_llmoversight_0_5_nodropsus_6_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_vllm_gemma2b_llmoversight_0_5_nodropsus_6_en.md new file mode 100644 index 00000000000000..9d1801a0341105 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_vllm_gemma2b_llmoversight_0_5_nodropsus_6_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_vllm_gemma2b_llmoversight_0_5_nodropsus_6 DistilBertForSequenceClassification from jvelja +author: John Snow Labs +name: bert_vllm_gemma2b_llmoversight_0_5_nodropsus_6 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_vllm_gemma2b_llmoversight_0_5_nodropsus_6` is a English model originally trained by jvelja. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_vllm_gemma2b_llmoversight_0_5_nodropsus_6_en_5.5.0_3.0_1727012953910.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_vllm_gemma2b_llmoversight_0_5_nodropsus_6_en_5.5.0_3.0_1727012953910.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("bert_vllm_gemma2b_llmoversight_0_5_nodropsus_6","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("bert_vllm_gemma2b_llmoversight_0_5_nodropsus_6", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_vllm_gemma2b_llmoversight_0_5_nodropsus_6| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/jvelja/BERT_vllm-gemma2b-llmOversight-0.5-noDropSus_6 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_vllm_gemma2b_llmoversight_0_5_nodropsus_6_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_vllm_gemma2b_llmoversight_0_5_nodropsus_6_pipeline_en.md new file mode 100644 index 00000000000000..fdf51ca5d345e7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_vllm_gemma2b_llmoversight_0_5_nodropsus_6_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_vllm_gemma2b_llmoversight_0_5_nodropsus_6_pipeline pipeline DistilBertForSequenceClassification from jvelja +author: John Snow Labs +name: bert_vllm_gemma2b_llmoversight_0_5_nodropsus_6_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_vllm_gemma2b_llmoversight_0_5_nodropsus_6_pipeline` is a English model originally trained by jvelja. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_vllm_gemma2b_llmoversight_0_5_nodropsus_6_pipeline_en_5.5.0_3.0_1727012966303.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_vllm_gemma2b_llmoversight_0_5_nodropsus_6_pipeline_en_5.5.0_3.0_1727012966303.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_vllm_gemma2b_llmoversight_0_5_nodropsus_6_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_vllm_gemma2b_llmoversight_0_5_nodropsus_6_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_vllm_gemma2b_llmoversight_0_5_nodropsus_6_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/jvelja/BERT_vllm-gemma2b-llmOversight-0.5-noDropSus_6 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bertmodel_thestriker117_en.md b/docs/_posts/ahmedlone127/2024-09-22-bertmodel_thestriker117_en.md new file mode 100644 index 00000000000000..28700d63d9f815 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bertmodel_thestriker117_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bertmodel_thestriker117 DistilBertForSequenceClassification from thestriker117 +author: John Snow Labs +name: bertmodel_thestriker117 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bertmodel_thestriker117` is a English model originally trained by thestriker117. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bertmodel_thestriker117_en_5.5.0_3.0_1727020685521.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bertmodel_thestriker117_en_5.5.0_3.0_1727020685521.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("bertmodel_thestriker117","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("bertmodel_thestriker117", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bertmodel_thestriker117| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/thestriker117/bertModel \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bertmodel_thestriker117_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-bertmodel_thestriker117_pipeline_en.md new file mode 100644 index 00000000000000..d778964da3f8a8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bertmodel_thestriker117_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bertmodel_thestriker117_pipeline pipeline DistilBertForSequenceClassification from thestriker117 +author: John Snow Labs +name: bertmodel_thestriker117_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bertmodel_thestriker117_pipeline` is a English model originally trained by thestriker117. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bertmodel_thestriker117_pipeline_en_5.5.0_3.0_1727020697845.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bertmodel_thestriker117_pipeline_en_5.5.0_3.0_1727020697845.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bertmodel_thestriker117_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bertmodel_thestriker117_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bertmodel_thestriker117_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/thestriker117/bertModel + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bigbio_mtl_en.md b/docs/_posts/ahmedlone127/2024-09-22-bigbio_mtl_en.md new file mode 100644 index 00000000000000..732b132c37d988 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bigbio_mtl_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bigbio_mtl BertForTokenClassification from bigbio +author: John Snow Labs +name: bigbio_mtl +date: 2024-09-22 +tags: [en, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bigbio_mtl` is a English model originally trained by bigbio. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bigbio_mtl_en_5.5.0_3.0_1726974987920.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bigbio_mtl_en_5.5.0_3.0_1726974987920.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("bigbio_mtl","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bigbio_mtl", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bigbio_mtl| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|408.9 MB| + +## References + +https://huggingface.co/bigbio/bigbio-mtl \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bigbio_mtl_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-bigbio_mtl_pipeline_en.md new file mode 100644 index 00000000000000..fce8efd0644a2d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bigbio_mtl_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bigbio_mtl_pipeline pipeline BertForTokenClassification from bigbio +author: John Snow Labs +name: bigbio_mtl_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bigbio_mtl_pipeline` is a English model originally trained by bigbio. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bigbio_mtl_pipeline_en_5.5.0_3.0_1726975006060.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bigbio_mtl_pipeline_en_5.5.0_3.0_1726975006060.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bigbio_mtl_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bigbio_mtl_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bigbio_mtl_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.0 MB| + +## References + +https://huggingface.co/bigbio/bigbio-mtl + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-binary_stock_classifier_en.md b/docs/_posts/ahmedlone127/2024-09-22-binary_stock_classifier_en.md new file mode 100644 index 00000000000000..1fa84aefede819 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-binary_stock_classifier_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English binary_stock_classifier DistilBertForSequenceClassification from hkufyp2024 +author: John Snow Labs +name: binary_stock_classifier +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`binary_stock_classifier` is a English model originally trained by hkufyp2024. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/binary_stock_classifier_en_5.5.0_3.0_1727035396440.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/binary_stock_classifier_en_5.5.0_3.0_1727035396440.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("binary_stock_classifier","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("binary_stock_classifier", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|binary_stock_classifier| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/hkufyp2024/binary-stock-classifier \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-binary_stock_classifier_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-binary_stock_classifier_pipeline_en.md new file mode 100644 index 00000000000000..2a4dcdefff4eae --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-binary_stock_classifier_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English binary_stock_classifier_pipeline pipeline DistilBertForSequenceClassification from hkufyp2024 +author: John Snow Labs +name: binary_stock_classifier_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`binary_stock_classifier_pipeline` is a English model originally trained by hkufyp2024. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/binary_stock_classifier_pipeline_en_5.5.0_3.0_1727035409774.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/binary_stock_classifier_pipeline_en_5.5.0_3.0_1727035409774.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("binary_stock_classifier_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("binary_stock_classifier_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|binary_stock_classifier_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/hkufyp2024/binary-stock-classifier + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bodo_roberta_base_sentencepiece_mlm_en.md b/docs/_posts/ahmedlone127/2024-09-22-bodo_roberta_base_sentencepiece_mlm_en.md new file mode 100644 index 00000000000000..69b46a13e7eeea --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bodo_roberta_base_sentencepiece_mlm_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bodo_roberta_base_sentencepiece_mlm RoBertaEmbeddings from alayaran +author: John Snow Labs +name: bodo_roberta_base_sentencepiece_mlm +date: 2024-09-22 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bodo_roberta_base_sentencepiece_mlm` is a English model originally trained by alayaran. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bodo_roberta_base_sentencepiece_mlm_en_5.5.0_3.0_1726999903050.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bodo_roberta_base_sentencepiece_mlm_en_5.5.0_3.0_1726999903050.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("bodo_roberta_base_sentencepiece_mlm","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("bodo_roberta_base_sentencepiece_mlm","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bodo_roberta_base_sentencepiece_mlm| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|466.1 MB| + +## References + +https://huggingface.co/alayaran/bodo-roberta-base-sentencepiece-mlm \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bodo_roberta_base_sentencepiece_mlm_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-bodo_roberta_base_sentencepiece_mlm_pipeline_en.md new file mode 100644 index 00000000000000..19b62a51b79176 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bodo_roberta_base_sentencepiece_mlm_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bodo_roberta_base_sentencepiece_mlm_pipeline pipeline RoBertaEmbeddings from alayaran +author: John Snow Labs +name: bodo_roberta_base_sentencepiece_mlm_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bodo_roberta_base_sentencepiece_mlm_pipeline` is a English model originally trained by alayaran. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bodo_roberta_base_sentencepiece_mlm_pipeline_en_5.5.0_3.0_1726999924419.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bodo_roberta_base_sentencepiece_mlm_pipeline_en_5.5.0_3.0_1726999924419.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bodo_roberta_base_sentencepiece_mlm_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bodo_roberta_base_sentencepiece_mlm_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bodo_roberta_base_sentencepiece_mlm_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|466.1 MB| + +## References + +https://huggingface.co/alayaran/bodo-roberta-base-sentencepiece-mlm + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-brand_classification_20240708_model_2_distilbert_0_980011_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-brand_classification_20240708_model_2_distilbert_0_980011_pipeline_en.md new file mode 100644 index 00000000000000..378ab7d77ac32c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-brand_classification_20240708_model_2_distilbert_0_980011_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English brand_classification_20240708_model_2_distilbert_0_980011_pipeline pipeline DistilBertForSequenceClassification from jointriple +author: John Snow Labs +name: brand_classification_20240708_model_2_distilbert_0_980011_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`brand_classification_20240708_model_2_distilbert_0_980011_pipeline` is a English model originally trained by jointriple. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/brand_classification_20240708_model_2_distilbert_0_980011_pipeline_en_5.5.0_3.0_1727012247491.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/brand_classification_20240708_model_2_distilbert_0_980011_pipeline_en_5.5.0_3.0_1727012247491.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("brand_classification_20240708_model_2_distilbert_0_980011_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("brand_classification_20240708_model_2_distilbert_0_980011_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|brand_classification_20240708_model_2_distilbert_0_980011_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|255.5 MB| + +## References + +https://huggingface.co/jointriple/brand_classification_20240708_model_2_distilbert_0_980011 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_eli5_mlm_model_afishally_en.md b/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_eli5_mlm_model_afishally_en.md new file mode 100644 index 00000000000000..1cbb1753e2288a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_eli5_mlm_model_afishally_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_awesome_eli5_mlm_model_afishally RoBertaEmbeddings from Afishally +author: John Snow Labs +name: burmese_awesome_eli5_mlm_model_afishally +date: 2024-09-22 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_eli5_mlm_model_afishally` is a English model originally trained by Afishally. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_eli5_mlm_model_afishally_en_5.5.0_3.0_1727041765860.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_eli5_mlm_model_afishally_en_5.5.0_3.0_1727041765860.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("burmese_awesome_eli5_mlm_model_afishally","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("burmese_awesome_eli5_mlm_model_afishally","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_eli5_mlm_model_afishally| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|306.5 MB| + +## References + +https://huggingface.co/Afishally/my_awesome_eli5_mlm_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_eli5_mlm_model_afishally_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_eli5_mlm_model_afishally_pipeline_en.md new file mode 100644 index 00000000000000..0dd8cb12dd7c33 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_eli5_mlm_model_afishally_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_eli5_mlm_model_afishally_pipeline pipeline RoBertaEmbeddings from Afishally +author: John Snow Labs +name: burmese_awesome_eli5_mlm_model_afishally_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_eli5_mlm_model_afishally_pipeline` is a English model originally trained by Afishally. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_eli5_mlm_model_afishally_pipeline_en_5.5.0_3.0_1727041780956.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_eli5_mlm_model_afishally_pipeline_en_5.5.0_3.0_1727041780956.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_eli5_mlm_model_afishally_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_eli5_mlm_model_afishally_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_eli5_mlm_model_afishally_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|306.5 MB| + +## References + +https://huggingface.co/Afishally/my_awesome_eli5_mlm_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_eli5_mlm_model_ashdev01_en.md b/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_eli5_mlm_model_ashdev01_en.md new file mode 100644 index 00000000000000..1d5ec3d7572868 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_eli5_mlm_model_ashdev01_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_awesome_eli5_mlm_model_ashdev01 RoBertaEmbeddings from ashdev01 +author: John Snow Labs +name: burmese_awesome_eli5_mlm_model_ashdev01 +date: 2024-09-22 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_eli5_mlm_model_ashdev01` is a English model originally trained by ashdev01. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_eli5_mlm_model_ashdev01_en_5.5.0_3.0_1726999766178.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_eli5_mlm_model_ashdev01_en_5.5.0_3.0_1726999766178.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("burmese_awesome_eli5_mlm_model_ashdev01","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("burmese_awesome_eli5_mlm_model_ashdev01","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_eli5_mlm_model_ashdev01| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|306.4 MB| + +## References + +https://huggingface.co/ashdev01/my_awesome_eli5_mlm_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_eli5_mlm_model_ashdev01_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_eli5_mlm_model_ashdev01_pipeline_en.md new file mode 100644 index 00000000000000..cadaad1651d3be --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_eli5_mlm_model_ashdev01_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_eli5_mlm_model_ashdev01_pipeline pipeline RoBertaEmbeddings from ashdev01 +author: John Snow Labs +name: burmese_awesome_eli5_mlm_model_ashdev01_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_eli5_mlm_model_ashdev01_pipeline` is a English model originally trained by ashdev01. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_eli5_mlm_model_ashdev01_pipeline_en_5.5.0_3.0_1726999780012.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_eli5_mlm_model_ashdev01_pipeline_en_5.5.0_3.0_1726999780012.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_eli5_mlm_model_ashdev01_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_eli5_mlm_model_ashdev01_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_eli5_mlm_model_ashdev01_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|306.4 MB| + +## References + +https://huggingface.co/ashdev01/my_awesome_eli5_mlm_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_eli5_mlm_model_cecilia0409_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_eli5_mlm_model_cecilia0409_pipeline_en.md new file mode 100644 index 00000000000000..22e57c92fd6c60 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_eli5_mlm_model_cecilia0409_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_eli5_mlm_model_cecilia0409_pipeline pipeline RoBertaEmbeddings from Cecilia0409 +author: John Snow Labs +name: burmese_awesome_eli5_mlm_model_cecilia0409_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_eli5_mlm_model_cecilia0409_pipeline` is a English model originally trained by Cecilia0409. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_eli5_mlm_model_cecilia0409_pipeline_en_5.5.0_3.0_1727000032780.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_eli5_mlm_model_cecilia0409_pipeline_en_5.5.0_3.0_1727000032780.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_eli5_mlm_model_cecilia0409_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_eli5_mlm_model_cecilia0409_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_eli5_mlm_model_cecilia0409_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|306.5 MB| + +## References + +https://huggingface.co/Cecilia0409/my_awesome_eli5_mlm_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_eli5_mlm_model_jesslimzhiqi_en.md b/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_eli5_mlm_model_jesslimzhiqi_en.md new file mode 100644 index 00000000000000..ec4b4e6bdf6cc8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_eli5_mlm_model_jesslimzhiqi_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_awesome_eli5_mlm_model_jesslimzhiqi RoBertaEmbeddings from JESSLIMZHIQI +author: John Snow Labs +name: burmese_awesome_eli5_mlm_model_jesslimzhiqi +date: 2024-09-22 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_eli5_mlm_model_jesslimzhiqi` is a English model originally trained by JESSLIMZHIQI. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_eli5_mlm_model_jesslimzhiqi_en_5.5.0_3.0_1726999858489.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_eli5_mlm_model_jesslimzhiqi_en_5.5.0_3.0_1726999858489.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("burmese_awesome_eli5_mlm_model_jesslimzhiqi","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("burmese_awesome_eli5_mlm_model_jesslimzhiqi","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_eli5_mlm_model_jesslimzhiqi| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|306.5 MB| + +## References + +https://huggingface.co/JESSLIMZHIQI/my_awesome_eli5_mlm_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_eli5_mlm_model_jesslimzhiqi_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_eli5_mlm_model_jesslimzhiqi_pipeline_en.md new file mode 100644 index 00000000000000..7335014be4ec37 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_eli5_mlm_model_jesslimzhiqi_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_eli5_mlm_model_jesslimzhiqi_pipeline pipeline RoBertaEmbeddings from JESSLIMZHIQI +author: John Snow Labs +name: burmese_awesome_eli5_mlm_model_jesslimzhiqi_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_eli5_mlm_model_jesslimzhiqi_pipeline` is a English model originally trained by JESSLIMZHIQI. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_eli5_mlm_model_jesslimzhiqi_pipeline_en_5.5.0_3.0_1726999872140.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_eli5_mlm_model_jesslimzhiqi_pipeline_en_5.5.0_3.0_1726999872140.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_eli5_mlm_model_jesslimzhiqi_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_eli5_mlm_model_jesslimzhiqi_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_eli5_mlm_model_jesslimzhiqi_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|306.5 MB| + +## References + +https://huggingface.co/JESSLIMZHIQI/my_awesome_eli5_mlm_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_eli5_mlm_model_philander_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_eli5_mlm_model_philander_pipeline_en.md new file mode 100644 index 00000000000000..bd2853788d207b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_eli5_mlm_model_philander_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_eli5_mlm_model_philander_pipeline pipeline RoBertaEmbeddings from PHILANDER +author: John Snow Labs +name: burmese_awesome_eli5_mlm_model_philander_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_eli5_mlm_model_philander_pipeline` is a English model originally trained by PHILANDER. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_eli5_mlm_model_philander_pipeline_en_5.5.0_3.0_1727041630597.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_eli5_mlm_model_philander_pipeline_en_5.5.0_3.0_1727041630597.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_eli5_mlm_model_philander_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_eli5_mlm_model_philander_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_eli5_mlm_model_philander_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|306.5 MB| + +## References + +https://huggingface.co/PHILANDER/my_awesome_eli5_mlm_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_model2_cippppy_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_model2_cippppy_pipeline_en.md new file mode 100644 index 00000000000000..bce4c51a74ca4b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_model2_cippppy_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_model2_cippppy_pipeline pipeline DistilBertForSequenceClassification from Cippppy +author: John Snow Labs +name: burmese_awesome_model2_cippppy_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model2_cippppy_pipeline` is a English model originally trained by Cippppy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model2_cippppy_pipeline_en_5.5.0_3.0_1727020437904.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model2_cippppy_pipeline_en_5.5.0_3.0_1727020437904.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_model2_cippppy_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_model2_cippppy_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model2_cippppy_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|238.3 MB| + +## References + +https://huggingface.co/Cippppy/my_awesome_model2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_model_augmented_en.md b/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_model_augmented_en.md new file mode 100644 index 00000000000000..a47a26b8a459e7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_model_augmented_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_awesome_model_augmented DistilBertForSequenceClassification from Shozi +author: John Snow Labs +name: burmese_awesome_model_augmented +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_augmented` is a English model originally trained by Shozi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_augmented_en_5.5.0_3.0_1727033588312.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_augmented_en_5.5.0_3.0_1727033588312.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_augmented","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_augmented", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_augmented| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Shozi/my_awesome_model_augmented \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_model_augmented_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_model_augmented_pipeline_en.md new file mode 100644 index 00000000000000..9fe7632bc7a34d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_model_augmented_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_model_augmented_pipeline pipeline DistilBertForSequenceClassification from Shozi +author: John Snow Labs +name: burmese_awesome_model_augmented_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_augmented_pipeline` is a English model originally trained by Shozi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_augmented_pipeline_en_5.5.0_3.0_1727033601000.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_augmented_pipeline_en_5.5.0_3.0_1727033601000.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_model_augmented_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_model_augmented_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_augmented_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Shozi/my_awesome_model_augmented + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_model_copa_en.md b/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_model_copa_en.md new file mode 100644 index 00000000000000..240dc3009598ca --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_model_copa_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_awesome_model_copa RoBertaForSequenceClassification from TheoLepere +author: John Snow Labs +name: burmese_awesome_model_copa +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_copa` is a English model originally trained by TheoLepere. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_copa_en_5.5.0_3.0_1726967503269.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_copa_en_5.5.0_3.0_1726967503269.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("burmese_awesome_model_copa","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("burmese_awesome_model_copa", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_copa| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|427.0 MB| + +## References + +https://huggingface.co/TheoLepere/my_awesome_model_copa \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_model_copa_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_model_copa_pipeline_en.md new file mode 100644 index 00000000000000..8ec5652ef1b236 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_model_copa_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_model_copa_pipeline pipeline RoBertaForSequenceClassification from TheoLepere +author: John Snow Labs +name: burmese_awesome_model_copa_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_copa_pipeline` is a English model originally trained by TheoLepere. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_copa_pipeline_en_5.5.0_3.0_1726967529105.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_copa_pipeline_en_5.5.0_3.0_1726967529105.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_model_copa_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_model_copa_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_copa_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|427.0 MB| + +## References + +https://huggingface.co/TheoLepere/my_awesome_model_copa + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_model_fabisor_en.md b/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_model_fabisor_en.md new file mode 100644 index 00000000000000..6e3b244fe7ab6a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_model_fabisor_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_awesome_model_fabisor DistilBertForSequenceClassification from fabisor +author: John Snow Labs +name: burmese_awesome_model_fabisor +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_fabisor` is a English model originally trained by fabisor. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_fabisor_en_5.5.0_3.0_1726980113822.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_fabisor_en_5.5.0_3.0_1726980113822.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_fabisor","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_fabisor", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_fabisor| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/fabisor/my_awesome_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_model_fabisor_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_model_fabisor_pipeline_en.md new file mode 100644 index 00000000000000..17089f29d82655 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_model_fabisor_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_model_fabisor_pipeline pipeline DistilBertForSequenceClassification from fabisor +author: John Snow Labs +name: burmese_awesome_model_fabisor_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_fabisor_pipeline` is a English model originally trained by fabisor. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_fabisor_pipeline_en_5.5.0_3.0_1726980125076.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_fabisor_pipeline_en_5.5.0_3.0_1726980125076.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_model_fabisor_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_model_fabisor_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_fabisor_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/fabisor/my_awesome_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_model_iamaries_en.md b/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_model_iamaries_en.md new file mode 100644 index 00000000000000..8621b3df898b55 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_model_iamaries_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_awesome_model_iamaries DistilBertForSequenceClassification from iamaries +author: John Snow Labs +name: burmese_awesome_model_iamaries +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_iamaries` is a English model originally trained by iamaries. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_iamaries_en_5.5.0_3.0_1726980429985.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_iamaries_en_5.5.0_3.0_1726980429985.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_iamaries","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_iamaries", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_iamaries| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/iamaries/my_awesome_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_model_ian_ailex_en.md b/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_model_ian_ailex_en.md new file mode 100644 index 00000000000000..3a121b109dd4fa --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_model_ian_ailex_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_awesome_model_ian_ailex DistilBertForSequenceClassification from Ian-AILex +author: John Snow Labs +name: burmese_awesome_model_ian_ailex +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_ian_ailex` is a English model originally trained by Ian-AILex. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_ian_ailex_en_5.5.0_3.0_1727021011983.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_ian_ailex_en_5.5.0_3.0_1727021011983.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_ian_ailex","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_ian_ailex", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_ian_ailex| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Ian-AILex/my_awesome_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_model_ian_ailex_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_model_ian_ailex_pipeline_en.md new file mode 100644 index 00000000000000..5786f86531bc52 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_model_ian_ailex_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_model_ian_ailex_pipeline pipeline DistilBertForSequenceClassification from Ian-AILex +author: John Snow Labs +name: burmese_awesome_model_ian_ailex_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_ian_ailex_pipeline` is a English model originally trained by Ian-AILex. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_ian_ailex_pipeline_en_5.5.0_3.0_1727021023167.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_ian_ailex_pipeline_en_5.5.0_3.0_1727021023167.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_model_ian_ailex_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_model_ian_ailex_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_ian_ailex_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Ian-AILex/my_awesome_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_model_jimmy77777_en.md b/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_model_jimmy77777_en.md new file mode 100644 index 00000000000000..76dbec4d3038f3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_model_jimmy77777_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_awesome_model_jimmy77777 DistilBertForSequenceClassification from Jimmy77777 +author: John Snow Labs +name: burmese_awesome_model_jimmy77777 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_jimmy77777` is a English model originally trained by Jimmy77777. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_jimmy77777_en_5.5.0_3.0_1727033372711.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_jimmy77777_en_5.5.0_3.0_1727033372711.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_jimmy77777","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_jimmy77777", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_jimmy77777| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Jimmy77777/my_awesome_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_model_jimmy77777_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_model_jimmy77777_pipeline_en.md new file mode 100644 index 00000000000000..8f0db49481f8c5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_model_jimmy77777_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_model_jimmy77777_pipeline pipeline DistilBertForSequenceClassification from Jimmy77777 +author: John Snow Labs +name: burmese_awesome_model_jimmy77777_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_jimmy77777_pipeline` is a English model originally trained by Jimmy77777. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_jimmy77777_pipeline_en_5.5.0_3.0_1727033385634.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_jimmy77777_pipeline_en_5.5.0_3.0_1727033385634.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_model_jimmy77777_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_model_jimmy77777_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_jimmy77777_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Jimmy77777/my_awesome_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_model_miguelactc27_en.md b/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_model_miguelactc27_en.md new file mode 100644 index 00000000000000..6845b0e5dde2e8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_model_miguelactc27_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_awesome_model_miguelactc27 DistilBertForSequenceClassification from miguelactc27 +author: John Snow Labs +name: burmese_awesome_model_miguelactc27 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_miguelactc27` is a English model originally trained by miguelactc27. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_miguelactc27_en_5.5.0_3.0_1727012435565.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_miguelactc27_en_5.5.0_3.0_1727012435565.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_miguelactc27","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_miguelactc27", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_miguelactc27| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/miguelactc27/my_awesome_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_model_miguelactc27_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_model_miguelactc27_pipeline_en.md new file mode 100644 index 00000000000000..9fb5981e2e5d3f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_model_miguelactc27_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_model_miguelactc27_pipeline pipeline DistilBertForSequenceClassification from miguelactc27 +author: John Snow Labs +name: burmese_awesome_model_miguelactc27_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_miguelactc27_pipeline` is a English model originally trained by miguelactc27. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_miguelactc27_pipeline_en_5.5.0_3.0_1727012448028.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_miguelactc27_pipeline_en_5.5.0_3.0_1727012448028.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_model_miguelactc27_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_model_miguelactc27_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_miguelactc27_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/miguelactc27/my_awesome_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_model_mou11209203_en.md b/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_model_mou11209203_en.md new file mode 100644 index 00000000000000..dc5a63b7278f2e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_model_mou11209203_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_awesome_model_mou11209203 DistilBertForSequenceClassification from Mou11209203 +author: John Snow Labs +name: burmese_awesome_model_mou11209203 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_mou11209203` is a English model originally trained by Mou11209203. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_mou11209203_en_5.5.0_3.0_1727012966134.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_mou11209203_en_5.5.0_3.0_1727012966134.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_mou11209203","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_mou11209203", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_mou11209203| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Mou11209203/my_awesome_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_model_myousfi_en.md b/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_model_myousfi_en.md new file mode 100644 index 00000000000000..8fa18063f4eef8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_model_myousfi_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_awesome_model_myousfi DistilBertForSequenceClassification from myousfi +author: John Snow Labs +name: burmese_awesome_model_myousfi +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_myousfi` is a English model originally trained by myousfi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_myousfi_en_5.5.0_3.0_1727013032120.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_myousfi_en_5.5.0_3.0_1727013032120.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_myousfi","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_myousfi", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_myousfi| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/myousfi/my_awesome_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_model_myousfi_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_model_myousfi_pipeline_en.md new file mode 100644 index 00000000000000..4e81315129b579 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_model_myousfi_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_model_myousfi_pipeline pipeline DistilBertForSequenceClassification from myousfi +author: John Snow Labs +name: burmese_awesome_model_myousfi_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_myousfi_pipeline` is a English model originally trained by myousfi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_myousfi_pipeline_en_5.5.0_3.0_1727013044101.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_myousfi_pipeline_en_5.5.0_3.0_1727013044101.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_model_myousfi_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_model_myousfi_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_myousfi_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/myousfi/my_awesome_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_model_ollamh_en.md b/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_model_ollamh_en.md new file mode 100644 index 00000000000000..25398b465d40cb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_model_ollamh_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_awesome_model_ollamh DistilBertForSequenceClassification from ollamh +author: John Snow Labs +name: burmese_awesome_model_ollamh +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_ollamh` is a English model originally trained by ollamh. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_ollamh_en_5.5.0_3.0_1727012676620.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_ollamh_en_5.5.0_3.0_1727012676620.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_ollamh","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_ollamh", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_ollamh| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/ollamh/my_awesome_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_model_ollamh_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_model_ollamh_pipeline_en.md new file mode 100644 index 00000000000000..410cf736377f29 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_model_ollamh_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_model_ollamh_pipeline pipeline DistilBertForSequenceClassification from ollamh +author: John Snow Labs +name: burmese_awesome_model_ollamh_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_ollamh_pipeline` is a English model originally trained by ollamh. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_ollamh_pipeline_en_5.5.0_3.0_1727012688309.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_ollamh_pipeline_en_5.5.0_3.0_1727012688309.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_model_ollamh_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_model_ollamh_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_ollamh_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/ollamh/my_awesome_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_model_soilspoon_en.md b/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_model_soilspoon_en.md new file mode 100644 index 00000000000000..0363bac41a3263 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_model_soilspoon_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_awesome_model_soilspoon DistilBertForSequenceClassification from soilSpoon +author: John Snow Labs +name: burmese_awesome_model_soilspoon +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_soilspoon` is a English model originally trained by soilSpoon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_soilspoon_en_5.5.0_3.0_1727012336661.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_soilspoon_en_5.5.0_3.0_1727012336661.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_soilspoon","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_soilspoon", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_soilspoon| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/soilSpoon/my_awesome_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_model_soilspoon_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_model_soilspoon_pipeline_en.md new file mode 100644 index 00000000000000..276dd95f819c5e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_model_soilspoon_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_model_soilspoon_pipeline pipeline DistilBertForSequenceClassification from soilSpoon +author: John Snow Labs +name: burmese_awesome_model_soilspoon_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_soilspoon_pipeline` is a English model originally trained by soilSpoon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_soilspoon_pipeline_en_5.5.0_3.0_1727012349220.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_soilspoon_pipeline_en_5.5.0_3.0_1727012349220.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_model_soilspoon_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_model_soilspoon_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_soilspoon_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/soilSpoon/my_awesome_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_model_wwwjjj_en.md b/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_model_wwwjjj_en.md new file mode 100644 index 00000000000000..fe79bb7f33c9f9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_model_wwwjjj_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_awesome_model_wwwjjj DistilBertForSequenceClassification from wwwjjj +author: John Snow Labs +name: burmese_awesome_model_wwwjjj +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_wwwjjj` is a English model originally trained by wwwjjj. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_wwwjjj_en_5.5.0_3.0_1727035090275.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_wwwjjj_en_5.5.0_3.0_1727035090275.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_wwwjjj","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_wwwjjj", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_wwwjjj| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/wwwjjj/my_awesome_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_model_wwwjjj_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_model_wwwjjj_pipeline_en.md new file mode 100644 index 00000000000000..f48e07f1f95a55 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_model_wwwjjj_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_model_wwwjjj_pipeline pipeline DistilBertForSequenceClassification from wwwjjj +author: John Snow Labs +name: burmese_awesome_model_wwwjjj_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_wwwjjj_pipeline` is a English model originally trained by wwwjjj. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_wwwjjj_pipeline_en_5.5.0_3.0_1727035103149.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_wwwjjj_pipeline_en_5.5.0_3.0_1727035103149.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_model_wwwjjj_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_model_wwwjjj_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_wwwjjj_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/wwwjjj/my_awesome_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_model_yjl814_en.md b/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_model_yjl814_en.md new file mode 100644 index 00000000000000..1a2e2c6beda46e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_model_yjl814_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_awesome_model_yjl814 DistilBertForSequenceClassification from YJL814 +author: John Snow Labs +name: burmese_awesome_model_yjl814 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_yjl814` is a English model originally trained by YJL814. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_yjl814_en_5.5.0_3.0_1727020897825.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_yjl814_en_5.5.0_3.0_1727020897825.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_yjl814","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_yjl814", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_yjl814| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/YJL814/my_awesome_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_model_yjl814_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_model_yjl814_pipeline_en.md new file mode 100644 index 00000000000000..9d9ad283496d3e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_model_yjl814_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_model_yjl814_pipeline pipeline DistilBertForSequenceClassification from YJL814 +author: John Snow Labs +name: burmese_awesome_model_yjl814_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_yjl814_pipeline` is a English model originally trained by YJL814. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_yjl814_pipeline_en_5.5.0_3.0_1727020910119.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_yjl814_pipeline_en_5.5.0_3.0_1727020910119.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_model_yjl814_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_model_yjl814_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_yjl814_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/YJL814/my_awesome_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_qa_model_hellfox17_en.md b/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_qa_model_hellfox17_en.md new file mode 100644 index 00000000000000..fc942a654149db --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_qa_model_hellfox17_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English burmese_awesome_qa_model_hellfox17 DistilBertForQuestionAnswering from Hellfox17 +author: John Snow Labs +name: burmese_awesome_qa_model_hellfox17 +date: 2024-09-22 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_qa_model_hellfox17` is a English model originally trained by Hellfox17. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_hellfox17_en_5.5.0_3.0_1726963741909.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_hellfox17_en_5.5.0_3.0_1726963741909.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("burmese_awesome_qa_model_hellfox17","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("burmese_awesome_qa_model_hellfox17", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_qa_model_hellfox17| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/Hellfox17/my_awesome_qa_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_qa_model_hellfox17_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_qa_model_hellfox17_pipeline_en.md new file mode 100644 index 00000000000000..7491e53bfad285 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_qa_model_hellfox17_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English burmese_awesome_qa_model_hellfox17_pipeline pipeline DistilBertForQuestionAnswering from Hellfox17 +author: John Snow Labs +name: burmese_awesome_qa_model_hellfox17_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_qa_model_hellfox17_pipeline` is a English model originally trained by Hellfox17. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_hellfox17_pipeline_en_5.5.0_3.0_1726963753091.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_hellfox17_pipeline_en_5.5.0_3.0_1726963753091.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_qa_model_hellfox17_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_qa_model_hellfox17_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_qa_model_hellfox17_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.3 MB| + +## References + +https://huggingface.co/Hellfox17/my_awesome_qa_model + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-burmese_bert_question_answering_model5_en.md b/docs/_posts/ahmedlone127/2024-09-22-burmese_bert_question_answering_model5_en.md new file mode 100644 index 00000000000000..43985cb9bf4404 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-burmese_bert_question_answering_model5_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English burmese_bert_question_answering_model5 BertForQuestionAnswering from Ashkh0099 +author: John Snow Labs +name: burmese_bert_question_answering_model5 +date: 2024-09-22 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_bert_question_answering_model5` is a English model originally trained by Ashkh0099. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_bert_question_answering_model5_en_5.5.0_3.0_1727039398511.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_bert_question_answering_model5_en_5.5.0_3.0_1727039398511.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("burmese_bert_question_answering_model5","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("burmese_bert_question_answering_model5", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_bert_question_answering_model5| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/Ashkh0099/my-bert-question-answering-model5 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-burmese_bert_question_answering_model5_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-burmese_bert_question_answering_model5_pipeline_en.md new file mode 100644 index 00000000000000..2e41d7cc3f29cd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-burmese_bert_question_answering_model5_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English burmese_bert_question_answering_model5_pipeline pipeline BertForQuestionAnswering from Ashkh0099 +author: John Snow Labs +name: burmese_bert_question_answering_model5_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_bert_question_answering_model5_pipeline` is a English model originally trained by Ashkh0099. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_bert_question_answering_model5_pipeline_en_5.5.0_3.0_1727039419547.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_bert_question_answering_model5_pipeline_en_5.5.0_3.0_1727039419547.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_bert_question_answering_model5_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_bert_question_answering_model5_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_bert_question_answering_model5_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/Ashkh0099/my-bert-question-answering-model5 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-burmese_fine_tuned_distilbert_lr_0_001_en.md b/docs/_posts/ahmedlone127/2024-09-22-burmese_fine_tuned_distilbert_lr_0_001_en.md new file mode 100644 index 00000000000000..bc844ee657ddfe --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-burmese_fine_tuned_distilbert_lr_0_001_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_fine_tuned_distilbert_lr_0_001 DistilBertForSequenceClassification from Benuehlinger +author: John Snow Labs +name: burmese_fine_tuned_distilbert_lr_0_001 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_fine_tuned_distilbert_lr_0_001` is a English model originally trained by Benuehlinger. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_fine_tuned_distilbert_lr_0_001_en_5.5.0_3.0_1726979997437.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_fine_tuned_distilbert_lr_0_001_en_5.5.0_3.0_1726979997437.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_fine_tuned_distilbert_lr_0_001","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_fine_tuned_distilbert_lr_0_001", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_fine_tuned_distilbert_lr_0_001| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.4 MB| + +## References + +https://huggingface.co/Benuehlinger/my-fine-tuned-distilbert-lr-0.001 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-burmese_fine_tuned_distilbert_lr_0_001_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-burmese_fine_tuned_distilbert_lr_0_001_pipeline_en.md new file mode 100644 index 00000000000000..c654be6a6839ca --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-burmese_fine_tuned_distilbert_lr_0_001_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_fine_tuned_distilbert_lr_0_001_pipeline pipeline DistilBertForSequenceClassification from Benuehlinger +author: John Snow Labs +name: burmese_fine_tuned_distilbert_lr_0_001_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_fine_tuned_distilbert_lr_0_001_pipeline` is a English model originally trained by Benuehlinger. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_fine_tuned_distilbert_lr_0_001_pipeline_en_5.5.0_3.0_1726980013826.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_fine_tuned_distilbert_lr_0_001_pipeline_en_5.5.0_3.0_1726980013826.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_fine_tuned_distilbert_lr_0_001_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_fine_tuned_distilbert_lr_0_001_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_fine_tuned_distilbert_lr_0_001_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Benuehlinger/my-fine-tuned-distilbert-lr-0.001 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-burmese_fine_tuned_distilbert_lr_1e_05_en.md b/docs/_posts/ahmedlone127/2024-09-22-burmese_fine_tuned_distilbert_lr_1e_05_en.md new file mode 100644 index 00000000000000..2eff3bfc08a093 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-burmese_fine_tuned_distilbert_lr_1e_05_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_fine_tuned_distilbert_lr_1e_05 DistilBertForSequenceClassification from Benuehlinger +author: John Snow Labs +name: burmese_fine_tuned_distilbert_lr_1e_05 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_fine_tuned_distilbert_lr_1e_05` is a English model originally trained by Benuehlinger. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_fine_tuned_distilbert_lr_1e_05_en_5.5.0_3.0_1727020895549.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_fine_tuned_distilbert_lr_1e_05_en_5.5.0_3.0_1727020895549.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_fine_tuned_distilbert_lr_1e_05","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_fine_tuned_distilbert_lr_1e_05", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_fine_tuned_distilbert_lr_1e_05| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Benuehlinger/my-fine-tuned-distilbert-lr-1e-05 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-burmese_nepal_bhasa_model_en.md b/docs/_posts/ahmedlone127/2024-09-22-burmese_nepal_bhasa_model_en.md new file mode 100644 index 00000000000000..c3e970cf5ed0d7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-burmese_nepal_bhasa_model_en.md @@ -0,0 +1,100 @@ +--- +layout: model +title: English burmese_nepal_bhasa_model DistilBertForSequenceClassification from CohleM +author: John Snow Labs +name: burmese_nepal_bhasa_model +date: 2024-09-22 +tags: [bert, en, open_source, sequence_classification, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_nepal_bhasa_model` is a English model originally trained by CohleM. + +## Predicted Entities + + + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_nepal_bhasa_model_en_5.5.0_3.0_1727012668739.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_nepal_bhasa_model_en_5.5.0_3.0_1727012668739.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +document_assembler = DocumentAssembler()\ + .setInputCol("text")\ + .setOutputCol("document") + +tokenizer = Tokenizer()\ + .setInputCols("document")\ + .setOutputCol("token") + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_nepal_bhasa_model","en")\ + .setInputCols(["document","token"])\ + .setOutputCol("class") + +pipeline = Pipeline().setStages([document_assembler, tokenizer, sequenceClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val document_assembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_nepal_bhasa_model","en") + .setInputCols(Array("document","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_nepal_bhasa_model| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +References + +References + +https://huggingface.co/CohleM/my_new_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-burmese_nepal_bhasa_model_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-burmese_nepal_bhasa_model_pipeline_en.md new file mode 100644 index 00000000000000..8a1b6c434e6372 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-burmese_nepal_bhasa_model_pipeline_en.md @@ -0,0 +1,72 @@ +--- +layout: model +title: English burmese_nepal_bhasa_model_pipeline pipeline RoBertaForQuestionAnswering from steffipriyanka +author: John Snow Labs +name: burmese_nepal_bhasa_model_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_nepal_bhasa_model_pipeline` is a English model originally trained by steffipriyanka. + +## Predicted Entities + + + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_nepal_bhasa_model_pipeline_en_5.5.0_3.0_1727012681091.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_nepal_bhasa_model_pipeline_en_5.5.0_3.0_1727012681091.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +pipeline = PretrainedPipeline("burmese_nepal_bhasa_model_pipeline", lang = "en") +annotations = pipeline.transform(df) +``` +```scala +val pipeline = new PretrainedPipeline("burmese_nepal_bhasa_model_pipeline", lang = "en") +val annotations = pipeline.transform(df) +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_nepal_bhasa_model_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +References + +https://huggingface.co/steffipriyanka/my_new_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-cat_ner_iwcg_3_en.md b/docs/_posts/ahmedlone127/2024-09-22-cat_ner_iwcg_3_en.md new file mode 100644 index 00000000000000..f2e80a6147df69 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-cat_ner_iwcg_3_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English cat_ner_iwcg_3 XlmRoBertaForTokenClassification from homersimpson +author: John Snow Labs +name: cat_ner_iwcg_3 +date: 2024-09-22 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`cat_ner_iwcg_3` is a English model originally trained by homersimpson. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/cat_ner_iwcg_3_en_5.5.0_3.0_1727019075841.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/cat_ner_iwcg_3_en_5.5.0_3.0_1727019075841.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("cat_ner_iwcg_3","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("cat_ner_iwcg_3", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|cat_ner_iwcg_3| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|430.8 MB| + +## References + +https://huggingface.co/homersimpson/cat-ner-iwcg-3 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-cat_ner_iwcg_3_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-cat_ner_iwcg_3_pipeline_en.md new file mode 100644 index 00000000000000..8d5a0ddd61a2ed --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-cat_ner_iwcg_3_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English cat_ner_iwcg_3_pipeline pipeline XlmRoBertaForTokenClassification from homersimpson +author: John Snow Labs +name: cat_ner_iwcg_3_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`cat_ner_iwcg_3_pipeline` is a English model originally trained by homersimpson. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/cat_ner_iwcg_3_pipeline_en_5.5.0_3.0_1727019103263.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/cat_ner_iwcg_3_pipeline_en_5.5.0_3.0_1727019103263.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("cat_ner_iwcg_3_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("cat_ner_iwcg_3_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|cat_ner_iwcg_3_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|430.9 MB| + +## References + +https://huggingface.co/homersimpson/cat-ner-iwcg-3 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-cat_ner_iwcg_5_en.md b/docs/_posts/ahmedlone127/2024-09-22-cat_ner_iwcg_5_en.md new file mode 100644 index 00000000000000..90a8f50f19df20 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-cat_ner_iwcg_5_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English cat_ner_iwcg_5 XlmRoBertaForTokenClassification from homersimpson +author: John Snow Labs +name: cat_ner_iwcg_5 +date: 2024-09-22 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`cat_ner_iwcg_5` is a English model originally trained by homersimpson. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/cat_ner_iwcg_5_en_5.5.0_3.0_1726970024200.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/cat_ner_iwcg_5_en_5.5.0_3.0_1726970024200.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("cat_ner_iwcg_5","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("cat_ner_iwcg_5", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|cat_ner_iwcg_5| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|430.8 MB| + +## References + +https://huggingface.co/homersimpson/cat-ner-iwcg-5 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-cat_ner_iwcg_5_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-cat_ner_iwcg_5_pipeline_en.md new file mode 100644 index 00000000000000..b8d78058cd9d9b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-cat_ner_iwcg_5_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English cat_ner_iwcg_5_pipeline pipeline XlmRoBertaForTokenClassification from homersimpson +author: John Snow Labs +name: cat_ner_iwcg_5_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`cat_ner_iwcg_5_pipeline` is a English model originally trained by homersimpson. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/cat_ner_iwcg_5_pipeline_en_5.5.0_3.0_1726970054170.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/cat_ner_iwcg_5_pipeline_en_5.5.0_3.0_1726970054170.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("cat_ner_iwcg_5_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("cat_ner_iwcg_5_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|cat_ner_iwcg_5_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|430.9 MB| + +## References + +https://huggingface.co/homersimpson/cat-ner-iwcg-5 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-cat_sayula_popoluca_iwcg_5_en.md b/docs/_posts/ahmedlone127/2024-09-22-cat_sayula_popoluca_iwcg_5_en.md new file mode 100644 index 00000000000000..fd37140a26b179 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-cat_sayula_popoluca_iwcg_5_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English cat_sayula_popoluca_iwcg_5 XlmRoBertaForTokenClassification from homersimpson +author: John Snow Labs +name: cat_sayula_popoluca_iwcg_5 +date: 2024-09-22 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`cat_sayula_popoluca_iwcg_5` is a English model originally trained by homersimpson. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/cat_sayula_popoluca_iwcg_5_en_5.5.0_3.0_1726970130136.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/cat_sayula_popoluca_iwcg_5_en_5.5.0_3.0_1726970130136.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("cat_sayula_popoluca_iwcg_5","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("cat_sayula_popoluca_iwcg_5", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|cat_sayula_popoluca_iwcg_5| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|432.1 MB| + +## References + +https://huggingface.co/homersimpson/cat-pos-iwcg-5 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-cat_sayula_popoluca_iwcg_5_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-cat_sayula_popoluca_iwcg_5_pipeline_en.md new file mode 100644 index 00000000000000..b61a4fa6dfa61f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-cat_sayula_popoluca_iwcg_5_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English cat_sayula_popoluca_iwcg_5_pipeline pipeline XlmRoBertaForTokenClassification from homersimpson +author: John Snow Labs +name: cat_sayula_popoluca_iwcg_5_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`cat_sayula_popoluca_iwcg_5_pipeline` is a English model originally trained by homersimpson. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/cat_sayula_popoluca_iwcg_5_pipeline_en_5.5.0_3.0_1726970158239.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/cat_sayula_popoluca_iwcg_5_pipeline_en_5.5.0_3.0_1726970158239.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("cat_sayula_popoluca_iwcg_5_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("cat_sayula_popoluca_iwcg_5_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|cat_sayula_popoluca_iwcg_5_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|432.1 MB| + +## References + +https://huggingface.co/homersimpson/cat-pos-iwcg-5 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-cer_model_iiii_en.md b/docs/_posts/ahmedlone127/2024-09-22-cer_model_iiii_en.md new file mode 100644 index 00000000000000..6140f86011ff67 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-cer_model_iiii_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English cer_model_iiii BertForTokenClassification from urbija +author: John Snow Labs +name: cer_model_iiii +date: 2024-09-22 +tags: [en, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`cer_model_iiii` is a English model originally trained by urbija. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/cer_model_iiii_en_5.5.0_3.0_1726974471856.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/cer_model_iiii_en_5.5.0_3.0_1726974471856.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("cer_model_iiii","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("cer_model_iiii", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|cer_model_iiii| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.3 MB| + +## References + +https://huggingface.co/urbija/cer_model-iiii \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-cer_model_iiii_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-cer_model_iiii_pipeline_en.md new file mode 100644 index 00000000000000..302345412e6024 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-cer_model_iiii_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English cer_model_iiii_pipeline pipeline BertForTokenClassification from urbija +author: John Snow Labs +name: cer_model_iiii_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`cer_model_iiii_pipeline` is a English model originally trained by urbija. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/cer_model_iiii_pipeline_en_5.5.0_3.0_1726974489573.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/cer_model_iiii_pipeline_en_5.5.0_3.0_1726974489573.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("cer_model_iiii_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("cer_model_iiii_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|cer_model_iiii_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|403.3 MB| + +## References + +https://huggingface.co/urbija/cer_model-iiii + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-chatgpt_essay_llms_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-chatgpt_essay_llms_pipeline_en.md new file mode 100644 index 00000000000000..d8f108a2cb4dbc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-chatgpt_essay_llms_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English chatgpt_essay_llms_pipeline pipeline DistilBertForSequenceClassification from huyen89 +author: John Snow Labs +name: chatgpt_essay_llms_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`chatgpt_essay_llms_pipeline` is a English model originally trained by huyen89. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/chatgpt_essay_llms_pipeline_en_5.5.0_3.0_1726980705031.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/chatgpt_essay_llms_pipeline_en_5.5.0_3.0_1726980705031.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("chatgpt_essay_llms_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("chatgpt_essay_llms_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|chatgpt_essay_llms_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/huyen89/ChatGPT-Essay_LLMs + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-chemberta_zinc250k_v1_en.md b/docs/_posts/ahmedlone127/2024-09-22-chemberta_zinc250k_v1_en.md new file mode 100644 index 00000000000000..4a27e31106e802 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-chemberta_zinc250k_v1_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English chemberta_zinc250k_v1 RoBertaEmbeddings from seyonec +author: John Snow Labs +name: chemberta_zinc250k_v1 +date: 2024-09-22 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`chemberta_zinc250k_v1` is a English model originally trained by seyonec. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/chemberta_zinc250k_v1_en_5.5.0_3.0_1726999929591.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/chemberta_zinc250k_v1_en_5.5.0_3.0_1726999929591.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("chemberta_zinc250k_v1","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("chemberta_zinc250k_v1","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|chemberta_zinc250k_v1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|309.9 MB| + +## References + +https://huggingface.co/seyonec/ChemBERTa-zinc250k-v1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-chemberta_zinc250k_v1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-chemberta_zinc250k_v1_pipeline_en.md new file mode 100644 index 00000000000000..627a998bbcc22f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-chemberta_zinc250k_v1_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English chemberta_zinc250k_v1_pipeline pipeline RoBertaEmbeddings from seyonec +author: John Snow Labs +name: chemberta_zinc250k_v1_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`chemberta_zinc250k_v1_pipeline` is a English model originally trained by seyonec. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/chemberta_zinc250k_v1_pipeline_en_5.5.0_3.0_1726999943429.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/chemberta_zinc250k_v1_pipeline_en_5.5.0_3.0_1726999943429.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("chemberta_zinc250k_v1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("chemberta_zinc250k_v1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|chemberta_zinc250k_v1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|309.9 MB| + +## References + +https://huggingface.co/seyonec/ChemBERTa-zinc250k-v1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-chinese_extract_bert_en.md b/docs/_posts/ahmedlone127/2024-09-22-chinese_extract_bert_en.md new file mode 100644 index 00000000000000..cd9b6dd56f6d95 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-chinese_extract_bert_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English chinese_extract_bert BertForQuestionAnswering from frett +author: John Snow Labs +name: chinese_extract_bert +date: 2024-09-22 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`chinese_extract_bert` is a English model originally trained by frett. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/chinese_extract_bert_en_5.5.0_3.0_1727039824473.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/chinese_extract_bert_en_5.5.0_3.0_1727039824473.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("chinese_extract_bert","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("chinese_extract_bert", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|chinese_extract_bert| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|381.0 MB| + +## References + +https://huggingface.co/frett/chinese_extract_bert \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-chinese_extract_bert_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-chinese_extract_bert_pipeline_en.md new file mode 100644 index 00000000000000..ddc1656be95cef --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-chinese_extract_bert_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English chinese_extract_bert_pipeline pipeline BertForQuestionAnswering from frett +author: John Snow Labs +name: chinese_extract_bert_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`chinese_extract_bert_pipeline` is a English model originally trained by frett. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/chinese_extract_bert_pipeline_en_5.5.0_3.0_1727039843187.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/chinese_extract_bert_pipeline_en_5.5.0_3.0_1727039843187.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("chinese_extract_bert_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("chinese_extract_bert_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|chinese_extract_bert_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|381.0 MB| + +## References + +https://huggingface.co/frett/chinese_extract_bert + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-clasificadormotivomora_distilbert_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-clasificadormotivomora_distilbert_pipeline_en.md new file mode 100644 index 00000000000000..22755ab50b7fff --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-clasificadormotivomora_distilbert_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English clasificadormotivomora_distilbert_pipeline pipeline DistilBertForSequenceClassification from Arodrigo +author: John Snow Labs +name: clasificadormotivomora_distilbert_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`clasificadormotivomora_distilbert_pipeline` is a English model originally trained by Arodrigo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/clasificadormotivomora_distilbert_pipeline_en_5.5.0_3.0_1727035506642.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/clasificadormotivomora_distilbert_pipeline_en_5.5.0_3.0_1727035506642.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("clasificadormotivomora_distilbert_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("clasificadormotivomora_distilbert_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|clasificadormotivomora_distilbert_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Arodrigo/ClasificadorMotivoMora-Distilbert + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-classification_en.md b/docs/_posts/ahmedlone127/2024-09-22-classification_en.md new file mode 100644 index 00000000000000..ba4d07657fd8c2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-classification_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English classification DistilBertForSequenceClassification from MaX0214 +author: John Snow Labs +name: classification +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`classification` is a English model originally trained by MaX0214. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/classification_en_5.5.0_3.0_1727020992972.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/classification_en_5.5.0_3.0_1727020992972.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("classification","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("classification", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|classification| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/MaX0214/classification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-classification_model_en.md b/docs/_posts/ahmedlone127/2024-09-22-classification_model_en.md new file mode 100644 index 00000000000000..055e16a26d4d18 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-classification_model_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English classification_model RoBertaForSequenceClassification from skelley +author: John Snow Labs +name: classification_model +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`classification_model` is a English model originally trained by skelley. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/classification_model_en_5.5.0_3.0_1727017445980.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/classification_model_en_5.5.0_3.0_1727017445980.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("classification_model","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("classification_model", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|classification_model| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|421.5 MB| + +## References + +https://huggingface.co/skelley/classification_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-classification_model_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-classification_model_pipeline_en.md new file mode 100644 index 00000000000000..d5e1510a9a0ad0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-classification_model_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English classification_model_pipeline pipeline RoBertaForSequenceClassification from skelley +author: John Snow Labs +name: classification_model_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`classification_model_pipeline` is a English model originally trained by skelley. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/classification_model_pipeline_en_5.5.0_3.0_1727017480089.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/classification_model_pipeline_en_5.5.0_3.0_1727017480089.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("classification_model_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("classification_model_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|classification_model_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|421.5 MB| + +## References + +https://huggingface.co/skelley/classification_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-classification_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-classification_pipeline_en.md new file mode 100644 index 00000000000000..9634399f8e91b6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-classification_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English classification_pipeline pipeline DistilBertForSequenceClassification from MaX0214 +author: John Snow Labs +name: classification_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`classification_pipeline` is a English model originally trained by MaX0214. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/classification_pipeline_en_5.5.0_3.0_1727021006159.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/classification_pipeline_en_5.5.0_3.0_1727021006159.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("classification_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("classification_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|classification_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/MaX0214/classification + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-classifiereutoplevelroberta_en.md b/docs/_posts/ahmedlone127/2024-09-22-classifiereutoplevelroberta_en.md new file mode 100644 index 00000000000000..3423598e3b1cc4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-classifiereutoplevelroberta_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English classifiereutoplevelroberta RoBertaForSequenceClassification from gianma +author: John Snow Labs +name: classifiereutoplevelroberta +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`classifiereutoplevelroberta` is a English model originally trained by gianma. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/classifiereutoplevelroberta_en_5.5.0_3.0_1727037880863.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/classifiereutoplevelroberta_en_5.5.0_3.0_1727037880863.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("classifiereutoplevelroberta","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("classifiereutoplevelroberta", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|classifiereutoplevelroberta| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/gianma/classifierEUtopLevelRoberta \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-classifiereutoplevelroberta_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-classifiereutoplevelroberta_pipeline_en.md new file mode 100644 index 00000000000000..1625401dc08b09 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-classifiereutoplevelroberta_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English classifiereutoplevelroberta_pipeline pipeline RoBertaForSequenceClassification from gianma +author: John Snow Labs +name: classifiereutoplevelroberta_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`classifiereutoplevelroberta_pipeline` is a English model originally trained by gianma. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/classifiereutoplevelroberta_pipeline_en_5.5.0_3.0_1727037950517.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/classifiereutoplevelroberta_pipeline_en_5.5.0_3.0_1727037950517.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("classifiereutoplevelroberta_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("classifiereutoplevelroberta_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|classifiereutoplevelroberta_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/gianma/classifierEUtopLevelRoberta + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-clinicalbertqa_200_en.md b/docs/_posts/ahmedlone127/2024-09-22-clinicalbertqa_200_en.md new file mode 100644 index 00000000000000..fec4c381b4dc15 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-clinicalbertqa_200_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English clinicalbertqa_200 BertForQuestionAnswering from lanzv +author: John Snow Labs +name: clinicalbertqa_200 +date: 2024-09-22 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`clinicalbertqa_200` is a English model originally trained by lanzv. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/clinicalbertqa_200_en_5.5.0_3.0_1727049191313.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/clinicalbertqa_200_en_5.5.0_3.0_1727049191313.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("clinicalbertqa_200","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("clinicalbertqa_200", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|clinicalbertqa_200| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|403.3 MB| + +## References + +https://huggingface.co/lanzv/ClinicalBERTQA_200 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-clinicalbertqa_200_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-clinicalbertqa_200_pipeline_en.md new file mode 100644 index 00000000000000..5ed5e55aa56721 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-clinicalbertqa_200_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English clinicalbertqa_200_pipeline pipeline BertForQuestionAnswering from lanzv +author: John Snow Labs +name: clinicalbertqa_200_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`clinicalbertqa_200_pipeline` is a English model originally trained by lanzv. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/clinicalbertqa_200_pipeline_en_5.5.0_3.0_1727049214498.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/clinicalbertqa_200_pipeline_en_5.5.0_3.0_1727049214498.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("clinicalbertqa_200_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("clinicalbertqa_200_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|clinicalbertqa_200_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|403.3 MB| + +## References + +https://huggingface.co/lanzv/ClinicalBERTQA_200 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-coderbert_finetuned_detect_vulnerability_on_msr_en.md b/docs/_posts/ahmedlone127/2024-09-22-coderbert_finetuned_detect_vulnerability_on_msr_en.md new file mode 100644 index 00000000000000..d13627331f8f44 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-coderbert_finetuned_detect_vulnerability_on_msr_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English coderbert_finetuned_detect_vulnerability_on_msr RoBertaForSequenceClassification from starmage520 +author: John Snow Labs +name: coderbert_finetuned_detect_vulnerability_on_msr +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`coderbert_finetuned_detect_vulnerability_on_msr` is a English model originally trained by starmage520. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/coderbert_finetuned_detect_vulnerability_on_msr_en_5.5.0_3.0_1726967388275.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/coderbert_finetuned_detect_vulnerability_on_msr_en_5.5.0_3.0_1726967388275.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("coderbert_finetuned_detect_vulnerability_on_msr","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("coderbert_finetuned_detect_vulnerability_on_msr", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|coderbert_finetuned_detect_vulnerability_on_msr| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|468.3 MB| + +## References + +https://huggingface.co/starmage520/Coderbert_finetuned_detect_vulnerability_on_MSR \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-coderbert_finetuned_detect_vulnerability_on_msr_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-coderbert_finetuned_detect_vulnerability_on_msr_pipeline_en.md new file mode 100644 index 00000000000000..ee47c00f604de6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-coderbert_finetuned_detect_vulnerability_on_msr_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English coderbert_finetuned_detect_vulnerability_on_msr_pipeline pipeline RoBertaForSequenceClassification from starmage520 +author: John Snow Labs +name: coderbert_finetuned_detect_vulnerability_on_msr_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`coderbert_finetuned_detect_vulnerability_on_msr_pipeline` is a English model originally trained by starmage520. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/coderbert_finetuned_detect_vulnerability_on_msr_pipeline_en_5.5.0_3.0_1726967409571.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/coderbert_finetuned_detect_vulnerability_on_msr_pipeline_en_5.5.0_3.0_1726967409571.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("coderbert_finetuned_detect_vulnerability_on_msr_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("coderbert_finetuned_detect_vulnerability_on_msr_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|coderbert_finetuned_detect_vulnerability_on_msr_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|468.4 MB| + +## References + +https://huggingface.co/starmage520/Coderbert_finetuned_detect_vulnerability_on_MSR + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-coha1940s_en.md b/docs/_posts/ahmedlone127/2024-09-22-coha1940s_en.md new file mode 100644 index 00000000000000..ff72b220a4a78b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-coha1940s_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English coha1940s RoBertaEmbeddings from simonmun +author: John Snow Labs +name: coha1940s +date: 2024-09-22 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`coha1940s` is a English model originally trained by simonmun. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/coha1940s_en_5.5.0_3.0_1726999704388.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/coha1940s_en_5.5.0_3.0_1726999704388.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("coha1940s","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("coha1940s","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|coha1940s| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|311.8 MB| + +## References + +https://huggingface.co/simonmun/COHA1940s \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-coha1940s_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-coha1940s_pipeline_en.md new file mode 100644 index 00000000000000..2e0dd6b43be444 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-coha1940s_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English coha1940s_pipeline pipeline RoBertaEmbeddings from simonmun +author: John Snow Labs +name: coha1940s_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`coha1940s_pipeline` is a English model originally trained by simonmun. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/coha1940s_pipeline_en_5.5.0_3.0_1726999720356.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/coha1940s_pipeline_en_5.5.0_3.0_1726999720356.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("coha1940s_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("coha1940s_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|coha1940s_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|311.8 MB| + +## References + +https://huggingface.co/simonmun/COHA1940s + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-cold_fusion_itr2_seed4_en.md b/docs/_posts/ahmedlone127/2024-09-22-cold_fusion_itr2_seed4_en.md new file mode 100644 index 00000000000000..478c5c915030b3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-cold_fusion_itr2_seed4_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English cold_fusion_itr2_seed4 RoBertaForSequenceClassification from ibm +author: John Snow Labs +name: cold_fusion_itr2_seed4 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`cold_fusion_itr2_seed4` is a English model originally trained by ibm. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/cold_fusion_itr2_seed4_en_5.5.0_3.0_1726967628751.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/cold_fusion_itr2_seed4_en_5.5.0_3.0_1726967628751.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("cold_fusion_itr2_seed4","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("cold_fusion_itr2_seed4", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|cold_fusion_itr2_seed4| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|467.8 MB| + +## References + +https://huggingface.co/ibm/ColD-Fusion-itr2-seed4 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-correct_bert_token_itr0_0_0001_editorials_01_03_2022_15_50_21_en.md b/docs/_posts/ahmedlone127/2024-09-22-correct_bert_token_itr0_0_0001_editorials_01_03_2022_15_50_21_en.md new file mode 100644 index 00000000000000..8446985d2dd722 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-correct_bert_token_itr0_0_0001_editorials_01_03_2022_15_50_21_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English correct_bert_token_itr0_0_0001_editorials_01_03_2022_15_50_21 BertForTokenClassification from ali2066 +author: John Snow Labs +name: correct_bert_token_itr0_0_0001_editorials_01_03_2022_15_50_21 +date: 2024-09-22 +tags: [en, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`correct_bert_token_itr0_0_0001_editorials_01_03_2022_15_50_21` is a English model originally trained by ali2066. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/correct_bert_token_itr0_0_0001_editorials_01_03_2022_15_50_21_en_5.5.0_3.0_1727031327438.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/correct_bert_token_itr0_0_0001_editorials_01_03_2022_15_50_21_en_5.5.0_3.0_1727031327438.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("correct_bert_token_itr0_0_0001_editorials_01_03_2022_15_50_21","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("correct_bert_token_itr0_0_0001_editorials_01_03_2022_15_50_21", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|correct_bert_token_itr0_0_0001_editorials_01_03_2022_15_50_21| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/ali2066/correct_BERT_token_itr0_0.0001_editorials_01_03_2022-15_50_21 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-correct_bert_token_itr0_0_0001_editorials_01_03_2022_15_50_21_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-correct_bert_token_itr0_0_0001_editorials_01_03_2022_15_50_21_pipeline_en.md new file mode 100644 index 00000000000000..59344cbc765962 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-correct_bert_token_itr0_0_0001_editorials_01_03_2022_15_50_21_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English correct_bert_token_itr0_0_0001_editorials_01_03_2022_15_50_21_pipeline pipeline BertForTokenClassification from ali2066 +author: John Snow Labs +name: correct_bert_token_itr0_0_0001_editorials_01_03_2022_15_50_21_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`correct_bert_token_itr0_0_0001_editorials_01_03_2022_15_50_21_pipeline` is a English model originally trained by ali2066. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/correct_bert_token_itr0_0_0001_editorials_01_03_2022_15_50_21_pipeline_en_5.5.0_3.0_1727031347646.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/correct_bert_token_itr0_0_0001_editorials_01_03_2022_15_50_21_pipeline_en_5.5.0_3.0_1727031347646.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("correct_bert_token_itr0_0_0001_editorials_01_03_2022_15_50_21_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("correct_bert_token_itr0_0_0001_editorials_01_03_2022_15_50_21_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|correct_bert_token_itr0_0_0001_editorials_01_03_2022_15_50_21_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/ali2066/correct_BERT_token_itr0_0.0001_editorials_01_03_2022-15_50_21 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-covid_19_vaccination_tweet_stance_en.md b/docs/_posts/ahmedlone127/2024-09-22-covid_19_vaccination_tweet_stance_en.md new file mode 100644 index 00000000000000..4d3b8a1c2fd070 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-covid_19_vaccination_tweet_stance_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English covid_19_vaccination_tweet_stance BertForSequenceClassification from seantw +author: John Snow Labs +name: covid_19_vaccination_tweet_stance +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`covid_19_vaccination_tweet_stance` is a English model originally trained by seantw. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/covid_19_vaccination_tweet_stance_en_5.5.0_3.0_1727007264751.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/covid_19_vaccination_tweet_stance_en_5.5.0_3.0_1727007264751.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("covid_19_vaccination_tweet_stance","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("covid_19_vaccination_tweet_stance", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|covid_19_vaccination_tweet_stance| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/seantw/covid-19-vaccination-tweet-stance \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-covid_19_vaccination_tweet_stance_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-covid_19_vaccination_tweet_stance_pipeline_en.md new file mode 100644 index 00000000000000..802df926d3ec4c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-covid_19_vaccination_tweet_stance_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English covid_19_vaccination_tweet_stance_pipeline pipeline BertForSequenceClassification from seantw +author: John Snow Labs +name: covid_19_vaccination_tweet_stance_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`covid_19_vaccination_tweet_stance_pipeline` is a English model originally trained by seantw. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/covid_19_vaccination_tweet_stance_pipeline_en_5.5.0_3.0_1727007320055.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/covid_19_vaccination_tweet_stance_pipeline_en_5.5.0_3.0_1727007320055.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("covid_19_vaccination_tweet_stance_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("covid_19_vaccination_tweet_stance_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|covid_19_vaccination_tweet_stance_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/seantw/covid-19-vaccination-tweet-stance + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-cross_encoder_roberta_wwm_ext_v1_pipeline_zh.md b/docs/_posts/ahmedlone127/2024-09-22-cross_encoder_roberta_wwm_ext_v1_pipeline_zh.md new file mode 100644 index 00000000000000..6de659d347088d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-cross_encoder_roberta_wwm_ext_v1_pipeline_zh.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Chinese cross_encoder_roberta_wwm_ext_v1_pipeline pipeline BertForSequenceClassification from tuhailong +author: John Snow Labs +name: cross_encoder_roberta_wwm_ext_v1_pipeline +date: 2024-09-22 +tags: [zh, open_source, pipeline, onnx] +task: Text Classification +language: zh +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`cross_encoder_roberta_wwm_ext_v1_pipeline` is a Chinese model originally trained by tuhailong. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/cross_encoder_roberta_wwm_ext_v1_pipeline_zh_5.5.0_3.0_1727034555787.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/cross_encoder_roberta_wwm_ext_v1_pipeline_zh_5.5.0_3.0_1727034555787.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("cross_encoder_roberta_wwm_ext_v1_pipeline", lang = "zh") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("cross_encoder_roberta_wwm_ext_v1_pipeline", lang = "zh") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|cross_encoder_roberta_wwm_ext_v1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|zh| +|Size:|383.2 MB| + +## References + +https://huggingface.co/tuhailong/cross_encoder_roberta-wwm-ext_v1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-cross_encoder_roberta_wwm_ext_v1_zh.md b/docs/_posts/ahmedlone127/2024-09-22-cross_encoder_roberta_wwm_ext_v1_zh.md new file mode 100644 index 00000000000000..15367ebdeceaad --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-cross_encoder_roberta_wwm_ext_v1_zh.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Chinese cross_encoder_roberta_wwm_ext_v1 BertForSequenceClassification from tuhailong +author: John Snow Labs +name: cross_encoder_roberta_wwm_ext_v1 +date: 2024-09-22 +tags: [zh, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: zh +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`cross_encoder_roberta_wwm_ext_v1` is a Chinese model originally trained by tuhailong. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/cross_encoder_roberta_wwm_ext_v1_zh_5.5.0_3.0_1727034536317.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/cross_encoder_roberta_wwm_ext_v1_zh_5.5.0_3.0_1727034536317.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("cross_encoder_roberta_wwm_ext_v1","zh") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("cross_encoder_roberta_wwm_ext_v1", "zh") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|cross_encoder_roberta_wwm_ext_v1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|zh| +|Size:|383.2 MB| + +## References + +https://huggingface.co/tuhailong/cross_encoder_roberta-wwm-ext_v1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-cross_encoder_roberta_wwm_ext_v2_pipeline_zh.md b/docs/_posts/ahmedlone127/2024-09-22-cross_encoder_roberta_wwm_ext_v2_pipeline_zh.md new file mode 100644 index 00000000000000..aed07da37ded6d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-cross_encoder_roberta_wwm_ext_v2_pipeline_zh.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Chinese cross_encoder_roberta_wwm_ext_v2_pipeline pipeline BertForSequenceClassification from tuhailong +author: John Snow Labs +name: cross_encoder_roberta_wwm_ext_v2_pipeline +date: 2024-09-22 +tags: [zh, open_source, pipeline, onnx] +task: Text Classification +language: zh +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`cross_encoder_roberta_wwm_ext_v2_pipeline` is a Chinese model originally trained by tuhailong. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/cross_encoder_roberta_wwm_ext_v2_pipeline_zh_5.5.0_3.0_1726988827784.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/cross_encoder_roberta_wwm_ext_v2_pipeline_zh_5.5.0_3.0_1726988827784.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("cross_encoder_roberta_wwm_ext_v2_pipeline", lang = "zh") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("cross_encoder_roberta_wwm_ext_v2_pipeline", lang = "zh") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|cross_encoder_roberta_wwm_ext_v2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|zh| +|Size:|383.2 MB| + +## References + +https://huggingface.co/tuhailong/cross_encoder_roberta-wwm-ext_v2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-cross_encoder_roberta_wwm_ext_v2_zh.md b/docs/_posts/ahmedlone127/2024-09-22-cross_encoder_roberta_wwm_ext_v2_zh.md new file mode 100644 index 00000000000000..02046f81444798 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-cross_encoder_roberta_wwm_ext_v2_zh.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Chinese cross_encoder_roberta_wwm_ext_v2 BertForSequenceClassification from tuhailong +author: John Snow Labs +name: cross_encoder_roberta_wwm_ext_v2 +date: 2024-09-22 +tags: [zh, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: zh +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`cross_encoder_roberta_wwm_ext_v2` is a Chinese model originally trained by tuhailong. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/cross_encoder_roberta_wwm_ext_v2_zh_5.5.0_3.0_1726988810496.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/cross_encoder_roberta_wwm_ext_v2_zh_5.5.0_3.0_1726988810496.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("cross_encoder_roberta_wwm_ext_v2","zh") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("cross_encoder_roberta_wwm_ext_v2", "zh") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|cross_encoder_roberta_wwm_ext_v2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|zh| +|Size:|383.2 MB| + +## References + +https://huggingface.co/tuhailong/cross_encoder_roberta-wwm-ext_v2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-crossencoder_airline_refine_en.md b/docs/_posts/ahmedlone127/2024-09-22-crossencoder_airline_refine_en.md new file mode 100644 index 00000000000000..ac61301b332383 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-crossencoder_airline_refine_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English crossencoder_airline_refine RoBertaForSequenceClassification from srmishra +author: John Snow Labs +name: crossencoder_airline_refine +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`crossencoder_airline_refine` is a English model originally trained by srmishra. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/crossencoder_airline_refine_en_5.5.0_3.0_1727017385562.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/crossencoder_airline_refine_en_5.5.0_3.0_1727017385562.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("crossencoder_airline_refine","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("crossencoder_airline_refine", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|crossencoder_airline_refine| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/srmishra/crossencoder-airline-refine \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-custom_model_tweets_sentiment_en.md b/docs/_posts/ahmedlone127/2024-09-22-custom_model_tweets_sentiment_en.md new file mode 100644 index 00000000000000..127b611c8fb4a4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-custom_model_tweets_sentiment_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English custom_model_tweets_sentiment DistilBertForSequenceClassification from Doukan +author: John Snow Labs +name: custom_model_tweets_sentiment +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`custom_model_tweets_sentiment` is a English model originally trained by Doukan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/custom_model_tweets_sentiment_en_5.5.0_3.0_1727012342797.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/custom_model_tweets_sentiment_en_5.5.0_3.0_1727012342797.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("custom_model_tweets_sentiment","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("custom_model_tweets_sentiment", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|custom_model_tweets_sentiment| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Doukan/custom-model-tweets-sentiment \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-cv9_special_batch12_lr6_small_id.md b/docs/_posts/ahmedlone127/2024-09-22-cv9_special_batch12_lr6_small_id.md new file mode 100644 index 00000000000000..b798a3c8b3f5c7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-cv9_special_batch12_lr6_small_id.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Indonesian cv9_special_batch12_lr6_small WhisperForCTC from TheRains +author: John Snow Labs +name: cv9_special_batch12_lr6_small +date: 2024-09-22 +tags: [id, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: id +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`cv9_special_batch12_lr6_small` is a Indonesian model originally trained by TheRains. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/cv9_special_batch12_lr6_small_id_5.5.0_3.0_1727024528786.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/cv9_special_batch12_lr6_small_id_5.5.0_3.0_1727024528786.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("cv9_special_batch12_lr6_small","id") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("cv9_special_batch12_lr6_small", "id") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|cv9_special_batch12_lr6_small| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|id| +|Size:|1.7 GB| + +## References + +https://huggingface.co/TheRains/cv9-special-batch12-lr6-small \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-cv9_special_batch12_lr6_small_pipeline_id.md b/docs/_posts/ahmedlone127/2024-09-22-cv9_special_batch12_lr6_small_pipeline_id.md new file mode 100644 index 00000000000000..065182c7f8fe56 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-cv9_special_batch12_lr6_small_pipeline_id.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Indonesian cv9_special_batch12_lr6_small_pipeline pipeline WhisperForCTC from TheRains +author: John Snow Labs +name: cv9_special_batch12_lr6_small_pipeline +date: 2024-09-22 +tags: [id, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: id +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`cv9_special_batch12_lr6_small_pipeline` is a Indonesian model originally trained by TheRains. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/cv9_special_batch12_lr6_small_pipeline_id_5.5.0_3.0_1727024625699.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/cv9_special_batch12_lr6_small_pipeline_id_5.5.0_3.0_1727024625699.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("cv9_special_batch12_lr6_small_pipeline", lang = "id") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("cv9_special_batch12_lr6_small_pipeline", lang = "id") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|cv9_special_batch12_lr6_small_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|id| +|Size:|1.7 GB| + +## References + +https://huggingface.co/TheRains/cv9-special-batch12-lr6-small + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-danish_bert_review_sentiment_da.md b/docs/_posts/ahmedlone127/2024-09-22-danish_bert_review_sentiment_da.md new file mode 100644 index 00000000000000..307238a60d8d31 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-danish_bert_review_sentiment_da.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Danish danish_bert_review_sentiment BertForSequenceClassification from KennethTM +author: John Snow Labs +name: danish_bert_review_sentiment +date: 2024-09-22 +tags: [da, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: da +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`danish_bert_review_sentiment` is a Danish model originally trained by KennethTM. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/danish_bert_review_sentiment_da_5.5.0_3.0_1727029775457.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/danish_bert_review_sentiment_da_5.5.0_3.0_1727029775457.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("danish_bert_review_sentiment","da") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("danish_bert_review_sentiment", "da") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|danish_bert_review_sentiment| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|da| +|Size:|414.5 MB| + +## References + +https://huggingface.co/KennethTM/danish-bert-review-sentiment \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-danish_bert_review_sentiment_pipeline_da.md b/docs/_posts/ahmedlone127/2024-09-22-danish_bert_review_sentiment_pipeline_da.md new file mode 100644 index 00000000000000..25295610a1d411 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-danish_bert_review_sentiment_pipeline_da.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Danish danish_bert_review_sentiment_pipeline pipeline BertForSequenceClassification from KennethTM +author: John Snow Labs +name: danish_bert_review_sentiment_pipeline +date: 2024-09-22 +tags: [da, open_source, pipeline, onnx] +task: Text Classification +language: da +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`danish_bert_review_sentiment_pipeline` is a Danish model originally trained by KennethTM. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/danish_bert_review_sentiment_pipeline_da_5.5.0_3.0_1727029800396.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/danish_bert_review_sentiment_pipeline_da_5.5.0_3.0_1727029800396.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("danish_bert_review_sentiment_pipeline", lang = "da") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("danish_bert_review_sentiment_pipeline", lang = "da") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|danish_bert_review_sentiment_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|da| +|Size:|414.5 MB| + +## References + +https://huggingface.co/KennethTM/danish-bert-review-sentiment + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-darkstar_bert_ome1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-darkstar_bert_ome1_pipeline_en.md new file mode 100644 index 00000000000000..ce1d033f0a863b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-darkstar_bert_ome1_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English darkstar_bert_ome1_pipeline pipeline BertForSequenceClassification from Schmitz005 +author: John Snow Labs +name: darkstar_bert_ome1_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`darkstar_bert_ome1_pipeline` is a English model originally trained by Schmitz005. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/darkstar_bert_ome1_pipeline_en_5.5.0_3.0_1726990825320.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/darkstar_bert_ome1_pipeline_en_5.5.0_3.0_1726990825320.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("darkstar_bert_ome1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("darkstar_bert_ome1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|darkstar_bert_ome1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/Schmitz005/Darkstar-Bert-ome1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-deberta_embeddings_tapt_nbme_v3_base_en.md b/docs/_posts/ahmedlone127/2024-09-22-deberta_embeddings_tapt_nbme_v3_base_en.md new file mode 100644 index 00000000000000..45534b57337d86 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-deberta_embeddings_tapt_nbme_v3_base_en.md @@ -0,0 +1,98 @@ +--- +layout: model +title: English Deberta Embeddings model (from ZZ99) +author: John Snow Labs +name: deberta_embeddings_tapt_nbme_v3_base +date: 2024-09-22 +tags: [deberta, open_source, deberta_embeddings, debertav2formaskedlm, en, onnx, openvino] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: openvino +annotator: DeBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DebertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `tapt_nbme_deberta_v3_base` is a English model originally trained by `ZZ99`. + +## Predicted Entities + + + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/deberta_embeddings_tapt_nbme_v3_base_en_5.5.0_3.0_1727046746008.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/deberta_embeddings_tapt_nbme_v3_base_en_5.5.0_3.0_1727046746008.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + +{:.model-param} + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = DeBertaEmbeddings.pretrained("deberta_embeddings_tapt_nbme_v3_base","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") \ + .setCaseSensitive(True) + +pipeline = Pipeline(stages=[documentAssembler, tokenizer, embeddings]) + +data = spark.createDataFrame([["I love Spark NLP"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val embeddings = DeBertaEmbeddings.pretrained("deberta_embeddings_tapt_nbme_v3_base","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + .setCaseSensitive(true) + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) + +val data = Seq("I love Spark NLP").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|deberta_embeddings_tapt_nbme_v3_base| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|720.7 MB| + +## References + +https://huggingface.co/ZZ99/tapt_nbme_deberta_v3_base \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-deeppolicytracker_500k_en.md b/docs/_posts/ahmedlone127/2024-09-22-deeppolicytracker_500k_en.md new file mode 100644 index 00000000000000..90ef92a8fd7eaa --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-deeppolicytracker_500k_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English deeppolicytracker_500k RoBertaEmbeddings from flavio-nakasato +author: John Snow Labs +name: deeppolicytracker_500k +date: 2024-09-22 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`deeppolicytracker_500k` is a English model originally trained by flavio-nakasato. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/deeppolicytracker_500k_en_5.5.0_3.0_1726999677211.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/deeppolicytracker_500k_en_5.5.0_3.0_1726999677211.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("deeppolicytracker_500k","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("deeppolicytracker_500k","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|deeppolicytracker_500k| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|305.8 MB| + +## References + +https://huggingface.co/flavio-nakasato/deeppolicytracker_500k \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-deeppolicytracker_500k_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-deeppolicytracker_500k_pipeline_en.md new file mode 100644 index 00000000000000..9386806f62c7e7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-deeppolicytracker_500k_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English deeppolicytracker_500k_pipeline pipeline RoBertaEmbeddings from flavio-nakasato +author: John Snow Labs +name: deeppolicytracker_500k_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`deeppolicytracker_500k_pipeline` is a English model originally trained by flavio-nakasato. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/deeppolicytracker_500k_pipeline_en_5.5.0_3.0_1726999692110.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/deeppolicytracker_500k_pipeline_en_5.5.0_3.0_1726999692110.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("deeppolicytracker_500k_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("deeppolicytracker_500k_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|deeppolicytracker_500k_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|305.8 MB| + +## References + +https://huggingface.co/flavio-nakasato/deeppolicytracker_500k + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-defsent_roberta_base_mean_en.md b/docs/_posts/ahmedlone127/2024-09-22-defsent_roberta_base_mean_en.md new file mode 100644 index 00000000000000..90a09ba42692bc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-defsent_roberta_base_mean_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English defsent_roberta_base_mean RoBertaEmbeddings from cl-nagoya +author: John Snow Labs +name: defsent_roberta_base_mean +date: 2024-09-22 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`defsent_roberta_base_mean` is a English model originally trained by cl-nagoya. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/defsent_roberta_base_mean_en_5.5.0_3.0_1727041747971.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/defsent_roberta_base_mean_en_5.5.0_3.0_1727041747971.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("defsent_roberta_base_mean","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("defsent_roberta_base_mean","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|defsent_roberta_base_mean| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|413.1 MB| + +## References + +https://huggingface.co/cl-nagoya/defsent-roberta-base-mean \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-defsent_roberta_base_mean_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-defsent_roberta_base_mean_pipeline_en.md new file mode 100644 index 00000000000000..479405408c7f46 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-defsent_roberta_base_mean_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English defsent_roberta_base_mean_pipeline pipeline RoBertaEmbeddings from cl-nagoya +author: John Snow Labs +name: defsent_roberta_base_mean_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`defsent_roberta_base_mean_pipeline` is a English model originally trained by cl-nagoya. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/defsent_roberta_base_mean_pipeline_en_5.5.0_3.0_1727041791826.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/defsent_roberta_base_mean_pipeline_en_5.5.0_3.0_1727041791826.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("defsent_roberta_base_mean_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("defsent_roberta_base_mean_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|defsent_roberta_base_mean_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|413.1 MB| + +## References + +https://huggingface.co/cl-nagoya/defsent-roberta-base-mean + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-delete_only_filtering_hausa_v2_en.md b/docs/_posts/ahmedlone127/2024-09-22-delete_only_filtering_hausa_v2_en.md new file mode 100644 index 00000000000000..32fdd2eeb2332d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-delete_only_filtering_hausa_v2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English delete_only_filtering_hausa_v2 XlmRoBertaForTokenClassification from grace-pro +author: John Snow Labs +name: delete_only_filtering_hausa_v2 +date: 2024-09-22 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`delete_only_filtering_hausa_v2` is a English model originally trained by grace-pro. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/delete_only_filtering_hausa_v2_en_5.5.0_3.0_1727018796486.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/delete_only_filtering_hausa_v2_en_5.5.0_3.0_1727018796486.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("delete_only_filtering_hausa_v2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("delete_only_filtering_hausa_v2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|delete_only_filtering_hausa_v2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|1.0 GB| + +## References + +https://huggingface.co/grace-pro/delete_only_filtering_hausa_v2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-delete_only_filtering_hausa_v2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-delete_only_filtering_hausa_v2_pipeline_en.md new file mode 100644 index 00000000000000..020cb8bd0de0d3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-delete_only_filtering_hausa_v2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English delete_only_filtering_hausa_v2_pipeline pipeline XlmRoBertaForTokenClassification from grace-pro +author: John Snow Labs +name: delete_only_filtering_hausa_v2_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`delete_only_filtering_hausa_v2_pipeline` is a English model originally trained by grace-pro. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/delete_only_filtering_hausa_v2_pipeline_en_5.5.0_3.0_1727018842469.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/delete_only_filtering_hausa_v2_pipeline_en_5.5.0_3.0_1727018842469.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("delete_only_filtering_hausa_v2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("delete_only_filtering_hausa_v2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|delete_only_filtering_hausa_v2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.0 GB| + +## References + +https://huggingface.co/grace-pro/delete_only_filtering_hausa_v2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-demo_mangowly_en.md b/docs/_posts/ahmedlone127/2024-09-22-demo_mangowly_en.md new file mode 100644 index 00000000000000..df7cea8027a88e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-demo_mangowly_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English demo_mangowly DistilBertForSequenceClassification from mangowly +author: John Snow Labs +name: demo_mangowly +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`demo_mangowly` is a English model originally trained by mangowly. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/demo_mangowly_en_5.5.0_3.0_1727033601485.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/demo_mangowly_en_5.5.0_3.0_1727033601485.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("demo_mangowly","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("demo_mangowly", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|demo_mangowly| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|250.6 MB| + +## References + +https://huggingface.co/mangowly/demo \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-demo_mangowly_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-demo_mangowly_pipeline_en.md new file mode 100644 index 00000000000000..1471ad3990bcfd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-demo_mangowly_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English demo_mangowly_pipeline pipeline DistilBertForSequenceClassification from mangowly +author: John Snow Labs +name: demo_mangowly_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`demo_mangowly_pipeline` is a English model originally trained by mangowly. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/demo_mangowly_pipeline_en_5.5.0_3.0_1727033619846.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/demo_mangowly_pipeline_en_5.5.0_3.0_1727033619846.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("demo_mangowly_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("demo_mangowly_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|demo_mangowly_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|250.6 MB| + +## References + +https://huggingface.co/mangowly/demo + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-disaster_tweet_4_en.md b/docs/_posts/ahmedlone127/2024-09-22-disaster_tweet_4_en.md new file mode 100644 index 00000000000000..ece39d1b9c11d4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-disaster_tweet_4_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English disaster_tweet_4 RoBertaForSequenceClassification from aellxx +author: John Snow Labs +name: disaster_tweet_4 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`disaster_tweet_4` is a English model originally trained by aellxx. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/disaster_tweet_4_en_5.5.0_3.0_1727027012936.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/disaster_tweet_4_en_5.5.0_3.0_1727027012936.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("disaster_tweet_4","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("disaster_tweet_4", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|disaster_tweet_4| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|468.1 MB| + +## References + +https://huggingface.co/aellxx/disaster-tweet-4 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-disaster_tweet_4_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-disaster_tweet_4_pipeline_en.md new file mode 100644 index 00000000000000..555ef275c46bfc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-disaster_tweet_4_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English disaster_tweet_4_pipeline pipeline RoBertaForSequenceClassification from aellxx +author: John Snow Labs +name: disaster_tweet_4_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`disaster_tweet_4_pipeline` is a English model originally trained by aellxx. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/disaster_tweet_4_pipeline_en_5.5.0_3.0_1727027036081.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/disaster_tweet_4_pipeline_en_5.5.0_3.0_1727027036081.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("disaster_tweet_4_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("disaster_tweet_4_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|disaster_tweet_4_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|468.2 MB| + +## References + +https://huggingface.co/aellxx/disaster-tweet-4 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-dissertation_sahil_bert_en.md b/docs/_posts/ahmedlone127/2024-09-22-dissertation_sahil_bert_en.md new file mode 100644 index 00000000000000..9b1eac749ea74f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-dissertation_sahil_bert_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English dissertation_sahil_bert BertForSequenceClassification from mahadev23 +author: John Snow Labs +name: dissertation_sahil_bert +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`dissertation_sahil_bert` is a English model originally trained by mahadev23. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/dissertation_sahil_bert_en_5.5.0_3.0_1727029774986.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/dissertation_sahil_bert_en_5.5.0_3.0_1727029774986.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("dissertation_sahil_bert","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("dissertation_sahil_bert", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|dissertation_sahil_bert| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/mahadev23/dissertation_sahil_bert \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-dissertation_sahil_bert_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-dissertation_sahil_bert_pipeline_en.md new file mode 100644 index 00000000000000..b1f7da9eae1c1b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-dissertation_sahil_bert_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English dissertation_sahil_bert_pipeline pipeline BertForSequenceClassification from mahadev23 +author: John Snow Labs +name: dissertation_sahil_bert_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`dissertation_sahil_bert_pipeline` is a English model originally trained by mahadev23. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/dissertation_sahil_bert_pipeline_en_5.5.0_3.0_1727029799949.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/dissertation_sahil_bert_pipeline_en_5.5.0_3.0_1727029799949.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("dissertation_sahil_bert_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("dissertation_sahil_bert_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|dissertation_sahil_bert_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/mahadev23/dissertation_sahil_bert + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distbert_cpcd_en.md b/docs/_posts/ahmedlone127/2024-09-22-distbert_cpcd_en.md new file mode 100644 index 00000000000000..88099c15fcd1f4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distbert_cpcd_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distbert_cpcd DistilBertForSequenceClassification from jnwnlee +author: John Snow Labs +name: distbert_cpcd +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distbert_cpcd` is a English model originally trained by jnwnlee. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distbert_cpcd_en_5.5.0_3.0_1727035507773.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distbert_cpcd_en_5.5.0_3.0_1727035507773.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distbert_cpcd","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distbert_cpcd", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distbert_cpcd| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/jnwnlee/distbert_cpcd \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distbert_cpcd_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-distbert_cpcd_pipeline_en.md new file mode 100644 index 00000000000000..c559977fe8cc9d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distbert_cpcd_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distbert_cpcd_pipeline pipeline DistilBertForSequenceClassification from jnwnlee +author: John Snow Labs +name: distbert_cpcd_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distbert_cpcd_pipeline` is a English model originally trained by jnwnlee. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distbert_cpcd_pipeline_en_5.5.0_3.0_1727035522113.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distbert_cpcd_pipeline_en_5.5.0_3.0_1727035522113.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distbert_cpcd_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distbert_cpcd_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distbert_cpcd_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/jnwnlee/distbert_cpcd + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distil_multilingual_cased_fire_classification_silvanus_pipeline_xx.md b/docs/_posts/ahmedlone127/2024-09-22-distil_multilingual_cased_fire_classification_silvanus_pipeline_xx.md new file mode 100644 index 00000000000000..351be8ce3ba058 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distil_multilingual_cased_fire_classification_silvanus_pipeline_xx.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Multilingual distil_multilingual_cased_fire_classification_silvanus_pipeline pipeline DistilBertForSequenceClassification from rollerhafeezh-amikom +author: John Snow Labs +name: distil_multilingual_cased_fire_classification_silvanus_pipeline +date: 2024-09-22 +tags: [xx, open_source, pipeline, onnx] +task: Text Classification +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distil_multilingual_cased_fire_classification_silvanus_pipeline` is a Multilingual model originally trained by rollerhafeezh-amikom. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distil_multilingual_cased_fire_classification_silvanus_pipeline_xx_5.5.0_3.0_1727020800633.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distil_multilingual_cased_fire_classification_silvanus_pipeline_xx_5.5.0_3.0_1727020800633.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distil_multilingual_cased_fire_classification_silvanus_pipeline", lang = "xx") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distil_multilingual_cased_fire_classification_silvanus_pipeline", lang = "xx") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distil_multilingual_cased_fire_classification_silvanus_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|xx| +|Size:|507.6 MB| + +## References + +https://huggingface.co/rollerhafeezh-amikom/distil-multilingual-cased-fire-classification-silvanus + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distil_multilingual_cased_fire_classification_silvanus_xx.md b/docs/_posts/ahmedlone127/2024-09-22-distil_multilingual_cased_fire_classification_silvanus_xx.md new file mode 100644 index 00000000000000..986dba1210f7e1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distil_multilingual_cased_fire_classification_silvanus_xx.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Multilingual distil_multilingual_cased_fire_classification_silvanus DistilBertForSequenceClassification from rollerhafeezh-amikom +author: John Snow Labs +name: distil_multilingual_cased_fire_classification_silvanus +date: 2024-09-22 +tags: [xx, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distil_multilingual_cased_fire_classification_silvanus` is a Multilingual model originally trained by rollerhafeezh-amikom. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distil_multilingual_cased_fire_classification_silvanus_xx_5.5.0_3.0_1727020778106.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distil_multilingual_cased_fire_classification_silvanus_xx_5.5.0_3.0_1727020778106.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distil_multilingual_cased_fire_classification_silvanus","xx") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distil_multilingual_cased_fire_classification_silvanus", "xx") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distil_multilingual_cased_fire_classification_silvanus| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|xx| +|Size:|507.6 MB| + +## References + +https://huggingface.co/rollerhafeezh-amikom/distil-multilingual-cased-fire-classification-silvanus \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert2_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert2_en.md new file mode 100644 index 00000000000000..ae04975b3918d5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert2 DistilBertForSequenceClassification from deptage +author: John Snow Labs +name: distilbert2 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert2` is a English model originally trained by deptage. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert2_en_5.5.0_3.0_1727020768980.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert2_en_5.5.0_3.0_1727020768980.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/deptage/distilbert2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert2_pipeline_en.md new file mode 100644 index 00000000000000..69e5b1df308b9b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert2_pipeline pipeline DistilBertForSequenceClassification from deptage +author: John Snow Labs +name: distilbert2_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert2_pipeline` is a English model originally trained by deptage. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert2_pipeline_en_5.5.0_3.0_1727020780976.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert2_pipeline_en_5.5.0_3.0_1727020780976.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/deptage/distilbert2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_07_3_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_07_3_en.md new file mode 100644 index 00000000000000..fac85b5c69f336 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_07_3_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_07_3 DistilBertForSequenceClassification from KalaiselvanD +author: John Snow Labs +name: distilbert_07_3 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_07_3` is a English model originally trained by KalaiselvanD. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_07_3_en_5.5.0_3.0_1727033245241.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_07_3_en_5.5.0_3.0_1727033245241.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_07_3","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_07_3", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_07_3| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/KalaiselvanD/distilbert_07_3 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_07_3_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_07_3_pipeline_en.md new file mode 100644 index 00000000000000..0dd166e1715c99 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_07_3_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_07_3_pipeline pipeline DistilBertForSequenceClassification from KalaiselvanD +author: John Snow Labs +name: distilbert_07_3_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_07_3_pipeline` is a English model originally trained by KalaiselvanD. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_07_3_pipeline_en_5.5.0_3.0_1727033257962.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_07_3_pipeline_en_5.5.0_3.0_1727033257962.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_07_3_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_07_3_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_07_3_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/KalaiselvanD/distilbert_07_3 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_agnews_padding70model_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_agnews_padding70model_en.md new file mode 100644 index 00000000000000..703fff9133d6cd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_agnews_padding70model_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_agnews_padding70model DistilBertForSequenceClassification from Realgon +author: John Snow Labs +name: distilbert_agnews_padding70model +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_agnews_padding70model` is a English model originally trained by Realgon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_agnews_padding70model_en_5.5.0_3.0_1727020873003.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_agnews_padding70model_en_5.5.0_3.0_1727020873003.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_agnews_padding70model","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_agnews_padding70model", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_agnews_padding70model| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Realgon/distilbert_agnews_padding70model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_agnews_padding70model_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_agnews_padding70model_pipeline_en.md new file mode 100644 index 00000000000000..eadd7e2ed3d491 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_agnews_padding70model_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_agnews_padding70model_pipeline pipeline DistilBertForSequenceClassification from Realgon +author: John Snow Labs +name: distilbert_agnews_padding70model_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_agnews_padding70model_pipeline` is a English model originally trained by Realgon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_agnews_padding70model_pipeline_en_5.5.0_3.0_1727020886459.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_agnews_padding70model_pipeline_en_5.5.0_3.0_1727020886459.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_agnews_padding70model_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_agnews_padding70model_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_agnews_padding70model_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Realgon/distilbert_agnews_padding70model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_cased_sst2_ft_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_cased_sst2_ft_en.md new file mode 100644 index 00000000000000..109e7583cc4e23 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_cased_sst2_ft_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_cased_sst2_ft DistilBertForSequenceClassification from EgehanEralp +author: John Snow Labs +name: distilbert_base_cased_sst2_ft +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_cased_sst2_ft` is a English model originally trained by EgehanEralp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_cased_sst2_ft_en_5.5.0_3.0_1726980633900.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_cased_sst2_ft_en_5.5.0_3.0_1726980633900.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_cased_sst2_ft","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_cased_sst2_ft", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_cased_sst2_ft| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|246.0 MB| + +## References + +https://huggingface.co/EgehanEralp/distilbert-base-cased-sst2-ft \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_cased_sst2_ft_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_cased_sst2_ft_pipeline_en.md new file mode 100644 index 00000000000000..2e70b3bf04eb49 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_cased_sst2_ft_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_cased_sst2_ft_pipeline pipeline DistilBertForSequenceClassification from EgehanEralp +author: John Snow Labs +name: distilbert_base_cased_sst2_ft_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_cased_sst2_ft_pipeline` is a English model originally trained by EgehanEralp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_cased_sst2_ft_pipeline_en_5.5.0_3.0_1726980645033.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_cased_sst2_ft_pipeline_en_5.5.0_3.0_1726980645033.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_cased_sst2_ft_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_cased_sst2_ft_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_cased_sst2_ft_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|246.0 MB| + +## References + +https://huggingface.co/EgehanEralp/distilbert-base-cased-sst2-ft + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_dataverse_2023_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_dataverse_2023_en.md new file mode 100644 index 00000000000000..266211444613a9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_dataverse_2023_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_dataverse_2023 DistilBertForSequenceClassification from rajendrabaskota +author: John Snow Labs +name: distilbert_base_dataverse_2023 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_dataverse_2023` is a English model originally trained by rajendrabaskota. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_dataverse_2023_en_5.5.0_3.0_1727033132297.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_dataverse_2023_en_5.5.0_3.0_1727033132297.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_dataverse_2023","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_dataverse_2023", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_dataverse_2023| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/rajendrabaskota/distilbert-base-dataverse-2023 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_dataverse_2023_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_dataverse_2023_pipeline_en.md new file mode 100644 index 00000000000000..042b5c265d6d12 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_dataverse_2023_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_dataverse_2023_pipeline pipeline DistilBertForSequenceClassification from rajendrabaskota +author: John Snow Labs +name: distilbert_base_dataverse_2023_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_dataverse_2023_pipeline` is a English model originally trained by rajendrabaskota. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_dataverse_2023_pipeline_en_5.5.0_3.0_1727033153188.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_dataverse_2023_pipeline_en_5.5.0_3.0_1727033153188.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_dataverse_2023_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_dataverse_2023_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_dataverse_2023_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/rajendrabaskota/distilbert-base-dataverse-2023 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_en.md new file mode 100644 index 00000000000000..c88da951f58156 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_en.md @@ -0,0 +1,100 @@ +--- +layout: model +title: English distilbert_base DistilBertForSequenceClassification from zonghaoyang +author: John Snow Labs +name: distilbert_base +date: 2024-09-22 +tags: [bert, en, open_source, sequence_classification, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base` is a English model originally trained by zonghaoyang. + +## Predicted Entities + + + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_en_5.5.0_3.0_1727012755569.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_en_5.5.0_3.0_1727012755569.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +document_assembler = DocumentAssembler()\ + .setInputCol("text")\ + .setOutputCol("document") + +tokenizer = Tokenizer()\ + .setInputCols("document")\ + .setOutputCol("token") + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base","en")\ + .setInputCols(["document","token"])\ + .setOutputCol("class") + +pipeline = Pipeline().setStages([document_assembler, tokenizer, sequenceClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val document_assembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base","en") + .setInputCols(Array("document","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +References + +References + +https://huggingface.co/zonghaoyang/DistilBERT-base \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_finetuned_imdb_sentiment_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_finetuned_imdb_sentiment_en.md new file mode 100644 index 00000000000000..32249b127177a3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_finetuned_imdb_sentiment_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_finetuned_imdb_sentiment DistilBertForSequenceClassification from lyrisha +author: John Snow Labs +name: distilbert_base_finetuned_imdb_sentiment +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_finetuned_imdb_sentiment` is a English model originally trained by lyrisha. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_finetuned_imdb_sentiment_en_5.5.0_3.0_1727033355896.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_finetuned_imdb_sentiment_en_5.5.0_3.0_1727033355896.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_finetuned_imdb_sentiment","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_finetuned_imdb_sentiment", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_finetuned_imdb_sentiment| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/lyrisha/distilbert-base-finetuned-imdb-sentiment \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_finetuned_imdb_sentiment_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_finetuned_imdb_sentiment_pipeline_en.md new file mode 100644 index 00000000000000..2f53e452947857 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_finetuned_imdb_sentiment_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_finetuned_imdb_sentiment_pipeline pipeline DistilBertForSequenceClassification from lyrisha +author: John Snow Labs +name: distilbert_base_finetuned_imdb_sentiment_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_finetuned_imdb_sentiment_pipeline` is a English model originally trained by lyrisha. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_finetuned_imdb_sentiment_pipeline_en_5.5.0_3.0_1727033368157.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_finetuned_imdb_sentiment_pipeline_en_5.5.0_3.0_1727033368157.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_finetuned_imdb_sentiment_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_finetuned_imdb_sentiment_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_finetuned_imdb_sentiment_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/lyrisha/distilbert-base-finetuned-imdb-sentiment + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_multilingual_cased_hate_speech_ben_hin_pipeline_xx.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_multilingual_cased_hate_speech_ben_hin_pipeline_xx.md new file mode 100644 index 00000000000000..ee3cc0a36fc5ce --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_multilingual_cased_hate_speech_ben_hin_pipeline_xx.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Multilingual distilbert_base_multilingual_cased_hate_speech_ben_hin_pipeline pipeline DistilBertForSequenceClassification from kingshukroy +author: John Snow Labs +name: distilbert_base_multilingual_cased_hate_speech_ben_hin_pipeline +date: 2024-09-22 +tags: [xx, open_source, pipeline, onnx] +task: Text Classification +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_multilingual_cased_hate_speech_ben_hin_pipeline` is a Multilingual model originally trained by kingshukroy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_multilingual_cased_hate_speech_ben_hin_pipeline_xx_5.5.0_3.0_1727012389617.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_multilingual_cased_hate_speech_ben_hin_pipeline_xx_5.5.0_3.0_1727012389617.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_multilingual_cased_hate_speech_ben_hin_pipeline", lang = "xx") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_multilingual_cased_hate_speech_ben_hin_pipeline", lang = "xx") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_multilingual_cased_hate_speech_ben_hin_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|xx| +|Size:|507.6 MB| + +## References + +https://huggingface.co/kingshukroy/distilbert-base-multilingual-cased-hate-speech-ben-hin + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_multilingual_cased_hate_speech_ben_hin_xx.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_multilingual_cased_hate_speech_ben_hin_xx.md new file mode 100644 index 00000000000000..85f4f923cfae02 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_multilingual_cased_hate_speech_ben_hin_xx.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Multilingual distilbert_base_multilingual_cased_hate_speech_ben_hin DistilBertForSequenceClassification from kingshukroy +author: John Snow Labs +name: distilbert_base_multilingual_cased_hate_speech_ben_hin +date: 2024-09-22 +tags: [xx, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_multilingual_cased_hate_speech_ben_hin` is a Multilingual model originally trained by kingshukroy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_multilingual_cased_hate_speech_ben_hin_xx_5.5.0_3.0_1727012366945.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_multilingual_cased_hate_speech_ben_hin_xx_5.5.0_3.0_1727012366945.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_multilingual_cased_hate_speech_ben_hin","xx") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_multilingual_cased_hate_speech_ben_hin", "xx") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_multilingual_cased_hate_speech_ben_hin| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|xx| +|Size:|507.6 MB| + +## References + +https://huggingface.co/kingshukroy/distilbert-base-multilingual-cased-hate-speech-ben-hin \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_pipeline_en.md new file mode 100644 index 00000000000000..883af245658092 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_pipeline_en.md @@ -0,0 +1,72 @@ +--- +layout: model +title: English distilbert_base_pipeline pipeline DistilBertForQuestionAnswering from KarthikAlagarsamy +author: John Snow Labs +name: distilbert_base_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_pipeline` is a English model originally trained by KarthikAlagarsamy. + +## Predicted Entities + + + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_pipeline_en_5.5.0_3.0_1727012767605.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_pipeline_en_5.5.0_3.0_1727012767605.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +pipeline = PretrainedPipeline("distilbert_base_pipeline", lang = "en") +annotations = pipeline.transform(df) +``` +```scala +val pipeline = new PretrainedPipeline("distilbert_base_pipeline", lang = "en") +val annotations = pipeline.transform(df) +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +References + +https://huggingface.co/KarthikAlagarsamy/distilbert_base + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_3epoch5_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_3epoch5_en.md new file mode 100644 index 00000000000000..d3d20921f55bed --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_3epoch5_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_3epoch5 DistilBertForSequenceClassification from dianamihalache27 +author: John Snow Labs +name: distilbert_base_uncased_3epoch5 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_3epoch5` is a English model originally trained by dianamihalache27. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_3epoch5_en_5.5.0_3.0_1727035583979.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_3epoch5_en_5.5.0_3.0_1727035583979.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_3epoch5","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_3epoch5", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_3epoch5| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/dianamihalache27/distilbert-base-uncased_3epoch5 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_3epoch5_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_3epoch5_pipeline_en.md new file mode 100644 index 00000000000000..0237b2100c3897 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_3epoch5_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_3epoch5_pipeline pipeline DistilBertForSequenceClassification from dianamihalache27 +author: John Snow Labs +name: distilbert_base_uncased_3epoch5_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_3epoch5_pipeline` is a English model originally trained by dianamihalache27. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_3epoch5_pipeline_en_5.5.0_3.0_1727035596787.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_3epoch5_pipeline_en_5.5.0_3.0_1727035596787.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_3epoch5_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_3epoch5_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_3epoch5_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/dianamihalache27/distilbert-base-uncased_3epoch5 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_ahc_25000_1683475188_540369_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_ahc_25000_1683475188_540369_en.md new file mode 100644 index 00000000000000..cc7ea63c5f4662 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_ahc_25000_1683475188_540369_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_ahc_25000_1683475188_540369 DistilBertForSequenceClassification from alvingogo +author: John Snow Labs +name: distilbert_base_uncased_ahc_25000_1683475188_540369 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_ahc_25000_1683475188_540369` is a English model originally trained by alvingogo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_ahc_25000_1683475188_540369_en_5.5.0_3.0_1727033492236.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_ahc_25000_1683475188_540369_en_5.5.0_3.0_1727033492236.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_ahc_25000_1683475188_540369","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_ahc_25000_1683475188_540369", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_ahc_25000_1683475188_540369| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/alvingogo/distilbert-base-uncased_AHC_25000_1683475188.540369 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_ahc_25000_1683475188_540369_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_ahc_25000_1683475188_540369_pipeline_en.md new file mode 100644 index 00000000000000..c44839eaa86064 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_ahc_25000_1683475188_540369_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_ahc_25000_1683475188_540369_pipeline pipeline DistilBertForSequenceClassification from alvingogo +author: John Snow Labs +name: distilbert_base_uncased_ahc_25000_1683475188_540369_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_ahc_25000_1683475188_540369_pipeline` is a English model originally trained by alvingogo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_ahc_25000_1683475188_540369_pipeline_en_5.5.0_3.0_1727033507107.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_ahc_25000_1683475188_540369_pipeline_en_5.5.0_3.0_1727033507107.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_ahc_25000_1683475188_540369_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_ahc_25000_1683475188_540369_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_ahc_25000_1683475188_540369_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/alvingogo/distilbert-base-uncased_AHC_25000_1683475188.540369 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_banking_zphr_0st72_ut72ut1_plain_simsp_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_banking_zphr_0st72_ut72ut1_plain_simsp_en.md new file mode 100644 index 00000000000000..90b780ecab3320 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_banking_zphr_0st72_ut72ut1_plain_simsp_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_banking_zphr_0st72_ut72ut1_plain_simsp DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_banking_zphr_0st72_ut72ut1_plain_simsp +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_banking_zphr_0st72_ut72ut1_plain_simsp` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_banking_zphr_0st72_ut72ut1_plain_simsp_en_5.5.0_3.0_1727033813396.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_banking_zphr_0st72_ut72ut1_plain_simsp_en_5.5.0_3.0_1727033813396.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_banking_zphr_0st72_ut72ut1_plain_simsp","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_banking_zphr_0st72_ut72ut1_plain_simsp", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_banking_zphr_0st72_ut72ut1_plain_simsp| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_banking_zphr_0st72_ut72ut1_plain_simsp \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_banking_zphr_0st72_ut72ut1_plain_simsp_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_banking_zphr_0st72_ut72ut1_plain_simsp_pipeline_en.md new file mode 100644 index 00000000000000..926d8d8e9825df --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_banking_zphr_0st72_ut72ut1_plain_simsp_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_banking_zphr_0st72_ut72ut1_plain_simsp_pipeline pipeline DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_banking_zphr_0st72_ut72ut1_plain_simsp_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_banking_zphr_0st72_ut72ut1_plain_simsp_pipeline` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_banking_zphr_0st72_ut72ut1_plain_simsp_pipeline_en_5.5.0_3.0_1727033826204.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_banking_zphr_0st72_ut72ut1_plain_simsp_pipeline_en_5.5.0_3.0_1727033826204.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_banking_zphr_0st72_ut72ut1_plain_simsp_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_banking_zphr_0st72_ut72ut1_plain_simsp_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_banking_zphr_0st72_ut72ut1_plain_simsp_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_banking_zphr_0st72_ut72ut1_plain_simsp + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_cext_mypersonality_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_cext_mypersonality_en.md new file mode 100644 index 00000000000000..57249a679298ff --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_cext_mypersonality_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_cext_mypersonality DistilBertForSequenceClassification from holistic-ai +author: John Snow Labs +name: distilbert_base_uncased_cext_mypersonality +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_cext_mypersonality` is a English model originally trained by holistic-ai. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_cext_mypersonality_en_5.5.0_3.0_1727020952708.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_cext_mypersonality_en_5.5.0_3.0_1727020952708.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_cext_mypersonality","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_cext_mypersonality", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_cext_mypersonality| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/holistic-ai/distilbert-base-uncased_cEXT_mypersonality \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_cext_mypersonality_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_cext_mypersonality_pipeline_en.md new file mode 100644 index 00000000000000..10fc08cf415698 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_cext_mypersonality_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_cext_mypersonality_pipeline pipeline DistilBertForSequenceClassification from holistic-ai +author: John Snow Labs +name: distilbert_base_uncased_cext_mypersonality_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_cext_mypersonality_pipeline` is a English model originally trained by holistic-ai. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_cext_mypersonality_pipeline_en_5.5.0_3.0_1727020964404.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_cext_mypersonality_pipeline_en_5.5.0_3.0_1727020964404.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_cext_mypersonality_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_cext_mypersonality_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_cext_mypersonality_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/holistic-ai/distilbert-base-uncased_cEXT_mypersonality + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_copn_mypersonality_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_copn_mypersonality_en.md new file mode 100644 index 00000000000000..2d122d803df1d9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_copn_mypersonality_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_copn_mypersonality DistilBertForSequenceClassification from holistic-ai +author: John Snow Labs +name: distilbert_base_uncased_copn_mypersonality +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_copn_mypersonality` is a English model originally trained by holistic-ai. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_copn_mypersonality_en_5.5.0_3.0_1727020504713.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_copn_mypersonality_en_5.5.0_3.0_1727020504713.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_copn_mypersonality","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_copn_mypersonality", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_copn_mypersonality| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/holistic-ai/distilbert-base-uncased_cOPN_mypersonality \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_copn_mypersonality_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_copn_mypersonality_pipeline_en.md new file mode 100644 index 00000000000000..18cb28b8a671ed --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_copn_mypersonality_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_copn_mypersonality_pipeline pipeline DistilBertForSequenceClassification from holistic-ai +author: John Snow Labs +name: distilbert_base_uncased_copn_mypersonality_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_copn_mypersonality_pipeline` is a English model originally trained by holistic-ai. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_copn_mypersonality_pipeline_en_5.5.0_3.0_1727020517033.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_copn_mypersonality_pipeline_en_5.5.0_3.0_1727020517033.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_copn_mypersonality_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_copn_mypersonality_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_copn_mypersonality_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/holistic-ai/distilbert-base-uncased_cOPN_mypersonality + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_distilled_clinc_dro14_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_distilled_clinc_dro14_en.md new file mode 100644 index 00000000000000..ee42e4b81070ef --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_distilled_clinc_dro14_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_distilled_clinc_dro14 DistilBertForSequenceClassification from dro14 +author: John Snow Labs +name: distilbert_base_uncased_distilled_clinc_dro14 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_distilled_clinc_dro14` is a English model originally trained by dro14. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_distilled_clinc_dro14_en_5.5.0_3.0_1726980423303.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_distilled_clinc_dro14_en_5.5.0_3.0_1726980423303.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_distilled_clinc_dro14","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_distilled_clinc_dro14", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_distilled_clinc_dro14| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.9 MB| + +## References + +https://huggingface.co/dro14/distilbert-base-uncased-distilled-clinc \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_distilled_clinc_dro14_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_distilled_clinc_dro14_pipeline_en.md new file mode 100644 index 00000000000000..709f8497cf29e5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_distilled_clinc_dro14_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_distilled_clinc_dro14_pipeline pipeline DistilBertForSequenceClassification from dro14 +author: John Snow Labs +name: distilbert_base_uncased_distilled_clinc_dro14_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_distilled_clinc_dro14_pipeline` is a English model originally trained by dro14. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_distilled_clinc_dro14_pipeline_en_5.5.0_3.0_1726980434719.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_distilled_clinc_dro14_pipeline_en_5.5.0_3.0_1726980434719.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_distilled_clinc_dro14_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_distilled_clinc_dro14_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_distilled_clinc_dro14_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.9 MB| + +## References + +https://huggingface.co/dro14/distilbert-base-uncased-distilled-clinc + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_emotion_ft_0416_qingh001_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_emotion_ft_0416_qingh001_en.md new file mode 100644 index 00000000000000..1f2d625d0a3e5b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_emotion_ft_0416_qingh001_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_emotion_ft_0416_qingh001 DistilBertForSequenceClassification from qingh001 +author: John Snow Labs +name: distilbert_base_uncased_emotion_ft_0416_qingh001 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_emotion_ft_0416_qingh001` is a English model originally trained by qingh001. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_emotion_ft_0416_qingh001_en_5.5.0_3.0_1727033582713.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_emotion_ft_0416_qingh001_en_5.5.0_3.0_1727033582713.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_emotion_ft_0416_qingh001","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_emotion_ft_0416_qingh001", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_emotion_ft_0416_qingh001| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/qingh001/distilbert-base-uncased_emotion_ft_0416 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_emotion_ft_0416_qingh001_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_emotion_ft_0416_qingh001_pipeline_en.md new file mode 100644 index 00000000000000..13e5e8c602a33d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_emotion_ft_0416_qingh001_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_emotion_ft_0416_qingh001_pipeline pipeline DistilBertForSequenceClassification from qingh001 +author: John Snow Labs +name: distilbert_base_uncased_emotion_ft_0416_qingh001_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_emotion_ft_0416_qingh001_pipeline` is a English model originally trained by qingh001. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_emotion_ft_0416_qingh001_pipeline_en_5.5.0_3.0_1727033595627.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_emotion_ft_0416_qingh001_pipeline_en_5.5.0_3.0_1727033595627.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_emotion_ft_0416_qingh001_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_emotion_ft_0416_qingh001_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_emotion_ft_0416_qingh001_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/qingh001/distilbert-base-uncased_emotion_ft_0416 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_adl_hw1_b10401015_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_adl_hw1_b10401015_pipeline_en.md new file mode 100644 index 00000000000000..070ceefca4e3e2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_adl_hw1_b10401015_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_adl_hw1_b10401015_pipeline pipeline DistilBertForSequenceClassification from b10401015 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_adl_hw1_b10401015_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_adl_hw1_b10401015_pipeline` is a English model originally trained by b10401015. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_adl_hw1_b10401015_pipeline_en_5.5.0_3.0_1726980399346.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_adl_hw1_b10401015_pipeline_en_5.5.0_3.0_1726980399346.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_adl_hw1_b10401015_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_adl_hw1_b10401015_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_adl_hw1_b10401015_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.9 MB| + +## References + +https://huggingface.co/b10401015/distilbert-base-uncased-finetuned-adl_hw1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_clinc_jnwulff_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_clinc_jnwulff_en.md new file mode 100644 index 00000000000000..46d9ebc7ea6405 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_clinc_jnwulff_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_clinc_jnwulff DistilBertForSequenceClassification from jnwulff +author: John Snow Labs +name: distilbert_base_uncased_finetuned_clinc_jnwulff +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_clinc_jnwulff` is a English model originally trained by jnwulff. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_clinc_jnwulff_en_5.5.0_3.0_1727033365345.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_clinc_jnwulff_en_5.5.0_3.0_1727033365345.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_clinc_jnwulff","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_clinc_jnwulff", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_clinc_jnwulff| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.9 MB| + +## References + +https://huggingface.co/jnwulff/distilbert-base-uncased-finetuned-clinc \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_clinc_jnwulff_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_clinc_jnwulff_pipeline_en.md new file mode 100644 index 00000000000000..a301feacfd5fe6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_clinc_jnwulff_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_clinc_jnwulff_pipeline pipeline DistilBertForSequenceClassification from jnwulff +author: John Snow Labs +name: distilbert_base_uncased_finetuned_clinc_jnwulff_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_clinc_jnwulff_pipeline` is a English model originally trained by jnwulff. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_clinc_jnwulff_pipeline_en_5.5.0_3.0_1727033378583.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_clinc_jnwulff_pipeline_en_5.5.0_3.0_1727033378583.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_clinc_jnwulff_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_clinc_jnwulff_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_clinc_jnwulff_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.9 MB| + +## References + +https://huggingface.co/jnwulff/distilbert-base-uncased-finetuned-clinc + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_clinc_kwkwkwkwpark_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_clinc_kwkwkwkwpark_en.md new file mode 100644 index 00000000000000..73184bb96d9b79 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_clinc_kwkwkwkwpark_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_clinc_kwkwkwkwpark DistilBertForSequenceClassification from kwkwkwkwpark +author: John Snow Labs +name: distilbert_base_uncased_finetuned_clinc_kwkwkwkwpark +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_clinc_kwkwkwkwpark` is a English model originally trained by kwkwkwkwpark. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_clinc_kwkwkwkwpark_en_5.5.0_3.0_1727033870540.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_clinc_kwkwkwkwpark_en_5.5.0_3.0_1727033870540.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_clinc_kwkwkwkwpark","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_clinc_kwkwkwkwpark", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_clinc_kwkwkwkwpark| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.9 MB| + +## References + +https://huggingface.co/kwkwkwkwpark/distilbert-base-uncased-finetuned-clinc \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_clinc_kwkwkwkwpark_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_clinc_kwkwkwkwpark_pipeline_en.md new file mode 100644 index 00000000000000..baaf287141ac2b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_clinc_kwkwkwkwpark_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_clinc_kwkwkwkwpark_pipeline pipeline DistilBertForSequenceClassification from kwkwkwkwpark +author: John Snow Labs +name: distilbert_base_uncased_finetuned_clinc_kwkwkwkwpark_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_clinc_kwkwkwkwpark_pipeline` is a English model originally trained by kwkwkwkwpark. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_clinc_kwkwkwkwpark_pipeline_en_5.5.0_3.0_1727033883520.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_clinc_kwkwkwkwpark_pipeline_en_5.5.0_3.0_1727033883520.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_clinc_kwkwkwkwpark_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_clinc_kwkwkwkwpark_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_clinc_kwkwkwkwpark_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.9 MB| + +## References + +https://huggingface.co/kwkwkwkwpark/distilbert-base-uncased-finetuned-clinc + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_clinc_maybehesham_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_clinc_maybehesham_en.md new file mode 100644 index 00000000000000..7d7436753f5527 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_clinc_maybehesham_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_clinc_maybehesham DistilBertForSequenceClassification from MayBeHesham +author: John Snow Labs +name: distilbert_base_uncased_finetuned_clinc_maybehesham +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_clinc_maybehesham` is a English model originally trained by MayBeHesham. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_clinc_maybehesham_en_5.5.0_3.0_1727012537981.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_clinc_maybehesham_en_5.5.0_3.0_1727012537981.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_clinc_maybehesham","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_clinc_maybehesham", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_clinc_maybehesham| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.9 MB| + +## References + +https://huggingface.co/MayBeHesham/distilbert-base-uncased-finetuned-clinc \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_clinc_maybehesham_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_clinc_maybehesham_pipeline_en.md new file mode 100644 index 00000000000000..bea8723e06ab6c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_clinc_maybehesham_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_clinc_maybehesham_pipeline pipeline DistilBertForSequenceClassification from MayBeHesham +author: John Snow Labs +name: distilbert_base_uncased_finetuned_clinc_maybehesham_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_clinc_maybehesham_pipeline` is a English model originally trained by MayBeHesham. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_clinc_maybehesham_pipeline_en_5.5.0_3.0_1727012551607.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_clinc_maybehesham_pipeline_en_5.5.0_3.0_1727012551607.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_clinc_maybehesham_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_clinc_maybehesham_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_clinc_maybehesham_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.9 MB| + +## References + +https://huggingface.co/MayBeHesham/distilbert-base-uncased-finetuned-clinc + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_clinc_sudaheng_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_clinc_sudaheng_en.md new file mode 100644 index 00000000000000..d817c70515f481 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_clinc_sudaheng_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_clinc_sudaheng DistilBertForSequenceClassification from sudaheng +author: John Snow Labs +name: distilbert_base_uncased_finetuned_clinc_sudaheng +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_clinc_sudaheng` is a English model originally trained by sudaheng. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_clinc_sudaheng_en_5.5.0_3.0_1727012230175.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_clinc_sudaheng_en_5.5.0_3.0_1727012230175.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_clinc_sudaheng","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_clinc_sudaheng", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_clinc_sudaheng| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.9 MB| + +## References + +https://huggingface.co/sudaheng/distilbert-base-uncased-finetuned-clinc \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_clinc_sudaheng_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_clinc_sudaheng_pipeline_en.md new file mode 100644 index 00000000000000..5337bdba93f8b7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_clinc_sudaheng_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_clinc_sudaheng_pipeline pipeline DistilBertForSequenceClassification from sudaheng +author: John Snow Labs +name: distilbert_base_uncased_finetuned_clinc_sudaheng_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_clinc_sudaheng_pipeline` is a English model originally trained by sudaheng. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_clinc_sudaheng_pipeline_en_5.5.0_3.0_1727012242828.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_clinc_sudaheng_pipeline_en_5.5.0_3.0_1727012242828.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_clinc_sudaheng_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_clinc_sudaheng_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_clinc_sudaheng_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.9 MB| + +## References + +https://huggingface.co/sudaheng/distilbert-base-uncased-finetuned-clinc + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_cola_artoriasxv_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_cola_artoriasxv_en.md new file mode 100644 index 00000000000000..e7664996611301 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_cola_artoriasxv_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_cola_artoriasxv DistilBertForSequenceClassification from ArtoriasXV +author: John Snow Labs +name: distilbert_base_uncased_finetuned_cola_artoriasxv +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_cola_artoriasxv` is a English model originally trained by ArtoriasXV. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_artoriasxv_en_5.5.0_3.0_1727012452925.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_artoriasxv_en_5.5.0_3.0_1727012452925.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_cola_artoriasxv","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_cola_artoriasxv", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_cola_artoriasxv| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/ArtoriasXV/distilbert-base-uncased-finetuned-cola \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_cola_bensonhugging_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_cola_bensonhugging_en.md new file mode 100644 index 00000000000000..f2c6ff9b0f5406 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_cola_bensonhugging_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_cola_bensonhugging DistilBertForSequenceClassification from BensonHugging +author: John Snow Labs +name: distilbert_base_uncased_finetuned_cola_bensonhugging +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_cola_bensonhugging` is a English model originally trained by BensonHugging. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_bensonhugging_en_5.5.0_3.0_1727033258488.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_bensonhugging_en_5.5.0_3.0_1727033258488.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_cola_bensonhugging","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_cola_bensonhugging", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_cola_bensonhugging| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/BensonHugging/distilbert-base-uncased-finetuned-cola \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_cola_bensonhugging_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_cola_bensonhugging_pipeline_en.md new file mode 100644 index 00000000000000..2125118e638801 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_cola_bensonhugging_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_cola_bensonhugging_pipeline pipeline DistilBertForSequenceClassification from BensonHugging +author: John Snow Labs +name: distilbert_base_uncased_finetuned_cola_bensonhugging_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_cola_bensonhugging_pipeline` is a English model originally trained by BensonHugging. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_bensonhugging_pipeline_en_5.5.0_3.0_1727033274639.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_bensonhugging_pipeline_en_5.5.0_3.0_1727033274639.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_cola_bensonhugging_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_cola_bensonhugging_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_cola_bensonhugging_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/BensonHugging/distilbert-base-uncased-finetuned-cola + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_cola_panzy0524_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_cola_panzy0524_en.md new file mode 100644 index 00000000000000..bb6f3acc53a3ca --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_cola_panzy0524_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_cola_panzy0524 DistilBertForSequenceClassification from panzy0524 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_cola_panzy0524 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_cola_panzy0524` is a English model originally trained by panzy0524. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_panzy0524_en_5.5.0_3.0_1727012729722.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_panzy0524_en_5.5.0_3.0_1727012729722.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_cola_panzy0524","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_cola_panzy0524", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_cola_panzy0524| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/panzy0524/distilbert-base-uncased-finetuned-cola \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_cola_panzy0524_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_cola_panzy0524_pipeline_en.md new file mode 100644 index 00000000000000..025e86a1d5fb5d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_cola_panzy0524_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_cola_panzy0524_pipeline pipeline DistilBertForSequenceClassification from panzy0524 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_cola_panzy0524_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_cola_panzy0524_pipeline` is a English model originally trained by panzy0524. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_panzy0524_pipeline_en_5.5.0_3.0_1727012742037.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_panzy0524_pipeline_en_5.5.0_3.0_1727012742037.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_cola_panzy0524_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_cola_panzy0524_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_cola_panzy0524_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/panzy0524/distilbert-base-uncased-finetuned-cola + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_cola_ravikant22_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_cola_ravikant22_en.md new file mode 100644 index 00000000000000..db3d10133dc609 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_cola_ravikant22_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_cola_ravikant22 DistilBertForSequenceClassification from ravikant22 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_cola_ravikant22 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_cola_ravikant22` is a English model originally trained by ravikant22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_ravikant22_en_5.5.0_3.0_1727020664780.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_ravikant22_en_5.5.0_3.0_1727020664780.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_cola_ravikant22","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_cola_ravikant22", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_cola_ravikant22| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/ravikant22/distilbert-base-uncased-finetuned-cola \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_cola_ravikant22_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_cola_ravikant22_pipeline_en.md new file mode 100644 index 00000000000000..e82be3cf26026e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_cola_ravikant22_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_cola_ravikant22_pipeline pipeline DistilBertForSequenceClassification from ravikant22 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_cola_ravikant22_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_cola_ravikant22_pipeline` is a English model originally trained by ravikant22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_ravikant22_pipeline_en_5.5.0_3.0_1727020677398.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_ravikant22_pipeline_en_5.5.0_3.0_1727020677398.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_cola_ravikant22_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_cola_ravikant22_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_cola_ravikant22_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/ravikant22/distilbert-base-uncased-finetuned-cola + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotion_anamelchor_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotion_anamelchor_en.md new file mode 100644 index 00000000000000..7d622c98762498 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotion_anamelchor_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_anamelchor DistilBertForSequenceClassification from anamelchor +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_anamelchor +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_anamelchor` is a English model originally trained by anamelchor. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_anamelchor_en_5.5.0_3.0_1727035401399.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_anamelchor_en_5.5.0_3.0_1727035401399.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_anamelchor","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_anamelchor", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_anamelchor| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/anamelchor/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotion_anamelchor_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotion_anamelchor_pipeline_en.md new file mode 100644 index 00000000000000..b810348675ac07 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotion_anamelchor_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_anamelchor_pipeline pipeline DistilBertForSequenceClassification from anamelchor +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_anamelchor_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_anamelchor_pipeline` is a English model originally trained by anamelchor. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_anamelchor_pipeline_en_5.5.0_3.0_1727035413745.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_anamelchor_pipeline_en_5.5.0_3.0_1727035413745.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_anamelchor_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_anamelchor_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_anamelchor_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/anamelchor/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotion_beijaflor2024_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotion_beijaflor2024_en.md new file mode 100644 index 00000000000000..466671cee0455c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotion_beijaflor2024_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_beijaflor2024 DistilBertForSequenceClassification from Beijaflor2024 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_beijaflor2024 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_beijaflor2024` is a English model originally trained by Beijaflor2024. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_beijaflor2024_en_5.5.0_3.0_1727012538121.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_beijaflor2024_en_5.5.0_3.0_1727012538121.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_beijaflor2024","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_beijaflor2024", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_beijaflor2024| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Beijaflor2024/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotion_beijaflor2024_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotion_beijaflor2024_pipeline_en.md new file mode 100644 index 00000000000000..06fa2113da5ccd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotion_beijaflor2024_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_beijaflor2024_pipeline pipeline DistilBertForSequenceClassification from Beijaflor2024 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_beijaflor2024_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_beijaflor2024_pipeline` is a English model originally trained by Beijaflor2024. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_beijaflor2024_pipeline_en_5.5.0_3.0_1727012551851.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_beijaflor2024_pipeline_en_5.5.0_3.0_1727012551851.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_beijaflor2024_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_beijaflor2024_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_beijaflor2024_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Beijaflor2024/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotion_evernight017_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotion_evernight017_en.md new file mode 100644 index 00000000000000..cf91fe2418698c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotion_evernight017_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_evernight017 DistilBertForSequenceClassification from evernight017 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_evernight017 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_evernight017` is a English model originally trained by evernight017. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_evernight017_en_5.5.0_3.0_1727035059104.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_evernight017_en_5.5.0_3.0_1727035059104.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_evernight017","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_evernight017", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_evernight017| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/evernight017/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotion_evernight017_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotion_evernight017_pipeline_en.md new file mode 100644 index 00000000000000..5bc32c715c35d1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotion_evernight017_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_evernight017_pipeline pipeline DistilBertForSequenceClassification from evernight017 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_evernight017_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_evernight017_pipeline` is a English model originally trained by evernight017. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_evernight017_pipeline_en_5.5.0_3.0_1727035071619.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_evernight017_pipeline_en_5.5.0_3.0_1727035071619.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_evernight017_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_evernight017_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_evernight017_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/evernight017/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotion_gossk_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotion_gossk_en.md new file mode 100644 index 00000000000000..a0dbe86e43f8c9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotion_gossk_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_gossk DistilBertForSequenceClassification from Gossk +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_gossk +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_gossk` is a English model originally trained by Gossk. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_gossk_en_5.5.0_3.0_1726980616063.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_gossk_en_5.5.0_3.0_1726980616063.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_gossk","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_gossk", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_gossk| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Gossk/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotion_gossk_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotion_gossk_pipeline_en.md new file mode 100644 index 00000000000000..7f6addfe0e0763 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotion_gossk_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_gossk_pipeline pipeline DistilBertForSequenceClassification from Gossk +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_gossk_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_gossk_pipeline` is a English model originally trained by Gossk. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_gossk_pipeline_en_5.5.0_3.0_1726980628189.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_gossk_pipeline_en_5.5.0_3.0_1726980628189.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_gossk_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_gossk_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_gossk_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Gossk/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotion_jason_oh_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotion_jason_oh_en.md new file mode 100644 index 00000000000000..b7b5f9bf80c7e9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotion_jason_oh_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_jason_oh DistilBertForSequenceClassification from Jason-Oh +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_jason_oh +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_jason_oh` is a English model originally trained by Jason-Oh. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_jason_oh_en_5.5.0_3.0_1727020892721.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_jason_oh_en_5.5.0_3.0_1727020892721.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_jason_oh","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_jason_oh", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_jason_oh| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Jason-Oh/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotion_jason_oh_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotion_jason_oh_pipeline_en.md new file mode 100644 index 00000000000000..a2f111fc87fad9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotion_jason_oh_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_jason_oh_pipeline pipeline DistilBertForSequenceClassification from Jason-Oh +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_jason_oh_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_jason_oh_pipeline` is a English model originally trained by Jason-Oh. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_jason_oh_pipeline_en_5.5.0_3.0_1727020904370.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_jason_oh_pipeline_en_5.5.0_3.0_1727020904370.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_jason_oh_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_jason_oh_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_jason_oh_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Jason-Oh/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotion_jhtop88_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotion_jhtop88_en.md new file mode 100644 index 00000000000000..35d8fdc0392fe3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotion_jhtop88_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_jhtop88 DistilBertForSequenceClassification from jhtop88 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_jhtop88 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_jhtop88` is a English model originally trained by jhtop88. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_jhtop88_en_5.5.0_3.0_1727035268264.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_jhtop88_en_5.5.0_3.0_1727035268264.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_jhtop88","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_jhtop88", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_jhtop88| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/jhtop88/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotion_jhtop88_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotion_jhtop88_pipeline_en.md new file mode 100644 index 00000000000000..24b8c0fe5cf522 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotion_jhtop88_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_jhtop88_pipeline pipeline DistilBertForSequenceClassification from jhtop88 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_jhtop88_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_jhtop88_pipeline` is a English model originally trained by jhtop88. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_jhtop88_pipeline_en_5.5.0_3.0_1727035281992.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_jhtop88_pipeline_en_5.5.0_3.0_1727035281992.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_jhtop88_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_jhtop88_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_jhtop88_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/jhtop88/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotion_kaebams_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotion_kaebams_en.md new file mode 100644 index 00000000000000..1a646ab09c196a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotion_kaebams_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_kaebams DistilBertForSequenceClassification from kaebaMS +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_kaebams +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_kaebams` is a English model originally trained by kaebaMS. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_kaebams_en_5.5.0_3.0_1726980110407.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_kaebams_en_5.5.0_3.0_1726980110407.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_kaebams","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_kaebams", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_kaebams| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/kaebaMS/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotion_kaebams_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotion_kaebams_pipeline_en.md new file mode 100644 index 00000000000000..22bb0d36c2d1f1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotion_kaebams_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_kaebams_pipeline pipeline DistilBertForSequenceClassification from kaebaMS +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_kaebams_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_kaebams_pipeline` is a English model originally trained by kaebaMS. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_kaebams_pipeline_en_5.5.0_3.0_1726980124056.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_kaebams_pipeline_en_5.5.0_3.0_1726980124056.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_kaebams_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_kaebams_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_kaebams_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/kaebaMS/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotion_onelock_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotion_onelock_en.md new file mode 100644 index 00000000000000..e1ce2c4db81735 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotion_onelock_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_onelock DistilBertForSequenceClassification from onelock +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_onelock +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_onelock` is a English model originally trained by onelock. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_onelock_en_5.5.0_3.0_1727035118067.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_onelock_en_5.5.0_3.0_1727035118067.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_onelock","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_onelock", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_onelock| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/onelock/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotion_onelock_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotion_onelock_pipeline_en.md new file mode 100644 index 00000000000000..c9516cdfdeecfc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotion_onelock_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_onelock_pipeline pipeline DistilBertForSequenceClassification from onelock +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_onelock_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_onelock_pipeline` is a English model originally trained by onelock. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_onelock_pipeline_en_5.5.0_3.0_1727035130478.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_onelock_pipeline_en_5.5.0_3.0_1727035130478.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_onelock_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_onelock_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_onelock_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/onelock/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotion_ouba_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotion_ouba_en.md new file mode 100644 index 00000000000000..aad4291bf7227a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotion_ouba_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_ouba DistilBertForSequenceClassification from ouba +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_ouba +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_ouba` is a English model originally trained by ouba. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_ouba_en_5.5.0_3.0_1727033817866.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_ouba_en_5.5.0_3.0_1727033817866.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_ouba","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_ouba", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_ouba| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/ouba/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotion_ouba_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotion_ouba_pipeline_en.md new file mode 100644 index 00000000000000..75fdb55f17fe32 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotion_ouba_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_ouba_pipeline pipeline DistilBertForSequenceClassification from ouba +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_ouba_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_ouba_pipeline` is a English model originally trained by ouba. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_ouba_pipeline_en_5.5.0_3.0_1727033830279.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_ouba_pipeline_en_5.5.0_3.0_1727033830279.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_ouba_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_ouba_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_ouba_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/ouba/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotion_overall_4_0_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotion_overall_4_0_en.md new file mode 100644 index 00000000000000..eb8f378fe5c607 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotion_overall_4_0_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_overall_4_0 DistilBertForSequenceClassification from LeBruse +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_overall_4_0 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_overall_4_0` is a English model originally trained by LeBruse. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_overall_4_0_en_5.5.0_3.0_1727020578253.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_overall_4_0_en_5.5.0_3.0_1727020578253.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_overall_4_0","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_overall_4_0", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_overall_4_0| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/LeBruse/distilbert-base-uncased-finetuned-emotion-overall-4.0 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotion_overall_4_0_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotion_overall_4_0_pipeline_en.md new file mode 100644 index 00000000000000..5051a44bddca0b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotion_overall_4_0_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_overall_4_0_pipeline pipeline DistilBertForSequenceClassification from LeBruse +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_overall_4_0_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_overall_4_0_pipeline` is a English model originally trained by LeBruse. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_overall_4_0_pipeline_en_5.5.0_3.0_1727020590064.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_overall_4_0_pipeline_en_5.5.0_3.0_1727020590064.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_overall_4_0_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_overall_4_0_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_overall_4_0_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/LeBruse/distilbert-base-uncased-finetuned-emotion-overall-4.0 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotion_souling_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotion_souling_en.md new file mode 100644 index 00000000000000..cce134d34ff926 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotion_souling_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_souling DistilBertForSequenceClassification from souling +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_souling +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_souling` is a English model originally trained by souling. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_souling_en_5.5.0_3.0_1726980716990.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_souling_en_5.5.0_3.0_1726980716990.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_souling","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_souling", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_souling| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/souling/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotion_souling_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotion_souling_pipeline_en.md new file mode 100644 index 00000000000000..849c52f24f5d56 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotion_souling_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_souling_pipeline pipeline DistilBertForSequenceClassification from souling +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_souling_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_souling_pipeline` is a English model originally trained by souling. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_souling_pipeline_en_5.5.0_3.0_1726980728475.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_souling_pipeline_en_5.5.0_3.0_1726980728475.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_souling_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_souling_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_souling_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/souling/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotion_thiago2608santana_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotion_thiago2608santana_en.md new file mode 100644 index 00000000000000..ff34994668758f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotion_thiago2608santana_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_thiago2608santana DistilBertForSequenceClassification from thiago2608santana +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_thiago2608santana +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_thiago2608santana` is a English model originally trained by thiago2608santana. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_thiago2608santana_en_5.5.0_3.0_1727012426126.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_thiago2608santana_en_5.5.0_3.0_1727012426126.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_thiago2608santana","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_thiago2608santana", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_thiago2608santana| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/thiago2608santana/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotion_thiago2608santana_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotion_thiago2608santana_pipeline_en.md new file mode 100644 index 00000000000000..c232639e32aaea --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotion_thiago2608santana_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_thiago2608santana_pipeline pipeline DistilBertForSequenceClassification from thiago2608santana +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_thiago2608santana_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_thiago2608santana_pipeline` is a English model originally trained by thiago2608santana. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_thiago2608santana_pipeline_en_5.5.0_3.0_1727012438354.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_thiago2608santana_pipeline_en_5.5.0_3.0_1727012438354.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_thiago2608santana_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_thiago2608santana_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_thiago2608santana_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/thiago2608santana/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotion_yamaguchi_kota_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotion_yamaguchi_kota_en.md new file mode 100644 index 00000000000000..648bef82210cd2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotion_yamaguchi_kota_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_yamaguchi_kota DistilBertForSequenceClassification from yamaguchi-kota +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_yamaguchi_kota +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_yamaguchi_kota` is a English model originally trained by yamaguchi-kota. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_yamaguchi_kota_en_5.5.0_3.0_1727012343958.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_yamaguchi_kota_en_5.5.0_3.0_1727012343958.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_yamaguchi_kota","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_yamaguchi_kota", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_yamaguchi_kota| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/yamaguchi-kota/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotion_yamaguchi_kota_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotion_yamaguchi_kota_pipeline_en.md new file mode 100644 index 00000000000000..ec283c84d03f14 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotion_yamaguchi_kota_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_yamaguchi_kota_pipeline pipeline DistilBertForSequenceClassification from yamaguchi-kota +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_yamaguchi_kota_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_yamaguchi_kota_pipeline` is a English model originally trained by yamaguchi-kota. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_yamaguchi_kota_pipeline_en_5.5.0_3.0_1727012355775.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_yamaguchi_kota_pipeline_en_5.5.0_3.0_1727012355775.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_yamaguchi_kota_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_yamaguchi_kota_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_yamaguchi_kota_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/yamaguchi-kota/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotion_yashcfc_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotion_yashcfc_en.md new file mode 100644 index 00000000000000..e3e0b63da1ae24 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotion_yashcfc_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_yashcfc DistilBertForSequenceClassification from yashcfc +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_yashcfc +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_yashcfc` is a English model originally trained by yashcfc. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_yashcfc_en_5.5.0_3.0_1727033132643.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_yashcfc_en_5.5.0_3.0_1727033132643.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_yashcfc","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_yashcfc", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_yashcfc| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/yashcfc/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotion_yashcfc_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotion_yashcfc_pipeline_en.md new file mode 100644 index 00000000000000..82f9062de5798b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotion_yashcfc_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_yashcfc_pipeline pipeline DistilBertForSequenceClassification from yashcfc +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_yashcfc_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_yashcfc_pipeline` is a English model originally trained by yashcfc. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_yashcfc_pipeline_en_5.5.0_3.0_1727033153389.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_yashcfc_pipeline_en_5.5.0_3.0_1727033153389.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_yashcfc_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_yashcfc_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_yashcfc_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/yashcfc/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotions_fibleep_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotions_fibleep_en.md new file mode 100644 index 00000000000000..69bb18c0a72a68 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotions_fibleep_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotions_fibleep DistilBertForSequenceClassification from fibleep +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotions_fibleep +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotions_fibleep` is a English model originally trained by fibleep. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotions_fibleep_en_5.5.0_3.0_1727035254547.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotions_fibleep_en_5.5.0_3.0_1727035254547.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotions_fibleep","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotions_fibleep", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotions_fibleep| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/fibleep/distilbert-base-uncased-finetuned-emotions \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotions_fibleep_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotions_fibleep_pipeline_en.md new file mode 100644 index 00000000000000..7dda42573b11dc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotions_fibleep_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotions_fibleep_pipeline pipeline DistilBertForSequenceClassification from fibleep +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotions_fibleep_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotions_fibleep_pipeline` is a English model originally trained by fibleep. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotions_fibleep_pipeline_en_5.5.0_3.0_1727035267304.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotions_fibleep_pipeline_en_5.5.0_3.0_1727035267304.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotions_fibleep_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotions_fibleep_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotions_fibleep_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/fibleep/distilbert-base-uncased-finetuned-emotions + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotions_riukix_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotions_riukix_en.md new file mode 100644 index 00000000000000..2f570e236b669f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotions_riukix_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotions_riukix DistilBertForSequenceClassification from Riukix +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotions_riukix +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotions_riukix` is a English model originally trained by Riukix. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotions_riukix_en_5.5.0_3.0_1726980229438.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotions_riukix_en_5.5.0_3.0_1726980229438.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotions_riukix","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotions_riukix", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotions_riukix| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Riukix/distilbert-base-uncased-finetuned-emotions \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotions_riukix_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotions_riukix_pipeline_en.md new file mode 100644 index 00000000000000..ff64c87a76b825 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotions_riukix_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotions_riukix_pipeline pipeline DistilBertForSequenceClassification from Riukix +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotions_riukix_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotions_riukix_pipeline` is a English model originally trained by Riukix. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotions_riukix_pipeline_en_5.5.0_3.0_1726980241955.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotions_riukix_pipeline_en_5.5.0_3.0_1726980241955.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotions_riukix_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotions_riukix_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotions_riukix_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Riukix/distilbert-base-uncased-finetuned-emotions + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_m_avoid_harm_seler_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_m_avoid_harm_seler_en.md new file mode 100644 index 00000000000000..0af343d7f4d0c3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_m_avoid_harm_seler_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_m_avoid_harm_seler DistilBertForSequenceClassification from Gregorig +author: John Snow Labs +name: distilbert_base_uncased_finetuned_m_avoid_harm_seler +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_m_avoid_harm_seler` is a English model originally trained by Gregorig. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_m_avoid_harm_seler_en_5.5.0_3.0_1726979997943.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_m_avoid_harm_seler_en_5.5.0_3.0_1726979997943.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_m_avoid_harm_seler","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_m_avoid_harm_seler", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_m_avoid_harm_seler| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Gregorig/distilbert-base-uncased-finetuned-m_avoid_harm_seler \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_m_avoid_harm_seler_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_m_avoid_harm_seler_pipeline_en.md new file mode 100644 index 00000000000000..e57f0e18909b89 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_m_avoid_harm_seler_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_m_avoid_harm_seler_pipeline pipeline DistilBertForSequenceClassification from Gregorig +author: John Snow Labs +name: distilbert_base_uncased_finetuned_m_avoid_harm_seler_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_m_avoid_harm_seler_pipeline` is a English model originally trained by Gregorig. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_m_avoid_harm_seler_pipeline_en_5.5.0_3.0_1726980013874.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_m_avoid_harm_seler_pipeline_en_5.5.0_3.0_1726980013874.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_m_avoid_harm_seler_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_m_avoid_harm_seler_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_m_avoid_harm_seler_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Gregorig/distilbert-base-uncased-finetuned-m_avoid_harm_seler + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_nlp_letters_s1_s2_degendered_class_weighted_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_nlp_letters_s1_s2_degendered_class_weighted_pipeline_en.md new file mode 100644 index 00000000000000..af0622b47782b0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_nlp_letters_s1_s2_degendered_class_weighted_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_nlp_letters_s1_s2_degendered_class_weighted_pipeline pipeline DistilBertForSequenceClassification from ben-yu +author: John Snow Labs +name: distilbert_base_uncased_finetuned_nlp_letters_s1_s2_degendered_class_weighted_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_nlp_letters_s1_s2_degendered_class_weighted_pipeline` is a English model originally trained by ben-yu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_nlp_letters_s1_s2_degendered_class_weighted_pipeline_en_5.5.0_3.0_1727020582269.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_nlp_letters_s1_s2_degendered_class_weighted_pipeline_en_5.5.0_3.0_1727020582269.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_nlp_letters_s1_s2_degendered_class_weighted_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_nlp_letters_s1_s2_degendered_class_weighted_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_nlp_letters_s1_s2_degendered_class_weighted_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/ben-yu/distilbert-base-uncased-finetuned-nlp-letters-s1-s2-degendered-class-weighted + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_t_feedback_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_t_feedback_en.md new file mode 100644 index 00000000000000..6de9583f112de6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_t_feedback_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_t_feedback DistilBertForSequenceClassification from Gregorig +author: John Snow Labs +name: distilbert_base_uncased_finetuned_t_feedback +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_t_feedback` is a English model originally trained by Gregorig. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_t_feedback_en_5.5.0_3.0_1727012913300.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_t_feedback_en_5.5.0_3.0_1727012913300.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_t_feedback","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_t_feedback", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_t_feedback| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Gregorig/distilbert-base-uncased-finetuned-t_feedback \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_t_feedback_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_t_feedback_pipeline_en.md new file mode 100644 index 00000000000000..76a1c733ea491b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_t_feedback_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_t_feedback_pipeline pipeline DistilBertForSequenceClassification from Gregorig +author: John Snow Labs +name: distilbert_base_uncased_finetuned_t_feedback_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_t_feedback_pipeline` is a English model originally trained by Gregorig. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_t_feedback_pipeline_en_5.5.0_3.0_1727012926017.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_t_feedback_pipeline_en_5.5.0_3.0_1727012926017.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_t_feedback_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_t_feedback_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_t_feedback_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Gregorig/distilbert-base-uncased-finetuned-t_feedback + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_t_product_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_t_product_en.md new file mode 100644 index 00000000000000..ff3b0a250f6a8b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_t_product_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_t_product DistilBertForSequenceClassification from Gregorig +author: John Snow Labs +name: distilbert_base_uncased_finetuned_t_product +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_t_product` is a English model originally trained by Gregorig. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_t_product_en_5.5.0_3.0_1727035146952.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_t_product_en_5.5.0_3.0_1727035146952.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_t_product","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_t_product", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_t_product| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Gregorig/distilbert-base-uncased-finetuned-t_product \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_t_product_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_t_product_pipeline_en.md new file mode 100644 index 00000000000000..fa16bd17668083 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_t_product_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_t_product_pipeline pipeline DistilBertForSequenceClassification from Gregorig +author: John Snow Labs +name: distilbert_base_uncased_finetuned_t_product_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_t_product_pipeline` is a English model originally trained by Gregorig. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_t_product_pipeline_en_5.5.0_3.0_1727035159928.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_t_product_pipeline_en_5.5.0_3.0_1727035159928.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_t_product_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_t_product_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_t_product_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Gregorig/distilbert-base-uncased-finetuned-t_product + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_idm_zphr_0st52sd_ut52ut1_plprefix0stlarge_simsp100_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_idm_zphr_0st52sd_ut52ut1_plprefix0stlarge_simsp100_en.md new file mode 100644 index 00000000000000..7ec8f892e3be62 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_idm_zphr_0st52sd_ut52ut1_plprefix0stlarge_simsp100_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_idm_zphr_0st52sd_ut52ut1_plprefix0stlarge_simsp100 DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_idm_zphr_0st52sd_ut52ut1_plprefix0stlarge_simsp100 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_idm_zphr_0st52sd_ut52ut1_plprefix0stlarge_simsp100` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_idm_zphr_0st52sd_ut52ut1_plprefix0stlarge_simsp100_en_5.5.0_3.0_1727020693294.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_idm_zphr_0st52sd_ut52ut1_plprefix0stlarge_simsp100_en_5.5.0_3.0_1727020693294.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_idm_zphr_0st52sd_ut52ut1_plprefix0stlarge_simsp100","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_idm_zphr_0st52sd_ut52ut1_plprefix0stlarge_simsp100", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_idm_zphr_0st52sd_ut52ut1_plprefix0stlarge_simsp100| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.7 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_idm_zphr_0st52sd_ut52ut1_PLPrefix0stlarge_simsp100 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_idm_zphr_0st52sd_ut52ut1_plprefix0stlarge_simsp100_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_idm_zphr_0st52sd_ut52ut1_plprefix0stlarge_simsp100_pipeline_en.md new file mode 100644 index 00000000000000..facbac303aac0e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_idm_zphr_0st52sd_ut52ut1_plprefix0stlarge_simsp100_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_idm_zphr_0st52sd_ut52ut1_plprefix0stlarge_simsp100_pipeline pipeline DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_idm_zphr_0st52sd_ut52ut1_plprefix0stlarge_simsp100_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_idm_zphr_0st52sd_ut52ut1_plprefix0stlarge_simsp100_pipeline` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_idm_zphr_0st52sd_ut52ut1_plprefix0stlarge_simsp100_pipeline_en_5.5.0_3.0_1727020705248.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_idm_zphr_0st52sd_ut52ut1_plprefix0stlarge_simsp100_pipeline_en_5.5.0_3.0_1727020705248.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_idm_zphr_0st52sd_ut52ut1_plprefix0stlarge_simsp100_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_idm_zphr_0st52sd_ut52ut1_plprefix0stlarge_simsp100_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_idm_zphr_0st52sd_ut52ut1_plprefix0stlarge_simsp100_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.7 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_idm_zphr_0st52sd_ut52ut1_PLPrefix0stlarge_simsp100 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_linasaba_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_linasaba_pipeline_en.md new file mode 100644 index 00000000000000..efd1fa893cdbc6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_linasaba_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_linasaba_pipeline pipeline BertForSequenceClassification from LinaSaba +author: John Snow Labs +name: distilbert_base_uncased_linasaba_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_linasaba_pipeline` is a English model originally trained by LinaSaba. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_linasaba_pipeline_en_5.5.0_3.0_1727007552846.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_linasaba_pipeline_en_5.5.0_3.0_1727007552846.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_linasaba_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_linasaba_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_linasaba_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/LinaSaba/distilbert-base-uncased + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_massive_v1_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_massive_v1_en.md new file mode 100644 index 00000000000000..b0d15129052aae --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_massive_v1_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_massive_v1 DistilBertForSequenceClassification from benayas +author: John Snow Labs +name: distilbert_base_uncased_massive_v1 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_massive_v1` is a English model originally trained by benayas. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_massive_v1_en_5.5.0_3.0_1726980748700.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_massive_v1_en_5.5.0_3.0_1726980748700.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_massive_v1","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_massive_v1", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_massive_v1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|258.0 MB| + +## References + +https://huggingface.co/benayas/distilbert-base-uncased-massive-v1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_massive_v1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_massive_v1_pipeline_en.md new file mode 100644 index 00000000000000..22b5c23e00de0d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_massive_v1_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_massive_v1_pipeline pipeline DistilBertForSequenceClassification from benayas +author: John Snow Labs +name: distilbert_base_uncased_massive_v1_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_massive_v1_pipeline` is a English model originally trained by benayas. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_massive_v1_pipeline_en_5.5.0_3.0_1726980760518.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_massive_v1_pipeline_en_5.5.0_3.0_1726980760518.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_massive_v1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_massive_v1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_massive_v1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|258.0 MB| + +## References + +https://huggingface.co/benayas/distilbert-base-uncased-massive-v1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_odm_zphr_0st13sd_ut72ut5_plprefix0stlarge_simsp100_clean100_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_odm_zphr_0st13sd_ut72ut5_plprefix0stlarge_simsp100_clean100_en.md new file mode 100644 index 00000000000000..7fc6acf4afc86c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_odm_zphr_0st13sd_ut72ut5_plprefix0stlarge_simsp100_clean100_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st13sd_ut72ut5_plprefix0stlarge_simsp100_clean100 DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st13sd_ut72ut5_plprefix0stlarge_simsp100_clean100 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st13sd_ut72ut5_plprefix0stlarge_simsp100_clean100` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st13sd_ut72ut5_plprefix0stlarge_simsp100_clean100_en_5.5.0_3.0_1727033455305.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st13sd_ut72ut5_plprefix0stlarge_simsp100_clean100_en_5.5.0_3.0_1727033455305.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st13sd_ut72ut5_plprefix0stlarge_simsp100_clean100","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st13sd_ut72ut5_plprefix0stlarge_simsp100_clean100", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st13sd_ut72ut5_plprefix0stlarge_simsp100_clean100| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.6 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st13sd_ut72ut5_PLPrefix0stlarge_simsp100_clean100 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_odm_zphr_0st13sd_ut72ut5_plprefix0stlarge_simsp100_clean100_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_odm_zphr_0st13sd_ut72ut5_plprefix0stlarge_simsp100_clean100_pipeline_en.md new file mode 100644 index 00000000000000..4e431e07531d89 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_odm_zphr_0st13sd_ut72ut5_plprefix0stlarge_simsp100_clean100_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st13sd_ut72ut5_plprefix0stlarge_simsp100_clean100_pipeline pipeline DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st13sd_ut72ut5_plprefix0stlarge_simsp100_clean100_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st13sd_ut72ut5_plprefix0stlarge_simsp100_clean100_pipeline` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st13sd_ut72ut5_plprefix0stlarge_simsp100_clean100_pipeline_en_5.5.0_3.0_1727033467897.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st13sd_ut72ut5_plprefix0stlarge_simsp100_clean100_pipeline_en_5.5.0_3.0_1727033467897.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st13sd_ut72ut5_plprefix0stlarge_simsp100_clean100_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st13sd_ut72ut5_plprefix0stlarge_simsp100_clean100_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st13sd_ut72ut5_plprefix0stlarge_simsp100_clean100_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.7 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st13sd_ut72ut5_PLPrefix0stlarge_simsp100_clean100 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_odm_zphr_0st14sd_ut12ut1_plprefix0stlarge_simsp100_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_odm_zphr_0st14sd_ut12ut1_plprefix0stlarge_simsp100_en.md new file mode 100644 index 00000000000000..7f177881234008 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_odm_zphr_0st14sd_ut12ut1_plprefix0stlarge_simsp100_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st14sd_ut12ut1_plprefix0stlarge_simsp100 DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st14sd_ut12ut1_plprefix0stlarge_simsp100 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st14sd_ut12ut1_plprefix0stlarge_simsp100` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st14sd_ut12ut1_plprefix0stlarge_simsp100_en_5.5.0_3.0_1727033389982.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st14sd_ut12ut1_plprefix0stlarge_simsp100_en_5.5.0_3.0_1727033389982.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st14sd_ut12ut1_plprefix0stlarge_simsp100","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st14sd_ut12ut1_plprefix0stlarge_simsp100", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st14sd_ut12ut1_plprefix0stlarge_simsp100| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.6 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st14sd_ut12ut1_PLPrefix0stlarge_simsp100 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_odm_zphr_0st14sd_ut12ut1_plprefix0stlarge_simsp100_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_odm_zphr_0st14sd_ut12ut1_plprefix0stlarge_simsp100_pipeline_en.md new file mode 100644 index 00000000000000..e790cb9733e186 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_odm_zphr_0st14sd_ut12ut1_plprefix0stlarge_simsp100_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st14sd_ut12ut1_plprefix0stlarge_simsp100_pipeline pipeline DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st14sd_ut12ut1_plprefix0stlarge_simsp100_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st14sd_ut12ut1_plprefix0stlarge_simsp100_pipeline` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st14sd_ut12ut1_plprefix0stlarge_simsp100_pipeline_en_5.5.0_3.0_1727033403391.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st14sd_ut12ut1_plprefix0stlarge_simsp100_pipeline_en_5.5.0_3.0_1727033403391.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st14sd_ut12ut1_plprefix0stlarge_simsp100_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st14sd_ut12ut1_plprefix0stlarge_simsp100_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st14sd_ut12ut1_plprefix0stlarge_simsp100_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.7 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st14sd_ut12ut1_PLPrefix0stlarge_simsp100 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_odm_zphr_0st16sd_ut72ut1largepfxnf_simsp300_clean100_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_odm_zphr_0st16sd_ut72ut1largepfxnf_simsp300_clean100_en.md new file mode 100644 index 00000000000000..b8ea872740fd78 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_odm_zphr_0st16sd_ut72ut1largepfxnf_simsp300_clean100_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st16sd_ut72ut1largepfxnf_simsp300_clean100 DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st16sd_ut72ut1largepfxnf_simsp300_clean100 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st16sd_ut72ut1largepfxnf_simsp300_clean100` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st16sd_ut72ut1largepfxnf_simsp300_clean100_en_5.5.0_3.0_1726979997426.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st16sd_ut72ut1largepfxnf_simsp300_clean100_en_5.5.0_3.0_1726979997426.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st16sd_ut72ut1largepfxnf_simsp300_clean100","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st16sd_ut72ut1largepfxnf_simsp300_clean100", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st16sd_ut72ut1largepfxnf_simsp300_clean100| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.6 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st16sd_ut72ut1largePfxNf_simsp300_clean100 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_odm_zphr_0st16sd_ut72ut1largepfxnf_simsp300_clean100_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_odm_zphr_0st16sd_ut72ut1largepfxnf_simsp300_clean100_pipeline_en.md new file mode 100644 index 00000000000000..0b12fd1f8b1896 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_odm_zphr_0st16sd_ut72ut1largepfxnf_simsp300_clean100_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st16sd_ut72ut1largepfxnf_simsp300_clean100_pipeline pipeline DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st16sd_ut72ut1largepfxnf_simsp300_clean100_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st16sd_ut72ut1largepfxnf_simsp300_clean100_pipeline` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st16sd_ut72ut1largepfxnf_simsp300_clean100_pipeline_en_5.5.0_3.0_1726980011775.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st16sd_ut72ut1largepfxnf_simsp300_clean100_pipeline_en_5.5.0_3.0_1726980011775.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st16sd_ut72ut1largepfxnf_simsp300_clean100_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st16sd_ut72ut1largepfxnf_simsp300_clean100_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st16sd_ut72ut1largepfxnf_simsp300_clean100_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.7 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st16sd_ut72ut1largePfxNf_simsp300_clean100 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_odm_zphr_0st1sd_ut72ut5_plprefix0stlarge_simsp100_clean200_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_odm_zphr_0st1sd_ut72ut5_plprefix0stlarge_simsp100_clean200_en.md new file mode 100644 index 00000000000000..0a880ca42ad15a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_odm_zphr_0st1sd_ut72ut5_plprefix0stlarge_simsp100_clean200_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st1sd_ut72ut5_plprefix0stlarge_simsp100_clean200 DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st1sd_ut72ut5_plprefix0stlarge_simsp100_clean200 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st1sd_ut72ut5_plprefix0stlarge_simsp100_clean200` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st1sd_ut72ut5_plprefix0stlarge_simsp100_clean200_en_5.5.0_3.0_1727033131932.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st1sd_ut72ut5_plprefix0stlarge_simsp100_clean200_en_5.5.0_3.0_1727033131932.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st1sd_ut72ut5_plprefix0stlarge_simsp100_clean200","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st1sd_ut72ut5_plprefix0stlarge_simsp100_clean200", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st1sd_ut72ut5_plprefix0stlarge_simsp100_clean200| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.6 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st1sd_ut72ut5_PLPrefix0stlarge_simsp100_clean200 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_odm_zphr_0st1sd_ut72ut5_plprefix0stlarge_simsp100_clean200_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_odm_zphr_0st1sd_ut72ut5_plprefix0stlarge_simsp100_clean200_pipeline_en.md new file mode 100644 index 00000000000000..23624fe213ce90 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_odm_zphr_0st1sd_ut72ut5_plprefix0stlarge_simsp100_clean200_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st1sd_ut72ut5_plprefix0stlarge_simsp100_clean200_pipeline pipeline DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st1sd_ut72ut5_plprefix0stlarge_simsp100_clean200_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st1sd_ut72ut5_plprefix0stlarge_simsp100_clean200_pipeline` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st1sd_ut72ut5_plprefix0stlarge_simsp100_clean200_pipeline_en_5.5.0_3.0_1727033153168.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st1sd_ut72ut5_plprefix0stlarge_simsp100_clean200_pipeline_en_5.5.0_3.0_1727033153168.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st1sd_ut72ut5_plprefix0stlarge_simsp100_clean200_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st1sd_ut72ut5_plprefix0stlarge_simsp100_clean200_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st1sd_ut72ut5_plprefix0stlarge_simsp100_clean200_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.7 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st1sd_ut72ut5_PLPrefix0stlarge_simsp100_clean200 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_odm_zphr_0st3sd_ut52ut1_plprefix0stlarge3_simsp100_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_odm_zphr_0st3sd_ut52ut1_plprefix0stlarge3_simsp100_en.md new file mode 100644 index 00000000000000..ca549859903a76 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_odm_zphr_0st3sd_ut52ut1_plprefix0stlarge3_simsp100_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st3sd_ut52ut1_plprefix0stlarge3_simsp100 DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st3sd_ut52ut1_plprefix0stlarge3_simsp100 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st3sd_ut52ut1_plprefix0stlarge3_simsp100` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st3sd_ut52ut1_plprefix0stlarge3_simsp100_en_5.5.0_3.0_1727035624423.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st3sd_ut52ut1_plprefix0stlarge3_simsp100_en_5.5.0_3.0_1727035624423.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st3sd_ut52ut1_plprefix0stlarge3_simsp100","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st3sd_ut52ut1_plprefix0stlarge3_simsp100", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st3sd_ut52ut1_plprefix0stlarge3_simsp100| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.6 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st3sd_ut52ut1_PLPrefix0stlarge3_simsp100 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_odm_zphr_0st3sd_ut52ut1_plprefix0stlarge3_simsp100_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_odm_zphr_0st3sd_ut52ut1_plprefix0stlarge3_simsp100_pipeline_en.md new file mode 100644 index 00000000000000..9cab9c8ab1d257 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_odm_zphr_0st3sd_ut52ut1_plprefix0stlarge3_simsp100_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st3sd_ut52ut1_plprefix0stlarge3_simsp100_pipeline pipeline DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st3sd_ut52ut1_plprefix0stlarge3_simsp100_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st3sd_ut52ut1_plprefix0stlarge3_simsp100_pipeline` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st3sd_ut52ut1_plprefix0stlarge3_simsp100_pipeline_en_5.5.0_3.0_1727035636666.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st3sd_ut52ut1_plprefix0stlarge3_simsp100_pipeline_en_5.5.0_3.0_1727035636666.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st3sd_ut52ut1_plprefix0stlarge3_simsp100_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st3sd_ut52ut1_plprefix0stlarge3_simsp100_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st3sd_ut52ut1_plprefix0stlarge3_simsp100_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.7 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st3sd_ut52ut1_PLPrefix0stlarge3_simsp100 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1_plprefix0stlarge71_simsp_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1_plprefix0stlarge71_simsp_pipeline_en.md new file mode 100644 index 00000000000000..518427dcf295f6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1_plprefix0stlarge71_simsp_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1_plprefix0stlarge71_simsp_pipeline pipeline DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1_plprefix0stlarge71_simsp_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1_plprefix0stlarge71_simsp_pipeline` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1_plprefix0stlarge71_simsp_pipeline_en_5.5.0_3.0_1727033312225.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1_plprefix0stlarge71_simsp_pipeline_en_5.5.0_3.0_1727033312225.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1_plprefix0stlarge71_simsp_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1_plprefix0stlarge71_simsp_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1_plprefix0stlarge71_simsp_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.7 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st42sd_ut72ut1_PLPrefix0stlarge71_simsp + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_odm_zphr_0st5sd_ut72ut1largepfxnf_simsp300_clean200_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_odm_zphr_0st5sd_ut72ut1largepfxnf_simsp300_clean200_en.md new file mode 100644 index 00000000000000..b1f490b4c7aa13 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_odm_zphr_0st5sd_ut72ut1largepfxnf_simsp300_clean200_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st5sd_ut72ut1largepfxnf_simsp300_clean200 DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st5sd_ut72ut1largepfxnf_simsp300_clean200 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st5sd_ut72ut1largepfxnf_simsp300_clean200` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st5sd_ut72ut1largepfxnf_simsp300_clean200_en_5.5.0_3.0_1726980326234.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st5sd_ut72ut1largepfxnf_simsp300_clean200_en_5.5.0_3.0_1726980326234.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st5sd_ut72ut1largepfxnf_simsp300_clean200","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st5sd_ut72ut1largepfxnf_simsp300_clean200", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st5sd_ut72ut1largepfxnf_simsp300_clean200| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.6 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st5sd_ut72ut1largePfxNf_simsp300_clean200 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_odm_zphr_0st5sd_ut72ut1largepfxnf_simsp300_clean200_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_odm_zphr_0st5sd_ut72ut1largepfxnf_simsp300_clean200_pipeline_en.md new file mode 100644 index 00000000000000..356988e65e1e70 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_odm_zphr_0st5sd_ut72ut1largepfxnf_simsp300_clean200_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st5sd_ut72ut1largepfxnf_simsp300_clean200_pipeline pipeline DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st5sd_ut72ut1largepfxnf_simsp300_clean200_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st5sd_ut72ut1largepfxnf_simsp300_clean200_pipeline` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st5sd_ut72ut1largepfxnf_simsp300_clean200_pipeline_en_5.5.0_3.0_1726980337774.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st5sd_ut72ut1largepfxnf_simsp300_clean200_pipeline_en_5.5.0_3.0_1726980337774.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st5sd_ut72ut1largepfxnf_simsp300_clean200_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st5sd_ut72ut1largepfxnf_simsp300_clean200_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st5sd_ut72ut1largepfxnf_simsp300_clean200_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.7 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st5sd_ut72ut1largePfxNf_simsp300_clean200 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_odm_zphr_0st5sd_ut72ut5_plprefix0stlarge_simsp100_clean300_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_odm_zphr_0st5sd_ut72ut5_plprefix0stlarge_simsp100_clean300_en.md new file mode 100644 index 00000000000000..965434465058bc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_odm_zphr_0st5sd_ut72ut5_plprefix0stlarge_simsp100_clean300_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st5sd_ut72ut5_plprefix0stlarge_simsp100_clean300 DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st5sd_ut72ut5_plprefix0stlarge_simsp100_clean300 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st5sd_ut72ut5_plprefix0stlarge_simsp100_clean300` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st5sd_ut72ut5_plprefix0stlarge_simsp100_clean300_en_5.5.0_3.0_1726980472831.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st5sd_ut72ut5_plprefix0stlarge_simsp100_clean300_en_5.5.0_3.0_1726980472831.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st5sd_ut72ut5_plprefix0stlarge_simsp100_clean300","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st5sd_ut72ut5_plprefix0stlarge_simsp100_clean300", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st5sd_ut72ut5_plprefix0stlarge_simsp100_clean300| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.6 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st5sd_ut72ut5_PLPrefix0stlarge_simsp100_clean300 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_odm_zphr_0st5sd_ut72ut5_plprefix0stlarge_simsp100_clean300_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_odm_zphr_0st5sd_ut72ut5_plprefix0stlarge_simsp100_clean300_pipeline_en.md new file mode 100644 index 00000000000000..73546597f36051 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_odm_zphr_0st5sd_ut72ut5_plprefix0stlarge_simsp100_clean300_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st5sd_ut72ut5_plprefix0stlarge_simsp100_clean300_pipeline pipeline DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st5sd_ut72ut5_plprefix0stlarge_simsp100_clean300_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st5sd_ut72ut5_plprefix0stlarge_simsp100_clean300_pipeline` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st5sd_ut72ut5_plprefix0stlarge_simsp100_clean300_pipeline_en_5.5.0_3.0_1726980484367.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st5sd_ut72ut5_plprefix0stlarge_simsp100_clean300_pipeline_en_5.5.0_3.0_1726980484367.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st5sd_ut72ut5_plprefix0stlarge_simsp100_clean300_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st5sd_ut72ut5_plprefix0stlarge_simsp100_clean300_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st5sd_ut72ut5_plprefix0stlarge_simsp100_clean300_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.7 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st5sd_ut72ut5_PLPrefix0stlarge_simsp100_clean300 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_sst2_v0_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_sst2_v0_en.md new file mode 100644 index 00000000000000..45b845ad8ad0d0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_sst2_v0_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_sst2_v0 DistilBertForSequenceClassification from benayas +author: John Snow Labs +name: distilbert_base_uncased_sst2_v0 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_sst2_v0` is a English model originally trained by benayas. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_sst2_v0_en_5.5.0_3.0_1727012888751.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_sst2_v0_en_5.5.0_3.0_1727012888751.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_sst2_v0","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_sst2_v0", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_sst2_v0| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|252.0 MB| + +## References + +https://huggingface.co/benayas/distilbert-base-uncased-sst2-v0 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_sst2_v0_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_sst2_v0_pipeline_en.md new file mode 100644 index 00000000000000..56e50b758ee246 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_sst2_v0_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_sst2_v0_pipeline pipeline DistilBertForSequenceClassification from benayas +author: John Snow Labs +name: distilbert_base_uncased_sst2_v0_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_sst2_v0_pipeline` is a English model originally trained by benayas. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_sst2_v0_pipeline_en_5.5.0_3.0_1727012901729.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_sst2_v0_pipeline_en_5.5.0_3.0_1727012901729.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_sst2_v0_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_sst2_v0_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_sst2_v0_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|252.0 MB| + +## References + +https://huggingface.co/benayas/distilbert-base-uncased-sst2-v0 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_travel_zphr_0st_ut72ut1_plain_simsp_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_travel_zphr_0st_ut72ut1_plain_simsp_en.md new file mode 100644 index 00000000000000..ed3495a13e17cc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_travel_zphr_0st_ut72ut1_plain_simsp_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_travel_zphr_0st_ut72ut1_plain_simsp DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_travel_zphr_0st_ut72ut1_plain_simsp +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_travel_zphr_0st_ut72ut1_plain_simsp` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_travel_zphr_0st_ut72ut1_plain_simsp_en_5.5.0_3.0_1727021064570.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_travel_zphr_0st_ut72ut1_plain_simsp_en_5.5.0_3.0_1727021064570.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_travel_zphr_0st_ut72ut1_plain_simsp","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_travel_zphr_0st_ut72ut1_plain_simsp", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_travel_zphr_0st_ut72ut1_plain_simsp| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_travel_zphr_0st_ut72ut1_plain_simsp \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_travel_zphr_0st_ut72ut1_plain_simsp_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_travel_zphr_0st_ut72ut1_plain_simsp_pipeline_en.md new file mode 100644 index 00000000000000..5264ea709df924 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_travel_zphr_0st_ut72ut1_plain_simsp_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_travel_zphr_0st_ut72ut1_plain_simsp_pipeline pipeline DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_travel_zphr_0st_ut72ut1_plain_simsp_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_travel_zphr_0st_ut72ut1_plain_simsp_pipeline` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_travel_zphr_0st_ut72ut1_plain_simsp_pipeline_en_5.5.0_3.0_1727021076209.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_travel_zphr_0st_ut72ut1_plain_simsp_pipeline_en_5.5.0_3.0_1727021076209.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_travel_zphr_0st_ut72ut1_plain_simsp_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_travel_zphr_0st_ut72ut1_plain_simsp_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_travel_zphr_0st_ut72ut1_plain_simsp_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_travel_zphr_0st_ut72ut1_plain_simsp + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_batch_size_64_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_batch_size_64_en.md new file mode 100644 index 00000000000000..5fdc359a18d1fa --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_batch_size_64_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_batch_size_64 DistilBertForSequenceClassification from K-kiron +author: John Snow Labs +name: distilbert_batch_size_64 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_batch_size_64` is a English model originally trained by K-kiron. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_batch_size_64_en_5.5.0_3.0_1727012230226.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_batch_size_64_en_5.5.0_3.0_1727012230226.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_batch_size_64","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_batch_size_64", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_batch_size_64| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/K-kiron/distilbert-batch-size-64 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_batch_size_64_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_batch_size_64_pipeline_en.md new file mode 100644 index 00000000000000..7d6cd8859aaa06 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_batch_size_64_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_batch_size_64_pipeline pipeline DistilBertForSequenceClassification from K-kiron +author: John Snow Labs +name: distilbert_batch_size_64_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_batch_size_64_pipeline` is a English model originally trained by K-kiron. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_batch_size_64_pipeline_en_5.5.0_3.0_1727012247031.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_batch_size_64_pipeline_en_5.5.0_3.0_1727012247031.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_batch_size_64_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_batch_size_64_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_batch_size_64_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/K-kiron/distilbert-batch-size-64 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_earningspersharediluted_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_earningspersharediluted_en.md new file mode 100644 index 00000000000000..d2e6a6fa0dec26 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_earningspersharediluted_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_earningspersharediluted DistilBertForSequenceClassification from lenguyen +author: John Snow Labs +name: distilbert_earningspersharediluted +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_earningspersharediluted` is a English model originally trained by lenguyen. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_earningspersharediluted_en_5.5.0_3.0_1727012565786.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_earningspersharediluted_en_5.5.0_3.0_1727012565786.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_earningspersharediluted","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_earningspersharediluted", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_earningspersharediluted| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|411.0 MB| + +## References + +https://huggingface.co/lenguyen/distilbert_EarningsPerShareDiluted \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_earningspersharediluted_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_earningspersharediluted_pipeline_en.md new file mode 100644 index 00000000000000..8ea6307803a66e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_earningspersharediluted_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_earningspersharediluted_pipeline pipeline DistilBertForSequenceClassification from lenguyen +author: John Snow Labs +name: distilbert_earningspersharediluted_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_earningspersharediluted_pipeline` is a English model originally trained by lenguyen. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_earningspersharediluted_pipeline_en_5.5.0_3.0_1727012585255.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_earningspersharediluted_pipeline_en_5.5.0_3.0_1727012585255.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_earningspersharediluted_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_earningspersharediluted_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_earningspersharediluted_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|411.0 MB| + +## References + +https://huggingface.co/lenguyen/distilbert_EarningsPerShareDiluted + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_emotion_emindurmus80_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_emotion_emindurmus80_en.md new file mode 100644 index 00000000000000..f978d9fbec7fdc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_emotion_emindurmus80_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_emotion_emindurmus80 DistilBertForSequenceClassification from EminDurmus80 +author: John Snow Labs +name: distilbert_emotion_emindurmus80 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_emotion_emindurmus80` is a English model originally trained by EminDurmus80. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_emotion_emindurmus80_en_5.5.0_3.0_1726980329243.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_emotion_emindurmus80_en_5.5.0_3.0_1726980329243.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_emotion_emindurmus80","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_emotion_emindurmus80", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_emotion_emindurmus80| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/EminDurmus80/distilbert-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_emotion_emindurmus80_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_emotion_emindurmus80_pipeline_en.md new file mode 100644 index 00000000000000..396f604c441a85 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_emotion_emindurmus80_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_emotion_emindurmus80_pipeline pipeline DistilBertForSequenceClassification from EminDurmus80 +author: John Snow Labs +name: distilbert_emotion_emindurmus80_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_emotion_emindurmus80_pipeline` is a English model originally trained by EminDurmus80. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_emotion_emindurmus80_pipeline_en_5.5.0_3.0_1726980340867.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_emotion_emindurmus80_pipeline_en_5.5.0_3.0_1726980340867.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_emotion_emindurmus80_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_emotion_emindurmus80_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_emotion_emindurmus80_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/EminDurmus80/distilbert-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_emotion_ucuncubayram_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_emotion_ucuncubayram_en.md new file mode 100644 index 00000000000000..db8c4b38273899 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_emotion_ucuncubayram_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_emotion_ucuncubayram DistilBertForSequenceClassification from ucuncubayram +author: John Snow Labs +name: distilbert_emotion_ucuncubayram +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_emotion_ucuncubayram` is a English model originally trained by ucuncubayram. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_emotion_ucuncubayram_en_5.5.0_3.0_1727033132781.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_emotion_ucuncubayram_en_5.5.0_3.0_1727033132781.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_emotion_ucuncubayram","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_emotion_ucuncubayram", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_emotion_ucuncubayram| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/ucuncubayram/distilbert-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_emotion_ucuncubayram_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_emotion_ucuncubayram_pipeline_en.md new file mode 100644 index 00000000000000..080422d3624832 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_emotion_ucuncubayram_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_emotion_ucuncubayram_pipeline pipeline DistilBertForSequenceClassification from ucuncubayram +author: John Snow Labs +name: distilbert_emotion_ucuncubayram_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_emotion_ucuncubayram_pipeline` is a English model originally trained by ucuncubayram. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_emotion_ucuncubayram_pipeline_en_5.5.0_3.0_1727033155933.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_emotion_ucuncubayram_pipeline_en_5.5.0_3.0_1727033155933.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_emotion_ucuncubayram_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_emotion_ucuncubayram_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_emotion_ucuncubayram_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/ucuncubayram/distilbert-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_finetuned_imdb_sentiment_omalve_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_finetuned_imdb_sentiment_omalve_en.md new file mode 100644 index 00000000000000..37e6e758d8db76 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_finetuned_imdb_sentiment_omalve_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_finetuned_imdb_sentiment_omalve DistilBertForSequenceClassification from OmAlve +author: John Snow Labs +name: distilbert_finetuned_imdb_sentiment_omalve +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_finetuned_imdb_sentiment_omalve` is a English model originally trained by OmAlve. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_finetuned_imdb_sentiment_omalve_en_5.5.0_3.0_1727035180915.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_finetuned_imdb_sentiment_omalve_en_5.5.0_3.0_1727035180915.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_finetuned_imdb_sentiment_omalve","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_finetuned_imdb_sentiment_omalve", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_finetuned_imdb_sentiment_omalve| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/OmAlve/distilbert-finetuned-imdb-sentiment \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_finetuned_imdb_sentiment_omalve_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_finetuned_imdb_sentiment_omalve_pipeline_en.md new file mode 100644 index 00000000000000..7ee51c1c97f87f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_finetuned_imdb_sentiment_omalve_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_finetuned_imdb_sentiment_omalve_pipeline pipeline DistilBertForSequenceClassification from OmAlve +author: John Snow Labs +name: distilbert_finetuned_imdb_sentiment_omalve_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_finetuned_imdb_sentiment_omalve_pipeline` is a English model originally trained by OmAlve. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_finetuned_imdb_sentiment_omalve_pipeline_en_5.5.0_3.0_1727035193447.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_finetuned_imdb_sentiment_omalve_pipeline_en_5.5.0_3.0_1727035193447.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_finetuned_imdb_sentiment_omalve_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_finetuned_imdb_sentiment_omalve_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_finetuned_imdb_sentiment_omalve_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/OmAlve/distilbert-finetuned-imdb-sentiment + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_finetuned_ner_haydenbspence_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_finetuned_ner_haydenbspence_en.md new file mode 100644 index 00000000000000..ecde4f32e2e89b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_finetuned_ner_haydenbspence_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_finetuned_ner_haydenbspence BertForTokenClassification from haydenbspence +author: John Snow Labs +name: distilbert_finetuned_ner_haydenbspence +date: 2024-09-22 +tags: [en, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_finetuned_ner_haydenbspence` is a English model originally trained by haydenbspence. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_finetuned_ner_haydenbspence_en_5.5.0_3.0_1727030910144.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_finetuned_ner_haydenbspence_en_5.5.0_3.0_1727030910144.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("distilbert_finetuned_ner_haydenbspence","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("distilbert_finetuned_ner_haydenbspence", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_finetuned_ner_haydenbspence| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|407.8 MB| + +## References + +https://huggingface.co/haydenbspence/distilbert-finetuned-ner \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_food_hightensan_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_food_hightensan_pipeline_en.md new file mode 100644 index 00000000000000..ce5c946019d0d0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_food_hightensan_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_food_hightensan_pipeline pipeline DistilBertForSequenceClassification from hightensan +author: John Snow Labs +name: distilbert_food_hightensan_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_food_hightensan_pipeline` is a English model originally trained by hightensan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_food_hightensan_pipeline_en_5.5.0_3.0_1727020625387.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_food_hightensan_pipeline_en_5.5.0_3.0_1727020625387.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_food_hightensan_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_food_hightensan_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_food_hightensan_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/hightensan/distilbert-food + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_foundation_category_c5_finetune_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_foundation_category_c5_finetune_en.md new file mode 100644 index 00000000000000..414df34d8c791a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_foundation_category_c5_finetune_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_foundation_category_c5_finetune DistilBertForSequenceClassification from eric-mc2 +author: John Snow Labs +name: distilbert_foundation_category_c5_finetune +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_foundation_category_c5_finetune` is a English model originally trained by eric-mc2. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_foundation_category_c5_finetune_en_5.5.0_3.0_1727033489081.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_foundation_category_c5_finetune_en_5.5.0_3.0_1727033489081.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_foundation_category_c5_finetune","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_foundation_category_c5_finetune", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_foundation_category_c5_finetune| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/eric-mc2/distilbert-foundation-category-c5-finetune \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_foundation_category_c5_finetune_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_foundation_category_c5_finetune_pipeline_en.md new file mode 100644 index 00000000000000..464d1d6e8795f6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_foundation_category_c5_finetune_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_foundation_category_c5_finetune_pipeline pipeline DistilBertForSequenceClassification from eric-mc2 +author: John Snow Labs +name: distilbert_foundation_category_c5_finetune_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_foundation_category_c5_finetune_pipeline` is a English model originally trained by eric-mc2. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_foundation_category_c5_finetune_pipeline_en_5.5.0_3.0_1727033501218.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_foundation_category_c5_finetune_pipeline_en_5.5.0_3.0_1727033501218.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_foundation_category_c5_finetune_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_foundation_category_c5_finetune_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_foundation_category_c5_finetune_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/eric-mc2/distilbert-foundation-category-c5-finetune + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_lora_false_adapted_augment_false_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_lora_false_adapted_augment_false_en.md new file mode 100644 index 00000000000000..1908f116c71620 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_lora_false_adapted_augment_false_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_lora_false_adapted_augment_false DistilBertForSequenceClassification from EmiMule +author: John Snow Labs +name: distilbert_lora_false_adapted_augment_false +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_lora_false_adapted_augment_false` is a English model originally trained by EmiMule. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_lora_false_adapted_augment_false_en_5.5.0_3.0_1727012552082.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_lora_false_adapted_augment_false_en_5.5.0_3.0_1727012552082.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_lora_false_adapted_augment_false","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_lora_false_adapted_augment_false", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_lora_false_adapted_augment_false| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/EmiMule/distilbert-LoRA-False-adapted-augment-False \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_lora_false_adapted_augment_false_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_lora_false_adapted_augment_false_pipeline_en.md new file mode 100644 index 00000000000000..2a208d91b4c831 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_lora_false_adapted_augment_false_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_lora_false_adapted_augment_false_pipeline pipeline DistilBertForSequenceClassification from EmiMule +author: John Snow Labs +name: distilbert_lora_false_adapted_augment_false_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_lora_false_adapted_augment_false_pipeline` is a English model originally trained by EmiMule. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_lora_false_adapted_augment_false_pipeline_en_5.5.0_3.0_1727012565402.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_lora_false_adapted_augment_false_pipeline_en_5.5.0_3.0_1727012565402.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_lora_false_adapted_augment_false_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_lora_false_adapted_augment_false_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_lora_false_adapted_augment_false_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/EmiMule/distilbert-LoRA-False-adapted-augment-False + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_sanskrit_saskta_glue_experiment_data_aug_mnli_192_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_sanskrit_saskta_glue_experiment_data_aug_mnli_192_en.md new file mode 100644 index 00000000000000..491c9ca4624886 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_sanskrit_saskta_glue_experiment_data_aug_mnli_192_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_sanskrit_saskta_glue_experiment_data_aug_mnli_192 DistilBertForSequenceClassification from gokuls +author: John Snow Labs +name: distilbert_sanskrit_saskta_glue_experiment_data_aug_mnli_192 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_sanskrit_saskta_glue_experiment_data_aug_mnli_192` is a English model originally trained by gokuls. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_sanskrit_saskta_glue_experiment_data_aug_mnli_192_en_5.5.0_3.0_1727033792143.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_sanskrit_saskta_glue_experiment_data_aug_mnli_192_en_5.5.0_3.0_1727033792143.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_sanskrit_saskta_glue_experiment_data_aug_mnli_192","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_sanskrit_saskta_glue_experiment_data_aug_mnli_192", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_sanskrit_saskta_glue_experiment_data_aug_mnli_192| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|52.7 MB| + +## References + +https://huggingface.co/gokuls/distilbert_sa_GLUE_Experiment_data_aug_mnli_192 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_sanskrit_saskta_glue_experiment_data_aug_mnli_192_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_sanskrit_saskta_glue_experiment_data_aug_mnli_192_pipeline_en.md new file mode 100644 index 00000000000000..a110826ba36b48 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_sanskrit_saskta_glue_experiment_data_aug_mnli_192_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_sanskrit_saskta_glue_experiment_data_aug_mnli_192_pipeline pipeline DistilBertForSequenceClassification from gokuls +author: John Snow Labs +name: distilbert_sanskrit_saskta_glue_experiment_data_aug_mnli_192_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_sanskrit_saskta_glue_experiment_data_aug_mnli_192_pipeline` is a English model originally trained by gokuls. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_sanskrit_saskta_glue_experiment_data_aug_mnli_192_pipeline_en_5.5.0_3.0_1727033794920.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_sanskrit_saskta_glue_experiment_data_aug_mnli_192_pipeline_en_5.5.0_3.0_1727033794920.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_sanskrit_saskta_glue_experiment_data_aug_mnli_192_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_sanskrit_saskta_glue_experiment_data_aug_mnli_192_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_sanskrit_saskta_glue_experiment_data_aug_mnli_192_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|52.7 MB| + +## References + +https://huggingface.co/gokuls/distilbert_sa_GLUE_Experiment_data_aug_mnli_192 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_mnli_256_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_mnli_256_en.md new file mode 100644 index 00000000000000..9a26cf333a73e2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_mnli_256_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_mnli_256 DistilBertForSequenceClassification from gokuls +author: John Snow Labs +name: distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_mnli_256 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_mnli_256` is a English model originally trained by gokuls. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_mnli_256_en_5.5.0_3.0_1726980508595.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_mnli_256_en_5.5.0_3.0_1726980508595.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_mnli_256","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_mnli_256", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_mnli_256| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|71.7 MB| + +## References + +https://huggingface.co/gokuls/distilbert_sa_GLUE_Experiment_logit_kd_data_aug_mnli_256 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_mnli_256_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_mnli_256_pipeline_en.md new file mode 100644 index 00000000000000..b1ed0dc06c7dad --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_mnli_256_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_mnli_256_pipeline pipeline DistilBertForSequenceClassification from gokuls +author: John Snow Labs +name: distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_mnli_256_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_mnli_256_pipeline` is a English model originally trained by gokuls. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_mnli_256_pipeline_en_5.5.0_3.0_1726980512398.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_mnli_256_pipeline_en_5.5.0_3.0_1726980512398.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_mnli_256_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_mnli_256_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_mnli_256_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|71.7 MB| + +## References + +https://huggingface.co/gokuls/distilbert_sa_GLUE_Experiment_logit_kd_data_aug_mnli_256 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_sanskrit_saskta_glue_experiment_logit_kd_pretrain_wnli_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_sanskrit_saskta_glue_experiment_logit_kd_pretrain_wnli_en.md new file mode 100644 index 00000000000000..9d91f2d5adecff --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_sanskrit_saskta_glue_experiment_logit_kd_pretrain_wnli_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_sanskrit_saskta_glue_experiment_logit_kd_pretrain_wnli DistilBertForSequenceClassification from gokuls +author: John Snow Labs +name: distilbert_sanskrit_saskta_glue_experiment_logit_kd_pretrain_wnli +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_sanskrit_saskta_glue_experiment_logit_kd_pretrain_wnli` is a English model originally trained by gokuls. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_sanskrit_saskta_glue_experiment_logit_kd_pretrain_wnli_en_5.5.0_3.0_1726980639932.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_sanskrit_saskta_glue_experiment_logit_kd_pretrain_wnli_en_5.5.0_3.0_1726980639932.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_sanskrit_saskta_glue_experiment_logit_kd_pretrain_wnli","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_sanskrit_saskta_glue_experiment_logit_kd_pretrain_wnli", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_sanskrit_saskta_glue_experiment_logit_kd_pretrain_wnli| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/gokuls/distilbert_sa_GLUE_Experiment_logit_kd_pretrain_wnli \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_sanskrit_saskta_glue_experiment_logit_kd_pretrain_wnli_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_sanskrit_saskta_glue_experiment_logit_kd_pretrain_wnli_pipeline_en.md new file mode 100644 index 00000000000000..6b2003e2fe764e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_sanskrit_saskta_glue_experiment_logit_kd_pretrain_wnli_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_sanskrit_saskta_glue_experiment_logit_kd_pretrain_wnli_pipeline pipeline DistilBertForSequenceClassification from gokuls +author: John Snow Labs +name: distilbert_sanskrit_saskta_glue_experiment_logit_kd_pretrain_wnli_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_sanskrit_saskta_glue_experiment_logit_kd_pretrain_wnli_pipeline` is a English model originally trained by gokuls. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_sanskrit_saskta_glue_experiment_logit_kd_pretrain_wnli_pipeline_en_5.5.0_3.0_1726980651348.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_sanskrit_saskta_glue_experiment_logit_kd_pretrain_wnli_pipeline_en_5.5.0_3.0_1726980651348.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_sanskrit_saskta_glue_experiment_logit_kd_pretrain_wnli_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_sanskrit_saskta_glue_experiment_logit_kd_pretrain_wnli_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_sanskrit_saskta_glue_experiment_logit_kd_pretrain_wnli_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/gokuls/distilbert_sa_GLUE_Experiment_logit_kd_pretrain_wnli + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_sanskrit_saskta_glue_experiment_logit_kd_qqp_256_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_sanskrit_saskta_glue_experiment_logit_kd_qqp_256_en.md new file mode 100644 index 00000000000000..4c385b7afc0b9d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_sanskrit_saskta_glue_experiment_logit_kd_qqp_256_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_sanskrit_saskta_glue_experiment_logit_kd_qqp_256 DistilBertForSequenceClassification from gokuls +author: John Snow Labs +name: distilbert_sanskrit_saskta_glue_experiment_logit_kd_qqp_256 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_sanskrit_saskta_glue_experiment_logit_kd_qqp_256` is a English model originally trained by gokuls. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_sanskrit_saskta_glue_experiment_logit_kd_qqp_256_en_5.5.0_3.0_1727020578387.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_sanskrit_saskta_glue_experiment_logit_kd_qqp_256_en_5.5.0_3.0_1727020578387.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_sanskrit_saskta_glue_experiment_logit_kd_qqp_256","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_sanskrit_saskta_glue_experiment_logit_kd_qqp_256", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_sanskrit_saskta_glue_experiment_logit_kd_qqp_256| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|71.7 MB| + +## References + +https://huggingface.co/gokuls/distilbert_sa_GLUE_Experiment_logit_kd_qqp_256 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_sanskrit_saskta_glue_experiment_logit_kd_qqp_256_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_sanskrit_saskta_glue_experiment_logit_kd_qqp_256_pipeline_en.md new file mode 100644 index 00000000000000..896a332b21dad0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_sanskrit_saskta_glue_experiment_logit_kd_qqp_256_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_sanskrit_saskta_glue_experiment_logit_kd_qqp_256_pipeline pipeline DistilBertForSequenceClassification from gokuls +author: John Snow Labs +name: distilbert_sanskrit_saskta_glue_experiment_logit_kd_qqp_256_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_sanskrit_saskta_glue_experiment_logit_kd_qqp_256_pipeline` is a English model originally trained by gokuls. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_sanskrit_saskta_glue_experiment_logit_kd_qqp_256_pipeline_en_5.5.0_3.0_1727020582154.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_sanskrit_saskta_glue_experiment_logit_kd_qqp_256_pipeline_en_5.5.0_3.0_1727020582154.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_sanskrit_saskta_glue_experiment_logit_kd_qqp_256_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_sanskrit_saskta_glue_experiment_logit_kd_qqp_256_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_sanskrit_saskta_glue_experiment_logit_kd_qqp_256_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|71.7 MB| + +## References + +https://huggingface.co/gokuls/distilbert_sa_GLUE_Experiment_logit_kd_qqp_256 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_sanskrit_saskta_glue_experiment_logit_kd_qqp_384_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_sanskrit_saskta_glue_experiment_logit_kd_qqp_384_en.md new file mode 100644 index 00000000000000..75b5ec20e7035e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_sanskrit_saskta_glue_experiment_logit_kd_qqp_384_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_sanskrit_saskta_glue_experiment_logit_kd_qqp_384 DistilBertForSequenceClassification from gokuls +author: John Snow Labs +name: distilbert_sanskrit_saskta_glue_experiment_logit_kd_qqp_384 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_sanskrit_saskta_glue_experiment_logit_kd_qqp_384` is a English model originally trained by gokuls. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_sanskrit_saskta_glue_experiment_logit_kd_qqp_384_en_5.5.0_3.0_1727033691889.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_sanskrit_saskta_glue_experiment_logit_kd_qqp_384_en_5.5.0_3.0_1727033691889.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_sanskrit_saskta_glue_experiment_logit_kd_qqp_384","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_sanskrit_saskta_glue_experiment_logit_kd_qqp_384", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_sanskrit_saskta_glue_experiment_logit_kd_qqp_384| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|111.9 MB| + +## References + +https://huggingface.co/gokuls/distilbert_sa_GLUE_Experiment_logit_kd_qqp_384 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_sanskrit_saskta_glue_experiment_logit_kd_qqp_384_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_sanskrit_saskta_glue_experiment_logit_kd_qqp_384_pipeline_en.md new file mode 100644 index 00000000000000..c6727fcef6edf4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_sanskrit_saskta_glue_experiment_logit_kd_qqp_384_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_sanskrit_saskta_glue_experiment_logit_kd_qqp_384_pipeline pipeline DistilBertForSequenceClassification from gokuls +author: John Snow Labs +name: distilbert_sanskrit_saskta_glue_experiment_logit_kd_qqp_384_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_sanskrit_saskta_glue_experiment_logit_kd_qqp_384_pipeline` is a English model originally trained by gokuls. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_sanskrit_saskta_glue_experiment_logit_kd_qqp_384_pipeline_en_5.5.0_3.0_1727033698301.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_sanskrit_saskta_glue_experiment_logit_kd_qqp_384_pipeline_en_5.5.0_3.0_1727033698301.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_sanskrit_saskta_glue_experiment_logit_kd_qqp_384_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_sanskrit_saskta_glue_experiment_logit_kd_qqp_384_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_sanskrit_saskta_glue_experiment_logit_kd_qqp_384_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|111.9 MB| + +## References + +https://huggingface.co/gokuls/distilbert_sa_GLUE_Experiment_logit_kd_qqp_384 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_sst2_padding20model_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_sst2_padding20model_en.md new file mode 100644 index 00000000000000..96cc084e3a5e6e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_sst2_padding20model_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_sst2_padding20model DistilBertForSequenceClassification from Realgon +author: John Snow Labs +name: distilbert_sst2_padding20model +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_sst2_padding20model` is a English model originally trained by Realgon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_sst2_padding20model_en_5.5.0_3.0_1726980724911.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_sst2_padding20model_en_5.5.0_3.0_1726980724911.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_sst2_padding20model","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_sst2_padding20model", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_sst2_padding20model| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Realgon/distilbert_sst2_padding20model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_sst2_padding20model_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_sst2_padding20model_pipeline_en.md new file mode 100644 index 00000000000000..adaedf6039e97f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_sst2_padding20model_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_sst2_padding20model_pipeline pipeline DistilBertForSequenceClassification from Realgon +author: John Snow Labs +name: distilbert_sst2_padding20model_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_sst2_padding20model_pipeline` is a English model originally trained by Realgon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_sst2_padding20model_pipeline_en_5.5.0_3.0_1726980736718.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_sst2_padding20model_pipeline_en_5.5.0_3.0_1726980736718.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_sst2_padding20model_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_sst2_padding20model_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_sst2_padding20model_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Realgon/distilbert_sst2_padding20model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_turkish_turkish_spam_email_pipeline_tr.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_turkish_turkish_spam_email_pipeline_tr.md new file mode 100644 index 00000000000000..bfc2b4c039a512 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_turkish_turkish_spam_email_pipeline_tr.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Turkish distilbert_turkish_turkish_spam_email_pipeline pipeline DistilBertForSequenceClassification from anilguven +author: John Snow Labs +name: distilbert_turkish_turkish_spam_email_pipeline +date: 2024-09-22 +tags: [tr, open_source, pipeline, onnx] +task: Text Classification +language: tr +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_turkish_turkish_spam_email_pipeline` is a Turkish model originally trained by anilguven. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_turkish_turkish_spam_email_pipeline_tr_5.5.0_3.0_1727020409409.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_turkish_turkish_spam_email_pipeline_tr_5.5.0_3.0_1727020409409.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_turkish_turkish_spam_email_pipeline", lang = "tr") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_turkish_turkish_spam_email_pipeline", lang = "tr") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_turkish_turkish_spam_email_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|tr| +|Size:|254.1 MB| + +## References + +https://huggingface.co/anilguven/distilbert_tr_turkish_spam_email + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_twitterfin_padding60model_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_twitterfin_padding60model_en.md new file mode 100644 index 00000000000000..d0f8f6ddf2277a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_twitterfin_padding60model_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_twitterfin_padding60model DistilBertForSequenceClassification from Realgon +author: John Snow Labs +name: distilbert_twitterfin_padding60model +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_twitterfin_padding60model` is a English model originally trained by Realgon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_twitterfin_padding60model_en_5.5.0_3.0_1727035498696.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_twitterfin_padding60model_en_5.5.0_3.0_1727035498696.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_twitterfin_padding60model","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_twitterfin_padding60model", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_twitterfin_padding60model| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Realgon/distilbert_twitterfin_padding60model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_twitterfin_padding60model_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_twitterfin_padding60model_pipeline_en.md new file mode 100644 index 00000000000000..fdae303e0839ca --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_twitterfin_padding60model_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_twitterfin_padding60model_pipeline pipeline DistilBertForSequenceClassification from Realgon +author: John Snow Labs +name: distilbert_twitterfin_padding60model_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_twitterfin_padding60model_pipeline` is a English model originally trained by Realgon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_twitterfin_padding60model_pipeline_en_5.5.0_3.0_1727035511041.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_twitterfin_padding60model_pipeline_en_5.5.0_3.0_1727035511041.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_twitterfin_padding60model_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_twitterfin_padding60model_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_twitterfin_padding60model_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Realgon/distilbert_twitterfin_padding60model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_twitterfin_padding80model_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_twitterfin_padding80model_en.md new file mode 100644 index 00000000000000..fe49c45a43287c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_twitterfin_padding80model_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_twitterfin_padding80model DistilBertForSequenceClassification from Realgon +author: John Snow Labs +name: distilbert_twitterfin_padding80model +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_twitterfin_padding80model` is a English model originally trained by Realgon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_twitterfin_padding80model_en_5.5.0_3.0_1727033948183.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_twitterfin_padding80model_en_5.5.0_3.0_1727033948183.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_twitterfin_padding80model","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_twitterfin_padding80model", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_twitterfin_padding80model| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Realgon/distilbert_twitterfin_padding80model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distill_whisper_jargon_btemirov_en.md b/docs/_posts/ahmedlone127/2024-09-22-distill_whisper_jargon_btemirov_en.md new file mode 100644 index 00000000000000..c48cd8e30ab7bb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distill_whisper_jargon_btemirov_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English distill_whisper_jargon_btemirov WhisperForCTC from btemirov +author: John Snow Labs +name: distill_whisper_jargon_btemirov +date: 2024-09-22 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distill_whisper_jargon_btemirov` is a English model originally trained by btemirov. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distill_whisper_jargon_btemirov_en_5.5.0_3.0_1726994219867.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distill_whisper_jargon_btemirov_en_5.5.0_3.0_1726994219867.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("distill_whisper_jargon_btemirov","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("distill_whisper_jargon_btemirov", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distill_whisper_jargon_btemirov| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/btemirov/distill-whisper-jargon \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distill_whisper_jargon_btemirov_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-distill_whisper_jargon_btemirov_pipeline_en.md new file mode 100644 index 00000000000000..bea58f51dda7aa --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distill_whisper_jargon_btemirov_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English distill_whisper_jargon_btemirov_pipeline pipeline WhisperForCTC from btemirov +author: John Snow Labs +name: distill_whisper_jargon_btemirov_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distill_whisper_jargon_btemirov_pipeline` is a English model originally trained by btemirov. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distill_whisper_jargon_btemirov_pipeline_en_5.5.0_3.0_1726994275570.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distill_whisper_jargon_btemirov_pipeline_en_5.5.0_3.0_1726994275570.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distill_whisper_jargon_btemirov_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distill_whisper_jargon_btemirov_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distill_whisper_jargon_btemirov_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/btemirov/distill-whisper-jargon + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilroberta_base_finetuned_condition_classifier_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilroberta_base_finetuned_condition_classifier_en.md new file mode 100644 index 00000000000000..95db676c196916 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilroberta_base_finetuned_condition_classifier_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilroberta_base_finetuned_condition_classifier RoBertaForSequenceClassification from BanUrsus +author: John Snow Labs +name: distilroberta_base_finetuned_condition_classifier +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilroberta_base_finetuned_condition_classifier` is a English model originally trained by BanUrsus. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilroberta_base_finetuned_condition_classifier_en_5.5.0_3.0_1727026411869.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilroberta_base_finetuned_condition_classifier_en_5.5.0_3.0_1727026411869.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("distilroberta_base_finetuned_condition_classifier","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("distilroberta_base_finetuned_condition_classifier", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilroberta_base_finetuned_condition_classifier| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|311.3 MB| + +## References + +https://huggingface.co/BanUrsus/distilroberta-base-finetuned-condition-classifier \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilroberta_base_finetuned_condition_classifier_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilroberta_base_finetuned_condition_classifier_pipeline_en.md new file mode 100644 index 00000000000000..5ba744db53ed24 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilroberta_base_finetuned_condition_classifier_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilroberta_base_finetuned_condition_classifier_pipeline pipeline RoBertaForSequenceClassification from BanUrsus +author: John Snow Labs +name: distilroberta_base_finetuned_condition_classifier_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilroberta_base_finetuned_condition_classifier_pipeline` is a English model originally trained by BanUrsus. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilroberta_base_finetuned_condition_classifier_pipeline_en_5.5.0_3.0_1727026427044.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilroberta_base_finetuned_condition_classifier_pipeline_en_5.5.0_3.0_1727026427044.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilroberta_base_finetuned_condition_classifier_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilroberta_base_finetuned_condition_classifier_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilroberta_base_finetuned_condition_classifier_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|311.4 MB| + +## References + +https://huggingface.co/BanUrsus/distilroberta-base-finetuned-condition-classifier + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilroberta_base_finetuned_leftarticles_mlm_epochier_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilroberta_base_finetuned_leftarticles_mlm_epochier_pipeline_en.md new file mode 100644 index 00000000000000..e6f719c17bcf86 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilroberta_base_finetuned_leftarticles_mlm_epochier_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilroberta_base_finetuned_leftarticles_mlm_epochier_pipeline pipeline RoBertaEmbeddings from happybusinessperson +author: John Snow Labs +name: distilroberta_base_finetuned_leftarticles_mlm_epochier_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilroberta_base_finetuned_leftarticles_mlm_epochier_pipeline` is a English model originally trained by happybusinessperson. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilroberta_base_finetuned_leftarticles_mlm_epochier_pipeline_en_5.5.0_3.0_1726999940632.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilroberta_base_finetuned_leftarticles_mlm_epochier_pipeline_en_5.5.0_3.0_1726999940632.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilroberta_base_finetuned_leftarticles_mlm_epochier_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilroberta_base_finetuned_leftarticles_mlm_epochier_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilroberta_base_finetuned_leftarticles_mlm_epochier_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|306.5 MB| + +## References + +https://huggingface.co/happybusinessperson/distilroberta-base-finetuned-leftarticles-mlm-epochier + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilroberta_base_mrpc_glue_kevinvelez18_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilroberta_base_mrpc_glue_kevinvelez18_pipeline_en.md new file mode 100644 index 00000000000000..ba2241fd278a33 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilroberta_base_mrpc_glue_kevinvelez18_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilroberta_base_mrpc_glue_kevinvelez18_pipeline pipeline RoBertaForSequenceClassification from kevinvelez18 +author: John Snow Labs +name: distilroberta_base_mrpc_glue_kevinvelez18_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilroberta_base_mrpc_glue_kevinvelez18_pipeline` is a English model originally trained by kevinvelez18. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilroberta_base_mrpc_glue_kevinvelez18_pipeline_en_5.5.0_3.0_1726967983954.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilroberta_base_mrpc_glue_kevinvelez18_pipeline_en_5.5.0_3.0_1726967983954.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilroberta_base_mrpc_glue_kevinvelez18_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilroberta_base_mrpc_glue_kevinvelez18_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilroberta_base_mrpc_glue_kevinvelez18_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|308.6 MB| + +## References + +https://huggingface.co/kevinvelez18/distilroberta-base-mrpc-glue + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilroberta_base_sst2_distilled_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilroberta_base_sst2_distilled_en.md new file mode 100644 index 00000000000000..dd68954baa963e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilroberta_base_sst2_distilled_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilroberta_base_sst2_distilled RoBertaForSequenceClassification from aal2015 +author: John Snow Labs +name: distilroberta_base_sst2_distilled +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilroberta_base_sst2_distilled` is a English model originally trained by aal2015. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilroberta_base_sst2_distilled_en_5.5.0_3.0_1726972131591.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilroberta_base_sst2_distilled_en_5.5.0_3.0_1726972131591.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("distilroberta_base_sst2_distilled","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("distilroberta_base_sst2_distilled", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilroberta_base_sst2_distilled| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|308.6 MB| + +## References + +https://huggingface.co/aal2015/distilroberta-base-sst2-distilled \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilroberta_base_sst2_distilled_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilroberta_base_sst2_distilled_pipeline_en.md new file mode 100644 index 00000000000000..282feba8a7a697 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilroberta_base_sst2_distilled_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilroberta_base_sst2_distilled_pipeline pipeline RoBertaForSequenceClassification from aal2015 +author: John Snow Labs +name: distilroberta_base_sst2_distilled_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilroberta_base_sst2_distilled_pipeline` is a English model originally trained by aal2015. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilroberta_base_sst2_distilled_pipeline_en_5.5.0_3.0_1726972146442.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilroberta_base_sst2_distilled_pipeline_en_5.5.0_3.0_1726972146442.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilroberta_base_sst2_distilled_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilroberta_base_sst2_distilled_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilroberta_base_sst2_distilled_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|308.6 MB| + +## References + +https://huggingface.co/aal2015/distilroberta-base-sst2-distilled + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilroberta_finetuned_financial_text_regression_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilroberta_finetuned_financial_text_regression_en.md new file mode 100644 index 00000000000000..97b4974bab91dd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilroberta_finetuned_financial_text_regression_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilroberta_finetuned_financial_text_regression RoBertaForSequenceClassification from lwat64 +author: John Snow Labs +name: distilroberta_finetuned_financial_text_regression +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilroberta_finetuned_financial_text_regression` is a English model originally trained by lwat64. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilroberta_finetuned_financial_text_regression_en_5.5.0_3.0_1726972135237.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilroberta_finetuned_financial_text_regression_en_5.5.0_3.0_1726972135237.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("distilroberta_finetuned_financial_text_regression","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("distilroberta_finetuned_financial_text_regression", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilroberta_finetuned_financial_text_regression| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|308.6 MB| + +## References + +https://huggingface.co/lwat64/distilroberta-finetuned-financial-text-regression \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-dopamin_post_training_en.md b/docs/_posts/ahmedlone127/2024-09-22-dopamin_post_training_en.md new file mode 100644 index 00000000000000..baae52ab720a53 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-dopamin_post_training_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English dopamin_post_training RoBertaForSequenceClassification from Fsoft-AIC +author: John Snow Labs +name: dopamin_post_training +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`dopamin_post_training` is a English model originally trained by Fsoft-AIC. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/dopamin_post_training_en_5.5.0_3.0_1726967737536.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/dopamin_post_training_en_5.5.0_3.0_1726967737536.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("dopamin_post_training","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("dopamin_post_training", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|dopamin_post_training| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|468.3 MB| + +## References + +https://huggingface.co/Fsoft-AIC/dopamin-post-training \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-dopamin_post_training_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-dopamin_post_training_pipeline_en.md new file mode 100644 index 00000000000000..543b2218a27837 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-dopamin_post_training_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English dopamin_post_training_pipeline pipeline RoBertaForSequenceClassification from Fsoft-AIC +author: John Snow Labs +name: dopamin_post_training_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`dopamin_post_training_pipeline` is a English model originally trained by Fsoft-AIC. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/dopamin_post_training_pipeline_en_5.5.0_3.0_1726967758843.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/dopamin_post_training_pipeline_en_5.5.0_3.0_1726967758843.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("dopamin_post_training_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("dopamin_post_training_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|dopamin_post_training_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|468.4 MB| + +## References + +https://huggingface.co/Fsoft-AIC/dopamin-post-training + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-drug_ner_cat_v1_ca.md b/docs/_posts/ahmedlone127/2024-09-22-drug_ner_cat_v1_ca.md new file mode 100644 index 00000000000000..ae1af2d356094b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-drug_ner_cat_v1_ca.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Catalan, Valencian drug_ner_cat_v1 RoBertaForTokenClassification from BSC-NLP4BIA +author: John Snow Labs +name: drug_ner_cat_v1 +date: 2024-09-22 +tags: [ca, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: ca +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`drug_ner_cat_v1` is a Catalan, Valencian model originally trained by BSC-NLP4BIA. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/drug_ner_cat_v1_ca_5.5.0_3.0_1727048485082.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/drug_ner_cat_v1_ca_5.5.0_3.0_1727048485082.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("drug_ner_cat_v1","ca") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("drug_ner_cat_v1", "ca") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|drug_ner_cat_v1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|ca| +|Size:|436.0 MB| + +## References + +https://huggingface.co/BSC-NLP4BIA/drug-ner-cat-v1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-drug_ner_cat_v1_pipeline_ca.md b/docs/_posts/ahmedlone127/2024-09-22-drug_ner_cat_v1_pipeline_ca.md new file mode 100644 index 00000000000000..f5d56a94f5d6b4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-drug_ner_cat_v1_pipeline_ca.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Catalan, Valencian drug_ner_cat_v1_pipeline pipeline RoBertaForTokenClassification from BSC-NLP4BIA +author: John Snow Labs +name: drug_ner_cat_v1_pipeline +date: 2024-09-22 +tags: [ca, open_source, pipeline, onnx] +task: Named Entity Recognition +language: ca +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`drug_ner_cat_v1_pipeline` is a Catalan, Valencian model originally trained by BSC-NLP4BIA. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/drug_ner_cat_v1_pipeline_ca_5.5.0_3.0_1727048508541.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/drug_ner_cat_v1_pipeline_ca_5.5.0_3.0_1727048508541.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("drug_ner_cat_v1_pipeline", lang = "ca") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("drug_ner_cat_v1_pipeline", lang = "ca") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|drug_ner_cat_v1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|ca| +|Size:|436.0 MB| + +## References + +https://huggingface.co/BSC-NLP4BIA/drug-ner-cat-v1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-echocardiogram_latvian_dilation_reduced_nl.md b/docs/_posts/ahmedlone127/2024-09-22-echocardiogram_latvian_dilation_reduced_nl.md new file mode 100644 index 00000000000000..1fc0ffdcf22487 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-echocardiogram_latvian_dilation_reduced_nl.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Dutch, Flemish echocardiogram_latvian_dilation_reduced RoBertaForSequenceClassification from UMCU +author: John Snow Labs +name: echocardiogram_latvian_dilation_reduced +date: 2024-09-22 +tags: [nl, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: nl +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`echocardiogram_latvian_dilation_reduced` is a Dutch, Flemish model originally trained by UMCU. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/echocardiogram_latvian_dilation_reduced_nl_5.5.0_3.0_1727017594069.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/echocardiogram_latvian_dilation_reduced_nl_5.5.0_3.0_1727017594069.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("echocardiogram_latvian_dilation_reduced","nl") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("echocardiogram_latvian_dilation_reduced", "nl") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|echocardiogram_latvian_dilation_reduced| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|nl| +|Size:|472.0 MB| + +## References + +https://huggingface.co/UMCU/Echocardiogram_LV_dilation_reduced \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-echocardiogram_latvian_dilation_reduced_pipeline_nl.md b/docs/_posts/ahmedlone127/2024-09-22-echocardiogram_latvian_dilation_reduced_pipeline_nl.md new file mode 100644 index 00000000000000..46f297d18913e0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-echocardiogram_latvian_dilation_reduced_pipeline_nl.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Dutch, Flemish echocardiogram_latvian_dilation_reduced_pipeline pipeline RoBertaForSequenceClassification from UMCU +author: John Snow Labs +name: echocardiogram_latvian_dilation_reduced_pipeline +date: 2024-09-22 +tags: [nl, open_source, pipeline, onnx] +task: Text Classification +language: nl +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`echocardiogram_latvian_dilation_reduced_pipeline` is a Dutch, Flemish model originally trained by UMCU. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/echocardiogram_latvian_dilation_reduced_pipeline_nl_5.5.0_3.0_1727017617462.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/echocardiogram_latvian_dilation_reduced_pipeline_nl_5.5.0_3.0_1727017617462.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("echocardiogram_latvian_dilation_reduced_pipeline", lang = "nl") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("echocardiogram_latvian_dilation_reduced_pipeline", lang = "nl") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|echocardiogram_latvian_dilation_reduced_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|nl| +|Size:|472.0 MB| + +## References + +https://huggingface.co/UMCU/Echocardiogram_LV_dilation_reduced + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-eee_en.md b/docs/_posts/ahmedlone127/2024-09-22-eee_en.md new file mode 100644 index 00000000000000..3c53cf7e317f4a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-eee_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English eee RoBertaForSequenceClassification from weicap +author: John Snow Labs +name: eee +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`eee` is a English model originally trained by weicap. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/eee_en_5.5.0_3.0_1727017509815.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/eee_en_5.5.0_3.0_1727017509815.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("eee","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("eee", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|eee| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|435.9 MB| + +## References + +https://huggingface.co/weicap/eee \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-eee_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-eee_pipeline_en.md new file mode 100644 index 00000000000000..bf9169fe7193aa --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-eee_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English eee_pipeline pipeline RoBertaForSequenceClassification from weicap +author: John Snow Labs +name: eee_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`eee_pipeline` is a English model originally trained by weicap. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/eee_pipeline_en_5.5.0_3.0_1727017535371.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/eee_pipeline_en_5.5.0_3.0_1727017535371.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("eee_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("eee_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|eee_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|435.9 MB| + +## References + +https://huggingface.co/weicap/eee + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-elatable_lp_en.md b/docs/_posts/ahmedlone127/2024-09-22-elatable_lp_en.md new file mode 100644 index 00000000000000..0e7e273154ceae --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-elatable_lp_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English elatable_lp DistilBertForSequenceClassification from gaborcselle +author: John Snow Labs +name: elatable_lp +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`elatable_lp` is a English model originally trained by gaborcselle. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/elatable_lp_en_5.5.0_3.0_1726980585585.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/elatable_lp_en_5.5.0_3.0_1726980585585.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("elatable_lp","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("elatable_lp", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|elatable_lp| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.4 MB| + +## References + +https://huggingface.co/gaborcselle/elatable-lp \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-elatable_lp_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-elatable_lp_pipeline_en.md new file mode 100644 index 00000000000000..488014362f2d1c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-elatable_lp_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English elatable_lp_pipeline pipeline DistilBertForSequenceClassification from gaborcselle +author: John Snow Labs +name: elatable_lp_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`elatable_lp_pipeline` is a English model originally trained by gaborcselle. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/elatable_lp_pipeline_en_5.5.0_3.0_1726980596925.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/elatable_lp_pipeline_en_5.5.0_3.0_1726980596925.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("elatable_lp_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("elatable_lp_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|elatable_lp_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/gaborcselle/elatable-lp + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-emoji_emoji_random0_seed0_bertweet_large_en.md b/docs/_posts/ahmedlone127/2024-09-22-emoji_emoji_random0_seed0_bertweet_large_en.md new file mode 100644 index 00000000000000..c93613cb8d1c9b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-emoji_emoji_random0_seed0_bertweet_large_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English emoji_emoji_random0_seed0_bertweet_large RoBertaForSequenceClassification from tweettemposhift +author: John Snow Labs +name: emoji_emoji_random0_seed0_bertweet_large +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`emoji_emoji_random0_seed0_bertweet_large` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/emoji_emoji_random0_seed0_bertweet_large_en_5.5.0_3.0_1727027210427.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/emoji_emoji_random0_seed0_bertweet_large_en_5.5.0_3.0_1727027210427.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("emoji_emoji_random0_seed0_bertweet_large","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("emoji_emoji_random0_seed0_bertweet_large", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|emoji_emoji_random0_seed0_bertweet_large| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/tweettemposhift/emoji-emoji_random0_seed0-bertweet-large \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-emoji_emoji_random0_seed0_bertweet_large_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-emoji_emoji_random0_seed0_bertweet_large_pipeline_en.md new file mode 100644 index 00000000000000..4e101b515b9caa --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-emoji_emoji_random0_seed0_bertweet_large_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English emoji_emoji_random0_seed0_bertweet_large_pipeline pipeline RoBertaForSequenceClassification from tweettemposhift +author: John Snow Labs +name: emoji_emoji_random0_seed0_bertweet_large_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`emoji_emoji_random0_seed0_bertweet_large_pipeline` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/emoji_emoji_random0_seed0_bertweet_large_pipeline_en_5.5.0_3.0_1727027282592.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/emoji_emoji_random0_seed0_bertweet_large_pipeline_en_5.5.0_3.0_1727027282592.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("emoji_emoji_random0_seed0_bertweet_large_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("emoji_emoji_random0_seed0_bertweet_large_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|emoji_emoji_random0_seed0_bertweet_large_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/tweettemposhift/emoji-emoji_random0_seed0-bertweet-large + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-emoji_emoji_random3_seed0_twitter_roberta_base_2021_124m_en.md b/docs/_posts/ahmedlone127/2024-09-22-emoji_emoji_random3_seed0_twitter_roberta_base_2021_124m_en.md new file mode 100644 index 00000000000000..b77d76f6d73d2e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-emoji_emoji_random3_seed0_twitter_roberta_base_2021_124m_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English emoji_emoji_random3_seed0_twitter_roberta_base_2021_124m RoBertaForSequenceClassification from tweettemposhift +author: John Snow Labs +name: emoji_emoji_random3_seed0_twitter_roberta_base_2021_124m +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`emoji_emoji_random3_seed0_twitter_roberta_base_2021_124m` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/emoji_emoji_random3_seed0_twitter_roberta_base_2021_124m_en_5.5.0_3.0_1727027531103.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/emoji_emoji_random3_seed0_twitter_roberta_base_2021_124m_en_5.5.0_3.0_1727027531103.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("emoji_emoji_random3_seed0_twitter_roberta_base_2021_124m","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("emoji_emoji_random3_seed0_twitter_roberta_base_2021_124m", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|emoji_emoji_random3_seed0_twitter_roberta_base_2021_124m| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|468.6 MB| + +## References + +https://huggingface.co/tweettemposhift/emoji-emoji_random3_seed0-twitter-roberta-base-2021-124m \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-emoji_emoji_random3_seed0_twitter_roberta_base_2021_124m_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-emoji_emoji_random3_seed0_twitter_roberta_base_2021_124m_pipeline_en.md new file mode 100644 index 00000000000000..1069df8b07aaf0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-emoji_emoji_random3_seed0_twitter_roberta_base_2021_124m_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English emoji_emoji_random3_seed0_twitter_roberta_base_2021_124m_pipeline pipeline RoBertaForSequenceClassification from tweettemposhift +author: John Snow Labs +name: emoji_emoji_random3_seed0_twitter_roberta_base_2021_124m_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`emoji_emoji_random3_seed0_twitter_roberta_base_2021_124m_pipeline` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/emoji_emoji_random3_seed0_twitter_roberta_base_2021_124m_pipeline_en_5.5.0_3.0_1727027554029.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/emoji_emoji_random3_seed0_twitter_roberta_base_2021_124m_pipeline_en_5.5.0_3.0_1727027554029.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("emoji_emoji_random3_seed0_twitter_roberta_base_2021_124m_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("emoji_emoji_random3_seed0_twitter_roberta_base_2021_124m_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|emoji_emoji_random3_seed0_twitter_roberta_base_2021_124m_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|468.6 MB| + +## References + +https://huggingface.co/tweettemposhift/emoji-emoji_random3_seed0-twitter-roberta-base-2021-124m + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-emoji_emoji_random3_seed1_twitter_roberta_base_2019_90m_en.md b/docs/_posts/ahmedlone127/2024-09-22-emoji_emoji_random3_seed1_twitter_roberta_base_2019_90m_en.md new file mode 100644 index 00000000000000..d4879658d6aa39 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-emoji_emoji_random3_seed1_twitter_roberta_base_2019_90m_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English emoji_emoji_random3_seed1_twitter_roberta_base_2019_90m RoBertaForSequenceClassification from tweettemposhift +author: John Snow Labs +name: emoji_emoji_random3_seed1_twitter_roberta_base_2019_90m +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`emoji_emoji_random3_seed1_twitter_roberta_base_2019_90m` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/emoji_emoji_random3_seed1_twitter_roberta_base_2019_90m_en_5.5.0_3.0_1727037360173.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/emoji_emoji_random3_seed1_twitter_roberta_base_2019_90m_en_5.5.0_3.0_1727037360173.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("emoji_emoji_random3_seed1_twitter_roberta_base_2019_90m","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("emoji_emoji_random3_seed1_twitter_roberta_base_2019_90m", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|emoji_emoji_random3_seed1_twitter_roberta_base_2019_90m| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|468.6 MB| + +## References + +https://huggingface.co/tweettemposhift/emoji-emoji_random3_seed1-twitter-roberta-base-2019-90m \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-emoji_emoji_random3_seed1_twitter_roberta_base_2019_90m_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-emoji_emoji_random3_seed1_twitter_roberta_base_2019_90m_pipeline_en.md new file mode 100644 index 00000000000000..ad279cac9a48b4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-emoji_emoji_random3_seed1_twitter_roberta_base_2019_90m_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English emoji_emoji_random3_seed1_twitter_roberta_base_2019_90m_pipeline pipeline RoBertaForSequenceClassification from tweettemposhift +author: John Snow Labs +name: emoji_emoji_random3_seed1_twitter_roberta_base_2019_90m_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`emoji_emoji_random3_seed1_twitter_roberta_base_2019_90m_pipeline` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/emoji_emoji_random3_seed1_twitter_roberta_base_2019_90m_pipeline_en_5.5.0_3.0_1727037385150.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/emoji_emoji_random3_seed1_twitter_roberta_base_2019_90m_pipeline_en_5.5.0_3.0_1727037385150.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("emoji_emoji_random3_seed1_twitter_roberta_base_2019_90m_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("emoji_emoji_random3_seed1_twitter_roberta_base_2019_90m_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|emoji_emoji_random3_seed1_twitter_roberta_base_2019_90m_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|468.6 MB| + +## References + +https://huggingface.co/tweettemposhift/emoji-emoji_random3_seed1-twitter-roberta-base-2019-90m + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-emotion_model_en.md b/docs/_posts/ahmedlone127/2024-09-22-emotion_model_en.md new file mode 100644 index 00000000000000..01561989e16bed --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-emotion_model_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English emotion_model DistilBertForSequenceClassification from naamalia23 +author: John Snow Labs +name: emotion_model +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`emotion_model` is a English model originally trained by naamalia23. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/emotion_model_en_5.5.0_3.0_1727035394131.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/emotion_model_en_5.5.0_3.0_1727035394131.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("emotion_model","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("emotion_model", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|emotion_model| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/naamalia23/emotion_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-emotion_model_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-emotion_model_pipeline_en.md new file mode 100644 index 00000000000000..c9fb62915141dc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-emotion_model_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English emotion_model_pipeline pipeline DistilBertForSequenceClassification from naamalia23 +author: John Snow Labs +name: emotion_model_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`emotion_model_pipeline` is a English model originally trained by naamalia23. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/emotion_model_pipeline_en_5.5.0_3.0_1727035406491.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/emotion_model_pipeline_en_5.5.0_3.0_1727035406491.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("emotion_model_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("emotion_model_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|emotion_model_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/naamalia23/emotion_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-eq_bert_v1_1_en.md b/docs/_posts/ahmedlone127/2024-09-22-eq_bert_v1_1_en.md new file mode 100644 index 00000000000000..99135ffb56a034 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-eq_bert_v1_1_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English eq_bert_v1_1 BertEmbeddings from RyotaroOKabe +author: John Snow Labs +name: eq_bert_v1_1 +date: 2024-09-22 +tags: [en, open_source, onnx, embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`eq_bert_v1_1` is a English model originally trained by RyotaroOKabe. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/eq_bert_v1_1_en_5.5.0_3.0_1726973566678.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/eq_bert_v1_1_en_5.5.0_3.0_1726973566678.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = BertEmbeddings.pretrained("eq_bert_v1_1","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = BertEmbeddings.pretrained("eq_bert_v1_1","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|eq_bert_v1_1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[bert]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/RyotaroOKabe/eq_bert_v1.1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-eq_bert_v1_1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-eq_bert_v1_1_pipeline_en.md new file mode 100644 index 00000000000000..96e77a7e8ecddb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-eq_bert_v1_1_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English eq_bert_v1_1_pipeline pipeline BertEmbeddings from RyotaroOKabe +author: John Snow Labs +name: eq_bert_v1_1_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`eq_bert_v1_1_pipeline` is a English model originally trained by RyotaroOKabe. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/eq_bert_v1_1_pipeline_en_5.5.0_3.0_1726973584645.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/eq_bert_v1_1_pipeline_en_5.5.0_3.0_1726973584645.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("eq_bert_v1_1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("eq_bert_v1_1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|eq_bert_v1_1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/RyotaroOKabe/eq_bert_v1.1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-fakenews_roberta_large_grad_en.md b/docs/_posts/ahmedlone127/2024-09-22-fakenews_roberta_large_grad_en.md new file mode 100644 index 00000000000000..695cfb565c820c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-fakenews_roberta_large_grad_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English fakenews_roberta_large_grad RoBertaForSequenceClassification from Denyol +author: John Snow Labs +name: fakenews_roberta_large_grad +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`fakenews_roberta_large_grad` is a English model originally trained by Denyol. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/fakenews_roberta_large_grad_en_5.5.0_3.0_1727037590807.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/fakenews_roberta_large_grad_en_5.5.0_3.0_1727037590807.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("fakenews_roberta_large_grad","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("fakenews_roberta_large_grad", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|fakenews_roberta_large_grad| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/Denyol/FakeNews-roberta-large-grad \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-fakenews_roberta_large_grad_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-fakenews_roberta_large_grad_pipeline_en.md new file mode 100644 index 00000000000000..22cfdabc02a535 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-fakenews_roberta_large_grad_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English fakenews_roberta_large_grad_pipeline pipeline RoBertaForSequenceClassification from Denyol +author: John Snow Labs +name: fakenews_roberta_large_grad_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`fakenews_roberta_large_grad_pipeline` is a English model originally trained by Denyol. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/fakenews_roberta_large_grad_pipeline_en_5.5.0_3.0_1727037681607.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/fakenews_roberta_large_grad_pipeline_en_5.5.0_3.0_1727037681607.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("fakenews_roberta_large_grad_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("fakenews_roberta_large_grad_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|fakenews_roberta_large_grad_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/Denyol/FakeNews-roberta-large-grad + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-films_hate_offensive_roberta_en.md b/docs/_posts/ahmedlone127/2024-09-22-films_hate_offensive_roberta_en.md new file mode 100644 index 00000000000000..00dbbec0ee7b46 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-films_hate_offensive_roberta_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English films_hate_offensive_roberta RoBertaForSequenceClassification from esmarquez17 +author: John Snow Labs +name: films_hate_offensive_roberta +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`films_hate_offensive_roberta` is a English model originally trained by esmarquez17. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/films_hate_offensive_roberta_en_5.5.0_3.0_1726972212863.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/films_hate_offensive_roberta_en_5.5.0_3.0_1726972212863.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("films_hate_offensive_roberta","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("films_hate_offensive_roberta", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|films_hate_offensive_roberta| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|448.5 MB| + +## References + +https://huggingface.co/esmarquez17/films-hate-offensive-roberta \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-films_hate_offensive_roberta_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-films_hate_offensive_roberta_pipeline_en.md new file mode 100644 index 00000000000000..842c2a3b8e8fb8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-films_hate_offensive_roberta_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English films_hate_offensive_roberta_pipeline pipeline RoBertaForSequenceClassification from esmarquez17 +author: John Snow Labs +name: films_hate_offensive_roberta_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`films_hate_offensive_roberta_pipeline` is a English model originally trained by esmarquez17. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/films_hate_offensive_roberta_pipeline_en_5.5.0_3.0_1726972235315.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/films_hate_offensive_roberta_pipeline_en_5.5.0_3.0_1726972235315.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("films_hate_offensive_roberta_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("films_hate_offensive_roberta_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|films_hate_offensive_roberta_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|448.5 MB| + +## References + +https://huggingface.co/esmarquez17/films-hate-offensive-roberta + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-final_ft__roberta_clinical_wl_spanish__70k_ultrasounds_en.md b/docs/_posts/ahmedlone127/2024-09-22-final_ft__roberta_clinical_wl_spanish__70k_ultrasounds_en.md new file mode 100644 index 00000000000000..527d59cd95cd4a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-final_ft__roberta_clinical_wl_spanish__70k_ultrasounds_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English final_ft__roberta_clinical_wl_spanish__70k_ultrasounds RoBertaEmbeddings from manucos +author: John Snow Labs +name: final_ft__roberta_clinical_wl_spanish__70k_ultrasounds +date: 2024-09-22 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`final_ft__roberta_clinical_wl_spanish__70k_ultrasounds` is a English model originally trained by manucos. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/final_ft__roberta_clinical_wl_spanish__70k_ultrasounds_en_5.5.0_3.0_1726999500606.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/final_ft__roberta_clinical_wl_spanish__70k_ultrasounds_en_5.5.0_3.0_1726999500606.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("final_ft__roberta_clinical_wl_spanish__70k_ultrasounds","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("final_ft__roberta_clinical_wl_spanish__70k_ultrasounds","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|final_ft__roberta_clinical_wl_spanish__70k_ultrasounds| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|469.7 MB| + +## References + +https://huggingface.co/manucos/final-ft__roberta-clinical-wl-es__70k-ultrasounds \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-final_ft__roberta_clinical_wl_spanish__70k_ultrasounds_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-final_ft__roberta_clinical_wl_spanish__70k_ultrasounds_pipeline_en.md new file mode 100644 index 00000000000000..d98359a84519b6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-final_ft__roberta_clinical_wl_spanish__70k_ultrasounds_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English final_ft__roberta_clinical_wl_spanish__70k_ultrasounds_pipeline pipeline RoBertaEmbeddings from manucos +author: John Snow Labs +name: final_ft__roberta_clinical_wl_spanish__70k_ultrasounds_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`final_ft__roberta_clinical_wl_spanish__70k_ultrasounds_pipeline` is a English model originally trained by manucos. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/final_ft__roberta_clinical_wl_spanish__70k_ultrasounds_pipeline_en_5.5.0_3.0_1726999522571.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/final_ft__roberta_clinical_wl_spanish__70k_ultrasounds_pipeline_en_5.5.0_3.0_1726999522571.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("final_ft__roberta_clinical_wl_spanish__70k_ultrasounds_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("final_ft__roberta_clinical_wl_spanish__70k_ultrasounds_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|final_ft__roberta_clinical_wl_spanish__70k_ultrasounds_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|469.7 MB| + +## References + +https://huggingface.co/manucos/final-ft__roberta-clinical-wl-es__70k-ultrasounds + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-final_model_mkbackup_en.md b/docs/_posts/ahmedlone127/2024-09-22-final_model_mkbackup_en.md new file mode 100644 index 00000000000000..146822dc15f2d4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-final_model_mkbackup_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English final_model_mkbackup WhisperForCTC from mkbackup +author: John Snow Labs +name: final_model_mkbackup +date: 2024-09-22 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`final_model_mkbackup` is a English model originally trained by mkbackup. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/final_model_mkbackup_en_5.5.0_3.0_1726985377035.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/final_model_mkbackup_en_5.5.0_3.0_1726985377035.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("final_model_mkbackup","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("final_model_mkbackup", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|final_model_mkbackup| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/mkbackup/final_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-final_model_mkbackup_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-final_model_mkbackup_pipeline_en.md new file mode 100644 index 00000000000000..1915f0aee91d04 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-final_model_mkbackup_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English final_model_mkbackup_pipeline pipeline WhisperForCTC from mkbackup +author: John Snow Labs +name: final_model_mkbackup_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`final_model_mkbackup_pipeline` is a English model originally trained by mkbackup. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/final_model_mkbackup_pipeline_en_5.5.0_3.0_1726985458502.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/final_model_mkbackup_pipeline_en_5.5.0_3.0_1726985458502.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("final_model_mkbackup_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("final_model_mkbackup_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|final_model_mkbackup_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/mkbackup/final_model + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-final_model_thebisso09_en.md b/docs/_posts/ahmedlone127/2024-09-22-final_model_thebisso09_en.md new file mode 100644 index 00000000000000..10f2a20641daf3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-final_model_thebisso09_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English final_model_thebisso09 DistilBertForSequenceClassification from Thebisso09 +author: John Snow Labs +name: final_model_thebisso09 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`final_model_thebisso09` is a English model originally trained by Thebisso09. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/final_model_thebisso09_en_5.5.0_3.0_1727033708824.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/final_model_thebisso09_en_5.5.0_3.0_1727033708824.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("final_model_thebisso09","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("final_model_thebisso09", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|final_model_thebisso09| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Thebisso09/final_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-final_model_thebisso09_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-final_model_thebisso09_pipeline_en.md new file mode 100644 index 00000000000000..8a93b37891b754 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-final_model_thebisso09_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English final_model_thebisso09_pipeline pipeline DistilBertForSequenceClassification from Thebisso09 +author: John Snow Labs +name: final_model_thebisso09_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`final_model_thebisso09_pipeline` is a English model originally trained by Thebisso09. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/final_model_thebisso09_pipeline_en_5.5.0_3.0_1727033721385.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/final_model_thebisso09_pipeline_en_5.5.0_3.0_1727033721385.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("final_model_thebisso09_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("final_model_thebisso09_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|final_model_thebisso09_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Thebisso09/final_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-final_whisper_for_initial_publish_en.md b/docs/_posts/ahmedlone127/2024-09-22-final_whisper_for_initial_publish_en.md new file mode 100644 index 00000000000000..b729c5845455de --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-final_whisper_for_initial_publish_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English final_whisper_for_initial_publish WhisperForCTC from AsemBadr +author: John Snow Labs +name: final_whisper_for_initial_publish +date: 2024-09-22 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`final_whisper_for_initial_publish` is a English model originally trained by AsemBadr. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/final_whisper_for_initial_publish_en_5.5.0_3.0_1726986065765.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/final_whisper_for_initial_publish_en_5.5.0_3.0_1726986065765.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("final_whisper_for_initial_publish","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("final_whisper_for_initial_publish", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|final_whisper_for_initial_publish| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/AsemBadr/final-whisper-for-initial-publish \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-final_whisper_for_initial_publish_v2_en.md b/docs/_posts/ahmedlone127/2024-09-22-final_whisper_for_initial_publish_v2_en.md new file mode 100644 index 00000000000000..98e5b916cda19c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-final_whisper_for_initial_publish_v2_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English final_whisper_for_initial_publish_v2 WhisperForCTC from AsemBadr +author: John Snow Labs +name: final_whisper_for_initial_publish_v2 +date: 2024-09-22 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`final_whisper_for_initial_publish_v2` is a English model originally trained by AsemBadr. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/final_whisper_for_initial_publish_v2_en_5.5.0_3.0_1727023247076.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/final_whisper_for_initial_publish_v2_en_5.5.0_3.0_1727023247076.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("final_whisper_for_initial_publish_v2","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("final_whisper_for_initial_publish_v2", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|final_whisper_for_initial_publish_v2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/AsemBadr/final-whisper-for-initial-publish-v2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-final_whisper_for_initial_publish_v2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-final_whisper_for_initial_publish_v2_pipeline_en.md new file mode 100644 index 00000000000000..64925be6693dc0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-final_whisper_for_initial_publish_v2_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English final_whisper_for_initial_publish_v2_pipeline pipeline WhisperForCTC from AsemBadr +author: John Snow Labs +name: final_whisper_for_initial_publish_v2_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`final_whisper_for_initial_publish_v2_pipeline` is a English model originally trained by AsemBadr. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/final_whisper_for_initial_publish_v2_pipeline_en_5.5.0_3.0_1727023335601.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/final_whisper_for_initial_publish_v2_pipeline_en_5.5.0_3.0_1727023335601.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("final_whisper_for_initial_publish_v2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("final_whisper_for_initial_publish_v2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|final_whisper_for_initial_publish_v2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/AsemBadr/final-whisper-for-initial-publish-v2 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-fine_tune_bert_base_cased_en.md b/docs/_posts/ahmedlone127/2024-09-22-fine_tune_bert_base_cased_en.md new file mode 100644 index 00000000000000..338245ae1af36d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-fine_tune_bert_base_cased_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English fine_tune_bert_base_cased BertForQuestionAnswering from Chessmen +author: John Snow Labs +name: fine_tune_bert_base_cased +date: 2024-09-22 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`fine_tune_bert_base_cased` is a English model originally trained by Chessmen. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/fine_tune_bert_base_cased_en_5.5.0_3.0_1726991954461.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/fine_tune_bert_base_cased_en_5.5.0_3.0_1726991954461.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("fine_tune_bert_base_cased","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("fine_tune_bert_base_cased", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|fine_tune_bert_base_cased| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/Chessmen/fine_tune_bert-base-cased \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-fine_tune_bert_base_cased_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-fine_tune_bert_base_cased_pipeline_en.md new file mode 100644 index 00000000000000..884c8a8458d670 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-fine_tune_bert_base_cased_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English fine_tune_bert_base_cased_pipeline pipeline BertForQuestionAnswering from Chessmen +author: John Snow Labs +name: fine_tune_bert_base_cased_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`fine_tune_bert_base_cased_pipeline` is a English model originally trained by Chessmen. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/fine_tune_bert_base_cased_pipeline_en_5.5.0_3.0_1726991972125.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/fine_tune_bert_base_cased_pipeline_en_5.5.0_3.0_1726991972125.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("fine_tune_bert_base_cased_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("fine_tune_bert_base_cased_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|fine_tune_bert_base_cased_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/Chessmen/fine_tune_bert-base-cased + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-fine_tuned_roberta_nosql_injection_en.md b/docs/_posts/ahmedlone127/2024-09-22-fine_tuned_roberta_nosql_injection_en.md new file mode 100644 index 00000000000000..4343cd4aa349a6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-fine_tuned_roberta_nosql_injection_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English fine_tuned_roberta_nosql_injection RoBertaEmbeddings from ankush-003 +author: John Snow Labs +name: fine_tuned_roberta_nosql_injection +date: 2024-09-22 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`fine_tuned_roberta_nosql_injection` is a English model originally trained by ankush-003. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/fine_tuned_roberta_nosql_injection_en_5.5.0_3.0_1727041555348.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/fine_tuned_roberta_nosql_injection_en_5.5.0_3.0_1727041555348.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("fine_tuned_roberta_nosql_injection","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("fine_tuned_roberta_nosql_injection","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|fine_tuned_roberta_nosql_injection| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|466.0 MB| + +## References + +https://huggingface.co/ankush-003/fine-tuned-roberta-nosql-injection \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-fine_tuned_roberta_nosql_injection_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-fine_tuned_roberta_nosql_injection_pipeline_en.md new file mode 100644 index 00000000000000..f76f3d5b714439 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-fine_tuned_roberta_nosql_injection_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English fine_tuned_roberta_nosql_injection_pipeline pipeline RoBertaEmbeddings from ankush-003 +author: John Snow Labs +name: fine_tuned_roberta_nosql_injection_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`fine_tuned_roberta_nosql_injection_pipeline` is a English model originally trained by ankush-003. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/fine_tuned_roberta_nosql_injection_pipeline_en_5.5.0_3.0_1727041579383.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/fine_tuned_roberta_nosql_injection_pipeline_en_5.5.0_3.0_1727041579383.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("fine_tuned_roberta_nosql_injection_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("fine_tuned_roberta_nosql_injection_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|fine_tuned_roberta_nosql_injection_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|466.0 MB| + +## References + +https://huggingface.co/ankush-003/fine-tuned-roberta-nosql-injection + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-finetune_distilbert_sst_avalinguo_fluency_en.md b/docs/_posts/ahmedlone127/2024-09-22-finetune_distilbert_sst_avalinguo_fluency_en.md new file mode 100644 index 00000000000000..ca781911b6f0be --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-finetune_distilbert_sst_avalinguo_fluency_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetune_distilbert_sst_avalinguo_fluency DistilBertForSequenceClassification from papasega +author: John Snow Labs +name: finetune_distilbert_sst_avalinguo_fluency +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetune_distilbert_sst_avalinguo_fluency` is a English model originally trained by papasega. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetune_distilbert_sst_avalinguo_fluency_en_5.5.0_3.0_1726979997659.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetune_distilbert_sst_avalinguo_fluency_en_5.5.0_3.0_1726979997659.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetune_distilbert_sst_avalinguo_fluency","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetune_distilbert_sst_avalinguo_fluency", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetune_distilbert_sst_avalinguo_fluency| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/papasega/finetune_Distilbert_SST_Avalinguo_Fluency \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-finetune_distilbert_sst_avalinguo_fluency_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-finetune_distilbert_sst_avalinguo_fluency_pipeline_en.md new file mode 100644 index 00000000000000..2b6a0fe2b6aec7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-finetune_distilbert_sst_avalinguo_fluency_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetune_distilbert_sst_avalinguo_fluency_pipeline pipeline DistilBertForSequenceClassification from papasega +author: John Snow Labs +name: finetune_distilbert_sst_avalinguo_fluency_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetune_distilbert_sst_avalinguo_fluency_pipeline` is a English model originally trained by papasega. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetune_distilbert_sst_avalinguo_fluency_pipeline_en_5.5.0_3.0_1726980011756.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetune_distilbert_sst_avalinguo_fluency_pipeline_en_5.5.0_3.0_1726980011756.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetune_distilbert_sst_avalinguo_fluency_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetune_distilbert_sst_avalinguo_fluency_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetune_distilbert_sst_avalinguo_fluency_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/papasega/finetune_Distilbert_SST_Avalinguo_Fluency + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-finetuned_demo_2x_en.md b/docs/_posts/ahmedlone127/2024-09-22-finetuned_demo_2x_en.md new file mode 100644 index 00000000000000..94d4b728b6ddf4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-finetuned_demo_2x_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuned_demo_2x DistilBertForSequenceClassification from nardellu +author: John Snow Labs +name: finetuned_demo_2x +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuned_demo_2x` is a English model originally trained by nardellu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuned_demo_2x_en_5.5.0_3.0_1727020657927.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuned_demo_2x_en_5.5.0_3.0_1727020657927.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuned_demo_2x","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuned_demo_2x", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuned_demo_2x| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|246.0 MB| + +## References + +https://huggingface.co/nardellu/finetuned_demo_2X \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-finetuned_demo_2x_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-finetuned_demo_2x_pipeline_en.md new file mode 100644 index 00000000000000..34cd4be0125601 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-finetuned_demo_2x_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuned_demo_2x_pipeline pipeline DistilBertForSequenceClassification from nardellu +author: John Snow Labs +name: finetuned_demo_2x_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuned_demo_2x_pipeline` is a English model originally trained by nardellu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuned_demo_2x_pipeline_en_5.5.0_3.0_1727020669354.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuned_demo_2x_pipeline_en_5.5.0_3.0_1727020669354.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuned_demo_2x_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuned_demo_2x_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuned_demo_2x_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|246.0 MB| + +## References + +https://huggingface.co/nardellu/finetuned_demo_2X + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-finetuned_sentiment_model_imdb_distilbert_2_en.md b/docs/_posts/ahmedlone127/2024-09-22-finetuned_sentiment_model_imdb_distilbert_2_en.md new file mode 100644 index 00000000000000..07ec3d15cbfb7d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-finetuned_sentiment_model_imdb_distilbert_2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuned_sentiment_model_imdb_distilbert_2 DistilBertForSequenceClassification from Tzimon +author: John Snow Labs +name: finetuned_sentiment_model_imdb_distilbert_2 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuned_sentiment_model_imdb_distilbert_2` is a English model originally trained by Tzimon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuned_sentiment_model_imdb_distilbert_2_en_5.5.0_3.0_1727020969574.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuned_sentiment_model_imdb_distilbert_2_en_5.5.0_3.0_1727020969574.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuned_sentiment_model_imdb_distilbert_2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuned_sentiment_model_imdb_distilbert_2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuned_sentiment_model_imdb_distilbert_2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Tzimon/finetuned_sentiment_model_imdb_distilbert_2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-finetuned_sentiment_model_imdb_distilbert_2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-finetuned_sentiment_model_imdb_distilbert_2_pipeline_en.md new file mode 100644 index 00000000000000..a6cd3deb1bd756 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-finetuned_sentiment_model_imdb_distilbert_2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuned_sentiment_model_imdb_distilbert_2_pipeline pipeline DistilBertForSequenceClassification from Tzimon +author: John Snow Labs +name: finetuned_sentiment_model_imdb_distilbert_2_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuned_sentiment_model_imdb_distilbert_2_pipeline` is a English model originally trained by Tzimon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuned_sentiment_model_imdb_distilbert_2_pipeline_en_5.5.0_3.0_1727020981741.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuned_sentiment_model_imdb_distilbert_2_pipeline_en_5.5.0_3.0_1727020981741.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuned_sentiment_model_imdb_distilbert_2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuned_sentiment_model_imdb_distilbert_2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuned_sentiment_model_imdb_distilbert_2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Tzimon/finetuned_sentiment_model_imdb_distilbert_2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-finetuning_sentiment_model_11000_samples_en.md b/docs/_posts/ahmedlone127/2024-09-22-finetuning_sentiment_model_11000_samples_en.md new file mode 100644 index 00000000000000..e96b1ce78d1fc1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-finetuning_sentiment_model_11000_samples_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuning_sentiment_model_11000_samples DistilBertForSequenceClassification from ZainabNac +author: John Snow Labs +name: finetuning_sentiment_model_11000_samples +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_11000_samples` is a English model originally trained by ZainabNac. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_11000_samples_en_5.5.0_3.0_1726980522450.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_11000_samples_en_5.5.0_3.0_1726980522450.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_11000_samples","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_11000_samples", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_11000_samples| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/ZainabNac/finetuning-sentiment-model-11000-samples \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-finetuning_sentiment_model_11000_samples_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-finetuning_sentiment_model_11000_samples_pipeline_en.md new file mode 100644 index 00000000000000..48c873ec6f5106 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-finetuning_sentiment_model_11000_samples_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuning_sentiment_model_11000_samples_pipeline pipeline DistilBertForSequenceClassification from ZainabNac +author: John Snow Labs +name: finetuning_sentiment_model_11000_samples_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_11000_samples_pipeline` is a English model originally trained by ZainabNac. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_11000_samples_pipeline_en_5.5.0_3.0_1726980534579.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_11000_samples_pipeline_en_5.5.0_3.0_1726980534579.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuning_sentiment_model_11000_samples_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuning_sentiment_model_11000_samples_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_11000_samples_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/ZainabNac/finetuning-sentiment-model-11000-samples + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-finetuning_sentiment_model_3000_samples_ammarasmro_en.md b/docs/_posts/ahmedlone127/2024-09-22-finetuning_sentiment_model_3000_samples_ammarasmro_en.md new file mode 100644 index 00000000000000..9e2ff3dc02fc97 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-finetuning_sentiment_model_3000_samples_ammarasmro_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_ammarasmro DistilBertForSequenceClassification from ammarasmro +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_ammarasmro +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_ammarasmro` is a English model originally trained by ammarasmro. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_ammarasmro_en_5.5.0_3.0_1727012797675.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_ammarasmro_en_5.5.0_3.0_1727012797675.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_ammarasmro","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_ammarasmro", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_ammarasmro| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/ammarasmro/finetuning-sentiment-model-3000-samples \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-finetuning_sentiment_model_3000_samples_ammarasmro_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-finetuning_sentiment_model_3000_samples_ammarasmro_pipeline_en.md new file mode 100644 index 00000000000000..9e471f0e9512fd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-finetuning_sentiment_model_3000_samples_ammarasmro_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_ammarasmro_pipeline pipeline DistilBertForSequenceClassification from ammarasmro +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_ammarasmro_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_ammarasmro_pipeline` is a English model originally trained by ammarasmro. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_ammarasmro_pipeline_en_5.5.0_3.0_1727012809317.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_ammarasmro_pipeline_en_5.5.0_3.0_1727012809317.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuning_sentiment_model_3000_samples_ammarasmro_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuning_sentiment_model_3000_samples_ammarasmro_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_ammarasmro_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/ammarasmro/finetuning-sentiment-model-3000-samples + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-finetuning_sentiment_model_3000_samples_bianchidev_en.md b/docs/_posts/ahmedlone127/2024-09-22-finetuning_sentiment_model_3000_samples_bianchidev_en.md new file mode 100644 index 00000000000000..00dc82ba2110d7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-finetuning_sentiment_model_3000_samples_bianchidev_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_bianchidev DistilBertForSequenceClassification from BianchiDev +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_bianchidev +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_bianchidev` is a English model originally trained by BianchiDev. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_bianchidev_en_5.5.0_3.0_1727020687554.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_bianchidev_en_5.5.0_3.0_1727020687554.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_bianchidev","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_bianchidev", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_bianchidev| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/BianchiDev/finetuning-sentiment-model-3000-samples \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-finetuning_sentiment_model_3000_samples_bianchidev_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-finetuning_sentiment_model_3000_samples_bianchidev_pipeline_en.md new file mode 100644 index 00000000000000..93d2d6ef1b69e7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-finetuning_sentiment_model_3000_samples_bianchidev_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_bianchidev_pipeline pipeline DistilBertForSequenceClassification from BianchiDev +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_bianchidev_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_bianchidev_pipeline` is a English model originally trained by BianchiDev. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_bianchidev_pipeline_en_5.5.0_3.0_1727020698997.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_bianchidev_pipeline_en_5.5.0_3.0_1727020698997.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuning_sentiment_model_3000_samples_bianchidev_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuning_sentiment_model_3000_samples_bianchidev_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_bianchidev_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/BianchiDev/finetuning-sentiment-model-3000-samples + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-finetuning_sentiment_model_3000_samples_jimbo4794_en.md b/docs/_posts/ahmedlone127/2024-09-22-finetuning_sentiment_model_3000_samples_jimbo4794_en.md new file mode 100644 index 00000000000000..61eb30a2f1a2e9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-finetuning_sentiment_model_3000_samples_jimbo4794_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_jimbo4794 DistilBertForSequenceClassification from Jimbo4794 +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_jimbo4794 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_jimbo4794` is a English model originally trained by Jimbo4794. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_jimbo4794_en_5.5.0_3.0_1727021134459.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_jimbo4794_en_5.5.0_3.0_1727021134459.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_jimbo4794","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_jimbo4794", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_jimbo4794| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Jimbo4794/finetuning-sentiment-model-3000-samples \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-finetuning_sentiment_model_3000_samples_liujiajiaee_en.md b/docs/_posts/ahmedlone127/2024-09-22-finetuning_sentiment_model_3000_samples_liujiajiaee_en.md new file mode 100644 index 00000000000000..21ac8b2ea6b127 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-finetuning_sentiment_model_3000_samples_liujiajiaee_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_liujiajiaee DistilBertForSequenceClassification from liujiajiaee +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_liujiajiaee +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_liujiajiaee` is a English model originally trained by liujiajiaee. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_liujiajiaee_en_5.5.0_3.0_1726980295891.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_liujiajiaee_en_5.5.0_3.0_1726980295891.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_liujiajiaee","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_liujiajiaee", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_liujiajiaee| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/liujiajiaee/finetuning-sentiment-model-3000-samples \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-finetuning_sentiment_model_3000_samples_liujiajiaee_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-finetuning_sentiment_model_3000_samples_liujiajiaee_pipeline_en.md new file mode 100644 index 00000000000000..9f5e66bc79d42a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-finetuning_sentiment_model_3000_samples_liujiajiaee_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_liujiajiaee_pipeline pipeline DistilBertForSequenceClassification from liujiajiaee +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_liujiajiaee_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_liujiajiaee_pipeline` is a English model originally trained by liujiajiaee. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_liujiajiaee_pipeline_en_5.5.0_3.0_1726980307608.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_liujiajiaee_pipeline_en_5.5.0_3.0_1726980307608.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuning_sentiment_model_3000_samples_liujiajiaee_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuning_sentiment_model_3000_samples_liujiajiaee_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_liujiajiaee_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/liujiajiaee/finetuning-sentiment-model-3000-samples + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-finetuning_sentiment_model_3000_samples_lwhite_en.md b/docs/_posts/ahmedlone127/2024-09-22-finetuning_sentiment_model_3000_samples_lwhite_en.md new file mode 100644 index 00000000000000..65b99b79a47e71 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-finetuning_sentiment_model_3000_samples_lwhite_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_lwhite DistilBertForSequenceClassification from lwhite +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_lwhite +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_lwhite` is a English model originally trained by lwhite. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_lwhite_en_5.5.0_3.0_1726980305501.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_lwhite_en_5.5.0_3.0_1726980305501.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_lwhite","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_lwhite", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_lwhite| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/lwhite/finetuning-sentiment-model-3000-samples \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-finetuning_sentiment_model_3000_samples_lwhite_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-finetuning_sentiment_model_3000_samples_lwhite_pipeline_en.md new file mode 100644 index 00000000000000..09612af17adcad --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-finetuning_sentiment_model_3000_samples_lwhite_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_lwhite_pipeline pipeline DistilBertForSequenceClassification from lwhite +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_lwhite_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_lwhite_pipeline` is a English model originally trained by lwhite. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_lwhite_pipeline_en_5.5.0_3.0_1726980318052.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_lwhite_pipeline_en_5.5.0_3.0_1726980318052.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuning_sentiment_model_3000_samples_lwhite_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuning_sentiment_model_3000_samples_lwhite_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_lwhite_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/lwhite/finetuning-sentiment-model-3000-samples + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-finetuning_sentiment_model_3000_samples_neo111x_en.md b/docs/_posts/ahmedlone127/2024-09-22-finetuning_sentiment_model_3000_samples_neo111x_en.md new file mode 100644 index 00000000000000..69399fa0f89f4b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-finetuning_sentiment_model_3000_samples_neo111x_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_neo111x DistilBertForSequenceClassification from Neo111x +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_neo111x +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_neo111x` is a English model originally trained by Neo111x. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_neo111x_en_5.5.0_3.0_1727020393301.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_neo111x_en_5.5.0_3.0_1727020393301.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_neo111x","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_neo111x", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_neo111x| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Neo111x/finetuning-sentiment-model-3000-samples \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-finetuning_sentiment_model_5000_samples_yvillamil_en.md b/docs/_posts/ahmedlone127/2024-09-22-finetuning_sentiment_model_5000_samples_yvillamil_en.md new file mode 100644 index 00000000000000..3a0fddf865e761 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-finetuning_sentiment_model_5000_samples_yvillamil_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuning_sentiment_model_5000_samples_yvillamil DistilBertForSequenceClassification from yvillamil +author: John Snow Labs +name: finetuning_sentiment_model_5000_samples_yvillamil +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_5000_samples_yvillamil` is a English model originally trained by yvillamil. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_5000_samples_yvillamil_en_5.5.0_3.0_1727035271767.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_5000_samples_yvillamil_en_5.5.0_3.0_1727035271767.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_5000_samples_yvillamil","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_5000_samples_yvillamil", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_5000_samples_yvillamil| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/yvillamil/finetuning-sentiment-model-5000-samples \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-finetuning_sentiment_model_5000_samples_yvillamil_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-finetuning_sentiment_model_5000_samples_yvillamil_pipeline_en.md new file mode 100644 index 00000000000000..804318d441d5d1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-finetuning_sentiment_model_5000_samples_yvillamil_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuning_sentiment_model_5000_samples_yvillamil_pipeline pipeline DistilBertForSequenceClassification from yvillamil +author: John Snow Labs +name: finetuning_sentiment_model_5000_samples_yvillamil_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_5000_samples_yvillamil_pipeline` is a English model originally trained by yvillamil. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_5000_samples_yvillamil_pipeline_en_5.5.0_3.0_1727035284169.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_5000_samples_yvillamil_pipeline_en_5.5.0_3.0_1727035284169.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuning_sentiment_model_5000_samples_yvillamil_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuning_sentiment_model_5000_samples_yvillamil_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_5000_samples_yvillamil_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/yvillamil/finetuning-sentiment-model-5000-samples + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-finetuning_sentiment_model_nerproject7_en.md b/docs/_posts/ahmedlone127/2024-09-22-finetuning_sentiment_model_nerproject7_en.md new file mode 100644 index 00000000000000..30451bbc02aff0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-finetuning_sentiment_model_nerproject7_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuning_sentiment_model_nerproject7 DistilBertForSequenceClassification from nerproject7 +author: John Snow Labs +name: finetuning_sentiment_model_nerproject7 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_nerproject7` is a English model originally trained by nerproject7. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_nerproject7_en_5.5.0_3.0_1726980204184.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_nerproject7_en_5.5.0_3.0_1726980204184.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_nerproject7","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_nerproject7", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_nerproject7| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|246.0 MB| + +## References + +https://huggingface.co/nerproject7/finetuning-sentiment-model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-finetuning_sentiment_model_nerproject7_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-finetuning_sentiment_model_nerproject7_pipeline_en.md new file mode 100644 index 00000000000000..f3060157850681 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-finetuning_sentiment_model_nerproject7_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuning_sentiment_model_nerproject7_pipeline pipeline DistilBertForSequenceClassification from nerproject7 +author: John Snow Labs +name: finetuning_sentiment_model_nerproject7_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_nerproject7_pipeline` is a English model originally trained by nerproject7. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_nerproject7_pipeline_en_5.5.0_3.0_1726980216523.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_nerproject7_pipeline_en_5.5.0_3.0_1726980216523.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuning_sentiment_model_nerproject7_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuning_sentiment_model_nerproject7_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_nerproject7_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|246.0 MB| + +## References + +https://huggingface.co/nerproject7/finetuning-sentiment-model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-finetuning_sentiment_model_qiuxuan_en.md b/docs/_posts/ahmedlone127/2024-09-22-finetuning_sentiment_model_qiuxuan_en.md new file mode 100644 index 00000000000000..868b6638e5cbb5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-finetuning_sentiment_model_qiuxuan_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuning_sentiment_model_qiuxuan DistilBertForSequenceClassification from Qiuxuan +author: John Snow Labs +name: finetuning_sentiment_model_qiuxuan +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_qiuxuan` is a English model originally trained by Qiuxuan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_qiuxuan_en_5.5.0_3.0_1726980555617.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_qiuxuan_en_5.5.0_3.0_1726980555617.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_qiuxuan","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_qiuxuan", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_qiuxuan| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Qiuxuan/finetuning-sentiment-model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-finetuning_sentiment_model_qiuxuan_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-finetuning_sentiment_model_qiuxuan_pipeline_en.md new file mode 100644 index 00000000000000..a2e42faff1aae5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-finetuning_sentiment_model_qiuxuan_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuning_sentiment_model_qiuxuan_pipeline pipeline DistilBertForSequenceClassification from Qiuxuan +author: John Snow Labs +name: finetuning_sentiment_model_qiuxuan_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_qiuxuan_pipeline` is a English model originally trained by Qiuxuan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_qiuxuan_pipeline_en_5.5.0_3.0_1726980567325.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_qiuxuan_pipeline_en_5.5.0_3.0_1726980567325.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuning_sentiment_model_qiuxuan_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuning_sentiment_model_qiuxuan_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_qiuxuan_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Qiuxuan/finetuning-sentiment-model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-frugalscore_medium_deberta_bert_score_en.md b/docs/_posts/ahmedlone127/2024-09-22-frugalscore_medium_deberta_bert_score_en.md new file mode 100644 index 00000000000000..8142016044be9e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-frugalscore_medium_deberta_bert_score_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English frugalscore_medium_deberta_bert_score BertForSequenceClassification from moussaKam +author: John Snow Labs +name: frugalscore_medium_deberta_bert_score +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`frugalscore_medium_deberta_bert_score` is a English model originally trained by moussaKam. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/frugalscore_medium_deberta_bert_score_en_5.5.0_3.0_1727031985606.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/frugalscore_medium_deberta_bert_score_en_5.5.0_3.0_1727031985606.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("frugalscore_medium_deberta_bert_score","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("frugalscore_medium_deberta_bert_score", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|frugalscore_medium_deberta_bert_score| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|155.2 MB| + +## References + +https://huggingface.co/moussaKam/frugalscore_medium_deberta_bert-score \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-frugalscore_medium_roberta_bert_score_en.md b/docs/_posts/ahmedlone127/2024-09-22-frugalscore_medium_roberta_bert_score_en.md new file mode 100644 index 00000000000000..93486b94dd309b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-frugalscore_medium_roberta_bert_score_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English frugalscore_medium_roberta_bert_score BertForSequenceClassification from moussaKam +author: John Snow Labs +name: frugalscore_medium_roberta_bert_score +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`frugalscore_medium_roberta_bert_score` is a English model originally trained by moussaKam. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/frugalscore_medium_roberta_bert_score_en_5.5.0_3.0_1727034480985.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/frugalscore_medium_roberta_bert_score_en_5.5.0_3.0_1727034480985.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("frugalscore_medium_roberta_bert_score","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("frugalscore_medium_roberta_bert_score", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|frugalscore_medium_roberta_bert_score| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|155.2 MB| + +## References + +https://huggingface.co/moussaKam/frugalscore_medium_roberta_bert-score \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-frugalscore_medium_roberta_bert_score_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-frugalscore_medium_roberta_bert_score_pipeline_en.md new file mode 100644 index 00000000000000..bc3519dd126394 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-frugalscore_medium_roberta_bert_score_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English frugalscore_medium_roberta_bert_score_pipeline pipeline BertForSequenceClassification from moussaKam +author: John Snow Labs +name: frugalscore_medium_roberta_bert_score_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`frugalscore_medium_roberta_bert_score_pipeline` is a English model originally trained by moussaKam. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/frugalscore_medium_roberta_bert_score_pipeline_en_5.5.0_3.0_1727034489087.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/frugalscore_medium_roberta_bert_score_pipeline_en_5.5.0_3.0_1727034489087.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("frugalscore_medium_roberta_bert_score_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("frugalscore_medium_roberta_bert_score_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|frugalscore_medium_roberta_bert_score_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|155.2 MB| + +## References + +https://huggingface.co/moussaKam/frugalscore_medium_roberta_bert-score + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-ft_distilbert_base_uncased_nlp_feup_en.md b/docs/_posts/ahmedlone127/2024-09-22-ft_distilbert_base_uncased_nlp_feup_en.md new file mode 100644 index 00000000000000..628db2ceb9a5fe --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-ft_distilbert_base_uncased_nlp_feup_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English ft_distilbert_base_uncased_nlp_feup DistilBertForSequenceClassification from NLP-FEUP +author: John Snow Labs +name: ft_distilbert_base_uncased_nlp_feup +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ft_distilbert_base_uncased_nlp_feup` is a English model originally trained by NLP-FEUP. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ft_distilbert_base_uncased_nlp_feup_en_5.5.0_3.0_1727035506995.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ft_distilbert_base_uncased_nlp_feup_en_5.5.0_3.0_1727035506995.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("ft_distilbert_base_uncased_nlp_feup","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("ft_distilbert_base_uncased_nlp_feup", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ft_distilbert_base_uncased_nlp_feup| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/NLP-FEUP/FT-distilbert-base-uncased \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-fyp_en.md b/docs/_posts/ahmedlone127/2024-09-22-fyp_en.md new file mode 100644 index 00000000000000..1cf759c53f520b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-fyp_en.md @@ -0,0 +1,88 @@ +--- +layout: model +title: English fyp T5Transformer from yaashwardhan +author: John Snow Labs +name: fyp +date: 2024-09-22 +tags: [en, open_source, onnx, t5, question_answering, summarization, translation, text_generation] +task: [Question Answering, Summarization, Translation, Text Generation] +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained T5Transformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`fyp` is a English model originally trained by yaashwardhan. + +## Predicted Entities + + + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/fyp_en_5.5.0_3.0_1727034850196.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/fyp_en_5.5.0_3.0_1727034850196.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +t5 = T5Transformer.pretrained("fyp","en") \ + .setInputCols(["document"]) \ + .setOutputCol("output") + +pipeline = Pipeline().setStages([documentAssembler, t5]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val t5 = T5Transformer.pretrained("fyp", "en") + .setInputCols(Array("documents")) + .setOutputCol("output") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, t5)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|fyp| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|405.9 MB| + +## References + +References + +https://huggingface.co/yaashwardhan/fyp \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-fyp_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-fyp_pipeline_en.md new file mode 100644 index 00000000000000..e2eebabe3dbeaf --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-fyp_pipeline_en.md @@ -0,0 +1,72 @@ +--- +layout: model +title: English fyp_pipeline pipeline T5Transformer from yaashwardhan +author: John Snow Labs +name: fyp_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: [Question Answering, Summarization, Translation, Text Generation] +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained T5Transformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`fyp_pipeline` is a English model originally trained by yaashwardhan. + +## Predicted Entities + + + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/fyp_pipeline_en_5.5.0_3.0_1727034870402.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/fyp_pipeline_en_5.5.0_3.0_1727034870402.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +pipeline = PretrainedPipeline("fyp_pipeline", lang = "en") +annotations = pipeline.transform(df) +``` +```scala +val pipeline = new PretrainedPipeline("fyp_pipeline", lang = "en") +val annotations = pipeline.transform(df) +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|fyp_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|406.0 MB| + +## References + +References + +https://huggingface.co/yaashwardhan/fyp + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-genztranscribe_base_hindi_en.md b/docs/_posts/ahmedlone127/2024-09-22-genztranscribe_base_hindi_en.md new file mode 100644 index 00000000000000..51e1f412eeab38 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-genztranscribe_base_hindi_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English genztranscribe_base_hindi WhisperForCTC from KshitizPandya +author: John Snow Labs +name: genztranscribe_base_hindi +date: 2024-09-22 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`genztranscribe_base_hindi` is a English model originally trained by KshitizPandya. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/genztranscribe_base_hindi_en_5.5.0_3.0_1726996369581.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/genztranscribe_base_hindi_en_5.5.0_3.0_1726996369581.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("genztranscribe_base_hindi","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("genztranscribe_base_hindi", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|genztranscribe_base_hindi| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|643.6 MB| + +## References + +https://huggingface.co/KshitizPandya/GenzTranscribe-base-hi \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-genztranscribe_base_hindi_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-genztranscribe_base_hindi_pipeline_en.md new file mode 100644 index 00000000000000..5b288edc521098 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-genztranscribe_base_hindi_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English genztranscribe_base_hindi_pipeline pipeline WhisperForCTC from KshitizPandya +author: John Snow Labs +name: genztranscribe_base_hindi_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`genztranscribe_base_hindi_pipeline` is a English model originally trained by KshitizPandya. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/genztranscribe_base_hindi_pipeline_en_5.5.0_3.0_1726996400983.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/genztranscribe_base_hindi_pipeline_en_5.5.0_3.0_1726996400983.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("genztranscribe_base_hindi_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("genztranscribe_base_hindi_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|genztranscribe_base_hindi_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|643.6 MB| + +## References + +https://huggingface.co/KshitizPandya/GenzTranscribe-base-hi + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-gs3n_roberta_model_es.md b/docs/_posts/ahmedlone127/2024-09-22-gs3n_roberta_model_es.md new file mode 100644 index 00000000000000..38ea5ae5107bfd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-gs3n_roberta_model_es.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Castilian, Spanish gs3n_roberta_model RoBertaForSequenceClassification from erickdp +author: John Snow Labs +name: gs3n_roberta_model +date: 2024-09-22 +tags: [es, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: es +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`gs3n_roberta_model` is a Castilian, Spanish model originally trained by erickdp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/gs3n_roberta_model_es_5.5.0_3.0_1727017134506.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/gs3n_roberta_model_es_5.5.0_3.0_1727017134506.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("gs3n_roberta_model","es") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("gs3n_roberta_model", "es") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|gs3n_roberta_model| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|es| +|Size:|408.3 MB| + +## References + +https://huggingface.co/erickdp/gs3n-roberta-model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-gs3n_roberta_model_pipeline_es.md b/docs/_posts/ahmedlone127/2024-09-22-gs3n_roberta_model_pipeline_es.md new file mode 100644 index 00000000000000..5ebd95fd36b51f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-gs3n_roberta_model_pipeline_es.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Castilian, Spanish gs3n_roberta_model_pipeline pipeline RoBertaForSequenceClassification from erickdp +author: John Snow Labs +name: gs3n_roberta_model_pipeline +date: 2024-09-22 +tags: [es, open_source, pipeline, onnx] +task: Text Classification +language: es +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`gs3n_roberta_model_pipeline` is a Castilian, Spanish model originally trained by erickdp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/gs3n_roberta_model_pipeline_es_5.5.0_3.0_1727017153221.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/gs3n_roberta_model_pipeline_es_5.5.0_3.0_1727017153221.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("gs3n_roberta_model_pipeline", lang = "es") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("gs3n_roberta_model_pipeline", lang = "es") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|gs3n_roberta_model_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|es| +|Size:|408.3 MB| + +## References + +https://huggingface.co/erickdp/gs3n-roberta-model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-hate_hate_balance_random0_seed1_twitter_roberta_base_2022_154m_en.md b/docs/_posts/ahmedlone127/2024-09-22-hate_hate_balance_random0_seed1_twitter_roberta_base_2022_154m_en.md new file mode 100644 index 00000000000000..9f2e5ab7550120 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-hate_hate_balance_random0_seed1_twitter_roberta_base_2022_154m_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English hate_hate_balance_random0_seed1_twitter_roberta_base_2022_154m RoBertaForSequenceClassification from tweettemposhift +author: John Snow Labs +name: hate_hate_balance_random0_seed1_twitter_roberta_base_2022_154m +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`hate_hate_balance_random0_seed1_twitter_roberta_base_2022_154m` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/hate_hate_balance_random0_seed1_twitter_roberta_base_2022_154m_en_5.5.0_3.0_1726972267089.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/hate_hate_balance_random0_seed1_twitter_roberta_base_2022_154m_en_5.5.0_3.0_1726972267089.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("hate_hate_balance_random0_seed1_twitter_roberta_base_2022_154m","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("hate_hate_balance_random0_seed1_twitter_roberta_base_2022_154m", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|hate_hate_balance_random0_seed1_twitter_roberta_base_2022_154m| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|468.1 MB| + +## References + +https://huggingface.co/tweettemposhift/hate-hate_balance_random0_seed1-twitter-roberta-base-2022-154m \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-hate_hate_balance_random0_seed1_twitter_roberta_base_2022_154m_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-hate_hate_balance_random0_seed1_twitter_roberta_base_2022_154m_pipeline_en.md new file mode 100644 index 00000000000000..33536110c9345e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-hate_hate_balance_random0_seed1_twitter_roberta_base_2022_154m_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English hate_hate_balance_random0_seed1_twitter_roberta_base_2022_154m_pipeline pipeline RoBertaForSequenceClassification from tweettemposhift +author: John Snow Labs +name: hate_hate_balance_random0_seed1_twitter_roberta_base_2022_154m_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`hate_hate_balance_random0_seed1_twitter_roberta_base_2022_154m_pipeline` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/hate_hate_balance_random0_seed1_twitter_roberta_base_2022_154m_pipeline_en_5.5.0_3.0_1726972289185.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/hate_hate_balance_random0_seed1_twitter_roberta_base_2022_154m_pipeline_en_5.5.0_3.0_1726972289185.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("hate_hate_balance_random0_seed1_twitter_roberta_base_2022_154m_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("hate_hate_balance_random0_seed1_twitter_roberta_base_2022_154m_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|hate_hate_balance_random0_seed1_twitter_roberta_base_2022_154m_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|468.1 MB| + +## References + +https://huggingface.co/tweettemposhift/hate-hate_balance_random0_seed1-twitter-roberta-base-2022-154m + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-hate_hate_balance_random3_seed1_bertweet_large_en.md b/docs/_posts/ahmedlone127/2024-09-22-hate_hate_balance_random3_seed1_bertweet_large_en.md new file mode 100644 index 00000000000000..510d88f606a683 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-hate_hate_balance_random3_seed1_bertweet_large_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English hate_hate_balance_random3_seed1_bertweet_large RoBertaForSequenceClassification from tweettemposhift +author: John Snow Labs +name: hate_hate_balance_random3_seed1_bertweet_large +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`hate_hate_balance_random3_seed1_bertweet_large` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/hate_hate_balance_random3_seed1_bertweet_large_en_5.5.0_3.0_1727027488301.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/hate_hate_balance_random3_seed1_bertweet_large_en_5.5.0_3.0_1727027488301.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("hate_hate_balance_random3_seed1_bertweet_large","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("hate_hate_balance_random3_seed1_bertweet_large", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|hate_hate_balance_random3_seed1_bertweet_large| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/tweettemposhift/hate-hate_balance_random3_seed1-bertweet-large \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-hate_hate_balance_random3_seed1_bertweet_large_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-hate_hate_balance_random3_seed1_bertweet_large_pipeline_en.md new file mode 100644 index 00000000000000..cce7329fcb97ee --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-hate_hate_balance_random3_seed1_bertweet_large_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English hate_hate_balance_random3_seed1_bertweet_large_pipeline pipeline RoBertaForSequenceClassification from tweettemposhift +author: John Snow Labs +name: hate_hate_balance_random3_seed1_bertweet_large_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`hate_hate_balance_random3_seed1_bertweet_large_pipeline` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/hate_hate_balance_random3_seed1_bertweet_large_pipeline_en_5.5.0_3.0_1727027584939.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/hate_hate_balance_random3_seed1_bertweet_large_pipeline_en_5.5.0_3.0_1727027584939.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("hate_hate_balance_random3_seed1_bertweet_large_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("hate_hate_balance_random3_seed1_bertweet_large_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|hate_hate_balance_random3_seed1_bertweet_large_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/tweettemposhift/hate-hate_balance_random3_seed1-bertweet-large + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-hate_hate_random2_seed2_twitter_roberta_large_2022_154m_en.md b/docs/_posts/ahmedlone127/2024-09-22-hate_hate_random2_seed2_twitter_roberta_large_2022_154m_en.md new file mode 100644 index 00000000000000..6981e8b1e9d146 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-hate_hate_random2_seed2_twitter_roberta_large_2022_154m_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English hate_hate_random2_seed2_twitter_roberta_large_2022_154m RoBertaForSequenceClassification from tweettemposhift +author: John Snow Labs +name: hate_hate_random2_seed2_twitter_roberta_large_2022_154m +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`hate_hate_random2_seed2_twitter_roberta_large_2022_154m` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/hate_hate_random2_seed2_twitter_roberta_large_2022_154m_en_5.5.0_3.0_1727027417241.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/hate_hate_random2_seed2_twitter_roberta_large_2022_154m_en_5.5.0_3.0_1727027417241.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("hate_hate_random2_seed2_twitter_roberta_large_2022_154m","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("hate_hate_random2_seed2_twitter_roberta_large_2022_154m", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|hate_hate_random2_seed2_twitter_roberta_large_2022_154m| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/tweettemposhift/hate-hate_random2_seed2-twitter-roberta-large-2022-154m \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-hate_hate_random2_seed2_twitter_roberta_large_2022_154m_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-hate_hate_random2_seed2_twitter_roberta_large_2022_154m_pipeline_en.md new file mode 100644 index 00000000000000..3cd79ed0f0bfa5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-hate_hate_random2_seed2_twitter_roberta_large_2022_154m_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English hate_hate_random2_seed2_twitter_roberta_large_2022_154m_pipeline pipeline RoBertaForSequenceClassification from tweettemposhift +author: John Snow Labs +name: hate_hate_random2_seed2_twitter_roberta_large_2022_154m_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`hate_hate_random2_seed2_twitter_roberta_large_2022_154m_pipeline` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/hate_hate_random2_seed2_twitter_roberta_large_2022_154m_pipeline_en_5.5.0_3.0_1727027483050.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/hate_hate_random2_seed2_twitter_roberta_large_2022_154m_pipeline_en_5.5.0_3.0_1727027483050.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("hate_hate_random2_seed2_twitter_roberta_large_2022_154m_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("hate_hate_random2_seed2_twitter_roberta_large_2022_154m_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|hate_hate_random2_seed2_twitter_roberta_large_2022_154m_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/tweettemposhift/hate-hate_random2_seed2-twitter-roberta-large-2022-154m + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-hello_classification_model_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-hello_classification_model_pipeline_en.md new file mode 100644 index 00000000000000..9351f8c1cce6ae --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-hello_classification_model_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English hello_classification_model_pipeline pipeline DistilBertForSequenceClassification from krishnareddy +author: John Snow Labs +name: hello_classification_model_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`hello_classification_model_pipeline` is a English model originally trained by krishnareddy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/hello_classification_model_pipeline_en_5.5.0_3.0_1727012467443.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/hello_classification_model_pipeline_en_5.5.0_3.0_1727012467443.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("hello_classification_model_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("hello_classification_model_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|hello_classification_model_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/krishnareddy/hello_classification_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-hin_trac1_fin_en.md b/docs/_posts/ahmedlone127/2024-09-22-hin_trac1_fin_en.md new file mode 100644 index 00000000000000..5f4c081f304a21 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-hin_trac1_fin_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English hin_trac1_fin BertForSequenceClassification from Maha +author: John Snow Labs +name: hin_trac1_fin +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`hin_trac1_fin` is a English model originally trained by Maha. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/hin_trac1_fin_en_5.5.0_3.0_1726988626300.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/hin_trac1_fin_en_5.5.0_3.0_1726988626300.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("hin_trac1_fin","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("hin_trac1_fin", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|hin_trac1_fin| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|667.3 MB| + +## References + +https://huggingface.co/Maha/hin-trac1_fin \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-hin_trac1_fin_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-hin_trac1_fin_pipeline_en.md new file mode 100644 index 00000000000000..1fb78c218565f5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-hin_trac1_fin_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English hin_trac1_fin_pipeline pipeline BertForSequenceClassification from Maha +author: John Snow Labs +name: hin_trac1_fin_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`hin_trac1_fin_pipeline` is a English model originally trained by Maha. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/hin_trac1_fin_pipeline_en_5.5.0_3.0_1726988655446.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/hin_trac1_fin_pipeline_en_5.5.0_3.0_1726988655446.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("hin_trac1_fin_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("hin_trac1_fin_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|hin_trac1_fin_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|667.3 MB| + +## References + +https://huggingface.co/Maha/hin-trac1_fin + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-hindi_codemixed_abusive_muril_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-hindi_codemixed_abusive_muril_pipeline_en.md new file mode 100644 index 00000000000000..a1ce0ef69e3e18 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-hindi_codemixed_abusive_muril_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English hindi_codemixed_abusive_muril_pipeline pipeline BertForSequenceClassification from Hate-speech-CNERG +author: John Snow Labs +name: hindi_codemixed_abusive_muril_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`hindi_codemixed_abusive_muril_pipeline` is a English model originally trained by Hate-speech-CNERG. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/hindi_codemixed_abusive_muril_pipeline_en_5.5.0_3.0_1727032243043.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/hindi_codemixed_abusive_muril_pipeline_en_5.5.0_3.0_1727032243043.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("hindi_codemixed_abusive_muril_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("hindi_codemixed_abusive_muril_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|hindi_codemixed_abusive_muril_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|892.7 MB| + +## References + +https://huggingface.co/Hate-speech-CNERG/hindi-codemixed-abusive-MuRIL + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-hindi_wordpiece_bert_test_2m_en.md b/docs/_posts/ahmedlone127/2024-09-22-hindi_wordpiece_bert_test_2m_en.md new file mode 100644 index 00000000000000..8b90382d0f7fa1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-hindi_wordpiece_bert_test_2m_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English hindi_wordpiece_bert_test_2m BertEmbeddings from rg1683 +author: John Snow Labs +name: hindi_wordpiece_bert_test_2m +date: 2024-09-22 +tags: [en, open_source, onnx, embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`hindi_wordpiece_bert_test_2m` is a English model originally trained by rg1683. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/hindi_wordpiece_bert_test_2m_en_5.5.0_3.0_1727008149400.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/hindi_wordpiece_bert_test_2m_en_5.5.0_3.0_1727008149400.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = BertEmbeddings.pretrained("hindi_wordpiece_bert_test_2m","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = BertEmbeddings.pretrained("hindi_wordpiece_bert_test_2m","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|hindi_wordpiece_bert_test_2m| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[bert]| +|Language:|en| +|Size:|377.7 MB| + +## References + +https://huggingface.co/rg1683/hindi_wordpiece_bert_test_2m \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-hindi_wordpiece_bert_test_2m_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-hindi_wordpiece_bert_test_2m_pipeline_en.md new file mode 100644 index 00000000000000..bd91ff4be01a8e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-hindi_wordpiece_bert_test_2m_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English hindi_wordpiece_bert_test_2m_pipeline pipeline BertEmbeddings from rg1683 +author: John Snow Labs +name: hindi_wordpiece_bert_test_2m_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`hindi_wordpiece_bert_test_2m_pipeline` is a English model originally trained by rg1683. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/hindi_wordpiece_bert_test_2m_pipeline_en_5.5.0_3.0_1727008166350.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/hindi_wordpiece_bert_test_2m_pipeline_en_5.5.0_3.0_1727008166350.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("hindi_wordpiece_bert_test_2m_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("hindi_wordpiece_bert_test_2m_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|hindi_wordpiece_bert_test_2m_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|377.7 MB| + +## References + +https://huggingface.co/rg1683/hindi_wordpiece_bert_test_2m + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-hp_search_deberta_en.md b/docs/_posts/ahmedlone127/2024-09-22-hp_search_deberta_en.md new file mode 100644 index 00000000000000..7c1385a152f48c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-hp_search_deberta_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English hp_search_deberta BertForTokenClassification from cynthiachan +author: John Snow Labs +name: hp_search_deberta +date: 2024-09-22 +tags: [en, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`hp_search_deberta` is a English model originally trained by cynthiachan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/hp_search_deberta_en_5.5.0_3.0_1726977667543.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/hp_search_deberta_en_5.5.0_3.0_1726977667543.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("hp_search_deberta","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("hp_search_deberta", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|hp_search_deberta| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.8 MB| + +## References + +https://huggingface.co/cynthiachan/hp-search-deberta \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-hp_search_deberta_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-hp_search_deberta_pipeline_en.md new file mode 100644 index 00000000000000..301ded326761c9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-hp_search_deberta_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English hp_search_deberta_pipeline pipeline BertForTokenClassification from cynthiachan +author: John Snow Labs +name: hp_search_deberta_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`hp_search_deberta_pipeline` is a English model originally trained by cynthiachan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/hp_search_deberta_pipeline_en_5.5.0_3.0_1726977685632.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/hp_search_deberta_pipeline_en_5.5.0_3.0_1726977685632.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("hp_search_deberta_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("hp_search_deberta_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|hp_search_deberta_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|403.8 MB| + +## References + +https://huggingface.co/cynthiachan/hp-search-deberta + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-hw01_acezkevinz_en.md b/docs/_posts/ahmedlone127/2024-09-22-hw01_acezkevinz_en.md new file mode 100644 index 00000000000000..58fd80985a94ff --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-hw01_acezkevinz_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English hw01_acezkevinz DistilBertForSequenceClassification from AcEzKeViNz +author: John Snow Labs +name: hw01_acezkevinz +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`hw01_acezkevinz` is a English model originally trained by AcEzKeViNz. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/hw01_acezkevinz_en_5.5.0_3.0_1727033596874.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/hw01_acezkevinz_en_5.5.0_3.0_1727033596874.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("hw01_acezkevinz","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("hw01_acezkevinz", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|hw01_acezkevinz| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/AcEzKeViNz/HW01 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-hw01_acezkevinz_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-hw01_acezkevinz_pipeline_en.md new file mode 100644 index 00000000000000..0d7433a3c83e33 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-hw01_acezkevinz_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English hw01_acezkevinz_pipeline pipeline DistilBertForSequenceClassification from AcEzKeViNz +author: John Snow Labs +name: hw01_acezkevinz_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`hw01_acezkevinz_pipeline` is a English model originally trained by AcEzKeViNz. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/hw01_acezkevinz_pipeline_en_5.5.0_3.0_1727033618897.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/hw01_acezkevinz_pipeline_en_5.5.0_3.0_1727033618897.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("hw01_acezkevinz_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("hw01_acezkevinz_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|hw01_acezkevinz_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/AcEzKeViNz/HW01 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-imbalanced_model_title_9_en.md b/docs/_posts/ahmedlone127/2024-09-22-imbalanced_model_title_9_en.md new file mode 100644 index 00000000000000..0f6204b7890de3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-imbalanced_model_title_9_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English imbalanced_model_title_9 RoBertaForSequenceClassification from amishshah +author: John Snow Labs +name: imbalanced_model_title_9 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`imbalanced_model_title_9` is a English model originally trained by amishshah. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/imbalanced_model_title_9_en_5.5.0_3.0_1727037143868.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/imbalanced_model_title_9_en_5.5.0_3.0_1727037143868.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("imbalanced_model_title_9","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("imbalanced_model_title_9", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|imbalanced_model_title_9| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|465.8 MB| + +## References + +https://huggingface.co/amishshah/imbalanced_model_title_9 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-imbalanced_model_title_9_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-imbalanced_model_title_9_pipeline_en.md new file mode 100644 index 00000000000000..42ae3b936e4cc7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-imbalanced_model_title_9_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English imbalanced_model_title_9_pipeline pipeline RoBertaForSequenceClassification from amishshah +author: John Snow Labs +name: imbalanced_model_title_9_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`imbalanced_model_title_9_pipeline` is a English model originally trained by amishshah. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/imbalanced_model_title_9_pipeline_en_5.5.0_3.0_1727037169832.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/imbalanced_model_title_9_pipeline_en_5.5.0_3.0_1727037169832.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("imbalanced_model_title_9_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("imbalanced_model_title_9_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|imbalanced_model_title_9_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|465.8 MB| + +## References + +https://huggingface.co/amishshah/imbalanced_model_title_9 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-imdb2_en.md b/docs/_posts/ahmedlone127/2024-09-22-imdb2_en.md new file mode 100644 index 00000000000000..905d1a67ca6814 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-imdb2_en.md @@ -0,0 +1,98 @@ +--- +layout: model +title: English imdb2 DistilBertForSequenceClassification from Joestars +author: John Snow Labs +name: imdb2 +date: 2024-09-22 +tags: [bert, en, open_source, sequence_classification, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`imdb2` is a English model originally trained by Joestars. + +## Predicted Entities + + + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/imdb2_en_5.5.0_3.0_1726990825082.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/imdb2_en_5.5.0_3.0_1726990825082.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +document_assembler = DocumentAssembler()\ + .setInputCol("text")\ + .setOutputCol("document") + +tokenizer = Tokenizer()\ + .setInputCols("document")\ + .setOutputCol("token") + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("imdb2","en")\ + .setInputCols(["document","token"])\ + .setOutputCol("class") + +pipeline = Pipeline().setStages([document_assembler, tokenizer, sequenceClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val document_assembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("imdb2","en") + .setInputCols(Array("document","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|imdb2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +References + +https://huggingface.co/Joestars/imdb2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-imdb2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-imdb2_pipeline_en.md new file mode 100644 index 00000000000000..3b21e5772c8c94 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-imdb2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English imdb2_pipeline pipeline BertForSequenceClassification from Lumos +author: John Snow Labs +name: imdb2_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`imdb2_pipeline` is a English model originally trained by Lumos. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/imdb2_pipeline_en_5.5.0_3.0_1726990845184.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/imdb2_pipeline_en_5.5.0_3.0_1726990845184.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("imdb2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("imdb2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|imdb2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/Lumos/imdb2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-imdb4_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-imdb4_pipeline_en.md new file mode 100644 index 00000000000000..f974fb4a112239 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-imdb4_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English imdb4_pipeline pipeline BertForSequenceClassification from Lumos +author: John Snow Labs +name: imdb4_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`imdb4_pipeline` is a English model originally trained by Lumos. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/imdb4_pipeline_en_5.5.0_3.0_1727032523653.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/imdb4_pipeline_en_5.5.0_3.0_1727032523653.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("imdb4_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("imdb4_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|imdb4_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/Lumos/imdb4 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-imdb_5_en.md b/docs/_posts/ahmedlone127/2024-09-22-imdb_5_en.md new file mode 100644 index 00000000000000..b25f9e34e098c9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-imdb_5_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English imdb_5 DistilBertForSequenceClassification from draghicivlad +author: John Snow Labs +name: imdb_5 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`imdb_5` is a English model originally trained by draghicivlad. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/imdb_5_en_5.5.0_3.0_1726980734662.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/imdb_5_en_5.5.0_3.0_1726980734662.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("imdb_5","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("imdb_5", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|imdb_5| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/draghicivlad/imdb_5 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-imdb_5_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-imdb_5_pipeline_en.md new file mode 100644 index 00000000000000..c60a5ce86066b2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-imdb_5_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English imdb_5_pipeline pipeline DistilBertForSequenceClassification from draghicivlad +author: John Snow Labs +name: imdb_5_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`imdb_5_pipeline` is a English model originally trained by draghicivlad. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/imdb_5_pipeline_en_5.5.0_3.0_1726980746279.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/imdb_5_pipeline_en_5.5.0_3.0_1726980746279.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("imdb_5_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("imdb_5_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|imdb_5_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/draghicivlad/imdb_5 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-imdbreviews_classification_roberta_v02_clf_finetuning_en.md b/docs/_posts/ahmedlone127/2024-09-22-imdbreviews_classification_roberta_v02_clf_finetuning_en.md new file mode 100644 index 00000000000000..3a4a5b2dfba5b7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-imdbreviews_classification_roberta_v02_clf_finetuning_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English imdbreviews_classification_roberta_v02_clf_finetuning RoBertaForSequenceClassification from darmendarizp +author: John Snow Labs +name: imdbreviews_classification_roberta_v02_clf_finetuning +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`imdbreviews_classification_roberta_v02_clf_finetuning` is a English model originally trained by darmendarizp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/imdbreviews_classification_roberta_v02_clf_finetuning_en_5.5.0_3.0_1727017298269.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/imdbreviews_classification_roberta_v02_clf_finetuning_en_5.5.0_3.0_1727017298269.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("imdbreviews_classification_roberta_v02_clf_finetuning","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("imdbreviews_classification_roberta_v02_clf_finetuning", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|imdbreviews_classification_roberta_v02_clf_finetuning| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|468.3 MB| + +## References + +https://huggingface.co/darmendarizp/imdbreviews_classification_roberta_v02_clf_finetuning \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-insta_sentiment_distill_roberta_custom_data_en.md b/docs/_posts/ahmedlone127/2024-09-22-insta_sentiment_distill_roberta_custom_data_en.md new file mode 100644 index 00000000000000..038f1cc1ddd5a8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-insta_sentiment_distill_roberta_custom_data_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English insta_sentiment_distill_roberta_custom_data RoBertaForSequenceClassification from davin45 +author: John Snow Labs +name: insta_sentiment_distill_roberta_custom_data +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`insta_sentiment_distill_roberta_custom_data` is a English model originally trained by davin45. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/insta_sentiment_distill_roberta_custom_data_en_5.5.0_3.0_1727037296932.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/insta_sentiment_distill_roberta_custom_data_en_5.5.0_3.0_1727037296932.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("insta_sentiment_distill_roberta_custom_data","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("insta_sentiment_distill_roberta_custom_data", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|insta_sentiment_distill_roberta_custom_data| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|308.6 MB| + +## References + +https://huggingface.co/davin45/insta-sentiment-distill-roberta-custom_data \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-insta_sentiment_distill_roberta_custom_data_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-insta_sentiment_distill_roberta_custom_data_pipeline_en.md new file mode 100644 index 00000000000000..f3ed73a8ca83ee --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-insta_sentiment_distill_roberta_custom_data_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English insta_sentiment_distill_roberta_custom_data_pipeline pipeline RoBertaForSequenceClassification from davin45 +author: John Snow Labs +name: insta_sentiment_distill_roberta_custom_data_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`insta_sentiment_distill_roberta_custom_data_pipeline` is a English model originally trained by davin45. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/insta_sentiment_distill_roberta_custom_data_pipeline_en_5.5.0_3.0_1727037316796.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/insta_sentiment_distill_roberta_custom_data_pipeline_en_5.5.0_3.0_1727037316796.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("insta_sentiment_distill_roberta_custom_data_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("insta_sentiment_distill_roberta_custom_data_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|insta_sentiment_distill_roberta_custom_data_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|308.6 MB| + +## References + +https://huggingface.co/davin45/insta-sentiment-distill-roberta-custom_data + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-irony_italian_it.md b/docs/_posts/ahmedlone127/2024-09-22-irony_italian_it.md new file mode 100644 index 00000000000000..ae1ace73586a59 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-irony_italian_it.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Italian irony_italian BertForSequenceClassification from aequa-tech +author: John Snow Labs +name: irony_italian +date: 2024-09-22 +tags: [it, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: it +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`irony_italian` is a Italian model originally trained by aequa-tech. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/irony_italian_it_5.5.0_3.0_1726976873858.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/irony_italian_it_5.5.0_3.0_1726976873858.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("irony_italian","it") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("irony_italian", "it") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|irony_italian| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|it| +|Size:|691.9 MB| + +## References + +https://huggingface.co/aequa-tech/irony-it \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-irony_italian_pipeline_it.md b/docs/_posts/ahmedlone127/2024-09-22-irony_italian_pipeline_it.md new file mode 100644 index 00000000000000..86448aae8736f2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-irony_italian_pipeline_it.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Italian irony_italian_pipeline pipeline BertForSequenceClassification from aequa-tech +author: John Snow Labs +name: irony_italian_pipeline +date: 2024-09-22 +tags: [it, open_source, pipeline, onnx] +task: Text Classification +language: it +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`irony_italian_pipeline` is a Italian model originally trained by aequa-tech. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/irony_italian_pipeline_it_5.5.0_3.0_1726976904272.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/irony_italian_pipeline_it_5.5.0_3.0_1726976904272.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("irony_italian_pipeline", lang = "it") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("irony_italian_pipeline", lang = "it") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|irony_italian_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|it| +|Size:|691.9 MB| + +## References + +https://huggingface.co/aequa-tech/irony-it + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-it2_robertuito_d_en.md b/docs/_posts/ahmedlone127/2024-09-22-it2_robertuito_d_en.md new file mode 100644 index 00000000000000..f9b9cf5854052a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-it2_robertuito_d_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English it2_robertuito_d RoBertaForSequenceClassification from PEzquerra +author: John Snow Labs +name: it2_robertuito_d +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`it2_robertuito_d` is a English model originally trained by PEzquerra. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/it2_robertuito_d_en_5.5.0_3.0_1726971747568.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/it2_robertuito_d_en_5.5.0_3.0_1726971747568.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("it2_robertuito_d","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("it2_robertuito_d", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|it2_robertuito_d| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|408.3 MB| + +## References + +https://huggingface.co/PEzquerra/it2_robertuito_D \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-it2_robertuito_d_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-it2_robertuito_d_pipeline_en.md new file mode 100644 index 00000000000000..6e7223bc94bdf0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-it2_robertuito_d_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English it2_robertuito_d_pipeline pipeline RoBertaForSequenceClassification from PEzquerra +author: John Snow Labs +name: it2_robertuito_d_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`it2_robertuito_d_pipeline` is a English model originally trained by PEzquerra. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/it2_robertuito_d_pipeline_en_5.5.0_3.0_1726971769796.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/it2_robertuito_d_pipeline_en_5.5.0_3.0_1726971769796.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("it2_robertuito_d_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("it2_robertuito_d_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|it2_robertuito_d_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|408.3 MB| + +## References + +https://huggingface.co/PEzquerra/it2_robertuito_D + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-javanese_bert_small_imdb_classifier_jv.md b/docs/_posts/ahmedlone127/2024-09-22-javanese_bert_small_imdb_classifier_jv.md new file mode 100644 index 00000000000000..458c4d3895d5eb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-javanese_bert_small_imdb_classifier_jv.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Javanese javanese_bert_small_imdb_classifier BertForSequenceClassification from w11wo +author: John Snow Labs +name: javanese_bert_small_imdb_classifier +date: 2024-09-22 +tags: [jv, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: jv +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`javanese_bert_small_imdb_classifier` is a Javanese model originally trained by w11wo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/javanese_bert_small_imdb_classifier_jv_5.5.0_3.0_1727032157896.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/javanese_bert_small_imdb_classifier_jv_5.5.0_3.0_1727032157896.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("javanese_bert_small_imdb_classifier","jv") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("javanese_bert_small_imdb_classifier", "jv") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|javanese_bert_small_imdb_classifier| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|jv| +|Size:|409.5 MB| + +## References + +https://huggingface.co/w11wo/javanese-bert-small-imdb-classifier \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-javanese_bert_small_imdb_classifier_pipeline_jv.md b/docs/_posts/ahmedlone127/2024-09-22-javanese_bert_small_imdb_classifier_pipeline_jv.md new file mode 100644 index 00000000000000..e9b581cb1aa115 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-javanese_bert_small_imdb_classifier_pipeline_jv.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Javanese javanese_bert_small_imdb_classifier_pipeline pipeline BertForSequenceClassification from w11wo +author: John Snow Labs +name: javanese_bert_small_imdb_classifier_pipeline +date: 2024-09-22 +tags: [jv, open_source, pipeline, onnx] +task: Text Classification +language: jv +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`javanese_bert_small_imdb_classifier_pipeline` is a Javanese model originally trained by w11wo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/javanese_bert_small_imdb_classifier_pipeline_jv_5.5.0_3.0_1727032178801.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/javanese_bert_small_imdb_classifier_pipeline_jv_5.5.0_3.0_1727032178801.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("javanese_bert_small_imdb_classifier_pipeline", lang = "jv") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("javanese_bert_small_imdb_classifier_pipeline", lang = "jv") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|javanese_bert_small_imdb_classifier_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|jv| +|Size:|409.5 MB| + +## References + +https://huggingface.co/w11wo/javanese-bert-small-imdb-classifier + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-jerteh355sentpos4_en.md b/docs/_posts/ahmedlone127/2024-09-22-jerteh355sentpos4_en.md new file mode 100644 index 00000000000000..228a40a27d0628 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-jerteh355sentpos4_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English jerteh355sentpos4 RoBertaForSequenceClassification from Tanor +author: John Snow Labs +name: jerteh355sentpos4 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`jerteh355sentpos4` is a English model originally trained by Tanor. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/jerteh355sentpos4_en_5.5.0_3.0_1727026659452.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/jerteh355sentpos4_en_5.5.0_3.0_1727026659452.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("jerteh355sentpos4","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("jerteh355sentpos4", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|jerteh355sentpos4| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/Tanor/Jerteh355SENTPOS4 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-jerteh355sentpos4_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-jerteh355sentpos4_pipeline_en.md new file mode 100644 index 00000000000000..304a1f44aedb1f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-jerteh355sentpos4_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English jerteh355sentpos4_pipeline pipeline RoBertaForSequenceClassification from Tanor +author: John Snow Labs +name: jerteh355sentpos4_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`jerteh355sentpos4_pipeline` is a English model originally trained by Tanor. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/jerteh355sentpos4_pipeline_en_5.5.0_3.0_1727026730786.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/jerteh355sentpos4_pipeline_en_5.5.0_3.0_1727026730786.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("jerteh355sentpos4_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("jerteh355sentpos4_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|jerteh355sentpos4_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/Tanor/Jerteh355SENTPOS4 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-jerteh355sentpos6_en.md b/docs/_posts/ahmedlone127/2024-09-22-jerteh355sentpos6_en.md new file mode 100644 index 00000000000000..a0ca5372e1b2a0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-jerteh355sentpos6_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English jerteh355sentpos6 RoBertaForSequenceClassification from Tanor +author: John Snow Labs +name: jerteh355sentpos6 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`jerteh355sentpos6` is a English model originally trained by Tanor. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/jerteh355sentpos6_en_5.5.0_3.0_1727017179028.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/jerteh355sentpos6_en_5.5.0_3.0_1727017179028.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("jerteh355sentpos6","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("jerteh355sentpos6", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|jerteh355sentpos6| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/Tanor/Jerteh355SENTPOS6 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-jerteh355sentpos6_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-jerteh355sentpos6_pipeline_en.md new file mode 100644 index 00000000000000..2846894566d200 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-jerteh355sentpos6_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English jerteh355sentpos6_pipeline pipeline RoBertaForSequenceClassification from Tanor +author: John Snow Labs +name: jerteh355sentpos6_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`jerteh355sentpos6_pipeline` is a English model originally trained by Tanor. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/jerteh355sentpos6_pipeline_en_5.5.0_3.0_1727017241673.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/jerteh355sentpos6_pipeline_en_5.5.0_3.0_1727017241673.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("jerteh355sentpos6_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("jerteh355sentpos6_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|jerteh355sentpos6_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/Tanor/Jerteh355SENTPOS6 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-jobclassifier_v2_en.md b/docs/_posts/ahmedlone127/2024-09-22-jobclassifier_v2_en.md new file mode 100644 index 00000000000000..da2ee5a441db8f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-jobclassifier_v2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English jobclassifier_v2 BertForSequenceClassification from CleveGreen +author: John Snow Labs +name: jobclassifier_v2 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`jobclassifier_v2` is a English model originally trained by CleveGreen. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/jobclassifier_v2_en_5.5.0_3.0_1727030597140.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/jobclassifier_v2_en_5.5.0_3.0_1727030597140.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("jobclassifier_v2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("jobclassifier_v2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|jobclassifier_v2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|406.3 MB| + +## References + +https://huggingface.co/CleveGreen/JobClassifier_v2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-jobclassifier_v2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-jobclassifier_v2_pipeline_en.md new file mode 100644 index 00000000000000..a144b16e361d36 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-jobclassifier_v2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English jobclassifier_v2_pipeline pipeline BertForSequenceClassification from CleveGreen +author: John Snow Labs +name: jobclassifier_v2_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`jobclassifier_v2_pipeline` is a English model originally trained by CleveGreen. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/jobclassifier_v2_pipeline_en_5.5.0_3.0_1727030617677.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/jobclassifier_v2_pipeline_en_5.5.0_3.0_1727030617677.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("jobclassifier_v2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("jobclassifier_v2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|jobclassifier_v2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|406.3 MB| + +## References + +https://huggingface.co/CleveGreen/JobClassifier_v2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-kgi_en.md b/docs/_posts/ahmedlone127/2024-09-22-kgi_en.md new file mode 100644 index 00000000000000..9120c1073a16ac --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-kgi_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English kgi DistilBertForSequenceClassification from shrikant11 +author: John Snow Labs +name: kgi +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`kgi` is a English model originally trained by shrikant11. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/kgi_en_5.5.0_3.0_1727033256079.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/kgi_en_5.5.0_3.0_1727033256079.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("kgi","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("kgi", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|kgi| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/shrikant11/KGI \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-kgi_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-kgi_pipeline_en.md new file mode 100644 index 00000000000000..dc6a5b8a33f1de --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-kgi_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English kgi_pipeline pipeline DistilBertForSequenceClassification from shrikant11 +author: John Snow Labs +name: kgi_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`kgi_pipeline` is a English model originally trained by shrikant11. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/kgi_pipeline_en_5.5.0_3.0_1727033271387.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/kgi_pipeline_en_5.5.0_3.0_1727033271387.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("kgi_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("kgi_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|kgi_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/shrikant11/KGI + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-kitchen_applinces_bert_classifier_en.md b/docs/_posts/ahmedlone127/2024-09-22-kitchen_applinces_bert_classifier_en.md new file mode 100644 index 00000000000000..14d88f6cef24aa --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-kitchen_applinces_bert_classifier_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English kitchen_applinces_bert_classifier DistilBertForSequenceClassification from decepticonsIsAllYouNeed +author: John Snow Labs +name: kitchen_applinces_bert_classifier +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`kitchen_applinces_bert_classifier` is a English model originally trained by decepticonsIsAllYouNeed. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/kitchen_applinces_bert_classifier_en_5.5.0_3.0_1727033567569.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/kitchen_applinces_bert_classifier_en_5.5.0_3.0_1727033567569.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("kitchen_applinces_bert_classifier","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("kitchen_applinces_bert_classifier", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|kitchen_applinces_bert_classifier| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|507.8 MB| + +## References + +https://huggingface.co/decepticonsIsAllYouNeed/kitchen_applinces_bert_classifier \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-kitchen_applinces_bert_classifier_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-kitchen_applinces_bert_classifier_pipeline_en.md new file mode 100644 index 00000000000000..bfee439b1edf3c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-kitchen_applinces_bert_classifier_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English kitchen_applinces_bert_classifier_pipeline pipeline DistilBertForSequenceClassification from decepticonsIsAllYouNeed +author: John Snow Labs +name: kitchen_applinces_bert_classifier_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`kitchen_applinces_bert_classifier_pipeline` is a English model originally trained by decepticonsIsAllYouNeed. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/kitchen_applinces_bert_classifier_pipeline_en_5.5.0_3.0_1727033594640.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/kitchen_applinces_bert_classifier_pipeline_en_5.5.0_3.0_1727033594640.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("kitchen_applinces_bert_classifier_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("kitchen_applinces_bert_classifier_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|kitchen_applinces_bert_classifier_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|507.8 MB| + +## References + +https://huggingface.co/decepticonsIsAllYouNeed/kitchen_applinces_bert_classifier + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-korscm_mbert_en.md b/docs/_posts/ahmedlone127/2024-09-22-korscm_mbert_en.md new file mode 100644 index 00000000000000..9dcdbdef438a51 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-korscm_mbert_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English korscm_mbert BertForSequenceClassification from DeadBeast +author: John Snow Labs +name: korscm_mbert +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`korscm_mbert` is a English model originally trained by DeadBeast. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/korscm_mbert_en_5.5.0_3.0_1726988644716.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/korscm_mbert_en_5.5.0_3.0_1726988644716.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("korscm_mbert","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("korscm_mbert", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|korscm_mbert| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|667.3 MB| + +## References + +https://huggingface.co/DeadBeast/korscm-mBERT \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-korscm_mbert_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-korscm_mbert_pipeline_en.md new file mode 100644 index 00000000000000..fcee1eff756252 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-korscm_mbert_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English korscm_mbert_pipeline pipeline BertForSequenceClassification from DeadBeast +author: John Snow Labs +name: korscm_mbert_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`korscm_mbert_pipeline` is a English model originally trained by DeadBeast. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/korscm_mbert_pipeline_en_5.5.0_3.0_1726988673586.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/korscm_mbert_pipeline_en_5.5.0_3.0_1726988673586.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("korscm_mbert_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("korscm_mbert_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|korscm_mbert_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|667.3 MB| + +## References + +https://huggingface.co/DeadBeast/korscm-mBERT + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-lab2_id2223_pipeline_sv.md b/docs/_posts/ahmedlone127/2024-09-22-lab2_id2223_pipeline_sv.md new file mode 100644 index 00000000000000..4f37dd8f3f9f81 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-lab2_id2223_pipeline_sv.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Swedish lab2_id2223_pipeline pipeline WhisperForCTC from humeur +author: John Snow Labs +name: lab2_id2223_pipeline +date: 2024-09-22 +tags: [sv, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: sv +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`lab2_id2223_pipeline` is a Swedish model originally trained by humeur. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/lab2_id2223_pipeline_sv_5.5.0_3.0_1727025215707.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/lab2_id2223_pipeline_sv_5.5.0_3.0_1727025215707.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("lab2_id2223_pipeline", lang = "sv") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("lab2_id2223_pipeline", lang = "sv") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|lab2_id2223_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|sv| +|Size:|1.7 GB| + +## References + +https://huggingface.co/humeur/lab2_id2223 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-lab2_id2223_sv.md b/docs/_posts/ahmedlone127/2024-09-22-lab2_id2223_sv.md new file mode 100644 index 00000000000000..7683614acbe2d5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-lab2_id2223_sv.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Swedish lab2_id2223 WhisperForCTC from humeur +author: John Snow Labs +name: lab2_id2223 +date: 2024-09-22 +tags: [sv, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: sv +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`lab2_id2223` is a Swedish model originally trained by humeur. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/lab2_id2223_sv_5.5.0_3.0_1727025133638.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/lab2_id2223_sv_5.5.0_3.0_1727025133638.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("lab2_id2223","sv") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("lab2_id2223", "sv") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|lab2_id2223| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|sv| +|Size:|1.7 GB| + +## References + +https://huggingface.co/humeur/lab2_id2223 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-legal_base_v1_5__checkpoint2_en.md b/docs/_posts/ahmedlone127/2024-09-22-legal_base_v1_5__checkpoint2_en.md new file mode 100644 index 00000000000000..fbd55f79e9292b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-legal_base_v1_5__checkpoint2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English legal_base_v1_5__checkpoint2 RoBertaEmbeddings from eduagarcia-temp +author: John Snow Labs +name: legal_base_v1_5__checkpoint2 +date: 2024-09-22 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`legal_base_v1_5__checkpoint2` is a English model originally trained by eduagarcia-temp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/legal_base_v1_5__checkpoint2_en_5.5.0_3.0_1727041968560.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/legal_base_v1_5__checkpoint2_en_5.5.0_3.0_1727041968560.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("legal_base_v1_5__checkpoint2","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("legal_base_v1_5__checkpoint2","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|legal_base_v1_5__checkpoint2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|296.5 MB| + +## References + +https://huggingface.co/eduagarcia-temp/legal_base_v1_5__checkpoint2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-legal_base_v1_5__checkpoint2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-legal_base_v1_5__checkpoint2_pipeline_en.md new file mode 100644 index 00000000000000..b529fe6146aa1c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-legal_base_v1_5__checkpoint2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English legal_base_v1_5__checkpoint2_pipeline pipeline RoBertaEmbeddings from eduagarcia-temp +author: John Snow Labs +name: legal_base_v1_5__checkpoint2_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`legal_base_v1_5__checkpoint2_pipeline` is a English model originally trained by eduagarcia-temp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/legal_base_v1_5__checkpoint2_pipeline_en_5.5.0_3.0_1727042054519.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/legal_base_v1_5__checkpoint2_pipeline_en_5.5.0_3.0_1727042054519.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("legal_base_v1_5__checkpoint2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("legal_base_v1_5__checkpoint2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|legal_base_v1_5__checkpoint2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|296.5 MB| + +## References + +https://huggingface.co/eduagarcia-temp/legal_base_v1_5__checkpoint2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-legal_bert_squad_law_en.md b/docs/_posts/ahmedlone127/2024-09-22-legal_bert_squad_law_en.md new file mode 100644 index 00000000000000..6c3d426f5b86b1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-legal_bert_squad_law_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English legal_bert_squad_law BertForQuestionAnswering from lisa +author: John Snow Labs +name: legal_bert_squad_law +date: 2024-09-22 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`legal_bert_squad_law` is a English model originally trained by lisa. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/legal_bert_squad_law_en_5.5.0_3.0_1726978911133.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/legal_bert_squad_law_en_5.5.0_3.0_1726978911133.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("legal_bert_squad_law","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("legal_bert_squad_law", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|legal_bert_squad_law| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|407.3 MB| + +## References + +https://huggingface.co/lisa/legal-bert-squad-law \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-legal_bert_squad_law_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-legal_bert_squad_law_pipeline_en.md new file mode 100644 index 00000000000000..05a6fcf1ea4791 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-legal_bert_squad_law_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English legal_bert_squad_law_pipeline pipeline BertForQuestionAnswering from lisa +author: John Snow Labs +name: legal_bert_squad_law_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`legal_bert_squad_law_pipeline` is a English model originally trained by lisa. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/legal_bert_squad_law_pipeline_en_5.5.0_3.0_1726978929448.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/legal_bert_squad_law_pipeline_en_5.5.0_3.0_1726978929448.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("legal_bert_squad_law_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("legal_bert_squad_law_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|legal_bert_squad_law_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.3 MB| + +## References + +https://huggingface.co/lisa/legal-bert-squad-law + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-lenate_model_4_en.md b/docs/_posts/ahmedlone127/2024-09-22-lenate_model_4_en.md new file mode 100644 index 00000000000000..dfedddb16db12f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-lenate_model_4_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English lenate_model_4 DistilBertForSequenceClassification from lenate +author: John Snow Labs +name: lenate_model_4 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`lenate_model_4` is a English model originally trained by lenate. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/lenate_model_4_en_5.5.0_3.0_1726980427424.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/lenate_model_4_en_5.5.0_3.0_1726980427424.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("lenate_model_4","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("lenate_model_4", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|lenate_model_4| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/lenate/lenate_model_4 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-lenate_model_4_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-lenate_model_4_pipeline_en.md new file mode 100644 index 00000000000000..869ba74fb58496 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-lenate_model_4_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English lenate_model_4_pipeline pipeline DistilBertForSequenceClassification from lenate +author: John Snow Labs +name: lenate_model_4_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`lenate_model_4_pipeline` is a English model originally trained by lenate. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/lenate_model_4_pipeline_en_5.5.0_3.0_1726980439372.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/lenate_model_4_pipeline_en_5.5.0_3.0_1726980439372.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("lenate_model_4_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("lenate_model_4_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|lenate_model_4_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/lenate/lenate_model_4 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-leth_en.md b/docs/_posts/ahmedlone127/2024-09-22-leth_en.md new file mode 100644 index 00000000000000..0882fcfabf997e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-leth_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English leth RoBertaForSequenceClassification from Arlethh +author: John Snow Labs +name: leth +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`leth` is a English model originally trained by Arlethh. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/leth_en_5.5.0_3.0_1727016715405.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/leth_en_5.5.0_3.0_1727016715405.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("leth","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("leth", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|leth| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|308.6 MB| + +## References + +https://huggingface.co/Arlethh/leth \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-leth_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-leth_pipeline_en.md new file mode 100644 index 00000000000000..9f1273877fe15a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-leth_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English leth_pipeline pipeline RoBertaForSequenceClassification from Arlethh +author: John Snow Labs +name: leth_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`leth_pipeline` is a English model originally trained by Arlethh. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/leth_pipeline_en_5.5.0_3.0_1727016730115.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/leth_pipeline_en_5.5.0_3.0_1727016730115.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("leth_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("leth_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|leth_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|308.6 MB| + +## References + +https://huggingface.co/Arlethh/leth + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-llm_practice001_en.md b/docs/_posts/ahmedlone127/2024-09-22-llm_practice001_en.md new file mode 100644 index 00000000000000..81797c2799908f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-llm_practice001_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English llm_practice001 DistilBertForSequenceClassification from JiAYu1997 +author: John Snow Labs +name: llm_practice001 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`llm_practice001` is a English model originally trained by JiAYu1997. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/llm_practice001_en_5.5.0_3.0_1727020505317.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/llm_practice001_en_5.5.0_3.0_1727020505317.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("llm_practice001","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("llm_practice001", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|llm_practice001| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/JiAYu1997/LLM_Practice001 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-llm_practice001_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-llm_practice001_pipeline_en.md new file mode 100644 index 00000000000000..1930a245aa048a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-llm_practice001_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English llm_practice001_pipeline pipeline DistilBertForSequenceClassification from JiAYu1997 +author: John Snow Labs +name: llm_practice001_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`llm_practice001_pipeline` is a English model originally trained by JiAYu1997. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/llm_practice001_pipeline_en_5.5.0_3.0_1727020517685.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/llm_practice001_pipeline_en_5.5.0_3.0_1727020517685.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("llm_practice001_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("llm_practice001_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|llm_practice001_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/JiAYu1997/LLM_Practice001 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-llm_project_en.md b/docs/_posts/ahmedlone127/2024-09-22-llm_project_en.md new file mode 100644 index 00000000000000..1447110116642c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-llm_project_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English llm_project DistilBertForSequenceClassification from ThuyTran102 +author: John Snow Labs +name: llm_project +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`llm_project` is a English model originally trained by ThuyTran102. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/llm_project_en_5.5.0_3.0_1727020559412.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/llm_project_en_5.5.0_3.0_1727020559412.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("llm_project","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("llm_project", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|llm_project| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/ThuyTran102/LLM_project \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-llm_project_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-llm_project_pipeline_en.md new file mode 100644 index 00000000000000..21e703669b338e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-llm_project_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English llm_project_pipeline pipeline DistilBertForSequenceClassification from ThuyTran102 +author: John Snow Labs +name: llm_project_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`llm_project_pipeline` is a English model originally trained by ThuyTran102. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/llm_project_pipeline_en_5.5.0_3.0_1727020571975.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/llm_project_pipeline_en_5.5.0_3.0_1727020571975.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("llm_project_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("llm_project_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|llm_project_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/ThuyTran102/LLM_project + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-llmhw01_chtsai2104_en.md b/docs/_posts/ahmedlone127/2024-09-22-llmhw01_chtsai2104_en.md new file mode 100644 index 00000000000000..307642b8292f20 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-llmhw01_chtsai2104_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English llmhw01_chtsai2104 DistilBertForSequenceClassification from chtsai2104 +author: John Snow Labs +name: llmhw01_chtsai2104 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`llmhw01_chtsai2104` is a English model originally trained by chtsai2104. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/llmhw01_chtsai2104_en_5.5.0_3.0_1726980108123.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/llmhw01_chtsai2104_en_5.5.0_3.0_1726980108123.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("llmhw01_chtsai2104","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("llmhw01_chtsai2104", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|llmhw01_chtsai2104| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/chtsai2104/llmhw01 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-llmhw01_chtsai2104_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-llmhw01_chtsai2104_pipeline_en.md new file mode 100644 index 00000000000000..5ef999666d3493 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-llmhw01_chtsai2104_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English llmhw01_chtsai2104_pipeline pipeline DistilBertForSequenceClassification from chtsai2104 +author: John Snow Labs +name: llmhw01_chtsai2104_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`llmhw01_chtsai2104_pipeline` is a English model originally trained by chtsai2104. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/llmhw01_chtsai2104_pipeline_en_5.5.0_3.0_1726980119550.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/llmhw01_chtsai2104_pipeline_en_5.5.0_3.0_1726980119550.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("llmhw01_chtsai2104_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("llmhw01_chtsai2104_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|llmhw01_chtsai2104_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/chtsai2104/llmhw01 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-malicious_prompt_classifier_en.md b/docs/_posts/ahmedlone127/2024-09-22-malicious_prompt_classifier_en.md new file mode 100644 index 00000000000000..fb87ec854f42ff --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-malicious_prompt_classifier_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English malicious_prompt_classifier DistilBertForSequenceClassification from Al-Chan +author: John Snow Labs +name: malicious_prompt_classifier +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`malicious_prompt_classifier` is a English model originally trained by Al-Chan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/malicious_prompt_classifier_en_5.5.0_3.0_1727012789878.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/malicious_prompt_classifier_en_5.5.0_3.0_1727012789878.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("malicious_prompt_classifier","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("malicious_prompt_classifier", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|malicious_prompt_classifier| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Al-Chan/Malicious_Prompt_Classifier \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-malicious_prompt_classifier_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-malicious_prompt_classifier_pipeline_en.md new file mode 100644 index 00000000000000..75de87ed103d0f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-malicious_prompt_classifier_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English malicious_prompt_classifier_pipeline pipeline DistilBertForSequenceClassification from Al-Chan +author: John Snow Labs +name: malicious_prompt_classifier_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`malicious_prompt_classifier_pipeline` is a English model originally trained by Al-Chan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/malicious_prompt_classifier_pipeline_en_5.5.0_3.0_1727012801765.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/malicious_prompt_classifier_pipeline_en_5.5.0_3.0_1727012801765.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("malicious_prompt_classifier_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("malicious_prompt_classifier_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|malicious_prompt_classifier_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Al-Chan/Malicious_Prompt_Classifier + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-marathi_sentiment_movie_reviews_mr.md b/docs/_posts/ahmedlone127/2024-09-22-marathi_sentiment_movie_reviews_mr.md new file mode 100644 index 00000000000000..894d475d830368 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-marathi_sentiment_movie_reviews_mr.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Marathi marathi_sentiment_movie_reviews BertForSequenceClassification from l3cube-pune +author: John Snow Labs +name: marathi_sentiment_movie_reviews +date: 2024-09-22 +tags: [mr, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: mr +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`marathi_sentiment_movie_reviews` is a Marathi model originally trained by l3cube-pune. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/marathi_sentiment_movie_reviews_mr_5.5.0_3.0_1727007645855.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/marathi_sentiment_movie_reviews_mr_5.5.0_3.0_1727007645855.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("marathi_sentiment_movie_reviews","mr") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("marathi_sentiment_movie_reviews", "mr") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|marathi_sentiment_movie_reviews| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|mr| +|Size:|892.8 MB| + +## References + +https://huggingface.co/l3cube-pune/marathi-sentiment-movie-reviews \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-marathi_sentiment_movie_reviews_pipeline_mr.md b/docs/_posts/ahmedlone127/2024-09-22-marathi_sentiment_movie_reviews_pipeline_mr.md new file mode 100644 index 00000000000000..40405d1e4b623a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-marathi_sentiment_movie_reviews_pipeline_mr.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Marathi marathi_sentiment_movie_reviews_pipeline pipeline BertForSequenceClassification from l3cube-pune +author: John Snow Labs +name: marathi_sentiment_movie_reviews_pipeline +date: 2024-09-22 +tags: [mr, open_source, pipeline, onnx] +task: Text Classification +language: mr +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`marathi_sentiment_movie_reviews_pipeline` is a Marathi model originally trained by l3cube-pune. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/marathi_sentiment_movie_reviews_pipeline_mr_5.5.0_3.0_1727007685674.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/marathi_sentiment_movie_reviews_pipeline_mr_5.5.0_3.0_1727007685674.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("marathi_sentiment_movie_reviews_pipeline", lang = "mr") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("marathi_sentiment_movie_reviews_pipeline", lang = "mr") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|marathi_sentiment_movie_reviews_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|mr| +|Size:|892.9 MB| + +## References + +https://huggingface.co/l3cube-pune/marathi-sentiment-movie-reviews + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-marathi_topic_all_doc_v2_mr.md b/docs/_posts/ahmedlone127/2024-09-22-marathi_topic_all_doc_v2_mr.md new file mode 100644 index 00000000000000..8cdd0960510e0a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-marathi_topic_all_doc_v2_mr.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Marathi marathi_topic_all_doc_v2 BertForSequenceClassification from l3cube-pune +author: John Snow Labs +name: marathi_topic_all_doc_v2 +date: 2024-09-22 +tags: [mr, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: mr +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`marathi_topic_all_doc_v2` is a Marathi model originally trained by l3cube-pune. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/marathi_topic_all_doc_v2_mr_5.5.0_3.0_1726991085459.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/marathi_topic_all_doc_v2_mr_5.5.0_3.0_1726991085459.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("marathi_topic_all_doc_v2","mr") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("marathi_topic_all_doc_v2", "mr") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|marathi_topic_all_doc_v2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|mr| +|Size:|892.9 MB| + +## References + +https://huggingface.co/l3cube-pune/marathi-topic-all-doc-v2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-marathi_topic_all_doc_v2_pipeline_mr.md b/docs/_posts/ahmedlone127/2024-09-22-marathi_topic_all_doc_v2_pipeline_mr.md new file mode 100644 index 00000000000000..fa8e56d6d4b9be --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-marathi_topic_all_doc_v2_pipeline_mr.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Marathi marathi_topic_all_doc_v2_pipeline pipeline BertForSequenceClassification from l3cube-pune +author: John Snow Labs +name: marathi_topic_all_doc_v2_pipeline +date: 2024-09-22 +tags: [mr, open_source, pipeline, onnx] +task: Text Classification +language: mr +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`marathi_topic_all_doc_v2_pipeline` is a Marathi model originally trained by l3cube-pune. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/marathi_topic_all_doc_v2_pipeline_mr_5.5.0_3.0_1726991124414.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/marathi_topic_all_doc_v2_pipeline_mr_5.5.0_3.0_1726991124414.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("marathi_topic_all_doc_v2_pipeline", lang = "mr") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("marathi_topic_all_doc_v2_pipeline", lang = "mr") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|marathi_topic_all_doc_v2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|mr| +|Size:|892.9 MB| + +## References + +https://huggingface.co/l3cube-pune/marathi-topic-all-doc-v2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-md3_sentiment_analysis_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-md3_sentiment_analysis_pipeline_en.md new file mode 100644 index 00000000000000..e40450980c1c58 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-md3_sentiment_analysis_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English md3_sentiment_analysis_pipeline pipeline DistilBertForSequenceClassification from kassfir +author: John Snow Labs +name: md3_sentiment_analysis_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`md3_sentiment_analysis_pipeline` is a English model originally trained by kassfir. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/md3_sentiment_analysis_pipeline_en_5.5.0_3.0_1727033269052.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/md3_sentiment_analysis_pipeline_en_5.5.0_3.0_1727033269052.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("md3_sentiment_analysis_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("md3_sentiment_analysis_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|md3_sentiment_analysis_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/kassfir/md3-sentiment-analysis + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-memo_bert_wsd_danskbert_en.md b/docs/_posts/ahmedlone127/2024-09-22-memo_bert_wsd_danskbert_en.md new file mode 100644 index 00000000000000..1ed9a8642e39f0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-memo_bert_wsd_danskbert_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English memo_bert_wsd_danskbert XlmRoBertaForSequenceClassification from yemen2016 +author: John Snow Labs +name: memo_bert_wsd_danskbert +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`memo_bert_wsd_danskbert` is a English model originally trained by yemen2016. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/memo_bert_wsd_danskbert_en_5.5.0_3.0_1727009142736.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/memo_bert_wsd_danskbert_en_5.5.0_3.0_1727009142736.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("memo_bert_wsd_danskbert","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("memo_bert_wsd_danskbert", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|memo_bert_wsd_danskbert| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|427.5 MB| + +## References + +https://huggingface.co/yemen2016/MeMo_BERT-WSD-DanskBERT \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-memo_bert_wsd_danskbert_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-memo_bert_wsd_danskbert_pipeline_en.md new file mode 100644 index 00000000000000..62ab075adbb0d5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-memo_bert_wsd_danskbert_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English memo_bert_wsd_danskbert_pipeline pipeline XlmRoBertaForSequenceClassification from yemen2016 +author: John Snow Labs +name: memo_bert_wsd_danskbert_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`memo_bert_wsd_danskbert_pipeline` is a English model originally trained by yemen2016. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/memo_bert_wsd_danskbert_pipeline_en_5.5.0_3.0_1727009177537.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/memo_bert_wsd_danskbert_pipeline_en_5.5.0_3.0_1727009177537.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("memo_bert_wsd_danskbert_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("memo_bert_wsd_danskbert_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|memo_bert_wsd_danskbert_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|427.5 MB| + +## References + +https://huggingface.co/yemen2016/MeMo_BERT-WSD-DanskBERT + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-mentalroberta_empai_final2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-mentalroberta_empai_final2_pipeline_en.md new file mode 100644 index 00000000000000..f5f7b6574954fd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-mentalroberta_empai_final2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English mentalroberta_empai_final2_pipeline pipeline RoBertaEmbeddings from LuangMV97 +author: John Snow Labs +name: mentalroberta_empai_final2_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`mentalroberta_empai_final2_pipeline` is a English model originally trained by LuangMV97. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/mentalroberta_empai_final2_pipeline_en_5.5.0_3.0_1726999372370.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/mentalroberta_empai_final2_pipeline_en_5.5.0_3.0_1726999372370.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("mentalroberta_empai_final2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("mentalroberta_empai_final2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|mentalroberta_empai_final2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|466.1 MB| + +## References + +https://huggingface.co/LuangMV97/MentalRoBERTa_EmpAI_final2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-minilmv2_l6_h384_from_bert_large_mrqa_en.md b/docs/_posts/ahmedlone127/2024-09-22-minilmv2_l6_h384_from_bert_large_mrqa_en.md new file mode 100644 index 00000000000000..bbb8cb0afd916c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-minilmv2_l6_h384_from_bert_large_mrqa_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English minilmv2_l6_h384_from_bert_large_mrqa BertForQuestionAnswering from VMware +author: John Snow Labs +name: minilmv2_l6_h384_from_bert_large_mrqa +date: 2024-09-22 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`minilmv2_l6_h384_from_bert_large_mrqa` is a English model originally trained by VMware. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/minilmv2_l6_h384_from_bert_large_mrqa_en_5.5.0_3.0_1726991819674.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/minilmv2_l6_h384_from_bert_large_mrqa_en_5.5.0_3.0_1726991819674.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("minilmv2_l6_h384_from_bert_large_mrqa","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("minilmv2_l6_h384_from_bert_large_mrqa", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|minilmv2_l6_h384_from_bert_large_mrqa| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|84.3 MB| + +## References + +https://huggingface.co/VMware/minilmv2-l6-h384-from-bert-large-mrqa \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-minilmv2_l6_h384_from_bert_large_mrqa_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-minilmv2_l6_h384_from_bert_large_mrqa_pipeline_en.md new file mode 100644 index 00000000000000..073162d3dc6f72 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-minilmv2_l6_h384_from_bert_large_mrqa_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English minilmv2_l6_h384_from_bert_large_mrqa_pipeline pipeline BertForQuestionAnswering from VMware +author: John Snow Labs +name: minilmv2_l6_h384_from_bert_large_mrqa_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`minilmv2_l6_h384_from_bert_large_mrqa_pipeline` is a English model originally trained by VMware. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/minilmv2_l6_h384_from_bert_large_mrqa_pipeline_en_5.5.0_3.0_1726991824080.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/minilmv2_l6_h384_from_bert_large_mrqa_pipeline_en_5.5.0_3.0_1726991824080.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("minilmv2_l6_h384_from_bert_large_mrqa_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("minilmv2_l6_h384_from_bert_large_mrqa_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|minilmv2_l6_h384_from_bert_large_mrqa_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|84.3 MB| + +## References + +https://huggingface.co/VMware/minilmv2-l6-h384-from-bert-large-mrqa + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-minilmv2_l6_h384_mlm_multi_emails_hq_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-minilmv2_l6_h384_mlm_multi_emails_hq_pipeline_en.md new file mode 100644 index 00000000000000..81d0cd4cf814f5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-minilmv2_l6_h384_mlm_multi_emails_hq_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English minilmv2_l6_h384_mlm_multi_emails_hq_pipeline pipeline RoBertaEmbeddings from postbot +author: John Snow Labs +name: minilmv2_l6_h384_mlm_multi_emails_hq_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`minilmv2_l6_h384_mlm_multi_emails_hq_pipeline` is a English model originally trained by postbot. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/minilmv2_l6_h384_mlm_multi_emails_hq_pipeline_en_5.5.0_3.0_1727042054048.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/minilmv2_l6_h384_mlm_multi_emails_hq_pipeline_en_5.5.0_3.0_1727042054048.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("minilmv2_l6_h384_mlm_multi_emails_hq_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("minilmv2_l6_h384_mlm_multi_emails_hq_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|minilmv2_l6_h384_mlm_multi_emails_hq_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|114.2 MB| + +## References + +https://huggingface.co/postbot/MiniLMv2-L6-H384-mlm-multi-emails-hq + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-mmarco_mminilmv2_l12_h384_v1_en.md b/docs/_posts/ahmedlone127/2024-09-22-mmarco_mminilmv2_l12_h384_v1_en.md new file mode 100644 index 00000000000000..7b26f9012a9a86 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-mmarco_mminilmv2_l12_h384_v1_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English mmarco_mminilmv2_l12_h384_v1 XlmRoBertaForSequenceClassification from lpsantao +author: John Snow Labs +name: mmarco_mminilmv2_l12_h384_v1 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`mmarco_mminilmv2_l12_h384_v1` is a English model originally trained by lpsantao. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/mmarco_mminilmv2_l12_h384_v1_en_5.5.0_3.0_1727009949173.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/mmarco_mminilmv2_l12_h384_v1_en_5.5.0_3.0_1727009949173.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("mmarco_mminilmv2_l12_h384_v1","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("mmarco_mminilmv2_l12_h384_v1", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|mmarco_mminilmv2_l12_h384_v1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|399.6 MB| + +## References + +https://huggingface.co/lpsantao/mmarco-mMiniLMv2-L12-H384-v1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-mmarco_mminilmv2_l12_h384_v1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-mmarco_mminilmv2_l12_h384_v1_pipeline_en.md new file mode 100644 index 00000000000000..9050197fa7d548 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-mmarco_mminilmv2_l12_h384_v1_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English mmarco_mminilmv2_l12_h384_v1_pipeline pipeline XlmRoBertaForSequenceClassification from lpsantao +author: John Snow Labs +name: mmarco_mminilmv2_l12_h384_v1_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`mmarco_mminilmv2_l12_h384_v1_pipeline` is a English model originally trained by lpsantao. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/mmarco_mminilmv2_l12_h384_v1_pipeline_en_5.5.0_3.0_1727009979068.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/mmarco_mminilmv2_l12_h384_v1_pipeline_en_5.5.0_3.0_1727009979068.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("mmarco_mminilmv2_l12_h384_v1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("mmarco_mminilmv2_l12_h384_v1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|mmarco_mminilmv2_l12_h384_v1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|399.6 MB| + +## References + +https://huggingface.co/lpsantao/mmarco-mMiniLMv2-L12-H384-v1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-mmlu_physics_classifier_en.md b/docs/_posts/ahmedlone127/2024-09-22-mmlu_physics_classifier_en.md new file mode 100644 index 00000000000000..2260a5ad6c0767 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-mmlu_physics_classifier_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English mmlu_physics_classifier RoBertaForSequenceClassification from chrisliu298 +author: John Snow Labs +name: mmlu_physics_classifier +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`mmlu_physics_classifier` is a English model originally trained by chrisliu298. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/mmlu_physics_classifier_en_5.5.0_3.0_1727026588008.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/mmlu_physics_classifier_en_5.5.0_3.0_1727026588008.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("mmlu_physics_classifier","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("mmlu_physics_classifier", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|mmlu_physics_classifier| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|439.3 MB| + +## References + +https://huggingface.co/chrisliu298/mmlu-physics_classifier \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-mmlu_physics_classifier_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-mmlu_physics_classifier_pipeline_en.md new file mode 100644 index 00000000000000..812ebbf6ef8718 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-mmlu_physics_classifier_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English mmlu_physics_classifier_pipeline pipeline RoBertaForSequenceClassification from chrisliu298 +author: John Snow Labs +name: mmlu_physics_classifier_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`mmlu_physics_classifier_pipeline` is a English model originally trained by chrisliu298. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/mmlu_physics_classifier_pipeline_en_5.5.0_3.0_1727026612058.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/mmlu_physics_classifier_pipeline_en_5.5.0_3.0_1727026612058.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("mmlu_physics_classifier_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("mmlu_physics_classifier_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|mmlu_physics_classifier_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|439.3 MB| + +## References + +https://huggingface.co/chrisliu298/mmlu-physics_classifier + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-model_2_7_en.md b/docs/_posts/ahmedlone127/2024-09-22-model_2_7_en.md new file mode 100644 index 00000000000000..25b1c893bd5926 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-model_2_7_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English model_2_7 RoBertaForSequenceClassification from raydentseng +author: John Snow Labs +name: model_2_7 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`model_2_7` is a English model originally trained by raydentseng. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/model_2_7_en_5.5.0_3.0_1727037745999.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/model_2_7_en_5.5.0_3.0_1727037745999.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("model_2_7","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("model_2_7", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|model_2_7| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|437.9 MB| + +## References + +https://huggingface.co/raydentseng/model_2_7 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-model_2_7_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-model_2_7_pipeline_en.md new file mode 100644 index 00000000000000..d366bf1079b338 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-model_2_7_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English model_2_7_pipeline pipeline RoBertaForSequenceClassification from raydentseng +author: John Snow Labs +name: model_2_7_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`model_2_7_pipeline` is a English model originally trained by raydentseng. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/model_2_7_pipeline_en_5.5.0_3.0_1727037770788.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/model_2_7_pipeline_en_5.5.0_3.0_1727037770788.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("model_2_7_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("model_2_7_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|model_2_7_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|438.0 MB| + +## References + +https://huggingface.co/raydentseng/model_2_7 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-model_4_en.md b/docs/_posts/ahmedlone127/2024-09-22-model_4_en.md new file mode 100644 index 00000000000000..98fff5d1deae31 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-model_4_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English model_4 BertForSequenceClassification from cannotbolt +author: John Snow Labs +name: model_4 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`model_4` is a English model originally trained by cannotbolt. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/model_4_en_5.5.0_3.0_1727034784048.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/model_4_en_5.5.0_3.0_1727034784048.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("model_4","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("model_4", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|model_4| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/cannotbolt/model_4 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-model_4_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-model_4_pipeline_en.md new file mode 100644 index 00000000000000..fa1f34cbc86f31 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-model_4_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English model_4_pipeline pipeline BertForSequenceClassification from cannotbolt +author: John Snow Labs +name: model_4_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`model_4_pipeline` is a English model originally trained by cannotbolt. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/model_4_pipeline_en_5.5.0_3.0_1727034804772.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/model_4_pipeline_en_5.5.0_3.0_1727034804772.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("model_4_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("model_4_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|model_4_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/cannotbolt/model_4 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-model_sentence_entailment_hackaton_coliee_en.md b/docs/_posts/ahmedlone127/2024-09-22-model_sentence_entailment_hackaton_coliee_en.md new file mode 100644 index 00000000000000..481e674adbf445 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-model_sentence_entailment_hackaton_coliee_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English model_sentence_entailment_hackaton_coliee RoBertaForSequenceClassification from ludoviciarraga +author: John Snow Labs +name: model_sentence_entailment_hackaton_coliee +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`model_sentence_entailment_hackaton_coliee` is a English model originally trained by ludoviciarraga. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/model_sentence_entailment_hackaton_coliee_en_5.5.0_3.0_1726967623298.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/model_sentence_entailment_hackaton_coliee_en_5.5.0_3.0_1726967623298.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("model_sentence_entailment_hackaton_coliee","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("model_sentence_entailment_hackaton_coliee", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|model_sentence_entailment_hackaton_coliee| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/ludoviciarraga/model_sentence_entailment_hackaton_coliee \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-model_sentence_entailment_hackaton_coliee_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-model_sentence_entailment_hackaton_coliee_pipeline_en.md new file mode 100644 index 00000000000000..a94e1ed8dce1d1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-model_sentence_entailment_hackaton_coliee_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English model_sentence_entailment_hackaton_coliee_pipeline pipeline RoBertaForSequenceClassification from ludoviciarraga +author: John Snow Labs +name: model_sentence_entailment_hackaton_coliee_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`model_sentence_entailment_hackaton_coliee_pipeline` is a English model originally trained by ludoviciarraga. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/model_sentence_entailment_hackaton_coliee_pipeline_en_5.5.0_3.0_1726967688097.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/model_sentence_entailment_hackaton_coliee_pipeline_en_5.5.0_3.0_1726967688097.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("model_sentence_entailment_hackaton_coliee_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("model_sentence_entailment_hackaton_coliee_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|model_sentence_entailment_hackaton_coliee_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/ludoviciarraga/model_sentence_entailment_hackaton_coliee + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-model_token_classification_bert_base_ner_en.md b/docs/_posts/ahmedlone127/2024-09-22-model_token_classification_bert_base_ner_en.md new file mode 100644 index 00000000000000..9596e1218d5907 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-model_token_classification_bert_base_ner_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English model_token_classification_bert_base_ner BertForTokenClassification from Ornelas7 +author: John Snow Labs +name: model_token_classification_bert_base_ner +date: 2024-09-22 +tags: [en, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`model_token_classification_bert_base_ner` is a English model originally trained by Ornelas7. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/model_token_classification_bert_base_ner_en_5.5.0_3.0_1727045793742.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/model_token_classification_bert_base_ner_en_5.5.0_3.0_1727045793742.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("model_token_classification_bert_base_ner","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("model_token_classification_bert_base_ner", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|model_token_classification_bert_base_ner| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/Ornelas7/model-token-classification-bert-base-NER \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-model_token_classification_bert_base_ner_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-model_token_classification_bert_base_ner_pipeline_en.md new file mode 100644 index 00000000000000..34367ac8ef3cae --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-model_token_classification_bert_base_ner_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English model_token_classification_bert_base_ner_pipeline pipeline BertForTokenClassification from Ornelas7 +author: John Snow Labs +name: model_token_classification_bert_base_ner_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`model_token_classification_bert_base_ner_pipeline` is a English model originally trained by Ornelas7. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/model_token_classification_bert_base_ner_pipeline_en_5.5.0_3.0_1727045812995.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/model_token_classification_bert_base_ner_pipeline_en_5.5.0_3.0_1727045812995.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("model_token_classification_bert_base_ner_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("model_token_classification_bert_base_ner_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|model_token_classification_bert_base_ner_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/Ornelas7/model-token-classification-bert-base-NER + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-modelofine4_en.md b/docs/_posts/ahmedlone127/2024-09-22-modelofine4_en.md new file mode 100644 index 00000000000000..f0346789beeed3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-modelofine4_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English modelofine4 RoBertaForSequenceClassification from adriansanz +author: John Snow Labs +name: modelofine4 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`modelofine4` is a English model originally trained by adriansanz. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/modelofine4_en_5.5.0_3.0_1727027620273.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/modelofine4_en_5.5.0_3.0_1727027620273.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("modelofine4","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("modelofine4", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|modelofine4| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|448.4 MB| + +## References + +https://huggingface.co/adriansanz/modelofine4 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-modelofine4_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-modelofine4_pipeline_en.md new file mode 100644 index 00000000000000..ecf6ae61ccc3ac --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-modelofine4_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English modelofine4_pipeline pipeline RoBertaForSequenceClassification from adriansanz +author: John Snow Labs +name: modelofine4_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`modelofine4_pipeline` is a English model originally trained by adriansanz. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/modelofine4_pipeline_en_5.5.0_3.0_1727027647545.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/modelofine4_pipeline_en_5.5.0_3.0_1727027647545.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("modelofine4_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("modelofine4_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|modelofine4_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|448.4 MB| + +## References + +https://huggingface.co/adriansanz/modelofine4 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-mpoclassification_en.md b/docs/_posts/ahmedlone127/2024-09-22-mpoclassification_en.md new file mode 100644 index 00000000000000..d6c89069d987a8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-mpoclassification_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English mpoclassification DistilBertForSequenceClassification from inXistant +author: John Snow Labs +name: mpoclassification +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`mpoclassification` is a English model originally trained by inXistant. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/mpoclassification_en_5.5.0_3.0_1727033889376.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/mpoclassification_en_5.5.0_3.0_1727033889376.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("mpoclassification","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("mpoclassification", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|mpoclassification| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/inXistant/MPOClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-mpoclassification_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-mpoclassification_pipeline_en.md new file mode 100644 index 00000000000000..402a3017699c32 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-mpoclassification_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English mpoclassification_pipeline pipeline DistilBertForSequenceClassification from inXistant +author: John Snow Labs +name: mpoclassification_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`mpoclassification_pipeline` is a English model originally trained by inXistant. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/mpoclassification_pipeline_en_5.5.0_3.0_1727033901341.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/mpoclassification_pipeline_en_5.5.0_3.0_1727033901341.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("mpoclassification_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("mpoclassification_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|mpoclassification_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/inXistant/MPOClassification + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-multiwd_en.md b/docs/_posts/ahmedlone127/2024-09-22-multiwd_en.md new file mode 100644 index 00000000000000..fedb16934b1f35 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-multiwd_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English multiwd BertForSequenceClassification from Tianlin668 +author: John Snow Labs +name: multiwd +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`multiwd` is a English model originally trained by Tianlin668. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/multiwd_en_5.5.0_3.0_1727030510510.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/multiwd_en_5.5.0_3.0_1727030510510.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("multiwd","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("multiwd", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|multiwd| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|408.8 MB| + +## References + +https://huggingface.co/Tianlin668/MultiWD \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-multiwd_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-multiwd_pipeline_en.md new file mode 100644 index 00000000000000..2b072f49819893 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-multiwd_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English multiwd_pipeline pipeline BertForSequenceClassification from Tianlin668 +author: John Snow Labs +name: multiwd_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`multiwd_pipeline` is a English model originally trained by Tianlin668. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/multiwd_pipeline_en_5.5.0_3.0_1727030531152.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/multiwd_pipeline_en_5.5.0_3.0_1727030531152.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("multiwd_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("multiwd_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|multiwd_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|408.9 MB| + +## References + +https://huggingface.co/Tianlin668/MultiWD + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-mysterious_bouncy_flan_2_en.md b/docs/_posts/ahmedlone127/2024-09-22-mysterious_bouncy_flan_2_en.md new file mode 100644 index 00000000000000..5d2d649d26bbe0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-mysterious_bouncy_flan_2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English mysterious_bouncy_flan_2 DistilBertForSequenceClassification from gaodrew +author: John Snow Labs +name: mysterious_bouncy_flan_2 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`mysterious_bouncy_flan_2` is a English model originally trained by gaodrew. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/mysterious_bouncy_flan_2_en_5.5.0_3.0_1727012813647.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/mysterious_bouncy_flan_2_en_5.5.0_3.0_1727012813647.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("mysterious_bouncy_flan_2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("mysterious_bouncy_flan_2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|mysterious_bouncy_flan_2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/gaodrew/mysterious-bouncy-flan-2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-mysterious_bouncy_flan_2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-mysterious_bouncy_flan_2_pipeline_en.md new file mode 100644 index 00000000000000..820297ffd5bf25 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-mysterious_bouncy_flan_2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English mysterious_bouncy_flan_2_pipeline pipeline DistilBertForSequenceClassification from gaodrew +author: John Snow Labs +name: mysterious_bouncy_flan_2_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`mysterious_bouncy_flan_2_pipeline` is a English model originally trained by gaodrew. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/mysterious_bouncy_flan_2_pipeline_en_5.5.0_3.0_1727012825109.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/mysterious_bouncy_flan_2_pipeline_en_5.5.0_3.0_1727012825109.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("mysterious_bouncy_flan_2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("mysterious_bouncy_flan_2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|mysterious_bouncy_flan_2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/gaodrew/mysterious-bouncy-flan-2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-n_distilbert_imdb_padding90model_en.md b/docs/_posts/ahmedlone127/2024-09-22-n_distilbert_imdb_padding90model_en.md new file mode 100644 index 00000000000000..4d4138839bdaeb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-n_distilbert_imdb_padding90model_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English n_distilbert_imdb_padding90model DistilBertForSequenceClassification from Realgon +author: John Snow Labs +name: n_distilbert_imdb_padding90model +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`n_distilbert_imdb_padding90model` is a English model originally trained by Realgon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/n_distilbert_imdb_padding90model_en_5.5.0_3.0_1727013035647.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/n_distilbert_imdb_padding90model_en_5.5.0_3.0_1727013035647.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("n_distilbert_imdb_padding90model","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("n_distilbert_imdb_padding90model", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|n_distilbert_imdb_padding90model| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.7 MB| + +## References + +https://huggingface.co/Realgon/N_distilbert_imdb_padding90model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-n_distilbert_imdb_padding90model_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-n_distilbert_imdb_padding90model_pipeline_en.md new file mode 100644 index 00000000000000..b66eb61b62c96f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-n_distilbert_imdb_padding90model_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English n_distilbert_imdb_padding90model_pipeline pipeline DistilBertForSequenceClassification from Realgon +author: John Snow Labs +name: n_distilbert_imdb_padding90model_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`n_distilbert_imdb_padding90model_pipeline` is a English model originally trained by Realgon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/n_distilbert_imdb_padding90model_pipeline_en_5.5.0_3.0_1727013048202.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/n_distilbert_imdb_padding90model_pipeline_en_5.5.0_3.0_1727013048202.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("n_distilbert_imdb_padding90model_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("n_distilbert_imdb_padding90model_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|n_distilbert_imdb_padding90model_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.7 MB| + +## References + +https://huggingface.co/Realgon/N_distilbert_imdb_padding90model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-n_distilbert_sst2_padding50model_en.md b/docs/_posts/ahmedlone127/2024-09-22-n_distilbert_sst2_padding50model_en.md new file mode 100644 index 00000000000000..669f924ee82737 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-n_distilbert_sst2_padding50model_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English n_distilbert_sst2_padding50model DistilBertForSequenceClassification from Realgon +author: John Snow Labs +name: n_distilbert_sst2_padding50model +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`n_distilbert_sst2_padding50model` is a English model originally trained by Realgon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/n_distilbert_sst2_padding50model_en_5.5.0_3.0_1727020791054.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/n_distilbert_sst2_padding50model_en_5.5.0_3.0_1727020791054.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("n_distilbert_sst2_padding50model","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("n_distilbert_sst2_padding50model", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|n_distilbert_sst2_padding50model| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.6 MB| + +## References + +https://huggingface.co/Realgon/N_distilbert_sst2_padding50model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-n_distilbert_sst2_padding50model_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-n_distilbert_sst2_padding50model_pipeline_en.md new file mode 100644 index 00000000000000..45e97a85a63079 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-n_distilbert_sst2_padding50model_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English n_distilbert_sst2_padding50model_pipeline pipeline DistilBertForSequenceClassification from Realgon +author: John Snow Labs +name: n_distilbert_sst2_padding50model_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`n_distilbert_sst2_padding50model_pipeline` is a English model originally trained by Realgon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/n_distilbert_sst2_padding50model_pipeline_en_5.5.0_3.0_1727020802858.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/n_distilbert_sst2_padding50model_pipeline_en_5.5.0_3.0_1727020802858.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("n_distilbert_sst2_padding50model_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("n_distilbert_sst2_padding50model_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|n_distilbert_sst2_padding50model_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.6 MB| + +## References + +https://huggingface.co/Realgon/N_distilbert_sst2_padding50model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-n_distilbert_sst5_padding10model_realgon_en.md b/docs/_posts/ahmedlone127/2024-09-22-n_distilbert_sst5_padding10model_realgon_en.md new file mode 100644 index 00000000000000..b6e415c98620aa --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-n_distilbert_sst5_padding10model_realgon_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English n_distilbert_sst5_padding10model_realgon DistilBertForSequenceClassification from Realgon +author: John Snow Labs +name: n_distilbert_sst5_padding10model_realgon +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`n_distilbert_sst5_padding10model_realgon` is a English model originally trained by Realgon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/n_distilbert_sst5_padding10model_realgon_en_5.5.0_3.0_1727033698893.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/n_distilbert_sst5_padding10model_realgon_en_5.5.0_3.0_1727033698893.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("n_distilbert_sst5_padding10model_realgon","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("n_distilbert_sst5_padding10model_realgon", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|n_distilbert_sst5_padding10model_realgon| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Realgon/N_distilbert_sst5_padding10model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-n_distilbert_sst5_padding10model_realgon_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-n_distilbert_sst5_padding10model_realgon_pipeline_en.md new file mode 100644 index 00000000000000..79e33f8ce1006e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-n_distilbert_sst5_padding10model_realgon_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English n_distilbert_sst5_padding10model_realgon_pipeline pipeline DistilBertForSequenceClassification from Realgon +author: John Snow Labs +name: n_distilbert_sst5_padding10model_realgon_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`n_distilbert_sst5_padding10model_realgon_pipeline` is a English model originally trained by Realgon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/n_distilbert_sst5_padding10model_realgon_pipeline_en_5.5.0_3.0_1727033711955.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/n_distilbert_sst5_padding10model_realgon_pipeline_en_5.5.0_3.0_1727033711955.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("n_distilbert_sst5_padding10model_realgon_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("n_distilbert_sst5_padding10model_realgon_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|n_distilbert_sst5_padding10model_realgon_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Realgon/N_distilbert_sst5_padding10model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-nbme_roberta_large_en.md b/docs/_posts/ahmedlone127/2024-09-22-nbme_roberta_large_en.md new file mode 100644 index 00000000000000..bb2aff96414830 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-nbme_roberta_large_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English nbme_roberta_large RoBertaEmbeddings from smeoni +author: John Snow Labs +name: nbme_roberta_large +date: 2024-09-22 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`nbme_roberta_large` is a English model originally trained by smeoni. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/nbme_roberta_large_en_5.5.0_3.0_1726999421960.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/nbme_roberta_large_en_5.5.0_3.0_1726999421960.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("nbme_roberta_large","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("nbme_roberta_large","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|nbme_roberta_large| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/smeoni/nbme-roberta-large \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-nbme_roberta_large_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-nbme_roberta_large_pipeline_en.md new file mode 100644 index 00000000000000..8e7fb654bed43b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-nbme_roberta_large_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English nbme_roberta_large_pipeline pipeline RoBertaEmbeddings from smeoni +author: John Snow Labs +name: nbme_roberta_large_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`nbme_roberta_large_pipeline` is a English model originally trained by smeoni. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/nbme_roberta_large_pipeline_en_5.5.0_3.0_1726999479308.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/nbme_roberta_large_pipeline_en_5.5.0_3.0_1726999479308.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("nbme_roberta_large_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("nbme_roberta_large_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|nbme_roberta_large_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/smeoni/nbme-roberta-large + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-ner_ehr_spanish_model_mulitlingual_bert_en.md b/docs/_posts/ahmedlone127/2024-09-22-ner_ehr_spanish_model_mulitlingual_bert_en.md new file mode 100644 index 00000000000000..cb535d2285384d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-ner_ehr_spanish_model_mulitlingual_bert_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English ner_ehr_spanish_model_mulitlingual_bert BertForTokenClassification from ajtamayoh +author: John Snow Labs +name: ner_ehr_spanish_model_mulitlingual_bert +date: 2024-09-22 +tags: [en, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ner_ehr_spanish_model_mulitlingual_bert` is a English model originally trained by ajtamayoh. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ner_ehr_spanish_model_mulitlingual_bert_en_5.5.0_3.0_1727040450475.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ner_ehr_spanish_model_mulitlingual_bert_en_5.5.0_3.0_1727040450475.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("ner_ehr_spanish_model_mulitlingual_bert","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("ner_ehr_spanish_model_mulitlingual_bert", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ner_ehr_spanish_model_mulitlingual_bert| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|665.1 MB| + +## References + +https://huggingface.co/ajtamayoh/NER_EHR_Spanish_model_Mulitlingual_BERT \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-ner_ehr_spanish_model_mulitlingual_bert_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-ner_ehr_spanish_model_mulitlingual_bert_pipeline_en.md new file mode 100644 index 00000000000000..868545e6ac6f18 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-ner_ehr_spanish_model_mulitlingual_bert_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English ner_ehr_spanish_model_mulitlingual_bert_pipeline pipeline BertForTokenClassification from ajtamayoh +author: John Snow Labs +name: ner_ehr_spanish_model_mulitlingual_bert_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ner_ehr_spanish_model_mulitlingual_bert_pipeline` is a English model originally trained by ajtamayoh. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ner_ehr_spanish_model_mulitlingual_bert_pipeline_en_5.5.0_3.0_1727040484405.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ner_ehr_spanish_model_mulitlingual_bert_pipeline_en_5.5.0_3.0_1727040484405.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("ner_ehr_spanish_model_mulitlingual_bert_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("ner_ehr_spanish_model_mulitlingual_bert_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ner_ehr_spanish_model_mulitlingual_bert_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|665.1 MB| + +## References + +https://huggingface.co/ajtamayoh/NER_EHR_Spanish_model_Mulitlingual_BERT + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-ner_gec_roberta_v3_en.md b/docs/_posts/ahmedlone127/2024-09-22-ner_gec_roberta_v3_en.md new file mode 100644 index 00000000000000..12f6965f6d1c38 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-ner_gec_roberta_v3_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English ner_gec_roberta_v3 RoBertaForTokenClassification from fursov +author: John Snow Labs +name: ner_gec_roberta_v3 +date: 2024-09-22 +tags: [en, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ner_gec_roberta_v3` is a English model originally trained by fursov. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ner_gec_roberta_v3_en_5.5.0_3.0_1727048525709.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ner_gec_roberta_v3_en_5.5.0_3.0_1727048525709.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("ner_gec_roberta_v3","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("ner_gec_roberta_v3", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ner_gec_roberta_v3| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|439.1 MB| + +## References + +https://huggingface.co/fursov/ner-gec-roberta-v3 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-ner_gec_roberta_v3_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-ner_gec_roberta_v3_pipeline_en.md new file mode 100644 index 00000000000000..6d17f8cf4cc9f7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-ner_gec_roberta_v3_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English ner_gec_roberta_v3_pipeline pipeline RoBertaForTokenClassification from fursov +author: John Snow Labs +name: ner_gec_roberta_v3_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ner_gec_roberta_v3_pipeline` is a English model originally trained by fursov. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ner_gec_roberta_v3_pipeline_en_5.5.0_3.0_1727048557044.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ner_gec_roberta_v3_pipeline_en_5.5.0_3.0_1727048557044.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("ner_gec_roberta_v3_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("ner_gec_roberta_v3_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ner_gec_roberta_v3_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|439.1 MB| + +## References + +https://huggingface.co/fursov/ner-gec-roberta-v3 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-ner_ner_random2_seed2_bernice_en.md b/docs/_posts/ahmedlone127/2024-09-22-ner_ner_random2_seed2_bernice_en.md new file mode 100644 index 00000000000000..30a42656563ab7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-ner_ner_random2_seed2_bernice_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English ner_ner_random2_seed2_bernice XlmRoBertaForTokenClassification from tweettemposhift +author: John Snow Labs +name: ner_ner_random2_seed2_bernice +date: 2024-09-22 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ner_ner_random2_seed2_bernice` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ner_ner_random2_seed2_bernice_en_5.5.0_3.0_1726969888631.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ner_ner_random2_seed2_bernice_en_5.5.0_3.0_1726969888631.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("ner_ner_random2_seed2_bernice","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("ner_ner_random2_seed2_bernice", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ner_ner_random2_seed2_bernice| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|802.5 MB| + +## References + +https://huggingface.co/tweettemposhift/ner-ner_random2_seed2-bernice \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-ner_ner_random2_seed2_bernice_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-ner_ner_random2_seed2_bernice_pipeline_en.md new file mode 100644 index 00000000000000..40856b705332ca --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-ner_ner_random2_seed2_bernice_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English ner_ner_random2_seed2_bernice_pipeline pipeline XlmRoBertaForTokenClassification from tweettemposhift +author: John Snow Labs +name: ner_ner_random2_seed2_bernice_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ner_ner_random2_seed2_bernice_pipeline` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ner_ner_random2_seed2_bernice_pipeline_en_5.5.0_3.0_1726970022139.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ner_ner_random2_seed2_bernice_pipeline_en_5.5.0_3.0_1726970022139.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("ner_ner_random2_seed2_bernice_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("ner_ner_random2_seed2_bernice_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ner_ner_random2_seed2_bernice_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|802.5 MB| + +## References + +https://huggingface.co/tweettemposhift/ner-ner_random2_seed2-bernice + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-ner_productname_en.md b/docs/_posts/ahmedlone127/2024-09-22-ner_productname_en.md new file mode 100644 index 00000000000000..0fbc67c9e24724 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-ner_productname_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English ner_productname BertForTokenClassification from sianbrumm +author: John Snow Labs +name: ner_productname +date: 2024-09-22 +tags: [en, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ner_productname` is a English model originally trained by sianbrumm. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ner_productname_en_5.5.0_3.0_1726966209567.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ner_productname_en_5.5.0_3.0_1726966209567.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("ner_productname","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("ner_productname", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ner_productname| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|409.9 MB| + +## References + +https://huggingface.co/sianbrumm/Ner_Productname \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-ner_productname_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-ner_productname_pipeline_en.md new file mode 100644 index 00000000000000..3a91932acd20de --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-ner_productname_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English ner_productname_pipeline pipeline BertForTokenClassification from sianbrumm +author: John Snow Labs +name: ner_productname_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ner_productname_pipeline` is a English model originally trained by sianbrumm. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ner_productname_pipeline_en_5.5.0_3.0_1726966228144.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ner_productname_pipeline_en_5.5.0_3.0_1726966228144.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("ner_productname_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("ner_productname_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ner_productname_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|410.0 MB| + +## References + +https://huggingface.co/sianbrumm/Ner_Productname + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-ner_serverstable_v0_en.md b/docs/_posts/ahmedlone127/2024-09-22-ner_serverstable_v0_en.md new file mode 100644 index 00000000000000..a95a650fc0661f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-ner_serverstable_v0_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English ner_serverstable_v0 BertForTokenClassification from procit002 +author: John Snow Labs +name: ner_serverstable_v0 +date: 2024-09-22 +tags: [en, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ner_serverstable_v0` is a English model originally trained by procit002. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ner_serverstable_v0_en_5.5.0_3.0_1727045435582.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ner_serverstable_v0_en_5.5.0_3.0_1727045435582.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("ner_serverstable_v0","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("ner_serverstable_v0", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ner_serverstable_v0| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/procit002/NER_ServerStable_v0 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-ner_serverstable_v0_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-ner_serverstable_v0_pipeline_en.md new file mode 100644 index 00000000000000..26d12b940ab797 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-ner_serverstable_v0_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English ner_serverstable_v0_pipeline pipeline BertForTokenClassification from procit002 +author: John Snow Labs +name: ner_serverstable_v0_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ner_serverstable_v0_pipeline` is a English model originally trained by procit002. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ner_serverstable_v0_pipeline_en_5.5.0_3.0_1727045459554.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ner_serverstable_v0_pipeline_en_5.5.0_3.0_1727045459554.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("ner_serverstable_v0_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("ner_serverstable_v0_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ner_serverstable_v0_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/procit002/NER_ServerStable_v0 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-neurips_distilbert_combined_1_en.md b/docs/_posts/ahmedlone127/2024-09-22-neurips_distilbert_combined_1_en.md new file mode 100644 index 00000000000000..66b3289eb43bd2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-neurips_distilbert_combined_1_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English neurips_distilbert_combined_1 DistilBertForSequenceClassification from neurips-user +author: John Snow Labs +name: neurips_distilbert_combined_1 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`neurips_distilbert_combined_1` is a English model originally trained by neurips-user. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/neurips_distilbert_combined_1_en_5.5.0_3.0_1727020853525.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/neurips_distilbert_combined_1_en_5.5.0_3.0_1727020853525.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("neurips_distilbert_combined_1","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("neurips_distilbert_combined_1", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|neurips_distilbert_combined_1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/neurips-user/neurips-distilbert-combined-1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-neurips_distilbert_combined_1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-neurips_distilbert_combined_1_pipeline_en.md new file mode 100644 index 00000000000000..c7ec04a8bb6ddd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-neurips_distilbert_combined_1_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English neurips_distilbert_combined_1_pipeline pipeline DistilBertForSequenceClassification from neurips-user +author: John Snow Labs +name: neurips_distilbert_combined_1_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`neurips_distilbert_combined_1_pipeline` is a English model originally trained by neurips-user. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/neurips_distilbert_combined_1_pipeline_en_5.5.0_3.0_1727020865318.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/neurips_distilbert_combined_1_pipeline_en_5.5.0_3.0_1727020865318.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("neurips_distilbert_combined_1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("neurips_distilbert_combined_1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|neurips_distilbert_combined_1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/neurips-user/neurips-distilbert-combined-1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-nli_roberta_base_finetuned_for_amazon_review_ratings_ninjabanana1_en.md b/docs/_posts/ahmedlone127/2024-09-22-nli_roberta_base_finetuned_for_amazon_review_ratings_ninjabanana1_en.md new file mode 100644 index 00000000000000..a9d104b9b736bf --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-nli_roberta_base_finetuned_for_amazon_review_ratings_ninjabanana1_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English nli_roberta_base_finetuned_for_amazon_review_ratings_ninjabanana1 RoBertaForSequenceClassification from NinjaBanana1 +author: John Snow Labs +name: nli_roberta_base_finetuned_for_amazon_review_ratings_ninjabanana1 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`nli_roberta_base_finetuned_for_amazon_review_ratings_ninjabanana1` is a English model originally trained by NinjaBanana1. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/nli_roberta_base_finetuned_for_amazon_review_ratings_ninjabanana1_en_5.5.0_3.0_1726971998891.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/nli_roberta_base_finetuned_for_amazon_review_ratings_ninjabanana1_en_5.5.0_3.0_1726971998891.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("nli_roberta_base_finetuned_for_amazon_review_ratings_ninjabanana1","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("nli_roberta_base_finetuned_for_amazon_review_ratings_ninjabanana1", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|nli_roberta_base_finetuned_for_amazon_review_ratings_ninjabanana1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|466.2 MB| + +## References + +https://huggingface.co/NinjaBanana1/nli-roberta-base-finetuned-for-amazon-review-ratings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-nli_roberta_base_finetuned_for_amazon_review_ratings_ninjabanana1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-nli_roberta_base_finetuned_for_amazon_review_ratings_ninjabanana1_pipeline_en.md new file mode 100644 index 00000000000000..e71e3b112467c1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-nli_roberta_base_finetuned_for_amazon_review_ratings_ninjabanana1_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English nli_roberta_base_finetuned_for_amazon_review_ratings_ninjabanana1_pipeline pipeline RoBertaForSequenceClassification from NinjaBanana1 +author: John Snow Labs +name: nli_roberta_base_finetuned_for_amazon_review_ratings_ninjabanana1_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`nli_roberta_base_finetuned_for_amazon_review_ratings_ninjabanana1_pipeline` is a English model originally trained by NinjaBanana1. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/nli_roberta_base_finetuned_for_amazon_review_ratings_ninjabanana1_pipeline_en_5.5.0_3.0_1726972020072.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/nli_roberta_base_finetuned_for_amazon_review_ratings_ninjabanana1_pipeline_en_5.5.0_3.0_1726972020072.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("nli_roberta_base_finetuned_for_amazon_review_ratings_ninjabanana1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("nli_roberta_base_finetuned_for_amazon_review_ratings_ninjabanana1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|nli_roberta_base_finetuned_for_amazon_review_ratings_ninjabanana1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|466.2 MB| + +## References + +https://huggingface.co/NinjaBanana1/nli-roberta-base-finetuned-for-amazon-review-ratings + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-nlp2_base_3e_4_fixed_en.md b/docs/_posts/ahmedlone127/2024-09-22-nlp2_base_3e_4_fixed_en.md new file mode 100644 index 00000000000000..f3689c7482d605 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-nlp2_base_3e_4_fixed_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English nlp2_base_3e_4_fixed DistilBertForSequenceClassification from VRT-2428211 +author: John Snow Labs +name: nlp2_base_3e_4_fixed +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`nlp2_base_3e_4_fixed` is a English model originally trained by VRT-2428211. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/nlp2_base_3e_4_fixed_en_5.5.0_3.0_1727033713332.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/nlp2_base_3e_4_fixed_en_5.5.0_3.0_1727033713332.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("nlp2_base_3e_4_fixed","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("nlp2_base_3e_4_fixed", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|nlp2_base_3e_4_fixed| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/VRT-2428211/NLP2_Base_3e-4_Fixed \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-nlp_hf_workshop2_en.md b/docs/_posts/ahmedlone127/2024-09-22-nlp_hf_workshop2_en.md new file mode 100644 index 00000000000000..f81122cba08781 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-nlp_hf_workshop2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English nlp_hf_workshop2 DistilBertForSequenceClassification from Bahareh0281 +author: John Snow Labs +name: nlp_hf_workshop2 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`nlp_hf_workshop2` is a English model originally trained by Bahareh0281. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/nlp_hf_workshop2_en_5.5.0_3.0_1726980528174.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/nlp_hf_workshop2_en_5.5.0_3.0_1726980528174.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("nlp_hf_workshop2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("nlp_hf_workshop2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|nlp_hf_workshop2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|246.0 MB| + +## References + +https://huggingface.co/Bahareh0281/NLP_HF_Workshop2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-nlp_hf_workshop2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-nlp_hf_workshop2_pipeline_en.md new file mode 100644 index 00000000000000..d7345173374c8d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-nlp_hf_workshop2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English nlp_hf_workshop2_pipeline pipeline DistilBertForSequenceClassification from Bahareh0281 +author: John Snow Labs +name: nlp_hf_workshop2_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`nlp_hf_workshop2_pipeline` is a English model originally trained by Bahareh0281. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/nlp_hf_workshop2_pipeline_en_5.5.0_3.0_1726980539416.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/nlp_hf_workshop2_pipeline_en_5.5.0_3.0_1726980539416.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("nlp_hf_workshop2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("nlp_hf_workshop2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|nlp_hf_workshop2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|246.0 MB| + +## References + +https://huggingface.co/Bahareh0281/NLP_HF_Workshop2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-p_model_2_en.md b/docs/_posts/ahmedlone127/2024-09-22-p_model_2_en.md new file mode 100644 index 00000000000000..7aa5e6392e2b49 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-p_model_2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English p_model_2 DistilBertForSequenceClassification from Habaznya +author: John Snow Labs +name: p_model_2 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`p_model_2` is a English model originally trained by Habaznya. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/p_model_2_en_5.5.0_3.0_1727012560507.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/p_model_2_en_5.5.0_3.0_1727012560507.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("p_model_2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("p_model_2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|p_model_2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|507.6 MB| + +## References + +https://huggingface.co/Habaznya/p_model_2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-p_model_2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-p_model_2_pipeline_en.md new file mode 100644 index 00000000000000..77c3d49c98d0c7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-p_model_2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English p_model_2_pipeline pipeline DistilBertForSequenceClassification from Habaznya +author: John Snow Labs +name: p_model_2_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`p_model_2_pipeline` is a English model originally trained by Habaznya. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/p_model_2_pipeline_en_5.5.0_3.0_1727012583305.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/p_model_2_pipeline_en_5.5.0_3.0_1727012583305.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("p_model_2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("p_model_2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|p_model_2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|507.6 MB| + +## References + +https://huggingface.co/Habaznya/p_model_2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-paludistilbertpartaugmentedoriginal_en.md b/docs/_posts/ahmedlone127/2024-09-22-paludistilbertpartaugmentedoriginal_en.md new file mode 100644 index 00000000000000..7b431c1e84c706 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-paludistilbertpartaugmentedoriginal_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English paludistilbertpartaugmentedoriginal DistilBertForSequenceClassification from Palu001 +author: John Snow Labs +name: paludistilbertpartaugmentedoriginal +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`paludistilbertpartaugmentedoriginal` is a English model originally trained by Palu001. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/paludistilbertpartaugmentedoriginal_en_5.5.0_3.0_1726980224901.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/paludistilbertpartaugmentedoriginal_en_5.5.0_3.0_1726980224901.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("paludistilbertpartaugmentedoriginal","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("paludistilbertpartaugmentedoriginal", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|paludistilbertpartaugmentedoriginal| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Palu001/PaluDistilbertPartAugmentedOriginal \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-paludistilbertpartaugmentedoriginal_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-paludistilbertpartaugmentedoriginal_pipeline_en.md new file mode 100644 index 00000000000000..1b75d928934e8b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-paludistilbertpartaugmentedoriginal_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English paludistilbertpartaugmentedoriginal_pipeline pipeline DistilBertForSequenceClassification from Palu001 +author: John Snow Labs +name: paludistilbertpartaugmentedoriginal_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`paludistilbertpartaugmentedoriginal_pipeline` is a English model originally trained by Palu001. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/paludistilbertpartaugmentedoriginal_pipeline_en_5.5.0_3.0_1726980236573.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/paludistilbertpartaugmentedoriginal_pipeline_en_5.5.0_3.0_1726980236573.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("paludistilbertpartaugmentedoriginal_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("paludistilbertpartaugmentedoriginal_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|paludistilbertpartaugmentedoriginal_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Palu001/PaluDistilbertPartAugmentedOriginal + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-panamianmodel_en.md b/docs/_posts/ahmedlone127/2024-09-22-panamianmodel_en.md new file mode 100644 index 00000000000000..ccce7ffe864284 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-panamianmodel_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English panamianmodel RoBertaEmbeddings from joseangelatm +author: John Snow Labs +name: panamianmodel +date: 2024-09-22 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`panamianmodel` is a English model originally trained by joseangelatm. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/panamianmodel_en_5.5.0_3.0_1727041601341.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/panamianmodel_en_5.5.0_3.0_1727041601341.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("panamianmodel","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("panamianmodel","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|panamianmodel| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|465.4 MB| + +## References + +https://huggingface.co/joseangelatm/PanamianModel \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-panamianmodel_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-panamianmodel_pipeline_en.md new file mode 100644 index 00000000000000..13defd3cfa4e55 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-panamianmodel_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English panamianmodel_pipeline pipeline RoBertaEmbeddings from joseangelatm +author: John Snow Labs +name: panamianmodel_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`panamianmodel_pipeline` is a English model originally trained by joseangelatm. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/panamianmodel_pipeline_en_5.5.0_3.0_1727041624936.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/panamianmodel_pipeline_en_5.5.0_3.0_1727041624936.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("panamianmodel_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("panamianmodel_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|panamianmodel_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|465.4 MB| + +## References + +https://huggingface.co/joseangelatm/PanamianModel + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-persian_text_emotion_bert_v1_pipeline_fa.md b/docs/_posts/ahmedlone127/2024-09-22-persian_text_emotion_bert_v1_pipeline_fa.md new file mode 100644 index 00000000000000..a62fe6a28fb298 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-persian_text_emotion_bert_v1_pipeline_fa.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Persian persian_text_emotion_bert_v1_pipeline pipeline BertForSequenceClassification from SeyedAli +author: John Snow Labs +name: persian_text_emotion_bert_v1_pipeline +date: 2024-09-22 +tags: [fa, open_source, pipeline, onnx] +task: Text Classification +language: fa +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`persian_text_emotion_bert_v1_pipeline` is a Persian model originally trained by SeyedAli. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/persian_text_emotion_bert_v1_pipeline_fa_5.5.0_3.0_1726988688390.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/persian_text_emotion_bert_v1_pipeline_fa_5.5.0_3.0_1726988688390.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("persian_text_emotion_bert_v1_pipeline", lang = "fa") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("persian_text_emotion_bert_v1_pipeline", lang = "fa") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|persian_text_emotion_bert_v1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|fa| +|Size:|608.7 MB| + +## References + +https://huggingface.co/SeyedAli/Persian-Text-Emotion-Bert-V1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-platzi_distilroberta_base_mrpc_glue_angrim_en.md b/docs/_posts/ahmedlone127/2024-09-22-platzi_distilroberta_base_mrpc_glue_angrim_en.md new file mode 100644 index 00000000000000..b9c9f5e8779113 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-platzi_distilroberta_base_mrpc_glue_angrim_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English platzi_distilroberta_base_mrpc_glue_angrim RoBertaForSequenceClassification from platzi +author: John Snow Labs +name: platzi_distilroberta_base_mrpc_glue_angrim +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`platzi_distilroberta_base_mrpc_glue_angrim` is a English model originally trained by platzi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/platzi_distilroberta_base_mrpc_glue_angrim_en_5.5.0_3.0_1727027149342.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/platzi_distilroberta_base_mrpc_glue_angrim_en_5.5.0_3.0_1727027149342.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("platzi_distilroberta_base_mrpc_glue_angrim","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("platzi_distilroberta_base_mrpc_glue_angrim", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|platzi_distilroberta_base_mrpc_glue_angrim| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|308.6 MB| + +## References + +https://huggingface.co/platzi/platzi-distilroberta-base-mrpc-glue-angrim \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-platzi_distilroberta_base_mrpc_glue_ricardo_talavera_en.md b/docs/_posts/ahmedlone127/2024-09-22-platzi_distilroberta_base_mrpc_glue_ricardo_talavera_en.md new file mode 100644 index 00000000000000..dab036b6654e72 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-platzi_distilroberta_base_mrpc_glue_ricardo_talavera_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English platzi_distilroberta_base_mrpc_glue_ricardo_talavera RoBertaForSequenceClassification from ricardotalavera +author: John Snow Labs +name: platzi_distilroberta_base_mrpc_glue_ricardo_talavera +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`platzi_distilroberta_base_mrpc_glue_ricardo_talavera` is a English model originally trained by ricardotalavera. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/platzi_distilroberta_base_mrpc_glue_ricardo_talavera_en_5.5.0_3.0_1727026768137.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/platzi_distilroberta_base_mrpc_glue_ricardo_talavera_en_5.5.0_3.0_1727026768137.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("platzi_distilroberta_base_mrpc_glue_ricardo_talavera","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("platzi_distilroberta_base_mrpc_glue_ricardo_talavera", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|platzi_distilroberta_base_mrpc_glue_ricardo_talavera| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|308.6 MB| + +## References + +https://huggingface.co/ricardotalavera/platzi-distilroberta-base-mrpc-glue-ricardo-talavera \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-platzi_distilroberta_base_mrpc_glue_ricardo_talavera_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-platzi_distilroberta_base_mrpc_glue_ricardo_talavera_pipeline_en.md new file mode 100644 index 00000000000000..0d760bdf7bda2d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-platzi_distilroberta_base_mrpc_glue_ricardo_talavera_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English platzi_distilroberta_base_mrpc_glue_ricardo_talavera_pipeline pipeline RoBertaForSequenceClassification from ricardotalavera +author: John Snow Labs +name: platzi_distilroberta_base_mrpc_glue_ricardo_talavera_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`platzi_distilroberta_base_mrpc_glue_ricardo_talavera_pipeline` is a English model originally trained by ricardotalavera. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/platzi_distilroberta_base_mrpc_glue_ricardo_talavera_pipeline_en_5.5.0_3.0_1727026782855.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/platzi_distilroberta_base_mrpc_glue_ricardo_talavera_pipeline_en_5.5.0_3.0_1727026782855.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("platzi_distilroberta_base_mrpc_glue_ricardo_talavera_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("platzi_distilroberta_base_mrpc_glue_ricardo_talavera_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|platzi_distilroberta_base_mrpc_glue_ricardo_talavera_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|308.6 MB| + +## References + +https://huggingface.co/ricardotalavera/platzi-distilroberta-base-mrpc-glue-ricardo-talavera + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-proposed_mediumf_model_en.md b/docs/_posts/ahmedlone127/2024-09-22-proposed_mediumf_model_en.md new file mode 100644 index 00000000000000..ede5721ffc3d62 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-proposed_mediumf_model_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English proposed_mediumf_model RoBertaEmbeddings from athar +author: John Snow Labs +name: proposed_mediumf_model +date: 2024-09-22 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`proposed_mediumf_model` is a English model originally trained by athar. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/proposed_mediumf_model_en_5.5.0_3.0_1727041549211.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/proposed_mediumf_model_en_5.5.0_3.0_1727041549211.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("proposed_mediumf_model","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("proposed_mediumf_model","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|proposed_mediumf_model| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|364.7 MB| + +## References + +https://huggingface.co/athar/proposed_MEDIUMF-model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-proposed_mediumf_model_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-proposed_mediumf_model_pipeline_en.md new file mode 100644 index 00000000000000..f9fc667b6f9be8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-proposed_mediumf_model_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English proposed_mediumf_model_pipeline pipeline RoBertaEmbeddings from athar +author: John Snow Labs +name: proposed_mediumf_model_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`proposed_mediumf_model_pipeline` is a English model originally trained by athar. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/proposed_mediumf_model_pipeline_en_5.5.0_3.0_1727041567718.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/proposed_mediumf_model_pipeline_en_5.5.0_3.0_1727041567718.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("proposed_mediumf_model_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("proposed_mediumf_model_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|proposed_mediumf_model_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|364.7 MB| + +## References + +https://huggingface.co/athar/proposed_MEDIUMF-model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-ptcrawl_plus_legal_base_v3_5__checkpoint_last_en.md b/docs/_posts/ahmedlone127/2024-09-22-ptcrawl_plus_legal_base_v3_5__checkpoint_last_en.md new file mode 100644 index 00000000000000..3c4daca7052bfc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-ptcrawl_plus_legal_base_v3_5__checkpoint_last_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English ptcrawl_plus_legal_base_v3_5__checkpoint_last RoBertaEmbeddings from eduagarcia-temp +author: John Snow Labs +name: ptcrawl_plus_legal_base_v3_5__checkpoint_last +date: 2024-09-22 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ptcrawl_plus_legal_base_v3_5__checkpoint_last` is a English model originally trained by eduagarcia-temp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ptcrawl_plus_legal_base_v3_5__checkpoint_last_en_5.5.0_3.0_1727041842066.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ptcrawl_plus_legal_base_v3_5__checkpoint_last_en_5.5.0_3.0_1727041842066.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("ptcrawl_plus_legal_base_v3_5__checkpoint_last","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("ptcrawl_plus_legal_base_v3_5__checkpoint_last","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ptcrawl_plus_legal_base_v3_5__checkpoint_last| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|296.7 MB| + +## References + +https://huggingface.co/eduagarcia-temp/ptcrawl_plus_legal_base_v3_5__checkpoint_last \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-ptcrawl_plus_legal_base_v3_5__checkpoint_last_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-ptcrawl_plus_legal_base_v3_5__checkpoint_last_pipeline_en.md new file mode 100644 index 00000000000000..f185be0d85a346 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-ptcrawl_plus_legal_base_v3_5__checkpoint_last_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English ptcrawl_plus_legal_base_v3_5__checkpoint_last_pipeline pipeline RoBertaEmbeddings from eduagarcia-temp +author: John Snow Labs +name: ptcrawl_plus_legal_base_v3_5__checkpoint_last_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ptcrawl_plus_legal_base_v3_5__checkpoint_last_pipeline` is a English model originally trained by eduagarcia-temp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ptcrawl_plus_legal_base_v3_5__checkpoint_last_pipeline_en_5.5.0_3.0_1727041930558.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ptcrawl_plus_legal_base_v3_5__checkpoint_last_pipeline_en_5.5.0_3.0_1727041930558.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("ptcrawl_plus_legal_base_v3_5__checkpoint_last_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("ptcrawl_plus_legal_base_v3_5__checkpoint_last_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ptcrawl_plus_legal_base_v3_5__checkpoint_last_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|296.7 MB| + +## References + +https://huggingface.co/eduagarcia-temp/ptcrawl_plus_legal_base_v3_5__checkpoint_last + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-pytorch_distilbert3_fallsclassifier_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-pytorch_distilbert3_fallsclassifier_pipeline_en.md new file mode 100644 index 00000000000000..340cba8fa21a63 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-pytorch_distilbert3_fallsclassifier_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English pytorch_distilbert3_fallsclassifier_pipeline pipeline DistilBertForSequenceClassification from Blaise-MR +author: John Snow Labs +name: pytorch_distilbert3_fallsclassifier_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`pytorch_distilbert3_fallsclassifier_pipeline` is a English model originally trained by Blaise-MR. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/pytorch_distilbert3_fallsclassifier_pipeline_en_5.5.0_3.0_1726980013810.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/pytorch_distilbert3_fallsclassifier_pipeline_en_5.5.0_3.0_1726980013810.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("pytorch_distilbert3_fallsclassifier_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("pytorch_distilbert3_fallsclassifier_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|pytorch_distilbert3_fallsclassifier_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Blaise-MR/pytorch_distilbert3_fallsclassifier + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-qa_bert_base_multilingual_cased_finetuned_squad_pipeline_xx.md b/docs/_posts/ahmedlone127/2024-09-22-qa_bert_base_multilingual_cased_finetuned_squad_pipeline_xx.md new file mode 100644 index 00000000000000..3b6b961e010c8d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-qa_bert_base_multilingual_cased_finetuned_squad_pipeline_xx.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Multilingual qa_bert_base_multilingual_cased_finetuned_squad_pipeline pipeline BertForQuestionAnswering from itsamitkumar +author: John Snow Labs +name: qa_bert_base_multilingual_cased_finetuned_squad_pipeline +date: 2024-09-22 +tags: [xx, open_source, pipeline, onnx] +task: Question Answering +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`qa_bert_base_multilingual_cased_finetuned_squad_pipeline` is a Multilingual model originally trained by itsamitkumar. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/qa_bert_base_multilingual_cased_finetuned_squad_pipeline_xx_5.5.0_3.0_1727049250023.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/qa_bert_base_multilingual_cased_finetuned_squad_pipeline_xx_5.5.0_3.0_1727049250023.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("qa_bert_base_multilingual_cased_finetuned_squad_pipeline", lang = "xx") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("qa_bert_base_multilingual_cased_finetuned_squad_pipeline", lang = "xx") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|qa_bert_base_multilingual_cased_finetuned_squad_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|xx| +|Size:|665.1 MB| + +## References + +https://huggingface.co/itsamitkumar/qa_bert-base-multilingual-cased-finetuned-squad + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-qa_bert_base_multilingual_cased_finetuned_squad_xx.md b/docs/_posts/ahmedlone127/2024-09-22-qa_bert_base_multilingual_cased_finetuned_squad_xx.md new file mode 100644 index 00000000000000..853146ef143518 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-qa_bert_base_multilingual_cased_finetuned_squad_xx.md @@ -0,0 +1,86 @@ +--- +layout: model +title: Multilingual qa_bert_base_multilingual_cased_finetuned_squad BertForQuestionAnswering from itsamitkumar +author: John Snow Labs +name: qa_bert_base_multilingual_cased_finetuned_squad +date: 2024-09-22 +tags: [xx, open_source, onnx, question_answering, bert] +task: Question Answering +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`qa_bert_base_multilingual_cased_finetuned_squad` is a Multilingual model originally trained by itsamitkumar. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/qa_bert_base_multilingual_cased_finetuned_squad_xx_5.5.0_3.0_1727049211091.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/qa_bert_base_multilingual_cased_finetuned_squad_xx_5.5.0_3.0_1727049211091.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("qa_bert_base_multilingual_cased_finetuned_squad","xx") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("qa_bert_base_multilingual_cased_finetuned_squad", "xx") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|qa_bert_base_multilingual_cased_finetuned_squad| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|xx| +|Size:|665.0 MB| + +## References + +https://huggingface.co/itsamitkumar/qa_bert-base-multilingual-cased-finetuned-squad \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-qnli_roberta_base_seed_3_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-qnli_roberta_base_seed_3_pipeline_en.md new file mode 100644 index 00000000000000..ea1e623259d30e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-qnli_roberta_base_seed_3_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English qnli_roberta_base_seed_3_pipeline pipeline RoBertaForSequenceClassification from utahnlp +author: John Snow Labs +name: qnli_roberta_base_seed_3_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`qnli_roberta_base_seed_3_pipeline` is a English model originally trained by utahnlp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/qnli_roberta_base_seed_3_pipeline_en_5.5.0_3.0_1727037198802.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/qnli_roberta_base_seed_3_pipeline_en_5.5.0_3.0_1727037198802.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("qnli_roberta_base_seed_3_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("qnli_roberta_base_seed_3_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|qnli_roberta_base_seed_3_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|461.6 MB| + +## References + +https://huggingface.co/utahnlp/qnli_roberta-base_seed-3 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-results_javierorjuela_en.md b/docs/_posts/ahmedlone127/2024-09-22-results_javierorjuela_en.md new file mode 100644 index 00000000000000..b868c917d503ab --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-results_javierorjuela_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English results_javierorjuela DistilBertForSequenceClassification from javierorjuela +author: John Snow Labs +name: results_javierorjuela +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`results_javierorjuela` is a English model originally trained by javierorjuela. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/results_javierorjuela_en_5.5.0_3.0_1726980181665.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/results_javierorjuela_en_5.5.0_3.0_1726980181665.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("results_javierorjuela","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("results_javierorjuela", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|results_javierorjuela| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|507.6 MB| + +## References + +https://huggingface.co/javierorjuela/results \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-results_javierorjuela_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-results_javierorjuela_pipeline_en.md new file mode 100644 index 00000000000000..10086769171913 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-results_javierorjuela_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English results_javierorjuela_pipeline pipeline DistilBertForSequenceClassification from javierorjuela +author: John Snow Labs +name: results_javierorjuela_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`results_javierorjuela_pipeline` is a English model originally trained by javierorjuela. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/results_javierorjuela_pipeline_en_5.5.0_3.0_1726980205103.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/results_javierorjuela_pipeline_en_5.5.0_3.0_1726980205103.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("results_javierorjuela_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("results_javierorjuela_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|results_javierorjuela_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|507.6 MB| + +## References + +https://huggingface.co/javierorjuela/results + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-reward_model_en.md b/docs/_posts/ahmedlone127/2024-09-22-reward_model_en.md new file mode 100644 index 00000000000000..ceee99d26db197 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-reward_model_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English reward_model RoBertaForSequenceClassification from lillybak +author: John Snow Labs +name: reward_model +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`reward_model` is a English model originally trained by lillybak. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/reward_model_en_5.5.0_3.0_1727037836585.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/reward_model_en_5.5.0_3.0_1727037836585.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("reward_model","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("reward_model", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|reward_model| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|309.0 MB| + +## References + +https://huggingface.co/lillybak/reward_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-reward_model_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-reward_model_pipeline_en.md new file mode 100644 index 00000000000000..d342c6078adfaa --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-reward_model_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English reward_model_pipeline pipeline RoBertaForSequenceClassification from lillybak +author: John Snow Labs +name: reward_model_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`reward_model_pipeline` is a English model originally trained by lillybak. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/reward_model_pipeline_en_5.5.0_3.0_1727037852561.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/reward_model_pipeline_en_5.5.0_3.0_1727037852561.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("reward_model_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("reward_model_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|reward_model_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|309.1 MB| + +## References + +https://huggingface.co/lillybak/reward_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-rl_grp_prj_per_cls_en.md b/docs/_posts/ahmedlone127/2024-09-22-rl_grp_prj_per_cls_en.md new file mode 100644 index 00000000000000..79b23bf22f302e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-rl_grp_prj_per_cls_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English rl_grp_prj_per_cls RoBertaForSequenceClassification from nasheed +author: John Snow Labs +name: rl_grp_prj_per_cls +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`rl_grp_prj_per_cls` is a English model originally trained by nasheed. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/rl_grp_prj_per_cls_en_5.5.0_3.0_1727016878903.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/rl_grp_prj_per_cls_en_5.5.0_3.0_1727016878903.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("rl_grp_prj_per_cls","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("rl_grp_prj_per_cls", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|rl_grp_prj_per_cls| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|422.0 MB| + +## References + +https://huggingface.co/nasheed/rl-grp-prj-per-cls \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-rl_grp_prj_per_cls_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-rl_grp_prj_per_cls_pipeline_en.md new file mode 100644 index 00000000000000..78cfd119d4b777 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-rl_grp_prj_per_cls_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English rl_grp_prj_per_cls_pipeline pipeline RoBertaForSequenceClassification from nasheed +author: John Snow Labs +name: rl_grp_prj_per_cls_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`rl_grp_prj_per_cls_pipeline` is a English model originally trained by nasheed. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/rl_grp_prj_per_cls_pipeline_en_5.5.0_3.0_1727016918434.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/rl_grp_prj_per_cls_pipeline_en_5.5.0_3.0_1727016918434.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("rl_grp_prj_per_cls_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("rl_grp_prj_per_cls_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|rl_grp_prj_per_cls_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|422.1 MB| + +## References + +https://huggingface.co/nasheed/rl-grp-prj-per-cls + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-robbert_emotions_en.md b/docs/_posts/ahmedlone127/2024-09-22-robbert_emotions_en.md new file mode 100644 index 00000000000000..801cd7020186cf --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-robbert_emotions_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English robbert_emotions RoBertaForSequenceClassification from rroell +author: John Snow Labs +name: robbert_emotions +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`robbert_emotions` is a English model originally trained by rroell. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/robbert_emotions_en_5.5.0_3.0_1727026865707.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/robbert_emotions_en_5.5.0_3.0_1727026865707.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("robbert_emotions","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("robbert_emotions", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|robbert_emotions| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|437.9 MB| + +## References + +https://huggingface.co/rroell/RoBBERT-emotions \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-robbert_emotions_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-robbert_emotions_pipeline_en.md new file mode 100644 index 00000000000000..5f3e59e33d0735 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-robbert_emotions_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English robbert_emotions_pipeline pipeline RoBertaForSequenceClassification from rroell +author: John Snow Labs +name: robbert_emotions_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`robbert_emotions_pipeline` is a English model originally trained by rroell. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/robbert_emotions_pipeline_en_5.5.0_3.0_1727026887748.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/robbert_emotions_pipeline_en_5.5.0_3.0_1727026887748.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("robbert_emotions_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("robbert_emotions_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|robbert_emotions_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|438.0 MB| + +## References + +https://huggingface.co/rroell/RoBBERT-emotions + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-roberta_9_en.md b/docs/_posts/ahmedlone127/2024-09-22-roberta_9_en.md new file mode 100644 index 00000000000000..c4ff8d6a011dcf --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-roberta_9_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_9 RoBertaForSequenceClassification from mollypak +author: John Snow Labs +name: roberta_9 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_9` is a English model originally trained by mollypak. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_9_en_5.5.0_3.0_1726972265563.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_9_en_5.5.0_3.0_1726972265563.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_9","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_9", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_9| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|422.9 MB| + +## References + +https://huggingface.co/mollypak/roberta_9 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-roberta_9_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-roberta_9_pipeline_en.md new file mode 100644 index 00000000000000..2c1c1a7ed5403b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-roberta_9_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_9_pipeline pipeline RoBertaForSequenceClassification from mollypak +author: John Snow Labs +name: roberta_9_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_9_pipeline` is a English model originally trained by mollypak. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_9_pipeline_en_5.5.0_3.0_1726972304265.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_9_pipeline_en_5.5.0_3.0_1726972304265.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_9_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_9_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_9_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|422.9 MB| + +## References + +https://huggingface.co/mollypak/roberta_9 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-roberta_augmented_finetuned_atis_5pct_v1_en.md b/docs/_posts/ahmedlone127/2024-09-22-roberta_augmented_finetuned_atis_5pct_v1_en.md new file mode 100644 index 00000000000000..fa6f745efba293 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-roberta_augmented_finetuned_atis_5pct_v1_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_augmented_finetuned_atis_5pct_v1 RoBertaForSequenceClassification from benayas +author: John Snow Labs +name: roberta_augmented_finetuned_atis_5pct_v1 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_augmented_finetuned_atis_5pct_v1` is a English model originally trained by benayas. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_augmented_finetuned_atis_5pct_v1_en_5.5.0_3.0_1727027540028.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_augmented_finetuned_atis_5pct_v1_en_5.5.0_3.0_1727027540028.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_augmented_finetuned_atis_5pct_v1","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_augmented_finetuned_atis_5pct_v1", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_augmented_finetuned_atis_5pct_v1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|425.6 MB| + +## References + +https://huggingface.co/benayas/roberta-augmented-finetuned-atis_5pct_v1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-roberta_augmented_finetuned_atis_5pct_v1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-roberta_augmented_finetuned_atis_5pct_v1_pipeline_en.md new file mode 100644 index 00000000000000..09663f8de86b77 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-roberta_augmented_finetuned_atis_5pct_v1_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_augmented_finetuned_atis_5pct_v1_pipeline pipeline RoBertaForSequenceClassification from benayas +author: John Snow Labs +name: roberta_augmented_finetuned_atis_5pct_v1_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_augmented_finetuned_atis_5pct_v1_pipeline` is a English model originally trained by benayas. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_augmented_finetuned_atis_5pct_v1_pipeline_en_5.5.0_3.0_1727027578661.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_augmented_finetuned_atis_5pct_v1_pipeline_en_5.5.0_3.0_1727027578661.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_augmented_finetuned_atis_5pct_v1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_augmented_finetuned_atis_5pct_v1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_augmented_finetuned_atis_5pct_v1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|425.6 MB| + +## References + +https://huggingface.co/benayas/roberta-augmented-finetuned-atis_5pct_v1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-roberta_babe_3epochs_en.md b/docs/_posts/ahmedlone127/2024-09-22-roberta_babe_3epochs_en.md new file mode 100644 index 00000000000000..676eb20c52efb4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-roberta_babe_3epochs_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_babe_3epochs RoBertaForSequenceClassification from jordankrishnayah +author: John Snow Labs +name: roberta_babe_3epochs +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_babe_3epochs` is a English model originally trained by jordankrishnayah. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_babe_3epochs_en_5.5.0_3.0_1727017645046.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_babe_3epochs_en_5.5.0_3.0_1727017645046.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_babe_3epochs","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_babe_3epochs", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_babe_3epochs| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|438.4 MB| + +## References + +https://huggingface.co/jordankrishnayah/ROBERTA-BABE-3epochs \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-roberta_babe_3epochs_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-roberta_babe_3epochs_pipeline_en.md new file mode 100644 index 00000000000000..bfe5aa4b0d5e6d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-roberta_babe_3epochs_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_babe_3epochs_pipeline pipeline RoBertaForSequenceClassification from jordankrishnayah +author: John Snow Labs +name: roberta_babe_3epochs_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_babe_3epochs_pipeline` is a English model originally trained by jordankrishnayah. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_babe_3epochs_pipeline_en_5.5.0_3.0_1727017673953.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_babe_3epochs_pipeline_en_5.5.0_3.0_1727017673953.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_babe_3epochs_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_babe_3epochs_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_babe_3epochs_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|438.4 MB| + +## References + +https://huggingface.co/jordankrishnayah/ROBERTA-BABE-3epochs + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-roberta_base_airlines_news_binary_en.md b/docs/_posts/ahmedlone127/2024-09-22-roberta_base_airlines_news_binary_en.md new file mode 100644 index 00000000000000..4c131a00dd12e3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-roberta_base_airlines_news_binary_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_base_airlines_news_binary RoBertaForSequenceClassification from dahe827 +author: John Snow Labs +name: roberta_base_airlines_news_binary +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_airlines_news_binary` is a English model originally trained by dahe827. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_airlines_news_binary_en_5.5.0_3.0_1727036976097.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_airlines_news_binary_en_5.5.0_3.0_1727036976097.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_base_airlines_news_binary","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_base_airlines_news_binary", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_airlines_news_binary| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|438.4 MB| + +## References + +https://huggingface.co/dahe827/roberta-base-airlines-news-binary \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-roberta_base_airlines_news_binary_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-roberta_base_airlines_news_binary_pipeline_en.md new file mode 100644 index 00000000000000..c15045e86d2466 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-roberta_base_airlines_news_binary_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_airlines_news_binary_pipeline pipeline RoBertaForSequenceClassification from dahe827 +author: John Snow Labs +name: roberta_base_airlines_news_binary_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_airlines_news_binary_pipeline` is a English model originally trained by dahe827. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_airlines_news_binary_pipeline_en_5.5.0_3.0_1727037015137.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_airlines_news_binary_pipeline_en_5.5.0_3.0_1727037015137.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_airlines_news_binary_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_airlines_news_binary_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_airlines_news_binary_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|438.4 MB| + +## References + +https://huggingface.co/dahe827/roberta-base-airlines-news-binary + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-roberta_base_bne_jou_amazon_reviews_multi_pamelapaolacb_en.md b/docs/_posts/ahmedlone127/2024-09-22-roberta_base_bne_jou_amazon_reviews_multi_pamelapaolacb_en.md new file mode 100644 index 00000000000000..d1e8b748df5a75 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-roberta_base_bne_jou_amazon_reviews_multi_pamelapaolacb_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_base_bne_jou_amazon_reviews_multi_pamelapaolacb RoBertaForSequenceClassification from pamelapaolacb +author: John Snow Labs +name: roberta_base_bne_jou_amazon_reviews_multi_pamelapaolacb +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_bne_jou_amazon_reviews_multi_pamelapaolacb` is a English model originally trained by pamelapaolacb. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_bne_jou_amazon_reviews_multi_pamelapaolacb_en_5.5.0_3.0_1726972309460.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_bne_jou_amazon_reviews_multi_pamelapaolacb_en_5.5.0_3.0_1726972309460.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_base_bne_jou_amazon_reviews_multi_pamelapaolacb","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_base_bne_jou_amazon_reviews_multi_pamelapaolacb", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_bne_jou_amazon_reviews_multi_pamelapaolacb| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|446.8 MB| + +## References + +https://huggingface.co/pamelapaolacb/roberta-base-bne-jou-amazon_reviews_multi \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-roberta_base_bne_jou_amazon_reviews_multi_pamelapaolacb_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-roberta_base_bne_jou_amazon_reviews_multi_pamelapaolacb_pipeline_en.md new file mode 100644 index 00000000000000..da77f3cf560b77 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-roberta_base_bne_jou_amazon_reviews_multi_pamelapaolacb_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_bne_jou_amazon_reviews_multi_pamelapaolacb_pipeline pipeline RoBertaForSequenceClassification from pamelapaolacb +author: John Snow Labs +name: roberta_base_bne_jou_amazon_reviews_multi_pamelapaolacb_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_bne_jou_amazon_reviews_multi_pamelapaolacb_pipeline` is a English model originally trained by pamelapaolacb. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_bne_jou_amazon_reviews_multi_pamelapaolacb_pipeline_en_5.5.0_3.0_1726972331072.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_bne_jou_amazon_reviews_multi_pamelapaolacb_pipeline_en_5.5.0_3.0_1726972331072.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_bne_jou_amazon_reviews_multi_pamelapaolacb_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_bne_jou_amazon_reviews_multi_pamelapaolacb_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_bne_jou_amazon_reviews_multi_pamelapaolacb_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|446.8 MB| + +## References + +https://huggingface.co/pamelapaolacb/roberta-base-bne-jou-amazon_reviews_multi + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-roberta_base_cased_finetuned_mnli_en.md b/docs/_posts/ahmedlone127/2024-09-22-roberta_base_cased_finetuned_mnli_en.md new file mode 100644 index 00000000000000..cb1164c37af251 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-roberta_base_cased_finetuned_mnli_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_base_cased_finetuned_mnli RoBertaForSequenceClassification from George-Ogden +author: John Snow Labs +name: roberta_base_cased_finetuned_mnli +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_cased_finetuned_mnli` is a English model originally trained by George-Ogden. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_cased_finetuned_mnli_en_5.5.0_3.0_1727017431827.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_cased_finetuned_mnli_en_5.5.0_3.0_1727017431827.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_base_cased_finetuned_mnli","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_base_cased_finetuned_mnli", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_cased_finetuned_mnli| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|462.5 MB| + +## References + +https://huggingface.co/George-Ogden/roberta-base-cased-finetuned-mnli \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-roberta_base_cased_finetuned_mnli_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-roberta_base_cased_finetuned_mnli_pipeline_en.md new file mode 100644 index 00000000000000..443f11b0688b0d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-roberta_base_cased_finetuned_mnli_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_cased_finetuned_mnli_pipeline pipeline RoBertaForSequenceClassification from George-Ogden +author: John Snow Labs +name: roberta_base_cased_finetuned_mnli_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_cased_finetuned_mnli_pipeline` is a English model originally trained by George-Ogden. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_cased_finetuned_mnli_pipeline_en_5.5.0_3.0_1727017454794.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_cased_finetuned_mnli_pipeline_en_5.5.0_3.0_1727017454794.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_cased_finetuned_mnli_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_cased_finetuned_mnli_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_cased_finetuned_mnli_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|462.5 MB| + +## References + +https://huggingface.co/George-Ogden/roberta-base-cased-finetuned-mnli + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-roberta_base_dutch_oscar23_en.md b/docs/_posts/ahmedlone127/2024-09-22-roberta_base_dutch_oscar23_en.md new file mode 100644 index 00000000000000..d3f786cf65499d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-roberta_base_dutch_oscar23_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_base_dutch_oscar23 RoBertaEmbeddings from FremyCompany +author: John Snow Labs +name: roberta_base_dutch_oscar23 +date: 2024-09-22 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_dutch_oscar23` is a English model originally trained by FremyCompany. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_dutch_oscar23_en_5.5.0_3.0_1726999567115.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_dutch_oscar23_en_5.5.0_3.0_1726999567115.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("roberta_base_dutch_oscar23","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("roberta_base_dutch_oscar23","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_dutch_oscar23| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|464.8 MB| + +## References + +https://huggingface.co/FremyCompany/roberta-base-nl-oscar23 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-roberta_base_dutch_oscar23_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-roberta_base_dutch_oscar23_pipeline_en.md new file mode 100644 index 00000000000000..b86517e842f370 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-roberta_base_dutch_oscar23_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_dutch_oscar23_pipeline pipeline RoBertaEmbeddings from FremyCompany +author: John Snow Labs +name: roberta_base_dutch_oscar23_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_dutch_oscar23_pipeline` is a English model originally trained by FremyCompany. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_dutch_oscar23_pipeline_en_5.5.0_3.0_1726999587737.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_dutch_oscar23_pipeline_en_5.5.0_3.0_1726999587737.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_dutch_oscar23_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_dutch_oscar23_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_dutch_oscar23_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|464.8 MB| + +## References + +https://huggingface.co/FremyCompany/roberta-base-nl-oscar23 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-roberta_base_finetuned_dark_en.md b/docs/_posts/ahmedlone127/2024-09-22-roberta_base_finetuned_dark_en.md new file mode 100644 index 00000000000000..22b17a3476494e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-roberta_base_finetuned_dark_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_base_finetuned_dark RoBertaForSequenceClassification from geektech +author: John Snow Labs +name: roberta_base_finetuned_dark +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_finetuned_dark` is a English model originally trained by geektech. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_finetuned_dark_en_5.5.0_3.0_1727026437444.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_finetuned_dark_en_5.5.0_3.0_1727026437444.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_base_finetuned_dark","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_base_finetuned_dark", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_finetuned_dark| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|423.3 MB| + +## References + +https://huggingface.co/geektech/roberta-base-finetuned-dark \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-roberta_base_finetuned_dark_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-roberta_base_finetuned_dark_pipeline_en.md new file mode 100644 index 00000000000000..3d60abc7342a10 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-roberta_base_finetuned_dark_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_finetuned_dark_pipeline pipeline RoBertaForSequenceClassification from geektech +author: John Snow Labs +name: roberta_base_finetuned_dark_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_finetuned_dark_pipeline` is a English model originally trained by geektech. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_finetuned_dark_pipeline_en_5.5.0_3.0_1727026475926.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_finetuned_dark_pipeline_en_5.5.0_3.0_1727026475926.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_finetuned_dark_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_finetuned_dark_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_finetuned_dark_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|423.3 MB| + +## References + +https://huggingface.co/geektech/roberta-base-finetuned-dark + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-roberta_base_finetuned_mnli_kuaaangwen_en.md b/docs/_posts/ahmedlone127/2024-09-22-roberta_base_finetuned_mnli_kuaaangwen_en.md new file mode 100644 index 00000000000000..18c8b26b59ac0f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-roberta_base_finetuned_mnli_kuaaangwen_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_base_finetuned_mnli_kuaaangwen RoBertaForSequenceClassification from Kuaaangwen +author: John Snow Labs +name: roberta_base_finetuned_mnli_kuaaangwen +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_finetuned_mnli_kuaaangwen` is a English model originally trained by Kuaaangwen. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_finetuned_mnli_kuaaangwen_en_5.5.0_3.0_1727017592170.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_finetuned_mnli_kuaaangwen_en_5.5.0_3.0_1727017592170.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_base_finetuned_mnli_kuaaangwen","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_base_finetuned_mnli_kuaaangwen", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_finetuned_mnli_kuaaangwen| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|465.5 MB| + +## References + +https://huggingface.co/Kuaaangwen/roberta-base-finetuned-mnli \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-roberta_base_finetuned_mnli_kuaaangwen_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-roberta_base_finetuned_mnli_kuaaangwen_pipeline_en.md new file mode 100644 index 00000000000000..da9136630c384e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-roberta_base_finetuned_mnli_kuaaangwen_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_finetuned_mnli_kuaaangwen_pipeline pipeline RoBertaForSequenceClassification from Kuaaangwen +author: John Snow Labs +name: roberta_base_finetuned_mnli_kuaaangwen_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_finetuned_mnli_kuaaangwen_pipeline` is a English model originally trained by Kuaaangwen. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_finetuned_mnli_kuaaangwen_pipeline_en_5.5.0_3.0_1727017614208.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_finetuned_mnli_kuaaangwen_pipeline_en_5.5.0_3.0_1727017614208.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_finetuned_mnli_kuaaangwen_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_finetuned_mnli_kuaaangwen_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_finetuned_mnli_kuaaangwen_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|465.5 MB| + +## References + +https://huggingface.co/Kuaaangwen/roberta-base-finetuned-mnli + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-roberta_base_finetuned_sleevelength_en.md b/docs/_posts/ahmedlone127/2024-09-22-roberta_base_finetuned_sleevelength_en.md new file mode 100644 index 00000000000000..01e7223dc93fee --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-roberta_base_finetuned_sleevelength_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_base_finetuned_sleevelength RoBertaForSequenceClassification from Cournane +author: John Snow Labs +name: roberta_base_finetuned_sleevelength +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_finetuned_sleevelength` is a English model originally trained by Cournane. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_finetuned_sleevelength_en_5.5.0_3.0_1727036898412.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_finetuned_sleevelength_en_5.5.0_3.0_1727036898412.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_base_finetuned_sleevelength","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_base_finetuned_sleevelength", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_finetuned_sleevelength| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|427.2 MB| + +## References + +https://huggingface.co/Cournane/roberta-base-finetuned-SleeveLength \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-roberta_base_hoax_classifier_defs_1h2r_en.md b/docs/_posts/ahmedlone127/2024-09-22-roberta_base_hoax_classifier_defs_1h2r_en.md new file mode 100644 index 00000000000000..6037cc86cc731a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-roberta_base_hoax_classifier_defs_1h2r_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_base_hoax_classifier_defs_1h2r RoBertaForSequenceClassification from research-dump +author: John Snow Labs +name: roberta_base_hoax_classifier_defs_1h2r +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_hoax_classifier_defs_1h2r` is a English model originally trained by research-dump. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_hoax_classifier_defs_1h2r_en_5.5.0_3.0_1727037375415.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_hoax_classifier_defs_1h2r_en_5.5.0_3.0_1727037375415.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_base_hoax_classifier_defs_1h2r","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_base_hoax_classifier_defs_1h2r", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_hoax_classifier_defs_1h2r| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|432.6 MB| + +## References + +https://huggingface.co/research-dump/roberta-base_hoax_classifier_defs_1h2r \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-roberta_base_hoax_classifier_defs_1h2r_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-roberta_base_hoax_classifier_defs_1h2r_pipeline_en.md new file mode 100644 index 00000000000000..3b7f749560b0b0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-roberta_base_hoax_classifier_defs_1h2r_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_hoax_classifier_defs_1h2r_pipeline pipeline RoBertaForSequenceClassification from research-dump +author: John Snow Labs +name: roberta_base_hoax_classifier_defs_1h2r_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_hoax_classifier_defs_1h2r_pipeline` is a English model originally trained by research-dump. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_hoax_classifier_defs_1h2r_pipeline_en_5.5.0_3.0_1727037412878.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_hoax_classifier_defs_1h2r_pipeline_en_5.5.0_3.0_1727037412878.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_hoax_classifier_defs_1h2r_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_hoax_classifier_defs_1h2r_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_hoax_classifier_defs_1h2r_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|432.7 MB| + +## References + +https://huggingface.co/research-dump/roberta-base_hoax_classifier_defs_1h2r + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-roberta_base_md_gender_bias_trained_en.md b/docs/_posts/ahmedlone127/2024-09-22-roberta_base_md_gender_bias_trained_en.md new file mode 100644 index 00000000000000..7530e1a61be7c0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-roberta_base_md_gender_bias_trained_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_base_md_gender_bias_trained RoBertaForSequenceClassification from JakobKaiser +author: John Snow Labs +name: roberta_base_md_gender_bias_trained +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_md_gender_bias_trained` is a English model originally trained by JakobKaiser. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_md_gender_bias_trained_en_5.5.0_3.0_1727017094626.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_md_gender_bias_trained_en_5.5.0_3.0_1727017094626.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_base_md_gender_bias_trained","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_base_md_gender_bias_trained", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_md_gender_bias_trained| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|436.0 MB| + +## References + +https://huggingface.co/JakobKaiser/roberta-base-md_gender_bias-trained \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-roberta_base_md_gender_bias_trained_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-roberta_base_md_gender_bias_trained_pipeline_en.md new file mode 100644 index 00000000000000..5d0dcb4b6a0a31 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-roberta_base_md_gender_bias_trained_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_md_gender_bias_trained_pipeline pipeline RoBertaForSequenceClassification from JakobKaiser +author: John Snow Labs +name: roberta_base_md_gender_bias_trained_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_md_gender_bias_trained_pipeline` is a English model originally trained by JakobKaiser. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_md_gender_bias_trained_pipeline_en_5.5.0_3.0_1727017119375.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_md_gender_bias_trained_pipeline_en_5.5.0_3.0_1727017119375.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_md_gender_bias_trained_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_md_gender_bias_trained_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_md_gender_bias_trained_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|436.0 MB| + +## References + +https://huggingface.co/JakobKaiser/roberta-base-md_gender_bias-trained + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-roberta_base_oscar_chen_en.md b/docs/_posts/ahmedlone127/2024-09-22-roberta_base_oscar_chen_en.md new file mode 100644 index 00000000000000..59c30351e5f3c1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-roberta_base_oscar_chen_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_base_oscar_chen RoBertaForSequenceClassification from Oscar-chen +author: John Snow Labs +name: roberta_base_oscar_chen +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_oscar_chen` is a English model originally trained by Oscar-chen. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_oscar_chen_en_5.5.0_3.0_1726972171071.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_oscar_chen_en_5.5.0_3.0_1726972171071.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_base_oscar_chen","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_base_oscar_chen", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_oscar_chen| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|433.1 MB| + +## References + +https://huggingface.co/Oscar-chen/roberta-base \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-roberta_base_oscar_chen_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-roberta_base_oscar_chen_pipeline_en.md new file mode 100644 index 00000000000000..df39bf412024ed --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-roberta_base_oscar_chen_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_oscar_chen_pipeline pipeline RoBertaForSequenceClassification from Oscar-chen +author: John Snow Labs +name: roberta_base_oscar_chen_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_oscar_chen_pipeline` is a English model originally trained by Oscar-chen. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_oscar_chen_pipeline_en_5.5.0_3.0_1726972198300.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_oscar_chen_pipeline_en_5.5.0_3.0_1726972198300.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_oscar_chen_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_oscar_chen_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_oscar_chen_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|433.2 MB| + +## References + +https://huggingface.co/Oscar-chen/roberta-base + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-roberta_base_philpapers_en.md b/docs/_posts/ahmedlone127/2024-09-22-roberta_base_philpapers_en.md new file mode 100644 index 00000000000000..d02ccd8a5cec47 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-roberta_base_philpapers_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_base_philpapers RoBertaEmbeddings from Dhruvil47 +author: John Snow Labs +name: roberta_base_philpapers +date: 2024-09-22 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_philpapers` is a English model originally trained by Dhruvil47. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_philpapers_en_5.5.0_3.0_1726999716378.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_philpapers_en_5.5.0_3.0_1726999716378.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("roberta_base_philpapers","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("roberta_base_philpapers","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_philpapers| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|466.3 MB| + +## References + +https://huggingface.co/Dhruvil47/roberta-base-philpapers \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-roberta_base_philpapers_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-roberta_base_philpapers_pipeline_en.md new file mode 100644 index 00000000000000..52ab5efce4b4dd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-roberta_base_philpapers_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_philpapers_pipeline pipeline RoBertaEmbeddings from Dhruvil47 +author: John Snow Labs +name: roberta_base_philpapers_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_philpapers_pipeline` is a English model originally trained by Dhruvil47. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_philpapers_pipeline_en_5.5.0_3.0_1726999737210.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_philpapers_pipeline_en_5.5.0_3.0_1726999737210.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_philpapers_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_philpapers_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_philpapers_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|466.4 MB| + +## References + +https://huggingface.co/Dhruvil47/roberta-base-philpapers + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-roberta_base_plausibility_en.md b/docs/_posts/ahmedlone127/2024-09-22-roberta_base_plausibility_en.md new file mode 100644 index 00000000000000..2f58898c2ac32d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-roberta_base_plausibility_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_base_plausibility RoBertaForSequenceClassification from ianporada +author: John Snow Labs +name: roberta_base_plausibility +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_plausibility` is a English model originally trained by ianporada. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_plausibility_en_5.5.0_3.0_1727026440179.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_plausibility_en_5.5.0_3.0_1727026440179.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_base_plausibility","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_base_plausibility", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_plausibility| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|420.3 MB| + +## References + +https://huggingface.co/ianporada/roberta_base_plausibility \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-roberta_base_plausibility_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-roberta_base_plausibility_pipeline_en.md new file mode 100644 index 00000000000000..17dacec546c471 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-roberta_base_plausibility_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_plausibility_pipeline pipeline RoBertaForSequenceClassification from ianporada +author: John Snow Labs +name: roberta_base_plausibility_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_plausibility_pipeline` is a English model originally trained by ianporada. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_plausibility_pipeline_en_5.5.0_3.0_1727026481622.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_plausibility_pipeline_en_5.5.0_3.0_1727026481622.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_plausibility_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_plausibility_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_plausibility_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|420.4 MB| + +## References + +https://huggingface.co/ianporada/roberta_base_plausibility + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-roberta_base_reduced_upper_fabric_en.md b/docs/_posts/ahmedlone127/2024-09-22-roberta_base_reduced_upper_fabric_en.md new file mode 100644 index 00000000000000..53466b04685b00 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-roberta_base_reduced_upper_fabric_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_base_reduced_upper_fabric RoBertaForSequenceClassification from Cournane +author: John Snow Labs +name: roberta_base_reduced_upper_fabric +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_reduced_upper_fabric` is a English model originally trained by Cournane. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_reduced_upper_fabric_en_5.5.0_3.0_1727017731890.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_reduced_upper_fabric_en_5.5.0_3.0_1727017731890.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_base_reduced_upper_fabric","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_base_reduced_upper_fabric", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_reduced_upper_fabric| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|427.2 MB| + +## References + +https://huggingface.co/Cournane/roberta-base-reduced-Upper_fabric \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-roberta_base_reduced_upper_fabric_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-roberta_base_reduced_upper_fabric_pipeline_en.md new file mode 100644 index 00000000000000..605444a6c08441 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-roberta_base_reduced_upper_fabric_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_reduced_upper_fabric_pipeline pipeline RoBertaForSequenceClassification from Cournane +author: John Snow Labs +name: roberta_base_reduced_upper_fabric_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_reduced_upper_fabric_pipeline` is a English model originally trained by Cournane. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_reduced_upper_fabric_pipeline_en_5.5.0_3.0_1727017758021.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_reduced_upper_fabric_pipeline_en_5.5.0_3.0_1727017758021.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_reduced_upper_fabric_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_reduced_upper_fabric_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_reduced_upper_fabric_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|427.2 MB| + +## References + +https://huggingface.co/Cournane/roberta-base-reduced-Upper_fabric + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-roberta_base_turkish_uncased_stance_pipeline_tr.md b/docs/_posts/ahmedlone127/2024-09-22-roberta_base_turkish_uncased_stance_pipeline_tr.md new file mode 100644 index 00000000000000..55b7ebc26b995d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-roberta_base_turkish_uncased_stance_pipeline_tr.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Turkish roberta_base_turkish_uncased_stance_pipeline pipeline RoBertaForSequenceClassification from byunal +author: John Snow Labs +name: roberta_base_turkish_uncased_stance_pipeline +date: 2024-09-22 +tags: [tr, open_source, pipeline, onnx] +task: Text Classification +language: tr +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_turkish_uncased_stance_pipeline` is a Turkish model originally trained by byunal. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_turkish_uncased_stance_pipeline_tr_5.5.0_3.0_1727017395547.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_turkish_uncased_stance_pipeline_tr_5.5.0_3.0_1727017395547.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_turkish_uncased_stance_pipeline", lang = "tr") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_turkish_uncased_stance_pipeline", lang = "tr") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_turkish_uncased_stance_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|tr| +|Size:|463.7 MB| + +## References + +https://huggingface.co/byunal/roberta-base-turkish-uncased-stance + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-roberta_base_turkish_uncased_stance_tr.md b/docs/_posts/ahmedlone127/2024-09-22-roberta_base_turkish_uncased_stance_tr.md new file mode 100644 index 00000000000000..f16c34d53b5502 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-roberta_base_turkish_uncased_stance_tr.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Turkish roberta_base_turkish_uncased_stance RoBertaForSequenceClassification from byunal +author: John Snow Labs +name: roberta_base_turkish_uncased_stance +date: 2024-09-22 +tags: [tr, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: tr +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_turkish_uncased_stance` is a Turkish model originally trained by byunal. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_turkish_uncased_stance_tr_5.5.0_3.0_1727017374018.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_turkish_uncased_stance_tr_5.5.0_3.0_1727017374018.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_base_turkish_uncased_stance","tr") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_base_turkish_uncased_stance", "tr") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_turkish_uncased_stance| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|tr| +|Size:|463.7 MB| + +## References + +https://huggingface.co/byunal/roberta-base-turkish-uncased-stance \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-roberta_base_tweet_topic_multi_2020_en.md b/docs/_posts/ahmedlone127/2024-09-22-roberta_base_tweet_topic_multi_2020_en.md new file mode 100644 index 00000000000000..8e1d7a3ef35de5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-roberta_base_tweet_topic_multi_2020_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_base_tweet_topic_multi_2020 RoBertaForSequenceClassification from cardiffnlp +author: John Snow Labs +name: roberta_base_tweet_topic_multi_2020 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_tweet_topic_multi_2020` is a English model originally trained by cardiffnlp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_tweet_topic_multi_2020_en_5.5.0_3.0_1726967441556.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_tweet_topic_multi_2020_en_5.5.0_3.0_1726967441556.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_base_tweet_topic_multi_2020","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_base_tweet_topic_multi_2020", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_tweet_topic_multi_2020| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|441.6 MB| + +## References + +https://huggingface.co/cardiffnlp/roberta-base-tweet-topic-multi-2020 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-roberta_base_tweet_topic_multi_2020_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-roberta_base_tweet_topic_multi_2020_pipeline_en.md new file mode 100644 index 00000000000000..a0feb6c5723d4b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-roberta_base_tweet_topic_multi_2020_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_tweet_topic_multi_2020_pipeline pipeline RoBertaForSequenceClassification from cardiffnlp +author: John Snow Labs +name: roberta_base_tweet_topic_multi_2020_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_tweet_topic_multi_2020_pipeline` is a English model originally trained by cardiffnlp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_tweet_topic_multi_2020_pipeline_en_5.5.0_3.0_1726967470783.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_tweet_topic_multi_2020_pipeline_en_5.5.0_3.0_1726967470783.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_tweet_topic_multi_2020_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_tweet_topic_multi_2020_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_tweet_topic_multi_2020_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|441.7 MB| + +## References + +https://huggingface.co/cardiffnlp/roberta-base-tweet-topic-multi-2020 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-roberta_base_wechsel_ukrainian_uk.md b/docs/_posts/ahmedlone127/2024-09-22-roberta_base_wechsel_ukrainian_uk.md new file mode 100644 index 00000000000000..dfb6cb293a48d1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-roberta_base_wechsel_ukrainian_uk.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Ukrainian roberta_base_wechsel_ukrainian RoBertaEmbeddings from benjamin +author: John Snow Labs +name: roberta_base_wechsel_ukrainian +date: 2024-09-22 +tags: [uk, open_source, onnx, embeddings, roberta] +task: Embeddings +language: uk +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_wechsel_ukrainian` is a Ukrainian model originally trained by benjamin. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_wechsel_ukrainian_uk_5.5.0_3.0_1727041915094.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_wechsel_ukrainian_uk_5.5.0_3.0_1727041915094.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("roberta_base_wechsel_ukrainian","uk") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("roberta_base_wechsel_ukrainian","uk") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_wechsel_ukrainian| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|uk| +|Size:|465.9 MB| + +## References + +https://huggingface.co/benjamin/roberta-base-wechsel-ukrainian \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-roberta_bert_10_unmalicious_en.md b/docs/_posts/ahmedlone127/2024-09-22-roberta_bert_10_unmalicious_en.md new file mode 100644 index 00000000000000..14046d95cb34eb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-roberta_bert_10_unmalicious_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_bert_10_unmalicious RoBertaEmbeddings from ubaskota +author: John Snow Labs +name: roberta_bert_10_unmalicious +date: 2024-09-22 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_bert_10_unmalicious` is a English model originally trained by ubaskota. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_bert_10_unmalicious_en_5.5.0_3.0_1726999624127.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_bert_10_unmalicious_en_5.5.0_3.0_1726999624127.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("roberta_bert_10_unmalicious","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("roberta_bert_10_unmalicious","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_bert_10_unmalicious| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|465.9 MB| + +## References + +https://huggingface.co/ubaskota/roberta_BERT_10_unmalicious \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-roberta_bert_10_unmalicious_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-roberta_bert_10_unmalicious_pipeline_en.md new file mode 100644 index 00000000000000..44b388c74b3f59 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-roberta_bert_10_unmalicious_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_bert_10_unmalicious_pipeline pipeline RoBertaEmbeddings from ubaskota +author: John Snow Labs +name: roberta_bert_10_unmalicious_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_bert_10_unmalicious_pipeline` is a English model originally trained by ubaskota. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_bert_10_unmalicious_pipeline_en_5.5.0_3.0_1726999645200.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_bert_10_unmalicious_pipeline_en_5.5.0_3.0_1726999645200.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_bert_10_unmalicious_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_bert_10_unmalicious_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_bert_10_unmalicious_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|465.9 MB| + +## References + +https://huggingface.co/ubaskota/roberta_BERT_10_unmalicious + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-roberta_cws_ctb6_en.md b/docs/_posts/ahmedlone127/2024-09-22-roberta_cws_ctb6_en.md new file mode 100644 index 00000000000000..960ca7773f42c9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-roberta_cws_ctb6_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_cws_ctb6 BertForTokenClassification from tjspross +author: John Snow Labs +name: roberta_cws_ctb6 +date: 2024-09-22 +tags: [en, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_cws_ctb6` is a English model originally trained by tjspross. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_cws_ctb6_en_5.5.0_3.0_1727045698272.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_cws_ctb6_en_5.5.0_3.0_1727045698272.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("roberta_cws_ctb6","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("roberta_cws_ctb6", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_cws_ctb6| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/tjspross/roberta_cws_ctb6 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-roberta_cws_ctb6_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-roberta_cws_ctb6_pipeline_en.md new file mode 100644 index 00000000000000..2179a17a482cfc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-roberta_cws_ctb6_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_cws_ctb6_pipeline pipeline BertForTokenClassification from tjspross +author: John Snow Labs +name: roberta_cws_ctb6_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_cws_ctb6_pipeline` is a English model originally trained by tjspross. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_cws_ctb6_pipeline_en_5.5.0_3.0_1727045755776.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_cws_ctb6_pipeline_en_5.5.0_3.0_1727045755776.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_cws_ctb6_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_cws_ctb6_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_cws_ctb6_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/tjspross/roberta_cws_ctb6 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-roberta_full_finetuned_banking_100pct_v2_en.md b/docs/_posts/ahmedlone127/2024-09-22-roberta_full_finetuned_banking_100pct_v2_en.md new file mode 100644 index 00000000000000..11d2791b03bf5c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-roberta_full_finetuned_banking_100pct_v2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_full_finetuned_banking_100pct_v2 RoBertaForSequenceClassification from benayas +author: John Snow Labs +name: roberta_full_finetuned_banking_100pct_v2 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_full_finetuned_banking_100pct_v2` is a English model originally trained by benayas. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_full_finetuned_banking_100pct_v2_en_5.5.0_3.0_1726971832743.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_full_finetuned_banking_100pct_v2_en_5.5.0_3.0_1726971832743.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_full_finetuned_banking_100pct_v2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_full_finetuned_banking_100pct_v2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_full_finetuned_banking_100pct_v2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|420.4 MB| + +## References + +https://huggingface.co/benayas/roberta-full-finetuned-banking_100pct_v2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-roberta_full_finetuned_banking_100pct_v2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-roberta_full_finetuned_banking_100pct_v2_pipeline_en.md new file mode 100644 index 00000000000000..42759c414174ac --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-roberta_full_finetuned_banking_100pct_v2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_full_finetuned_banking_100pct_v2_pipeline pipeline RoBertaForSequenceClassification from benayas +author: John Snow Labs +name: roberta_full_finetuned_banking_100pct_v2_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_full_finetuned_banking_100pct_v2_pipeline` is a English model originally trained by benayas. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_full_finetuned_banking_100pct_v2_pipeline_en_5.5.0_3.0_1726971872088.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_full_finetuned_banking_100pct_v2_pipeline_en_5.5.0_3.0_1726971872088.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_full_finetuned_banking_100pct_v2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_full_finetuned_banking_100pct_v2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_full_finetuned_banking_100pct_v2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|420.4 MB| + +## References + +https://huggingface.co/benayas/roberta-full-finetuned-banking_100pct_v2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-roberta_ingredients_en.md b/docs/_posts/ahmedlone127/2024-09-22-roberta_ingredients_en.md new file mode 100644 index 00000000000000..c67d3f0a8a6fa2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-roberta_ingredients_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_ingredients RoBertaEmbeddings from ggilley +author: John Snow Labs +name: roberta_ingredients +date: 2024-09-22 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_ingredients` is a English model originally trained by ggilley. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_ingredients_en_5.5.0_3.0_1727042083306.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_ingredients_en_5.5.0_3.0_1727042083306.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("roberta_ingredients","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("roberta_ingredients","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_ingredients| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|309.0 MB| + +## References + +https://huggingface.co/ggilley/roberta-ingredients \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-roberta_large_mnli_model3_en.md b/docs/_posts/ahmedlone127/2024-09-22-roberta_large_mnli_model3_en.md new file mode 100644 index 00000000000000..2e1127737b76ae --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-roberta_large_mnli_model3_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_large_mnli_model3 RoBertaForSequenceClassification from varun-v-rao +author: John Snow Labs +name: roberta_large_mnli_model3 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_large_mnli_model3` is a English model originally trained by varun-v-rao. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_large_mnli_model3_en_5.5.0_3.0_1727038017764.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_large_mnli_model3_en_5.5.0_3.0_1727038017764.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_large_mnli_model3","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_large_mnli_model3", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_large_mnli_model3| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/varun-v-rao/roberta-large-mnli-model3 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-roberta_large_mnli_model3_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-roberta_large_mnli_model3_pipeline_en.md new file mode 100644 index 00000000000000..794907ad110878 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-roberta_large_mnli_model3_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_large_mnli_model3_pipeline pipeline RoBertaForSequenceClassification from varun-v-rao +author: John Snow Labs +name: roberta_large_mnli_model3_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_large_mnli_model3_pipeline` is a English model originally trained by varun-v-rao. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_large_mnli_model3_pipeline_en_5.5.0_3.0_1727038084170.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_large_mnli_model3_pipeline_en_5.5.0_3.0_1727038084170.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_large_mnli_model3_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_large_mnli_model3_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_large_mnli_model3_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/varun-v-rao/roberta-large-mnli-model3 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-roberta_large_sst_2_16_13_smoothed_en.md b/docs/_posts/ahmedlone127/2024-09-22-roberta_large_sst_2_16_13_smoothed_en.md new file mode 100644 index 00000000000000..0d35cd12f1441d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-roberta_large_sst_2_16_13_smoothed_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_large_sst_2_16_13_smoothed RoBertaForSequenceClassification from simonycl +author: John Snow Labs +name: roberta_large_sst_2_16_13_smoothed +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_large_sst_2_16_13_smoothed` is a English model originally trained by simonycl. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_large_sst_2_16_13_smoothed_en_5.5.0_3.0_1726972443205.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_large_sst_2_16_13_smoothed_en_5.5.0_3.0_1726972443205.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_large_sst_2_16_13_smoothed","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_large_sst_2_16_13_smoothed", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_large_sst_2_16_13_smoothed| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/simonycl/roberta-large-sst-2-16-13-smoothed \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-roberta_large_sst_2_16_13_smoothed_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-roberta_large_sst_2_16_13_smoothed_pipeline_en.md new file mode 100644 index 00000000000000..4d77ac6a9032a4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-roberta_large_sst_2_16_13_smoothed_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_large_sst_2_16_13_smoothed_pipeline pipeline RoBertaForSequenceClassification from simonycl +author: John Snow Labs +name: roberta_large_sst_2_16_13_smoothed_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_large_sst_2_16_13_smoothed_pipeline` is a English model originally trained by simonycl. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_large_sst_2_16_13_smoothed_pipeline_en_5.5.0_3.0_1726972516150.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_large_sst_2_16_13_smoothed_pipeline_en_5.5.0_3.0_1726972516150.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_large_sst_2_16_13_smoothed_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_large_sst_2_16_13_smoothed_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_large_sst_2_16_13_smoothed_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/simonycl/roberta-large-sst-2-16-13-smoothed + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-roberta_large_temp_classifier_v2_en.md b/docs/_posts/ahmedlone127/2024-09-22-roberta_large_temp_classifier_v2_en.md new file mode 100644 index 00000000000000..1f7e9c0f500b98 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-roberta_large_temp_classifier_v2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_large_temp_classifier_v2 RoBertaForSequenceClassification from research-dump +author: John Snow Labs +name: roberta_large_temp_classifier_v2 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_large_temp_classifier_v2` is a English model originally trained by research-dump. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_large_temp_classifier_v2_en_5.5.0_3.0_1727017028599.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_large_temp_classifier_v2_en_5.5.0_3.0_1727017028599.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_large_temp_classifier_v2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_large_temp_classifier_v2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_large_temp_classifier_v2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/research-dump/roberta_large_temp_classifier_v2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-roberta_large_temp_classifier_v2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-roberta_large_temp_classifier_v2_pipeline_en.md new file mode 100644 index 00000000000000..008280336bbb9f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-roberta_large_temp_classifier_v2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_large_temp_classifier_v2_pipeline pipeline RoBertaForSequenceClassification from research-dump +author: John Snow Labs +name: roberta_large_temp_classifier_v2_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_large_temp_classifier_v2_pipeline` is a English model originally trained by research-dump. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_large_temp_classifier_v2_pipeline_en_5.5.0_3.0_1727017097486.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_large_temp_classifier_v2_pipeline_en_5.5.0_3.0_1727017097486.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_large_temp_classifier_v2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_large_temp_classifier_v2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_large_temp_classifier_v2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/research-dump/roberta_large_temp_classifier_v2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-roberta_model_sst2_babylm_challenge_strict_small_en.md b/docs/_posts/ahmedlone127/2024-09-22-roberta_model_sst2_babylm_challenge_strict_small_en.md new file mode 100644 index 00000000000000..f2207c63cf74b4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-roberta_model_sst2_babylm_challenge_strict_small_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_model_sst2_babylm_challenge_strict_small RoBertaForSequenceClassification from TheBguy87 +author: John Snow Labs +name: roberta_model_sst2_babylm_challenge_strict_small +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_model_sst2_babylm_challenge_strict_small` is a English model originally trained by TheBguy87. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_model_sst2_babylm_challenge_strict_small_en_5.5.0_3.0_1727037298360.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_model_sst2_babylm_challenge_strict_small_en_5.5.0_3.0_1727037298360.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_model_sst2_babylm_challenge_strict_small","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_model_sst2_babylm_challenge_strict_small", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_model_sst2_babylm_challenge_strict_small| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|313.7 MB| + +## References + +https://huggingface.co/TheBguy87/roBERTa-Model-sst2-BabyLM-Challenge-Strict-Small \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-roberta_model_sst2_babylm_challenge_strict_small_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-roberta_model_sst2_babylm_challenge_strict_small_pipeline_en.md new file mode 100644 index 00000000000000..a959fd5e6862cf --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-roberta_model_sst2_babylm_challenge_strict_small_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_model_sst2_babylm_challenge_strict_small_pipeline pipeline RoBertaForSequenceClassification from TheBguy87 +author: John Snow Labs +name: roberta_model_sst2_babylm_challenge_strict_small_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_model_sst2_babylm_challenge_strict_small_pipeline` is a English model originally trained by TheBguy87. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_model_sst2_babylm_challenge_strict_small_pipeline_en_5.5.0_3.0_1727037316848.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_model_sst2_babylm_challenge_strict_small_pipeline_en_5.5.0_3.0_1727037316848.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_model_sst2_babylm_challenge_strict_small_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_model_sst2_babylm_challenge_strict_small_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_model_sst2_babylm_challenge_strict_small_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|313.7 MB| + +## References + +https://huggingface.co/TheBguy87/roBERTa-Model-sst2-BabyLM-Challenge-Strict-Small + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-roberta_poetry_religion_crpo_en.md b/docs/_posts/ahmedlone127/2024-09-22-roberta_poetry_religion_crpo_en.md new file mode 100644 index 00000000000000..dbf78d50f7a29f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-roberta_poetry_religion_crpo_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_poetry_religion_crpo RoBertaEmbeddings from andreipb +author: John Snow Labs +name: roberta_poetry_religion_crpo +date: 2024-09-22 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_poetry_religion_crpo` is a English model originally trained by andreipb. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_poetry_religion_crpo_en_5.5.0_3.0_1726999704632.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_poetry_religion_crpo_en_5.5.0_3.0_1726999704632.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("roberta_poetry_religion_crpo","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("roberta_poetry_religion_crpo","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_poetry_religion_crpo| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|466.1 MB| + +## References + +https://huggingface.co/andreipb/roberta-poetry-religion-crpo \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-roberta_poetry_religion_crpo_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-roberta_poetry_religion_crpo_pipeline_en.md new file mode 100644 index 00000000000000..48e1be3196413a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-roberta_poetry_religion_crpo_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_poetry_religion_crpo_pipeline pipeline RoBertaEmbeddings from andreipb +author: John Snow Labs +name: roberta_poetry_religion_crpo_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_poetry_religion_crpo_pipeline` is a English model originally trained by andreipb. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_poetry_religion_crpo_pipeline_en_5.5.0_3.0_1726999728004.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_poetry_religion_crpo_pipeline_en_5.5.0_3.0_1726999728004.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_poetry_religion_crpo_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_poetry_religion_crpo_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_poetry_religion_crpo_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|466.2 MB| + +## References + +https://huggingface.co/andreipb/roberta-poetry-religion-crpo + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-roberta_sentiment_classification_en.md b/docs/_posts/ahmedlone127/2024-09-22-roberta_sentiment_classification_en.md new file mode 100644 index 00000000000000..5f1b6da3b83bb3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-roberta_sentiment_classification_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_sentiment_classification RoBertaForSequenceClassification from newsmediabias +author: John Snow Labs +name: roberta_sentiment_classification +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_sentiment_classification` is a English model originally trained by newsmediabias. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_sentiment_classification_en_5.5.0_3.0_1727036898297.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_sentiment_classification_en_5.5.0_3.0_1727036898297.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_sentiment_classification","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_sentiment_classification", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_sentiment_classification| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|450.9 MB| + +## References + +https://huggingface.co/newsmediabias/Roberta_Sentiment_Classification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-roberta_sentiment_classification_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-roberta_sentiment_classification_pipeline_en.md new file mode 100644 index 00000000000000..65e0eee659292a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-roberta_sentiment_classification_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_sentiment_classification_pipeline pipeline RoBertaForSequenceClassification from newsmediabias +author: John Snow Labs +name: roberta_sentiment_classification_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_sentiment_classification_pipeline` is a English model originally trained by newsmediabias. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_sentiment_classification_pipeline_en_5.5.0_3.0_1727036931274.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_sentiment_classification_pipeline_en_5.5.0_3.0_1727036931274.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_sentiment_classification_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_sentiment_classification_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_sentiment_classification_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|451.0 MB| + +## References + +https://huggingface.co/newsmediabias/Roberta_Sentiment_Classification + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-roberta_tgmd_large_b_es1_en.md b/docs/_posts/ahmedlone127/2024-09-22-roberta_tgmd_large_b_es1_en.md new file mode 100644 index 00000000000000..724c95583072d8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-roberta_tgmd_large_b_es1_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_tgmd_large_b_es1 RoBertaForSequenceClassification from hadish +author: John Snow Labs +name: roberta_tgmd_large_b_es1 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_tgmd_large_b_es1` is a English model originally trained by hadish. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_tgmd_large_b_es1_en_5.5.0_3.0_1726971775641.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_tgmd_large_b_es1_en_5.5.0_3.0_1726971775641.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_tgmd_large_b_es1","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_tgmd_large_b_es1", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_tgmd_large_b_es1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/hadish/roberta-TGMD-large-B-ES1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-roberta_tgmd_large_b_es1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-roberta_tgmd_large_b_es1_pipeline_en.md new file mode 100644 index 00000000000000..b79e8dd36e0a17 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-roberta_tgmd_large_b_es1_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_tgmd_large_b_es1_pipeline pipeline RoBertaForSequenceClassification from hadish +author: John Snow Labs +name: roberta_tgmd_large_b_es1_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_tgmd_large_b_es1_pipeline` is a English model originally trained by hadish. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_tgmd_large_b_es1_pipeline_en_5.5.0_3.0_1726971838898.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_tgmd_large_b_es1_pipeline_en_5.5.0_3.0_1726971838898.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_tgmd_large_b_es1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_tgmd_large_b_es1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_tgmd_large_b_es1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/hadish/roberta-TGMD-large-B-ES1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-roberta_untrained_1eps_seed925_en.md b/docs/_posts/ahmedlone127/2024-09-22-roberta_untrained_1eps_seed925_en.md new file mode 100644 index 00000000000000..63b3192c25f398 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-roberta_untrained_1eps_seed925_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_untrained_1eps_seed925 RoBertaForSequenceClassification from custeau +author: John Snow Labs +name: roberta_untrained_1eps_seed925 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_untrained_1eps_seed925` is a English model originally trained by custeau. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_untrained_1eps_seed925_en_5.5.0_3.0_1726971603303.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_untrained_1eps_seed925_en_5.5.0_3.0_1726971603303.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_untrained_1eps_seed925","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_untrained_1eps_seed925", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_untrained_1eps_seed925| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|447.8 MB| + +## References + +https://huggingface.co/custeau/roberta_untrained_1eps_seed925 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-roberta_untrained_1eps_seed925_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-roberta_untrained_1eps_seed925_pipeline_en.md new file mode 100644 index 00000000000000..e4fb3645e255c3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-roberta_untrained_1eps_seed925_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_untrained_1eps_seed925_pipeline pipeline RoBertaForSequenceClassification from custeau +author: John Snow Labs +name: roberta_untrained_1eps_seed925_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_untrained_1eps_seed925_pipeline` is a English model originally trained by custeau. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_untrained_1eps_seed925_pipeline_en_5.5.0_3.0_1726971631010.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_untrained_1eps_seed925_pipeline_en_5.5.0_3.0_1726971631010.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_untrained_1eps_seed925_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_untrained_1eps_seed925_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_untrained_1eps_seed925_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|447.8 MB| + +## References + +https://huggingface.co/custeau/roberta_untrained_1eps_seed925 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-robertacnnrnnfnntransformer2_en.md b/docs/_posts/ahmedlone127/2024-09-22-robertacnnrnnfnntransformer2_en.md new file mode 100644 index 00000000000000..41aa44d6860398 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-robertacnnrnnfnntransformer2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English robertacnnrnnfnntransformer2 RoBertaEmbeddings from Mukundhan32 +author: John Snow Labs +name: robertacnnrnnfnntransformer2 +date: 2024-09-22 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`robertacnnrnnfnntransformer2` is a English model originally trained by Mukundhan32. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/robertacnnrnnfnntransformer2_en_5.5.0_3.0_1726999495285.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/robertacnnrnnfnntransformer2_en_5.5.0_3.0_1726999495285.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("robertacnnrnnfnntransformer2","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("robertacnnrnnfnntransformer2","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|robertacnnrnnfnntransformer2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|446.9 MB| + +## References + +https://huggingface.co/Mukundhan32/RobertaCnnRnnFnnTransformer2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-robertacnnrnnfnntransformer2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-robertacnnrnnfnntransformer2_pipeline_en.md new file mode 100644 index 00000000000000..002e834f772db0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-robertacnnrnnfnntransformer2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English robertacnnrnnfnntransformer2_pipeline pipeline RoBertaEmbeddings from Mukundhan32 +author: John Snow Labs +name: robertacnnrnnfnntransformer2_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`robertacnnrnnfnntransformer2_pipeline` is a English model originally trained by Mukundhan32. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/robertacnnrnnfnntransformer2_pipeline_en_5.5.0_3.0_1726999521851.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/robertacnnrnnfnntransformer2_pipeline_en_5.5.0_3.0_1726999521851.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("robertacnnrnnfnntransformer2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("robertacnnrnnfnntransformer2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|robertacnnrnnfnntransformer2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|446.9 MB| + +## References + +https://huggingface.co/Mukundhan32/RobertaCnnRnnFnnTransformer2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-robertalex_ptbr_ulyssesner_en.md b/docs/_posts/ahmedlone127/2024-09-22-robertalex_ptbr_ulyssesner_en.md new file mode 100644 index 00000000000000..1f8d0222d54486 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-robertalex_ptbr_ulyssesner_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English robertalex_ptbr_ulyssesner RoBertaForTokenClassification from giliardgodoi +author: John Snow Labs +name: robertalex_ptbr_ulyssesner +date: 2024-09-22 +tags: [en, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`robertalex_ptbr_ulyssesner` is a English model originally trained by giliardgodoi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/robertalex_ptbr_ulyssesner_en_5.5.0_3.0_1727048953872.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/robertalex_ptbr_ulyssesner_en_5.5.0_3.0_1727048953872.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("robertalex_ptbr_ulyssesner","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("robertalex_ptbr_ulyssesner", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|robertalex_ptbr_ulyssesner| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|434.5 MB| + +## References + +https://huggingface.co/giliardgodoi/robertalex-ptbr-ulyssesner \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-robertalex_ptbr_ulyssesner_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-robertalex_ptbr_ulyssesner_pipeline_en.md new file mode 100644 index 00000000000000..8c7bf6d77fe8f0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-robertalex_ptbr_ulyssesner_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English robertalex_ptbr_ulyssesner_pipeline pipeline RoBertaForTokenClassification from giliardgodoi +author: John Snow Labs +name: robertalex_ptbr_ulyssesner_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`robertalex_ptbr_ulyssesner_pipeline` is a English model originally trained by giliardgodoi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/robertalex_ptbr_ulyssesner_pipeline_en_5.5.0_3.0_1727048978989.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/robertalex_ptbr_ulyssesner_pipeline_en_5.5.0_3.0_1727048978989.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("robertalex_ptbr_ulyssesner_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("robertalex_ptbr_ulyssesner_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|robertalex_ptbr_ulyssesner_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|434.5 MB| + +## References + +https://huggingface.co/giliardgodoi/robertalex-ptbr-ulyssesner + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-romansh_prima_en.md b/docs/_posts/ahmedlone127/2024-09-22-romansh_prima_en.md new file mode 100644 index 00000000000000..bc4a561889ab06 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-romansh_prima_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English romansh_prima RoBertaForSequenceClassification from davidgaofc +author: John Snow Labs +name: romansh_prima +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`romansh_prima` is a English model originally trained by davidgaofc. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/romansh_prima_en_5.5.0_3.0_1727037129621.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/romansh_prima_en_5.5.0_3.0_1727037129621.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("romansh_prima","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("romansh_prima", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|romansh_prima| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|309.0 MB| + +## References + +https://huggingface.co/davidgaofc/RM_prima \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-rubert_finetuned_ner_dondosss_en.md b/docs/_posts/ahmedlone127/2024-09-22-rubert_finetuned_ner_dondosss_en.md new file mode 100644 index 00000000000000..022f7a32f09149 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-rubert_finetuned_ner_dondosss_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English rubert_finetuned_ner_dondosss BertForTokenClassification from dondosss +author: John Snow Labs +name: rubert_finetuned_ner_dondosss +date: 2024-09-22 +tags: [en, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`rubert_finetuned_ner_dondosss` is a English model originally trained by dondosss. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/rubert_finetuned_ner_dondosss_en_5.5.0_3.0_1726977944951.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/rubert_finetuned_ner_dondosss_en_5.5.0_3.0_1726977944951.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("rubert_finetuned_ner_dondosss","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("rubert_finetuned_ner_dondosss", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|rubert_finetuned_ner_dondosss| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|664.3 MB| + +## References + +https://huggingface.co/dondosss/rubert-finetuned-ner \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-rubert_finetuned_ner_dondosss_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-rubert_finetuned_ner_dondosss_pipeline_en.md new file mode 100644 index 00000000000000..916dd7dd6e5a14 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-rubert_finetuned_ner_dondosss_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English rubert_finetuned_ner_dondosss_pipeline pipeline BertForTokenClassification from dondosss +author: John Snow Labs +name: rubert_finetuned_ner_dondosss_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`rubert_finetuned_ner_dondosss_pipeline` is a English model originally trained by dondosss. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/rubert_finetuned_ner_dondosss_pipeline_en_5.5.0_3.0_1726977973677.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/rubert_finetuned_ner_dondosss_pipeline_en_5.5.0_3.0_1726977973677.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("rubert_finetuned_ner_dondosss_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("rubert_finetuned_ner_dondosss_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|rubert_finetuned_ner_dondosss_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|664.3 MB| + +## References + +https://huggingface.co/dondosss/rubert-finetuned-ner + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-scenario_non_kd_po_copy_cdf_english_d2_data_english_cardiff_eng_only_beta_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-scenario_non_kd_po_copy_cdf_english_d2_data_english_cardiff_eng_only_beta_pipeline_en.md new file mode 100644 index 00000000000000..81780bc6c6de8f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-scenario_non_kd_po_copy_cdf_english_d2_data_english_cardiff_eng_only_beta_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English scenario_non_kd_po_copy_cdf_english_d2_data_english_cardiff_eng_only_beta_pipeline pipeline XlmRoBertaForSequenceClassification from haryoaw +author: John Snow Labs +name: scenario_non_kd_po_copy_cdf_english_d2_data_english_cardiff_eng_only_beta_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`scenario_non_kd_po_copy_cdf_english_d2_data_english_cardiff_eng_only_beta_pipeline` is a English model originally trained by haryoaw. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/scenario_non_kd_po_copy_cdf_english_d2_data_english_cardiff_eng_only_beta_pipeline_en_5.5.0_3.0_1727009653857.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/scenario_non_kd_po_copy_cdf_english_d2_data_english_cardiff_eng_only_beta_pipeline_en_5.5.0_3.0_1727009653857.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("scenario_non_kd_po_copy_cdf_english_d2_data_english_cardiff_eng_only_beta_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("scenario_non_kd_po_copy_cdf_english_d2_data_english_cardiff_eng_only_beta_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|scenario_non_kd_po_copy_cdf_english_d2_data_english_cardiff_eng_only_beta_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|672.4 MB| + +## References + +https://huggingface.co/haryoaw/scenario-NON-KD-PO-COPY-CDF-EN-D2_data-en-cardiff_eng_only_beta + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-scinertopic_en.md b/docs/_posts/ahmedlone127/2024-09-22-scinertopic_en.md new file mode 100644 index 00000000000000..332c6f8ecdd059 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-scinertopic_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English scinertopic BertForTokenClassification from RJuro +author: John Snow Labs +name: scinertopic +date: 2024-09-22 +tags: [en, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`scinertopic` is a English model originally trained by RJuro. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/scinertopic_en_5.5.0_3.0_1726974698673.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/scinertopic_en_5.5.0_3.0_1726974698673.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("scinertopic","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("scinertopic", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|scinertopic| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|410.0 MB| + +## References + +https://huggingface.co/RJuro/SciNERTopic \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-scinertopic_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-scinertopic_pipeline_en.md new file mode 100644 index 00000000000000..4195270f51b205 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-scinertopic_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English scinertopic_pipeline pipeline BertForTokenClassification from RJuro +author: John Snow Labs +name: scinertopic_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`scinertopic_pipeline` is a English model originally trained by RJuro. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/scinertopic_pipeline_en_5.5.0_3.0_1726974716669.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/scinertopic_pipeline_en_5.5.0_3.0_1726974716669.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("scinertopic_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("scinertopic_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|scinertopic_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|410.0 MB| + +## References + +https://huggingface.co/RJuro/SciNERTopic + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sent_arabertmo_base_v6_ar.md b/docs/_posts/ahmedlone127/2024-09-22-sent_arabertmo_base_v6_ar.md new file mode 100644 index 00000000000000..8664848875c7b0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sent_arabertmo_base_v6_ar.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Arabic sent_arabertmo_base_v6 BertSentenceEmbeddings from Ebtihal +author: John Snow Labs +name: sent_arabertmo_base_v6 +date: 2024-09-22 +tags: [ar, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: ar +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_arabertmo_base_v6` is a Arabic model originally trained by Ebtihal. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_arabertmo_base_v6_ar_5.5.0_3.0_1727044755571.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_arabertmo_base_v6_ar_5.5.0_3.0_1727044755571.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_arabertmo_base_v6","ar") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_arabertmo_base_v6","ar") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_arabertmo_base_v6| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|ar| +|Size:|407.9 MB| + +## References + +https://huggingface.co/Ebtihal/AraBertMo_base_V6 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sent_arabertmo_base_v6_pipeline_ar.md b/docs/_posts/ahmedlone127/2024-09-22-sent_arabertmo_base_v6_pipeline_ar.md new file mode 100644 index 00000000000000..67b1f407965509 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sent_arabertmo_base_v6_pipeline_ar.md @@ -0,0 +1,71 @@ +--- +layout: model +title: Arabic sent_arabertmo_base_v6_pipeline pipeline BertSentenceEmbeddings from Ebtihal +author: John Snow Labs +name: sent_arabertmo_base_v6_pipeline +date: 2024-09-22 +tags: [ar, open_source, pipeline, onnx] +task: Embeddings +language: ar +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_arabertmo_base_v6_pipeline` is a Arabic model originally trained by Ebtihal. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_arabertmo_base_v6_pipeline_ar_5.5.0_3.0_1727044776904.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_arabertmo_base_v6_pipeline_ar_5.5.0_3.0_1727044776904.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_arabertmo_base_v6_pipeline", lang = "ar") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_arabertmo_base_v6_pipeline", lang = "ar") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_arabertmo_base_v6_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|ar| +|Size:|408.4 MB| + +## References + +https://huggingface.co/Ebtihal/AraBertMo_base_V6 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sent_astro_hep_bert_en.md b/docs/_posts/ahmedlone127/2024-09-22-sent_astro_hep_bert_en.md new file mode 100644 index 00000000000000..474e149420dee4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sent_astro_hep_bert_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_astro_hep_bert BertSentenceEmbeddings from arnosimons +author: John Snow Labs +name: sent_astro_hep_bert +date: 2024-09-22 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_astro_hep_bert` is a English model originally trained by arnosimons. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_astro_hep_bert_en_5.5.0_3.0_1726964678554.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_astro_hep_bert_en_5.5.0_3.0_1726964678554.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_astro_hep_bert","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_astro_hep_bert","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_astro_hep_bert| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|404.1 MB| + +## References + +https://huggingface.co/arnosimons/astro-hep-bert \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sent_astro_hep_bert_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-sent_astro_hep_bert_pipeline_en.md new file mode 100644 index 00000000000000..84dcee88b8122f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sent_astro_hep_bert_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_astro_hep_bert_pipeline pipeline BertSentenceEmbeddings from arnosimons +author: John Snow Labs +name: sent_astro_hep_bert_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_astro_hep_bert_pipeline` is a English model originally trained by arnosimons. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_astro_hep_bert_pipeline_en_5.5.0_3.0_1726964696982.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_astro_hep_bert_pipeline_en_5.5.0_3.0_1726964696982.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_astro_hep_bert_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_astro_hep_bert_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_astro_hep_bert_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|404.6 MB| + +## References + +https://huggingface.co/arnosimons/astro-hep-bert + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sent_banglabert_generator_bn.md b/docs/_posts/ahmedlone127/2024-09-22-sent_banglabert_generator_bn.md new file mode 100644 index 00000000000000..04a64d133664a4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sent_banglabert_generator_bn.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Bengali sent_banglabert_generator BertSentenceEmbeddings from csebuetnlp +author: John Snow Labs +name: sent_banglabert_generator +date: 2024-09-22 +tags: [bn, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: bn +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_banglabert_generator` is a Bengali model originally trained by csebuetnlp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_banglabert_generator_bn_5.5.0_3.0_1727004429749.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_banglabert_generator_bn_5.5.0_3.0_1727004429749.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_banglabert_generator","bn") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_banglabert_generator","bn") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_banglabert_generator| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|bn| +|Size:|130.0 MB| + +## References + +https://huggingface.co/csebuetnlp/banglabert_generator \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sent_banglabert_generator_pipeline_bn.md b/docs/_posts/ahmedlone127/2024-09-22-sent_banglabert_generator_pipeline_bn.md new file mode 100644 index 00000000000000..db7e4edae30845 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sent_banglabert_generator_pipeline_bn.md @@ -0,0 +1,71 @@ +--- +layout: model +title: Bengali sent_banglabert_generator_pipeline pipeline BertSentenceEmbeddings from csebuetnlp +author: John Snow Labs +name: sent_banglabert_generator_pipeline +date: 2024-09-22 +tags: [bn, open_source, pipeline, onnx] +task: Embeddings +language: bn +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_banglabert_generator_pipeline` is a Bengali model originally trained by csebuetnlp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_banglabert_generator_pipeline_bn_5.5.0_3.0_1727004435762.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_banglabert_generator_pipeline_bn_5.5.0_3.0_1727004435762.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_banglabert_generator_pipeline", lang = "bn") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_banglabert_generator_pipeline", lang = "bn") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_banglabert_generator_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|bn| +|Size:|130.5 MB| + +## References + +https://huggingface.co/csebuetnlp/banglabert_generator + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sent_bert_base_cased_model_attribution_challenge_en.md b/docs/_posts/ahmedlone127/2024-09-22-sent_bert_base_cased_model_attribution_challenge_en.md new file mode 100644 index 00000000000000..c62ec0ee9ed905 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sent_bert_base_cased_model_attribution_challenge_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_base_cased_model_attribution_challenge BertSentenceEmbeddings from model-attribution-challenge +author: John Snow Labs +name: sent_bert_base_cased_model_attribution_challenge +date: 2024-09-22 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_cased_model_attribution_challenge` is a English model originally trained by model-attribution-challenge. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_cased_model_attribution_challenge_en_5.5.0_3.0_1727004091661.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_cased_model_attribution_challenge_en_5.5.0_3.0_1727004091661.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_cased_model_attribution_challenge","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_cased_model_attribution_challenge","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_cased_model_attribution_challenge| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|403.6 MB| + +## References + +https://huggingface.co/model-attribution-challenge/bert-base-cased \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sent_bert_base_cased_model_attribution_challenge_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-sent_bert_base_cased_model_attribution_challenge_pipeline_en.md new file mode 100644 index 00000000000000..b521740b6cebdc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sent_bert_base_cased_model_attribution_challenge_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_base_cased_model_attribution_challenge_pipeline pipeline BertSentenceEmbeddings from model-attribution-challenge +author: John Snow Labs +name: sent_bert_base_cased_model_attribution_challenge_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_cased_model_attribution_challenge_pipeline` is a English model originally trained by model-attribution-challenge. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_cased_model_attribution_challenge_pipeline_en_5.5.0_3.0_1727004112566.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_cased_model_attribution_challenge_pipeline_en_5.5.0_3.0_1727004112566.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_cased_model_attribution_challenge_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_cased_model_attribution_challenge_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_cased_model_attribution_challenge_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|404.2 MB| + +## References + +https://huggingface.co/model-attribution-challenge/bert-base-cased + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sent_bert_base_english_danish_cased_en.md b/docs/_posts/ahmedlone127/2024-09-22-sent_bert_base_english_danish_cased_en.md new file mode 100644 index 00000000000000..2f3708719730ba --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sent_bert_base_english_danish_cased_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_base_english_danish_cased BertSentenceEmbeddings from Geotrend +author: John Snow Labs +name: sent_bert_base_english_danish_cased +date: 2024-09-22 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_english_danish_cased` is a English model originally trained by Geotrend. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_english_danish_cased_en_5.5.0_3.0_1727044582855.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_english_danish_cased_en_5.5.0_3.0_1727044582855.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_english_danish_cased","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_english_danish_cased","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_english_danish_cased| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|414.6 MB| + +## References + +https://huggingface.co/Geotrend/bert-base-en-da-cased \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sent_bert_base_english_danish_cased_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-sent_bert_base_english_danish_cased_pipeline_en.md new file mode 100644 index 00000000000000..47515528d66cd1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sent_bert_base_english_danish_cased_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_base_english_danish_cased_pipeline pipeline BertSentenceEmbeddings from Geotrend +author: John Snow Labs +name: sent_bert_base_english_danish_cased_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_english_danish_cased_pipeline` is a English model originally trained by Geotrend. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_english_danish_cased_pipeline_en_5.5.0_3.0_1727044604106.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_english_danish_cased_pipeline_en_5.5.0_3.0_1727044604106.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_english_danish_cased_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_english_danish_cased_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_english_danish_cased_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|415.1 MB| + +## References + +https://huggingface.co/Geotrend/bert-base-en-da-cased + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sent_bert_base_english_italian_cased_en.md b/docs/_posts/ahmedlone127/2024-09-22-sent_bert_base_english_italian_cased_en.md new file mode 100644 index 00000000000000..6c7085501d9fb6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sent_bert_base_english_italian_cased_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_base_english_italian_cased BertSentenceEmbeddings from Geotrend +author: John Snow Labs +name: sent_bert_base_english_italian_cased +date: 2024-09-22 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_english_italian_cased` is a English model originally trained by Geotrend. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_english_italian_cased_en_5.5.0_3.0_1727047180968.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_english_italian_cased_en_5.5.0_3.0_1727047180968.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_english_italian_cased","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_english_italian_cased","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_english_italian_cased| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|417.9 MB| + +## References + +https://huggingface.co/Geotrend/bert-base-en-it-cased \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sent_bert_base_english_italian_cased_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-sent_bert_base_english_italian_cased_pipeline_en.md new file mode 100644 index 00000000000000..7529d98ac19cd1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sent_bert_base_english_italian_cased_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_base_english_italian_cased_pipeline pipeline BertSentenceEmbeddings from Geotrend +author: John Snow Labs +name: sent_bert_base_english_italian_cased_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_english_italian_cased_pipeline` is a English model originally trained by Geotrend. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_english_italian_cased_pipeline_en_5.5.0_3.0_1727047201097.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_english_italian_cased_pipeline_en_5.5.0_3.0_1727047201097.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_english_italian_cased_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_english_italian_cased_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_english_italian_cased_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|418.5 MB| + +## References + +https://huggingface.co/Geotrend/bert-base-en-it-cased + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sent_bert_base_english_thai_cased_en.md b/docs/_posts/ahmedlone127/2024-09-22-sent_bert_base_english_thai_cased_en.md new file mode 100644 index 00000000000000..77b9ffa69efaec --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sent_bert_base_english_thai_cased_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_base_english_thai_cased BertSentenceEmbeddings from Geotrend +author: John Snow Labs +name: sent_bert_base_english_thai_cased +date: 2024-09-22 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_english_thai_cased` is a English model originally trained by Geotrend. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_english_thai_cased_en_5.5.0_3.0_1727004279720.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_english_thai_cased_en_5.5.0_3.0_1727004279720.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_english_thai_cased","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_english_thai_cased","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_english_thai_cased| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|404.3 MB| + +## References + +https://huggingface.co/Geotrend/bert-base-en-th-cased \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sent_bert_base_english_thai_cased_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-sent_bert_base_english_thai_cased_pipeline_en.md new file mode 100644 index 00000000000000..0405f32b787682 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sent_bert_base_english_thai_cased_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_base_english_thai_cased_pipeline pipeline BertSentenceEmbeddings from Geotrend +author: John Snow Labs +name: sent_bert_base_english_thai_cased_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_english_thai_cased_pipeline` is a English model originally trained by Geotrend. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_english_thai_cased_pipeline_en_5.5.0_3.0_1727004297490.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_english_thai_cased_pipeline_en_5.5.0_3.0_1727004297490.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_english_thai_cased_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_english_thai_cased_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_english_thai_cased_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|404.8 MB| + +## References + +https://huggingface.co/Geotrend/bert-base-en-th-cased + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sent_bert_base_finnish_europeana_cased_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-sent_bert_base_finnish_europeana_cased_pipeline_en.md new file mode 100644 index 00000000000000..ef1938c0260ae6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sent_bert_base_finnish_europeana_cased_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_base_finnish_europeana_cased_pipeline pipeline BertSentenceEmbeddings from dbmdz +author: John Snow Labs +name: sent_bert_base_finnish_europeana_cased_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_finnish_europeana_cased_pipeline` is a English model originally trained by dbmdz. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_finnish_europeana_cased_pipeline_en_5.5.0_3.0_1727013611484.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_finnish_europeana_cased_pipeline_en_5.5.0_3.0_1727013611484.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_finnish_europeana_cased_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_finnish_europeana_cased_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_finnish_europeana_cased_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|411.9 MB| + +## References + +https://huggingface.co/dbmdz/bert-base-finnish-europeana-cased + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sent_bert_base_german_cased_finetuned_swiss_de.md b/docs/_posts/ahmedlone127/2024-09-22-sent_bert_base_german_cased_finetuned_swiss_de.md new file mode 100644 index 00000000000000..4d024e245f4387 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sent_bert_base_german_cased_finetuned_swiss_de.md @@ -0,0 +1,94 @@ +--- +layout: model +title: German sent_bert_base_german_cased_finetuned_swiss BertSentenceEmbeddings from statworx +author: John Snow Labs +name: sent_bert_base_german_cased_finetuned_swiss +date: 2024-09-22 +tags: [de, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: de +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_german_cased_finetuned_swiss` is a German model originally trained by statworx. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_german_cased_finetuned_swiss_de_5.5.0_3.0_1727047043482.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_german_cased_finetuned_swiss_de_5.5.0_3.0_1727047043482.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_german_cased_finetuned_swiss","de") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_german_cased_finetuned_swiss","de") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_german_cased_finetuned_swiss| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|de| +|Size:|406.9 MB| + +## References + +https://huggingface.co/statworx/bert-base-german-cased-finetuned-swiss \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sent_bert_base_german_cased_finetuned_swiss_pipeline_de.md b/docs/_posts/ahmedlone127/2024-09-22-sent_bert_base_german_cased_finetuned_swiss_pipeline_de.md new file mode 100644 index 00000000000000..d87e16d4b4a069 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sent_bert_base_german_cased_finetuned_swiss_pipeline_de.md @@ -0,0 +1,71 @@ +--- +layout: model +title: German sent_bert_base_german_cased_finetuned_swiss_pipeline pipeline BertSentenceEmbeddings from statworx +author: John Snow Labs +name: sent_bert_base_german_cased_finetuned_swiss_pipeline +date: 2024-09-22 +tags: [de, open_source, pipeline, onnx] +task: Embeddings +language: de +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_german_cased_finetuned_swiss_pipeline` is a German model originally trained by statworx. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_german_cased_finetuned_swiss_pipeline_de_5.5.0_3.0_1727047063507.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_german_cased_finetuned_swiss_pipeline_de_5.5.0_3.0_1727047063507.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_german_cased_finetuned_swiss_pipeline", lang = "de") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_german_cased_finetuned_swiss_pipeline", lang = "de") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_german_cased_finetuned_swiss_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|de| +|Size:|407.4 MB| + +## References + +https://huggingface.co/statworx/bert-base-german-cased-finetuned-swiss + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sent_bert_base_german_europeana_td_cased_en.md b/docs/_posts/ahmedlone127/2024-09-22-sent_bert_base_german_europeana_td_cased_en.md new file mode 100644 index 00000000000000..f60dee84235952 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sent_bert_base_german_europeana_td_cased_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_base_german_europeana_td_cased BertSentenceEmbeddings from dbmdz +author: John Snow Labs +name: sent_bert_base_german_europeana_td_cased +date: 2024-09-22 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_german_europeana_td_cased` is a English model originally trained by dbmdz. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_german_europeana_td_cased_en_5.5.0_3.0_1727013463916.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_german_europeana_td_cased_en_5.5.0_3.0_1727013463916.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_german_europeana_td_cased","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_german_europeana_td_cased","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_german_europeana_td_cased| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|412.4 MB| + +## References + +https://huggingface.co/dbmdz/bert-base-german-europeana-td-cased \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sent_bert_base_german_europeana_td_cased_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-sent_bert_base_german_europeana_td_cased_pipeline_en.md new file mode 100644 index 00000000000000..5dd134da0afeb5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sent_bert_base_german_europeana_td_cased_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_base_german_europeana_td_cased_pipeline pipeline BertSentenceEmbeddings from dbmdz +author: John Snow Labs +name: sent_bert_base_german_europeana_td_cased_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_german_europeana_td_cased_pipeline` is a English model originally trained by dbmdz. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_german_europeana_td_cased_pipeline_en_5.5.0_3.0_1727013483644.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_german_europeana_td_cased_pipeline_en_5.5.0_3.0_1727013483644.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_german_europeana_td_cased_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_german_europeana_td_cased_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_german_europeana_td_cased_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|412.9 MB| + +## References + +https://huggingface.co/dbmdz/bert-base-german-europeana-td-cased + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sent_bert_base_multilingual_cased_based_encoder_pipeline_xx.md b/docs/_posts/ahmedlone127/2024-09-22-sent_bert_base_multilingual_cased_based_encoder_pipeline_xx.md new file mode 100644 index 00000000000000..cb7d73d54242ca --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sent_bert_base_multilingual_cased_based_encoder_pipeline_xx.md @@ -0,0 +1,71 @@ +--- +layout: model +title: Multilingual sent_bert_base_multilingual_cased_based_encoder_pipeline pipeline BertSentenceEmbeddings from shsha0110 +author: John Snow Labs +name: sent_bert_base_multilingual_cased_based_encoder_pipeline +date: 2024-09-22 +tags: [xx, open_source, pipeline, onnx] +task: Embeddings +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_multilingual_cased_based_encoder_pipeline` is a Multilingual model originally trained by shsha0110. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_multilingual_cased_based_encoder_pipeline_xx_5.5.0_3.0_1727001937322.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_multilingual_cased_based_encoder_pipeline_xx_5.5.0_3.0_1727001937322.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_multilingual_cased_based_encoder_pipeline", lang = "xx") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_multilingual_cased_based_encoder_pipeline", lang = "xx") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_multilingual_cased_based_encoder_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|xx| +|Size:|665.4 MB| + +## References + +https://huggingface.co/shsha0110/bert-base-multilingual-cased-based-encoder + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sent_bert_base_multilingual_cased_finetuned_kinyarwanda_pipeline_xx.md b/docs/_posts/ahmedlone127/2024-09-22-sent_bert_base_multilingual_cased_finetuned_kinyarwanda_pipeline_xx.md new file mode 100644 index 00000000000000..2f42ea7be2cb83 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sent_bert_base_multilingual_cased_finetuned_kinyarwanda_pipeline_xx.md @@ -0,0 +1,71 @@ +--- +layout: model +title: Multilingual sent_bert_base_multilingual_cased_finetuned_kinyarwanda_pipeline pipeline BertSentenceEmbeddings from Davlan +author: John Snow Labs +name: sent_bert_base_multilingual_cased_finetuned_kinyarwanda_pipeline +date: 2024-09-22 +tags: [xx, open_source, pipeline, onnx] +task: Embeddings +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_multilingual_cased_finetuned_kinyarwanda_pipeline` is a Multilingual model originally trained by Davlan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_multilingual_cased_finetuned_kinyarwanda_pipeline_xx_5.5.0_3.0_1727001838902.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_multilingual_cased_finetuned_kinyarwanda_pipeline_xx_5.5.0_3.0_1727001838902.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_multilingual_cased_finetuned_kinyarwanda_pipeline", lang = "xx") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_multilingual_cased_finetuned_kinyarwanda_pipeline", lang = "xx") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_multilingual_cased_finetuned_kinyarwanda_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|xx| +|Size:|665.6 MB| + +## References + +https://huggingface.co/Davlan/bert-base-multilingual-cased-finetuned-kinyarwanda + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sent_bert_base_multilingual_cased_finetuned_kinyarwanda_xx.md b/docs/_posts/ahmedlone127/2024-09-22-sent_bert_base_multilingual_cased_finetuned_kinyarwanda_xx.md new file mode 100644 index 00000000000000..e468c521b4f489 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sent_bert_base_multilingual_cased_finetuned_kinyarwanda_xx.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Multilingual sent_bert_base_multilingual_cased_finetuned_kinyarwanda BertSentenceEmbeddings from Davlan +author: John Snow Labs +name: sent_bert_base_multilingual_cased_finetuned_kinyarwanda +date: 2024-09-22 +tags: [xx, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_multilingual_cased_finetuned_kinyarwanda` is a Multilingual model originally trained by Davlan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_multilingual_cased_finetuned_kinyarwanda_xx_5.5.0_3.0_1727001810047.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_multilingual_cased_finetuned_kinyarwanda_xx_5.5.0_3.0_1727001810047.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_multilingual_cased_finetuned_kinyarwanda","xx") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_multilingual_cased_finetuned_kinyarwanda","xx") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_multilingual_cased_finetuned_kinyarwanda| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|xx| +|Size:|665.0 MB| + +## References + +https://huggingface.co/Davlan/bert-base-multilingual-cased-finetuned-kinyarwanda \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sent_bert_base_multilingual_cased_finetuned_swahili_pipeline_xx.md b/docs/_posts/ahmedlone127/2024-09-22-sent_bert_base_multilingual_cased_finetuned_swahili_pipeline_xx.md new file mode 100644 index 00000000000000..fba8ba76e20848 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sent_bert_base_multilingual_cased_finetuned_swahili_pipeline_xx.md @@ -0,0 +1,71 @@ +--- +layout: model +title: Multilingual sent_bert_base_multilingual_cased_finetuned_swahili_pipeline pipeline BertSentenceEmbeddings from Davlan +author: John Snow Labs +name: sent_bert_base_multilingual_cased_finetuned_swahili_pipeline +date: 2024-09-22 +tags: [xx, open_source, pipeline, onnx] +task: Embeddings +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_multilingual_cased_finetuned_swahili_pipeline` is a Multilingual model originally trained by Davlan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_multilingual_cased_finetuned_swahili_pipeline_xx_5.5.0_3.0_1727001928114.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_multilingual_cased_finetuned_swahili_pipeline_xx_5.5.0_3.0_1727001928114.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_multilingual_cased_finetuned_swahili_pipeline", lang = "xx") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_multilingual_cased_finetuned_swahili_pipeline", lang = "xx") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_multilingual_cased_finetuned_swahili_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|xx| +|Size:|664.7 MB| + +## References + +https://huggingface.co/Davlan/bert-base-multilingual-cased-finetuned-swahili + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sent_bert_base_nli_en.md b/docs/_posts/ahmedlone127/2024-09-22-sent_bert_base_nli_en.md new file mode 100644 index 00000000000000..a7c9c7eefd92db --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sent_bert_base_nli_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_base_nli BertSentenceEmbeddings from binwang +author: John Snow Labs +name: sent_bert_base_nli +date: 2024-09-22 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_nli` is a English model originally trained by binwang. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_nli_en_5.5.0_3.0_1726965199929.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_nli_en_5.5.0_3.0_1726965199929.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_nli","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_nli","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_nli| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/binwang/bert-base-nli \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sent_bert_base_nli_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-sent_bert_base_nli_pipeline_en.md new file mode 100644 index 00000000000000..08d0f979748325 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sent_bert_base_nli_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_base_nli_pipeline pipeline BertSentenceEmbeddings from binwang +author: John Snow Labs +name: sent_bert_base_nli_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_nli_pipeline` is a English model originally trained by binwang. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_nli_pipeline_en_5.5.0_3.0_1726965217957.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_nli_pipeline_en_5.5.0_3.0_1726965217957.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_nli_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_nli_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_nli_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.7 MB| + +## References + +https://huggingface.co/binwang/bert-base-nli + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sent_bert_base_uncased_1802_en.md b/docs/_posts/ahmedlone127/2024-09-22-sent_bert_base_uncased_1802_en.md new file mode 100644 index 00000000000000..fcb76cb3520a72 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sent_bert_base_uncased_1802_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_base_uncased_1802 BertSentenceEmbeddings from JamesKim +author: John Snow Labs +name: sent_bert_base_uncased_1802 +date: 2024-09-22 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_uncased_1802` is a English model originally trained by JamesKim. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_1802_en_5.5.0_3.0_1727046935475.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_1802_en_5.5.0_3.0_1727046935475.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_uncased_1802","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_uncased_1802","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_uncased_1802| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|407.1 MB| + +## References + +https://huggingface.co/JamesKim/bert-base-uncased_1802 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sent_bert_base_uncased_1802_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-sent_bert_base_uncased_1802_pipeline_en.md new file mode 100644 index 00000000000000..957efba71159c1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sent_bert_base_uncased_1802_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_base_uncased_1802_pipeline pipeline BertSentenceEmbeddings from JamesKim +author: John Snow Labs +name: sent_bert_base_uncased_1802_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_uncased_1802_pipeline` is a English model originally trained by JamesKim. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_1802_pipeline_en_5.5.0_3.0_1727046955301.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_1802_pipeline_en_5.5.0_3.0_1727046955301.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_uncased_1802_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_uncased_1802_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_uncased_1802_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.7 MB| + +## References + +https://huggingface.co/JamesKim/bert-base-uncased_1802 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sent_bert_base_uncased_dstc9_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-sent_bert_base_uncased_dstc9_pipeline_en.md new file mode 100644 index 00000000000000..eaf1236e33a07c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sent_bert_base_uncased_dstc9_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_base_uncased_dstc9_pipeline pipeline BertSentenceEmbeddings from wilsontam +author: John Snow Labs +name: sent_bert_base_uncased_dstc9_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_uncased_dstc9_pipeline` is a English model originally trained by wilsontam. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_dstc9_pipeline_en_5.5.0_3.0_1727004514773.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_dstc9_pipeline_en_5.5.0_3.0_1727004514773.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_uncased_dstc9_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_uncased_dstc9_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_uncased_dstc9_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.7 MB| + +## References + +https://huggingface.co/wilsontam/bert-base-uncased-dstc9 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sent_bert_base_uncased_finetuned_bible_en.md b/docs/_posts/ahmedlone127/2024-09-22-sent_bert_base_uncased_finetuned_bible_en.md new file mode 100644 index 00000000000000..06980b4b50c23a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sent_bert_base_uncased_finetuned_bible_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_base_uncased_finetuned_bible BertSentenceEmbeddings from Pragash-Mohanarajah +author: John Snow Labs +name: sent_bert_base_uncased_finetuned_bible +date: 2024-09-22 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_uncased_finetuned_bible` is a English model originally trained by Pragash-Mohanarajah. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_finetuned_bible_en_5.5.0_3.0_1727001517315.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_finetuned_bible_en_5.5.0_3.0_1727001517315.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_uncased_finetuned_bible","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_uncased_finetuned_bible","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_uncased_finetuned_bible| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|407.1 MB| + +## References + +https://huggingface.co/Pragash-Mohanarajah/bert-base-uncased-finetuned-bible \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sent_bert_base_uncased_finetuned_bible_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-sent_bert_base_uncased_finetuned_bible_pipeline_en.md new file mode 100644 index 00000000000000..d7fe75ea8923cb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sent_bert_base_uncased_finetuned_bible_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_base_uncased_finetuned_bible_pipeline pipeline BertSentenceEmbeddings from Pragash-Mohanarajah +author: John Snow Labs +name: sent_bert_base_uncased_finetuned_bible_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_uncased_finetuned_bible_pipeline` is a English model originally trained by Pragash-Mohanarajah. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_finetuned_bible_pipeline_en_5.5.0_3.0_1727001535467.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_finetuned_bible_pipeline_en_5.5.0_3.0_1727001535467.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_uncased_finetuned_bible_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_uncased_finetuned_bible_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_uncased_finetuned_bible_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.7 MB| + +## References + +https://huggingface.co/Pragash-Mohanarajah/bert-base-uncased-finetuned-bible + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sent_bert_base_uncased_finetuned_news_2010_2015_en.md b/docs/_posts/ahmedlone127/2024-09-22-sent_bert_base_uncased_finetuned_news_2010_2015_en.md new file mode 100644 index 00000000000000..a7601c31b3e440 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sent_bert_base_uncased_finetuned_news_2010_2015_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_base_uncased_finetuned_news_2010_2015 BertSentenceEmbeddings from sally9805 +author: John Snow Labs +name: sent_bert_base_uncased_finetuned_news_2010_2015 +date: 2024-09-22 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_uncased_finetuned_news_2010_2015` is a English model originally trained by sally9805. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_finetuned_news_2010_2015_en_5.5.0_3.0_1726965225645.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_finetuned_news_2010_2015_en_5.5.0_3.0_1726965225645.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_uncased_finetuned_news_2010_2015","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_uncased_finetuned_news_2010_2015","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_uncased_finetuned_news_2010_2015| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/sally9805/bert-base-uncased-finetuned-news-2010-2015 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sent_bert_base_uncased_finetuned_news_2010_2015_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-sent_bert_base_uncased_finetuned_news_2010_2015_pipeline_en.md new file mode 100644 index 00000000000000..285de6232b6d1e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sent_bert_base_uncased_finetuned_news_2010_2015_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_base_uncased_finetuned_news_2010_2015_pipeline pipeline BertSentenceEmbeddings from sally9805 +author: John Snow Labs +name: sent_bert_base_uncased_finetuned_news_2010_2015_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_uncased_finetuned_news_2010_2015_pipeline` is a English model originally trained by sally9805. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_finetuned_news_2010_2015_pipeline_en_5.5.0_3.0_1726965244154.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_finetuned_news_2010_2015_pipeline_en_5.5.0_3.0_1726965244154.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_uncased_finetuned_news_2010_2015_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_uncased_finetuned_news_2010_2015_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_uncased_finetuned_news_2010_2015_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.7 MB| + +## References + +https://huggingface.co/sally9805/bert-base-uncased-finetuned-news-2010-2015 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sent_bert_base_uncased_finetuned_wikitext_en.md b/docs/_posts/ahmedlone127/2024-09-22-sent_bert_base_uncased_finetuned_wikitext_en.md new file mode 100644 index 00000000000000..b69fd87af36c00 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sent_bert_base_uncased_finetuned_wikitext_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_base_uncased_finetuned_wikitext BertSentenceEmbeddings from peteryushunli +author: John Snow Labs +name: sent_bert_base_uncased_finetuned_wikitext +date: 2024-09-22 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_uncased_finetuned_wikitext` is a English model originally trained by peteryushunli. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_finetuned_wikitext_en_5.5.0_3.0_1727047195540.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_finetuned_wikitext_en_5.5.0_3.0_1727047195540.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_uncased_finetuned_wikitext","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_uncased_finetuned_wikitext","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_uncased_finetuned_wikitext| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/peteryushunli/bert-base-uncased-finetuned-wikitext \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sent_bert_base_uncased_issues_128_anantonios9_en.md b/docs/_posts/ahmedlone127/2024-09-22-sent_bert_base_uncased_issues_128_anantonios9_en.md new file mode 100644 index 00000000000000..1a6ae168d0e289 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sent_bert_base_uncased_issues_128_anantonios9_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_base_uncased_issues_128_anantonios9 BertSentenceEmbeddings from anantonios9 +author: John Snow Labs +name: sent_bert_base_uncased_issues_128_anantonios9 +date: 2024-09-22 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_uncased_issues_128_anantonios9` is a English model originally trained by anantonios9. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_issues_128_anantonios9_en_5.5.0_3.0_1726964887546.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_issues_128_anantonios9_en_5.5.0_3.0_1726964887546.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_uncased_issues_128_anantonios9","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_uncased_issues_128_anantonios9","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_uncased_issues_128_anantonios9| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|407.1 MB| + +## References + +https://huggingface.co/anantonios9/bert-base-uncased-issues-128 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sent_bert_base_uncased_issues_128_anantonios9_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-sent_bert_base_uncased_issues_128_anantonios9_pipeline_en.md new file mode 100644 index 00000000000000..128ca7da9d3e17 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sent_bert_base_uncased_issues_128_anantonios9_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_base_uncased_issues_128_anantonios9_pipeline pipeline BertSentenceEmbeddings from anantonios9 +author: John Snow Labs +name: sent_bert_base_uncased_issues_128_anantonios9_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_uncased_issues_128_anantonios9_pipeline` is a English model originally trained by anantonios9. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_issues_128_anantonios9_pipeline_en_5.5.0_3.0_1726964905859.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_issues_128_anantonios9_pipeline_en_5.5.0_3.0_1726964905859.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_uncased_issues_128_anantonios9_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_uncased_issues_128_anantonios9_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_uncased_issues_128_anantonios9_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.7 MB| + +## References + +https://huggingface.co/anantonios9/bert-base-uncased-issues-128 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sent_bert_base_uncased_issues_128_hndc_en.md b/docs/_posts/ahmedlone127/2024-09-22-sent_bert_base_uncased_issues_128_hndc_en.md new file mode 100644 index 00000000000000..dc6b8cfaf1fb72 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sent_bert_base_uncased_issues_128_hndc_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_base_uncased_issues_128_hndc BertSentenceEmbeddings from hndc +author: John Snow Labs +name: sent_bert_base_uncased_issues_128_hndc +date: 2024-09-22 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_uncased_issues_128_hndc` is a English model originally trained by hndc. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_issues_128_hndc_en_5.5.0_3.0_1727013706027.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_issues_128_hndc_en_5.5.0_3.0_1727013706027.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_uncased_issues_128_hndc","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_uncased_issues_128_hndc","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_uncased_issues_128_hndc| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|407.1 MB| + +## References + +https://huggingface.co/hndc/bert-base-uncased-issues-128 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sent_bert_base_uncased_issues_128_hndc_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-sent_bert_base_uncased_issues_128_hndc_pipeline_en.md new file mode 100644 index 00000000000000..118566c4a9b290 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sent_bert_base_uncased_issues_128_hndc_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_base_uncased_issues_128_hndc_pipeline pipeline BertSentenceEmbeddings from hndc +author: John Snow Labs +name: sent_bert_base_uncased_issues_128_hndc_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_uncased_issues_128_hndc_pipeline` is a English model originally trained by hndc. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_issues_128_hndc_pipeline_en_5.5.0_3.0_1727013726241.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_issues_128_hndc_pipeline_en_5.5.0_3.0_1727013726241.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_uncased_issues_128_hndc_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_uncased_issues_128_hndc_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_uncased_issues_128_hndc_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.7 MB| + +## References + +https://huggingface.co/hndc/bert-base-uncased-issues-128 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sent_bert_base_uncased_issues_128_transformersbook_en.md b/docs/_posts/ahmedlone127/2024-09-22-sent_bert_base_uncased_issues_128_transformersbook_en.md new file mode 100644 index 00000000000000..297032ee0aff4f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sent_bert_base_uncased_issues_128_transformersbook_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_base_uncased_issues_128_transformersbook BertSentenceEmbeddings from transformersbook +author: John Snow Labs +name: sent_bert_base_uncased_issues_128_transformersbook +date: 2024-09-22 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_uncased_issues_128_transformersbook` is a English model originally trained by transformersbook. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_issues_128_transformersbook_en_5.5.0_3.0_1727013623210.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_issues_128_transformersbook_en_5.5.0_3.0_1727013623210.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_uncased_issues_128_transformersbook","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_uncased_issues_128_transformersbook","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_uncased_issues_128_transformersbook| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|407.1 MB| + +## References + +https://huggingface.co/transformersbook/bert-base-uncased-issues-128 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sent_bert_base_uncased_issues_128_transformersbook_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-sent_bert_base_uncased_issues_128_transformersbook_pipeline_en.md new file mode 100644 index 00000000000000..71c67c9420867a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sent_bert_base_uncased_issues_128_transformersbook_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_base_uncased_issues_128_transformersbook_pipeline pipeline BertSentenceEmbeddings from transformersbook +author: John Snow Labs +name: sent_bert_base_uncased_issues_128_transformersbook_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_uncased_issues_128_transformersbook_pipeline` is a English model originally trained by transformersbook. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_issues_128_transformersbook_pipeline_en_5.5.0_3.0_1727013642163.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_issues_128_transformersbook_pipeline_en_5.5.0_3.0_1727013642163.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_uncased_issues_128_transformersbook_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_uncased_issues_128_transformersbook_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_uncased_issues_128_transformersbook_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.7 MB| + +## References + +https://huggingface.co/transformersbook/bert-base-uncased-issues-128 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sent_bert_l10_h256_uncased_en.md b/docs/_posts/ahmedlone127/2024-09-22-sent_bert_l10_h256_uncased_en.md new file mode 100644 index 00000000000000..0b960e6865657e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sent_bert_l10_h256_uncased_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_l10_h256_uncased BertSentenceEmbeddings from gaunernst +author: John Snow Labs +name: sent_bert_l10_h256_uncased +date: 2024-09-22 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_l10_h256_uncased` is a English model originally trained by gaunernst. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_l10_h256_uncased_en_5.5.0_3.0_1727004245964.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_l10_h256_uncased_en_5.5.0_3.0_1727004245964.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_l10_h256_uncased","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_l10_h256_uncased","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_l10_h256_uncased| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|59.6 MB| + +## References + +https://huggingface.co/gaunernst/bert-L10-H256-uncased \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sent_bert_small_cord19_en.md b/docs/_posts/ahmedlone127/2024-09-22-sent_bert_small_cord19_en.md new file mode 100644 index 00000000000000..4f627293a04bd3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sent_bert_small_cord19_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_small_cord19 BertSentenceEmbeddings from NeuML +author: John Snow Labs +name: sent_bert_small_cord19 +date: 2024-09-22 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_small_cord19` is a English model originally trained by NeuML. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_small_cord19_en_5.5.0_3.0_1727046962818.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_small_cord19_en_5.5.0_3.0_1727046962818.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_small_cord19","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_small_cord19","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_small_cord19| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|130.6 MB| + +## References + +https://huggingface.co/NeuML/bert-small-cord19 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sent_bert_small_cord19_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-sent_bert_small_cord19_pipeline_en.md new file mode 100644 index 00000000000000..f4eb9e1b1b9d70 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sent_bert_small_cord19_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_small_cord19_pipeline pipeline BertSentenceEmbeddings from NeuML +author: John Snow Labs +name: sent_bert_small_cord19_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_small_cord19_pipeline` is a English model originally trained by NeuML. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_small_cord19_pipeline_en_5.5.0_3.0_1727046968827.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_small_cord19_pipeline_en_5.5.0_3.0_1727046968827.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_small_cord19_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_small_cord19_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_small_cord19_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|131.1 MB| + +## References + +https://huggingface.co/NeuML/bert-small-cord19 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sent_bert_tagalog_base_uncased_en.md b/docs/_posts/ahmedlone127/2024-09-22-sent_bert_tagalog_base_uncased_en.md new file mode 100644 index 00000000000000..2c2cc2ba1ce0ff --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sent_bert_tagalog_base_uncased_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_tagalog_base_uncased BertSentenceEmbeddings from GKLMIP +author: John Snow Labs +name: sent_bert_tagalog_base_uncased +date: 2024-09-22 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_tagalog_base_uncased` is a English model originally trained by GKLMIP. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_tagalog_base_uncased_en_5.5.0_3.0_1727001705277.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_tagalog_base_uncased_en_5.5.0_3.0_1727001705277.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_tagalog_base_uncased","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_tagalog_base_uncased","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_tagalog_base_uncased| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|469.7 MB| + +## References + +https://huggingface.co/GKLMIP/bert-tagalog-base-uncased \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sent_bert_tagalog_base_uncased_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-sent_bert_tagalog_base_uncased_pipeline_en.md new file mode 100644 index 00000000000000..2051ed00c12a09 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sent_bert_tagalog_base_uncased_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_tagalog_base_uncased_pipeline pipeline BertSentenceEmbeddings from GKLMIP +author: John Snow Labs +name: sent_bert_tagalog_base_uncased_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_tagalog_base_uncased_pipeline` is a English model originally trained by GKLMIP. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_tagalog_base_uncased_pipeline_en_5.5.0_3.0_1727001725672.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_tagalog_base_uncased_pipeline_en_5.5.0_3.0_1727001725672.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_tagalog_base_uncased_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_tagalog_base_uncased_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_tagalog_base_uncased_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|470.3 MB| + +## References + +https://huggingface.co/GKLMIP/bert-tagalog-base-uncased + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sent_bert_two_en.md b/docs/_posts/ahmedlone127/2024-09-22-sent_bert_two_en.md new file mode 100644 index 00000000000000..d5deaf6cfe92af --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sent_bert_two_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_two BertSentenceEmbeddings from emma7897 +author: John Snow Labs +name: sent_bert_two +date: 2024-09-22 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_two` is a English model originally trained by emma7897. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_two_en_5.5.0_3.0_1727044311067.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_two_en_5.5.0_3.0_1727044311067.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_two","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_two","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_two| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|403.6 MB| + +## References + +https://huggingface.co/emma7897/bert_two \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sent_bert_two_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-sent_bert_two_pipeline_en.md new file mode 100644 index 00000000000000..84d7ac1d97978b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sent_bert_two_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_two_pipeline pipeline BertSentenceEmbeddings from emma7897 +author: John Snow Labs +name: sent_bert_two_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_two_pipeline` is a English model originally trained by emma7897. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_two_pipeline_en_5.5.0_3.0_1727044331597.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_two_pipeline_en_5.5.0_3.0_1727044331597.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_two_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_two_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_two_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|404.2 MB| + +## References + +https://huggingface.co/emma7897/bert_two + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sent_bertbek_news_big_cased_pipeline_uz.md b/docs/_posts/ahmedlone127/2024-09-22-sent_bertbek_news_big_cased_pipeline_uz.md new file mode 100644 index 00000000000000..916dd9201cd015 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sent_bertbek_news_big_cased_pipeline_uz.md @@ -0,0 +1,71 @@ +--- +layout: model +title: Uzbek sent_bertbek_news_big_cased_pipeline pipeline BertSentenceEmbeddings from elmurod1202 +author: John Snow Labs +name: sent_bertbek_news_big_cased_pipeline +date: 2024-09-22 +tags: [uz, open_source, pipeline, onnx] +task: Embeddings +language: uz +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bertbek_news_big_cased_pipeline` is a Uzbek model originally trained by elmurod1202. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bertbek_news_big_cased_pipeline_uz_5.5.0_3.0_1727047021905.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bertbek_news_big_cased_pipeline_uz_5.5.0_3.0_1727047021905.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bertbek_news_big_cased_pipeline", lang = "uz") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bertbek_news_big_cased_pipeline", lang = "uz") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bertbek_news_big_cased_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|uz| +|Size:|406.0 MB| + +## References + +https://huggingface.co/elmurod1202/bertbek-news-big-cased + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sent_bertbek_news_big_cased_uz.md b/docs/_posts/ahmedlone127/2024-09-22-sent_bertbek_news_big_cased_uz.md new file mode 100644 index 00000000000000..4f2b4916bf8bc7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sent_bertbek_news_big_cased_uz.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Uzbek sent_bertbek_news_big_cased BertSentenceEmbeddings from elmurod1202 +author: John Snow Labs +name: sent_bertbek_news_big_cased +date: 2024-09-22 +tags: [uz, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: uz +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bertbek_news_big_cased` is a Uzbek model originally trained by elmurod1202. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bertbek_news_big_cased_uz_5.5.0_3.0_1727047001984.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bertbek_news_big_cased_uz_5.5.0_3.0_1727047001984.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bertbek_news_big_cased","uz") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bertbek_news_big_cased","uz") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bertbek_news_big_cased| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|uz| +|Size:|405.5 MB| + +## References + +https://huggingface.co/elmurod1202/bertbek-news-big-cased \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sent_bertimbau_base_finetuned_lener_breton_pipeline_pt.md b/docs/_posts/ahmedlone127/2024-09-22-sent_bertimbau_base_finetuned_lener_breton_pipeline_pt.md new file mode 100644 index 00000000000000..f0b7cb5046ded9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sent_bertimbau_base_finetuned_lener_breton_pipeline_pt.md @@ -0,0 +1,71 @@ +--- +layout: model +title: Portuguese sent_bertimbau_base_finetuned_lener_breton_pipeline pipeline BertSentenceEmbeddings from Luciano +author: John Snow Labs +name: sent_bertimbau_base_finetuned_lener_breton_pipeline +date: 2024-09-22 +tags: [pt, open_source, pipeline, onnx] +task: Embeddings +language: pt +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bertimbau_base_finetuned_lener_breton_pipeline` is a Portuguese model originally trained by Luciano. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bertimbau_base_finetuned_lener_breton_pipeline_pt_5.5.0_3.0_1727013445081.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bertimbau_base_finetuned_lener_breton_pipeline_pt_5.5.0_3.0_1727013445081.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bertimbau_base_finetuned_lener_breton_pipeline", lang = "pt") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bertimbau_base_finetuned_lener_breton_pipeline", lang = "pt") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bertimbau_base_finetuned_lener_breton_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|pt| +|Size:|406.4 MB| + +## References + +https://huggingface.co/Luciano/bertimbau-base-finetuned-lener-br + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sent_bertimbau_base_finetuned_lener_breton_pt.md b/docs/_posts/ahmedlone127/2024-09-22-sent_bertimbau_base_finetuned_lener_breton_pt.md new file mode 100644 index 00000000000000..3ecd0b1f5e29cb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sent_bertimbau_base_finetuned_lener_breton_pt.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Portuguese sent_bertimbau_base_finetuned_lener_breton BertSentenceEmbeddings from Luciano +author: John Snow Labs +name: sent_bertimbau_base_finetuned_lener_breton +date: 2024-09-22 +tags: [pt, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: pt +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bertimbau_base_finetuned_lener_breton` is a Portuguese model originally trained by Luciano. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bertimbau_base_finetuned_lener_breton_pt_5.5.0_3.0_1727013425342.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bertimbau_base_finetuned_lener_breton_pt_5.5.0_3.0_1727013425342.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bertimbau_base_finetuned_lener_breton","pt") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bertimbau_base_finetuned_lener_breton","pt") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bertimbau_base_finetuned_lener_breton| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|pt| +|Size:|405.9 MB| + +## References + +https://huggingface.co/Luciano/bertimbau-base-finetuned-lener-br \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sent_bertimbau_large_fine_tuned_md_en.md b/docs/_posts/ahmedlone127/2024-09-22-sent_bertimbau_large_fine_tuned_md_en.md new file mode 100644 index 00000000000000..55015ca0bb26bf --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sent_bertimbau_large_fine_tuned_md_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bertimbau_large_fine_tuned_md BertSentenceEmbeddings from AVSilva +author: John Snow Labs +name: sent_bertimbau_large_fine_tuned_md +date: 2024-09-22 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bertimbau_large_fine_tuned_md` is a English model originally trained by AVSilva. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bertimbau_large_fine_tuned_md_en_5.5.0_3.0_1726964815373.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bertimbau_large_fine_tuned_md_en_5.5.0_3.0_1726964815373.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bertimbau_large_fine_tuned_md","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bertimbau_large_fine_tuned_md","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bertimbau_large_fine_tuned_md| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/AVSilva/bertimbau-large-fine-tuned-md \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sent_bertimbau_large_fine_tuned_md_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-sent_bertimbau_large_fine_tuned_md_pipeline_en.md new file mode 100644 index 00000000000000..a7f9cea9b607c2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sent_bertimbau_large_fine_tuned_md_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bertimbau_large_fine_tuned_md_pipeline pipeline BertSentenceEmbeddings from AVSilva +author: John Snow Labs +name: sent_bertimbau_large_fine_tuned_md_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bertimbau_large_fine_tuned_md_pipeline` is a English model originally trained by AVSilva. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bertimbau_large_fine_tuned_md_pipeline_en_5.5.0_3.0_1726964870212.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bertimbau_large_fine_tuned_md_pipeline_en_5.5.0_3.0_1726964870212.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bertimbau_large_fine_tuned_md_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bertimbau_large_fine_tuned_md_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bertimbau_large_fine_tuned_md_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/AVSilva/bertimbau-large-fine-tuned-md + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sent_beto_clinical_wl_spanish_es.md b/docs/_posts/ahmedlone127/2024-09-22-sent_beto_clinical_wl_spanish_es.md new file mode 100644 index 00000000000000..1d54989658055b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sent_beto_clinical_wl_spanish_es.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Castilian, Spanish sent_beto_clinical_wl_spanish BertSentenceEmbeddings from plncmm +author: John Snow Labs +name: sent_beto_clinical_wl_spanish +date: 2024-09-22 +tags: [es, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: es +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_beto_clinical_wl_spanish` is a Castilian, Spanish model originally trained by plncmm. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_beto_clinical_wl_spanish_es_5.5.0_3.0_1727044736651.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_beto_clinical_wl_spanish_es_5.5.0_3.0_1727044736651.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_beto_clinical_wl_spanish","es") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_beto_clinical_wl_spanish","es") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_beto_clinical_wl_spanish| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|es| +|Size:|409.7 MB| + +## References + +https://huggingface.co/plncmm/beto-clinical-wl-es \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sent_beto_clinical_wl_spanish_pipeline_es.md b/docs/_posts/ahmedlone127/2024-09-22-sent_beto_clinical_wl_spanish_pipeline_es.md new file mode 100644 index 00000000000000..406f0768ac0384 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sent_beto_clinical_wl_spanish_pipeline_es.md @@ -0,0 +1,71 @@ +--- +layout: model +title: Castilian, Spanish sent_beto_clinical_wl_spanish_pipeline pipeline BertSentenceEmbeddings from plncmm +author: John Snow Labs +name: sent_beto_clinical_wl_spanish_pipeline +date: 2024-09-22 +tags: [es, open_source, pipeline, onnx] +task: Embeddings +language: es +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_beto_clinical_wl_spanish_pipeline` is a Castilian, Spanish model originally trained by plncmm. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_beto_clinical_wl_spanish_pipeline_es_5.5.0_3.0_1727044757989.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_beto_clinical_wl_spanish_pipeline_es_5.5.0_3.0_1727044757989.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_beto_clinical_wl_spanish_pipeline", lang = "es") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_beto_clinical_wl_spanish_pipeline", lang = "es") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_beto_clinical_wl_spanish_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|es| +|Size:|410.2 MB| + +## References + +https://huggingface.co/plncmm/beto-clinical-wl-es + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sent_bibert_v0_1_en.md b/docs/_posts/ahmedlone127/2024-09-22-sent_bibert_v0_1_en.md new file mode 100644 index 00000000000000..3a43598efd9fcb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sent_bibert_v0_1_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bibert_v0_1 BertSentenceEmbeddings from yugen-ok +author: John Snow Labs +name: sent_bibert_v0_1 +date: 2024-09-22 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bibert_v0_1` is a English model originally trained by yugen-ok. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bibert_v0_1_en_5.5.0_3.0_1727044456673.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bibert_v0_1_en_5.5.0_3.0_1727044456673.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bibert_v0_1","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bibert_v0_1","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bibert_v0_1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/yugen-ok/bibert-v0.1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sent_bibert_v0_1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-sent_bibert_v0_1_pipeline_en.md new file mode 100644 index 00000000000000..5b32125b66562f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sent_bibert_v0_1_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bibert_v0_1_pipeline pipeline BertSentenceEmbeddings from yugen-ok +author: John Snow Labs +name: sent_bibert_v0_1_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bibert_v0_1_pipeline` is a English model originally trained by yugen-ok. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bibert_v0_1_pipeline_en_5.5.0_3.0_1727044478271.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bibert_v0_1_pipeline_en_5.5.0_3.0_1727044478271.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bibert_v0_1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bibert_v0_1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bibert_v0_1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.7 MB| + +## References + +https://huggingface.co/yugen-ok/bibert-v0.1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sent_bioptimus_en.md b/docs/_posts/ahmedlone127/2024-09-22-sent_bioptimus_en.md new file mode 100644 index 00000000000000..96e4d099f274e0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sent_bioptimus_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bioptimus BertSentenceEmbeddings from rttl-ai +author: John Snow Labs +name: sent_bioptimus +date: 2024-09-22 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bioptimus` is a English model originally trained by rttl-ai. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bioptimus_en_5.5.0_3.0_1727047066143.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bioptimus_en_5.5.0_3.0_1727047066143.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bioptimus","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bioptimus","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bioptimus| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|408.1 MB| + +## References + +https://huggingface.co/rttl-ai/BIOptimus \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sent_bioptimus_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-sent_bioptimus_pipeline_en.md new file mode 100644 index 00000000000000..d8ce22ac4bcccc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sent_bioptimus_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bioptimus_pipeline pipeline BertSentenceEmbeddings from rttl-ai +author: John Snow Labs +name: sent_bioptimus_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bioptimus_pipeline` is a English model originally trained by rttl-ai. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bioptimus_pipeline_en_5.5.0_3.0_1727047088568.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bioptimus_pipeline_en_5.5.0_3.0_1727047088568.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bioptimus_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bioptimus_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bioptimus_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|408.7 MB| + +## References + +https://huggingface.co/rttl-ai/BIOptimus + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sent_deberta_base_uncased_en.md b/docs/_posts/ahmedlone127/2024-09-22-sent_deberta_base_uncased_en.md new file mode 100644 index 00000000000000..297fd7048f8b92 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sent_deberta_base_uncased_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_deberta_base_uncased BertSentenceEmbeddings from mlcorelib +author: John Snow Labs +name: sent_deberta_base_uncased +date: 2024-09-22 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_deberta_base_uncased` is a English model originally trained by mlcorelib. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_deberta_base_uncased_en_5.5.0_3.0_1727004091994.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_deberta_base_uncased_en_5.5.0_3.0_1727004091994.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_deberta_base_uncased","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_deberta_base_uncased","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_deberta_base_uncased| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/mlcorelib/deberta-base-uncased \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sent_deberta_base_uncased_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-sent_deberta_base_uncased_pipeline_en.md new file mode 100644 index 00000000000000..367e3c83387df3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sent_deberta_base_uncased_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_deberta_base_uncased_pipeline pipeline BertSentenceEmbeddings from mlcorelib +author: John Snow Labs +name: sent_deberta_base_uncased_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_deberta_base_uncased_pipeline` is a English model originally trained by mlcorelib. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_deberta_base_uncased_pipeline_en_5.5.0_3.0_1727004112981.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_deberta_base_uncased_pipeline_en_5.5.0_3.0_1727004112981.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_deberta_base_uncased_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_deberta_base_uncased_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_deberta_base_uncased_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.7 MB| + +## References + +https://huggingface.co/mlcorelib/deberta-base-uncased + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sent_hindi_bpe_bert_test_2m_en.md b/docs/_posts/ahmedlone127/2024-09-22-sent_hindi_bpe_bert_test_2m_en.md new file mode 100644 index 00000000000000..9c2440fe844865 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sent_hindi_bpe_bert_test_2m_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_hindi_bpe_bert_test_2m BertSentenceEmbeddings from rg1683 +author: John Snow Labs +name: sent_hindi_bpe_bert_test_2m +date: 2024-09-22 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_hindi_bpe_bert_test_2m` is a English model originally trained by rg1683. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_hindi_bpe_bert_test_2m_en_5.5.0_3.0_1727004329991.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_hindi_bpe_bert_test_2m_en_5.5.0_3.0_1727004329991.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_hindi_bpe_bert_test_2m","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_hindi_bpe_bert_test_2m","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_hindi_bpe_bert_test_2m| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|377.8 MB| + +## References + +https://huggingface.co/rg1683/hindi_bpe_bert_test_2m \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sent_hindi_bpe_bert_test_2m_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-sent_hindi_bpe_bert_test_2m_pipeline_en.md new file mode 100644 index 00000000000000..2a7677f5e7bda0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sent_hindi_bpe_bert_test_2m_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_hindi_bpe_bert_test_2m_pipeline pipeline BertSentenceEmbeddings from rg1683 +author: John Snow Labs +name: sent_hindi_bpe_bert_test_2m_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_hindi_bpe_bert_test_2m_pipeline` is a English model originally trained by rg1683. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_hindi_bpe_bert_test_2m_pipeline_en_5.5.0_3.0_1727004347253.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_hindi_bpe_bert_test_2m_pipeline_en_5.5.0_3.0_1727004347253.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_hindi_bpe_bert_test_2m_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_hindi_bpe_bert_test_2m_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_hindi_bpe_bert_test_2m_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|378.3 MB| + +## References + +https://huggingface.co/rg1683/hindi_bpe_bert_test_2m + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sent_incel_alberto_en.md b/docs/_posts/ahmedlone127/2024-09-22-sent_incel_alberto_en.md new file mode 100644 index 00000000000000..f12ff35693d2be --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sent_incel_alberto_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_incel_alberto BertSentenceEmbeddings from pgajo +author: John Snow Labs +name: sent_incel_alberto +date: 2024-09-22 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_incel_alberto` is a English model originally trained by pgajo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_incel_alberto_en_5.5.0_3.0_1727044458578.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_incel_alberto_en_5.5.0_3.0_1727044458578.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_incel_alberto","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_incel_alberto","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_incel_alberto| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|688.7 MB| + +## References + +https://huggingface.co/pgajo/incel-alberto \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sent_incel_alberto_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-sent_incel_alberto_pipeline_en.md new file mode 100644 index 00000000000000..01fd9c274bf04f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sent_incel_alberto_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_incel_alberto_pipeline pipeline BertSentenceEmbeddings from pgajo +author: John Snow Labs +name: sent_incel_alberto_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_incel_alberto_pipeline` is a English model originally trained by pgajo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_incel_alberto_pipeline_en_5.5.0_3.0_1727044495226.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_incel_alberto_pipeline_en_5.5.0_3.0_1727044495226.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_incel_alberto_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_incel_alberto_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_incel_alberto_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|689.3 MB| + +## References + +https://huggingface.co/pgajo/incel-alberto + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sent_indobert_base_p2_finetuned_mer_80k_en.md b/docs/_posts/ahmedlone127/2024-09-22-sent_indobert_base_p2_finetuned_mer_80k_en.md new file mode 100644 index 00000000000000..4850bb1194e7da --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sent_indobert_base_p2_finetuned_mer_80k_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_indobert_base_p2_finetuned_mer_80k BertSentenceEmbeddings from stevenwh +author: John Snow Labs +name: sent_indobert_base_p2_finetuned_mer_80k +date: 2024-09-22 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_indobert_base_p2_finetuned_mer_80k` is a English model originally trained by stevenwh. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_indobert_base_p2_finetuned_mer_80k_en_5.5.0_3.0_1727001702109.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_indobert_base_p2_finetuned_mer_80k_en_5.5.0_3.0_1727001702109.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_indobert_base_p2_finetuned_mer_80k","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_indobert_base_p2_finetuned_mer_80k","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_indobert_base_p2_finetuned_mer_80k| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|464.2 MB| + +## References + +https://huggingface.co/stevenwh/indobert-base-p2-finetuned-mer-80k \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sent_indobert_base_p2_finetuned_mer_80k_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-sent_indobert_base_p2_finetuned_mer_80k_pipeline_en.md new file mode 100644 index 00000000000000..168a4a0b39ea1c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sent_indobert_base_p2_finetuned_mer_80k_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_indobert_base_p2_finetuned_mer_80k_pipeline pipeline BertSentenceEmbeddings from stevenwh +author: John Snow Labs +name: sent_indobert_base_p2_finetuned_mer_80k_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_indobert_base_p2_finetuned_mer_80k_pipeline` is a English model originally trained by stevenwh. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_indobert_base_p2_finetuned_mer_80k_pipeline_en_5.5.0_3.0_1727001722140.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_indobert_base_p2_finetuned_mer_80k_pipeline_en_5.5.0_3.0_1727001722140.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_indobert_base_p2_finetuned_mer_80k_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_indobert_base_p2_finetuned_mer_80k_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_indobert_base_p2_finetuned_mer_80k_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|464.8 MB| + +## References + +https://huggingface.co/stevenwh/indobert-base-p2-finetuned-mer-80k + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sent_knowbias_bert_base_uncased_race_en.md b/docs/_posts/ahmedlone127/2024-09-22-sent_knowbias_bert_base_uncased_race_en.md new file mode 100644 index 00000000000000..47f0b2e36e6f51 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sent_knowbias_bert_base_uncased_race_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_knowbias_bert_base_uncased_race BertSentenceEmbeddings from squiduu +author: John Snow Labs +name: sent_knowbias_bert_base_uncased_race +date: 2024-09-22 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_knowbias_bert_base_uncased_race` is a English model originally trained by squiduu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_knowbias_bert_base_uncased_race_en_5.5.0_3.0_1727013798024.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_knowbias_bert_base_uncased_race_en_5.5.0_3.0_1727013798024.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_knowbias_bert_base_uncased_race","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_knowbias_bert_base_uncased_race","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_knowbias_bert_base_uncased_race| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/squiduu/knowbias-bert-base-uncased-race \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sent_knowbias_bert_base_uncased_race_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-sent_knowbias_bert_base_uncased_race_pipeline_en.md new file mode 100644 index 00000000000000..04b6ed37cfcd8f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sent_knowbias_bert_base_uncased_race_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_knowbias_bert_base_uncased_race_pipeline pipeline BertSentenceEmbeddings from squiduu +author: John Snow Labs +name: sent_knowbias_bert_base_uncased_race_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_knowbias_bert_base_uncased_race_pipeline` is a English model originally trained by squiduu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_knowbias_bert_base_uncased_race_pipeline_en_5.5.0_3.0_1727013817401.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_knowbias_bert_base_uncased_race_pipeline_en_5.5.0_3.0_1727013817401.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_knowbias_bert_base_uncased_race_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_knowbias_bert_base_uncased_race_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_knowbias_bert_base_uncased_race_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.7 MB| + +## References + +https://huggingface.co/squiduu/knowbias-bert-base-uncased-race + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sent_ksl_bert_en.md b/docs/_posts/ahmedlone127/2024-09-22-sent_ksl_bert_en.md new file mode 100644 index 00000000000000..9e5b0673a3ef86 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sent_ksl_bert_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_ksl_bert BertSentenceEmbeddings from dobbytk +author: John Snow Labs +name: sent_ksl_bert +date: 2024-09-22 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_ksl_bert` is a English model originally trained by dobbytk. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_ksl_bert_en_5.5.0_3.0_1727047077059.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_ksl_bert_en_5.5.0_3.0_1727047077059.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_ksl_bert","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_ksl_bert","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_ksl_bert| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|406.2 MB| + +## References + +https://huggingface.co/dobbytk/KSL-BERT \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sent_ksl_bert_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-sent_ksl_bert_pipeline_en.md new file mode 100644 index 00000000000000..599dec27802abe --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sent_ksl_bert_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_ksl_bert_pipeline pipeline BertSentenceEmbeddings from dobbytk +author: John Snow Labs +name: sent_ksl_bert_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_ksl_bert_pipeline` is a English model originally trained by dobbytk. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_ksl_bert_pipeline_en_5.5.0_3.0_1727047097969.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_ksl_bert_pipeline_en_5.5.0_3.0_1727047097969.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_ksl_bert_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_ksl_bert_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_ksl_bert_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|406.7 MB| + +## References + +https://huggingface.co/dobbytk/KSL-BERT + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sent_protaugment_lm_banking77_en.md b/docs/_posts/ahmedlone127/2024-09-22-sent_protaugment_lm_banking77_en.md new file mode 100644 index 00000000000000..7891ed9e095e35 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sent_protaugment_lm_banking77_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_protaugment_lm_banking77 BertSentenceEmbeddings from tdopierre +author: John Snow Labs +name: sent_protaugment_lm_banking77 +date: 2024-09-22 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_protaugment_lm_banking77` is a English model originally trained by tdopierre. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_protaugment_lm_banking77_en_5.5.0_3.0_1726964955944.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_protaugment_lm_banking77_en_5.5.0_3.0_1726964955944.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_protaugment_lm_banking77","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_protaugment_lm_banking77","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_protaugment_lm_banking77| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|403.6 MB| + +## References + +https://huggingface.co/tdopierre/ProtAugment-LM-BANKING77 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sent_protaugment_lm_banking77_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-sent_protaugment_lm_banking77_pipeline_en.md new file mode 100644 index 00000000000000..0ab43f0a96045a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sent_protaugment_lm_banking77_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_protaugment_lm_banking77_pipeline pipeline BertSentenceEmbeddings from tdopierre +author: John Snow Labs +name: sent_protaugment_lm_banking77_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_protaugment_lm_banking77_pipeline` is a English model originally trained by tdopierre. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_protaugment_lm_banking77_pipeline_en_5.5.0_3.0_1726964974170.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_protaugment_lm_banking77_pipeline_en_5.5.0_3.0_1726964974170.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_protaugment_lm_banking77_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_protaugment_lm_banking77_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_protaugment_lm_banking77_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|404.1 MB| + +## References + +https://huggingface.co/tdopierre/ProtAugment-LM-BANKING77 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sent_python_code_comment_classification_en.md b/docs/_posts/ahmedlone127/2024-09-22-sent_python_code_comment_classification_en.md new file mode 100644 index 00000000000000..2ef2d3f55b1dc3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sent_python_code_comment_classification_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_python_code_comment_classification BertSentenceEmbeddings from ZarahShibli +author: John Snow Labs +name: sent_python_code_comment_classification +date: 2024-09-22 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_python_code_comment_classification` is a English model originally trained by ZarahShibli. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_python_code_comment_classification_en_5.5.0_3.0_1727047423894.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_python_code_comment_classification_en_5.5.0_3.0_1727047423894.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_python_code_comment_classification","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_python_code_comment_classification","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_python_code_comment_classification| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|406.7 MB| + +## References + +https://huggingface.co/ZarahShibli/python-code-comment-classification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sent_python_code_comment_classification_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-sent_python_code_comment_classification_pipeline_en.md new file mode 100644 index 00000000000000..fb08de1298413e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sent_python_code_comment_classification_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_python_code_comment_classification_pipeline pipeline BertSentenceEmbeddings from ZarahShibli +author: John Snow Labs +name: sent_python_code_comment_classification_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_python_code_comment_classification_pipeline` is a English model originally trained by ZarahShibli. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_python_code_comment_classification_pipeline_en_5.5.0_3.0_1727047443516.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_python_code_comment_classification_pipeline_en_5.5.0_3.0_1727047443516.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_python_code_comment_classification_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_python_code_comment_classification_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_python_code_comment_classification_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/ZarahShibli/python-code-comment-classification + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sent_splade_v3_lexical_nirantk_en.md b/docs/_posts/ahmedlone127/2024-09-22-sent_splade_v3_lexical_nirantk_en.md new file mode 100644 index 00000000000000..92892a90c0d4e5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sent_splade_v3_lexical_nirantk_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_splade_v3_lexical_nirantk BertSentenceEmbeddings from nirantk +author: John Snow Labs +name: sent_splade_v3_lexical_nirantk +date: 2024-09-22 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_splade_v3_lexical_nirantk` is a English model originally trained by nirantk. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_splade_v3_lexical_nirantk_en_5.5.0_3.0_1727001477816.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_splade_v3_lexical_nirantk_en_5.5.0_3.0_1727001477816.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_splade_v3_lexical_nirantk","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_splade_v3_lexical_nirantk","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_splade_v3_lexical_nirantk| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|407.4 MB| + +## References + +https://huggingface.co/nirantk/splade-v3-lexical \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sent_splade_v3_lexical_nirantk_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-sent_splade_v3_lexical_nirantk_pipeline_en.md new file mode 100644 index 00000000000000..8df612c5099bcd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sent_splade_v3_lexical_nirantk_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_splade_v3_lexical_nirantk_pipeline pipeline BertSentenceEmbeddings from nirantk +author: John Snow Labs +name: sent_splade_v3_lexical_nirantk_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_splade_v3_lexical_nirantk_pipeline` is a English model originally trained by nirantk. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_splade_v3_lexical_nirantk_pipeline_en_5.5.0_3.0_1727001497745.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_splade_v3_lexical_nirantk_pipeline_en_5.5.0_3.0_1727001497745.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_splade_v3_lexical_nirantk_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_splade_v3_lexical_nirantk_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_splade_v3_lexical_nirantk_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.9 MB| + +## References + +https://huggingface.co/nirantk/splade-v3-lexical + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sentence_classification_bitnet_en.md b/docs/_posts/ahmedlone127/2024-09-22-sentence_classification_bitnet_en.md new file mode 100644 index 00000000000000..abbe265caace01 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sentence_classification_bitnet_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sentence_classification_bitnet DistilBertForSequenceClassification from sanjeev-bhandari01 +author: John Snow Labs +name: sentence_classification_bitnet +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sentence_classification_bitnet` is a English model originally trained by sanjeev-bhandari01. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sentence_classification_bitnet_en_5.5.0_3.0_1726980414810.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sentence_classification_bitnet_en_5.5.0_3.0_1726980414810.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("sentence_classification_bitnet","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("sentence_classification_bitnet", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sentence_classification_bitnet| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|248.5 MB| + +## References + +https://huggingface.co/sanjeev-bhandari01/sentence_classification_bitnet \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sentence_classification_bitnet_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-sentence_classification_bitnet_pipeline_en.md new file mode 100644 index 00000000000000..1623d68d2fc3cd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sentence_classification_bitnet_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English sentence_classification_bitnet_pipeline pipeline DistilBertForSequenceClassification from sanjeev-bhandari01 +author: John Snow Labs +name: sentence_classification_bitnet_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sentence_classification_bitnet_pipeline` is a English model originally trained by sanjeev-bhandari01. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sentence_classification_bitnet_pipeline_en_5.5.0_3.0_1726980426598.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sentence_classification_bitnet_pipeline_en_5.5.0_3.0_1726980426598.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sentence_classification_bitnet_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sentence_classification_bitnet_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sentence_classification_bitnet_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|248.5 MB| + +## References + +https://huggingface.co/sanjeev-bhandari01/sentence_classification_bitnet + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sentiment1_en.md b/docs/_posts/ahmedlone127/2024-09-22-sentiment1_en.md new file mode 100644 index 00000000000000..829dd2a33a7b9e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sentiment1_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sentiment1 DistilBertForSequenceClassification from ben-ongys +author: John Snow Labs +name: sentiment1 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sentiment1` is a English model originally trained by ben-ongys. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sentiment1_en_5.5.0_3.0_1727035375346.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sentiment1_en_5.5.0_3.0_1727035375346.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("sentiment1","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("sentiment1", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sentiment1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/ben-ongys/sentiment1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sentiment1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-sentiment1_pipeline_en.md new file mode 100644 index 00000000000000..dae71b88ddfa03 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sentiment1_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English sentiment1_pipeline pipeline DistilBertForSequenceClassification from ben-ongys +author: John Snow Labs +name: sentiment1_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sentiment1_pipeline` is a English model originally trained by ben-ongys. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sentiment1_pipeline_en_5.5.0_3.0_1727035389325.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sentiment1_pipeline_en_5.5.0_3.0_1727035389325.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sentiment1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sentiment1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sentiment1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/ben-ongys/sentiment1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sentiment_analysis_distilbert_en.md b/docs/_posts/ahmedlone127/2024-09-22-sentiment_analysis_distilbert_en.md new file mode 100644 index 00000000000000..2b047a1e7c789b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sentiment_analysis_distilbert_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sentiment_analysis_distilbert DistilBertForSequenceClassification from arnavmahapatra +author: John Snow Labs +name: sentiment_analysis_distilbert +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sentiment_analysis_distilbert` is a English model originally trained by arnavmahapatra. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sentiment_analysis_distilbert_en_5.5.0_3.0_1726980227790.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sentiment_analysis_distilbert_en_5.5.0_3.0_1726980227790.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("sentiment_analysis_distilbert","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("sentiment_analysis_distilbert", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sentiment_analysis_distilbert| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/arnavmahapatra/sentiment-analysis-distilbert \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sentiment_analysis_distilbert_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-sentiment_analysis_distilbert_pipeline_en.md new file mode 100644 index 00000000000000..e32d4bd6f9f625 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sentiment_analysis_distilbert_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English sentiment_analysis_distilbert_pipeline pipeline DistilBertForSequenceClassification from arnavmahapatra +author: John Snow Labs +name: sentiment_analysis_distilbert_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sentiment_analysis_distilbert_pipeline` is a English model originally trained by arnavmahapatra. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sentiment_analysis_distilbert_pipeline_en_5.5.0_3.0_1726980239220.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sentiment_analysis_distilbert_pipeline_en_5.5.0_3.0_1726980239220.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sentiment_analysis_distilbert_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sentiment_analysis_distilbert_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sentiment_analysis_distilbert_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/arnavmahapatra/sentiment-analysis-distilbert + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sentiment_analysis_llm_en.md b/docs/_posts/ahmedlone127/2024-09-22-sentiment_analysis_llm_en.md new file mode 100644 index 00000000000000..556f9c80bf3cb1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sentiment_analysis_llm_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sentiment_analysis_llm RoBertaForSequenceClassification from Agra2002 +author: John Snow Labs +name: sentiment_analysis_llm +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sentiment_analysis_llm` is a English model originally trained by Agra2002. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sentiment_analysis_llm_en_5.5.0_3.0_1726972503397.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sentiment_analysis_llm_en_5.5.0_3.0_1726972503397.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("sentiment_analysis_llm","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("sentiment_analysis_llm", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sentiment_analysis_llm| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|467.7 MB| + +## References + +https://huggingface.co/Agra2002/sentiment_analysis_LLM \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sentiment_analysis_llm_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-sentiment_analysis_llm_pipeline_en.md new file mode 100644 index 00000000000000..beec11fa196a8c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sentiment_analysis_llm_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English sentiment_analysis_llm_pipeline pipeline RoBertaForSequenceClassification from Agra2002 +author: John Snow Labs +name: sentiment_analysis_llm_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sentiment_analysis_llm_pipeline` is a English model originally trained by Agra2002. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sentiment_analysis_llm_pipeline_en_5.5.0_3.0_1726972524131.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sentiment_analysis_llm_pipeline_en_5.5.0_3.0_1726972524131.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sentiment_analysis_llm_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sentiment_analysis_llm_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sentiment_analysis_llm_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|467.7 MB| + +## References + +https://huggingface.co/Agra2002/sentiment_analysis_LLM + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sentiment_analysis_preetham04_en.md b/docs/_posts/ahmedlone127/2024-09-22-sentiment_analysis_preetham04_en.md new file mode 100644 index 00000000000000..f5f7266b11bc5c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sentiment_analysis_preetham04_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sentiment_analysis_preetham04 BertForSequenceClassification from Preetham04 +author: John Snow Labs +name: sentiment_analysis_preetham04 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sentiment_analysis_preetham04` is a English model originally trained by Preetham04. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sentiment_analysis_preetham04_en_5.5.0_3.0_1727034100449.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sentiment_analysis_preetham04_en_5.5.0_3.0_1727034100449.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("sentiment_analysis_preetham04","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("sentiment_analysis_preetham04", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sentiment_analysis_preetham04| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/Preetham04/sentiment-analysis \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sentiment_analysis_preetham04_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-sentiment_analysis_preetham04_pipeline_en.md new file mode 100644 index 00000000000000..d01519373dc3ff --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sentiment_analysis_preetham04_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English sentiment_analysis_preetham04_pipeline pipeline BertForSequenceClassification from Preetham04 +author: John Snow Labs +name: sentiment_analysis_preetham04_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sentiment_analysis_preetham04_pipeline` is a English model originally trained by Preetham04. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sentiment_analysis_preetham04_pipeline_en_5.5.0_3.0_1727034120954.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sentiment_analysis_preetham04_pipeline_en_5.5.0_3.0_1727034120954.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sentiment_analysis_preetham04_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sentiment_analysis_preetham04_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sentiment_analysis_preetham04_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/Preetham04/sentiment-analysis + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sentiment_analysis_shadez25_en.md b/docs/_posts/ahmedlone127/2024-09-22-sentiment_analysis_shadez25_en.md new file mode 100644 index 00000000000000..94d1cfe7216b45 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sentiment_analysis_shadez25_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sentiment_analysis_shadez25 DistilBertForSequenceClassification from Shadez25 +author: John Snow Labs +name: sentiment_analysis_shadez25 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sentiment_analysis_shadez25` is a English model originally trained by Shadez25. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sentiment_analysis_shadez25_en_5.5.0_3.0_1727033492393.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sentiment_analysis_shadez25_en_5.5.0_3.0_1727033492393.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("sentiment_analysis_shadez25","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("sentiment_analysis_shadez25", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sentiment_analysis_shadez25| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Shadez25/sentiment-analysis \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sentiment_analysis_shadez25_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-sentiment_analysis_shadez25_pipeline_en.md new file mode 100644 index 00000000000000..d274ee64ced64e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sentiment_analysis_shadez25_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English sentiment_analysis_shadez25_pipeline pipeline DistilBertForSequenceClassification from Shadez25 +author: John Snow Labs +name: sentiment_analysis_shadez25_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sentiment_analysis_shadez25_pipeline` is a English model originally trained by Shadez25. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sentiment_analysis_shadez25_pipeline_en_5.5.0_3.0_1727033506405.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sentiment_analysis_shadez25_pipeline_en_5.5.0_3.0_1727033506405.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sentiment_analysis_shadez25_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sentiment_analysis_shadez25_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sentiment_analysis_shadez25_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Shadez25/sentiment-analysis + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sentiment_sentiment_small_random0_seed2_twitter_roberta_base_dec2020_en.md b/docs/_posts/ahmedlone127/2024-09-22-sentiment_sentiment_small_random0_seed2_twitter_roberta_base_dec2020_en.md new file mode 100644 index 00000000000000..030a582ef40375 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sentiment_sentiment_small_random0_seed2_twitter_roberta_base_dec2020_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sentiment_sentiment_small_random0_seed2_twitter_roberta_base_dec2020 RoBertaForSequenceClassification from tweettemposhift +author: John Snow Labs +name: sentiment_sentiment_small_random0_seed2_twitter_roberta_base_dec2020 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sentiment_sentiment_small_random0_seed2_twitter_roberta_base_dec2020` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sentiment_sentiment_small_random0_seed2_twitter_roberta_base_dec2020_en_5.5.0_3.0_1727026421434.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sentiment_sentiment_small_random0_seed2_twitter_roberta_base_dec2020_en_5.5.0_3.0_1727026421434.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("sentiment_sentiment_small_random0_seed2_twitter_roberta_base_dec2020","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("sentiment_sentiment_small_random0_seed2_twitter_roberta_base_dec2020", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sentiment_sentiment_small_random0_seed2_twitter_roberta_base_dec2020| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|468.3 MB| + +## References + +https://huggingface.co/tweettemposhift/sentiment-sentiment_small_random0_seed2-twitter-roberta-base-dec2020 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sentiment_sentiment_small_random0_seed2_twitter_roberta_base_dec2020_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-sentiment_sentiment_small_random0_seed2_twitter_roberta_base_dec2020_pipeline_en.md new file mode 100644 index 00000000000000..ab0cb502546e1f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sentiment_sentiment_small_random0_seed2_twitter_roberta_base_dec2020_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English sentiment_sentiment_small_random0_seed2_twitter_roberta_base_dec2020_pipeline pipeline RoBertaForSequenceClassification from tweettemposhift +author: John Snow Labs +name: sentiment_sentiment_small_random0_seed2_twitter_roberta_base_dec2020_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sentiment_sentiment_small_random0_seed2_twitter_roberta_base_dec2020_pipeline` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sentiment_sentiment_small_random0_seed2_twitter_roberta_base_dec2020_pipeline_en_5.5.0_3.0_1727026444053.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sentiment_sentiment_small_random0_seed2_twitter_roberta_base_dec2020_pipeline_en_5.5.0_3.0_1727026444053.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sentiment_sentiment_small_random0_seed2_twitter_roberta_base_dec2020_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sentiment_sentiment_small_random0_seed2_twitter_roberta_base_dec2020_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sentiment_sentiment_small_random0_seed2_twitter_roberta_base_dec2020_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|468.3 MB| + +## References + +https://huggingface.co/tweettemposhift/sentiment-sentiment_small_random0_seed2-twitter-roberta-base-dec2020 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sentiment_sentiment_small_random1_seed1_twitter_roberta_base_2019_90m_en.md b/docs/_posts/ahmedlone127/2024-09-22-sentiment_sentiment_small_random1_seed1_twitter_roberta_base_2019_90m_en.md new file mode 100644 index 00000000000000..d8f2cf363a8735 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sentiment_sentiment_small_random1_seed1_twitter_roberta_base_2019_90m_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sentiment_sentiment_small_random1_seed1_twitter_roberta_base_2019_90m RoBertaForSequenceClassification from tweettemposhift +author: John Snow Labs +name: sentiment_sentiment_small_random1_seed1_twitter_roberta_base_2019_90m +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sentiment_sentiment_small_random1_seed1_twitter_roberta_base_2019_90m` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sentiment_sentiment_small_random1_seed1_twitter_roberta_base_2019_90m_en_5.5.0_3.0_1727037162690.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sentiment_sentiment_small_random1_seed1_twitter_roberta_base_2019_90m_en_5.5.0_3.0_1727037162690.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("sentiment_sentiment_small_random1_seed1_twitter_roberta_base_2019_90m","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("sentiment_sentiment_small_random1_seed1_twitter_roberta_base_2019_90m", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sentiment_sentiment_small_random1_seed1_twitter_roberta_base_2019_90m| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|468.3 MB| + +## References + +https://huggingface.co/tweettemposhift/sentiment-sentiment_small_random1_seed1-twitter-roberta-base-2019-90m \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sentiment_sentiment_small_random1_seed1_twitter_roberta_base_2019_90m_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-sentiment_sentiment_small_random1_seed1_twitter_roberta_base_2019_90m_pipeline_en.md new file mode 100644 index 00000000000000..c5f1b3b9a39b29 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sentiment_sentiment_small_random1_seed1_twitter_roberta_base_2019_90m_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English sentiment_sentiment_small_random1_seed1_twitter_roberta_base_2019_90m_pipeline pipeline RoBertaForSequenceClassification from tweettemposhift +author: John Snow Labs +name: sentiment_sentiment_small_random1_seed1_twitter_roberta_base_2019_90m_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sentiment_sentiment_small_random1_seed1_twitter_roberta_base_2019_90m_pipeline` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sentiment_sentiment_small_random1_seed1_twitter_roberta_base_2019_90m_pipeline_en_5.5.0_3.0_1727037186523.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sentiment_sentiment_small_random1_seed1_twitter_roberta_base_2019_90m_pipeline_en_5.5.0_3.0_1727037186523.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sentiment_sentiment_small_random1_seed1_twitter_roberta_base_2019_90m_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sentiment_sentiment_small_random1_seed1_twitter_roberta_base_2019_90m_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sentiment_sentiment_small_random1_seed1_twitter_roberta_base_2019_90m_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|468.3 MB| + +## References + +https://huggingface.co/tweettemposhift/sentiment-sentiment_small_random1_seed1-twitter-roberta-base-2019-90m + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sentiment_sentiment_small_random3_seed0_bertweet_large_en.md b/docs/_posts/ahmedlone127/2024-09-22-sentiment_sentiment_small_random3_seed0_bertweet_large_en.md new file mode 100644 index 00000000000000..6a6afeec5861ea --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sentiment_sentiment_small_random3_seed0_bertweet_large_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sentiment_sentiment_small_random3_seed0_bertweet_large RoBertaForSequenceClassification from tweettemposhift +author: John Snow Labs +name: sentiment_sentiment_small_random3_seed0_bertweet_large +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sentiment_sentiment_small_random3_seed0_bertweet_large` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sentiment_sentiment_small_random3_seed0_bertweet_large_en_5.5.0_3.0_1727037682282.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sentiment_sentiment_small_random3_seed0_bertweet_large_en_5.5.0_3.0_1727037682282.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("sentiment_sentiment_small_random3_seed0_bertweet_large","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("sentiment_sentiment_small_random3_seed0_bertweet_large", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sentiment_sentiment_small_random3_seed0_bertweet_large| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/tweettemposhift/sentiment-sentiment_small_random3_seed0-bertweet-large \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sentiment_sentiment_small_random3_seed0_bertweet_large_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-sentiment_sentiment_small_random3_seed0_bertweet_large_pipeline_en.md new file mode 100644 index 00000000000000..ed537da0cff14c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sentiment_sentiment_small_random3_seed0_bertweet_large_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English sentiment_sentiment_small_random3_seed0_bertweet_large_pipeline pipeline RoBertaForSequenceClassification from tweettemposhift +author: John Snow Labs +name: sentiment_sentiment_small_random3_seed0_bertweet_large_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sentiment_sentiment_small_random3_seed0_bertweet_large_pipeline` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sentiment_sentiment_small_random3_seed0_bertweet_large_pipeline_en_5.5.0_3.0_1727037780253.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sentiment_sentiment_small_random3_seed0_bertweet_large_pipeline_en_5.5.0_3.0_1727037780253.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sentiment_sentiment_small_random3_seed0_bertweet_large_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sentiment_sentiment_small_random3_seed0_bertweet_large_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sentiment_sentiment_small_random3_seed0_bertweet_large_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/tweettemposhift/sentiment-sentiment_small_random3_seed0-bertweet-large + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sentiment_sentiment_small_random3_seed0_roberta_base_en.md b/docs/_posts/ahmedlone127/2024-09-22-sentiment_sentiment_small_random3_seed0_roberta_base_en.md new file mode 100644 index 00000000000000..8a028ebaf92777 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sentiment_sentiment_small_random3_seed0_roberta_base_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sentiment_sentiment_small_random3_seed0_roberta_base RoBertaForSequenceClassification from tweettemposhift +author: John Snow Labs +name: sentiment_sentiment_small_random3_seed0_roberta_base +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sentiment_sentiment_small_random3_seed0_roberta_base` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sentiment_sentiment_small_random3_seed0_roberta_base_en_5.5.0_3.0_1727026583169.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sentiment_sentiment_small_random3_seed0_roberta_base_en_5.5.0_3.0_1727026583169.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("sentiment_sentiment_small_random3_seed0_roberta_base","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("sentiment_sentiment_small_random3_seed0_roberta_base", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sentiment_sentiment_small_random3_seed0_roberta_base| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|430.5 MB| + +## References + +https://huggingface.co/tweettemposhift/sentiment-sentiment_small_random3_seed0-roberta-base \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sentiment_sentiment_small_random3_seed0_roberta_base_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-sentiment_sentiment_small_random3_seed0_roberta_base_pipeline_en.md new file mode 100644 index 00000000000000..5c5e5ad3618c74 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sentiment_sentiment_small_random3_seed0_roberta_base_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English sentiment_sentiment_small_random3_seed0_roberta_base_pipeline pipeline RoBertaForSequenceClassification from tweettemposhift +author: John Snow Labs +name: sentiment_sentiment_small_random3_seed0_roberta_base_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sentiment_sentiment_small_random3_seed0_roberta_base_pipeline` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sentiment_sentiment_small_random3_seed0_roberta_base_pipeline_en_5.5.0_3.0_1727026620010.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sentiment_sentiment_small_random3_seed0_roberta_base_pipeline_en_5.5.0_3.0_1727026620010.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sentiment_sentiment_small_random3_seed0_roberta_base_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sentiment_sentiment_small_random3_seed0_roberta_base_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sentiment_sentiment_small_random3_seed0_roberta_base_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|430.5 MB| + +## References + +https://huggingface.co/tweettemposhift/sentiment-sentiment_small_random3_seed0-roberta-base + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sentiment_sentiment_small_random3_seed1_bertweet_large_en.md b/docs/_posts/ahmedlone127/2024-09-22-sentiment_sentiment_small_random3_seed1_bertweet_large_en.md new file mode 100644 index 00000000000000..d771b5b5aecffe --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sentiment_sentiment_small_random3_seed1_bertweet_large_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sentiment_sentiment_small_random3_seed1_bertweet_large RoBertaForSequenceClassification from tweettemposhift +author: John Snow Labs +name: sentiment_sentiment_small_random3_seed1_bertweet_large +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sentiment_sentiment_small_random3_seed1_bertweet_large` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sentiment_sentiment_small_random3_seed1_bertweet_large_en_5.5.0_3.0_1727037875534.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sentiment_sentiment_small_random3_seed1_bertweet_large_en_5.5.0_3.0_1727037875534.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("sentiment_sentiment_small_random3_seed1_bertweet_large","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("sentiment_sentiment_small_random3_seed1_bertweet_large", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sentiment_sentiment_small_random3_seed1_bertweet_large| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/tweettemposhift/sentiment-sentiment_small_random3_seed1-bertweet-large \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sentiment_sentiment_small_random3_seed1_bertweet_large_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-sentiment_sentiment_small_random3_seed1_bertweet_large_pipeline_en.md new file mode 100644 index 00000000000000..70a364b82d7fc2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sentiment_sentiment_small_random3_seed1_bertweet_large_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English sentiment_sentiment_small_random3_seed1_bertweet_large_pipeline pipeline RoBertaForSequenceClassification from tweettemposhift +author: John Snow Labs +name: sentiment_sentiment_small_random3_seed1_bertweet_large_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sentiment_sentiment_small_random3_seed1_bertweet_large_pipeline` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sentiment_sentiment_small_random3_seed1_bertweet_large_pipeline_en_5.5.0_3.0_1727037968650.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sentiment_sentiment_small_random3_seed1_bertweet_large_pipeline_en_5.5.0_3.0_1727037968650.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sentiment_sentiment_small_random3_seed1_bertweet_large_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sentiment_sentiment_small_random3_seed1_bertweet_large_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sentiment_sentiment_small_random3_seed1_bertweet_large_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/tweettemposhift/sentiment-sentiment_small_random3_seed1-bertweet-large + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sentiment_sentiment_small_random3_seed2_twitter_roberta_base_2019_90m_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-sentiment_sentiment_small_random3_seed2_twitter_roberta_base_2019_90m_pipeline_en.md new file mode 100644 index 00000000000000..e0c95dce7c8a8a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sentiment_sentiment_small_random3_seed2_twitter_roberta_base_2019_90m_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English sentiment_sentiment_small_random3_seed2_twitter_roberta_base_2019_90m_pipeline pipeline RoBertaForSequenceClassification from tweettemposhift +author: John Snow Labs +name: sentiment_sentiment_small_random3_seed2_twitter_roberta_base_2019_90m_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sentiment_sentiment_small_random3_seed2_twitter_roberta_base_2019_90m_pipeline` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sentiment_sentiment_small_random3_seed2_twitter_roberta_base_2019_90m_pipeline_en_5.5.0_3.0_1726972066460.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sentiment_sentiment_small_random3_seed2_twitter_roberta_base_2019_90m_pipeline_en_5.5.0_3.0_1726972066460.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sentiment_sentiment_small_random3_seed2_twitter_roberta_base_2019_90m_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sentiment_sentiment_small_random3_seed2_twitter_roberta_base_2019_90m_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sentiment_sentiment_small_random3_seed2_twitter_roberta_base_2019_90m_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|468.3 MB| + +## References + +https://huggingface.co/tweettemposhift/sentiment-sentiment_small_random3_seed2-twitter-roberta-base-2019-90m + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sentiment_sentiment_small_temporal_twitter_roberta_base_2022_154m_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-sentiment_sentiment_small_temporal_twitter_roberta_base_2022_154m_pipeline_en.md new file mode 100644 index 00000000000000..c385fdf4a48bd5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sentiment_sentiment_small_temporal_twitter_roberta_base_2022_154m_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English sentiment_sentiment_small_temporal_twitter_roberta_base_2022_154m_pipeline pipeline RoBertaForSequenceClassification from tweettemposhift +author: John Snow Labs +name: sentiment_sentiment_small_temporal_twitter_roberta_base_2022_154m_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sentiment_sentiment_small_temporal_twitter_roberta_base_2022_154m_pipeline` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sentiment_sentiment_small_temporal_twitter_roberta_base_2022_154m_pipeline_en_5.5.0_3.0_1727017012982.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sentiment_sentiment_small_temporal_twitter_roberta_base_2022_154m_pipeline_en_5.5.0_3.0_1727017012982.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sentiment_sentiment_small_temporal_twitter_roberta_base_2022_154m_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sentiment_sentiment_small_temporal_twitter_roberta_base_2022_154m_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sentiment_sentiment_small_temporal_twitter_roberta_base_2022_154m_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|468.1 MB| + +## References + +https://huggingface.co/tweettemposhift/sentiment-sentiment_small_temporal-twitter-roberta-base-2022-154m + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sequence_classification_en.md b/docs/_posts/ahmedlone127/2024-09-22-sequence_classification_en.md new file mode 100644 index 00000000000000..db113779642aff --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sequence_classification_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sequence_classification BertForSequenceClassification from xysmalobia +author: John Snow Labs +name: sequence_classification +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sequence_classification` is a English model originally trained by xysmalobia. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sequence_classification_en_5.5.0_3.0_1727030482744.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sequence_classification_en_5.5.0_3.0_1727030482744.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("sequence_classification","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("sequence_classification", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sequence_classification| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/xysmalobia/sequence_classification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sequence_classification_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-sequence_classification_pipeline_en.md new file mode 100644 index 00000000000000..74d5ca00fc9a90 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sequence_classification_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English sequence_classification_pipeline pipeline BertForSequenceClassification from xysmalobia +author: John Snow Labs +name: sequence_classification_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sequence_classification_pipeline` is a English model originally trained by xysmalobia. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sequence_classification_pipeline_en_5.5.0_3.0_1727030503598.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sequence_classification_pipeline_en_5.5.0_3.0_1727030503598.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sequence_classification_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sequence_classification_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sequence_classification_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/xysmalobia/sequence_classification + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-smsa_xlm_r_en.md b/docs/_posts/ahmedlone127/2024-09-22-smsa_xlm_r_en.md new file mode 100644 index 00000000000000..4caf575a0af09e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-smsa_xlm_r_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English smsa_xlm_r XlmRoBertaForSequenceClassification from Cincin-nvp +author: John Snow Labs +name: smsa_xlm_r +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`smsa_xlm_r` is a English model originally trained by Cincin-nvp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/smsa_xlm_r_en_5.5.0_3.0_1727009897441.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/smsa_xlm_r_en_5.5.0_3.0_1727009897441.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("smsa_xlm_r","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("smsa_xlm_r", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|smsa_xlm_r| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|847.9 MB| + +## References + +https://huggingface.co/Cincin-nvp/SmSA_XLM-R \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-smsa_xlm_r_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-smsa_xlm_r_pipeline_en.md new file mode 100644 index 00000000000000..ec775e973359b5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-smsa_xlm_r_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English smsa_xlm_r_pipeline pipeline XlmRoBertaForSequenceClassification from Cincin-nvp +author: John Snow Labs +name: smsa_xlm_r_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`smsa_xlm_r_pipeline` is a English model originally trained by Cincin-nvp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/smsa_xlm_r_pipeline_en_5.5.0_3.0_1727009966108.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/smsa_xlm_r_pipeline_en_5.5.0_3.0_1727009966108.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("smsa_xlm_r_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("smsa_xlm_r_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|smsa_xlm_r_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|847.9 MB| + +## References + +https://huggingface.co/Cincin-nvp/SmSA_XLM-R + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-smsphishing_en.md b/docs/_posts/ahmedlone127/2024-09-22-smsphishing_en.md new file mode 100644 index 00000000000000..0a096cccc5cee6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-smsphishing_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English smsphishing RoBertaForSequenceClassification from matanbn +author: John Snow Labs +name: smsphishing +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`smsphishing` is a English model originally trained by matanbn. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/smsphishing_en_5.5.0_3.0_1727017042474.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/smsphishing_en_5.5.0_3.0_1727017042474.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("smsphishing","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("smsphishing", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|smsphishing| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|427.2 MB| + +## References + +https://huggingface.co/matanbn/smsPhishing \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-smsphishing_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-smsphishing_pipeline_en.md new file mode 100644 index 00000000000000..7e082211527fa9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-smsphishing_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English smsphishing_pipeline pipeline RoBertaForSequenceClassification from matanbn +author: John Snow Labs +name: smsphishing_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`smsphishing_pipeline` is a English model originally trained by matanbn. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/smsphishing_pipeline_en_5.5.0_3.0_1727017078269.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/smsphishing_pipeline_en_5.5.0_3.0_1727017078269.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("smsphishing_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("smsphishing_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|smsphishing_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|427.3 MB| + +## References + +https://huggingface.co/matanbn/smsPhishing + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-social_orientation_en.md b/docs/_posts/ahmedlone127/2024-09-22-social_orientation_en.md new file mode 100644 index 00000000000000..92e8806d3f9109 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-social_orientation_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English social_orientation DistilBertForSequenceClassification from tee-oh-double-dee +author: John Snow Labs +name: social_orientation +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`social_orientation` is a English model originally trained by tee-oh-double-dee. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/social_orientation_en_5.5.0_3.0_1727033470309.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/social_orientation_en_5.5.0_3.0_1727033470309.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("social_orientation","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("social_orientation", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|social_orientation| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/tee-oh-double-dee/social-orientation \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-social_orientation_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-social_orientation_pipeline_en.md new file mode 100644 index 00000000000000..9eb1cf4e72bad8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-social_orientation_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English social_orientation_pipeline pipeline DistilBertForSequenceClassification from tee-oh-double-dee +author: John Snow Labs +name: social_orientation_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`social_orientation_pipeline` is a English model originally trained by tee-oh-double-dee. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/social_orientation_pipeline_en_5.5.0_3.0_1727033484452.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/social_orientation_pipeline_en_5.5.0_3.0_1727033484452.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("social_orientation_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("social_orientation_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|social_orientation_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/tee-oh-double-dee/social-orientation + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-spa_portuguese_xlm_r_es.md b/docs/_posts/ahmedlone127/2024-09-22-spa_portuguese_xlm_r_es.md new file mode 100644 index 00000000000000..9d46f1906a231e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-spa_portuguese_xlm_r_es.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Castilian, Spanish spa_portuguese_xlm_r XlmRoBertaForTokenClassification from mbruton +author: John Snow Labs +name: spa_portuguese_xlm_r +date: 2024-09-22 +tags: [es, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: es +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`spa_portuguese_xlm_r` is a Castilian, Spanish model originally trained by mbruton. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/spa_portuguese_xlm_r_es_5.5.0_3.0_1726970122075.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/spa_portuguese_xlm_r_es_5.5.0_3.0_1726970122075.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("spa_portuguese_xlm_r","es") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("spa_portuguese_xlm_r", "es") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|spa_portuguese_xlm_r| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|es| +|Size:|865.1 MB| + +## References + +https://huggingface.co/mbruton/spa_pt_XLM-R \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-spa_portuguese_xlm_r_pipeline_es.md b/docs/_posts/ahmedlone127/2024-09-22-spa_portuguese_xlm_r_pipeline_es.md new file mode 100644 index 00000000000000..63ffc94011a806 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-spa_portuguese_xlm_r_pipeline_es.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Castilian, Spanish spa_portuguese_xlm_r_pipeline pipeline XlmRoBertaForTokenClassification from mbruton +author: John Snow Labs +name: spa_portuguese_xlm_r_pipeline +date: 2024-09-22 +tags: [es, open_source, pipeline, onnx] +task: Named Entity Recognition +language: es +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`spa_portuguese_xlm_r_pipeline` is a Castilian, Spanish model originally trained by mbruton. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/spa_portuguese_xlm_r_pipeline_es_5.5.0_3.0_1726970179501.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/spa_portuguese_xlm_r_pipeline_es_5.5.0_3.0_1726970179501.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("spa_portuguese_xlm_r_pipeline", lang = "es") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("spa_portuguese_xlm_r_pipeline", lang = "es") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|spa_portuguese_xlm_r_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|es| +|Size:|865.1 MB| + +## References + +https://huggingface.co/mbruton/spa_pt_XLM-R + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-spam_en.md b/docs/_posts/ahmedlone127/2024-09-22-spam_en.md new file mode 100644 index 00000000000000..223d28120f71ec --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-spam_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English spam DistilBertForSequenceClassification from Luisdahuis +author: John Snow Labs +name: spam +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`spam` is a English model originally trained by Luisdahuis. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/spam_en_5.5.0_3.0_1727020393173.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/spam_en_5.5.0_3.0_1727020393173.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("spam","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("spam", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|spam| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Luisdahuis/spam \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-spam_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-spam_pipeline_en.md new file mode 100644 index 00000000000000..9df6863b68e366 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-spam_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English spam_pipeline pipeline DistilBertForSequenceClassification from Luisdahuis +author: John Snow Labs +name: spam_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`spam_pipeline` is a English model originally trained by Luisdahuis. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/spam_pipeline_en_5.5.0_3.0_1727020407646.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/spam_pipeline_en_5.5.0_3.0_1727020407646.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("spam_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("spam_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|spam_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Luisdahuis/spam + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sreegeni_finetune_textclass_auto_en.md b/docs/_posts/ahmedlone127/2024-09-22-sreegeni_finetune_textclass_auto_en.md new file mode 100644 index 00000000000000..0e2342d0b3fd42 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sreegeni_finetune_textclass_auto_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sreegeni_finetune_textclass_auto RoBertaForSequenceClassification from sreerammadhu +author: John Snow Labs +name: sreegeni_finetune_textclass_auto +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sreegeni_finetune_textclass_auto` is a English model originally trained by sreerammadhu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sreegeni_finetune_textclass_auto_en_5.5.0_3.0_1727017243296.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sreegeni_finetune_textclass_auto_en_5.5.0_3.0_1727017243296.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("sreegeni_finetune_textclass_auto","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("sreegeni_finetune_textclass_auto", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sreegeni_finetune_textclass_auto| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|308.6 MB| + +## References + +https://huggingface.co/sreerammadhu/sreegeni-finetune-textclass-auto \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sreegeni_finetune_textclass_auto_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-sreegeni_finetune_textclass_auto_pipeline_en.md new file mode 100644 index 00000000000000..97863d957d57d4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sreegeni_finetune_textclass_auto_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English sreegeni_finetune_textclass_auto_pipeline pipeline RoBertaForSequenceClassification from sreerammadhu +author: John Snow Labs +name: sreegeni_finetune_textclass_auto_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sreegeni_finetune_textclass_auto_pipeline` is a English model originally trained by sreerammadhu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sreegeni_finetune_textclass_auto_pipeline_en_5.5.0_3.0_1727017258993.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sreegeni_finetune_textclass_auto_pipeline_en_5.5.0_3.0_1727017258993.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sreegeni_finetune_textclass_auto_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sreegeni_finetune_textclass_auto_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sreegeni_finetune_textclass_auto_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|308.7 MB| + +## References + +https://huggingface.co/sreerammadhu/sreegeni-finetune-textclass-auto + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-stance_gottbert_en.md b/docs/_posts/ahmedlone127/2024-09-22-stance_gottbert_en.md new file mode 100644 index 00000000000000..753e342321bdad --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-stance_gottbert_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English stance_gottbert RoBertaForSequenceClassification from ogoshi2000 +author: John Snow Labs +name: stance_gottbert +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`stance_gottbert` is a English model originally trained by ogoshi2000. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/stance_gottbert_en_5.5.0_3.0_1726971983560.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/stance_gottbert_en_5.5.0_3.0_1726971983560.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("stance_gottbert","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("stance_gottbert", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|stance_gottbert| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|472.9 MB| + +## References + +https://huggingface.co/ogoshi2000/stance-gottbert \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-stance_gottbert_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-stance_gottbert_pipeline_en.md new file mode 100644 index 00000000000000..41a3ac15f4f950 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-stance_gottbert_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English stance_gottbert_pipeline pipeline RoBertaForSequenceClassification from ogoshi2000 +author: John Snow Labs +name: stance_gottbert_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`stance_gottbert_pipeline` is a English model originally trained by ogoshi2000. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/stance_gottbert_pipeline_en_5.5.0_3.0_1726972007326.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/stance_gottbert_pipeline_en_5.5.0_3.0_1726972007326.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("stance_gottbert_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("stance_gottbert_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|stance_gottbert_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|472.9 MB| + +## References + +https://huggingface.co/ogoshi2000/stance-gottbert + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-stereoset_roberta_large_finetuned_en.md b/docs/_posts/ahmedlone127/2024-09-22-stereoset_roberta_large_finetuned_en.md new file mode 100644 index 00000000000000..4ffdb7d52af9b5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-stereoset_roberta_large_finetuned_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English stereoset_roberta_large_finetuned RoBertaForSequenceClassification from henryscheible +author: John Snow Labs +name: stereoset_roberta_large_finetuned +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`stereoset_roberta_large_finetuned` is a English model originally trained by henryscheible. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/stereoset_roberta_large_finetuned_en_5.5.0_3.0_1727037325345.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/stereoset_roberta_large_finetuned_en_5.5.0_3.0_1727037325345.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("stereoset_roberta_large_finetuned","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("stereoset_roberta_large_finetuned", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|stereoset_roberta_large_finetuned| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/henryscheible/stereoset_roberta-large_finetuned \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-stereoset_roberta_large_finetuned_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-stereoset_roberta_large_finetuned_pipeline_en.md new file mode 100644 index 00000000000000..befdc1f8de6b44 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-stereoset_roberta_large_finetuned_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English stereoset_roberta_large_finetuned_pipeline pipeline RoBertaForSequenceClassification from henryscheible +author: John Snow Labs +name: stereoset_roberta_large_finetuned_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`stereoset_roberta_large_finetuned_pipeline` is a English model originally trained by henryscheible. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/stereoset_roberta_large_finetuned_pipeline_en_5.5.0_3.0_1727037414939.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/stereoset_roberta_large_finetuned_pipeline_en_5.5.0_3.0_1727037414939.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("stereoset_roberta_large_finetuned_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("stereoset_roberta_large_finetuned_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|stereoset_roberta_large_finetuned_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/henryscheible/stereoset_roberta-large_finetuned + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-stock_sentiment_hp_en.md b/docs/_posts/ahmedlone127/2024-09-22-stock_sentiment_hp_en.md new file mode 100644 index 00000000000000..10724ac8f6cfe4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-stock_sentiment_hp_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English stock_sentiment_hp BertForSequenceClassification from slisowski +author: John Snow Labs +name: stock_sentiment_hp +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`stock_sentiment_hp` is a English model originally trained by slisowski. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/stock_sentiment_hp_en_5.5.0_3.0_1727032021377.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/stock_sentiment_hp_en_5.5.0_3.0_1727032021377.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("stock_sentiment_hp","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("stock_sentiment_hp", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|stock_sentiment_hp| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/slisowski/stock_sentiment_hp \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-stock_sentiment_hp_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-stock_sentiment_hp_pipeline_en.md new file mode 100644 index 00000000000000..f61f7b0831fe00 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-stock_sentiment_hp_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English stock_sentiment_hp_pipeline pipeline BertForSequenceClassification from slisowski +author: John Snow Labs +name: stock_sentiment_hp_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`stock_sentiment_hp_pipeline` is a English model originally trained by slisowski. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/stock_sentiment_hp_pipeline_en_5.5.0_3.0_1727032044844.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/stock_sentiment_hp_pipeline_en_5.5.0_3.0_1727032044844.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("stock_sentiment_hp_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("stock_sentiment_hp_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|stock_sentiment_hp_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/slisowski/stock_sentiment_hp + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-strategytransitionplanv1_en.md b/docs/_posts/ahmedlone127/2024-09-22-strategytransitionplanv1_en.md new file mode 100644 index 00000000000000..6afe2940ffebda --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-strategytransitionplanv1_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English strategytransitionplanv1 RoBertaForSequenceClassification from lomov +author: John Snow Labs +name: strategytransitionplanv1 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`strategytransitionplanv1` is a English model originally trained by lomov. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/strategytransitionplanv1_en_5.5.0_3.0_1727017356142.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/strategytransitionplanv1_en_5.5.0_3.0_1727017356142.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("strategytransitionplanv1","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("strategytransitionplanv1", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|strategytransitionplanv1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|309.6 MB| + +## References + +https://huggingface.co/lomov/strategytransitionplanv1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-strategytransitionplanv1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-strategytransitionplanv1_pipeline_en.md new file mode 100644 index 00000000000000..4d1e620e7e6cd4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-strategytransitionplanv1_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English strategytransitionplanv1_pipeline pipeline RoBertaForSequenceClassification from lomov +author: John Snow Labs +name: strategytransitionplanv1_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`strategytransitionplanv1_pipeline` is a English model originally trained by lomov. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/strategytransitionplanv1_pipeline_en_5.5.0_3.0_1727017370066.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/strategytransitionplanv1_pipeline_en_5.5.0_3.0_1727017370066.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("strategytransitionplanv1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("strategytransitionplanv1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|strategytransitionplanv1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|309.7 MB| + +## References + +https://huggingface.co/lomov/strategytransitionplanv1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-stsb_tinybert_l_4_finetuned_auc_151221_top3_op2_en.md b/docs/_posts/ahmedlone127/2024-09-22-stsb_tinybert_l_4_finetuned_auc_151221_top3_op2_en.md new file mode 100644 index 00000000000000..247006eee7f521 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-stsb_tinybert_l_4_finetuned_auc_151221_top3_op2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English stsb_tinybert_l_4_finetuned_auc_151221_top3_op2 BertForSequenceClassification from Katsiaryna +author: John Snow Labs +name: stsb_tinybert_l_4_finetuned_auc_151221_top3_op2 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`stsb_tinybert_l_4_finetuned_auc_151221_top3_op2` is a English model originally trained by Katsiaryna. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/stsb_tinybert_l_4_finetuned_auc_151221_top3_op2_en_5.5.0_3.0_1727034717612.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/stsb_tinybert_l_4_finetuned_auc_151221_top3_op2_en_5.5.0_3.0_1727034717612.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("stsb_tinybert_l_4_finetuned_auc_151221_top3_op2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("stsb_tinybert_l_4_finetuned_auc_151221_top3_op2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|stsb_tinybert_l_4_finetuned_auc_151221_top3_op2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|54.2 MB| + +## References + +https://huggingface.co/Katsiaryna/stsb-TinyBERT-L-4-finetuned_auc_151221-top3_op2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-stsb_tinybert_l_4_finetuned_auc_151221_top3_op2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-stsb_tinybert_l_4_finetuned_auc_151221_top3_op2_pipeline_en.md new file mode 100644 index 00000000000000..402125694c1c6e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-stsb_tinybert_l_4_finetuned_auc_151221_top3_op2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English stsb_tinybert_l_4_finetuned_auc_151221_top3_op2_pipeline pipeline BertForSequenceClassification from Katsiaryna +author: John Snow Labs +name: stsb_tinybert_l_4_finetuned_auc_151221_top3_op2_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`stsb_tinybert_l_4_finetuned_auc_151221_top3_op2_pipeline` is a English model originally trained by Katsiaryna. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/stsb_tinybert_l_4_finetuned_auc_151221_top3_op2_pipeline_en_5.5.0_3.0_1727034720633.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/stsb_tinybert_l_4_finetuned_auc_151221_top3_op2_pipeline_en_5.5.0_3.0_1727034720633.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("stsb_tinybert_l_4_finetuned_auc_151221_top3_op2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("stsb_tinybert_l_4_finetuned_auc_151221_top3_op2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|stsb_tinybert_l_4_finetuned_auc_151221_top3_op2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|54.2 MB| + +## References + +https://huggingface.co/Katsiaryna/stsb-TinyBERT-L-4-finetuned_auc_151221-top3_op2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-suicide_distilbert_2_4_en.md b/docs/_posts/ahmedlone127/2024-09-22-suicide_distilbert_2_4_en.md new file mode 100644 index 00000000000000..0e4c5daa1d0a0a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-suicide_distilbert_2_4_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English suicide_distilbert_2_4 DistilBertForSequenceClassification from cuadron11 +author: John Snow Labs +name: suicide_distilbert_2_4 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`suicide_distilbert_2_4` is a English model originally trained by cuadron11. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/suicide_distilbert_2_4_en_5.5.0_3.0_1727012649237.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/suicide_distilbert_2_4_en_5.5.0_3.0_1727012649237.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("suicide_distilbert_2_4","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("suicide_distilbert_2_4", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|suicide_distilbert_2_4| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/cuadron11/suicide-distilbert-2-4 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-suicide_distilbert_2_4_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-suicide_distilbert_2_4_pipeline_en.md new file mode 100644 index 00000000000000..a95eb6227f8b83 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-suicide_distilbert_2_4_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English suicide_distilbert_2_4_pipeline pipeline DistilBertForSequenceClassification from cuadron11 +author: John Snow Labs +name: suicide_distilbert_2_4_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`suicide_distilbert_2_4_pipeline` is a English model originally trained by cuadron11. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/suicide_distilbert_2_4_pipeline_en_5.5.0_3.0_1727012661672.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/suicide_distilbert_2_4_pipeline_en_5.5.0_3.0_1727012661672.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("suicide_distilbert_2_4_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("suicide_distilbert_2_4_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|suicide_distilbert_2_4_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/cuadron11/suicide-distilbert-2-4 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-swahili_ner_bertbase_cased_pipeline_sw.md b/docs/_posts/ahmedlone127/2024-09-22-swahili_ner_bertbase_cased_pipeline_sw.md new file mode 100644 index 00000000000000..5437302b34c777 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-swahili_ner_bertbase_cased_pipeline_sw.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Swahili (macrolanguage) swahili_ner_bertbase_cased_pipeline pipeline BertForTokenClassification from eolang +author: John Snow Labs +name: swahili_ner_bertbase_cased_pipeline +date: 2024-09-22 +tags: [sw, open_source, pipeline, onnx] +task: Named Entity Recognition +language: sw +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`swahili_ner_bertbase_cased_pipeline` is a Swahili (macrolanguage) model originally trained by eolang. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/swahili_ner_bertbase_cased_pipeline_sw_5.5.0_3.0_1727040538207.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/swahili_ner_bertbase_cased_pipeline_sw_5.5.0_3.0_1727040538207.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("swahili_ner_bertbase_cased_pipeline", lang = "sw") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("swahili_ner_bertbase_cased_pipeline", lang = "sw") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|swahili_ner_bertbase_cased_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|sw| +|Size:|665.1 MB| + +## References + +https://huggingface.co/eolang/Swahili-NER-BertBase-Cased + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-swahili_ner_bertbase_cased_sw.md b/docs/_posts/ahmedlone127/2024-09-22-swahili_ner_bertbase_cased_sw.md new file mode 100644 index 00000000000000..e96120ec880b71 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-swahili_ner_bertbase_cased_sw.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Swahili (macrolanguage) swahili_ner_bertbase_cased BertForTokenClassification from eolang +author: John Snow Labs +name: swahili_ner_bertbase_cased +date: 2024-09-22 +tags: [sw, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: sw +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`swahili_ner_bertbase_cased` is a Swahili (macrolanguage) model originally trained by eolang. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/swahili_ner_bertbase_cased_sw_5.5.0_3.0_1727040504584.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/swahili_ner_bertbase_cased_sw_5.5.0_3.0_1727040504584.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("swahili_ner_bertbase_cased","sw") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("swahili_ner_bertbase_cased", "sw") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|swahili_ner_bertbase_cased| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|sw| +|Size:|665.1 MB| + +## References + +https://huggingface.co/eolang/Swahili-NER-BertBase-Cased \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-swot_classifier_en.md b/docs/_posts/ahmedlone127/2024-09-22-swot_classifier_en.md new file mode 100644 index 00000000000000..6094792ab197b0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-swot_classifier_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English swot_classifier DistilBertForSequenceClassification from jcaponigro +author: John Snow Labs +name: swot_classifier +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`swot_classifier` is a English model originally trained by jcaponigro. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/swot_classifier_en_5.5.0_3.0_1727012230273.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/swot_classifier_en_5.5.0_3.0_1727012230273.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("swot_classifier","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("swot_classifier", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|swot_classifier| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/jcaponigro/SWOT_Classifier \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-swot_classifier_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-swot_classifier_pipeline_en.md new file mode 100644 index 00000000000000..bd4f20628cdedf --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-swot_classifier_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English swot_classifier_pipeline pipeline DistilBertForSequenceClassification from jcaponigro +author: John Snow Labs +name: swot_classifier_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`swot_classifier_pipeline` is a English model originally trained by jcaponigro. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/swot_classifier_pipeline_en_5.5.0_3.0_1727012243916.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/swot_classifier_pipeline_en_5.5.0_3.0_1727012243916.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("swot_classifier_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("swot_classifier_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|swot_classifier_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/jcaponigro/SWOT_Classifier + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-t_108_en.md b/docs/_posts/ahmedlone127/2024-09-22-t_108_en.md new file mode 100644 index 00000000000000..0bc4dd326cda7e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-t_108_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English t_108 RoBertaForSequenceClassification from Pablojmed +author: John Snow Labs +name: t_108 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`t_108` is a English model originally trained by Pablojmed. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/t_108_en_5.5.0_3.0_1726972378305.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/t_108_en_5.5.0_3.0_1726972378305.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("t_108","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("t_108", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|t_108| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|438.2 MB| + +## References + +https://huggingface.co/Pablojmed/t_108 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-t_108_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-t_108_pipeline_en.md new file mode 100644 index 00000000000000..afa83b84f374fc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-t_108_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English t_108_pipeline pipeline RoBertaForSequenceClassification from Pablojmed +author: John Snow Labs +name: t_108_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`t_108_pipeline` is a English model originally trained by Pablojmed. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/t_108_pipeline_en_5.5.0_3.0_1726972401481.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/t_108_pipeline_en_5.5.0_3.0_1726972401481.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("t_108_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("t_108_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|t_108_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|438.2 MB| + +## References + +https://huggingface.co/Pablojmed/t_108 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-t_7_en.md b/docs/_posts/ahmedlone127/2024-09-22-t_7_en.md new file mode 100644 index 00000000000000..798452b53b2d21 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-t_7_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English t_7 RoBertaForSequenceClassification from Pablojmed +author: John Snow Labs +name: t_7 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`t_7` is a English model originally trained by Pablojmed. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/t_7_en_5.5.0_3.0_1727017187438.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/t_7_en_5.5.0_3.0_1727017187438.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("t_7","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("t_7", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|t_7| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|438.2 MB| + +## References + +https://huggingface.co/Pablojmed/t_7 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-t_7_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-t_7_pipeline_en.md new file mode 100644 index 00000000000000..0d2c34f566347f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-t_7_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English t_7_pipeline pipeline RoBertaForSequenceClassification from Pablojmed +author: John Snow Labs +name: t_7_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`t_7_pipeline` is a English model originally trained by Pablojmed. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/t_7_pipeline_en_5.5.0_3.0_1727017215307.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/t_7_pipeline_en_5.5.0_3.0_1727017215307.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("t_7_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("t_7_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|t_7_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|438.2 MB| + +## References + +https://huggingface.co/Pablojmed/t_7 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-tanya_mama_ner_en.md b/docs/_posts/ahmedlone127/2024-09-22-tanya_mama_ner_en.md new file mode 100644 index 00000000000000..a45f9ccf69e6f4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-tanya_mama_ner_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English tanya_mama_ner XlmRoBertaForTokenClassification from Domo123 +author: John Snow Labs +name: tanya_mama_ner +date: 2024-09-22 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`tanya_mama_ner` is a English model originally trained by Domo123. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/tanya_mama_ner_en_5.5.0_3.0_1727019506290.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/tanya_mama_ner_en_5.5.0_3.0_1727019506290.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("tanya_mama_ner","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("tanya_mama_ner", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|tanya_mama_ner| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|841.6 MB| + +## References + +https://huggingface.co/Domo123/tanya-mama-ner \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-tanya_mama_ner_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-tanya_mama_ner_pipeline_en.md new file mode 100644 index 00000000000000..bec6e4d2a3e527 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-tanya_mama_ner_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English tanya_mama_ner_pipeline pipeline XlmRoBertaForTokenClassification from Domo123 +author: John Snow Labs +name: tanya_mama_ner_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`tanya_mama_ner_pipeline` is a English model originally trained by Domo123. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/tanya_mama_ner_pipeline_en_5.5.0_3.0_1727019586437.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/tanya_mama_ner_pipeline_en_5.5.0_3.0_1727019586437.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("tanya_mama_ner_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("tanya_mama_ner_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|tanya_mama_ner_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|841.6 MB| + +## References + +https://huggingface.co/Domo123/tanya-mama-ner + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-tapp_multilabel_climatebert_f_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-tapp_multilabel_climatebert_f_pipeline_en.md new file mode 100644 index 00000000000000..1a16abec5c79e7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-tapp_multilabel_climatebert_f_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English tapp_multilabel_climatebert_f_pipeline pipeline RoBertaForSequenceClassification from GIZ +author: John Snow Labs +name: tapp_multilabel_climatebert_f_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`tapp_multilabel_climatebert_f_pipeline` is a English model originally trained by GIZ. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/tapp_multilabel_climatebert_f_pipeline_en_5.5.0_3.0_1726972206134.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/tapp_multilabel_climatebert_f_pipeline_en_5.5.0_3.0_1726972206134.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("tapp_multilabel_climatebert_f_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("tapp_multilabel_climatebert_f_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|tapp_multilabel_climatebert_f_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|309.7 MB| + +## References + +https://huggingface.co/GIZ/TAPP-multilabel-climatebert_f + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-task4_2_en.md b/docs/_posts/ahmedlone127/2024-09-22-task4_2_en.md new file mode 100644 index 00000000000000..800adf85bbce40 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-task4_2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English task4_2 DistilBertForSequenceClassification from Koreander +author: John Snow Labs +name: task4_2 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`task4_2` is a English model originally trained by Koreander. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/task4_2_en_5.5.0_3.0_1727033707058.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/task4_2_en_5.5.0_3.0_1727033707058.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("task4_2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("task4_2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|task4_2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Koreander/task4-2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-test_fith_ko.md b/docs/_posts/ahmedlone127/2024-09-22-test_fith_ko.md new file mode 100644 index 00000000000000..46ca0cfd77821d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-test_fith_ko.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Korean test_fith WhisperForCTC from kyungmin011029 +author: John Snow Labs +name: test_fith +date: 2024-09-22 +tags: [ko, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: ko +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`test_fith` is a Korean model originally trained by kyungmin011029. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/test_fith_ko_5.5.0_3.0_1726981307031.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/test_fith_ko_5.5.0_3.0_1726981307031.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("test_fith","ko") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("test_fith", "ko") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|test_fith| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|ko| +|Size:|643.3 MB| + +## References + +https://huggingface.co/kyungmin011029/test_fith \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-test_fith_pipeline_ko.md b/docs/_posts/ahmedlone127/2024-09-22-test_fith_pipeline_ko.md new file mode 100644 index 00000000000000..da0b825a38e282 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-test_fith_pipeline_ko.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Korean test_fith_pipeline pipeline WhisperForCTC from kyungmin011029 +author: John Snow Labs +name: test_fith_pipeline +date: 2024-09-22 +tags: [ko, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: ko +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`test_fith_pipeline` is a Korean model originally trained by kyungmin011029. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/test_fith_pipeline_ko_5.5.0_3.0_1726981337434.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/test_fith_pipeline_ko_5.5.0_3.0_1726981337434.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("test_fith_pipeline", lang = "ko") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("test_fith_pipeline", lang = "ko") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|test_fith_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|ko| +|Size:|643.3 MB| + +## References + +https://huggingface.co/kyungmin011029/test_fith + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-test_glue_en.md b/docs/_posts/ahmedlone127/2024-09-22-test_glue_en.md new file mode 100644 index 00000000000000..68c7fd4022fc56 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-test_glue_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English test_glue DistilBertForSequenceClassification from honghk +author: John Snow Labs +name: test_glue +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`test_glue` is a English model originally trained by honghk. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/test_glue_en_5.5.0_3.0_1727012749644.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/test_glue_en_5.5.0_3.0_1727012749644.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("test_glue","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("test_glue", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|test_glue| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/honghk/test-glue \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-test_glue_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-test_glue_pipeline_en.md new file mode 100644 index 00000000000000..5d47623797709e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-test_glue_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English test_glue_pipeline pipeline DistilBertForSequenceClassification from honghk +author: John Snow Labs +name: test_glue_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`test_glue_pipeline` is a English model originally trained by honghk. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/test_glue_pipeline_en_5.5.0_3.0_1727012761309.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/test_glue_pipeline_en_5.5.0_3.0_1727012761309.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("test_glue_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("test_glue_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|test_glue_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/honghk/test-glue + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-test_medner_en.md b/docs/_posts/ahmedlone127/2024-09-22-test_medner_en.md new file mode 100644 index 00000000000000..bc3c40361e2e41 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-test_medner_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English test_medner BertForTokenClassification from adalbertojunior +author: John Snow Labs +name: test_medner +date: 2024-09-22 +tags: [en, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`test_medner` is a English model originally trained by adalbertojunior. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/test_medner_en_5.5.0_3.0_1727040323041.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/test_medner_en_5.5.0_3.0_1727040323041.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("test_medner","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("test_medner", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|test_medner| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|664.9 MB| + +## References + +https://huggingface.co/adalbertojunior/test-medner \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-test_medner_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-test_medner_pipeline_en.md new file mode 100644 index 00000000000000..986581b6039bb2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-test_medner_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English test_medner_pipeline pipeline BertForTokenClassification from adalbertojunior +author: John Snow Labs +name: test_medner_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`test_medner_pipeline` is a English model originally trained by adalbertojunior. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/test_medner_pipeline_en_5.5.0_3.0_1727040356909.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/test_medner_pipeline_en_5.5.0_3.0_1727040356909.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("test_medner_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("test_medner_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|test_medner_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|664.9 MB| + +## References + +https://huggingface.co/adalbertojunior/test-medner + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-test_model_2_en.md b/docs/_posts/ahmedlone127/2024-09-22-test_model_2_en.md new file mode 100644 index 00000000000000..9e30af59c33b48 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-test_model_2_en.md @@ -0,0 +1,96 @@ +--- +layout: model +title: English test_model_2 DistilBertEmbeddings from TamBeo +author: John Snow Labs +name: test_model_2 +date: 2024-09-22 +tags: [en, open_source, onnx, embeddings, distilbert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`test_model_2` is a English model originally trained by TamBeo. + +## Predicted Entities + + + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/test_model_2_en_5.5.0_3.0_1727012643926.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/test_model_2_en_5.5.0_3.0_1727012643926.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = DistilBertEmbeddings.pretrained("test_model_2","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = DistilBertEmbeddings.pretrained("test_model_2","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|test_model_2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +References + +https://huggingface.co/TamBeo/test_model_2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-test_roberta_euhaq_en.md b/docs/_posts/ahmedlone127/2024-09-22-test_roberta_euhaq_en.md new file mode 100644 index 00000000000000..d13284e878253a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-test_roberta_euhaq_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English test_roberta_euhaq RoBertaForSequenceClassification from euhaq +author: John Snow Labs +name: test_roberta_euhaq +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`test_roberta_euhaq` is a English model originally trained by euhaq. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/test_roberta_euhaq_en_5.5.0_3.0_1726967540876.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/test_roberta_euhaq_en_5.5.0_3.0_1726967540876.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("test_roberta_euhaq","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("test_roberta_euhaq", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|test_roberta_euhaq| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|421.9 MB| + +## References + +https://huggingface.co/euhaq/test_roberta \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-test_roberta_euhaq_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-test_roberta_euhaq_pipeline_en.md new file mode 100644 index 00000000000000..775978fcaecaf6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-test_roberta_euhaq_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English test_roberta_euhaq_pipeline pipeline RoBertaForSequenceClassification from euhaq +author: John Snow Labs +name: test_roberta_euhaq_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`test_roberta_euhaq_pipeline` is a English model originally trained by euhaq. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/test_roberta_euhaq_pipeline_en_5.5.0_3.0_1726967578627.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/test_roberta_euhaq_pipeline_en_5.5.0_3.0_1726967578627.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("test_roberta_euhaq_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("test_roberta_euhaq_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|test_roberta_euhaq_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|421.9 MB| + +## References + +https://huggingface.co/euhaq/test_roberta + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-test_trainer11a_en.md b/docs/_posts/ahmedlone127/2024-09-22-test_trainer11a_en.md new file mode 100644 index 00000000000000..de42e6e00af75a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-test_trainer11a_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English test_trainer11a DistilBertForSequenceClassification from SimoneJLaudani +author: John Snow Labs +name: test_trainer11a +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`test_trainer11a` is a English model originally trained by SimoneJLaudani. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/test_trainer11a_en_5.5.0_3.0_1727020392937.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/test_trainer11a_en_5.5.0_3.0_1727020392937.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("test_trainer11a","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("test_trainer11a", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|test_trainer11a| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|246.0 MB| + +## References + +https://huggingface.co/SimoneJLaudani/test_trainer11a \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-test_trainer11a_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-test_trainer11a_pipeline_en.md new file mode 100644 index 00000000000000..b5d62145c5b19d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-test_trainer11a_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English test_trainer11a_pipeline pipeline DistilBertForSequenceClassification from SimoneJLaudani +author: John Snow Labs +name: test_trainer11a_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`test_trainer11a_pipeline` is a English model originally trained by SimoneJLaudani. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/test_trainer11a_pipeline_en_5.5.0_3.0_1727020407533.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/test_trainer11a_pipeline_en_5.5.0_3.0_1727020407533.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("test_trainer11a_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("test_trainer11a_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|test_trainer11a_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|246.0 MB| + +## References + +https://huggingface.co/SimoneJLaudani/test_trainer11a + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-test_ver5_ko.md b/docs/_posts/ahmedlone127/2024-09-22-test_ver5_ko.md new file mode 100644 index 00000000000000..6134ad775edbf4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-test_ver5_ko.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Korean test_ver5 WhisperForCTC from Jpep26 +author: John Snow Labs +name: test_ver5 +date: 2024-09-22 +tags: [ko, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: ko +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`test_ver5` is a Korean model originally trained by Jpep26. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/test_ver5_ko_5.5.0_3.0_1727023573446.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/test_ver5_ko_5.5.0_3.0_1727023573446.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("test_ver5","ko") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("test_ver5", "ko") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|test_ver5| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|ko| +|Size:|642.4 MB| + +## References + +https://huggingface.co/Jpep26/test_ver5 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-test_ver5_pipeline_ko.md b/docs/_posts/ahmedlone127/2024-09-22-test_ver5_pipeline_ko.md new file mode 100644 index 00000000000000..4873ef831319b2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-test_ver5_pipeline_ko.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Korean test_ver5_pipeline pipeline WhisperForCTC from Jpep26 +author: John Snow Labs +name: test_ver5_pipeline +date: 2024-09-22 +tags: [ko, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: ko +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`test_ver5_pipeline` is a Korean model originally trained by Jpep26. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/test_ver5_pipeline_ko_5.5.0_3.0_1727023606900.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/test_ver5_pipeline_ko_5.5.0_3.0_1727023606900.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("test_ver5_pipeline", lang = "ko") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("test_ver5_pipeline", lang = "ko") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|test_ver5_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|ko| +|Size:|642.4 MB| + +## References + +https://huggingface.co/Jpep26/test_ver5 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-text_classification_5000_en.md b/docs/_posts/ahmedlone127/2024-09-22-text_classification_5000_en.md new file mode 100644 index 00000000000000..ed9d284c2f7c13 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-text_classification_5000_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English text_classification_5000 DistilBertForSequenceClassification from Neroism8422 +author: John Snow Labs +name: text_classification_5000 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`text_classification_5000` is a English model originally trained by Neroism8422. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/text_classification_5000_en_5.5.0_3.0_1727021071557.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/text_classification_5000_en_5.5.0_3.0_1727021071557.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("text_classification_5000","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("text_classification_5000", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|text_classification_5000| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Neroism8422/Text_Classification_5000 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-text_classification_5000_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-text_classification_5000_pipeline_en.md new file mode 100644 index 00000000000000..ff1edc0e89c102 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-text_classification_5000_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English text_classification_5000_pipeline pipeline DistilBertForSequenceClassification from Neroism8422 +author: John Snow Labs +name: text_classification_5000_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`text_classification_5000_pipeline` is a English model originally trained by Neroism8422. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/text_classification_5000_pipeline_en_5.5.0_3.0_1727021083192.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/text_classification_5000_pipeline_en_5.5.0_3.0_1727021083192.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("text_classification_5000_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("text_classification_5000_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|text_classification_5000_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Neroism8422/Text_Classification_5000 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-text_emotion_classifier_distilbert_en.md b/docs/_posts/ahmedlone127/2024-09-22-text_emotion_classifier_distilbert_en.md new file mode 100644 index 00000000000000..11727d193a0929 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-text_emotion_classifier_distilbert_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English text_emotion_classifier_distilbert BertForSequenceClassification from dima806 +author: John Snow Labs +name: text_emotion_classifier_distilbert +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`text_emotion_classifier_distilbert` is a English model originally trained by dima806. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/text_emotion_classifier_distilbert_en_5.5.0_3.0_1726988477721.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/text_emotion_classifier_distilbert_en_5.5.0_3.0_1726988477721.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("text_emotion_classifier_distilbert","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("text_emotion_classifier_distilbert", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|text_emotion_classifier_distilbert| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/dima806/text-emotion-classifier-distilbert \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-text_emotion_classifier_distilbert_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-text_emotion_classifier_distilbert_pipeline_en.md new file mode 100644 index 00000000000000..1151f93e7a1302 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-text_emotion_classifier_distilbert_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English text_emotion_classifier_distilbert_pipeline pipeline BertForSequenceClassification from dima806 +author: John Snow Labs +name: text_emotion_classifier_distilbert_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`text_emotion_classifier_distilbert_pipeline` is a English model originally trained by dima806. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/text_emotion_classifier_distilbert_pipeline_en_5.5.0_3.0_1726988497857.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/text_emotion_classifier_distilbert_pipeline_en_5.5.0_3.0_1726988497857.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("text_emotion_classifier_distilbert_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("text_emotion_classifier_distilbert_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|text_emotion_classifier_distilbert_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/dima806/text-emotion-classifier-distilbert + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-tiny_bert_flax_en.md b/docs/_posts/ahmedlone127/2024-09-22-tiny_bert_flax_en.md new file mode 100644 index 00000000000000..f7907b09191639 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-tiny_bert_flax_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English tiny_bert_flax BertForQuestionAnswering from harshil10 +author: John Snow Labs +name: tiny_bert_flax +date: 2024-09-22 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`tiny_bert_flax` is a English model originally trained by harshil10. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/tiny_bert_flax_en_5.5.0_3.0_1726991925976.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/tiny_bert_flax_en_5.5.0_3.0_1726991925976.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("tiny_bert_flax","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("tiny_bert_flax", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|tiny_bert_flax| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|16.7 MB| + +## References + +https://huggingface.co/harshil10/tiny_bert_flax \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-tiny_bert_flax_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-tiny_bert_flax_pipeline_en.md new file mode 100644 index 00000000000000..e5065a8167d509 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-tiny_bert_flax_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English tiny_bert_flax_pipeline pipeline BertForQuestionAnswering from harshil10 +author: John Snow Labs +name: tiny_bert_flax_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`tiny_bert_flax_pipeline` is a English model originally trained by harshil10. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/tiny_bert_flax_pipeline_en_5.5.0_3.0_1726991927086.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/tiny_bert_flax_pipeline_en_5.5.0_3.0_1726991927086.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("tiny_bert_flax_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("tiny_bert_flax_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|tiny_bert_flax_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|16.7 MB| + +## References + +https://huggingface.co/harshil10/tiny_bert_flax + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-tiny_english_combined_v3_trimmed_1_0_15_8_1e_05_dainty_sweep_13_en.md b/docs/_posts/ahmedlone127/2024-09-22-tiny_english_combined_v3_trimmed_1_0_15_8_1e_05_dainty_sweep_13_en.md new file mode 100644 index 00000000000000..1db666bf8032ef --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-tiny_english_combined_v3_trimmed_1_0_15_8_1e_05_dainty_sweep_13_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English tiny_english_combined_v3_trimmed_1_0_15_8_1e_05_dainty_sweep_13 WhisperForCTC from saahith +author: John Snow Labs +name: tiny_english_combined_v3_trimmed_1_0_15_8_1e_05_dainty_sweep_13 +date: 2024-09-22 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`tiny_english_combined_v3_trimmed_1_0_15_8_1e_05_dainty_sweep_13` is a English model originally trained by saahith. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/tiny_english_combined_v3_trimmed_1_0_15_8_1e_05_dainty_sweep_13_en_5.5.0_3.0_1727023433362.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/tiny_english_combined_v3_trimmed_1_0_15_8_1e_05_dainty_sweep_13_en_5.5.0_3.0_1727023433362.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("tiny_english_combined_v3_trimmed_1_0_15_8_1e_05_dainty_sweep_13","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("tiny_english_combined_v3_trimmed_1_0_15_8_1e_05_dainty_sweep_13", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|tiny_english_combined_v3_trimmed_1_0_15_8_1e_05_dainty_sweep_13| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|394.5 MB| + +## References + +https://huggingface.co/saahith/tiny.en-combined_v3-trimmed-1-0.15-8-1e-05-dainty-sweep-13 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-tiny_english_combined_v3_trimmed_1_0_15_8_1e_05_dainty_sweep_13_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-tiny_english_combined_v3_trimmed_1_0_15_8_1e_05_dainty_sweep_13_pipeline_en.md new file mode 100644 index 00000000000000..d7d186e173c03b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-tiny_english_combined_v3_trimmed_1_0_15_8_1e_05_dainty_sweep_13_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English tiny_english_combined_v3_trimmed_1_0_15_8_1e_05_dainty_sweep_13_pipeline pipeline WhisperForCTC from saahith +author: John Snow Labs +name: tiny_english_combined_v3_trimmed_1_0_15_8_1e_05_dainty_sweep_13_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`tiny_english_combined_v3_trimmed_1_0_15_8_1e_05_dainty_sweep_13_pipeline` is a English model originally trained by saahith. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/tiny_english_combined_v3_trimmed_1_0_15_8_1e_05_dainty_sweep_13_pipeline_en_5.5.0_3.0_1727023452999.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/tiny_english_combined_v3_trimmed_1_0_15_8_1e_05_dainty_sweep_13_pipeline_en_5.5.0_3.0_1727023452999.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("tiny_english_combined_v3_trimmed_1_0_15_8_1e_05_dainty_sweep_13_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("tiny_english_combined_v3_trimmed_1_0_15_8_1e_05_dainty_sweep_13_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|tiny_english_combined_v3_trimmed_1_0_15_8_1e_05_dainty_sweep_13_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|394.5 MB| + +## References + +https://huggingface.co/saahith/tiny.en-combined_v3-trimmed-1-0.15-8-1e-05-dainty-sweep-13 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-topic_topic_random1_seed1_roberta_large_en.md b/docs/_posts/ahmedlone127/2024-09-22-topic_topic_random1_seed1_roberta_large_en.md new file mode 100644 index 00000000000000..32ab34fc51f6bc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-topic_topic_random1_seed1_roberta_large_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English topic_topic_random1_seed1_roberta_large RoBertaForSequenceClassification from tweettemposhift +author: John Snow Labs +name: topic_topic_random1_seed1_roberta_large +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`topic_topic_random1_seed1_roberta_large` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/topic_topic_random1_seed1_roberta_large_en_5.5.0_3.0_1727026950525.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/topic_topic_random1_seed1_roberta_large_en_5.5.0_3.0_1727026950525.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("topic_topic_random1_seed1_roberta_large","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("topic_topic_random1_seed1_roberta_large", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|topic_topic_random1_seed1_roberta_large| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/tweettemposhift/topic-topic_random1_seed1-roberta-large \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-topic_topic_random1_seed1_roberta_large_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-topic_topic_random1_seed1_roberta_large_pipeline_en.md new file mode 100644 index 00000000000000..50118f187bc6fa --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-topic_topic_random1_seed1_roberta_large_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English topic_topic_random1_seed1_roberta_large_pipeline pipeline RoBertaForSequenceClassification from tweettemposhift +author: John Snow Labs +name: topic_topic_random1_seed1_roberta_large_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`topic_topic_random1_seed1_roberta_large_pipeline` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/topic_topic_random1_seed1_roberta_large_pipeline_en_5.5.0_3.0_1727027029830.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/topic_topic_random1_seed1_roberta_large_pipeline_en_5.5.0_3.0_1727027029830.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("topic_topic_random1_seed1_roberta_large_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("topic_topic_random1_seed1_roberta_large_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|topic_topic_random1_seed1_roberta_large_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/tweettemposhift/topic-topic_random1_seed1-roberta-large + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-topic_topic_random2_seed2_bertweet_large_en.md b/docs/_posts/ahmedlone127/2024-09-22-topic_topic_random2_seed2_bertweet_large_en.md new file mode 100644 index 00000000000000..c56319d3ccc6c1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-topic_topic_random2_seed2_bertweet_large_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English topic_topic_random2_seed2_bertweet_large RoBertaForSequenceClassification from tweettemposhift +author: John Snow Labs +name: topic_topic_random2_seed2_bertweet_large +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`topic_topic_random2_seed2_bertweet_large` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/topic_topic_random2_seed2_bertweet_large_en_5.5.0_3.0_1727016881218.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/topic_topic_random2_seed2_bertweet_large_en_5.5.0_3.0_1727016881218.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("topic_topic_random2_seed2_bertweet_large","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("topic_topic_random2_seed2_bertweet_large", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|topic_topic_random2_seed2_bertweet_large| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/tweettemposhift/topic-topic_random2_seed2-bertweet-large \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-topic_topic_random2_seed2_bertweet_large_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-topic_topic_random2_seed2_bertweet_large_pipeline_en.md new file mode 100644 index 00000000000000..1a8c1edaf62be6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-topic_topic_random2_seed2_bertweet_large_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English topic_topic_random2_seed2_bertweet_large_pipeline pipeline RoBertaForSequenceClassification from tweettemposhift +author: John Snow Labs +name: topic_topic_random2_seed2_bertweet_large_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`topic_topic_random2_seed2_bertweet_large_pipeline` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/topic_topic_random2_seed2_bertweet_large_pipeline_en_5.5.0_3.0_1727016953455.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/topic_topic_random2_seed2_bertweet_large_pipeline_en_5.5.0_3.0_1727016953455.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("topic_topic_random2_seed2_bertweet_large_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("topic_topic_random2_seed2_bertweet_large_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|topic_topic_random2_seed2_bertweet_large_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/tweettemposhift/topic-topic_random2_seed2-bertweet-large + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-trainer2b_en.md b/docs/_posts/ahmedlone127/2024-09-22-trainer2b_en.md new file mode 100644 index 00000000000000..8d211950aaad8b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-trainer2b_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English trainer2b DistilBertForSequenceClassification from SimoneJLaudani +author: John Snow Labs +name: trainer2b +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`trainer2b` is a English model originally trained by SimoneJLaudani. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/trainer2b_en_5.5.0_3.0_1726980109846.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/trainer2b_en_5.5.0_3.0_1726980109846.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("trainer2b","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("trainer2b", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|trainer2b| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|410.0 MB| + +## References + +https://huggingface.co/SimoneJLaudani/trainer2b \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-trainer8_en.md b/docs/_posts/ahmedlone127/2024-09-22-trainer8_en.md new file mode 100644 index 00000000000000..7ddf5565836888 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-trainer8_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English trainer8 DistilBertForSequenceClassification from SimoneJLaudani +author: John Snow Labs +name: trainer8 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`trainer8` is a English model originally trained by SimoneJLaudani. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/trainer8_en_5.5.0_3.0_1727020486538.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/trainer8_en_5.5.0_3.0_1727020486538.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("trainer8","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("trainer8", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|trainer8| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|246.0 MB| + +## References + +https://huggingface.co/SimoneJLaudani/trainer8 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-transformer_classification_lex_ceo_test_en.md b/docs/_posts/ahmedlone127/2024-09-22-transformer_classification_lex_ceo_test_en.md new file mode 100644 index 00000000000000..aa5f1591bf1b5e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-transformer_classification_lex_ceo_test_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English transformer_classification_lex_ceo_test RoBertaForSequenceClassification from rd-1 +author: John Snow Labs +name: transformer_classification_lex_ceo_test +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`transformer_classification_lex_ceo_test` is a English model originally trained by rd-1. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/transformer_classification_lex_ceo_test_en_5.5.0_3.0_1727027006388.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/transformer_classification_lex_ceo_test_en_5.5.0_3.0_1727027006388.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("transformer_classification_lex_ceo_test","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("transformer_classification_lex_ceo_test", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|transformer_classification_lex_ceo_test| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|468.4 MB| + +## References + +https://huggingface.co/rd-1/transformer_classification_lex_ceo_test \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-transformer_classification_lex_ceo_test_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-transformer_classification_lex_ceo_test_pipeline_en.md new file mode 100644 index 00000000000000..fe40b5ea4906d7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-transformer_classification_lex_ceo_test_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English transformer_classification_lex_ceo_test_pipeline pipeline RoBertaForSequenceClassification from rd-1 +author: John Snow Labs +name: transformer_classification_lex_ceo_test_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`transformer_classification_lex_ceo_test_pipeline` is a English model originally trained by rd-1. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/transformer_classification_lex_ceo_test_pipeline_en_5.5.0_3.0_1727027029444.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/transformer_classification_lex_ceo_test_pipeline_en_5.5.0_3.0_1727027029444.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("transformer_classification_lex_ceo_test_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("transformer_classification_lex_ceo_test_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|transformer_classification_lex_ceo_test_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|468.4 MB| + +## References + +https://huggingface.co/rd-1/transformer_classification_lex_ceo_test + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-trial_model_hrudhai_rajasekhar_en.md b/docs/_posts/ahmedlone127/2024-09-22-trial_model_hrudhai_rajasekhar_en.md new file mode 100644 index 00000000000000..e3eb9556e3cb6d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-trial_model_hrudhai_rajasekhar_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English trial_model_hrudhai_rajasekhar RoBertaForSequenceClassification from hrudhai-rajasekhar +author: John Snow Labs +name: trial_model_hrudhai_rajasekhar +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`trial_model_hrudhai_rajasekhar` is a English model originally trained by hrudhai-rajasekhar. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/trial_model_hrudhai_rajasekhar_en_5.5.0_3.0_1726967529018.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/trial_model_hrudhai_rajasekhar_en_5.5.0_3.0_1726967529018.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("trial_model_hrudhai_rajasekhar","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("trial_model_hrudhai_rajasekhar", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|trial_model_hrudhai_rajasekhar| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|416.1 MB| + +## References + +https://huggingface.co/hrudhai-rajasekhar/trial-model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-trial_model_hrudhai_rajasekhar_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-trial_model_hrudhai_rajasekhar_pipeline_en.md new file mode 100644 index 00000000000000..71afcb5558d9a8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-trial_model_hrudhai_rajasekhar_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English trial_model_hrudhai_rajasekhar_pipeline pipeline RoBertaForSequenceClassification from hrudhai-rajasekhar +author: John Snow Labs +name: trial_model_hrudhai_rajasekhar_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`trial_model_hrudhai_rajasekhar_pipeline` is a English model originally trained by hrudhai-rajasekhar. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/trial_model_hrudhai_rajasekhar_pipeline_en_5.5.0_3.0_1726967571784.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/trial_model_hrudhai_rajasekhar_pipeline_en_5.5.0_3.0_1726967571784.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("trial_model_hrudhai_rajasekhar_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("trial_model_hrudhai_rajasekhar_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|trial_model_hrudhai_rajasekhar_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|416.1 MB| + +## References + +https://huggingface.co/hrudhai-rajasekhar/trial-model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-twitchleaguebert_260k_en.md b/docs/_posts/ahmedlone127/2024-09-22-twitchleaguebert_260k_en.md new file mode 100644 index 00000000000000..1c3b1827e88e48 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-twitchleaguebert_260k_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English twitchleaguebert_260k RoBertaEmbeddings from Epidot +author: John Snow Labs +name: twitchleaguebert_260k +date: 2024-09-22 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`twitchleaguebert_260k` is a English model originally trained by Epidot. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/twitchleaguebert_260k_en_5.5.0_3.0_1726999577563.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/twitchleaguebert_260k_en_5.5.0_3.0_1726999577563.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("twitchleaguebert_260k","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("twitchleaguebert_260k","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|twitchleaguebert_260k| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|305.9 MB| + +## References + +https://huggingface.co/Epidot/TwitchLeagueBert-260k \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-twitchleaguebert_260k_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-twitchleaguebert_260k_pipeline_en.md new file mode 100644 index 00000000000000..b3dac2e1d41fe4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-twitchleaguebert_260k_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English twitchleaguebert_260k_pipeline pipeline RoBertaEmbeddings from Epidot +author: John Snow Labs +name: twitchleaguebert_260k_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`twitchleaguebert_260k_pipeline` is a English model originally trained by Epidot. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/twitchleaguebert_260k_pipeline_en_5.5.0_3.0_1726999591099.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/twitchleaguebert_260k_pipeline_en_5.5.0_3.0_1726999591099.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("twitchleaguebert_260k_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("twitchleaguebert_260k_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|twitchleaguebert_260k_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|305.9 MB| + +## References + +https://huggingface.co/Epidot/TwitchLeagueBert-260k + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-twitter_roberta_base_3epoch10_64_en.md b/docs/_posts/ahmedlone127/2024-09-22-twitter_roberta_base_3epoch10_64_en.md new file mode 100644 index 00000000000000..243510cd5a18b5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-twitter_roberta_base_3epoch10_64_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English twitter_roberta_base_3epoch10_64 RoBertaForSequenceClassification from dianamihalache27 +author: John Snow Labs +name: twitter_roberta_base_3epoch10_64 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`twitter_roberta_base_3epoch10_64` is a English model originally trained by dianamihalache27. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/twitter_roberta_base_3epoch10_64_en_5.5.0_3.0_1727036995070.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/twitter_roberta_base_3epoch10_64_en_5.5.0_3.0_1727036995070.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("twitter_roberta_base_3epoch10_64","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("twitter_roberta_base_3epoch10_64", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|twitter_roberta_base_3epoch10_64| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|468.1 MB| + +## References + +https://huggingface.co/dianamihalache27/twitter-roberta-base_3epoch10.64 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-twitter_roberta_base_3epoch10_64_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-twitter_roberta_base_3epoch10_64_pipeline_en.md new file mode 100644 index 00000000000000..ea54e7b0206a81 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-twitter_roberta_base_3epoch10_64_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English twitter_roberta_base_3epoch10_64_pipeline pipeline RoBertaForSequenceClassification from dianamihalache27 +author: John Snow Labs +name: twitter_roberta_base_3epoch10_64_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`twitter_roberta_base_3epoch10_64_pipeline` is a English model originally trained by dianamihalache27. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/twitter_roberta_base_3epoch10_64_pipeline_en_5.5.0_3.0_1727037019441.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/twitter_roberta_base_3epoch10_64_pipeline_en_5.5.0_3.0_1727037019441.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("twitter_roberta_base_3epoch10_64_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("twitter_roberta_base_3epoch10_64_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|twitter_roberta_base_3epoch10_64_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|468.2 MB| + +## References + +https://huggingface.co/dianamihalache27/twitter-roberta-base_3epoch10.64 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-twitter_roberta_base_nerd_latest_en.md b/docs/_posts/ahmedlone127/2024-09-22-twitter_roberta_base_nerd_latest_en.md new file mode 100644 index 00000000000000..0a8c49cb39238b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-twitter_roberta_base_nerd_latest_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English twitter_roberta_base_nerd_latest RoBertaForSequenceClassification from cardiffnlp +author: John Snow Labs +name: twitter_roberta_base_nerd_latest +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`twitter_roberta_base_nerd_latest` is a English model originally trained by cardiffnlp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/twitter_roberta_base_nerd_latest_en_5.5.0_3.0_1726967300700.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/twitter_roberta_base_nerd_latest_en_5.5.0_3.0_1726967300700.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("twitter_roberta_base_nerd_latest","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("twitter_roberta_base_nerd_latest", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|twitter_roberta_base_nerd_latest| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|468.6 MB| + +## References + +https://huggingface.co/cardiffnlp/twitter-roberta-base-nerd-latest \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-twitter_roberta_base_nerd_latest_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-twitter_roberta_base_nerd_latest_pipeline_en.md new file mode 100644 index 00000000000000..bafc1c01c0cc37 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-twitter_roberta_base_nerd_latest_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English twitter_roberta_base_nerd_latest_pipeline pipeline RoBertaForSequenceClassification from cardiffnlp +author: John Snow Labs +name: twitter_roberta_base_nerd_latest_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`twitter_roberta_base_nerd_latest_pipeline` is a English model originally trained by cardiffnlp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/twitter_roberta_base_nerd_latest_pipeline_en_5.5.0_3.0_1726967322635.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/twitter_roberta_base_nerd_latest_pipeline_en_5.5.0_3.0_1726967322635.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("twitter_roberta_base_nerd_latest_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("twitter_roberta_base_nerd_latest_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|twitter_roberta_base_nerd_latest_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|468.6 MB| + +## References + +https://huggingface.co/cardiffnlp/twitter-roberta-base-nerd-latest + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-type_inference_en.md b/docs/_posts/ahmedlone127/2024-09-22-type_inference_en.md new file mode 100644 index 00000000000000..5c9096d78e304a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-type_inference_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English type_inference RoBertaEmbeddings from ZQ +author: John Snow Labs +name: type_inference +date: 2024-09-22 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`type_inference` is a English model originally trained by ZQ. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/type_inference_en_5.5.0_3.0_1727041895483.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/type_inference_en_5.5.0_3.0_1727041895483.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("type_inference","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("type_inference","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|type_inference| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|465.9 MB| + +## References + +https://huggingface.co/ZQ/Type_Inference \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-type_inference_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-type_inference_pipeline_en.md new file mode 100644 index 00000000000000..5daa836f12fb3d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-type_inference_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English type_inference_pipeline pipeline RoBertaEmbeddings from ZQ +author: John Snow Labs +name: type_inference_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`type_inference_pipeline` is a English model originally trained by ZQ. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/type_inference_pipeline_en_5.5.0_3.0_1727041919414.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/type_inference_pipeline_en_5.5.0_3.0_1727041919414.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("type_inference_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("type_inference_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|type_inference_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|465.9 MB| + +## References + +https://huggingface.co/ZQ/Type_Inference + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-uned_tfg_08_62_mas_frecuentes_en.md b/docs/_posts/ahmedlone127/2024-09-22-uned_tfg_08_62_mas_frecuentes_en.md new file mode 100644 index 00000000000000..642b62e7fe4d9b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-uned_tfg_08_62_mas_frecuentes_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English uned_tfg_08_62_mas_frecuentes RoBertaForSequenceClassification from alexisdr +author: John Snow Labs +name: uned_tfg_08_62_mas_frecuentes +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`uned_tfg_08_62_mas_frecuentes` is a English model originally trained by alexisdr. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/uned_tfg_08_62_mas_frecuentes_en_5.5.0_3.0_1727026987671.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/uned_tfg_08_62_mas_frecuentes_en_5.5.0_3.0_1727026987671.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("uned_tfg_08_62_mas_frecuentes","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("uned_tfg_08_62_mas_frecuentes", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|uned_tfg_08_62_mas_frecuentes| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|430.8 MB| + +## References + +https://huggingface.co/alexisdr/uned-tfg-08.62_mas_frecuentes \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-uned_tfg_08_62_mas_frecuentes_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-uned_tfg_08_62_mas_frecuentes_pipeline_en.md new file mode 100644 index 00000000000000..25257bcaf249ae --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-uned_tfg_08_62_mas_frecuentes_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English uned_tfg_08_62_mas_frecuentes_pipeline pipeline RoBertaForSequenceClassification from alexisdr +author: John Snow Labs +name: uned_tfg_08_62_mas_frecuentes_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`uned_tfg_08_62_mas_frecuentes_pipeline` is a English model originally trained by alexisdr. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/uned_tfg_08_62_mas_frecuentes_pipeline_en_5.5.0_3.0_1727027016389.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/uned_tfg_08_62_mas_frecuentes_pipeline_en_5.5.0_3.0_1727027016389.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("uned_tfg_08_62_mas_frecuentes_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("uned_tfg_08_62_mas_frecuentes_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|uned_tfg_08_62_mas_frecuentes_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|430.8 MB| + +## References + +https://huggingface.co/alexisdr/uned-tfg-08.62_mas_frecuentes + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-voicemath_tiny_en.md b/docs/_posts/ahmedlone127/2024-09-22-voicemath_tiny_en.md new file mode 100644 index 00000000000000..709590bdb89fad --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-voicemath_tiny_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English voicemath_tiny WhisperForCTC from hoonsung +author: John Snow Labs +name: voicemath_tiny +date: 2024-09-22 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`voicemath_tiny` is a English model originally trained by hoonsung. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/voicemath_tiny_en_5.5.0_3.0_1726995741643.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/voicemath_tiny_en_5.5.0_3.0_1726995741643.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("voicemath_tiny","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("voicemath_tiny", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|voicemath_tiny| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|394.9 MB| + +## References + +https://huggingface.co/hoonsung/VoiceMath-Tiny \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-voicemath_tiny_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-voicemath_tiny_pipeline_en.md new file mode 100644 index 00000000000000..cb7c91f50f003b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-voicemath_tiny_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English voicemath_tiny_pipeline pipeline WhisperForCTC from hoonsung +author: John Snow Labs +name: voicemath_tiny_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`voicemath_tiny_pipeline` is a English model originally trained by hoonsung. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/voicemath_tiny_pipeline_en_5.5.0_3.0_1726995760568.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/voicemath_tiny_pipeline_en_5.5.0_3.0_1726995760568.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("voicemath_tiny_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("voicemath_tiny_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|voicemath_tiny_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|394.9 MB| + +## References + +https://huggingface.co/hoonsung/VoiceMath-Tiny + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-whisper_ai_nose_en.md b/docs/_posts/ahmedlone127/2024-09-22-whisper_ai_nose_en.md new file mode 100644 index 00000000000000..3e28a0723c44a0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-whisper_ai_nose_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_ai_nose WhisperForCTC from susmitabhatt +author: John Snow Labs +name: whisper_ai_nose +date: 2024-09-22 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_ai_nose` is a English model originally trained by susmitabhatt. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_ai_nose_en_5.5.0_3.0_1727022388766.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_ai_nose_en_5.5.0_3.0_1727022388766.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_ai_nose","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_ai_nose", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_ai_nose| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/susmitabhatt/whisper-ai-nose \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-whisper_ai_nose_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-whisper_ai_nose_pipeline_en.md new file mode 100644 index 00000000000000..3b891a789404df --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-whisper_ai_nose_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_ai_nose_pipeline pipeline WhisperForCTC from susmitabhatt +author: John Snow Labs +name: whisper_ai_nose_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_ai_nose_pipeline` is a English model originally trained by susmitabhatt. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_ai_nose_pipeline_en_5.5.0_3.0_1727022480769.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_ai_nose_pipeline_en_5.5.0_3.0_1727022480769.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_ai_nose_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_ai_nose_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_ai_nose_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/susmitabhatt/whisper-ai-nose + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-whisper_base2_ko.md b/docs/_posts/ahmedlone127/2024-09-22-whisper_base2_ko.md new file mode 100644 index 00000000000000..71dfa1ef9325ad --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-whisper_base2_ko.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Korean whisper_base2 WhisperForCTC from Dearlie +author: John Snow Labs +name: whisper_base2 +date: 2024-09-22 +tags: [ko, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: ko +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_base2` is a Korean model originally trained by Dearlie. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_base2_ko_5.5.0_3.0_1726997724494.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_base2_ko_5.5.0_3.0_1726997724494.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_base2","ko") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_base2", "ko") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_base2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|ko| +|Size:|398.6 MB| + +## References + +https://huggingface.co/Dearlie/whisper-base2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-whisper_base2_pipeline_ko.md b/docs/_posts/ahmedlone127/2024-09-22-whisper_base2_pipeline_ko.md new file mode 100644 index 00000000000000..ba0e6a601a257f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-whisper_base2_pipeline_ko.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Korean whisper_base2_pipeline pipeline WhisperForCTC from Dearlie +author: John Snow Labs +name: whisper_base2_pipeline +date: 2024-09-22 +tags: [ko, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: ko +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_base2_pipeline` is a Korean model originally trained by Dearlie. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_base2_pipeline_ko_5.5.0_3.0_1726997834648.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_base2_pipeline_ko_5.5.0_3.0_1726997834648.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_base2_pipeline", lang = "ko") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_base2_pipeline", lang = "ko") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_base2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|ko| +|Size:|398.6 MB| + +## References + +https://huggingface.co/Dearlie/whisper-base2 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-whisper_base_ga2en_v1_1_ga.md b/docs/_posts/ahmedlone127/2024-09-22-whisper_base_ga2en_v1_1_ga.md new file mode 100644 index 00000000000000..fb32c3f6dc701c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-whisper_base_ga2en_v1_1_ga.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Irish whisper_base_ga2en_v1_1 WhisperForCTC from ymoslem +author: John Snow Labs +name: whisper_base_ga2en_v1_1 +date: 2024-09-22 +tags: [ga, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: ga +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_base_ga2en_v1_1` is a Irish model originally trained by ymoslem. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_base_ga2en_v1_1_ga_5.5.0_3.0_1727024714531.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_base_ga2en_v1_1_ga_5.5.0_3.0_1727024714531.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_base_ga2en_v1_1","ga") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_base_ga2en_v1_1", "ga") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_base_ga2en_v1_1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|ga| +|Size:|641.2 MB| + +## References + +https://huggingface.co/ymoslem/whisper-base-ga2en-v1.1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-whisper_base_ga2en_v1_1_pipeline_ga.md b/docs/_posts/ahmedlone127/2024-09-22-whisper_base_ga2en_v1_1_pipeline_ga.md new file mode 100644 index 00000000000000..05a847f3b19073 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-whisper_base_ga2en_v1_1_pipeline_ga.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Irish whisper_base_ga2en_v1_1_pipeline pipeline WhisperForCTC from ymoslem +author: John Snow Labs +name: whisper_base_ga2en_v1_1_pipeline +date: 2024-09-22 +tags: [ga, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: ga +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_base_ga2en_v1_1_pipeline` is a Irish model originally trained by ymoslem. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_base_ga2en_v1_1_pipeline_ga_5.5.0_3.0_1727024753949.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_base_ga2en_v1_1_pipeline_ga_5.5.0_3.0_1727024753949.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_base_ga2en_v1_1_pipeline", lang = "ga") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_base_ga2en_v1_1_pipeline", lang = "ga") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_base_ga2en_v1_1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|ga| +|Size:|641.2 MB| + +## References + +https://huggingface.co/ymoslem/whisper-base-ga2en-v1.1 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-whisper_base_germanmed_full_de.md b/docs/_posts/ahmedlone127/2024-09-22-whisper_base_germanmed_full_de.md new file mode 100644 index 00000000000000..87a9e542d42685 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-whisper_base_germanmed_full_de.md @@ -0,0 +1,84 @@ +--- +layout: model +title: German whisper_base_germanmed_full WhisperForCTC from Hanhpt23 +author: John Snow Labs +name: whisper_base_germanmed_full +date: 2024-09-22 +tags: [de, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: de +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_base_germanmed_full` is a German model originally trained by Hanhpt23. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_base_germanmed_full_de_5.5.0_3.0_1727023098544.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_base_germanmed_full_de_5.5.0_3.0_1727023098544.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_base_germanmed_full","de") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_base_germanmed_full", "de") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_base_germanmed_full| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|de| +|Size:|614.6 MB| + +## References + +https://huggingface.co/Hanhpt23/whisper-base-GermanMed-full \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-whisper_base_germanmed_full_pipeline_de.md b/docs/_posts/ahmedlone127/2024-09-22-whisper_base_germanmed_full_pipeline_de.md new file mode 100644 index 00000000000000..59714bd0ae1e60 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-whisper_base_germanmed_full_pipeline_de.md @@ -0,0 +1,69 @@ +--- +layout: model +title: German whisper_base_germanmed_full_pipeline pipeline WhisperForCTC from Hanhpt23 +author: John Snow Labs +name: whisper_base_germanmed_full_pipeline +date: 2024-09-22 +tags: [de, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: de +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_base_germanmed_full_pipeline` is a German model originally trained by Hanhpt23. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_base_germanmed_full_pipeline_de_5.5.0_3.0_1727023141529.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_base_germanmed_full_pipeline_de_5.5.0_3.0_1727023141529.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_base_germanmed_full_pipeline", lang = "de") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_base_germanmed_full_pipeline", lang = "de") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_base_germanmed_full_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|de| +|Size:|614.6 MB| + +## References + +https://huggingface.co/Hanhpt23/whisper-base-GermanMed-full + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-whisper_base_malayalam_ml.md b/docs/_posts/ahmedlone127/2024-09-22-whisper_base_malayalam_ml.md new file mode 100644 index 00000000000000..e3151948444dd5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-whisper_base_malayalam_ml.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Malayalam whisper_base_malayalam WhisperForCTC from parambharat +author: John Snow Labs +name: whisper_base_malayalam +date: 2024-09-22 +tags: [ml, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: ml +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_base_malayalam` is a Malayalam model originally trained by parambharat. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_base_malayalam_ml_5.5.0_3.0_1726985249638.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_base_malayalam_ml_5.5.0_3.0_1726985249638.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_base_malayalam","ml") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_base_malayalam", "ml") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_base_malayalam| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|ml| +|Size:|643.9 MB| + +## References + +https://huggingface.co/parambharat/whisper-base-ml \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-whisper_base_malayalam_pipeline_ml.md b/docs/_posts/ahmedlone127/2024-09-22-whisper_base_malayalam_pipeline_ml.md new file mode 100644 index 00000000000000..664d0ca4167fc9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-whisper_base_malayalam_pipeline_ml.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Malayalam whisper_base_malayalam_pipeline pipeline WhisperForCTC from parambharat +author: John Snow Labs +name: whisper_base_malayalam_pipeline +date: 2024-09-22 +tags: [ml, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: ml +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_base_malayalam_pipeline` is a Malayalam model originally trained by parambharat. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_base_malayalam_pipeline_ml_5.5.0_3.0_1726985279379.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_base_malayalam_pipeline_ml_5.5.0_3.0_1726985279379.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_base_malayalam_pipeline", lang = "ml") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_base_malayalam_pipeline", lang = "ml") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_base_malayalam_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|ml| +|Size:|643.9 MB| + +## References + +https://huggingface.co/parambharat/whisper-base-ml + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-whisper_medium_bambara_field_bm.md b/docs/_posts/ahmedlone127/2024-09-22-whisper_medium_bambara_field_bm.md new file mode 100644 index 00000000000000..75bf1eb4053036 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-whisper_medium_bambara_field_bm.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Bambara whisper_medium_bambara_field WhisperForCTC from RobbieJimersonJr +author: John Snow Labs +name: whisper_medium_bambara_field +date: 2024-09-22 +tags: [bm, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: bm +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_medium_bambara_field` is a Bambara model originally trained by RobbieJimersonJr. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_medium_bambara_field_bm_5.5.0_3.0_1727025634380.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_medium_bambara_field_bm_5.5.0_3.0_1727025634380.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_medium_bambara_field","bm") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_medium_bambara_field", "bm") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_medium_bambara_field| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|bm| +|Size:|4.8 GB| + +## References + +https://huggingface.co/RobbieJimersonJr/whisper-medium-Bambara-field \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-whisper_medium_bambara_field_pipeline_bm.md b/docs/_posts/ahmedlone127/2024-09-22-whisper_medium_bambara_field_pipeline_bm.md new file mode 100644 index 00000000000000..52c75d02f7ebe2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-whisper_medium_bambara_field_pipeline_bm.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Bambara whisper_medium_bambara_field_pipeline pipeline WhisperForCTC from RobbieJimersonJr +author: John Snow Labs +name: whisper_medium_bambara_field_pipeline +date: 2024-09-22 +tags: [bm, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: bm +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_medium_bambara_field_pipeline` is a Bambara model originally trained by RobbieJimersonJr. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_medium_bambara_field_pipeline_bm_5.5.0_3.0_1727025839775.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_medium_bambara_field_pipeline_bm_5.5.0_3.0_1727025839775.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_medium_bambara_field_pipeline", lang = "bm") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_medium_bambara_field_pipeline", lang = "bm") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_medium_bambara_field_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|bm| +|Size:|4.8 GB| + +## References + +https://huggingface.co/RobbieJimersonJr/whisper-medium-Bambara-field + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-whisper_medium_tajik_tj_en.md b/docs/_posts/ahmedlone127/2024-09-22-whisper_medium_tajik_tj_en.md new file mode 100644 index 00000000000000..6f353616033629 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-whisper_medium_tajik_tj_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_medium_tajik_tj WhisperForCTC from muhtasham +author: John Snow Labs +name: whisper_medium_tajik_tj +date: 2024-09-22 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_medium_tajik_tj` is a English model originally trained by muhtasham. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_medium_tajik_tj_en_5.5.0_3.0_1727024476664.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_medium_tajik_tj_en_5.5.0_3.0_1727024476664.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_medium_tajik_tj","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_medium_tajik_tj", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_medium_tajik_tj| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|4.8 GB| + +## References + +https://huggingface.co/muhtasham/whisper-medium-tg_tj \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-whisper_sft_german_de.md b/docs/_posts/ahmedlone127/2024-09-22-whisper_sft_german_de.md new file mode 100644 index 00000000000000..04b00dc75d67f5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-whisper_sft_german_de.md @@ -0,0 +1,84 @@ +--- +layout: model +title: German whisper_sft_german WhisperForCTC from Jamin20 +author: John Snow Labs +name: whisper_sft_german +date: 2024-09-22 +tags: [de, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: de +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_sft_german` is a German model originally trained by Jamin20. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_sft_german_de_5.5.0_3.0_1726982555988.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_sft_german_de_5.5.0_3.0_1726982555988.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_sft_german","de") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_sft_german", "de") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_sft_german| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|de| +|Size:|1.7 GB| + +## References + +https://huggingface.co/Jamin20/whisper_sft_de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-whisper_small_allsnr_v4_de.md b/docs/_posts/ahmedlone127/2024-09-22-whisper_small_allsnr_v4_de.md new file mode 100644 index 00000000000000..1627ff442a7f6d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-whisper_small_allsnr_v4_de.md @@ -0,0 +1,84 @@ +--- +layout: model +title: German whisper_small_allsnr_v4 WhisperForCTC from marccgrau +author: John Snow Labs +name: whisper_small_allsnr_v4 +date: 2024-09-22 +tags: [de, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: de +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_allsnr_v4` is a German model originally trained by marccgrau. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_allsnr_v4_de_5.5.0_3.0_1727024315731.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_allsnr_v4_de_5.5.0_3.0_1727024315731.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_allsnr_v4","de") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_allsnr_v4", "de") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_allsnr_v4| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|de| +|Size:|1.7 GB| + +## References + +https://huggingface.co/marccgrau/whisper-small-allSNR-v4 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-whisper_small_allsnr_v4_pipeline_de.md b/docs/_posts/ahmedlone127/2024-09-22-whisper_small_allsnr_v4_pipeline_de.md new file mode 100644 index 00000000000000..8faa4e15835957 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-whisper_small_allsnr_v4_pipeline_de.md @@ -0,0 +1,69 @@ +--- +layout: model +title: German whisper_small_allsnr_v4_pipeline pipeline WhisperForCTC from marccgrau +author: John Snow Labs +name: whisper_small_allsnr_v4_pipeline +date: 2024-09-22 +tags: [de, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: de +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_allsnr_v4_pipeline` is a German model originally trained by marccgrau. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_allsnr_v4_pipeline_de_5.5.0_3.0_1727024417964.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_allsnr_v4_pipeline_de_5.5.0_3.0_1727024417964.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_allsnr_v4_pipeline", lang = "de") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_allsnr_v4_pipeline", lang = "de") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_allsnr_v4_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|de| +|Size:|1.7 GB| + +## References + +https://huggingface.co/marccgrau/whisper-small-allSNR-v4 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-whisper_small_arabic_abosteet_ar.md b/docs/_posts/ahmedlone127/2024-09-22-whisper_small_arabic_abosteet_ar.md new file mode 100644 index 00000000000000..e5748311995210 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-whisper_small_arabic_abosteet_ar.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Arabic whisper_small_arabic_abosteet WhisperForCTC from Abosteet +author: John Snow Labs +name: whisper_small_arabic_abosteet +date: 2024-09-22 +tags: [ar, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: ar +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_arabic_abosteet` is a Arabic model originally trained by Abosteet. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_arabic_abosteet_ar_5.5.0_3.0_1726981948928.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_arabic_abosteet_ar_5.5.0_3.0_1726981948928.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_arabic_abosteet","ar") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_arabic_abosteet", "ar") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_arabic_abosteet| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|ar| +|Size:|1.7 GB| + +## References + +https://huggingface.co/Abosteet/whisper-small-arabic \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-whisper_small_arabic_abosteet_pipeline_ar.md b/docs/_posts/ahmedlone127/2024-09-22-whisper_small_arabic_abosteet_pipeline_ar.md new file mode 100644 index 00000000000000..7e028c7dd0abd7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-whisper_small_arabic_abosteet_pipeline_ar.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Arabic whisper_small_arabic_abosteet_pipeline pipeline WhisperForCTC from Abosteet +author: John Snow Labs +name: whisper_small_arabic_abosteet_pipeline +date: 2024-09-22 +tags: [ar, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: ar +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_arabic_abosteet_pipeline` is a Arabic model originally trained by Abosteet. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_arabic_abosteet_pipeline_ar_5.5.0_3.0_1726982030498.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_arabic_abosteet_pipeline_ar_5.5.0_3.0_1726982030498.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_arabic_abosteet_pipeline", lang = "ar") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_arabic_abosteet_pipeline", lang = "ar") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_arabic_abosteet_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|ar| +|Size:|1.7 GB| + +## References + +https://huggingface.co/Abosteet/whisper-small-arabic + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-whisper_small_arabic_danielizham_pipeline_ar.md b/docs/_posts/ahmedlone127/2024-09-22-whisper_small_arabic_danielizham_pipeline_ar.md new file mode 100644 index 00000000000000..13e14e2d0e15ae --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-whisper_small_arabic_danielizham_pipeline_ar.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Arabic whisper_small_arabic_danielizham_pipeline pipeline WhisperForCTC from danielizham +author: John Snow Labs +name: whisper_small_arabic_danielizham_pipeline +date: 2024-09-22 +tags: [ar, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: ar +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_arabic_danielizham_pipeline` is a Arabic model originally trained by danielizham. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_arabic_danielizham_pipeline_ar_5.5.0_3.0_1726986608742.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_arabic_danielizham_pipeline_ar_5.5.0_3.0_1726986608742.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_arabic_danielizham_pipeline", lang = "ar") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_arabic_danielizham_pipeline", lang = "ar") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_arabic_danielizham_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|ar| +|Size:|1.7 GB| + +## References + +https://huggingface.co/danielizham/whisper-small-ar + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-whisper_small_arabic_huggingpanda_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-whisper_small_arabic_huggingpanda_pipeline_en.md new file mode 100644 index 00000000000000..667a983e8dff93 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-whisper_small_arabic_huggingpanda_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_small_arabic_huggingpanda_pipeline pipeline WhisperForCTC from HuggingPanda +author: John Snow Labs +name: whisper_small_arabic_huggingpanda_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_arabic_huggingpanda_pipeline` is a English model originally trained by HuggingPanda. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_arabic_huggingpanda_pipeline_en_5.5.0_3.0_1727023032941.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_arabic_huggingpanda_pipeline_en_5.5.0_3.0_1727023032941.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_arabic_huggingpanda_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_arabic_huggingpanda_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_arabic_huggingpanda_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/HuggingPanda/whisper-small-arabic + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-whisper_small_arabict12_ar.md b/docs/_posts/ahmedlone127/2024-09-22-whisper_small_arabict12_ar.md new file mode 100644 index 00000000000000..6acba62c36b384 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-whisper_small_arabict12_ar.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Arabic whisper_small_arabict12 WhisperForCTC from taqwa92 +author: John Snow Labs +name: whisper_small_arabict12 +date: 2024-09-22 +tags: [ar, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: ar +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_arabict12` is a Arabic model originally trained by taqwa92. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_arabict12_ar_5.5.0_3.0_1726994510951.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_arabict12_ar_5.5.0_3.0_1726994510951.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_arabict12","ar") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_arabict12", "ar") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_arabict12| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|ar| +|Size:|1.7 GB| + +## References + +https://huggingface.co/taqwa92/whisper-small-ArabicT12 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-whisper_small_arabict12_pipeline_ar.md b/docs/_posts/ahmedlone127/2024-09-22-whisper_small_arabict12_pipeline_ar.md new file mode 100644 index 00000000000000..7e0f1bf5bb10db --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-whisper_small_arabict12_pipeline_ar.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Arabic whisper_small_arabict12_pipeline pipeline WhisperForCTC from taqwa92 +author: John Snow Labs +name: whisper_small_arabict12_pipeline +date: 2024-09-22 +tags: [ar, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: ar +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_arabict12_pipeline` is a Arabic model originally trained by taqwa92. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_arabict12_pipeline_ar_5.5.0_3.0_1726994585142.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_arabict12_pipeline_ar_5.5.0_3.0_1726994585142.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_arabict12_pipeline", lang = "ar") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_arabict12_pipeline", lang = "ar") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_arabict12_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|ar| +|Size:|1.7 GB| + +## References + +https://huggingface.co/taqwa92/whisper-small-ArabicT12 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-whisper_small_atc_san2003m_en.md b/docs/_posts/ahmedlone127/2024-09-22-whisper_small_atc_san2003m_en.md new file mode 100644 index 00000000000000..080bb3436725f2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-whisper_small_atc_san2003m_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_small_atc_san2003m WhisperForCTC from san2003m +author: John Snow Labs +name: whisper_small_atc_san2003m +date: 2024-09-22 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_atc_san2003m` is a English model originally trained by san2003m. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_atc_san2003m_en_5.5.0_3.0_1726983312154.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_atc_san2003m_en_5.5.0_3.0_1726983312154.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_atc_san2003m","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_atc_san2003m", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_atc_san2003m| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/san2003m/whisper-small-atc \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-whisper_small_atc_san2003m_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-whisper_small_atc_san2003m_pipeline_en.md new file mode 100644 index 00000000000000..47622ed9204855 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-whisper_small_atc_san2003m_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_small_atc_san2003m_pipeline pipeline WhisperForCTC from san2003m +author: John Snow Labs +name: whisper_small_atc_san2003m_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_atc_san2003m_pipeline` is a English model originally trained by san2003m. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_atc_san2003m_pipeline_en_5.5.0_3.0_1726983388523.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_atc_san2003m_pipeline_en_5.5.0_3.0_1726983388523.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_atc_san2003m_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_atc_san2003m_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_atc_san2003m_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/san2003m/whisper-small-atc + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-whisper_small_divehi_avnishkanungo_en.md b/docs/_posts/ahmedlone127/2024-09-22-whisper_small_divehi_avnishkanungo_en.md new file mode 100644 index 00000000000000..5270386f3e6c46 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-whisper_small_divehi_avnishkanungo_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_small_divehi_avnishkanungo WhisperForCTC from avnishkanungo +author: John Snow Labs +name: whisper_small_divehi_avnishkanungo +date: 2024-09-22 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_divehi_avnishkanungo` is a English model originally trained by avnishkanungo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_divehi_avnishkanungo_en_5.5.0_3.0_1727024617000.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_divehi_avnishkanungo_en_5.5.0_3.0_1727024617000.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_divehi_avnishkanungo","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_divehi_avnishkanungo", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_divehi_avnishkanungo| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/avnishkanungo/whisper-small-dv \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-whisper_small_divehi_avnishkanungo_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-whisper_small_divehi_avnishkanungo_pipeline_en.md new file mode 100644 index 00000000000000..a70907333a7bc5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-whisper_small_divehi_avnishkanungo_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_small_divehi_avnishkanungo_pipeline pipeline WhisperForCTC from avnishkanungo +author: John Snow Labs +name: whisper_small_divehi_avnishkanungo_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_divehi_avnishkanungo_pipeline` is a English model originally trained by avnishkanungo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_divehi_avnishkanungo_pipeline_en_5.5.0_3.0_1727024706116.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_divehi_avnishkanungo_pipeline_en_5.5.0_3.0_1727024706116.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_divehi_avnishkanungo_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_divehi_avnishkanungo_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_divehi_avnishkanungo_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/avnishkanungo/whisper-small-dv + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-whisper_small_divehi_cordwainersmith_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-whisper_small_divehi_cordwainersmith_pipeline_en.md new file mode 100644 index 00000000000000..8f802262d2dcb3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-whisper_small_divehi_cordwainersmith_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_small_divehi_cordwainersmith_pipeline pipeline WhisperForCTC from CordwainerSmith +author: John Snow Labs +name: whisper_small_divehi_cordwainersmith_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_divehi_cordwainersmith_pipeline` is a English model originally trained by CordwainerSmith. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_divehi_cordwainersmith_pipeline_en_5.5.0_3.0_1726997375263.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_divehi_cordwainersmith_pipeline_en_5.5.0_3.0_1726997375263.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_divehi_cordwainersmith_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_divehi_cordwainersmith_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_divehi_cordwainersmith_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/CordwainerSmith/whisper-small-dv + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-whisper_small_divehi_dahml_dv.md b/docs/_posts/ahmedlone127/2024-09-22-whisper_small_divehi_dahml_dv.md new file mode 100644 index 00000000000000..fabe22ca64f645 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-whisper_small_divehi_dahml_dv.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Dhivehi, Divehi, Maldivian whisper_small_divehi_dahml WhisperForCTC from DahmL +author: John Snow Labs +name: whisper_small_divehi_dahml +date: 2024-09-22 +tags: [dv, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: dv +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_divehi_dahml` is a Dhivehi, Divehi, Maldivian model originally trained by DahmL. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_divehi_dahml_dv_5.5.0_3.0_1726982482552.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_divehi_dahml_dv_5.5.0_3.0_1726982482552.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_divehi_dahml","dv") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_divehi_dahml", "dv") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_divehi_dahml| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|dv| +|Size:|1.7 GB| + +## References + +https://huggingface.co/DahmL/whisper-small-dv \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-whisper_small_divehi_dahml_pipeline_dv.md b/docs/_posts/ahmedlone127/2024-09-22-whisper_small_divehi_dahml_pipeline_dv.md new file mode 100644 index 00000000000000..f764abefc59170 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-whisper_small_divehi_dahml_pipeline_dv.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Dhivehi, Divehi, Maldivian whisper_small_divehi_dahml_pipeline pipeline WhisperForCTC from DahmL +author: John Snow Labs +name: whisper_small_divehi_dahml_pipeline +date: 2024-09-22 +tags: [dv, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: dv +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_divehi_dahml_pipeline` is a Dhivehi, Divehi, Maldivian model originally trained by DahmL. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_divehi_dahml_pipeline_dv_5.5.0_3.0_1726982557555.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_divehi_dahml_pipeline_dv_5.5.0_3.0_1726982557555.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_divehi_dahml_pipeline", lang = "dv") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_divehi_dahml_pipeline", lang = "dv") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_divehi_dahml_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|dv| +|Size:|1.7 GB| + +## References + +https://huggingface.co/DahmL/whisper-small-dv + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-whisper_small_estonian_rristo_et.md b/docs/_posts/ahmedlone127/2024-09-22-whisper_small_estonian_rristo_et.md new file mode 100644 index 00000000000000..5f4e7312b283b8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-whisper_small_estonian_rristo_et.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Estonian whisper_small_estonian_rristo WhisperForCTC from rristo +author: John Snow Labs +name: whisper_small_estonian_rristo +date: 2024-09-22 +tags: [et, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: et +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_estonian_rristo` is a Estonian model originally trained by rristo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_estonian_rristo_et_5.5.0_3.0_1726997327646.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_estonian_rristo_et_5.5.0_3.0_1726997327646.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_estonian_rristo","et") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_estonian_rristo", "et") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_estonian_rristo| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|et| +|Size:|1.7 GB| + +## References + +https://huggingface.co/rristo/whisper-small-et \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-whisper_small_estonian_rristo_pipeline_et.md b/docs/_posts/ahmedlone127/2024-09-22-whisper_small_estonian_rristo_pipeline_et.md new file mode 100644 index 00000000000000..d1717b1c06e328 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-whisper_small_estonian_rristo_pipeline_et.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Estonian whisper_small_estonian_rristo_pipeline pipeline WhisperForCTC from rristo +author: John Snow Labs +name: whisper_small_estonian_rristo_pipeline +date: 2024-09-22 +tags: [et, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: et +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_estonian_rristo_pipeline` is a Estonian model originally trained by rristo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_estonian_rristo_pipeline_et_5.5.0_3.0_1726997409451.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_estonian_rristo_pipeline_et_5.5.0_3.0_1726997409451.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_estonian_rristo_pipeline", lang = "et") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_estonian_rristo_pipeline", lang = "et") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_estonian_rristo_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|et| +|Size:|1.7 GB| + +## References + +https://huggingface.co/rristo/whisper-small-et + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-whisper_small_hindi_dv.md b/docs/_posts/ahmedlone127/2024-09-22-whisper_small_hindi_dv.md new file mode 100644 index 00000000000000..0253475ca23a29 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-whisper_small_hindi_dv.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Dhivehi, Divehi, Maldivian whisper_small_hindi WhisperForCTC from Froptor +author: John Snow Labs +name: whisper_small_hindi +date: 2024-09-22 +tags: [dv, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: dv +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_hindi` is a Dhivehi, Divehi, Maldivian model originally trained by Froptor. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_hindi_dv_5.5.0_3.0_1726995593038.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_hindi_dv_5.5.0_3.0_1726995593038.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_hindi","dv") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_hindi", "dv") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_hindi| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|dv| +|Size:|1.7 GB| + +## References + +https://huggingface.co/Froptor/whisper-small-hi \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-whisper_small_hindi_pipeline_dv.md b/docs/_posts/ahmedlone127/2024-09-22-whisper_small_hindi_pipeline_dv.md new file mode 100644 index 00000000000000..35d0d1032e7cd4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-whisper_small_hindi_pipeline_dv.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Dhivehi, Divehi, Maldivian whisper_small_hindi_pipeline pipeline WhisperForCTC from Froptor +author: John Snow Labs +name: whisper_small_hindi_pipeline +date: 2024-09-22 +tags: [dv, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: dv +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_hindi_pipeline` is a Dhivehi, Divehi, Maldivian model originally trained by Froptor. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_hindi_pipeline_dv_5.5.0_3.0_1726995680491.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_hindi_pipeline_dv_5.5.0_3.0_1726995680491.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_hindi_pipeline", lang = "dv") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_hindi_pipeline", lang = "dv") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_hindi_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|dv| +|Size:|1.7 GB| + +## References + +https://huggingface.co/Froptor/whisper-small-hi + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-whisper_small_hindi_srirama_en.md b/docs/_posts/ahmedlone127/2024-09-22-whisper_small_hindi_srirama_en.md new file mode 100644 index 00000000000000..e0e92440d1131c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-whisper_small_hindi_srirama_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_small_hindi_srirama WhisperForCTC from srirama +author: John Snow Labs +name: whisper_small_hindi_srirama +date: 2024-09-22 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_hindi_srirama` is a English model originally trained by srirama. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_hindi_srirama_en_5.5.0_3.0_1727024528122.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_hindi_srirama_en_5.5.0_3.0_1727024528122.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_hindi_srirama","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_hindi_srirama", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_hindi_srirama| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/srirama/whisper-small-hi \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-whisper_small_hindi_srirama_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-whisper_small_hindi_srirama_pipeline_en.md new file mode 100644 index 00000000000000..d4089dd724d260 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-whisper_small_hindi_srirama_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_small_hindi_srirama_pipeline pipeline WhisperForCTC from srirama +author: John Snow Labs +name: whisper_small_hindi_srirama_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_hindi_srirama_pipeline` is a English model originally trained by srirama. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_hindi_srirama_pipeline_en_5.5.0_3.0_1727024625869.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_hindi_srirama_pipeline_en_5.5.0_3.0_1727024625869.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_hindi_srirama_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_hindi_srirama_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_hindi_srirama_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/srirama/whisper-small-hi + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-whisper_small_hre3_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-whisper_small_hre3_pipeline_en.md new file mode 100644 index 00000000000000..3edb62cffe0727 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-whisper_small_hre3_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_small_hre3_pipeline pipeline WhisperForCTC from ntviet +author: John Snow Labs +name: whisper_small_hre3_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_hre3_pipeline` is a English model originally trained by ntviet. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_hre3_pipeline_en_5.5.0_3.0_1726995411402.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_hre3_pipeline_en_5.5.0_3.0_1726995411402.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_hre3_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_hre3_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_hre3_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/ntviet/whisper-small-hre3 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-whisper_small_indonesian_v1_en.md b/docs/_posts/ahmedlone127/2024-09-22-whisper_small_indonesian_v1_en.md new file mode 100644 index 00000000000000..5885114a249a8a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-whisper_small_indonesian_v1_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_small_indonesian_v1 WhisperForCTC from yusufagung29 +author: John Snow Labs +name: whisper_small_indonesian_v1 +date: 2024-09-22 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_indonesian_v1` is a English model originally trained by yusufagung29. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_indonesian_v1_en_5.5.0_3.0_1726996678139.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_indonesian_v1_en_5.5.0_3.0_1726996678139.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_indonesian_v1","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_indonesian_v1", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_indonesian_v1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/yusufagung29/whisper_small_id_v1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-whisper_small_indonesian_v1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-whisper_small_indonesian_v1_pipeline_en.md new file mode 100644 index 00000000000000..3123028635a818 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-whisper_small_indonesian_v1_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_small_indonesian_v1_pipeline pipeline WhisperForCTC from yusufagung29 +author: John Snow Labs +name: whisper_small_indonesian_v1_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_indonesian_v1_pipeline` is a English model originally trained by yusufagung29. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_indonesian_v1_pipeline_en_5.5.0_3.0_1726996766502.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_indonesian_v1_pipeline_en_5.5.0_3.0_1726996766502.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_indonesian_v1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_indonesian_v1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_indonesian_v1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/yusufagung29/whisper_small_id_v1 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-whisper_small_irish_ga.md b/docs/_posts/ahmedlone127/2024-09-22-whisper_small_irish_ga.md new file mode 100644 index 00000000000000..e1e52029333fc8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-whisper_small_irish_ga.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Irish whisper_small_irish WhisperForCTC from callum-canavan +author: John Snow Labs +name: whisper_small_irish +date: 2024-09-22 +tags: [ga, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: ga +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_irish` is a Irish model originally trained by callum-canavan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_irish_ga_5.5.0_3.0_1726997693397.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_irish_ga_5.5.0_3.0_1726997693397.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_irish","ga") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_irish", "ga") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_irish| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|ga| +|Size:|1.7 GB| + +## References + +https://huggingface.co/callum-canavan/whisper-small-ga \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-whisper_small_irish_pipeline_ga.md b/docs/_posts/ahmedlone127/2024-09-22-whisper_small_irish_pipeline_ga.md new file mode 100644 index 00000000000000..77dcb8905b75b5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-whisper_small_irish_pipeline_ga.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Irish whisper_small_irish_pipeline pipeline WhisperForCTC from callum-canavan +author: John Snow Labs +name: whisper_small_irish_pipeline +date: 2024-09-22 +tags: [ga, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: ga +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_irish_pipeline` is a Irish model originally trained by callum-canavan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_irish_pipeline_ga_5.5.0_3.0_1726997773111.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_irish_pipeline_ga_5.5.0_3.0_1726997773111.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_irish_pipeline", lang = "ga") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_irish_pipeline", lang = "ga") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_irish_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|ga| +|Size:|1.7 GB| + +## References + +https://huggingface.co/callum-canavan/whisper-small-ga + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-whisper_small_italian_edoabati_it.md b/docs/_posts/ahmedlone127/2024-09-22-whisper_small_italian_edoabati_it.md new file mode 100644 index 00000000000000..d8d81c68e89a5f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-whisper_small_italian_edoabati_it.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Italian whisper_small_italian_edoabati WhisperForCTC from EdoAbati +author: John Snow Labs +name: whisper_small_italian_edoabati +date: 2024-09-22 +tags: [it, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: it +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_italian_edoabati` is a Italian model originally trained by EdoAbati. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_italian_edoabati_it_5.5.0_3.0_1726994790477.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_italian_edoabati_it_5.5.0_3.0_1726994790477.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_italian_edoabati","it") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_italian_edoabati", "it") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_italian_edoabati| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|it| +|Size:|1.7 GB| + +## References + +https://huggingface.co/EdoAbati/whisper-small-it \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-whisper_small_italian_edoabati_pipeline_it.md b/docs/_posts/ahmedlone127/2024-09-22-whisper_small_italian_edoabati_pipeline_it.md new file mode 100644 index 00000000000000..61829e0596c5e9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-whisper_small_italian_edoabati_pipeline_it.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Italian whisper_small_italian_edoabati_pipeline pipeline WhisperForCTC from EdoAbati +author: John Snow Labs +name: whisper_small_italian_edoabati_pipeline +date: 2024-09-22 +tags: [it, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: it +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_italian_edoabati_pipeline` is a Italian model originally trained by EdoAbati. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_italian_edoabati_pipeline_it_5.5.0_3.0_1726994865149.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_italian_edoabati_pipeline_it_5.5.0_3.0_1726994865149.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_italian_edoabati_pipeline", lang = "it") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_italian_edoabati_pipeline", lang = "it") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_italian_edoabati_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|it| +|Size:|1.7 GB| + +## References + +https://huggingface.co/EdoAbati/whisper-small-it + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-whisper_small_korean_eyfreq_speed_hi.md b/docs/_posts/ahmedlone127/2024-09-22-whisper_small_korean_eyfreq_speed_hi.md new file mode 100644 index 00000000000000..e220af5923f5b5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-whisper_small_korean_eyfreq_speed_hi.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Hindi whisper_small_korean_eyfreq_speed WhisperForCTC from Gummybear05 +author: John Snow Labs +name: whisper_small_korean_eyfreq_speed +date: 2024-09-22 +tags: [hi, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: hi +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_korean_eyfreq_speed` is a Hindi model originally trained by Gummybear05. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_korean_eyfreq_speed_hi_5.5.0_3.0_1726997502947.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_korean_eyfreq_speed_hi_5.5.0_3.0_1726997502947.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_korean_eyfreq_speed","hi") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_korean_eyfreq_speed", "hi") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_korean_eyfreq_speed| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|hi| +|Size:|1.7 GB| + +## References + +https://huggingface.co/Gummybear05/whisper-small-ko-EYfreq_speed \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-whisper_small_korean_eyfreq_speed_pipeline_hi.md b/docs/_posts/ahmedlone127/2024-09-22-whisper_small_korean_eyfreq_speed_pipeline_hi.md new file mode 100644 index 00000000000000..0b079760b0b0bc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-whisper_small_korean_eyfreq_speed_pipeline_hi.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Hindi whisper_small_korean_eyfreq_speed_pipeline pipeline WhisperForCTC from Gummybear05 +author: John Snow Labs +name: whisper_small_korean_eyfreq_speed_pipeline +date: 2024-09-22 +tags: [hi, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: hi +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_korean_eyfreq_speed_pipeline` is a Hindi model originally trained by Gummybear05. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_korean_eyfreq_speed_pipeline_hi_5.5.0_3.0_1726997584512.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_korean_eyfreq_speed_pipeline_hi_5.5.0_3.0_1726997584512.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_korean_eyfreq_speed_pipeline", lang = "hi") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_korean_eyfreq_speed_pipeline", lang = "hi") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_korean_eyfreq_speed_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|hi| +|Size:|1.7 GB| + +## References + +https://huggingface.co/Gummybear05/whisper-small-ko-EYfreq_speed + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-whisper_small_macedonian_mk.md b/docs/_posts/ahmedlone127/2024-09-22-whisper_small_macedonian_mk.md new file mode 100644 index 00000000000000..e380fc71db0e7c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-whisper_small_macedonian_mk.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Macedonian whisper_small_macedonian WhisperForCTC from goran +author: John Snow Labs +name: whisper_small_macedonian +date: 2024-09-22 +tags: [mk, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: mk +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_macedonian` is a Macedonian model originally trained by goran. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_macedonian_mk_5.5.0_3.0_1726995006790.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_macedonian_mk_5.5.0_3.0_1726995006790.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_macedonian","mk") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_macedonian", "mk") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_macedonian| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|mk| +|Size:|1.7 GB| + +## References + +https://huggingface.co/goran/whisper-small.mk \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-whisper_small_macedonian_pipeline_mk.md b/docs/_posts/ahmedlone127/2024-09-22-whisper_small_macedonian_pipeline_mk.md new file mode 100644 index 00000000000000..c2cc46f7bf7375 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-whisper_small_macedonian_pipeline_mk.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Macedonian whisper_small_macedonian_pipeline pipeline WhisperForCTC from goran +author: John Snow Labs +name: whisper_small_macedonian_pipeline +date: 2024-09-22 +tags: [mk, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: mk +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_macedonian_pipeline` is a Macedonian model originally trained by goran. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_macedonian_pipeline_mk_5.5.0_3.0_1726995085225.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_macedonian_pipeline_mk_5.5.0_3.0_1726995085225.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_macedonian_pipeline", lang = "mk") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_macedonian_pipeline", lang = "mk") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_macedonian_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|mk| +|Size:|1.7 GB| + +## References + +https://huggingface.co/goran/whisper-small.mk + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-whisper_small_mongolian_dorjzodovsuren_mn.md b/docs/_posts/ahmedlone127/2024-09-22-whisper_small_mongolian_dorjzodovsuren_mn.md new file mode 100644 index 00000000000000..46ed2d38a53e0c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-whisper_small_mongolian_dorjzodovsuren_mn.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Mongolian whisper_small_mongolian_dorjzodovsuren WhisperForCTC from Dorjzodovsuren +author: John Snow Labs +name: whisper_small_mongolian_dorjzodovsuren +date: 2024-09-22 +tags: [mn, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: mn +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_mongolian_dorjzodovsuren` is a Mongolian model originally trained by Dorjzodovsuren. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_mongolian_dorjzodovsuren_mn_5.5.0_3.0_1726997069151.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_mongolian_dorjzodovsuren_mn_5.5.0_3.0_1726997069151.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_mongolian_dorjzodovsuren","mn") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_mongolian_dorjzodovsuren", "mn") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_mongolian_dorjzodovsuren| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|mn| +|Size:|1.7 GB| + +## References + +https://huggingface.co/Dorjzodovsuren/whisper-small-mn \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-whisper_small_mongolian_dorjzodovsuren_pipeline_mn.md b/docs/_posts/ahmedlone127/2024-09-22-whisper_small_mongolian_dorjzodovsuren_pipeline_mn.md new file mode 100644 index 00000000000000..b1ea619aceb0a3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-whisper_small_mongolian_dorjzodovsuren_pipeline_mn.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Mongolian whisper_small_mongolian_dorjzodovsuren_pipeline pipeline WhisperForCTC from Dorjzodovsuren +author: John Snow Labs +name: whisper_small_mongolian_dorjzodovsuren_pipeline +date: 2024-09-22 +tags: [mn, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: mn +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_mongolian_dorjzodovsuren_pipeline` is a Mongolian model originally trained by Dorjzodovsuren. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_mongolian_dorjzodovsuren_pipeline_mn_5.5.0_3.0_1726997159353.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_mongolian_dorjzodovsuren_pipeline_mn_5.5.0_3.0_1726997159353.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_mongolian_dorjzodovsuren_pipeline", lang = "mn") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_mongolian_dorjzodovsuren_pipeline", lang = "mn") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_mongolian_dorjzodovsuren_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|mn| +|Size:|1.7 GB| + +## References + +https://huggingface.co/Dorjzodovsuren/whisper-small-mn + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-whisper_small_nomo_en.md b/docs/_posts/ahmedlone127/2024-09-22-whisper_small_nomo_en.md new file mode 100644 index 00000000000000..f75355656058a5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-whisper_small_nomo_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_small_nomo WhisperForCTC from susmitabhatt +author: John Snow Labs +name: whisper_small_nomo +date: 2024-09-22 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_nomo` is a English model originally trained by susmitabhatt. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_nomo_en_5.5.0_3.0_1726994862427.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_nomo_en_5.5.0_3.0_1726994862427.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_nomo","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_nomo", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_nomo| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/susmitabhatt/whisper-small-nomo \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-whisper_small_nomo_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-whisper_small_nomo_pipeline_en.md new file mode 100644 index 00000000000000..cac50e713c82e4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-whisper_small_nomo_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_small_nomo_pipeline pipeline WhisperForCTC from susmitabhatt +author: John Snow Labs +name: whisper_small_nomo_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_nomo_pipeline` is a English model originally trained by susmitabhatt. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_nomo_pipeline_en_5.5.0_3.0_1726994946969.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_nomo_pipeline_en_5.5.0_3.0_1726994946969.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_nomo_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_nomo_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_nomo_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/susmitabhatt/whisper-small-nomo + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-whisper_small_persian_farsi_javadr_fa.md b/docs/_posts/ahmedlone127/2024-09-22-whisper_small_persian_farsi_javadr_fa.md new file mode 100644 index 00000000000000..d63d81db5142d7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-whisper_small_persian_farsi_javadr_fa.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Persian whisper_small_persian_farsi_javadr WhisperForCTC from javadr +author: John Snow Labs +name: whisper_small_persian_farsi_javadr +date: 2024-09-22 +tags: [fa, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: fa +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_persian_farsi_javadr` is a Persian model originally trained by javadr. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_persian_farsi_javadr_fa_5.5.0_3.0_1726994734624.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_persian_farsi_javadr_fa_5.5.0_3.0_1726994734624.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_persian_farsi_javadr","fa") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_persian_farsi_javadr", "fa") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_persian_farsi_javadr| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|fa| +|Size:|1.1 GB| + +## References + +https://huggingface.co/javadr/whisper-small-fa \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-whisper_small_persian_farsi_javadr_pipeline_fa.md b/docs/_posts/ahmedlone127/2024-09-22-whisper_small_persian_farsi_javadr_pipeline_fa.md new file mode 100644 index 00000000000000..8e289663765457 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-whisper_small_persian_farsi_javadr_pipeline_fa.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Persian whisper_small_persian_farsi_javadr_pipeline pipeline WhisperForCTC from javadr +author: John Snow Labs +name: whisper_small_persian_farsi_javadr_pipeline +date: 2024-09-22 +tags: [fa, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: fa +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_persian_farsi_javadr_pipeline` is a Persian model originally trained by javadr. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_persian_farsi_javadr_pipeline_fa_5.5.0_3.0_1726995019059.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_persian_farsi_javadr_pipeline_fa_5.5.0_3.0_1726995019059.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_persian_farsi_javadr_pipeline", lang = "fa") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_persian_farsi_javadr_pipeline", lang = "fa") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_persian_farsi_javadr_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|fa| +|Size:|1.1 GB| + +## References + +https://huggingface.co/javadr/whisper-small-fa + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-whisper_small_r2_50k_2ep_en.md b/docs/_posts/ahmedlone127/2024-09-22-whisper_small_r2_50k_2ep_en.md new file mode 100644 index 00000000000000..4606a058038e4c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-whisper_small_r2_50k_2ep_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_small_r2_50k_2ep WhisperForCTC from spsither +author: John Snow Labs +name: whisper_small_r2_50k_2ep +date: 2024-09-22 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_r2_50k_2ep` is a English model originally trained by spsither. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_r2_50k_2ep_en_5.5.0_3.0_1727024814275.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_r2_50k_2ep_en_5.5.0_3.0_1727024814275.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_r2_50k_2ep","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_r2_50k_2ep", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_r2_50k_2ep| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/spsither/whisper-small-r2-50k-2ep \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-whisper_small_r2_50k_2ep_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-whisper_small_r2_50k_2ep_pipeline_en.md new file mode 100644 index 00000000000000..91c0d0b459dffe --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-whisper_small_r2_50k_2ep_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_small_r2_50k_2ep_pipeline pipeline WhisperForCTC from spsither +author: John Snow Labs +name: whisper_small_r2_50k_2ep_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_r2_50k_2ep_pipeline` is a English model originally trained by spsither. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_r2_50k_2ep_pipeline_en_5.5.0_3.0_1727024901961.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_r2_50k_2ep_pipeline_en_5.5.0_3.0_1727024901961.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_r2_50k_2ep_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_r2_50k_2ep_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_r2_50k_2ep_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/spsither/whisper-small-r2-50k-2ep + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-whisper_small_serbian_combined_pipeline_sr.md b/docs/_posts/ahmedlone127/2024-09-22-whisper_small_serbian_combined_pipeline_sr.md new file mode 100644 index 00000000000000..1baad796b30a2f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-whisper_small_serbian_combined_pipeline_sr.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Serbian whisper_small_serbian_combined_pipeline pipeline WhisperForCTC from Sagicc +author: John Snow Labs +name: whisper_small_serbian_combined_pipeline +date: 2024-09-22 +tags: [sr, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: sr +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_serbian_combined_pipeline` is a Serbian model originally trained by Sagicc. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_serbian_combined_pipeline_sr_5.5.0_3.0_1726994926452.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_serbian_combined_pipeline_sr_5.5.0_3.0_1726994926452.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_serbian_combined_pipeline", lang = "sr") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_serbian_combined_pipeline", lang = "sr") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_serbian_combined_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|sr| +|Size:|1.7 GB| + +## References + +https://huggingface.co/Sagicc/whisper-small-sr-combined + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-whisper_small_serbian_combined_sr.md b/docs/_posts/ahmedlone127/2024-09-22-whisper_small_serbian_combined_sr.md new file mode 100644 index 00000000000000..dbd759aae0d394 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-whisper_small_serbian_combined_sr.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Serbian whisper_small_serbian_combined WhisperForCTC from Sagicc +author: John Snow Labs +name: whisper_small_serbian_combined +date: 2024-09-22 +tags: [sr, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: sr +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_serbian_combined` is a Serbian model originally trained by Sagicc. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_serbian_combined_sr_5.5.0_3.0_1726994849214.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_serbian_combined_sr_5.5.0_3.0_1726994849214.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_serbian_combined","sr") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_serbian_combined", "sr") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_serbian_combined| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|sr| +|Size:|1.7 GB| + +## References + +https://huggingface.co/Sagicc/whisper-small-sr-combined \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-whisper_small_spanish_kevincrb_es.md b/docs/_posts/ahmedlone127/2024-09-22-whisper_small_spanish_kevincrb_es.md new file mode 100644 index 00000000000000..9414e849a97444 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-whisper_small_spanish_kevincrb_es.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Castilian, Spanish whisper_small_spanish_kevincrb WhisperForCTC from KevinCRB +author: John Snow Labs +name: whisper_small_spanish_kevincrb +date: 2024-09-22 +tags: [es, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: es +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_spanish_kevincrb` is a Castilian, Spanish model originally trained by KevinCRB. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_spanish_kevincrb_es_5.5.0_3.0_1726982108195.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_spanish_kevincrb_es_5.5.0_3.0_1726982108195.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_spanish_kevincrb","es") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_spanish_kevincrb", "es") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_spanish_kevincrb| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|es| +|Size:|1.7 GB| + +## References + +https://huggingface.co/KevinCRB/whisper-small-es \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-whisper_small_spanish_kevincrb_pipeline_es.md b/docs/_posts/ahmedlone127/2024-09-22-whisper_small_spanish_kevincrb_pipeline_es.md new file mode 100644 index 00000000000000..ed7906beca3f04 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-whisper_small_spanish_kevincrb_pipeline_es.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Castilian, Spanish whisper_small_spanish_kevincrb_pipeline pipeline WhisperForCTC from KevinCRB +author: John Snow Labs +name: whisper_small_spanish_kevincrb_pipeline +date: 2024-09-22 +tags: [es, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: es +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_spanish_kevincrb_pipeline` is a Castilian, Spanish model originally trained by KevinCRB. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_spanish_kevincrb_pipeline_es_5.5.0_3.0_1726982195496.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_spanish_kevincrb_pipeline_es_5.5.0_3.0_1726982195496.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_spanish_kevincrb_pipeline", lang = "es") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_spanish_kevincrb_pipeline", lang = "es") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_spanish_kevincrb_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|es| +|Size:|1.7 GB| + +## References + +https://huggingface.co/KevinCRB/whisper-small-es + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-whisper_small_tamil_steja_pipeline_ta.md b/docs/_posts/ahmedlone127/2024-09-22-whisper_small_tamil_steja_pipeline_ta.md new file mode 100644 index 00000000000000..e662841be2f83a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-whisper_small_tamil_steja_pipeline_ta.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Tamil whisper_small_tamil_steja_pipeline pipeline WhisperForCTC from steja +author: John Snow Labs +name: whisper_small_tamil_steja_pipeline +date: 2024-09-22 +tags: [ta, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: ta +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_tamil_steja_pipeline` is a Tamil model originally trained by steja. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_tamil_steja_pipeline_ta_5.5.0_3.0_1726983559495.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_tamil_steja_pipeline_ta_5.5.0_3.0_1726983559495.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_tamil_steja_pipeline", lang = "ta") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_tamil_steja_pipeline", lang = "ta") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_tamil_steja_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|ta| +|Size:|1.7 GB| + +## References + +https://huggingface.co/steja/whisper-small-tamil + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-whisper_small_tamil_steja_ta.md b/docs/_posts/ahmedlone127/2024-09-22-whisper_small_tamil_steja_ta.md new file mode 100644 index 00000000000000..6e8e8a69145e80 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-whisper_small_tamil_steja_ta.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Tamil whisper_small_tamil_steja WhisperForCTC from steja +author: John Snow Labs +name: whisper_small_tamil_steja +date: 2024-09-22 +tags: [ta, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: ta +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_tamil_steja` is a Tamil model originally trained by steja. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_tamil_steja_ta_5.5.0_3.0_1726983471823.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_tamil_steja_ta_5.5.0_3.0_1726983471823.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_tamil_steja","ta") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_tamil_steja", "ta") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_tamil_steja| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|ta| +|Size:|1.7 GB| + +## References + +https://huggingface.co/steja/whisper-small-tamil \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-whisper_small_turkish_cp2_pipeline_tr.md b/docs/_posts/ahmedlone127/2024-09-22-whisper_small_turkish_cp2_pipeline_tr.md new file mode 100644 index 00000000000000..437fd1f9924849 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-whisper_small_turkish_cp2_pipeline_tr.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Turkish whisper_small_turkish_cp2_pipeline pipeline WhisperForCTC from Kiwipirate +author: John Snow Labs +name: whisper_small_turkish_cp2_pipeline +date: 2024-09-22 +tags: [tr, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: tr +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_turkish_cp2_pipeline` is a Turkish model originally trained by Kiwipirate. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_turkish_cp2_pipeline_tr_5.5.0_3.0_1727025076704.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_turkish_cp2_pipeline_tr_5.5.0_3.0_1727025076704.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_turkish_cp2_pipeline", lang = "tr") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_turkish_cp2_pipeline", lang = "tr") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_turkish_cp2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|tr| +|Size:|1.7 GB| + +## References + +https://huggingface.co/Kiwipirate/whisper-small-tr-cp2 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-whisper_small_turkish_cp2_tr.md b/docs/_posts/ahmedlone127/2024-09-22-whisper_small_turkish_cp2_tr.md new file mode 100644 index 00000000000000..0864d327267c67 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-whisper_small_turkish_cp2_tr.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Turkish whisper_small_turkish_cp2 WhisperForCTC from Kiwipirate +author: John Snow Labs +name: whisper_small_turkish_cp2 +date: 2024-09-22 +tags: [tr, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: tr +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_turkish_cp2` is a Turkish model originally trained by Kiwipirate. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_turkish_cp2_tr_5.5.0_3.0_1727024993129.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_turkish_cp2_tr_5.5.0_3.0_1727024993129.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_turkish_cp2","tr") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_turkish_cp2", "tr") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_turkish_cp2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|tr| +|Size:|1.7 GB| + +## References + +https://huggingface.co/Kiwipirate/whisper-small-tr-cp2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-whisper_small_uoseftalaat_en.md b/docs/_posts/ahmedlone127/2024-09-22-whisper_small_uoseftalaat_en.md new file mode 100644 index 00000000000000..7b004ffee1ceac --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-whisper_small_uoseftalaat_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_small_uoseftalaat WhisperForCTC from uoseftalaat +author: John Snow Labs +name: whisper_small_uoseftalaat +date: 2024-09-22 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_uoseftalaat` is a English model originally trained by uoseftalaat. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_uoseftalaat_en_5.5.0_3.0_1726997256931.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_uoseftalaat_en_5.5.0_3.0_1726997256931.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_uoseftalaat","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_uoseftalaat", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_uoseftalaat| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/uoseftalaat/whisper-small \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-whisper_small_uoseftalaat_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-whisper_small_uoseftalaat_pipeline_en.md new file mode 100644 index 00000000000000..7a318e4635eb6f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-whisper_small_uoseftalaat_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_small_uoseftalaat_pipeline pipeline WhisperForCTC from uoseftalaat +author: John Snow Labs +name: whisper_small_uoseftalaat_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_uoseftalaat_pipeline` is a English model originally trained by uoseftalaat. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_uoseftalaat_pipeline_en_5.5.0_3.0_1726997335760.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_uoseftalaat_pipeline_en_5.5.0_3.0_1726997335760.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_uoseftalaat_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_uoseftalaat_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_uoseftalaat_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/uoseftalaat/whisper-small + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-whisper_small_uzbek_with_uzbekvoice_pipeline_uz.md b/docs/_posts/ahmedlone127/2024-09-22-whisper_small_uzbek_with_uzbekvoice_pipeline_uz.md new file mode 100644 index 00000000000000..3b8f4d20fbdc81 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-whisper_small_uzbek_with_uzbekvoice_pipeline_uz.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Uzbek whisper_small_uzbek_with_uzbekvoice_pipeline pipeline WhisperForCTC from aslon1213 +author: John Snow Labs +name: whisper_small_uzbek_with_uzbekvoice_pipeline +date: 2024-09-22 +tags: [uz, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: uz +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_uzbek_with_uzbekvoice_pipeline` is a Uzbek model originally trained by aslon1213. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_uzbek_with_uzbekvoice_pipeline_uz_5.5.0_3.0_1726984917682.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_uzbek_with_uzbekvoice_pipeline_uz_5.5.0_3.0_1726984917682.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_uzbek_with_uzbekvoice_pipeline", lang = "uz") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_uzbek_with_uzbekvoice_pipeline", lang = "uz") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_uzbek_with_uzbekvoice_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|uz| +|Size:|1.7 GB| + +## References + +https://huggingface.co/aslon1213/whisper-small-uz-with-uzbekvoice + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-whisper_small_uzbek_with_uzbekvoice_uz.md b/docs/_posts/ahmedlone127/2024-09-22-whisper_small_uzbek_with_uzbekvoice_uz.md new file mode 100644 index 00000000000000..10b019058b5d98 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-whisper_small_uzbek_with_uzbekvoice_uz.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Uzbek whisper_small_uzbek_with_uzbekvoice WhisperForCTC from aslon1213 +author: John Snow Labs +name: whisper_small_uzbek_with_uzbekvoice +date: 2024-09-22 +tags: [uz, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: uz +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_uzbek_with_uzbekvoice` is a Uzbek model originally trained by aslon1213. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_uzbek_with_uzbekvoice_uz_5.5.0_3.0_1726984832480.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_uzbek_with_uzbekvoice_uz_5.5.0_3.0_1726984832480.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_uzbek_with_uzbekvoice","uz") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_uzbek_with_uzbekvoice", "uz") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_uzbek_with_uzbekvoice| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|uz| +|Size:|1.7 GB| + +## References + +https://huggingface.co/aslon1213/whisper-small-uz-with-uzbekvoice \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-whisper_small_yoruba_omoekan_en.md b/docs/_posts/ahmedlone127/2024-09-22-whisper_small_yoruba_omoekan_en.md new file mode 100644 index 00000000000000..da31c2a0b19b77 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-whisper_small_yoruba_omoekan_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_small_yoruba_omoekan WhisperForCTC from omoekan +author: John Snow Labs +name: whisper_small_yoruba_omoekan +date: 2024-09-22 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_yoruba_omoekan` is a English model originally trained by omoekan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_yoruba_omoekan_en_5.5.0_3.0_1727025247375.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_yoruba_omoekan_en_5.5.0_3.0_1727025247375.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_yoruba_omoekan","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_yoruba_omoekan", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_yoruba_omoekan| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/omoekan/whisper-small-yoruba \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-whisper_small_yoruba_omoekan_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-whisper_small_yoruba_omoekan_pipeline_en.md new file mode 100644 index 00000000000000..4ac11dcf4acae6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-whisper_small_yoruba_omoekan_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_small_yoruba_omoekan_pipeline pipeline WhisperForCTC from omoekan +author: John Snow Labs +name: whisper_small_yoruba_omoekan_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_yoruba_omoekan_pipeline` is a English model originally trained by omoekan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_yoruba_omoekan_pipeline_en_5.5.0_3.0_1727025329863.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_yoruba_omoekan_pipeline_en_5.5.0_3.0_1727025329863.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_yoruba_omoekan_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_yoruba_omoekan_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_yoruba_omoekan_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/omoekan/whisper-small-yoruba + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-whisper_tamil_v2_en.md b/docs/_posts/ahmedlone127/2024-09-22-whisper_tamil_v2_en.md new file mode 100644 index 00000000000000..7cbaf241c2121f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-whisper_tamil_v2_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_tamil_v2 WhisperForCTC from tamilnlpSLIIT +author: John Snow Labs +name: whisper_tamil_v2 +date: 2024-09-22 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tamil_v2` is a English model originally trained by tamilnlpSLIIT. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tamil_v2_en_5.5.0_3.0_1726994139593.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tamil_v2_en_5.5.0_3.0_1726994139593.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_tamil_v2","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_tamil_v2", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tamil_v2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|390.0 MB| + +## References + +https://huggingface.co/tamilnlpSLIIT/whisper-ta-v2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-whisper_tamil_v2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-whisper_tamil_v2_pipeline_en.md new file mode 100644 index 00000000000000..5847ad92cf8f8d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-whisper_tamil_v2_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_tamil_v2_pipeline pipeline WhisperForCTC from tamilnlpSLIIT +author: John Snow Labs +name: whisper_tamil_v2_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tamil_v2_pipeline` is a English model originally trained by tamilnlpSLIIT. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tamil_v2_pipeline_en_5.5.0_3.0_1726994158336.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tamil_v2_pipeline_en_5.5.0_3.0_1726994158336.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_tamil_v2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_tamil_v2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tamil_v2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|390.1 MB| + +## References + +https://huggingface.co/tamilnlpSLIIT/whisper-ta-v2 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-whisper_tiny_final_cafet_en.md b/docs/_posts/ahmedlone127/2024-09-22-whisper_tiny_final_cafet_en.md new file mode 100644 index 00000000000000..5124ff12e7ff9d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-whisper_tiny_final_cafet_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_tiny_final_cafet WhisperForCTC from Cafet +author: John Snow Labs +name: whisper_tiny_final_cafet +date: 2024-09-22 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_final_cafet` is a English model originally trained by Cafet. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_final_cafet_en_5.5.0_3.0_1726983621589.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_final_cafet_en_5.5.0_3.0_1726983621589.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_tiny_final_cafet","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_tiny_final_cafet", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_final_cafet| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|389.6 MB| + +## References + +https://huggingface.co/Cafet/whisper-tiny-final \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-whisper_tiny_final_cafet_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-whisper_tiny_final_cafet_pipeline_en.md new file mode 100644 index 00000000000000..01037aea44f1c2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-whisper_tiny_final_cafet_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_tiny_final_cafet_pipeline pipeline WhisperForCTC from Cafet +author: John Snow Labs +name: whisper_tiny_final_cafet_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_final_cafet_pipeline` is a English model originally trained by Cafet. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_final_cafet_pipeline_en_5.5.0_3.0_1726983640075.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_final_cafet_pipeline_en_5.5.0_3.0_1726983640075.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_tiny_final_cafet_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_tiny_final_cafet_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_final_cafet_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|389.6 MB| + +## References + +https://huggingface.co/Cafet/whisper-tiny-final + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-whisper_tiny_finetuned_minds14_english_v2_en.md b/docs/_posts/ahmedlone127/2024-09-22-whisper_tiny_finetuned_minds14_english_v2_en.md new file mode 100644 index 00000000000000..a17df38f3bee9d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-whisper_tiny_finetuned_minds14_english_v2_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_tiny_finetuned_minds14_english_v2 WhisperForCTC from vineetsharma +author: John Snow Labs +name: whisper_tiny_finetuned_minds14_english_v2 +date: 2024-09-22 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_finetuned_minds14_english_v2` is a English model originally trained by vineetsharma. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_finetuned_minds14_english_v2_en_5.5.0_3.0_1726981231933.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_finetuned_minds14_english_v2_en_5.5.0_3.0_1726981231933.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_tiny_finetuned_minds14_english_v2","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_tiny_finetuned_minds14_english_v2", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_finetuned_minds14_english_v2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|390.9 MB| + +## References + +https://huggingface.co/vineetsharma/whisper-tiny-finetuned-minds14-en-v2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-whisper_tiny_finetuned_minds14_english_v2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-whisper_tiny_finetuned_minds14_english_v2_pipeline_en.md new file mode 100644 index 00000000000000..cd0522c8296b65 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-whisper_tiny_finetuned_minds14_english_v2_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_tiny_finetuned_minds14_english_v2_pipeline pipeline WhisperForCTC from vineetsharma +author: John Snow Labs +name: whisper_tiny_finetuned_minds14_english_v2_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_finetuned_minds14_english_v2_pipeline` is a English model originally trained by vineetsharma. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_finetuned_minds14_english_v2_pipeline_en_5.5.0_3.0_1726981250710.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_finetuned_minds14_english_v2_pipeline_en_5.5.0_3.0_1726981250710.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_tiny_finetuned_minds14_english_v2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_tiny_finetuned_minds14_english_v2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_finetuned_minds14_english_v2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|390.9 MB| + +## References + +https://huggingface.co/vineetsharma/whisper-tiny-finetuned-minds14-en-v2 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-whisper_tiny_italian_mattiasu96_pipeline_it.md b/docs/_posts/ahmedlone127/2024-09-22-whisper_tiny_italian_mattiasu96_pipeline_it.md new file mode 100644 index 00000000000000..9c2535faa2aec4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-whisper_tiny_italian_mattiasu96_pipeline_it.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Italian whisper_tiny_italian_mattiasu96_pipeline pipeline WhisperForCTC from mattiasu96 +author: John Snow Labs +name: whisper_tiny_italian_mattiasu96_pipeline +date: 2024-09-22 +tags: [it, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: it +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_italian_mattiasu96_pipeline` is a Italian model originally trained by mattiasu96. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_italian_mattiasu96_pipeline_it_5.5.0_3.0_1727022253421.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_italian_mattiasu96_pipeline_it_5.5.0_3.0_1727022253421.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_tiny_italian_mattiasu96_pipeline", lang = "it") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_tiny_italian_mattiasu96_pipeline", lang = "it") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_italian_mattiasu96_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|it| +|Size:|390.8 MB| + +## References + +https://huggingface.co/mattiasu96/whisper-tiny-it + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-whisper_tiny_minds14_english_us_ghassenhannachi_en.md b/docs/_posts/ahmedlone127/2024-09-22-whisper_tiny_minds14_english_us_ghassenhannachi_en.md new file mode 100644 index 00000000000000..b7d4d1e6325f73 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-whisper_tiny_minds14_english_us_ghassenhannachi_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_tiny_minds14_english_us_ghassenhannachi WhisperForCTC from ghassenhannachi +author: John Snow Labs +name: whisper_tiny_minds14_english_us_ghassenhannachi +date: 2024-09-22 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_minds14_english_us_ghassenhannachi` is a English model originally trained by ghassenhannachi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_minds14_english_us_ghassenhannachi_en_5.5.0_3.0_1726994995336.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_minds14_english_us_ghassenhannachi_en_5.5.0_3.0_1726994995336.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_tiny_minds14_english_us_ghassenhannachi","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_tiny_minds14_english_us_ghassenhannachi", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_minds14_english_us_ghassenhannachi| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|389.9 MB| + +## References + +https://huggingface.co/ghassenhannachi/whisper-tiny-minds14-en-us \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-whisper_tiny_minds14_english_us_ghassenhannachi_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-whisper_tiny_minds14_english_us_ghassenhannachi_pipeline_en.md new file mode 100644 index 00000000000000..130bc1b6d929d4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-whisper_tiny_minds14_english_us_ghassenhannachi_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_tiny_minds14_english_us_ghassenhannachi_pipeline pipeline WhisperForCTC from ghassenhannachi +author: John Snow Labs +name: whisper_tiny_minds14_english_us_ghassenhannachi_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_minds14_english_us_ghassenhannachi_pipeline` is a English model originally trained by ghassenhannachi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_minds14_english_us_ghassenhannachi_pipeline_en_5.5.0_3.0_1726995013770.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_minds14_english_us_ghassenhannachi_pipeline_en_5.5.0_3.0_1726995013770.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_tiny_minds14_english_us_ghassenhannachi_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_tiny_minds14_english_us_ghassenhannachi_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_minds14_english_us_ghassenhannachi_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|390.0 MB| + +## References + +https://huggingface.co/ghassenhannachi/whisper-tiny-minds14-en-us + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-whisper_tiny_minds14_hewliyang_en.md b/docs/_posts/ahmedlone127/2024-09-22-whisper_tiny_minds14_hewliyang_en.md new file mode 100644 index 00000000000000..848ee5c99e1235 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-whisper_tiny_minds14_hewliyang_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_tiny_minds14_hewliyang WhisperForCTC from hewliyang +author: John Snow Labs +name: whisper_tiny_minds14_hewliyang +date: 2024-09-22 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_minds14_hewliyang` is a English model originally trained by hewliyang. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_minds14_hewliyang_en_5.5.0_3.0_1727022270391.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_minds14_hewliyang_en_5.5.0_3.0_1727022270391.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_tiny_minds14_hewliyang","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_tiny_minds14_hewliyang", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_minds14_hewliyang| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|389.9 MB| + +## References + +https://huggingface.co/hewliyang/whisper-tiny-minds14 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-whisper_tiny_minds14_hewliyang_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-whisper_tiny_minds14_hewliyang_pipeline_en.md new file mode 100644 index 00000000000000..164fe8f34b3e97 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-whisper_tiny_minds14_hewliyang_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_tiny_minds14_hewliyang_pipeline pipeline WhisperForCTC from hewliyang +author: John Snow Labs +name: whisper_tiny_minds14_hewliyang_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_minds14_hewliyang_pipeline` is a English model originally trained by hewliyang. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_minds14_hewliyang_pipeline_en_5.5.0_3.0_1727022290284.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_minds14_hewliyang_pipeline_en_5.5.0_3.0_1727022290284.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_tiny_minds14_hewliyang_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_tiny_minds14_hewliyang_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_minds14_hewliyang_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|390.0 MB| + +## References + +https://huggingface.co/hewliyang/whisper-tiny-minds14 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-whisper_tiny_minds14_olegs_en.md b/docs/_posts/ahmedlone127/2024-09-22-whisper_tiny_minds14_olegs_en.md new file mode 100644 index 00000000000000..278720d85c126d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-whisper_tiny_minds14_olegs_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_tiny_minds14_olegs WhisperForCTC from olegs +author: John Snow Labs +name: whisper_tiny_minds14_olegs +date: 2024-09-22 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_minds14_olegs` is a English model originally trained by olegs. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_minds14_olegs_en_5.5.0_3.0_1726994141600.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_minds14_olegs_en_5.5.0_3.0_1726994141600.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_tiny_minds14_olegs","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_tiny_minds14_olegs", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_minds14_olegs| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|390.9 MB| + +## References + +https://huggingface.co/olegs/whisper-tiny-minds14 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-whisper_tiny_minds14_olegs_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-whisper_tiny_minds14_olegs_pipeline_en.md new file mode 100644 index 00000000000000..48c97c73aa499e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-whisper_tiny_minds14_olegs_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_tiny_minds14_olegs_pipeline pipeline WhisperForCTC from olegs +author: John Snow Labs +name: whisper_tiny_minds14_olegs_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_minds14_olegs_pipeline` is a English model originally trained by olegs. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_minds14_olegs_pipeline_en_5.5.0_3.0_1726994160440.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_minds14_olegs_pipeline_en_5.5.0_3.0_1726994160440.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_tiny_minds14_olegs_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_tiny_minds14_olegs_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_minds14_olegs_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|390.9 MB| + +## References + +https://huggingface.co/olegs/whisper-tiny-minds14 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-withinapps_ndd_mrbs_test_tags_cwadj_en.md b/docs/_posts/ahmedlone127/2024-09-22-withinapps_ndd_mrbs_test_tags_cwadj_en.md new file mode 100644 index 00000000000000..8ce181a1d323f3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-withinapps_ndd_mrbs_test_tags_cwadj_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English withinapps_ndd_mrbs_test_tags_cwadj DistilBertForSequenceClassification from lgk03 +author: John Snow Labs +name: withinapps_ndd_mrbs_test_tags_cwadj +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`withinapps_ndd_mrbs_test_tags_cwadj` is a English model originally trained by lgk03. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/withinapps_ndd_mrbs_test_tags_cwadj_en_5.5.0_3.0_1727012231607.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/withinapps_ndd_mrbs_test_tags_cwadj_en_5.5.0_3.0_1727012231607.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("withinapps_ndd_mrbs_test_tags_cwadj","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("withinapps_ndd_mrbs_test_tags_cwadj", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|withinapps_ndd_mrbs_test_tags_cwadj| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/lgk03/WITHINAPPS_NDD-mrbs_test-tags-CWAdj \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-withinapps_ndd_mrbs_test_tags_cwadj_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-withinapps_ndd_mrbs_test_tags_cwadj_pipeline_en.md new file mode 100644 index 00000000000000..92e1ab20aff7a8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-withinapps_ndd_mrbs_test_tags_cwadj_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English withinapps_ndd_mrbs_test_tags_cwadj_pipeline pipeline DistilBertForSequenceClassification from lgk03 +author: John Snow Labs +name: withinapps_ndd_mrbs_test_tags_cwadj_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`withinapps_ndd_mrbs_test_tags_cwadj_pipeline` is a English model originally trained by lgk03. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/withinapps_ndd_mrbs_test_tags_cwadj_pipeline_en_5.5.0_3.0_1727012247048.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/withinapps_ndd_mrbs_test_tags_cwadj_pipeline_en_5.5.0_3.0_1727012247048.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("withinapps_ndd_mrbs_test_tags_cwadj_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("withinapps_ndd_mrbs_test_tags_cwadj_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|withinapps_ndd_mrbs_test_tags_cwadj_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/lgk03/WITHINAPPS_NDD-mrbs_test-tags-CWAdj + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_1_en.md b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_1_en.md new file mode 100644 index 00000000000000..8f60509684b7c6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_1_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_1 XlmRoBertaForSequenceClassification from alyazharr +author: John Snow Labs +name: xlm_roberta_base_1 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_1` is a English model originally trained by alyazharr. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_1_en_5.5.0_3.0_1727009665776.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_1_en_5.5.0_3.0_1727009665776.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_1","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_1", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|827.5 MB| + +## References + +https://huggingface.co/alyazharr/xlm_roberta_base_1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_1_pipeline_en.md new file mode 100644 index 00000000000000..aa559204ea09b9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_1_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_1_pipeline pipeline XlmRoBertaForSequenceClassification from alyazharr +author: John Snow Labs +name: xlm_roberta_base_1_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_1_pipeline` is a English model originally trained by alyazharr. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_1_pipeline_en_5.5.0_3.0_1727009749083.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_1_pipeline_en_5.5.0_3.0_1727009749083.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|827.5 MB| + +## References + +https://huggingface.co/alyazharr/xlm_roberta_base_1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_balance_vietnam_aug_replace_w2v_en.md b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_balance_vietnam_aug_replace_w2v_en.md new file mode 100644 index 00000000000000..c580d1940bb1be --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_balance_vietnam_aug_replace_w2v_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_balance_vietnam_aug_replace_w2v XlmRoBertaForSequenceClassification from ThuyNT03 +author: John Snow Labs +name: xlm_roberta_base_balance_vietnam_aug_replace_w2v +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_balance_vietnam_aug_replace_w2v` is a English model originally trained by ThuyNT03. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_balance_vietnam_aug_replace_w2v_en_5.5.0_3.0_1727010095030.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_balance_vietnam_aug_replace_w2v_en_5.5.0_3.0_1727010095030.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_balance_vietnam_aug_replace_w2v","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_balance_vietnam_aug_replace_w2v", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_balance_vietnam_aug_replace_w2v| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|795.9 MB| + +## References + +https://huggingface.co/ThuyNT03/xlm-roberta-base-Balance_VietNam-aug_replace_w2v \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_balance_vietnam_aug_replace_w2v_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_balance_vietnam_aug_replace_w2v_pipeline_en.md new file mode 100644 index 00000000000000..bd7288323e4a13 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_balance_vietnam_aug_replace_w2v_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_balance_vietnam_aug_replace_w2v_pipeline pipeline XlmRoBertaForSequenceClassification from ThuyNT03 +author: John Snow Labs +name: xlm_roberta_base_balance_vietnam_aug_replace_w2v_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_balance_vietnam_aug_replace_w2v_pipeline` is a English model originally trained by ThuyNT03. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_balance_vietnam_aug_replace_w2v_pipeline_en_5.5.0_3.0_1727010210731.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_balance_vietnam_aug_replace_w2v_pipeline_en_5.5.0_3.0_1727010210731.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_balance_vietnam_aug_replace_w2v_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_balance_vietnam_aug_replace_w2v_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_balance_vietnam_aug_replace_w2v_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|796.0 MB| + +## References + +https://huggingface.co/ThuyNT03/xlm-roberta-base-Balance_VietNam-aug_replace_w2v + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_all_arned_en.md b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_all_arned_en.md new file mode 100644 index 00000000000000..6162793497f444 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_all_arned_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_all_arned XlmRoBertaForTokenClassification from ArneD +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_all_arned +date: 2024-09-22 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_all_arned` is a English model originally trained by ArneD. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_arned_en_5.5.0_3.0_1726969941698.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_arned_en_5.5.0_3.0_1726969941698.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_all_arned","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_all_arned", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_all_arned| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|861.0 MB| + +## References + +https://huggingface.co/ArneD/xlm-roberta-base-finetuned-panx-all \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_all_arned_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_all_arned_pipeline_en.md new file mode 100644 index 00000000000000..8f5aec9a6dd0f6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_all_arned_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_all_arned_pipeline pipeline XlmRoBertaForTokenClassification from ArneD +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_all_arned_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_all_arned_pipeline` is a English model originally trained by ArneD. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_arned_pipeline_en_5.5.0_3.0_1726970003011.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_arned_pipeline_en_5.5.0_3.0_1726970003011.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_all_arned_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_all_arned_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_all_arned_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|861.0 MB| + +## References + +https://huggingface.co/ArneD/xlm-roberta-base-finetuned-panx-all + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_all_ashrielbrian_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_all_ashrielbrian_pipeline_en.md new file mode 100644 index 00000000000000..ca5a9656bc457f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_all_ashrielbrian_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_all_ashrielbrian_pipeline pipeline XlmRoBertaForTokenClassification from ashrielbrian +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_all_ashrielbrian_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_all_ashrielbrian_pipeline` is a English model originally trained by ashrielbrian. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_ashrielbrian_pipeline_en_5.5.0_3.0_1726970706114.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_ashrielbrian_pipeline_en_5.5.0_3.0_1726970706114.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_all_ashrielbrian_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_all_ashrielbrian_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_all_ashrielbrian_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|858.2 MB| + +## References + +https://huggingface.co/ashrielbrian/xlm-roberta-base-finetuned-panx-all + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_all_mealduct_en.md b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_all_mealduct_en.md new file mode 100644 index 00000000000000..6329f595cfbb2a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_all_mealduct_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_all_mealduct XlmRoBertaForTokenClassification from MealDuct +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_all_mealduct +date: 2024-09-22 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_all_mealduct` is a English model originally trained by MealDuct. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_mealduct_en_5.5.0_3.0_1727018613326.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_mealduct_en_5.5.0_3.0_1727018613326.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_all_mealduct","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_all_mealduct", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_all_mealduct| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|848.0 MB| + +## References + +https://huggingface.co/MealDuct/xlm-roberta-base-finetuned-panx-all \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_all_obong_en.md b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_all_obong_en.md new file mode 100644 index 00000000000000..2cf3966d789969 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_all_obong_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_all_obong XlmRoBertaForTokenClassification from obong +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_all_obong +date: 2024-09-22 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_all_obong` is a English model originally trained by obong. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_obong_en_5.5.0_3.0_1726969816626.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_obong_en_5.5.0_3.0_1726969816626.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_all_obong","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_all_obong", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_all_obong| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|861.0 MB| + +## References + +https://huggingface.co/obong/xlm-roberta-base-finetuned-panx-all \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_all_obong_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_all_obong_pipeline_en.md new file mode 100644 index 00000000000000..9c4a0d52f65875 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_all_obong_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_all_obong_pipeline pipeline XlmRoBertaForTokenClassification from obong +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_all_obong_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_all_obong_pipeline` is a English model originally trained by obong. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_obong_pipeline_en_5.5.0_3.0_1726969878251.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_obong_pipeline_en_5.5.0_3.0_1726969878251.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_all_obong_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_all_obong_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_all_obong_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|861.0 MB| + +## References + +https://huggingface.co/obong/xlm-roberta-base-finetuned-panx-all + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_english_chaoli_en.md b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_english_chaoli_en.md new file mode 100644 index 00000000000000..750f76d38bef65 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_english_chaoli_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_english_chaoli XlmRoBertaForTokenClassification from ChaoLi +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_english_chaoli +date: 2024-09-22 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_english_chaoli` is a English model originally trained by ChaoLi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_chaoli_en_5.5.0_3.0_1726970258472.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_chaoli_en_5.5.0_3.0_1726970258472.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_english_chaoli","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_english_chaoli", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_english_chaoli| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|826.4 MB| + +## References + +https://huggingface.co/ChaoLi/xlm-roberta-base-finetuned-panx-en \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_english_chaoli_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_english_chaoli_pipeline_en.md new file mode 100644 index 00000000000000..d2470f680b14a0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_english_chaoli_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_english_chaoli_pipeline pipeline XlmRoBertaForTokenClassification from ChaoLi +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_english_chaoli_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_english_chaoli_pipeline` is a English model originally trained by ChaoLi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_chaoli_pipeline_en_5.5.0_3.0_1726970354138.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_chaoli_pipeline_en_5.5.0_3.0_1726970354138.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_english_chaoli_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_english_chaoli_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_english_chaoli_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|826.4 MB| + +## References + +https://huggingface.co/ChaoLi/xlm-roberta-base-finetuned-panx-en + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_french_rupe_en.md b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_french_rupe_en.md new file mode 100644 index 00000000000000..aac2f07232ee5c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_french_rupe_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_rupe XlmRoBertaForTokenClassification from RupE +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_rupe +date: 2024-09-22 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_rupe` is a English model originally trained by RupE. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_rupe_en_5.5.0_3.0_1727018736421.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_rupe_en_5.5.0_3.0_1727018736421.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_french_rupe","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_french_rupe", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_rupe| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|833.0 MB| + +## References + +https://huggingface.co/RupE/xlm-roberta-base-finetuned-panx-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_french_rupe_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_french_rupe_pipeline_en.md new file mode 100644 index 00000000000000..1c64bd8c88e603 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_french_rupe_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_rupe_pipeline pipeline XlmRoBertaForTokenClassification from RupE +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_rupe_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_rupe_pipeline` is a English model originally trained by RupE. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_rupe_pipeline_en_5.5.0_3.0_1727018823943.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_rupe_pipeline_en_5.5.0_3.0_1727018823943.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_rupe_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_rupe_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_rupe_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|833.0 MB| + +## References + +https://huggingface.co/RupE/xlm-roberta-base-finetuned-panx-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_french_sponomary_en.md b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_french_sponomary_en.md new file mode 100644 index 00000000000000..df8ebdc8ce26d9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_french_sponomary_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_sponomary XlmRoBertaForTokenClassification from sponomary +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_sponomary +date: 2024-09-22 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_sponomary` is a English model originally trained by sponomary. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_sponomary_en_5.5.0_3.0_1726970904759.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_sponomary_en_5.5.0_3.0_1726970904759.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_french_sponomary","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_french_sponomary", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_sponomary| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|840.9 MB| + +## References + +https://huggingface.co/sponomary/xlm-roberta-base-finetuned-panx-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_french_sponomary_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_french_sponomary_pipeline_en.md new file mode 100644 index 00000000000000..e6ffb21fadc6ea --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_french_sponomary_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_sponomary_pipeline pipeline XlmRoBertaForTokenClassification from sponomary +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_sponomary_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_sponomary_pipeline` is a English model originally trained by sponomary. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_sponomary_pipeline_en_5.5.0_3.0_1726970978524.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_sponomary_pipeline_en_5.5.0_3.0_1726970978524.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_sponomary_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_sponomary_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_sponomary_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|840.9 MB| + +## References + +https://huggingface.co/sponomary/xlm-roberta-base-finetuned-panx-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_french_yasu320001_en.md b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_french_yasu320001_en.md new file mode 100644 index 00000000000000..2e68e83a984b8f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_french_yasu320001_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_yasu320001 XlmRoBertaForTokenClassification from yasu320001 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_yasu320001 +date: 2024-09-22 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_yasu320001` is a English model originally trained by yasu320001. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_yasu320001_en_5.5.0_3.0_1727018931165.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_yasu320001_en_5.5.0_3.0_1727018931165.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_french_yasu320001","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_french_yasu320001", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_yasu320001| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|840.9 MB| + +## References + +https://huggingface.co/yasu320001/xlm-roberta-base-finetuned-panx-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_french_yasu320001_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_french_yasu320001_pipeline_en.md new file mode 100644 index 00000000000000..1d5cdf15671df1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_french_yasu320001_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_yasu320001_pipeline pipeline XlmRoBertaForTokenClassification from yasu320001 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_yasu320001_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_yasu320001_pipeline` is a English model originally trained by yasu320001. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_yasu320001_pipeline_en_5.5.0_3.0_1727019008995.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_yasu320001_pipeline_en_5.5.0_3.0_1727019008995.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_yasu320001_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_yasu320001_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_yasu320001_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|840.9 MB| + +## References + +https://huggingface.co/yasu320001/xlm-roberta-base-finetuned-panx-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_german_daiwenbin_en.md b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_german_daiwenbin_en.md new file mode 100644 index 00000000000000..9de9bc7cf017ef --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_german_daiwenbin_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_daiwenbin XlmRoBertaForTokenClassification from daiwenbin +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_daiwenbin +date: 2024-09-22 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_daiwenbin` is a English model originally trained by daiwenbin. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_daiwenbin_en_5.5.0_3.0_1726970967385.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_daiwenbin_en_5.5.0_3.0_1726970967385.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_daiwenbin","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_daiwenbin", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_daiwenbin| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/daiwenbin/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_german_daiwenbin_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_german_daiwenbin_pipeline_en.md new file mode 100644 index 00000000000000..6c786eed8f0dd8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_german_daiwenbin_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_daiwenbin_pipeline pipeline XlmRoBertaForTokenClassification from daiwenbin +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_daiwenbin_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_daiwenbin_pipeline` is a English model originally trained by daiwenbin. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_daiwenbin_pipeline_en_5.5.0_3.0_1726971031155.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_daiwenbin_pipeline_en_5.5.0_3.0_1726971031155.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_daiwenbin_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_daiwenbin_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_daiwenbin_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/daiwenbin/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_german_french_amitjain171980_en.md b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_german_french_amitjain171980_en.md new file mode 100644 index 00000000000000..2126ec09681301 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_german_french_amitjain171980_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_amitjain171980 XlmRoBertaForTokenClassification from amitjain171980 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_amitjain171980 +date: 2024-09-22 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_amitjain171980` is a English model originally trained by amitjain171980. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_amitjain171980_en_5.5.0_3.0_1727018470843.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_amitjain171980_en_5.5.0_3.0_1727018470843.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_amitjain171980","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_amitjain171980", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_amitjain171980| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|858.2 MB| + +## References + +https://huggingface.co/amitjain171980/xlm-roberta-base-finetuned-panx-de-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_german_french_amitjain171980_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_german_french_amitjain171980_pipeline_en.md new file mode 100644 index 00000000000000..94a1cfa8e51046 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_german_french_amitjain171980_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_amitjain171980_pipeline pipeline XlmRoBertaForTokenClassification from amitjain171980 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_amitjain171980_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_amitjain171980_pipeline` is a English model originally trained by amitjain171980. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_amitjain171980_pipeline_en_5.5.0_3.0_1727018532721.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_amitjain171980_pipeline_en_5.5.0_3.0_1727018532721.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_amitjain171980_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_amitjain171980_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_amitjain171980_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|858.2 MB| + +## References + +https://huggingface.co/amitjain171980/xlm-roberta-base-finetuned-panx-de-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_german_french_athairus_en.md b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_german_french_athairus_en.md new file mode 100644 index 00000000000000..51c4d53db98c9e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_german_french_athairus_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_athairus XlmRoBertaForTokenClassification from athairus +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_athairus +date: 2024-09-22 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_athairus` is a English model originally trained by athairus. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_athairus_en_5.5.0_3.0_1726970258084.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_athairus_en_5.5.0_3.0_1726970258084.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_athairus","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_athairus", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_athairus| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|858.2 MB| + +## References + +https://huggingface.co/athairus/xlm-roberta-base-finetuned-panx-de-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_german_french_athairus_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_german_french_athairus_pipeline_en.md new file mode 100644 index 00000000000000..3cd375c8328621 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_german_french_athairus_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_athairus_pipeline pipeline XlmRoBertaForTokenClassification from athairus +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_athairus_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_athairus_pipeline` is a English model originally trained by athairus. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_athairus_pipeline_en_5.5.0_3.0_1726970325027.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_athairus_pipeline_en_5.5.0_3.0_1726970325027.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_athairus_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_athairus_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_athairus_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|858.2 MB| + +## References + +https://huggingface.co/athairus/xlm-roberta-base-finetuned-panx-de-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_german_french_gewissta_en.md b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_german_french_gewissta_en.md new file mode 100644 index 00000000000000..ab1ce69a1792ea --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_german_french_gewissta_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_gewissta XlmRoBertaForTokenClassification from gewissta +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_gewissta +date: 2024-09-22 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_gewissta` is a English model originally trained by gewissta. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_gewissta_en_5.5.0_3.0_1727019295325.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_gewissta_en_5.5.0_3.0_1727019295325.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_gewissta","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_gewissta", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_gewissta| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|858.2 MB| + +## References + +https://huggingface.co/gewissta/xlm-roberta-base-finetuned-panx-de-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_german_french_gewissta_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_german_french_gewissta_pipeline_en.md new file mode 100644 index 00000000000000..0960b73fe6018a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_german_french_gewissta_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_gewissta_pipeline pipeline XlmRoBertaForTokenClassification from gewissta +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_gewissta_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_gewissta_pipeline` is a English model originally trained by gewissta. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_gewissta_pipeline_en_5.5.0_3.0_1727019360738.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_gewissta_pipeline_en_5.5.0_3.0_1727019360738.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_gewissta_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_gewissta_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_gewissta_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|858.2 MB| + +## References + +https://huggingface.co/gewissta/xlm-roberta-base-finetuned-panx-de-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_german_french_hcy5561_en.md b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_german_french_hcy5561_en.md new file mode 100644 index 00000000000000..75c6ae512e5cf5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_german_french_hcy5561_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_hcy5561 XlmRoBertaForTokenClassification from hcy5561 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_hcy5561 +date: 2024-09-22 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_hcy5561` is a English model originally trained by hcy5561. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_hcy5561_en_5.5.0_3.0_1727019238386.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_hcy5561_en_5.5.0_3.0_1727019238386.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_hcy5561","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_hcy5561", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_hcy5561| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|843.4 MB| + +## References + +https://huggingface.co/hcy5561/xlm-roberta-base-finetuned-panx-de-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_german_french_hcy5561_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_german_french_hcy5561_pipeline_en.md new file mode 100644 index 00000000000000..18df591d9e0666 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_german_french_hcy5561_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_hcy5561_pipeline pipeline XlmRoBertaForTokenClassification from hcy5561 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_hcy5561_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_hcy5561_pipeline` is a English model originally trained by hcy5561. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_hcy5561_pipeline_en_5.5.0_3.0_1727019320920.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_hcy5561_pipeline_en_5.5.0_3.0_1727019320920.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_hcy5561_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_hcy5561_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_hcy5561_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|843.4 MB| + +## References + +https://huggingface.co/hcy5561/xlm-roberta-base-finetuned-panx-de-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_german_french_malduwais_en.md b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_german_french_malduwais_en.md new file mode 100644 index 00000000000000..8596ae1b2b927c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_german_french_malduwais_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_malduwais XlmRoBertaForTokenClassification from malduwais +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_malduwais +date: 2024-09-22 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_malduwais` is a English model originally trained by malduwais. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_malduwais_en_5.5.0_3.0_1726970594528.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_malduwais_en_5.5.0_3.0_1726970594528.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_malduwais","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_malduwais", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_malduwais| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|853.4 MB| + +## References + +https://huggingface.co/malduwais/xlm-roberta-base-finetuned-panx-de-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_german_french_malduwais_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_german_french_malduwais_pipeline_en.md new file mode 100644 index 00000000000000..05d7f4f82fefaf --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_german_french_malduwais_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_malduwais_pipeline pipeline XlmRoBertaForTokenClassification from malduwais +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_malduwais_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_malduwais_pipeline` is a English model originally trained by malduwais. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_malduwais_pipeline_en_5.5.0_3.0_1726970653599.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_malduwais_pipeline_en_5.5.0_3.0_1726970653599.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_malduwais_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_malduwais_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_malduwais_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|853.4 MB| + +## References + +https://huggingface.co/malduwais/xlm-roberta-base-finetuned-panx-de-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_german_french_xiao888_en.md b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_german_french_xiao888_en.md new file mode 100644 index 00000000000000..0a5805a8858149 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_german_french_xiao888_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_xiao888 XlmRoBertaForTokenClassification from Xiao888 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_xiao888 +date: 2024-09-22 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_xiao888` is a English model originally trained by Xiao888. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_xiao888_en_5.5.0_3.0_1727018929609.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_xiao888_en_5.5.0_3.0_1727018929609.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_xiao888","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_xiao888", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_xiao888| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|858.2 MB| + +## References + +https://huggingface.co/Xiao888/xlm-roberta-base-finetuned-panx-de-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_german_french_xiao888_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_german_french_xiao888_pipeline_en.md new file mode 100644 index 00000000000000..679d84e26bb1ec --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_german_french_xiao888_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_xiao888_pipeline pipeline XlmRoBertaForTokenClassification from Xiao888 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_xiao888_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_xiao888_pipeline` is a English model originally trained by Xiao888. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_xiao888_pipeline_en_5.5.0_3.0_1727018995264.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_xiao888_pipeline_en_5.5.0_3.0_1727018995264.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_xiao888_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_xiao888_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_xiao888_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|858.2 MB| + +## References + +https://huggingface.co/Xiao888/xlm-roberta-base-finetuned-panx-de-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_german_french_zeronin7_en.md b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_german_french_zeronin7_en.md new file mode 100644 index 00000000000000..11734f2d60c165 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_german_french_zeronin7_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_zeronin7 XlmRoBertaForTokenClassification from zeronin7 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_zeronin7 +date: 2024-09-22 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_zeronin7` is a English model originally trained by zeronin7. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_zeronin7_en_5.5.0_3.0_1726969840493.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_zeronin7_en_5.5.0_3.0_1726969840493.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_zeronin7","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_zeronin7", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_zeronin7| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|843.4 MB| + +## References + +https://huggingface.co/zeronin7/xlm-roberta-base-finetuned-panx-de-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_german_french_zeronin7_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_german_french_zeronin7_pipeline_en.md new file mode 100644 index 00000000000000..c726a4dd92a084 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_german_french_zeronin7_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_zeronin7_pipeline pipeline XlmRoBertaForTokenClassification from zeronin7 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_zeronin7_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_zeronin7_pipeline` is a English model originally trained by zeronin7. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_zeronin7_pipeline_en_5.5.0_3.0_1726969923292.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_zeronin7_pipeline_en_5.5.0_3.0_1726969923292.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_zeronin7_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_zeronin7_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_zeronin7_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|843.4 MB| + +## References + +https://huggingface.co/zeronin7/xlm-roberta-base-finetuned-panx-de-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_german_jb723_en.md b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_german_jb723_en.md new file mode 100644 index 00000000000000..81cb382cfd0e48 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_german_jb723_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_jb723 XlmRoBertaForTokenClassification from jb723 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_jb723 +date: 2024-09-22 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_jb723` is a English model originally trained by jb723. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_jb723_en_5.5.0_3.0_1726970802529.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_jb723_en_5.5.0_3.0_1726970802529.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_jb723","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_jb723", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_jb723| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|854.4 MB| + +## References + +https://huggingface.co/jb723/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_german_jb723_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_german_jb723_pipeline_en.md new file mode 100644 index 00000000000000..2abf5def2f7a3b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_german_jb723_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_jb723_pipeline pipeline XlmRoBertaForTokenClassification from jb723 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_jb723_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_jb723_pipeline` is a English model originally trained by jb723. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_jb723_pipeline_en_5.5.0_3.0_1726970861243.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_jb723_pipeline_en_5.5.0_3.0_1726970861243.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_jb723_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_jb723_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_jb723_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|854.4 MB| + +## References + +https://huggingface.co/jb723/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_german_natrajanv_en.md b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_german_natrajanv_en.md new file mode 100644 index 00000000000000..ff3ec63b5d9ce7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_german_natrajanv_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_natrajanv XlmRoBertaForTokenClassification from natrajanv +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_natrajanv +date: 2024-09-22 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_natrajanv` is a English model originally trained by natrajanv. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_natrajanv_en_5.5.0_3.0_1726970451330.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_natrajanv_en_5.5.0_3.0_1726970451330.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_natrajanv","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_natrajanv", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_natrajanv| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|841.2 MB| + +## References + +https://huggingface.co/natrajanv/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_german_natrajanv_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_german_natrajanv_pipeline_en.md new file mode 100644 index 00000000000000..1d9c790edb0984 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_german_natrajanv_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_natrajanv_pipeline pipeline XlmRoBertaForTokenClassification from natrajanv +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_natrajanv_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_natrajanv_pipeline` is a English model originally trained by natrajanv. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_natrajanv_pipeline_en_5.5.0_3.0_1726970531610.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_natrajanv_pipeline_en_5.5.0_3.0_1726970531610.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_natrajanv_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_natrajanv_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_natrajanv_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|841.2 MB| + +## References + +https://huggingface.co/natrajanv/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_german_sh_zheng_en.md b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_german_sh_zheng_en.md new file mode 100644 index 00000000000000..f21e66358b5605 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_german_sh_zheng_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_sh_zheng XlmRoBertaForTokenClassification from sh-zheng +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_sh_zheng +date: 2024-09-22 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_sh_zheng` is a English model originally trained by sh-zheng. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_sh_zheng_en_5.5.0_3.0_1726970675630.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_sh_zheng_en_5.5.0_3.0_1726970675630.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_sh_zheng","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_sh_zheng", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_sh_zheng| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|840.8 MB| + +## References + +https://huggingface.co/sh-zheng/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_german_sh_zheng_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_german_sh_zheng_pipeline_en.md new file mode 100644 index 00000000000000..cda4599e0715a8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_german_sh_zheng_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_sh_zheng_pipeline pipeline XlmRoBertaForTokenClassification from sh-zheng +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_sh_zheng_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_sh_zheng_pipeline` is a English model originally trained by sh-zheng. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_sh_zheng_pipeline_en_5.5.0_3.0_1726970755427.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_sh_zheng_pipeline_en_5.5.0_3.0_1726970755427.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_sh_zheng_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_sh_zheng_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_sh_zheng_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|840.8 MB| + +## References + +https://huggingface.co/sh-zheng/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_german_tirendaz_en.md b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_german_tirendaz_en.md new file mode 100644 index 00000000000000..b234af5bd2c282 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_german_tirendaz_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_tirendaz XlmRoBertaForTokenClassification from Tirendaz +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_tirendaz +date: 2024-09-22 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_tirendaz` is a English model originally trained by Tirendaz. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_tirendaz_en_5.5.0_3.0_1726971061078.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_tirendaz_en_5.5.0_3.0_1726971061078.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_tirendaz","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_tirendaz", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_tirendaz| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|840.3 MB| + +## References + +https://huggingface.co/Tirendaz/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_german_tirendaz_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_german_tirendaz_pipeline_en.md new file mode 100644 index 00000000000000..0ae1ca2346e345 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_german_tirendaz_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_tirendaz_pipeline pipeline XlmRoBertaForTokenClassification from Tirendaz +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_tirendaz_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_tirendaz_pipeline` is a English model originally trained by Tirendaz. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_tirendaz_pipeline_en_5.5.0_3.0_1726971140160.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_tirendaz_pipeline_en_5.5.0_3.0_1726971140160.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_tirendaz_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_tirendaz_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_tirendaz_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|840.3 MB| + +## References + +https://huggingface.co/Tirendaz/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_italian_k3lana_en.md b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_italian_k3lana_en.md new file mode 100644 index 00000000000000..5cc1e5f7c120a1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_italian_k3lana_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_k3lana XlmRoBertaForTokenClassification from k3lana +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_k3lana +date: 2024-09-22 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_k3lana` is a English model originally trained by k3lana. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_k3lana_en_5.5.0_3.0_1726970446487.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_k3lana_en_5.5.0_3.0_1726970446487.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_k3lana","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_k3lana", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_k3lana| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|828.6 MB| + +## References + +https://huggingface.co/k3lana/xlm-roberta-base-finetuned-panx-it \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_italian_k3lana_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_italian_k3lana_pipeline_en.md new file mode 100644 index 00000000000000..56f5cf02a99829 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_italian_k3lana_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_k3lana_pipeline pipeline XlmRoBertaForTokenClassification from k3lana +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_k3lana_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_k3lana_pipeline` is a English model originally trained by k3lana. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_k3lana_pipeline_en_5.5.0_3.0_1726970528017.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_k3lana_pipeline_en_5.5.0_3.0_1726970528017.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_k3lana_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_k3lana_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_k3lana_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|828.7 MB| + +## References + +https://huggingface.co/k3lana/xlm-roberta-base-finetuned-panx-it + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_italian_yurit04_en.md b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_italian_yurit04_en.md new file mode 100644 index 00000000000000..19b4a4de5919b1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_italian_yurit04_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_yurit04 XlmRoBertaForTokenClassification from yurit04 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_yurit04 +date: 2024-09-22 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_yurit04` is a English model originally trained by yurit04. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_yurit04_en_5.5.0_3.0_1727018491544.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_yurit04_en_5.5.0_3.0_1727018491544.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_yurit04","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_yurit04", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_yurit04| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|828.6 MB| + +## References + +https://huggingface.co/yurit04/xlm-roberta-base-finetuned-panx-it \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_italian_yurit04_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_italian_yurit04_pipeline_en.md new file mode 100644 index 00000000000000..799f120c715b24 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_italian_yurit04_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_yurit04_pipeline pipeline XlmRoBertaForTokenClassification from yurit04 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_yurit04_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_yurit04_pipeline` is a English model originally trained by yurit04. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_yurit04_pipeline_en_5.5.0_3.0_1727018573800.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_yurit04_pipeline_en_5.5.0_3.0_1727018573800.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_yurit04_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_yurit04_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_yurit04_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|828.6 MB| + +## References + +https://huggingface.co/yurit04/xlm-roberta-base-finetuned-panx-it + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_korean_en.md b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_korean_en.md new file mode 100644 index 00000000000000..03f4c4ac9a2e39 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_korean_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_korean XlmRoBertaForTokenClassification from sungkwangjoong +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_korean +date: 2024-09-22 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_korean` is a English model originally trained by sungkwangjoong. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_korean_en_5.5.0_3.0_1727019124460.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_korean_en_5.5.0_3.0_1727019124460.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_korean","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_korean", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_korean| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|814.7 MB| + +## References + +https://huggingface.co/sungkwangjoong/xlm-roberta-base-finetuned-panx-ko \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_korean_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_korean_pipeline_en.md new file mode 100644 index 00000000000000..28c9926d2c2324 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_korean_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_korean_pipeline pipeline XlmRoBertaForTokenClassification from sungkwangjoong +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_korean_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_korean_pipeline` is a English model originally trained by sungkwangjoong. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_korean_pipeline_en_5.5.0_3.0_1727019217304.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_korean_pipeline_en_5.5.0_3.0_1727019217304.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_korean_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_korean_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_korean_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|814.7 MB| + +## References + +https://huggingface.co/sungkwangjoong/xlm-roberta-base-finetuned-panx-ko + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_tamil_neelrr_en.md b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_tamil_neelrr_en.md new file mode 100644 index 00000000000000..97aeb6c08f458d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_tamil_neelrr_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_tamil_neelrr XlmRoBertaForTokenClassification from neelrr +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_tamil_neelrr +date: 2024-09-22 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_tamil_neelrr` is a English model originally trained by neelrr. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_tamil_neelrr_en_5.5.0_3.0_1727018501487.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_tamil_neelrr_en_5.5.0_3.0_1727018501487.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_tamil_neelrr","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_tamil_neelrr", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_tamil_neelrr| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|833.6 MB| + +## References + +https://huggingface.co/neelrr/xlm-roberta-base-finetuned-panx-ta \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_tamil_neelrr_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_tamil_neelrr_pipeline_en.md new file mode 100644 index 00000000000000..aac02e2c1deeb8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_tamil_neelrr_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_tamil_neelrr_pipeline pipeline XlmRoBertaForTokenClassification from neelrr +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_tamil_neelrr_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_tamil_neelrr_pipeline` is a English model originally trained by neelrr. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_tamil_neelrr_pipeline_en_5.5.0_3.0_1727018576700.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_tamil_neelrr_pipeline_en_5.5.0_3.0_1727018576700.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_tamil_neelrr_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_tamil_neelrr_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_tamil_neelrr_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|833.6 MB| + +## References + +https://huggingface.co/neelrr/xlm-roberta-base-finetuned-panx-ta + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_ft_udpos213_top8lang_southern_sotho_en.md b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_ft_udpos213_top8lang_southern_sotho_en.md new file mode 100644 index 00000000000000..75806be067eda2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_ft_udpos213_top8lang_southern_sotho_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_ft_udpos213_top8lang_southern_sotho XlmRoBertaForTokenClassification from iceman2434 +author: John Snow Labs +name: xlm_roberta_base_ft_udpos213_top8lang_southern_sotho +date: 2024-09-22 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_ft_udpos213_top8lang_southern_sotho` is a English model originally trained by iceman2434. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_ft_udpos213_top8lang_southern_sotho_en_5.5.0_3.0_1727019173465.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_ft_udpos213_top8lang_southern_sotho_en_5.5.0_3.0_1727019173465.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_ft_udpos213_top8lang_southern_sotho","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_ft_udpos213_top8lang_southern_sotho", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_ft_udpos213_top8lang_southern_sotho| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|805.3 MB| + +## References + +https://huggingface.co/iceman2434/xlm-roberta-base_ft_udpos213-top8lang-st \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_ft_udpos213_top8lang_southern_sotho_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_ft_udpos213_top8lang_southern_sotho_pipeline_en.md new file mode 100644 index 00000000000000..bd7297a062980b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_ft_udpos213_top8lang_southern_sotho_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_ft_udpos213_top8lang_southern_sotho_pipeline pipeline XlmRoBertaForTokenClassification from iceman2434 +author: John Snow Labs +name: xlm_roberta_base_ft_udpos213_top8lang_southern_sotho_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_ft_udpos213_top8lang_southern_sotho_pipeline` is a English model originally trained by iceman2434. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_ft_udpos213_top8lang_southern_sotho_pipeline_en_5.5.0_3.0_1727019297152.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_ft_udpos213_top8lang_southern_sotho_pipeline_en_5.5.0_3.0_1727019297152.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_ft_udpos213_top8lang_southern_sotho_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_ft_udpos213_top8lang_southern_sotho_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_ft_udpos213_top8lang_southern_sotho_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|805.3 MB| + +## References + +https://huggingface.co/iceman2434/xlm-roberta-base_ft_udpos213-top8lang-st + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_lr2e_05_seed42_kinyarwanda_hau_eng_train_en.md b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_lr2e_05_seed42_kinyarwanda_hau_eng_train_en.md new file mode 100644 index 00000000000000..94485777eda9c7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_lr2e_05_seed42_kinyarwanda_hau_eng_train_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_lr2e_05_seed42_kinyarwanda_hau_eng_train XlmRoBertaForSequenceClassification from shanhy +author: John Snow Labs +name: xlm_roberta_base_lr2e_05_seed42_kinyarwanda_hau_eng_train +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_lr2e_05_seed42_kinyarwanda_hau_eng_train` is a English model originally trained by shanhy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_lr2e_05_seed42_kinyarwanda_hau_eng_train_en_5.5.0_3.0_1727010064521.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_lr2e_05_seed42_kinyarwanda_hau_eng_train_en_5.5.0_3.0_1727010064521.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_lr2e_05_seed42_kinyarwanda_hau_eng_train","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_lr2e_05_seed42_kinyarwanda_hau_eng_train", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_lr2e_05_seed42_kinyarwanda_hau_eng_train| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|800.8 MB| + +## References + +https://huggingface.co/shanhy/xlm-roberta-base_lr2e-05_seed42_kin-hau-eng_train \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_lr2e_05_seed42_kinyarwanda_hau_eng_train_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_lr2e_05_seed42_kinyarwanda_hau_eng_train_pipeline_en.md new file mode 100644 index 00000000000000..49f8ad2507d7cd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_lr2e_05_seed42_kinyarwanda_hau_eng_train_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_lr2e_05_seed42_kinyarwanda_hau_eng_train_pipeline pipeline XlmRoBertaForSequenceClassification from shanhy +author: John Snow Labs +name: xlm_roberta_base_lr2e_05_seed42_kinyarwanda_hau_eng_train_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_lr2e_05_seed42_kinyarwanda_hau_eng_train_pipeline` is a English model originally trained by shanhy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_lr2e_05_seed42_kinyarwanda_hau_eng_train_pipeline_en_5.5.0_3.0_1727010189358.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_lr2e_05_seed42_kinyarwanda_hau_eng_train_pipeline_en_5.5.0_3.0_1727010189358.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_lr2e_05_seed42_kinyarwanda_hau_eng_train_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_lr2e_05_seed42_kinyarwanda_hau_eng_train_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_lr2e_05_seed42_kinyarwanda_hau_eng_train_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|800.8 MB| + +## References + +https://huggingface.co/shanhy/xlm-roberta-base_lr2e-05_seed42_kin-hau-eng_train + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_nepal_bhasa_vietnam_aug_insert_synonym_1_en.md b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_nepal_bhasa_vietnam_aug_insert_synonym_1_en.md new file mode 100644 index 00000000000000..3fb1fb4147d91b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_nepal_bhasa_vietnam_aug_insert_synonym_1_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_nepal_bhasa_vietnam_aug_insert_synonym_1 XlmRoBertaForSequenceClassification from ThuyNT03 +author: John Snow Labs +name: xlm_roberta_base_nepal_bhasa_vietnam_aug_insert_synonym_1 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_nepal_bhasa_vietnam_aug_insert_synonym_1` is a English model originally trained by ThuyNT03. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_nepal_bhasa_vietnam_aug_insert_synonym_1_en_5.5.0_3.0_1727009816875.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_nepal_bhasa_vietnam_aug_insert_synonym_1_en_5.5.0_3.0_1727009816875.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_nepal_bhasa_vietnam_aug_insert_synonym_1","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_nepal_bhasa_vietnam_aug_insert_synonym_1", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_nepal_bhasa_vietnam_aug_insert_synonym_1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|795.7 MB| + +## References + +https://huggingface.co/ThuyNT03/xlm-roberta-base-New_VietNam-aug_insert_synonym-1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_nepal_bhasa_vietnam_aug_insert_synonym_1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_nepal_bhasa_vietnam_aug_insert_synonym_1_pipeline_en.md new file mode 100644 index 00000000000000..4a8cb34e743de3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_nepal_bhasa_vietnam_aug_insert_synonym_1_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_nepal_bhasa_vietnam_aug_insert_synonym_1_pipeline pipeline XlmRoBertaForSequenceClassification from ThuyNT03 +author: John Snow Labs +name: xlm_roberta_base_nepal_bhasa_vietnam_aug_insert_synonym_1_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_nepal_bhasa_vietnam_aug_insert_synonym_1_pipeline` is a English model originally trained by ThuyNT03. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_nepal_bhasa_vietnam_aug_insert_synonym_1_pipeline_en_5.5.0_3.0_1727009935365.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_nepal_bhasa_vietnam_aug_insert_synonym_1_pipeline_en_5.5.0_3.0_1727009935365.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_nepal_bhasa_vietnam_aug_insert_synonym_1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_nepal_bhasa_vietnam_aug_insert_synonym_1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_nepal_bhasa_vietnam_aug_insert_synonym_1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|795.7 MB| + +## References + +https://huggingface.co/ThuyNT03/xlm-roberta-base-New_VietNam-aug_insert_synonym-1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_vietnam_aug_insert_w2v_en.md b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_vietnam_aug_insert_w2v_en.md new file mode 100644 index 00000000000000..9db83f8bb9f5da --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_vietnam_aug_insert_w2v_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_vietnam_aug_insert_w2v XlmRoBertaForSequenceClassification from ThuyNT03 +author: John Snow Labs +name: xlm_roberta_base_vietnam_aug_insert_w2v +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_vietnam_aug_insert_w2v` is a English model originally trained by ThuyNT03. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_vietnam_aug_insert_w2v_en_5.5.0_3.0_1727009589410.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_vietnam_aug_insert_w2v_en_5.5.0_3.0_1727009589410.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_vietnam_aug_insert_w2v","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_vietnam_aug_insert_w2v", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_vietnam_aug_insert_w2v| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|795.0 MB| + +## References + +https://huggingface.co/ThuyNT03/xlm-roberta-base-VietNam-aug_insert_w2v \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_vietnam_aug_insert_w2v_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_vietnam_aug_insert_w2v_pipeline_en.md new file mode 100644 index 00000000000000..1b888e3faf7861 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_vietnam_aug_insert_w2v_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_vietnam_aug_insert_w2v_pipeline pipeline XlmRoBertaForSequenceClassification from ThuyNT03 +author: John Snow Labs +name: xlm_roberta_base_vietnam_aug_insert_w2v_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_vietnam_aug_insert_w2v_pipeline` is a English model originally trained by ThuyNT03. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_vietnam_aug_insert_w2v_pipeline_en_5.5.0_3.0_1727009711975.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_vietnam_aug_insert_w2v_pipeline_en_5.5.0_3.0_1727009711975.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_vietnam_aug_insert_w2v_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_vietnam_aug_insert_w2v_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_vietnam_aug_insert_w2v_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|795.0 MB| + +## References + +https://huggingface.co/ThuyNT03/xlm-roberta-base-VietNam-aug_insert_w2v + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_xnli_german_trimmed_german_30000_en.md b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_xnli_german_trimmed_german_30000_en.md new file mode 100644 index 00000000000000..5acf6b365d1dde --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_xnli_german_trimmed_german_30000_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_xnli_german_trimmed_german_30000 XlmRoBertaForSequenceClassification from vocabtrimmer +author: John Snow Labs +name: xlm_roberta_base_xnli_german_trimmed_german_30000 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_xnli_german_trimmed_german_30000` is a English model originally trained by vocabtrimmer. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_xnli_german_trimmed_german_30000_en_5.5.0_3.0_1727009352775.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_xnli_german_trimmed_german_30000_en_5.5.0_3.0_1727009352775.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_xnli_german_trimmed_german_30000","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_xnli_german_trimmed_german_30000", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_xnli_german_trimmed_german_30000| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|406.2 MB| + +## References + +https://huggingface.co/vocabtrimmer/xlm-roberta-base-xnli-de-trimmed-de-30000 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_xnli_german_trimmed_german_30000_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_xnli_german_trimmed_german_30000_pipeline_en.md new file mode 100644 index 00000000000000..cb883535365765 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_xnli_german_trimmed_german_30000_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_xnli_german_trimmed_german_30000_pipeline pipeline XlmRoBertaForSequenceClassification from vocabtrimmer +author: John Snow Labs +name: xlm_roberta_base_xnli_german_trimmed_german_30000_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_xnli_german_trimmed_german_30000_pipeline` is a English model originally trained by vocabtrimmer. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_xnli_german_trimmed_german_30000_pipeline_en_5.5.0_3.0_1727009372385.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_xnli_german_trimmed_german_30000_pipeline_en_5.5.0_3.0_1727009372385.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_xnli_german_trimmed_german_30000_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_xnli_german_trimmed_german_30000_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_xnli_german_trimmed_german_30000_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|406.3 MB| + +## References + +https://huggingface.co/vocabtrimmer/xlm-roberta-base-xnli-de-trimmed-de-30000 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-xlmr_english_chinese_all_shuffled_42_test1000_en.md b/docs/_posts/ahmedlone127/2024-09-22-xlmr_english_chinese_all_shuffled_42_test1000_en.md new file mode 100644 index 00000000000000..f73263324f009a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-xlmr_english_chinese_all_shuffled_42_test1000_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlmr_english_chinese_all_shuffled_42_test1000 XlmRoBertaForSequenceClassification from patpizio +author: John Snow Labs +name: xlmr_english_chinese_all_shuffled_42_test1000 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlmr_english_chinese_all_shuffled_42_test1000` is a English model originally trained by patpizio. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmr_english_chinese_all_shuffled_42_test1000_en_5.5.0_3.0_1727009254823.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmr_english_chinese_all_shuffled_42_test1000_en_5.5.0_3.0_1727009254823.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlmr_english_chinese_all_shuffled_42_test1000","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlmr_english_chinese_all_shuffled_42_test1000", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmr_english_chinese_all_shuffled_42_test1000| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|826.9 MB| + +## References + +https://huggingface.co/patpizio/xlmr-en-zh-all_shuffled-42-test1000 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-xlmr_english_chinese_all_shuffled_42_test1000_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-xlmr_english_chinese_all_shuffled_42_test1000_pipeline_en.md new file mode 100644 index 00000000000000..27c5fbde06cb54 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-xlmr_english_chinese_all_shuffled_42_test1000_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlmr_english_chinese_all_shuffled_42_test1000_pipeline pipeline XlmRoBertaForSequenceClassification from patpizio +author: John Snow Labs +name: xlmr_english_chinese_all_shuffled_42_test1000_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlmr_english_chinese_all_shuffled_42_test1000_pipeline` is a English model originally trained by patpizio. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmr_english_chinese_all_shuffled_42_test1000_pipeline_en_5.5.0_3.0_1727009365740.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmr_english_chinese_all_shuffled_42_test1000_pipeline_en_5.5.0_3.0_1727009365740.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlmr_english_chinese_all_shuffled_42_test1000_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlmr_english_chinese_all_shuffled_42_test1000_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmr_english_chinese_all_shuffled_42_test1000_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|827.0 MB| + +## References + +https://huggingface.co/patpizio/xlmr-en-zh-all_shuffled-42-test1000 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-1030_en.md b/docs/_posts/ahmedlone127/2024-09-23-1030_en.md new file mode 100644 index 00000000000000..d53e7fb06d91b3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-1030_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English 1030 DistilBertForSequenceClassification from tingchih +author: John Snow Labs +name: 1030 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`1030` is a English model originally trained by tingchih. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/1030_en_5.5.0_3.0_1727108743974.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/1030_en_5.5.0_3.0_1727108743974.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("1030","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("1030", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|1030| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/tingchih/1030 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-2020_q1_25p_filtered_en.md b/docs/_posts/ahmedlone127/2024-09-23-2020_q1_25p_filtered_en.md new file mode 100644 index 00000000000000..60395920c87ef7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-2020_q1_25p_filtered_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English 2020_q1_25p_filtered RoBertaEmbeddings from DouglasPontes +author: John Snow Labs +name: 2020_q1_25p_filtered +date: 2024-09-23 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`2020_q1_25p_filtered` is a English model originally trained by DouglasPontes. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/2020_q1_25p_filtered_en_5.5.0_3.0_1727121898228.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/2020_q1_25p_filtered_en_5.5.0_3.0_1727121898228.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("2020_q1_25p_filtered","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("2020_q1_25p_filtered","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|2020_q1_25p_filtered| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|466.0 MB| + +## References + +https://huggingface.co/DouglasPontes/2020-Q1-25p-filtered \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-2020_q1_25p_filtered_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-2020_q1_25p_filtered_pipeline_en.md new file mode 100644 index 00000000000000..8d2d9036a4fc76 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-2020_q1_25p_filtered_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English 2020_q1_25p_filtered_pipeline pipeline RoBertaEmbeddings from DouglasPontes +author: John Snow Labs +name: 2020_q1_25p_filtered_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`2020_q1_25p_filtered_pipeline` is a English model originally trained by DouglasPontes. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/2020_q1_25p_filtered_pipeline_en_5.5.0_3.0_1727121920175.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/2020_q1_25p_filtered_pipeline_en_5.5.0_3.0_1727121920175.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("2020_q1_25p_filtered_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("2020_q1_25p_filtered_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|2020_q1_25p_filtered_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|466.1 MB| + +## References + +https://huggingface.co/DouglasPontes/2020-Q1-25p-filtered + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-2020_q2_50p_filtered_2_en.md b/docs/_posts/ahmedlone127/2024-09-23-2020_q2_50p_filtered_2_en.md new file mode 100644 index 00000000000000..a789e6cf15b218 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-2020_q2_50p_filtered_2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English 2020_q2_50p_filtered_2 RoBertaEmbeddings from DouglasPontes +author: John Snow Labs +name: 2020_q2_50p_filtered_2 +date: 2024-09-23 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`2020_q2_50p_filtered_2` is a English model originally trained by DouglasPontes. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/2020_q2_50p_filtered_2_en_5.5.0_3.0_1727091907536.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/2020_q2_50p_filtered_2_en_5.5.0_3.0_1727091907536.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("2020_q2_50p_filtered_2","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("2020_q2_50p_filtered_2","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|2020_q2_50p_filtered_2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|463.9 MB| + +## References + +https://huggingface.co/DouglasPontes/2020-Q2-50p-filtered_2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-2020_q2_50p_filtered_2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-2020_q2_50p_filtered_2_pipeline_en.md new file mode 100644 index 00000000000000..1054862eb3d8ce --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-2020_q2_50p_filtered_2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English 2020_q2_50p_filtered_2_pipeline pipeline RoBertaEmbeddings from DouglasPontes +author: John Snow Labs +name: 2020_q2_50p_filtered_2_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`2020_q2_50p_filtered_2_pipeline` is a English model originally trained by DouglasPontes. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/2020_q2_50p_filtered_2_pipeline_en_5.5.0_3.0_1727091929671.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/2020_q2_50p_filtered_2_pipeline_en_5.5.0_3.0_1727091929671.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("2020_q2_50p_filtered_2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("2020_q2_50p_filtered_2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|2020_q2_50p_filtered_2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|463.9 MB| + +## References + +https://huggingface.co/DouglasPontes/2020-Q2-50p-filtered_2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-2020_q2_75p_filtered_combined90_en.md b/docs/_posts/ahmedlone127/2024-09-23-2020_q2_75p_filtered_combined90_en.md new file mode 100644 index 00000000000000..2e882975ce74d8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-2020_q2_75p_filtered_combined90_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English 2020_q2_75p_filtered_combined90 RoBertaEmbeddings from DouglasPontes +author: John Snow Labs +name: 2020_q2_75p_filtered_combined90 +date: 2024-09-23 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`2020_q2_75p_filtered_combined90` is a English model originally trained by DouglasPontes. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/2020_q2_75p_filtered_combined90_en_5.5.0_3.0_1727056871223.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/2020_q2_75p_filtered_combined90_en_5.5.0_3.0_1727056871223.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("2020_q2_75p_filtered_combined90","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("2020_q2_75p_filtered_combined90","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|2020_q2_75p_filtered_combined90| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|466.1 MB| + +## References + +https://huggingface.co/DouglasPontes/2020-Q2-75p-filtered_combined90 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-2020_q2_75p_filtered_combined90_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-2020_q2_75p_filtered_combined90_pipeline_en.md new file mode 100644 index 00000000000000..8d5b384ed369de --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-2020_q2_75p_filtered_combined90_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English 2020_q2_75p_filtered_combined90_pipeline pipeline RoBertaEmbeddings from DouglasPontes +author: John Snow Labs +name: 2020_q2_75p_filtered_combined90_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`2020_q2_75p_filtered_combined90_pipeline` is a English model originally trained by DouglasPontes. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/2020_q2_75p_filtered_combined90_pipeline_en_5.5.0_3.0_1727056895264.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/2020_q2_75p_filtered_combined90_pipeline_en_5.5.0_3.0_1727056895264.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("2020_q2_75p_filtered_combined90_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("2020_q2_75p_filtered_combined90_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|2020_q2_75p_filtered_combined90_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|466.1 MB| + +## References + +https://huggingface.co/DouglasPontes/2020-Q2-75p-filtered_combined90 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-2020_q2_90p_filtered_combined90_en.md b/docs/_posts/ahmedlone127/2024-09-23-2020_q2_90p_filtered_combined90_en.md new file mode 100644 index 00000000000000..6e88d64fd4eb39 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-2020_q2_90p_filtered_combined90_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English 2020_q2_90p_filtered_combined90 RoBertaEmbeddings from DouglasPontes +author: John Snow Labs +name: 2020_q2_90p_filtered_combined90 +date: 2024-09-23 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`2020_q2_90p_filtered_combined90` is a English model originally trained by DouglasPontes. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/2020_q2_90p_filtered_combined90_en_5.5.0_3.0_1727056740557.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/2020_q2_90p_filtered_combined90_en_5.5.0_3.0_1727056740557.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("2020_q2_90p_filtered_combined90","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("2020_q2_90p_filtered_combined90","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|2020_q2_90p_filtered_combined90| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|466.1 MB| + +## References + +https://huggingface.co/DouglasPontes/2020-Q2-90p-filtered_combined90 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-2020_q4_test_v0_3_en.md b/docs/_posts/ahmedlone127/2024-09-23-2020_q4_test_v0_3_en.md new file mode 100644 index 00000000000000..85e4398cabf19b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-2020_q4_test_v0_3_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English 2020_q4_test_v0_3 RoBertaEmbeddings from Magdk01 +author: John Snow Labs +name: 2020_q4_test_v0_3 +date: 2024-09-23 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`2020_q4_test_v0_3` is a English model originally trained by Magdk01. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/2020_q4_test_v0_3_en_5.5.0_3.0_1727080978995.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/2020_q4_test_v0_3_en_5.5.0_3.0_1727080978995.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("2020_q4_test_v0_3","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("2020_q4_test_v0_3","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|2020_q4_test_v0_3| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|466.1 MB| + +## References + +https://huggingface.co/Magdk01/2020_Q4_test_v0.3 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-2020_q4_test_v0_3_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-2020_q4_test_v0_3_pipeline_en.md new file mode 100644 index 00000000000000..828db38fe05b9b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-2020_q4_test_v0_3_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English 2020_q4_test_v0_3_pipeline pipeline RoBertaEmbeddings from Magdk01 +author: John Snow Labs +name: 2020_q4_test_v0_3_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`2020_q4_test_v0_3_pipeline` is a English model originally trained by Magdk01. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/2020_q4_test_v0_3_pipeline_en_5.5.0_3.0_1727081001295.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/2020_q4_test_v0_3_pipeline_en_5.5.0_3.0_1727081001295.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("2020_q4_test_v0_3_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("2020_q4_test_v0_3_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|2020_q4_test_v0_3_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|466.1 MB| + +## References + +https://huggingface.co/Magdk01/2020_Q4_test_v0.3 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-3_roberta_0_en.md b/docs/_posts/ahmedlone127/2024-09-23-3_roberta_0_en.md new file mode 100644 index 00000000000000..01923a1a992aa1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-3_roberta_0_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English 3_roberta_0 RoBertaForSequenceClassification from prl90777 +author: John Snow Labs +name: 3_roberta_0 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`3_roberta_0` is a English model originally trained by prl90777. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/3_roberta_0_en_5.5.0_3.0_1727055510163.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/3_roberta_0_en_5.5.0_3.0_1727055510163.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("3_roberta_0","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("3_roberta_0", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|3_roberta_0| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|449.2 MB| + +## References + +https://huggingface.co/prl90777/3_roberta_0 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-3_roberta_0_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-3_roberta_0_pipeline_en.md new file mode 100644 index 00000000000000..4874d52c3883ab --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-3_roberta_0_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English 3_roberta_0_pipeline pipeline RoBertaForSequenceClassification from prl90777 +author: John Snow Labs +name: 3_roberta_0_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`3_roberta_0_pipeline` is a English model originally trained by prl90777. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/3_roberta_0_pipeline_en_5.5.0_3.0_1727055538073.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/3_roberta_0_pipeline_en_5.5.0_3.0_1727055538073.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("3_roberta_0_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("3_roberta_0_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|3_roberta_0_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|449.2 MB| + +## References + +https://huggingface.co/prl90777/3_roberta_0 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-afriberta_small_hausa_5e_5_en.md b/docs/_posts/ahmedlone127/2024-09-23-afriberta_small_hausa_5e_5_en.md new file mode 100644 index 00000000000000..f9e6edf89e30d2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-afriberta_small_hausa_5e_5_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English afriberta_small_hausa_5e_5 XlmRoBertaForTokenClassification from grace-pro +author: John Snow Labs +name: afriberta_small_hausa_5e_5 +date: 2024-09-23 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`afriberta_small_hausa_5e_5` is a English model originally trained by grace-pro. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/afriberta_small_hausa_5e_5_en_5.5.0_3.0_1727061476768.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/afriberta_small_hausa_5e_5_en_5.5.0_3.0_1727061476768.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("afriberta_small_hausa_5e_5","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("afriberta_small_hausa_5e_5", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|afriberta_small_hausa_5e_5| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|308.6 MB| + +## References + +https://huggingface.co/grace-pro/afriberta-small-hausa-5e-5 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-afriberta_small_hausa_5e_5_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-afriberta_small_hausa_5e_5_pipeline_en.md new file mode 100644 index 00000000000000..d1838559ced798 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-afriberta_small_hausa_5e_5_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English afriberta_small_hausa_5e_5_pipeline pipeline XlmRoBertaForTokenClassification from grace-pro +author: John Snow Labs +name: afriberta_small_hausa_5e_5_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`afriberta_small_hausa_5e_5_pipeline` is a English model originally trained by grace-pro. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/afriberta_small_hausa_5e_5_pipeline_en_5.5.0_3.0_1727061492242.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/afriberta_small_hausa_5e_5_pipeline_en_5.5.0_3.0_1727061492242.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("afriberta_small_hausa_5e_5_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("afriberta_small_hausa_5e_5_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|afriberta_small_hausa_5e_5_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|308.6 MB| + +## References + +https://huggingface.co/grace-pro/afriberta-small-hausa-5e-5 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-afro_xlmr_base_hausa_seed_30_en.md b/docs/_posts/ahmedlone127/2024-09-23-afro_xlmr_base_hausa_seed_30_en.md new file mode 100644 index 00000000000000..c214d9b94ae9ac --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-afro_xlmr_base_hausa_seed_30_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English afro_xlmr_base_hausa_seed_30 XlmRoBertaForTokenClassification from grace-pro +author: John Snow Labs +name: afro_xlmr_base_hausa_seed_30 +date: 2024-09-23 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`afro_xlmr_base_hausa_seed_30` is a English model originally trained by grace-pro. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/afro_xlmr_base_hausa_seed_30_en_5.5.0_3.0_1727061485807.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/afro_xlmr_base_hausa_seed_30_en_5.5.0_3.0_1727061485807.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("afro_xlmr_base_hausa_seed_30","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("afro_xlmr_base_hausa_seed_30", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|afro_xlmr_base_hausa_seed_30| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|1.0 GB| + +## References + +https://huggingface.co/grace-pro/afro-xlmr-base-hausa-seed-30 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-afro_xlmr_base_hausa_seed_30_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-afro_xlmr_base_hausa_seed_30_pipeline_en.md new file mode 100644 index 00000000000000..56c5acb55020eb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-afro_xlmr_base_hausa_seed_30_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English afro_xlmr_base_hausa_seed_30_pipeline pipeline XlmRoBertaForTokenClassification from grace-pro +author: John Snow Labs +name: afro_xlmr_base_hausa_seed_30_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`afro_xlmr_base_hausa_seed_30_pipeline` is a English model originally trained by grace-pro. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/afro_xlmr_base_hausa_seed_30_pipeline_en_5.5.0_3.0_1727061539034.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/afro_xlmr_base_hausa_seed_30_pipeline_en_5.5.0_3.0_1727061539034.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("afro_xlmr_base_hausa_seed_30_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("afro_xlmr_base_hausa_seed_30_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|afro_xlmr_base_hausa_seed_30_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.0 GB| + +## References + +https://huggingface.co/grace-pro/afro-xlmr-base-hausa-seed-30 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-ag_news_roberta_base_seed_2_en.md b/docs/_posts/ahmedlone127/2024-09-23-ag_news_roberta_base_seed_2_en.md new file mode 100644 index 00000000000000..28ad0db2ee29b7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-ag_news_roberta_base_seed_2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English ag_news_roberta_base_seed_2 RoBertaForSequenceClassification from utahnlp +author: John Snow Labs +name: ag_news_roberta_base_seed_2 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ag_news_roberta_base_seed_2` is a English model originally trained by utahnlp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ag_news_roberta_base_seed_2_en_5.5.0_3.0_1727055252287.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ag_news_roberta_base_seed_2_en_5.5.0_3.0_1727055252287.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("ag_news_roberta_base_seed_2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("ag_news_roberta_base_seed_2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ag_news_roberta_base_seed_2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|463.3 MB| + +## References + +https://huggingface.co/utahnlp/ag_news_roberta-base_seed-2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-agnews_padding0model_en.md b/docs/_posts/ahmedlone127/2024-09-23-agnews_padding0model_en.md new file mode 100644 index 00000000000000..3d2f619c520eb9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-agnews_padding0model_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English agnews_padding0model DistilBertForSequenceClassification from Realgon +author: John Snow Labs +name: agnews_padding0model +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`agnews_padding0model` is a English model originally trained by Realgon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/agnews_padding0model_en_5.5.0_3.0_1727108318948.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/agnews_padding0model_en_5.5.0_3.0_1727108318948.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("agnews_padding0model","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("agnews_padding0model", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|agnews_padding0model| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Realgon/agnews_padding0model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-agnews_padding30model_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-agnews_padding30model_pipeline_en.md new file mode 100644 index 00000000000000..8e77c27bf23535 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-agnews_padding30model_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English agnews_padding30model_pipeline pipeline DistilBertForSequenceClassification from Realgon +author: John Snow Labs +name: agnews_padding30model_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`agnews_padding30model_pipeline` is a English model originally trained by Realgon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/agnews_padding30model_pipeline_en_5.5.0_3.0_1727059686642.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/agnews_padding30model_pipeline_en_5.5.0_3.0_1727059686642.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("agnews_padding30model_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("agnews_padding30model_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|agnews_padding30model_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Realgon/agnews_padding30model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-ai_generated_text_classification_en.md b/docs/_posts/ahmedlone127/2024-09-23-ai_generated_text_classification_en.md new file mode 100644 index 00000000000000..8667c7292c650f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-ai_generated_text_classification_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English ai_generated_text_classification RoBertaForSequenceClassification from luciayn +author: John Snow Labs +name: ai_generated_text_classification +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ai_generated_text_classification` is a English model originally trained by luciayn. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ai_generated_text_classification_en_5.5.0_3.0_1727055227830.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ai_generated_text_classification_en_5.5.0_3.0_1727055227830.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("ai_generated_text_classification","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("ai_generated_text_classification", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ai_generated_text_classification| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|468.2 MB| + +## References + +https://huggingface.co/luciayn/ai-generated-text-classification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-ai_generated_text_classification_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-ai_generated_text_classification_pipeline_en.md new file mode 100644 index 00000000000000..5dc5bf778c9a9c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-ai_generated_text_classification_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English ai_generated_text_classification_pipeline pipeline RoBertaForSequenceClassification from luciayn +author: John Snow Labs +name: ai_generated_text_classification_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ai_generated_text_classification_pipeline` is a English model originally trained by luciayn. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ai_generated_text_classification_pipeline_en_5.5.0_3.0_1727055250930.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ai_generated_text_classification_pipeline_en_5.5.0_3.0_1727055250930.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("ai_generated_text_classification_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("ai_generated_text_classification_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ai_generated_text_classification_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|468.2 MB| + +## References + +https://huggingface.co/luciayn/ai-generated-text-classification + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-aia_hw01_qian_wu_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-aia_hw01_qian_wu_pipeline_en.md new file mode 100644 index 00000000000000..9ba8c13c5c6016 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-aia_hw01_qian_wu_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English aia_hw01_qian_wu_pipeline pipeline DistilBertForSequenceClassification from Qian-Wu +author: John Snow Labs +name: aia_hw01_qian_wu_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`aia_hw01_qian_wu_pipeline` is a English model originally trained by Qian-Wu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/aia_hw01_qian_wu_pipeline_en_5.5.0_3.0_1727097132445.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/aia_hw01_qian_wu_pipeline_en_5.5.0_3.0_1727097132445.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("aia_hw01_qian_wu_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("aia_hw01_qian_wu_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|aia_hw01_qian_wu_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Qian-Wu/AIA_HW01 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-all_roberta_large_v1_auto_and_commute_1000_16_5_oos_en.md b/docs/_posts/ahmedlone127/2024-09-23-all_roberta_large_v1_auto_and_commute_1000_16_5_oos_en.md new file mode 100644 index 00000000000000..cd1d419cf647c2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-all_roberta_large_v1_auto_and_commute_1000_16_5_oos_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English all_roberta_large_v1_auto_and_commute_1000_16_5_oos RoBertaForSequenceClassification from fathyshalab +author: John Snow Labs +name: all_roberta_large_v1_auto_and_commute_1000_16_5_oos +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`all_roberta_large_v1_auto_and_commute_1000_16_5_oos` is a English model originally trained by fathyshalab. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/all_roberta_large_v1_auto_and_commute_1000_16_5_oos_en_5.5.0_3.0_1727134875001.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/all_roberta_large_v1_auto_and_commute_1000_16_5_oos_en_5.5.0_3.0_1727134875001.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("all_roberta_large_v1_auto_and_commute_1000_16_5_oos","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("all_roberta_large_v1_auto_and_commute_1000_16_5_oos", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|all_roberta_large_v1_auto_and_commute_1000_16_5_oos| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/fathyshalab/all-roberta-large-v1-auto_and_commute-1000-16-5-oos \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-all_roberta_large_v1_banking_10_16_5_en.md b/docs/_posts/ahmedlone127/2024-09-23-all_roberta_large_v1_banking_10_16_5_en.md new file mode 100644 index 00000000000000..1fe5a5bd96861b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-all_roberta_large_v1_banking_10_16_5_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English all_roberta_large_v1_banking_10_16_5 RoBertaForSequenceClassification from fathyshalab +author: John Snow Labs +name: all_roberta_large_v1_banking_10_16_5 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`all_roberta_large_v1_banking_10_16_5` is a English model originally trained by fathyshalab. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/all_roberta_large_v1_banking_10_16_5_en_5.5.0_3.0_1727055463492.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/all_roberta_large_v1_banking_10_16_5_en_5.5.0_3.0_1727055463492.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("all_roberta_large_v1_banking_10_16_5","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("all_roberta_large_v1_banking_10_16_5", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|all_roberta_large_v1_banking_10_16_5| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/fathyshalab/all-roberta-large-v1-banking-10-16-5 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-all_roberta_large_v1_banking_10_16_5_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-all_roberta_large_v1_banking_10_16_5_pipeline_en.md new file mode 100644 index 00000000000000..c8be765d729f0f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-all_roberta_large_v1_banking_10_16_5_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English all_roberta_large_v1_banking_10_16_5_pipeline pipeline RoBertaForSequenceClassification from fathyshalab +author: John Snow Labs +name: all_roberta_large_v1_banking_10_16_5_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`all_roberta_large_v1_banking_10_16_5_pipeline` is a English model originally trained by fathyshalab. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/all_roberta_large_v1_banking_10_16_5_pipeline_en_5.5.0_3.0_1727055527422.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/all_roberta_large_v1_banking_10_16_5_pipeline_en_5.5.0_3.0_1727055527422.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("all_roberta_large_v1_banking_10_16_5_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("all_roberta_large_v1_banking_10_16_5_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|all_roberta_large_v1_banking_10_16_5_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/fathyshalab/all-roberta-large-v1-banking-10-16-5 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-amazon_0_en.md b/docs/_posts/ahmedlone127/2024-09-23-amazon_0_en.md new file mode 100644 index 00000000000000..70f7a5ff3ccd32 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-amazon_0_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English amazon_0 DistilBertForSequenceClassification from draghicivlad +author: John Snow Labs +name: amazon_0 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`amazon_0` is a English model originally trained by draghicivlad. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/amazon_0_en_5.5.0_3.0_1727108745660.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/amazon_0_en_5.5.0_3.0_1727108745660.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("amazon_0","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("amazon_0", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|amazon_0| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/draghicivlad/amazon_0 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-amazon_0_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-amazon_0_pipeline_en.md new file mode 100644 index 00000000000000..c7150c41c5ae15 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-amazon_0_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English amazon_0_pipeline pipeline DistilBertForSequenceClassification from draghicivlad +author: John Snow Labs +name: amazon_0_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`amazon_0_pipeline` is a English model originally trained by draghicivlad. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/amazon_0_pipeline_en_5.5.0_3.0_1727108757578.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/amazon_0_pipeline_en_5.5.0_3.0_1727108757578.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("amazon_0_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("amazon_0_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|amazon_0_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/draghicivlad/amazon_0 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-amharicnewsnoncleanedsmall_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-amharicnewsnoncleanedsmall_pipeline_en.md new file mode 100644 index 00000000000000..efb835f5b8aef2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-amharicnewsnoncleanedsmall_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English amharicnewsnoncleanedsmall_pipeline pipeline XlmRoBertaForSequenceClassification from akiseid +author: John Snow Labs +name: amharicnewsnoncleanedsmall_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`amharicnewsnoncleanedsmall_pipeline` is a English model originally trained by akiseid. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/amharicnewsnoncleanedsmall_pipeline_en_5.5.0_3.0_1727089188756.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/amharicnewsnoncleanedsmall_pipeline_en_5.5.0_3.0_1727089188756.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("amharicnewsnoncleanedsmall_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("amharicnewsnoncleanedsmall_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|amharicnewsnoncleanedsmall_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|839.3 MB| + +## References + +https://huggingface.co/akiseid/AmharicNewsNonCleanedSmall + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-apps2_en.md b/docs/_posts/ahmedlone127/2024-09-23-apps2_en.md new file mode 100644 index 00000000000000..0673555019a324 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-apps2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English apps2 DistilBertForSequenceClassification from Frana9812 +author: John Snow Labs +name: apps2 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`apps2` is a English model originally trained by Frana9812. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/apps2_en_5.5.0_3.0_1727094073990.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/apps2_en_5.5.0_3.0_1727094073990.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("apps2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("apps2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|apps2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Frana9812/apps2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-apps2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-apps2_pipeline_en.md new file mode 100644 index 00000000000000..deb3a892ed1b9c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-apps2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English apps2_pipeline pipeline DistilBertForSequenceClassification from Frana9812 +author: John Snow Labs +name: apps2_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`apps2_pipeline` is a English model originally trained by Frana9812. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/apps2_pipeline_en_5.5.0_3.0_1727094085604.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/apps2_pipeline_en_5.5.0_3.0_1727094085604.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("apps2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("apps2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|apps2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Frana9812/apps2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-araroberta_luxembourgish_ar.md b/docs/_posts/ahmedlone127/2024-09-23-araroberta_luxembourgish_ar.md new file mode 100644 index 00000000000000..6cf6f43ce46131 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-araroberta_luxembourgish_ar.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Arabic araroberta_luxembourgish RoBertaEmbeddings from reemalyami +author: John Snow Labs +name: araroberta_luxembourgish +date: 2024-09-23 +tags: [ar, open_source, onnx, embeddings, roberta] +task: Embeddings +language: ar +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`araroberta_luxembourgish` is a Arabic model originally trained by reemalyami. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/araroberta_luxembourgish_ar_5.5.0_3.0_1727121659638.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/araroberta_luxembourgish_ar_5.5.0_3.0_1727121659638.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("araroberta_luxembourgish","ar") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("araroberta_luxembourgish","ar") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|araroberta_luxembourgish| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|ar| +|Size:|470.6 MB| + +## References + +https://huggingface.co/reemalyami/AraRoBERTa-LB \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-araroberta_luxembourgish_pipeline_ar.md b/docs/_posts/ahmedlone127/2024-09-23-araroberta_luxembourgish_pipeline_ar.md new file mode 100644 index 00000000000000..4729346d1ca87c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-araroberta_luxembourgish_pipeline_ar.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Arabic araroberta_luxembourgish_pipeline pipeline RoBertaEmbeddings from reemalyami +author: John Snow Labs +name: araroberta_luxembourgish_pipeline +date: 2024-09-23 +tags: [ar, open_source, pipeline, onnx] +task: Embeddings +language: ar +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`araroberta_luxembourgish_pipeline` is a Arabic model originally trained by reemalyami. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/araroberta_luxembourgish_pipeline_ar_5.5.0_3.0_1727121682180.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/araroberta_luxembourgish_pipeline_ar_5.5.0_3.0_1727121682180.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("araroberta_luxembourgish_pipeline", lang = "ar") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("araroberta_luxembourgish_pipeline", lang = "ar") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|araroberta_luxembourgish_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|ar| +|Size:|470.6 MB| + +## References + +https://huggingface.co/reemalyami/AraRoBERTa-LB + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-augmented_model_fast_1_en.md b/docs/_posts/ahmedlone127/2024-09-23-augmented_model_fast_1_en.md new file mode 100644 index 00000000000000..6c470e97d59ce1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-augmented_model_fast_1_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English augmented_model_fast_1 DistilBertForSequenceClassification from LeonardoFettucciari +author: John Snow Labs +name: augmented_model_fast_1 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`augmented_model_fast_1` is a English model originally trained by LeonardoFettucciari. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/augmented_model_fast_1_en_5.5.0_3.0_1727073517341.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/augmented_model_fast_1_en_5.5.0_3.0_1727073517341.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("augmented_model_fast_1","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("augmented_model_fast_1", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|augmented_model_fast_1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/LeonardoFettucciari/augmented_model_fast_1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-augmented_model_one_en.md b/docs/_posts/ahmedlone127/2024-09-23-augmented_model_one_en.md new file mode 100644 index 00000000000000..01b35beaf50ce8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-augmented_model_one_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English augmented_model_one DistilBertForSequenceClassification from LeonardoFettucciari +author: John Snow Labs +name: augmented_model_one +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`augmented_model_one` is a English model originally trained by LeonardoFettucciari. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/augmented_model_one_en_5.5.0_3.0_1727087106801.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/augmented_model_one_en_5.5.0_3.0_1727087106801.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("augmented_model_one","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("augmented_model_one", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|augmented_model_one| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/LeonardoFettucciari/augmented_model_one \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-augmented_model_one_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-augmented_model_one_pipeline_en.md new file mode 100644 index 00000000000000..dc3ea507e4b441 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-augmented_model_one_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English augmented_model_one_pipeline pipeline DistilBertForSequenceClassification from LeonardoFettucciari +author: John Snow Labs +name: augmented_model_one_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`augmented_model_one_pipeline` is a English model originally trained by LeonardoFettucciari. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/augmented_model_one_pipeline_en_5.5.0_3.0_1727087121223.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/augmented_model_one_pipeline_en_5.5.0_3.0_1727087121223.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("augmented_model_one_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("augmented_model_one_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|augmented_model_one_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/LeonardoFettucciari/augmented_model_one + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-autotrain_qr7os_gstst_en.md b/docs/_posts/ahmedlone127/2024-09-23-autotrain_qr7os_gstst_en.md new file mode 100644 index 00000000000000..66ed9c5e830ef9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-autotrain_qr7os_gstst_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English autotrain_qr7os_gstst RoBertaForSequenceClassification from Nishthaa321 +author: John Snow Labs +name: autotrain_qr7os_gstst +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`autotrain_qr7os_gstst` is a English model originally trained by Nishthaa321. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/autotrain_qr7os_gstst_en_5.5.0_3.0_1727135288651.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/autotrain_qr7os_gstst_en_5.5.0_3.0_1727135288651.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("autotrain_qr7os_gstst","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("autotrain_qr7os_gstst", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|autotrain_qr7os_gstst| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|468.1 MB| + +## References + +https://huggingface.co/Nishthaa321/autotrain-qr7os-gstst \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-autotrain_qr7os_gstst_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-autotrain_qr7os_gstst_pipeline_en.md new file mode 100644 index 00000000000000..6a1aefdad443bb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-autotrain_qr7os_gstst_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English autotrain_qr7os_gstst_pipeline pipeline RoBertaForSequenceClassification from Nishthaa321 +author: John Snow Labs +name: autotrain_qr7os_gstst_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`autotrain_qr7os_gstst_pipeline` is a English model originally trained by Nishthaa321. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/autotrain_qr7os_gstst_pipeline_en_5.5.0_3.0_1727135312742.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/autotrain_qr7os_gstst_pipeline_en_5.5.0_3.0_1727135312742.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("autotrain_qr7os_gstst_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("autotrain_qr7os_gstst_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|autotrain_qr7os_gstst_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|468.2 MB| + +## References + +https://huggingface.co/Nishthaa321/autotrain-qr7os-gstst + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-autotrain_xlmroberta_iuexist_50302120401_en.md b/docs/_posts/ahmedlone127/2024-09-23-autotrain_xlmroberta_iuexist_50302120401_en.md new file mode 100644 index 00000000000000..6e66a131d647fa --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-autotrain_xlmroberta_iuexist_50302120401_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English autotrain_xlmroberta_iuexist_50302120401 XlmRoBertaForSequenceClassification from Muhsabrys +author: John Snow Labs +name: autotrain_xlmroberta_iuexist_50302120401 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`autotrain_xlmroberta_iuexist_50302120401` is a English model originally trained by Muhsabrys. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/autotrain_xlmroberta_iuexist_50302120401_en_5.5.0_3.0_1727125993259.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/autotrain_xlmroberta_iuexist_50302120401_en_5.5.0_3.0_1727125993259.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("autotrain_xlmroberta_iuexist_50302120401","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("autotrain_xlmroberta_iuexist_50302120401", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|autotrain_xlmroberta_iuexist_50302120401| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.0 GB| + +## References + +https://huggingface.co/Muhsabrys/autotrain-xlmroberta-iuexist-50302120401 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-babylm_roberta_base_epoch_15_en.md b/docs/_posts/ahmedlone127/2024-09-23-babylm_roberta_base_epoch_15_en.md new file mode 100644 index 00000000000000..b24933488b0d72 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-babylm_roberta_base_epoch_15_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English babylm_roberta_base_epoch_15 RoBertaEmbeddings from Raj-Sanjay-Shah +author: John Snow Labs +name: babylm_roberta_base_epoch_15 +date: 2024-09-23 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`babylm_roberta_base_epoch_15` is a English model originally trained by Raj-Sanjay-Shah. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/babylm_roberta_base_epoch_15_en_5.5.0_3.0_1727066228126.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/babylm_roberta_base_epoch_15_en_5.5.0_3.0_1727066228126.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("babylm_roberta_base_epoch_15","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("babylm_roberta_base_epoch_15","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|babylm_roberta_base_epoch_15| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|465.7 MB| + +## References + +https://huggingface.co/Raj-Sanjay-Shah/babyLM_roberta_base_epoch_15 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-babylm_roberta_base_epoch_15_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-babylm_roberta_base_epoch_15_pipeline_en.md new file mode 100644 index 00000000000000..8fa7a53fa33581 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-babylm_roberta_base_epoch_15_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English babylm_roberta_base_epoch_15_pipeline pipeline RoBertaEmbeddings from Raj-Sanjay-Shah +author: John Snow Labs +name: babylm_roberta_base_epoch_15_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`babylm_roberta_base_epoch_15_pipeline` is a English model originally trained by Raj-Sanjay-Shah. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/babylm_roberta_base_epoch_15_pipeline_en_5.5.0_3.0_1727066250168.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/babylm_roberta_base_epoch_15_pipeline_en_5.5.0_3.0_1727066250168.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("babylm_roberta_base_epoch_15_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("babylm_roberta_base_epoch_15_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|babylm_roberta_base_epoch_15_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|465.7 MB| + +## References + +https://huggingface.co/Raj-Sanjay-Shah/babyLM_roberta_base_epoch_15 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-base_12_pipeline_tr.md b/docs/_posts/ahmedlone127/2024-09-23-base_12_pipeline_tr.md new file mode 100644 index 00000000000000..d0e3721ad9f3e1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-base_12_pipeline_tr.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Turkish base_12_pipeline pipeline WhisperForCTC from Mehtap +author: John Snow Labs +name: base_12_pipeline +date: 2024-09-23 +tags: [tr, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: tr +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`base_12_pipeline` is a Turkish model originally trained by Mehtap. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/base_12_pipeline_tr_5.5.0_3.0_1727077345335.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/base_12_pipeline_tr_5.5.0_3.0_1727077345335.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("base_12_pipeline", lang = "tr") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("base_12_pipeline", lang = "tr") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|base_12_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|tr| +|Size:|643.7 MB| + +## References + +https://huggingface.co/Mehtap/base_12 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-base_12_tr.md b/docs/_posts/ahmedlone127/2024-09-23-base_12_tr.md new file mode 100644 index 00000000000000..fcd69771ba7eb6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-base_12_tr.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Turkish base_12 WhisperForCTC from Mehtap +author: John Snow Labs +name: base_12 +date: 2024-09-23 +tags: [tr, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: tr +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`base_12` is a Turkish model originally trained by Mehtap. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/base_12_tr_5.5.0_3.0_1727077312654.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/base_12_tr_5.5.0_3.0_1727077312654.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("base_12","tr") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("base_12", "tr") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|base_12| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|tr| +|Size:|643.7 MB| + +## References + +https://huggingface.co/Mehtap/base_12 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-base_english_combined_v5_wolof_emsassist_1_0_1_8_1e_05_tough_sweep_2_en.md b/docs/_posts/ahmedlone127/2024-09-23-base_english_combined_v5_wolof_emsassist_1_0_1_8_1e_05_tough_sweep_2_en.md new file mode 100644 index 00000000000000..1d6a2b9aaa7773 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-base_english_combined_v5_wolof_emsassist_1_0_1_8_1e_05_tough_sweep_2_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English base_english_combined_v5_wolof_emsassist_1_0_1_8_1e_05_tough_sweep_2 WhisperForCTC from saahith +author: John Snow Labs +name: base_english_combined_v5_wolof_emsassist_1_0_1_8_1e_05_tough_sweep_2 +date: 2024-09-23 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`base_english_combined_v5_wolof_emsassist_1_0_1_8_1e_05_tough_sweep_2` is a English model originally trained by saahith. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/base_english_combined_v5_wolof_emsassist_1_0_1_8_1e_05_tough_sweep_2_en_5.5.0_3.0_1727052276622.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/base_english_combined_v5_wolof_emsassist_1_0_1_8_1e_05_tough_sweep_2_en_5.5.0_3.0_1727052276622.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("base_english_combined_v5_wolof_emsassist_1_0_1_8_1e_05_tough_sweep_2","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("base_english_combined_v5_wolof_emsassist_1_0_1_8_1e_05_tough_sweep_2", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|base_english_combined_v5_wolof_emsassist_1_0_1_8_1e_05_tough_sweep_2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|646.3 MB| + +## References + +https://huggingface.co/saahith/base.en-combined_v5_wo_emsAssist-1-0.1-8-1e-05-tough-sweep-2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-base_english_combined_v5_wolof_emsassist_1_0_1_8_1e_05_tough_sweep_2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-base_english_combined_v5_wolof_emsassist_1_0_1_8_1e_05_tough_sweep_2_pipeline_en.md new file mode 100644 index 00000000000000..7a57450c272448 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-base_english_combined_v5_wolof_emsassist_1_0_1_8_1e_05_tough_sweep_2_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English base_english_combined_v5_wolof_emsassist_1_0_1_8_1e_05_tough_sweep_2_pipeline pipeline WhisperForCTC from saahith +author: John Snow Labs +name: base_english_combined_v5_wolof_emsassist_1_0_1_8_1e_05_tough_sweep_2_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`base_english_combined_v5_wolof_emsassist_1_0_1_8_1e_05_tough_sweep_2_pipeline` is a English model originally trained by saahith. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/base_english_combined_v5_wolof_emsassist_1_0_1_8_1e_05_tough_sweep_2_pipeline_en_5.5.0_3.0_1727052312003.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/base_english_combined_v5_wolof_emsassist_1_0_1_8_1e_05_tough_sweep_2_pipeline_en_5.5.0_3.0_1727052312003.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("base_english_combined_v5_wolof_emsassist_1_0_1_8_1e_05_tough_sweep_2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("base_english_combined_v5_wolof_emsassist_1_0_1_8_1e_05_tough_sweep_2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|base_english_combined_v5_wolof_emsassist_1_0_1_8_1e_05_tough_sweep_2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|646.3 MB| + +## References + +https://huggingface.co/saahith/base.en-combined_v5_wo_emsAssist-1-0.1-8-1e-05-tough-sweep-2 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bds_en.md b/docs/_posts/ahmedlone127/2024-09-23-bds_en.md new file mode 100644 index 00000000000000..de32c0d32a35c7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bds_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bds DistilBertForSequenceClassification from LogischeIP +author: John Snow Labs +name: bds +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bds` is a English model originally trained by LogischeIP. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bds_en_5.5.0_3.0_1727087001288.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bds_en_5.5.0_3.0_1727087001288.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("bds","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("bds", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bds| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/LogischeIP/BDS \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bds_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-bds_pipeline_en.md new file mode 100644 index 00000000000000..d303a0d01fa03d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bds_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bds_pipeline pipeline DistilBertForSequenceClassification from LogischeIP +author: John Snow Labs +name: bds_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bds_pipeline` is a English model originally trained by LogischeIP. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bds_pipeline_en_5.5.0_3.0_1727087015908.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bds_pipeline_en_5.5.0_3.0_1727087015908.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bds_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bds_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bds_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/LogischeIP/BDS + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bert_base_cased_fake_news_en.md b/docs/_posts/ahmedlone127/2024-09-23-bert_base_cased_fake_news_en.md new file mode 100644 index 00000000000000..23c8230305dc08 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bert_base_cased_fake_news_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_cased_fake_news BertForSequenceClassification from elozano +author: John Snow Labs +name: bert_base_cased_fake_news +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_cased_fake_news` is a English model originally trained by elozano. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_cased_fake_news_en_5.5.0_3.0_1727095289073.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_cased_fake_news_en_5.5.0_3.0_1727095289073.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_cased_fake_news","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_cased_fake_news", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_cased_fake_news| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/elozano/bert-base-cased-fake-news \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bert_base_cased_fake_news_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-bert_base_cased_fake_news_pipeline_en.md new file mode 100644 index 00000000000000..d16897c64768b5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bert_base_cased_fake_news_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_base_cased_fake_news_pipeline pipeline BertForSequenceClassification from elozano +author: John Snow Labs +name: bert_base_cased_fake_news_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_cased_fake_news_pipeline` is a English model originally trained by elozano. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_cased_fake_news_pipeline_en_5.5.0_3.0_1727095308660.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_cased_fake_news_pipeline_en_5.5.0_3.0_1727095308660.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_cased_fake_news_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_cased_fake_news_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_cased_fake_news_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/elozano/bert-base-cased-fake-news + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bert_base_cased_squad2_finetuned_squad_chocolatehog_en.md b/docs/_posts/ahmedlone127/2024-09-23-bert_base_cased_squad2_finetuned_squad_chocolatehog_en.md new file mode 100644 index 00000000000000..4f42644ef631eb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bert_base_cased_squad2_finetuned_squad_chocolatehog_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_cased_squad2_finetuned_squad_chocolatehog BertForQuestionAnswering from chocolatehog +author: John Snow Labs +name: bert_base_cased_squad2_finetuned_squad_chocolatehog +date: 2024-09-23 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_cased_squad2_finetuned_squad_chocolatehog` is a English model originally trained by chocolatehog. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_cased_squad2_finetuned_squad_chocolatehog_en_5.5.0_3.0_1727070310824.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_cased_squad2_finetuned_squad_chocolatehog_en_5.5.0_3.0_1727070310824.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_cased_squad2_finetuned_squad_chocolatehog","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_cased_squad2_finetuned_squad_chocolatehog", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_cased_squad2_finetuned_squad_chocolatehog| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/chocolatehog/bert-base-cased-squad2-finetuned-squad \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bert_base_cased_squad2_finetuned_squad_chocolatehog_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-bert_base_cased_squad2_finetuned_squad_chocolatehog_pipeline_en.md new file mode 100644 index 00000000000000..13d62e516f043f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bert_base_cased_squad2_finetuned_squad_chocolatehog_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_cased_squad2_finetuned_squad_chocolatehog_pipeline pipeline BertForQuestionAnswering from chocolatehog +author: John Snow Labs +name: bert_base_cased_squad2_finetuned_squad_chocolatehog_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_cased_squad2_finetuned_squad_chocolatehog_pipeline` is a English model originally trained by chocolatehog. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_cased_squad2_finetuned_squad_chocolatehog_pipeline_en_5.5.0_3.0_1727070331020.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_cased_squad2_finetuned_squad_chocolatehog_pipeline_en_5.5.0_3.0_1727070331020.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_cased_squad2_finetuned_squad_chocolatehog_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_cased_squad2_finetuned_squad_chocolatehog_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_cased_squad2_finetuned_squad_chocolatehog_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/chocolatehog/bert-base-cased-squad2-finetuned-squad + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bert_base_cased_squad_v1_1_portuguese_ibama_v0_1_en.md b/docs/_posts/ahmedlone127/2024-09-23-bert_base_cased_squad_v1_1_portuguese_ibama_v0_1_en.md new file mode 100644 index 00000000000000..e479b908877e30 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bert_base_cased_squad_v1_1_portuguese_ibama_v0_1_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_cased_squad_v1_1_portuguese_ibama_v0_1 BertForQuestionAnswering from alcalazans +author: John Snow Labs +name: bert_base_cased_squad_v1_1_portuguese_ibama_v0_1 +date: 2024-09-23 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_cased_squad_v1_1_portuguese_ibama_v0_1` is a English model originally trained by alcalazans. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_cased_squad_v1_1_portuguese_ibama_v0_1_en_5.5.0_3.0_1727127794172.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_cased_squad_v1_1_portuguese_ibama_v0_1_en_5.5.0_3.0_1727127794172.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_cased_squad_v1_1_portuguese_ibama_v0_1","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_cased_squad_v1_1_portuguese_ibama_v0_1", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_cased_squad_v1_1_portuguese_ibama_v0_1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|405.9 MB| + +## References + +https://huggingface.co/alcalazans/bert-base-cased-squad-v1.1-pt_IBAMA_v0.1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bert_base_cased_squad_v1_1_portuguese_ibama_v0_1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-bert_base_cased_squad_v1_1_portuguese_ibama_v0_1_pipeline_en.md new file mode 100644 index 00000000000000..e8b095a1d7b9de --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bert_base_cased_squad_v1_1_portuguese_ibama_v0_1_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_cased_squad_v1_1_portuguese_ibama_v0_1_pipeline pipeline BertForQuestionAnswering from alcalazans +author: John Snow Labs +name: bert_base_cased_squad_v1_1_portuguese_ibama_v0_1_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_cased_squad_v1_1_portuguese_ibama_v0_1_pipeline` is a English model originally trained by alcalazans. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_cased_squad_v1_1_portuguese_ibama_v0_1_pipeline_en_5.5.0_3.0_1727127815103.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_cased_squad_v1_1_portuguese_ibama_v0_1_pipeline_en_5.5.0_3.0_1727127815103.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_cased_squad_v1_1_portuguese_ibama_v0_1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_cased_squad_v1_1_portuguese_ibama_v0_1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_cased_squad_v1_1_portuguese_ibama_v0_1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|405.9 MB| + +## References + +https://huggingface.co/alcalazans/bert-base-cased-squad-v1.1-pt_IBAMA_v0.1 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bert_base_multilingual_cased_ner_hrl_nttaii_pipeline_xx.md b/docs/_posts/ahmedlone127/2024-09-23-bert_base_multilingual_cased_ner_hrl_nttaii_pipeline_xx.md new file mode 100644 index 00000000000000..317f43844b4ed5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bert_base_multilingual_cased_ner_hrl_nttaii_pipeline_xx.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Multilingual bert_base_multilingual_cased_ner_hrl_nttaii_pipeline pipeline BertForTokenClassification from nttaii +author: John Snow Labs +name: bert_base_multilingual_cased_ner_hrl_nttaii_pipeline +date: 2024-09-23 +tags: [xx, open_source, pipeline, onnx] +task: Named Entity Recognition +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_multilingual_cased_ner_hrl_nttaii_pipeline` is a Multilingual model originally trained by nttaii. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_multilingual_cased_ner_hrl_nttaii_pipeline_xx_5.5.0_3.0_1727060451951.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_multilingual_cased_ner_hrl_nttaii_pipeline_xx_5.5.0_3.0_1727060451951.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_multilingual_cased_ner_hrl_nttaii_pipeline", lang = "xx") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_multilingual_cased_ner_hrl_nttaii_pipeline", lang = "xx") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_multilingual_cased_ner_hrl_nttaii_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|xx| +|Size:|665.3 MB| + +## References + +https://huggingface.co/nttaii/bert-base-multilingual-cased-ner-hrl + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bert_base_multilingual_cased_ner_hrl_nttaii_xx.md b/docs/_posts/ahmedlone127/2024-09-23-bert_base_multilingual_cased_ner_hrl_nttaii_xx.md new file mode 100644 index 00000000000000..24b6e53ab5f404 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bert_base_multilingual_cased_ner_hrl_nttaii_xx.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Multilingual bert_base_multilingual_cased_ner_hrl_nttaii BertForTokenClassification from nttaii +author: John Snow Labs +name: bert_base_multilingual_cased_ner_hrl_nttaii +date: 2024-09-23 +tags: [xx, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_multilingual_cased_ner_hrl_nttaii` is a Multilingual model originally trained by nttaii. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_multilingual_cased_ner_hrl_nttaii_xx_5.5.0_3.0_1727060419777.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_multilingual_cased_ner_hrl_nttaii_xx_5.5.0_3.0_1727060419777.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("bert_base_multilingual_cased_ner_hrl_nttaii","xx") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_base_multilingual_cased_ner_hrl_nttaii", "xx") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_multilingual_cased_ner_hrl_nttaii| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|xx| +|Size:|665.3 MB| + +## References + +https://huggingface.co/nttaii/bert-base-multilingual-cased-ner-hrl \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bert_base_spanish_wwm_cased_finetuned_qa_tar_en.md b/docs/_posts/ahmedlone127/2024-09-23-bert_base_spanish_wwm_cased_finetuned_qa_tar_en.md new file mode 100644 index 00000000000000..17e36efcc35109 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bert_base_spanish_wwm_cased_finetuned_qa_tar_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_spanish_wwm_cased_finetuned_qa_tar BertForQuestionAnswering from dccuchile +author: John Snow Labs +name: bert_base_spanish_wwm_cased_finetuned_qa_tar +date: 2024-09-23 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_spanish_wwm_cased_finetuned_qa_tar` is a English model originally trained by dccuchile. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_spanish_wwm_cased_finetuned_qa_tar_en_5.5.0_3.0_1727127872293.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_spanish_wwm_cased_finetuned_qa_tar_en_5.5.0_3.0_1727127872293.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_spanish_wwm_cased_finetuned_qa_tar","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_spanish_wwm_cased_finetuned_qa_tar", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_spanish_wwm_cased_finetuned_qa_tar| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|409.5 MB| + +## References + +https://huggingface.co/dccuchile/bert-base-spanish-wwm-cased-finetuned-qa-tar \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bert_base_spanish_wwm_cased_finetuned_qa_tar_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-bert_base_spanish_wwm_cased_finetuned_qa_tar_pipeline_en.md new file mode 100644 index 00000000000000..b959cc37a54a63 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bert_base_spanish_wwm_cased_finetuned_qa_tar_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_spanish_wwm_cased_finetuned_qa_tar_pipeline pipeline BertForQuestionAnswering from dccuchile +author: John Snow Labs +name: bert_base_spanish_wwm_cased_finetuned_qa_tar_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_spanish_wwm_cased_finetuned_qa_tar_pipeline` is a English model originally trained by dccuchile. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_spanish_wwm_cased_finetuned_qa_tar_pipeline_en_5.5.0_3.0_1727127893217.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_spanish_wwm_cased_finetuned_qa_tar_pipeline_en_5.5.0_3.0_1727127893217.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_spanish_wwm_cased_finetuned_qa_tar_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_spanish_wwm_cased_finetuned_qa_tar_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_spanish_wwm_cased_finetuned_qa_tar_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.5 MB| + +## References + +https://huggingface.co/dccuchile/bert-base-spanish-wwm-cased-finetuned-qa-tar + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bert_base_spanish_wwm_cased_nubes_es.md b/docs/_posts/ahmedlone127/2024-09-23-bert_base_spanish_wwm_cased_nubes_es.md new file mode 100644 index 00000000000000..d056cf94dd6f40 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bert_base_spanish_wwm_cased_nubes_es.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Castilian, Spanish bert_base_spanish_wwm_cased_nubes BertForTokenClassification from IIC +author: John Snow Labs +name: bert_base_spanish_wwm_cased_nubes +date: 2024-09-23 +tags: [es, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: es +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_spanish_wwm_cased_nubes` is a Castilian, Spanish model originally trained by IIC. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_spanish_wwm_cased_nubes_es_5.5.0_3.0_1727060208662.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_spanish_wwm_cased_nubes_es_5.5.0_3.0_1727060208662.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("bert_base_spanish_wwm_cased_nubes","es") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_base_spanish_wwm_cased_nubes", "es") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_spanish_wwm_cased_nubes| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|es| +|Size:|409.5 MB| + +## References + +https://huggingface.co/IIC/bert-base-spanish-wwm-cased-nubes \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bert_base_spanish_wwm_cased_nubes_pipeline_es.md b/docs/_posts/ahmedlone127/2024-09-23-bert_base_spanish_wwm_cased_nubes_pipeline_es.md new file mode 100644 index 00000000000000..68a6502081d748 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bert_base_spanish_wwm_cased_nubes_pipeline_es.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Castilian, Spanish bert_base_spanish_wwm_cased_nubes_pipeline pipeline BertForTokenClassification from IIC +author: John Snow Labs +name: bert_base_spanish_wwm_cased_nubes_pipeline +date: 2024-09-23 +tags: [es, open_source, pipeline, onnx] +task: Named Entity Recognition +language: es +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_spanish_wwm_cased_nubes_pipeline` is a Castilian, Spanish model originally trained by IIC. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_spanish_wwm_cased_nubes_pipeline_es_5.5.0_3.0_1727060228253.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_spanish_wwm_cased_nubes_pipeline_es_5.5.0_3.0_1727060228253.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_spanish_wwm_cased_nubes_pipeline", lang = "es") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_spanish_wwm_cased_nubes_pipeline", lang = "es") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_spanish_wwm_cased_nubes_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|es| +|Size:|409.5 MB| + +## References + +https://huggingface.co/IIC/bert-base-spanish-wwm-cased-nubes + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bert_base_squad_v1_1_portuguese_ibama_v0_220240904191111_en.md b/docs/_posts/ahmedlone127/2024-09-23-bert_base_squad_v1_1_portuguese_ibama_v0_220240904191111_en.md new file mode 100644 index 00000000000000..ba063d3f94f42b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bert_base_squad_v1_1_portuguese_ibama_v0_220240904191111_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_squad_v1_1_portuguese_ibama_v0_220240904191111 BertForQuestionAnswering from alcalazans +author: John Snow Labs +name: bert_base_squad_v1_1_portuguese_ibama_v0_220240904191111 +date: 2024-09-23 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_squad_v1_1_portuguese_ibama_v0_220240904191111` is a English model originally trained by alcalazans. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_squad_v1_1_portuguese_ibama_v0_220240904191111_en_5.5.0_3.0_1727127885221.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_squad_v1_1_portuguese_ibama_v0_220240904191111_en_5.5.0_3.0_1727127885221.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_squad_v1_1_portuguese_ibama_v0_220240904191111","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_squad_v1_1_portuguese_ibama_v0_220240904191111", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_squad_v1_1_portuguese_ibama_v0_220240904191111| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|405.9 MB| + +## References + +https://huggingface.co/alcalazans/bert-base-squad-v1.1-pt-IBAMA_v0.220240904191111 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bert_base_squad_v1_1_portuguese_ibama_v0_220240904191111_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-bert_base_squad_v1_1_portuguese_ibama_v0_220240904191111_pipeline_en.md new file mode 100644 index 00000000000000..ea310d5409a402 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bert_base_squad_v1_1_portuguese_ibama_v0_220240904191111_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_squad_v1_1_portuguese_ibama_v0_220240904191111_pipeline pipeline BertForQuestionAnswering from alcalazans +author: John Snow Labs +name: bert_base_squad_v1_1_portuguese_ibama_v0_220240904191111_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_squad_v1_1_portuguese_ibama_v0_220240904191111_pipeline` is a English model originally trained by alcalazans. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_squad_v1_1_portuguese_ibama_v0_220240904191111_pipeline_en_5.5.0_3.0_1727127906219.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_squad_v1_1_portuguese_ibama_v0_220240904191111_pipeline_en_5.5.0_3.0_1727127906219.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_squad_v1_1_portuguese_ibama_v0_220240904191111_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_squad_v1_1_portuguese_ibama_v0_220240904191111_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_squad_v1_1_portuguese_ibama_v0_220240904191111_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|405.9 MB| + +## References + +https://huggingface.co/alcalazans/bert-base-squad-v1.1-pt-IBAMA_v0.220240904191111 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bert_base_squad_v1_1_portuguese_ibama_v0_420240914220642_en.md b/docs/_posts/ahmedlone127/2024-09-23-bert_base_squad_v1_1_portuguese_ibama_v0_420240914220642_en.md new file mode 100644 index 00000000000000..57ca8ab1caaa15 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bert_base_squad_v1_1_portuguese_ibama_v0_420240914220642_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_squad_v1_1_portuguese_ibama_v0_420240914220642 BertForQuestionAnswering from alcalazans +author: John Snow Labs +name: bert_base_squad_v1_1_portuguese_ibama_v0_420240914220642 +date: 2024-09-23 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_squad_v1_1_portuguese_ibama_v0_420240914220642` is a English model originally trained by alcalazans. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_squad_v1_1_portuguese_ibama_v0_420240914220642_en_5.5.0_3.0_1727128019484.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_squad_v1_1_portuguese_ibama_v0_420240914220642_en_5.5.0_3.0_1727128019484.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_squad_v1_1_portuguese_ibama_v0_420240914220642","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_squad_v1_1_portuguese_ibama_v0_420240914220642", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_squad_v1_1_portuguese_ibama_v0_420240914220642| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|405.9 MB| + +## References + +https://huggingface.co/alcalazans/bert-base-squad-v1.1-pt-IBAMA_v0.420240914220642 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bert_base_squad_v1_1_portuguese_ibama_v0_420240914220642_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-bert_base_squad_v1_1_portuguese_ibama_v0_420240914220642_pipeline_en.md new file mode 100644 index 00000000000000..24a1e540c6aa7f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bert_base_squad_v1_1_portuguese_ibama_v0_420240914220642_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_squad_v1_1_portuguese_ibama_v0_420240914220642_pipeline pipeline BertForQuestionAnswering from alcalazans +author: John Snow Labs +name: bert_base_squad_v1_1_portuguese_ibama_v0_420240914220642_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_squad_v1_1_portuguese_ibama_v0_420240914220642_pipeline` is a English model originally trained by alcalazans. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_squad_v1_1_portuguese_ibama_v0_420240914220642_pipeline_en_5.5.0_3.0_1727128040724.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_squad_v1_1_portuguese_ibama_v0_420240914220642_pipeline_en_5.5.0_3.0_1727128040724.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_squad_v1_1_portuguese_ibama_v0_420240914220642_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_squad_v1_1_portuguese_ibama_v0_420240914220642_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_squad_v1_1_portuguese_ibama_v0_420240914220642_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|405.9 MB| + +## References + +https://huggingface.co/alcalazans/bert-base-squad-v1.1-pt-IBAMA_v0.420240914220642 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bert_base_squad_v1_1_portuguese_ibama_v0_420240915001955_en.md b/docs/_posts/ahmedlone127/2024-09-23-bert_base_squad_v1_1_portuguese_ibama_v0_420240915001955_en.md new file mode 100644 index 00000000000000..2217b154314396 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bert_base_squad_v1_1_portuguese_ibama_v0_420240915001955_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_squad_v1_1_portuguese_ibama_v0_420240915001955 BertForQuestionAnswering from alcalazans +author: John Snow Labs +name: bert_base_squad_v1_1_portuguese_ibama_v0_420240915001955 +date: 2024-09-23 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_squad_v1_1_portuguese_ibama_v0_420240915001955` is a English model originally trained by alcalazans. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_squad_v1_1_portuguese_ibama_v0_420240915001955_en_5.5.0_3.0_1727127747366.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_squad_v1_1_portuguese_ibama_v0_420240915001955_en_5.5.0_3.0_1727127747366.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_squad_v1_1_portuguese_ibama_v0_420240915001955","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_squad_v1_1_portuguese_ibama_v0_420240915001955", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_squad_v1_1_portuguese_ibama_v0_420240915001955| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|405.9 MB| + +## References + +https://huggingface.co/alcalazans/bert-base-squad-v1.1-pt-IBAMA_v0.420240915001955 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bert_base_squad_v1_1_portuguese_ibama_v0_420240915001955_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-bert_base_squad_v1_1_portuguese_ibama_v0_420240915001955_pipeline_en.md new file mode 100644 index 00000000000000..cdee93cda10ecf --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bert_base_squad_v1_1_portuguese_ibama_v0_420240915001955_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_squad_v1_1_portuguese_ibama_v0_420240915001955_pipeline pipeline BertForQuestionAnswering from alcalazans +author: John Snow Labs +name: bert_base_squad_v1_1_portuguese_ibama_v0_420240915001955_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_squad_v1_1_portuguese_ibama_v0_420240915001955_pipeline` is a English model originally trained by alcalazans. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_squad_v1_1_portuguese_ibama_v0_420240915001955_pipeline_en_5.5.0_3.0_1727127770037.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_squad_v1_1_portuguese_ibama_v0_420240915001955_pipeline_en_5.5.0_3.0_1727127770037.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_squad_v1_1_portuguese_ibama_v0_420240915001955_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_squad_v1_1_portuguese_ibama_v0_420240915001955_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_squad_v1_1_portuguese_ibama_v0_420240915001955_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|405.9 MB| + +## References + +https://huggingface.co/alcalazans/bert-base-squad-v1.1-pt-IBAMA_v0.420240915001955 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bert_base_uncase_conll2012_en.md b/docs/_posts/ahmedlone127/2024-09-23-bert_base_uncase_conll2012_en.md new file mode 100644 index 00000000000000..ee070370850dba --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bert_base_uncase_conll2012_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_uncase_conll2012 BertForTokenClassification from sarveshsk +author: John Snow Labs +name: bert_base_uncase_conll2012 +date: 2024-09-23 +tags: [en, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncase_conll2012` is a English model originally trained by sarveshsk. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncase_conll2012_en_5.5.0_3.0_1727111304268.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncase_conll2012_en_5.5.0_3.0_1727111304268.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("bert_base_uncase_conll2012","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_base_uncase_conll2012", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncase_conll2012| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|407.3 MB| + +## References + +https://huggingface.co/sarveshsk/bert_base_uncase_Conll2012 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bert_base_uncased_conll2003_joshuaphua_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-bert_base_uncased_conll2003_joshuaphua_pipeline_en.md new file mode 100644 index 00000000000000..b94b9ed8b9c47b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bert_base_uncased_conll2003_joshuaphua_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_base_uncased_conll2003_joshuaphua_pipeline pipeline BertForTokenClassification from joshuaphua +author: John Snow Labs +name: bert_base_uncased_conll2003_joshuaphua_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_conll2003_joshuaphua_pipeline` is a English model originally trained by joshuaphua. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_conll2003_joshuaphua_pipeline_en_5.5.0_3.0_1727098274971.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_conll2003_joshuaphua_pipeline_en_5.5.0_3.0_1727098274971.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_conll2003_joshuaphua_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_conll2003_joshuaphua_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_conll2003_joshuaphua_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/joshuaphua/bert-base-uncased-conll2003 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bert_base_uncased_ep_1_45_b_32_lr_1_2e_06_dp_0_3_swati_300_southern_sotho_false_fh_true_hs_0_en.md b/docs/_posts/ahmedlone127/2024-09-23-bert_base_uncased_ep_1_45_b_32_lr_1_2e_06_dp_0_3_swati_300_southern_sotho_false_fh_true_hs_0_en.md new file mode 100644 index 00000000000000..b346c78565324e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bert_base_uncased_ep_1_45_b_32_lr_1_2e_06_dp_0_3_swati_300_southern_sotho_false_fh_true_hs_0_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_uncased_ep_1_45_b_32_lr_1_2e_06_dp_0_3_swati_300_southern_sotho_false_fh_true_hs_0 BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_ep_1_45_b_32_lr_1_2e_06_dp_0_3_swati_300_southern_sotho_false_fh_true_hs_0 +date: 2024-09-23 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_ep_1_45_b_32_lr_1_2e_06_dp_0_3_swati_300_southern_sotho_false_fh_true_hs_0` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_ep_1_45_b_32_lr_1_2e_06_dp_0_3_swati_300_southern_sotho_false_fh_true_hs_0_en_5.5.0_3.0_1727127747505.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_ep_1_45_b_32_lr_1_2e_06_dp_0_3_swati_300_southern_sotho_false_fh_true_hs_0_en_5.5.0_3.0_1727127747505.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_ep_1_45_b_32_lr_1_2e_06_dp_0_3_swati_300_southern_sotho_false_fh_true_hs_0","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_ep_1_45_b_32_lr_1_2e_06_dp_0_3_swati_300_southern_sotho_false_fh_true_hs_0", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_ep_1_45_b_32_lr_1_2e_06_dp_0_3_swati_300_southern_sotho_false_fh_true_hs_0| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-ep-1.45-b-32-lr-1.2e-06-dp-0.3-ss-300-st-False-fh-True-hs-0 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bert_base_uncased_ep_1_45_b_32_lr_1_2e_06_dp_0_3_swati_300_southern_sotho_false_fh_true_hs_0_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-bert_base_uncased_ep_1_45_b_32_lr_1_2e_06_dp_0_3_swati_300_southern_sotho_false_fh_true_hs_0_pipeline_en.md new file mode 100644 index 00000000000000..08f9904ba2f2f8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bert_base_uncased_ep_1_45_b_32_lr_1_2e_06_dp_0_3_swati_300_southern_sotho_false_fh_true_hs_0_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_uncased_ep_1_45_b_32_lr_1_2e_06_dp_0_3_swati_300_southern_sotho_false_fh_true_hs_0_pipeline pipeline BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_ep_1_45_b_32_lr_1_2e_06_dp_0_3_swati_300_southern_sotho_false_fh_true_hs_0_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_ep_1_45_b_32_lr_1_2e_06_dp_0_3_swati_300_southern_sotho_false_fh_true_hs_0_pipeline` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_ep_1_45_b_32_lr_1_2e_06_dp_0_3_swati_300_southern_sotho_false_fh_true_hs_0_pipeline_en_5.5.0_3.0_1727127772746.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_ep_1_45_b_32_lr_1_2e_06_dp_0_3_swati_300_southern_sotho_false_fh_true_hs_0_pipeline_en_5.5.0_3.0_1727127772746.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_ep_1_45_b_32_lr_1_2e_06_dp_0_3_swati_300_southern_sotho_false_fh_true_hs_0_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_ep_1_45_b_32_lr_1_2e_06_dp_0_3_swati_300_southern_sotho_false_fh_true_hs_0_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_ep_1_45_b_32_lr_1_2e_06_dp_0_3_swati_300_southern_sotho_false_fh_true_hs_0_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-ep-1.45-b-32-lr-1.2e-06-dp-0.3-ss-300-st-False-fh-True-hs-0 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bert_base_uncased_finetune_squad_ep_0_8_lr_1e_05_wd_0_001_dp_0_99999_swati_140000_en.md b/docs/_posts/ahmedlone127/2024-09-23-bert_base_uncased_finetune_squad_ep_0_8_lr_1e_05_wd_0_001_dp_0_99999_swati_140000_en.md new file mode 100644 index 00000000000000..38821d503e02e8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bert_base_uncased_finetune_squad_ep_0_8_lr_1e_05_wd_0_001_dp_0_99999_swati_140000_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_uncased_finetune_squad_ep_0_8_lr_1e_05_wd_0_001_dp_0_99999_swati_140000 BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_finetune_squad_ep_0_8_lr_1e_05_wd_0_001_dp_0_99999_swati_140000 +date: 2024-09-23 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetune_squad_ep_0_8_lr_1e_05_wd_0_001_dp_0_99999_swati_140000` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_0_8_lr_1e_05_wd_0_001_dp_0_99999_swati_140000_en_5.5.0_3.0_1727050170173.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_0_8_lr_1e_05_wd_0_001_dp_0_99999_swati_140000_en_5.5.0_3.0_1727050170173.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetune_squad_ep_0_8_lr_1e_05_wd_0_001_dp_0_99999_swati_140000","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetune_squad_ep_0_8_lr_1e_05_wd_0_001_dp_0_99999_swati_140000", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetune_squad_ep_0_8_lr_1e_05_wd_0_001_dp_0_99999_swati_140000| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-finetune-squad-ep-0.8-lr-1e-05-wd-0.001-dp-0.99999-ss-140000 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bert_base_uncased_finetune_squad_ep_0_8_lr_1e_05_wd_0_001_dp_0_99999_swati_140000_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-bert_base_uncased_finetune_squad_ep_0_8_lr_1e_05_wd_0_001_dp_0_99999_swati_140000_pipeline_en.md new file mode 100644 index 00000000000000..b6dbde2244a7a6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bert_base_uncased_finetune_squad_ep_0_8_lr_1e_05_wd_0_001_dp_0_99999_swati_140000_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_uncased_finetune_squad_ep_0_8_lr_1e_05_wd_0_001_dp_0_99999_swati_140000_pipeline pipeline BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_finetune_squad_ep_0_8_lr_1e_05_wd_0_001_dp_0_99999_swati_140000_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetune_squad_ep_0_8_lr_1e_05_wd_0_001_dp_0_99999_swati_140000_pipeline` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_0_8_lr_1e_05_wd_0_001_dp_0_99999_swati_140000_pipeline_en_5.5.0_3.0_1727050191426.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_0_8_lr_1e_05_wd_0_001_dp_0_99999_swati_140000_pipeline_en_5.5.0_3.0_1727050191426.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_finetune_squad_ep_0_8_lr_1e_05_wd_0_001_dp_0_99999_swati_140000_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_finetune_squad_ep_0_8_lr_1e_05_wd_0_001_dp_0_99999_swati_140000_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetune_squad_ep_0_8_lr_1e_05_wd_0_001_dp_0_99999_swati_140000_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-finetune-squad-ep-0.8-lr-1e-05-wd-0.001-dp-0.99999-ss-140000 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bert_base_uncased_finetune_squad_ep_10_0_lr_4e_07_wd_1e_05_dp_1_0_swati_100_southern_sotho_false_fh_true_hs_0_en.md b/docs/_posts/ahmedlone127/2024-09-23-bert_base_uncased_finetune_squad_ep_10_0_lr_4e_07_wd_1e_05_dp_1_0_swati_100_southern_sotho_false_fh_true_hs_0_en.md new file mode 100644 index 00000000000000..59042b0c3dfcc3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bert_base_uncased_finetune_squad_ep_10_0_lr_4e_07_wd_1e_05_dp_1_0_swati_100_southern_sotho_false_fh_true_hs_0_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_uncased_finetune_squad_ep_10_0_lr_4e_07_wd_1e_05_dp_1_0_swati_100_southern_sotho_false_fh_true_hs_0 BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_finetune_squad_ep_10_0_lr_4e_07_wd_1e_05_dp_1_0_swati_100_southern_sotho_false_fh_true_hs_0 +date: 2024-09-23 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetune_squad_ep_10_0_lr_4e_07_wd_1e_05_dp_1_0_swati_100_southern_sotho_false_fh_true_hs_0` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_10_0_lr_4e_07_wd_1e_05_dp_1_0_swati_100_southern_sotho_false_fh_true_hs_0_en_5.5.0_3.0_1727049969744.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_10_0_lr_4e_07_wd_1e_05_dp_1_0_swati_100_southern_sotho_false_fh_true_hs_0_en_5.5.0_3.0_1727049969744.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetune_squad_ep_10_0_lr_4e_07_wd_1e_05_dp_1_0_swati_100_southern_sotho_false_fh_true_hs_0","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetune_squad_ep_10_0_lr_4e_07_wd_1e_05_dp_1_0_swati_100_southern_sotho_false_fh_true_hs_0", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetune_squad_ep_10_0_lr_4e_07_wd_1e_05_dp_1_0_swati_100_southern_sotho_false_fh_true_hs_0| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-finetune-squad-ep-10.0-lr-4e-07-wd-1e-05-dp-1.0-ss-100-st-False-fh-True-hs-0 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bert_base_uncased_finetune_squad_ep_10_0_lr_4e_07_wd_1e_05_dp_1_0_swati_100_southern_sotho_false_fh_true_hs_0_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-bert_base_uncased_finetune_squad_ep_10_0_lr_4e_07_wd_1e_05_dp_1_0_swati_100_southern_sotho_false_fh_true_hs_0_pipeline_en.md new file mode 100644 index 00000000000000..af2c00e78d86de --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bert_base_uncased_finetune_squad_ep_10_0_lr_4e_07_wd_1e_05_dp_1_0_swati_100_southern_sotho_false_fh_true_hs_0_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_uncased_finetune_squad_ep_10_0_lr_4e_07_wd_1e_05_dp_1_0_swati_100_southern_sotho_false_fh_true_hs_0_pipeline pipeline BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_finetune_squad_ep_10_0_lr_4e_07_wd_1e_05_dp_1_0_swati_100_southern_sotho_false_fh_true_hs_0_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetune_squad_ep_10_0_lr_4e_07_wd_1e_05_dp_1_0_swati_100_southern_sotho_false_fh_true_hs_0_pipeline` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_10_0_lr_4e_07_wd_1e_05_dp_1_0_swati_100_southern_sotho_false_fh_true_hs_0_pipeline_en_5.5.0_3.0_1727049992673.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_10_0_lr_4e_07_wd_1e_05_dp_1_0_swati_100_southern_sotho_false_fh_true_hs_0_pipeline_en_5.5.0_3.0_1727049992673.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_finetune_squad_ep_10_0_lr_4e_07_wd_1e_05_dp_1_0_swati_100_southern_sotho_false_fh_true_hs_0_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_finetune_squad_ep_10_0_lr_4e_07_wd_1e_05_dp_1_0_swati_100_southern_sotho_false_fh_true_hs_0_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetune_squad_ep_10_0_lr_4e_07_wd_1e_05_dp_1_0_swati_100_southern_sotho_false_fh_true_hs_0_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-finetune-squad-ep-10.0-lr-4e-07-wd-1e-05-dp-1.0-ss-100-st-False-fh-True-hs-0 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bert_base_uncased_finetune_squad_ep_1_0_lr_1e_05_wd_0_001_dp_0_99999_swati_900_en.md b/docs/_posts/ahmedlone127/2024-09-23-bert_base_uncased_finetune_squad_ep_1_0_lr_1e_05_wd_0_001_dp_0_99999_swati_900_en.md new file mode 100644 index 00000000000000..34e87f140b1dc3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bert_base_uncased_finetune_squad_ep_1_0_lr_1e_05_wd_0_001_dp_0_99999_swati_900_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_uncased_finetune_squad_ep_1_0_lr_1e_05_wd_0_001_dp_0_99999_swati_900 BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_finetune_squad_ep_1_0_lr_1e_05_wd_0_001_dp_0_99999_swati_900 +date: 2024-09-23 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetune_squad_ep_1_0_lr_1e_05_wd_0_001_dp_0_99999_swati_900` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_1_0_lr_1e_05_wd_0_001_dp_0_99999_swati_900_en_5.5.0_3.0_1727050260036.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_1_0_lr_1e_05_wd_0_001_dp_0_99999_swati_900_en_5.5.0_3.0_1727050260036.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetune_squad_ep_1_0_lr_1e_05_wd_0_001_dp_0_99999_swati_900","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetune_squad_ep_1_0_lr_1e_05_wd_0_001_dp_0_99999_swati_900", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetune_squad_ep_1_0_lr_1e_05_wd_0_001_dp_0_99999_swati_900| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-finetune-squad-ep-1.0-lr-1e-05-wd-0.001-dp-0.99999-ss-900 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bert_base_uncased_finetune_squad_ep_1_0_lr_1e_05_wd_0_001_dp_0_99999_swati_900_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-bert_base_uncased_finetune_squad_ep_1_0_lr_1e_05_wd_0_001_dp_0_99999_swati_900_pipeline_en.md new file mode 100644 index 00000000000000..3bd744f5790d2e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bert_base_uncased_finetune_squad_ep_1_0_lr_1e_05_wd_0_001_dp_0_99999_swati_900_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_uncased_finetune_squad_ep_1_0_lr_1e_05_wd_0_001_dp_0_99999_swati_900_pipeline pipeline BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_finetune_squad_ep_1_0_lr_1e_05_wd_0_001_dp_0_99999_swati_900_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetune_squad_ep_1_0_lr_1e_05_wd_0_001_dp_0_99999_swati_900_pipeline` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_1_0_lr_1e_05_wd_0_001_dp_0_99999_swati_900_pipeline_en_5.5.0_3.0_1727050282501.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_1_0_lr_1e_05_wd_0_001_dp_0_99999_swati_900_pipeline_en_5.5.0_3.0_1727050282501.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_finetune_squad_ep_1_0_lr_1e_05_wd_0_001_dp_0_99999_swati_900_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_finetune_squad_ep_1_0_lr_1e_05_wd_0_001_dp_0_99999_swati_900_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetune_squad_ep_1_0_lr_1e_05_wd_0_001_dp_0_99999_swati_900_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-finetune-squad-ep-1.0-lr-1e-05-wd-0.001-dp-0.99999-ss-900 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bert_base_uncased_finetune_squad_ep_1_0_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_100_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-bert_base_uncased_finetune_squad_ep_1_0_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_100_pipeline_en.md new file mode 100644 index 00000000000000..c951f366efd8ed --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bert_base_uncased_finetune_squad_ep_1_0_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_100_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_uncased_finetune_squad_ep_1_0_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_100_pipeline pipeline BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_finetune_squad_ep_1_0_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_100_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetune_squad_ep_1_0_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_100_pipeline` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_1_0_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_100_pipeline_en_5.5.0_3.0_1727049603752.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_1_0_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_100_pipeline_en_5.5.0_3.0_1727049603752.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_finetune_squad_ep_1_0_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_100_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_finetune_squad_ep_1_0_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_100_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetune_squad_ep_1_0_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_100_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-finetune-squad-ep-1.0-lr-4e-07-wd-1e-05-dp-1.0-ss-0-st-False-fh-False-hs-100 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bert_base_uncased_finetune_squad_ep_1_0_lr_4e_07_wd_1e_05_dp_1_0_swati_100_southern_sotho_false_fh_true_hs_0_en.md b/docs/_posts/ahmedlone127/2024-09-23-bert_base_uncased_finetune_squad_ep_1_0_lr_4e_07_wd_1e_05_dp_1_0_swati_100_southern_sotho_false_fh_true_hs_0_en.md new file mode 100644 index 00000000000000..b5d4c0f997a4cc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bert_base_uncased_finetune_squad_ep_1_0_lr_4e_07_wd_1e_05_dp_1_0_swati_100_southern_sotho_false_fh_true_hs_0_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_uncased_finetune_squad_ep_1_0_lr_4e_07_wd_1e_05_dp_1_0_swati_100_southern_sotho_false_fh_true_hs_0 BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_finetune_squad_ep_1_0_lr_4e_07_wd_1e_05_dp_1_0_swati_100_southern_sotho_false_fh_true_hs_0 +date: 2024-09-23 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetune_squad_ep_1_0_lr_4e_07_wd_1e_05_dp_1_0_swati_100_southern_sotho_false_fh_true_hs_0` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_1_0_lr_4e_07_wd_1e_05_dp_1_0_swati_100_southern_sotho_false_fh_true_hs_0_en_5.5.0_3.0_1727128027683.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_1_0_lr_4e_07_wd_1e_05_dp_1_0_swati_100_southern_sotho_false_fh_true_hs_0_en_5.5.0_3.0_1727128027683.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetune_squad_ep_1_0_lr_4e_07_wd_1e_05_dp_1_0_swati_100_southern_sotho_false_fh_true_hs_0","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetune_squad_ep_1_0_lr_4e_07_wd_1e_05_dp_1_0_swati_100_southern_sotho_false_fh_true_hs_0", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetune_squad_ep_1_0_lr_4e_07_wd_1e_05_dp_1_0_swati_100_southern_sotho_false_fh_true_hs_0| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-finetune-squad-ep-1.0-lr-4e-07-wd-1e-05-dp-1.0-ss-100-st-False-fh-True-hs-0 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bert_base_uncased_finetune_squad_ep_1_0_lr_4e_07_wd_1e_05_dp_1_0_swati_100_southern_sotho_false_fh_true_hs_0_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-bert_base_uncased_finetune_squad_ep_1_0_lr_4e_07_wd_1e_05_dp_1_0_swati_100_southern_sotho_false_fh_true_hs_0_pipeline_en.md new file mode 100644 index 00000000000000..60df331cab47d4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bert_base_uncased_finetune_squad_ep_1_0_lr_4e_07_wd_1e_05_dp_1_0_swati_100_southern_sotho_false_fh_true_hs_0_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_uncased_finetune_squad_ep_1_0_lr_4e_07_wd_1e_05_dp_1_0_swati_100_southern_sotho_false_fh_true_hs_0_pipeline pipeline BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_finetune_squad_ep_1_0_lr_4e_07_wd_1e_05_dp_1_0_swati_100_southern_sotho_false_fh_true_hs_0_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetune_squad_ep_1_0_lr_4e_07_wd_1e_05_dp_1_0_swati_100_southern_sotho_false_fh_true_hs_0_pipeline` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_1_0_lr_4e_07_wd_1e_05_dp_1_0_swati_100_southern_sotho_false_fh_true_hs_0_pipeline_en_5.5.0_3.0_1727128050307.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_1_0_lr_4e_07_wd_1e_05_dp_1_0_swati_100_southern_sotho_false_fh_true_hs_0_pipeline_en_5.5.0_3.0_1727128050307.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_finetune_squad_ep_1_0_lr_4e_07_wd_1e_05_dp_1_0_swati_100_southern_sotho_false_fh_true_hs_0_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_finetune_squad_ep_1_0_lr_4e_07_wd_1e_05_dp_1_0_swati_100_southern_sotho_false_fh_true_hs_0_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetune_squad_ep_1_0_lr_4e_07_wd_1e_05_dp_1_0_swati_100_southern_sotho_false_fh_true_hs_0_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-finetune-squad-ep-1.0-lr-4e-07-wd-1e-05-dp-1.0-ss-100-st-False-fh-True-hs-0 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bert_base_uncased_finetune_squad_ep_2_0_lr_1e_05_wd_0_001_dp_0_2_swati_4664_southern_sotho_false_fh_true_hs_666_en.md b/docs/_posts/ahmedlone127/2024-09-23-bert_base_uncased_finetune_squad_ep_2_0_lr_1e_05_wd_0_001_dp_0_2_swati_4664_southern_sotho_false_fh_true_hs_666_en.md new file mode 100644 index 00000000000000..02b8b0d755c8ca --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bert_base_uncased_finetune_squad_ep_2_0_lr_1e_05_wd_0_001_dp_0_2_swati_4664_southern_sotho_false_fh_true_hs_666_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_uncased_finetune_squad_ep_2_0_lr_1e_05_wd_0_001_dp_0_2_swati_4664_southern_sotho_false_fh_true_hs_666 BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_finetune_squad_ep_2_0_lr_1e_05_wd_0_001_dp_0_2_swati_4664_southern_sotho_false_fh_true_hs_666 +date: 2024-09-23 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetune_squad_ep_2_0_lr_1e_05_wd_0_001_dp_0_2_swati_4664_southern_sotho_false_fh_true_hs_666` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_2_0_lr_1e_05_wd_0_001_dp_0_2_swati_4664_southern_sotho_false_fh_true_hs_666_en_5.5.0_3.0_1727049749165.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_2_0_lr_1e_05_wd_0_001_dp_0_2_swati_4664_southern_sotho_false_fh_true_hs_666_en_5.5.0_3.0_1727049749165.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetune_squad_ep_2_0_lr_1e_05_wd_0_001_dp_0_2_swati_4664_southern_sotho_false_fh_true_hs_666","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetune_squad_ep_2_0_lr_1e_05_wd_0_001_dp_0_2_swati_4664_southern_sotho_false_fh_true_hs_666", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetune_squad_ep_2_0_lr_1e_05_wd_0_001_dp_0_2_swati_4664_southern_sotho_false_fh_true_hs_666| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-finetune-squad-ep-2.0-lr-1e-05-wd-0.001-dp-0.2-ss-4664-st-False-fh-True-hs-666 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bert_base_uncased_finetune_squad_ep_2_0_lr_1e_05_wd_0_001_dp_0_2_swati_4664_southern_sotho_false_fh_true_hs_666_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-bert_base_uncased_finetune_squad_ep_2_0_lr_1e_05_wd_0_001_dp_0_2_swati_4664_southern_sotho_false_fh_true_hs_666_pipeline_en.md new file mode 100644 index 00000000000000..08ac2880664dd0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bert_base_uncased_finetune_squad_ep_2_0_lr_1e_05_wd_0_001_dp_0_2_swati_4664_southern_sotho_false_fh_true_hs_666_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_uncased_finetune_squad_ep_2_0_lr_1e_05_wd_0_001_dp_0_2_swati_4664_southern_sotho_false_fh_true_hs_666_pipeline pipeline BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_finetune_squad_ep_2_0_lr_1e_05_wd_0_001_dp_0_2_swati_4664_southern_sotho_false_fh_true_hs_666_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetune_squad_ep_2_0_lr_1e_05_wd_0_001_dp_0_2_swati_4664_southern_sotho_false_fh_true_hs_666_pipeline` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_2_0_lr_1e_05_wd_0_001_dp_0_2_swati_4664_southern_sotho_false_fh_true_hs_666_pipeline_en_5.5.0_3.0_1727049773921.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_2_0_lr_1e_05_wd_0_001_dp_0_2_swati_4664_southern_sotho_false_fh_true_hs_666_pipeline_en_5.5.0_3.0_1727049773921.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_finetune_squad_ep_2_0_lr_1e_05_wd_0_001_dp_0_2_swati_4664_southern_sotho_false_fh_true_hs_666_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_finetune_squad_ep_2_0_lr_1e_05_wd_0_001_dp_0_2_swati_4664_southern_sotho_false_fh_true_hs_666_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetune_squad_ep_2_0_lr_1e_05_wd_0_001_dp_0_2_swati_4664_southern_sotho_false_fh_true_hs_666_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-finetune-squad-ep-2.0-lr-1e-05-wd-0.001-dp-0.2-ss-4664-st-False-fh-True-hs-666 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bert_base_uncased_finetune_squad_ep_2_0_lr_1e_06_wd_0_001_dp_0_2_swati_2882_southern_sotho_false_fh_true_hs_666_en.md b/docs/_posts/ahmedlone127/2024-09-23-bert_base_uncased_finetune_squad_ep_2_0_lr_1e_06_wd_0_001_dp_0_2_swati_2882_southern_sotho_false_fh_true_hs_666_en.md new file mode 100644 index 00000000000000..b29ae22b84f298 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bert_base_uncased_finetune_squad_ep_2_0_lr_1e_06_wd_0_001_dp_0_2_swati_2882_southern_sotho_false_fh_true_hs_666_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_uncased_finetune_squad_ep_2_0_lr_1e_06_wd_0_001_dp_0_2_swati_2882_southern_sotho_false_fh_true_hs_666 BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_finetune_squad_ep_2_0_lr_1e_06_wd_0_001_dp_0_2_swati_2882_southern_sotho_false_fh_true_hs_666 +date: 2024-09-23 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetune_squad_ep_2_0_lr_1e_06_wd_0_001_dp_0_2_swati_2882_southern_sotho_false_fh_true_hs_666` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_2_0_lr_1e_06_wd_0_001_dp_0_2_swati_2882_southern_sotho_false_fh_true_hs_666_en_5.5.0_3.0_1727050070238.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_2_0_lr_1e_06_wd_0_001_dp_0_2_swati_2882_southern_sotho_false_fh_true_hs_666_en_5.5.0_3.0_1727050070238.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetune_squad_ep_2_0_lr_1e_06_wd_0_001_dp_0_2_swati_2882_southern_sotho_false_fh_true_hs_666","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetune_squad_ep_2_0_lr_1e_06_wd_0_001_dp_0_2_swati_2882_southern_sotho_false_fh_true_hs_666", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetune_squad_ep_2_0_lr_1e_06_wd_0_001_dp_0_2_swati_2882_southern_sotho_false_fh_true_hs_666| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-finetune-squad-ep-2.0-lr-1e-06-wd-0.001-dp-0.2-ss-2882-st-False-fh-True-hs-666 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bert_base_uncased_finetune_squad_ep_2_0_lr_1e_06_wd_0_001_dp_0_2_swati_2882_southern_sotho_false_fh_true_hs_666_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-bert_base_uncased_finetune_squad_ep_2_0_lr_1e_06_wd_0_001_dp_0_2_swati_2882_southern_sotho_false_fh_true_hs_666_pipeline_en.md new file mode 100644 index 00000000000000..c73448344b1267 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bert_base_uncased_finetune_squad_ep_2_0_lr_1e_06_wd_0_001_dp_0_2_swati_2882_southern_sotho_false_fh_true_hs_666_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_uncased_finetune_squad_ep_2_0_lr_1e_06_wd_0_001_dp_0_2_swati_2882_southern_sotho_false_fh_true_hs_666_pipeline pipeline BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_finetune_squad_ep_2_0_lr_1e_06_wd_0_001_dp_0_2_swati_2882_southern_sotho_false_fh_true_hs_666_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetune_squad_ep_2_0_lr_1e_06_wd_0_001_dp_0_2_swati_2882_southern_sotho_false_fh_true_hs_666_pipeline` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_2_0_lr_1e_06_wd_0_001_dp_0_2_swati_2882_southern_sotho_false_fh_true_hs_666_pipeline_en_5.5.0_3.0_1727050091038.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_2_0_lr_1e_06_wd_0_001_dp_0_2_swati_2882_southern_sotho_false_fh_true_hs_666_pipeline_en_5.5.0_3.0_1727050091038.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_finetune_squad_ep_2_0_lr_1e_06_wd_0_001_dp_0_2_swati_2882_southern_sotho_false_fh_true_hs_666_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_finetune_squad_ep_2_0_lr_1e_06_wd_0_001_dp_0_2_swati_2882_southern_sotho_false_fh_true_hs_666_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetune_squad_ep_2_0_lr_1e_06_wd_0_001_dp_0_2_swati_2882_southern_sotho_false_fh_true_hs_666_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-finetune-squad-ep-2.0-lr-1e-06-wd-0.001-dp-0.2-ss-2882-st-False-fh-True-hs-666 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bert_base_uncased_finetune_squad_ep_2_0_lr_8e_06_wd_0_001_dp_0_999_en.md b/docs/_posts/ahmedlone127/2024-09-23-bert_base_uncased_finetune_squad_ep_2_0_lr_8e_06_wd_0_001_dp_0_999_en.md new file mode 100644 index 00000000000000..f0aeed4f2ff1bd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bert_base_uncased_finetune_squad_ep_2_0_lr_8e_06_wd_0_001_dp_0_999_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_uncased_finetune_squad_ep_2_0_lr_8e_06_wd_0_001_dp_0_999 BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_finetune_squad_ep_2_0_lr_8e_06_wd_0_001_dp_0_999 +date: 2024-09-23 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetune_squad_ep_2_0_lr_8e_06_wd_0_001_dp_0_999` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_2_0_lr_8e_06_wd_0_001_dp_0_999_en_5.5.0_3.0_1727049758353.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_2_0_lr_8e_06_wd_0_001_dp_0_999_en_5.5.0_3.0_1727049758353.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetune_squad_ep_2_0_lr_8e_06_wd_0_001_dp_0_999","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetune_squad_ep_2_0_lr_8e_06_wd_0_001_dp_0_999", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetune_squad_ep_2_0_lr_8e_06_wd_0_001_dp_0_999| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-finetune-squad-ep-2.0-lr-8e-06-wd-0.001-dp-0.999 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bert_base_uncased_finetune_squad_ep_2_0_lr_8e_06_wd_0_001_dp_0_999_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-bert_base_uncased_finetune_squad_ep_2_0_lr_8e_06_wd_0_001_dp_0_999_pipeline_en.md new file mode 100644 index 00000000000000..068ea65942c59e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bert_base_uncased_finetune_squad_ep_2_0_lr_8e_06_wd_0_001_dp_0_999_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_uncased_finetune_squad_ep_2_0_lr_8e_06_wd_0_001_dp_0_999_pipeline pipeline BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_finetune_squad_ep_2_0_lr_8e_06_wd_0_001_dp_0_999_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetune_squad_ep_2_0_lr_8e_06_wd_0_001_dp_0_999_pipeline` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_2_0_lr_8e_06_wd_0_001_dp_0_999_pipeline_en_5.5.0_3.0_1727049778187.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_2_0_lr_8e_06_wd_0_001_dp_0_999_pipeline_en_5.5.0_3.0_1727049778187.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_finetune_squad_ep_2_0_lr_8e_06_wd_0_001_dp_0_999_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_finetune_squad_ep_2_0_lr_8e_06_wd_0_001_dp_0_999_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetune_squad_ep_2_0_lr_8e_06_wd_0_001_dp_0_999_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-finetune-squad-ep-2.0-lr-8e-06-wd-0.001-dp-0.999 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bert_base_uncased_finetuned_mrpc_w05230505_en.md b/docs/_posts/ahmedlone127/2024-09-23-bert_base_uncased_finetuned_mrpc_w05230505_en.md new file mode 100644 index 00000000000000..8212611910e771 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bert_base_uncased_finetuned_mrpc_w05230505_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_uncased_finetuned_mrpc_w05230505 BertForSequenceClassification from w05230505 +author: John Snow Labs +name: bert_base_uncased_finetuned_mrpc_w05230505 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetuned_mrpc_w05230505` is a English model originally trained by w05230505. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_mrpc_w05230505_en_5.5.0_3.0_1727095230954.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_mrpc_w05230505_en_5.5.0_3.0_1727095230954.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_uncased_finetuned_mrpc_w05230505","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_uncased_finetuned_mrpc_w05230505", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetuned_mrpc_w05230505| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/w05230505/bert-base-uncased-finetuned-mrpc \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bert_base_uncased_finetuned_mrpc_w05230505_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-bert_base_uncased_finetuned_mrpc_w05230505_pipeline_en.md new file mode 100644 index 00000000000000..2b8824ace9517e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bert_base_uncased_finetuned_mrpc_w05230505_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_base_uncased_finetuned_mrpc_w05230505_pipeline pipeline BertForSequenceClassification from w05230505 +author: John Snow Labs +name: bert_base_uncased_finetuned_mrpc_w05230505_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetuned_mrpc_w05230505_pipeline` is a English model originally trained by w05230505. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_mrpc_w05230505_pipeline_en_5.5.0_3.0_1727095250267.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_mrpc_w05230505_pipeline_en_5.5.0_3.0_1727095250267.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_finetuned_mrpc_w05230505_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_finetuned_mrpc_w05230505_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetuned_mrpc_w05230505_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/w05230505/bert-base-uncased-finetuned-mrpc + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bert_base_uncased_newscategoryclassification_fullmodel_3_en.md b/docs/_posts/ahmedlone127/2024-09-23-bert_base_uncased_newscategoryclassification_fullmodel_3_en.md new file mode 100644 index 00000000000000..fd1ebb3c205d54 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bert_base_uncased_newscategoryclassification_fullmodel_3_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_uncased_newscategoryclassification_fullmodel_3 DistilBertForSequenceClassification from akashmaggon +author: John Snow Labs +name: bert_base_uncased_newscategoryclassification_fullmodel_3 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_newscategoryclassification_fullmodel_3` is a English model originally trained by akashmaggon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_newscategoryclassification_fullmodel_3_en_5.5.0_3.0_1727059857892.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_newscategoryclassification_fullmodel_3_en_5.5.0_3.0_1727059857892.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("bert_base_uncased_newscategoryclassification_fullmodel_3","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("bert_base_uncased_newscategoryclassification_fullmodel_3", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_newscategoryclassification_fullmodel_3| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.6 MB| + +## References + +https://huggingface.co/akashmaggon/bert-base-uncased-newscategoryclassification-fullmodel-3 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bert_base_uncased_newscategoryclassification_fullmodel_3_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-bert_base_uncased_newscategoryclassification_fullmodel_3_pipeline_en.md new file mode 100644 index 00000000000000..2f1c37b14f152e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bert_base_uncased_newscategoryclassification_fullmodel_3_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_base_uncased_newscategoryclassification_fullmodel_3_pipeline pipeline DistilBertForSequenceClassification from akashmaggon +author: John Snow Labs +name: bert_base_uncased_newscategoryclassification_fullmodel_3_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_newscategoryclassification_fullmodel_3_pipeline` is a English model originally trained by akashmaggon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_newscategoryclassification_fullmodel_3_pipeline_en_5.5.0_3.0_1727059869698.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_newscategoryclassification_fullmodel_3_pipeline_en_5.5.0_3.0_1727059869698.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_newscategoryclassification_fullmodel_3_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_newscategoryclassification_fullmodel_3_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_newscategoryclassification_fullmodel_3_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.6 MB| + +## References + +https://huggingface.co/akashmaggon/bert-base-uncased-newscategoryclassification-fullmodel-3 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bert_finetuned_368items_en.md b/docs/_posts/ahmedlone127/2024-09-23-bert_finetuned_368items_en.md new file mode 100644 index 00000000000000..2331595dd7d658 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bert_finetuned_368items_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_finetuned_368items BertForSequenceClassification from luminar9 +author: John Snow Labs +name: bert_finetuned_368items +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_finetuned_368items` is a English model originally trained by luminar9. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_finetuned_368items_en_5.5.0_3.0_1727095940278.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_finetuned_368items_en_5.5.0_3.0_1727095940278.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_finetuned_368items","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_finetuned_368items", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_finetuned_368items| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/luminar9/bert-finetuned-368items \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bert_finetuned_368items_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-bert_finetuned_368items_pipeline_en.md new file mode 100644 index 00000000000000..30bee3b6e7fd3f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bert_finetuned_368items_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_finetuned_368items_pipeline pipeline BertForSequenceClassification from luminar9 +author: John Snow Labs +name: bert_finetuned_368items_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_finetuned_368items_pipeline` is a English model originally trained by luminar9. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_finetuned_368items_pipeline_en_5.5.0_3.0_1727095959274.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_finetuned_368items_pipeline_en_5.5.0_3.0_1727095959274.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_finetuned_368items_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_finetuned_368items_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_finetuned_368items_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/luminar9/bert-finetuned-368items + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bert_finetuned_ner_asos_uncased_en.md b/docs/_posts/ahmedlone127/2024-09-23-bert_finetuned_ner_asos_uncased_en.md new file mode 100644 index 00000000000000..4984779a7253ce --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bert_finetuned_ner_asos_uncased_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_finetuned_ner_asos_uncased BertForTokenClassification from vantagediscovery +author: John Snow Labs +name: bert_finetuned_ner_asos_uncased +date: 2024-09-23 +tags: [en, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_finetuned_ner_asos_uncased` is a English model originally trained by vantagediscovery. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_finetuned_ner_asos_uncased_en_5.5.0_3.0_1727111828478.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_finetuned_ner_asos_uncased_en_5.5.0_3.0_1727111828478.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("bert_finetuned_ner_asos_uncased","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_finetuned_ner_asos_uncased", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_finetuned_ner_asos_uncased| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|407.3 MB| + +## References + +https://huggingface.co/vantagediscovery/bert-finetuned-ner-asos-uncased \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bert_finetuned_ner_asos_uncased_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-bert_finetuned_ner_asos_uncased_pipeline_en.md new file mode 100644 index 00000000000000..96a6d534921933 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bert_finetuned_ner_asos_uncased_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_finetuned_ner_asos_uncased_pipeline pipeline BertForTokenClassification from vantagediscovery +author: John Snow Labs +name: bert_finetuned_ner_asos_uncased_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_finetuned_ner_asos_uncased_pipeline` is a English model originally trained by vantagediscovery. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_finetuned_ner_asos_uncased_pipeline_en_5.5.0_3.0_1727111847300.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_finetuned_ner_asos_uncased_pipeline_en_5.5.0_3.0_1727111847300.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_finetuned_ner_asos_uncased_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_finetuned_ner_asos_uncased_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_finetuned_ner_asos_uncased_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.3 MB| + +## References + +https://huggingface.co/vantagediscovery/bert-finetuned-ner-asos-uncased + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bert_finetuned_squad_vidyuth_en.md b/docs/_posts/ahmedlone127/2024-09-23-bert_finetuned_squad_vidyuth_en.md new file mode 100644 index 00000000000000..8ae40bdac26111 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bert_finetuned_squad_vidyuth_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_finetuned_squad_vidyuth BertForQuestionAnswering from Vidyuth +author: John Snow Labs +name: bert_finetuned_squad_vidyuth +date: 2024-09-23 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_finetuned_squad_vidyuth` is a English model originally trained by Vidyuth. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_finetuned_squad_vidyuth_en_5.5.0_3.0_1727128425852.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_finetuned_squad_vidyuth_en_5.5.0_3.0_1727128425852.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_finetuned_squad_vidyuth","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_finetuned_squad_vidyuth", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_finetuned_squad_vidyuth| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/Vidyuth/bert-finetuned-squad \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bert_finetuned_squad_vidyuth_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-bert_finetuned_squad_vidyuth_pipeline_en.md new file mode 100644 index 00000000000000..f902c675806357 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bert_finetuned_squad_vidyuth_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_finetuned_squad_vidyuth_pipeline pipeline BertForQuestionAnswering from Vidyuth +author: John Snow Labs +name: bert_finetuned_squad_vidyuth_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_finetuned_squad_vidyuth_pipeline` is a English model originally trained by Vidyuth. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_finetuned_squad_vidyuth_pipeline_en_5.5.0_3.0_1727128486255.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_finetuned_squad_vidyuth_pipeline_en_5.5.0_3.0_1727128486255.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_finetuned_squad_vidyuth_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_finetuned_squad_vidyuth_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_finetuned_squad_vidyuth_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/Vidyuth/bert-finetuned-squad + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bert_finetuning_demo_en.md b/docs/_posts/ahmedlone127/2024-09-23-bert_finetuning_demo_en.md new file mode 100644 index 00000000000000..296eefc00b892f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bert_finetuning_demo_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_finetuning_demo BertForQuestionAnswering from internetoftim +author: John Snow Labs +name: bert_finetuning_demo +date: 2024-09-23 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_finetuning_demo` is a English model originally trained by internetoftim. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_finetuning_demo_en_5.5.0_3.0_1727128443496.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_finetuning_demo_en_5.5.0_3.0_1727128443496.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_finetuning_demo","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_finetuning_demo", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_finetuning_demo| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|797.5 MB| + +## References + +https://huggingface.co/internetoftim/BERT-Finetuning-Demo \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bert_finetuning_demo_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-bert_finetuning_demo_pipeline_en.md new file mode 100644 index 00000000000000..ccc91947ec1731 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bert_finetuning_demo_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_finetuning_demo_pipeline pipeline BertForQuestionAnswering from internetoftim +author: John Snow Labs +name: bert_finetuning_demo_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_finetuning_demo_pipeline` is a English model originally trained by internetoftim. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_finetuning_demo_pipeline_en_5.5.0_3.0_1727128670577.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_finetuning_demo_pipeline_en_5.5.0_3.0_1727128670577.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_finetuning_demo_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_finetuning_demo_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_finetuning_demo_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|797.5 MB| + +## References + +https://huggingface.co/internetoftim/BERT-Finetuning-Demo + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bert_large_cased_finetuned_conll03_english_finetuned_en.md b/docs/_posts/ahmedlone127/2024-09-23-bert_large_cased_finetuned_conll03_english_finetuned_en.md new file mode 100644 index 00000000000000..23ebba56c13db0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bert_large_cased_finetuned_conll03_english_finetuned_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_large_cased_finetuned_conll03_english_finetuned BertForTokenClassification from alban12 +author: John Snow Labs +name: bert_large_cased_finetuned_conll03_english_finetuned +date: 2024-09-23 +tags: [en, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_large_cased_finetuned_conll03_english_finetuned` is a English model originally trained by alban12. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_large_cased_finetuned_conll03_english_finetuned_en_5.5.0_3.0_1727111378582.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_large_cased_finetuned_conll03_english_finetuned_en_5.5.0_3.0_1727111378582.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("bert_large_cased_finetuned_conll03_english_finetuned","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_large_cased_finetuned_conll03_english_finetuned", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_large_cased_finetuned_conll03_english_finetuned| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/alban12/bert-large-cased-finetuned-conll03-english-finetuned \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bert_large_cased_finetuned_conll03_english_finetuned_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-bert_large_cased_finetuned_conll03_english_finetuned_pipeline_en.md new file mode 100644 index 00000000000000..07bbb335805175 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bert_large_cased_finetuned_conll03_english_finetuned_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_large_cased_finetuned_conll03_english_finetuned_pipeline pipeline BertForTokenClassification from alban12 +author: John Snow Labs +name: bert_large_cased_finetuned_conll03_english_finetuned_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_large_cased_finetuned_conll03_english_finetuned_pipeline` is a English model originally trained by alban12. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_large_cased_finetuned_conll03_english_finetuned_pipeline_en_5.5.0_3.0_1727111437539.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_large_cased_finetuned_conll03_english_finetuned_pipeline_en_5.5.0_3.0_1727111437539.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_large_cased_finetuned_conll03_english_finetuned_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_large_cased_finetuned_conll03_english_finetuned_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_large_cased_finetuned_conll03_english_finetuned_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/alban12/bert-large-cased-finetuned-conll03-english-finetuned + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bert_large_cased_scmedium_squad_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-bert_large_cased_scmedium_squad_pipeline_en.md new file mode 100644 index 00000000000000..f01c41498d2033 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bert_large_cased_scmedium_squad_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_large_cased_scmedium_squad_pipeline pipeline BertForQuestionAnswering from CambridgeMolecularEngineering +author: John Snow Labs +name: bert_large_cased_scmedium_squad_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_large_cased_scmedium_squad_pipeline` is a English model originally trained by CambridgeMolecularEngineering. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_large_cased_scmedium_squad_pipeline_en_5.5.0_3.0_1727050127678.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_large_cased_scmedium_squad_pipeline_en_5.5.0_3.0_1727050127678.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_large_cased_scmedium_squad_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_large_cased_scmedium_squad_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_large_cased_scmedium_squad_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/CambridgeMolecularEngineering/bert-large-cased-scmedium-squad + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bert_large_uncased_whole_word_masking_finetuned_policy_number_en.md b/docs/_posts/ahmedlone127/2024-09-23-bert_large_uncased_whole_word_masking_finetuned_policy_number_en.md new file mode 100644 index 00000000000000..2181c47e0a3072 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bert_large_uncased_whole_word_masking_finetuned_policy_number_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_large_uncased_whole_word_masking_finetuned_policy_number BertForQuestionAnswering from Ineract +author: John Snow Labs +name: bert_large_uncased_whole_word_masking_finetuned_policy_number +date: 2024-09-23 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_large_uncased_whole_word_masking_finetuned_policy_number` is a English model originally trained by Ineract. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_large_uncased_whole_word_masking_finetuned_policy_number_en_5.5.0_3.0_1727049964871.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_large_uncased_whole_word_masking_finetuned_policy_number_en_5.5.0_3.0_1727049964871.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_large_uncased_whole_word_masking_finetuned_policy_number","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_large_uncased_whole_word_masking_finetuned_policy_number", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_large_uncased_whole_word_masking_finetuned_policy_number| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/Ineract/bert-large-uncased-whole-word-masking-finetuned-policy-number \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bert_large_uncased_whole_word_masking_finetuned_policy_number_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-bert_large_uncased_whole_word_masking_finetuned_policy_number_pipeline_en.md new file mode 100644 index 00000000000000..5d1235a55c2f11 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bert_large_uncased_whole_word_masking_finetuned_policy_number_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_large_uncased_whole_word_masking_finetuned_policy_number_pipeline pipeline BertForQuestionAnswering from Ineract +author: John Snow Labs +name: bert_large_uncased_whole_word_masking_finetuned_policy_number_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_large_uncased_whole_word_masking_finetuned_policy_number_pipeline` is a English model originally trained by Ineract. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_large_uncased_whole_word_masking_finetuned_policy_number_pipeline_en_5.5.0_3.0_1727050028079.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_large_uncased_whole_word_masking_finetuned_policy_number_pipeline_en_5.5.0_3.0_1727050028079.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_large_uncased_whole_word_masking_finetuned_policy_number_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_large_uncased_whole_word_masking_finetuned_policy_number_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_large_uncased_whole_word_masking_finetuned_policy_number_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/Ineract/bert-large-uncased-whole-word-masking-finetuned-policy-number + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bert_large_uncased_whole_word_masking_finetuned_squad_dev_one_en.md b/docs/_posts/ahmedlone127/2024-09-23-bert_large_uncased_whole_word_masking_finetuned_squad_dev_one_en.md new file mode 100644 index 00000000000000..a20cb1ec1cf7f2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bert_large_uncased_whole_word_masking_finetuned_squad_dev_one_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_large_uncased_whole_word_masking_finetuned_squad_dev_one BertForQuestionAnswering from mdzrg +author: John Snow Labs +name: bert_large_uncased_whole_word_masking_finetuned_squad_dev_one +date: 2024-09-23 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_large_uncased_whole_word_masking_finetuned_squad_dev_one` is a English model originally trained by mdzrg. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_large_uncased_whole_word_masking_finetuned_squad_dev_one_en_5.5.0_3.0_1727049678814.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_large_uncased_whole_word_masking_finetuned_squad_dev_one_en_5.5.0_3.0_1727049678814.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_large_uncased_whole_word_masking_finetuned_squad_dev_one","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_large_uncased_whole_word_masking_finetuned_squad_dev_one", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_large_uncased_whole_word_masking_finetuned_squad_dev_one| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/mdzrg/bert-large-uncased-whole-word-masking-finetuned-squad-dev-one \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bert_large_uncased_whole_word_masking_squad2_finetuned_islamic_squad_en.md b/docs/_posts/ahmedlone127/2024-09-23-bert_large_uncased_whole_word_masking_squad2_finetuned_islamic_squad_en.md new file mode 100644 index 00000000000000..902c194dd80e71 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bert_large_uncased_whole_word_masking_squad2_finetuned_islamic_squad_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_large_uncased_whole_word_masking_squad2_finetuned_islamic_squad BertForQuestionAnswering from haddadalwi +author: John Snow Labs +name: bert_large_uncased_whole_word_masking_squad2_finetuned_islamic_squad +date: 2024-09-23 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_large_uncased_whole_word_masking_squad2_finetuned_islamic_squad` is a English model originally trained by haddadalwi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_large_uncased_whole_word_masking_squad2_finetuned_islamic_squad_en_5.5.0_3.0_1727050165039.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_large_uncased_whole_word_masking_squad2_finetuned_islamic_squad_en_5.5.0_3.0_1727050165039.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_large_uncased_whole_word_masking_squad2_finetuned_islamic_squad","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_large_uncased_whole_word_masking_squad2_finetuned_islamic_squad", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_large_uncased_whole_word_masking_squad2_finetuned_islamic_squad| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/haddadalwi/bert-large-uncased-whole-word-masking-squad2-finetuned-islamic-squad \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bert_large_uncased_whole_word_masking_squad2_finetuned_islamic_squad_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-bert_large_uncased_whole_word_masking_squad2_finetuned_islamic_squad_pipeline_en.md new file mode 100644 index 00000000000000..d36a8b8d8e962f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bert_large_uncased_whole_word_masking_squad2_finetuned_islamic_squad_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_large_uncased_whole_word_masking_squad2_finetuned_islamic_squad_pipeline pipeline BertForQuestionAnswering from haddadalwi +author: John Snow Labs +name: bert_large_uncased_whole_word_masking_squad2_finetuned_islamic_squad_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_large_uncased_whole_word_masking_squad2_finetuned_islamic_squad_pipeline` is a English model originally trained by haddadalwi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_large_uncased_whole_word_masking_squad2_finetuned_islamic_squad_pipeline_en_5.5.0_3.0_1727050228698.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_large_uncased_whole_word_masking_squad2_finetuned_islamic_squad_pipeline_en_5.5.0_3.0_1727050228698.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_large_uncased_whole_word_masking_squad2_finetuned_islamic_squad_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_large_uncased_whole_word_masking_squad2_finetuned_islamic_squad_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_large_uncased_whole_word_masking_squad2_finetuned_islamic_squad_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/haddadalwi/bert-large-uncased-whole-word-masking-squad2-finetuned-islamic-squad + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bert_large_uncased_whole_word_masking_squad2_train_data_unmodified_en.md b/docs/_posts/ahmedlone127/2024-09-23-bert_large_uncased_whole_word_masking_squad2_train_data_unmodified_en.md new file mode 100644 index 00000000000000..10390f16cc927d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bert_large_uncased_whole_word_masking_squad2_train_data_unmodified_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_large_uncased_whole_word_masking_squad2_train_data_unmodified BertForQuestionAnswering from mdzrg +author: John Snow Labs +name: bert_large_uncased_whole_word_masking_squad2_train_data_unmodified +date: 2024-09-23 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_large_uncased_whole_word_masking_squad2_train_data_unmodified` is a English model originally trained by mdzrg. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_large_uncased_whole_word_masking_squad2_train_data_unmodified_en_5.5.0_3.0_1727128644295.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_large_uncased_whole_word_masking_squad2_train_data_unmodified_en_5.5.0_3.0_1727128644295.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_large_uncased_whole_word_masking_squad2_train_data_unmodified","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_large_uncased_whole_word_masking_squad2_train_data_unmodified", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_large_uncased_whole_word_masking_squad2_train_data_unmodified| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/mdzrg/bert-large-uncased-whole-word-masking-squad2-train-data-unmodified \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bert_large_uncased_whole_word_masking_squad2_train_data_unmodified_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-bert_large_uncased_whole_word_masking_squad2_train_data_unmodified_pipeline_en.md new file mode 100644 index 00000000000000..4936e8fb2dbbd5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bert_large_uncased_whole_word_masking_squad2_train_data_unmodified_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_large_uncased_whole_word_masking_squad2_train_data_unmodified_pipeline pipeline BertForQuestionAnswering from mdzrg +author: John Snow Labs +name: bert_large_uncased_whole_word_masking_squad2_train_data_unmodified_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_large_uncased_whole_word_masking_squad2_train_data_unmodified_pipeline` is a English model originally trained by mdzrg. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_large_uncased_whole_word_masking_squad2_train_data_unmodified_pipeline_en_5.5.0_3.0_1727128703878.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_large_uncased_whole_word_masking_squad2_train_data_unmodified_pipeline_en_5.5.0_3.0_1727128703878.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_large_uncased_whole_word_masking_squad2_train_data_unmodified_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_large_uncased_whole_word_masking_squad2_train_data_unmodified_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_large_uncased_whole_word_masking_squad2_train_data_unmodified_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/mdzrg/bert-large-uncased-whole-word-masking-squad2-train-data-unmodified + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bert_massa_es.md b/docs/_posts/ahmedlone127/2024-09-23-bert_massa_es.md new file mode 100644 index 00000000000000..00f7756686ad37 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bert_massa_es.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Castilian, Spanish bert_massa XlmRoBertaForSequenceClassification from nmarinnn +author: John Snow Labs +name: bert_massa +date: 2024-09-23 +tags: [es, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: es +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_massa` is a Castilian, Spanish model originally trained by nmarinnn. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_massa_es_5.5.0_3.0_1727126108155.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_massa_es_5.5.0_3.0_1727126108155.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("bert_massa","es") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("bert_massa", "es") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_massa| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|es| +|Size:|1.0 GB| + +## References + +https://huggingface.co/nmarinnn/bert-massa \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bert_massa_pipeline_es.md b/docs/_posts/ahmedlone127/2024-09-23-bert_massa_pipeline_es.md new file mode 100644 index 00000000000000..ea4fd44444ea17 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bert_massa_pipeline_es.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Castilian, Spanish bert_massa_pipeline pipeline XlmRoBertaForSequenceClassification from nmarinnn +author: John Snow Labs +name: bert_massa_pipeline +date: 2024-09-23 +tags: [es, open_source, pipeline, onnx] +task: Text Classification +language: es +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_massa_pipeline` is a Castilian, Spanish model originally trained by nmarinnn. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_massa_pipeline_es_5.5.0_3.0_1727126157927.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_massa_pipeline_es_5.5.0_3.0_1727126157927.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_massa_pipeline", lang = "es") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_massa_pipeline", lang = "es") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_massa_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|es| +|Size:|1.0 GB| + +## References + +https://huggingface.co/nmarinnn/bert-massa + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bert_medquad_500_tokens_en.md b/docs/_posts/ahmedlone127/2024-09-23-bert_medquad_500_tokens_en.md new file mode 100644 index 00000000000000..f56772eb9bd36d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bert_medquad_500_tokens_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_medquad_500_tokens BertForQuestionAnswering from DataScientist1122 +author: John Snow Labs +name: bert_medquad_500_tokens +date: 2024-09-23 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_medquad_500_tokens` is a English model originally trained by DataScientist1122. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_medquad_500_tokens_en_5.5.0_3.0_1727128486045.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_medquad_500_tokens_en_5.5.0_3.0_1727128486045.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_medquad_500_tokens","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_medquad_500_tokens", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_medquad_500_tokens| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/DataScientist1122/BERT_MedQuad_500_tokens \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bert_medquad_500_tokens_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-bert_medquad_500_tokens_pipeline_en.md new file mode 100644 index 00000000000000..40bcb3dc8a5858 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bert_medquad_500_tokens_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_medquad_500_tokens_pipeline pipeline BertForQuestionAnswering from DataScientist1122 +author: John Snow Labs +name: bert_medquad_500_tokens_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_medquad_500_tokens_pipeline` is a English model originally trained by DataScientist1122. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_medquad_500_tokens_pipeline_en_5.5.0_3.0_1727128508041.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_medquad_500_tokens_pipeline_en_5.5.0_3.0_1727128508041.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_medquad_500_tokens_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_medquad_500_tokens_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_medquad_500_tokens_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/DataScientist1122/BERT_MedQuad_500_tokens + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bert_poop_0_en.md b/docs/_posts/ahmedlone127/2024-09-23-bert_poop_0_en.md new file mode 100644 index 00000000000000..3b14ff42d2b47c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bert_poop_0_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_poop_0 DistilBertForSequenceClassification from jvelja +author: John Snow Labs +name: bert_poop_0 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_poop_0` is a English model originally trained by jvelja. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_poop_0_en_5.5.0_3.0_1727082112064.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_poop_0_en_5.5.0_3.0_1727082112064.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("bert_poop_0","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("bert_poop_0", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_poop_0| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/jvelja/BERT_poop_0 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bert_poop_0_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-bert_poop_0_pipeline_en.md new file mode 100644 index 00000000000000..de7bb25bcc4c71 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bert_poop_0_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_poop_0_pipeline pipeline DistilBertForSequenceClassification from jvelja +author: John Snow Labs +name: bert_poop_0_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_poop_0_pipeline` is a English model originally trained by jvelja. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_poop_0_pipeline_en_5.5.0_3.0_1727082124408.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_poop_0_pipeline_en_5.5.0_3.0_1727082124408.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_poop_0_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_poop_0_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_poop_0_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/jvelja/BERT_poop_0 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bert_portuguese_squad_en.md b/docs/_posts/ahmedlone127/2024-09-23-bert_portuguese_squad_en.md new file mode 100644 index 00000000000000..b68d4f49e9956a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bert_portuguese_squad_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_portuguese_squad BertForQuestionAnswering from lfcc +author: John Snow Labs +name: bert_portuguese_squad +date: 2024-09-23 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_portuguese_squad` is a English model originally trained by lfcc. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_portuguese_squad_en_5.5.0_3.0_1727127915089.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_portuguese_squad_en_5.5.0_3.0_1727127915089.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_portuguese_squad","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_portuguese_squad", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_portuguese_squad| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|405.9 MB| + +## References + +https://huggingface.co/lfcc/bert-portuguese-squad \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bert_portuguese_squad_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-bert_portuguese_squad_pipeline_en.md new file mode 100644 index 00000000000000..52f236944e0f26 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bert_portuguese_squad_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_portuguese_squad_pipeline pipeline BertForQuestionAnswering from lfcc +author: John Snow Labs +name: bert_portuguese_squad_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_portuguese_squad_pipeline` is a English model originally trained by lfcc. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_portuguese_squad_pipeline_en_5.5.0_3.0_1727127936015.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_portuguese_squad_pipeline_en_5.5.0_3.0_1727127936015.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_portuguese_squad_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_portuguese_squad_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_portuguese_squad_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|405.9 MB| + +## References + +https://huggingface.co/lfcc/bert-portuguese-squad + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bert_tonga_tonga_islands_distilbert_ner_soniquentin_en.md b/docs/_posts/ahmedlone127/2024-09-23-bert_tonga_tonga_islands_distilbert_ner_soniquentin_en.md new file mode 100644 index 00000000000000..a353a4a232cc77 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bert_tonga_tonga_islands_distilbert_ner_soniquentin_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_tonga_tonga_islands_distilbert_ner_soniquentin BertForTokenClassification from soniquentin +author: John Snow Labs +name: bert_tonga_tonga_islands_distilbert_ner_soniquentin +date: 2024-09-23 +tags: [en, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_tonga_tonga_islands_distilbert_ner_soniquentin` is a English model originally trained by soniquentin. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_tonga_tonga_islands_distilbert_ner_soniquentin_en_5.5.0_3.0_1727060743700.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_tonga_tonga_islands_distilbert_ner_soniquentin_en_5.5.0_3.0_1727060743700.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("bert_tonga_tonga_islands_distilbert_ner_soniquentin","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_tonga_tonga_islands_distilbert_ner_soniquentin", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_tonga_tonga_islands_distilbert_ner_soniquentin| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|244.3 MB| + +## References + +https://huggingface.co/soniquentin/bert-to-distilbert-NER \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bert_tonga_tonga_islands_distilbert_ner_soniquentin_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-bert_tonga_tonga_islands_distilbert_ner_soniquentin_pipeline_en.md new file mode 100644 index 00000000000000..932a137a117cc8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bert_tonga_tonga_islands_distilbert_ner_soniquentin_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_tonga_tonga_islands_distilbert_ner_soniquentin_pipeline pipeline BertForTokenClassification from soniquentin +author: John Snow Labs +name: bert_tonga_tonga_islands_distilbert_ner_soniquentin_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_tonga_tonga_islands_distilbert_ner_soniquentin_pipeline` is a English model originally trained by soniquentin. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_tonga_tonga_islands_distilbert_ner_soniquentin_pipeline_en_5.5.0_3.0_1727060755254.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_tonga_tonga_islands_distilbert_ner_soniquentin_pipeline_en_5.5.0_3.0_1727060755254.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_tonga_tonga_islands_distilbert_ner_soniquentin_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_tonga_tonga_islands_distilbert_ner_soniquentin_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_tonga_tonga_islands_distilbert_ner_soniquentin_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|244.3 MB| + +## References + +https://huggingface.co/soniquentin/bert-to-distilbert-NER + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bertin_roberta_base_spanish_finetuned_xnli_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-bertin_roberta_base_spanish_finetuned_xnli_pipeline_en.md new file mode 100644 index 00000000000000..9fbf6aa222e35a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bertin_roberta_base_spanish_finetuned_xnli_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bertin_roberta_base_spanish_finetuned_xnli_pipeline pipeline RoBertaForSequenceClassification from dccuchile +author: John Snow Labs +name: bertin_roberta_base_spanish_finetuned_xnli_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bertin_roberta_base_spanish_finetuned_xnli_pipeline` is a English model originally trained by dccuchile. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bertin_roberta_base_spanish_finetuned_xnli_pipeline_en_5.5.0_3.0_1727135255172.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bertin_roberta_base_spanish_finetuned_xnli_pipeline_en_5.5.0_3.0_1727135255172.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bertin_roberta_base_spanish_finetuned_xnli_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bertin_roberta_base_spanish_finetuned_xnli_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bertin_roberta_base_spanish_finetuned_xnli_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|464.5 MB| + +## References + +https://huggingface.co/dccuchile/bertin-roberta-base-spanish-finetuned-xnli + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bge_large_repmus_cross_entropy_en.md b/docs/_posts/ahmedlone127/2024-09-23-bge_large_repmus_cross_entropy_en.md new file mode 100644 index 00000000000000..910831292e9ebf --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bge_large_repmus_cross_entropy_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English bge_large_repmus_cross_entropy BGEEmbeddings from tessimago +author: John Snow Labs +name: bge_large_repmus_cross_entropy +date: 2024-09-23 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_large_repmus_cross_entropy` is a English model originally trained by tessimago. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_large_repmus_cross_entropy_en_5.5.0_3.0_1727105946568.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_large_repmus_cross_entropy_en_5.5.0_3.0_1727105946568.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("bge_large_repmus_cross_entropy","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("bge_large_repmus_cross_entropy","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_large_repmus_cross_entropy| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/tessimago/bge-large-repmus-cross_entropy \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bge_large_repmus_cross_entropy_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-bge_large_repmus_cross_entropy_pipeline_en.md new file mode 100644 index 00000000000000..c3f7cbbd10166a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bge_large_repmus_cross_entropy_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bge_large_repmus_cross_entropy_pipeline pipeline BGEEmbeddings from tessimago +author: John Snow Labs +name: bge_large_repmus_cross_entropy_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_large_repmus_cross_entropy_pipeline` is a English model originally trained by tessimago. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_large_repmus_cross_entropy_pipeline_en_5.5.0_3.0_1727106012986.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_large_repmus_cross_entropy_pipeline_en_5.5.0_3.0_1727106012986.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bge_large_repmus_cross_entropy_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bge_large_repmus_cross_entropy_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_large_repmus_cross_entropy_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/tessimago/bge-large-repmus-cross_entropy + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bloom_question_classification_en.md b/docs/_posts/ahmedlone127/2024-09-23-bloom_question_classification_en.md new file mode 100644 index 00000000000000..87bb4eb0b09876 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bloom_question_classification_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bloom_question_classification DistilBertForSequenceClassification from MinervaBotTeam +author: John Snow Labs +name: bloom_question_classification +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bloom_question_classification` is a English model originally trained by MinervaBotTeam. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bloom_question_classification_en_5.5.0_3.0_1727108414330.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bloom_question_classification_en_5.5.0_3.0_1727108414330.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("bloom_question_classification","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("bloom_question_classification", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bloom_question_classification| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/MinervaBotTeam/Bloom_Question_Classification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bloom_question_classification_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-bloom_question_classification_pipeline_en.md new file mode 100644 index 00000000000000..9224ed0319bfa4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bloom_question_classification_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bloom_question_classification_pipeline pipeline DistilBertForSequenceClassification from MinervaBotTeam +author: John Snow Labs +name: bloom_question_classification_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bloom_question_classification_pipeline` is a English model originally trained by MinervaBotTeam. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bloom_question_classification_pipeline_en_5.5.0_3.0_1727108426660.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bloom_question_classification_pipeline_en_5.5.0_3.0_1727108426660.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bloom_question_classification_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bloom_question_classification_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bloom_question_classification_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/MinervaBotTeam/Bloom_Question_Classification + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bnt5_101_en.md b/docs/_posts/ahmedlone127/2024-09-23-bnt5_101_en.md new file mode 100644 index 00000000000000..e765143650e742 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bnt5_101_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bnt5_101 T5Transformer from kawsarahmd +author: John Snow Labs +name: bnt5_101 +date: 2024-09-23 +tags: [en, open_source, onnx, t5, question_answering, summarization, translation, text_generation] +task: [Question Answering, Summarization, Translation, Text Generation] +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: T5Transformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained T5Transformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bnt5_101` is a English model originally trained by kawsarahmd. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bnt5_101_en_5.5.0_3.0_1727124636646.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bnt5_101_en_5.5.0_3.0_1727124636646.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +t5 = T5Transformer.pretrained("bnt5_101","en") \ + .setInputCols(["document"]) \ + .setOutputCol("output") + +pipeline = Pipeline().setStages([documentAssembler, t5]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val t5 = T5Transformer.pretrained("bnt5_101", "en") + .setInputCols(Array("documents")) + .setOutputCol("output") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, t5)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bnt5_101| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[output]| +|Language:|en| +|Size:|1.0 GB| + +## References + +https://huggingface.co/kawsarahmd/bnt5-101 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bnt5_101_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-bnt5_101_pipeline_en.md new file mode 100644 index 00000000000000..09657fb91bcc2e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bnt5_101_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bnt5_101_pipeline pipeline T5Transformer from kawsarahmd +author: John Snow Labs +name: bnt5_101_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: [Question Answering, Summarization, Translation, Text Generation] +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained T5Transformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bnt5_101_pipeline` is a English model originally trained by kawsarahmd. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bnt5_101_pipeline_en_5.5.0_3.0_1727124684796.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bnt5_101_pipeline_en_5.5.0_3.0_1727124684796.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bnt5_101_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bnt5_101_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bnt5_101_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.0 GB| + +## References + +https://huggingface.co/kawsarahmd/bnt5-101 + +## Included Models + +- DocumentAssembler +- T5Transformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bpe_selfies_pubchem_shard00_70k_en.md b/docs/_posts/ahmedlone127/2024-09-23-bpe_selfies_pubchem_shard00_70k_en.md new file mode 100644 index 00000000000000..ab6e09206c6693 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bpe_selfies_pubchem_shard00_70k_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bpe_selfies_pubchem_shard00_70k RoBertaEmbeddings from seyonec +author: John Snow Labs +name: bpe_selfies_pubchem_shard00_70k +date: 2024-09-23 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bpe_selfies_pubchem_shard00_70k` is a English model originally trained by seyonec. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bpe_selfies_pubchem_shard00_70k_en_5.5.0_3.0_1727092327447.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bpe_selfies_pubchem_shard00_70k_en_5.5.0_3.0_1727092327447.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("bpe_selfies_pubchem_shard00_70k","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("bpe_selfies_pubchem_shard00_70k","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bpe_selfies_pubchem_shard00_70k| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|309.5 MB| + +## References + +https://huggingface.co/seyonec/BPE_SELFIES_PubChem_shard00_70k \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bpe_selfies_pubchem_shard00_70k_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-bpe_selfies_pubchem_shard00_70k_pipeline_en.md new file mode 100644 index 00000000000000..7f620799b0847a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bpe_selfies_pubchem_shard00_70k_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bpe_selfies_pubchem_shard00_70k_pipeline pipeline RoBertaEmbeddings from seyonec +author: John Snow Labs +name: bpe_selfies_pubchem_shard00_70k_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bpe_selfies_pubchem_shard00_70k_pipeline` is a English model originally trained by seyonec. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bpe_selfies_pubchem_shard00_70k_pipeline_en_5.5.0_3.0_1727092340901.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bpe_selfies_pubchem_shard00_70k_pipeline_en_5.5.0_3.0_1727092340901.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bpe_selfies_pubchem_shard00_70k_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bpe_selfies_pubchem_shard00_70k_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bpe_selfies_pubchem_shard00_70k_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|309.5 MB| + +## References + +https://huggingface.co/seyonec/BPE_SELFIES_PubChem_shard00_70k + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bsc_bio_ehr_spanish_carmen_livingner_humano_es.md b/docs/_posts/ahmedlone127/2024-09-23-bsc_bio_ehr_spanish_carmen_livingner_humano_es.md new file mode 100644 index 00000000000000..b0b12cfdba889e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bsc_bio_ehr_spanish_carmen_livingner_humano_es.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Castilian, Spanish bsc_bio_ehr_spanish_carmen_livingner_humano RoBertaForTokenClassification from BSC-NLP4BIA +author: John Snow Labs +name: bsc_bio_ehr_spanish_carmen_livingner_humano +date: 2024-09-23 +tags: [es, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: es +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bsc_bio_ehr_spanish_carmen_livingner_humano` is a Castilian, Spanish model originally trained by BSC-NLP4BIA. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bsc_bio_ehr_spanish_carmen_livingner_humano_es_5.5.0_3.0_1727081573115.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bsc_bio_ehr_spanish_carmen_livingner_humano_es_5.5.0_3.0_1727081573115.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("bsc_bio_ehr_spanish_carmen_livingner_humano","es") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("bsc_bio_ehr_spanish_carmen_livingner_humano", "es") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bsc_bio_ehr_spanish_carmen_livingner_humano| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|es| +|Size:|454.4 MB| + +## References + +https://huggingface.co/BSC-NLP4BIA/bsc-bio-ehr-es-carmen-livingner-humano \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bsc_bio_ehr_spanish_carmen_livingner_humano_pipeline_es.md b/docs/_posts/ahmedlone127/2024-09-23-bsc_bio_ehr_spanish_carmen_livingner_humano_pipeline_es.md new file mode 100644 index 00000000000000..9f5db6c67c9341 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bsc_bio_ehr_spanish_carmen_livingner_humano_pipeline_es.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Castilian, Spanish bsc_bio_ehr_spanish_carmen_livingner_humano_pipeline pipeline RoBertaForTokenClassification from BSC-NLP4BIA +author: John Snow Labs +name: bsc_bio_ehr_spanish_carmen_livingner_humano_pipeline +date: 2024-09-23 +tags: [es, open_source, pipeline, onnx] +task: Named Entity Recognition +language: es +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bsc_bio_ehr_spanish_carmen_livingner_humano_pipeline` is a Castilian, Spanish model originally trained by BSC-NLP4BIA. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bsc_bio_ehr_spanish_carmen_livingner_humano_pipeline_es_5.5.0_3.0_1727081596372.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bsc_bio_ehr_spanish_carmen_livingner_humano_pipeline_es_5.5.0_3.0_1727081596372.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bsc_bio_ehr_spanish_carmen_livingner_humano_pipeline", lang = "es") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bsc_bio_ehr_spanish_carmen_livingner_humano_pipeline", lang = "es") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bsc_bio_ehr_spanish_carmen_livingner_humano_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|es| +|Size:|454.4 MB| + +## References + +https://huggingface.co/BSC-NLP4BIA/bsc-bio-ehr-es-carmen-livingner-humano + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bsc_bio_ehr_spanish_distemist_ner_en.md b/docs/_posts/ahmedlone127/2024-09-23-bsc_bio_ehr_spanish_distemist_ner_en.md new file mode 100644 index 00000000000000..1f5b62c29fccc9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bsc_bio_ehr_spanish_distemist_ner_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bsc_bio_ehr_spanish_distemist_ner RoBertaForTokenClassification from Rodrigo1771 +author: John Snow Labs +name: bsc_bio_ehr_spanish_distemist_ner +date: 2024-09-23 +tags: [en, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bsc_bio_ehr_spanish_distemist_ner` is a English model originally trained by Rodrigo1771. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bsc_bio_ehr_spanish_distemist_ner_en_5.5.0_3.0_1727072856064.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bsc_bio_ehr_spanish_distemist_ner_en_5.5.0_3.0_1727072856064.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("bsc_bio_ehr_spanish_distemist_ner","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("bsc_bio_ehr_spanish_distemist_ner", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bsc_bio_ehr_spanish_distemist_ner| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|440.4 MB| + +## References + +https://huggingface.co/Rodrigo1771/bsc-bio-ehr-es-distemist-ner \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bsc_bio_ehr_spanish_distemist_ner_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-bsc_bio_ehr_spanish_distemist_ner_pipeline_en.md new file mode 100644 index 00000000000000..5c2e4c75cb5d29 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bsc_bio_ehr_spanish_distemist_ner_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bsc_bio_ehr_spanish_distemist_ner_pipeline pipeline RoBertaForTokenClassification from Rodrigo1771 +author: John Snow Labs +name: bsc_bio_ehr_spanish_distemist_ner_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bsc_bio_ehr_spanish_distemist_ner_pipeline` is a English model originally trained by Rodrigo1771. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bsc_bio_ehr_spanish_distemist_ner_pipeline_en_5.5.0_3.0_1727072887817.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bsc_bio_ehr_spanish_distemist_ner_pipeline_en_5.5.0_3.0_1727072887817.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bsc_bio_ehr_spanish_distemist_ner_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bsc_bio_ehr_spanish_distemist_ner_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bsc_bio_ehr_spanish_distemist_ner_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|440.5 MB| + +## References + +https://huggingface.co/Rodrigo1771/bsc-bio-ehr-es-distemist-ner + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bsc_bio_ehr_spanish_symptemist_word2vec_85_ner_en.md b/docs/_posts/ahmedlone127/2024-09-23-bsc_bio_ehr_spanish_symptemist_word2vec_85_ner_en.md new file mode 100644 index 00000000000000..52368f14768672 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bsc_bio_ehr_spanish_symptemist_word2vec_85_ner_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bsc_bio_ehr_spanish_symptemist_word2vec_85_ner RoBertaForTokenClassification from Rodrigo1771 +author: John Snow Labs +name: bsc_bio_ehr_spanish_symptemist_word2vec_85_ner +date: 2024-09-23 +tags: [en, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bsc_bio_ehr_spanish_symptemist_word2vec_85_ner` is a English model originally trained by Rodrigo1771. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bsc_bio_ehr_spanish_symptemist_word2vec_85_ner_en_5.5.0_3.0_1727115179187.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bsc_bio_ehr_spanish_symptemist_word2vec_85_ner_en_5.5.0_3.0_1727115179187.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("bsc_bio_ehr_spanish_symptemist_word2vec_85_ner","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("bsc_bio_ehr_spanish_symptemist_word2vec_85_ner", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bsc_bio_ehr_spanish_symptemist_word2vec_85_ner| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|435.0 MB| + +## References + +https://huggingface.co/Rodrigo1771/bsc-bio-ehr-es-symptemist-word2vec-85-ner \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bsc_bio_ehr_spanish_symptemist_word2vec_85_ner_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-bsc_bio_ehr_spanish_symptemist_word2vec_85_ner_pipeline_en.md new file mode 100644 index 00000000000000..93a1f7749c6e1c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bsc_bio_ehr_spanish_symptemist_word2vec_85_ner_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bsc_bio_ehr_spanish_symptemist_word2vec_85_ner_pipeline pipeline RoBertaForTokenClassification from Rodrigo1771 +author: John Snow Labs +name: bsc_bio_ehr_spanish_symptemist_word2vec_85_ner_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bsc_bio_ehr_spanish_symptemist_word2vec_85_ner_pipeline` is a English model originally trained by Rodrigo1771. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bsc_bio_ehr_spanish_symptemist_word2vec_85_ner_pipeline_en_5.5.0_3.0_1727115210618.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bsc_bio_ehr_spanish_symptemist_word2vec_85_ner_pipeline_en_5.5.0_3.0_1727115210618.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bsc_bio_ehr_spanish_symptemist_word2vec_85_ner_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bsc_bio_ehr_spanish_symptemist_word2vec_85_ner_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bsc_bio_ehr_spanish_symptemist_word2vec_85_ner_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|435.0 MB| + +## References + +https://huggingface.co/Rodrigo1771/bsc-bio-ehr-es-symptemist-word2vec-85-ner + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bsc_bio_ehr_spanish_vih_rod_en.md b/docs/_posts/ahmedlone127/2024-09-23-bsc_bio_ehr_spanish_vih_rod_en.md new file mode 100644 index 00000000000000..067925c0795831 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bsc_bio_ehr_spanish_vih_rod_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bsc_bio_ehr_spanish_vih_rod RoBertaForSequenceClassification from Wariano +author: John Snow Labs +name: bsc_bio_ehr_spanish_vih_rod +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bsc_bio_ehr_spanish_vih_rod` is a English model originally trained by Wariano. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bsc_bio_ehr_spanish_vih_rod_en_5.5.0_3.0_1727055073441.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bsc_bio_ehr_spanish_vih_rod_en_5.5.0_3.0_1727055073441.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("bsc_bio_ehr_spanish_vih_rod","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("bsc_bio_ehr_spanish_vih_rod", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bsc_bio_ehr_spanish_vih_rod| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|437.0 MB| + +## References + +https://huggingface.co/Wariano/bsc-bio-ehr-es-vih-rod \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bsc_bio_ehr_spanish_vih_rod_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-bsc_bio_ehr_spanish_vih_rod_pipeline_en.md new file mode 100644 index 00000000000000..5bc872f9c53e43 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bsc_bio_ehr_spanish_vih_rod_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bsc_bio_ehr_spanish_vih_rod_pipeline pipeline RoBertaForSequenceClassification from Wariano +author: John Snow Labs +name: bsc_bio_ehr_spanish_vih_rod_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bsc_bio_ehr_spanish_vih_rod_pipeline` is a English model originally trained by Wariano. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bsc_bio_ehr_spanish_vih_rod_pipeline_en_5.5.0_3.0_1727055107475.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bsc_bio_ehr_spanish_vih_rod_pipeline_en_5.5.0_3.0_1727055107475.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bsc_bio_ehr_spanish_vih_rod_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bsc_bio_ehr_spanish_vih_rod_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bsc_bio_ehr_spanish_vih_rod_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|437.0 MB| + +## References + +https://huggingface.co/Wariano/bsc-bio-ehr-es-vih-rod + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_eli5_mlm_model_abishines_en.md b/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_eli5_mlm_model_abishines_en.md new file mode 100644 index 00000000000000..1c73ca809c86e9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_eli5_mlm_model_abishines_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_awesome_eli5_mlm_model_abishines RoBertaEmbeddings from abishines +author: John Snow Labs +name: burmese_awesome_eli5_mlm_model_abishines +date: 2024-09-23 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_eli5_mlm_model_abishines` is a English model originally trained by abishines. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_eli5_mlm_model_abishines_en_5.5.0_3.0_1727091980181.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_eli5_mlm_model_abishines_en_5.5.0_3.0_1727091980181.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("burmese_awesome_eli5_mlm_model_abishines","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("burmese_awesome_eli5_mlm_model_abishines","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_eli5_mlm_model_abishines| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|306.5 MB| + +## References + +https://huggingface.co/abishines/my_awesome_eli5_mlm_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_eli5_mlm_model_abishines_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_eli5_mlm_model_abishines_pipeline_en.md new file mode 100644 index 00000000000000..d1ea3b977cdcfc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_eli5_mlm_model_abishines_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_eli5_mlm_model_abishines_pipeline pipeline RoBertaEmbeddings from abishines +author: John Snow Labs +name: burmese_awesome_eli5_mlm_model_abishines_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_eli5_mlm_model_abishines_pipeline` is a English model originally trained by abishines. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_eli5_mlm_model_abishines_pipeline_en_5.5.0_3.0_1727091996203.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_eli5_mlm_model_abishines_pipeline_en_5.5.0_3.0_1727091996203.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_eli5_mlm_model_abishines_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_eli5_mlm_model_abishines_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_eli5_mlm_model_abishines_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|306.5 MB| + +## References + +https://huggingface.co/abishines/my_awesome_eli5_mlm_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_eli5_mlm_model_eitanli_en.md b/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_eli5_mlm_model_eitanli_en.md new file mode 100644 index 00000000000000..8df04212d275c7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_eli5_mlm_model_eitanli_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_awesome_eli5_mlm_model_eitanli RoBertaEmbeddings from Eitanli +author: John Snow Labs +name: burmese_awesome_eli5_mlm_model_eitanli +date: 2024-09-23 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_eli5_mlm_model_eitanli` is a English model originally trained by Eitanli. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_eli5_mlm_model_eitanli_en_5.5.0_3.0_1727080650223.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_eli5_mlm_model_eitanli_en_5.5.0_3.0_1727080650223.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("burmese_awesome_eli5_mlm_model_eitanli","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("burmese_awesome_eli5_mlm_model_eitanli","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_eli5_mlm_model_eitanli| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|306.5 MB| + +## References + +https://huggingface.co/Eitanli/my_awesome_eli5_mlm_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_eli5_mlm_model_eitanli_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_eli5_mlm_model_eitanli_pipeline_en.md new file mode 100644 index 00000000000000..c205ae5a6cb673 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_eli5_mlm_model_eitanli_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_eli5_mlm_model_eitanli_pipeline pipeline RoBertaEmbeddings from Eitanli +author: John Snow Labs +name: burmese_awesome_eli5_mlm_model_eitanli_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_eli5_mlm_model_eitanli_pipeline` is a English model originally trained by Eitanli. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_eli5_mlm_model_eitanli_pipeline_en_5.5.0_3.0_1727080664803.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_eli5_mlm_model_eitanli_pipeline_en_5.5.0_3.0_1727080664803.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_eli5_mlm_model_eitanli_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_eli5_mlm_model_eitanli_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_eli5_mlm_model_eitanli_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|306.5 MB| + +## References + +https://huggingface.co/Eitanli/my_awesome_eli5_mlm_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_eli5_mlm_model_nerdygene_en.md b/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_eli5_mlm_model_nerdygene_en.md new file mode 100644 index 00000000000000..e7a36473f62ea3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_eli5_mlm_model_nerdygene_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_awesome_eli5_mlm_model_nerdygene RoBertaEmbeddings from nerdygene +author: John Snow Labs +name: burmese_awesome_eli5_mlm_model_nerdygene +date: 2024-09-23 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_eli5_mlm_model_nerdygene` is a English model originally trained by nerdygene. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_eli5_mlm_model_nerdygene_en_5.5.0_3.0_1727121581035.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_eli5_mlm_model_nerdygene_en_5.5.0_3.0_1727121581035.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("burmese_awesome_eli5_mlm_model_nerdygene","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("burmese_awesome_eli5_mlm_model_nerdygene","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_eli5_mlm_model_nerdygene| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|306.5 MB| + +## References + +https://huggingface.co/nerdygene/my_awesome_eli5_mlm_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_eli5_mlm_model_nerdygene_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_eli5_mlm_model_nerdygene_pipeline_en.md new file mode 100644 index 00000000000000..1718993a6f2df1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_eli5_mlm_model_nerdygene_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_eli5_mlm_model_nerdygene_pipeline pipeline RoBertaEmbeddings from nerdygene +author: John Snow Labs +name: burmese_awesome_eli5_mlm_model_nerdygene_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_eli5_mlm_model_nerdygene_pipeline` is a English model originally trained by nerdygene. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_eli5_mlm_model_nerdygene_pipeline_en_5.5.0_3.0_1727121595562.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_eli5_mlm_model_nerdygene_pipeline_en_5.5.0_3.0_1727121595562.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_eli5_mlm_model_nerdygene_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_eli5_mlm_model_nerdygene_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_eli5_mlm_model_nerdygene_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|306.5 MB| + +## References + +https://huggingface.co/nerdygene/my_awesome_eli5_mlm_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_eli5_mlm_model_waterboy111_en.md b/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_eli5_mlm_model_waterboy111_en.md new file mode 100644 index 00000000000000..082699cce20e4a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_eli5_mlm_model_waterboy111_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_awesome_eli5_mlm_model_waterboy111 RoBertaEmbeddings from waterboy111 +author: John Snow Labs +name: burmese_awesome_eli5_mlm_model_waterboy111 +date: 2024-09-23 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_eli5_mlm_model_waterboy111` is a English model originally trained by waterboy111. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_eli5_mlm_model_waterboy111_en_5.5.0_3.0_1727066058256.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_eli5_mlm_model_waterboy111_en_5.5.0_3.0_1727066058256.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("burmese_awesome_eli5_mlm_model_waterboy111","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("burmese_awesome_eli5_mlm_model_waterboy111","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_eli5_mlm_model_waterboy111| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|306.5 MB| + +## References + +https://huggingface.co/waterboy111/my_awesome_eli5_mlm_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_eli5_mlm_model_waterboy111_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_eli5_mlm_model_waterboy111_pipeline_en.md new file mode 100644 index 00000000000000..3ec5f7ced9a1dc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_eli5_mlm_model_waterboy111_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_eli5_mlm_model_waterboy111_pipeline pipeline RoBertaEmbeddings from waterboy111 +author: John Snow Labs +name: burmese_awesome_eli5_mlm_model_waterboy111_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_eli5_mlm_model_waterboy111_pipeline` is a English model originally trained by waterboy111. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_eli5_mlm_model_waterboy111_pipeline_en_5.5.0_3.0_1727066074227.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_eli5_mlm_model_waterboy111_pipeline_en_5.5.0_3.0_1727066074227.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_eli5_mlm_model_waterboy111_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_eli5_mlm_model_waterboy111_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_eli5_mlm_model_waterboy111_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|306.5 MB| + +## References + +https://huggingface.co/waterboy111/my_awesome_eli5_mlm_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_model_bertester_en.md b/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_model_bertester_en.md new file mode 100644 index 00000000000000..86b693130f716c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_model_bertester_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_awesome_model_bertester DistilBertForSequenceClassification from bertester +author: John Snow Labs +name: burmese_awesome_model_bertester +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_bertester` is a English model originally trained by bertester. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_bertester_en_5.5.0_3.0_1727059741813.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_bertester_en_5.5.0_3.0_1727059741813.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_bertester","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_bertester", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_bertester| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/bertester/my_awesome_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_model_beto_en.md b/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_model_beto_en.md new file mode 100644 index 00000000000000..3000f21026bc2f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_model_beto_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_awesome_model_beto BertForSequenceClassification from maic1995 +author: John Snow Labs +name: burmese_awesome_model_beto +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_beto` is a English model originally trained by maic1995. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_beto_en_5.5.0_3.0_1727095427947.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_beto_en_5.5.0_3.0_1727095427947.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("burmese_awesome_model_beto","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("burmese_awesome_model_beto", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_beto| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|411.9 MB| + +## References + +https://huggingface.co/maic1995/my_awesome_model_beto \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_model_beto_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_model_beto_pipeline_en.md new file mode 100644 index 00000000000000..28624435288c5b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_model_beto_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_model_beto_pipeline pipeline BertForSequenceClassification from maic1995 +author: John Snow Labs +name: burmese_awesome_model_beto_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_beto_pipeline` is a English model originally trained by maic1995. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_beto_pipeline_en_5.5.0_3.0_1727095446948.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_beto_pipeline_en_5.5.0_3.0_1727095446948.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_model_beto_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_model_beto_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_beto_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|411.9 MB| + +## References + +https://huggingface.co/maic1995/my_awesome_model_beto + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_model_brianrigoni_en.md b/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_model_brianrigoni_en.md new file mode 100644 index 00000000000000..ff879bf7908d5b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_model_brianrigoni_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_awesome_model_brianrigoni DistilBertForSequenceClassification from brianrigoni +author: John Snow Labs +name: burmese_awesome_model_brianrigoni +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_brianrigoni` is a English model originally trained by brianrigoni. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_brianrigoni_en_5.5.0_3.0_1727097055185.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_brianrigoni_en_5.5.0_3.0_1727097055185.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_brianrigoni","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_brianrigoni", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_brianrigoni| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/brianrigoni/my_awesome_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_model_brianrigoni_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_model_brianrigoni_pipeline_en.md new file mode 100644 index 00000000000000..6f90e4cc730e5d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_model_brianrigoni_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_model_brianrigoni_pipeline pipeline DistilBertForSequenceClassification from brianrigoni +author: John Snow Labs +name: burmese_awesome_model_brianrigoni_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_brianrigoni_pipeline` is a English model originally trained by brianrigoni. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_brianrigoni_pipeline_en_5.5.0_3.0_1727097066695.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_brianrigoni_pipeline_en_5.5.0_3.0_1727097066695.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_model_brianrigoni_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_model_brianrigoni_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_brianrigoni_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/brianrigoni/my_awesome_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_model_eyeonyou_en.md b/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_model_eyeonyou_en.md new file mode 100644 index 00000000000000..fae0c78e7ffedd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_model_eyeonyou_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_awesome_model_eyeonyou DistilBertForSequenceClassification from eyeonyou +author: John Snow Labs +name: burmese_awesome_model_eyeonyou +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_eyeonyou` is a English model originally trained by eyeonyou. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_eyeonyou_en_5.5.0_3.0_1727110501859.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_eyeonyou_en_5.5.0_3.0_1727110501859.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_eyeonyou","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_eyeonyou", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_eyeonyou| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/eyeonyou/my_awesome_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_model_eyeonyou_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_model_eyeonyou_pipeline_en.md new file mode 100644 index 00000000000000..ed64509c203950 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_model_eyeonyou_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_model_eyeonyou_pipeline pipeline DistilBertForSequenceClassification from eyeonyou +author: John Snow Labs +name: burmese_awesome_model_eyeonyou_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_eyeonyou_pipeline` is a English model originally trained by eyeonyou. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_eyeonyou_pipeline_en_5.5.0_3.0_1727110514298.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_eyeonyou_pipeline_en_5.5.0_3.0_1727110514298.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_model_eyeonyou_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_model_eyeonyou_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_eyeonyou_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.6 MB| + +## References + +https://huggingface.co/eyeonyou/my_awesome_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_model_habiba_2227_en.md b/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_model_habiba_2227_en.md new file mode 100644 index 00000000000000..8e87f7333419ba --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_model_habiba_2227_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_awesome_model_habiba_2227 DistilBertForSequenceClassification from habiba-2227 +author: John Snow Labs +name: burmese_awesome_model_habiba_2227 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_habiba_2227` is a English model originally trained by habiba-2227. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_habiba_2227_en_5.5.0_3.0_1727059385052.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_habiba_2227_en_5.5.0_3.0_1727059385052.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_habiba_2227","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_habiba_2227", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_habiba_2227| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/habiba-2227/my_awesome_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_model_habiba_2227_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_model_habiba_2227_pipeline_en.md new file mode 100644 index 00000000000000..ecca5f6cd3a031 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_model_habiba_2227_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_model_habiba_2227_pipeline pipeline DistilBertForSequenceClassification from habiba-2227 +author: John Snow Labs +name: burmese_awesome_model_habiba_2227_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_habiba_2227_pipeline` is a English model originally trained by habiba-2227. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_habiba_2227_pipeline_en_5.5.0_3.0_1727059396976.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_habiba_2227_pipeline_en_5.5.0_3.0_1727059396976.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_model_habiba_2227_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_model_habiba_2227_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_habiba_2227_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/habiba-2227/my_awesome_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_model_jbar646_en.md b/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_model_jbar646_en.md new file mode 100644 index 00000000000000..677258c25f981f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_model_jbar646_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_awesome_model_jbar646 DistilBertForSequenceClassification from jbar646 +author: John Snow Labs +name: burmese_awesome_model_jbar646 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_jbar646` is a English model originally trained by jbar646. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_jbar646_en_5.5.0_3.0_1727059238431.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_jbar646_en_5.5.0_3.0_1727059238431.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_jbar646","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_jbar646", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_jbar646| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/jbar646/my_awesome_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_model_jbar646_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_model_jbar646_pipeline_en.md new file mode 100644 index 00000000000000..08d838562f25de --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_model_jbar646_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_model_jbar646_pipeline pipeline DistilBertForSequenceClassification from jbar646 +author: John Snow Labs +name: burmese_awesome_model_jbar646_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_jbar646_pipeline` is a English model originally trained by jbar646. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_jbar646_pipeline_en_5.5.0_3.0_1727059256013.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_jbar646_pipeline_en_5.5.0_3.0_1727059256013.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_model_jbar646_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_model_jbar646_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_jbar646_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/jbar646/my_awesome_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_model_julianorosco37_en.md b/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_model_julianorosco37_en.md new file mode 100644 index 00000000000000..ee551c49925ae6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_model_julianorosco37_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_awesome_model_julianorosco37 DistilBertForSequenceClassification from Julianorosco37 +author: John Snow Labs +name: burmese_awesome_model_julianorosco37 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_julianorosco37` is a English model originally trained by Julianorosco37. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_julianorosco37_en_5.5.0_3.0_1727082352214.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_julianorosco37_en_5.5.0_3.0_1727082352214.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_julianorosco37","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_julianorosco37", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_julianorosco37| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Julianorosco37/my_awesome_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_model_julianorosco37_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_model_julianorosco37_pipeline_en.md new file mode 100644 index 00000000000000..b75a2d3c8d4658 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_model_julianorosco37_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_model_julianorosco37_pipeline pipeline DistilBertForSequenceClassification from Julianorosco37 +author: John Snow Labs +name: burmese_awesome_model_julianorosco37_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_julianorosco37_pipeline` is a English model originally trained by Julianorosco37. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_julianorosco37_pipeline_en_5.5.0_3.0_1727082364112.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_julianorosco37_pipeline_en_5.5.0_3.0_1727082364112.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_model_julianorosco37_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_model_julianorosco37_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_julianorosco37_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Julianorosco37/my_awesome_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_model_kelvinleong_en.md b/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_model_kelvinleong_en.md new file mode 100644 index 00000000000000..69acfa74ee9e1d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_model_kelvinleong_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_awesome_model_kelvinleong RoBertaForSequenceClassification from kelvinleong +author: John Snow Labs +name: burmese_awesome_model_kelvinleong +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_kelvinleong` is a English model originally trained by kelvinleong. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_kelvinleong_en_5.5.0_3.0_1727055362644.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_kelvinleong_en_5.5.0_3.0_1727055362644.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("burmese_awesome_model_kelvinleong","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("burmese_awesome_model_kelvinleong", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_kelvinleong| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|424.1 MB| + +## References + +https://huggingface.co/kelvinleong/my_awesome_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_model_kelvinleong_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_model_kelvinleong_pipeline_en.md new file mode 100644 index 00000000000000..2989130641af6e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_model_kelvinleong_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_model_kelvinleong_pipeline pipeline RoBertaForSequenceClassification from kelvinleong +author: John Snow Labs +name: burmese_awesome_model_kelvinleong_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_kelvinleong_pipeline` is a English model originally trained by kelvinleong. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_kelvinleong_pipeline_en_5.5.0_3.0_1727055391924.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_kelvinleong_pipeline_en_5.5.0_3.0_1727055391924.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_model_kelvinleong_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_model_kelvinleong_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_kelvinleong_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|424.1 MB| + +## References + +https://huggingface.co/kelvinleong/my_awesome_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_model_nandini54_en.md b/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_model_nandini54_en.md new file mode 100644 index 00000000000000..09f68966a30f55 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_model_nandini54_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_awesome_model_nandini54 DistilBertForSequenceClassification from Nandini54 +author: John Snow Labs +name: burmese_awesome_model_nandini54 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_nandini54` is a English model originally trained by Nandini54. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_nandini54_en_5.5.0_3.0_1727108287838.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_nandini54_en_5.5.0_3.0_1727108287838.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_nandini54","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_nandini54", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_nandini54| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Nandini54/my_awesome_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_model_nandini54_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_model_nandini54_pipeline_en.md new file mode 100644 index 00000000000000..044dc6952be30b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_model_nandini54_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_model_nandini54_pipeline pipeline DistilBertForSequenceClassification from Nandini54 +author: John Snow Labs +name: burmese_awesome_model_nandini54_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_nandini54_pipeline` is a English model originally trained by Nandini54. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_nandini54_pipeline_en_5.5.0_3.0_1727108304530.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_nandini54_pipeline_en_5.5.0_3.0_1727108304530.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_model_nandini54_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_model_nandini54_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_nandini54_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Nandini54/my_awesome_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_model_nataliacristina_en.md b/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_model_nataliacristina_en.md new file mode 100644 index 00000000000000..320a1fda96730e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_model_nataliacristina_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_awesome_model_nataliacristina DistilBertForSequenceClassification from nataliacristina +author: John Snow Labs +name: burmese_awesome_model_nataliacristina +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_nataliacristina` is a English model originally trained by nataliacristina. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_nataliacristina_en_5.5.0_3.0_1727073747794.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_nataliacristina_en_5.5.0_3.0_1727073747794.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_nataliacristina","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_nataliacristina", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_nataliacristina| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/nataliacristina/my_awesome_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_model_omertnks_en.md b/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_model_omertnks_en.md new file mode 100644 index 00000000000000..6c0cecb5f5dde6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_model_omertnks_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_awesome_model_omertnks DistilBertForSequenceClassification from omertnks +author: John Snow Labs +name: burmese_awesome_model_omertnks +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_omertnks` is a English model originally trained by omertnks. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_omertnks_en_5.5.0_3.0_1727110600913.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_omertnks_en_5.5.0_3.0_1727110600913.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_omertnks","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_omertnks", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_omertnks| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/omertnks/my_awesome_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_model_omertnks_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_model_omertnks_pipeline_en.md new file mode 100644 index 00000000000000..360cbc46b51951 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_model_omertnks_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_model_omertnks_pipeline pipeline DistilBertForSequenceClassification from omertnks +author: John Snow Labs +name: burmese_awesome_model_omertnks_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_omertnks_pipeline` is a English model originally trained by omertnks. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_omertnks_pipeline_en_5.5.0_3.0_1727110612838.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_omertnks_pipeline_en_5.5.0_3.0_1727110612838.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_model_omertnks_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_model_omertnks_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_omertnks_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/omertnks/my_awesome_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_model_tsibbett_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_model_tsibbett_pipeline_en.md new file mode 100644 index 00000000000000..3fa715b01a4ba8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_model_tsibbett_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_model_tsibbett_pipeline pipeline DistilBertForSequenceClassification from tsibbett +author: John Snow Labs +name: burmese_awesome_model_tsibbett_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_tsibbett_pipeline` is a English model originally trained by tsibbett. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_tsibbett_pipeline_en_5.5.0_3.0_1727082302505.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_tsibbett_pipeline_en_5.5.0_3.0_1727082302505.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_model_tsibbett_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_model_tsibbett_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_tsibbett_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/tsibbett/my_awesome_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_model_zeckhardt_en.md b/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_model_zeckhardt_en.md new file mode 100644 index 00000000000000..4a4f6206972f20 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_model_zeckhardt_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_awesome_model_zeckhardt DistilBertForSequenceClassification from zeckhardt +author: John Snow Labs +name: burmese_awesome_model_zeckhardt +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_zeckhardt` is a English model originally trained by zeckhardt. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_zeckhardt_en_5.5.0_3.0_1727097140210.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_zeckhardt_en_5.5.0_3.0_1727097140210.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_zeckhardt","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_zeckhardt", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_zeckhardt| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/zeckhardt/my_awesome_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_model_zeckhardt_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_model_zeckhardt_pipeline_en.md new file mode 100644 index 00000000000000..5dac3fc85f87b0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_model_zeckhardt_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_model_zeckhardt_pipeline pipeline DistilBertForSequenceClassification from zeckhardt +author: John Snow Labs +name: burmese_awesome_model_zeckhardt_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_zeckhardt_pipeline` is a English model originally trained by zeckhardt. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_zeckhardt_pipeline_en_5.5.0_3.0_1727097152156.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_zeckhardt_pipeline_en_5.5.0_3.0_1727097152156.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_model_zeckhardt_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_model_zeckhardt_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_zeckhardt_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/zeckhardt/my_awesome_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_qa_model_dennischan_en.md b/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_qa_model_dennischan_en.md new file mode 100644 index 00000000000000..8c520d4c715502 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_qa_model_dennischan_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English burmese_awesome_qa_model_dennischan BertForQuestionAnswering from dennischan +author: John Snow Labs +name: burmese_awesome_qa_model_dennischan +date: 2024-09-23 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_qa_model_dennischan` is a English model originally trained by dennischan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_dennischan_en_5.5.0_3.0_1727049931660.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_dennischan_en_5.5.0_3.0_1727049931660.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("burmese_awesome_qa_model_dennischan","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("burmese_awesome_qa_model_dennischan", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_qa_model_dennischan| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/dennischan/my_awesome_qa_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_qa_model_dennischan_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_qa_model_dennischan_pipeline_en.md new file mode 100644 index 00000000000000..80bca63d652869 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_qa_model_dennischan_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English burmese_awesome_qa_model_dennischan_pipeline pipeline BertForQuestionAnswering from dennischan +author: John Snow Labs +name: burmese_awesome_qa_model_dennischan_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_qa_model_dennischan_pipeline` is a English model originally trained by dennischan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_dennischan_pipeline_en_5.5.0_3.0_1727049952851.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_dennischan_pipeline_en_5.5.0_3.0_1727049952851.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_qa_model_dennischan_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_qa_model_dennischan_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_qa_model_dennischan_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/dennischan/my_awesome_qa_model + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_wnut_model_efar98_en.md b/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_wnut_model_efar98_en.md new file mode 100644 index 00000000000000..a4eaed158e01bb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_wnut_model_efar98_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_awesome_wnut_model_efar98 DistilBertForTokenClassification from Efar98 +author: John Snow Labs +name: burmese_awesome_wnut_model_efar98 +date: 2024-09-23 +tags: [en, open_source, onnx, token_classification, distilbert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_wnut_model_efar98` is a English model originally trained by Efar98. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_wnut_model_efar98_en_5.5.0_3.0_1727065440434.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_wnut_model_efar98_en_5.5.0_3.0_1727065440434.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = DistilBertForTokenClassification.pretrained("burmese_awesome_wnut_model_efar98","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = DistilBertForTokenClassification.pretrained("burmese_awesome_wnut_model_efar98", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_wnut_model_efar98| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|247.3 MB| + +## References + +https://huggingface.co/Efar98/my_awesome_wnut_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_wnut_model_efar98_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_wnut_model_efar98_pipeline_en.md new file mode 100644 index 00000000000000..486c2d30796129 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_wnut_model_efar98_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_wnut_model_efar98_pipeline pipeline DistilBertForTokenClassification from Efar98 +author: John Snow Labs +name: burmese_awesome_wnut_model_efar98_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_wnut_model_efar98_pipeline` is a English model originally trained by Efar98. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_wnut_model_efar98_pipeline_en_5.5.0_3.0_1727065452944.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_wnut_model_efar98_pipeline_en_5.5.0_3.0_1727065452944.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_wnut_model_efar98_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_wnut_model_efar98_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_wnut_model_efar98_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.3 MB| + +## References + +https://huggingface.co/Efar98/my_awesome_wnut_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_wnut_model_mandel94_en.md b/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_wnut_model_mandel94_en.md new file mode 100644 index 00000000000000..dd5de948ff00f2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_wnut_model_mandel94_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_awesome_wnut_model_mandel94 DistilBertForTokenClassification from mandel94 +author: John Snow Labs +name: burmese_awesome_wnut_model_mandel94 +date: 2024-09-23 +tags: [en, open_source, onnx, token_classification, distilbert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_wnut_model_mandel94` is a English model originally trained by mandel94. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_wnut_model_mandel94_en_5.5.0_3.0_1727120665317.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_wnut_model_mandel94_en_5.5.0_3.0_1727120665317.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = DistilBertForTokenClassification.pretrained("burmese_awesome_wnut_model_mandel94","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = DistilBertForTokenClassification.pretrained("burmese_awesome_wnut_model_mandel94", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_wnut_model_mandel94| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|247.3 MB| + +## References + +https://huggingface.co/mandel94/my_awesome_wnut_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_wnut_model_mandel94_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_wnut_model_mandel94_pipeline_en.md new file mode 100644 index 00000000000000..83024f907fd6a8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_wnut_model_mandel94_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_wnut_model_mandel94_pipeline pipeline DistilBertForTokenClassification from mandel94 +author: John Snow Labs +name: burmese_awesome_wnut_model_mandel94_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_wnut_model_mandel94_pipeline` is a English model originally trained by mandel94. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_wnut_model_mandel94_pipeline_en_5.5.0_3.0_1727120677223.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_wnut_model_mandel94_pipeline_en_5.5.0_3.0_1727120677223.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_wnut_model_mandel94_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_wnut_model_mandel94_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_wnut_model_mandel94_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.3 MB| + +## References + +https://huggingface.co/mandel94/my_awesome_wnut_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-burmese_textclassification_model_en.md b/docs/_posts/ahmedlone127/2024-09-23-burmese_textclassification_model_en.md new file mode 100644 index 00000000000000..d50d77324a8550 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-burmese_textclassification_model_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_textclassification_model DistilBertForSequenceClassification from Happpy0413 +author: John Snow Labs +name: burmese_textclassification_model +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_textclassification_model` is a English model originally trained by Happpy0413. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_textclassification_model_en_5.5.0_3.0_1727059119275.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_textclassification_model_en_5.5.0_3.0_1727059119275.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_textclassification_model","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_textclassification_model", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_textclassification_model| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Happpy0413/my_textclassification_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-burmese_textclassification_model_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-burmese_textclassification_model_pipeline_en.md new file mode 100644 index 00000000000000..71fa91ebce29de --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-burmese_textclassification_model_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_textclassification_model_pipeline pipeline DistilBertForSequenceClassification from Happpy0413 +author: John Snow Labs +name: burmese_textclassification_model_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_textclassification_model_pipeline` is a English model originally trained by Happpy0413. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_textclassification_model_pipeline_en_5.5.0_3.0_1727059133075.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_textclassification_model_pipeline_en_5.5.0_3.0_1727059133075.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_textclassification_model_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_textclassification_model_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_textclassification_model_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Happpy0413/my_textclassification_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-canbert_en.md b/docs/_posts/ahmedlone127/2024-09-23-canbert_en.md new file mode 100644 index 00000000000000..a34f2f1fc0ea10 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-canbert_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English canbert RoBertaEmbeddings from ebelenwaf +author: John Snow Labs +name: canbert +date: 2024-09-23 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`canbert` is a English model originally trained by ebelenwaf. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/canbert_en_5.5.0_3.0_1727056582415.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/canbert_en_5.5.0_3.0_1727056582415.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("canbert","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("canbert","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|canbert| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|311.4 MB| + +## References + +https://huggingface.co/ebelenwaf/canbert \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-canbert_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-canbert_pipeline_en.md new file mode 100644 index 00000000000000..12688705bc5f19 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-canbert_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English canbert_pipeline pipeline RoBertaEmbeddings from ebelenwaf +author: John Snow Labs +name: canbert_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`canbert_pipeline` is a English model originally trained by ebelenwaf. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/canbert_pipeline_en_5.5.0_3.0_1727056597928.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/canbert_pipeline_en_5.5.0_3.0_1727056597928.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("canbert_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("canbert_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|canbert_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|311.5 MB| + +## References + +https://huggingface.co/ebelenwaf/canbert + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-cebfil_roberta_en.md b/docs/_posts/ahmedlone127/2024-09-23-cebfil_roberta_en.md new file mode 100644 index 00000000000000..63e25b982053f5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-cebfil_roberta_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English cebfil_roberta RoBertaEmbeddings from jfernandez +author: John Snow Labs +name: cebfil_roberta +date: 2024-09-23 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`cebfil_roberta` is a English model originally trained by jfernandez. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/cebfil_roberta_en_5.5.0_3.0_1727057002605.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/cebfil_roberta_en_5.5.0_3.0_1727057002605.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("cebfil_roberta","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("cebfil_roberta","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|cebfil_roberta| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|469.5 MB| + +## References + +https://huggingface.co/jfernandez/cebfil-roberta \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-cebfil_roberta_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-cebfil_roberta_pipeline_en.md new file mode 100644 index 00000000000000..c2c13f601879a8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-cebfil_roberta_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English cebfil_roberta_pipeline pipeline RoBertaEmbeddings from jfernandez +author: John Snow Labs +name: cebfil_roberta_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`cebfil_roberta_pipeline` is a English model originally trained by jfernandez. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/cebfil_roberta_pipeline_en_5.5.0_3.0_1727057025789.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/cebfil_roberta_pipeline_en_5.5.0_3.0_1727057025789.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("cebfil_roberta_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("cebfil_roberta_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|cebfil_roberta_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|469.5 MB| + +## References + +https://huggingface.co/jfernandez/cebfil-roberta + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-classifier_main_subjects_technology_en.md b/docs/_posts/ahmedlone127/2024-09-23-classifier_main_subjects_technology_en.md new file mode 100644 index 00000000000000..14ca07d64c927e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-classifier_main_subjects_technology_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English classifier_main_subjects_technology RoBertaForSequenceClassification from gptmurdock +author: John Snow Labs +name: classifier_main_subjects_technology +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`classifier_main_subjects_technology` is a English model originally trained by gptmurdock. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/classifier_main_subjects_technology_en_5.5.0_3.0_1727086265650.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/classifier_main_subjects_technology_en_5.5.0_3.0_1727086265650.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("classifier_main_subjects_technology","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("classifier_main_subjects_technology", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|classifier_main_subjects_technology| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|463.9 MB| + +## References + +https://huggingface.co/gptmurdock/classifier-main_subjects_technology \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-classifier_main_subjects_technology_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-classifier_main_subjects_technology_pipeline_en.md new file mode 100644 index 00000000000000..b1cde9d34b567c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-classifier_main_subjects_technology_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English classifier_main_subjects_technology_pipeline pipeline RoBertaForSequenceClassification from gptmurdock +author: John Snow Labs +name: classifier_main_subjects_technology_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`classifier_main_subjects_technology_pipeline` is a English model originally trained by gptmurdock. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/classifier_main_subjects_technology_pipeline_en_5.5.0_3.0_1727086289605.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/classifier_main_subjects_technology_pipeline_en_5.5.0_3.0_1727086289605.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("classifier_main_subjects_technology_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("classifier_main_subjects_technology_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|classifier_main_subjects_technology_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|463.9 MB| + +## References + +https://huggingface.co/gptmurdock/classifier-main_subjects_technology + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-classifier_tensorride_en.md b/docs/_posts/ahmedlone127/2024-09-23-classifier_tensorride_en.md new file mode 100644 index 00000000000000..d1a87f5ae52af6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-classifier_tensorride_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English classifier_tensorride DistilBertForSequenceClassification from Tensorride +author: John Snow Labs +name: classifier_tensorride +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`classifier_tensorride` is a English model originally trained by Tensorride. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/classifier_tensorride_en_5.5.0_3.0_1727059238148.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/classifier_tensorride_en_5.5.0_3.0_1727059238148.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("classifier_tensorride","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("classifier_tensorride", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|classifier_tensorride| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Tensorride/Classifier \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-classifier_tensorride_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-classifier_tensorride_pipeline_en.md new file mode 100644 index 00000000000000..a6050c67292e9d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-classifier_tensorride_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English classifier_tensorride_pipeline pipeline DistilBertForSequenceClassification from Tensorride +author: John Snow Labs +name: classifier_tensorride_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`classifier_tensorride_pipeline` is a English model originally trained by Tensorride. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/classifier_tensorride_pipeline_en_5.5.0_3.0_1727059256095.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/classifier_tensorride_pipeline_en_5.5.0_3.0_1727059256095.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("classifier_tensorride_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("classifier_tensorride_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|classifier_tensorride_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Tensorride/Classifier + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-climate_change_en.md b/docs/_posts/ahmedlone127/2024-09-23-climate_change_en.md new file mode 100644 index 00000000000000..addff67c1f1d61 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-climate_change_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English climate_change RoBertaEmbeddings from DhirajKumarSahu +author: John Snow Labs +name: climate_change +date: 2024-09-23 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`climate_change` is a English model originally trained by DhirajKumarSahu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/climate_change_en_5.5.0_3.0_1727066208595.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/climate_change_en_5.5.0_3.0_1727066208595.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("climate_change","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("climate_change","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|climate_change| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|384.4 MB| + +## References + +https://huggingface.co/DhirajKumarSahu/climate_change \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-climate_change_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-climate_change_pipeline_en.md new file mode 100644 index 00000000000000..71df9b8ada85b6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-climate_change_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English climate_change_pipeline pipeline RoBertaEmbeddings from DhirajKumarSahu +author: John Snow Labs +name: climate_change_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`climate_change_pipeline` is a English model originally trained by DhirajKumarSahu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/climate_change_pipeline_en_5.5.0_3.0_1727066259462.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/climate_change_pipeline_en_5.5.0_3.0_1727066259462.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("climate_change_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("climate_change_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|climate_change_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|384.4 MB| + +## References + +https://huggingface.co/DhirajKumarSahu/climate_change + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-code_search_codebert_base_2_en.md b/docs/_posts/ahmedlone127/2024-09-23-code_search_codebert_base_2_en.md new file mode 100644 index 00000000000000..7c19af396317dc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-code_search_codebert_base_2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English code_search_codebert_base_2 RoBertaForTokenClassification from DianaIulia +author: John Snow Labs +name: code_search_codebert_base_2 +date: 2024-09-23 +tags: [en, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`code_search_codebert_base_2` is a English model originally trained by DianaIulia. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/code_search_codebert_base_2_en_5.5.0_3.0_1727081601856.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/code_search_codebert_base_2_en_5.5.0_3.0_1727081601856.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("code_search_codebert_base_2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("code_search_codebert_base_2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|code_search_codebert_base_2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|466.1 MB| + +## References + +https://huggingface.co/DianaIulia/code_search_codebert_base_2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-code_search_codebert_base_2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-code_search_codebert_base_2_pipeline_en.md new file mode 100644 index 00000000000000..a304601fd13fdd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-code_search_codebert_base_2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English code_search_codebert_base_2_pipeline pipeline RoBertaForTokenClassification from DianaIulia +author: John Snow Labs +name: code_search_codebert_base_2_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`code_search_codebert_base_2_pipeline` is a English model originally trained by DianaIulia. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/code_search_codebert_base_2_pipeline_en_5.5.0_3.0_1727081625835.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/code_search_codebert_base_2_pipeline_en_5.5.0_3.0_1727081625835.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("code_search_codebert_base_2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("code_search_codebert_base_2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|code_search_codebert_base_2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|466.2 MB| + +## References + +https://huggingface.co/DianaIulia/code_search_codebert_base_2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-coha1920s_en.md b/docs/_posts/ahmedlone127/2024-09-23-coha1920s_en.md new file mode 100644 index 00000000000000..f965f782b36426 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-coha1920s_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English coha1920s RoBertaEmbeddings from simonmun +author: John Snow Labs +name: coha1920s +date: 2024-09-23 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`coha1920s` is a English model originally trained by simonmun. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/coha1920s_en_5.5.0_3.0_1727121976760.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/coha1920s_en_5.5.0_3.0_1727121976760.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("coha1920s","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("coha1920s","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|coha1920s| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|311.4 MB| + +## References + +https://huggingface.co/simonmun/COHA1920s \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-coha1920s_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-coha1920s_pipeline_en.md new file mode 100644 index 00000000000000..9a5de14eb39629 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-coha1920s_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English coha1920s_pipeline pipeline RoBertaEmbeddings from simonmun +author: John Snow Labs +name: coha1920s_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`coha1920s_pipeline` is a English model originally trained by simonmun. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/coha1920s_pipeline_en_5.5.0_3.0_1727121990916.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/coha1920s_pipeline_en_5.5.0_3.0_1727121990916.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("coha1920s_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("coha1920s_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|coha1920s_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|311.5 MB| + +## References + +https://huggingface.co/simonmun/COHA1920s + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-coha1930s_en.md b/docs/_posts/ahmedlone127/2024-09-23-coha1930s_en.md new file mode 100644 index 00000000000000..276548f45c575e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-coha1930s_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English coha1930s RoBertaEmbeddings from simonmun +author: John Snow Labs +name: coha1930s +date: 2024-09-23 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`coha1930s` is a English model originally trained by simonmun. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/coha1930s_en_5.5.0_3.0_1727121606144.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/coha1930s_en_5.5.0_3.0_1727121606144.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("coha1930s","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("coha1930s","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|coha1930s| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|311.8 MB| + +## References + +https://huggingface.co/simonmun/COHA1930s \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-coha1930s_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-coha1930s_pipeline_en.md new file mode 100644 index 00000000000000..7d693a28c08f97 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-coha1930s_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English coha1930s_pipeline pipeline RoBertaEmbeddings from simonmun +author: John Snow Labs +name: coha1930s_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`coha1930s_pipeline` is a English model originally trained by simonmun. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/coha1930s_pipeline_en_5.5.0_3.0_1727121620775.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/coha1930s_pipeline_en_5.5.0_3.0_1727121620775.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("coha1930s_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("coha1930s_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|coha1930s_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|311.9 MB| + +## References + +https://huggingface.co/simonmun/COHA1930s + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-cold_fusion_itr23_seed1_en.md b/docs/_posts/ahmedlone127/2024-09-23-cold_fusion_itr23_seed1_en.md new file mode 100644 index 00000000000000..ec937274739976 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-cold_fusion_itr23_seed1_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English cold_fusion_itr23_seed1 RoBertaForSequenceClassification from ibm +author: John Snow Labs +name: cold_fusion_itr23_seed1 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`cold_fusion_itr23_seed1` is a English model originally trained by ibm. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/cold_fusion_itr23_seed1_en_5.5.0_3.0_1727135590809.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/cold_fusion_itr23_seed1_en_5.5.0_3.0_1727135590809.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("cold_fusion_itr23_seed1","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("cold_fusion_itr23_seed1", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|cold_fusion_itr23_seed1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|468.0 MB| + +## References + +https://huggingface.co/ibm/ColD-Fusion-itr23-seed1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-cold_fusion_itr23_seed1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-cold_fusion_itr23_seed1_pipeline_en.md new file mode 100644 index 00000000000000..9e64ec18c6eb0c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-cold_fusion_itr23_seed1_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English cold_fusion_itr23_seed1_pipeline pipeline RoBertaForSequenceClassification from ibm +author: John Snow Labs +name: cold_fusion_itr23_seed1_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`cold_fusion_itr23_seed1_pipeline` is a English model originally trained by ibm. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/cold_fusion_itr23_seed1_pipeline_en_5.5.0_3.0_1727135614447.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/cold_fusion_itr23_seed1_pipeline_en_5.5.0_3.0_1727135614447.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("cold_fusion_itr23_seed1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("cold_fusion_itr23_seed1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|cold_fusion_itr23_seed1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|468.0 MB| + +## References + +https://huggingface.co/ibm/ColD-Fusion-itr23-seed1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-cold_fusion_itr25_seed4_en.md b/docs/_posts/ahmedlone127/2024-09-23-cold_fusion_itr25_seed4_en.md new file mode 100644 index 00000000000000..12390a1fea6691 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-cold_fusion_itr25_seed4_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English cold_fusion_itr25_seed4 RoBertaForSequenceClassification from ibm +author: John Snow Labs +name: cold_fusion_itr25_seed4 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`cold_fusion_itr25_seed4` is a English model originally trained by ibm. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/cold_fusion_itr25_seed4_en_5.5.0_3.0_1727134691420.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/cold_fusion_itr25_seed4_en_5.5.0_3.0_1727134691420.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("cold_fusion_itr25_seed4","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("cold_fusion_itr25_seed4", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|cold_fusion_itr25_seed4| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|468.0 MB| + +## References + +https://huggingface.co/ibm/ColD-Fusion-itr25-seed4 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-cold_fusion_itr25_seed4_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-cold_fusion_itr25_seed4_pipeline_en.md new file mode 100644 index 00000000000000..ab4bb3d28ce5e4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-cold_fusion_itr25_seed4_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English cold_fusion_itr25_seed4_pipeline pipeline RoBertaForSequenceClassification from ibm +author: John Snow Labs +name: cold_fusion_itr25_seed4_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`cold_fusion_itr25_seed4_pipeline` is a English model originally trained by ibm. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/cold_fusion_itr25_seed4_pipeline_en_5.5.0_3.0_1727134714843.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/cold_fusion_itr25_seed4_pipeline_en_5.5.0_3.0_1727134714843.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("cold_fusion_itr25_seed4_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("cold_fusion_itr25_seed4_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|cold_fusion_itr25_seed4_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|468.0 MB| + +## References + +https://huggingface.co/ibm/ColD-Fusion-itr25-seed4 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-cold_fusion_itr28_seed3_en.md b/docs/_posts/ahmedlone127/2024-09-23-cold_fusion_itr28_seed3_en.md new file mode 100644 index 00000000000000..1cfaf7c47fe12e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-cold_fusion_itr28_seed3_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English cold_fusion_itr28_seed3 RoBertaForSequenceClassification from ibm +author: John Snow Labs +name: cold_fusion_itr28_seed3 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`cold_fusion_itr28_seed3` is a English model originally trained by ibm. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/cold_fusion_itr28_seed3_en_5.5.0_3.0_1727055385938.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/cold_fusion_itr28_seed3_en_5.5.0_3.0_1727055385938.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("cold_fusion_itr28_seed3","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("cold_fusion_itr28_seed3", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|cold_fusion_itr28_seed3| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|468.0 MB| + +## References + +https://huggingface.co/ibm/ColD-Fusion-itr28-seed3 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-cold_fusion_itr28_seed3_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-cold_fusion_itr28_seed3_pipeline_en.md new file mode 100644 index 00000000000000..fa5aae6b0e1301 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-cold_fusion_itr28_seed3_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English cold_fusion_itr28_seed3_pipeline pipeline RoBertaForSequenceClassification from ibm +author: John Snow Labs +name: cold_fusion_itr28_seed3_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`cold_fusion_itr28_seed3_pipeline` is a English model originally trained by ibm. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/cold_fusion_itr28_seed3_pipeline_en_5.5.0_3.0_1727055408875.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/cold_fusion_itr28_seed3_pipeline_en_5.5.0_3.0_1727055408875.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("cold_fusion_itr28_seed3_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("cold_fusion_itr28_seed3_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|cold_fusion_itr28_seed3_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|468.0 MB| + +## References + +https://huggingface.co/ibm/ColD-Fusion-itr28-seed3 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-comnumpndistilbertv1_big_en.md b/docs/_posts/ahmedlone127/2024-09-23-comnumpndistilbertv1_big_en.md new file mode 100644 index 00000000000000..74b8a1a28a5d6e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-comnumpndistilbertv1_big_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English comnumpndistilbertv1_big DistilBertForSequenceClassification from abbassix +author: John Snow Labs +name: comnumpndistilbertv1_big +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`comnumpndistilbertv1_big` is a English model originally trained by abbassix. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/comnumpndistilbertv1_big_en_5.5.0_3.0_1727086751840.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/comnumpndistilbertv1_big_en_5.5.0_3.0_1727086751840.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("comnumpndistilbertv1_big","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("comnumpndistilbertv1_big", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|comnumpndistilbertv1_big| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|250.4 MB| + +## References + +https://huggingface.co/abbassix/ComNumPNdistilBERTv1-big \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-correct_bert_token_itr0_0_0001_all_01_03_2022_15_52_19_en.md b/docs/_posts/ahmedlone127/2024-09-23-correct_bert_token_itr0_0_0001_all_01_03_2022_15_52_19_en.md new file mode 100644 index 00000000000000..88adf4a4682744 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-correct_bert_token_itr0_0_0001_all_01_03_2022_15_52_19_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English correct_bert_token_itr0_0_0001_all_01_03_2022_15_52_19 BertForTokenClassification from ali2066 +author: John Snow Labs +name: correct_bert_token_itr0_0_0001_all_01_03_2022_15_52_19 +date: 2024-09-23 +tags: [en, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`correct_bert_token_itr0_0_0001_all_01_03_2022_15_52_19` is a English model originally trained by ali2066. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/correct_bert_token_itr0_0_0001_all_01_03_2022_15_52_19_en_5.5.0_3.0_1727111269967.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/correct_bert_token_itr0_0_0001_all_01_03_2022_15_52_19_en_5.5.0_3.0_1727111269967.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("correct_bert_token_itr0_0_0001_all_01_03_2022_15_52_19","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("correct_bert_token_itr0_0_0001_all_01_03_2022_15_52_19", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|correct_bert_token_itr0_0_0001_all_01_03_2022_15_52_19| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/ali2066/correct_BERT_token_itr0_0.0001_all_01_03_2022-15_52_19 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-correct_bert_token_itr0_0_0001_all_01_03_2022_15_52_19_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-correct_bert_token_itr0_0_0001_all_01_03_2022_15_52_19_pipeline_en.md new file mode 100644 index 00000000000000..5bae7ca608497d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-correct_bert_token_itr0_0_0001_all_01_03_2022_15_52_19_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English correct_bert_token_itr0_0_0001_all_01_03_2022_15_52_19_pipeline pipeline BertForTokenClassification from ali2066 +author: John Snow Labs +name: correct_bert_token_itr0_0_0001_all_01_03_2022_15_52_19_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`correct_bert_token_itr0_0_0001_all_01_03_2022_15_52_19_pipeline` is a English model originally trained by ali2066. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/correct_bert_token_itr0_0_0001_all_01_03_2022_15_52_19_pipeline_en_5.5.0_3.0_1727111290012.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/correct_bert_token_itr0_0_0001_all_01_03_2022_15_52_19_pipeline_en_5.5.0_3.0_1727111290012.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("correct_bert_token_itr0_0_0001_all_01_03_2022_15_52_19_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("correct_bert_token_itr0_0_0001_all_01_03_2022_15_52_19_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|correct_bert_token_itr0_0_0001_all_01_03_2022_15_52_19_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/ali2066/correct_BERT_token_itr0_0.0001_all_01_03_2022-15_52_19 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-cos_tapt_n_roberta_en.md b/docs/_posts/ahmedlone127/2024-09-23-cos_tapt_n_roberta_en.md new file mode 100644 index 00000000000000..df8b7044394f48 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-cos_tapt_n_roberta_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English cos_tapt_n_roberta RoBertaEmbeddings from Kyleiwaniec +author: John Snow Labs +name: cos_tapt_n_roberta +date: 2024-09-23 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`cos_tapt_n_roberta` is a English model originally trained by Kyleiwaniec. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/cos_tapt_n_roberta_en_5.5.0_3.0_1727066055984.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/cos_tapt_n_roberta_en_5.5.0_3.0_1727066055984.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("cos_tapt_n_roberta","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("cos_tapt_n_roberta","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|cos_tapt_n_roberta| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/Kyleiwaniec/COS_TAPT_n_RoBERTa \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-cos_tapt_n_roberta_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-cos_tapt_n_roberta_pipeline_en.md new file mode 100644 index 00000000000000..a0572a045cf995 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-cos_tapt_n_roberta_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English cos_tapt_n_roberta_pipeline pipeline RoBertaEmbeddings from Kyleiwaniec +author: John Snow Labs +name: cos_tapt_n_roberta_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`cos_tapt_n_roberta_pipeline` is a English model originally trained by Kyleiwaniec. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/cos_tapt_n_roberta_pipeline_en_5.5.0_3.0_1727066118449.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/cos_tapt_n_roberta_pipeline_en_5.5.0_3.0_1727066118449.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("cos_tapt_n_roberta_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("cos_tapt_n_roberta_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|cos_tapt_n_roberta_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/Kyleiwaniec/COS_TAPT_n_RoBERTa + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-covid_roberta_25_en.md b/docs/_posts/ahmedlone127/2024-09-23-covid_roberta_25_en.md new file mode 100644 index 00000000000000..3799d4c69bc877 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-covid_roberta_25_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English covid_roberta_25 RoBertaEmbeddings from timoneda +author: John Snow Labs +name: covid_roberta_25 +date: 2024-09-23 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`covid_roberta_25` is a English model originally trained by timoneda. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/covid_roberta_25_en_5.5.0_3.0_1727092053022.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/covid_roberta_25_en_5.5.0_3.0_1727092053022.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("covid_roberta_25","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("covid_roberta_25","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|covid_roberta_25| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/timoneda/covid_roberta_25 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-covid_roberta_25_masked_en.md b/docs/_posts/ahmedlone127/2024-09-23-covid_roberta_25_masked_en.md new file mode 100644 index 00000000000000..6b96b87a528677 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-covid_roberta_25_masked_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English covid_roberta_25_masked RoBertaEmbeddings from timoneda +author: John Snow Labs +name: covid_roberta_25_masked +date: 2024-09-23 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`covid_roberta_25_masked` is a English model originally trained by timoneda. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/covid_roberta_25_masked_en_5.5.0_3.0_1727091906525.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/covid_roberta_25_masked_en_5.5.0_3.0_1727091906525.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("covid_roberta_25_masked","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("covid_roberta_25_masked","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|covid_roberta_25_masked| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/timoneda/covid_roberta_25_masked \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-covid_roberta_25_masked_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-covid_roberta_25_masked_pipeline_en.md new file mode 100644 index 00000000000000..dfab35fe39540c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-covid_roberta_25_masked_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English covid_roberta_25_masked_pipeline pipeline RoBertaEmbeddings from timoneda +author: John Snow Labs +name: covid_roberta_25_masked_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`covid_roberta_25_masked_pipeline` is a English model originally trained by timoneda. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/covid_roberta_25_masked_pipeline_en_5.5.0_3.0_1727091972081.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/covid_roberta_25_masked_pipeline_en_5.5.0_3.0_1727091972081.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("covid_roberta_25_masked_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("covid_roberta_25_masked_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|covid_roberta_25_masked_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/timoneda/covid_roberta_25_masked + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-covid_roberta_25_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-covid_roberta_25_pipeline_en.md new file mode 100644 index 00000000000000..d318073dd50d8b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-covid_roberta_25_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English covid_roberta_25_pipeline pipeline RoBertaEmbeddings from timoneda +author: John Snow Labs +name: covid_roberta_25_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`covid_roberta_25_pipeline` is a English model originally trained by timoneda. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/covid_roberta_25_pipeline_en_5.5.0_3.0_1727092114506.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/covid_roberta_25_pipeline_en_5.5.0_3.0_1727092114506.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("covid_roberta_25_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("covid_roberta_25_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|covid_roberta_25_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/timoneda/covid_roberta_25 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-cyber_distilbert_en.md b/docs/_posts/ahmedlone127/2024-09-23-cyber_distilbert_en.md new file mode 100644 index 00000000000000..7735149dd1bbb3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-cyber_distilbert_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English cyber_distilbert DistilBertForSequenceClassification from eysharaazia +author: John Snow Labs +name: cyber_distilbert +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`cyber_distilbert` is a English model originally trained by eysharaazia. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/cyber_distilbert_en_5.5.0_3.0_1727093590628.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/cyber_distilbert_en_5.5.0_3.0_1727093590628.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("cyber_distilbert","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("cyber_distilbert", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|cyber_distilbert| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|507.6 MB| + +## References + +https://huggingface.co/eysharaazia/cyber_distilbert \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-cyber_distilbert_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-cyber_distilbert_pipeline_en.md new file mode 100644 index 00000000000000..70dfd6b077c85f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-cyber_distilbert_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English cyber_distilbert_pipeline pipeline DistilBertForSequenceClassification from eysharaazia +author: John Snow Labs +name: cyber_distilbert_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`cyber_distilbert_pipeline` is a English model originally trained by eysharaazia. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/cyber_distilbert_pipeline_en_5.5.0_3.0_1727093615342.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/cyber_distilbert_pipeline_en_5.5.0_3.0_1727093615342.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("cyber_distilbert_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("cyber_distilbert_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|cyber_distilbert_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|507.6 MB| + +## References + +https://huggingface.co/eysharaazia/cyber_distilbert + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-dapt_plus_tapt_helpfulness_base_pretraining_model_ltuzova_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-dapt_plus_tapt_helpfulness_base_pretraining_model_ltuzova_pipeline_en.md new file mode 100644 index 00000000000000..37998785e19abf --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-dapt_plus_tapt_helpfulness_base_pretraining_model_ltuzova_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English dapt_plus_tapt_helpfulness_base_pretraining_model_ltuzova_pipeline pipeline RoBertaEmbeddings from ltuzova +author: John Snow Labs +name: dapt_plus_tapt_helpfulness_base_pretraining_model_ltuzova_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`dapt_plus_tapt_helpfulness_base_pretraining_model_ltuzova_pipeline` is a English model originally trained by ltuzova. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/dapt_plus_tapt_helpfulness_base_pretraining_model_ltuzova_pipeline_en_5.5.0_3.0_1727122125631.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/dapt_plus_tapt_helpfulness_base_pretraining_model_ltuzova_pipeline_en_5.5.0_3.0_1727122125631.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("dapt_plus_tapt_helpfulness_base_pretraining_model_ltuzova_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("dapt_plus_tapt_helpfulness_base_pretraining_model_ltuzova_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|dapt_plus_tapt_helpfulness_base_pretraining_model_ltuzova_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|466.2 MB| + +## References + +https://huggingface.co/ltuzova/dapt_plus_tapt_helpfulness_base_pretraining_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-db_mc_9_2_en.md b/docs/_posts/ahmedlone127/2024-09-23-db_mc_9_2_en.md new file mode 100644 index 00000000000000..d44a7337fc25b4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-db_mc_9_2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English db_mc_9_2 DistilBertForSequenceClassification from exala +author: John Snow Labs +name: db_mc_9_2 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`db_mc_9_2` is a English model originally trained by exala. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/db_mc_9_2_en_5.5.0_3.0_1727059224175.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/db_mc_9_2_en_5.5.0_3.0_1727059224175.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("db_mc_9_2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("db_mc_9_2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|db_mc_9_2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/exala/db_mc_9.2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-db_mc_9_2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-db_mc_9_2_pipeline_en.md new file mode 100644 index 00000000000000..ed4896bcbd2067 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-db_mc_9_2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English db_mc_9_2_pipeline pipeline DistilBertForSequenceClassification from exala +author: John Snow Labs +name: db_mc_9_2_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`db_mc_9_2_pipeline` is a English model originally trained by exala. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/db_mc_9_2_pipeline_en_5.5.0_3.0_1727059236326.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/db_mc_9_2_pipeline_en_5.5.0_3.0_1727059236326.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("db_mc_9_2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("db_mc_9_2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|db_mc_9_2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/exala/db_mc_9.2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-deepset_bert_base_cased_squad2_orkg_which_5e_05_en.md b/docs/_posts/ahmedlone127/2024-09-23-deepset_bert_base_cased_squad2_orkg_which_5e_05_en.md new file mode 100644 index 00000000000000..27ee3b783a84d5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-deepset_bert_base_cased_squad2_orkg_which_5e_05_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English deepset_bert_base_cased_squad2_orkg_which_5e_05 BertForQuestionAnswering from Moussab +author: John Snow Labs +name: deepset_bert_base_cased_squad2_orkg_which_5e_05 +date: 2024-09-23 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`deepset_bert_base_cased_squad2_orkg_which_5e_05` is a English model originally trained by Moussab. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/deepset_bert_base_cased_squad2_orkg_which_5e_05_en_5.5.0_3.0_1727070757142.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/deepset_bert_base_cased_squad2_orkg_which_5e_05_en_5.5.0_3.0_1727070757142.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("deepset_bert_base_cased_squad2_orkg_which_5e_05","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("deepset_bert_base_cased_squad2_orkg_which_5e_05", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|deepset_bert_base_cased_squad2_orkg_which_5e_05| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|403.6 MB| + +## References + +https://huggingface.co/Moussab/deepset_bert-base-cased-squad2-orkg-which-5e-05 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-deepset_bert_base_cased_squad2_orkg_which_5e_05_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-deepset_bert_base_cased_squad2_orkg_which_5e_05_pipeline_en.md new file mode 100644 index 00000000000000..0effd7df1db152 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-deepset_bert_base_cased_squad2_orkg_which_5e_05_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English deepset_bert_base_cased_squad2_orkg_which_5e_05_pipeline pipeline BertForQuestionAnswering from Moussab +author: John Snow Labs +name: deepset_bert_base_cased_squad2_orkg_which_5e_05_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`deepset_bert_base_cased_squad2_orkg_which_5e_05_pipeline` is a English model originally trained by Moussab. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/deepset_bert_base_cased_squad2_orkg_which_5e_05_pipeline_en_5.5.0_3.0_1727070776049.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/deepset_bert_base_cased_squad2_orkg_which_5e_05_pipeline_en_5.5.0_3.0_1727070776049.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("deepset_bert_base_cased_squad2_orkg_which_5e_05_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("deepset_bert_base_cased_squad2_orkg_which_5e_05_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|deepset_bert_base_cased_squad2_orkg_which_5e_05_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/Moussab/deepset_bert-base-cased-squad2-orkg-which-5e-05 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-denilsenaxel_xlm_roberta_finetuned_language_detection_en.md b/docs/_posts/ahmedlone127/2024-09-23-denilsenaxel_xlm_roberta_finetuned_language_detection_en.md new file mode 100644 index 00000000000000..a31b7e78dbca6f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-denilsenaxel_xlm_roberta_finetuned_language_detection_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English denilsenaxel_xlm_roberta_finetuned_language_detection XlmRoBertaForSequenceClassification from DenilsenAxel +author: John Snow Labs +name: denilsenaxel_xlm_roberta_finetuned_language_detection +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`denilsenaxel_xlm_roberta_finetuned_language_detection` is a English model originally trained by DenilsenAxel. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/denilsenaxel_xlm_roberta_finetuned_language_detection_en_5.5.0_3.0_1727126352193.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/denilsenaxel_xlm_roberta_finetuned_language_detection_en_5.5.0_3.0_1727126352193.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("denilsenaxel_xlm_roberta_finetuned_language_detection","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("denilsenaxel_xlm_roberta_finetuned_language_detection", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|denilsenaxel_xlm_roberta_finetuned_language_detection| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|792.3 MB| + +## References + +https://huggingface.co/DenilsenAxel/denilsenaxel-xlm-roberta-finetuned-language-detection \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-denilsenaxel_xlm_roberta_finetuned_language_detection_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-denilsenaxel_xlm_roberta_finetuned_language_detection_pipeline_en.md new file mode 100644 index 00000000000000..788e12bfb28ad2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-denilsenaxel_xlm_roberta_finetuned_language_detection_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English denilsenaxel_xlm_roberta_finetuned_language_detection_pipeline pipeline XlmRoBertaForSequenceClassification from DenilsenAxel +author: John Snow Labs +name: denilsenaxel_xlm_roberta_finetuned_language_detection_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`denilsenaxel_xlm_roberta_finetuned_language_detection_pipeline` is a English model originally trained by DenilsenAxel. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/denilsenaxel_xlm_roberta_finetuned_language_detection_pipeline_en_5.5.0_3.0_1727126484919.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/denilsenaxel_xlm_roberta_finetuned_language_detection_pipeline_en_5.5.0_3.0_1727126484919.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("denilsenaxel_xlm_roberta_finetuned_language_detection_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("denilsenaxel_xlm_roberta_finetuned_language_detection_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|denilsenaxel_xlm_roberta_finetuned_language_detection_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|792.3 MB| + +## References + +https://huggingface.co/DenilsenAxel/denilsenaxel-xlm-roberta-finetuned-language-detection + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-deproberta_v4_en.md b/docs/_posts/ahmedlone127/2024-09-23-deproberta_v4_en.md new file mode 100644 index 00000000000000..c31ca8280ee7e4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-deproberta_v4_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English deproberta_v4 RoBertaForSequenceClassification from ericNguyen0132 +author: John Snow Labs +name: deproberta_v4 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`deproberta_v4` is a English model originally trained by ericNguyen0132. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/deproberta_v4_en_5.5.0_3.0_1727135610488.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/deproberta_v4_en_5.5.0_3.0_1727135610488.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("deproberta_v4","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("deproberta_v4", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|deproberta_v4| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/ericNguyen0132/DepRoBERTa-v4 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-deproberta_v4_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-deproberta_v4_pipeline_en.md new file mode 100644 index 00000000000000..b99c92c1576aee --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-deproberta_v4_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English deproberta_v4_pipeline pipeline RoBertaForSequenceClassification from ericNguyen0132 +author: John Snow Labs +name: deproberta_v4_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`deproberta_v4_pipeline` is a English model originally trained by ericNguyen0132. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/deproberta_v4_pipeline_en_5.5.0_3.0_1727135683679.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/deproberta_v4_pipeline_en_5.5.0_3.0_1727135683679.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("deproberta_v4_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("deproberta_v4_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|deproberta_v4_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/ericNguyen0132/DepRoBERTa-v4 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-detect_femicide_news_xlmr_dutch_mono_freeze2_en.md b/docs/_posts/ahmedlone127/2024-09-23-detect_femicide_news_xlmr_dutch_mono_freeze2_en.md new file mode 100644 index 00000000000000..bab99a1a9a137d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-detect_femicide_news_xlmr_dutch_mono_freeze2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English detect_femicide_news_xlmr_dutch_mono_freeze2 XlmRoBertaForSequenceClassification from gossminn +author: John Snow Labs +name: detect_femicide_news_xlmr_dutch_mono_freeze2 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`detect_femicide_news_xlmr_dutch_mono_freeze2` is a English model originally trained by gossminn. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/detect_femicide_news_xlmr_dutch_mono_freeze2_en_5.5.0_3.0_1727100177395.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/detect_femicide_news_xlmr_dutch_mono_freeze2_en_5.5.0_3.0_1727100177395.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("detect_femicide_news_xlmr_dutch_mono_freeze2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("detect_femicide_news_xlmr_dutch_mono_freeze2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|detect_femicide_news_xlmr_dutch_mono_freeze2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|655.1 MB| + +## References + +https://huggingface.co/gossminn/detect-femicide-news-xlmr-nl-mono-freeze2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-detect_femicide_news_xlmr_dutch_mono_freeze2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-detect_femicide_news_xlmr_dutch_mono_freeze2_pipeline_en.md new file mode 100644 index 00000000000000..e8ef678fc96e47 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-detect_femicide_news_xlmr_dutch_mono_freeze2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English detect_femicide_news_xlmr_dutch_mono_freeze2_pipeline pipeline XlmRoBertaForSequenceClassification from gossminn +author: John Snow Labs +name: detect_femicide_news_xlmr_dutch_mono_freeze2_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`detect_femicide_news_xlmr_dutch_mono_freeze2_pipeline` is a English model originally trained by gossminn. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/detect_femicide_news_xlmr_dutch_mono_freeze2_pipeline_en_5.5.0_3.0_1727100356459.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/detect_femicide_news_xlmr_dutch_mono_freeze2_pipeline_en_5.5.0_3.0_1727100356459.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("detect_femicide_news_xlmr_dutch_mono_freeze2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("detect_femicide_news_xlmr_dutch_mono_freeze2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|detect_femicide_news_xlmr_dutch_mono_freeze2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|655.2 MB| + +## References + +https://huggingface.co/gossminn/detect-femicide-news-xlmr-nl-mono-freeze2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-detector_god2_en.md b/docs/_posts/ahmedlone127/2024-09-23-detector_god2_en.md new file mode 100644 index 00000000000000..73d4b2345bdecb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-detector_god2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English detector_god2 XlmRoBertaForSequenceClassification from Sydelabs +author: John Snow Labs +name: detector_god2 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`detector_god2` is a English model originally trained by Sydelabs. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/detector_god2_en_5.5.0_3.0_1727088261823.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/detector_god2_en_5.5.0_3.0_1727088261823.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("detector_god2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("detector_god2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|detector_god2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.1 GB| + +## References + +https://huggingface.co/Sydelabs/detector_god2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-detector_god2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-detector_god2_pipeline_en.md new file mode 100644 index 00000000000000..0a75f67960d0f6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-detector_god2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English detector_god2_pipeline pipeline XlmRoBertaForSequenceClassification from Sydelabs +author: John Snow Labs +name: detector_god2_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`detector_god2_pipeline` is a English model originally trained by Sydelabs. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/detector_god2_pipeline_en_5.5.0_3.0_1727088314880.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/detector_god2_pipeline_en_5.5.0_3.0_1727088314880.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("detector_god2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("detector_god2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|detector_god2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.1 GB| + +## References + +https://huggingface.co/Sydelabs/detector_god2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-diabetes_bert_two_en.md b/docs/_posts/ahmedlone127/2024-09-23-diabetes_bert_two_en.md new file mode 100644 index 00000000000000..832d787f8fdcd4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-diabetes_bert_two_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English diabetes_bert_two RoBertaEmbeddings from ubaskota +author: John Snow Labs +name: diabetes_bert_two +date: 2024-09-23 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`diabetes_bert_two` is a English model originally trained by ubaskota. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/diabetes_bert_two_en_5.5.0_3.0_1727122254339.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/diabetes_bert_two_en_5.5.0_3.0_1727122254339.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("diabetes_bert_two","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("diabetes_bert_two","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|diabetes_bert_two| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|463.7 MB| + +## References + +https://huggingface.co/ubaskota/diabetes_BERT_two \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-diabetes_bert_two_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-diabetes_bert_two_pipeline_en.md new file mode 100644 index 00000000000000..c72e575d6988cd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-diabetes_bert_two_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English diabetes_bert_two_pipeline pipeline RoBertaEmbeddings from ubaskota +author: John Snow Labs +name: diabetes_bert_two_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`diabetes_bert_two_pipeline` is a English model originally trained by ubaskota. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/diabetes_bert_two_pipeline_en_5.5.0_3.0_1727122276200.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/diabetes_bert_two_pipeline_en_5.5.0_3.0_1727122276200.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("diabetes_bert_two_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("diabetes_bert_two_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|diabetes_bert_two_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|463.7 MB| + +## References + +https://huggingface.co/ubaskota/diabetes_BERT_two + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-diffusion_robustness_imdb_en.md b/docs/_posts/ahmedlone127/2024-09-23-diffusion_robustness_imdb_en.md new file mode 100644 index 00000000000000..1c61e0dd548e20 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-diffusion_robustness_imdb_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English diffusion_robustness_imdb RoBertaEmbeddings from Maybe1407 +author: John Snow Labs +name: diffusion_robustness_imdb +date: 2024-09-23 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`diffusion_robustness_imdb` is a English model originally trained by Maybe1407. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/diffusion_robustness_imdb_en_5.5.0_3.0_1727080458962.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/diffusion_robustness_imdb_en_5.5.0_3.0_1727080458962.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("diffusion_robustness_imdb","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("diffusion_robustness_imdb","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|diffusion_robustness_imdb| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/Maybe1407/diffusion_robustness_imdb \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-diffusion_robustness_imdb_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-diffusion_robustness_imdb_pipeline_en.md new file mode 100644 index 00000000000000..8fb6844f33a202 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-diffusion_robustness_imdb_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English diffusion_robustness_imdb_pipeline pipeline RoBertaEmbeddings from Maybe1407 +author: John Snow Labs +name: diffusion_robustness_imdb_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`diffusion_robustness_imdb_pipeline` is a English model originally trained by Maybe1407. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/diffusion_robustness_imdb_pipeline_en_5.5.0_3.0_1727080527683.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/diffusion_robustness_imdb_pipeline_en_5.5.0_3.0_1727080527683.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("diffusion_robustness_imdb_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("diffusion_robustness_imdb_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|diffusion_robustness_imdb_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/Maybe1407/diffusion_robustness_imdb + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-disaster_tweet_3_en.md b/docs/_posts/ahmedlone127/2024-09-23-disaster_tweet_3_en.md new file mode 100644 index 00000000000000..4900b42201bf5d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-disaster_tweet_3_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English disaster_tweet_3 RoBertaForSequenceClassification from aellxx +author: John Snow Labs +name: disaster_tweet_3 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`disaster_tweet_3` is a English model originally trained by aellxx. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/disaster_tweet_3_en_5.5.0_3.0_1727134743885.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/disaster_tweet_3_en_5.5.0_3.0_1727134743885.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("disaster_tweet_3","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("disaster_tweet_3", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|disaster_tweet_3| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|468.1 MB| + +## References + +https://huggingface.co/aellxx/disaster-tweet-3 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-disaster_tweet_3_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-disaster_tweet_3_pipeline_en.md new file mode 100644 index 00000000000000..2224702088916e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-disaster_tweet_3_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English disaster_tweet_3_pipeline pipeline RoBertaForSequenceClassification from aellxx +author: John Snow Labs +name: disaster_tweet_3_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`disaster_tweet_3_pipeline` is a English model originally trained by aellxx. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/disaster_tweet_3_pipeline_en_5.5.0_3.0_1727134767965.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/disaster_tweet_3_pipeline_en_5.5.0_3.0_1727134767965.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("disaster_tweet_3_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("disaster_tweet_3_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|disaster_tweet_3_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|468.2 MB| + +## References + +https://huggingface.co/aellxx/disaster-tweet-3 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distil_whisper_medium_hindi_test_v2_en.md b/docs/_posts/ahmedlone127/2024-09-23-distil_whisper_medium_hindi_test_v2_en.md new file mode 100644 index 00000000000000..a9dddbd9a73498 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distil_whisper_medium_hindi_test_v2_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English distil_whisper_medium_hindi_test_v2 WhisperForCTC from yi-ching +author: John Snow Labs +name: distil_whisper_medium_hindi_test_v2 +date: 2024-09-23 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distil_whisper_medium_hindi_test_v2` is a English model originally trained by yi-ching. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distil_whisper_medium_hindi_test_v2_en_5.5.0_3.0_1727077585696.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distil_whisper_medium_hindi_test_v2_en_5.5.0_3.0_1727077585696.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("distil_whisper_medium_hindi_test_v2","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("distil_whisper_medium_hindi_test_v2", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distil_whisper_medium_hindi_test_v2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.8 GB| + +## References + +https://huggingface.co/yi-ching/distil-whisper-medium-hi-test-v2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_agnews_padding50model_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_agnews_padding50model_en.md new file mode 100644 index 00000000000000..c78a7d0c649608 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_agnews_padding50model_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_agnews_padding50model DistilBertForSequenceClassification from Realgon +author: John Snow Labs +name: distilbert_agnews_padding50model +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_agnews_padding50model` is a English model originally trained by Realgon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_agnews_padding50model_en_5.5.0_3.0_1727087106129.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_agnews_padding50model_en_5.5.0_3.0_1727087106129.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_agnews_padding50model","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_agnews_padding50model", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_agnews_padding50model| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Realgon/distilbert_agnews_padding50model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_agnews_padding50model_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_agnews_padding50model_pipeline_en.md new file mode 100644 index 00000000000000..6774137e8f5b60 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_agnews_padding50model_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_agnews_padding50model_pipeline pipeline DistilBertForSequenceClassification from Realgon +author: John Snow Labs +name: distilbert_agnews_padding50model_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_agnews_padding50model_pipeline` is a English model originally trained by Realgon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_agnews_padding50model_pipeline_en_5.5.0_3.0_1727087120314.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_agnews_padding50model_pipeline_en_5.5.0_3.0_1727087120314.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_agnews_padding50model_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_agnews_padding50model_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_agnews_padding50model_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Realgon/distilbert_agnews_padding50model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_agnews_padding80model_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_agnews_padding80model_en.md new file mode 100644 index 00000000000000..a293c4b11191dc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_agnews_padding80model_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_agnews_padding80model DistilBertForSequenceClassification from Realgon +author: John Snow Labs +name: distilbert_agnews_padding80model +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_agnews_padding80model` is a English model originally trained by Realgon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_agnews_padding80model_en_5.5.0_3.0_1727059834158.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_agnews_padding80model_en_5.5.0_3.0_1727059834158.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_agnews_padding80model","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_agnews_padding80model", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_agnews_padding80model| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Realgon/distilbert_agnews_padding80model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_agnews_padding80model_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_agnews_padding80model_pipeline_en.md new file mode 100644 index 00000000000000..cf7e734733bb92 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_agnews_padding80model_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_agnews_padding80model_pipeline pipeline DistilBertForSequenceClassification from Realgon +author: John Snow Labs +name: distilbert_agnews_padding80model_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_agnews_padding80model_pipeline` is a English model originally trained by Realgon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_agnews_padding80model_pipeline_en_5.5.0_3.0_1727059846344.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_agnews_padding80model_pipeline_en_5.5.0_3.0_1727059846344.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_agnews_padding80model_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_agnews_padding80model_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_agnews_padding80model_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Realgon/distilbert_agnews_padding80model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_cased_hatespeech_ft_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_cased_hatespeech_ft_en.md new file mode 100644 index 00000000000000..64d300b9313990 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_cased_hatespeech_ft_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_cased_hatespeech_ft DistilBertForSequenceClassification from EgehanEralp +author: John Snow Labs +name: distilbert_base_cased_hatespeech_ft +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_cased_hatespeech_ft` is a English model originally trained by EgehanEralp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_cased_hatespeech_ft_en_5.5.0_3.0_1727082637367.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_cased_hatespeech_ft_en_5.5.0_3.0_1727082637367.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_cased_hatespeech_ft","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_cased_hatespeech_ft", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_cased_hatespeech_ft| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|246.0 MB| + +## References + +https://huggingface.co/EgehanEralp/distilbert-base-cased-hatespeech-ft \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_cased_hatespeech_ft_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_cased_hatespeech_ft_pipeline_en.md new file mode 100644 index 00000000000000..f72c9b878ba9fc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_cased_hatespeech_ft_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_cased_hatespeech_ft_pipeline pipeline DistilBertForSequenceClassification from EgehanEralp +author: John Snow Labs +name: distilbert_base_cased_hatespeech_ft_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_cased_hatespeech_ft_pipeline` is a English model originally trained by EgehanEralp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_cased_hatespeech_ft_pipeline_en_5.5.0_3.0_1727082648897.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_cased_hatespeech_ft_pipeline_en_5.5.0_3.0_1727082648897.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_cased_hatespeech_ft_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_cased_hatespeech_ft_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_cased_hatespeech_ft_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|246.0 MB| + +## References + +https://huggingface.co/EgehanEralp/distilbert-base-cased-hatespeech-ft + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_finetuned_chnsenticorp_chinese_pipeline_zh.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_finetuned_chnsenticorp_chinese_pipeline_zh.md new file mode 100644 index 00000000000000..2374f3392b0c4e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_finetuned_chnsenticorp_chinese_pipeline_zh.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Chinese distilbert_base_finetuned_chnsenticorp_chinese_pipeline pipeline DistilBertForSequenceClassification from WangA +author: John Snow Labs +name: distilbert_base_finetuned_chnsenticorp_chinese_pipeline +date: 2024-09-23 +tags: [zh, open_source, pipeline, onnx] +task: Text Classification +language: zh +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_finetuned_chnsenticorp_chinese_pipeline` is a Chinese model originally trained by WangA. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_finetuned_chnsenticorp_chinese_pipeline_zh_5.5.0_3.0_1727082583587.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_finetuned_chnsenticorp_chinese_pipeline_zh_5.5.0_3.0_1727082583587.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_finetuned_chnsenticorp_chinese_pipeline", lang = "zh") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_finetuned_chnsenticorp_chinese_pipeline", lang = "zh") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_finetuned_chnsenticorp_chinese_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|zh| +|Size:|507.6 MB| + +## References + +https://huggingface.co/WangA/distilbert-base-finetuned-chnsenticorp-chinese + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_finetuned_chnsenticorp_chinese_zh.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_finetuned_chnsenticorp_chinese_zh.md new file mode 100644 index 00000000000000..1cb801a579c718 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_finetuned_chnsenticorp_chinese_zh.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Chinese distilbert_base_finetuned_chnsenticorp_chinese DistilBertForSequenceClassification from WangA +author: John Snow Labs +name: distilbert_base_finetuned_chnsenticorp_chinese +date: 2024-09-23 +tags: [zh, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: zh +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_finetuned_chnsenticorp_chinese` is a Chinese model originally trained by WangA. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_finetuned_chnsenticorp_chinese_zh_5.5.0_3.0_1727082557787.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_finetuned_chnsenticorp_chinese_zh_5.5.0_3.0_1727082557787.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_finetuned_chnsenticorp_chinese","zh") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_finetuned_chnsenticorp_chinese", "zh") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_finetuned_chnsenticorp_chinese| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|zh| +|Size:|507.6 MB| + +## References + +https://huggingface.co/WangA/distilbert-base-finetuned-chnsenticorp-chinese \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_3epoch10_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_3epoch10_en.md new file mode 100644 index 00000000000000..27403cf5ac2a61 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_3epoch10_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_3epoch10 DistilBertForSequenceClassification from dianamihalache27 +author: John Snow Labs +name: distilbert_base_uncased_3epoch10 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_3epoch10` is a English model originally trained by dianamihalache27. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_3epoch10_en_5.5.0_3.0_1727093918373.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_3epoch10_en_5.5.0_3.0_1727093918373.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_3epoch10","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_3epoch10", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_3epoch10| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/dianamihalache27/distilbert-base-uncased_3epoch10 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_3epoch10_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_3epoch10_pipeline_en.md new file mode 100644 index 00000000000000..1b2de858511fcf --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_3epoch10_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_3epoch10_pipeline pipeline DistilBertForSequenceClassification from dianamihalache27 +author: John Snow Labs +name: distilbert_base_uncased_3epoch10_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_3epoch10_pipeline` is a English model originally trained by dianamihalache27. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_3epoch10_pipeline_en_5.5.0_3.0_1727093930102.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_3epoch10_pipeline_en_5.5.0_3.0_1727093930102.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_3epoch10_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_3epoch10_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_3epoch10_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/dianamihalache27/distilbert-base-uncased_3epoch10 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_distilled_finetuned_clinc_sohaibsoussi_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_distilled_finetuned_clinc_sohaibsoussi_en.md new file mode 100644 index 00000000000000..d4128211615834 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_distilled_finetuned_clinc_sohaibsoussi_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_distilled_finetuned_clinc_sohaibsoussi DistilBertForSequenceClassification from Sohaibsoussi +author: John Snow Labs +name: distilbert_base_uncased_distilled_finetuned_clinc_sohaibsoussi +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_distilled_finetuned_clinc_sohaibsoussi` is a English model originally trained by Sohaibsoussi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_distilled_finetuned_clinc_sohaibsoussi_en_5.5.0_3.0_1727087216585.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_distilled_finetuned_clinc_sohaibsoussi_en_5.5.0_3.0_1727087216585.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_distilled_finetuned_clinc_sohaibsoussi","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_distilled_finetuned_clinc_sohaibsoussi", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_distilled_finetuned_clinc_sohaibsoussi| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.9 MB| + +## References + +https://huggingface.co/Sohaibsoussi/distilbert-base-uncased-distilled-finetuned-clinc \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_distilled_finetuned_clinc_sohaibsoussi_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_distilled_finetuned_clinc_sohaibsoussi_pipeline_en.md new file mode 100644 index 00000000000000..0058f98fe2d32b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_distilled_finetuned_clinc_sohaibsoussi_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_distilled_finetuned_clinc_sohaibsoussi_pipeline pipeline DistilBertForSequenceClassification from Sohaibsoussi +author: John Snow Labs +name: distilbert_base_uncased_distilled_finetuned_clinc_sohaibsoussi_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_distilled_finetuned_clinc_sohaibsoussi_pipeline` is a English model originally trained by Sohaibsoussi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_distilled_finetuned_clinc_sohaibsoussi_pipeline_en_5.5.0_3.0_1727087228194.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_distilled_finetuned_clinc_sohaibsoussi_pipeline_en_5.5.0_3.0_1727087228194.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_distilled_finetuned_clinc_sohaibsoussi_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_distilled_finetuned_clinc_sohaibsoussi_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_distilled_finetuned_clinc_sohaibsoussi_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.9 MB| + +## References + +https://huggingface.co/Sohaibsoussi/distilbert-base-uncased-distilled-finetuned-clinc + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_emotion_classifier_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_emotion_classifier_en.md new file mode 100644 index 00000000000000..3c63c5ffbedf3b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_emotion_classifier_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_emotion_classifier DistilBertForSequenceClassification from automatichamster +author: John Snow Labs +name: distilbert_base_uncased_emotion_classifier +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_emotion_classifier` is a English model originally trained by automatichamster. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_emotion_classifier_en_5.5.0_3.0_1727087130717.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_emotion_classifier_en_5.5.0_3.0_1727087130717.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_emotion_classifier","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_emotion_classifier", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_emotion_classifier| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/automatichamster/distilbert-base-uncased-emotion-classifier \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_emotion_classifier_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_emotion_classifier_pipeline_en.md new file mode 100644 index 00000000000000..8269e9b8595b5a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_emotion_classifier_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_emotion_classifier_pipeline pipeline DistilBertForSequenceClassification from automatichamster +author: John Snow Labs +name: distilbert_base_uncased_emotion_classifier_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_emotion_classifier_pipeline` is a English model originally trained by automatichamster. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_emotion_classifier_pipeline_en_5.5.0_3.0_1727087143250.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_emotion_classifier_pipeline_en_5.5.0_3.0_1727087143250.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_emotion_classifier_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_emotion_classifier_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_emotion_classifier_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/automatichamster/distilbert-base-uncased-emotion-classifier + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_adl_hw1_alkhwarizmi_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_adl_hw1_alkhwarizmi_en.md new file mode 100644 index 00000000000000..9520ffb1c5b741 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_adl_hw1_alkhwarizmi_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_adl_hw1_alkhwarizmi DistilBertForSequenceClassification from alkhwarizmi +author: John Snow Labs +name: distilbert_base_uncased_finetuned_adl_hw1_alkhwarizmi +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_adl_hw1_alkhwarizmi` is a English model originally trained by alkhwarizmi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_adl_hw1_alkhwarizmi_en_5.5.0_3.0_1727108287535.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_adl_hw1_alkhwarizmi_en_5.5.0_3.0_1727108287535.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_adl_hw1_alkhwarizmi","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_adl_hw1_alkhwarizmi", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_adl_hw1_alkhwarizmi| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.9 MB| + +## References + +https://huggingface.co/alkhwarizmi/distilbert-base-uncased-finetuned-adl_hw1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_adl_hw_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_adl_hw_en.md new file mode 100644 index 00000000000000..112ebfda99951b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_adl_hw_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_adl_hw DistilBertForSequenceClassification from Hongu +author: John Snow Labs +name: distilbert_base_uncased_finetuned_adl_hw +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_adl_hw` is a English model originally trained by Hongu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_adl_hw_en_5.5.0_3.0_1727074043952.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_adl_hw_en_5.5.0_3.0_1727074043952.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_adl_hw","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_adl_hw", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_adl_hw| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.9 MB| + +## References + +https://huggingface.co/Hongu/distilbert-base-uncased-finetuned-adl_hw \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_adl_hw_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_adl_hw_pipeline_en.md new file mode 100644 index 00000000000000..36c046c9b721a1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_adl_hw_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_adl_hw_pipeline pipeline DistilBertForSequenceClassification from Hongu +author: John Snow Labs +name: distilbert_base_uncased_finetuned_adl_hw_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_adl_hw_pipeline` is a English model originally trained by Hongu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_adl_hw_pipeline_en_5.5.0_3.0_1727074058740.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_adl_hw_pipeline_en_5.5.0_3.0_1727074058740.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_adl_hw_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_adl_hw_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_adl_hw_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.9 MB| + +## References + +https://huggingface.co/Hongu/distilbert-base-uncased-finetuned-adl_hw + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_cc_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_cc_en.md new file mode 100644 index 00000000000000..2edf96dcf65e6e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_cc_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_cc DistilBertForSequenceClassification from gtalibov +author: John Snow Labs +name: distilbert_base_uncased_finetuned_cc +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_cc` is a English model originally trained by gtalibov. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cc_en_5.5.0_3.0_1727093714112.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cc_en_5.5.0_3.0_1727093714112.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_cc","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_cc", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_cc| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.6 MB| + +## References + +https://huggingface.co/gtalibov/distilbert-base-uncased-finetuned-CC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_cc_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_cc_pipeline_en.md new file mode 100644 index 00000000000000..04ee9ae6c91306 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_cc_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_cc_pipeline pipeline DistilBertForSequenceClassification from gtalibov +author: John Snow Labs +name: distilbert_base_uncased_finetuned_cc_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_cc_pipeline` is a English model originally trained by gtalibov. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cc_pipeline_en_5.5.0_3.0_1727093727942.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cc_pipeline_en_5.5.0_3.0_1727093727942.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_cc_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_cc_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_cc_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.6 MB| + +## References + +https://huggingface.co/gtalibov/distilbert-base-uncased-finetuned-CC + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_clinc_bobtk_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_clinc_bobtk_en.md new file mode 100644 index 00000000000000..a2edc00b815b17 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_clinc_bobtk_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_clinc_bobtk DistilBertForSequenceClassification from bobtk +author: John Snow Labs +name: distilbert_base_uncased_finetuned_clinc_bobtk +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_clinc_bobtk` is a English model originally trained by bobtk. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_clinc_bobtk_en_5.5.0_3.0_1727093709989.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_clinc_bobtk_en_5.5.0_3.0_1727093709989.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_clinc_bobtk","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_clinc_bobtk", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_clinc_bobtk| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.9 MB| + +## References + +https://huggingface.co/bobtk/distilbert-base-uncased-finetuned-clinc \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_clinc_bobtk_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_clinc_bobtk_pipeline_en.md new file mode 100644 index 00000000000000..19950c8deec632 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_clinc_bobtk_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_clinc_bobtk_pipeline pipeline DistilBertForSequenceClassification from bobtk +author: John Snow Labs +name: distilbert_base_uncased_finetuned_clinc_bobtk_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_clinc_bobtk_pipeline` is a English model originally trained by bobtk. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_clinc_bobtk_pipeline_en_5.5.0_3.0_1727093722353.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_clinc_bobtk_pipeline_en_5.5.0_3.0_1727093722353.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_clinc_bobtk_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_clinc_bobtk_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_clinc_bobtk_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.9 MB| + +## References + +https://huggingface.co/bobtk/distilbert-base-uncased-finetuned-clinc + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_clinc_cjfghk5697_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_clinc_cjfghk5697_en.md new file mode 100644 index 00000000000000..0d26be7cc437b1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_clinc_cjfghk5697_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_clinc_cjfghk5697 DistilBertForSequenceClassification from cjfghk5697 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_clinc_cjfghk5697 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_clinc_cjfghk5697` is a English model originally trained by cjfghk5697. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_clinc_cjfghk5697_en_5.5.0_3.0_1727059227313.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_clinc_cjfghk5697_en_5.5.0_3.0_1727059227313.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_clinc_cjfghk5697","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_clinc_cjfghk5697", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_clinc_cjfghk5697| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.9 MB| + +## References + +https://huggingface.co/cjfghk5697/distilbert-base-uncased-finetuned-clinc \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_clinc_cjfghk5697_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_clinc_cjfghk5697_pipeline_en.md new file mode 100644 index 00000000000000..b03d5bae586cdd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_clinc_cjfghk5697_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_clinc_cjfghk5697_pipeline pipeline DistilBertForSequenceClassification from cjfghk5697 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_clinc_cjfghk5697_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_clinc_cjfghk5697_pipeline` is a English model originally trained by cjfghk5697. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_clinc_cjfghk5697_pipeline_en_5.5.0_3.0_1727059239264.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_clinc_cjfghk5697_pipeline_en_5.5.0_3.0_1727059239264.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_clinc_cjfghk5697_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_clinc_cjfghk5697_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_clinc_cjfghk5697_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.9 MB| + +## References + +https://huggingface.co/cjfghk5697/distilbert-base-uncased-finetuned-clinc + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_clinc_dodiaz2111_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_clinc_dodiaz2111_en.md new file mode 100644 index 00000000000000..0277273df754e8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_clinc_dodiaz2111_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_clinc_dodiaz2111 DistilBertForSequenceClassification from dodiaz2111 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_clinc_dodiaz2111 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_clinc_dodiaz2111` is a English model originally trained by dodiaz2111. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_clinc_dodiaz2111_en_5.5.0_3.0_1727074031811.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_clinc_dodiaz2111_en_5.5.0_3.0_1727074031811.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_clinc_dodiaz2111","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_clinc_dodiaz2111", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_clinc_dodiaz2111| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.9 MB| + +## References + +https://huggingface.co/dodiaz2111/distilbert-base-uncased-finetuned-clinc \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_clinc_dodiaz2111_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_clinc_dodiaz2111_pipeline_en.md new file mode 100644 index 00000000000000..28f663d12cafb2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_clinc_dodiaz2111_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_clinc_dodiaz2111_pipeline pipeline DistilBertForSequenceClassification from dodiaz2111 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_clinc_dodiaz2111_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_clinc_dodiaz2111_pipeline` is a English model originally trained by dodiaz2111. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_clinc_dodiaz2111_pipeline_en_5.5.0_3.0_1727074044565.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_clinc_dodiaz2111_pipeline_en_5.5.0_3.0_1727074044565.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_clinc_dodiaz2111_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_clinc_dodiaz2111_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_clinc_dodiaz2111_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.9 MB| + +## References + +https://huggingface.co/dodiaz2111/distilbert-base-uncased-finetuned-clinc + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_clinc_hrayrm_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_clinc_hrayrm_en.md new file mode 100644 index 00000000000000..fe8024f3d4045c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_clinc_hrayrm_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_clinc_hrayrm DistilBertForSequenceClassification from HrayrM +author: John Snow Labs +name: distilbert_base_uncased_finetuned_clinc_hrayrm +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_clinc_hrayrm` is a English model originally trained by HrayrM. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_clinc_hrayrm_en_5.5.0_3.0_1727110387998.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_clinc_hrayrm_en_5.5.0_3.0_1727110387998.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_clinc_hrayrm","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_clinc_hrayrm", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_clinc_hrayrm| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.9 MB| + +## References + +https://huggingface.co/HrayrM/distilbert-base-uncased-finetuned-clinc \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_clinc_hrayrm_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_clinc_hrayrm_pipeline_en.md new file mode 100644 index 00000000000000..94dfc3a7a48bf0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_clinc_hrayrm_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_clinc_hrayrm_pipeline pipeline DistilBertForSequenceClassification from HrayrM +author: John Snow Labs +name: distilbert_base_uncased_finetuned_clinc_hrayrm_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_clinc_hrayrm_pipeline` is a English model originally trained by HrayrM. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_clinc_hrayrm_pipeline_en_5.5.0_3.0_1727110401610.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_clinc_hrayrm_pipeline_en_5.5.0_3.0_1727110401610.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_clinc_hrayrm_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_clinc_hrayrm_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_clinc_hrayrm_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.9 MB| + +## References + +https://huggingface.co/HrayrM/distilbert-base-uncased-finetuned-clinc + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_clinc_joacorf33_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_clinc_joacorf33_en.md new file mode 100644 index 00000000000000..2183dd664e4e91 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_clinc_joacorf33_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_clinc_joacorf33 DistilBertForSequenceClassification from joacorf33 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_clinc_joacorf33 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_clinc_joacorf33` is a English model originally trained by joacorf33. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_clinc_joacorf33_en_5.5.0_3.0_1727093600577.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_clinc_joacorf33_en_5.5.0_3.0_1727093600577.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_clinc_joacorf33","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_clinc_joacorf33", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_clinc_joacorf33| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.9 MB| + +## References + +https://huggingface.co/joacorf33/distilbert-base-uncased-finetuned-clinc \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_cola_addie11_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_cola_addie11_en.md new file mode 100644 index 00000000000000..14054395aca005 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_cola_addie11_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_cola_addie11 DistilBertForSequenceClassification from addie11 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_cola_addie11 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_cola_addie11` is a English model originally trained by addie11. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_addie11_en_5.5.0_3.0_1727059343971.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_addie11_en_5.5.0_3.0_1727059343971.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_cola_addie11","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_cola_addie11", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_cola_addie11| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/addie11/distilbert-base-uncased-finetuned-cola \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_cola_addie11_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_cola_addie11_pipeline_en.md new file mode 100644 index 00000000000000..21c514eed983b8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_cola_addie11_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_cola_addie11_pipeline pipeline DistilBertForSequenceClassification from addie11 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_cola_addie11_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_cola_addie11_pipeline` is a English model originally trained by addie11. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_addie11_pipeline_en_5.5.0_3.0_1727059355826.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_addie11_pipeline_en_5.5.0_3.0_1727059355826.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_cola_addie11_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_cola_addie11_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_cola_addie11_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/addie11/distilbert-base-uncased-finetuned-cola + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_cola_gamallo_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_cola_gamallo_en.md new file mode 100644 index 00000000000000..94d712ee054f16 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_cola_gamallo_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_cola_gamallo DistilBertForSequenceClassification from gamallo +author: John Snow Labs +name: distilbert_base_uncased_finetuned_cola_gamallo +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_cola_gamallo` is a English model originally trained by gamallo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_gamallo_en_5.5.0_3.0_1727059375729.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_gamallo_en_5.5.0_3.0_1727059375729.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_cola_gamallo","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_cola_gamallo", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_cola_gamallo| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/gamallo/distilbert-base-uncased-finetuned-cola \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_cola_gamallo_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_cola_gamallo_pipeline_en.md new file mode 100644 index 00000000000000..99a1733c346c86 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_cola_gamallo_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_cola_gamallo_pipeline pipeline DistilBertForSequenceClassification from gamallo +author: John Snow Labs +name: distilbert_base_uncased_finetuned_cola_gamallo_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_cola_gamallo_pipeline` is a English model originally trained by gamallo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_gamallo_pipeline_en_5.5.0_3.0_1727059387763.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_gamallo_pipeline_en_5.5.0_3.0_1727059387763.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_cola_gamallo_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_cola_gamallo_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_cola_gamallo_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/gamallo/distilbert-base-uncased-finetuned-cola + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_cola_garyseventeen_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_cola_garyseventeen_en.md new file mode 100644 index 00000000000000..d07aa04b64287f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_cola_garyseventeen_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_cola_garyseventeen DistilBertForSequenceClassification from Garyseventeen +author: John Snow Labs +name: distilbert_base_uncased_finetuned_cola_garyseventeen +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_cola_garyseventeen` is a English model originally trained by Garyseventeen. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_garyseventeen_en_5.5.0_3.0_1727110523624.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_garyseventeen_en_5.5.0_3.0_1727110523624.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_cola_garyseventeen","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_cola_garyseventeen", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_cola_garyseventeen| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Garyseventeen/distilbert-base-uncased-finetuned-cola \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_cola_garyseventeen_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_cola_garyseventeen_pipeline_en.md new file mode 100644 index 00000000000000..35ff5f44405fa1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_cola_garyseventeen_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_cola_garyseventeen_pipeline pipeline DistilBertForSequenceClassification from Garyseventeen +author: John Snow Labs +name: distilbert_base_uncased_finetuned_cola_garyseventeen_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_cola_garyseventeen_pipeline` is a English model originally trained by Garyseventeen. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_garyseventeen_pipeline_en_5.5.0_3.0_1727110535749.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_garyseventeen_pipeline_en_5.5.0_3.0_1727110535749.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_cola_garyseventeen_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_cola_garyseventeen_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_cola_garyseventeen_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Garyseventeen/distilbert-base-uncased-finetuned-cola + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_cola_poodja_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_cola_poodja_en.md new file mode 100644 index 00000000000000..f129e8804a0036 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_cola_poodja_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_cola_poodja DistilBertForSequenceClassification from Poodja +author: John Snow Labs +name: distilbert_base_uncased_finetuned_cola_poodja +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_cola_poodja` is a English model originally trained by Poodja. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_poodja_en_5.5.0_3.0_1727108543457.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_poodja_en_5.5.0_3.0_1727108543457.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_cola_poodja","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_cola_poodja", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_cola_poodja| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Poodja/distilbert-base-uncased-finetuned-cola \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_cola_poodja_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_cola_poodja_pipeline_en.md new file mode 100644 index 00000000000000..3129e1ffbf1fe5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_cola_poodja_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_cola_poodja_pipeline pipeline DistilBertForSequenceClassification from Poodja +author: John Snow Labs +name: distilbert_base_uncased_finetuned_cola_poodja_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_cola_poodja_pipeline` is a English model originally trained by Poodja. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_poodja_pipeline_en_5.5.0_3.0_1727108555603.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_poodja_pipeline_en_5.5.0_3.0_1727108555603.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_cola_poodja_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_cola_poodja_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_cola_poodja_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Poodja/distilbert-base-uncased-finetuned-cola + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_cola_sjoerdvink_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_cola_sjoerdvink_en.md new file mode 100644 index 00000000000000..4a5752d8029e01 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_cola_sjoerdvink_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_cola_sjoerdvink DistilBertForSequenceClassification from sjoerdvink +author: John Snow Labs +name: distilbert_base_uncased_finetuned_cola_sjoerdvink +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_cola_sjoerdvink` is a English model originally trained by sjoerdvink. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_sjoerdvink_en_5.5.0_3.0_1727082297367.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_sjoerdvink_en_5.5.0_3.0_1727082297367.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_cola_sjoerdvink","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_cola_sjoerdvink", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_cola_sjoerdvink| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/sjoerdvink/distilbert-base-uncased-finetuned-cola \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_cola_zeid_hazboun_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_cola_zeid_hazboun_en.md new file mode 100644 index 00000000000000..f2f5c9f3df3292 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_cola_zeid_hazboun_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_cola_zeid_hazboun DistilBertForSequenceClassification from Zeid-Hazboun +author: John Snow Labs +name: distilbert_base_uncased_finetuned_cola_zeid_hazboun +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_cola_zeid_hazboun` is a English model originally trained by Zeid-Hazboun. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_zeid_hazboun_en_5.5.0_3.0_1727108397380.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_zeid_hazboun_en_5.5.0_3.0_1727108397380.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_cola_zeid_hazboun","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_cola_zeid_hazboun", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_cola_zeid_hazboun| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Zeid-Hazboun/distilbert-base-uncased-finetuned-cola \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_cola_zeid_hazboun_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_cola_zeid_hazboun_pipeline_en.md new file mode 100644 index 00000000000000..9722661a5f8043 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_cola_zeid_hazboun_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_cola_zeid_hazboun_pipeline pipeline DistilBertForSequenceClassification from Zeid-Hazboun +author: John Snow Labs +name: distilbert_base_uncased_finetuned_cola_zeid_hazboun_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_cola_zeid_hazboun_pipeline` is a English model originally trained by Zeid-Hazboun. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_zeid_hazboun_pipeline_en_5.5.0_3.0_1727108410486.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_zeid_hazboun_pipeline_en_5.5.0_3.0_1727108410486.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_cola_zeid_hazboun_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_cola_zeid_hazboun_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_cola_zeid_hazboun_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Zeid-Hazboun/distilbert-base-uncased-finetuned-cola + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_conceptos_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_conceptos_en.md new file mode 100644 index 00000000000000..420f5ab8d7914b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_conceptos_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_conceptos DistilBertForSequenceClassification from jcesquivel +author: John Snow Labs +name: distilbert_base_uncased_finetuned_conceptos +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_conceptos` is a English model originally trained by jcesquivel. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_conceptos_en_5.5.0_3.0_1727087112051.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_conceptos_en_5.5.0_3.0_1727087112051.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_conceptos","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_conceptos", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_conceptos| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/jcesquivel/distilbert-base-uncased-finetuned-conceptos \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_conceptos_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_conceptos_pipeline_en.md new file mode 100644 index 00000000000000..6c73e9786efdaf --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_conceptos_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_conceptos_pipeline pipeline DistilBertForSequenceClassification from jcesquivel +author: John Snow Labs +name: distilbert_base_uncased_finetuned_conceptos_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_conceptos_pipeline` is a English model originally trained by jcesquivel. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_conceptos_pipeline_en_5.5.0_3.0_1727087125822.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_conceptos_pipeline_en_5.5.0_3.0_1727087125822.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_conceptos_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_conceptos_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_conceptos_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.6 MB| + +## References + +https://huggingface.co/jcesquivel/distilbert-base-uncased-finetuned-conceptos + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_dataset_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_dataset_en.md new file mode 100644 index 00000000000000..6353d60aaddadf --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_dataset_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_dataset DistilBertForSequenceClassification from boisalai +author: John Snow Labs +name: distilbert_base_uncased_finetuned_dataset +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_dataset` is a English model originally trained by boisalai. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_dataset_en_5.5.0_3.0_1727059927006.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_dataset_en_5.5.0_3.0_1727059927006.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_dataset","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_dataset", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_dataset| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/boisalai/distilbert-base-uncased-finetuned-dataset \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_depression_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_depression_en.md new file mode 100644 index 00000000000000..b2b7d5ed223a67 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_depression_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_depression DistilBertForSequenceClassification from welsachy +author: John Snow Labs +name: distilbert_base_uncased_finetuned_depression +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_depression` is a English model originally trained by welsachy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_depression_en_5.5.0_3.0_1727108588595.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_depression_en_5.5.0_3.0_1727108588595.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_depression","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_depression", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_depression| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/welsachy/distilbert-base-uncased-finetuned-depression \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_disaster_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_disaster_en.md new file mode 100644 index 00000000000000..8b217fd6b2d0ab --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_disaster_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_disaster DistilBertForSequenceClassification from RaiRachit +author: John Snow Labs +name: distilbert_base_uncased_finetuned_disaster +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_disaster` is a English model originally trained by RaiRachit. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_disaster_en_5.5.0_3.0_1727108517271.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_disaster_en_5.5.0_3.0_1727108517271.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_disaster","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_disaster", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_disaster| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/RaiRachit/distilbert-base-uncased-finetuned-disaster \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_disaster_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_disaster_pipeline_en.md new file mode 100644 index 00000000000000..dd5d4ef12078ca --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_disaster_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_disaster_pipeline pipeline DistilBertForSequenceClassification from RaiRachit +author: John Snow Labs +name: distilbert_base_uncased_finetuned_disaster_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_disaster_pipeline` is a English model originally trained by RaiRachit. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_disaster_pipeline_en_5.5.0_3.0_1727108529116.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_disaster_pipeline_en_5.5.0_3.0_1727108529116.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_disaster_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_disaster_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_disaster_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/RaiRachit/distilbert-base-uncased-finetuned-disaster + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_2hab_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_2hab_en.md new file mode 100644 index 00000000000000..65c61138751766 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_2hab_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_2hab DistilBertForSequenceClassification from 2hab +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_2hab +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_2hab` is a English model originally trained by 2hab. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_2hab_en_5.5.0_3.0_1727110394824.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_2hab_en_5.5.0_3.0_1727110394824.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_2hab","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_2hab", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_2hab| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/2hab/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_2hab_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_2hab_pipeline_en.md new file mode 100644 index 00000000000000..d048816bddf34b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_2hab_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_2hab_pipeline pipeline DistilBertForSequenceClassification from 2hab +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_2hab_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_2hab_pipeline` is a English model originally trained by 2hab. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_2hab_pipeline_en_5.5.0_3.0_1727110408334.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_2hab_pipeline_en_5.5.0_3.0_1727110408334.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_2hab_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_2hab_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_2hab_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/2hab/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_adelineli_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_adelineli_en.md new file mode 100644 index 00000000000000..f57b7816679dac --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_adelineli_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_adelineli DistilBertForSequenceClassification from adelineli +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_adelineli +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_adelineli` is a English model originally trained by adelineli. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_adelineli_en_5.5.0_3.0_1727108384522.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_adelineli_en_5.5.0_3.0_1727108384522.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_adelineli","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_adelineli", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_adelineli| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/adelineli/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_adelineli_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_adelineli_pipeline_en.md new file mode 100644 index 00000000000000..8381d977289ca2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_adelineli_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_adelineli_pipeline pipeline DistilBertForSequenceClassification from adelineli +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_adelineli_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_adelineli_pipeline` is a English model originally trained by adelineli. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_adelineli_pipeline_en_5.5.0_3.0_1727108396360.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_adelineli_pipeline_en_5.5.0_3.0_1727108396360.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_adelineli_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_adelineli_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_adelineli_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/adelineli/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_adidae_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_adidae_en.md new file mode 100644 index 00000000000000..77078fba799dc5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_adidae_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_adidae DistilBertForSequenceClassification from Adidae +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_adidae +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_adidae` is a English model originally trained by Adidae. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_adidae_en_5.5.0_3.0_1727059564967.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_adidae_en_5.5.0_3.0_1727059564967.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_adidae","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_adidae", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_adidae| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Adidae/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_adidae_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_adidae_pipeline_en.md new file mode 100644 index 00000000000000..aa0ae06c624c43 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_adidae_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_adidae_pipeline pipeline DistilBertForSequenceClassification from Adidae +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_adidae_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_adidae_pipeline` is a English model originally trained by Adidae. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_adidae_pipeline_en_5.5.0_3.0_1727059577102.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_adidae_pipeline_en_5.5.0_3.0_1727059577102.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_adidae_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_adidae_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_adidae_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Adidae/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_cereline_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_cereline_en.md new file mode 100644 index 00000000000000..a87c560d35b55d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_cereline_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_cereline DistilBertForSequenceClassification from cereline +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_cereline +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_cereline` is a English model originally trained by cereline. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_cereline_en_5.5.0_3.0_1727110610693.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_cereline_en_5.5.0_3.0_1727110610693.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_cereline","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_cereline", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_cereline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/cereline/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_cereline_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_cereline_pipeline_en.md new file mode 100644 index 00000000000000..902a4299ea2b4b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_cereline_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_cereline_pipeline pipeline DistilBertForSequenceClassification from cereline +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_cereline_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_cereline_pipeline` is a English model originally trained by cereline. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_cereline_pipeline_en_5.5.0_3.0_1727110622417.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_cereline_pipeline_en_5.5.0_3.0_1727110622417.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_cereline_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_cereline_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_cereline_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/cereline/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_cogsci13_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_cogsci13_en.md new file mode 100644 index 00000000000000..85ff8714d9bc15 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_cogsci13_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_cogsci13 DistilBertForSequenceClassification from cogsci13 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_cogsci13 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_cogsci13` is a English model originally trained by cogsci13. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_cogsci13_en_5.5.0_3.0_1727082509540.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_cogsci13_en_5.5.0_3.0_1727082509540.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_cogsci13","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_cogsci13", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_cogsci13| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/cogsci13/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_cogsci13_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_cogsci13_pipeline_en.md new file mode 100644 index 00000000000000..832865b0f7fb73 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_cogsci13_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_cogsci13_pipeline pipeline DistilBertForSequenceClassification from cogsci13 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_cogsci13_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_cogsci13_pipeline` is a English model originally trained by cogsci13. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_cogsci13_pipeline_en_5.5.0_3.0_1727082522178.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_cogsci13_pipeline_en_5.5.0_3.0_1727082522178.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_cogsci13_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_cogsci13_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_cogsci13_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/cogsci13/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_devs0n_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_devs0n_en.md new file mode 100644 index 00000000000000..161ab6b3dcef2c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_devs0n_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_devs0n DistilBertForSequenceClassification from devs0n +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_devs0n +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_devs0n` is a English model originally trained by devs0n. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_devs0n_en_5.5.0_3.0_1727059472829.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_devs0n_en_5.5.0_3.0_1727059472829.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_devs0n","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_devs0n", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_devs0n| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/devs0n/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_devs0n_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_devs0n_pipeline_en.md new file mode 100644 index 00000000000000..fe1a00a3e51f32 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_devs0n_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_devs0n_pipeline pipeline DistilBertForSequenceClassification from devs0n +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_devs0n_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_devs0n_pipeline` is a English model originally trained by devs0n. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_devs0n_pipeline_en_5.5.0_3.0_1727059485004.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_devs0n_pipeline_en_5.5.0_3.0_1727059485004.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_devs0n_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_devs0n_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_devs0n_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/devs0n/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_dljh1214_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_dljh1214_pipeline_en.md new file mode 100644 index 00000000000000..64036a0c256625 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_dljh1214_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_dljh1214_pipeline pipeline DistilBertForSequenceClassification from dljh1214 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_dljh1214_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_dljh1214_pipeline` is a English model originally trained by dljh1214. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_dljh1214_pipeline_en_5.5.0_3.0_1727094032502.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_dljh1214_pipeline_en_5.5.0_3.0_1727094032502.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_dljh1214_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_dljh1214_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_dljh1214_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/dljh1214/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_hun0520_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_hun0520_en.md new file mode 100644 index 00000000000000..fd74dd0e39de3e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_hun0520_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_hun0520 DistilBertForSequenceClassification from hun0520 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_hun0520 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_hun0520` is a English model originally trained by hun0520. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_hun0520_en_5.5.0_3.0_1727093929959.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_hun0520_en_5.5.0_3.0_1727093929959.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_hun0520","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_hun0520", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_hun0520| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/hun0520/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_hun0520_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_hun0520_pipeline_en.md new file mode 100644 index 00000000000000..50034e86d97a3f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_hun0520_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_hun0520_pipeline pipeline DistilBertForSequenceClassification from hun0520 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_hun0520_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_hun0520_pipeline` is a English model originally trained by hun0520. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_hun0520_pipeline_en_5.5.0_3.0_1727093944930.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_hun0520_pipeline_en_5.5.0_3.0_1727093944930.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_hun0520_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_hun0520_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_hun0520_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/hun0520/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_jachs182_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_jachs182_en.md new file mode 100644 index 00000000000000..b9475af22bb757 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_jachs182_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_jachs182 DistilBertForSequenceClassification from jachs182 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_jachs182 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_jachs182` is a English model originally trained by jachs182. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_jachs182_en_5.5.0_3.0_1727059762812.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_jachs182_en_5.5.0_3.0_1727059762812.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_jachs182","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_jachs182", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_jachs182| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/jachs182/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_jeongyeom_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_jeongyeom_en.md new file mode 100644 index 00000000000000..58eca5db095881 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_jeongyeom_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_jeongyeom DistilBertForSequenceClassification from jeongyeom +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_jeongyeom +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_jeongyeom` is a English model originally trained by jeongyeom. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_jeongyeom_en_5.5.0_3.0_1727059771316.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_jeongyeom_en_5.5.0_3.0_1727059771316.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_jeongyeom","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_jeongyeom", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_jeongyeom| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/jeongyeom/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_jeongyeom_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_jeongyeom_pipeline_en.md new file mode 100644 index 00000000000000..eec3b59d0cf6e3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_jeongyeom_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_jeongyeom_pipeline pipeline DistilBertForSequenceClassification from jeongyeom +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_jeongyeom_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_jeongyeom_pipeline` is a English model originally trained by jeongyeom. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_jeongyeom_pipeline_en_5.5.0_3.0_1727059784678.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_jeongyeom_pipeline_en_5.5.0_3.0_1727059784678.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_jeongyeom_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_jeongyeom_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_jeongyeom_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/jeongyeom/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_jerilseb_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_jerilseb_pipeline_en.md new file mode 100644 index 00000000000000..9d97827efa3b46 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_jerilseb_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_jerilseb_pipeline pipeline DistilBertForSequenceClassification from jerilseb +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_jerilseb_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_jerilseb_pipeline` is a English model originally trained by jerilseb. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_jerilseb_pipeline_en_5.5.0_3.0_1727097310875.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_jerilseb_pipeline_en_5.5.0_3.0_1727097310875.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_jerilseb_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_jerilseb_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_jerilseb_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/jerilseb/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_jrsky_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_jrsky_en.md new file mode 100644 index 00000000000000..a3352d304dce1a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_jrsky_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_jrsky DistilBertForSequenceClassification from jrsky +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_jrsky +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_jrsky` is a English model originally trained by jrsky. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_jrsky_en_5.5.0_3.0_1727073517647.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_jrsky_en_5.5.0_3.0_1727073517647.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_jrsky","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_jrsky", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_jrsky| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/jrsky/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_jrsky_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_jrsky_pipeline_en.md new file mode 100644 index 00000000000000..c37d128a3db7ee --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_jrsky_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_jrsky_pipeline pipeline DistilBertForSequenceClassification from jrsky +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_jrsky_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_jrsky_pipeline` is a English model originally trained by jrsky. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_jrsky_pipeline_en_5.5.0_3.0_1727073536479.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_jrsky_pipeline_en_5.5.0_3.0_1727073536479.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_jrsky_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_jrsky_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_jrsky_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/jrsky/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_ladoza03_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_ladoza03_en.md new file mode 100644 index 00000000000000..a8d2baa413f734 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_ladoza03_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_ladoza03 DistilBertForSequenceClassification from ladoza03 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_ladoza03 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_ladoza03` is a English model originally trained by ladoza03. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_ladoza03_en_5.5.0_3.0_1727110497479.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_ladoza03_en_5.5.0_3.0_1727110497479.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_ladoza03","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_ladoza03", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_ladoza03| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/ladoza03/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_ladoza03_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_ladoza03_pipeline_en.md new file mode 100644 index 00000000000000..89c2828ac442aa --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_ladoza03_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_ladoza03_pipeline pipeline DistilBertForSequenceClassification from ladoza03 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_ladoza03_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_ladoza03_pipeline` is a English model originally trained by ladoza03. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_ladoza03_pipeline_en_5.5.0_3.0_1727110509485.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_ladoza03_pipeline_en_5.5.0_3.0_1727110509485.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_ladoza03_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_ladoza03_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_ladoza03_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/ladoza03/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_lulu5131_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_lulu5131_en.md new file mode 100644 index 00000000000000..f48af0f66bbd03 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_lulu5131_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_lulu5131 DistilBertForSequenceClassification from lulu5131 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_lulu5131 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_lulu5131` is a English model originally trained by lulu5131. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_lulu5131_en_5.5.0_3.0_1727073842892.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_lulu5131_en_5.5.0_3.0_1727073842892.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_lulu5131","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_lulu5131", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_lulu5131| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/lulu5131/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_lulu5131_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_lulu5131_pipeline_en.md new file mode 100644 index 00000000000000..024488c948fe9a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_lulu5131_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_lulu5131_pipeline pipeline DistilBertForSequenceClassification from lulu5131 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_lulu5131_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_lulu5131_pipeline` is a English model originally trained by lulu5131. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_lulu5131_pipeline_en_5.5.0_3.0_1727073854903.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_lulu5131_pipeline_en_5.5.0_3.0_1727073854903.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_lulu5131_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_lulu5131_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_lulu5131_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/lulu5131/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_matvey67_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_matvey67_en.md new file mode 100644 index 00000000000000..be2aeb90c16188 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_matvey67_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_matvey67 DistilBertForSequenceClassification from Matvey67 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_matvey67 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_matvey67` is a English model originally trained by Matvey67. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_matvey67_en_5.5.0_3.0_1727073944867.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_matvey67_en_5.5.0_3.0_1727073944867.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_matvey67","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_matvey67", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_matvey67| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Matvey67/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_mikhab_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_mikhab_en.md new file mode 100644 index 00000000000000..849f27927e500c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_mikhab_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_mikhab DistilBertForSequenceClassification from mikhab +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_mikhab +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_mikhab` is a English model originally trained by mikhab. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_mikhab_en_5.5.0_3.0_1727059479937.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_mikhab_en_5.5.0_3.0_1727059479937.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_mikhab","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_mikhab", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_mikhab| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/mikhab/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_mikhab_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_mikhab_pipeline_en.md new file mode 100644 index 00000000000000..0532d821d39cbf --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_mikhab_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_mikhab_pipeline pipeline DistilBertForSequenceClassification from mikhab +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_mikhab_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_mikhab_pipeline` is a English model originally trained by mikhab. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_mikhab_pipeline_en_5.5.0_3.0_1727059492056.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_mikhab_pipeline_en_5.5.0_3.0_1727059492056.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_mikhab_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_mikhab_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_mikhab_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/mikhab/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_praneelnihar_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_praneelnihar_en.md new file mode 100644 index 00000000000000..4028fbd34bbe20 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_praneelnihar_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_praneelnihar DistilBertForSequenceClassification from PraneelNihar +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_praneelnihar +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_praneelnihar` is a English model originally trained by PraneelNihar. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_praneelnihar_en_5.5.0_3.0_1727093694045.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_praneelnihar_en_5.5.0_3.0_1727093694045.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_praneelnihar","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_praneelnihar", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_praneelnihar| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/PraneelNihar/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_praneelnihar_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_praneelnihar_pipeline_en.md new file mode 100644 index 00000000000000..2a60a814694861 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_praneelnihar_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_praneelnihar_pipeline pipeline DistilBertForSequenceClassification from PraneelNihar +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_praneelnihar_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_praneelnihar_pipeline` is a English model originally trained by PraneelNihar. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_praneelnihar_pipeline_en_5.5.0_3.0_1727093707231.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_praneelnihar_pipeline_en_5.5.0_3.0_1727093707231.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_praneelnihar_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_praneelnihar_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_praneelnihar_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/PraneelNihar/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_sharadhonavar_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_sharadhonavar_en.md new file mode 100644 index 00000000000000..8c74aa12482a09 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_sharadhonavar_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_sharadhonavar DistilBertForSequenceClassification from Sharadhonavar +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_sharadhonavar +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_sharadhonavar` is a English model originally trained by Sharadhonavar. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_sharadhonavar_en_5.5.0_3.0_1727097276844.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_sharadhonavar_en_5.5.0_3.0_1727097276844.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_sharadhonavar","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_sharadhonavar", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_sharadhonavar| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Sharadhonavar/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_trsekhar_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_trsekhar_en.md new file mode 100644 index 00000000000000..891f91aed04e3b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_trsekhar_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_trsekhar DistilBertForSequenceClassification from trsekhar +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_trsekhar +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_trsekhar` is a English model originally trained by trsekhar. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_trsekhar_en_5.5.0_3.0_1727073952136.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_trsekhar_en_5.5.0_3.0_1727073952136.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_trsekhar","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_trsekhar", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_trsekhar| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/trsekhar/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_trsekhar_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_trsekhar_pipeline_en.md new file mode 100644 index 00000000000000..81fbb07fe72c2a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_trsekhar_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_trsekhar_pipeline pipeline DistilBertForSequenceClassification from trsekhar +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_trsekhar_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_trsekhar_pipeline` is a English model originally trained by trsekhar. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_trsekhar_pipeline_en_5.5.0_3.0_1727073969465.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_trsekhar_pipeline_en_5.5.0_3.0_1727073969465.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_trsekhar_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_trsekhar_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_trsekhar_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/trsekhar/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_zcstarr_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_zcstarr_en.md new file mode 100644 index 00000000000000..280e9d9fa157bb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_zcstarr_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_zcstarr DistilBertForSequenceClassification from zcstarr +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_zcstarr +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_zcstarr` is a English model originally trained by zcstarr. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_zcstarr_en_5.5.0_3.0_1727059325176.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_zcstarr_en_5.5.0_3.0_1727059325176.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_zcstarr","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_zcstarr", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_zcstarr| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/zcstarr/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_zcstarr_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_zcstarr_pipeline_en.md new file mode 100644 index 00000000000000..d8938a70d1be66 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_zcstarr_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_zcstarr_pipeline pipeline DistilBertForSequenceClassification from zcstarr +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_zcstarr_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_zcstarr_pipeline` is a English model originally trained by zcstarr. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_zcstarr_pipeline_en_5.5.0_3.0_1727059337813.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_zcstarr_pipeline_en_5.5.0_3.0_1727059337813.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_zcstarr_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_zcstarr_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_zcstarr_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/zcstarr/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_ziwone_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_ziwone_en.md new file mode 100644 index 00000000000000..0de483a5d61836 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_ziwone_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_ziwone DistilBertForSequenceClassification from ziwone +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_ziwone +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_ziwone` is a English model originally trained by ziwone. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_ziwone_en_5.5.0_3.0_1727074057508.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_ziwone_en_5.5.0_3.0_1727074057508.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_ziwone","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_ziwone", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_ziwone| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/ziwone/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_ziwone_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_ziwone_pipeline_en.md new file mode 100644 index 00000000000000..20ae2c2b5d1366 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_ziwone_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_ziwone_pipeline pipeline DistilBertForSequenceClassification from ziwone +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_ziwone_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_ziwone_pipeline` is a English model originally trained by ziwone. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_ziwone_pipeline_en_5.5.0_3.0_1727074069147.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_ziwone_pipeline_en_5.5.0_3.0_1727074069147.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_ziwone_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_ziwone_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_ziwone_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/ziwone/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_intro2_verizon_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_intro2_verizon_en.md new file mode 100644 index 00000000000000..2a627b3ecc85eb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_intro2_verizon_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_intro2_verizon DistilBertForSequenceClassification from TieIncred +author: John Snow Labs +name: distilbert_base_uncased_finetuned_intro2_verizon +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_intro2_verizon` is a English model originally trained by TieIncred. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_intro2_verizon_en_5.5.0_3.0_1727086974505.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_intro2_verizon_en_5.5.0_3.0_1727086974505.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_intro2_verizon","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_intro2_verizon", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_intro2_verizon| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/TieIncred/distilbert-base-uncased-finetuned-intro2-verizon \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_intro2_verizon_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_intro2_verizon_pipeline_en.md new file mode 100644 index 00000000000000..f9bfb2a0ea0729 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_intro2_verizon_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_intro2_verizon_pipeline pipeline DistilBertForSequenceClassification from TieIncred +author: John Snow Labs +name: distilbert_base_uncased_finetuned_intro2_verizon_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_intro2_verizon_pipeline` is a English model originally trained by TieIncred. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_intro2_verizon_pipeline_en_5.5.0_3.0_1727086987155.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_intro2_verizon_pipeline_en_5.5.0_3.0_1727086987155.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_intro2_verizon_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_intro2_verizon_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_intro2_verizon_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/TieIncred/distilbert-base-uncased-finetuned-intro2-verizon + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_m_share_facts_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_m_share_facts_pipeline_en.md new file mode 100644 index 00000000000000..bd24666b649460 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_m_share_facts_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_m_share_facts_pipeline pipeline DistilBertForSequenceClassification from Gregorig +author: John Snow Labs +name: distilbert_base_uncased_finetuned_m_share_facts_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_m_share_facts_pipeline` is a English model originally trained by Gregorig. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_m_share_facts_pipeline_en_5.5.0_3.0_1727082214896.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_m_share_facts_pipeline_en_5.5.0_3.0_1727082214896.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_m_share_facts_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_m_share_facts_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_m_share_facts_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Gregorig/distilbert-base-uncased-finetuned-m_share_facts + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_mp_unannotated_half_frozen_v1_full_classes_v1_un_frozen_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_mp_unannotated_half_frozen_v1_full_classes_v1_un_frozen_en.md new file mode 100644 index 00000000000000..d26a9101f864f8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_mp_unannotated_half_frozen_v1_full_classes_v1_un_frozen_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_mp_unannotated_half_frozen_v1_full_classes_v1_un_frozen DistilBertForSequenceClassification from kghanlon +author: John Snow Labs +name: distilbert_base_uncased_finetuned_mp_unannotated_half_frozen_v1_full_classes_v1_un_frozen +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_mp_unannotated_half_frozen_v1_full_classes_v1_un_frozen` is a English model originally trained by kghanlon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_mp_unannotated_half_frozen_v1_full_classes_v1_un_frozen_en_5.5.0_3.0_1727096984328.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_mp_unannotated_half_frozen_v1_full_classes_v1_un_frozen_en_5.5.0_3.0_1727096984328.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_mp_unannotated_half_frozen_v1_full_classes_v1_un_frozen","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_mp_unannotated_half_frozen_v1_full_classes_v1_un_frozen", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_mp_unannotated_half_frozen_v1_full_classes_v1_un_frozen| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.6 MB| + +## References + +https://huggingface.co/kghanlon/distilbert-base-uncased-finetuned-MP-unannotated-half-frozen-v1-FULL_CLASSES-v1_un_frozen \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_mp_unannotated_half_frozen_v1_full_classes_v1_un_frozen_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_mp_unannotated_half_frozen_v1_full_classes_v1_un_frozen_pipeline_en.md new file mode 100644 index 00000000000000..c1dc311d8ee632 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_mp_unannotated_half_frozen_v1_full_classes_v1_un_frozen_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_mp_unannotated_half_frozen_v1_full_classes_v1_un_frozen_pipeline pipeline DistilBertForSequenceClassification from kghanlon +author: John Snow Labs +name: distilbert_base_uncased_finetuned_mp_unannotated_half_frozen_v1_full_classes_v1_un_frozen_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_mp_unannotated_half_frozen_v1_full_classes_v1_un_frozen_pipeline` is a English model originally trained by kghanlon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_mp_unannotated_half_frozen_v1_full_classes_v1_un_frozen_pipeline_en_5.5.0_3.0_1727096996121.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_mp_unannotated_half_frozen_v1_full_classes_v1_un_frozen_pipeline_en_5.5.0_3.0_1727096996121.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_mp_unannotated_half_frozen_v1_full_classes_v1_un_frozen_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_mp_unannotated_half_frozen_v1_full_classes_v1_un_frozen_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_mp_unannotated_half_frozen_v1_full_classes_v1_un_frozen_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.6 MB| + +## References + +https://huggingface.co/kghanlon/distilbert-base-uncased-finetuned-MP-unannotated-half-frozen-v1-FULL_CLASSES-v1_un_frozen + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_pad_clf_v2_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_pad_clf_v2_en.md new file mode 100644 index 00000000000000..f00a8e90023d12 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_pad_clf_v2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_pad_clf_v2 DistilBertForSequenceClassification from netoferraz +author: John Snow Labs +name: distilbert_base_uncased_finetuned_pad_clf_v2 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_pad_clf_v2` is a English model originally trained by netoferraz. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_pad_clf_v2_en_5.5.0_3.0_1727110387504.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_pad_clf_v2_en_5.5.0_3.0_1727110387504.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_pad_clf_v2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_pad_clf_v2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_pad_clf_v2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/netoferraz/distilbert-base-uncased-finetuned-pad-clf-v2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_pad_clf_v2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_pad_clf_v2_pipeline_en.md new file mode 100644 index 00000000000000..604640f2ee4291 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_pad_clf_v2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_pad_clf_v2_pipeline pipeline DistilBertForSequenceClassification from netoferraz +author: John Snow Labs +name: distilbert_base_uncased_finetuned_pad_clf_v2_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_pad_clf_v2_pipeline` is a English model originally trained by netoferraz. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_pad_clf_v2_pipeline_en_5.5.0_3.0_1727110399689.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_pad_clf_v2_pipeline_en_5.5.0_3.0_1727110399689.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_pad_clf_v2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_pad_clf_v2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_pad_clf_v2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/netoferraz/distilbert-base-uncased-finetuned-pad-clf-v2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_sst_2_english_beijaflor2024_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_sst_2_english_beijaflor2024_en.md new file mode 100644 index 00000000000000..eb4df75da01309 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_sst_2_english_beijaflor2024_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_sst_2_english_beijaflor2024 DistilBertForSequenceClassification from Beijaflor2024 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_sst_2_english_beijaflor2024 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_sst_2_english_beijaflor2024` is a English model originally trained by Beijaflor2024. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_sst_2_english_beijaflor2024_en_5.5.0_3.0_1727093987044.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_sst_2_english_beijaflor2024_en_5.5.0_3.0_1727093987044.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_sst_2_english_beijaflor2024","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_sst_2_english_beijaflor2024", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_sst_2_english_beijaflor2024| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Beijaflor2024/distilbert-base-uncased-finetuned-sst-2-english \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_sst_2_english_beijaflor2024_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_sst_2_english_beijaflor2024_pipeline_en.md new file mode 100644 index 00000000000000..d9c58d8bf047a5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_sst_2_english_beijaflor2024_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_sst_2_english_beijaflor2024_pipeline pipeline DistilBertForSequenceClassification from Beijaflor2024 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_sst_2_english_beijaflor2024_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_sst_2_english_beijaflor2024_pipeline` is a English model originally trained by Beijaflor2024. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_sst_2_english_beijaflor2024_pipeline_en_5.5.0_3.0_1727093998673.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_sst_2_english_beijaflor2024_pipeline_en_5.5.0_3.0_1727093998673.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_sst_2_english_beijaflor2024_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_sst_2_english_beijaflor2024_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_sst_2_english_beijaflor2024_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Beijaflor2024/distilbert-base-uncased-finetuned-sst-2-english + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_tweets_dataset_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_tweets_dataset_en.md new file mode 100644 index 00000000000000..5760775125dbe8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_tweets_dataset_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_tweets_dataset DistilBertForSequenceClassification from lambda101 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_tweets_dataset +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_tweets_dataset` is a English model originally trained by lambda101. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_tweets_dataset_en_5.5.0_3.0_1727086751833.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_tweets_dataset_en_5.5.0_3.0_1727086751833.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_tweets_dataset","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_tweets_dataset", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_tweets_dataset| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/lambda101/distilbert-base-uncased-finetuned-tweets-dataset \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_tweets_dataset_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_tweets_dataset_pipeline_en.md new file mode 100644 index 00000000000000..603f749d1ddbe7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_tweets_dataset_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_tweets_dataset_pipeline pipeline DistilBertForSequenceClassification from lambda101 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_tweets_dataset_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_tweets_dataset_pipeline` is a English model originally trained by lambda101. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_tweets_dataset_pipeline_en_5.5.0_3.0_1727086767715.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_tweets_dataset_pipeline_en_5.5.0_3.0_1727086767715.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_tweets_dataset_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_tweets_dataset_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_tweets_dataset_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/lambda101/distilbert-base-uncased-finetuned-tweets-dataset + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finteuned_emotion_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finteuned_emotion_en.md new file mode 100644 index 00000000000000..3ba4e7b67e8ee5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finteuned_emotion_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finteuned_emotion DistilBertForSequenceClassification from sknera +author: John Snow Labs +name: distilbert_base_uncased_finteuned_emotion +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finteuned_emotion` is a English model originally trained by sknera. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finteuned_emotion_en_5.5.0_3.0_1727074062122.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finteuned_emotion_en_5.5.0_3.0_1727074062122.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finteuned_emotion","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finteuned_emotion", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finteuned_emotion| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/sknera/distilbert-base-uncased-finteuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finteuned_emotion_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finteuned_emotion_pipeline_en.md new file mode 100644 index 00000000000000..fe6bc3e3f253a7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finteuned_emotion_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finteuned_emotion_pipeline pipeline DistilBertForSequenceClassification from sknera +author: John Snow Labs +name: distilbert_base_uncased_finteuned_emotion_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finteuned_emotion_pipeline` is a English model originally trained by sknera. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finteuned_emotion_pipeline_en_5.5.0_3.0_1727074074068.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finteuned_emotion_pipeline_en_5.5.0_3.0_1727074074068.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finteuned_emotion_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finteuned_emotion_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finteuned_emotion_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/sknera/distilbert-base-uncased-finteuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_imdb_dfurman_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_imdb_dfurman_en.md new file mode 100644 index 00000000000000..4a40a9455edc21 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_imdb_dfurman_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_imdb_dfurman DistilBertForSequenceClassification from dfurman +author: John Snow Labs +name: distilbert_base_uncased_imdb_dfurman +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_imdb_dfurman` is a English model originally trained by dfurman. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_imdb_dfurman_en_5.5.0_3.0_1727086925352.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_imdb_dfurman_en_5.5.0_3.0_1727086925352.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_imdb_dfurman","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_imdb_dfurman", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_imdb_dfurman| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/dfurman/distilbert-base-uncased-imdb \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_imdb_dfurman_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_imdb_dfurman_pipeline_en.md new file mode 100644 index 00000000000000..4916b1f1223c02 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_imdb_dfurman_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_imdb_dfurman_pipeline pipeline DistilBertForSequenceClassification from dfurman +author: John Snow Labs +name: distilbert_base_uncased_imdb_dfurman_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_imdb_dfurman_pipeline` is a English model originally trained by dfurman. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_imdb_dfurman_pipeline_en_5.5.0_3.0_1727086937837.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_imdb_dfurman_pipeline_en_5.5.0_3.0_1727086937837.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_imdb_dfurman_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_imdb_dfurman_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_imdb_dfurman_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/dfurman/distilbert-base-uncased-imdb + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_lora_text_classification_lincgr_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_lora_text_classification_lincgr_en.md new file mode 100644 index 00000000000000..6e201509c0dbf4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_lora_text_classification_lincgr_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_lora_text_classification_lincgr DistilBertForSequenceClassification from lincgr +author: John Snow Labs +name: distilbert_base_uncased_lora_text_classification_lincgr +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_lora_text_classification_lincgr` is a English model originally trained by lincgr. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_lora_text_classification_lincgr_en_5.5.0_3.0_1727059557405.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_lora_text_classification_lincgr_en_5.5.0_3.0_1727059557405.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_lora_text_classification_lincgr","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_lora_text_classification_lincgr", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_lora_text_classification_lincgr| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/lincgr/distilbert-base-uncased-lora-text-classification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_lora_text_classification_lincgr_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_lora_text_classification_lincgr_pipeline_en.md new file mode 100644 index 00000000000000..7c68cc18b70681 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_lora_text_classification_lincgr_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_lora_text_classification_lincgr_pipeline pipeline DistilBertForSequenceClassification from lincgr +author: John Snow Labs +name: distilbert_base_uncased_lora_text_classification_lincgr_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_lora_text_classification_lincgr_pipeline` is a English model originally trained by lincgr. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_lora_text_classification_lincgr_pipeline_en_5.5.0_3.0_1727059569407.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_lora_text_classification_lincgr_pipeline_en_5.5.0_3.0_1727059569407.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_lora_text_classification_lincgr_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_lora_text_classification_lincgr_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_lora_text_classification_lincgr_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/lincgr/distilbert-base-uncased-lora-text-classification + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_odm_zphr_0st13sd_ut72ut5_plprefix0stlarge13_simsp100_clean200_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_odm_zphr_0st13sd_ut72ut5_plprefix0stlarge13_simsp100_clean200_en.md new file mode 100644 index 00000000000000..6d86c76286e517 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_odm_zphr_0st13sd_ut72ut5_plprefix0stlarge13_simsp100_clean200_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st13sd_ut72ut5_plprefix0stlarge13_simsp100_clean200 DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st13sd_ut72ut5_plprefix0stlarge13_simsp100_clean200 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st13sd_ut72ut5_plprefix0stlarge13_simsp100_clean200` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st13sd_ut72ut5_plprefix0stlarge13_simsp100_clean200_en_5.5.0_3.0_1727059409187.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st13sd_ut72ut5_plprefix0stlarge13_simsp100_clean200_en_5.5.0_3.0_1727059409187.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st13sd_ut72ut5_plprefix0stlarge13_simsp100_clean200","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st13sd_ut72ut5_plprefix0stlarge13_simsp100_clean200", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st13sd_ut72ut5_plprefix0stlarge13_simsp100_clean200| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.6 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st13sd_ut72ut5_PLPrefix0stlarge13_simsp100_clean200 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_odm_zphr_0st13sd_ut72ut5_plprefix0stlarge13_simsp100_clean200_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_odm_zphr_0st13sd_ut72ut5_plprefix0stlarge13_simsp100_clean200_pipeline_en.md new file mode 100644 index 00000000000000..1229d9b810808b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_odm_zphr_0st13sd_ut72ut5_plprefix0stlarge13_simsp100_clean200_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st13sd_ut72ut5_plprefix0stlarge13_simsp100_clean200_pipeline pipeline DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st13sd_ut72ut5_plprefix0stlarge13_simsp100_clean200_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st13sd_ut72ut5_plprefix0stlarge13_simsp100_clean200_pipeline` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st13sd_ut72ut5_plprefix0stlarge13_simsp100_clean200_pipeline_en_5.5.0_3.0_1727059420921.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st13sd_ut72ut5_plprefix0stlarge13_simsp100_clean200_pipeline_en_5.5.0_3.0_1727059420921.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st13sd_ut72ut5_plprefix0stlarge13_simsp100_clean200_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st13sd_ut72ut5_plprefix0stlarge13_simsp100_clean200_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st13sd_ut72ut5_plprefix0stlarge13_simsp100_clean200_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.7 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st13sd_ut72ut5_PLPrefix0stlarge13_simsp100_clean200 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_odm_zphr_0st14sd_ut72ut1large14pfxnf_simsp400_clean200_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_odm_zphr_0st14sd_ut72ut1large14pfxnf_simsp400_clean200_en.md new file mode 100644 index 00000000000000..37801adef19cd9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_odm_zphr_0st14sd_ut72ut1large14pfxnf_simsp400_clean200_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st14sd_ut72ut1large14pfxnf_simsp400_clean200 DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st14sd_ut72ut1large14pfxnf_simsp400_clean200 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st14sd_ut72ut1large14pfxnf_simsp400_clean200` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st14sd_ut72ut1large14pfxnf_simsp400_clean200_en_5.5.0_3.0_1727059122032.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st14sd_ut72ut1large14pfxnf_simsp400_clean200_en_5.5.0_3.0_1727059122032.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st14sd_ut72ut1large14pfxnf_simsp400_clean200","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st14sd_ut72ut1large14pfxnf_simsp400_clean200", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st14sd_ut72ut1large14pfxnf_simsp400_clean200| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.6 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st14sd_ut72ut1large14PfxNf_simsp400_clean200 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_odm_zphr_0st14sd_ut72ut1large14pfxnf_simsp400_clean200_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_odm_zphr_0st14sd_ut72ut1large14pfxnf_simsp400_clean200_pipeline_en.md new file mode 100644 index 00000000000000..079fa026aaa050 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_odm_zphr_0st14sd_ut72ut1large14pfxnf_simsp400_clean200_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st14sd_ut72ut1large14pfxnf_simsp400_clean200_pipeline pipeline DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st14sd_ut72ut1large14pfxnf_simsp400_clean200_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st14sd_ut72ut1large14pfxnf_simsp400_clean200_pipeline` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st14sd_ut72ut1large14pfxnf_simsp400_clean200_pipeline_en_5.5.0_3.0_1727059139627.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st14sd_ut72ut1large14pfxnf_simsp400_clean200_pipeline_en_5.5.0_3.0_1727059139627.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st14sd_ut72ut1large14pfxnf_simsp400_clean200_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st14sd_ut72ut1large14pfxnf_simsp400_clean200_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st14sd_ut72ut1large14pfxnf_simsp400_clean200_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.7 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st14sd_ut72ut1large14PfxNf_simsp400_clean200 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_odm_zphr_0st15sd_ut72ut1_plprefix0stlarge_simsp300_clean100_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_odm_zphr_0st15sd_ut72ut1_plprefix0stlarge_simsp300_clean100_en.md new file mode 100644 index 00000000000000..534d19c5bc6ac4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_odm_zphr_0st15sd_ut72ut1_plprefix0stlarge_simsp300_clean100_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st15sd_ut72ut1_plprefix0stlarge_simsp300_clean100 DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st15sd_ut72ut1_plprefix0stlarge_simsp300_clean100 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st15sd_ut72ut1_plprefix0stlarge_simsp300_clean100` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st15sd_ut72ut1_plprefix0stlarge_simsp300_clean100_en_5.5.0_3.0_1727059527931.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st15sd_ut72ut1_plprefix0stlarge_simsp300_clean100_en_5.5.0_3.0_1727059527931.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st15sd_ut72ut1_plprefix0stlarge_simsp300_clean100","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st15sd_ut72ut1_plprefix0stlarge_simsp300_clean100", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st15sd_ut72ut1_plprefix0stlarge_simsp300_clean100| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.6 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st15sd_ut72ut1_PLPrefix0stlarge_simsp300_clean100 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_odm_zphr_0st15sd_ut72ut1_plprefix0stlarge_simsp300_clean100_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_odm_zphr_0st15sd_ut72ut1_plprefix0stlarge_simsp300_clean100_pipeline_en.md new file mode 100644 index 00000000000000..46f4b7b970d174 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_odm_zphr_0st15sd_ut72ut1_plprefix0stlarge_simsp300_clean100_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st15sd_ut72ut1_plprefix0stlarge_simsp300_clean100_pipeline pipeline DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st15sd_ut72ut1_plprefix0stlarge_simsp300_clean100_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st15sd_ut72ut1_plprefix0stlarge_simsp300_clean100_pipeline` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st15sd_ut72ut1_plprefix0stlarge_simsp300_clean100_pipeline_en_5.5.0_3.0_1727059540372.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st15sd_ut72ut1_plprefix0stlarge_simsp300_clean100_pipeline_en_5.5.0_3.0_1727059540372.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st15sd_ut72ut1_plprefix0stlarge_simsp300_clean100_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st15sd_ut72ut1_plprefix0stlarge_simsp300_clean100_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st15sd_ut72ut1_plprefix0stlarge_simsp300_clean100_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.7 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st15sd_ut72ut1_PLPrefix0stlarge_simsp300_clean100 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_odm_zphr_0st16sd_ut12ut1_plprefix0stlarge_simsp100_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_odm_zphr_0st16sd_ut12ut1_plprefix0stlarge_simsp100_en.md new file mode 100644 index 00000000000000..d325a7b0d9935a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_odm_zphr_0st16sd_ut12ut1_plprefix0stlarge_simsp100_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st16sd_ut12ut1_plprefix0stlarge_simsp100 DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st16sd_ut12ut1_plprefix0stlarge_simsp100 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st16sd_ut12ut1_plprefix0stlarge_simsp100` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st16sd_ut12ut1_plprefix0stlarge_simsp100_en_5.5.0_3.0_1727097377994.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st16sd_ut12ut1_plprefix0stlarge_simsp100_en_5.5.0_3.0_1727097377994.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st16sd_ut12ut1_plprefix0stlarge_simsp100","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st16sd_ut12ut1_plprefix0stlarge_simsp100", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st16sd_ut12ut1_plprefix0stlarge_simsp100| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.6 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st16sd_ut12ut1_PLPrefix0stlarge_simsp100 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_odm_zphr_0st16sd_ut72ut1largepfxnf_simsp300_clean200_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_odm_zphr_0st16sd_ut72ut1largepfxnf_simsp300_clean200_en.md new file mode 100644 index 00000000000000..6e5c35bb4feb95 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_odm_zphr_0st16sd_ut72ut1largepfxnf_simsp300_clean200_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st16sd_ut72ut1largepfxnf_simsp300_clean200 DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st16sd_ut72ut1largepfxnf_simsp300_clean200 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st16sd_ut72ut1largepfxnf_simsp300_clean200` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st16sd_ut72ut1largepfxnf_simsp300_clean200_en_5.5.0_3.0_1727108557731.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st16sd_ut72ut1largepfxnf_simsp300_clean200_en_5.5.0_3.0_1727108557731.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st16sd_ut72ut1largepfxnf_simsp300_clean200","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st16sd_ut72ut1largepfxnf_simsp300_clean200", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st16sd_ut72ut1largepfxnf_simsp300_clean200| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.6 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st16sd_ut72ut1largePfxNf_simsp300_clean200 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_odm_zphr_0st16sd_ut72ut1largepfxnf_simsp300_clean200_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_odm_zphr_0st16sd_ut72ut1largepfxnf_simsp300_clean200_pipeline_en.md new file mode 100644 index 00000000000000..c4c7f1e21c8b67 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_odm_zphr_0st16sd_ut72ut1largepfxnf_simsp300_clean200_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st16sd_ut72ut1largepfxnf_simsp300_clean200_pipeline pipeline DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st16sd_ut72ut1largepfxnf_simsp300_clean200_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st16sd_ut72ut1largepfxnf_simsp300_clean200_pipeline` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st16sd_ut72ut1largepfxnf_simsp300_clean200_pipeline_en_5.5.0_3.0_1727108569798.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st16sd_ut72ut1largepfxnf_simsp300_clean200_pipeline_en_5.5.0_3.0_1727108569798.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st16sd_ut72ut1largepfxnf_simsp300_clean200_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st16sd_ut72ut1largepfxnf_simsp300_clean200_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st16sd_ut72ut1largepfxnf_simsp300_clean200_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.7 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st16sd_ut72ut1largePfxNf_simsp300_clean200 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_odm_zphr_0st16sd_ut72ut3_plprefix0stlarge_simsp100_clean100_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_odm_zphr_0st16sd_ut72ut3_plprefix0stlarge_simsp100_clean100_en.md new file mode 100644 index 00000000000000..dc6b692c0cf884 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_odm_zphr_0st16sd_ut72ut3_plprefix0stlarge_simsp100_clean100_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st16sd_ut72ut3_plprefix0stlarge_simsp100_clean100 DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st16sd_ut72ut3_plprefix0stlarge_simsp100_clean100 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st16sd_ut72ut3_plprefix0stlarge_simsp100_clean100` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st16sd_ut72ut3_plprefix0stlarge_simsp100_clean100_en_5.5.0_3.0_1727110613987.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st16sd_ut72ut3_plprefix0stlarge_simsp100_clean100_en_5.5.0_3.0_1727110613987.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st16sd_ut72ut3_plprefix0stlarge_simsp100_clean100","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st16sd_ut72ut3_plprefix0stlarge_simsp100_clean100", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st16sd_ut72ut3_plprefix0stlarge_simsp100_clean100| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.6 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st16sd_ut72ut3_PLPrefix0stlarge_simsp100_clean100 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_odm_zphr_0st16sd_ut72ut3_plprefix0stlarge_simsp100_clean100_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_odm_zphr_0st16sd_ut72ut3_plprefix0stlarge_simsp100_clean100_pipeline_en.md new file mode 100644 index 00000000000000..461d86c16b5cf7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_odm_zphr_0st16sd_ut72ut3_plprefix0stlarge_simsp100_clean100_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st16sd_ut72ut3_plprefix0stlarge_simsp100_clean100_pipeline pipeline DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st16sd_ut72ut3_plprefix0stlarge_simsp100_clean100_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st16sd_ut72ut3_plprefix0stlarge_simsp100_clean100_pipeline` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st16sd_ut72ut3_plprefix0stlarge_simsp100_clean100_pipeline_en_5.5.0_3.0_1727110627225.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st16sd_ut72ut3_plprefix0stlarge_simsp100_clean100_pipeline_en_5.5.0_3.0_1727110627225.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st16sd_ut72ut3_plprefix0stlarge_simsp100_clean100_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st16sd_ut72ut3_plprefix0stlarge_simsp100_clean100_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st16sd_ut72ut3_plprefix0stlarge_simsp100_clean100_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.7 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st16sd_ut72ut3_PLPrefix0stlarge_simsp100_clean100 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_odm_zphr_0st17sd_ut72ut5_plprefix0stlarge17_simsp100_clean100_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_odm_zphr_0st17sd_ut72ut5_plprefix0stlarge17_simsp100_clean100_en.md new file mode 100644 index 00000000000000..605a4b0414a49c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_odm_zphr_0st17sd_ut72ut5_plprefix0stlarge17_simsp100_clean100_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st17sd_ut72ut5_plprefix0stlarge17_simsp100_clean100 DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st17sd_ut72ut5_plprefix0stlarge17_simsp100_clean100 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st17sd_ut72ut5_plprefix0stlarge17_simsp100_clean100` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st17sd_ut72ut5_plprefix0stlarge17_simsp100_clean100_en_5.5.0_3.0_1727093720340.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st17sd_ut72ut5_plprefix0stlarge17_simsp100_clean100_en_5.5.0_3.0_1727093720340.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st17sd_ut72ut5_plprefix0stlarge17_simsp100_clean100","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st17sd_ut72ut5_plprefix0stlarge17_simsp100_clean100", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st17sd_ut72ut5_plprefix0stlarge17_simsp100_clean100| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.6 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st17sd_ut72ut5_PLPrefix0stlarge17_simsp100_clean100 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_odm_zphr_0st17sd_ut72ut5_plprefix0stlarge17_simsp100_clean100_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_odm_zphr_0st17sd_ut72ut5_plprefix0stlarge17_simsp100_clean100_pipeline_en.md new file mode 100644 index 00000000000000..a7d29db27afc9e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_odm_zphr_0st17sd_ut72ut5_plprefix0stlarge17_simsp100_clean100_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st17sd_ut72ut5_plprefix0stlarge17_simsp100_clean100_pipeline pipeline DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st17sd_ut72ut5_plprefix0stlarge17_simsp100_clean100_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st17sd_ut72ut5_plprefix0stlarge17_simsp100_clean100_pipeline` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st17sd_ut72ut5_plprefix0stlarge17_simsp100_clean100_pipeline_en_5.5.0_3.0_1727093733181.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st17sd_ut72ut5_plprefix0stlarge17_simsp100_clean100_pipeline_en_5.5.0_3.0_1727093733181.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st17sd_ut72ut5_plprefix0stlarge17_simsp100_clean100_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st17sd_ut72ut5_plprefix0stlarge17_simsp100_clean100_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st17sd_ut72ut5_plprefix0stlarge17_simsp100_clean100_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.7 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st17sd_ut72ut5_PLPrefix0stlarge17_simsp100_clean100 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_odm_zphr_0st19sd_ut72ut1_plprefix0stlarge_simsp300_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_odm_zphr_0st19sd_ut72ut1_plprefix0stlarge_simsp300_pipeline_en.md new file mode 100644 index 00000000000000..bfa9f4a030e145 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_odm_zphr_0st19sd_ut72ut1_plprefix0stlarge_simsp300_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st19sd_ut72ut1_plprefix0stlarge_simsp300_pipeline pipeline DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st19sd_ut72ut1_plprefix0stlarge_simsp300_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st19sd_ut72ut1_plprefix0stlarge_simsp300_pipeline` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st19sd_ut72ut1_plprefix0stlarge_simsp300_pipeline_en_5.5.0_3.0_1727108509148.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st19sd_ut72ut1_plprefix0stlarge_simsp300_pipeline_en_5.5.0_3.0_1727108509148.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st19sd_ut72ut1_plprefix0stlarge_simsp300_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st19sd_ut72ut1_plprefix0stlarge_simsp300_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st19sd_ut72ut1_plprefix0stlarge_simsp300_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.7 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st19sd_ut72ut1_PLPrefix0stlarge_simsp300 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_odm_zphr_0st1sd_ut72ut5_plprefix0stlarge_simsp100_clean100_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_odm_zphr_0st1sd_ut72ut5_plprefix0stlarge_simsp100_clean100_pipeline_en.md new file mode 100644 index 00000000000000..12d752d8199483 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_odm_zphr_0st1sd_ut72ut5_plprefix0stlarge_simsp100_clean100_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st1sd_ut72ut5_plprefix0stlarge_simsp100_clean100_pipeline pipeline DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st1sd_ut72ut5_plprefix0stlarge_simsp100_clean100_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st1sd_ut72ut5_plprefix0stlarge_simsp100_clean100_pipeline` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st1sd_ut72ut5_plprefix0stlarge_simsp100_clean100_pipeline_en_5.5.0_3.0_1727097220467.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st1sd_ut72ut5_plprefix0stlarge_simsp100_clean100_pipeline_en_5.5.0_3.0_1727097220467.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st1sd_ut72ut5_plprefix0stlarge_simsp100_clean100_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st1sd_ut72ut5_plprefix0stlarge_simsp100_clean100_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st1sd_ut72ut5_plprefix0stlarge_simsp100_clean100_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.7 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st1sd_ut72ut5_PLPrefix0stlarge_simsp100_clean100 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_odm_zphr_0st21sd_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_odm_zphr_0st21sd_en.md new file mode 100644 index 00000000000000..bc2665ea6ebcb8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_odm_zphr_0st21sd_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st21sd DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st21sd +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st21sd` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st21sd_en_5.5.0_3.0_1727094104025.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st21sd_en_5.5.0_3.0_1727094104025.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st21sd","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st21sd", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st21sd| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.6 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st21sd \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_odm_zphr_0st2sd_ut72ut5_plprefix0stlarge_simsp100_clean100_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_odm_zphr_0st2sd_ut72ut5_plprefix0stlarge_simsp100_clean100_en.md new file mode 100644 index 00000000000000..ffd4e83b16ec4c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_odm_zphr_0st2sd_ut72ut5_plprefix0stlarge_simsp100_clean100_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st2sd_ut72ut5_plprefix0stlarge_simsp100_clean100 DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st2sd_ut72ut5_plprefix0stlarge_simsp100_clean100 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st2sd_ut72ut5_plprefix0stlarge_simsp100_clean100` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st2sd_ut72ut5_plprefix0stlarge_simsp100_clean100_en_5.5.0_3.0_1727093568416.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st2sd_ut72ut5_plprefix0stlarge_simsp100_clean100_en_5.5.0_3.0_1727093568416.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st2sd_ut72ut5_plprefix0stlarge_simsp100_clean100","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st2sd_ut72ut5_plprefix0stlarge_simsp100_clean100", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st2sd_ut72ut5_plprefix0stlarge_simsp100_clean100| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.6 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st2sd_ut72ut5_PLPrefix0stlarge_simsp100_clean100 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_odm_zphr_0st2sd_ut72ut5_plprefix0stlarge_simsp100_clean100_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_odm_zphr_0st2sd_ut72ut5_plprefix0stlarge_simsp100_clean100_pipeline_en.md new file mode 100644 index 00000000000000..fc2cb577a1b7b0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_odm_zphr_0st2sd_ut72ut5_plprefix0stlarge_simsp100_clean100_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st2sd_ut72ut5_plprefix0stlarge_simsp100_clean100_pipeline pipeline DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st2sd_ut72ut5_plprefix0stlarge_simsp100_clean100_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st2sd_ut72ut5_plprefix0stlarge_simsp100_clean100_pipeline` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st2sd_ut72ut5_plprefix0stlarge_simsp100_clean100_pipeline_en_5.5.0_3.0_1727093579982.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st2sd_ut72ut5_plprefix0stlarge_simsp100_clean100_pipeline_en_5.5.0_3.0_1727093579982.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st2sd_ut72ut5_plprefix0stlarge_simsp100_clean100_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st2sd_ut72ut5_plprefix0stlarge_simsp100_clean100_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st2sd_ut72ut5_plprefix0stlarge_simsp100_clean100_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.7 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st2sd_ut72ut5_PLPrefix0stlarge_simsp100_clean100 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_odm_zphr_0st2sd_ut72ut5_plprefix0stlarge_simsp100_clean300_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_odm_zphr_0st2sd_ut72ut5_plprefix0stlarge_simsp100_clean300_en.md new file mode 100644 index 00000000000000..e5b3f1adf3f15a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_odm_zphr_0st2sd_ut72ut5_plprefix0stlarge_simsp100_clean300_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st2sd_ut72ut5_plprefix0stlarge_simsp100_clean300 DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st2sd_ut72ut5_plprefix0stlarge_simsp100_clean300 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st2sd_ut72ut5_plprefix0stlarge_simsp100_clean300` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st2sd_ut72ut5_plprefix0stlarge_simsp100_clean300_en_5.5.0_3.0_1727073953066.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st2sd_ut72ut5_plprefix0stlarge_simsp100_clean300_en_5.5.0_3.0_1727073953066.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st2sd_ut72ut5_plprefix0stlarge_simsp100_clean300","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st2sd_ut72ut5_plprefix0stlarge_simsp100_clean300", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st2sd_ut72ut5_plprefix0stlarge_simsp100_clean300| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.6 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st2sd_ut72ut5_PLPrefix0stlarge_simsp100_clean300 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_odm_zphr_0st2sd_ut72ut5_plprefix0stlarge_simsp100_clean300_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_odm_zphr_0st2sd_ut72ut5_plprefix0stlarge_simsp100_clean300_pipeline_en.md new file mode 100644 index 00000000000000..e8f91cdf824809 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_odm_zphr_0st2sd_ut72ut5_plprefix0stlarge_simsp100_clean300_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st2sd_ut72ut5_plprefix0stlarge_simsp100_clean300_pipeline pipeline DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st2sd_ut72ut5_plprefix0stlarge_simsp100_clean300_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st2sd_ut72ut5_plprefix0stlarge_simsp100_clean300_pipeline` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st2sd_ut72ut5_plprefix0stlarge_simsp100_clean300_pipeline_en_5.5.0_3.0_1727073969499.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st2sd_ut72ut5_plprefix0stlarge_simsp100_clean300_pipeline_en_5.5.0_3.0_1727073969499.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st2sd_ut72ut5_plprefix0stlarge_simsp100_clean300_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st2sd_ut72ut5_plprefix0stlarge_simsp100_clean300_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st2sd_ut72ut5_plprefix0stlarge_simsp100_clean300_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.7 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st2sd_ut72ut5_PLPrefix0stlarge_simsp100_clean300 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1_plprefix0stlarge30_simsp_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1_plprefix0stlarge30_simsp_en.md new file mode 100644 index 00000000000000..613b125070f3f8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1_plprefix0stlarge30_simsp_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1_plprefix0stlarge30_simsp DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1_plprefix0stlarge30_simsp +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1_plprefix0stlarge30_simsp` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1_plprefix0stlarge30_simsp_en_5.5.0_3.0_1727108642673.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1_plprefix0stlarge30_simsp_en_5.5.0_3.0_1727108642673.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1_plprefix0stlarge30_simsp","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1_plprefix0stlarge30_simsp", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1_plprefix0stlarge30_simsp| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.6 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st42sd_ut72ut1_PLPrefix0stlarge30_simsp \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1_plprefix0stlarge30_simsp_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1_plprefix0stlarge30_simsp_pipeline_en.md new file mode 100644 index 00000000000000..5b9853bf6070fe --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1_plprefix0stlarge30_simsp_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1_plprefix0stlarge30_simsp_pipeline pipeline DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1_plprefix0stlarge30_simsp_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1_plprefix0stlarge30_simsp_pipeline` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1_plprefix0stlarge30_simsp_pipeline_en_5.5.0_3.0_1727108654944.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1_plprefix0stlarge30_simsp_pipeline_en_5.5.0_3.0_1727108654944.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1_plprefix0stlarge30_simsp_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1_plprefix0stlarge30_simsp_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1_plprefix0stlarge30_simsp_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.7 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st42sd_ut72ut1_PLPrefix0stlarge30_simsp + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_odm_zphr_0st42sd_ut72ut5_plprefix0stlarge42_simsp_clean1sd_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_odm_zphr_0st42sd_ut72ut5_plprefix0stlarge42_simsp_clean1sd_en.md new file mode 100644 index 00000000000000..fad029a3c29576 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_odm_zphr_0st42sd_ut72ut5_plprefix0stlarge42_simsp_clean1sd_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st42sd_ut72ut5_plprefix0stlarge42_simsp_clean1sd DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st42sd_ut72ut5_plprefix0stlarge42_simsp_clean1sd +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st42sd_ut72ut5_plprefix0stlarge42_simsp_clean1sd` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st42sd_ut72ut5_plprefix0stlarge42_simsp_clean1sd_en_5.5.0_3.0_1727082444097.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st42sd_ut72ut5_plprefix0stlarge42_simsp_clean1sd_en_5.5.0_3.0_1727082444097.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st42sd_ut72ut5_plprefix0stlarge42_simsp_clean1sd","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st42sd_ut72ut5_plprefix0stlarge42_simsp_clean1sd", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st42sd_ut72ut5_plprefix0stlarge42_simsp_clean1sd| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.6 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st42sd_ut72ut5_PLPrefix0stlarge42_simsp_clean1sd \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_odm_zphr_0st42sd_ut92ut1_pl0stlarge42_simsp_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_odm_zphr_0st42sd_ut92ut1_pl0stlarge42_simsp_en.md new file mode 100644 index 00000000000000..56638b25cd5e01 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_odm_zphr_0st42sd_ut92ut1_pl0stlarge42_simsp_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st42sd_ut92ut1_pl0stlarge42_simsp DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st42sd_ut92ut1_pl0stlarge42_simsp +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st42sd_ut92ut1_pl0stlarge42_simsp` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st42sd_ut92ut1_pl0stlarge42_simsp_en_5.5.0_3.0_1727093823964.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st42sd_ut92ut1_pl0stlarge42_simsp_en_5.5.0_3.0_1727093823964.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st42sd_ut92ut1_pl0stlarge42_simsp","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st42sd_ut92ut1_pl0stlarge42_simsp", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st42sd_ut92ut1_pl0stlarge42_simsp| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.6 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st42sd_ut92ut1_PL0stlarge42_simsp \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_odm_zphr_0st42sd_ut92ut1_pl0stlarge42_simsp_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_odm_zphr_0st42sd_ut92ut1_pl0stlarge42_simsp_pipeline_en.md new file mode 100644 index 00000000000000..9bde458637c8fe --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_odm_zphr_0st42sd_ut92ut1_pl0stlarge42_simsp_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st42sd_ut92ut1_pl0stlarge42_simsp_pipeline pipeline DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st42sd_ut92ut1_pl0stlarge42_simsp_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st42sd_ut92ut1_pl0stlarge42_simsp_pipeline` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st42sd_ut92ut1_pl0stlarge42_simsp_pipeline_en_5.5.0_3.0_1727093835824.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st42sd_ut92ut1_pl0stlarge42_simsp_pipeline_en_5.5.0_3.0_1727093835824.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st42sd_ut92ut1_pl0stlarge42_simsp_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st42sd_ut92ut1_pl0stlarge42_simsp_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st42sd_ut92ut1_pl0stlarge42_simsp_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.7 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st42sd_ut92ut1_PL0stlarge42_simsp + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_odm_zphr_0st6sd_ut72ut5_plprefix0stlarge_simsp100_clean100_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_odm_zphr_0st6sd_ut72ut5_plprefix0stlarge_simsp100_clean100_en.md new file mode 100644 index 00000000000000..94218329e654f2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_odm_zphr_0st6sd_ut72ut5_plprefix0stlarge_simsp100_clean100_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st6sd_ut72ut5_plprefix0stlarge_simsp100_clean100 DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st6sd_ut72ut5_plprefix0stlarge_simsp100_clean100 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st6sd_ut72ut5_plprefix0stlarge_simsp100_clean100` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st6sd_ut72ut5_plprefix0stlarge_simsp100_clean100_en_5.5.0_3.0_1727082750419.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st6sd_ut72ut5_plprefix0stlarge_simsp100_clean100_en_5.5.0_3.0_1727082750419.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st6sd_ut72ut5_plprefix0stlarge_simsp100_clean100","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st6sd_ut72ut5_plprefix0stlarge_simsp100_clean100", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st6sd_ut72ut5_plprefix0stlarge_simsp100_clean100| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.6 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st6sd_ut72ut5_PLPrefix0stlarge_simsp100_clean100 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_odm_zphr_0st6sd_ut72ut5_plprefix0stlarge_simsp100_clean200_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_odm_zphr_0st6sd_ut72ut5_plprefix0stlarge_simsp100_clean200_en.md new file mode 100644 index 00000000000000..2b6733a6b7acd2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_odm_zphr_0st6sd_ut72ut5_plprefix0stlarge_simsp100_clean200_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st6sd_ut72ut5_plprefix0stlarge_simsp100_clean200 DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st6sd_ut72ut5_plprefix0stlarge_simsp100_clean200 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st6sd_ut72ut5_plprefix0stlarge_simsp100_clean200` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st6sd_ut72ut5_plprefix0stlarge_simsp100_clean200_en_5.5.0_3.0_1727073730231.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st6sd_ut72ut5_plprefix0stlarge_simsp100_clean200_en_5.5.0_3.0_1727073730231.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st6sd_ut72ut5_plprefix0stlarge_simsp100_clean200","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st6sd_ut72ut5_plprefix0stlarge_simsp100_clean200", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st6sd_ut72ut5_plprefix0stlarge_simsp100_clean200| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.6 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st6sd_ut72ut5_PLPrefix0stlarge_simsp100_clean200 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_odm_zphr_0st6sd_ut72ut5_plprefix0stlarge_simsp100_clean200_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_odm_zphr_0st6sd_ut72ut5_plprefix0stlarge_simsp100_clean200_pipeline_en.md new file mode 100644 index 00000000000000..3dc04416cf68f2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_odm_zphr_0st6sd_ut72ut5_plprefix0stlarge_simsp100_clean200_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st6sd_ut72ut5_plprefix0stlarge_simsp100_clean200_pipeline pipeline DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st6sd_ut72ut5_plprefix0stlarge_simsp100_clean200_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st6sd_ut72ut5_plprefix0stlarge_simsp100_clean200_pipeline` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st6sd_ut72ut5_plprefix0stlarge_simsp100_clean200_pipeline_en_5.5.0_3.0_1727073742331.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st6sd_ut72ut5_plprefix0stlarge_simsp100_clean200_pipeline_en_5.5.0_3.0_1727073742331.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st6sd_ut72ut5_plprefix0stlarge_simsp100_clean200_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st6sd_ut72ut5_plprefix0stlarge_simsp100_clean200_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st6sd_ut72ut5_plprefix0stlarge_simsp100_clean200_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.7 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st6sd_ut72ut5_PLPrefix0stlarge_simsp100_clean200 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_pii_200_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_pii_200_en.md new file mode 100644 index 00000000000000..31df0615c8602b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_pii_200_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_pii_200 DistilBertForTokenClassification from modeldev +author: John Snow Labs +name: distilbert_base_uncased_pii_200 +date: 2024-09-23 +tags: [en, open_source, onnx, token_classification, distilbert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_pii_200` is a English model originally trained by modeldev. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_pii_200_en_5.5.0_3.0_1727065444115.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_pii_200_en_5.5.0_3.0_1727065444115.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = DistilBertForTokenClassification.pretrained("distilbert_base_uncased_pii_200","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = DistilBertForTokenClassification.pretrained("distilbert_base_uncased_pii_200", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_pii_200| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|247.6 MB| + +## References + +https://huggingface.co/modeldev/distilbert-base-uncased-pii-200 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_pii_200_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_pii_200_pipeline_en.md new file mode 100644 index 00000000000000..a246f1dbf69086 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_pii_200_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_pii_200_pipeline pipeline DistilBertForTokenClassification from modeldev +author: John Snow Labs +name: distilbert_base_uncased_pii_200_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_pii_200_pipeline` is a English model originally trained by modeldev. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_pii_200_pipeline_en_5.5.0_3.0_1727065455623.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_pii_200_pipeline_en_5.5.0_3.0_1727065455623.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_pii_200_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_pii_200_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_pii_200_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.6 MB| + +## References + +https://huggingface.co/modeldev/distilbert-base-uncased-pii-200 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_resumesclasssifierv1_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_resumesclasssifierv1_en.md new file mode 100644 index 00000000000000..eca1f141a206ed --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_resumesclasssifierv1_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_resumesclasssifierv1 DistilBertForSequenceClassification from youssefkhalil320 +author: John Snow Labs +name: distilbert_base_uncased_resumesclasssifierv1 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_resumesclasssifierv1` is a English model originally trained by youssefkhalil320. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_resumesclasssifierv1_en_5.5.0_3.0_1727073670686.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_resumesclasssifierv1_en_5.5.0_3.0_1727073670686.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_resumesclasssifierv1","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_resumesclasssifierv1", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_resumesclasssifierv1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.6 MB| + +## References + +https://huggingface.co/youssefkhalil320/distilbert-base-uncased-resumesClasssifierV1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_resumesclasssifierv1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_resumesclasssifierv1_pipeline_en.md new file mode 100644 index 00000000000000..53bf534fec794f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_resumesclasssifierv1_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_resumesclasssifierv1_pipeline pipeline DistilBertForSequenceClassification from youssefkhalil320 +author: John Snow Labs +name: distilbert_base_uncased_resumesclasssifierv1_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_resumesclasssifierv1_pipeline` is a English model originally trained by youssefkhalil320. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_resumesclasssifierv1_pipeline_en_5.5.0_3.0_1727073682591.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_resumesclasssifierv1_pipeline_en_5.5.0_3.0_1727073682591.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_resumesclasssifierv1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_resumesclasssifierv1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_resumesclasssifierv1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.6 MB| + +## References + +https://huggingface.co/youssefkhalil320/distilbert-base-uncased-resumesClasssifierV1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_sancho3010_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_sancho3010_en.md new file mode 100644 index 00000000000000..2a28e868b51b0b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_sancho3010_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_sancho3010 DistilBertForSequenceClassification from Sancho3010 +author: John Snow Labs +name: distilbert_base_uncased_sancho3010 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_sancho3010` is a English model originally trained by Sancho3010. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_sancho3010_en_5.5.0_3.0_1727059119274.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_sancho3010_en_5.5.0_3.0_1727059119274.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_sancho3010","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_sancho3010", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_sancho3010| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Sancho3010/distilbert-base-uncased \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_sbulut_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_sbulut_en.md new file mode 100644 index 00000000000000..ff667563419607 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_sbulut_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_sbulut DistilBertForSequenceClassification from sbulut +author: John Snow Labs +name: distilbert_base_uncased_sbulut +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_sbulut` is a English model originally trained by sbulut. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_sbulut_en_5.5.0_3.0_1727087221032.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_sbulut_en_5.5.0_3.0_1727087221032.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_sbulut","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_sbulut", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_sbulut| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/sbulut/distilbert-base-uncased \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_sbulut_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_sbulut_pipeline_en.md new file mode 100644 index 00000000000000..f4f4f5376e4768 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_sbulut_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_sbulut_pipeline pipeline DistilBertForSequenceClassification from sbulut +author: John Snow Labs +name: distilbert_base_uncased_sbulut_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_sbulut_pipeline` is a English model originally trained by sbulut. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_sbulut_pipeline_en_5.5.0_3.0_1727087232708.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_sbulut_pipeline_en_5.5.0_3.0_1727087232708.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_sbulut_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_sbulut_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_sbulut_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/sbulut/distilbert-base-uncased + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_thienle_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_thienle_en.md new file mode 100644 index 00000000000000..b4fd2c981d99a0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_thienle_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_thienle DistilBertForSequenceClassification from thienlelong +author: John Snow Labs +name: distilbert_base_uncased_thienle +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_thienle` is a English model originally trained by thienlelong. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_thienle_en_5.5.0_3.0_1727059121707.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_thienle_en_5.5.0_3.0_1727059121707.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_thienle","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_thienle", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_thienle| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/thienlelong/distilbert-base-uncased-thienle \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_thienle_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_thienle_pipeline_en.md new file mode 100644 index 00000000000000..c0bc73c4b92be4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_thienle_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_thienle_pipeline pipeline DistilBertForSequenceClassification from thienlelong +author: John Snow Labs +name: distilbert_base_uncased_thienle_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_thienle_pipeline` is a English model originally trained by thienlelong. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_thienle_pipeline_en_5.5.0_3.0_1727059139704.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_thienle_pipeline_en_5.5.0_3.0_1727059139704.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_thienle_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_thienle_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_thienle_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/thienlelong/distilbert-base-uncased-thienle + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_three_v2_fix_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_three_v2_fix_en.md new file mode 100644 index 00000000000000..d43f5ef99bc287 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_three_v2_fix_en.md @@ -0,0 +1,96 @@ +--- +layout: model +title: English distilbert_base_uncased_three_v2_fix DistilBertForTokenClassification from devtibo +author: John Snow Labs +name: distilbert_base_uncased_three_v2_fix +date: 2024-09-23 +tags: [en, open_source, onnx, token_classification, distilbert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_three_v2_fix` is a English model originally trained by devtibo. + +## Predicted Entities + + + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_three_v2_fix_en_5.5.0_3.0_1727065438463.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_three_v2_fix_en_5.5.0_3.0_1727065438463.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = DistilBertForTokenClassification.pretrained("distilbert_base_uncased_three_v2_fix","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = DistilBertForTokenClassification.pretrained("distilbert_base_uncased_three_v2_fix", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_three_v2_fix| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|247.6 MB| + +## References + +References + +https://huggingface.co/devtibo/distilbert-base-uncased-three-v2-fix \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_three_v2_fix_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_three_v2_fix_pipeline_en.md new file mode 100644 index 00000000000000..2fe798dcc6525d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_three_v2_fix_pipeline_en.md @@ -0,0 +1,72 @@ +--- +layout: model +title: English distilbert_base_uncased_three_v2_fix_pipeline pipeline DistilBertForTokenClassification from devtibo +author: John Snow Labs +name: distilbert_base_uncased_three_v2_fix_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_three_v2_fix_pipeline` is a English model originally trained by devtibo. + +## Predicted Entities + + + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_three_v2_fix_pipeline_en_5.5.0_3.0_1727065450037.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_three_v2_fix_pipeline_en_5.5.0_3.0_1727065450037.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +pipeline = PretrainedPipeline("distilbert_base_uncased_three_v2_fix_pipeline", lang = "en") +annotations = pipeline.transform(df) +``` +```scala +val pipeline = new PretrainedPipeline("distilbert_base_uncased_three_v2_fix_pipeline", lang = "en") +val annotations = pipeline.transform(df) +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_three_v2_fix_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.6 MB| + +## References + +References + +https://huggingface.co/devtibo/distilbert-base-uncased-three-v2-fix + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_travel_zphr_0st_ut102ut1_plain_simsp_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_travel_zphr_0st_ut102ut1_plain_simsp_en.md new file mode 100644 index 00000000000000..fa8df53934ca63 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_travel_zphr_0st_ut102ut1_plain_simsp_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_travel_zphr_0st_ut102ut1_plain_simsp DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_travel_zphr_0st_ut102ut1_plain_simsp +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_travel_zphr_0st_ut102ut1_plain_simsp` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_travel_zphr_0st_ut102ut1_plain_simsp_en_5.5.0_3.0_1727096947443.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_travel_zphr_0st_ut102ut1_plain_simsp_en_5.5.0_3.0_1727096947443.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_travel_zphr_0st_ut102ut1_plain_simsp","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_travel_zphr_0st_ut102ut1_plain_simsp", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_travel_zphr_0st_ut102ut1_plain_simsp| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_travel_zphr_0st_ut102ut1_plain_simsp \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_travel_zphr_0st_ut52ut1_ad7_sp_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_travel_zphr_0st_ut52ut1_ad7_sp_en.md new file mode 100644 index 00000000000000..1ae3a90d759f25 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_travel_zphr_0st_ut52ut1_ad7_sp_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_travel_zphr_0st_ut52ut1_ad7_sp DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_travel_zphr_0st_ut52ut1_ad7_sp +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_travel_zphr_0st_ut52ut1_ad7_sp` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_travel_zphr_0st_ut52ut1_ad7_sp_en_5.5.0_3.0_1727087339017.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_travel_zphr_0st_ut52ut1_ad7_sp_en_5.5.0_3.0_1727087339017.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_travel_zphr_0st_ut52ut1_ad7_sp","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_travel_zphr_0st_ut52ut1_ad7_sp", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_travel_zphr_0st_ut52ut1_ad7_sp| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_travel_zphr_0st_ut52ut1_ad7_sp \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_travel_zphr_0st_ut52ut1_ad7_sp_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_travel_zphr_0st_ut52ut1_ad7_sp_pipeline_en.md new file mode 100644 index 00000000000000..0a6b2cc8d0100d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_travel_zphr_0st_ut52ut1_ad7_sp_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_travel_zphr_0st_ut52ut1_ad7_sp_pipeline pipeline DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_travel_zphr_0st_ut52ut1_ad7_sp_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_travel_zphr_0st_ut52ut1_ad7_sp_pipeline` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_travel_zphr_0st_ut52ut1_ad7_sp_pipeline_en_5.5.0_3.0_1727087350420.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_travel_zphr_0st_ut52ut1_ad7_sp_pipeline_en_5.5.0_3.0_1727087350420.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_travel_zphr_0st_ut52ut1_ad7_sp_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_travel_zphr_0st_ut52ut1_ad7_sp_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_travel_zphr_0st_ut52ut1_ad7_sp_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_travel_zphr_0st_ut52ut1_ad7_sp + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_travel_zphr_0st_ut72ut1_plainprefix_simsp_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_travel_zphr_0st_ut72ut1_plainprefix_simsp_en.md new file mode 100644 index 00000000000000..e58758ac956c24 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_travel_zphr_0st_ut72ut1_plainprefix_simsp_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_travel_zphr_0st_ut72ut1_plainprefix_simsp DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_travel_zphr_0st_ut72ut1_plainprefix_simsp +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_travel_zphr_0st_ut72ut1_plainprefix_simsp` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_travel_zphr_0st_ut72ut1_plainprefix_simsp_en_5.5.0_3.0_1727108566162.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_travel_zphr_0st_ut72ut1_plainprefix_simsp_en_5.5.0_3.0_1727108566162.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_travel_zphr_0st_ut72ut1_plainprefix_simsp","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_travel_zphr_0st_ut72ut1_plainprefix_simsp", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_travel_zphr_0st_ut72ut1_plainprefix_simsp| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_travel_zphr_0st_ut72ut1_plainPrefix_simsp \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_travel_zphr_0st_ut72ut1_plainprefix_simsp_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_travel_zphr_0st_ut72ut1_plainprefix_simsp_pipeline_en.md new file mode 100644 index 00000000000000..0bf782309c3e28 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_travel_zphr_0st_ut72ut1_plainprefix_simsp_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_travel_zphr_0st_ut72ut1_plainprefix_simsp_pipeline pipeline DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_travel_zphr_0st_ut72ut1_plainprefix_simsp_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_travel_zphr_0st_ut72ut1_plainprefix_simsp_pipeline` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_travel_zphr_0st_ut72ut1_plainprefix_simsp_pipeline_en_5.5.0_3.0_1727108578579.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_travel_zphr_0st_ut72ut1_plainprefix_simsp_pipeline_en_5.5.0_3.0_1727108578579.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_travel_zphr_0st_ut72ut1_plainprefix_simsp_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_travel_zphr_0st_ut72ut1_plainprefix_simsp_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_travel_zphr_0st_ut72ut1_plainprefix_simsp_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_travel_zphr_0st_ut72ut1_plainPrefix_simsp + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_emotion_carlosramirez2112_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_emotion_carlosramirez2112_en.md new file mode 100644 index 00000000000000..d961fd0a782d63 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_emotion_carlosramirez2112_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_emotion_carlosramirez2112 DistilBertForSequenceClassification from carlosramirez2112 +author: John Snow Labs +name: distilbert_emotion_carlosramirez2112 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_emotion_carlosramirez2112` is a English model originally trained by carlosramirez2112. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_emotion_carlosramirez2112_en_5.5.0_3.0_1727086752254.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_emotion_carlosramirez2112_en_5.5.0_3.0_1727086752254.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_emotion_carlosramirez2112","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_emotion_carlosramirez2112", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_emotion_carlosramirez2112| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/carlosramirez2112/distilbert-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_emotion_gthivaios_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_emotion_gthivaios_en.md new file mode 100644 index 00000000000000..d9f7568c1842e2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_emotion_gthivaios_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_emotion_gthivaios DistilBertForSequenceClassification from gthivaios +author: John Snow Labs +name: distilbert_emotion_gthivaios +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_emotion_gthivaios` is a English model originally trained by gthivaios. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_emotion_gthivaios_en_5.5.0_3.0_1727087099729.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_emotion_gthivaios_en_5.5.0_3.0_1727087099729.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_emotion_gthivaios","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_emotion_gthivaios", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_emotion_gthivaios| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/gthivaios/distilbert-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_emotion_gthivaios_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_emotion_gthivaios_pipeline_en.md new file mode 100644 index 00000000000000..856e77f46c62fe --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_emotion_gthivaios_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_emotion_gthivaios_pipeline pipeline DistilBertForSequenceClassification from gthivaios +author: John Snow Labs +name: distilbert_emotion_gthivaios_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_emotion_gthivaios_pipeline` is a English model originally trained by gthivaios. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_emotion_gthivaios_pipeline_en_5.5.0_3.0_1727087111405.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_emotion_gthivaios_pipeline_en_5.5.0_3.0_1727087111405.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_emotion_gthivaios_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_emotion_gthivaios_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_emotion_gthivaios_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/gthivaios/distilbert-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_emotion_scaaseu_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_emotion_scaaseu_en.md new file mode 100644 index 00000000000000..bb18d1353cb61c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_emotion_scaaseu_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_emotion_scaaseu DistilBertForSequenceClassification from scaaseu +author: John Snow Labs +name: distilbert_emotion_scaaseu +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_emotion_scaaseu` is a English model originally trained by scaaseu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_emotion_scaaseu_en_5.5.0_3.0_1727073517763.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_emotion_scaaseu_en_5.5.0_3.0_1727073517763.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_emotion_scaaseu","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_emotion_scaaseu", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_emotion_scaaseu| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/scaaseu/distilbert-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_emotion_scaaseu_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_emotion_scaaseu_pipeline_en.md new file mode 100644 index 00000000000000..6548ce610bfaab --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_emotion_scaaseu_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_emotion_scaaseu_pipeline pipeline DistilBertForSequenceClassification from scaaseu +author: John Snow Labs +name: distilbert_emotion_scaaseu_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_emotion_scaaseu_pipeline` is a English model originally trained by scaaseu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_emotion_scaaseu_pipeline_en_5.5.0_3.0_1727073535771.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_emotion_scaaseu_pipeline_en_5.5.0_3.0_1727073535771.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_emotion_scaaseu_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_emotion_scaaseu_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_emotion_scaaseu_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/scaaseu/distilbert-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_fine_turned_classification_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_fine_turned_classification_en.md new file mode 100644 index 00000000000000..60deb08f9cd405 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_fine_turned_classification_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_fine_turned_classification DistilBertForSequenceClassification from abhimanyuaryan +author: John Snow Labs +name: distilbert_fine_turned_classification +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_fine_turned_classification` is a English model originally trained by abhimanyuaryan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_fine_turned_classification_en_5.5.0_3.0_1727110503231.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_fine_turned_classification_en_5.5.0_3.0_1727110503231.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_fine_turned_classification","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_fine_turned_classification", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_fine_turned_classification| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/abhimanyuaryan/distilbert-fine-turned-classification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_fine_turned_classification_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_fine_turned_classification_pipeline_en.md new file mode 100644 index 00000000000000..765fc7f70b7f3c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_fine_turned_classification_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_fine_turned_classification_pipeline pipeline DistilBertForSequenceClassification from abhimanyuaryan +author: John Snow Labs +name: distilbert_fine_turned_classification_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_fine_turned_classification_pipeline` is a English model originally trained by abhimanyuaryan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_fine_turned_classification_pipeline_en_5.5.0_3.0_1727110516132.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_fine_turned_classification_pipeline_en_5.5.0_3.0_1727110516132.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_fine_turned_classification_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_fine_turned_classification_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_fine_turned_classification_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/abhimanyuaryan/distilbert-fine-turned-classification + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_finetuned_go_emotions_dataset_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_finetuned_go_emotions_dataset_pipeline_en.md new file mode 100644 index 00000000000000..f6cada97fd3c00 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_finetuned_go_emotions_dataset_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_finetuned_go_emotions_dataset_pipeline pipeline DistilBertForSequenceClassification from abdurrahman22224 +author: John Snow Labs +name: distilbert_finetuned_go_emotions_dataset_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_finetuned_go_emotions_dataset_pipeline` is a English model originally trained by abdurrahman22224. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_finetuned_go_emotions_dataset_pipeline_en_5.5.0_3.0_1727087018911.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_finetuned_go_emotions_dataset_pipeline_en_5.5.0_3.0_1727087018911.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_finetuned_go_emotions_dataset_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_finetuned_go_emotions_dataset_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_finetuned_go_emotions_dataset_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.6 MB| + +## References + +https://huggingface.co/abdurrahman22224/distilbert-finetuned-go-emotions_dataset + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_foundation_category_funders_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_foundation_category_funders_en.md new file mode 100644 index 00000000000000..c8035faee74e1f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_foundation_category_funders_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_foundation_category_funders DistilBertForSequenceClassification from eric-mc2 +author: John Snow Labs +name: distilbert_foundation_category_funders +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_foundation_category_funders` is a English model originally trained by eric-mc2. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_foundation_category_funders_en_5.5.0_3.0_1727108670250.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_foundation_category_funders_en_5.5.0_3.0_1727108670250.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_foundation_category_funders","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_foundation_category_funders", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_foundation_category_funders| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/eric-mc2/distilbert-foundation-category-funders \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_foundation_category_funders_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_foundation_category_funders_pipeline_en.md new file mode 100644 index 00000000000000..dd16125c80ed5d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_foundation_category_funders_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_foundation_category_funders_pipeline pipeline DistilBertForSequenceClassification from eric-mc2 +author: John Snow Labs +name: distilbert_foundation_category_funders_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_foundation_category_funders_pipeline` is a English model originally trained by eric-mc2. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_foundation_category_funders_pipeline_en_5.5.0_3.0_1727108682112.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_foundation_category_funders_pipeline_en_5.5.0_3.0_1727108682112.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_foundation_category_funders_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_foundation_category_funders_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_foundation_category_funders_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/eric-mc2/distilbert-foundation-category-funders + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_ft_sst5_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_ft_sst5_pipeline_en.md new file mode 100644 index 00000000000000..1ce753585a7eb6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_ft_sst5_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_ft_sst5_pipeline pipeline DistilBertForSequenceClassification from pablo-chocobar +author: John Snow Labs +name: distilbert_ft_sst5_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_ft_sst5_pipeline` is a English model originally trained by pablo-chocobar. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_ft_sst5_pipeline_en_5.5.0_3.0_1727097146414.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_ft_sst5_pipeline_en_5.5.0_3.0_1727097146414.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_ft_sst5_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_ft_sst5_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_ft_sst5_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/pablo-chocobar/distilbert-ft-sst5 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_imdb_decentmakeover13_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_imdb_decentmakeover13_en.md new file mode 100644 index 00000000000000..6661323b0cf8e6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_imdb_decentmakeover13_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_imdb_decentmakeover13 DistilBertForSequenceClassification from decentmakeover13 +author: John Snow Labs +name: distilbert_imdb_decentmakeover13 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_imdb_decentmakeover13` is a English model originally trained by decentmakeover13. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_imdb_decentmakeover13_en_5.5.0_3.0_1727082283749.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_imdb_decentmakeover13_en_5.5.0_3.0_1727082283749.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_imdb_decentmakeover13","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_imdb_decentmakeover13", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_imdb_decentmakeover13| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/decentmakeover13/distilbert-imdb \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_imdb_kuma9831_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_imdb_kuma9831_en.md new file mode 100644 index 00000000000000..e4f7958232eed4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_imdb_kuma9831_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_imdb_kuma9831 DistilBertForSequenceClassification from kuma9831 +author: John Snow Labs +name: distilbert_imdb_kuma9831 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_imdb_kuma9831` is a English model originally trained by kuma9831. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_imdb_kuma9831_en_5.5.0_3.0_1727097248222.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_imdb_kuma9831_en_5.5.0_3.0_1727097248222.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_imdb_kuma9831","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_imdb_kuma9831", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_imdb_kuma9831| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/kuma9831/distilbert-imdb \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_imdb_kuma9831_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_imdb_kuma9831_pipeline_en.md new file mode 100644 index 00000000000000..50202b671e5287 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_imdb_kuma9831_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_imdb_kuma9831_pipeline pipeline DistilBertForSequenceClassification from kuma9831 +author: John Snow Labs +name: distilbert_imdb_kuma9831_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_imdb_kuma9831_pipeline` is a English model originally trained by kuma9831. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_imdb_kuma9831_pipeline_en_5.5.0_3.0_1727097259922.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_imdb_kuma9831_pipeline_en_5.5.0_3.0_1727097259922.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_imdb_kuma9831_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_imdb_kuma9831_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_imdb_kuma9831_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/kuma9831/distilbert-imdb + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_nbx_all_l_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_nbx_all_l_pipeline_en.md new file mode 100644 index 00000000000000..4023867c09ee48 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_nbx_all_l_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_nbx_all_l_pipeline pipeline DistilBertForSequenceClassification from vishnuhaasan +author: John Snow Labs +name: distilbert_nbx_all_l_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_nbx_all_l_pipeline` is a English model originally trained by vishnuhaasan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_nbx_all_l_pipeline_en_5.5.0_3.0_1727073535793.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_nbx_all_l_pipeline_en_5.5.0_3.0_1727073535793.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_nbx_all_l_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_nbx_all_l_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_nbx_all_l_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.6 MB| + +## References + +https://huggingface.co/vishnuhaasan/distilbert_nbx_all_l + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_new2_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_new2_en.md new file mode 100644 index 00000000000000..7e6648c7a27b71 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_new2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_new2 DistilBertForSequenceClassification from wnic00 +author: John Snow Labs +name: distilbert_new2 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_new2` is a English model originally trained by wnic00. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_new2_en_5.5.0_3.0_1727082733068.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_new2_en_5.5.0_3.0_1727082733068.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_new2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_new2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_new2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/wnic00/distilbert-new2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_new2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_new2_pipeline_en.md new file mode 100644 index 00000000000000..f9ead7b84cd5a1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_new2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_new2_pipeline pipeline DistilBertForSequenceClassification from wnic00 +author: John Snow Labs +name: distilbert_new2_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_new2_pipeline` is a English model originally trained by wnic00. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_new2_pipeline_en_5.5.0_3.0_1727082744930.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_new2_pipeline_en_5.5.0_3.0_1727082744930.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_new2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_new2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_new2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/wnic00/distilbert-new2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_pd_books_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_pd_books_en.md new file mode 100644 index 00000000000000..baff320e758cbb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_pd_books_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_pd_books DistilBertForSequenceClassification from Gaxys +author: John Snow Labs +name: distilbert_pd_books +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_pd_books` is a English model originally trained by Gaxys. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_pd_books_en_5.5.0_3.0_1727087226968.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_pd_books_en_5.5.0_3.0_1727087226968.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_pd_books","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_pd_books", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_pd_books| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Gaxys/DistilBERT-PD_Books \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_pd_books_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_pd_books_pipeline_en.md new file mode 100644 index 00000000000000..e16c0516a907f2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_pd_books_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_pd_books_pipeline pipeline DistilBertForSequenceClassification from Gaxys +author: John Snow Labs +name: distilbert_pd_books_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_pd_books_pipeline` is a English model originally trained by Gaxys. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_pd_books_pipeline_en_5.5.0_3.0_1727087238320.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_pd_books_pipeline_en_5.5.0_3.0_1727087238320.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_pd_books_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_pd_books_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_pd_books_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Gaxys/DistilBERT-PD_Books + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_sanskrit_saskta_glue_experiment_data_aug_cola_256_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_sanskrit_saskta_glue_experiment_data_aug_cola_256_en.md new file mode 100644 index 00000000000000..b5be68090efbed --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_sanskrit_saskta_glue_experiment_data_aug_cola_256_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_sanskrit_saskta_glue_experiment_data_aug_cola_256 DistilBertForSequenceClassification from gokuls +author: John Snow Labs +name: distilbert_sanskrit_saskta_glue_experiment_data_aug_cola_256 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_sanskrit_saskta_glue_experiment_data_aug_cola_256` is a English model originally trained by gokuls. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_sanskrit_saskta_glue_experiment_data_aug_cola_256_en_5.5.0_3.0_1727082200538.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_sanskrit_saskta_glue_experiment_data_aug_cola_256_en_5.5.0_3.0_1727082200538.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_sanskrit_saskta_glue_experiment_data_aug_cola_256","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_sanskrit_saskta_glue_experiment_data_aug_cola_256", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_sanskrit_saskta_glue_experiment_data_aug_cola_256| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|71.6 MB| + +## References + +https://huggingface.co/gokuls/distilbert_sa_GLUE_Experiment_data_aug_cola_256 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_sanskrit_saskta_glue_experiment_data_aug_cola_256_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_sanskrit_saskta_glue_experiment_data_aug_cola_256_pipeline_en.md new file mode 100644 index 00000000000000..c1cbe420d5f582 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_sanskrit_saskta_glue_experiment_data_aug_cola_256_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_sanskrit_saskta_glue_experiment_data_aug_cola_256_pipeline pipeline DistilBertForSequenceClassification from gokuls +author: John Snow Labs +name: distilbert_sanskrit_saskta_glue_experiment_data_aug_cola_256_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_sanskrit_saskta_glue_experiment_data_aug_cola_256_pipeline` is a English model originally trained by gokuls. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_sanskrit_saskta_glue_experiment_data_aug_cola_256_pipeline_en_5.5.0_3.0_1727082204816.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_sanskrit_saskta_glue_experiment_data_aug_cola_256_pipeline_en_5.5.0_3.0_1727082204816.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_sanskrit_saskta_glue_experiment_data_aug_cola_256_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_sanskrit_saskta_glue_experiment_data_aug_cola_256_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_sanskrit_saskta_glue_experiment_data_aug_cola_256_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|71.7 MB| + +## References + +https://huggingface.co/gokuls/distilbert_sa_GLUE_Experiment_data_aug_cola_256 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_sanskrit_saskta_glue_experiment_data_aug_mrpc_256_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_sanskrit_saskta_glue_experiment_data_aug_mrpc_256_en.md new file mode 100644 index 00000000000000..db76f79a149c19 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_sanskrit_saskta_glue_experiment_data_aug_mrpc_256_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_sanskrit_saskta_glue_experiment_data_aug_mrpc_256 DistilBertForSequenceClassification from gokuls +author: John Snow Labs +name: distilbert_sanskrit_saskta_glue_experiment_data_aug_mrpc_256 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_sanskrit_saskta_glue_experiment_data_aug_mrpc_256` is a English model originally trained by gokuls. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_sanskrit_saskta_glue_experiment_data_aug_mrpc_256_en_5.5.0_3.0_1727108475119.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_sanskrit_saskta_glue_experiment_data_aug_mrpc_256_en_5.5.0_3.0_1727108475119.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_sanskrit_saskta_glue_experiment_data_aug_mrpc_256","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_sanskrit_saskta_glue_experiment_data_aug_mrpc_256", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_sanskrit_saskta_glue_experiment_data_aug_mrpc_256| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|71.7 MB| + +## References + +https://huggingface.co/gokuls/distilbert_sa_GLUE_Experiment_data_aug_mrpc_256 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_sanskrit_saskta_glue_experiment_logit_kd_qnli_384_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_sanskrit_saskta_glue_experiment_logit_kd_qnli_384_en.md new file mode 100644 index 00000000000000..18de5c490d0c27 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_sanskrit_saskta_glue_experiment_logit_kd_qnli_384_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_sanskrit_saskta_glue_experiment_logit_kd_qnli_384 DistilBertForSequenceClassification from gokuls +author: John Snow Labs +name: distilbert_sanskrit_saskta_glue_experiment_logit_kd_qnli_384 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_sanskrit_saskta_glue_experiment_logit_kd_qnli_384` is a English model originally trained by gokuls. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_sanskrit_saskta_glue_experiment_logit_kd_qnli_384_en_5.5.0_3.0_1727097080143.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_sanskrit_saskta_glue_experiment_logit_kd_qnli_384_en_5.5.0_3.0_1727097080143.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_sanskrit_saskta_glue_experiment_logit_kd_qnli_384","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_sanskrit_saskta_glue_experiment_logit_kd_qnli_384", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_sanskrit_saskta_glue_experiment_logit_kd_qnli_384| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|111.9 MB| + +## References + +https://huggingface.co/gokuls/distilbert_sa_GLUE_Experiment_logit_kd_qnli_384 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_sanskrit_saskta_glue_experiment_logit_kd_qnli_384_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_sanskrit_saskta_glue_experiment_logit_kd_qnli_384_pipeline_en.md new file mode 100644 index 00000000000000..d11c685f568269 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_sanskrit_saskta_glue_experiment_logit_kd_qnli_384_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_sanskrit_saskta_glue_experiment_logit_kd_qnli_384_pipeline pipeline DistilBertForSequenceClassification from gokuls +author: John Snow Labs +name: distilbert_sanskrit_saskta_glue_experiment_logit_kd_qnli_384_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_sanskrit_saskta_glue_experiment_logit_kd_qnli_384_pipeline` is a English model originally trained by gokuls. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_sanskrit_saskta_glue_experiment_logit_kd_qnli_384_pipeline_en_5.5.0_3.0_1727097085462.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_sanskrit_saskta_glue_experiment_logit_kd_qnli_384_pipeline_en_5.5.0_3.0_1727097085462.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_sanskrit_saskta_glue_experiment_logit_kd_qnli_384_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_sanskrit_saskta_glue_experiment_logit_kd_qnli_384_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_sanskrit_saskta_glue_experiment_logit_kd_qnli_384_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|111.9 MB| + +## References + +https://huggingface.co/gokuls/distilbert_sa_GLUE_Experiment_logit_kd_qnli_384 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_sentiment_test_2023dec_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_sentiment_test_2023dec_en.md new file mode 100644 index 00000000000000..3b65635eaf2eeb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_sentiment_test_2023dec_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_sentiment_test_2023dec DistilBertForSequenceClassification from FungSung +author: John Snow Labs +name: distilbert_sentiment_test_2023dec +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_sentiment_test_2023dec` is a English model originally trained by FungSung. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_sentiment_test_2023dec_en_5.5.0_3.0_1727108675899.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_sentiment_test_2023dec_en_5.5.0_3.0_1727108675899.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_sentiment_test_2023dec","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_sentiment_test_2023dec", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_sentiment_test_2023dec| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|507.6 MB| + +## References + +https://huggingface.co/FungSung/distilBert_sentiment_test_2023DEC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_sentiment_test_2023dec_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_sentiment_test_2023dec_pipeline_en.md new file mode 100644 index 00000000000000..072bb340235362 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_sentiment_test_2023dec_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_sentiment_test_2023dec_pipeline pipeline DistilBertForSequenceClassification from FungSung +author: John Snow Labs +name: distilbert_sentiment_test_2023dec_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_sentiment_test_2023dec_pipeline` is a English model originally trained by FungSung. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_sentiment_test_2023dec_pipeline_en_5.5.0_3.0_1727108699800.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_sentiment_test_2023dec_pipeline_en_5.5.0_3.0_1727108699800.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_sentiment_test_2023dec_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_sentiment_test_2023dec_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_sentiment_test_2023dec_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|507.6 MB| + +## References + +https://huggingface.co/FungSung/distilBert_sentiment_test_2023DEC + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_twitterfin_padding70model_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_twitterfin_padding70model_en.md new file mode 100644 index 00000000000000..2712701fb1312c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_twitterfin_padding70model_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_twitterfin_padding70model DistilBertForSequenceClassification from Realgon +author: John Snow Labs +name: distilbert_twitterfin_padding70model +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_twitterfin_padding70model` is a English model originally trained by Realgon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_twitterfin_padding70model_en_5.5.0_3.0_1727059744593.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_twitterfin_padding70model_en_5.5.0_3.0_1727059744593.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_twitterfin_padding70model","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_twitterfin_padding70model", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_twitterfin_padding70model| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Realgon/distilbert_twitterfin_padding70model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_twitterfin_padding70model_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_twitterfin_padding70model_pipeline_en.md new file mode 100644 index 00000000000000..545bc323a13a55 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_twitterfin_padding70model_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_twitterfin_padding70model_pipeline pipeline DistilBertForSequenceClassification from Realgon +author: John Snow Labs +name: distilbert_twitterfin_padding70model_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_twitterfin_padding70model_pipeline` is a English model originally trained by Realgon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_twitterfin_padding70model_pipeline_en_5.5.0_3.0_1727059757491.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_twitterfin_padding70model_pipeline_en_5.5.0_3.0_1727059757491.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_twitterfin_padding70model_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_twitterfin_padding70model_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_twitterfin_padding70model_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Realgon/distilbert_twitterfin_padding70model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distillbert_indiv_vocab_ver4_1_en.md b/docs/_posts/ahmedlone127/2024-09-23-distillbert_indiv_vocab_ver4_1_en.md new file mode 100644 index 00000000000000..bac514a2c10da3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distillbert_indiv_vocab_ver4_1_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distillbert_indiv_vocab_ver4_1 DistilBertForTokenClassification from AdiShingote +author: John Snow Labs +name: distillbert_indiv_vocab_ver4_1 +date: 2024-09-23 +tags: [en, open_source, onnx, token_classification, distilbert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distillbert_indiv_vocab_ver4_1` is a English model originally trained by AdiShingote. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distillbert_indiv_vocab_ver4_1_en_5.5.0_3.0_1727120690210.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distillbert_indiv_vocab_ver4_1_en_5.5.0_3.0_1727120690210.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = DistilBertForTokenClassification.pretrained("distillbert_indiv_vocab_ver4_1","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = DistilBertForTokenClassification.pretrained("distillbert_indiv_vocab_ver4_1", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distillbert_indiv_vocab_ver4_1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|352.8 MB| + +## References + +https://huggingface.co/AdiShingote/Distillbert-indiv-vocab-ver4.1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distillbert_indiv_vocab_ver4_1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-distillbert_indiv_vocab_ver4_1_pipeline_en.md new file mode 100644 index 00000000000000..c693fce390488b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distillbert_indiv_vocab_ver4_1_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distillbert_indiv_vocab_ver4_1_pipeline pipeline DistilBertForTokenClassification from AdiShingote +author: John Snow Labs +name: distillbert_indiv_vocab_ver4_1_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distillbert_indiv_vocab_ver4_1_pipeline` is a English model originally trained by AdiShingote. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distillbert_indiv_vocab_ver4_1_pipeline_en_5.5.0_3.0_1727120706053.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distillbert_indiv_vocab_ver4_1_pipeline_en_5.5.0_3.0_1727120706053.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distillbert_indiv_vocab_ver4_1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distillbert_indiv_vocab_ver4_1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distillbert_indiv_vocab_ver4_1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|352.8 MB| + +## References + +https://huggingface.co/AdiShingote/Distillbert-indiv-vocab-ver4.1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilroberta_base_ft_fitness_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilroberta_base_ft_fitness_en.md new file mode 100644 index 00000000000000..29ef3a392dd9e1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilroberta_base_ft_fitness_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilroberta_base_ft_fitness RoBertaEmbeddings from jkruk +author: John Snow Labs +name: distilroberta_base_ft_fitness +date: 2024-09-23 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilroberta_base_ft_fitness` is a English model originally trained by jkruk. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilroberta_base_ft_fitness_en_5.5.0_3.0_1727121703017.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilroberta_base_ft_fitness_en_5.5.0_3.0_1727121703017.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("distilroberta_base_ft_fitness","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("distilroberta_base_ft_fitness","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilroberta_base_ft_fitness| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|306.5 MB| + +## References + +https://huggingface.co/jkruk/distilroberta-base-ft-Fitness \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilroberta_base_ft_fitness_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilroberta_base_ft_fitness_pipeline_en.md new file mode 100644 index 00000000000000..19b34ba14d8203 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilroberta_base_ft_fitness_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilroberta_base_ft_fitness_pipeline pipeline RoBertaEmbeddings from jkruk +author: John Snow Labs +name: distilroberta_base_ft_fitness_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilroberta_base_ft_fitness_pipeline` is a English model originally trained by jkruk. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilroberta_base_ft_fitness_pipeline_en_5.5.0_3.0_1727121716644.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilroberta_base_ft_fitness_pipeline_en_5.5.0_3.0_1727121716644.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilroberta_base_ft_fitness_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilroberta_base_ft_fitness_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilroberta_base_ft_fitness_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|306.5 MB| + +## References + +https://huggingface.co/jkruk/distilroberta-base-ft-Fitness + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilroberta_base_ft_gaming_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilroberta_base_ft_gaming_en.md new file mode 100644 index 00000000000000..4c7975eb66fe55 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilroberta_base_ft_gaming_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilroberta_base_ft_gaming RoBertaEmbeddings from jkruk +author: John Snow Labs +name: distilroberta_base_ft_gaming +date: 2024-09-23 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilroberta_base_ft_gaming` is a English model originally trained by jkruk. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilroberta_base_ft_gaming_en_5.5.0_3.0_1727065957320.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilroberta_base_ft_gaming_en_5.5.0_3.0_1727065957320.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("distilroberta_base_ft_gaming","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("distilroberta_base_ft_gaming","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilroberta_base_ft_gaming| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|306.5 MB| + +## References + +https://huggingface.co/jkruk/distilroberta-base-ft-gaming \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilroberta_base_ft_gaming_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilroberta_base_ft_gaming_pipeline_en.md new file mode 100644 index 00000000000000..6196a3b3052b21 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilroberta_base_ft_gaming_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilroberta_base_ft_gaming_pipeline pipeline RoBertaEmbeddings from jkruk +author: John Snow Labs +name: distilroberta_base_ft_gaming_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilroberta_base_ft_gaming_pipeline` is a English model originally trained by jkruk. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilroberta_base_ft_gaming_pipeline_en_5.5.0_3.0_1727065971483.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilroberta_base_ft_gaming_pipeline_en_5.5.0_3.0_1727065971483.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilroberta_base_ft_gaming_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilroberta_base_ft_gaming_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilroberta_base_ft_gaming_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|306.5 MB| + +## References + +https://huggingface.co/jkruk/distilroberta-base-ft-gaming + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilroberta_base_model_transcript_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilroberta_base_model_transcript_en.md new file mode 100644 index 00000000000000..22666156d897b1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilroberta_base_model_transcript_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilroberta_base_model_transcript RoBertaEmbeddings from mahaamami +author: John Snow Labs +name: distilroberta_base_model_transcript +date: 2024-09-23 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilroberta_base_model_transcript` is a English model originally trained by mahaamami. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilroberta_base_model_transcript_en_5.5.0_3.0_1727065862431.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilroberta_base_model_transcript_en_5.5.0_3.0_1727065862431.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("distilroberta_base_model_transcript","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("distilroberta_base_model_transcript","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilroberta_base_model_transcript| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|306.5 MB| + +## References + +https://huggingface.co/mahaamami/distilroberta-base-model-transcript \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilroberta_pr200k_ep20_reuters_bloomberg_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilroberta_pr200k_ep20_reuters_bloomberg_en.md new file mode 100644 index 00000000000000..2d33cc45694392 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilroberta_pr200k_ep20_reuters_bloomberg_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilroberta_pr200k_ep20_reuters_bloomberg RoBertaEmbeddings from judy93536 +author: John Snow Labs +name: distilroberta_pr200k_ep20_reuters_bloomberg +date: 2024-09-23 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilroberta_pr200k_ep20_reuters_bloomberg` is a English model originally trained by judy93536. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilroberta_pr200k_ep20_reuters_bloomberg_en_5.5.0_3.0_1727091853346.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilroberta_pr200k_ep20_reuters_bloomberg_en_5.5.0_3.0_1727091853346.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("distilroberta_pr200k_ep20_reuters_bloomberg","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("distilroberta_pr200k_ep20_reuters_bloomberg","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilroberta_pr200k_ep20_reuters_bloomberg| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|306.4 MB| + +## References + +https://huggingface.co/judy93536/distilroberta-pr200k-ep20-reuters-bloomberg \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilroberta_pr200k_ep20_reuters_bloomberg_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilroberta_pr200k_ep20_reuters_bloomberg_pipeline_en.md new file mode 100644 index 00000000000000..e06422c59196d7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilroberta_pr200k_ep20_reuters_bloomberg_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilroberta_pr200k_ep20_reuters_bloomberg_pipeline pipeline RoBertaEmbeddings from judy93536 +author: John Snow Labs +name: distilroberta_pr200k_ep20_reuters_bloomberg_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilroberta_pr200k_ep20_reuters_bloomberg_pipeline` is a English model originally trained by judy93536. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilroberta_pr200k_ep20_reuters_bloomberg_pipeline_en_5.5.0_3.0_1727091868200.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilroberta_pr200k_ep20_reuters_bloomberg_pipeline_en_5.5.0_3.0_1727091868200.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilroberta_pr200k_ep20_reuters_bloomberg_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilroberta_pr200k_ep20_reuters_bloomberg_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilroberta_pr200k_ep20_reuters_bloomberg_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|306.4 MB| + +## References + +https://huggingface.co/judy93536/distilroberta-pr200k-ep20-reuters-bloomberg + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilroberta_rb156k_opt15_ep20_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilroberta_rb156k_opt15_ep20_pipeline_en.md new file mode 100644 index 00000000000000..49ea282875f861 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilroberta_rb156k_opt15_ep20_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilroberta_rb156k_opt15_ep20_pipeline pipeline RoBertaEmbeddings from judy93536 +author: John Snow Labs +name: distilroberta_rb156k_opt15_ep20_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilroberta_rb156k_opt15_ep20_pipeline` is a English model originally trained by judy93536. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilroberta_rb156k_opt15_ep20_pipeline_en_5.5.0_3.0_1727092014779.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilroberta_rb156k_opt15_ep20_pipeline_en_5.5.0_3.0_1727092014779.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilroberta_rb156k_opt15_ep20_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilroberta_rb156k_opt15_ep20_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilroberta_rb156k_opt15_ep20_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|306.0 MB| + +## References + +https://huggingface.co/judy93536/distilroberta-rb156k-opt15-ep20 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-dp_roberta_large_finetuned_en.md b/docs/_posts/ahmedlone127/2024-09-23-dp_roberta_large_finetuned_en.md new file mode 100644 index 00000000000000..b3592bbe8ca64a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-dp_roberta_large_finetuned_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English dp_roberta_large_finetuned RoBertaForSequenceClassification from GRMenon +author: John Snow Labs +name: dp_roberta_large_finetuned +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`dp_roberta_large_finetuned` is a English model originally trained by GRMenon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/dp_roberta_large_finetuned_en_5.5.0_3.0_1727085991811.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/dp_roberta_large_finetuned_en_5.5.0_3.0_1727085991811.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("dp_roberta_large_finetuned","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("dp_roberta_large_finetuned", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|dp_roberta_large_finetuned| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/GRMenon/dp-roberta-large-finetuned \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-dp_roberta_large_finetuned_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-dp_roberta_large_finetuned_pipeline_en.md new file mode 100644 index 00000000000000..fa15ba9d299395 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-dp_roberta_large_finetuned_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English dp_roberta_large_finetuned_pipeline pipeline RoBertaForSequenceClassification from GRMenon +author: John Snow Labs +name: dp_roberta_large_finetuned_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`dp_roberta_large_finetuned_pipeline` is a English model originally trained by GRMenon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/dp_roberta_large_finetuned_pipeline_en_5.5.0_3.0_1727086075857.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/dp_roberta_large_finetuned_pipeline_en_5.5.0_3.0_1727086075857.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("dp_roberta_large_finetuned_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("dp_roberta_large_finetuned_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|dp_roberta_large_finetuned_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/GRMenon/dp-roberta-large-finetuned + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-efficient_mlm_m0_40_finetuned_cola_en.md b/docs/_posts/ahmedlone127/2024-09-23-efficient_mlm_m0_40_finetuned_cola_en.md new file mode 100644 index 00000000000000..e5417728713839 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-efficient_mlm_m0_40_finetuned_cola_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English efficient_mlm_m0_40_finetuned_cola RoBertaForSequenceClassification from QGXQ +author: John Snow Labs +name: efficient_mlm_m0_40_finetuned_cola +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`efficient_mlm_m0_40_finetuned_cola` is a English model originally trained by QGXQ. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/efficient_mlm_m0_40_finetuned_cola_en_5.5.0_3.0_1727055277613.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/efficient_mlm_m0_40_finetuned_cola_en_5.5.0_3.0_1727055277613.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("efficient_mlm_m0_40_finetuned_cola","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("efficient_mlm_m0_40_finetuned_cola", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|efficient_mlm_m0_40_finetuned_cola| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/QGXQ/efficient_mlm_m0.40-finetuned-cola \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-efficient_mlm_m0_40_finetuned_cola_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-efficient_mlm_m0_40_finetuned_cola_pipeline_en.md new file mode 100644 index 00000000000000..70f64e6434f767 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-efficient_mlm_m0_40_finetuned_cola_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English efficient_mlm_m0_40_finetuned_cola_pipeline pipeline RoBertaForSequenceClassification from QGXQ +author: John Snow Labs +name: efficient_mlm_m0_40_finetuned_cola_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`efficient_mlm_m0_40_finetuned_cola_pipeline` is a English model originally trained by QGXQ. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/efficient_mlm_m0_40_finetuned_cola_pipeline_en_5.5.0_3.0_1727055342728.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/efficient_mlm_m0_40_finetuned_cola_pipeline_en_5.5.0_3.0_1727055342728.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("efficient_mlm_m0_40_finetuned_cola_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("efficient_mlm_m0_40_finetuned_cola_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|efficient_mlm_m0_40_finetuned_cola_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/QGXQ/efficient_mlm_m0.40-finetuned-cola + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-ellis_v2_emotion_leadership_multi_label_en.md b/docs/_posts/ahmedlone127/2024-09-23-ellis_v2_emotion_leadership_multi_label_en.md new file mode 100644 index 00000000000000..1905103958f5e6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-ellis_v2_emotion_leadership_multi_label_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English ellis_v2_emotion_leadership_multi_label DistilBertForSequenceClassification from gsl22 +author: John Snow Labs +name: ellis_v2_emotion_leadership_multi_label +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ellis_v2_emotion_leadership_multi_label` is a English model originally trained by gsl22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ellis_v2_emotion_leadership_multi_label_en_5.5.0_3.0_1727082268801.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ellis_v2_emotion_leadership_multi_label_en_5.5.0_3.0_1727082268801.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("ellis_v2_emotion_leadership_multi_label","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("ellis_v2_emotion_leadership_multi_label", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ellis_v2_emotion_leadership_multi_label| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/gsl22/ellis-v2-emotion-leadership-multi-label \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-ellis_v2_emotion_leadership_multi_label_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-ellis_v2_emotion_leadership_multi_label_pipeline_en.md new file mode 100644 index 00000000000000..19cc838667dadd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-ellis_v2_emotion_leadership_multi_label_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English ellis_v2_emotion_leadership_multi_label_pipeline pipeline DistilBertForSequenceClassification from gsl22 +author: John Snow Labs +name: ellis_v2_emotion_leadership_multi_label_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ellis_v2_emotion_leadership_multi_label_pipeline` is a English model originally trained by gsl22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ellis_v2_emotion_leadership_multi_label_pipeline_en_5.5.0_3.0_1727082282019.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ellis_v2_emotion_leadership_multi_label_pipeline_en_5.5.0_3.0_1727082282019.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("ellis_v2_emotion_leadership_multi_label_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("ellis_v2_emotion_leadership_multi_label_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ellis_v2_emotion_leadership_multi_label_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/gsl22/ellis-v2-emotion-leadership-multi-label + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-emotions_roberta_test_4_en.md b/docs/_posts/ahmedlone127/2024-09-23-emotions_roberta_test_4_en.md new file mode 100644 index 00000000000000..237adc198063e6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-emotions_roberta_test_4_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English emotions_roberta_test_4 RoBertaForSequenceClassification from Zeyu2000 +author: John Snow Labs +name: emotions_roberta_test_4 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`emotions_roberta_test_4` is a English model originally trained by Zeyu2000. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/emotions_roberta_test_4_en_5.5.0_3.0_1727054829063.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/emotions_roberta_test_4_en_5.5.0_3.0_1727054829063.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("emotions_roberta_test_4","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("emotions_roberta_test_4", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|emotions_roberta_test_4| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|450.3 MB| + +## References + +https://huggingface.co/Zeyu2000/emotions-roberta-test-4 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-emotions_roberta_test_4_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-emotions_roberta_test_4_pipeline_en.md new file mode 100644 index 00000000000000..46e76a321683e0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-emotions_roberta_test_4_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English emotions_roberta_test_4_pipeline pipeline RoBertaForSequenceClassification from Zeyu2000 +author: John Snow Labs +name: emotions_roberta_test_4_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`emotions_roberta_test_4_pipeline` is a English model originally trained by Zeyu2000. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/emotions_roberta_test_4_pipeline_en_5.5.0_3.0_1727054851569.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/emotions_roberta_test_4_pipeline_en_5.5.0_3.0_1727054851569.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("emotions_roberta_test_4_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("emotions_roberta_test_4_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|emotions_roberta_test_4_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|450.3 MB| + +## References + +https://huggingface.co/Zeyu2000/emotions-roberta-test-4 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-enlm_roberta_imdb_final_en.md b/docs/_posts/ahmedlone127/2024-09-23-enlm_roberta_imdb_final_en.md new file mode 100644 index 00000000000000..1b29a2cd53df48 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-enlm_roberta_imdb_final_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English enlm_roberta_imdb_final XlmRoBertaForSequenceClassification from manirai91 +author: John Snow Labs +name: enlm_roberta_imdb_final +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`enlm_roberta_imdb_final` is a English model originally trained by manirai91. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/enlm_roberta_imdb_final_en_5.5.0_3.0_1727088093590.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/enlm_roberta_imdb_final_en_5.5.0_3.0_1727088093590.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("enlm_roberta_imdb_final","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("enlm_roberta_imdb_final", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|enlm_roberta_imdb_final| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|466.5 MB| + +## References + +https://huggingface.co/manirai91/enlm-roberta-imdb-final \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-enlm_roberta_imdb_final_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-enlm_roberta_imdb_final_pipeline_en.md new file mode 100644 index 00000000000000..d6a99bef795f16 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-enlm_roberta_imdb_final_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English enlm_roberta_imdb_final_pipeline pipeline XlmRoBertaForSequenceClassification from manirai91 +author: John Snow Labs +name: enlm_roberta_imdb_final_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`enlm_roberta_imdb_final_pipeline` is a English model originally trained by manirai91. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/enlm_roberta_imdb_final_pipeline_en_5.5.0_3.0_1727088120024.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/enlm_roberta_imdb_final_pipeline_en_5.5.0_3.0_1727088120024.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("enlm_roberta_imdb_final_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("enlm_roberta_imdb_final_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|enlm_roberta_imdb_final_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|466.6 MB| + +## References + +https://huggingface.co/manirai91/enlm-roberta-imdb-final + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-ensemble_roberta_en.md b/docs/_posts/ahmedlone127/2024-09-23-ensemble_roberta_en.md new file mode 100644 index 00000000000000..24f608a4499512 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-ensemble_roberta_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English ensemble_roberta RoBertaForSequenceClassification from Crayo1902 +author: John Snow Labs +name: ensemble_roberta +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ensemble_roberta` is a English model originally trained by Crayo1902. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ensemble_roberta_en_5.5.0_3.0_1727135664732.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ensemble_roberta_en_5.5.0_3.0_1727135664732.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("ensemble_roberta","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("ensemble_roberta", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ensemble_roberta| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|430.7 MB| + +## References + +https://huggingface.co/Crayo1902/ensemble-roberta \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-ensemble_roberta_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-ensemble_roberta_pipeline_en.md new file mode 100644 index 00000000000000..4a8b4e4958135a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-ensemble_roberta_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English ensemble_roberta_pipeline pipeline RoBertaForSequenceClassification from Crayo1902 +author: John Snow Labs +name: ensemble_roberta_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ensemble_roberta_pipeline` is a English model originally trained by Crayo1902. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ensemble_roberta_pipeline_en_5.5.0_3.0_1727135692542.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ensemble_roberta_pipeline_en_5.5.0_3.0_1727135692542.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("ensemble_roberta_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("ensemble_roberta_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ensemble_roberta_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|430.7 MB| + +## References + +https://huggingface.co/Crayo1902/ensemble-roberta + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-environmentalbert_forest_en.md b/docs/_posts/ahmedlone127/2024-09-23-environmentalbert_forest_en.md new file mode 100644 index 00000000000000..bb74f8e63682c2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-environmentalbert_forest_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English environmentalbert_forest RoBertaForSequenceClassification from ESGBERT +author: John Snow Labs +name: environmentalbert_forest +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`environmentalbert_forest` is a English model originally trained by ESGBERT. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/environmentalbert_forest_en_5.5.0_3.0_1727135072808.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/environmentalbert_forest_en_5.5.0_3.0_1727135072808.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("environmentalbert_forest","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("environmentalbert_forest", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|environmentalbert_forest| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|309.0 MB| + +## References + +https://huggingface.co/ESGBERT/EnvironmentalBERT-forest \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-environmentalbert_forest_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-environmentalbert_forest_pipeline_en.md new file mode 100644 index 00000000000000..9db6a3b81f56d3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-environmentalbert_forest_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English environmentalbert_forest_pipeline pipeline RoBertaForSequenceClassification from ESGBERT +author: John Snow Labs +name: environmentalbert_forest_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`environmentalbert_forest_pipeline` is a English model originally trained by ESGBERT. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/environmentalbert_forest_pipeline_en_5.5.0_3.0_1727135088453.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/environmentalbert_forest_pipeline_en_5.5.0_3.0_1727135088453.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("environmentalbert_forest_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("environmentalbert_forest_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|environmentalbert_forest_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|309.0 MB| + +## References + +https://huggingface.co/ESGBERT/EnvironmentalBERT-forest + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-fake_news_en.md b/docs/_posts/ahmedlone127/2024-09-23-fake_news_en.md new file mode 100644 index 00000000000000..5fdc41856f5811 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-fake_news_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English fake_news DistilBertForSequenceClassification from nlp-godfathers +author: John Snow Labs +name: fake_news +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`fake_news` is a English model originally trained by nlp-godfathers. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/fake_news_en_5.5.0_3.0_1727082217772.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/fake_news_en_5.5.0_3.0_1727082217772.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("fake_news","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("fake_news", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|fake_news| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.2 MB| + +## References + +https://huggingface.co/nlp-godfathers/fake_news \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-fake_news_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-fake_news_pipeline_en.md new file mode 100644 index 00000000000000..786ca13f902edf --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-fake_news_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English fake_news_pipeline pipeline DistilBertForSequenceClassification from nlp-godfathers +author: John Snow Labs +name: fake_news_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`fake_news_pipeline` is a English model originally trained by nlp-godfathers. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/fake_news_pipeline_en_5.5.0_3.0_1727082229446.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/fake_news_pipeline_en_5.5.0_3.0_1727082229446.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("fake_news_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("fake_news_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|fake_news_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.3 MB| + +## References + +https://huggingface.co/nlp-godfathers/fake_news + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-financial_phrasebank_fulltraindata_8020split_en.md b/docs/_posts/ahmedlone127/2024-09-23-financial_phrasebank_fulltraindata_8020split_en.md new file mode 100644 index 00000000000000..1a970bf5ac722b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-financial_phrasebank_fulltraindata_8020split_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English financial_phrasebank_fulltraindata_8020split RoBertaForSequenceClassification from kruthof +author: John Snow Labs +name: financial_phrasebank_fulltraindata_8020split +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`financial_phrasebank_fulltraindata_8020split` is a English model originally trained by kruthof. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/financial_phrasebank_fulltraindata_8020split_en_5.5.0_3.0_1727135422668.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/financial_phrasebank_fulltraindata_8020split_en_5.5.0_3.0_1727135422668.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("financial_phrasebank_fulltraindata_8020split","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("financial_phrasebank_fulltraindata_8020split", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|financial_phrasebank_fulltraindata_8020split| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|435.1 MB| + +## References + +https://huggingface.co/kruthof/financial_phrasebank_fullTrainData_8020split \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-financial_phrasebank_fulltraindata_8020split_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-financial_phrasebank_fulltraindata_8020split_pipeline_en.md new file mode 100644 index 00000000000000..a9ca6f10ff88c1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-financial_phrasebank_fulltraindata_8020split_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English financial_phrasebank_fulltraindata_8020split_pipeline pipeline RoBertaForSequenceClassification from kruthof +author: John Snow Labs +name: financial_phrasebank_fulltraindata_8020split_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`financial_phrasebank_fulltraindata_8020split_pipeline` is a English model originally trained by kruthof. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/financial_phrasebank_fulltraindata_8020split_pipeline_en_5.5.0_3.0_1727135453320.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/financial_phrasebank_fulltraindata_8020split_pipeline_en_5.5.0_3.0_1727135453320.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("financial_phrasebank_fulltraindata_8020split_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("financial_phrasebank_fulltraindata_8020split_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|financial_phrasebank_fulltraindata_8020split_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|435.1 MB| + +## References + +https://huggingface.co/kruthof/financial_phrasebank_fullTrainData_8020split + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-fine_tune_whisper_small_sania67_en.md b/docs/_posts/ahmedlone127/2024-09-23-fine_tune_whisper_small_sania67_en.md new file mode 100644 index 00000000000000..1ff2d57dc8be5d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-fine_tune_whisper_small_sania67_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English fine_tune_whisper_small_sania67 WhisperForCTC from Sania67 +author: John Snow Labs +name: fine_tune_whisper_small_sania67 +date: 2024-09-23 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`fine_tune_whisper_small_sania67` is a English model originally trained by Sania67. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/fine_tune_whisper_small_sania67_en_5.5.0_3.0_1727076190927.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/fine_tune_whisper_small_sania67_en_5.5.0_3.0_1727076190927.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("fine_tune_whisper_small_sania67","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("fine_tune_whisper_small_sania67", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|fine_tune_whisper_small_sania67| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/Sania67/Fine_tune_whisper_small \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-fine_tuned_roberta_base_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-fine_tuned_roberta_base_pipeline_en.md new file mode 100644 index 00000000000000..2050ec673e31d7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-fine_tuned_roberta_base_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English fine_tuned_roberta_base_pipeline pipeline BertForQuestionAnswering from kiwakwok +author: John Snow Labs +name: fine_tuned_roberta_base_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`fine_tuned_roberta_base_pipeline` is a English model originally trained by kiwakwok. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/fine_tuned_roberta_base_pipeline_en_5.5.0_3.0_1727106733404.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/fine_tuned_roberta_base_pipeline_en_5.5.0_3.0_1727106733404.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("fine_tuned_roberta_base_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("fine_tuned_roberta_base_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|fine_tuned_roberta_base_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|382.3 MB| + +## References + +https://huggingface.co/kiwakwok/fine-tuned-roberta-base + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-fine_tuning_en.md b/docs/_posts/ahmedlone127/2024-09-23-fine_tuning_en.md new file mode 100644 index 00000000000000..c175317a51d20f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-fine_tuning_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English fine_tuning DistilBertForSequenceClassification from StevensRV93 +author: John Snow Labs +name: fine_tuning +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`fine_tuning` is a English model originally trained by StevensRV93. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/fine_tuning_en_5.5.0_3.0_1727093913733.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/fine_tuning_en_5.5.0_3.0_1727093913733.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("fine_tuning","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("fine_tuning", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|fine_tuning| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/StevensRV93/Fine_tuning \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-finetuned__roberta_clinical_wl_spanish__augmented_ultrasounds_ner_en.md b/docs/_posts/ahmedlone127/2024-09-23-finetuned__roberta_clinical_wl_spanish__augmented_ultrasounds_ner_en.md new file mode 100644 index 00000000000000..c96c44b715e302 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-finetuned__roberta_clinical_wl_spanish__augmented_ultrasounds_ner_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuned__roberta_clinical_wl_spanish__augmented_ultrasounds_ner RoBertaForTokenClassification from manucos +author: John Snow Labs +name: finetuned__roberta_clinical_wl_spanish__augmented_ultrasounds_ner +date: 2024-09-23 +tags: [en, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuned__roberta_clinical_wl_spanish__augmented_ultrasounds_ner` is a English model originally trained by manucos. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuned__roberta_clinical_wl_spanish__augmented_ultrasounds_ner_en_5.5.0_3.0_1727081601983.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuned__roberta_clinical_wl_spanish__augmented_ultrasounds_ner_en_5.5.0_3.0_1727081601983.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("finetuned__roberta_clinical_wl_spanish__augmented_ultrasounds_ner","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("finetuned__roberta_clinical_wl_spanish__augmented_ultrasounds_ner", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuned__roberta_clinical_wl_spanish__augmented_ultrasounds_ner| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|469.7 MB| + +## References + +https://huggingface.co/manucos/finetuned__roberta-clinical-wl-es__augmented-ultrasounds-ner \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-finetuned__roberta_clinical_wl_spanish__augmented_ultrasounds_ner_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-finetuned__roberta_clinical_wl_spanish__augmented_ultrasounds_ner_pipeline_en.md new file mode 100644 index 00000000000000..801c516e3ea8ed --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-finetuned__roberta_clinical_wl_spanish__augmented_ultrasounds_ner_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuned__roberta_clinical_wl_spanish__augmented_ultrasounds_ner_pipeline pipeline RoBertaForTokenClassification from manucos +author: John Snow Labs +name: finetuned__roberta_clinical_wl_spanish__augmented_ultrasounds_ner_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuned__roberta_clinical_wl_spanish__augmented_ultrasounds_ner_pipeline` is a English model originally trained by manucos. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuned__roberta_clinical_wl_spanish__augmented_ultrasounds_ner_pipeline_en_5.5.0_3.0_1727081628091.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuned__roberta_clinical_wl_spanish__augmented_ultrasounds_ner_pipeline_en_5.5.0_3.0_1727081628091.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuned__roberta_clinical_wl_spanish__augmented_ultrasounds_ner_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuned__roberta_clinical_wl_spanish__augmented_ultrasounds_ner_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuned__roberta_clinical_wl_spanish__augmented_ultrasounds_ner_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|469.8 MB| + +## References + +https://huggingface.co/manucos/finetuned__roberta-clinical-wl-es__augmented-ultrasounds-ner + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-finetuned_bio_clinicalbert_2012i2b2_en.md b/docs/_posts/ahmedlone127/2024-09-23-finetuned_bio_clinicalbert_2012i2b2_en.md new file mode 100644 index 00000000000000..e93f7621f1f101 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-finetuned_bio_clinicalbert_2012i2b2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuned_bio_clinicalbert_2012i2b2 BertForTokenClassification from xiaojingduan +author: John Snow Labs +name: finetuned_bio_clinicalbert_2012i2b2 +date: 2024-09-23 +tags: [en, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuned_bio_clinicalbert_2012i2b2` is a English model originally trained by xiaojingduan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuned_bio_clinicalbert_2012i2b2_en_5.5.0_3.0_1727060787463.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuned_bio_clinicalbert_2012i2b2_en_5.5.0_3.0_1727060787463.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("finetuned_bio_clinicalbert_2012i2b2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("finetuned_bio_clinicalbert_2012i2b2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuned_bio_clinicalbert_2012i2b2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.3 MB| + +## References + +https://huggingface.co/xiaojingduan/finetuned_bio_clinicalbert_2012i2b2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-finetuned_bio_clinicalbert_2012i2b2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-finetuned_bio_clinicalbert_2012i2b2_pipeline_en.md new file mode 100644 index 00000000000000..945d0e508f7bd4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-finetuned_bio_clinicalbert_2012i2b2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuned_bio_clinicalbert_2012i2b2_pipeline pipeline BertForTokenClassification from xiaojingduan +author: John Snow Labs +name: finetuned_bio_clinicalbert_2012i2b2_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuned_bio_clinicalbert_2012i2b2_pipeline` is a English model originally trained by xiaojingduan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuned_bio_clinicalbert_2012i2b2_pipeline_en_5.5.0_3.0_1727060807463.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuned_bio_clinicalbert_2012i2b2_pipeline_en_5.5.0_3.0_1727060807463.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuned_bio_clinicalbert_2012i2b2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuned_bio_clinicalbert_2012i2b2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuned_bio_clinicalbert_2012i2b2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|403.4 MB| + +## References + +https://huggingface.co/xiaojingduan/finetuned_bio_clinicalbert_2012i2b2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-finetuned_model_imsoumyaneel_25k_epoch_10_en.md b/docs/_posts/ahmedlone127/2024-09-23-finetuned_model_imsoumyaneel_25k_epoch_10_en.md new file mode 100644 index 00000000000000..83dc09fdfb9fc6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-finetuned_model_imsoumyaneel_25k_epoch_10_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuned_model_imsoumyaneel_25k_epoch_10 DistilBertForSequenceClassification from rahulgaikwad007 +author: John Snow Labs +name: finetuned_model_imsoumyaneel_25k_epoch_10 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuned_model_imsoumyaneel_25k_epoch_10` is a English model originally trained by rahulgaikwad007. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuned_model_imsoumyaneel_25k_epoch_10_en_5.5.0_3.0_1727086875704.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuned_model_imsoumyaneel_25k_epoch_10_en_5.5.0_3.0_1727086875704.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuned_model_imsoumyaneel_25k_epoch_10","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuned_model_imsoumyaneel_25k_epoch_10", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuned_model_imsoumyaneel_25k_epoch_10| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/rahulgaikwad007/Finetuned-model-imsoumyaneel-25k-Epoch-10 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-finetuned_model_imsoumyaneel_25k_epoch_10_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-finetuned_model_imsoumyaneel_25k_epoch_10_pipeline_en.md new file mode 100644 index 00000000000000..afd44c39985dd0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-finetuned_model_imsoumyaneel_25k_epoch_10_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuned_model_imsoumyaneel_25k_epoch_10_pipeline pipeline DistilBertForSequenceClassification from rahulgaikwad007 +author: John Snow Labs +name: finetuned_model_imsoumyaneel_25k_epoch_10_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuned_model_imsoumyaneel_25k_epoch_10_pipeline` is a English model originally trained by rahulgaikwad007. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuned_model_imsoumyaneel_25k_epoch_10_pipeline_en_5.5.0_3.0_1727086887236.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuned_model_imsoumyaneel_25k_epoch_10_pipeline_en_5.5.0_3.0_1727086887236.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuned_model_imsoumyaneel_25k_epoch_10_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuned_model_imsoumyaneel_25k_epoch_10_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuned_model_imsoumyaneel_25k_epoch_10_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/rahulgaikwad007/Finetuned-model-imsoumyaneel-25k-Epoch-10 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-finetuned_model_on_4k_samples_en.md b/docs/_posts/ahmedlone127/2024-09-23-finetuned_model_on_4k_samples_en.md new file mode 100644 index 00000000000000..f38bb4d72abf73 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-finetuned_model_on_4k_samples_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuned_model_on_4k_samples DistilBertForSequenceClassification from Wolverine001 +author: John Snow Labs +name: finetuned_model_on_4k_samples +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuned_model_on_4k_samples` is a English model originally trained by Wolverine001. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuned_model_on_4k_samples_en_5.5.0_3.0_1727059646912.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuned_model_on_4k_samples_en_5.5.0_3.0_1727059646912.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuned_model_on_4k_samples","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuned_model_on_4k_samples", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuned_model_on_4k_samples| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Wolverine001/finetuned_model_on-4k-samples \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-finetuned_model_on_4k_samples_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-finetuned_model_on_4k_samples_pipeline_en.md new file mode 100644 index 00000000000000..06a3f249375fa8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-finetuned_model_on_4k_samples_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuned_model_on_4k_samples_pipeline pipeline DistilBertForSequenceClassification from Wolverine001 +author: John Snow Labs +name: finetuned_model_on_4k_samples_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuned_model_on_4k_samples_pipeline` is a English model originally trained by Wolverine001. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuned_model_on_4k_samples_pipeline_en_5.5.0_3.0_1727059659123.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuned_model_on_4k_samples_pipeline_en_5.5.0_3.0_1727059659123.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuned_model_on_4k_samples_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuned_model_on_4k_samples_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuned_model_on_4k_samples_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Wolverine001/finetuned_model_on-4k-samples + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-finetuned_sentiment_modell_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-finetuned_sentiment_modell_pipeline_en.md new file mode 100644 index 00000000000000..5a9aa911b6c056 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-finetuned_sentiment_modell_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuned_sentiment_modell_pipeline pipeline XlmRoBertaForSequenceClassification from Justin-J +author: John Snow Labs +name: finetuned_sentiment_modell_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuned_sentiment_modell_pipeline` is a English model originally trained by Justin-J. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuned_sentiment_modell_pipeline_en_5.5.0_3.0_1727126220838.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuned_sentiment_modell_pipeline_en_5.5.0_3.0_1727126220838.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuned_sentiment_modell_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuned_sentiment_modell_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuned_sentiment_modell_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.0 GB| + +## References + +https://huggingface.co/Justin-J/finetuned_sentiment_modell + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-finetuning_emotion_model_eric313_en.md b/docs/_posts/ahmedlone127/2024-09-23-finetuning_emotion_model_eric313_en.md new file mode 100644 index 00000000000000..03c2b24faf0b16 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-finetuning_emotion_model_eric313_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuning_emotion_model_eric313 DistilBertForSequenceClassification from Eric313 +author: John Snow Labs +name: finetuning_emotion_model_eric313 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_emotion_model_eric313` is a English model originally trained by Eric313. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_emotion_model_eric313_en_5.5.0_3.0_1727074054564.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_emotion_model_eric313_en_5.5.0_3.0_1727074054564.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_emotion_model_eric313","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_emotion_model_eric313", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_emotion_model_eric313| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Eric313/finetuning-emotion-model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-finetuning_emotion_model_eric313_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-finetuning_emotion_model_eric313_pipeline_en.md new file mode 100644 index 00000000000000..ffbcbb6b39e513 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-finetuning_emotion_model_eric313_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuning_emotion_model_eric313_pipeline pipeline DistilBertForSequenceClassification from Eric313 +author: John Snow Labs +name: finetuning_emotion_model_eric313_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_emotion_model_eric313_pipeline` is a English model originally trained by Eric313. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_emotion_model_eric313_pipeline_en_5.5.0_3.0_1727074066174.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_emotion_model_eric313_pipeline_en_5.5.0_3.0_1727074066174.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuning_emotion_model_eric313_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuning_emotion_model_eric313_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_emotion_model_eric313_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Eric313/finetuning-emotion-model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-finetuning_emotion_model_keinpyisi_en.md b/docs/_posts/ahmedlone127/2024-09-23-finetuning_emotion_model_keinpyisi_en.md new file mode 100644 index 00000000000000..18fee8b52a851b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-finetuning_emotion_model_keinpyisi_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuning_emotion_model_keinpyisi DistilBertForSequenceClassification from keinpyisi +author: John Snow Labs +name: finetuning_emotion_model_keinpyisi +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_emotion_model_keinpyisi` is a English model originally trained by keinpyisi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_emotion_model_keinpyisi_en_5.5.0_3.0_1727094018184.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_emotion_model_keinpyisi_en_5.5.0_3.0_1727094018184.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_emotion_model_keinpyisi","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_emotion_model_keinpyisi", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_emotion_model_keinpyisi| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/keinpyisi/finetuning-emotion-model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-finetuning_sentiment_analysis_model_team_28_mekteck_en.md b/docs/_posts/ahmedlone127/2024-09-23-finetuning_sentiment_analysis_model_team_28_mekteck_en.md new file mode 100644 index 00000000000000..34d2b599650e1a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-finetuning_sentiment_analysis_model_team_28_mekteck_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuning_sentiment_analysis_model_team_28_mekteck DistilBertForSequenceClassification from Mekteck +author: John Snow Labs +name: finetuning_sentiment_analysis_model_team_28_mekteck +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_analysis_model_team_28_mekteck` is a English model originally trained by Mekteck. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_analysis_model_team_28_mekteck_en_5.5.0_3.0_1727093817128.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_analysis_model_team_28_mekteck_en_5.5.0_3.0_1727093817128.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_analysis_model_team_28_mekteck","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_analysis_model_team_28_mekteck", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_analysis_model_team_28_mekteck| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Mekteck/finetuning-sentiment-analysis-model-team-28 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-finetuning_sentiment_model_3000_samples_assoboss_en.md b/docs/_posts/ahmedlone127/2024-09-23-finetuning_sentiment_model_3000_samples_assoboss_en.md new file mode 100644 index 00000000000000..552fd6348a54ef --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-finetuning_sentiment_model_3000_samples_assoboss_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_assoboss DistilBertForSequenceClassification from assoboss +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_assoboss +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_assoboss` is a English model originally trained by assoboss. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_assoboss_en_5.5.0_3.0_1727074269741.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_assoboss_en_5.5.0_3.0_1727074269741.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_assoboss","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_assoboss", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_assoboss| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/assoboss/finetuning-sentiment-model-3000-samples \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-finetuning_sentiment_model_3000_samples_ducpham1501_en.md b/docs/_posts/ahmedlone127/2024-09-23-finetuning_sentiment_model_3000_samples_ducpham1501_en.md new file mode 100644 index 00000000000000..9c1f43ba19bdaf --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-finetuning_sentiment_model_3000_samples_ducpham1501_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_ducpham1501 DistilBertForSequenceClassification from DucPham1501 +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_ducpham1501 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_ducpham1501` is a English model originally trained by DucPham1501. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_ducpham1501_en_5.5.0_3.0_1727087281383.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_ducpham1501_en_5.5.0_3.0_1727087281383.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_ducpham1501","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_ducpham1501", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_ducpham1501| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/DucPham1501/finetuning-sentiment-model-3000-samples \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-finetuning_sentiment_model_3000_samples_ducpham1501_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-finetuning_sentiment_model_3000_samples_ducpham1501_pipeline_en.md new file mode 100644 index 00000000000000..a9746674632a4b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-finetuning_sentiment_model_3000_samples_ducpham1501_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_ducpham1501_pipeline pipeline DistilBertForSequenceClassification from DucPham1501 +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_ducpham1501_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_ducpham1501_pipeline` is a English model originally trained by DucPham1501. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_ducpham1501_pipeline_en_5.5.0_3.0_1727087293159.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_ducpham1501_pipeline_en_5.5.0_3.0_1727087293159.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuning_sentiment_model_3000_samples_ducpham1501_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuning_sentiment_model_3000_samples_ducpham1501_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_ducpham1501_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/DucPham1501/finetuning-sentiment-model-3000-samples + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-finetuning_sentiment_model_3000_samples_emmaly0937245_en.md b/docs/_posts/ahmedlone127/2024-09-23-finetuning_sentiment_model_3000_samples_emmaly0937245_en.md new file mode 100644 index 00000000000000..f70dc8870db5b5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-finetuning_sentiment_model_3000_samples_emmaly0937245_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_emmaly0937245 DistilBertForSequenceClassification from emmaly0937245 +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_emmaly0937245 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_emmaly0937245` is a English model originally trained by emmaly0937245. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_emmaly0937245_en_5.5.0_3.0_1727059655993.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_emmaly0937245_en_5.5.0_3.0_1727059655993.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_emmaly0937245","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_emmaly0937245", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_emmaly0937245| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/emmaly0937245/finetuning-sentiment-model-3000-samples \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-finetuning_sentiment_model_3000_samples_emmaly0937245_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-finetuning_sentiment_model_3000_samples_emmaly0937245_pipeline_en.md new file mode 100644 index 00000000000000..c98fc2486bcaf9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-finetuning_sentiment_model_3000_samples_emmaly0937245_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_emmaly0937245_pipeline pipeline DistilBertForSequenceClassification from emmaly0937245 +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_emmaly0937245_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_emmaly0937245_pipeline` is a English model originally trained by emmaly0937245. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_emmaly0937245_pipeline_en_5.5.0_3.0_1727059668020.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_emmaly0937245_pipeline_en_5.5.0_3.0_1727059668020.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuning_sentiment_model_3000_samples_emmaly0937245_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuning_sentiment_model_3000_samples_emmaly0937245_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_emmaly0937245_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/emmaly0937245/finetuning-sentiment-model-3000-samples + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-finetuning_sentiment_model_3000_samples_ginosky_en.md b/docs/_posts/ahmedlone127/2024-09-23-finetuning_sentiment_model_3000_samples_ginosky_en.md new file mode 100644 index 00000000000000..a69b32765d4d56 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-finetuning_sentiment_model_3000_samples_ginosky_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_ginosky DistilBertForSequenceClassification from ginosky +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_ginosky +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_ginosky` is a English model originally trained by ginosky. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_ginosky_en_5.5.0_3.0_1727073714240.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_ginosky_en_5.5.0_3.0_1727073714240.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_ginosky","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_ginosky", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_ginosky| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/ginosky/finetuning-sentiment-model-3000-samples \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-finetuning_sentiment_model_3000_samples_ginosky_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-finetuning_sentiment_model_3000_samples_ginosky_pipeline_en.md new file mode 100644 index 00000000000000..99034fc252d19f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-finetuning_sentiment_model_3000_samples_ginosky_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_ginosky_pipeline pipeline DistilBertForSequenceClassification from ginosky +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_ginosky_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_ginosky_pipeline` is a English model originally trained by ginosky. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_ginosky_pipeline_en_5.5.0_3.0_1727073726626.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_ginosky_pipeline_en_5.5.0_3.0_1727073726626.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuning_sentiment_model_3000_samples_ginosky_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuning_sentiment_model_3000_samples_ginosky_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_ginosky_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/ginosky/finetuning-sentiment-model-3000-samples + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-finetuning_sentiment_model_3000_samples_h9v8_en.md b/docs/_posts/ahmedlone127/2024-09-23-finetuning_sentiment_model_3000_samples_h9v8_en.md new file mode 100644 index 00000000000000..d25435add01229 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-finetuning_sentiment_model_3000_samples_h9v8_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_h9v8 DistilBertForSequenceClassification from H9V8 +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_h9v8 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_h9v8` is a English model originally trained by H9V8. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_h9v8_en_5.5.0_3.0_1727097239450.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_h9v8_en_5.5.0_3.0_1727097239450.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_h9v8","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_h9v8", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_h9v8| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/H9V8/finetuning-sentiment-model-3000-samples \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-finetuning_sentiment_model_3000_samples_h9v8_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-finetuning_sentiment_model_3000_samples_h9v8_pipeline_en.md new file mode 100644 index 00000000000000..4cd48b5bf2f75f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-finetuning_sentiment_model_3000_samples_h9v8_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_h9v8_pipeline pipeline DistilBertForSequenceClassification from H9V8 +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_h9v8_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_h9v8_pipeline` is a English model originally trained by H9V8. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_h9v8_pipeline_en_5.5.0_3.0_1727097250957.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_h9v8_pipeline_en_5.5.0_3.0_1727097250957.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuning_sentiment_model_3000_samples_h9v8_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuning_sentiment_model_3000_samples_h9v8_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_h9v8_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/H9V8/finetuning-sentiment-model-3000-samples + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-finetuning_sentiment_model_3000_samples_ih8l1ght_en.md b/docs/_posts/ahmedlone127/2024-09-23-finetuning_sentiment_model_3000_samples_ih8l1ght_en.md new file mode 100644 index 00000000000000..443932da4ccede --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-finetuning_sentiment_model_3000_samples_ih8l1ght_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_ih8l1ght DistilBertForSequenceClassification from ih8l1ght +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_ih8l1ght +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_ih8l1ght` is a English model originally trained by ih8l1ght. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_ih8l1ght_en_5.5.0_3.0_1727094027763.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_ih8l1ght_en_5.5.0_3.0_1727094027763.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_ih8l1ght","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_ih8l1ght", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_ih8l1ght| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/ih8l1ght/finetuning-sentiment-model-3000-samples \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-finetuning_sentiment_model_3000_samples_ih8l1ght_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-finetuning_sentiment_model_3000_samples_ih8l1ght_pipeline_en.md new file mode 100644 index 00000000000000..88a2219f69d1aa --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-finetuning_sentiment_model_3000_samples_ih8l1ght_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_ih8l1ght_pipeline pipeline DistilBertForSequenceClassification from ih8l1ght +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_ih8l1ght_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_ih8l1ght_pipeline` is a English model originally trained by ih8l1ght. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_ih8l1ght_pipeline_en_5.5.0_3.0_1727094039365.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_ih8l1ght_pipeline_en_5.5.0_3.0_1727094039365.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuning_sentiment_model_3000_samples_ih8l1ght_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuning_sentiment_model_3000_samples_ih8l1ght_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_ih8l1ght_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/ih8l1ght/finetuning-sentiment-model-3000-samples + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-finetuning_sentiment_model_3000_samples_inn_ctrl_en.md b/docs/_posts/ahmedlone127/2024-09-23-finetuning_sentiment_model_3000_samples_inn_ctrl_en.md new file mode 100644 index 00000000000000..e53501897118a0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-finetuning_sentiment_model_3000_samples_inn_ctrl_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_inn_ctrl DistilBertForSequenceClassification from inn-ctrl +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_inn_ctrl +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_inn_ctrl` is a English model originally trained by inn-ctrl. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_inn_ctrl_en_5.5.0_3.0_1727110394593.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_inn_ctrl_en_5.5.0_3.0_1727110394593.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_inn_ctrl","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_inn_ctrl", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_inn_ctrl| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/inn-ctrl/finetuning-sentiment-model-3000-samples \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-finetuning_sentiment_model_3000_samples_inn_ctrl_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-finetuning_sentiment_model_3000_samples_inn_ctrl_pipeline_en.md new file mode 100644 index 00000000000000..a354d6f29381a9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-finetuning_sentiment_model_3000_samples_inn_ctrl_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_inn_ctrl_pipeline pipeline DistilBertForSequenceClassification from inn-ctrl +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_inn_ctrl_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_inn_ctrl_pipeline` is a English model originally trained by inn-ctrl. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_inn_ctrl_pipeline_en_5.5.0_3.0_1727110406993.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_inn_ctrl_pipeline_en_5.5.0_3.0_1727110406993.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuning_sentiment_model_3000_samples_inn_ctrl_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuning_sentiment_model_3000_samples_inn_ctrl_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_inn_ctrl_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/inn-ctrl/finetuning-sentiment-model-3000-samples + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-finetuning_sentiment_model_3000_samples_lusinep_en.md b/docs/_posts/ahmedlone127/2024-09-23-finetuning_sentiment_model_3000_samples_lusinep_en.md new file mode 100644 index 00000000000000..f25796d7b2c6dd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-finetuning_sentiment_model_3000_samples_lusinep_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_lusinep DistilBertForSequenceClassification from lusinep +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_lusinep +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_lusinep` is a English model originally trained by lusinep. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_lusinep_en_5.5.0_3.0_1727059771336.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_lusinep_en_5.5.0_3.0_1727059771336.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_lusinep","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_lusinep", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_lusinep| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/lusinep/finetuning-sentiment-model-3000-samples \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-finetuning_sentiment_model_3000_samples_lusinep_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-finetuning_sentiment_model_3000_samples_lusinep_pipeline_en.md new file mode 100644 index 00000000000000..288482b39730c1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-finetuning_sentiment_model_3000_samples_lusinep_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_lusinep_pipeline pipeline DistilBertForSequenceClassification from lusinep +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_lusinep_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_lusinep_pipeline` is a English model originally trained by lusinep. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_lusinep_pipeline_en_5.5.0_3.0_1727059785312.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_lusinep_pipeline_en_5.5.0_3.0_1727059785312.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuning_sentiment_model_3000_samples_lusinep_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuning_sentiment_model_3000_samples_lusinep_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_lusinep_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/lusinep/finetuning-sentiment-model-3000-samples + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-finetuning_sentiment_model_3000_samples_murali07_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-finetuning_sentiment_model_3000_samples_murali07_pipeline_en.md new file mode 100644 index 00000000000000..4c79ce580369f0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-finetuning_sentiment_model_3000_samples_murali07_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_murali07_pipeline pipeline DistilBertForSequenceClassification from murali07 +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_murali07_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_murali07_pipeline` is a English model originally trained by murali07. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_murali07_pipeline_en_5.5.0_3.0_1727108485534.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_murali07_pipeline_en_5.5.0_3.0_1727108485534.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuning_sentiment_model_3000_samples_murali07_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuning_sentiment_model_3000_samples_murali07_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_murali07_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/murali07/finetuning-sentiment-model-3000-samples + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-finetuning_sentiment_model_3000_samples_nandyala12_en.md b/docs/_posts/ahmedlone127/2024-09-23-finetuning_sentiment_model_3000_samples_nandyala12_en.md new file mode 100644 index 00000000000000..a222ffa089c2e2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-finetuning_sentiment_model_3000_samples_nandyala12_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_nandyala12 DistilBertForSequenceClassification from Nandyala12 +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_nandyala12 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_nandyala12` is a English model originally trained by Nandyala12. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_nandyala12_en_5.5.0_3.0_1727097358137.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_nandyala12_en_5.5.0_3.0_1727097358137.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_nandyala12","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_nandyala12", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_nandyala12| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Nandyala12/finetuning-sentiment-model-3000-samples \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-finetuning_sentiment_model_3000_samples_nandyala12_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-finetuning_sentiment_model_3000_samples_nandyala12_pipeline_en.md new file mode 100644 index 00000000000000..8a86eba06d4e88 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-finetuning_sentiment_model_3000_samples_nandyala12_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_nandyala12_pipeline pipeline DistilBertForSequenceClassification from Nandyala12 +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_nandyala12_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_nandyala12_pipeline` is a English model originally trained by Nandyala12. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_nandyala12_pipeline_en_5.5.0_3.0_1727097369985.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_nandyala12_pipeline_en_5.5.0_3.0_1727097369985.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuning_sentiment_model_3000_samples_nandyala12_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuning_sentiment_model_3000_samples_nandyala12_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_nandyala12_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Nandyala12/finetuning-sentiment-model-3000-samples + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-finetuning_sentiment_model_3000_samples_ritesh47_en.md b/docs/_posts/ahmedlone127/2024-09-23-finetuning_sentiment_model_3000_samples_ritesh47_en.md new file mode 100644 index 00000000000000..3e4216f5bc15cf --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-finetuning_sentiment_model_3000_samples_ritesh47_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_ritesh47 DistilBertForSequenceClassification from ritesh47 +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_ritesh47 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_ritesh47` is a English model originally trained by ritesh47. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_ritesh47_en_5.5.0_3.0_1727073633754.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_ritesh47_en_5.5.0_3.0_1727073633754.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_ritesh47","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_ritesh47", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_ritesh47| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/ritesh47/finetuning-sentiment-model-3000-samples \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-finetuning_sentiment_model_3000_samples_ritesh47_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-finetuning_sentiment_model_3000_samples_ritesh47_pipeline_en.md new file mode 100644 index 00000000000000..ba31eb5aa7cb1c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-finetuning_sentiment_model_3000_samples_ritesh47_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_ritesh47_pipeline pipeline DistilBertForSequenceClassification from ritesh47 +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_ritesh47_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_ritesh47_pipeline` is a English model originally trained by ritesh47. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_ritesh47_pipeline_en_5.5.0_3.0_1727073645874.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_ritesh47_pipeline_en_5.5.0_3.0_1727073645874.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuning_sentiment_model_3000_samples_ritesh47_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuning_sentiment_model_3000_samples_ritesh47_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_ritesh47_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/ritesh47/finetuning-sentiment-model-3000-samples + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-finetuning_sentiment_model_3000_samples_thanhchauns2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-finetuning_sentiment_model_3000_samples_thanhchauns2_pipeline_en.md new file mode 100644 index 00000000000000..6a4356e0f3e3cd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-finetuning_sentiment_model_3000_samples_thanhchauns2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_thanhchauns2_pipeline pipeline DistilBertForSequenceClassification from thanhchauns2 +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_thanhchauns2_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_thanhchauns2_pipeline` is a English model originally trained by thanhchauns2. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_thanhchauns2_pipeline_en_5.5.0_3.0_1727082623262.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_thanhchauns2_pipeline_en_5.5.0_3.0_1727082623262.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuning_sentiment_model_3000_samples_thanhchauns2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuning_sentiment_model_3000_samples_thanhchauns2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_thanhchauns2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/thanhchauns2/finetuning-sentiment-model-3000-samples + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-finetuning_sentiment_model_5000_amazon_cm_en.md b/docs/_posts/ahmedlone127/2024-09-23-finetuning_sentiment_model_5000_amazon_cm_en.md new file mode 100644 index 00000000000000..81f344d1f4fffd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-finetuning_sentiment_model_5000_amazon_cm_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuning_sentiment_model_5000_amazon_cm DistilBertForSequenceClassification from abyesses +author: John Snow Labs +name: finetuning_sentiment_model_5000_amazon_cm +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_5000_amazon_cm` is a English model originally trained by abyesses. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_5000_amazon_cm_en_5.5.0_3.0_1727096946966.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_5000_amazon_cm_en_5.5.0_3.0_1727096946966.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_5000_amazon_cm","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_5000_amazon_cm", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_5000_amazon_cm| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/abyesses/finetuning-sentiment-model-5000-amazon_cm \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-finetuning_sentiment_model_5000_amazon_cm_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-finetuning_sentiment_model_5000_amazon_cm_pipeline_en.md new file mode 100644 index 00000000000000..9a5bcf401c82a3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-finetuning_sentiment_model_5000_amazon_cm_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuning_sentiment_model_5000_amazon_cm_pipeline pipeline DistilBertForSequenceClassification from abyesses +author: John Snow Labs +name: finetuning_sentiment_model_5000_amazon_cm_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_5000_amazon_cm_pipeline` is a English model originally trained by abyesses. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_5000_amazon_cm_pipeline_en_5.5.0_3.0_1727096962720.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_5000_amazon_cm_pipeline_en_5.5.0_3.0_1727096962720.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuning_sentiment_model_5000_amazon_cm_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuning_sentiment_model_5000_amazon_cm_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_5000_amazon_cm_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/abyesses/finetuning-sentiment-model-5000-amazon_cm + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-finetuning_sentiment_model_lomve_en.md b/docs/_posts/ahmedlone127/2024-09-23-finetuning_sentiment_model_lomve_en.md new file mode 100644 index 00000000000000..b98abf3fbf261e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-finetuning_sentiment_model_lomve_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuning_sentiment_model_lomve DistilBertForSequenceClassification from lomve +author: John Snow Labs +name: finetuning_sentiment_model_lomve +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_lomve` is a English model originally trained by lomve. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_lomve_en_5.5.0_3.0_1727082406112.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_lomve_en_5.5.0_3.0_1727082406112.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_lomve","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_lomve", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_lomve| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/lomve/finetuning-sentiment-model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-finetuning_sentiment_model_lomve_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-finetuning_sentiment_model_lomve_pipeline_en.md new file mode 100644 index 00000000000000..509e015a9dad1c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-finetuning_sentiment_model_lomve_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuning_sentiment_model_lomve_pipeline pipeline DistilBertForSequenceClassification from lomve +author: John Snow Labs +name: finetuning_sentiment_model_lomve_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_lomve_pipeline` is a English model originally trained by lomve. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_lomve_pipeline_en_5.5.0_3.0_1727082420725.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_lomve_pipeline_en_5.5.0_3.0_1727082420725.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuning_sentiment_model_lomve_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuning_sentiment_model_lomve_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_lomve_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/lomve/finetuning-sentiment-model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-gal_sayula_popoluca_iw_2_en.md b/docs/_posts/ahmedlone127/2024-09-23-gal_sayula_popoluca_iw_2_en.md new file mode 100644 index 00000000000000..078d36d78761aa --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-gal_sayula_popoluca_iw_2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English gal_sayula_popoluca_iw_2 XlmRoBertaForTokenClassification from homersimpson +author: John Snow Labs +name: gal_sayula_popoluca_iw_2 +date: 2024-09-23 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`gal_sayula_popoluca_iw_2` is a English model originally trained by homersimpson. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/gal_sayula_popoluca_iw_2_en_5.5.0_3.0_1727132889328.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/gal_sayula_popoluca_iw_2_en_5.5.0_3.0_1727132889328.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("gal_sayula_popoluca_iw_2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("gal_sayula_popoluca_iw_2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|gal_sayula_popoluca_iw_2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|417.1 MB| + +## References + +https://huggingface.co/homersimpson/gal-pos-iw-2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-graphcodebert_base_german_en.md b/docs/_posts/ahmedlone127/2024-09-23-graphcodebert_base_german_en.md new file mode 100644 index 00000000000000..72bc852d8c5665 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-graphcodebert_base_german_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English graphcodebert_base_german RoBertaEmbeddings from meloneneneis +author: John Snow Labs +name: graphcodebert_base_german +date: 2024-09-23 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`graphcodebert_base_german` is a English model originally trained by meloneneneis. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/graphcodebert_base_german_en_5.5.0_3.0_1727080576847.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/graphcodebert_base_german_en_5.5.0_3.0_1727080576847.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("graphcodebert_base_german","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("graphcodebert_base_german","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|graphcodebert_base_german| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|480.3 MB| + +## References + +https://huggingface.co/meloneneneis/graphcodebert-base-german \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-graphcodebert_base_german_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-graphcodebert_base_german_pipeline_en.md new file mode 100644 index 00000000000000..74903d4804e43f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-graphcodebert_base_german_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English graphcodebert_base_german_pipeline pipeline RoBertaEmbeddings from meloneneneis +author: John Snow Labs +name: graphcodebert_base_german_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`graphcodebert_base_german_pipeline` is a English model originally trained by meloneneneis. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/graphcodebert_base_german_pipeline_en_5.5.0_3.0_1727080599841.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/graphcodebert_base_german_pipeline_en_5.5.0_3.0_1727080599841.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("graphcodebert_base_german_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("graphcodebert_base_german_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|graphcodebert_base_german_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|480.3 MB| + +## References + +https://huggingface.co/meloneneneis/graphcodebert-base-german + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-groberta_goemotions_en.md b/docs/_posts/ahmedlone127/2024-09-23-groberta_goemotions_en.md new file mode 100644 index 00000000000000..7a3218d1ded30d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-groberta_goemotions_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English groberta_goemotions RoBertaForSequenceClassification from Mukundhan32 +author: John Snow Labs +name: groberta_goemotions +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`groberta_goemotions` is a English model originally trained by Mukundhan32. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/groberta_goemotions_en_5.5.0_3.0_1727085809523.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/groberta_goemotions_en_5.5.0_3.0_1727085809523.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("groberta_goemotions","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("groberta_goemotions", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|groberta_goemotions| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|453.3 MB| + +## References + +https://huggingface.co/Mukundhan32/Groberta-goemotions \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-groberta_goemotions_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-groberta_goemotions_pipeline_en.md new file mode 100644 index 00000000000000..1a695e14a2a2f6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-groberta_goemotions_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English groberta_goemotions_pipeline pipeline RoBertaForSequenceClassification from Mukundhan32 +author: John Snow Labs +name: groberta_goemotions_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`groberta_goemotions_pipeline` is a English model originally trained by Mukundhan32. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/groberta_goemotions_pipeline_en_5.5.0_3.0_1727085834370.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/groberta_goemotions_pipeline_en_5.5.0_3.0_1727085834370.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("groberta_goemotions_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("groberta_goemotions_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|groberta_goemotions_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|453.4 MB| + +## References + +https://huggingface.co/Mukundhan32/Groberta-goemotions + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-happy_en.md b/docs/_posts/ahmedlone127/2024-09-23-happy_en.md new file mode 100644 index 00000000000000..918b3c779cc0c0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-happy_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English happy RoBertaEmbeddings from MatthijsN +author: John Snow Labs +name: happy +date: 2024-09-23 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`happy` is a English model originally trained by MatthijsN. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/happy_en_5.5.0_3.0_1727057061183.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/happy_en_5.5.0_3.0_1727057061183.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("happy","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("happy","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|happy| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|466.3 MB| + +## References + +https://huggingface.co/MatthijsN/happy \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-happy_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-happy_pipeline_en.md new file mode 100644 index 00000000000000..9f3343de126ced --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-happy_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English happy_pipeline pipeline RoBertaEmbeddings from MatthijsN +author: John Snow Labs +name: happy_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`happy_pipeline` is a English model originally trained by MatthijsN. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/happy_pipeline_en_5.5.0_3.0_1727057083302.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/happy_pipeline_en_5.5.0_3.0_1727057083302.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("happy_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("happy_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|happy_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|466.4 MB| + +## References + +https://huggingface.co/MatthijsN/happy + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-harmful_content_trainer_en.md b/docs/_posts/ahmedlone127/2024-09-23-harmful_content_trainer_en.md new file mode 100644 index 00000000000000..1c5084101da8af --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-harmful_content_trainer_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English harmful_content_trainer DistilBertForSequenceClassification from AIUs3r0 +author: John Snow Labs +name: harmful_content_trainer +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`harmful_content_trainer` is a English model originally trained by AIUs3r0. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/harmful_content_trainer_en_5.5.0_3.0_1727059847466.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/harmful_content_trainer_en_5.5.0_3.0_1727059847466.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("harmful_content_trainer","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("harmful_content_trainer", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|harmful_content_trainer| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/AIUs3r0/Harmful_Content_Trainer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-harmful_content_trainer_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-harmful_content_trainer_pipeline_en.md new file mode 100644 index 00000000000000..020d7fa5c49456 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-harmful_content_trainer_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English harmful_content_trainer_pipeline pipeline DistilBertForSequenceClassification from AIUs3r0 +author: John Snow Labs +name: harmful_content_trainer_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`harmful_content_trainer_pipeline` is a English model originally trained by AIUs3r0. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/harmful_content_trainer_pipeline_en_5.5.0_3.0_1727059860413.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/harmful_content_trainer_pipeline_en_5.5.0_3.0_1727059860413.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("harmful_content_trainer_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("harmful_content_trainer_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|harmful_content_trainer_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/AIUs3r0/Harmful_Content_Trainer + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-hate_hate_balance_random0_seed0_bertweet_large_en.md b/docs/_posts/ahmedlone127/2024-09-23-hate_hate_balance_random0_seed0_bertweet_large_en.md new file mode 100644 index 00000000000000..7c8f27a191dc5e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-hate_hate_balance_random0_seed0_bertweet_large_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English hate_hate_balance_random0_seed0_bertweet_large RoBertaForSequenceClassification from tweettemposhift +author: John Snow Labs +name: hate_hate_balance_random0_seed0_bertweet_large +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`hate_hate_balance_random0_seed0_bertweet_large` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/hate_hate_balance_random0_seed0_bertweet_large_en_5.5.0_3.0_1727055095050.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/hate_hate_balance_random0_seed0_bertweet_large_en_5.5.0_3.0_1727055095050.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("hate_hate_balance_random0_seed0_bertweet_large","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("hate_hate_balance_random0_seed0_bertweet_large", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|hate_hate_balance_random0_seed0_bertweet_large| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/tweettemposhift/hate-hate_balance_random0_seed0-bertweet-large \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-hate_hate_balance_random0_seed0_bertweet_large_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-hate_hate_balance_random0_seed0_bertweet_large_pipeline_en.md new file mode 100644 index 00000000000000..6d20c59d82f608 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-hate_hate_balance_random0_seed0_bertweet_large_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English hate_hate_balance_random0_seed0_bertweet_large_pipeline pipeline RoBertaForSequenceClassification from tweettemposhift +author: John Snow Labs +name: hate_hate_balance_random0_seed0_bertweet_large_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`hate_hate_balance_random0_seed0_bertweet_large_pipeline` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/hate_hate_balance_random0_seed0_bertweet_large_pipeline_en_5.5.0_3.0_1727055179350.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/hate_hate_balance_random0_seed0_bertweet_large_pipeline_en_5.5.0_3.0_1727055179350.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("hate_hate_balance_random0_seed0_bertweet_large_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("hate_hate_balance_random0_seed0_bertweet_large_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|hate_hate_balance_random0_seed0_bertweet_large_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/tweettemposhift/hate-hate_balance_random0_seed0-bertweet-large + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-hate_hate_balance_random1_seed1_twitter_roberta_large_2022_154m_en.md b/docs/_posts/ahmedlone127/2024-09-23-hate_hate_balance_random1_seed1_twitter_roberta_large_2022_154m_en.md new file mode 100644 index 00000000000000..71a769de9178a9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-hate_hate_balance_random1_seed1_twitter_roberta_large_2022_154m_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English hate_hate_balance_random1_seed1_twitter_roberta_large_2022_154m RoBertaForSequenceClassification from tweettemposhift +author: John Snow Labs +name: hate_hate_balance_random1_seed1_twitter_roberta_large_2022_154m +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`hate_hate_balance_random1_seed1_twitter_roberta_large_2022_154m` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/hate_hate_balance_random1_seed1_twitter_roberta_large_2022_154m_en_5.5.0_3.0_1727055765647.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/hate_hate_balance_random1_seed1_twitter_roberta_large_2022_154m_en_5.5.0_3.0_1727055765647.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("hate_hate_balance_random1_seed1_twitter_roberta_large_2022_154m","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("hate_hate_balance_random1_seed1_twitter_roberta_large_2022_154m", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|hate_hate_balance_random1_seed1_twitter_roberta_large_2022_154m| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/tweettemposhift/hate-hate_balance_random1_seed1-twitter-roberta-large-2022-154m \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-hate_hate_balance_random1_seed1_twitter_roberta_large_2022_154m_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-hate_hate_balance_random1_seed1_twitter_roberta_large_2022_154m_pipeline_en.md new file mode 100644 index 00000000000000..2879babf4000b9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-hate_hate_balance_random1_seed1_twitter_roberta_large_2022_154m_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English hate_hate_balance_random1_seed1_twitter_roberta_large_2022_154m_pipeline pipeline RoBertaForSequenceClassification from tweettemposhift +author: John Snow Labs +name: hate_hate_balance_random1_seed1_twitter_roberta_large_2022_154m_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`hate_hate_balance_random1_seed1_twitter_roberta_large_2022_154m_pipeline` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/hate_hate_balance_random1_seed1_twitter_roberta_large_2022_154m_pipeline_en_5.5.0_3.0_1727055826494.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/hate_hate_balance_random1_seed1_twitter_roberta_large_2022_154m_pipeline_en_5.5.0_3.0_1727055826494.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("hate_hate_balance_random1_seed1_twitter_roberta_large_2022_154m_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("hate_hate_balance_random1_seed1_twitter_roberta_large_2022_154m_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|hate_hate_balance_random1_seed1_twitter_roberta_large_2022_154m_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/tweettemposhift/hate-hate_balance_random1_seed1-twitter-roberta-large-2022-154m + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-hate_hate_balance_random2_seed0_twitter_roberta_base_2021_124m_en.md b/docs/_posts/ahmedlone127/2024-09-23-hate_hate_balance_random2_seed0_twitter_roberta_base_2021_124m_en.md new file mode 100644 index 00000000000000..a06c37e990072d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-hate_hate_balance_random2_seed0_twitter_roberta_base_2021_124m_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English hate_hate_balance_random2_seed0_twitter_roberta_base_2021_124m RoBertaForSequenceClassification from tweettemposhift +author: John Snow Labs +name: hate_hate_balance_random2_seed0_twitter_roberta_base_2021_124m +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`hate_hate_balance_random2_seed0_twitter_roberta_base_2021_124m` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/hate_hate_balance_random2_seed0_twitter_roberta_base_2021_124m_en_5.5.0_3.0_1727055474650.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/hate_hate_balance_random2_seed0_twitter_roberta_base_2021_124m_en_5.5.0_3.0_1727055474650.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("hate_hate_balance_random2_seed0_twitter_roberta_base_2021_124m","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("hate_hate_balance_random2_seed0_twitter_roberta_base_2021_124m", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|hate_hate_balance_random2_seed0_twitter_roberta_base_2021_124m| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|468.3 MB| + +## References + +https://huggingface.co/tweettemposhift/hate-hate_balance_random2_seed0-twitter-roberta-base-2021-124m \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-hate_hate_balance_random2_seed0_twitter_roberta_base_2021_124m_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-hate_hate_balance_random2_seed0_twitter_roberta_base_2021_124m_pipeline_en.md new file mode 100644 index 00000000000000..16e09992363aa4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-hate_hate_balance_random2_seed0_twitter_roberta_base_2021_124m_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English hate_hate_balance_random2_seed0_twitter_roberta_base_2021_124m_pipeline pipeline RoBertaForSequenceClassification from tweettemposhift +author: John Snow Labs +name: hate_hate_balance_random2_seed0_twitter_roberta_base_2021_124m_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`hate_hate_balance_random2_seed0_twitter_roberta_base_2021_124m_pipeline` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/hate_hate_balance_random2_seed0_twitter_roberta_base_2021_124m_pipeline_en_5.5.0_3.0_1727055499145.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/hate_hate_balance_random2_seed0_twitter_roberta_base_2021_124m_pipeline_en_5.5.0_3.0_1727055499145.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("hate_hate_balance_random2_seed0_twitter_roberta_base_2021_124m_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("hate_hate_balance_random2_seed0_twitter_roberta_base_2021_124m_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|hate_hate_balance_random2_seed0_twitter_roberta_base_2021_124m_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|468.3 MB| + +## References + +https://huggingface.co/tweettemposhift/hate-hate_balance_random2_seed0-twitter-roberta-base-2021-124m + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-hindi_bpe_bert_test_2m_en.md b/docs/_posts/ahmedlone127/2024-09-23-hindi_bpe_bert_test_2m_en.md new file mode 100644 index 00000000000000..afa6a57ee8ec92 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-hindi_bpe_bert_test_2m_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English hindi_bpe_bert_test_2m BertEmbeddings from rg1683 +author: John Snow Labs +name: hindi_bpe_bert_test_2m +date: 2024-09-23 +tags: [en, open_source, onnx, embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`hindi_bpe_bert_test_2m` is a English model originally trained by rg1683. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/hindi_bpe_bert_test_2m_en_5.5.0_3.0_1727107688305.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/hindi_bpe_bert_test_2m_en_5.5.0_3.0_1727107688305.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = BertEmbeddings.pretrained("hindi_bpe_bert_test_2m","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = BertEmbeddings.pretrained("hindi_bpe_bert_test_2m","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|hindi_bpe_bert_test_2m| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[bert]| +|Language:|en| +|Size:|377.8 MB| + +## References + +https://huggingface.co/rg1683/hindi_bpe_bert_test_2m \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-hindi_bpe_bert_test_2m_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-hindi_bpe_bert_test_2m_pipeline_en.md new file mode 100644 index 00000000000000..66ef7d06068c2a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-hindi_bpe_bert_test_2m_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English hindi_bpe_bert_test_2m_pipeline pipeline BertEmbeddings from rg1683 +author: John Snow Labs +name: hindi_bpe_bert_test_2m_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`hindi_bpe_bert_test_2m_pipeline` is a English model originally trained by rg1683. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/hindi_bpe_bert_test_2m_pipeline_en_5.5.0_3.0_1727107705675.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/hindi_bpe_bert_test_2m_pipeline_en_5.5.0_3.0_1727107705675.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("hindi_bpe_bert_test_2m_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("hindi_bpe_bert_test_2m_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|hindi_bpe_bert_test_2m_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|377.8 MB| + +## References + +https://huggingface.co/rg1683/hindi_bpe_bert_test_2m + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-humor_recognition_greek_distilbert_el.md b/docs/_posts/ahmedlone127/2024-09-23-humor_recognition_greek_distilbert_el.md new file mode 100644 index 00000000000000..a9e93ccc0f8f35 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-humor_recognition_greek_distilbert_el.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Modern Greek (1453-) humor_recognition_greek_distilbert DistilBertForSequenceClassification from Kalloniatis +author: John Snow Labs +name: humor_recognition_greek_distilbert +date: 2024-09-23 +tags: [el, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: el +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`humor_recognition_greek_distilbert` is a Modern Greek (1453-) model originally trained by Kalloniatis. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/humor_recognition_greek_distilbert_el_5.5.0_3.0_1727074177409.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/humor_recognition_greek_distilbert_el_5.5.0_3.0_1727074177409.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("humor_recognition_greek_distilbert","el") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("humor_recognition_greek_distilbert", "el") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|humor_recognition_greek_distilbert| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|el| +|Size:|507.6 MB| + +## References + +https://huggingface.co/Kalloniatis/Humor-Recognition-Greek-DistilBERT \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-humor_recognition_greek_distilbert_pipeline_el.md b/docs/_posts/ahmedlone127/2024-09-23-humor_recognition_greek_distilbert_pipeline_el.md new file mode 100644 index 00000000000000..1b162b89532230 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-humor_recognition_greek_distilbert_pipeline_el.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Modern Greek (1453-) humor_recognition_greek_distilbert_pipeline pipeline DistilBertForSequenceClassification from Kalloniatis +author: John Snow Labs +name: humor_recognition_greek_distilbert_pipeline +date: 2024-09-23 +tags: [el, open_source, pipeline, onnx] +task: Text Classification +language: el +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`humor_recognition_greek_distilbert_pipeline` is a Modern Greek (1453-) model originally trained by Kalloniatis. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/humor_recognition_greek_distilbert_pipeline_el_5.5.0_3.0_1727074201489.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/humor_recognition_greek_distilbert_pipeline_el_5.5.0_3.0_1727074201489.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("humor_recognition_greek_distilbert_pipeline", lang = "el") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("humor_recognition_greek_distilbert_pipeline", lang = "el") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|humor_recognition_greek_distilbert_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|el| +|Size:|507.6 MB| + +## References + +https://huggingface.co/Kalloniatis/Humor-Recognition-Greek-DistilBERT + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-hw01_chchang_en.md b/docs/_posts/ahmedlone127/2024-09-23-hw01_chchang_en.md new file mode 100644 index 00000000000000..928a4b1a93716f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-hw01_chchang_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English hw01_chchang DistilBertForSequenceClassification from CHChang +author: John Snow Labs +name: hw01_chchang +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`hw01_chchang` is a English model originally trained by CHChang. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/hw01_chchang_en_5.5.0_3.0_1727093797638.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/hw01_chchang_en_5.5.0_3.0_1727093797638.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("hw01_chchang","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("hw01_chchang", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|hw01_chchang| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/CHChang/HW01 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-hw1223_01_en.md b/docs/_posts/ahmedlone127/2024-09-23-hw1223_01_en.md new file mode 100644 index 00000000000000..d2be1409d91e0b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-hw1223_01_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English hw1223_01 DistilBertForSequenceClassification from tunyu +author: John Snow Labs +name: hw1223_01 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`hw1223_01` is a English model originally trained by tunyu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/hw1223_01_en_5.5.0_3.0_1727059540790.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/hw1223_01_en_5.5.0_3.0_1727059540790.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("hw1223_01","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("hw1223_01", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|hw1223_01| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/tunyu/HW1223_01 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-hw1223_01_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-hw1223_01_pipeline_en.md new file mode 100644 index 00000000000000..4496732edd9a25 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-hw1223_01_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English hw1223_01_pipeline pipeline DistilBertForSequenceClassification from tunyu +author: John Snow Labs +name: hw1223_01_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`hw1223_01_pipeline` is a English model originally trained by tunyu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/hw1223_01_pipeline_en_5.5.0_3.0_1727059554935.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/hw1223_01_pipeline_en_5.5.0_3.0_1727059554935.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("hw1223_01_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("hw1223_01_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|hw1223_01_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/tunyu/HW1223_01 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-hw_1_irisliou_en.md b/docs/_posts/ahmedlone127/2024-09-23-hw_1_irisliou_en.md new file mode 100644 index 00000000000000..c9964578560936 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-hw_1_irisliou_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English hw_1_irisliou DistilBertForSequenceClassification from IrisLiou +author: John Snow Labs +name: hw_1_irisliou +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`hw_1_irisliou` is a English model originally trained by IrisLiou. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/hw_1_irisliou_en_5.5.0_3.0_1727093820951.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/hw_1_irisliou_en_5.5.0_3.0_1727093820951.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("hw_1_irisliou","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("hw_1_irisliou", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|hw_1_irisliou| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/IrisLiou/hw-1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-hw_1_irisliou_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-hw_1_irisliou_pipeline_en.md new file mode 100644 index 00000000000000..2b7f9feb9472d6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-hw_1_irisliou_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English hw_1_irisliou_pipeline pipeline DistilBertForSequenceClassification from IrisLiou +author: John Snow Labs +name: hw_1_irisliou_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`hw_1_irisliou_pipeline` is a English model originally trained by IrisLiou. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/hw_1_irisliou_pipeline_en_5.5.0_3.0_1727093832450.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/hw_1_irisliou_pipeline_en_5.5.0_3.0_1727093832450.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("hw_1_irisliou_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("hw_1_irisliou_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|hw_1_irisliou_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/IrisLiou/hw-1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-id2223_lab2_whisper_nelanbu_en.md b/docs/_posts/ahmedlone127/2024-09-23-id2223_lab2_whisper_nelanbu_en.md new file mode 100644 index 00000000000000..d19e6b1c3178a8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-id2223_lab2_whisper_nelanbu_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English id2223_lab2_whisper_nelanbu WhisperForCTC from nelanbu +author: John Snow Labs +name: id2223_lab2_whisper_nelanbu +date: 2024-09-23 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`id2223_lab2_whisper_nelanbu` is a English model originally trained by nelanbu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/id2223_lab2_whisper_nelanbu_en_5.5.0_3.0_1727051979457.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/id2223_lab2_whisper_nelanbu_en_5.5.0_3.0_1727051979457.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("id2223_lab2_whisper_nelanbu","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("id2223_lab2_whisper_nelanbu", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|id2223_lab2_whisper_nelanbu| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/nelanbu/ID2223_Lab2_Whisper \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-id2223_lab2_whisper_nelanbu_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-id2223_lab2_whisper_nelanbu_pipeline_en.md new file mode 100644 index 00000000000000..042a3977ca5835 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-id2223_lab2_whisper_nelanbu_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English id2223_lab2_whisper_nelanbu_pipeline pipeline WhisperForCTC from nelanbu +author: John Snow Labs +name: id2223_lab2_whisper_nelanbu_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`id2223_lab2_whisper_nelanbu_pipeline` is a English model originally trained by nelanbu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/id2223_lab2_whisper_nelanbu_pipeline_en_5.5.0_3.0_1727052076788.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/id2223_lab2_whisper_nelanbu_pipeline_en_5.5.0_3.0_1727052076788.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("id2223_lab2_whisper_nelanbu_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("id2223_lab2_whisper_nelanbu_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|id2223_lab2_whisper_nelanbu_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/nelanbu/ID2223_Lab2_Whisper + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-imdbreviews_classification_distilbert_sst2_transfer_learning_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-imdbreviews_classification_distilbert_sst2_transfer_learning_pipeline_en.md new file mode 100644 index 00000000000000..7e28c52af1c38f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-imdbreviews_classification_distilbert_sst2_transfer_learning_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English imdbreviews_classification_distilbert_sst2_transfer_learning_pipeline pipeline DistilBertForSequenceClassification from darmendarizp +author: John Snow Labs +name: imdbreviews_classification_distilbert_sst2_transfer_learning_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`imdbreviews_classification_distilbert_sst2_transfer_learning_pipeline` is a English model originally trained by darmendarizp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/imdbreviews_classification_distilbert_sst2_transfer_learning_pipeline_en_5.5.0_3.0_1727082664941.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/imdbreviews_classification_distilbert_sst2_transfer_learning_pipeline_en_5.5.0_3.0_1727082664941.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("imdbreviews_classification_distilbert_sst2_transfer_learning_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("imdbreviews_classification_distilbert_sst2_transfer_learning_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|imdbreviews_classification_distilbert_sst2_transfer_learning_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/darmendarizp/imdbreviews_classification_distilbert_sst2_transfer_learning + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-imdbreviews_classification_distilbert_v02_mancd_en.md b/docs/_posts/ahmedlone127/2024-09-23-imdbreviews_classification_distilbert_v02_mancd_en.md new file mode 100644 index 00000000000000..61976dcb02087a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-imdbreviews_classification_distilbert_v02_mancd_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English imdbreviews_classification_distilbert_v02_mancd DistilBertForSequenceClassification from ManCD +author: John Snow Labs +name: imdbreviews_classification_distilbert_v02_mancd +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`imdbreviews_classification_distilbert_v02_mancd` is a English model originally trained by ManCD. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/imdbreviews_classification_distilbert_v02_mancd_en_5.5.0_3.0_1727093814161.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/imdbreviews_classification_distilbert_v02_mancd_en_5.5.0_3.0_1727093814161.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("imdbreviews_classification_distilbert_v02_mancd","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("imdbreviews_classification_distilbert_v02_mancd", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|imdbreviews_classification_distilbert_v02_mancd| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/ManCD/imdbreviews_classification_distilbert_v02 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-imdbreviews_classification_distilbert_v02_mancd_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-imdbreviews_classification_distilbert_v02_mancd_pipeline_en.md new file mode 100644 index 00000000000000..8c5c52407f464c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-imdbreviews_classification_distilbert_v02_mancd_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English imdbreviews_classification_distilbert_v02_mancd_pipeline pipeline DistilBertForSequenceClassification from ManCD +author: John Snow Labs +name: imdbreviews_classification_distilbert_v02_mancd_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`imdbreviews_classification_distilbert_v02_mancd_pipeline` is a English model originally trained by ManCD. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/imdbreviews_classification_distilbert_v02_mancd_pipeline_en_5.5.0_3.0_1727093825749.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/imdbreviews_classification_distilbert_v02_mancd_pipeline_en_5.5.0_3.0_1727093825749.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("imdbreviews_classification_distilbert_v02_mancd_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("imdbreviews_classification_distilbert_v02_mancd_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|imdbreviews_classification_distilbert_v02_mancd_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/ManCD/imdbreviews_classification_distilbert_v02 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-inisw08_robert_mlm_adamw_torch_bs8_en.md b/docs/_posts/ahmedlone127/2024-09-23-inisw08_robert_mlm_adamw_torch_bs8_en.md new file mode 100644 index 00000000000000..24b77883261731 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-inisw08_robert_mlm_adamw_torch_bs8_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English inisw08_robert_mlm_adamw_torch_bs8 RoBertaEmbeddings from ugiugi +author: John Snow Labs +name: inisw08_robert_mlm_adamw_torch_bs8 +date: 2024-09-23 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`inisw08_robert_mlm_adamw_torch_bs8` is a English model originally trained by ugiugi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/inisw08_robert_mlm_adamw_torch_bs8_en_5.5.0_3.0_1727066103391.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/inisw08_robert_mlm_adamw_torch_bs8_en_5.5.0_3.0_1727066103391.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("inisw08_robert_mlm_adamw_torch_bs8","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("inisw08_robert_mlm_adamw_torch_bs8","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|inisw08_robert_mlm_adamw_torch_bs8| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|465.6 MB| + +## References + +https://huggingface.co/ugiugi/inisw08-RoBERT-mlm-adamw_torch_bs8 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-inisw08_robert_mlm_adamw_torch_bs8_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-inisw08_robert_mlm_adamw_torch_bs8_pipeline_en.md new file mode 100644 index 00000000000000..0aec2fc2d4e865 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-inisw08_robert_mlm_adamw_torch_bs8_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English inisw08_robert_mlm_adamw_torch_bs8_pipeline pipeline RoBertaEmbeddings from ugiugi +author: John Snow Labs +name: inisw08_robert_mlm_adamw_torch_bs8_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`inisw08_robert_mlm_adamw_torch_bs8_pipeline` is a English model originally trained by ugiugi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/inisw08_robert_mlm_adamw_torch_bs8_pipeline_en_5.5.0_3.0_1727066126742.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/inisw08_robert_mlm_adamw_torch_bs8_pipeline_en_5.5.0_3.0_1727066126742.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("inisw08_robert_mlm_adamw_torch_bs8_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("inisw08_robert_mlm_adamw_torch_bs8_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|inisw08_robert_mlm_adamw_torch_bs8_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|465.7 MB| + +## References + +https://huggingface.co/ugiugi/inisw08-RoBERT-mlm-adamw_torch_bs8 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-interview_classifier_en.md b/docs/_posts/ahmedlone127/2024-09-23-interview_classifier_en.md new file mode 100644 index 00000000000000..9f099fc38c1cb8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-interview_classifier_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English interview_classifier DistilBertForSequenceClassification from eskayML +author: John Snow Labs +name: interview_classifier +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`interview_classifier` is a English model originally trained by eskayML. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/interview_classifier_en_5.5.0_3.0_1727073517281.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/interview_classifier_en_5.5.0_3.0_1727073517281.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("interview_classifier","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("interview_classifier", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|interview_classifier| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/eskayML/interview_classifier \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-kanglish_offensive_language_identification_en.md b/docs/_posts/ahmedlone127/2024-09-23-kanglish_offensive_language_identification_en.md new file mode 100644 index 00000000000000..fe81565abc1430 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-kanglish_offensive_language_identification_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English kanglish_offensive_language_identification RoBertaForSequenceClassification from seanbenhur +author: John Snow Labs +name: kanglish_offensive_language_identification +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`kanglish_offensive_language_identification` is a English model originally trained by seanbenhur. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/kanglish_offensive_language_identification_en_5.5.0_3.0_1727134915581.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/kanglish_offensive_language_identification_en_5.5.0_3.0_1727134915581.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("kanglish_offensive_language_identification","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("kanglish_offensive_language_identification", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|kanglish_offensive_language_identification| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|451.8 MB| + +## References + +https://huggingface.co/seanbenhur/kanglish-offensive-language-identification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-kanglish_offensive_language_identification_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-kanglish_offensive_language_identification_pipeline_en.md new file mode 100644 index 00000000000000..ec0ee3747fe3f0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-kanglish_offensive_language_identification_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English kanglish_offensive_language_identification_pipeline pipeline RoBertaForSequenceClassification from seanbenhur +author: John Snow Labs +name: kanglish_offensive_language_identification_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`kanglish_offensive_language_identification_pipeline` is a English model originally trained by seanbenhur. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/kanglish_offensive_language_identification_pipeline_en_5.5.0_3.0_1727134938987.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/kanglish_offensive_language_identification_pipeline_en_5.5.0_3.0_1727134938987.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("kanglish_offensive_language_identification_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("kanglish_offensive_language_identification_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|kanglish_offensive_language_identification_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|451.8 MB| + +## References + +https://huggingface.co/seanbenhur/kanglish-offensive-language-identification + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-kor_bert_qa_test_2_en.md b/docs/_posts/ahmedlone127/2024-09-23-kor_bert_qa_test_2_en.md new file mode 100644 index 00000000000000..db63e0edae4675 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-kor_bert_qa_test_2_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English kor_bert_qa_test_2 BertForQuestionAnswering from lemonTree5366 +author: John Snow Labs +name: kor_bert_qa_test_2 +date: 2024-09-23 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`kor_bert_qa_test_2` is a English model originally trained by lemonTree5366. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/kor_bert_qa_test_2_en_5.5.0_3.0_1727070299469.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/kor_bert_qa_test_2_en_5.5.0_3.0_1727070299469.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("kor_bert_qa_test_2","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("kor_bert_qa_test_2", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|kor_bert_qa_test_2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|441.2 MB| + +## References + +https://huggingface.co/lemonTree5366/kor_bert_qa_test_2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-kor_bert_qa_test_2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-kor_bert_qa_test_2_pipeline_en.md new file mode 100644 index 00000000000000..2f47827171af5b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-kor_bert_qa_test_2_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English kor_bert_qa_test_2_pipeline pipeline BertForQuestionAnswering from lemonTree5366 +author: John Snow Labs +name: kor_bert_qa_test_2_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`kor_bert_qa_test_2_pipeline` is a English model originally trained by lemonTree5366. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/kor_bert_qa_test_2_pipeline_en_5.5.0_3.0_1727070321205.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/kor_bert_qa_test_2_pipeline_en_5.5.0_3.0_1727070321205.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("kor_bert_qa_test_2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("kor_bert_qa_test_2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|kor_bert_qa_test_2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|441.2 MB| + +## References + +https://huggingface.co/lemonTree5366/kor_bert_qa_test_2 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-lab1_yanwen9969_en.md b/docs/_posts/ahmedlone127/2024-09-23-lab1_yanwen9969_en.md new file mode 100644 index 00000000000000..6eb0b09b70278f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-lab1_yanwen9969_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English lab1_yanwen9969 DistilBertForSequenceClassification from Yanwen9969 +author: John Snow Labs +name: lab1_yanwen9969 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`lab1_yanwen9969` is a English model originally trained by Yanwen9969. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/lab1_yanwen9969_en_5.5.0_3.0_1727108287546.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/lab1_yanwen9969_en_5.5.0_3.0_1727108287546.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("lab1_yanwen9969","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("lab1_yanwen9969", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|lab1_yanwen9969| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Yanwen9969/Lab1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-lab1_yanwen9969_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-lab1_yanwen9969_pipeline_en.md new file mode 100644 index 00000000000000..7c0f44568f52c0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-lab1_yanwen9969_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English lab1_yanwen9969_pipeline pipeline DistilBertForSequenceClassification from Yanwen9969 +author: John Snow Labs +name: lab1_yanwen9969_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`lab1_yanwen9969_pipeline` is a English model originally trained by Yanwen9969. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/lab1_yanwen9969_pipeline_en_5.5.0_3.0_1727108304479.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/lab1_yanwen9969_pipeline_en_5.5.0_3.0_1727108304479.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("lab1_yanwen9969_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("lab1_yanwen9969_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|lab1_yanwen9969_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Yanwen9969/Lab1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-lab2_whisper_swedish_hi.md b/docs/_posts/ahmedlone127/2024-09-23-lab2_whisper_swedish_hi.md new file mode 100644 index 00000000000000..5faf945a3a1300 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-lab2_whisper_swedish_hi.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Hindi lab2_whisper_swedish WhisperForCTC from SodraZatre +author: John Snow Labs +name: lab2_whisper_swedish +date: 2024-09-23 +tags: [hi, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: hi +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`lab2_whisper_swedish` is a Hindi model originally trained by SodraZatre. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/lab2_whisper_swedish_hi_5.5.0_3.0_1727116461072.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/lab2_whisper_swedish_hi_5.5.0_3.0_1727116461072.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("lab2_whisper_swedish","hi") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("lab2_whisper_swedish", "hi") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|lab2_whisper_swedish| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|hi| +|Size:|1.7 GB| + +## References + +https://huggingface.co/SodraZatre/lab2-whisper-sv \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-lab2_whisper_swedish_pipeline_hi.md b/docs/_posts/ahmedlone127/2024-09-23-lab2_whisper_swedish_pipeline_hi.md new file mode 100644 index 00000000000000..aebf71baf5d78e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-lab2_whisper_swedish_pipeline_hi.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Hindi lab2_whisper_swedish_pipeline pipeline WhisperForCTC from SodraZatre +author: John Snow Labs +name: lab2_whisper_swedish_pipeline +date: 2024-09-23 +tags: [hi, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: hi +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`lab2_whisper_swedish_pipeline` is a Hindi model originally trained by SodraZatre. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/lab2_whisper_swedish_pipeline_hi_5.5.0_3.0_1727116560650.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/lab2_whisper_swedish_pipeline_hi_5.5.0_3.0_1727116560650.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("lab2_whisper_swedish_pipeline", lang = "hi") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("lab2_whisper_swedish_pipeline", lang = "hi") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|lab2_whisper_swedish_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|hi| +|Size:|1.7 GB| + +## References + +https://huggingface.co/SodraZatre/lab2-whisper-sv + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-lab_11_distilbert_sentiment_en.md b/docs/_posts/ahmedlone127/2024-09-23-lab_11_distilbert_sentiment_en.md new file mode 100644 index 00000000000000..4f88a4d6884ff4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-lab_11_distilbert_sentiment_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English lab_11_distilbert_sentiment DistilBertForSequenceClassification from Malecc +author: John Snow Labs +name: lab_11_distilbert_sentiment +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`lab_11_distilbert_sentiment` is a English model originally trained by Malecc. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/lab_11_distilbert_sentiment_en_5.5.0_3.0_1727097178410.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/lab_11_distilbert_sentiment_en_5.5.0_3.0_1727097178410.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("lab_11_distilbert_sentiment","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("lab_11_distilbert_sentiment", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|lab_11_distilbert_sentiment| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Malecc/lab_11_distilbert_sentiment \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-lab_11_distilbert_sentiment_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-lab_11_distilbert_sentiment_pipeline_en.md new file mode 100644 index 00000000000000..a5a6680c0ded8d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-lab_11_distilbert_sentiment_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English lab_11_distilbert_sentiment_pipeline pipeline DistilBertForSequenceClassification from Malecc +author: John Snow Labs +name: lab_11_distilbert_sentiment_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`lab_11_distilbert_sentiment_pipeline` is a English model originally trained by Malecc. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/lab_11_distilbert_sentiment_pipeline_en_5.5.0_3.0_1727097190060.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/lab_11_distilbert_sentiment_pipeline_en_5.5.0_3.0_1727097190060.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("lab_11_distilbert_sentiment_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("lab_11_distilbert_sentiment_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|lab_11_distilbert_sentiment_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Malecc/lab_11_distilbert_sentiment + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-legal_gqa_bert_augmented_1000_en.md b/docs/_posts/ahmedlone127/2024-09-23-legal_gqa_bert_augmented_1000_en.md new file mode 100644 index 00000000000000..8cf33187e85663 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-legal_gqa_bert_augmented_1000_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English legal_gqa_bert_augmented_1000 BertForQuestionAnswering from farid1088 +author: John Snow Labs +name: legal_gqa_bert_augmented_1000 +date: 2024-09-23 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`legal_gqa_bert_augmented_1000` is a English model originally trained by farid1088. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/legal_gqa_bert_augmented_1000_en_5.5.0_3.0_1727050234095.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/legal_gqa_bert_augmented_1000_en_5.5.0_3.0_1727050234095.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("legal_gqa_bert_augmented_1000","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("legal_gqa_bert_augmented_1000", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|legal_gqa_bert_augmented_1000| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|406.9 MB| + +## References + +https://huggingface.co/farid1088/Legal_GQA_BERT_augmented_1000 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-legal_gqa_bert_augmented_1000_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-legal_gqa_bert_augmented_1000_pipeline_en.md new file mode 100644 index 00000000000000..40ea022b5f4254 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-legal_gqa_bert_augmented_1000_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English legal_gqa_bert_augmented_1000_pipeline pipeline BertForQuestionAnswering from farid1088 +author: John Snow Labs +name: legal_gqa_bert_augmented_1000_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`legal_gqa_bert_augmented_1000_pipeline` is a English model originally trained by farid1088. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/legal_gqa_bert_augmented_1000_pipeline_en_5.5.0_3.0_1727050257283.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/legal_gqa_bert_augmented_1000_pipeline_en_5.5.0_3.0_1727050257283.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("legal_gqa_bert_augmented_1000_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("legal_gqa_bert_augmented_1000_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|legal_gqa_bert_augmented_1000_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|406.9 MB| + +## References + +https://huggingface.co/farid1088/Legal_GQA_BERT_augmented_1000 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-len_pruned_30_model_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-len_pruned_30_model_pipeline_en.md new file mode 100644 index 00000000000000..e8b150caff080f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-len_pruned_30_model_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English len_pruned_30_model_pipeline pipeline DistilBertForSequenceClassification from andygoh5 +author: John Snow Labs +name: len_pruned_30_model_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`len_pruned_30_model_pipeline` is a English model originally trained by andygoh5. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/len_pruned_30_model_pipeline_en_5.5.0_3.0_1727093724827.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/len_pruned_30_model_pipeline_en_5.5.0_3.0_1727093724827.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("len_pruned_30_model_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("len_pruned_30_model_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|len_pruned_30_model_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/andygoh5/len-pruned-30-model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-liar_fake_news_roberta_base_en.md b/docs/_posts/ahmedlone127/2024-09-23-liar_fake_news_roberta_base_en.md new file mode 100644 index 00000000000000..53f2758d1321c6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-liar_fake_news_roberta_base_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English liar_fake_news_roberta_base RoBertaEmbeddings from Jawaher +author: John Snow Labs +name: liar_fake_news_roberta_base +date: 2024-09-23 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`liar_fake_news_roberta_base` is a English model originally trained by Jawaher. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/liar_fake_news_roberta_base_en_5.5.0_3.0_1727092161072.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/liar_fake_news_roberta_base_en_5.5.0_3.0_1727092161072.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("liar_fake_news_roberta_base","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("liar_fake_news_roberta_base","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|liar_fake_news_roberta_base| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|466.3 MB| + +## References + +https://huggingface.co/Jawaher/LIAR-fake-news-roberta-base \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-liar_fake_news_roberta_base_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-liar_fake_news_roberta_base_pipeline_en.md new file mode 100644 index 00000000000000..0d0003b82e9ffa --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-liar_fake_news_roberta_base_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English liar_fake_news_roberta_base_pipeline pipeline RoBertaEmbeddings from Jawaher +author: John Snow Labs +name: liar_fake_news_roberta_base_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`liar_fake_news_roberta_base_pipeline` is a English model originally trained by Jawaher. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/liar_fake_news_roberta_base_pipeline_en_5.5.0_3.0_1727092185302.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/liar_fake_news_roberta_base_pipeline_en_5.5.0_3.0_1727092185302.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("liar_fake_news_roberta_base_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("liar_fake_news_roberta_base_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|liar_fake_news_roberta_base_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|466.3 MB| + +## References + +https://huggingface.co/Jawaher/LIAR-fake-news-roberta-base + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-lnm_ner_en.md b/docs/_posts/ahmedlone127/2024-09-23-lnm_ner_en.md new file mode 100644 index 00000000000000..a53cf63d2a4b74 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-lnm_ner_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English lnm_ner BertForTokenClassification from suyashmittal +author: John Snow Labs +name: lnm_ner +date: 2024-09-23 +tags: [en, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`lnm_ner` is a English model originally trained by suyashmittal. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/lnm_ner_en_5.5.0_3.0_1727129850854.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/lnm_ner_en_5.5.0_3.0_1727129850854.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("lnm_ner","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("lnm_ner", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|lnm_ner| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/suyashmittal/lnm-ner \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-melodea_final_model_en.md b/docs/_posts/ahmedlone127/2024-09-23-melodea_final_model_en.md new file mode 100644 index 00000000000000..11c6216466a90a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-melodea_final_model_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English melodea_final_model RoBertaForSequenceClassification from GabiRayman +author: John Snow Labs +name: melodea_final_model +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`melodea_final_model` is a English model originally trained by GabiRayman. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/melodea_final_model_en_5.5.0_3.0_1727055072815.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/melodea_final_model_en_5.5.0_3.0_1727055072815.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("melodea_final_model","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("melodea_final_model", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|melodea_final_model| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|429.4 MB| + +## References + +https://huggingface.co/GabiRayman/melodea_final-model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-melodea_final_model_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-melodea_final_model_pipeline_en.md new file mode 100644 index 00000000000000..3f44f1481b697b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-melodea_final_model_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English melodea_final_model_pipeline pipeline RoBertaForSequenceClassification from GabiRayman +author: John Snow Labs +name: melodea_final_model_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`melodea_final_model_pipeline` is a English model originally trained by GabiRayman. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/melodea_final_model_pipeline_en_5.5.0_3.0_1727055109373.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/melodea_final_model_pipeline_en_5.5.0_3.0_1727055109373.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("melodea_final_model_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("melodea_final_model_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|melodea_final_model_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|429.4 MB| + +## References + +https://huggingface.co/GabiRayman/melodea_final-model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-missingbertmodelfinal1_en.md b/docs/_posts/ahmedlone127/2024-09-23-missingbertmodelfinal1_en.md new file mode 100644 index 00000000000000..874bfa0ef3f053 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-missingbertmodelfinal1_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English missingbertmodelfinal1 DistilBertForSequenceClassification from sachit56 +author: John Snow Labs +name: missingbertmodelfinal1 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`missingbertmodelfinal1` is a English model originally trained by sachit56. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/missingbertmodelfinal1_en_5.5.0_3.0_1727059660384.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/missingbertmodelfinal1_en_5.5.0_3.0_1727059660384.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("missingbertmodelfinal1","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("missingbertmodelfinal1", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|missingbertmodelfinal1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/sachit56/missingbertmodelfinal1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-missingbertmodelfinal1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-missingbertmodelfinal1_pipeline_en.md new file mode 100644 index 00000000000000..0f5ab07f52bb52 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-missingbertmodelfinal1_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English missingbertmodelfinal1_pipeline pipeline DistilBertForSequenceClassification from sachit56 +author: John Snow Labs +name: missingbertmodelfinal1_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`missingbertmodelfinal1_pipeline` is a English model originally trained by sachit56. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/missingbertmodelfinal1_pipeline_en_5.5.0_3.0_1727059673068.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/missingbertmodelfinal1_pipeline_en_5.5.0_3.0_1727059673068.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("missingbertmodelfinal1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("missingbertmodelfinal1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|missingbertmodelfinal1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/sachit56/missingbertmodelfinal1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-mnli_distilled_bart_cross_roberta_en.md b/docs/_posts/ahmedlone127/2024-09-23-mnli_distilled_bart_cross_roberta_en.md new file mode 100644 index 00000000000000..4001a0bdd6102c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-mnli_distilled_bart_cross_roberta_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English mnli_distilled_bart_cross_roberta RoBertaForSequenceClassification from Sayan01 +author: John Snow Labs +name: mnli_distilled_bart_cross_roberta +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`mnli_distilled_bart_cross_roberta` is a English model originally trained by Sayan01. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/mnli_distilled_bart_cross_roberta_en_5.5.0_3.0_1727085580049.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/mnli_distilled_bart_cross_roberta_en_5.5.0_3.0_1727085580049.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("mnli_distilled_bart_cross_roberta","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("mnli_distilled_bart_cross_roberta", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|mnli_distilled_bart_cross_roberta| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|308.9 MB| + +## References + +https://huggingface.co/Sayan01/mnli-distilled-bart-cross-roberta \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-mnli_distilled_bart_cross_roberta_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-mnli_distilled_bart_cross_roberta_pipeline_en.md new file mode 100644 index 00000000000000..cc3f3c4a1bf02a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-mnli_distilled_bart_cross_roberta_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English mnli_distilled_bart_cross_roberta_pipeline pipeline RoBertaForSequenceClassification from Sayan01 +author: John Snow Labs +name: mnli_distilled_bart_cross_roberta_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`mnli_distilled_bart_cross_roberta_pipeline` is a English model originally trained by Sayan01. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/mnli_distilled_bart_cross_roberta_pipeline_en_5.5.0_3.0_1727085594749.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/mnli_distilled_bart_cross_roberta_pipeline_en_5.5.0_3.0_1727085594749.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("mnli_distilled_bart_cross_roberta_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("mnli_distilled_bart_cross_roberta_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|mnli_distilled_bart_cross_roberta_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|309.0 MB| + +## References + +https://huggingface.co/Sayan01/mnli-distilled-bart-cross-roberta + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-mobilebert_imdb_en.md b/docs/_posts/ahmedlone127/2024-09-23-mobilebert_imdb_en.md new file mode 100644 index 00000000000000..4b8f75836888b1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-mobilebert_imdb_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English mobilebert_imdb DistilBertForSequenceClassification from Cippppy +author: John Snow Labs +name: mobilebert_imdb +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`mobilebert_imdb` is a English model originally trained by Cippppy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/mobilebert_imdb_en_5.5.0_3.0_1727086985826.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/mobilebert_imdb_en_5.5.0_3.0_1727086985826.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("mobilebert_imdb","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("mobilebert_imdb", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|mobilebert_imdb| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|238.8 MB| + +## References + +https://huggingface.co/Cippppy/mobileBert_imdb \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-mobilebert_imdb_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-mobilebert_imdb_pipeline_en.md new file mode 100644 index 00000000000000..fbc7645f21af80 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-mobilebert_imdb_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English mobilebert_imdb_pipeline pipeline DistilBertForSequenceClassification from Cippppy +author: John Snow Labs +name: mobilebert_imdb_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`mobilebert_imdb_pipeline` is a English model originally trained by Cippppy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/mobilebert_imdb_pipeline_en_5.5.0_3.0_1727087001139.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/mobilebert_imdb_pipeline_en_5.5.0_3.0_1727087001139.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("mobilebert_imdb_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("mobilebert_imdb_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|mobilebert_imdb_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|238.8 MB| + +## References + +https://huggingface.co/Cippppy/mobileBert_imdb + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-model_1_8_en.md b/docs/_posts/ahmedlone127/2024-09-23-model_1_8_en.md new file mode 100644 index 00000000000000..4b9d05b7c96055 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-model_1_8_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English model_1_8 RoBertaForSequenceClassification from raydentseng +author: John Snow Labs +name: model_1_8 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`model_1_8` is a English model originally trained by raydentseng. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/model_1_8_en_5.5.0_3.0_1727135064897.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/model_1_8_en_5.5.0_3.0_1727135064897.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("model_1_8","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("model_1_8", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|model_1_8| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|437.9 MB| + +## References + +https://huggingface.co/raydentseng/model_1_8 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-model_1_8_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-model_1_8_pipeline_en.md new file mode 100644 index 00000000000000..9dc2ac5d17ef21 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-model_1_8_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English model_1_8_pipeline pipeline RoBertaForSequenceClassification from raydentseng +author: John Snow Labs +name: model_1_8_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`model_1_8_pipeline` is a English model originally trained by raydentseng. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/model_1_8_pipeline_en_5.5.0_3.0_1727135087186.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/model_1_8_pipeline_en_5.5.0_3.0_1727135087186.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("model_1_8_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("model_1_8_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|model_1_8_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|438.0 MB| + +## References + +https://huggingface.co/raydentseng/model_1_8 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-model_sentence_entailment_hackaton_2_en.md b/docs/_posts/ahmedlone127/2024-09-23-model_sentence_entailment_hackaton_2_en.md new file mode 100644 index 00000000000000..fe20e7d95a0695 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-model_sentence_entailment_hackaton_2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English model_sentence_entailment_hackaton_2 RoBertaForSequenceClassification from ludoviciarraga +author: John Snow Labs +name: model_sentence_entailment_hackaton_2 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`model_sentence_entailment_hackaton_2` is a English model originally trained by ludoviciarraga. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/model_sentence_entailment_hackaton_2_en_5.5.0_3.0_1727135295638.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/model_sentence_entailment_hackaton_2_en_5.5.0_3.0_1727135295638.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("model_sentence_entailment_hackaton_2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("model_sentence_entailment_hackaton_2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|model_sentence_entailment_hackaton_2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/ludoviciarraga/model_sentence_entailment_hackaton_2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-model_sentence_entailment_hackaton_2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-model_sentence_entailment_hackaton_2_pipeline_en.md new file mode 100644 index 00000000000000..07b37a002441fc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-model_sentence_entailment_hackaton_2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English model_sentence_entailment_hackaton_2_pipeline pipeline RoBertaForSequenceClassification from ludoviciarraga +author: John Snow Labs +name: model_sentence_entailment_hackaton_2_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`model_sentence_entailment_hackaton_2_pipeline` is a English model originally trained by ludoviciarraga. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/model_sentence_entailment_hackaton_2_pipeline_en_5.5.0_3.0_1727135371470.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/model_sentence_entailment_hackaton_2_pipeline_en_5.5.0_3.0_1727135371470.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("model_sentence_entailment_hackaton_2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("model_sentence_entailment_hackaton_2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|model_sentence_entailment_hackaton_2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/ludoviciarraga/model_sentence_entailment_hackaton_2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-model_test_for_upload_en.md b/docs/_posts/ahmedlone127/2024-09-23-model_test_for_upload_en.md new file mode 100644 index 00000000000000..bc6f1c3226a82f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-model_test_for_upload_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English model_test_for_upload DistilBertForSequenceClassification from JACOBBBB +author: John Snow Labs +name: model_test_for_upload +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`model_test_for_upload` is a English model originally trained by JACOBBBB. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/model_test_for_upload_en_5.5.0_3.0_1727086844099.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/model_test_for_upload_en_5.5.0_3.0_1727086844099.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("model_test_for_upload","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("model_test_for_upload", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|model_test_for_upload| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/JACOBBBB/model_test_for_upload \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-model_test_for_upload_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-model_test_for_upload_pipeline_en.md new file mode 100644 index 00000000000000..4f531315643a93 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-model_test_for_upload_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English model_test_for_upload_pipeline pipeline DistilBertForSequenceClassification from JACOBBBB +author: John Snow Labs +name: model_test_for_upload_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`model_test_for_upload_pipeline` is a English model originally trained by JACOBBBB. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/model_test_for_upload_pipeline_en_5.5.0_3.0_1727086855901.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/model_test_for_upload_pipeline_en_5.5.0_3.0_1727086855901.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("model_test_for_upload_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("model_test_for_upload_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|model_test_for_upload_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/JACOBBBB/model_test_for_upload + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-msroberta_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-msroberta_pipeline_en.md new file mode 100644 index 00000000000000..2e4ba32a0c6e6c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-msroberta_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English msroberta_pipeline pipeline RoBertaEmbeddings from nkoh01 +author: John Snow Labs +name: msroberta_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`msroberta_pipeline` is a English model originally trained by nkoh01. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/msroberta_pipeline_en_5.5.0_3.0_1727056624961.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/msroberta_pipeline_en_5.5.0_3.0_1727056624961.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("msroberta_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("msroberta_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|msroberta_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|306.5 MB| + +## References + +https://huggingface.co/nkoh01/MSRoberta + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-mt5_base_visp_s2_en.md b/docs/_posts/ahmedlone127/2024-09-23-mt5_base_visp_s2_en.md new file mode 100644 index 00000000000000..4709d8a62ad48f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-mt5_base_visp_s2_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English mt5_base_visp_s2 T5Transformer from ngwgsang +author: John Snow Labs +name: mt5_base_visp_s2 +date: 2024-09-23 +tags: [en, open_source, onnx, t5, question_answering, summarization, translation, text_generation] +task: [Question Answering, Summarization, Translation, Text Generation] +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: T5Transformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained T5Transformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`mt5_base_visp_s2` is a English model originally trained by ngwgsang. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/mt5_base_visp_s2_en_5.5.0_3.0_1727068944585.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/mt5_base_visp_s2_en_5.5.0_3.0_1727068944585.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +t5 = T5Transformer.pretrained("mt5_base_visp_s2","en") \ + .setInputCols(["document"]) \ + .setOutputCol("output") + +pipeline = Pipeline().setStages([documentAssembler, t5]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val t5 = T5Transformer.pretrained("mt5_base_visp_s2", "en") + .setInputCols(Array("documents")) + .setOutputCol("output") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, t5)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|mt5_base_visp_s2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[output]| +|Language:|en| +|Size:|2.3 GB| + +## References + +https://huggingface.co/ngwgsang/mt5-base-visp-s2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-mt5_base_visp_s2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-mt5_base_visp_s2_pipeline_en.md new file mode 100644 index 00000000000000..3b4f162859ce80 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-mt5_base_visp_s2_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English mt5_base_visp_s2_pipeline pipeline T5Transformer from ngwgsang +author: John Snow Labs +name: mt5_base_visp_s2_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: [Question Answering, Summarization, Translation, Text Generation] +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained T5Transformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`mt5_base_visp_s2_pipeline` is a English model originally trained by ngwgsang. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/mt5_base_visp_s2_pipeline_en_5.5.0_3.0_1727069187914.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/mt5_base_visp_s2_pipeline_en_5.5.0_3.0_1727069187914.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("mt5_base_visp_s2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("mt5_base_visp_s2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|mt5_base_visp_s2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|2.3 GB| + +## References + +https://huggingface.co/ngwgsang/mt5-base-visp-s2 + +## Included Models + +- DocumentAssembler +- T5Transformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-muril_large_chaii_hi.md b/docs/_posts/ahmedlone127/2024-09-23-muril_large_chaii_hi.md new file mode 100644 index 00000000000000..969550450a428b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-muril_large_chaii_hi.md @@ -0,0 +1,86 @@ +--- +layout: model +title: Hindi muril_large_chaii BertForQuestionAnswering from abhishek +author: John Snow Labs +name: muril_large_chaii +date: 2024-09-23 +tags: [hi, open_source, onnx, question_answering, bert] +task: Question Answering +language: hi +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`muril_large_chaii` is a Hindi model originally trained by abhishek. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/muril_large_chaii_hi_5.5.0_3.0_1727070766823.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/muril_large_chaii_hi_5.5.0_3.0_1727070766823.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("muril_large_chaii","hi") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("muril_large_chaii", "hi") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|muril_large_chaii| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|hi| +|Size:|1.9 GB| + +## References + +https://huggingface.co/abhishek/muril-large-chaii \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-muril_large_chaii_pipeline_hi.md b/docs/_posts/ahmedlone127/2024-09-23-muril_large_chaii_pipeline_hi.md new file mode 100644 index 00000000000000..6a2470aea3797f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-muril_large_chaii_pipeline_hi.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Hindi muril_large_chaii_pipeline pipeline BertForQuestionAnswering from abhishek +author: John Snow Labs +name: muril_large_chaii_pipeline +date: 2024-09-23 +tags: [hi, open_source, pipeline, onnx] +task: Question Answering +language: hi +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`muril_large_chaii_pipeline` is a Hindi model originally trained by abhishek. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/muril_large_chaii_pipeline_hi_5.5.0_3.0_1727070855702.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/muril_large_chaii_pipeline_hi_5.5.0_3.0_1727070855702.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("muril_large_chaii_pipeline", lang = "hi") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("muril_large_chaii_pipeline", lang = "hi") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|muril_large_chaii_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|hi| +|Size:|1.9 GB| + +## References + +https://huggingface.co/abhishek/muril-large-chaii + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-n_distilbert_sst5_padding20model_en.md b/docs/_posts/ahmedlone127/2024-09-23-n_distilbert_sst5_padding20model_en.md new file mode 100644 index 00000000000000..2f08870b8c9710 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-n_distilbert_sst5_padding20model_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English n_distilbert_sst5_padding20model DistilBertForSequenceClassification from Realgon +author: John Snow Labs +name: n_distilbert_sst5_padding20model +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`n_distilbert_sst5_padding20model` is a English model originally trained by Realgon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/n_distilbert_sst5_padding20model_en_5.5.0_3.0_1727110752226.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/n_distilbert_sst5_padding20model_en_5.5.0_3.0_1727110752226.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("n_distilbert_sst5_padding20model","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("n_distilbert_sst5_padding20model", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|n_distilbert_sst5_padding20model| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Realgon/N_distilbert_sst5_padding20model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-n_distilbert_sst5_padding20model_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-n_distilbert_sst5_padding20model_pipeline_en.md new file mode 100644 index 00000000000000..415bd98d40877c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-n_distilbert_sst5_padding20model_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English n_distilbert_sst5_padding20model_pipeline pipeline DistilBertForSequenceClassification from Realgon +author: John Snow Labs +name: n_distilbert_sst5_padding20model_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`n_distilbert_sst5_padding20model_pipeline` is a English model originally trained by Realgon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/n_distilbert_sst5_padding20model_pipeline_en_5.5.0_3.0_1727110763622.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/n_distilbert_sst5_padding20model_pipeline_en_5.5.0_3.0_1727110763622.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("n_distilbert_sst5_padding20model_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("n_distilbert_sst5_padding20model_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|n_distilbert_sst5_padding20model_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Realgon/N_distilbert_sst5_padding20model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-n_distilbert_sst5_padding60model_en.md b/docs/_posts/ahmedlone127/2024-09-23-n_distilbert_sst5_padding60model_en.md new file mode 100644 index 00000000000000..3161373f706be1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-n_distilbert_sst5_padding60model_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English n_distilbert_sst5_padding60model DistilBertForSequenceClassification from Realgon +author: John Snow Labs +name: n_distilbert_sst5_padding60model +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`n_distilbert_sst5_padding60model` is a English model originally trained by Realgon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/n_distilbert_sst5_padding60model_en_5.5.0_3.0_1727093928776.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/n_distilbert_sst5_padding60model_en_5.5.0_3.0_1727093928776.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("n_distilbert_sst5_padding60model","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("n_distilbert_sst5_padding60model", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|n_distilbert_sst5_padding60model| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.6 MB| + +## References + +https://huggingface.co/Realgon/N_distilbert_sst5_padding60model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-n_distilbert_sst5_padding60model_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-n_distilbert_sst5_padding60model_pipeline_en.md new file mode 100644 index 00000000000000..69524f584658e2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-n_distilbert_sst5_padding60model_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English n_distilbert_sst5_padding60model_pipeline pipeline DistilBertForSequenceClassification from Realgon +author: John Snow Labs +name: n_distilbert_sst5_padding60model_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`n_distilbert_sst5_padding60model_pipeline` is a English model originally trained by Realgon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/n_distilbert_sst5_padding60model_pipeline_en_5.5.0_3.0_1727093940809.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/n_distilbert_sst5_padding60model_pipeline_en_5.5.0_3.0_1727093940809.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("n_distilbert_sst5_padding60model_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("n_distilbert_sst5_padding60model_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|n_distilbert_sst5_padding60model_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.7 MB| + +## References + +https://huggingface.co/Realgon/N_distilbert_sst5_padding60model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-n_distilbert_twitterfin_padding30model_en.md b/docs/_posts/ahmedlone127/2024-09-23-n_distilbert_twitterfin_padding30model_en.md new file mode 100644 index 00000000000000..325b8beed62dbf --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-n_distilbert_twitterfin_padding30model_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English n_distilbert_twitterfin_padding30model DistilBertForSequenceClassification from Realgon +author: John Snow Labs +name: n_distilbert_twitterfin_padding30model +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`n_distilbert_twitterfin_padding30model` is a English model originally trained by Realgon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/n_distilbert_twitterfin_padding30model_en_5.5.0_3.0_1727110724511.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/n_distilbert_twitterfin_padding30model_en_5.5.0_3.0_1727110724511.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("n_distilbert_twitterfin_padding30model","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("n_distilbert_twitterfin_padding30model", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|n_distilbert_twitterfin_padding30model| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Realgon/N_distilbert_twitterfin_padding30model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-n_distilbert_twitterfin_padding30model_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-n_distilbert_twitterfin_padding30model_pipeline_en.md new file mode 100644 index 00000000000000..cf541633bafc6f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-n_distilbert_twitterfin_padding30model_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English n_distilbert_twitterfin_padding30model_pipeline pipeline DistilBertForSequenceClassification from Realgon +author: John Snow Labs +name: n_distilbert_twitterfin_padding30model_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`n_distilbert_twitterfin_padding30model_pipeline` is a English model originally trained by Realgon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/n_distilbert_twitterfin_padding30model_pipeline_en_5.5.0_3.0_1727110736186.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/n_distilbert_twitterfin_padding30model_pipeline_en_5.5.0_3.0_1727110736186.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("n_distilbert_twitterfin_padding30model_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("n_distilbert_twitterfin_padding30model_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|n_distilbert_twitterfin_padding30model_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.6 MB| + +## References + +https://huggingface.co/Realgon/N_distilbert_twitterfin_padding30model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-nameattrsbert_en.md b/docs/_posts/ahmedlone127/2024-09-23-nameattrsbert_en.md new file mode 100644 index 00000000000000..df408980558369 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-nameattrsbert_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English nameattrsbert BertForSequenceClassification from madgnome +author: John Snow Labs +name: nameattrsbert +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`nameattrsbert` is a English model originally trained by madgnome. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/nameattrsbert_en_5.5.0_3.0_1727095464620.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/nameattrsbert_en_5.5.0_3.0_1727095464620.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("nameattrsbert","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("nameattrsbert", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|nameattrsbert| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|669.1 MB| + +## References + +https://huggingface.co/madgnome/nameattrsbert \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-nameattrsbert_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-nameattrsbert_pipeline_en.md new file mode 100644 index 00000000000000..33c7e80fde2433 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-nameattrsbert_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English nameattrsbert_pipeline pipeline BertForSequenceClassification from madgnome +author: John Snow Labs +name: nameattrsbert_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`nameattrsbert_pipeline` is a English model originally trained by madgnome. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/nameattrsbert_pipeline_en_5.5.0_3.0_1727095496474.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/nameattrsbert_pipeline_en_5.5.0_3.0_1727095496474.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("nameattrsbert_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("nameattrsbert_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|nameattrsbert_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|669.1 MB| + +## References + +https://huggingface.co/madgnome/nameattrsbert + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-nepal_bhasa_dataset_bert_en.md b/docs/_posts/ahmedlone127/2024-09-23-nepal_bhasa_dataset_bert_en.md new file mode 100644 index 00000000000000..2954f7f6c61df6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-nepal_bhasa_dataset_bert_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English nepal_bhasa_dataset_bert RoBertaEmbeddings from ubaskota +author: John Snow Labs +name: nepal_bhasa_dataset_bert +date: 2024-09-23 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`nepal_bhasa_dataset_bert` is a English model originally trained by ubaskota. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/nepal_bhasa_dataset_bert_en_5.5.0_3.0_1727122069822.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/nepal_bhasa_dataset_bert_en_5.5.0_3.0_1727122069822.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("nepal_bhasa_dataset_bert","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("nepal_bhasa_dataset_bert","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|nepal_bhasa_dataset_bert| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|461.7 MB| + +## References + +https://huggingface.co/ubaskota/new_dataset_bert \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-nepal_bhasa_dataset_bert_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-nepal_bhasa_dataset_bert_pipeline_en.md new file mode 100644 index 00000000000000..789269d0439f5f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-nepal_bhasa_dataset_bert_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English nepal_bhasa_dataset_bert_pipeline pipeline RoBertaEmbeddings from ubaskota +author: John Snow Labs +name: nepal_bhasa_dataset_bert_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`nepal_bhasa_dataset_bert_pipeline` is a English model originally trained by ubaskota. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/nepal_bhasa_dataset_bert_pipeline_en_5.5.0_3.0_1727122092171.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/nepal_bhasa_dataset_bert_pipeline_en_5.5.0_3.0_1727122092171.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("nepal_bhasa_dataset_bert_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("nepal_bhasa_dataset_bert_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|nepal_bhasa_dataset_bert_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|461.7 MB| + +## References + +https://huggingface.co/ubaskota/new_dataset_bert + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-nepal_bhasa_model_vierpiet_en.md b/docs/_posts/ahmedlone127/2024-09-23-nepal_bhasa_model_vierpiet_en.md new file mode 100644 index 00000000000000..a6b6583a1e7eb3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-nepal_bhasa_model_vierpiet_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English nepal_bhasa_model_vierpiet DistilBertForSequenceClassification from vierpiet +author: John Snow Labs +name: nepal_bhasa_model_vierpiet +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`nepal_bhasa_model_vierpiet` is a English model originally trained by vierpiet. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/nepal_bhasa_model_vierpiet_en_5.5.0_3.0_1727082665415.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/nepal_bhasa_model_vierpiet_en_5.5.0_3.0_1727082665415.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("nepal_bhasa_model_vierpiet","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("nepal_bhasa_model_vierpiet", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|nepal_bhasa_model_vierpiet| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/vierpiet/new_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-nepal_bhasa_model_vierpiet_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-nepal_bhasa_model_vierpiet_pipeline_en.md new file mode 100644 index 00000000000000..d271df77dc5aca --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-nepal_bhasa_model_vierpiet_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English nepal_bhasa_model_vierpiet_pipeline pipeline DistilBertForSequenceClassification from vierpiet +author: John Snow Labs +name: nepal_bhasa_model_vierpiet_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`nepal_bhasa_model_vierpiet_pipeline` is a English model originally trained by vierpiet. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/nepal_bhasa_model_vierpiet_pipeline_en_5.5.0_3.0_1727082678736.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/nepal_bhasa_model_vierpiet_pipeline_en_5.5.0_3.0_1727082678736.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("nepal_bhasa_model_vierpiet_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("nepal_bhasa_model_vierpiet_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|nepal_bhasa_model_vierpiet_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/vierpiet/new_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-nepal_bhasa_phishing_email_detection2_en.md b/docs/_posts/ahmedlone127/2024-09-23-nepal_bhasa_phishing_email_detection2_en.md new file mode 100644 index 00000000000000..32e3351f3e7299 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-nepal_bhasa_phishing_email_detection2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English nepal_bhasa_phishing_email_detection2 DistilBertForSequenceClassification from kamikaze20 +author: John Snow Labs +name: nepal_bhasa_phishing_email_detection2 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`nepal_bhasa_phishing_email_detection2` is a English model originally trained by kamikaze20. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/nepal_bhasa_phishing_email_detection2_en_5.5.0_3.0_1727074260367.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/nepal_bhasa_phishing_email_detection2_en_5.5.0_3.0_1727074260367.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("nepal_bhasa_phishing_email_detection2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("nepal_bhasa_phishing_email_detection2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|nepal_bhasa_phishing_email_detection2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|246.0 MB| + +## References + +https://huggingface.co/kamikaze20/new_phishing-email-detection2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-nepal_bhasa_phishing_email_detection2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-nepal_bhasa_phishing_email_detection2_pipeline_en.md new file mode 100644 index 00000000000000..f42654403d6f08 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-nepal_bhasa_phishing_email_detection2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English nepal_bhasa_phishing_email_detection2_pipeline pipeline DistilBertForSequenceClassification from kamikaze20 +author: John Snow Labs +name: nepal_bhasa_phishing_email_detection2_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`nepal_bhasa_phishing_email_detection2_pipeline` is a English model originally trained by kamikaze20. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/nepal_bhasa_phishing_email_detection2_pipeline_en_5.5.0_3.0_1727074271998.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/nepal_bhasa_phishing_email_detection2_pipeline_en_5.5.0_3.0_1727074271998.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("nepal_bhasa_phishing_email_detection2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("nepal_bhasa_phishing_email_detection2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|nepal_bhasa_phishing_email_detection2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|246.0 MB| + +## References + +https://huggingface.co/kamikaze20/new_phishing-email-detection2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-nepal_bhasa_pretrained_model_en.md b/docs/_posts/ahmedlone127/2024-09-23-nepal_bhasa_pretrained_model_en.md new file mode 100644 index 00000000000000..59129916a9e17a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-nepal_bhasa_pretrained_model_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English nepal_bhasa_pretrained_model DistilBertForSequenceClassification from Vigneshwaran255 +author: John Snow Labs +name: nepal_bhasa_pretrained_model +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`nepal_bhasa_pretrained_model` is a English model originally trained by Vigneshwaran255. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/nepal_bhasa_pretrained_model_en_5.5.0_3.0_1727073859354.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/nepal_bhasa_pretrained_model_en_5.5.0_3.0_1727073859354.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("nepal_bhasa_pretrained_model","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("nepal_bhasa_pretrained_model", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|nepal_bhasa_pretrained_model| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Vigneshwaran255/new_pretrained_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-ner_clue_en.md b/docs/_posts/ahmedlone127/2024-09-23-ner_clue_en.md new file mode 100644 index 00000000000000..300018cc981e5f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-ner_clue_en.md @@ -0,0 +1,88 @@ +--- +layout: model +title: English ner_clue T5Transformer from helloya0908 +author: John Snow Labs +name: ner_clue +date: 2024-09-23 +tags: [en, open_source, onnx, t5, question_answering, summarization, translation, text_generation] +task: [Question Answering, Summarization, Translation, Text Generation] +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: T5Transformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained T5Transformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ner_clue` is a English model originally trained by helloya0908. + +## Predicted Entities + + + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ner_clue_en_5.5.0_3.0_1727124792734.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ner_clue_en_5.5.0_3.0_1727124792734.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +t5 = T5Transformer.pretrained("ner_clue","en") \ + .setInputCols(["document"]) \ + .setOutputCol("output") + +pipeline = Pipeline().setStages([documentAssembler, t5]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val t5 = T5Transformer.pretrained("ner_clue", "en") + .setInputCols(Array("documents")) + .setOutputCol("output") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, t5)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ner_clue| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[output]| +|Language:|en| +|Size:|950.4 MB| + +## References + +References + +https://huggingface.co/helloya0908/NER_CLUE \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-ner_clue_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-ner_clue_pipeline_en.md new file mode 100644 index 00000000000000..9434ed18a628d2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-ner_clue_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English ner_clue_pipeline pipeline T5Transformer from helloya0908 +author: John Snow Labs +name: ner_clue_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: [Question Answering, Summarization, Translation, Text Generation] +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained T5Transformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ner_clue_pipeline` is a English model originally trained by helloya0908. + +## Predicted Entities + + + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ner_clue_pipeline_en_5.5.0_3.0_1727124855743.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ner_clue_pipeline_en_5.5.0_3.0_1727124855743.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +pipeline = PretrainedPipeline("ner_clue_pipeline", lang = "en") +annotations = pipeline.transform(df) +``` +```scala +val pipeline = new PretrainedPipeline("ner_clue_pipeline", lang = "en") +val annotations = pipeline.transform(df) +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ner_clue_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|950.4 MB| + +## References + +References + +https://huggingface.co/helloya0908/NER_CLUE + +## Included Models + +- DocumentAssembler +- T5Transformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-ner_model_nathali99_en.md b/docs/_posts/ahmedlone127/2024-09-23-ner_model_nathali99_en.md new file mode 100644 index 00000000000000..b6a052d2090d5d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-ner_model_nathali99_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English ner_model_nathali99 BertForTokenClassification from Nathali99 +author: John Snow Labs +name: ner_model_nathali99 +date: 2024-09-23 +tags: [en, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ner_model_nathali99` is a English model originally trained by Nathali99. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ner_model_nathali99_en_5.5.0_3.0_1727129840770.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ner_model_nathali99_en_5.5.0_3.0_1727129840770.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("ner_model_nathali99","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("ner_model_nathali99", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ner_model_nathali99| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/Nathali99/ner-model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-ner_model_nathali99_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-ner_model_nathali99_pipeline_en.md new file mode 100644 index 00000000000000..bc8974cf1fdcbd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-ner_model_nathali99_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English ner_model_nathali99_pipeline pipeline BertForTokenClassification from Nathali99 +author: John Snow Labs +name: ner_model_nathali99_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ner_model_nathali99_pipeline` is a English model originally trained by Nathali99. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ner_model_nathali99_pipeline_en_5.5.0_3.0_1727129861197.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ner_model_nathali99_pipeline_en_5.5.0_3.0_1727129861197.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("ner_model_nathali99_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("ner_model_nathali99_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ner_model_nathali99_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/Nathali99/ner-model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-ner_ner_random1_seed0_bernice_en.md b/docs/_posts/ahmedlone127/2024-09-23-ner_ner_random1_seed0_bernice_en.md new file mode 100644 index 00000000000000..d57952b78b534f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-ner_ner_random1_seed0_bernice_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English ner_ner_random1_seed0_bernice XlmRoBertaForTokenClassification from tweettemposhift +author: John Snow Labs +name: ner_ner_random1_seed0_bernice +date: 2024-09-23 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ner_ner_random1_seed0_bernice` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ner_ner_random1_seed0_bernice_en_5.5.0_3.0_1727132537180.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ner_ner_random1_seed0_bernice_en_5.5.0_3.0_1727132537180.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("ner_ner_random1_seed0_bernice","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("ner_ner_random1_seed0_bernice", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ner_ner_random1_seed0_bernice| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|802.5 MB| + +## References + +https://huggingface.co/tweettemposhift/ner-ner_random1_seed0-bernice \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-nerd_nerd_random0_seed1_roberta_large_en.md b/docs/_posts/ahmedlone127/2024-09-23-nerd_nerd_random0_seed1_roberta_large_en.md new file mode 100644 index 00000000000000..dfb3d08e4e3860 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-nerd_nerd_random0_seed1_roberta_large_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English nerd_nerd_random0_seed1_roberta_large RoBertaForSequenceClassification from tweettemposhift +author: John Snow Labs +name: nerd_nerd_random0_seed1_roberta_large +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`nerd_nerd_random0_seed1_roberta_large` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/nerd_nerd_random0_seed1_roberta_large_en_5.5.0_3.0_1727135019648.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/nerd_nerd_random0_seed1_roberta_large_en_5.5.0_3.0_1727135019648.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("nerd_nerd_random0_seed1_roberta_large","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("nerd_nerd_random0_seed1_roberta_large", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|nerd_nerd_random0_seed1_roberta_large| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/tweettemposhift/nerd-nerd_random0_seed1-roberta-large \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-nerd_nerd_random0_seed1_roberta_large_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-nerd_nerd_random0_seed1_roberta_large_pipeline_en.md new file mode 100644 index 00000000000000..c7d3d083afd8ea --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-nerd_nerd_random0_seed1_roberta_large_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English nerd_nerd_random0_seed1_roberta_large_pipeline pipeline RoBertaForSequenceClassification from tweettemposhift +author: John Snow Labs +name: nerd_nerd_random0_seed1_roberta_large_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`nerd_nerd_random0_seed1_roberta_large_pipeline` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/nerd_nerd_random0_seed1_roberta_large_pipeline_en_5.5.0_3.0_1727135092501.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/nerd_nerd_random0_seed1_roberta_large_pipeline_en_5.5.0_3.0_1727135092501.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("nerd_nerd_random0_seed1_roberta_large_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("nerd_nerd_random0_seed1_roberta_large_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|nerd_nerd_random0_seed1_roberta_large_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/tweettemposhift/nerd-nerd_random0_seed1-roberta-large + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-news_classifier_bert_en.md b/docs/_posts/ahmedlone127/2024-09-23-news_classifier_bert_en.md new file mode 100644 index 00000000000000..7364bed0898fe1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-news_classifier_bert_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English news_classifier_bert DistilBertForSequenceClassification from MehdiHosseiniMoghadam +author: John Snow Labs +name: news_classifier_bert +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`news_classifier_bert` is a English model originally trained by MehdiHosseiniMoghadam. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/news_classifier_bert_en_5.5.0_3.0_1727073940171.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/news_classifier_bert_en_5.5.0_3.0_1727073940171.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("news_classifier_bert","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("news_classifier_bert", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|news_classifier_bert| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/MehdiHosseiniMoghadam/news_classifier-bert \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-news_classifier_bert_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-news_classifier_bert_pipeline_en.md new file mode 100644 index 00000000000000..7321d04a7ae7fa --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-news_classifier_bert_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English news_classifier_bert_pipeline pipeline DistilBertForSequenceClassification from MehdiHosseiniMoghadam +author: John Snow Labs +name: news_classifier_bert_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`news_classifier_bert_pipeline` is a English model originally trained by MehdiHosseiniMoghadam. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/news_classifier_bert_pipeline_en_5.5.0_3.0_1727073952177.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/news_classifier_bert_pipeline_en_5.5.0_3.0_1727073952177.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("news_classifier_bert_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("news_classifier_bert_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|news_classifier_bert_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/MehdiHosseiniMoghadam/news_classifier-bert + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-newsmodelclassification_en.md b/docs/_posts/ahmedlone127/2024-09-23-newsmodelclassification_en.md new file mode 100644 index 00000000000000..a2aa58ce9fa5e2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-newsmodelclassification_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English newsmodelclassification DistilBertForSequenceClassification from aatmasidha +author: John Snow Labs +name: newsmodelclassification +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`newsmodelclassification` is a English model originally trained by aatmasidha. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/newsmodelclassification_en_5.5.0_3.0_1727082475431.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/newsmodelclassification_en_5.5.0_3.0_1727082475431.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("newsmodelclassification","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("newsmodelclassification", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|newsmodelclassification| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/aatmasidha/newsmodelclassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-oceanprotocol_tweet_impact_v1_en.md b/docs/_posts/ahmedlone127/2024-09-23-oceanprotocol_tweet_impact_v1_en.md new file mode 100644 index 00000000000000..f1cc18b5c0b5de --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-oceanprotocol_tweet_impact_v1_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English oceanprotocol_tweet_impact_v1 RoBertaForSequenceClassification from lucaordronneau +author: John Snow Labs +name: oceanprotocol_tweet_impact_v1 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`oceanprotocol_tweet_impact_v1` is a English model originally trained by lucaordronneau. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/oceanprotocol_tweet_impact_v1_en_5.5.0_3.0_1727085367829.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/oceanprotocol_tweet_impact_v1_en_5.5.0_3.0_1727085367829.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("oceanprotocol_tweet_impact_v1","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("oceanprotocol_tweet_impact_v1", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|oceanprotocol_tweet_impact_v1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|467.9 MB| + +## References + +https://huggingface.co/lucaordronneau/oceanprotocol-tweet-impact-v1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-oceanprotocol_tweet_impact_v1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-oceanprotocol_tweet_impact_v1_pipeline_en.md new file mode 100644 index 00000000000000..0ddfb30517cf27 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-oceanprotocol_tweet_impact_v1_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English oceanprotocol_tweet_impact_v1_pipeline pipeline RoBertaForSequenceClassification from lucaordronneau +author: John Snow Labs +name: oceanprotocol_tweet_impact_v1_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`oceanprotocol_tweet_impact_v1_pipeline` is a English model originally trained by lucaordronneau. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/oceanprotocol_tweet_impact_v1_pipeline_en_5.5.0_3.0_1727085390712.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/oceanprotocol_tweet_impact_v1_pipeline_en_5.5.0_3.0_1727085390712.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("oceanprotocol_tweet_impact_v1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("oceanprotocol_tweet_impact_v1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|oceanprotocol_tweet_impact_v1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|467.9 MB| + +## References + +https://huggingface.co/lucaordronneau/oceanprotocol-tweet-impact-v1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-odiaberta_en.md b/docs/_posts/ahmedlone127/2024-09-23-odiaberta_en.md new file mode 100644 index 00000000000000..00ccc7cd1bd13f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-odiaberta_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English odiaberta RoBertaEmbeddings from Nikhil7280 +author: John Snow Labs +name: odiaberta +date: 2024-09-23 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`odiaberta` is a English model originally trained by Nikhil7280. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/odiaberta_en_5.5.0_3.0_1727080662114.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/odiaberta_en_5.5.0_3.0_1727080662114.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("odiaberta","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("odiaberta","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|odiaberta| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|311.5 MB| + +## References + +https://huggingface.co/Nikhil7280/OdiaBERTa \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-odiaberta_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-odiaberta_pipeline_en.md new file mode 100644 index 00000000000000..decf71420857b7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-odiaberta_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English odiaberta_pipeline pipeline RoBertaEmbeddings from Nikhil7280 +author: John Snow Labs +name: odiaberta_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`odiaberta_pipeline` is a English model originally trained by Nikhil7280. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/odiaberta_pipeline_en_5.5.0_3.0_1727080677236.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/odiaberta_pipeline_en_5.5.0_3.0_1727080677236.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("odiaberta_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("odiaberta_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|odiaberta_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|311.6 MB| + +## References + +https://huggingface.co/Nikhil7280/OdiaBERTa + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-paraphrase_russian_crossencoder_mminilmv2_l12_h384_v1_en.md b/docs/_posts/ahmedlone127/2024-09-23-paraphrase_russian_crossencoder_mminilmv2_l12_h384_v1_en.md new file mode 100644 index 00000000000000..4a744ac0b28ae7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-paraphrase_russian_crossencoder_mminilmv2_l12_h384_v1_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English paraphrase_russian_crossencoder_mminilmv2_l12_h384_v1 XlmRoBertaForSequenceClassification from victorych22 +author: John Snow Labs +name: paraphrase_russian_crossencoder_mminilmv2_l12_h384_v1 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`paraphrase_russian_crossencoder_mminilmv2_l12_h384_v1` is a English model originally trained by victorych22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/paraphrase_russian_crossencoder_mminilmv2_l12_h384_v1_en_5.5.0_3.0_1727088308163.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/paraphrase_russian_crossencoder_mminilmv2_l12_h384_v1_en_5.5.0_3.0_1727088308163.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("paraphrase_russian_crossencoder_mminilmv2_l12_h384_v1","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("paraphrase_russian_crossencoder_mminilmv2_l12_h384_v1", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|paraphrase_russian_crossencoder_mminilmv2_l12_h384_v1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|145.7 MB| + +## References + +https://huggingface.co/victorych22/paraphrase-russian-crossencoder-mMiniLMv2-L12-H384-v1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-paraquantizar_en.md b/docs/_posts/ahmedlone127/2024-09-23-paraquantizar_en.md new file mode 100644 index 00000000000000..d48443f8580568 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-paraquantizar_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English paraquantizar RoBertaForSequenceClassification from Heber77 +author: John Snow Labs +name: paraquantizar +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`paraquantizar` is a English model originally trained by Heber77. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/paraquantizar_en_5.5.0_3.0_1727055575249.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/paraquantizar_en_5.5.0_3.0_1727055575249.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("paraquantizar","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("paraquantizar", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|paraquantizar| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|468.5 MB| + +## References + +https://huggingface.co/Heber77/paraquantizar \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-paraquantizar_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-paraquantizar_pipeline_en.md new file mode 100644 index 00000000000000..9d2031dda8d805 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-paraquantizar_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English paraquantizar_pipeline pipeline RoBertaForSequenceClassification from Heber77 +author: John Snow Labs +name: paraquantizar_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`paraquantizar_pipeline` is a English model originally trained by Heber77. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/paraquantizar_pipeline_en_5.5.0_3.0_1727055597488.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/paraquantizar_pipeline_en_5.5.0_3.0_1727055597488.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("paraquantizar_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("paraquantizar_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|paraquantizar_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|468.5 MB| + +## References + +https://huggingface.co/Heber77/paraquantizar + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-paws_x_xlm_r_only_spanish_en.md b/docs/_posts/ahmedlone127/2024-09-23-paws_x_xlm_r_only_spanish_en.md new file mode 100644 index 00000000000000..542738ef763009 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-paws_x_xlm_r_only_spanish_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English paws_x_xlm_r_only_spanish XlmRoBertaForSequenceClassification from semindan +author: John Snow Labs +name: paws_x_xlm_r_only_spanish +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`paws_x_xlm_r_only_spanish` is a English model originally trained by semindan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/paws_x_xlm_r_only_spanish_en_5.5.0_3.0_1727099329600.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/paws_x_xlm_r_only_spanish_en_5.5.0_3.0_1727099329600.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("paws_x_xlm_r_only_spanish","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("paws_x_xlm_r_only_spanish", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|paws_x_xlm_r_only_spanish| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|803.4 MB| + +## References + +https://huggingface.co/semindan/paws_x_xlm_r_only_es \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-paws_x_xlm_r_only_spanish_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-paws_x_xlm_r_only_spanish_pipeline_en.md new file mode 100644 index 00000000000000..52af452662e97b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-paws_x_xlm_r_only_spanish_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English paws_x_xlm_r_only_spanish_pipeline pipeline XlmRoBertaForSequenceClassification from semindan +author: John Snow Labs +name: paws_x_xlm_r_only_spanish_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`paws_x_xlm_r_only_spanish_pipeline` is a English model originally trained by semindan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/paws_x_xlm_r_only_spanish_pipeline_en_5.5.0_3.0_1727099457879.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/paws_x_xlm_r_only_spanish_pipeline_en_5.5.0_3.0_1727099457879.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("paws_x_xlm_r_only_spanish_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("paws_x_xlm_r_only_spanish_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|paws_x_xlm_r_only_spanish_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|803.4 MB| + +## References + +https://huggingface.co/semindan/paws_x_xlm_r_only_es + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-phowhisper_small_pipeline_vi.md b/docs/_posts/ahmedlone127/2024-09-23-phowhisper_small_pipeline_vi.md new file mode 100644 index 00000000000000..acdbfe675d005a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-phowhisper_small_pipeline_vi.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Vietnamese phowhisper_small_pipeline pipeline WhisperForCTC from huuquyet +author: John Snow Labs +name: phowhisper_small_pipeline +date: 2024-09-23 +tags: [vi, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: vi +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`phowhisper_small_pipeline` is a Vietnamese model originally trained by huuquyet. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/phowhisper_small_pipeline_vi_5.5.0_3.0_1727117104510.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/phowhisper_small_pipeline_vi_5.5.0_3.0_1727117104510.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("phowhisper_small_pipeline", lang = "vi") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("phowhisper_small_pipeline", lang = "vi") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|phowhisper_small_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|vi| +|Size:|1.7 GB| + +## References + +https://huggingface.co/huuquyet/PhoWhisper-small + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-phowhisper_small_vi.md b/docs/_posts/ahmedlone127/2024-09-23-phowhisper_small_vi.md new file mode 100644 index 00000000000000..966e2f72c6b635 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-phowhisper_small_vi.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Vietnamese phowhisper_small WhisperForCTC from huuquyet +author: John Snow Labs +name: phowhisper_small +date: 2024-09-23 +tags: [vi, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: vi +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`phowhisper_small` is a Vietnamese model originally trained by huuquyet. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/phowhisper_small_vi_5.5.0_3.0_1727117014948.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/phowhisper_small_vi_5.5.0_3.0_1727117014948.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("phowhisper_small","vi") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("phowhisper_small", "vi") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|phowhisper_small| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|vi| +|Size:|1.7 GB| + +## References + +https://huggingface.co/huuquyet/PhoWhisper-small \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-phrase_bank_yeah_en.md b/docs/_posts/ahmedlone127/2024-09-23-phrase_bank_yeah_en.md new file mode 100644 index 00000000000000..91fb751d9c2c5c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-phrase_bank_yeah_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English phrase_bank_yeah DistilBertForSequenceClassification from EthanHosier +author: John Snow Labs +name: phrase_bank_yeah +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`phrase_bank_yeah` is a English model originally trained by EthanHosier. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/phrase_bank_yeah_en_5.5.0_3.0_1727096947022.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/phrase_bank_yeah_en_5.5.0_3.0_1727096947022.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("phrase_bank_yeah","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("phrase_bank_yeah", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|phrase_bank_yeah| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/EthanHosier/phrase_bank_yeah \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-platzi_distilroberta_base_glue_mrpc_eduardo_ag_en.md b/docs/_posts/ahmedlone127/2024-09-23-platzi_distilroberta_base_glue_mrpc_eduardo_ag_en.md new file mode 100644 index 00000000000000..7dc88b4cf69a78 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-platzi_distilroberta_base_glue_mrpc_eduardo_ag_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English platzi_distilroberta_base_glue_mrpc_eduardo_ag RoBertaForSequenceClassification from platzi +author: John Snow Labs +name: platzi_distilroberta_base_glue_mrpc_eduardo_ag +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`platzi_distilroberta_base_glue_mrpc_eduardo_ag` is a English model originally trained by platzi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/platzi_distilroberta_base_glue_mrpc_eduardo_ag_en_5.5.0_3.0_1727085898144.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/platzi_distilroberta_base_glue_mrpc_eduardo_ag_en_5.5.0_3.0_1727085898144.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("platzi_distilroberta_base_glue_mrpc_eduardo_ag","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("platzi_distilroberta_base_glue_mrpc_eduardo_ag", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|platzi_distilroberta_base_glue_mrpc_eduardo_ag| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|308.6 MB| + +## References + +https://huggingface.co/platzi/platzi-distilroberta-base-glue-mrpc-eduardo-ag \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-platzi_distilroberta_base_glue_mrpc_eduardo_ag_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-platzi_distilroberta_base_glue_mrpc_eduardo_ag_pipeline_en.md new file mode 100644 index 00000000000000..354cdf1cc13680 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-platzi_distilroberta_base_glue_mrpc_eduardo_ag_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English platzi_distilroberta_base_glue_mrpc_eduardo_ag_pipeline pipeline RoBertaForSequenceClassification from platzi +author: John Snow Labs +name: platzi_distilroberta_base_glue_mrpc_eduardo_ag_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`platzi_distilroberta_base_glue_mrpc_eduardo_ag_pipeline` is a English model originally trained by platzi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/platzi_distilroberta_base_glue_mrpc_eduardo_ag_pipeline_en_5.5.0_3.0_1727085913302.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/platzi_distilroberta_base_glue_mrpc_eduardo_ag_pipeline_en_5.5.0_3.0_1727085913302.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("platzi_distilroberta_base_glue_mrpc_eduardo_ag_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("platzi_distilroberta_base_glue_mrpc_eduardo_ag_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|platzi_distilroberta_base_glue_mrpc_eduardo_ag_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|308.6 MB| + +## References + +https://huggingface.co/platzi/platzi-distilroberta-base-glue-mrpc-eduardo-ag + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-portuguese_up_xlmr_truetrue_0_4_best_en.md b/docs/_posts/ahmedlone127/2024-09-23-portuguese_up_xlmr_truetrue_0_4_best_en.md new file mode 100644 index 00000000000000..ee6cc2d268a680 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-portuguese_up_xlmr_truetrue_0_4_best_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English portuguese_up_xlmr_truetrue_0_4_best XlmRoBertaForSequenceClassification from harish +author: John Snow Labs +name: portuguese_up_xlmr_truetrue_0_4_best +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`portuguese_up_xlmr_truetrue_0_4_best` is a English model originally trained by harish. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/portuguese_up_xlmr_truetrue_0_4_best_en_5.5.0_3.0_1727089422482.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/portuguese_up_xlmr_truetrue_0_4_best_en_5.5.0_3.0_1727089422482.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("portuguese_up_xlmr_truetrue_0_4_best","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("portuguese_up_xlmr_truetrue_0_4_best", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|portuguese_up_xlmr_truetrue_0_4_best| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|788.1 MB| + +## References + +https://huggingface.co/harish/PT-UP-xlmR-TrueTrue-0_4_BEST \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-predict_perception_xlmr_cause_human_en.md b/docs/_posts/ahmedlone127/2024-09-23-predict_perception_xlmr_cause_human_en.md new file mode 100644 index 00000000000000..1596a5ac143e30 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-predict_perception_xlmr_cause_human_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English predict_perception_xlmr_cause_human XlmRoBertaForSequenceClassification from responsibility-framing +author: John Snow Labs +name: predict_perception_xlmr_cause_human +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`predict_perception_xlmr_cause_human` is a English model originally trained by responsibility-framing. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/predict_perception_xlmr_cause_human_en_5.5.0_3.0_1727126499130.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/predict_perception_xlmr_cause_human_en_5.5.0_3.0_1727126499130.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("predict_perception_xlmr_cause_human","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("predict_perception_xlmr_cause_human", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|predict_perception_xlmr_cause_human| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|837.6 MB| + +## References + +https://huggingface.co/responsibility-framing/predict-perception-xlmr-cause-human \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-predict_perception_xlmr_cause_human_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-predict_perception_xlmr_cause_human_pipeline_en.md new file mode 100644 index 00000000000000..264b921e61fe2c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-predict_perception_xlmr_cause_human_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English predict_perception_xlmr_cause_human_pipeline pipeline XlmRoBertaForSequenceClassification from responsibility-framing +author: John Snow Labs +name: predict_perception_xlmr_cause_human_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`predict_perception_xlmr_cause_human_pipeline` is a English model originally trained by responsibility-framing. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/predict_perception_xlmr_cause_human_pipeline_en_5.5.0_3.0_1727126563385.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/predict_perception_xlmr_cause_human_pipeline_en_5.5.0_3.0_1727126563385.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("predict_perception_xlmr_cause_human_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("predict_perception_xlmr_cause_human_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|predict_perception_xlmr_cause_human_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|837.6 MB| + +## References + +https://huggingface.co/responsibility-framing/predict-perception-xlmr-cause-human + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-pruned_30_model_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-pruned_30_model_pipeline_en.md new file mode 100644 index 00000000000000..fb340c0d237fbf --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-pruned_30_model_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English pruned_30_model_pipeline pipeline DistilBertForSequenceClassification from andygoh5 +author: John Snow Labs +name: pruned_30_model_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`pruned_30_model_pipeline` is a English model originally trained by andygoh5. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/pruned_30_model_pipeline_en_5.5.0_3.0_1727082419368.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/pruned_30_model_pipeline_en_5.5.0_3.0_1727082419368.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("pruned_30_model_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("pruned_30_model_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|pruned_30_model_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/andygoh5/pruned-30-model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-qa_model_gigazinie_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-qa_model_gigazinie_pipeline_en.md new file mode 100644 index 00000000000000..6602a4f025ee73 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-qa_model_gigazinie_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English qa_model_gigazinie_pipeline pipeline BertForQuestionAnswering from Gigazinie +author: John Snow Labs +name: qa_model_gigazinie_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`qa_model_gigazinie_pipeline` is a English model originally trained by Gigazinie. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/qa_model_gigazinie_pipeline_en_5.5.0_3.0_1727070468565.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/qa_model_gigazinie_pipeline_en_5.5.0_3.0_1727070468565.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("qa_model_gigazinie_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("qa_model_gigazinie_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|qa_model_gigazinie_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/Gigazinie/QA_model + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-qa_persian_albert_persian_farsi_zwnj_base_v2_en.md b/docs/_posts/ahmedlone127/2024-09-23-qa_persian_albert_persian_farsi_zwnj_base_v2_en.md new file mode 100644 index 00000000000000..1ab5f6e7b695a0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-qa_persian_albert_persian_farsi_zwnj_base_v2_en.md @@ -0,0 +1,88 @@ +--- +layout: model +title: English qa_persian_albert_persian_farsi_zwnj_base_v2 AlbertForQuestionAnswering from makhataei +author: John Snow Labs +name: qa_persian_albert_persian_farsi_zwnj_base_v2 +date: 2024-09-23 +tags: [en, open_source, onnx, question_answering, albert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained AlbertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`qa_persian_albert_persian_farsi_zwnj_base_v2` is a English model originally trained by makhataei. + +## Predicted Entities + + + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/qa_persian_albert_persian_farsi_zwnj_base_v2_en_5.5.0_3.0_1727128416736.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/qa_persian_albert_persian_farsi_zwnj_base_v2_en_5.5.0_3.0_1727128416736.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = AlbertForQuestionAnswering.pretrained("qa_persian_albert_persian_farsi_zwnj_base_v2","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) +``` +```scala +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = AlbertForQuestionAnswering.pretrained("qa_persian_albert_persian_farsi_zwnj_base_v2", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|qa_persian_albert_persian_farsi_zwnj_base_v2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|41.8 MB| + +## References + +References + +https://huggingface.co/makhataei/qa-persian-albert-fa-zwnj-base-v2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-qa_persian_albert_persian_farsi_zwnj_base_v2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-qa_persian_albert_persian_farsi_zwnj_base_v2_pipeline_en.md new file mode 100644 index 00000000000000..9035bb7eb9dee5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-qa_persian_albert_persian_farsi_zwnj_base_v2_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English qa_persian_albert_persian_farsi_zwnj_base_v2_pipeline pipeline AlbertForQuestionAnswering from makhataei +author: John Snow Labs +name: qa_persian_albert_persian_farsi_zwnj_base_v2_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained AlbertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`qa_persian_albert_persian_farsi_zwnj_base_v2_pipeline` is a English model originally trained by makhataei. + +## Predicted Entities + + + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/qa_persian_albert_persian_farsi_zwnj_base_v2_pipeline_en_5.5.0_3.0_1727128419239.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/qa_persian_albert_persian_farsi_zwnj_base_v2_pipeline_en_5.5.0_3.0_1727128419239.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +pipeline = PretrainedPipeline("qa_persian_albert_persian_farsi_zwnj_base_v2_pipeline", lang = "en") +annotations = pipeline.transform(df) +``` +```scala +val pipeline = new PretrainedPipeline("qa_persian_albert_persian_farsi_zwnj_base_v2_pipeline", lang = "en") +val annotations = pipeline.transform(df) +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|qa_persian_albert_persian_farsi_zwnj_base_v2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|41.8 MB| + +## References + +References + +https://huggingface.co/makhataei/qa-persian-albert-fa-zwnj-base-v2 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-question_answering_model_finetuned_on_bert_en.md b/docs/_posts/ahmedlone127/2024-09-23-question_answering_model_finetuned_on_bert_en.md new file mode 100644 index 00000000000000..60780cf865ec3c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-question_answering_model_finetuned_on_bert_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English question_answering_model_finetuned_on_bert BertForQuestionAnswering from Vardan-verma +author: John Snow Labs +name: question_answering_model_finetuned_on_bert +date: 2024-09-23 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`question_answering_model_finetuned_on_bert` is a English model originally trained by Vardan-verma. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/question_answering_model_finetuned_on_bert_en_5.5.0_3.0_1727049603017.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/question_answering_model_finetuned_on_bert_en_5.5.0_3.0_1727049603017.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("question_answering_model_finetuned_on_bert","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("question_answering_model_finetuned_on_bert", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|question_answering_model_finetuned_on_bert| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|797.4 MB| + +## References + +https://huggingface.co/Vardan-verma/Question_Answering_model_finetuned_on_bert \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-question_answering_model_finetuned_on_bert_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-question_answering_model_finetuned_on_bert_pipeline_en.md new file mode 100644 index 00000000000000..72b8d1d5eb2f46 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-question_answering_model_finetuned_on_bert_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English question_answering_model_finetuned_on_bert_pipeline pipeline BertForQuestionAnswering from Vardan-verma +author: John Snow Labs +name: question_answering_model_finetuned_on_bert_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`question_answering_model_finetuned_on_bert_pipeline` is a English model originally trained by Vardan-verma. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/question_answering_model_finetuned_on_bert_pipeline_en_5.5.0_3.0_1727049830318.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/question_answering_model_finetuned_on_bert_pipeline_en_5.5.0_3.0_1727049830318.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("question_answering_model_finetuned_on_bert_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("question_answering_model_finetuned_on_bert_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|question_answering_model_finetuned_on_bert_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|797.4 MB| + +## References + +https://huggingface.co/Vardan-verma/Question_Answering_model_finetuned_on_bert + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-red_green_classification_v3_en.md b/docs/_posts/ahmedlone127/2024-09-23-red_green_classification_v3_en.md new file mode 100644 index 00000000000000..245df0fc3f88b3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-red_green_classification_v3_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English red_green_classification_v3 DistilBertForSequenceClassification from pnr-svc +author: John Snow Labs +name: red_green_classification_v3 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`red_green_classification_v3` is a English model originally trained by pnr-svc. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/red_green_classification_v3_en_5.5.0_3.0_1727108644725.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/red_green_classification_v3_en_5.5.0_3.0_1727108644725.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("red_green_classification_v3","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("red_green_classification_v3", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|red_green_classification_v3| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/pnr-svc/red-green-classification-v3 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-red_green_classification_v3_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-red_green_classification_v3_pipeline_en.md new file mode 100644 index 00000000000000..4f66fae0c4267f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-red_green_classification_v3_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English red_green_classification_v3_pipeline pipeline DistilBertForSequenceClassification from pnr-svc +author: John Snow Labs +name: red_green_classification_v3_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`red_green_classification_v3_pipeline` is a English model originally trained by pnr-svc. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/red_green_classification_v3_pipeline_en_5.5.0_3.0_1727108657324.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/red_green_classification_v3_pipeline_en_5.5.0_3.0_1727108657324.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("red_green_classification_v3_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("red_green_classification_v3_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|red_green_classification_v3_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/pnr-svc/red-green-classification-v3 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-results_lr_0_001_en.md b/docs/_posts/ahmedlone127/2024-09-23-results_lr_0_001_en.md new file mode 100644 index 00000000000000..0020eeea3adb0a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-results_lr_0_001_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English results_lr_0_001 DistilBertForSequenceClassification from Benuehlinger +author: John Snow Labs +name: results_lr_0_001 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`results_lr_0_001` is a English model originally trained by Benuehlinger. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/results_lr_0_001_en_5.5.0_3.0_1727073645417.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/results_lr_0_001_en_5.5.0_3.0_1727073645417.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("results_lr_0_001","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("results_lr_0_001", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|results_lr_0_001| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.4 MB| + +## References + +https://huggingface.co/Benuehlinger/results_lr_0.001 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-results_lr_0_001_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-results_lr_0_001_pipeline_en.md new file mode 100644 index 00000000000000..dcc3f90a959d5b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-results_lr_0_001_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English results_lr_0_001_pipeline pipeline DistilBertForSequenceClassification from Benuehlinger +author: John Snow Labs +name: results_lr_0_001_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`results_lr_0_001_pipeline` is a English model originally trained by Benuehlinger. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/results_lr_0_001_pipeline_en_5.5.0_3.0_1727073658319.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/results_lr_0_001_pipeline_en_5.5.0_3.0_1727073658319.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("results_lr_0_001_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("results_lr_0_001_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|results_lr_0_001_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Benuehlinger/results_lr_0.001 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-results_yildizt_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-results_yildizt_pipeline_en.md new file mode 100644 index 00000000000000..2daccee7238df3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-results_yildizt_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English results_yildizt_pipeline pipeline DistilBertForSequenceClassification from yildizt +author: John Snow Labs +name: results_yildizt_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`results_yildizt_pipeline` is a English model originally trained by yildizt. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/results_yildizt_pipeline_en_5.5.0_3.0_1727087309816.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/results_yildizt_pipeline_en_5.5.0_3.0_1727087309816.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("results_yildizt_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("results_yildizt_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|results_yildizt_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/yildizt/results + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-reviews_classifier_en.md b/docs/_posts/ahmedlone127/2024-09-23-reviews_classifier_en.md new file mode 100644 index 00000000000000..45b5f258aec717 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-reviews_classifier_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English reviews_classifier RoBertaForSequenceClassification from dariadaria +author: John Snow Labs +name: reviews_classifier +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`reviews_classifier` is a English model originally trained by dariadaria. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/reviews_classifier_en_5.5.0_3.0_1727085667042.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/reviews_classifier_en_5.5.0_3.0_1727085667042.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("reviews_classifier","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("reviews_classifier", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|reviews_classifier| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|466.2 MB| + +## References + +https://huggingface.co/dariadaria/reviews_classifier \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-reviews_classifier_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-reviews_classifier_pipeline_en.md new file mode 100644 index 00000000000000..46aea56fb3f588 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-reviews_classifier_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English reviews_classifier_pipeline pipeline RoBertaForSequenceClassification from dariadaria +author: John Snow Labs +name: reviews_classifier_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`reviews_classifier_pipeline` is a English model originally trained by dariadaria. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/reviews_classifier_pipeline_en_5.5.0_3.0_1727085690760.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/reviews_classifier_pipeline_en_5.5.0_3.0_1727085690760.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("reviews_classifier_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("reviews_classifier_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|reviews_classifier_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|466.3 MB| + +## References + +https://huggingface.co/dariadaria/reviews_classifier + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-robbert_2023_dutch_base_abb_nl.md b/docs/_posts/ahmedlone127/2024-09-23-robbert_2023_dutch_base_abb_nl.md new file mode 100644 index 00000000000000..0b8b92d8da932d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-robbert_2023_dutch_base_abb_nl.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Dutch, Flemish robbert_2023_dutch_base_abb RoBertaEmbeddings from svercoutere +author: John Snow Labs +name: robbert_2023_dutch_base_abb +date: 2024-09-23 +tags: [nl, open_source, onnx, embeddings, roberta] +task: Embeddings +language: nl +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`robbert_2023_dutch_base_abb` is a Dutch, Flemish model originally trained by svercoutere. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/robbert_2023_dutch_base_abb_nl_5.5.0_3.0_1727065849629.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/robbert_2023_dutch_base_abb_nl_5.5.0_3.0_1727065849629.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("robbert_2023_dutch_base_abb","nl") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("robbert_2023_dutch_base_abb","nl") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|robbert_2023_dutch_base_abb| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|nl| +|Size:|464.8 MB| + +## References + +https://huggingface.co/svercoutere/robbert-2023-dutch-base-abb \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-robbert_2023_dutch_base_abb_pipeline_nl.md b/docs/_posts/ahmedlone127/2024-09-23-robbert_2023_dutch_base_abb_pipeline_nl.md new file mode 100644 index 00000000000000..20e2b5c1f01704 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-robbert_2023_dutch_base_abb_pipeline_nl.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Dutch, Flemish robbert_2023_dutch_base_abb_pipeline pipeline RoBertaEmbeddings from svercoutere +author: John Snow Labs +name: robbert_2023_dutch_base_abb_pipeline +date: 2024-09-23 +tags: [nl, open_source, pipeline, onnx] +task: Embeddings +language: nl +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`robbert_2023_dutch_base_abb_pipeline` is a Dutch, Flemish model originally trained by svercoutere. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/robbert_2023_dutch_base_abb_pipeline_nl_5.5.0_3.0_1727065872429.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/robbert_2023_dutch_base_abb_pipeline_nl_5.5.0_3.0_1727065872429.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("robbert_2023_dutch_base_abb_pipeline", lang = "nl") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("robbert_2023_dutch_base_abb_pipeline", lang = "nl") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|robbert_2023_dutch_base_abb_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|nl| +|Size:|464.8 MB| + +## References + +https://huggingface.co/svercoutere/robbert-2023-dutch-base-abb + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-robbert_cosmetic_finetuned_en.md b/docs/_posts/ahmedlone127/2024-09-23-robbert_cosmetic_finetuned_en.md new file mode 100644 index 00000000000000..9c249942247cc0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-robbert_cosmetic_finetuned_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English robbert_cosmetic_finetuned RoBertaEmbeddings from ymelka +author: John Snow Labs +name: robbert_cosmetic_finetuned +date: 2024-09-23 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`robbert_cosmetic_finetuned` is a English model originally trained by ymelka. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/robbert_cosmetic_finetuned_en_5.5.0_3.0_1727121936523.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/robbert_cosmetic_finetuned_en_5.5.0_3.0_1727121936523.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("robbert_cosmetic_finetuned","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("robbert_cosmetic_finetuned","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|robbert_cosmetic_finetuned| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|443.8 MB| + +## References + +https://huggingface.co/ymelka/robbert-cosmetic-finetuned \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-robbert_cosmetic_finetuned_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-robbert_cosmetic_finetuned_pipeline_en.md new file mode 100644 index 00000000000000..5715b6f78cff1b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-robbert_cosmetic_finetuned_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English robbert_cosmetic_finetuned_pipeline pipeline RoBertaEmbeddings from ymelka +author: John Snow Labs +name: robbert_cosmetic_finetuned_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`robbert_cosmetic_finetuned_pipeline` is a English model originally trained by ymelka. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/robbert_cosmetic_finetuned_pipeline_en_5.5.0_3.0_1727121957099.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/robbert_cosmetic_finetuned_pipeline_en_5.5.0_3.0_1727121957099.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("robbert_cosmetic_finetuned_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("robbert_cosmetic_finetuned_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|robbert_cosmetic_finetuned_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|443.8 MB| + +## References + +https://huggingface.co/ymelka/robbert-cosmetic-finetuned + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-roberta_base_5_epochs_en.md b/docs/_posts/ahmedlone127/2024-09-23-roberta_base_5_epochs_en.md new file mode 100644 index 00000000000000..d4c955a42c3253 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-roberta_base_5_epochs_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_base_5_epochs RoBertaEmbeddings from mingAAA +author: John Snow Labs +name: roberta_base_5_epochs +date: 2024-09-23 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_5_epochs` is a English model originally trained by mingAAA. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_5_epochs_en_5.5.0_3.0_1727056655328.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_5_epochs_en_5.5.0_3.0_1727056655328.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("roberta_base_5_epochs","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("roberta_base_5_epochs","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_5_epochs| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|298.2 MB| + +## References + +https://huggingface.co/mingAAA/roberta-base-5-epochs \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-roberta_base_5_epochs_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-roberta_base_5_epochs_pipeline_en.md new file mode 100644 index 00000000000000..e01a4bdd2ace8e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-roberta_base_5_epochs_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_5_epochs_pipeline pipeline RoBertaEmbeddings from mingAAA +author: John Snow Labs +name: roberta_base_5_epochs_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_5_epochs_pipeline` is a English model originally trained by mingAAA. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_5_epochs_pipeline_en_5.5.0_3.0_1727056738251.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_5_epochs_pipeline_en_5.5.0_3.0_1727056738251.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_5_epochs_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_5_epochs_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_5_epochs_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|298.2 MB| + +## References + +https://huggingface.co/mingAAA/roberta-base-5-epochs + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-roberta_base_airlines_news_multi_en.md b/docs/_posts/ahmedlone127/2024-09-23-roberta_base_airlines_news_multi_en.md new file mode 100644 index 00000000000000..a3b37aaf2f4a8c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-roberta_base_airlines_news_multi_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_base_airlines_news_multi RoBertaForSequenceClassification from dahe827 +author: John Snow Labs +name: roberta_base_airlines_news_multi +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_airlines_news_multi` is a English model originally trained by dahe827. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_airlines_news_multi_en_5.5.0_3.0_1727085376883.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_airlines_news_multi_en_5.5.0_3.0_1727085376883.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_base_airlines_news_multi","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_base_airlines_news_multi", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_airlines_news_multi| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|434.1 MB| + +## References + +https://huggingface.co/dahe827/roberta-base-airlines-news-multi \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-roberta_base_airlines_news_multi_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-roberta_base_airlines_news_multi_pipeline_en.md new file mode 100644 index 00000000000000..6cf3f5cd8f7acb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-roberta_base_airlines_news_multi_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_airlines_news_multi_pipeline pipeline RoBertaForSequenceClassification from dahe827 +author: John Snow Labs +name: roberta_base_airlines_news_multi_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_airlines_news_multi_pipeline` is a English model originally trained by dahe827. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_airlines_news_multi_pipeline_en_5.5.0_3.0_1727085408158.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_airlines_news_multi_pipeline_en_5.5.0_3.0_1727085408158.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_airlines_news_multi_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_airlines_news_multi_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_airlines_news_multi_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|434.2 MB| + +## References + +https://huggingface.co/dahe827/roberta-base-airlines-news-multi + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-roberta_base_bne_finetuned_analisis_sentimiento_textos_turisticos_mx_polaridad_en.md b/docs/_posts/ahmedlone127/2024-09-23-roberta_base_bne_finetuned_analisis_sentimiento_textos_turisticos_mx_polaridad_en.md new file mode 100644 index 00000000000000..3d0617f6c3a7c3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-roberta_base_bne_finetuned_analisis_sentimiento_textos_turisticos_mx_polaridad_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_base_bne_finetuned_analisis_sentimiento_textos_turisticos_mx_polaridad RoBertaForSequenceClassification from vg055 +author: John Snow Labs +name: roberta_base_bne_finetuned_analisis_sentimiento_textos_turisticos_mx_polaridad +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_bne_finetuned_analisis_sentimiento_textos_turisticos_mx_polaridad` is a English model originally trained by vg055. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_bne_finetuned_analisis_sentimiento_textos_turisticos_mx_polaridad_en_5.5.0_3.0_1727135401091.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_bne_finetuned_analisis_sentimiento_textos_turisticos_mx_polaridad_en_5.5.0_3.0_1727135401091.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_base_bne_finetuned_analisis_sentimiento_textos_turisticos_mx_polaridad","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_base_bne_finetuned_analisis_sentimiento_textos_turisticos_mx_polaridad", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_bne_finetuned_analisis_sentimiento_textos_turisticos_mx_polaridad| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|463.6 MB| + +## References + +https://huggingface.co/vg055/roberta-base-bne-finetuned-analisis-sentimiento-textos-turisticos-mx-polaridad \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-roberta_base_bne_finetuned_analisis_sentimiento_textos_turisticos_mx_polaridad_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-roberta_base_bne_finetuned_analisis_sentimiento_textos_turisticos_mx_polaridad_pipeline_en.md new file mode 100644 index 00000000000000..1f93189266948f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-roberta_base_bne_finetuned_analisis_sentimiento_textos_turisticos_mx_polaridad_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_bne_finetuned_analisis_sentimiento_textos_turisticos_mx_polaridad_pipeline pipeline RoBertaForSequenceClassification from vg055 +author: John Snow Labs +name: roberta_base_bne_finetuned_analisis_sentimiento_textos_turisticos_mx_polaridad_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_bne_finetuned_analisis_sentimiento_textos_turisticos_mx_polaridad_pipeline` is a English model originally trained by vg055. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_bne_finetuned_analisis_sentimiento_textos_turisticos_mx_polaridad_pipeline_en_5.5.0_3.0_1727135425452.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_bne_finetuned_analisis_sentimiento_textos_turisticos_mx_polaridad_pipeline_en_5.5.0_3.0_1727135425452.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_bne_finetuned_analisis_sentimiento_textos_turisticos_mx_polaridad_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_bne_finetuned_analisis_sentimiento_textos_turisticos_mx_polaridad_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_bne_finetuned_analisis_sentimiento_textos_turisticos_mx_polaridad_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|463.6 MB| + +## References + +https://huggingface.co/vg055/roberta-base-bne-finetuned-analisis-sentimiento-textos-turisticos-mx-polaridad + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-roberta_base_bne_finetuned_tripadvisordomainadaptation_finetuned_e2_restmex2023_polaridad_en.md b/docs/_posts/ahmedlone127/2024-09-23-roberta_base_bne_finetuned_tripadvisordomainadaptation_finetuned_e2_restmex2023_polaridad_en.md new file mode 100644 index 00000000000000..843292f4d7f53d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-roberta_base_bne_finetuned_tripadvisordomainadaptation_finetuned_e2_restmex2023_polaridad_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_base_bne_finetuned_tripadvisordomainadaptation_finetuned_e2_restmex2023_polaridad RoBertaForSequenceClassification from vg055 +author: John Snow Labs +name: roberta_base_bne_finetuned_tripadvisordomainadaptation_finetuned_e2_restmex2023_polaridad +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_bne_finetuned_tripadvisordomainadaptation_finetuned_e2_restmex2023_polaridad` is a English model originally trained by vg055. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_bne_finetuned_tripadvisordomainadaptation_finetuned_e2_restmex2023_polaridad_en_5.5.0_3.0_1727055061144.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_bne_finetuned_tripadvisordomainadaptation_finetuned_e2_restmex2023_polaridad_en_5.5.0_3.0_1727055061144.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_base_bne_finetuned_tripadvisordomainadaptation_finetuned_e2_restmex2023_polaridad","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_base_bne_finetuned_tripadvisordomainadaptation_finetuned_e2_restmex2023_polaridad", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_bne_finetuned_tripadvisordomainadaptation_finetuned_e2_restmex2023_polaridad| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|466.3 MB| + +## References + +https://huggingface.co/vg055/roberta-base-bne-finetuned-TripAdvisorDomainAdaptation-finetuned-e2-RestMex2023-polaridad \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-roberta_base_bne_finetuned_tripadvisordomainadaptation_finetuned_e2_restmex2023_polaridad_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-roberta_base_bne_finetuned_tripadvisordomainadaptation_finetuned_e2_restmex2023_polaridad_pipeline_en.md new file mode 100644 index 00000000000000..11d2e6cb6f2a39 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-roberta_base_bne_finetuned_tripadvisordomainadaptation_finetuned_e2_restmex2023_polaridad_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_bne_finetuned_tripadvisordomainadaptation_finetuned_e2_restmex2023_polaridad_pipeline pipeline RoBertaForSequenceClassification from vg055 +author: John Snow Labs +name: roberta_base_bne_finetuned_tripadvisordomainadaptation_finetuned_e2_restmex2023_polaridad_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_bne_finetuned_tripadvisordomainadaptation_finetuned_e2_restmex2023_polaridad_pipeline` is a English model originally trained by vg055. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_bne_finetuned_tripadvisordomainadaptation_finetuned_e2_restmex2023_polaridad_pipeline_en_5.5.0_3.0_1727055083718.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_bne_finetuned_tripadvisordomainadaptation_finetuned_e2_restmex2023_polaridad_pipeline_en_5.5.0_3.0_1727055083718.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_bne_finetuned_tripadvisordomainadaptation_finetuned_e2_restmex2023_polaridad_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_bne_finetuned_tripadvisordomainadaptation_finetuned_e2_restmex2023_polaridad_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_bne_finetuned_tripadvisordomainadaptation_finetuned_e2_restmex2023_polaridad_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|466.3 MB| + +## References + +https://huggingface.co/vg055/roberta-base-bne-finetuned-TripAdvisorDomainAdaptation-finetuned-e2-RestMex2023-polaridad + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-roberta_base_epoch_26_en.md b/docs/_posts/ahmedlone127/2024-09-23-roberta_base_epoch_26_en.md new file mode 100644 index 00000000000000..3c841b249f2219 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-roberta_base_epoch_26_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_base_epoch_26 RoBertaEmbeddings from yanaiela +author: John Snow Labs +name: roberta_base_epoch_26 +date: 2024-09-23 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_epoch_26` is a English model originally trained by yanaiela. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_epoch_26_en_5.5.0_3.0_1727057123268.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_epoch_26_en_5.5.0_3.0_1727057123268.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("roberta_base_epoch_26","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("roberta_base_epoch_26","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_epoch_26| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|297.3 MB| + +## References + +https://huggingface.co/yanaiela/roberta-base-epoch_26 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-roberta_base_epoch_26_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-roberta_base_epoch_26_pipeline_en.md new file mode 100644 index 00000000000000..2d93559b863e0e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-roberta_base_epoch_26_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_epoch_26_pipeline pipeline RoBertaEmbeddings from yanaiela +author: John Snow Labs +name: roberta_base_epoch_26_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_epoch_26_pipeline` is a English model originally trained by yanaiela. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_epoch_26_pipeline_en_5.5.0_3.0_1727057203348.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_epoch_26_pipeline_en_5.5.0_3.0_1727057203348.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_epoch_26_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_epoch_26_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_epoch_26_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|297.3 MB| + +## References + +https://huggingface.co/yanaiela/roberta-base-epoch_26 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-roberta_base_epoch_36_en.md b/docs/_posts/ahmedlone127/2024-09-23-roberta_base_epoch_36_en.md new file mode 100644 index 00000000000000..f5201ab47cbd99 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-roberta_base_epoch_36_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_base_epoch_36 RoBertaEmbeddings from yanaiela +author: John Snow Labs +name: roberta_base_epoch_36 +date: 2024-09-23 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_epoch_36` is a English model originally trained by yanaiela. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_epoch_36_en_5.5.0_3.0_1727080612388.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_epoch_36_en_5.5.0_3.0_1727080612388.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("roberta_base_epoch_36","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("roberta_base_epoch_36","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_epoch_36| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|297.3 MB| + +## References + +https://huggingface.co/yanaiela/roberta-base-epoch_36 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-roberta_base_epoch_36_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-roberta_base_epoch_36_pipeline_en.md new file mode 100644 index 00000000000000..420ec41e82953f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-roberta_base_epoch_36_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_epoch_36_pipeline pipeline RoBertaEmbeddings from yanaiela +author: John Snow Labs +name: roberta_base_epoch_36_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_epoch_36_pipeline` is a English model originally trained by yanaiela. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_epoch_36_pipeline_en_5.5.0_3.0_1727080698172.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_epoch_36_pipeline_en_5.5.0_3.0_1727080698172.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_epoch_36_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_epoch_36_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_epoch_36_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|297.3 MB| + +## References + +https://huggingface.co/yanaiela/roberta-base-epoch_36 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-roberta_base_epoch_45_en.md b/docs/_posts/ahmedlone127/2024-09-23-roberta_base_epoch_45_en.md new file mode 100644 index 00000000000000..990b5508e437e9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-roberta_base_epoch_45_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_base_epoch_45 RoBertaEmbeddings from yanaiela +author: John Snow Labs +name: roberta_base_epoch_45 +date: 2024-09-23 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_epoch_45` is a English model originally trained by yanaiela. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_epoch_45_en_5.5.0_3.0_1727056789478.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_epoch_45_en_5.5.0_3.0_1727056789478.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("roberta_base_epoch_45","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("roberta_base_epoch_45","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_epoch_45| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|297.3 MB| + +## References + +https://huggingface.co/yanaiela/roberta-base-epoch_45 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-roberta_base_epoch_45_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-roberta_base_epoch_45_pipeline_en.md new file mode 100644 index 00000000000000..b2ae5c6a659830 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-roberta_base_epoch_45_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_epoch_45_pipeline pipeline RoBertaEmbeddings from yanaiela +author: John Snow Labs +name: roberta_base_epoch_45_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_epoch_45_pipeline` is a English model originally trained by yanaiela. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_epoch_45_pipeline_en_5.5.0_3.0_1727056872118.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_epoch_45_pipeline_en_5.5.0_3.0_1727056872118.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_epoch_45_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_epoch_45_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_epoch_45_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|297.3 MB| + +## References + +https://huggingface.co/yanaiela/roberta-base-epoch_45 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-roberta_base_epoch_46_en.md b/docs/_posts/ahmedlone127/2024-09-23-roberta_base_epoch_46_en.md new file mode 100644 index 00000000000000..f1ab1ca8443aa6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-roberta_base_epoch_46_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_base_epoch_46 RoBertaEmbeddings from yanaiela +author: John Snow Labs +name: roberta_base_epoch_46 +date: 2024-09-23 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_epoch_46` is a English model originally trained by yanaiela. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_epoch_46_en_5.5.0_3.0_1727122195755.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_epoch_46_en_5.5.0_3.0_1727122195755.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("roberta_base_epoch_46","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("roberta_base_epoch_46","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_epoch_46| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|297.3 MB| + +## References + +https://huggingface.co/yanaiela/roberta-base-epoch_46 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-roberta_base_epoch_46_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-roberta_base_epoch_46_pipeline_en.md new file mode 100644 index 00000000000000..54cbd129f71e3b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-roberta_base_epoch_46_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_epoch_46_pipeline pipeline RoBertaEmbeddings from yanaiela +author: John Snow Labs +name: roberta_base_epoch_46_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_epoch_46_pipeline` is a English model originally trained by yanaiela. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_epoch_46_pipeline_en_5.5.0_3.0_1727122276146.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_epoch_46_pipeline_en_5.5.0_3.0_1727122276146.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_epoch_46_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_epoch_46_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_epoch_46_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|297.3 MB| + +## References + +https://huggingface.co/yanaiela/roberta-base-epoch_46 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-roberta_base_epoch_50_en.md b/docs/_posts/ahmedlone127/2024-09-23-roberta_base_epoch_50_en.md new file mode 100644 index 00000000000000..cc661ff73d9762 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-roberta_base_epoch_50_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_base_epoch_50 RoBertaEmbeddings from yanaiela +author: John Snow Labs +name: roberta_base_epoch_50 +date: 2024-09-23 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_epoch_50` is a English model originally trained by yanaiela. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_epoch_50_en_5.5.0_3.0_1727121907390.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_epoch_50_en_5.5.0_3.0_1727121907390.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("roberta_base_epoch_50","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("roberta_base_epoch_50","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_epoch_50| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|297.3 MB| + +## References + +https://huggingface.co/yanaiela/roberta-base-epoch_50 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-roberta_base_epoch_50_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-roberta_base_epoch_50_pipeline_en.md new file mode 100644 index 00000000000000..b32b562fd2fa48 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-roberta_base_epoch_50_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_epoch_50_pipeline pipeline RoBertaEmbeddings from yanaiela +author: John Snow Labs +name: roberta_base_epoch_50_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_epoch_50_pipeline` is a English model originally trained by yanaiela. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_epoch_50_pipeline_en_5.5.0_3.0_1727121989570.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_epoch_50_pipeline_en_5.5.0_3.0_1727121989570.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_epoch_50_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_epoch_50_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_epoch_50_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|297.3 MB| + +## References + +https://huggingface.co/yanaiela/roberta-base-epoch_50 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-roberta_base_finetuned_sotus_v1_rile_v1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-roberta_base_finetuned_sotus_v1_rile_v1_pipeline_en.md new file mode 100644 index 00000000000000..20cec9de03f6db --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-roberta_base_finetuned_sotus_v1_rile_v1_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_finetuned_sotus_v1_rile_v1_pipeline pipeline RoBertaForSequenceClassification from kghanlon +author: John Snow Labs +name: roberta_base_finetuned_sotus_v1_rile_v1_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_finetuned_sotus_v1_rile_v1_pipeline` is a English model originally trained by kghanlon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_finetuned_sotus_v1_rile_v1_pipeline_en_5.5.0_3.0_1727135671938.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_finetuned_sotus_v1_rile_v1_pipeline_en_5.5.0_3.0_1727135671938.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_finetuned_sotus_v1_rile_v1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_finetuned_sotus_v1_rile_v1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_finetuned_sotus_v1_rile_v1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|468.3 MB| + +## References + +https://huggingface.co/kghanlon/roberta-base-finetuned-SOTUs-v1-RILE-v1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-roberta_base_finetuned_wallisian_manual_6ep_en.md b/docs/_posts/ahmedlone127/2024-09-23-roberta_base_finetuned_wallisian_manual_6ep_en.md new file mode 100644 index 00000000000000..84f923e93fe5df --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-roberta_base_finetuned_wallisian_manual_6ep_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_base_finetuned_wallisian_manual_6ep RoBertaEmbeddings from btamm12 +author: John Snow Labs +name: roberta_base_finetuned_wallisian_manual_6ep +date: 2024-09-23 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_finetuned_wallisian_manual_6ep` is a English model originally trained by btamm12. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_finetuned_wallisian_manual_6ep_en_5.5.0_3.0_1727066025615.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_finetuned_wallisian_manual_6ep_en_5.5.0_3.0_1727066025615.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("roberta_base_finetuned_wallisian_manual_6ep","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("roberta_base_finetuned_wallisian_manual_6ep","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_finetuned_wallisian_manual_6ep| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|465.0 MB| + +## References + +https://huggingface.co/btamm12/roberta-base-finetuned-wls-manual-6ep \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-roberta_base_finetuned_wallisian_manual_6ep_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-roberta_base_finetuned_wallisian_manual_6ep_pipeline_en.md new file mode 100644 index 00000000000000..30efe62fd3831a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-roberta_base_finetuned_wallisian_manual_6ep_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_finetuned_wallisian_manual_6ep_pipeline pipeline RoBertaEmbeddings from btamm12 +author: John Snow Labs +name: roberta_base_finetuned_wallisian_manual_6ep_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_finetuned_wallisian_manual_6ep_pipeline` is a English model originally trained by btamm12. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_finetuned_wallisian_manual_6ep_pipeline_en_5.5.0_3.0_1727066048873.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_finetuned_wallisian_manual_6ep_pipeline_en_5.5.0_3.0_1727066048873.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_finetuned_wallisian_manual_6ep_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_finetuned_wallisian_manual_6ep_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_finetuned_wallisian_manual_6ep_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|465.0 MB| + +## References + +https://huggingface.co/btamm12/roberta-base-finetuned-wls-manual-6ep + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-roberta_base_finetuned_wallisian_manual_7ep_en.md b/docs/_posts/ahmedlone127/2024-09-23-roberta_base_finetuned_wallisian_manual_7ep_en.md new file mode 100644 index 00000000000000..40777cde5a5887 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-roberta_base_finetuned_wallisian_manual_7ep_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_base_finetuned_wallisian_manual_7ep RoBertaEmbeddings from btamm12 +author: John Snow Labs +name: roberta_base_finetuned_wallisian_manual_7ep +date: 2024-09-23 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_finetuned_wallisian_manual_7ep` is a English model originally trained by btamm12. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_finetuned_wallisian_manual_7ep_en_5.5.0_3.0_1727092281288.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_finetuned_wallisian_manual_7ep_en_5.5.0_3.0_1727092281288.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("roberta_base_finetuned_wallisian_manual_7ep","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("roberta_base_finetuned_wallisian_manual_7ep","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_finetuned_wallisian_manual_7ep| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|465.0 MB| + +## References + +https://huggingface.co/btamm12/roberta-base-finetuned-wls-manual-7ep \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-roberta_base_finetuned_wallisian_manual_7ep_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-roberta_base_finetuned_wallisian_manual_7ep_pipeline_en.md new file mode 100644 index 00000000000000..2b11b0a8aa1cf4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-roberta_base_finetuned_wallisian_manual_7ep_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_finetuned_wallisian_manual_7ep_pipeline pipeline RoBertaEmbeddings from btamm12 +author: John Snow Labs +name: roberta_base_finetuned_wallisian_manual_7ep_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_finetuned_wallisian_manual_7ep_pipeline` is a English model originally trained by btamm12. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_finetuned_wallisian_manual_7ep_pipeline_en_5.5.0_3.0_1727092303503.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_finetuned_wallisian_manual_7ep_pipeline_en_5.5.0_3.0_1727092303503.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_finetuned_wallisian_manual_7ep_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_finetuned_wallisian_manual_7ep_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_finetuned_wallisian_manual_7ep_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|465.0 MB| + +## References + +https://huggingface.co/btamm12/roberta-base-finetuned-wls-manual-7ep + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-roberta_base_finetuned_wallisian_whisper_9ep_en.md b/docs/_posts/ahmedlone127/2024-09-23-roberta_base_finetuned_wallisian_whisper_9ep_en.md new file mode 100644 index 00000000000000..6896f5ff5c0628 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-roberta_base_finetuned_wallisian_whisper_9ep_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_base_finetuned_wallisian_whisper_9ep RoBertaEmbeddings from btamm12 +author: John Snow Labs +name: roberta_base_finetuned_wallisian_whisper_9ep +date: 2024-09-23 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_finetuned_wallisian_whisper_9ep` is a English model originally trained by btamm12. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_finetuned_wallisian_whisper_9ep_en_5.5.0_3.0_1727092137100.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_finetuned_wallisian_whisper_9ep_en_5.5.0_3.0_1727092137100.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("roberta_base_finetuned_wallisian_whisper_9ep","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("roberta_base_finetuned_wallisian_whisper_9ep","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_finetuned_wallisian_whisper_9ep| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|466.0 MB| + +## References + +https://huggingface.co/btamm12/roberta-base-finetuned-wls-whisper-9ep \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-roberta_base_finetunedlabelsmooting_ner_en.md b/docs/_posts/ahmedlone127/2024-09-23-roberta_base_finetunedlabelsmooting_ner_en.md new file mode 100644 index 00000000000000..8a9f5d3cd2d30a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-roberta_base_finetunedlabelsmooting_ner_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_base_finetunedlabelsmooting_ner RoBertaForTokenClassification from lobrien001 +author: John Snow Labs +name: roberta_base_finetunedlabelsmooting_ner +date: 2024-09-23 +tags: [en, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_finetunedlabelsmooting_ner` is a English model originally trained by lobrien001. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_finetunedlabelsmooting_ner_en_5.5.0_3.0_1727081796172.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_finetunedlabelsmooting_ner_en_5.5.0_3.0_1727081796172.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_base_finetunedlabelsmooting_ner","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_base_finetunedlabelsmooting_ner", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_finetunedlabelsmooting_ner| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|359.3 MB| + +## References + +https://huggingface.co/lobrien001/roberta-base-finetunedLabelSmooting-ner \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-roberta_base_finetunedlabelsmooting_ner_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-roberta_base_finetunedlabelsmooting_ner_pipeline_en.md new file mode 100644 index 00000000000000..54b39402c5d03d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-roberta_base_finetunedlabelsmooting_ner_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_finetunedlabelsmooting_ner_pipeline pipeline RoBertaForTokenClassification from lobrien001 +author: John Snow Labs +name: roberta_base_finetunedlabelsmooting_ner_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_finetunedlabelsmooting_ner_pipeline` is a English model originally trained by lobrien001. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_finetunedlabelsmooting_ner_pipeline_en_5.5.0_3.0_1727081861060.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_finetunedlabelsmooting_ner_pipeline_en_5.5.0_3.0_1727081861060.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_finetunedlabelsmooting_ner_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_finetunedlabelsmooting_ner_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_finetunedlabelsmooting_ner_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|359.3 MB| + +## References + +https://huggingface.co/lobrien001/roberta-base-finetunedLabelSmooting-ner + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-roberta_base_go_emotions_en.md b/docs/_posts/ahmedlone127/2024-09-23-roberta_base_go_emotions_en.md new file mode 100644 index 00000000000000..a5f56c9c243c42 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-roberta_base_go_emotions_en.md @@ -0,0 +1,98 @@ +--- +layout: model +title: English roberta_base_go_emotions RoBertaForSequenceClassification from SamLowe +author: John Snow Labs +name: roberta_base_go_emotions +date: 2024-09-23 +tags: [roberta, en, open_source, sequence_classification, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_go_emotions` is a English model originally trained by SamLowe. + +## Predicted Entities + + + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_go_emotions_en_5.5.0_3.0_1727082387278.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_go_emotions_en_5.5.0_3.0_1727082387278.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +document_assembler = DocumentAssembler()\ + .setInputCol("text")\ + .setOutputCol("document") + +tokenizer = Tokenizer()\ + .setInputCols("document")\ + .setOutputCol("token") + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_base_go_emotions","en")\ + .setInputCols(["document","token"])\ + .setOutputCol("class") + +pipeline = Pipeline().setStages([document_assembler, tokenizer, sequenceClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val document_assembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_base_go_emotions","en") + .setInputCols(Array("document","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_go_emotions| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +References + +https://huggingface.co/SamLowe/roberta-base-go_emotions \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-roberta_base_he111111_en.md b/docs/_posts/ahmedlone127/2024-09-23-roberta_base_he111111_en.md new file mode 100644 index 00000000000000..9120b794fe0e86 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-roberta_base_he111111_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_base_he111111 RoBertaForSequenceClassification from he111111 +author: John Snow Labs +name: roberta_base_he111111 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_he111111` is a English model originally trained by he111111. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_he111111_en_5.5.0_3.0_1727135155055.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_he111111_en_5.5.0_3.0_1727135155055.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_base_he111111","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_base_he111111", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_he111111| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|451.4 MB| + +## References + +https://huggingface.co/he111111/Roberta-base \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-roberta_base_he111111_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-roberta_base_he111111_pipeline_en.md new file mode 100644 index 00000000000000..2d21f206d437a6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-roberta_base_he111111_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_he111111_pipeline pipeline RoBertaForSequenceClassification from he111111 +author: John Snow Labs +name: roberta_base_he111111_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_he111111_pipeline` is a English model originally trained by he111111. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_he111111_pipeline_en_5.5.0_3.0_1727135183943.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_he111111_pipeline_en_5.5.0_3.0_1727135183943.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_he111111_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_he111111_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_he111111_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|451.4 MB| + +## References + +https://huggingface.co/he111111/Roberta-base + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-roberta_base_hoax_classifier_defs_1h100r_en.md b/docs/_posts/ahmedlone127/2024-09-23-roberta_base_hoax_classifier_defs_1h100r_en.md new file mode 100644 index 00000000000000..f0a3d69b826b28 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-roberta_base_hoax_classifier_defs_1h100r_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_base_hoax_classifier_defs_1h100r RoBertaForSequenceClassification from research-dump +author: John Snow Labs +name: roberta_base_hoax_classifier_defs_1h100r +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_hoax_classifier_defs_1h100r` is a English model originally trained by research-dump. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_hoax_classifier_defs_1h100r_en_5.5.0_3.0_1727085791843.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_hoax_classifier_defs_1h100r_en_5.5.0_3.0_1727085791843.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_base_hoax_classifier_defs_1h100r","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_base_hoax_classifier_defs_1h100r", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_hoax_classifier_defs_1h100r| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|451.1 MB| + +## References + +https://huggingface.co/research-dump/roberta-base_hoax_classifier_defs_1h100r \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-roberta_base_hoax_classifier_defs_1h100r_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-roberta_base_hoax_classifier_defs_1h100r_pipeline_en.md new file mode 100644 index 00000000000000..ccefed147906a8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-roberta_base_hoax_classifier_defs_1h100r_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_hoax_classifier_defs_1h100r_pipeline pipeline RoBertaForSequenceClassification from research-dump +author: John Snow Labs +name: roberta_base_hoax_classifier_defs_1h100r_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_hoax_classifier_defs_1h100r_pipeline` is a English model originally trained by research-dump. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_hoax_classifier_defs_1h100r_pipeline_en_5.5.0_3.0_1727085818982.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_hoax_classifier_defs_1h100r_pipeline_en_5.5.0_3.0_1727085818982.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_hoax_classifier_defs_1h100r_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_hoax_classifier_defs_1h100r_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_hoax_classifier_defs_1h100r_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|451.1 MB| + +## References + +https://huggingface.co/research-dump/roberta-base_hoax_classifier_defs_1h100r + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-roberta_base_s2d_saved_en.md b/docs/_posts/ahmedlone127/2024-09-23-roberta_base_s2d_saved_en.md new file mode 100644 index 00000000000000..5cdafe5f72b64a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-roberta_base_s2d_saved_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_base_s2d_saved RoBertaForSequenceClassification from thaile +author: John Snow Labs +name: roberta_base_s2d_saved +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_s2d_saved` is a English model originally trained by thaile. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_s2d_saved_en_5.5.0_3.0_1727085687280.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_s2d_saved_en_5.5.0_3.0_1727085687280.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_base_s2d_saved","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_base_s2d_saved", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_s2d_saved| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|427.0 MB| + +## References + +https://huggingface.co/thaile/roberta-base-s2d-saved \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-roberta_base_s2d_saved_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-roberta_base_s2d_saved_pipeline_en.md new file mode 100644 index 00000000000000..d58ee63bd8499c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-roberta_base_s2d_saved_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_s2d_saved_pipeline pipeline RoBertaForSequenceClassification from thaile +author: John Snow Labs +name: roberta_base_s2d_saved_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_s2d_saved_pipeline` is a English model originally trained by thaile. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_s2d_saved_pipeline_en_5.5.0_3.0_1727085715448.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_s2d_saved_pipeline_en_5.5.0_3.0_1727085715448.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_s2d_saved_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_s2d_saved_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_s2d_saved_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|427.1 MB| + +## References + +https://huggingface.co/thaile/roberta-base-s2d-saved + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-roberta_base_sentiment_en.md b/docs/_posts/ahmedlone127/2024-09-23-roberta_base_sentiment_en.md new file mode 100644 index 00000000000000..7c59688a7e235c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-roberta_base_sentiment_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_base_sentiment RoBertaForSequenceClassification from 51la5 +author: John Snow Labs +name: roberta_base_sentiment +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_sentiment` is a English model originally trained by 51la5. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_sentiment_en_5.5.0_3.0_1727135542779.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_sentiment_en_5.5.0_3.0_1727135542779.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_base_sentiment","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_base_sentiment", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_sentiment| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|433.2 MB| + +## References + +https://huggingface.co/51la5/roberta-base-sentiment \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-roberta_base_sentiment_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-roberta_base_sentiment_pipeline_en.md new file mode 100644 index 00000000000000..2e89720f0a4403 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-roberta_base_sentiment_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_sentiment_pipeline pipeline RoBertaForSequenceClassification from 51la5 +author: John Snow Labs +name: roberta_base_sentiment_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_sentiment_pipeline` is a English model originally trained by 51la5. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_sentiment_pipeline_en_5.5.0_3.0_1727135582929.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_sentiment_pipeline_en_5.5.0_3.0_1727135582929.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_sentiment_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_sentiment_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_sentiment_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|433.2 MB| + +## References + +https://huggingface.co/51la5/roberta-base-sentiment + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-roberta_base_snli_mtreviso_en.md b/docs/_posts/ahmedlone127/2024-09-23-roberta_base_snli_mtreviso_en.md new file mode 100644 index 00000000000000..e9f7c02f67bf39 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-roberta_base_snli_mtreviso_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_base_snli_mtreviso RoBertaForSequenceClassification from mtreviso +author: John Snow Labs +name: roberta_base_snli_mtreviso +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_snli_mtreviso` is a English model originally trained by mtreviso. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_snli_mtreviso_en_5.5.0_3.0_1727134839277.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_snli_mtreviso_en_5.5.0_3.0_1727134839277.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_base_snli_mtreviso","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_base_snli_mtreviso", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_snli_mtreviso| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|447.8 MB| + +## References + +https://huggingface.co/mtreviso/roberta-base-snli \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-roberta_base_snli_mtreviso_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-roberta_base_snli_mtreviso_pipeline_en.md new file mode 100644 index 00000000000000..db9c1c3ba239f2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-roberta_base_snli_mtreviso_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_snli_mtreviso_pipeline pipeline RoBertaForSequenceClassification from mtreviso +author: John Snow Labs +name: roberta_base_snli_mtreviso_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_snli_mtreviso_pipeline` is a English model originally trained by mtreviso. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_snli_mtreviso_pipeline_en_5.5.0_3.0_1727134870752.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_snli_mtreviso_pipeline_en_5.5.0_3.0_1727134870752.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_snli_mtreviso_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_snli_mtreviso_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_snli_mtreviso_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|447.9 MB| + +## References + +https://huggingface.co/mtreviso/roberta-base-snli + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-roberta_base_university_writing2_en.md b/docs/_posts/ahmedlone127/2024-09-23-roberta_base_university_writing2_en.md new file mode 100644 index 00000000000000..a4e3be82ec174f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-roberta_base_university_writing2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_base_university_writing2 RoBertaEmbeddings from egumasa +author: John Snow Labs +name: roberta_base_university_writing2 +date: 2024-09-23 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_university_writing2` is a English model originally trained by egumasa. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_university_writing2_en_5.5.0_3.0_1727080448557.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_university_writing2_en_5.5.0_3.0_1727080448557.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("roberta_base_university_writing2","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("roberta_base_university_writing2","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_university_writing2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|466.3 MB| + +## References + +https://huggingface.co/egumasa/roberta-base-university-writing2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-roberta_base_university_writing2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-roberta_base_university_writing2_pipeline_en.md new file mode 100644 index 00000000000000..acf8fbae17a855 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-roberta_base_university_writing2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_university_writing2_pipeline pipeline RoBertaEmbeddings from egumasa +author: John Snow Labs +name: roberta_base_university_writing2_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_university_writing2_pipeline` is a English model originally trained by egumasa. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_university_writing2_pipeline_en_5.5.0_3.0_1727080470581.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_university_writing2_pipeline_en_5.5.0_3.0_1727080470581.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_university_writing2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_university_writing2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_university_writing2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|466.4 MB| + +## References + +https://huggingface.co/egumasa/roberta-base-university-writing2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-roberta_conll_epoch_4_en.md b/docs/_posts/ahmedlone127/2024-09-23-roberta_conll_epoch_4_en.md new file mode 100644 index 00000000000000..bb1bea33e0397e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-roberta_conll_epoch_4_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_conll_epoch_4 RoBertaForTokenClassification from ICT2214Team7 +author: John Snow Labs +name: roberta_conll_epoch_4 +date: 2024-09-23 +tags: [en, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_conll_epoch_4` is a English model originally trained by ICT2214Team7. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_conll_epoch_4_en_5.5.0_3.0_1727081465346.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_conll_epoch_4_en_5.5.0_3.0_1727081465346.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_conll_epoch_4","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_conll_epoch_4", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_conll_epoch_4| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|306.6 MB| + +## References + +https://huggingface.co/ICT2214Team7/RoBERTa_conll_epoch_4 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-roberta_conll_epoch_4_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-roberta_conll_epoch_4_pipeline_en.md new file mode 100644 index 00000000000000..003b15c78fab18 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-roberta_conll_epoch_4_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_conll_epoch_4_pipeline pipeline RoBertaForTokenClassification from ICT2214Team7 +author: John Snow Labs +name: roberta_conll_epoch_4_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_conll_epoch_4_pipeline` is a English model originally trained by ICT2214Team7. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_conll_epoch_4_pipeline_en_5.5.0_3.0_1727081480063.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_conll_epoch_4_pipeline_en_5.5.0_3.0_1727081480063.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_conll_epoch_4_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_conll_epoch_4_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_conll_epoch_4_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|306.6 MB| + +## References + +https://huggingface.co/ICT2214Team7/RoBERTa_conll_epoch_4 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-roberta_english_financialnews_tuned_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-roberta_english_financialnews_tuned_pipeline_en.md new file mode 100644 index 00000000000000..354f2553c4afeb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-roberta_english_financialnews_tuned_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_english_financialnews_tuned_pipeline pipeline RoBertaEmbeddings from CCCCC5 +author: John Snow Labs +name: roberta_english_financialnews_tuned_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_english_financialnews_tuned_pipeline` is a English model originally trained by CCCCC5. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_english_financialnews_tuned_pipeline_en_5.5.0_3.0_1727066359701.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_english_financialnews_tuned_pipeline_en_5.5.0_3.0_1727066359701.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_english_financialnews_tuned_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_english_financialnews_tuned_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_english_financialnews_tuned_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|426.0 MB| + +## References + +https://huggingface.co/CCCCC5/RoBERTa_English_FinancialNews_tuned + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-roberta_finetuned_20_en.md b/docs/_posts/ahmedlone127/2024-09-23-roberta_finetuned_20_en.md new file mode 100644 index 00000000000000..f8e60dbe67b625 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-roberta_finetuned_20_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_finetuned_20 RoBertaEmbeddings from yzxjb +author: John Snow Labs +name: roberta_finetuned_20 +date: 2024-09-23 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_finetuned_20` is a English model originally trained by yzxjb. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_finetuned_20_en_5.5.0_3.0_1727056993808.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_finetuned_20_en_5.5.0_3.0_1727056993808.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("roberta_finetuned_20","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("roberta_finetuned_20","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_finetuned_20| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|466.2 MB| + +## References + +https://huggingface.co/yzxjb/roberta-finetuned-20 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-roberta_finetuned_20_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-roberta_finetuned_20_pipeline_en.md new file mode 100644 index 00000000000000..6c3b29b2a13d4b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-roberta_finetuned_20_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_finetuned_20_pipeline pipeline RoBertaEmbeddings from yzxjb +author: John Snow Labs +name: roberta_finetuned_20_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_finetuned_20_pipeline` is a English model originally trained by yzxjb. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_finetuned_20_pipeline_en_5.5.0_3.0_1727057015874.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_finetuned_20_pipeline_en_5.5.0_3.0_1727057015874.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_finetuned_20_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_finetuned_20_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_finetuned_20_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|466.3 MB| + +## References + +https://huggingface.co/yzxjb/roberta-finetuned-20 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-roberta_finetuned_danish_task_b_100k_5_labels_en.md b/docs/_posts/ahmedlone127/2024-09-23-roberta_finetuned_danish_task_b_100k_5_labels_en.md new file mode 100644 index 00000000000000..b5eed69c84ee3c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-roberta_finetuned_danish_task_b_100k_5_labels_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_finetuned_danish_task_b_100k_5_labels RoBertaForSequenceClassification from bitsanlp +author: John Snow Labs +name: roberta_finetuned_danish_task_b_100k_5_labels +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_finetuned_danish_task_b_100k_5_labels` is a English model originally trained by bitsanlp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_finetuned_danish_task_b_100k_5_labels_en_5.5.0_3.0_1727085646890.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_finetuned_danish_task_b_100k_5_labels_en_5.5.0_3.0_1727085646890.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_finetuned_danish_task_b_100k_5_labels","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_finetuned_danish_task_b_100k_5_labels", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_finetuned_danish_task_b_100k_5_labels| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|468.2 MB| + +## References + +https://huggingface.co/bitsanlp/roberta-finetuned-DA-task-B-100k-5-labels \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-roberta_finetuned_inspirational_en.md b/docs/_posts/ahmedlone127/2024-09-23-roberta_finetuned_inspirational_en.md new file mode 100644 index 00000000000000..416907077dbd07 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-roberta_finetuned_inspirational_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_finetuned_inspirational XlmRoBertaForSequenceClassification from reecursion +author: John Snow Labs +name: roberta_finetuned_inspirational +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_finetuned_inspirational` is a English model originally trained by reecursion. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_finetuned_inspirational_en_5.5.0_3.0_1727127152928.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_finetuned_inspirational_en_5.5.0_3.0_1727127152928.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("roberta_finetuned_inspirational","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("roberta_finetuned_inspirational", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_finetuned_inspirational| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|813.0 MB| + +## References + +https://huggingface.co/reecursion/roberta-finetuned-inspirational \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-roberta_finetuned_inspirational_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-roberta_finetuned_inspirational_pipeline_en.md new file mode 100644 index 00000000000000..ce8804b49b39bc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-roberta_finetuned_inspirational_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_finetuned_inspirational_pipeline pipeline XlmRoBertaForSequenceClassification from reecursion +author: John Snow Labs +name: roberta_finetuned_inspirational_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_finetuned_inspirational_pipeline` is a English model originally trained by reecursion. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_finetuned_inspirational_pipeline_en_5.5.0_3.0_1727127269042.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_finetuned_inspirational_pipeline_en_5.5.0_3.0_1727127269042.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_finetuned_inspirational_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_finetuned_inspirational_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_finetuned_inspirational_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|813.1 MB| + +## References + +https://huggingface.co/reecursion/roberta-finetuned-inspirational + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-roberta_large_conv_contradiction_detector_v0_en.md b/docs/_posts/ahmedlone127/2024-09-23-roberta_large_conv_contradiction_detector_v0_en.md new file mode 100644 index 00000000000000..6df1ed738cb00b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-roberta_large_conv_contradiction_detector_v0_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_large_conv_contradiction_detector_v0 RoBertaForSequenceClassification from ynie +author: John Snow Labs +name: roberta_large_conv_contradiction_detector_v0 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_large_conv_contradiction_detector_v0` is a English model originally trained by ynie. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_large_conv_contradiction_detector_v0_en_5.5.0_3.0_1727086292184.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_large_conv_contradiction_detector_v0_en_5.5.0_3.0_1727086292184.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_large_conv_contradiction_detector_v0","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_large_conv_contradiction_detector_v0", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_large_conv_contradiction_detector_v0| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/ynie/roberta-large_conv_contradiction_detector_v0 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-roberta_large_conv_contradiction_detector_v0_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-roberta_large_conv_contradiction_detector_v0_pipeline_en.md new file mode 100644 index 00000000000000..0ba8006fdc86c6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-roberta_large_conv_contradiction_detector_v0_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_large_conv_contradiction_detector_v0_pipeline pipeline RoBertaForSequenceClassification from ynie +author: John Snow Labs +name: roberta_large_conv_contradiction_detector_v0_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_large_conv_contradiction_detector_v0_pipeline` is a English model originally trained by ynie. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_large_conv_contradiction_detector_v0_pipeline_en_5.5.0_3.0_1727086363656.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_large_conv_contradiction_detector_v0_pipeline_en_5.5.0_3.0_1727086363656.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_large_conv_contradiction_detector_v0_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_large_conv_contradiction_detector_v0_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_large_conv_contradiction_detector_v0_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/ynie/roberta-large_conv_contradiction_detector_v0 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-roberta_large_fever_sagnikrayc_en.md b/docs/_posts/ahmedlone127/2024-09-23-roberta_large_fever_sagnikrayc_en.md new file mode 100644 index 00000000000000..6fdd455353f7c9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-roberta_large_fever_sagnikrayc_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_large_fever_sagnikrayc RoBertaForSequenceClassification from sagnikrayc +author: John Snow Labs +name: roberta_large_fever_sagnikrayc +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_large_fever_sagnikrayc` is a English model originally trained by sagnikrayc. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_large_fever_sagnikrayc_en_5.5.0_3.0_1727135453464.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_large_fever_sagnikrayc_en_5.5.0_3.0_1727135453464.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_large_fever_sagnikrayc","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_large_fever_sagnikrayc", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_large_fever_sagnikrayc| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/sagnikrayc/roberta-large-fever \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-roberta_large_fever_sagnikrayc_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-roberta_large_fever_sagnikrayc_pipeline_en.md new file mode 100644 index 00000000000000..4bd6161f3fbb26 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-roberta_large_fever_sagnikrayc_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_large_fever_sagnikrayc_pipeline pipeline RoBertaForSequenceClassification from sagnikrayc +author: John Snow Labs +name: roberta_large_fever_sagnikrayc_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_large_fever_sagnikrayc_pipeline` is a English model originally trained by sagnikrayc. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_large_fever_sagnikrayc_pipeline_en_5.5.0_3.0_1727135524693.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_large_fever_sagnikrayc_pipeline_en_5.5.0_3.0_1727135524693.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_large_fever_sagnikrayc_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_large_fever_sagnikrayc_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_large_fever_sagnikrayc_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/sagnikrayc/roberta-large-fever + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-roberta_large_finetuned_ner_hypnosis0930_en.md b/docs/_posts/ahmedlone127/2024-09-23-roberta_large_finetuned_ner_hypnosis0930_en.md new file mode 100644 index 00000000000000..60a16f735b3f70 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-roberta_large_finetuned_ner_hypnosis0930_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_large_finetuned_ner_hypnosis0930 RoBertaForTokenClassification from hypnosis0930 +author: John Snow Labs +name: roberta_large_finetuned_ner_hypnosis0930 +date: 2024-09-23 +tags: [en, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_large_finetuned_ner_hypnosis0930` is a English model originally trained by hypnosis0930. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_large_finetuned_ner_hypnosis0930_en_5.5.0_3.0_1727072856735.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_large_finetuned_ner_hypnosis0930_en_5.5.0_3.0_1727072856735.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_large_finetuned_ner_hypnosis0930","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_large_finetuned_ner_hypnosis0930", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_large_finetuned_ner_hypnosis0930| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/hypnosis0930/roberta-large-finetuned-ner \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-roberta_large_finetuned_ner_hypnosis0930_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-roberta_large_finetuned_ner_hypnosis0930_pipeline_en.md new file mode 100644 index 00000000000000..d3021effea35f8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-roberta_large_finetuned_ner_hypnosis0930_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_large_finetuned_ner_hypnosis0930_pipeline pipeline RoBertaForTokenClassification from hypnosis0930 +author: John Snow Labs +name: roberta_large_finetuned_ner_hypnosis0930_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_large_finetuned_ner_hypnosis0930_pipeline` is a English model originally trained by hypnosis0930. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_large_finetuned_ner_hypnosis0930_pipeline_en_5.5.0_3.0_1727072925612.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_large_finetuned_ner_hypnosis0930_pipeline_en_5.5.0_3.0_1727072925612.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_large_finetuned_ner_hypnosis0930_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_large_finetuned_ner_hypnosis0930_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_large_finetuned_ner_hypnosis0930_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/hypnosis0930/roberta-large-finetuned-ner + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-roberta_large_gd1_v1_en.md b/docs/_posts/ahmedlone127/2024-09-23-roberta_large_gd1_v1_en.md new file mode 100644 index 00000000000000..e078d0494fd5e3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-roberta_large_gd1_v1_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_large_gd1_v1 RoBertaForSequenceClassification from ericNguyen0132 +author: John Snow Labs +name: roberta_large_gd1_v1 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_large_gd1_v1` is a English model originally trained by ericNguyen0132. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_large_gd1_v1_en_5.5.0_3.0_1727054960782.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_large_gd1_v1_en_5.5.0_3.0_1727054960782.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_large_gd1_v1","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_large_gd1_v1", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_large_gd1_v1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/ericNguyen0132/RoBERTa-large-GD1-v1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-roberta_large_gd1_v1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-roberta_large_gd1_v1_pipeline_en.md new file mode 100644 index 00000000000000..be9c280d72515e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-roberta_large_gd1_v1_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_large_gd1_v1_pipeline pipeline RoBertaForSequenceClassification from ericNguyen0132 +author: John Snow Labs +name: roberta_large_gd1_v1_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_large_gd1_v1_pipeline` is a English model originally trained by ericNguyen0132. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_large_gd1_v1_pipeline_en_5.5.0_3.0_1727055025719.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_large_gd1_v1_pipeline_en_5.5.0_3.0_1727055025719.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_large_gd1_v1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_large_gd1_v1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_large_gd1_v1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/ericNguyen0132/RoBERTa-large-GD1-v1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-roberta_large_movie_genre_en.md b/docs/_posts/ahmedlone127/2024-09-23-roberta_large_movie_genre_en.md new file mode 100644 index 00000000000000..7c7418fe2a21fd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-roberta_large_movie_genre_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_large_movie_genre RoBertaEmbeddings from Shiro +author: John Snow Labs +name: roberta_large_movie_genre +date: 2024-09-23 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_large_movie_genre` is a English model originally trained by Shiro. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_large_movie_genre_en_5.5.0_3.0_1727121940506.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_large_movie_genre_en_5.5.0_3.0_1727121940506.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("roberta_large_movie_genre","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("roberta_large_movie_genre","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_large_movie_genre| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/Shiro/roberta-large-movie-genre \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-roberta_large_movie_genre_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-roberta_large_movie_genre_pipeline_en.md new file mode 100644 index 00000000000000..1b54e56ca21add --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-roberta_large_movie_genre_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_large_movie_genre_pipeline pipeline RoBertaEmbeddings from Shiro +author: John Snow Labs +name: roberta_large_movie_genre_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_large_movie_genre_pipeline` is a English model originally trained by Shiro. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_large_movie_genre_pipeline_en_5.5.0_3.0_1727122002400.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_large_movie_genre_pipeline_en_5.5.0_3.0_1727122002400.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_large_movie_genre_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_large_movie_genre_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_large_movie_genre_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/Shiro/roberta-large-movie-genre + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-roberta_large_pm_m3_voc_hf_finetuned_ner_v2_en.md b/docs/_posts/ahmedlone127/2024-09-23-roberta_large_pm_m3_voc_hf_finetuned_ner_v2_en.md new file mode 100644 index 00000000000000..46ed0be6ccd1fe --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-roberta_large_pm_m3_voc_hf_finetuned_ner_v2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_large_pm_m3_voc_hf_finetuned_ner_v2 RoBertaForTokenClassification from ktgiahieu +author: John Snow Labs +name: roberta_large_pm_m3_voc_hf_finetuned_ner_v2 +date: 2024-09-23 +tags: [en, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_large_pm_m3_voc_hf_finetuned_ner_v2` is a English model originally trained by ktgiahieu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_large_pm_m3_voc_hf_finetuned_ner_v2_en_5.5.0_3.0_1727081817052.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_large_pm_m3_voc_hf_finetuned_ner_v2_en_5.5.0_3.0_1727081817052.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_large_pm_m3_voc_hf_finetuned_ner_v2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_large_pm_m3_voc_hf_finetuned_ner_v2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_large_pm_m3_voc_hf_finetuned_ner_v2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/ktgiahieu/RoBERTa-large-PM-M3-Voc-hf-finetuned-ner-v2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-roberta_large_pm_m3_voc_hf_finetuned_ner_v2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-roberta_large_pm_m3_voc_hf_finetuned_ner_v2_pipeline_en.md new file mode 100644 index 00000000000000..4c0b776b8464ba --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-roberta_large_pm_m3_voc_hf_finetuned_ner_v2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_large_pm_m3_voc_hf_finetuned_ner_v2_pipeline pipeline RoBertaForTokenClassification from ktgiahieu +author: John Snow Labs +name: roberta_large_pm_m3_voc_hf_finetuned_ner_v2_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_large_pm_m3_voc_hf_finetuned_ner_v2_pipeline` is a English model originally trained by ktgiahieu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_large_pm_m3_voc_hf_finetuned_ner_v2_pipeline_en_5.5.0_3.0_1727081884133.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_large_pm_m3_voc_hf_finetuned_ner_v2_pipeline_en_5.5.0_3.0_1727081884133.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_large_pm_m3_voc_hf_finetuned_ner_v2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_large_pm_m3_voc_hf_finetuned_ner_v2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_large_pm_m3_voc_hf_finetuned_ner_v2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/ktgiahieu/RoBERTa-large-PM-M3-Voc-hf-finetuned-ner-v2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-roberta_large_unlabeled_gab_semeval2023_task10_45000sample_en.md b/docs/_posts/ahmedlone127/2024-09-23-roberta_large_unlabeled_gab_semeval2023_task10_45000sample_en.md new file mode 100644 index 00000000000000..e7092ac4948f0f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-roberta_large_unlabeled_gab_semeval2023_task10_45000sample_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_large_unlabeled_gab_semeval2023_task10_45000sample RoBertaEmbeddings from HPL +author: John Snow Labs +name: roberta_large_unlabeled_gab_semeval2023_task10_45000sample +date: 2024-09-23 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_large_unlabeled_gab_semeval2023_task10_45000sample` is a English model originally trained by HPL. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_large_unlabeled_gab_semeval2023_task10_45000sample_en_5.5.0_3.0_1727121645108.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_large_unlabeled_gab_semeval2023_task10_45000sample_en_5.5.0_3.0_1727121645108.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("roberta_large_unlabeled_gab_semeval2023_task10_45000sample","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("roberta_large_unlabeled_gab_semeval2023_task10_45000sample","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_large_unlabeled_gab_semeval2023_task10_45000sample| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/HPL/roberta-large-unlabeled-gab-semeval2023-task10-45000sample \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-roberta_nba_v2_en.md b/docs/_posts/ahmedlone127/2024-09-23-roberta_nba_v2_en.md new file mode 100644 index 00000000000000..89f73c54aa3bd1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-roberta_nba_v2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_nba_v2 RoBertaForSequenceClassification from sivakarri +author: John Snow Labs +name: roberta_nba_v2 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_nba_v2` is a English model originally trained by sivakarri. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_nba_v2_en_5.5.0_3.0_1727055315012.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_nba_v2_en_5.5.0_3.0_1727055315012.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_nba_v2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_nba_v2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_nba_v2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|430.7 MB| + +## References + +https://huggingface.co/sivakarri/roberta_nba_v2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-roberta_ner_omvibhandik_en.md b/docs/_posts/ahmedlone127/2024-09-23-roberta_ner_omvibhandik_en.md new file mode 100644 index 00000000000000..7fc51a4d4400eb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-roberta_ner_omvibhandik_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_ner_omvibhandik RoBertaForTokenClassification from OmVibhandik +author: John Snow Labs +name: roberta_ner_omvibhandik +date: 2024-09-23 +tags: [en, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_ner_omvibhandik` is a English model originally trained by OmVibhandik. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_ner_omvibhandik_en_5.5.0_3.0_1727081347076.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_ner_omvibhandik_en_5.5.0_3.0_1727081347076.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_ner_omvibhandik","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_ner_omvibhandik", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_ner_omvibhandik| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|429.4 MB| + +## References + +https://huggingface.co/OmVibhandik/roBERTa_NER \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-roberta_ner_omvibhandik_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-roberta_ner_omvibhandik_pipeline_en.md new file mode 100644 index 00000000000000..ef315ec209368f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-roberta_ner_omvibhandik_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_ner_omvibhandik_pipeline pipeline RoBertaForTokenClassification from OmVibhandik +author: John Snow Labs +name: roberta_ner_omvibhandik_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_ner_omvibhandik_pipeline` is a English model originally trained by OmVibhandik. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_ner_omvibhandik_pipeline_en_5.5.0_3.0_1727081378852.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_ner_omvibhandik_pipeline_en_5.5.0_3.0_1727081378852.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_ner_omvibhandik_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_ner_omvibhandik_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_ner_omvibhandik_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|429.4 MB| + +## References + +https://huggingface.co/OmVibhandik/roBERTa_NER + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-roberta_tagalog_base_ft_udpos213_romanian_en.md b/docs/_posts/ahmedlone127/2024-09-23-roberta_tagalog_base_ft_udpos213_romanian_en.md new file mode 100644 index 00000000000000..ea7d39bddfea47 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-roberta_tagalog_base_ft_udpos213_romanian_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_tagalog_base_ft_udpos213_romanian RoBertaForTokenClassification from hellojimson +author: John Snow Labs +name: roberta_tagalog_base_ft_udpos213_romanian +date: 2024-09-23 +tags: [en, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_tagalog_base_ft_udpos213_romanian` is a English model originally trained by hellojimson. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_tagalog_base_ft_udpos213_romanian_en_5.5.0_3.0_1727081261016.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_tagalog_base_ft_udpos213_romanian_en_5.5.0_3.0_1727081261016.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_tagalog_base_ft_udpos213_romanian","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_tagalog_base_ft_udpos213_romanian", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_tagalog_base_ft_udpos213_romanian| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/hellojimson/roberta-tagalog-base-ft-udpos213-ro \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-roberta_tagalog_base_ft_udpos213_romanian_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-roberta_tagalog_base_ft_udpos213_romanian_pipeline_en.md new file mode 100644 index 00000000000000..249d11deb04039 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-roberta_tagalog_base_ft_udpos213_romanian_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_tagalog_base_ft_udpos213_romanian_pipeline pipeline RoBertaForTokenClassification from hellojimson +author: John Snow Labs +name: roberta_tagalog_base_ft_udpos213_romanian_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_tagalog_base_ft_udpos213_romanian_pipeline` is a English model originally trained by hellojimson. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_tagalog_base_ft_udpos213_romanian_pipeline_en_5.5.0_3.0_1727081282609.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_tagalog_base_ft_udpos213_romanian_pipeline_en_5.5.0_3.0_1727081282609.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_tagalog_base_ft_udpos213_romanian_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_tagalog_base_ft_udpos213_romanian_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_tagalog_base_ft_udpos213_romanian_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/hellojimson/roberta-tagalog-base-ft-udpos213-ro + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-roberta_tagalog_base_ft_udpos213_top4lang_en.md b/docs/_posts/ahmedlone127/2024-09-23-roberta_tagalog_base_ft_udpos213_top4lang_en.md new file mode 100644 index 00000000000000..38eb6838102e57 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-roberta_tagalog_base_ft_udpos213_top4lang_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_tagalog_base_ft_udpos213_top4lang RoBertaForTokenClassification from katrinatan +author: John Snow Labs +name: roberta_tagalog_base_ft_udpos213_top4lang +date: 2024-09-23 +tags: [en, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_tagalog_base_ft_udpos213_top4lang` is a English model originally trained by katrinatan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_tagalog_base_ft_udpos213_top4lang_en_5.5.0_3.0_1727081261010.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_tagalog_base_ft_udpos213_top4lang_en_5.5.0_3.0_1727081261010.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_tagalog_base_ft_udpos213_top4lang","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_tagalog_base_ft_udpos213_top4lang", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_tagalog_base_ft_udpos213_top4lang| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/katrinatan/roberta-tagalog-base-ft-udpos213-top4lang \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-roberta_tagalog_base_ft_udpos213_top4lang_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-roberta_tagalog_base_ft_udpos213_top4lang_pipeline_en.md new file mode 100644 index 00000000000000..176120a060987f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-roberta_tagalog_base_ft_udpos213_top4lang_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_tagalog_base_ft_udpos213_top4lang_pipeline pipeline RoBertaForTokenClassification from katrinatan +author: John Snow Labs +name: roberta_tagalog_base_ft_udpos213_top4lang_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_tagalog_base_ft_udpos213_top4lang_pipeline` is a English model originally trained by katrinatan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_tagalog_base_ft_udpos213_top4lang_pipeline_en_5.5.0_3.0_1727081282876.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_tagalog_base_ft_udpos213_top4lang_pipeline_en_5.5.0_3.0_1727081282876.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_tagalog_base_ft_udpos213_top4lang_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_tagalog_base_ft_udpos213_top4lang_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_tagalog_base_ft_udpos213_top4lang_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/katrinatan/roberta-tagalog-base-ft-udpos213-top4lang + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-roberta_tomasz_en.md b/docs/_posts/ahmedlone127/2024-09-23-roberta_tomasz_en.md new file mode 100644 index 00000000000000..818933759c0dcf --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-roberta_tomasz_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_tomasz RoBertaEmbeddings from Tomasz +author: John Snow Labs +name: roberta_tomasz +date: 2024-09-23 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_tomasz` is a English model originally trained by Tomasz. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_tomasz_en_5.5.0_3.0_1727080784279.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_tomasz_en_5.5.0_3.0_1727080784279.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("roberta_tomasz","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("roberta_tomasz","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_tomasz| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|845.7 MB| + +## References + +https://huggingface.co/Tomasz/roberta \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-roberta_tomasz_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-roberta_tomasz_pipeline_en.md new file mode 100644 index 00000000000000..806a4ef0d551c8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-roberta_tomasz_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_tomasz_pipeline pipeline RoBertaEmbeddings from Tomasz +author: John Snow Labs +name: roberta_tomasz_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_tomasz_pipeline` is a English model originally trained by Tomasz. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_tomasz_pipeline_en_5.5.0_3.0_1727081025723.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_tomasz_pipeline_en_5.5.0_3.0_1727081025723.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_tomasz_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_tomasz_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_tomasz_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|845.7 MB| + +## References + +https://huggingface.co/Tomasz/roberta + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-roberta_wiki_english_en.md b/docs/_posts/ahmedlone127/2024-09-23-roberta_wiki_english_en.md new file mode 100644 index 00000000000000..1daa4c4f317b6c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-roberta_wiki_english_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_wiki_english RoBertaEmbeddings from juanquivilla +author: John Snow Labs +name: roberta_wiki_english +date: 2024-09-23 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_wiki_english` is a English model originally trained by juanquivilla. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_wiki_english_en_5.5.0_3.0_1727066083725.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_wiki_english_en_5.5.0_3.0_1727066083725.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("roberta_wiki_english","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("roberta_wiki_english","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_wiki_english| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|466.0 MB| + +## References + +https://huggingface.co/juanquivilla/roberta-wiki-en \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-roberta_wiki_english_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-roberta_wiki_english_pipeline_en.md new file mode 100644 index 00000000000000..18c3f73e903380 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-roberta_wiki_english_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_wiki_english_pipeline pipeline RoBertaEmbeddings from juanquivilla +author: John Snow Labs +name: roberta_wiki_english_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_wiki_english_pipeline` is a English model originally trained by juanquivilla. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_wiki_english_pipeline_en_5.5.0_3.0_1727066105583.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_wiki_english_pipeline_en_5.5.0_3.0_1727066105583.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_wiki_english_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_wiki_english_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_wiki_english_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|466.0 MB| + +## References + +https://huggingface.co/juanquivilla/roberta-wiki-en + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-robertvarcs_10pc_en.md b/docs/_posts/ahmedlone127/2024-09-23-robertvarcs_10pc_en.md new file mode 100644 index 00000000000000..cb708ab7e5a2cf --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-robertvarcs_10pc_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English robertvarcs_10pc RoBertaEmbeddings from gnathoi +author: John Snow Labs +name: robertvarcs_10pc +date: 2024-09-23 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`robertvarcs_10pc` is a English model originally trained by gnathoi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/robertvarcs_10pc_en_5.5.0_3.0_1727080857649.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/robertvarcs_10pc_en_5.5.0_3.0_1727080857649.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("robertvarcs_10pc","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("robertvarcs_10pc","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|robertvarcs_10pc| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|298.2 MB| + +## References + +https://huggingface.co/gnathoi/RoBERTvarCS_10pc \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-robertvarcs_10pc_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-robertvarcs_10pc_pipeline_en.md new file mode 100644 index 00000000000000..328a095e43b0d3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-robertvarcs_10pc_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English robertvarcs_10pc_pipeline pipeline RoBertaEmbeddings from gnathoi +author: John Snow Labs +name: robertvarcs_10pc_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`robertvarcs_10pc_pipeline` is a English model originally trained by gnathoi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/robertvarcs_10pc_pipeline_en_5.5.0_3.0_1727080941881.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/robertvarcs_10pc_pipeline_en_5.5.0_3.0_1727080941881.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("robertvarcs_10pc_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("robertvarcs_10pc_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|robertvarcs_10pc_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|298.2 MB| + +## References + +https://huggingface.co/gnathoi/RoBERTvarCS_10pc + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-roboust_nlp_xlmr_en.md b/docs/_posts/ahmedlone127/2024-09-23-roboust_nlp_xlmr_en.md new file mode 100644 index 00000000000000..988459b1d8ddd9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-roboust_nlp_xlmr_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roboust_nlp_xlmr XlmRoBertaEmbeddings from Blue7Bird +author: John Snow Labs +name: roboust_nlp_xlmr +date: 2024-09-23 +tags: [en, open_source, onnx, embeddings, xlm_roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roboust_nlp_xlmr` is a English model originally trained by Blue7Bird. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roboust_nlp_xlmr_en_5.5.0_3.0_1727071750749.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roboust_nlp_xlmr_en_5.5.0_3.0_1727071750749.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = XlmRoBertaEmbeddings.pretrained("roboust_nlp_xlmr","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = XlmRoBertaEmbeddings.pretrained("roboust_nlp_xlmr","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roboust_nlp_xlmr| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[xlm_roberta]| +|Language:|en| +|Size:|1.0 GB| + +## References + +https://huggingface.co/Blue7Bird/Roboust_nlp_xlmr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-roboust_nlp_xlmr_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-roboust_nlp_xlmr_pipeline_en.md new file mode 100644 index 00000000000000..6b8479ffd6b9a5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-roboust_nlp_xlmr_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roboust_nlp_xlmr_pipeline pipeline XlmRoBertaEmbeddings from Blue7Bird +author: John Snow Labs +name: roboust_nlp_xlmr_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roboust_nlp_xlmr_pipeline` is a English model originally trained by Blue7Bird. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roboust_nlp_xlmr_pipeline_en_5.5.0_3.0_1727071802512.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roboust_nlp_xlmr_pipeline_en_5.5.0_3.0_1727071802512.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roboust_nlp_xlmr_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roboust_nlp_xlmr_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roboust_nlp_xlmr_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.0 GB| + +## References + +https://huggingface.co/Blue7Bird/Roboust_nlp_xlmr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-ruroberta_large_neg_en.md b/docs/_posts/ahmedlone127/2024-09-23-ruroberta_large_neg_en.md new file mode 100644 index 00000000000000..39ae1f0547a179 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-ruroberta_large_neg_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English ruroberta_large_neg RoBertaForTokenClassification from DimasikKurd +author: John Snow Labs +name: ruroberta_large_neg +date: 2024-09-23 +tags: [en, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ruroberta_large_neg` is a English model originally trained by DimasikKurd. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ruroberta_large_neg_en_5.5.0_3.0_1727072654278.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ruroberta_large_neg_en_5.5.0_3.0_1727072654278.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("ruroberta_large_neg","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("ruroberta_large_neg", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ruroberta_large_neg| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/DimasikKurd/ruRoberta-large_neg \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-ruroberta_large_neg_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-ruroberta_large_neg_pipeline_en.md new file mode 100644 index 00000000000000..2b5a29e6f331ad --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-ruroberta_large_neg_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English ruroberta_large_neg_pipeline pipeline RoBertaForTokenClassification from DimasikKurd +author: John Snow Labs +name: ruroberta_large_neg_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ruroberta_large_neg_pipeline` is a English model originally trained by DimasikKurd. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ruroberta_large_neg_pipeline_en_5.5.0_3.0_1727072720324.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ruroberta_large_neg_pipeline_en_5.5.0_3.0_1727072720324.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("ruroberta_large_neg_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("ruroberta_large_neg_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ruroberta_large_neg_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/DimasikKurd/ruRoberta-large_neg + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-s_ohm_en.md b/docs/_posts/ahmedlone127/2024-09-23-s_ohm_en.md new file mode 100644 index 00000000000000..acb6929db97a92 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-s_ohm_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English s_ohm RoBertaEmbeddings from anandohm +author: John Snow Labs +name: s_ohm +date: 2024-09-23 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`s_ohm` is a English model originally trained by anandohm. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/s_ohm_en_5.5.0_3.0_1727091880416.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/s_ohm_en_5.5.0_3.0_1727091880416.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("s_ohm","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("s_ohm","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|s_ohm| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|310.1 MB| + +## References + +https://huggingface.co/anandohm/S_ohm \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-s_ohm_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-s_ohm_pipeline_en.md new file mode 100644 index 00000000000000..61c37d824d01af --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-s_ohm_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English s_ohm_pipeline pipeline RoBertaEmbeddings from anandohm +author: John Snow Labs +name: s_ohm_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`s_ohm_pipeline` is a English model originally trained by anandohm. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/s_ohm_pipeline_en_5.5.0_3.0_1727091895560.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/s_ohm_pipeline_en_5.5.0_3.0_1727091895560.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("s_ohm_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("s_ohm_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|s_ohm_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|310.1 MB| + +## References + +https://huggingface.co/anandohm/S_ohm + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sajjadayoubi_bert_base_persian_farsi_qa_finetune_on_amharic_15_en.md b/docs/_posts/ahmedlone127/2024-09-23-sajjadayoubi_bert_base_persian_farsi_qa_finetune_on_amharic_15_en.md new file mode 100644 index 00000000000000..8894a5ed060b73 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sajjadayoubi_bert_base_persian_farsi_qa_finetune_on_amharic_15_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English sajjadayoubi_bert_base_persian_farsi_qa_finetune_on_amharic_15 BertForQuestionAnswering from phd411r1 +author: John Snow Labs +name: sajjadayoubi_bert_base_persian_farsi_qa_finetune_on_amharic_15 +date: 2024-09-23 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sajjadayoubi_bert_base_persian_farsi_qa_finetune_on_amharic_15` is a English model originally trained by phd411r1. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sajjadayoubi_bert_base_persian_farsi_qa_finetune_on_amharic_15_en_5.5.0_3.0_1727128026158.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sajjadayoubi_bert_base_persian_farsi_qa_finetune_on_amharic_15_en_5.5.0_3.0_1727128026158.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("sajjadayoubi_bert_base_persian_farsi_qa_finetune_on_amharic_15","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("sajjadayoubi_bert_base_persian_farsi_qa_finetune_on_amharic_15", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sajjadayoubi_bert_base_persian_farsi_qa_finetune_on_amharic_15| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|606.5 MB| + +## References + +https://huggingface.co/phd411r1/SajjadAyoubi_bert-base-fa-qa_finetune_on_am_15 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sajjadayoubi_bert_base_persian_farsi_qa_finetune_on_amharic_15_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-sajjadayoubi_bert_base_persian_farsi_qa_finetune_on_amharic_15_pipeline_en.md new file mode 100644 index 00000000000000..011ce86c77e54f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sajjadayoubi_bert_base_persian_farsi_qa_finetune_on_amharic_15_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English sajjadayoubi_bert_base_persian_farsi_qa_finetune_on_amharic_15_pipeline pipeline BertForQuestionAnswering from phd411r1 +author: John Snow Labs +name: sajjadayoubi_bert_base_persian_farsi_qa_finetune_on_amharic_15_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sajjadayoubi_bert_base_persian_farsi_qa_finetune_on_amharic_15_pipeline` is a English model originally trained by phd411r1. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sajjadayoubi_bert_base_persian_farsi_qa_finetune_on_amharic_15_pipeline_en_5.5.0_3.0_1727128061168.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sajjadayoubi_bert_base_persian_farsi_qa_finetune_on_amharic_15_pipeline_en_5.5.0_3.0_1727128061168.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sajjadayoubi_bert_base_persian_farsi_qa_finetune_on_amharic_15_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sajjadayoubi_bert_base_persian_farsi_qa_finetune_on_amharic_15_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sajjadayoubi_bert_base_persian_farsi_qa_finetune_on_amharic_15_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|606.5 MB| + +## References + +https://huggingface.co/phd411r1/SajjadAyoubi_bert-base-fa-qa_finetune_on_am_15 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-scenario_non_kd_po_copy_d2_data_amazonscience_massive_all_1_1_gamma_en.md b/docs/_posts/ahmedlone127/2024-09-23-scenario_non_kd_po_copy_d2_data_amazonscience_massive_all_1_1_gamma_en.md new file mode 100644 index 00000000000000..1e4ca78e19dded --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-scenario_non_kd_po_copy_d2_data_amazonscience_massive_all_1_1_gamma_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English scenario_non_kd_po_copy_d2_data_amazonscience_massive_all_1_1_gamma XlmRoBertaForSequenceClassification from haryoaw +author: John Snow Labs +name: scenario_non_kd_po_copy_d2_data_amazonscience_massive_all_1_1_gamma +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`scenario_non_kd_po_copy_d2_data_amazonscience_massive_all_1_1_gamma` is a English model originally trained by haryoaw. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/scenario_non_kd_po_copy_d2_data_amazonscience_massive_all_1_1_gamma_en_5.5.0_3.0_1727099539900.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/scenario_non_kd_po_copy_d2_data_amazonscience_massive_all_1_1_gamma_en_5.5.0_3.0_1727099539900.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("scenario_non_kd_po_copy_d2_data_amazonscience_massive_all_1_1_gamma","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("scenario_non_kd_po_copy_d2_data_amazonscience_massive_all_1_1_gamma", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|scenario_non_kd_po_copy_d2_data_amazonscience_massive_all_1_1_gamma| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|801.7 MB| + +## References + +https://huggingface.co/haryoaw/scenario-NON-KD-PO-COPY-D2_data-AmazonScience_massive_all_1_1_gamma \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-scenario_non_kd_po_copy_d2_data_amazonscience_massive_all_1_1_gamma_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-scenario_non_kd_po_copy_d2_data_amazonscience_massive_all_1_1_gamma_pipeline_en.md new file mode 100644 index 00000000000000..95429e74af41cd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-scenario_non_kd_po_copy_d2_data_amazonscience_massive_all_1_1_gamma_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English scenario_non_kd_po_copy_d2_data_amazonscience_massive_all_1_1_gamma_pipeline pipeline XlmRoBertaForSequenceClassification from haryoaw +author: John Snow Labs +name: scenario_non_kd_po_copy_d2_data_amazonscience_massive_all_1_1_gamma_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`scenario_non_kd_po_copy_d2_data_amazonscience_massive_all_1_1_gamma_pipeline` is a English model originally trained by haryoaw. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/scenario_non_kd_po_copy_d2_data_amazonscience_massive_all_1_1_gamma_pipeline_en_5.5.0_3.0_1727099587307.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/scenario_non_kd_po_copy_d2_data_amazonscience_massive_all_1_1_gamma_pipeline_en_5.5.0_3.0_1727099587307.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("scenario_non_kd_po_copy_d2_data_amazonscience_massive_all_1_1_gamma_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("scenario_non_kd_po_copy_d2_data_amazonscience_massive_all_1_1_gamma_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|scenario_non_kd_po_copy_d2_data_amazonscience_massive_all_1_1_gamma_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|801.7 MB| + +## References + +https://huggingface.co/haryoaw/scenario-NON-KD-PO-COPY-D2_data-AmazonScience_massive_all_1_1_gamma + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-scenario_non_kd_scr_d2_data_cardiffnlp_tweet_sentiment_multilingual_all2_pipeline_xx.md b/docs/_posts/ahmedlone127/2024-09-23-scenario_non_kd_scr_d2_data_cardiffnlp_tweet_sentiment_multilingual_all2_pipeline_xx.md new file mode 100644 index 00000000000000..9e3ea0c97eb569 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-scenario_non_kd_scr_d2_data_cardiffnlp_tweet_sentiment_multilingual_all2_pipeline_xx.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Multilingual scenario_non_kd_scr_d2_data_cardiffnlp_tweet_sentiment_multilingual_all2_pipeline pipeline XlmRoBertaForSequenceClassification from haryoaw +author: John Snow Labs +name: scenario_non_kd_scr_d2_data_cardiffnlp_tweet_sentiment_multilingual_all2_pipeline +date: 2024-09-23 +tags: [xx, open_source, pipeline, onnx] +task: Text Classification +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`scenario_non_kd_scr_d2_data_cardiffnlp_tweet_sentiment_multilingual_all2_pipeline` is a Multilingual model originally trained by haryoaw. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/scenario_non_kd_scr_d2_data_cardiffnlp_tweet_sentiment_multilingual_all2_pipeline_xx_5.5.0_3.0_1727089013755.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/scenario_non_kd_scr_d2_data_cardiffnlp_tweet_sentiment_multilingual_all2_pipeline_xx_5.5.0_3.0_1727089013755.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("scenario_non_kd_scr_d2_data_cardiffnlp_tweet_sentiment_multilingual_all2_pipeline", lang = "xx") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("scenario_non_kd_scr_d2_data_cardiffnlp_tweet_sentiment_multilingual_all2_pipeline", lang = "xx") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|scenario_non_kd_scr_d2_data_cardiffnlp_tweet_sentiment_multilingual_all2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|xx| +|Size:|883.8 MB| + +## References + +https://huggingface.co/haryoaw/scenario-NON-KD-SCR-D2_data-cardiffnlp_tweet_sentiment_multilingual_all2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-scenario_non_kd_scr_d2_data_cardiffnlp_tweet_sentiment_multilingual_all2_xx.md b/docs/_posts/ahmedlone127/2024-09-23-scenario_non_kd_scr_d2_data_cardiffnlp_tweet_sentiment_multilingual_all2_xx.md new file mode 100644 index 00000000000000..ec1da4abb0ed42 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-scenario_non_kd_scr_d2_data_cardiffnlp_tweet_sentiment_multilingual_all2_xx.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Multilingual scenario_non_kd_scr_d2_data_cardiffnlp_tweet_sentiment_multilingual_all2 XlmRoBertaForSequenceClassification from haryoaw +author: John Snow Labs +name: scenario_non_kd_scr_d2_data_cardiffnlp_tweet_sentiment_multilingual_all2 +date: 2024-09-23 +tags: [xx, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`scenario_non_kd_scr_d2_data_cardiffnlp_tweet_sentiment_multilingual_all2` is a Multilingual model originally trained by haryoaw. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/scenario_non_kd_scr_d2_data_cardiffnlp_tweet_sentiment_multilingual_all2_xx_5.5.0_3.0_1727088971728.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/scenario_non_kd_scr_d2_data_cardiffnlp_tweet_sentiment_multilingual_all2_xx_5.5.0_3.0_1727088971728.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("scenario_non_kd_scr_d2_data_cardiffnlp_tweet_sentiment_multilingual_all2","xx") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("scenario_non_kd_scr_d2_data_cardiffnlp_tweet_sentiment_multilingual_all2", "xx") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|scenario_non_kd_scr_d2_data_cardiffnlp_tweet_sentiment_multilingual_all2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|xx| +|Size:|883.7 MB| + +## References + +https://huggingface.co/haryoaw/scenario-NON-KD-SCR-D2_data-cardiffnlp_tweet_sentiment_multilingual_all2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-secondo_modello_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-secondo_modello_pipeline_en.md new file mode 100644 index 00000000000000..7964fd1acf4582 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-secondo_modello_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English secondo_modello_pipeline pipeline DistilBertForSequenceClassification from soniarocca31 +author: John Snow Labs +name: secondo_modello_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`secondo_modello_pipeline` is a English model originally trained by soniarocca31. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/secondo_modello_pipeline_en_5.5.0_3.0_1727074181815.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/secondo_modello_pipeline_en_5.5.0_3.0_1727074181815.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("secondo_modello_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("secondo_modello_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|secondo_modello_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|507.6 MB| + +## References + +https://huggingface.co/soniarocca31/secondo_modello + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sector_multilabel_climatebert_f_en.md b/docs/_posts/ahmedlone127/2024-09-23-sector_multilabel_climatebert_f_en.md new file mode 100644 index 00000000000000..3cc5011fac7648 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sector_multilabel_climatebert_f_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sector_multilabel_climatebert_f RoBertaForSequenceClassification from GIZ +author: John Snow Labs +name: sector_multilabel_climatebert_f +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sector_multilabel_climatebert_f` is a English model originally trained by GIZ. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sector_multilabel_climatebert_f_en_5.5.0_3.0_1727085913596.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sector_multilabel_climatebert_f_en_5.5.0_3.0_1727085913596.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("sector_multilabel_climatebert_f","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("sector_multilabel_climatebert_f", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sector_multilabel_climatebert_f| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|309.7 MB| + +## References + +https://huggingface.co/GIZ/SECTOR-multilabel-climatebert_f \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sector_multilabel_climatebert_f_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-sector_multilabel_climatebert_f_pipeline_en.md new file mode 100644 index 00000000000000..711f1e1eb518f5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sector_multilabel_climatebert_f_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English sector_multilabel_climatebert_f_pipeline pipeline RoBertaForSequenceClassification from GIZ +author: John Snow Labs +name: sector_multilabel_climatebert_f_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sector_multilabel_climatebert_f_pipeline` is a English model originally trained by GIZ. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sector_multilabel_climatebert_f_pipeline_en_5.5.0_3.0_1727085931657.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sector_multilabel_climatebert_f_pipeline_en_5.5.0_3.0_1727085931657.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sector_multilabel_climatebert_f_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sector_multilabel_climatebert_f_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sector_multilabel_climatebert_f_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|309.7 MB| + +## References + +https://huggingface.co/GIZ/SECTOR-multilabel-climatebert_f + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-securebert_finetuned_autoisac_en.md b/docs/_posts/ahmedlone127/2024-09-23-securebert_finetuned_autoisac_en.md new file mode 100644 index 00000000000000..dfd64313e38da0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-securebert_finetuned_autoisac_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English securebert_finetuned_autoisac RoBertaEmbeddings from frankharman +author: John Snow Labs +name: securebert_finetuned_autoisac +date: 2024-09-23 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`securebert_finetuned_autoisac` is a English model originally trained by frankharman. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/securebert_finetuned_autoisac_en_5.5.0_3.0_1727080870169.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/securebert_finetuned_autoisac_en_5.5.0_3.0_1727080870169.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("securebert_finetuned_autoisac","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("securebert_finetuned_autoisac","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|securebert_finetuned_autoisac| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|466.1 MB| + +## References + +https://huggingface.co/frankharman/securebert-finetuned-autoisac \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-securebert_finetuned_autoisac_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-securebert_finetuned_autoisac_pipeline_en.md new file mode 100644 index 00000000000000..45967a9774ce20 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-securebert_finetuned_autoisac_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English securebert_finetuned_autoisac_pipeline pipeline RoBertaEmbeddings from frankharman +author: John Snow Labs +name: securebert_finetuned_autoisac_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`securebert_finetuned_autoisac_pipeline` is a English model originally trained by frankharman. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/securebert_finetuned_autoisac_pipeline_en_5.5.0_3.0_1727080892361.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/securebert_finetuned_autoisac_pipeline_en_5.5.0_3.0_1727080892361.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("securebert_finetuned_autoisac_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("securebert_finetuned_autoisac_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|securebert_finetuned_autoisac_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|466.1 MB| + +## References + +https://huggingface.co/frankharman/securebert-finetuned-autoisac + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_arabertmo_base_v8_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_arabertmo_base_v8_en.md new file mode 100644 index 00000000000000..520aab40ab8269 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_arabertmo_base_v8_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_arabertmo_base_v8 BertSentenceEmbeddings from Ebtihal +author: John Snow Labs +name: sent_arabertmo_base_v8 +date: 2024-09-23 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_arabertmo_base_v8` is a English model originally trained by Ebtihal. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_arabertmo_base_v8_en_5.5.0_3.0_1727091051376.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_arabertmo_base_v8_en_5.5.0_3.0_1727091051376.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_arabertmo_base_v8","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_arabertmo_base_v8","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_arabertmo_base_v8| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|407.9 MB| + +## References + +https://huggingface.co/Ebtihal/AraBertMo_base_V8 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_arabertmo_base_v8_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_arabertmo_base_v8_pipeline_en.md new file mode 100644 index 00000000000000..0b7b2f59b3332d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_arabertmo_base_v8_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_arabertmo_base_v8_pipeline pipeline BertSentenceEmbeddings from Ebtihal +author: John Snow Labs +name: sent_arabertmo_base_v8_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_arabertmo_base_v8_pipeline` is a English model originally trained by Ebtihal. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_arabertmo_base_v8_pipeline_en_5.5.0_3.0_1727091070845.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_arabertmo_base_v8_pipeline_en_5.5.0_3.0_1727091070845.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_arabertmo_base_v8_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_arabertmo_base_v8_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_arabertmo_base_v8_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|408.4 MB| + +## References + +https://huggingface.co/Ebtihal/AraBertMo_base_V8 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_buddhist_sanskrit_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_buddhist_sanskrit_en.md new file mode 100644 index 00000000000000..f5f916357820f1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_buddhist_sanskrit_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_base_buddhist_sanskrit BertSentenceEmbeddings from Matej +author: John Snow Labs +name: sent_bert_base_buddhist_sanskrit +date: 2024-09-23 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_buddhist_sanskrit` is a English model originally trained by Matej. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_buddhist_sanskrit_en_5.5.0_3.0_1727105457117.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_buddhist_sanskrit_en_5.5.0_3.0_1727105457117.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_buddhist_sanskrit","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_buddhist_sanskrit","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_buddhist_sanskrit| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|406.8 MB| + +## References + +https://huggingface.co/Matej/bert-base-buddhist-sanskrit \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_buddhist_sanskrit_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_buddhist_sanskrit_pipeline_en.md new file mode 100644 index 00000000000000..ff38966dcc8479 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_buddhist_sanskrit_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_base_buddhist_sanskrit_pipeline pipeline BertSentenceEmbeddings from Matej +author: John Snow Labs +name: sent_bert_base_buddhist_sanskrit_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_buddhist_sanskrit_pipeline` is a English model originally trained by Matej. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_buddhist_sanskrit_pipeline_en_5.5.0_3.0_1727105476589.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_buddhist_sanskrit_pipeline_en_5.5.0_3.0_1727105476589.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_buddhist_sanskrit_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_buddhist_sanskrit_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_buddhist_sanskrit_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.3 MB| + +## References + +https://huggingface.co/Matej/bert-base-buddhist-sanskrit + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_cased_portuguese_lenerbr_alynneoya_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_cased_portuguese_lenerbr_alynneoya_en.md new file mode 100644 index 00000000000000..9a480ffd724920 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_cased_portuguese_lenerbr_alynneoya_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_base_cased_portuguese_lenerbr_alynneoya BertSentenceEmbeddings from alynneoya +author: John Snow Labs +name: sent_bert_base_cased_portuguese_lenerbr_alynneoya +date: 2024-09-23 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_cased_portuguese_lenerbr_alynneoya` is a English model originally trained by alynneoya. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_cased_portuguese_lenerbr_alynneoya_en_5.5.0_3.0_1727113885461.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_cased_portuguese_lenerbr_alynneoya_en_5.5.0_3.0_1727113885461.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_cased_portuguese_lenerbr_alynneoya","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_cased_portuguese_lenerbr_alynneoya","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_cased_portuguese_lenerbr_alynneoya| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|405.9 MB| + +## References + +https://huggingface.co/alynneoya/bert-base-cased-pt-lenerbr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_cased_portuguese_lenerbr_alynneoya_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_cased_portuguese_lenerbr_alynneoya_pipeline_en.md new file mode 100644 index 00000000000000..b2bdac35728ca1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_cased_portuguese_lenerbr_alynneoya_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_base_cased_portuguese_lenerbr_alynneoya_pipeline pipeline BertSentenceEmbeddings from alynneoya +author: John Snow Labs +name: sent_bert_base_cased_portuguese_lenerbr_alynneoya_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_cased_portuguese_lenerbr_alynneoya_pipeline` is a English model originally trained by alynneoya. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_cased_portuguese_lenerbr_alynneoya_pipeline_en_5.5.0_3.0_1727113905335.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_cased_portuguese_lenerbr_alynneoya_pipeline_en_5.5.0_3.0_1727113905335.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_cased_portuguese_lenerbr_alynneoya_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_cased_portuguese_lenerbr_alynneoya_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_cased_portuguese_lenerbr_alynneoya_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|406.5 MB| + +## References + +https://huggingface.co/alynneoya/bert-base-cased-pt-lenerbr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_cased_scmedium_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_cased_scmedium_en.md new file mode 100644 index 00000000000000..adac6a70ce43d8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_cased_scmedium_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_base_cased_scmedium BertSentenceEmbeddings from CambridgeMolecularEngineering +author: John Snow Labs +name: sent_bert_base_cased_scmedium +date: 2024-09-23 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_cased_scmedium` is a English model originally trained by CambridgeMolecularEngineering. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_cased_scmedium_en_5.5.0_3.0_1727113684994.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_cased_scmedium_en_5.5.0_3.0_1727113684994.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_cased_scmedium","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_cased_scmedium","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_cased_scmedium| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|403.6 MB| + +## References + +https://huggingface.co/CambridgeMolecularEngineering/bert-base-cased-scmedium \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_cased_scmedium_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_cased_scmedium_pipeline_en.md new file mode 100644 index 00000000000000..6405e0a58ceb0c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_cased_scmedium_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_base_cased_scmedium_pipeline pipeline BertSentenceEmbeddings from CambridgeMolecularEngineering +author: John Snow Labs +name: sent_bert_base_cased_scmedium_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_cased_scmedium_pipeline` is a English model originally trained by CambridgeMolecularEngineering. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_cased_scmedium_pipeline_en_5.5.0_3.0_1727113704205.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_cased_scmedium_pipeline_en_5.5.0_3.0_1727113704205.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_cased_scmedium_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_cased_scmedium_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_cased_scmedium_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|404.2 MB| + +## References + +https://huggingface.co/CambridgeMolecularEngineering/bert-base-cased-scmedium + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_dutch_cased_finetuned_manx_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_dutch_cased_finetuned_manx_en.md new file mode 100644 index 00000000000000..e28429767e4a52 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_dutch_cased_finetuned_manx_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_base_dutch_cased_finetuned_manx BertSentenceEmbeddings from Pyjay +author: John Snow Labs +name: sent_bert_base_dutch_cased_finetuned_manx +date: 2024-09-23 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_dutch_cased_finetuned_manx` is a English model originally trained by Pyjay. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_dutch_cased_finetuned_manx_en_5.5.0_3.0_1727109962610.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_dutch_cased_finetuned_manx_en_5.5.0_3.0_1727109962610.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_dutch_cased_finetuned_manx","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_dutch_cased_finetuned_manx","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_dutch_cased_finetuned_manx| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|406.8 MB| + +## References + +https://huggingface.co/Pyjay/bert-base-dutch-cased-finetuned-gv \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_dutch_cased_finetuned_manx_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_dutch_cased_finetuned_manx_pipeline_en.md new file mode 100644 index 00000000000000..1956b1825e2100 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_dutch_cased_finetuned_manx_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_base_dutch_cased_finetuned_manx_pipeline pipeline BertSentenceEmbeddings from Pyjay +author: John Snow Labs +name: sent_bert_base_dutch_cased_finetuned_manx_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_dutch_cased_finetuned_manx_pipeline` is a English model originally trained by Pyjay. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_dutch_cased_finetuned_manx_pipeline_en_5.5.0_3.0_1727109982428.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_dutch_cased_finetuned_manx_pipeline_en_5.5.0_3.0_1727109982428.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_dutch_cased_finetuned_manx_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_dutch_cased_finetuned_manx_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_dutch_cased_finetuned_manx_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.3 MB| + +## References + +https://huggingface.co/Pyjay/bert-base-dutch-cased-finetuned-gv + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_english_german_cased_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_english_german_cased_en.md new file mode 100644 index 00000000000000..34b0f53282efb5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_english_german_cased_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_base_english_german_cased BertSentenceEmbeddings from Geotrend +author: John Snow Labs +name: sent_bert_base_english_german_cased +date: 2024-09-23 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_english_german_cased` is a English model originally trained by Geotrend. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_english_german_cased_en_5.5.0_3.0_1727090987713.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_english_german_cased_en_5.5.0_3.0_1727090987713.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_english_german_cased","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_english_german_cased","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_english_german_cased| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|421.8 MB| + +## References + +https://huggingface.co/Geotrend/bert-base-en-de-cased \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_english_german_cased_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_english_german_cased_pipeline_en.md new file mode 100644 index 00000000000000..175a53b8dbbc4e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_english_german_cased_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_base_english_german_cased_pipeline pipeline BertSentenceEmbeddings from Geotrend +author: John Snow Labs +name: sent_bert_base_english_german_cased_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_english_german_cased_pipeline` is a English model originally trained by Geotrend. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_english_german_cased_pipeline_en_5.5.0_3.0_1727091008286.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_english_german_cased_pipeline_en_5.5.0_3.0_1727091008286.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_english_german_cased_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_english_german_cased_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_english_german_cased_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|422.4 MB| + +## References + +https://huggingface.co/Geotrend/bert-base-en-de-cased + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_english_japanese_cased_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_english_japanese_cased_en.md new file mode 100644 index 00000000000000..11049c42709e03 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_english_japanese_cased_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_base_english_japanese_cased BertSentenceEmbeddings from Geotrend +author: John Snow Labs +name: sent_bert_base_english_japanese_cased +date: 2024-09-23 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_english_japanese_cased` is a English model originally trained by Geotrend. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_english_japanese_cased_en_5.5.0_3.0_1727104978654.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_english_japanese_cased_en_5.5.0_3.0_1727104978654.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_english_japanese_cased","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_english_japanese_cased","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_english_japanese_cased| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|416.3 MB| + +## References + +https://huggingface.co/Geotrend/bert-base-en-ja-cased \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_english_japanese_cased_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_english_japanese_cased_pipeline_en.md new file mode 100644 index 00000000000000..a2322514fb77fb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_english_japanese_cased_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_base_english_japanese_cased_pipeline pipeline BertSentenceEmbeddings from Geotrend +author: John Snow Labs +name: sent_bert_base_english_japanese_cased_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_english_japanese_cased_pipeline` is a English model originally trained by Geotrend. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_english_japanese_cased_pipeline_en_5.5.0_3.0_1727104998249.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_english_japanese_cased_pipeline_en_5.5.0_3.0_1727104998249.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_english_japanese_cased_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_english_japanese_cased_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_english_japanese_cased_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|416.9 MB| + +## References + +https://huggingface.co/Geotrend/bert-base-en-ja-cased + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_english_lithuanian_cased_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_english_lithuanian_cased_en.md new file mode 100644 index 00000000000000..f94c750ec1651f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_english_lithuanian_cased_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_base_english_lithuanian_cased BertSentenceEmbeddings from Geotrend +author: John Snow Labs +name: sent_bert_base_english_lithuanian_cased +date: 2024-09-23 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_english_lithuanian_cased` is a English model originally trained by Geotrend. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_english_lithuanian_cased_en_5.5.0_3.0_1727101824083.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_english_lithuanian_cased_en_5.5.0_3.0_1727101824083.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_english_lithuanian_cased","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_english_lithuanian_cased","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_english_lithuanian_cased| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|412.0 MB| + +## References + +https://huggingface.co/Geotrend/bert-base-en-lt-cased \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_english_lithuanian_cased_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_english_lithuanian_cased_pipeline_en.md new file mode 100644 index 00000000000000..0a1a79502b6ff9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_english_lithuanian_cased_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_base_english_lithuanian_cased_pipeline pipeline BertSentenceEmbeddings from Geotrend +author: John Snow Labs +name: sent_bert_base_english_lithuanian_cased_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_english_lithuanian_cased_pipeline` is a English model originally trained by Geotrend. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_english_lithuanian_cased_pipeline_en_5.5.0_3.0_1727101842957.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_english_lithuanian_cased_pipeline_en_5.5.0_3.0_1727101842957.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_english_lithuanian_cased_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_english_lithuanian_cased_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_english_lithuanian_cased_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|412.5 MB| + +## References + +https://huggingface.co/Geotrend/bert-base-en-lt-cased + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_english_norwegian_cased_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_english_norwegian_cased_en.md new file mode 100644 index 00000000000000..0024110692635e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_english_norwegian_cased_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_base_english_norwegian_cased BertSentenceEmbeddings from Geotrend +author: John Snow Labs +name: sent_bert_base_english_norwegian_cased +date: 2024-09-23 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_english_norwegian_cased` is a English model originally trained by Geotrend. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_english_norwegian_cased_en_5.5.0_3.0_1727105450905.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_english_norwegian_cased_en_5.5.0_3.0_1727105450905.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_english_norwegian_cased","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_english_norwegian_cased","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_english_norwegian_cased| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|415.6 MB| + +## References + +https://huggingface.co/Geotrend/bert-base-en-no-cased \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_english_norwegian_cased_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_english_norwegian_cased_pipeline_en.md new file mode 100644 index 00000000000000..8a0812407812e7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_english_norwegian_cased_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_base_english_norwegian_cased_pipeline pipeline BertSentenceEmbeddings from Geotrend +author: John Snow Labs +name: sent_bert_base_english_norwegian_cased_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_english_norwegian_cased_pipeline` is a English model originally trained by Geotrend. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_english_norwegian_cased_pipeline_en_5.5.0_3.0_1727105470030.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_english_norwegian_cased_pipeline_en_5.5.0_3.0_1727105470030.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_english_norwegian_cased_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_english_norwegian_cased_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_english_norwegian_cased_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|416.1 MB| + +## References + +https://huggingface.co/Geotrend/bert-base-en-no-cased + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_english_portuguese_cased_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_english_portuguese_cased_en.md new file mode 100644 index 00000000000000..6b43fb50c702d4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_english_portuguese_cased_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_base_english_portuguese_cased BertSentenceEmbeddings from Geotrend +author: John Snow Labs +name: sent_bert_base_english_portuguese_cased +date: 2024-09-23 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_english_portuguese_cased` is a English model originally trained by Geotrend. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_english_portuguese_cased_en_5.5.0_3.0_1727105325121.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_english_portuguese_cased_en_5.5.0_3.0_1727105325121.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_english_portuguese_cased","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_english_portuguese_cased","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_english_portuguese_cased| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|419.2 MB| + +## References + +https://huggingface.co/Geotrend/bert-base-en-pt-cased \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_english_spanish_portuguese_cased_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_english_spanish_portuguese_cased_pipeline_en.md new file mode 100644 index 00000000000000..26cfe1bea9379f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_english_spanish_portuguese_cased_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_base_english_spanish_portuguese_cased_pipeline pipeline BertSentenceEmbeddings from Geotrend +author: John Snow Labs +name: sent_bert_base_english_spanish_portuguese_cased_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_english_spanish_portuguese_cased_pipeline` is a English model originally trained by Geotrend. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_english_spanish_portuguese_cased_pipeline_en_5.5.0_3.0_1727091120598.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_english_spanish_portuguese_cased_pipeline_en_5.5.0_3.0_1727091120598.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_english_spanish_portuguese_cased_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_english_spanish_portuguese_cased_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_english_spanish_portuguese_cased_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|428.5 MB| + +## References + +https://huggingface.co/Geotrend/bert-base-en-es-pt-cased + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_english_turkish_cased_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_english_turkish_cased_en.md new file mode 100644 index 00000000000000..32aeb3073bca39 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_english_turkish_cased_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_base_english_turkish_cased BertSentenceEmbeddings from Geotrend +author: John Snow Labs +name: sent_bert_base_english_turkish_cased +date: 2024-09-23 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_english_turkish_cased` is a English model originally trained by Geotrend. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_english_turkish_cased_en_5.5.0_3.0_1727109881858.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_english_turkish_cased_en_5.5.0_3.0_1727109881858.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_english_turkish_cased","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_english_turkish_cased","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_english_turkish_cased| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|410.7 MB| + +## References + +https://huggingface.co/Geotrend/bert-base-en-tr-cased \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_english_turkish_cased_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_english_turkish_cased_pipeline_en.md new file mode 100644 index 00000000000000..626e6fdd6efe40 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_english_turkish_cased_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_base_english_turkish_cased_pipeline pipeline BertSentenceEmbeddings from Geotrend +author: John Snow Labs +name: sent_bert_base_english_turkish_cased_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_english_turkish_cased_pipeline` is a English model originally trained by Geotrend. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_english_turkish_cased_pipeline_en_5.5.0_3.0_1727109901413.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_english_turkish_cased_pipeline_en_5.5.0_3.0_1727109901413.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_english_turkish_cased_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_english_turkish_cased_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_english_turkish_cased_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|411.2 MB| + +## References + +https://huggingface.co/Geotrend/bert-base-en-tr-cased + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_greek_uncased_v2_finetuned_polylex_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_greek_uncased_v2_finetuned_polylex_en.md new file mode 100644 index 00000000000000..ef7acea9aa1a52 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_greek_uncased_v2_finetuned_polylex_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_base_greek_uncased_v2_finetuned_polylex BertSentenceEmbeddings from snousias +author: John Snow Labs +name: sent_bert_base_greek_uncased_v2_finetuned_polylex +date: 2024-09-23 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_greek_uncased_v2_finetuned_polylex` is a English model originally trained by snousias. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_greek_uncased_v2_finetuned_polylex_en_5.5.0_3.0_1727113449962.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_greek_uncased_v2_finetuned_polylex_en_5.5.0_3.0_1727113449962.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_greek_uncased_v2_finetuned_polylex","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_greek_uncased_v2_finetuned_polylex","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_greek_uncased_v2_finetuned_polylex| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|421.1 MB| + +## References + +https://huggingface.co/snousias/bert-base-greek-uncased-v2-finetuned-polylex \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_greek_uncased_v2_finetuned_polylex_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_greek_uncased_v2_finetuned_polylex_pipeline_en.md new file mode 100644 index 00000000000000..6047432cb57906 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_greek_uncased_v2_finetuned_polylex_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_base_greek_uncased_v2_finetuned_polylex_pipeline pipeline BertSentenceEmbeddings from snousias +author: John Snow Labs +name: sent_bert_base_greek_uncased_v2_finetuned_polylex_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_greek_uncased_v2_finetuned_polylex_pipeline` is a English model originally trained by snousias. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_greek_uncased_v2_finetuned_polylex_pipeline_en_5.5.0_3.0_1727113469909.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_greek_uncased_v2_finetuned_polylex_pipeline_en_5.5.0_3.0_1727113469909.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_greek_uncased_v2_finetuned_polylex_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_greek_uncased_v2_finetuned_polylex_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_greek_uncased_v2_finetuned_polylex_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|421.7 MB| + +## References + +https://huggingface.co/snousias/bert-base-greek-uncased-v2-finetuned-polylex + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_greek_uncased_v6_finetuned_polylex_malagasy_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_greek_uncased_v6_finetuned_polylex_malagasy_en.md new file mode 100644 index 00000000000000..aafb6bc4e960cd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_greek_uncased_v6_finetuned_polylex_malagasy_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_base_greek_uncased_v6_finetuned_polylex_malagasy BertSentenceEmbeddings from polylexmg +author: John Snow Labs +name: sent_bert_base_greek_uncased_v6_finetuned_polylex_malagasy +date: 2024-09-23 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_greek_uncased_v6_finetuned_polylex_malagasy` is a English model originally trained by polylexmg. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_greek_uncased_v6_finetuned_polylex_malagasy_en_5.5.0_3.0_1727110158116.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_greek_uncased_v6_finetuned_polylex_malagasy_en_5.5.0_3.0_1727110158116.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_greek_uncased_v6_finetuned_polylex_malagasy","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_greek_uncased_v6_finetuned_polylex_malagasy","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_greek_uncased_v6_finetuned_polylex_malagasy| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|421.1 MB| + +## References + +https://huggingface.co/polylexmg/bert-base-greek-uncased-v6-finetuned-polylex-mg \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_greek_uncased_v6_finetuned_polylex_malagasy_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_greek_uncased_v6_finetuned_polylex_malagasy_pipeline_en.md new file mode 100644 index 00000000000000..a4d935d0d29889 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_greek_uncased_v6_finetuned_polylex_malagasy_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_base_greek_uncased_v6_finetuned_polylex_malagasy_pipeline pipeline BertSentenceEmbeddings from polylexmg +author: John Snow Labs +name: sent_bert_base_greek_uncased_v6_finetuned_polylex_malagasy_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_greek_uncased_v6_finetuned_polylex_malagasy_pipeline` is a English model originally trained by polylexmg. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_greek_uncased_v6_finetuned_polylex_malagasy_pipeline_en_5.5.0_3.0_1727110178153.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_greek_uncased_v6_finetuned_polylex_malagasy_pipeline_en_5.5.0_3.0_1727110178153.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_greek_uncased_v6_finetuned_polylex_malagasy_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_greek_uncased_v6_finetuned_polylex_malagasy_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_greek_uncased_v6_finetuned_polylex_malagasy_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|421.7 MB| + +## References + +https://huggingface.co/polylexmg/bert-base-greek-uncased-v6-finetuned-polylex-mg + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_stackoverflow_comments_1m_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_stackoverflow_comments_1m_en.md new file mode 100644 index 00000000000000..1b0e5508739676 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_stackoverflow_comments_1m_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_base_stackoverflow_comments_1m BertSentenceEmbeddings from giganticode +author: John Snow Labs +name: sent_bert_base_stackoverflow_comments_1m +date: 2024-09-23 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_stackoverflow_comments_1m` is a English model originally trained by giganticode. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_stackoverflow_comments_1m_en_5.5.0_3.0_1727122964051.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_stackoverflow_comments_1m_en_5.5.0_3.0_1727122964051.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_stackoverflow_comments_1m","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_stackoverflow_comments_1m","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_stackoverflow_comments_1m| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|407.0 MB| + +## References + +https://huggingface.co/giganticode/bert-base-StackOverflow-comments_1M \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_stackoverflow_comments_1m_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_stackoverflow_comments_1m_pipeline_en.md new file mode 100644 index 00000000000000..37648299a88fa5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_stackoverflow_comments_1m_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_base_stackoverflow_comments_1m_pipeline pipeline BertSentenceEmbeddings from giganticode +author: John Snow Labs +name: sent_bert_base_stackoverflow_comments_1m_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_stackoverflow_comments_1m_pipeline` is a English model originally trained by giganticode. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_stackoverflow_comments_1m_pipeline_en_5.5.0_3.0_1727122982883.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_stackoverflow_comments_1m_pipeline_en_5.5.0_3.0_1727122982883.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_stackoverflow_comments_1m_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_stackoverflow_comments_1m_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_stackoverflow_comments_1m_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.5 MB| + +## References + +https://huggingface.co/giganticode/bert-base-StackOverflow-comments_1M + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_theseus_bulgarian_bg.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_theseus_bulgarian_bg.md new file mode 100644 index 00000000000000..650432e47095f1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_theseus_bulgarian_bg.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Bulgarian sent_bert_base_theseus_bulgarian BertSentenceEmbeddings from rmihaylov +author: John Snow Labs +name: sent_bert_base_theseus_bulgarian +date: 2024-09-23 +tags: [bg, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: bg +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_theseus_bulgarian` is a Bulgarian model originally trained by rmihaylov. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_theseus_bulgarian_bg_5.5.0_3.0_1727109588426.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_theseus_bulgarian_bg_5.5.0_3.0_1727109588426.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_theseus_bulgarian","bg") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_theseus_bulgarian","bg") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_theseus_bulgarian| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|bg| +|Size:|505.4 MB| + +## References + +https://huggingface.co/rmihaylov/bert-base-theseus-bg \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_theseus_bulgarian_pipeline_bg.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_theseus_bulgarian_pipeline_bg.md new file mode 100644 index 00000000000000..7eecb91315b7c7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_theseus_bulgarian_pipeline_bg.md @@ -0,0 +1,71 @@ +--- +layout: model +title: Bulgarian sent_bert_base_theseus_bulgarian_pipeline pipeline BertSentenceEmbeddings from rmihaylov +author: John Snow Labs +name: sent_bert_base_theseus_bulgarian_pipeline +date: 2024-09-23 +tags: [bg, open_source, pipeline, onnx] +task: Embeddings +language: bg +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_theseus_bulgarian_pipeline` is a Bulgarian model originally trained by rmihaylov. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_theseus_bulgarian_pipeline_bg_5.5.0_3.0_1727109613013.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_theseus_bulgarian_pipeline_bg_5.5.0_3.0_1727109613013.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_theseus_bulgarian_pipeline", lang = "bg") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_theseus_bulgarian_pipeline", lang = "bg") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_theseus_bulgarian_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|bg| +|Size:|506.0 MB| + +## References + +https://huggingface.co/rmihaylov/bert-base-theseus-bg + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_turkish_uncased_offensive_mlm_pipeline_tr.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_turkish_uncased_offensive_mlm_pipeline_tr.md new file mode 100644 index 00000000000000..0f1bd259718628 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_turkish_uncased_offensive_mlm_pipeline_tr.md @@ -0,0 +1,71 @@ +--- +layout: model +title: Turkish sent_bert_base_turkish_uncased_offensive_mlm_pipeline pipeline BertSentenceEmbeddings from Overfit-GM +author: John Snow Labs +name: sent_bert_base_turkish_uncased_offensive_mlm_pipeline +date: 2024-09-23 +tags: [tr, open_source, pipeline, onnx] +task: Embeddings +language: tr +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_turkish_uncased_offensive_mlm_pipeline` is a Turkish model originally trained by Overfit-GM. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_turkish_uncased_offensive_mlm_pipeline_tr_5.5.0_3.0_1727109586947.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_turkish_uncased_offensive_mlm_pipeline_tr_5.5.0_3.0_1727109586947.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_turkish_uncased_offensive_mlm_pipeline", lang = "tr") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_turkish_uncased_offensive_mlm_pipeline", lang = "tr") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_turkish_uncased_offensive_mlm_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|tr| +|Size:|413.0 MB| + +## References + +https://huggingface.co/Overfit-GM/bert-base-turkish-uncased-offensive-mlm + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_turkish_uncased_offensive_mlm_tr.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_turkish_uncased_offensive_mlm_tr.md new file mode 100644 index 00000000000000..57b0a491dcffdc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_turkish_uncased_offensive_mlm_tr.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Turkish sent_bert_base_turkish_uncased_offensive_mlm BertSentenceEmbeddings from Overfit-GM +author: John Snow Labs +name: sent_bert_base_turkish_uncased_offensive_mlm +date: 2024-09-23 +tags: [tr, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: tr +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_turkish_uncased_offensive_mlm` is a Turkish model originally trained by Overfit-GM. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_turkish_uncased_offensive_mlm_tr_5.5.0_3.0_1727109566445.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_turkish_uncased_offensive_mlm_tr_5.5.0_3.0_1727109566445.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_turkish_uncased_offensive_mlm","tr") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_turkish_uncased_offensive_mlm","tr") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_turkish_uncased_offensive_mlm| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|tr| +|Size:|412.5 MB| + +## References + +https://huggingface.co/Overfit-GM/bert-base-turkish-uncased-offensive-mlm \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_uncased_1802_r2_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_uncased_1802_r2_en.md new file mode 100644 index 00000000000000..35fb09eba1e2af --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_uncased_1802_r2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_base_uncased_1802_r2 BertSentenceEmbeddings from JamesKim +author: John Snow Labs +name: sent_bert_base_uncased_1802_r2 +date: 2024-09-23 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_uncased_1802_r2` is a English model originally trained by JamesKim. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_1802_r2_en_5.5.0_3.0_1727123126737.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_1802_r2_en_5.5.0_3.0_1727123126737.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_uncased_1802_r2","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_uncased_1802_r2","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_uncased_1802_r2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|407.1 MB| + +## References + +https://huggingface.co/JamesKim/bert-base-uncased_1802_r2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_uncased_1802_r2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_uncased_1802_r2_pipeline_en.md new file mode 100644 index 00000000000000..8df58e3d45ece0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_uncased_1802_r2_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_base_uncased_1802_r2_pipeline pipeline BertSentenceEmbeddings from JamesKim +author: John Snow Labs +name: sent_bert_base_uncased_1802_r2_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_uncased_1802_r2_pipeline` is a English model originally trained by JamesKim. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_1802_r2_pipeline_en_5.5.0_3.0_1727123146014.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_1802_r2_pipeline_en_5.5.0_3.0_1727123146014.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_uncased_1802_r2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_uncased_1802_r2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_uncased_1802_r2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.7 MB| + +## References + +https://huggingface.co/JamesKim/bert-base-uncased_1802_r2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_uncased_2022_nvidia_test_3_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_uncased_2022_nvidia_test_3_en.md new file mode 100644 index 00000000000000..99b8ddc7fde1d1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_uncased_2022_nvidia_test_3_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_base_uncased_2022_nvidia_test_3 BertSentenceEmbeddings from philschmid +author: John Snow Labs +name: sent_bert_base_uncased_2022_nvidia_test_3 +date: 2024-09-23 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_uncased_2022_nvidia_test_3` is a English model originally trained by philschmid. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_2022_nvidia_test_3_en_5.5.0_3.0_1727113940265.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_2022_nvidia_test_3_en_5.5.0_3.0_1727113940265.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_uncased_2022_nvidia_test_3","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_uncased_2022_nvidia_test_3","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_uncased_2022_nvidia_test_3| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|412.1 MB| + +## References + +https://huggingface.co/philschmid/bert-base-uncased-2022-nvidia-test-3 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_uncased_danish_da.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_uncased_danish_da.md new file mode 100644 index 00000000000000..f5ad3f44d21eb6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_uncased_danish_da.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Danish sent_bert_base_uncased_danish BertSentenceEmbeddings from KennethTM +author: John Snow Labs +name: sent_bert_base_uncased_danish +date: 2024-09-23 +tags: [da, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: da +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_uncased_danish` is a Danish model originally trained by KennethTM. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_danish_da_5.5.0_3.0_1727090892590.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_danish_da_5.5.0_3.0_1727090892590.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_uncased_danish","da") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_uncased_danish","da") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_uncased_danish| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|da| +|Size:|408.0 MB| + +## References + +https://huggingface.co/KennethTM/bert-base-uncased-danish \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_uncased_danish_pipeline_da.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_uncased_danish_pipeline_da.md new file mode 100644 index 00000000000000..bf933cfb0a4583 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_uncased_danish_pipeline_da.md @@ -0,0 +1,71 @@ +--- +layout: model +title: Danish sent_bert_base_uncased_danish_pipeline pipeline BertSentenceEmbeddings from KennethTM +author: John Snow Labs +name: sent_bert_base_uncased_danish_pipeline +date: 2024-09-23 +tags: [da, open_source, pipeline, onnx] +task: Embeddings +language: da +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_uncased_danish_pipeline` is a Danish model originally trained by KennethTM. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_danish_pipeline_da_5.5.0_3.0_1727090911732.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_danish_pipeline_da_5.5.0_3.0_1727090911732.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_uncased_danish_pipeline", lang = "da") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_uncased_danish_pipeline", lang = "da") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_uncased_danish_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|da| +|Size:|408.6 MB| + +## References + +https://huggingface.co/KennethTM/bert-base-uncased-danish + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_uncased_dish_descriptions_128_0_5m_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_uncased_dish_descriptions_128_0_5m_en.md new file mode 100644 index 00000000000000..0b8dc1be841312 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_uncased_dish_descriptions_128_0_5m_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_base_uncased_dish_descriptions_128_0_5m BertSentenceEmbeddings from abhilashawasthi +author: John Snow Labs +name: sent_bert_base_uncased_dish_descriptions_128_0_5m +date: 2024-09-23 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_uncased_dish_descriptions_128_0_5m` is a English model originally trained by abhilashawasthi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_dish_descriptions_128_0_5m_en_5.5.0_3.0_1727113419882.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_dish_descriptions_128_0_5m_en_5.5.0_3.0_1727113419882.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_uncased_dish_descriptions_128_0_5m","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_uncased_dish_descriptions_128_0_5m","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_uncased_dish_descriptions_128_0_5m| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|407.1 MB| + +## References + +https://huggingface.co/abhilashawasthi/bert-base-uncased_dish_descriptions_128_0.5M \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_uncased_dish_descriptions_128_0_5m_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_uncased_dish_descriptions_128_0_5m_pipeline_en.md new file mode 100644 index 00000000000000..4197ab515e7404 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_uncased_dish_descriptions_128_0_5m_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_base_uncased_dish_descriptions_128_0_5m_pipeline pipeline BertSentenceEmbeddings from abhilashawasthi +author: John Snow Labs +name: sent_bert_base_uncased_dish_descriptions_128_0_5m_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_uncased_dish_descriptions_128_0_5m_pipeline` is a English model originally trained by abhilashawasthi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_dish_descriptions_128_0_5m_pipeline_en_5.5.0_3.0_1727113439605.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_dish_descriptions_128_0_5m_pipeline_en_5.5.0_3.0_1727113439605.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_uncased_dish_descriptions_128_0_5m_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_uncased_dish_descriptions_128_0_5m_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_uncased_dish_descriptions_128_0_5m_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.6 MB| + +## References + +https://huggingface.co/abhilashawasthi/bert-base-uncased_dish_descriptions_128_0.5M + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_uncased_duplicate_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_uncased_duplicate_en.md new file mode 100644 index 00000000000000..3fb8c6933b0661 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_uncased_duplicate_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_base_uncased_duplicate BertSentenceEmbeddings from julien-c +author: John Snow Labs +name: sent_bert_base_uncased_duplicate +date: 2024-09-23 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_uncased_duplicate` is a English model originally trained by julien-c. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_duplicate_en_5.5.0_3.0_1727105106174.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_duplicate_en_5.5.0_3.0_1727105106174.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_uncased_duplicate","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_uncased_duplicate","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_uncased_duplicate| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/julien-c/bert-base-uncased-duplicate \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_uncased_duplicate_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_uncased_duplicate_pipeline_en.md new file mode 100644 index 00000000000000..08b9729e4e7328 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_uncased_duplicate_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_base_uncased_duplicate_pipeline pipeline BertSentenceEmbeddings from julien-c +author: John Snow Labs +name: sent_bert_base_uncased_duplicate_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_uncased_duplicate_pipeline` is a English model originally trained by julien-c. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_duplicate_pipeline_en_5.5.0_3.0_1727105125107.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_duplicate_pipeline_en_5.5.0_3.0_1727105125107.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_uncased_duplicate_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_uncased_duplicate_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_uncased_duplicate_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.7 MB| + +## References + +https://huggingface.co/julien-c/bert-base-uncased-duplicate + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_uncased_finetuned_imdb_medhabi_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_uncased_finetuned_imdb_medhabi_en.md new file mode 100644 index 00000000000000..f6f4cd62323a3b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_uncased_finetuned_imdb_medhabi_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_base_uncased_finetuned_imdb_medhabi BertSentenceEmbeddings from medhabi +author: John Snow Labs +name: sent_bert_base_uncased_finetuned_imdb_medhabi +date: 2024-09-23 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_uncased_finetuned_imdb_medhabi` is a English model originally trained by medhabi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_finetuned_imdb_medhabi_en_5.5.0_3.0_1727113933512.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_finetuned_imdb_medhabi_en_5.5.0_3.0_1727113933512.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_uncased_finetuned_imdb_medhabi","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_uncased_finetuned_imdb_medhabi","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_uncased_finetuned_imdb_medhabi| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/medhabi/bert-base-uncased-finetuned-imdb \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_uncased_finetuned_imdb_medhabi_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_uncased_finetuned_imdb_medhabi_pipeline_en.md new file mode 100644 index 00000000000000..4282aa1a9a4dc2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_uncased_finetuned_imdb_medhabi_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_base_uncased_finetuned_imdb_medhabi_pipeline pipeline BertSentenceEmbeddings from medhabi +author: John Snow Labs +name: sent_bert_base_uncased_finetuned_imdb_medhabi_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_uncased_finetuned_imdb_medhabi_pipeline` is a English model originally trained by medhabi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_finetuned_imdb_medhabi_pipeline_en_5.5.0_3.0_1727113952916.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_finetuned_imdb_medhabi_pipeline_en_5.5.0_3.0_1727113952916.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_uncased_finetuned_imdb_medhabi_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_uncased_finetuned_imdb_medhabi_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_uncased_finetuned_imdb_medhabi_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.7 MB| + +## References + +https://huggingface.co/medhabi/bert-base-uncased-finetuned-imdb + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_uncased_finetuned_jira_hyperledger_issue_titles_and_bodies_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_uncased_finetuned_jira_hyperledger_issue_titles_and_bodies_en.md new file mode 100644 index 00000000000000..7f1b4e3bf833d0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_uncased_finetuned_jira_hyperledger_issue_titles_and_bodies_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_base_uncased_finetuned_jira_hyperledger_issue_titles_and_bodies BertSentenceEmbeddings from ietz +author: John Snow Labs +name: sent_bert_base_uncased_finetuned_jira_hyperledger_issue_titles_and_bodies +date: 2024-09-23 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_uncased_finetuned_jira_hyperledger_issue_titles_and_bodies` is a English model originally trained by ietz. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_finetuned_jira_hyperledger_issue_titles_and_bodies_en_5.5.0_3.0_1727113803766.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_finetuned_jira_hyperledger_issue_titles_and_bodies_en_5.5.0_3.0_1727113803766.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_uncased_finetuned_jira_hyperledger_issue_titles_and_bodies","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_uncased_finetuned_jira_hyperledger_issue_titles_and_bodies","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_uncased_finetuned_jira_hyperledger_issue_titles_and_bodies| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|407.1 MB| + +## References + +https://huggingface.co/ietz/bert-base-uncased-finetuned-jira-hyperledger-issue-titles-and-bodies \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_uncased_finetuned_jira_hyperledger_issue_titles_and_bodies_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_uncased_finetuned_jira_hyperledger_issue_titles_and_bodies_pipeline_en.md new file mode 100644 index 00000000000000..38c8e19d798bac --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_uncased_finetuned_jira_hyperledger_issue_titles_and_bodies_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_base_uncased_finetuned_jira_hyperledger_issue_titles_and_bodies_pipeline pipeline BertSentenceEmbeddings from ietz +author: John Snow Labs +name: sent_bert_base_uncased_finetuned_jira_hyperledger_issue_titles_and_bodies_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_uncased_finetuned_jira_hyperledger_issue_titles_and_bodies_pipeline` is a English model originally trained by ietz. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_finetuned_jira_hyperledger_issue_titles_and_bodies_pipeline_en_5.5.0_3.0_1727113822958.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_finetuned_jira_hyperledger_issue_titles_and_bodies_pipeline_en_5.5.0_3.0_1727113822958.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_uncased_finetuned_jira_hyperledger_issue_titles_and_bodies_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_uncased_finetuned_jira_hyperledger_issue_titles_and_bodies_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_uncased_finetuned_jira_hyperledger_issue_titles_and_bodies_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.7 MB| + +## References + +https://huggingface.co/ietz/bert-base-uncased-finetuned-jira-hyperledger-issue-titles-and-bodies + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_uncased_finetuned_news_1929_1932_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_uncased_finetuned_news_1929_1932_en.md new file mode 100644 index 00000000000000..6c39d5086a6444 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_uncased_finetuned_news_1929_1932_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_base_uncased_finetuned_news_1929_1932 BertSentenceEmbeddings from sally9805 +author: John Snow Labs +name: sent_bert_base_uncased_finetuned_news_1929_1932 +date: 2024-09-23 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_uncased_finetuned_news_1929_1932` is a English model originally trained by sally9805. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_finetuned_news_1929_1932_en_5.5.0_3.0_1727109834343.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_finetuned_news_1929_1932_en_5.5.0_3.0_1727109834343.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_uncased_finetuned_news_1929_1932","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_uncased_finetuned_news_1929_1932","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_uncased_finetuned_news_1929_1932| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/sally9805/bert-base-uncased-finetuned-news-1929-1932 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_uncased_finetuned_news_1929_1932_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_uncased_finetuned_news_1929_1932_pipeline_en.md new file mode 100644 index 00000000000000..6869ae37e23261 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_uncased_finetuned_news_1929_1932_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_base_uncased_finetuned_news_1929_1932_pipeline pipeline BertSentenceEmbeddings from sally9805 +author: John Snow Labs +name: sent_bert_base_uncased_finetuned_news_1929_1932_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_uncased_finetuned_news_1929_1932_pipeline` is a English model originally trained by sally9805. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_finetuned_news_1929_1932_pipeline_en_5.5.0_3.0_1727109854077.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_finetuned_news_1929_1932_pipeline_en_5.5.0_3.0_1727109854077.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_uncased_finetuned_news_1929_1932_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_uncased_finetuned_news_1929_1932_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_uncased_finetuned_news_1929_1932_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.7 MB| + +## References + +https://huggingface.co/sally9805/bert-base-uncased-finetuned-news-1929-1932 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_uncased_finetuned_wallisian_manual_2ep_lower_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_uncased_finetuned_wallisian_manual_2ep_lower_en.md new file mode 100644 index 00000000000000..56af94080fd2d8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_uncased_finetuned_wallisian_manual_2ep_lower_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_base_uncased_finetuned_wallisian_manual_2ep_lower BertSentenceEmbeddings from btamm12 +author: John Snow Labs +name: sent_bert_base_uncased_finetuned_wallisian_manual_2ep_lower +date: 2024-09-23 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_uncased_finetuned_wallisian_manual_2ep_lower` is a English model originally trained by btamm12. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_finetuned_wallisian_manual_2ep_lower_en_5.5.0_3.0_1727109868052.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_finetuned_wallisian_manual_2ep_lower_en_5.5.0_3.0_1727109868052.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_uncased_finetuned_wallisian_manual_2ep_lower","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_uncased_finetuned_wallisian_manual_2ep_lower","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_uncased_finetuned_wallisian_manual_2ep_lower| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/btamm12/bert-base-uncased-finetuned-wls-manual-2ep-lower \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_uncased_finetuned_wallisian_manual_2ep_lower_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_uncased_finetuned_wallisian_manual_2ep_lower_pipeline_en.md new file mode 100644 index 00000000000000..1d376c78d9b557 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_uncased_finetuned_wallisian_manual_2ep_lower_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_base_uncased_finetuned_wallisian_manual_2ep_lower_pipeline pipeline BertSentenceEmbeddings from btamm12 +author: John Snow Labs +name: sent_bert_base_uncased_finetuned_wallisian_manual_2ep_lower_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_uncased_finetuned_wallisian_manual_2ep_lower_pipeline` is a English model originally trained by btamm12. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_finetuned_wallisian_manual_2ep_lower_pipeline_en_5.5.0_3.0_1727109887680.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_finetuned_wallisian_manual_2ep_lower_pipeline_en_5.5.0_3.0_1727109887680.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_uncased_finetuned_wallisian_manual_2ep_lower_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_uncased_finetuned_wallisian_manual_2ep_lower_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_uncased_finetuned_wallisian_manual_2ep_lower_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.7 MB| + +## References + +https://huggingface.co/btamm12/bert-base-uncased-finetuned-wls-manual-2ep-lower + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_uncased_issues_128_bh8648_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_uncased_issues_128_bh8648_en.md new file mode 100644 index 00000000000000..976ba9d03bafeb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_uncased_issues_128_bh8648_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_base_uncased_issues_128_bh8648 BertSentenceEmbeddings from bh8648 +author: John Snow Labs +name: sent_bert_base_uncased_issues_128_bh8648 +date: 2024-09-23 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_uncased_issues_128_bh8648` is a English model originally trained by bh8648. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_issues_128_bh8648_en_5.5.0_3.0_1727104976102.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_issues_128_bh8648_en_5.5.0_3.0_1727104976102.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_uncased_issues_128_bh8648","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_uncased_issues_128_bh8648","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_uncased_issues_128_bh8648| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|407.1 MB| + +## References + +https://huggingface.co/bh8648/bert-base-uncased-issues-128 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_uncased_issues_128_bh8648_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_uncased_issues_128_bh8648_pipeline_en.md new file mode 100644 index 00000000000000..5b261e3f18fc6b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_uncased_issues_128_bh8648_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_base_uncased_issues_128_bh8648_pipeline pipeline BertSentenceEmbeddings from bh8648 +author: John Snow Labs +name: sent_bert_base_uncased_issues_128_bh8648_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_uncased_issues_128_bh8648_pipeline` is a English model originally trained by bh8648. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_issues_128_bh8648_pipeline_en_5.5.0_3.0_1727104995723.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_issues_128_bh8648_pipeline_en_5.5.0_3.0_1727104995723.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_uncased_issues_128_bh8648_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_uncased_issues_128_bh8648_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_uncased_issues_128_bh8648_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.7 MB| + +## References + +https://huggingface.co/bh8648/bert-base-uncased-issues-128 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_uncased_issues_128_igory1999_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_uncased_issues_128_igory1999_en.md new file mode 100644 index 00000000000000..ab1459e237d3a5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_uncased_issues_128_igory1999_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_base_uncased_issues_128_igory1999 BertSentenceEmbeddings from igory1999 +author: John Snow Labs +name: sent_bert_base_uncased_issues_128_igory1999 +date: 2024-09-23 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_uncased_issues_128_igory1999` is a English model originally trained by igory1999. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_issues_128_igory1999_en_5.5.0_3.0_1727105301459.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_issues_128_igory1999_en_5.5.0_3.0_1727105301459.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_uncased_issues_128_igory1999","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_uncased_issues_128_igory1999","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_uncased_issues_128_igory1999| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|407.1 MB| + +## References + +https://huggingface.co/igory1999/bert-base-uncased-issues-128 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_uncased_issues_128_igory1999_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_uncased_issues_128_igory1999_pipeline_en.md new file mode 100644 index 00000000000000..5659f4473bc823 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_uncased_issues_128_igory1999_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_base_uncased_issues_128_igory1999_pipeline pipeline BertSentenceEmbeddings from igory1999 +author: John Snow Labs +name: sent_bert_base_uncased_issues_128_igory1999_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_uncased_issues_128_igory1999_pipeline` is a English model originally trained by igory1999. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_issues_128_igory1999_pipeline_en_5.5.0_3.0_1727105321183.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_issues_128_igory1999_pipeline_en_5.5.0_3.0_1727105321183.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_uncased_issues_128_igory1999_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_uncased_issues_128_igory1999_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_uncased_issues_128_igory1999_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.7 MB| + +## References + +https://huggingface.co/igory1999/bert-base-uncased-issues-128 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_uncased_issues_128_pensuke_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_uncased_issues_128_pensuke_en.md new file mode 100644 index 00000000000000..7739d5b407fdeb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_uncased_issues_128_pensuke_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_base_uncased_issues_128_pensuke BertSentenceEmbeddings from pensuke +author: John Snow Labs +name: sent_bert_base_uncased_issues_128_pensuke +date: 2024-09-23 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_uncased_issues_128_pensuke` is a English model originally trained by pensuke. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_issues_128_pensuke_en_5.5.0_3.0_1727123265957.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_issues_128_pensuke_en_5.5.0_3.0_1727123265957.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_uncased_issues_128_pensuke","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_uncased_issues_128_pensuke","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_uncased_issues_128_pensuke| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|407.1 MB| + +## References + +https://huggingface.co/pensuke/bert-base-uncased-issues-128 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_uncased_issues_128_pensuke_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_uncased_issues_128_pensuke_pipeline_en.md new file mode 100644 index 00000000000000..3b08fb8627d860 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_uncased_issues_128_pensuke_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_base_uncased_issues_128_pensuke_pipeline pipeline BertSentenceEmbeddings from pensuke +author: John Snow Labs +name: sent_bert_base_uncased_issues_128_pensuke_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_uncased_issues_128_pensuke_pipeline` is a English model originally trained by pensuke. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_issues_128_pensuke_pipeline_en_5.5.0_3.0_1727123285915.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_issues_128_pensuke_pipeline_en_5.5.0_3.0_1727123285915.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_uncased_issues_128_pensuke_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_uncased_issues_128_pensuke_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_uncased_issues_128_pensuke_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.7 MB| + +## References + +https://huggingface.co/pensuke/bert-base-uncased-issues-128 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_uncased_issues_128_seddiktrk_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_uncased_issues_128_seddiktrk_en.md new file mode 100644 index 00000000000000..52c89daafd17db --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_uncased_issues_128_seddiktrk_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_base_uncased_issues_128_seddiktrk BertSentenceEmbeddings from seddiktrk +author: John Snow Labs +name: sent_bert_base_uncased_issues_128_seddiktrk +date: 2024-09-23 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_uncased_issues_128_seddiktrk` is a English model originally trained by seddiktrk. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_issues_128_seddiktrk_en_5.5.0_3.0_1727105288815.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_issues_128_seddiktrk_en_5.5.0_3.0_1727105288815.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_uncased_issues_128_seddiktrk","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_uncased_issues_128_seddiktrk","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_uncased_issues_128_seddiktrk| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|407.1 MB| + +## References + +https://huggingface.co/seddiktrk/bert-base-uncased-issues-128 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_uncased_issues_128_seddiktrk_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_uncased_issues_128_seddiktrk_pipeline_en.md new file mode 100644 index 00000000000000..263c8247a6093d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_uncased_issues_128_seddiktrk_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_base_uncased_issues_128_seddiktrk_pipeline pipeline BertSentenceEmbeddings from seddiktrk +author: John Snow Labs +name: sent_bert_base_uncased_issues_128_seddiktrk_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_uncased_issues_128_seddiktrk_pipeline` is a English model originally trained by seddiktrk. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_issues_128_seddiktrk_pipeline_en_5.5.0_3.0_1727105308520.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_issues_128_seddiktrk_pipeline_en_5.5.0_3.0_1727105308520.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_uncased_issues_128_seddiktrk_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_uncased_issues_128_seddiktrk_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_uncased_issues_128_seddiktrk_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.7 MB| + +## References + +https://huggingface.co/seddiktrk/bert-base-uncased-issues-128 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_uncased_kinyarwanda_finetuned_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_uncased_kinyarwanda_finetuned_en.md new file mode 100644 index 00000000000000..143c39be274309 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_uncased_kinyarwanda_finetuned_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_base_uncased_kinyarwanda_finetuned BertSentenceEmbeddings from RogerB +author: John Snow Labs +name: sent_bert_base_uncased_kinyarwanda_finetuned +date: 2024-09-23 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_uncased_kinyarwanda_finetuned` is a English model originally trained by RogerB. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_kinyarwanda_finetuned_en_5.5.0_3.0_1727109798634.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_kinyarwanda_finetuned_en_5.5.0_3.0_1727109798634.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_uncased_kinyarwanda_finetuned","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_uncased_kinyarwanda_finetuned","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_uncased_kinyarwanda_finetuned| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/RogerB/bert-base-uncased-kinyarwanda-finetuned \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_uncased_kinyarwanda_finetuned_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_uncased_kinyarwanda_finetuned_pipeline_en.md new file mode 100644 index 00000000000000..0eeeb3cf0acf76 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_uncased_kinyarwanda_finetuned_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_base_uncased_kinyarwanda_finetuned_pipeline pipeline BertSentenceEmbeddings from RogerB +author: John Snow Labs +name: sent_bert_base_uncased_kinyarwanda_finetuned_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_uncased_kinyarwanda_finetuned_pipeline` is a English model originally trained by RogerB. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_kinyarwanda_finetuned_pipeline_en_5.5.0_3.0_1727109818125.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_kinyarwanda_finetuned_pipeline_en_5.5.0_3.0_1727109818125.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_uncased_kinyarwanda_finetuned_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_uncased_kinyarwanda_finetuned_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_uncased_kinyarwanda_finetuned_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.7 MB| + +## References + +https://huggingface.co/RogerB/bert-base-uncased-kinyarwanda-finetuned + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_uncased_mnli_sparse_70_unstructured_norwegian_classifier_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_uncased_mnli_sparse_70_unstructured_norwegian_classifier_pipeline_en.md new file mode 100644 index 00000000000000..b8046eeb124abe --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_uncased_mnli_sparse_70_unstructured_norwegian_classifier_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_base_uncased_mnli_sparse_70_unstructured_norwegian_classifier_pipeline pipeline BertSentenceEmbeddings from Intel +author: John Snow Labs +name: sent_bert_base_uncased_mnli_sparse_70_unstructured_norwegian_classifier_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_uncased_mnli_sparse_70_unstructured_norwegian_classifier_pipeline` is a English model originally trained by Intel. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_mnli_sparse_70_unstructured_norwegian_classifier_pipeline_en_5.5.0_3.0_1727105153242.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_mnli_sparse_70_unstructured_norwegian_classifier_pipeline_en_5.5.0_3.0_1727105153242.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_uncased_mnli_sparse_70_unstructured_norwegian_classifier_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_uncased_mnli_sparse_70_unstructured_norwegian_classifier_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_uncased_mnli_sparse_70_unstructured_norwegian_classifier_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|226.5 MB| + +## References + +https://huggingface.co/Intel/bert-base-uncased-mnli-sparse-70-unstructured-no-classifier + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_finetuning_test_xiejiafang_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_finetuning_test_xiejiafang_en.md new file mode 100644 index 00000000000000..a125724a156c63 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_finetuning_test_xiejiafang_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_finetuning_test_xiejiafang BertSentenceEmbeddings from xiejiafang +author: John Snow Labs +name: sent_bert_finetuning_test_xiejiafang +date: 2024-09-23 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_finetuning_test_xiejiafang` is a English model originally trained by xiejiafang. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_finetuning_test_xiejiafang_en_5.5.0_3.0_1727114049947.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_finetuning_test_xiejiafang_en_5.5.0_3.0_1727114049947.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_finetuning_test_xiejiafang","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_finetuning_test_xiejiafang","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_finetuning_test_xiejiafang| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/xiejiafang/bert_finetuning_test \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_finetuning_test_xiejiafang_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_finetuning_test_xiejiafang_pipeline_en.md new file mode 100644 index 00000000000000..fba687b1fdcb41 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_finetuning_test_xiejiafang_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_finetuning_test_xiejiafang_pipeline pipeline BertSentenceEmbeddings from xiejiafang +author: John Snow Labs +name: sent_bert_finetuning_test_xiejiafang_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_finetuning_test_xiejiafang_pipeline` is a English model originally trained by xiejiafang. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_finetuning_test_xiejiafang_pipeline_en_5.5.0_3.0_1727114068670.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_finetuning_test_xiejiafang_pipeline_en_5.5.0_3.0_1727114068670.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_finetuning_test_xiejiafang_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_finetuning_test_xiejiafang_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_finetuning_test_xiejiafang_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.7 MB| + +## References + +https://huggingface.co/xiejiafang/bert_finetuning_test + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_hinglish_big_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_hinglish_big_en.md new file mode 100644 index 00000000000000..8820f7ecdebe46 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_hinglish_big_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_hinglish_big BertSentenceEmbeddings from aditeyabaral +author: John Snow Labs +name: sent_bert_hinglish_big +date: 2024-09-23 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_hinglish_big` is a English model originally trained by aditeyabaral. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_hinglish_big_en_5.5.0_3.0_1727109552454.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_hinglish_big_en_5.5.0_3.0_1727109552454.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_hinglish_big","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_hinglish_big","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_hinglish_big| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|249.0 MB| + +## References + +https://huggingface.co/aditeyabaral/bert-hinglish-big \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_hinglish_big_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_hinglish_big_pipeline_en.md new file mode 100644 index 00000000000000..5d27c9860cb391 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_hinglish_big_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_hinglish_big_pipeline pipeline BertSentenceEmbeddings from aditeyabaral +author: John Snow Labs +name: sent_bert_hinglish_big_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_hinglish_big_pipeline` is a English model originally trained by aditeyabaral. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_hinglish_big_pipeline_en_5.5.0_3.0_1727109564269.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_hinglish_big_pipeline_en_5.5.0_3.0_1727109564269.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_hinglish_big_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_hinglish_big_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_hinglish_big_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/aditeyabaral/bert-hinglish-big + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_large_cased_sigir_support_refute_norwegian_label_40_2nd_test_lr10_8_9_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_large_cased_sigir_support_refute_norwegian_label_40_2nd_test_lr10_8_9_pipeline_en.md new file mode 100644 index 00000000000000..bf18cb815abb3b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_large_cased_sigir_support_refute_norwegian_label_40_2nd_test_lr10_8_9_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_large_cased_sigir_support_refute_norwegian_label_40_2nd_test_lr10_8_9_pipeline pipeline BertSentenceEmbeddings from jojoUla +author: John Snow Labs +name: sent_bert_large_cased_sigir_support_refute_norwegian_label_40_2nd_test_lr10_8_9_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_large_cased_sigir_support_refute_norwegian_label_40_2nd_test_lr10_8_9_pipeline` is a English model originally trained by jojoUla. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_large_cased_sigir_support_refute_norwegian_label_40_2nd_test_lr10_8_9_pipeline_en_5.5.0_3.0_1727113541166.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_large_cased_sigir_support_refute_norwegian_label_40_2nd_test_lr10_8_9_pipeline_en_5.5.0_3.0_1727113541166.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_large_cased_sigir_support_refute_norwegian_label_40_2nd_test_lr10_8_9_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_large_cased_sigir_support_refute_norwegian_label_40_2nd_test_lr10_8_9_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_large_cased_sigir_support_refute_norwegian_label_40_2nd_test_lr10_8_9_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/jojoUla/bert-large-cased-sigir-support-refute-no-label-40-2nd-test-LR10-8-9 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_large_cased_sigir_support_refute_norwegian_label_40_2nd_test_lr10_8_fast_14_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_large_cased_sigir_support_refute_norwegian_label_40_2nd_test_lr10_8_fast_14_en.md new file mode 100644 index 00000000000000..a49b7e1fe10f97 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_large_cased_sigir_support_refute_norwegian_label_40_2nd_test_lr10_8_fast_14_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_large_cased_sigir_support_refute_norwegian_label_40_2nd_test_lr10_8_fast_14 BertSentenceEmbeddings from jojoUla +author: John Snow Labs +name: sent_bert_large_cased_sigir_support_refute_norwegian_label_40_2nd_test_lr10_8_fast_14 +date: 2024-09-23 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_large_cased_sigir_support_refute_norwegian_label_40_2nd_test_lr10_8_fast_14` is a English model originally trained by jojoUla. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_large_cased_sigir_support_refute_norwegian_label_40_2nd_test_lr10_8_fast_14_en_5.5.0_3.0_1727122909698.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_large_cased_sigir_support_refute_norwegian_label_40_2nd_test_lr10_8_fast_14_en_5.5.0_3.0_1727122909698.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_large_cased_sigir_support_refute_norwegian_label_40_2nd_test_lr10_8_fast_14","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_large_cased_sigir_support_refute_norwegian_label_40_2nd_test_lr10_8_fast_14","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_large_cased_sigir_support_refute_norwegian_label_40_2nd_test_lr10_8_fast_14| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/jojoUla/bert-large-cased-sigir-support-refute-no-label-40-2nd-test-LR10-8-fast-14 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_large_cased_sigir_support_refute_norwegian_label_40_2nd_test_lr10_8_fast_4_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_large_cased_sigir_support_refute_norwegian_label_40_2nd_test_lr10_8_fast_4_en.md new file mode 100644 index 00000000000000..b6ff98fb448791 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_large_cased_sigir_support_refute_norwegian_label_40_2nd_test_lr10_8_fast_4_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_large_cased_sigir_support_refute_norwegian_label_40_2nd_test_lr10_8_fast_4 BertSentenceEmbeddings from jojoUla +author: John Snow Labs +name: sent_bert_large_cased_sigir_support_refute_norwegian_label_40_2nd_test_lr10_8_fast_4 +date: 2024-09-23 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_large_cased_sigir_support_refute_norwegian_label_40_2nd_test_lr10_8_fast_4` is a English model originally trained by jojoUla. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_large_cased_sigir_support_refute_norwegian_label_40_2nd_test_lr10_8_fast_4_en_5.5.0_3.0_1727105145171.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_large_cased_sigir_support_refute_norwegian_label_40_2nd_test_lr10_8_fast_4_en_5.5.0_3.0_1727105145171.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_large_cased_sigir_support_refute_norwegian_label_40_2nd_test_lr10_8_fast_4","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_large_cased_sigir_support_refute_norwegian_label_40_2nd_test_lr10_8_fast_4","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_large_cased_sigir_support_refute_norwegian_label_40_2nd_test_lr10_8_fast_4| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/jojoUla/bert-large-cased-sigir-support-refute-no-label-40-2nd-test-LR10-8-fast-4 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_large_cased_sigir_support_refute_norwegian_label_40_2nd_test_lr10_8_fast_4_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_large_cased_sigir_support_refute_norwegian_label_40_2nd_test_lr10_8_fast_4_pipeline_en.md new file mode 100644 index 00000000000000..44caa589ecf81a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_large_cased_sigir_support_refute_norwegian_label_40_2nd_test_lr10_8_fast_4_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_large_cased_sigir_support_refute_norwegian_label_40_2nd_test_lr10_8_fast_4_pipeline pipeline BertSentenceEmbeddings from jojoUla +author: John Snow Labs +name: sent_bert_large_cased_sigir_support_refute_norwegian_label_40_2nd_test_lr10_8_fast_4_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_large_cased_sigir_support_refute_norwegian_label_40_2nd_test_lr10_8_fast_4_pipeline` is a English model originally trained by jojoUla. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_large_cased_sigir_support_refute_norwegian_label_40_2nd_test_lr10_8_fast_4_pipeline_en_5.5.0_3.0_1727105203785.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_large_cased_sigir_support_refute_norwegian_label_40_2nd_test_lr10_8_fast_4_pipeline_en_5.5.0_3.0_1727105203785.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_large_cased_sigir_support_refute_norwegian_label_40_2nd_test_lr10_8_fast_4_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_large_cased_sigir_support_refute_norwegian_label_40_2nd_test_lr10_8_fast_4_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_large_cased_sigir_support_refute_norwegian_label_40_2nd_test_lr10_8_fast_4_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/jojoUla/bert-large-cased-sigir-support-refute-no-label-40-2nd-test-LR10-8-fast-4 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_large_cased_sigir_support_refute_norwegian_label_40_2nd_test_lr10_8_fast_8_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_large_cased_sigir_support_refute_norwegian_label_40_2nd_test_lr10_8_fast_8_en.md new file mode 100644 index 00000000000000..794ca55f74d47f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_large_cased_sigir_support_refute_norwegian_label_40_2nd_test_lr10_8_fast_8_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_large_cased_sigir_support_refute_norwegian_label_40_2nd_test_lr10_8_fast_8 BertSentenceEmbeddings from jojoUla +author: John Snow Labs +name: sent_bert_large_cased_sigir_support_refute_norwegian_label_40_2nd_test_lr10_8_fast_8 +date: 2024-09-23 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_large_cased_sigir_support_refute_norwegian_label_40_2nd_test_lr10_8_fast_8` is a English model originally trained by jojoUla. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_large_cased_sigir_support_refute_norwegian_label_40_2nd_test_lr10_8_fast_8_en_5.5.0_3.0_1727113756633.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_large_cased_sigir_support_refute_norwegian_label_40_2nd_test_lr10_8_fast_8_en_5.5.0_3.0_1727113756633.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_large_cased_sigir_support_refute_norwegian_label_40_2nd_test_lr10_8_fast_8","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_large_cased_sigir_support_refute_norwegian_label_40_2nd_test_lr10_8_fast_8","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_large_cased_sigir_support_refute_norwegian_label_40_2nd_test_lr10_8_fast_8| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/jojoUla/bert-large-cased-sigir-support-refute-no-label-40-2nd-test-LR10-8-fast-8 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_large_cased_sigir_support_refute_norwegian_label_40_2nd_test_lr10_8_fast_8_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_large_cased_sigir_support_refute_norwegian_label_40_2nd_test_lr10_8_fast_8_pipeline_en.md new file mode 100644 index 00000000000000..6a9caf0d6d2f89 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_large_cased_sigir_support_refute_norwegian_label_40_2nd_test_lr10_8_fast_8_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_large_cased_sigir_support_refute_norwegian_label_40_2nd_test_lr10_8_fast_8_pipeline pipeline BertSentenceEmbeddings from jojoUla +author: John Snow Labs +name: sent_bert_large_cased_sigir_support_refute_norwegian_label_40_2nd_test_lr10_8_fast_8_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_large_cased_sigir_support_refute_norwegian_label_40_2nd_test_lr10_8_fast_8_pipeline` is a English model originally trained by jojoUla. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_large_cased_sigir_support_refute_norwegian_label_40_2nd_test_lr10_8_fast_8_pipeline_en_5.5.0_3.0_1727113815073.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_large_cased_sigir_support_refute_norwegian_label_40_2nd_test_lr10_8_fast_8_pipeline_en_5.5.0_3.0_1727113815073.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_large_cased_sigir_support_refute_norwegian_label_40_2nd_test_lr10_8_fast_8_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_large_cased_sigir_support_refute_norwegian_label_40_2nd_test_lr10_8_fast_8_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_large_cased_sigir_support_refute_norwegian_label_40_2nd_test_lr10_8_fast_8_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/jojoUla/bert-large-cased-sigir-support-refute-no-label-40-2nd-test-LR10-8-fast-8 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_large_cased_sigir_support_refute_norwegian_label_40_2nd_test_lr10_8_fast_9_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_large_cased_sigir_support_refute_norwegian_label_40_2nd_test_lr10_8_fast_9_pipeline_en.md new file mode 100644 index 00000000000000..8044371e975b06 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_large_cased_sigir_support_refute_norwegian_label_40_2nd_test_lr10_8_fast_9_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_large_cased_sigir_support_refute_norwegian_label_40_2nd_test_lr10_8_fast_9_pipeline pipeline BertSentenceEmbeddings from jojoUla +author: John Snow Labs +name: sent_bert_large_cased_sigir_support_refute_norwegian_label_40_2nd_test_lr10_8_fast_9_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_large_cased_sigir_support_refute_norwegian_label_40_2nd_test_lr10_8_fast_9_pipeline` is a English model originally trained by jojoUla. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_large_cased_sigir_support_refute_norwegian_label_40_2nd_test_lr10_8_fast_9_pipeline_en_5.5.0_3.0_1727102339622.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_large_cased_sigir_support_refute_norwegian_label_40_2nd_test_lr10_8_fast_9_pipeline_en_5.5.0_3.0_1727102339622.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_large_cased_sigir_support_refute_norwegian_label_40_2nd_test_lr10_8_fast_9_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_large_cased_sigir_support_refute_norwegian_label_40_2nd_test_lr10_8_fast_9_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_large_cased_sigir_support_refute_norwegian_label_40_2nd_test_lr10_8_fast_9_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/jojoUla/bert-large-cased-sigir-support-refute-no-label-40-2nd-test-LR10-8-fast-9 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_ltrc_telugu_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_ltrc_telugu_en.md new file mode 100644 index 00000000000000..b88113bb98fa48 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_ltrc_telugu_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_ltrc_telugu BertSentenceEmbeddings from ltrctelugu +author: John Snow Labs +name: sent_bert_ltrc_telugu +date: 2024-09-23 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_ltrc_telugu` is a English model originally trained by ltrctelugu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_ltrc_telugu_en_5.5.0_3.0_1727109713843.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_ltrc_telugu_en_5.5.0_3.0_1727109713843.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_ltrc_telugu","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_ltrc_telugu","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_ltrc_telugu| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|403.4 MB| + +## References + +https://huggingface.co/ltrctelugu/bert_ltrc_telugu \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_ltrc_telugu_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_ltrc_telugu_pipeline_en.md new file mode 100644 index 00000000000000..2b231e63a03d89 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_ltrc_telugu_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_ltrc_telugu_pipeline pipeline BertSentenceEmbeddings from ltrctelugu +author: John Snow Labs +name: sent_bert_ltrc_telugu_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_ltrc_telugu_pipeline` is a English model originally trained by ltrctelugu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_ltrc_telugu_pipeline_en_5.5.0_3.0_1727109732931.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_ltrc_telugu_pipeline_en_5.5.0_3.0_1727109732931.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_ltrc_telugu_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_ltrc_telugu_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_ltrc_telugu_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|403.9 MB| + +## References + +https://huggingface.co/ltrctelugu/bert_ltrc_telugu + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_mini_domain_adapted_imdb_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_mini_domain_adapted_imdb_en.md new file mode 100644 index 00000000000000..c6a865565b5521 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_mini_domain_adapted_imdb_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_mini_domain_adapted_imdb BertSentenceEmbeddings from rasyosef +author: John Snow Labs +name: sent_bert_mini_domain_adapted_imdb +date: 2024-09-23 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_mini_domain_adapted_imdb` is a English model originally trained by rasyosef. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_mini_domain_adapted_imdb_en_5.5.0_3.0_1727122774835.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_mini_domain_adapted_imdb_en_5.5.0_3.0_1727122774835.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_mini_domain_adapted_imdb","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_mini_domain_adapted_imdb","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_mini_domain_adapted_imdb| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|41.9 MB| + +## References + +https://huggingface.co/rasyosef/bert-mini-domain-adapted-imdb \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_mini_domain_adapted_imdb_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_mini_domain_adapted_imdb_pipeline_en.md new file mode 100644 index 00000000000000..c325666c914aa9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_mini_domain_adapted_imdb_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_mini_domain_adapted_imdb_pipeline pipeline BertSentenceEmbeddings from rasyosef +author: John Snow Labs +name: sent_bert_mini_domain_adapted_imdb_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_mini_domain_adapted_imdb_pipeline` is a English model originally trained by rasyosef. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_mini_domain_adapted_imdb_pipeline_en_5.5.0_3.0_1727122777219.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_mini_domain_adapted_imdb_pipeline_en_5.5.0_3.0_1727122777219.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_mini_domain_adapted_imdb_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_mini_domain_adapted_imdb_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_mini_domain_adapted_imdb_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|42.4 MB| + +## References + +https://huggingface.co/rasyosef/bert-mini-domain-adapted-imdb + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_persian_farsi_base_uncased_nlp_course_hw2_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_persian_farsi_base_uncased_nlp_course_hw2_en.md new file mode 100644 index 00000000000000..676a38de19e491 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_persian_farsi_base_uncased_nlp_course_hw2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_persian_farsi_base_uncased_nlp_course_hw2 BertSentenceEmbeddings from iMahdiGhazavi +author: John Snow Labs +name: sent_bert_persian_farsi_base_uncased_nlp_course_hw2 +date: 2024-09-23 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_persian_farsi_base_uncased_nlp_course_hw2` is a English model originally trained by iMahdiGhazavi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_persian_farsi_base_uncased_nlp_course_hw2_en_5.5.0_3.0_1727102030567.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_persian_farsi_base_uncased_nlp_course_hw2_en_5.5.0_3.0_1727102030567.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_persian_farsi_base_uncased_nlp_course_hw2","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_persian_farsi_base_uncased_nlp_course_hw2","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_persian_farsi_base_uncased_nlp_course_hw2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|605.8 MB| + +## References + +https://huggingface.co/iMahdiGhazavi/bert-fa-base-uncased-nlp-course-hw2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_persian_farsi_base_uncased_nlp_course_hw2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_persian_farsi_base_uncased_nlp_course_hw2_pipeline_en.md new file mode 100644 index 00000000000000..7744f199de7266 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_persian_farsi_base_uncased_nlp_course_hw2_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_persian_farsi_base_uncased_nlp_course_hw2_pipeline pipeline BertSentenceEmbeddings from iMahdiGhazavi +author: John Snow Labs +name: sent_bert_persian_farsi_base_uncased_nlp_course_hw2_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_persian_farsi_base_uncased_nlp_course_hw2_pipeline` is a English model originally trained by iMahdiGhazavi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_persian_farsi_base_uncased_nlp_course_hw2_pipeline_en_5.5.0_3.0_1727102060003.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_persian_farsi_base_uncased_nlp_course_hw2_pipeline_en_5.5.0_3.0_1727102060003.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_persian_farsi_base_uncased_nlp_course_hw2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_persian_farsi_base_uncased_nlp_course_hw2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_persian_farsi_base_uncased_nlp_course_hw2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|606.3 MB| + +## References + +https://huggingface.co/iMahdiGhazavi/bert-fa-base-uncased-nlp-course-hw2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_political_election2020_twitter_mlm_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_political_election2020_twitter_mlm_en.md new file mode 100644 index 00000000000000..fe20d5d7b03663 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_political_election2020_twitter_mlm_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_political_election2020_twitter_mlm BertSentenceEmbeddings from kornosk +author: John Snow Labs +name: sent_bert_political_election2020_twitter_mlm +date: 2024-09-23 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_political_election2020_twitter_mlm` is a English model originally trained by kornosk. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_political_election2020_twitter_mlm_en_5.5.0_3.0_1727113638727.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_political_election2020_twitter_mlm_en_5.5.0_3.0_1727113638727.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_political_election2020_twitter_mlm","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_political_election2020_twitter_mlm","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_political_election2020_twitter_mlm| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|407.6 MB| + +## References + +https://huggingface.co/kornosk/bert-political-election2020-twitter-mlm \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_political_election2020_twitter_mlm_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_political_election2020_twitter_mlm_pipeline_en.md new file mode 100644 index 00000000000000..8377db46395c01 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_political_election2020_twitter_mlm_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_political_election2020_twitter_mlm_pipeline pipeline BertSentenceEmbeddings from kornosk +author: John Snow Labs +name: sent_bert_political_election2020_twitter_mlm_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_political_election2020_twitter_mlm_pipeline` is a English model originally trained by kornosk. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_political_election2020_twitter_mlm_pipeline_en_5.5.0_3.0_1727113658162.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_political_election2020_twitter_mlm_pipeline_en_5.5.0_3.0_1727113658162.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_political_election2020_twitter_mlm_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_political_election2020_twitter_mlm_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_political_election2020_twitter_mlm_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|408.1 MB| + +## References + +https://huggingface.co/kornosk/bert-political-election2020-twitter-mlm + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_pretrained_wikitext_2_raw_v1_dimpo_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_pretrained_wikitext_2_raw_v1_dimpo_pipeline_en.md new file mode 100644 index 00000000000000..bd2513f90b88b3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_pretrained_wikitext_2_raw_v1_dimpo_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_pretrained_wikitext_2_raw_v1_dimpo_pipeline pipeline BertSentenceEmbeddings from dimpo +author: John Snow Labs +name: sent_bert_pretrained_wikitext_2_raw_v1_dimpo_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_pretrained_wikitext_2_raw_v1_dimpo_pipeline` is a English model originally trained by dimpo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_pretrained_wikitext_2_raw_v1_dimpo_pipeline_en_5.5.0_3.0_1727110082331.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_pretrained_wikitext_2_raw_v1_dimpo_pipeline_en_5.5.0_3.0_1727110082331.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_pretrained_wikitext_2_raw_v1_dimpo_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_pretrained_wikitext_2_raw_v1_dimpo_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_pretrained_wikitext_2_raw_v1_dimpo_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|408.4 MB| + +## References + +https://huggingface.co/dimpo/bert-pretrained-wikitext-2-raw-v1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_pretraining_gaudi_2_batch_size_64_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_pretraining_gaudi_2_batch_size_64_en.md new file mode 100644 index 00000000000000..c4d505b28cd4aa --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_pretraining_gaudi_2_batch_size_64_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_pretraining_gaudi_2_batch_size_64 BertSentenceEmbeddings from regisss +author: John Snow Labs +name: sent_bert_pretraining_gaudi_2_batch_size_64 +date: 2024-09-23 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_pretraining_gaudi_2_batch_size_64` is a English model originally trained by regisss. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_pretraining_gaudi_2_batch_size_64_en_5.5.0_3.0_1727122873673.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_pretraining_gaudi_2_batch_size_64_en_5.5.0_3.0_1727122873673.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_pretraining_gaudi_2_batch_size_64","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_pretraining_gaudi_2_batch_size_64","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_pretraining_gaudi_2_batch_size_64| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|412.4 MB| + +## References + +https://huggingface.co/regisss/bert-pretraining-gaudi-2-batch-size-64 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_pretraining_gaudi_2_batch_size_64_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_pretraining_gaudi_2_batch_size_64_pipeline_en.md new file mode 100644 index 00000000000000..8d57e4eeabb972 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_pretraining_gaudi_2_batch_size_64_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_pretraining_gaudi_2_batch_size_64_pipeline pipeline BertSentenceEmbeddings from regisss +author: John Snow Labs +name: sent_bert_pretraining_gaudi_2_batch_size_64_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_pretraining_gaudi_2_batch_size_64_pipeline` is a English model originally trained by regisss. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_pretraining_gaudi_2_batch_size_64_pipeline_en_5.5.0_3.0_1727122893192.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_pretraining_gaudi_2_batch_size_64_pipeline_en_5.5.0_3.0_1727122893192.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_pretraining_gaudi_2_batch_size_64_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_pretraining_gaudi_2_batch_size_64_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_pretraining_gaudi_2_batch_size_64_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|412.9 MB| + +## References + +https://huggingface.co/regisss/bert-pretraining-gaudi-2-batch-size-64 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_small_finetuned_legal_contracts10train10val_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_small_finetuned_legal_contracts10train10val_en.md new file mode 100644 index 00000000000000..0f88a96527b3f5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_small_finetuned_legal_contracts10train10val_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_small_finetuned_legal_contracts10train10val BertSentenceEmbeddings from muhtasham +author: John Snow Labs +name: sent_bert_small_finetuned_legal_contracts10train10val +date: 2024-09-23 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_small_finetuned_legal_contracts10train10val` is a English model originally trained by muhtasham. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_small_finetuned_legal_contracts10train10val_en_5.5.0_3.0_1727110088145.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_small_finetuned_legal_contracts10train10val_en_5.5.0_3.0_1727110088145.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_small_finetuned_legal_contracts10train10val","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_small_finetuned_legal_contracts10train10val","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_small_finetuned_legal_contracts10train10val| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|107.0 MB| + +## References + +https://huggingface.co/muhtasham/bert-small-finetuned-legal-contracts10train10val \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_small_finetuned_legal_contracts10train10val_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_small_finetuned_legal_contracts10train10val_pipeline_en.md new file mode 100644 index 00000000000000..6635eb2dc78462 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_small_finetuned_legal_contracts10train10val_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_small_finetuned_legal_contracts10train10val_pipeline pipeline BertSentenceEmbeddings from muhtasham +author: John Snow Labs +name: sent_bert_small_finetuned_legal_contracts10train10val_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_small_finetuned_legal_contracts10train10val_pipeline` is a English model originally trained by muhtasham. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_small_finetuned_legal_contracts10train10val_pipeline_en_5.5.0_3.0_1727110093220.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_small_finetuned_legal_contracts10train10val_pipeline_en_5.5.0_3.0_1727110093220.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_small_finetuned_legal_contracts10train10val_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_small_finetuned_legal_contracts10train10val_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_small_finetuned_legal_contracts10train10val_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|107.5 MB| + +## References + +https://huggingface.co/muhtasham/bert-small-finetuned-legal-contracts10train10val + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_tiny_finetuned_legal_definitions_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_tiny_finetuned_legal_definitions_en.md new file mode 100644 index 00000000000000..477041cd34c380 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_tiny_finetuned_legal_definitions_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_tiny_finetuned_legal_definitions BertSentenceEmbeddings from muhtasham +author: John Snow Labs +name: sent_bert_tiny_finetuned_legal_definitions +date: 2024-09-23 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_tiny_finetuned_legal_definitions` is a English model originally trained by muhtasham. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_tiny_finetuned_legal_definitions_en_5.5.0_3.0_1727113417295.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_tiny_finetuned_legal_definitions_en_5.5.0_3.0_1727113417295.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_tiny_finetuned_legal_definitions","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_tiny_finetuned_legal_definitions","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_tiny_finetuned_legal_definitions| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|16.7 MB| + +## References + +https://huggingface.co/muhtasham/bert-tiny-finetuned-legal-definitions \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_tiny_finetuned_legal_definitions_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_tiny_finetuned_legal_definitions_pipeline_en.md new file mode 100644 index 00000000000000..25f1457ac4434e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_tiny_finetuned_legal_definitions_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_tiny_finetuned_legal_definitions_pipeline pipeline BertSentenceEmbeddings from muhtasham +author: John Snow Labs +name: sent_bert_tiny_finetuned_legal_definitions_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_tiny_finetuned_legal_definitions_pipeline` is a English model originally trained by muhtasham. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_tiny_finetuned_legal_definitions_pipeline_en_5.5.0_3.0_1727113418618.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_tiny_finetuned_legal_definitions_pipeline_en_5.5.0_3.0_1727113418618.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_tiny_finetuned_legal_definitions_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_tiny_finetuned_legal_definitions_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_tiny_finetuned_legal_definitions_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|17.2 MB| + +## References + +https://huggingface.co/muhtasham/bert-tiny-finetuned-legal-definitions + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_uncased_l_10_h_512_a_8_cord19_200616_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_uncased_l_10_h_512_a_8_cord19_200616_en.md new file mode 100644 index 00000000000000..aa24732c258a47 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_uncased_l_10_h_512_a_8_cord19_200616_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_uncased_l_10_h_512_a_8_cord19_200616 BertSentenceEmbeddings from aodiniz +author: John Snow Labs +name: sent_bert_uncased_l_10_h_512_a_8_cord19_200616 +date: 2024-09-23 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_uncased_l_10_h_512_a_8_cord19_200616` is a English model originally trained by aodiniz. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_uncased_l_10_h_512_a_8_cord19_200616_en_5.5.0_3.0_1727102076204.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_uncased_l_10_h_512_a_8_cord19_200616_en_5.5.0_3.0_1727102076204.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_uncased_l_10_h_512_a_8_cord19_200616","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_uncased_l_10_h_512_a_8_cord19_200616","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_uncased_l_10_h_512_a_8_cord19_200616| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|177.4 MB| + +## References + +https://huggingface.co/aodiniz/bert_uncased_L-10_H-512_A-8_cord19-200616 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_uncased_l_10_h_512_a_8_cord19_200616_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_uncased_l_10_h_512_a_8_cord19_200616_pipeline_en.md new file mode 100644 index 00000000000000..0073eb0aea6970 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_uncased_l_10_h_512_a_8_cord19_200616_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_uncased_l_10_h_512_a_8_cord19_200616_pipeline pipeline BertSentenceEmbeddings from aodiniz +author: John Snow Labs +name: sent_bert_uncased_l_10_h_512_a_8_cord19_200616_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_uncased_l_10_h_512_a_8_cord19_200616_pipeline` is a English model originally trained by aodiniz. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_uncased_l_10_h_512_a_8_cord19_200616_pipeline_en_5.5.0_3.0_1727102084641.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_uncased_l_10_h_512_a_8_cord19_200616_pipeline_en_5.5.0_3.0_1727102084641.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_uncased_l_10_h_512_a_8_cord19_200616_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_uncased_l_10_h_512_a_8_cord19_200616_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_uncased_l_10_h_512_a_8_cord19_200616_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|178.0 MB| + +## References + +https://huggingface.co/aodiniz/bert_uncased_L-10_H-512_A-8_cord19-200616 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_uncased_l_6_h_128_a_2_cord19_200616_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_uncased_l_6_h_128_a_2_cord19_200616_en.md new file mode 100644 index 00000000000000..183d49bb9ca0ad --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_uncased_l_6_h_128_a_2_cord19_200616_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_uncased_l_6_h_128_a_2_cord19_200616 BertSentenceEmbeddings from aodiniz +author: John Snow Labs +name: sent_bert_uncased_l_6_h_128_a_2_cord19_200616 +date: 2024-09-23 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_uncased_l_6_h_128_a_2_cord19_200616` is a English model originally trained by aodiniz. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_uncased_l_6_h_128_a_2_cord19_200616_en_5.5.0_3.0_1727122852915.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_uncased_l_6_h_128_a_2_cord19_200616_en_5.5.0_3.0_1727122852915.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_uncased_l_6_h_128_a_2_cord19_200616","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_uncased_l_6_h_128_a_2_cord19_200616","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_uncased_l_6_h_128_a_2_cord19_200616| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|19.6 MB| + +## References + +https://huggingface.co/aodiniz/bert_uncased_L-6_H-128_A-2_cord19-200616 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bertbase_uyghur_3e_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bertbase_uyghur_3e_en.md new file mode 100644 index 00000000000000..1326deaf12682b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bertbase_uyghur_3e_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bertbase_uyghur_3e BertSentenceEmbeddings from TurkLangsTeamURFU +author: John Snow Labs +name: sent_bertbase_uyghur_3e +date: 2024-09-23 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bertbase_uyghur_3e` is a English model originally trained by TurkLangsTeamURFU. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bertbase_uyghur_3e_en_5.5.0_3.0_1727123195451.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bertbase_uyghur_3e_en_5.5.0_3.0_1727123195451.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bertbase_uyghur_3e","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bertbase_uyghur_3e","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bertbase_uyghur_3e| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|492.9 MB| + +## References + +https://huggingface.co/TurkLangsTeamURFU/BertBase_UG_3e \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bertbased_hatespeech_pretrain_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bertbased_hatespeech_pretrain_en.md new file mode 100644 index 00000000000000..28e8a3e74d021b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bertbased_hatespeech_pretrain_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bertbased_hatespeech_pretrain BertSentenceEmbeddings from agvidit1 +author: John Snow Labs +name: sent_bertbased_hatespeech_pretrain +date: 2024-09-23 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bertbased_hatespeech_pretrain` is a English model originally trained by agvidit1. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bertbased_hatespeech_pretrain_en_5.5.0_3.0_1727090966164.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bertbased_hatespeech_pretrain_en_5.5.0_3.0_1727090966164.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bertbased_hatespeech_pretrain","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bertbased_hatespeech_pretrain","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bertbased_hatespeech_pretrain| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|407.1 MB| + +## References + +https://huggingface.co/agvidit1/BertBased_HateSpeech_pretrain \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bertbased_hatespeech_pretrain_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bertbased_hatespeech_pretrain_pipeline_en.md new file mode 100644 index 00000000000000..86d566c40c8a5f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bertbased_hatespeech_pretrain_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bertbased_hatespeech_pretrain_pipeline pipeline BertSentenceEmbeddings from agvidit1 +author: John Snow Labs +name: sent_bertbased_hatespeech_pretrain_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bertbased_hatespeech_pretrain_pipeline` is a English model originally trained by agvidit1. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bertbased_hatespeech_pretrain_pipeline_en_5.5.0_3.0_1727090985489.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bertbased_hatespeech_pretrain_pipeline_en_5.5.0_3.0_1727090985489.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bertbased_hatespeech_pretrain_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bertbased_hatespeech_pretrain_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bertbased_hatespeech_pretrain_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.7 MB| + +## References + +https://huggingface.co/agvidit1/BertBased_HateSpeech_pretrain + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bertinho_galician_base_cased_gl.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bertinho_galician_base_cased_gl.md new file mode 100644 index 00000000000000..0669df15547b49 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bertinho_galician_base_cased_gl.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Galician sent_bertinho_galician_base_cased BertSentenceEmbeddings from dvilares +author: John Snow Labs +name: sent_bertinho_galician_base_cased +date: 2024-09-23 +tags: [gl, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: gl +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bertinho_galician_base_cased` is a Galician model originally trained by dvilares. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bertinho_galician_base_cased_gl_5.5.0_3.0_1727105071248.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bertinho_galician_base_cased_gl_5.5.0_3.0_1727105071248.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bertinho_galician_base_cased","gl") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bertinho_galician_base_cased","gl") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bertinho_galician_base_cased| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|gl| +|Size:|405.3 MB| + +## References + +https://huggingface.co/dvilares/bertinho-gl-base-cased \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bertinho_galician_base_cased_pipeline_gl.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bertinho_galician_base_cased_pipeline_gl.md new file mode 100644 index 00000000000000..8103076f01f277 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bertinho_galician_base_cased_pipeline_gl.md @@ -0,0 +1,71 @@ +--- +layout: model +title: Galician sent_bertinho_galician_base_cased_pipeline pipeline BertSentenceEmbeddings from dvilares +author: John Snow Labs +name: sent_bertinho_galician_base_cased_pipeline +date: 2024-09-23 +tags: [gl, open_source, pipeline, onnx] +task: Embeddings +language: gl +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bertinho_galician_base_cased_pipeline` is a Galician model originally trained by dvilares. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bertinho_galician_base_cased_pipeline_gl_5.5.0_3.0_1727105091575.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bertinho_galician_base_cased_pipeline_gl_5.5.0_3.0_1727105091575.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bertinho_galician_base_cased_pipeline", lang = "gl") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bertinho_galician_base_cased_pipeline", lang = "gl") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bertinho_galician_base_cased_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|gl| +|Size:|405.8 MB| + +## References + +https://huggingface.co/dvilares/bertinho-gl-base-cased + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bio_mobilebert_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bio_mobilebert_en.md new file mode 100644 index 00000000000000..4ac90ece21c39f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bio_mobilebert_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bio_mobilebert BertSentenceEmbeddings from nlpie +author: John Snow Labs +name: sent_bio_mobilebert +date: 2024-09-23 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bio_mobilebert` is a English model originally trained by nlpie. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bio_mobilebert_en_5.5.0_3.0_1727105328224.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bio_mobilebert_en_5.5.0_3.0_1727105328224.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bio_mobilebert","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bio_mobilebert","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bio_mobilebert| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|92.5 MB| + +## References + +https://huggingface.co/nlpie/bio-mobilebert \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bio_mobilebert_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bio_mobilebert_pipeline_en.md new file mode 100644 index 00000000000000..a89900a86d228b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bio_mobilebert_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bio_mobilebert_pipeline pipeline BertSentenceEmbeddings from nlpie +author: John Snow Labs +name: sent_bio_mobilebert_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bio_mobilebert_pipeline` is a English model originally trained by nlpie. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bio_mobilebert_pipeline_en_5.5.0_3.0_1727105334524.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bio_mobilebert_pipeline_en_5.5.0_3.0_1727105334524.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bio_mobilebert_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bio_mobilebert_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bio_mobilebert_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|93.1 MB| + +## References + +https://huggingface.co/nlpie/bio-mobilebert + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_clinical_pubmed_bert_base_128_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_clinical_pubmed_bert_base_128_en.md new file mode 100644 index 00000000000000..65e0528c34838e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_clinical_pubmed_bert_base_128_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_clinical_pubmed_bert_base_128 BertSentenceEmbeddings from Tsubasaz +author: John Snow Labs +name: sent_clinical_pubmed_bert_base_128 +date: 2024-09-23 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_clinical_pubmed_bert_base_128` is a English model originally trained by Tsubasaz. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_clinical_pubmed_bert_base_128_en_5.5.0_3.0_1727101752096.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_clinical_pubmed_bert_base_128_en_5.5.0_3.0_1727101752096.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_clinical_pubmed_bert_base_128","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_clinical_pubmed_bert_base_128","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_clinical_pubmed_bert_base_128| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|408.0 MB| + +## References + +https://huggingface.co/Tsubasaz/clinical-pubmed-bert-base-128 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_clinical_pubmed_bert_base_128_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_clinical_pubmed_bert_base_128_pipeline_en.md new file mode 100644 index 00000000000000..31f2ba2b301d16 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_clinical_pubmed_bert_base_128_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_clinical_pubmed_bert_base_128_pipeline pipeline BertSentenceEmbeddings from Tsubasaz +author: John Snow Labs +name: sent_clinical_pubmed_bert_base_128_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_clinical_pubmed_bert_base_128_pipeline` is a English model originally trained by Tsubasaz. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_clinical_pubmed_bert_base_128_pipeline_en_5.5.0_3.0_1727101771083.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_clinical_pubmed_bert_base_128_pipeline_en_5.5.0_3.0_1727101771083.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_clinical_pubmed_bert_base_128_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_clinical_pubmed_bert_base_128_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_clinical_pubmed_bert_base_128_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|408.6 MB| + +## References + +https://huggingface.co/Tsubasaz/clinical-pubmed-bert-base-128 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_clr_pretrained_bert_base_uncased_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_clr_pretrained_bert_base_uncased_en.md new file mode 100644 index 00000000000000..840f1c04d0804f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_clr_pretrained_bert_base_uncased_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_clr_pretrained_bert_base_uncased BertSentenceEmbeddings from SauravMaheshkar +author: John Snow Labs +name: sent_clr_pretrained_bert_base_uncased +date: 2024-09-23 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_clr_pretrained_bert_base_uncased` is a English model originally trained by SauravMaheshkar. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_clr_pretrained_bert_base_uncased_en_5.5.0_3.0_1727113811922.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_clr_pretrained_bert_base_uncased_en_5.5.0_3.0_1727113811922.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_clr_pretrained_bert_base_uncased","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_clr_pretrained_bert_base_uncased","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_clr_pretrained_bert_base_uncased| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|407.1 MB| + +## References + +https://huggingface.co/SauravMaheshkar/clr-pretrained-bert-base-uncased \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_clr_pretrained_bert_base_uncased_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_clr_pretrained_bert_base_uncased_pipeline_en.md new file mode 100644 index 00000000000000..52ddeb21d50207 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_clr_pretrained_bert_base_uncased_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_clr_pretrained_bert_base_uncased_pipeline pipeline BertSentenceEmbeddings from SauravMaheshkar +author: John Snow Labs +name: sent_clr_pretrained_bert_base_uncased_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_clr_pretrained_bert_base_uncased_pipeline` is a English model originally trained by SauravMaheshkar. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_clr_pretrained_bert_base_uncased_pipeline_en_5.5.0_3.0_1727113830946.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_clr_pretrained_bert_base_uncased_pipeline_en_5.5.0_3.0_1727113830946.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_clr_pretrained_bert_base_uncased_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_clr_pretrained_bert_base_uncased_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_clr_pretrained_bert_base_uncased_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.7 MB| + +## References + +https://huggingface.co/SauravMaheshkar/clr-pretrained-bert-base-uncased + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_dajobbert_base_uncased_da.md b/docs/_posts/ahmedlone127/2024-09-23-sent_dajobbert_base_uncased_da.md new file mode 100644 index 00000000000000..2241cae753e33b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_dajobbert_base_uncased_da.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Danish sent_dajobbert_base_uncased BertSentenceEmbeddings from jjzha +author: John Snow Labs +name: sent_dajobbert_base_uncased +date: 2024-09-23 +tags: [da, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: da +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_dajobbert_base_uncased` is a Danish model originally trained by jjzha. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_dajobbert_base_uncased_da_5.5.0_3.0_1727109759877.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_dajobbert_base_uncased_da_5.5.0_3.0_1727109759877.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_dajobbert_base_uncased","da") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_dajobbert_base_uncased","da") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_dajobbert_base_uncased| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|da| +|Size:|411.3 MB| + +## References + +https://huggingface.co/jjzha/dajobbert-base-uncased \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_dajobbert_base_uncased_pipeline_da.md b/docs/_posts/ahmedlone127/2024-09-23-sent_dajobbert_base_uncased_pipeline_da.md new file mode 100644 index 00000000000000..0941fb4657f0ec --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_dajobbert_base_uncased_pipeline_da.md @@ -0,0 +1,71 @@ +--- +layout: model +title: Danish sent_dajobbert_base_uncased_pipeline pipeline BertSentenceEmbeddings from jjzha +author: John Snow Labs +name: sent_dajobbert_base_uncased_pipeline +date: 2024-09-23 +tags: [da, open_source, pipeline, onnx] +task: Embeddings +language: da +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_dajobbert_base_uncased_pipeline` is a Danish model originally trained by jjzha. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_dajobbert_base_uncased_pipeline_da_5.5.0_3.0_1727109779805.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_dajobbert_base_uncased_pipeline_da_5.5.0_3.0_1727109779805.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_dajobbert_base_uncased_pipeline", lang = "da") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_dajobbert_base_uncased_pipeline", lang = "da") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_dajobbert_base_uncased_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|da| +|Size:|411.9 MB| + +## References + +https://huggingface.co/jjzha/dajobbert-base-uncased + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_danish_legal_bert_base_da.md b/docs/_posts/ahmedlone127/2024-09-23-sent_danish_legal_bert_base_da.md new file mode 100644 index 00000000000000..b2189163b2ed27 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_danish_legal_bert_base_da.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Danish sent_danish_legal_bert_base BertSentenceEmbeddings from coastalcph +author: John Snow Labs +name: sent_danish_legal_bert_base +date: 2024-09-23 +tags: [da, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: da +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_danish_legal_bert_base` is a Danish model originally trained by coastalcph. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_danish_legal_bert_base_da_5.5.0_3.0_1727123277252.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_danish_legal_bert_base_da_5.5.0_3.0_1727123277252.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_danish_legal_bert_base","da") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_danish_legal_bert_base","da") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_danish_legal_bert_base| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|da| +|Size:|411.6 MB| + +## References + +https://huggingface.co/coastalcph/danish-legal-bert-base \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_danish_legal_bert_base_pipeline_da.md b/docs/_posts/ahmedlone127/2024-09-23-sent_danish_legal_bert_base_pipeline_da.md new file mode 100644 index 00000000000000..0b5dd7512b7477 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_danish_legal_bert_base_pipeline_da.md @@ -0,0 +1,71 @@ +--- +layout: model +title: Danish sent_danish_legal_bert_base_pipeline pipeline BertSentenceEmbeddings from coastalcph +author: John Snow Labs +name: sent_danish_legal_bert_base_pipeline +date: 2024-09-23 +tags: [da, open_source, pipeline, onnx] +task: Embeddings +language: da +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_danish_legal_bert_base_pipeline` is a Danish model originally trained by coastalcph. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_danish_legal_bert_base_pipeline_da_5.5.0_3.0_1727123296342.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_danish_legal_bert_base_pipeline_da_5.5.0_3.0_1727123296342.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_danish_legal_bert_base_pipeline", lang = "da") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_danish_legal_bert_base_pipeline", lang = "da") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_danish_legal_bert_base_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|da| +|Size:|412.1 MB| + +## References + +https://huggingface.co/coastalcph/danish-legal-bert-base + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_defsent_bert_base_uncased_max_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_defsent_bert_base_uncased_max_en.md new file mode 100644 index 00000000000000..e4319171a03fe5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_defsent_bert_base_uncased_max_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_defsent_bert_base_uncased_max BertSentenceEmbeddings from cl-nagoya +author: John Snow Labs +name: sent_defsent_bert_base_uncased_max +date: 2024-09-23 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_defsent_bert_base_uncased_max` is a English model originally trained by cl-nagoya. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_defsent_bert_base_uncased_max_en_5.5.0_3.0_1727101689855.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_defsent_bert_base_uncased_max_en_5.5.0_3.0_1727101689855.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_defsent_bert_base_uncased_max","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_defsent_bert_base_uncased_max","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_defsent_bert_base_uncased_max| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/cl-nagoya/defsent-bert-base-uncased-max \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_defsent_bert_base_uncased_max_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_defsent_bert_base_uncased_max_pipeline_en.md new file mode 100644 index 00000000000000..03b8f0c42c1d0b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_defsent_bert_base_uncased_max_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_defsent_bert_base_uncased_max_pipeline pipeline BertSentenceEmbeddings from cl-nagoya +author: John Snow Labs +name: sent_defsent_bert_base_uncased_max_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_defsent_bert_base_uncased_max_pipeline` is a English model originally trained by cl-nagoya. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_defsent_bert_base_uncased_max_pipeline_en_5.5.0_3.0_1727101710915.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_defsent_bert_base_uncased_max_pipeline_en_5.5.0_3.0_1727101710915.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_defsent_bert_base_uncased_max_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_defsent_bert_base_uncased_max_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_defsent_bert_base_uncased_max_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.7 MB| + +## References + +https://huggingface.co/cl-nagoya/defsent-bert-base-uncased-max + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_german_english_code_switching_bert_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_german_english_code_switching_bert_en.md new file mode 100644 index 00000000000000..c5c12bf9677949 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_german_english_code_switching_bert_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_german_english_code_switching_bert BertSentenceEmbeddings from igorsterner +author: John Snow Labs +name: sent_german_english_code_switching_bert +date: 2024-09-23 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_german_english_code_switching_bert` is a English model originally trained by igorsterner. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_german_english_code_switching_bert_en_5.5.0_3.0_1727109972427.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_german_english_code_switching_bert_en_5.5.0_3.0_1727109972427.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_german_english_code_switching_bert","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_german_english_code_switching_bert","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_german_english_code_switching_bert| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|664.7 MB| + +## References + +https://huggingface.co/igorsterner/german-english-code-switching-bert \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_german_english_code_switching_bert_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_german_english_code_switching_bert_pipeline_en.md new file mode 100644 index 00000000000000..9fb37c3a37cb95 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_german_english_code_switching_bert_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_german_english_code_switching_bert_pipeline pipeline BertSentenceEmbeddings from igorsterner +author: John Snow Labs +name: sent_german_english_code_switching_bert_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_german_english_code_switching_bert_pipeline` is a English model originally trained by igorsterner. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_german_english_code_switching_bert_pipeline_en_5.5.0_3.0_1727110003933.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_german_english_code_switching_bert_pipeline_en_5.5.0_3.0_1727110003933.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_german_english_code_switching_bert_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_german_english_code_switching_bert_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_german_english_code_switching_bert_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|665.3 MB| + +## References + +https://huggingface.co/igorsterner/german-english-code-switching-bert + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_gujarati_bert_gu.md b/docs/_posts/ahmedlone127/2024-09-23-sent_gujarati_bert_gu.md new file mode 100644 index 00000000000000..7bb3bf0fe5192d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_gujarati_bert_gu.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Gujarati sent_gujarati_bert BertSentenceEmbeddings from l3cube-pune +author: John Snow Labs +name: sent_gujarati_bert +date: 2024-09-23 +tags: [gu, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: gu +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_gujarati_bert` is a Gujarati model originally trained by l3cube-pune. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_gujarati_bert_gu_5.5.0_3.0_1727101739126.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_gujarati_bert_gu_5.5.0_3.0_1727101739126.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_gujarati_bert","gu") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_gujarati_bert","gu") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_gujarati_bert| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|gu| +|Size:|890.5 MB| + +## References + +https://huggingface.co/l3cube-pune/gujarati-bert \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_gujarati_bert_pipeline_gu.md b/docs/_posts/ahmedlone127/2024-09-23-sent_gujarati_bert_pipeline_gu.md new file mode 100644 index 00000000000000..e0f1abd75a6c5e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_gujarati_bert_pipeline_gu.md @@ -0,0 +1,71 @@ +--- +layout: model +title: Gujarati sent_gujarati_bert_pipeline pipeline BertSentenceEmbeddings from l3cube-pune +author: John Snow Labs +name: sent_gujarati_bert_pipeline +date: 2024-09-23 +tags: [gu, open_source, pipeline, onnx] +task: Embeddings +language: gu +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_gujarati_bert_pipeline` is a Gujarati model originally trained by l3cube-pune. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_gujarati_bert_pipeline_gu_5.5.0_3.0_1727101781243.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_gujarati_bert_pipeline_gu_5.5.0_3.0_1727101781243.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_gujarati_bert_pipeline", lang = "gu") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_gujarati_bert_pipeline", lang = "gu") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_gujarati_bert_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|gu| +|Size:|891.0 MB| + +## References + +https://huggingface.co/l3cube-pune/gujarati-bert + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_hindi_bert_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_hindi_bert_en.md new file mode 100644 index 00000000000000..93b32d00058ea7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_hindi_bert_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_hindi_bert BertSentenceEmbeddings from sukritin +author: John Snow Labs +name: sent_hindi_bert +date: 2024-09-23 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_hindi_bert` is a English model originally trained by sukritin. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_hindi_bert_en_5.5.0_3.0_1727110200426.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_hindi_bert_en_5.5.0_3.0_1727110200426.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_hindi_bert","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_hindi_bert","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_hindi_bert| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|609.3 MB| + +## References + +https://huggingface.co/sukritin/hindi-bert \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_hindi_bert_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_hindi_bert_pipeline_en.md new file mode 100644 index 00000000000000..9363fb2b4b0b64 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_hindi_bert_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_hindi_bert_pipeline pipeline BertSentenceEmbeddings from sukritin +author: John Snow Labs +name: sent_hindi_bert_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_hindi_bert_pipeline` is a English model originally trained by sukritin. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_hindi_bert_pipeline_en_5.5.0_3.0_1727110229750.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_hindi_bert_pipeline_en_5.5.0_3.0_1727110229750.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_hindi_bert_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_hindi_bert_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_hindi_bert_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|609.8 MB| + +## References + +https://huggingface.co/sukritin/hindi-bert + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_malay_bert_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_malay_bert_pipeline_en.md new file mode 100644 index 00000000000000..dcc9b4b67d8da0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_malay_bert_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_malay_bert_pipeline pipeline BertSentenceEmbeddings from NLP4H +author: John Snow Labs +name: sent_malay_bert_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_malay_bert_pipeline` is a English model originally trained by NLP4H. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_malay_bert_pipeline_en_5.5.0_3.0_1727101712886.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_malay_bert_pipeline_en_5.5.0_3.0_1727101712886.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_malay_bert_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_malay_bert_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_malay_bert_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|408.8 MB| + +## References + +https://huggingface.co/NLP4H/ms_bert + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_mobilebert_sanskrit_saskta_pre_training_complete_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_mobilebert_sanskrit_saskta_pre_training_complete_en.md new file mode 100644 index 00000000000000..f57757f8bad17c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_mobilebert_sanskrit_saskta_pre_training_complete_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_mobilebert_sanskrit_saskta_pre_training_complete BertSentenceEmbeddings from gokuls +author: John Snow Labs +name: sent_mobilebert_sanskrit_saskta_pre_training_complete +date: 2024-09-23 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_mobilebert_sanskrit_saskta_pre_training_complete` is a English model originally trained by gokuls. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_mobilebert_sanskrit_saskta_pre_training_complete_en_5.5.0_3.0_1727105588822.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_mobilebert_sanskrit_saskta_pre_training_complete_en_5.5.0_3.0_1727105588822.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_mobilebert_sanskrit_saskta_pre_training_complete","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_mobilebert_sanskrit_saskta_pre_training_complete","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_mobilebert_sanskrit_saskta_pre_training_complete| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|92.5 MB| + +## References + +https://huggingface.co/gokuls/mobilebert_sa_pre-training-complete \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_mobilebert_sanskrit_saskta_pre_training_complete_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_mobilebert_sanskrit_saskta_pre_training_complete_pipeline_en.md new file mode 100644 index 00000000000000..5e302e980ffa19 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_mobilebert_sanskrit_saskta_pre_training_complete_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_mobilebert_sanskrit_saskta_pre_training_complete_pipeline pipeline BertSentenceEmbeddings from gokuls +author: John Snow Labs +name: sent_mobilebert_sanskrit_saskta_pre_training_complete_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_mobilebert_sanskrit_saskta_pre_training_complete_pipeline` is a English model originally trained by gokuls. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_mobilebert_sanskrit_saskta_pre_training_complete_pipeline_en_5.5.0_3.0_1727105593324.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_mobilebert_sanskrit_saskta_pre_training_complete_pipeline_en_5.5.0_3.0_1727105593324.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_mobilebert_sanskrit_saskta_pre_training_complete_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_mobilebert_sanskrit_saskta_pre_training_complete_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_mobilebert_sanskrit_saskta_pre_training_complete_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|93.1 MB| + +## References + +https://huggingface.co/gokuls/mobilebert_sa_pre-training-complete + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_roboust_nlp_xlmr_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_roboust_nlp_xlmr_en.md new file mode 100644 index 00000000000000..03fd992aef5b88 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_roboust_nlp_xlmr_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_roboust_nlp_xlmr XlmRoBertaSentenceEmbeddings from Blue7Bird +author: John Snow Labs +name: sent_roboust_nlp_xlmr +date: 2024-09-23 +tags: [en, open_source, onnx, sentence_embeddings, xlm_roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_roboust_nlp_xlmr` is a English model originally trained by Blue7Bird. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_roboust_nlp_xlmr_en_5.5.0_3.0_1727062754592.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_roboust_nlp_xlmr_en_5.5.0_3.0_1727062754592.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = XlmRoBertaSentenceEmbeddings.pretrained("sent_roboust_nlp_xlmr","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = XlmRoBertaSentenceEmbeddings.pretrained("sent_roboust_nlp_xlmr","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_roboust_nlp_xlmr| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|1.0 GB| + +## References + +https://huggingface.co/Blue7Bird/Roboust_nlp_xlmr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_roboust_nlp_xlmr_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_roboust_nlp_xlmr_pipeline_en.md new file mode 100644 index 00000000000000..709d66a9348a9b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_roboust_nlp_xlmr_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_roboust_nlp_xlmr_pipeline pipeline XlmRoBertaSentenceEmbeddings from Blue7Bird +author: John Snow Labs +name: sent_roboust_nlp_xlmr_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_roboust_nlp_xlmr_pipeline` is a English model originally trained by Blue7Bird. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_roboust_nlp_xlmr_pipeline_en_5.5.0_3.0_1727062803456.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_roboust_nlp_xlmr_pipeline_en_5.5.0_3.0_1727062803456.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_roboust_nlp_xlmr_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_roboust_nlp_xlmr_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_roboust_nlp_xlmr_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.0 GB| + +## References + +https://huggingface.co/Blue7Bird/Roboust_nlp_xlmr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- XlmRoBertaSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_small_mlm_rotten_tomatoes_custom_tokenizer_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_small_mlm_rotten_tomatoes_custom_tokenizer_en.md new file mode 100644 index 00000000000000..e134effc07c094 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_small_mlm_rotten_tomatoes_custom_tokenizer_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_small_mlm_rotten_tomatoes_custom_tokenizer BertSentenceEmbeddings from muhtasham +author: John Snow Labs +name: sent_small_mlm_rotten_tomatoes_custom_tokenizer +date: 2024-09-23 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_small_mlm_rotten_tomatoes_custom_tokenizer` is a English model originally trained by muhtasham. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_small_mlm_rotten_tomatoes_custom_tokenizer_en_5.5.0_3.0_1727109900453.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_small_mlm_rotten_tomatoes_custom_tokenizer_en_5.5.0_3.0_1727109900453.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_small_mlm_rotten_tomatoes_custom_tokenizer","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_small_mlm_rotten_tomatoes_custom_tokenizer","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_small_mlm_rotten_tomatoes_custom_tokenizer| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|106.9 MB| + +## References + +https://huggingface.co/muhtasham/small-mlm-rotten_tomatoes-custom-tokenizer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_small_mlm_rotten_tomatoes_custom_tokenizer_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_small_mlm_rotten_tomatoes_custom_tokenizer_pipeline_en.md new file mode 100644 index 00000000000000..6c8921cd40d16e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_small_mlm_rotten_tomatoes_custom_tokenizer_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_small_mlm_rotten_tomatoes_custom_tokenizer_pipeline pipeline BertSentenceEmbeddings from muhtasham +author: John Snow Labs +name: sent_small_mlm_rotten_tomatoes_custom_tokenizer_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_small_mlm_rotten_tomatoes_custom_tokenizer_pipeline` is a English model originally trained by muhtasham. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_small_mlm_rotten_tomatoes_custom_tokenizer_pipeline_en_5.5.0_3.0_1727109905716.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_small_mlm_rotten_tomatoes_custom_tokenizer_pipeline_en_5.5.0_3.0_1727109905716.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_small_mlm_rotten_tomatoes_custom_tokenizer_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_small_mlm_rotten_tomatoes_custom_tokenizer_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_small_mlm_rotten_tomatoes_custom_tokenizer_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|107.5 MB| + +## References + +https://huggingface.co/muhtasham/small-mlm-rotten_tomatoes-custom-tokenizer + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_test_bert_base_uncased_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_test_bert_base_uncased_en.md new file mode 100644 index 00000000000000..37382acfb8e303 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_test_bert_base_uncased_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_test_bert_base_uncased BertSentenceEmbeddings from kkkzzzkkk +author: John Snow Labs +name: sent_test_bert_base_uncased +date: 2024-09-23 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_test_bert_base_uncased` is a English model originally trained by kkkzzzkkk. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_test_bert_base_uncased_en_5.5.0_3.0_1727123025663.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_test_bert_base_uncased_en_5.5.0_3.0_1727123025663.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_test_bert_base_uncased","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_test_bert_base_uncased","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_test_bert_base_uncased| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/kkkzzzkkk/test_bert-base-uncased \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_test_bert_base_uncased_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_test_bert_base_uncased_pipeline_en.md new file mode 100644 index 00000000000000..49d33f07c3fb43 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_test_bert_base_uncased_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_test_bert_base_uncased_pipeline pipeline BertSentenceEmbeddings from kkkzzzkkk +author: John Snow Labs +name: sent_test_bert_base_uncased_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_test_bert_base_uncased_pipeline` is a English model originally trained by kkkzzzkkk. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_test_bert_base_uncased_pipeline_en_5.5.0_3.0_1727123045148.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_test_bert_base_uncased_pipeline_en_5.5.0_3.0_1727123045148.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_test_bert_base_uncased_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_test_bert_base_uncased_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_test_bert_base_uncased_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.7 MB| + +## References + +https://huggingface.co/kkkzzzkkk/test_bert-base-uncased + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_tiny_mlm_glue_mrpc_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_tiny_mlm_glue_mrpc_en.md new file mode 100644 index 00000000000000..4ada972f6b36e1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_tiny_mlm_glue_mrpc_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_tiny_mlm_glue_mrpc BertSentenceEmbeddings from muhtasham +author: John Snow Labs +name: sent_tiny_mlm_glue_mrpc +date: 2024-09-23 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_tiny_mlm_glue_mrpc` is a English model originally trained by muhtasham. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_tiny_mlm_glue_mrpc_en_5.5.0_3.0_1727105587312.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_tiny_mlm_glue_mrpc_en_5.5.0_3.0_1727105587312.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_tiny_mlm_glue_mrpc","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_tiny_mlm_glue_mrpc","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_tiny_mlm_glue_mrpc| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|16.7 MB| + +## References + +https://huggingface.co/muhtasham/tiny-mlm-glue-mrpc \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_tiny_mlm_glue_mrpc_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_tiny_mlm_glue_mrpc_pipeline_en.md new file mode 100644 index 00000000000000..d6244a9bff55ea --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_tiny_mlm_glue_mrpc_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_tiny_mlm_glue_mrpc_pipeline pipeline BertSentenceEmbeddings from muhtasham +author: John Snow Labs +name: sent_tiny_mlm_glue_mrpc_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_tiny_mlm_glue_mrpc_pipeline` is a English model originally trained by muhtasham. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_tiny_mlm_glue_mrpc_pipeline_en_5.5.0_3.0_1727105588548.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_tiny_mlm_glue_mrpc_pipeline_en_5.5.0_3.0_1727105588548.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_tiny_mlm_glue_mrpc_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_tiny_mlm_glue_mrpc_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_tiny_mlm_glue_mrpc_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|17.2 MB| + +## References + +https://huggingface.co/muhtasham/tiny-mlm-glue-mrpc + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_twitch_bert_base_cased_pytorch_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_twitch_bert_base_cased_pytorch_en.md new file mode 100644 index 00000000000000..96fcfc50384550 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_twitch_bert_base_cased_pytorch_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_twitch_bert_base_cased_pytorch BertSentenceEmbeddings from veb +author: John Snow Labs +name: sent_twitch_bert_base_cased_pytorch +date: 2024-09-23 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_twitch_bert_base_cased_pytorch` is a English model originally trained by veb. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_twitch_bert_base_cased_pytorch_en_5.5.0_3.0_1727113974443.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_twitch_bert_base_cased_pytorch_en_5.5.0_3.0_1727113974443.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_twitch_bert_base_cased_pytorch","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_twitch_bert_base_cased_pytorch","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_twitch_bert_base_cased_pytorch| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/veb/twitch-bert-base-cased-pytorch \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_twitch_bert_base_cased_pytorch_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_twitch_bert_base_cased_pytorch_pipeline_en.md new file mode 100644 index 00000000000000..b89a25e39b2199 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_twitch_bert_base_cased_pytorch_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_twitch_bert_base_cased_pytorch_pipeline pipeline BertSentenceEmbeddings from veb +author: John Snow Labs +name: sent_twitch_bert_base_cased_pytorch_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_twitch_bert_base_cased_pytorch_pipeline` is a English model originally trained by veb. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_twitch_bert_base_cased_pytorch_pipeline_en_5.5.0_3.0_1727113993094.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_twitch_bert_base_cased_pytorch_pipeline_en_5.5.0_3.0_1727113993094.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_twitch_bert_base_cased_pytorch_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_twitch_bert_base_cased_pytorch_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_twitch_bert_base_cased_pytorch_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.7 MB| + +## References + +https://huggingface.co/veb/twitch-bert-base-cased-pytorch + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_viz_wiz_bert_base_uncased_f16_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_viz_wiz_bert_base_uncased_f16_en.md new file mode 100644 index 00000000000000..fda2175d5086cd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_viz_wiz_bert_base_uncased_f16_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_viz_wiz_bert_base_uncased_f16 BertSentenceEmbeddings from eisenjulian +author: John Snow Labs +name: sent_viz_wiz_bert_base_uncased_f16 +date: 2024-09-23 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_viz_wiz_bert_base_uncased_f16` is a English model originally trained by eisenjulian. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_viz_wiz_bert_base_uncased_f16_en_5.5.0_3.0_1727091386839.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_viz_wiz_bert_base_uncased_f16_en_5.5.0_3.0_1727091386839.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_viz_wiz_bert_base_uncased_f16","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_viz_wiz_bert_base_uncased_f16","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_viz_wiz_bert_base_uncased_f16| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/eisenjulian/viz-wiz-bert-base-uncased_f16 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_viz_wiz_bert_base_uncased_f16_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_viz_wiz_bert_base_uncased_f16_pipeline_en.md new file mode 100644 index 00000000000000..83038442a9601c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_viz_wiz_bert_base_uncased_f16_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_viz_wiz_bert_base_uncased_f16_pipeline pipeline BertSentenceEmbeddings from eisenjulian +author: John Snow Labs +name: sent_viz_wiz_bert_base_uncased_f16_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_viz_wiz_bert_base_uncased_f16_pipeline` is a English model originally trained by eisenjulian. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_viz_wiz_bert_base_uncased_f16_pipeline_en_5.5.0_3.0_1727091406148.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_viz_wiz_bert_base_uncased_f16_pipeline_en_5.5.0_3.0_1727091406148.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_viz_wiz_bert_base_uncased_f16_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_viz_wiz_bert_base_uncased_f16_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_viz_wiz_bert_base_uncased_f16_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.7 MB| + +## References + +https://huggingface.co/eisenjulian/viz-wiz-bert-base-uncased_f16 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sentiment_analysis_model_en.md b/docs/_posts/ahmedlone127/2024-09-23-sentiment_analysis_model_en.md new file mode 100644 index 00000000000000..093750a3cdc044 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sentiment_analysis_model_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sentiment_analysis_model RoBertaForSequenceClassification from fusersam +author: John Snow Labs +name: sentiment_analysis_model +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sentiment_analysis_model` is a English model originally trained by fusersam. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sentiment_analysis_model_en_5.5.0_3.0_1727054807998.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sentiment_analysis_model_en_5.5.0_3.0_1727054807998.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("sentiment_analysis_model","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("sentiment_analysis_model", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sentiment_analysis_model| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|436.3 MB| + +## References + +https://huggingface.co/fusersam/Sentiment-Analysis-Model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sentiment_analysis_model_kiranwood_en.md b/docs/_posts/ahmedlone127/2024-09-23-sentiment_analysis_model_kiranwood_en.md new file mode 100644 index 00000000000000..219fdb1a8436fe --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sentiment_analysis_model_kiranwood_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sentiment_analysis_model_kiranwood DistilBertForSequenceClassification from KiranWood +author: John Snow Labs +name: sentiment_analysis_model_kiranwood +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sentiment_analysis_model_kiranwood` is a English model originally trained by KiranWood. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sentiment_analysis_model_kiranwood_en_5.5.0_3.0_1727073619345.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sentiment_analysis_model_kiranwood_en_5.5.0_3.0_1727073619345.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("sentiment_analysis_model_kiranwood","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("sentiment_analysis_model_kiranwood", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sentiment_analysis_model_kiranwood| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/KiranWood/sentiment-analysis-model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sentiment_analysis_model_kiranwood_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-sentiment_analysis_model_kiranwood_pipeline_en.md new file mode 100644 index 00000000000000..af3588a4d24765 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sentiment_analysis_model_kiranwood_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English sentiment_analysis_model_kiranwood_pipeline pipeline DistilBertForSequenceClassification from KiranWood +author: John Snow Labs +name: sentiment_analysis_model_kiranwood_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sentiment_analysis_model_kiranwood_pipeline` is a English model originally trained by KiranWood. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sentiment_analysis_model_kiranwood_pipeline_en_5.5.0_3.0_1727073631451.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sentiment_analysis_model_kiranwood_pipeline_en_5.5.0_3.0_1727073631451.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sentiment_analysis_model_kiranwood_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sentiment_analysis_model_kiranwood_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sentiment_analysis_model_kiranwood_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/KiranWood/sentiment-analysis-model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sentiment_analysis_model_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-sentiment_analysis_model_pipeline_en.md new file mode 100644 index 00000000000000..6ee479742695e5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sentiment_analysis_model_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English sentiment_analysis_model_pipeline pipeline RoBertaForSequenceClassification from fusersam +author: John Snow Labs +name: sentiment_analysis_model_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sentiment_analysis_model_pipeline` is a English model originally trained by fusersam. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sentiment_analysis_model_pipeline_en_5.5.0_3.0_1727054841069.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sentiment_analysis_model_pipeline_en_5.5.0_3.0_1727054841069.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sentiment_analysis_model_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sentiment_analysis_model_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sentiment_analysis_model_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|436.3 MB| + +## References + +https://huggingface.co/fusersam/Sentiment-Analysis-Model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sentiment_analysis_model_team_28_a01794830_en.md b/docs/_posts/ahmedlone127/2024-09-23-sentiment_analysis_model_team_28_a01794830_en.md new file mode 100644 index 00000000000000..c9283df5ab0dc2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sentiment_analysis_model_team_28_a01794830_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sentiment_analysis_model_team_28_a01794830 DistilBertForSequenceClassification from a01794830 +author: John Snow Labs +name: sentiment_analysis_model_team_28_a01794830 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sentiment_analysis_model_team_28_a01794830` is a English model originally trained by a01794830. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sentiment_analysis_model_team_28_a01794830_en_5.5.0_3.0_1727094145191.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sentiment_analysis_model_team_28_a01794830_en_5.5.0_3.0_1727094145191.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("sentiment_analysis_model_team_28_a01794830","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("sentiment_analysis_model_team_28_a01794830", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sentiment_analysis_model_team_28_a01794830| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/a01794830/sentiment-analysis-model-team-28 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sentiment_analysis_model_team_28_a01794830_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-sentiment_analysis_model_team_28_a01794830_pipeline_en.md new file mode 100644 index 00000000000000..8836a9e7f51053 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sentiment_analysis_model_team_28_a01794830_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English sentiment_analysis_model_team_28_a01794830_pipeline pipeline DistilBertForSequenceClassification from a01794830 +author: John Snow Labs +name: sentiment_analysis_model_team_28_a01794830_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sentiment_analysis_model_team_28_a01794830_pipeline` is a English model originally trained by a01794830. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sentiment_analysis_model_team_28_a01794830_pipeline_en_5.5.0_3.0_1727094156564.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sentiment_analysis_model_team_28_a01794830_pipeline_en_5.5.0_3.0_1727094156564.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sentiment_analysis_model_team_28_a01794830_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sentiment_analysis_model_team_28_a01794830_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sentiment_analysis_model_team_28_a01794830_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/a01794830/sentiment-analysis-model-team-28 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sentiment_roberta_latest_e8_b16_data2_en.md b/docs/_posts/ahmedlone127/2024-09-23-sentiment_roberta_latest_e8_b16_data2_en.md new file mode 100644 index 00000000000000..b06338172cfc45 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sentiment_roberta_latest_e8_b16_data2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sentiment_roberta_latest_e8_b16_data2 RoBertaForSequenceClassification from JerryYanJiang +author: John Snow Labs +name: sentiment_roberta_latest_e8_b16_data2 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sentiment_roberta_latest_e8_b16_data2` is a English model originally trained by JerryYanJiang. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sentiment_roberta_latest_e8_b16_data2_en_5.5.0_3.0_1727054731811.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sentiment_roberta_latest_e8_b16_data2_en_5.5.0_3.0_1727054731811.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("sentiment_roberta_latest_e8_b16_data2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("sentiment_roberta_latest_e8_b16_data2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sentiment_roberta_latest_e8_b16_data2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|468.3 MB| + +## References + +https://huggingface.co/JerryYanJiang/sentiment-roberta-latest-e8-b16-data2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sentiment_roberta_latest_e8_b16_data2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-sentiment_roberta_latest_e8_b16_data2_pipeline_en.md new file mode 100644 index 00000000000000..baf2ef047bdc10 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sentiment_roberta_latest_e8_b16_data2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English sentiment_roberta_latest_e8_b16_data2_pipeline pipeline RoBertaForSequenceClassification from JerryYanJiang +author: John Snow Labs +name: sentiment_roberta_latest_e8_b16_data2_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sentiment_roberta_latest_e8_b16_data2_pipeline` is a English model originally trained by JerryYanJiang. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sentiment_roberta_latest_e8_b16_data2_pipeline_en_5.5.0_3.0_1727054758471.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sentiment_roberta_latest_e8_b16_data2_pipeline_en_5.5.0_3.0_1727054758471.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sentiment_roberta_latest_e8_b16_data2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sentiment_roberta_latest_e8_b16_data2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sentiment_roberta_latest_e8_b16_data2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|468.3 MB| + +## References + +https://huggingface.co/JerryYanJiang/sentiment-roberta-latest-e8-b16-data2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sentiment_sentiment_small_random3_seed0_bernice_en.md b/docs/_posts/ahmedlone127/2024-09-23-sentiment_sentiment_small_random3_seed0_bernice_en.md new file mode 100644 index 00000000000000..f26658fe5039e6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sentiment_sentiment_small_random3_seed0_bernice_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sentiment_sentiment_small_random3_seed0_bernice XlmRoBertaForSequenceClassification from tweettemposhift +author: John Snow Labs +name: sentiment_sentiment_small_random3_seed0_bernice +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sentiment_sentiment_small_random3_seed0_bernice` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sentiment_sentiment_small_random3_seed0_bernice_en_5.5.0_3.0_1727100070152.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sentiment_sentiment_small_random3_seed0_bernice_en_5.5.0_3.0_1727100070152.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("sentiment_sentiment_small_random3_seed0_bernice","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("sentiment_sentiment_small_random3_seed0_bernice", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sentiment_sentiment_small_random3_seed0_bernice| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|790.3 MB| + +## References + +https://huggingface.co/tweettemposhift/sentiment-sentiment_small_random3_seed0-bernice \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sentiment_sentiment_small_random3_seed0_bernice_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-sentiment_sentiment_small_random3_seed0_bernice_pipeline_en.md new file mode 100644 index 00000000000000..ed1c5539ee25ba --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sentiment_sentiment_small_random3_seed0_bernice_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English sentiment_sentiment_small_random3_seed0_bernice_pipeline pipeline XlmRoBertaForSequenceClassification from tweettemposhift +author: John Snow Labs +name: sentiment_sentiment_small_random3_seed0_bernice_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sentiment_sentiment_small_random3_seed0_bernice_pipeline` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sentiment_sentiment_small_random3_seed0_bernice_pipeline_en_5.5.0_3.0_1727100208066.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sentiment_sentiment_small_random3_seed0_bernice_pipeline_en_5.5.0_3.0_1727100208066.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sentiment_sentiment_small_random3_seed0_bernice_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sentiment_sentiment_small_random3_seed0_bernice_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sentiment_sentiment_small_random3_seed0_bernice_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|790.3 MB| + +## References + +https://huggingface.co/tweettemposhift/sentiment-sentiment_small_random3_seed0-bernice + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sentiment_sentiment_small_random3_seed0_twitter_roberta_base_2022_154m_en.md b/docs/_posts/ahmedlone127/2024-09-23-sentiment_sentiment_small_random3_seed0_twitter_roberta_base_2022_154m_en.md new file mode 100644 index 00000000000000..bf31d3594cb2cc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sentiment_sentiment_small_random3_seed0_twitter_roberta_base_2022_154m_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sentiment_sentiment_small_random3_seed0_twitter_roberta_base_2022_154m RoBertaForSequenceClassification from tweettemposhift +author: John Snow Labs +name: sentiment_sentiment_small_random3_seed0_twitter_roberta_base_2022_154m +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sentiment_sentiment_small_random3_seed0_twitter_roberta_base_2022_154m` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sentiment_sentiment_small_random3_seed0_twitter_roberta_base_2022_154m_en_5.5.0_3.0_1727054732920.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sentiment_sentiment_small_random3_seed0_twitter_roberta_base_2022_154m_en_5.5.0_3.0_1727054732920.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("sentiment_sentiment_small_random3_seed0_twitter_roberta_base_2022_154m","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("sentiment_sentiment_small_random3_seed0_twitter_roberta_base_2022_154m", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sentiment_sentiment_small_random3_seed0_twitter_roberta_base_2022_154m| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|468.1 MB| + +## References + +https://huggingface.co/tweettemposhift/sentiment-sentiment_small_random3_seed0-twitter-roberta-base-2022-154m \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sentiment_sentiment_small_random3_seed0_twitter_roberta_base_2022_154m_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-sentiment_sentiment_small_random3_seed0_twitter_roberta_base_2022_154m_pipeline_en.md new file mode 100644 index 00000000000000..bf3dbe70fe212d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sentiment_sentiment_small_random3_seed0_twitter_roberta_base_2022_154m_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English sentiment_sentiment_small_random3_seed0_twitter_roberta_base_2022_154m_pipeline pipeline RoBertaForSequenceClassification from tweettemposhift +author: John Snow Labs +name: sentiment_sentiment_small_random3_seed0_twitter_roberta_base_2022_154m_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sentiment_sentiment_small_random3_seed0_twitter_roberta_base_2022_154m_pipeline` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sentiment_sentiment_small_random3_seed0_twitter_roberta_base_2022_154m_pipeline_en_5.5.0_3.0_1727054758620.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sentiment_sentiment_small_random3_seed0_twitter_roberta_base_2022_154m_pipeline_en_5.5.0_3.0_1727054758620.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sentiment_sentiment_small_random3_seed0_twitter_roberta_base_2022_154m_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sentiment_sentiment_small_random3_seed0_twitter_roberta_base_2022_154m_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sentiment_sentiment_small_random3_seed0_twitter_roberta_base_2022_154m_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|468.1 MB| + +## References + +https://huggingface.co/tweettemposhift/sentiment-sentiment_small_random3_seed0-twitter-roberta-base-2022-154m + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sentimentanalysis_imdb_en.md b/docs/_posts/ahmedlone127/2024-09-23-sentimentanalysis_imdb_en.md new file mode 100644 index 00000000000000..9ccb25733a2d7a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sentimentanalysis_imdb_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sentimentanalysis_imdb DistilBertForSequenceClassification from johnchangbviwit +author: John Snow Labs +name: sentimentanalysis_imdb +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sentimentanalysis_imdb` is a English model originally trained by johnchangbviwit. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sentimentanalysis_imdb_en_5.5.0_3.0_1727059885759.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sentimentanalysis_imdb_en_5.5.0_3.0_1727059885759.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("sentimentanalysis_imdb","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("sentimentanalysis_imdb", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sentimentanalysis_imdb| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/johnchangbviwit/sentimentanalysis-imdb \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sentimentanalysis_imdb_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-sentimentanalysis_imdb_pipeline_en.md new file mode 100644 index 00000000000000..181aea145ee75a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sentimentanalysis_imdb_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English sentimentanalysis_imdb_pipeline pipeline DistilBertForSequenceClassification from johnchangbviwit +author: John Snow Labs +name: sentimentanalysis_imdb_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sentimentanalysis_imdb_pipeline` is a English model originally trained by johnchangbviwit. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sentimentanalysis_imdb_pipeline_en_5.5.0_3.0_1727059897505.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sentimentanalysis_imdb_pipeline_en_5.5.0_3.0_1727059897505.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sentimentanalysis_imdb_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sentimentanalysis_imdb_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sentimentanalysis_imdb_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/johnchangbviwit/sentimentanalysis-imdb + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sentiments_analysis_roberta_en.md b/docs/_posts/ahmedlone127/2024-09-23-sentiments_analysis_roberta_en.md new file mode 100644 index 00000000000000..abe88620dd813d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sentiments_analysis_roberta_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sentiments_analysis_roberta RoBertaForSequenceClassification from Junr-syl +author: John Snow Labs +name: sentiments_analysis_roberta +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sentiments_analysis_roberta` is a English model originally trained by Junr-syl. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sentiments_analysis_roberta_en_5.5.0_3.0_1727086030953.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sentiments_analysis_roberta_en_5.5.0_3.0_1727086030953.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("sentiments_analysis_roberta","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("sentiments_analysis_roberta", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sentiments_analysis_roberta| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|438.9 MB| + +## References + +https://huggingface.co/Junr-syl/sentiments_analysis_Roberta \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sentiments_analysis_roberta_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-sentiments_analysis_roberta_pipeline_en.md new file mode 100644 index 00000000000000..ecadd18baf6512 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sentiments_analysis_roberta_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English sentiments_analysis_roberta_pipeline pipeline RoBertaForSequenceClassification from Junr-syl +author: John Snow Labs +name: sentiments_analysis_roberta_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sentiments_analysis_roberta_pipeline` is a English model originally trained by Junr-syl. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sentiments_analysis_roberta_pipeline_en_5.5.0_3.0_1727086055443.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sentiments_analysis_roberta_pipeline_en_5.5.0_3.0_1727086055443.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sentiments_analysis_roberta_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sentiments_analysis_roberta_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sentiments_analysis_roberta_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|439.0 MB| + +## References + +https://huggingface.co/Junr-syl/sentiments_analysis_Roberta + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-slovakbert_upos_en.md b/docs/_posts/ahmedlone127/2024-09-23-slovakbert_upos_en.md new file mode 100644 index 00000000000000..937bf0339e2dab --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-slovakbert_upos_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English slovakbert_upos RoBertaForTokenClassification from crabz +author: John Snow Labs +name: slovakbert_upos +date: 2024-09-23 +tags: [en, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`slovakbert_upos` is a English model originally trained by crabz. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/slovakbert_upos_en_5.5.0_3.0_1727072905424.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/slovakbert_upos_en_5.5.0_3.0_1727072905424.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("slovakbert_upos","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("slovakbert_upos", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|slovakbert_upos| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|434.4 MB| + +## References + +https://huggingface.co/crabz/slovakbert-upos \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-slovakbert_upos_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-slovakbert_upos_pipeline_en.md new file mode 100644 index 00000000000000..314e307e74efec --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-slovakbert_upos_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English slovakbert_upos_pipeline pipeline RoBertaForTokenClassification from crabz +author: John Snow Labs +name: slovakbert_upos_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`slovakbert_upos_pipeline` is a English model originally trained by crabz. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/slovakbert_upos_pipeline_en_5.5.0_3.0_1727072937547.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/slovakbert_upos_pipeline_en_5.5.0_3.0_1727072937547.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("slovakbert_upos_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("slovakbert_upos_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|slovakbert_upos_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|434.4 MB| + +## References + +https://huggingface.co/crabz/slovakbert-upos + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-small_whisper_en.md b/docs/_posts/ahmedlone127/2024-09-23-small_whisper_en.md new file mode 100644 index 00000000000000..fda026bdf69b6c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-small_whisper_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English small_whisper WhisperForCTC from sanjitaa +author: John Snow Labs +name: small_whisper +date: 2024-09-23 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`small_whisper` is a English model originally trained by sanjitaa. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/small_whisper_en_5.5.0_3.0_1727051286565.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/small_whisper_en_5.5.0_3.0_1727051286565.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("small_whisper","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("small_whisper", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|small_whisper| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/sanjitaa/small-whisper \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-small_whisper_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-small_whisper_pipeline_en.md new file mode 100644 index 00000000000000..095de513ea109a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-small_whisper_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English small_whisper_pipeline pipeline WhisperForCTC from sanjitaa +author: John Snow Labs +name: small_whisper_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`small_whisper_pipeline` is a English model originally trained by sanjitaa. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/small_whisper_pipeline_en_5.5.0_3.0_1727051373275.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/small_whisper_pipeline_en_5.5.0_3.0_1727051373275.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("small_whisper_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("small_whisper_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|small_whisper_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/sanjitaa/small-whisper + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-social_media_sanskrit_saskta_finetuned_2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-social_media_sanskrit_saskta_finetuned_2_pipeline_en.md new file mode 100644 index 00000000000000..f78cbc2e5eb7bd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-social_media_sanskrit_saskta_finetuned_2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English social_media_sanskrit_saskta_finetuned_2_pipeline pipeline DistilBertForSequenceClassification from Kwaku +author: John Snow Labs +name: social_media_sanskrit_saskta_finetuned_2_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`social_media_sanskrit_saskta_finetuned_2_pipeline` is a English model originally trained by Kwaku. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/social_media_sanskrit_saskta_finetuned_2_pipeline_en_5.5.0_3.0_1727093632102.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/social_media_sanskrit_saskta_finetuned_2_pipeline_en_5.5.0_3.0_1727093632102.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("social_media_sanskrit_saskta_finetuned_2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("social_media_sanskrit_saskta_finetuned_2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|social_media_sanskrit_saskta_finetuned_2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Kwaku/social_media_sa_finetuned_2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-somd_xlm_3stage_stage0_pre_v1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-somd_xlm_3stage_stage0_pre_v1_pipeline_en.md new file mode 100644 index 00000000000000..4baaaf76e14cfd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-somd_xlm_3stage_stage0_pre_v1_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English somd_xlm_3stage_stage0_pre_v1_pipeline pipeline XlmRoBertaForSequenceClassification from ThuyNT03 +author: John Snow Labs +name: somd_xlm_3stage_stage0_pre_v1_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`somd_xlm_3stage_stage0_pre_v1_pipeline` is a English model originally trained by ThuyNT03. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/somd_xlm_3stage_stage0_pre_v1_pipeline_en_5.5.0_3.0_1727126203217.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/somd_xlm_3stage_stage0_pre_v1_pipeline_en_5.5.0_3.0_1727126203217.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("somd_xlm_3stage_stage0_pre_v1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("somd_xlm_3stage_stage0_pre_v1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|somd_xlm_3stage_stage0_pre_v1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|791.4 MB| + +## References + +https://huggingface.co/ThuyNT03/SOMD-xlm-3stage-stage0-pre-v1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sst2_padding90model_en.md b/docs/_posts/ahmedlone127/2024-09-23-sst2_padding90model_en.md new file mode 100644 index 00000000000000..c55cdb41e96f07 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sst2_padding90model_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sst2_padding90model DistilBertForSequenceClassification from Realgon +author: John Snow Labs +name: sst2_padding90model +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sst2_padding90model` is a English model originally trained by Realgon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sst2_padding90model_en_5.5.0_3.0_1727082112010.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sst2_padding90model_en_5.5.0_3.0_1727082112010.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("sst2_padding90model","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("sst2_padding90model", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sst2_padding90model| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Realgon/sst2_padding90model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sst2_padding90model_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-sst2_padding90model_pipeline_en.md new file mode 100644 index 00000000000000..a897e27a9629d1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sst2_padding90model_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English sst2_padding90model_pipeline pipeline DistilBertForSequenceClassification from Realgon +author: John Snow Labs +name: sst2_padding90model_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sst2_padding90model_pipeline` is a English model originally trained by Realgon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sst2_padding90model_pipeline_en_5.5.0_3.0_1727082128279.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sst2_padding90model_pipeline_en_5.5.0_3.0_1727082128279.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sst2_padding90model_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sst2_padding90model_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sst2_padding90model_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Realgon/sst2_padding90model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sst5_padding20model_en.md b/docs/_posts/ahmedlone127/2024-09-23-sst5_padding20model_en.md new file mode 100644 index 00000000000000..aebb31cf4b9f56 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sst5_padding20model_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sst5_padding20model DistilBertForSequenceClassification from Realgon +author: John Snow Labs +name: sst5_padding20model +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sst5_padding20model` is a English model originally trained by Realgon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sst5_padding20model_en_5.5.0_3.0_1727059549103.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sst5_padding20model_en_5.5.0_3.0_1727059549103.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("sst5_padding20model","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("sst5_padding20model", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sst5_padding20model| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Realgon/sst5_padding20model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sst5_padding20model_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-sst5_padding20model_pipeline_en.md new file mode 100644 index 00000000000000..d58bb089b16ea9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sst5_padding20model_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English sst5_padding20model_pipeline pipeline DistilBertForSequenceClassification from Realgon +author: John Snow Labs +name: sst5_padding20model_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sst5_padding20model_pipeline` is a English model originally trained by Realgon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sst5_padding20model_pipeline_en_5.5.0_3.0_1727059561499.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sst5_padding20model_pipeline_en_5.5.0_3.0_1727059561499.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sst5_padding20model_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sst5_padding20model_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sst5_padding20model_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Realgon/sst5_padding20model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-stego_classifier_checkpoint_epoch_10_2024_07_26_12_23_45_en.md b/docs/_posts/ahmedlone127/2024-09-23-stego_classifier_checkpoint_epoch_10_2024_07_26_12_23_45_en.md new file mode 100644 index 00000000000000..cf7c174dd480e8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-stego_classifier_checkpoint_epoch_10_2024_07_26_12_23_45_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English stego_classifier_checkpoint_epoch_10_2024_07_26_12_23_45 DistilBertForSequenceClassification from jvelja +author: John Snow Labs +name: stego_classifier_checkpoint_epoch_10_2024_07_26_12_23_45 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`stego_classifier_checkpoint_epoch_10_2024_07_26_12_23_45` is a English model originally trained by jvelja. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/stego_classifier_checkpoint_epoch_10_2024_07_26_12_23_45_en_5.5.0_3.0_1727082506338.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/stego_classifier_checkpoint_epoch_10_2024_07_26_12_23_45_en_5.5.0_3.0_1727082506338.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("stego_classifier_checkpoint_epoch_10_2024_07_26_12_23_45","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("stego_classifier_checkpoint_epoch_10_2024_07_26_12_23_45", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|stego_classifier_checkpoint_epoch_10_2024_07_26_12_23_45| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/jvelja/stego-classifier-checkpoint-epoch-10-2024-07-26_12-23-45 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-stego_classifier_checkpoint_epoch_10_2024_07_26_16_19_31_en.md b/docs/_posts/ahmedlone127/2024-09-23-stego_classifier_checkpoint_epoch_10_2024_07_26_16_19_31_en.md new file mode 100644 index 00000000000000..1681e4a2f6a361 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-stego_classifier_checkpoint_epoch_10_2024_07_26_16_19_31_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English stego_classifier_checkpoint_epoch_10_2024_07_26_16_19_31 DistilBertForSequenceClassification from jvelja +author: John Snow Labs +name: stego_classifier_checkpoint_epoch_10_2024_07_26_16_19_31 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`stego_classifier_checkpoint_epoch_10_2024_07_26_16_19_31` is a English model originally trained by jvelja. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/stego_classifier_checkpoint_epoch_10_2024_07_26_16_19_31_en_5.5.0_3.0_1727110649602.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/stego_classifier_checkpoint_epoch_10_2024_07_26_16_19_31_en_5.5.0_3.0_1727110649602.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("stego_classifier_checkpoint_epoch_10_2024_07_26_16_19_31","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("stego_classifier_checkpoint_epoch_10_2024_07_26_16_19_31", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|stego_classifier_checkpoint_epoch_10_2024_07_26_16_19_31| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/jvelja/stego-classifier-checkpoint-epoch-10-2024-07-26_16-19-31 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-stego_classifier_checkpoint_epoch_10_2024_07_26_16_19_31_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-stego_classifier_checkpoint_epoch_10_2024_07_26_16_19_31_pipeline_en.md new file mode 100644 index 00000000000000..cb8e9b3923543b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-stego_classifier_checkpoint_epoch_10_2024_07_26_16_19_31_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English stego_classifier_checkpoint_epoch_10_2024_07_26_16_19_31_pipeline pipeline DistilBertForSequenceClassification from jvelja +author: John Snow Labs +name: stego_classifier_checkpoint_epoch_10_2024_07_26_16_19_31_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`stego_classifier_checkpoint_epoch_10_2024_07_26_16_19_31_pipeline` is a English model originally trained by jvelja. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/stego_classifier_checkpoint_epoch_10_2024_07_26_16_19_31_pipeline_en_5.5.0_3.0_1727110661338.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/stego_classifier_checkpoint_epoch_10_2024_07_26_16_19_31_pipeline_en_5.5.0_3.0_1727110661338.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("stego_classifier_checkpoint_epoch_10_2024_07_26_16_19_31_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("stego_classifier_checkpoint_epoch_10_2024_07_26_16_19_31_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|stego_classifier_checkpoint_epoch_10_2024_07_26_16_19_31_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/jvelja/stego-classifier-checkpoint-epoch-10-2024-07-26_16-19-31 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-stego_classifier_checkpoint_epoch_20_2024_07_26_12_23_45_en.md b/docs/_posts/ahmedlone127/2024-09-23-stego_classifier_checkpoint_epoch_20_2024_07_26_12_23_45_en.md new file mode 100644 index 00000000000000..81ec2205d84943 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-stego_classifier_checkpoint_epoch_20_2024_07_26_12_23_45_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English stego_classifier_checkpoint_epoch_20_2024_07_26_12_23_45 DistilBertForSequenceClassification from jvelja +author: John Snow Labs +name: stego_classifier_checkpoint_epoch_20_2024_07_26_12_23_45 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`stego_classifier_checkpoint_epoch_20_2024_07_26_12_23_45` is a English model originally trained by jvelja. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/stego_classifier_checkpoint_epoch_20_2024_07_26_12_23_45_en_5.5.0_3.0_1727059122330.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/stego_classifier_checkpoint_epoch_20_2024_07_26_12_23_45_en_5.5.0_3.0_1727059122330.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("stego_classifier_checkpoint_epoch_20_2024_07_26_12_23_45","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("stego_classifier_checkpoint_epoch_20_2024_07_26_12_23_45", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|stego_classifier_checkpoint_epoch_20_2024_07_26_12_23_45| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/jvelja/stego-classifier-checkpoint-epoch-20-2024-07-26_12-23-45 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-stego_classifier_checkpoint_epoch_20_2024_07_26_12_23_45_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-stego_classifier_checkpoint_epoch_20_2024_07_26_12_23_45_pipeline_en.md new file mode 100644 index 00000000000000..0767d5f20e94f0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-stego_classifier_checkpoint_epoch_20_2024_07_26_12_23_45_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English stego_classifier_checkpoint_epoch_20_2024_07_26_12_23_45_pipeline pipeline DistilBertForSequenceClassification from jvelja +author: John Snow Labs +name: stego_classifier_checkpoint_epoch_20_2024_07_26_12_23_45_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`stego_classifier_checkpoint_epoch_20_2024_07_26_12_23_45_pipeline` is a English model originally trained by jvelja. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/stego_classifier_checkpoint_epoch_20_2024_07_26_12_23_45_pipeline_en_5.5.0_3.0_1727059135535.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/stego_classifier_checkpoint_epoch_20_2024_07_26_12_23_45_pipeline_en_5.5.0_3.0_1727059135535.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("stego_classifier_checkpoint_epoch_20_2024_07_26_12_23_45_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("stego_classifier_checkpoint_epoch_20_2024_07_26_12_23_45_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|stego_classifier_checkpoint_epoch_20_2024_07_26_12_23_45_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/jvelja/stego-classifier-checkpoint-epoch-20-2024-07-26_12-23-45 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-suicide_bert_en.md b/docs/_posts/ahmedlone127/2024-09-23-suicide_bert_en.md new file mode 100644 index 00000000000000..fdbc5fdecd657d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-suicide_bert_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English suicide_bert RoBertaForSequenceClassification from vishalp23 +author: John Snow Labs +name: suicide_bert +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`suicide_bert` is a English model originally trained by vishalp23. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/suicide_bert_en_5.5.0_3.0_1727085371437.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/suicide_bert_en_5.5.0_3.0_1727085371437.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("suicide_bert","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("suicide_bert", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|suicide_bert| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|467.5 MB| + +## References + +https://huggingface.co/vishalp23/suicide-bert \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-suicide_bert_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-suicide_bert_pipeline_en.md new file mode 100644 index 00000000000000..e0dedc9b143df3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-suicide_bert_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English suicide_bert_pipeline pipeline RoBertaForSequenceClassification from vishalp23 +author: John Snow Labs +name: suicide_bert_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`suicide_bert_pipeline` is a English model originally trained by vishalp23. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/suicide_bert_pipeline_en_5.5.0_3.0_1727085396182.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/suicide_bert_pipeline_en_5.5.0_3.0_1727085396182.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("suicide_bert_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("suicide_bert_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|suicide_bert_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|467.5 MB| + +## References + +https://huggingface.co/vishalp23/suicide-bert + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-suicide_distilbert_6_5_en.md b/docs/_posts/ahmedlone127/2024-09-23-suicide_distilbert_6_5_en.md new file mode 100644 index 00000000000000..a7e10d308d2441 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-suicide_distilbert_6_5_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English suicide_distilbert_6_5 DistilBertForSequenceClassification from cuadron11 +author: John Snow Labs +name: suicide_distilbert_6_5 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`suicide_distilbert_6_5` is a English model originally trained by cuadron11. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/suicide_distilbert_6_5_en_5.5.0_3.0_1727073775043.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/suicide_distilbert_6_5_en_5.5.0_3.0_1727073775043.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("suicide_distilbert_6_5","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("suicide_distilbert_6_5", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|suicide_distilbert_6_5| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/cuadron11/suicide-distilbert-6-5 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-suicide_distilbert_6_5_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-suicide_distilbert_6_5_pipeline_en.md new file mode 100644 index 00000000000000..0659e5ae3d6768 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-suicide_distilbert_6_5_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English suicide_distilbert_6_5_pipeline pipeline DistilBertForSequenceClassification from cuadron11 +author: John Snow Labs +name: suicide_distilbert_6_5_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`suicide_distilbert_6_5_pipeline` is a English model originally trained by cuadron11. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/suicide_distilbert_6_5_pipeline_en_5.5.0_3.0_1727073786779.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/suicide_distilbert_6_5_pipeline_en_5.5.0_3.0_1727073786779.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("suicide_distilbert_6_5_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("suicide_distilbert_6_5_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|suicide_distilbert_6_5_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/cuadron11/suicide-distilbert-6-5 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sungbeom_whisper_small_korean_set31_ko.md b/docs/_posts/ahmedlone127/2024-09-23-sungbeom_whisper_small_korean_set31_ko.md new file mode 100644 index 00000000000000..55678ddb124889 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sungbeom_whisper_small_korean_set31_ko.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Korean sungbeom_whisper_small_korean_set31 WhisperForCTC from maxseats +author: John Snow Labs +name: sungbeom_whisper_small_korean_set31 +date: 2024-09-23 +tags: [ko, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: ko +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sungbeom_whisper_small_korean_set31` is a Korean model originally trained by maxseats. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sungbeom_whisper_small_korean_set31_ko_5.5.0_3.0_1727116366905.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sungbeom_whisper_small_korean_set31_ko_5.5.0_3.0_1727116366905.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("sungbeom_whisper_small_korean_set31","ko") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("sungbeom_whisper_small_korean_set31", "ko") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sungbeom_whisper_small_korean_set31| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|ko| +|Size:|1.7 GB| + +## References + +https://huggingface.co/maxseats/SungBeom-whisper-small-ko-set31 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-supernatural_distilbert_prod_en.md b/docs/_posts/ahmedlone127/2024-09-23-supernatural_distilbert_prod_en.md new file mode 100644 index 00000000000000..47035c67e9faef --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-supernatural_distilbert_prod_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English supernatural_distilbert_prod DistilBertForSequenceClassification from banhabang +author: John Snow Labs +name: supernatural_distilbert_prod +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`supernatural_distilbert_prod` is a English model originally trained by banhabang. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/supernatural_distilbert_prod_en_5.5.0_3.0_1727073931444.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/supernatural_distilbert_prod_en_5.5.0_3.0_1727073931444.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("supernatural_distilbert_prod","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("supernatural_distilbert_prod", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|supernatural_distilbert_prod| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/banhabang/Supernatural-distilbert-Prod \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-supernatural_distilbert_prod_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-supernatural_distilbert_prod_pipeline_en.md new file mode 100644 index 00000000000000..6ecbe82fa60a0a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-supernatural_distilbert_prod_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English supernatural_distilbert_prod_pipeline pipeline DistilBertForSequenceClassification from banhabang +author: John Snow Labs +name: supernatural_distilbert_prod_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`supernatural_distilbert_prod_pipeline` is a English model originally trained by banhabang. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/supernatural_distilbert_prod_pipeline_en_5.5.0_3.0_1727073944110.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/supernatural_distilbert_prod_pipeline_en_5.5.0_3.0_1727073944110.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("supernatural_distilbert_prod_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("supernatural_distilbert_prod_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|supernatural_distilbert_prod_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/banhabang/Supernatural-distilbert-Prod + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-swati_model_en.md b/docs/_posts/ahmedlone127/2024-09-23-swati_model_en.md new file mode 100644 index 00000000000000..1808a98eefb7ea --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-swati_model_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English swati_model DistilBertForSequenceClassification from anth0nyhak1m +author: John Snow Labs +name: swati_model +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`swati_model` is a English model originally trained by anth0nyhak1m. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/swati_model_en_5.5.0_3.0_1727082168595.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/swati_model_en_5.5.0_3.0_1727082168595.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("swati_model","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("swati_model", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|swati_model| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/anth0nyhak1m/SS_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-swati_model_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-swati_model_pipeline_en.md new file mode 100644 index 00000000000000..2890991bad8153 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-swati_model_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English swati_model_pipeline pipeline DistilBertForSequenceClassification from anth0nyhak1m +author: John Snow Labs +name: swati_model_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`swati_model_pipeline` is a English model originally trained by anth0nyhak1m. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/swati_model_pipeline_en_5.5.0_3.0_1727082180681.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/swati_model_pipeline_en_5.5.0_3.0_1727082180681.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("swati_model_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("swati_model_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|swati_model_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/anth0nyhak1m/SS_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-tamilroberta_en.md b/docs/_posts/ahmedlone127/2024-09-23-tamilroberta_en.md new file mode 100644 index 00000000000000..ff8151b3ca9772 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-tamilroberta_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English tamilroberta RoBertaEmbeddings from apkbala107 +author: John Snow Labs +name: tamilroberta +date: 2024-09-23 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`tamilroberta` is a English model originally trained by apkbala107. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/tamilroberta_en_5.5.0_3.0_1727121707582.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/tamilroberta_en_5.5.0_3.0_1727121707582.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("tamilroberta","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("tamilroberta","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|tamilroberta| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|310.2 MB| + +## References + +https://huggingface.co/apkbala107/tamilroberta \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-tamilroberta_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-tamilroberta_pipeline_en.md new file mode 100644 index 00000000000000..4017d593552e7c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-tamilroberta_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English tamilroberta_pipeline pipeline RoBertaEmbeddings from apkbala107 +author: John Snow Labs +name: tamilroberta_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`tamilroberta_pipeline` is a English model originally trained by apkbala107. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/tamilroberta_pipeline_en_5.5.0_3.0_1727121723689.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/tamilroberta_pipeline_en_5.5.0_3.0_1727121723689.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("tamilroberta_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("tamilroberta_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|tamilroberta_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|310.2 MB| + +## References + +https://huggingface.co/apkbala107/tamilroberta + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-tamilroberto_en.md b/docs/_posts/ahmedlone127/2024-09-23-tamilroberto_en.md new file mode 100644 index 00000000000000..140eee095a1128 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-tamilroberto_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English tamilroberto RoBertaEmbeddings from apkbala107 +author: John Snow Labs +name: tamilroberto +date: 2024-09-23 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`tamilroberto` is a English model originally trained by apkbala107. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/tamilroberto_en_5.5.0_3.0_1727056896281.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/tamilroberto_en_5.5.0_3.0_1727056896281.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("tamilroberto","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("tamilroberto","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|tamilroberto| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|310.1 MB| + +## References + +https://huggingface.co/apkbala107/tamilroberto \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-tamilroberto_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-tamilroberto_pipeline_en.md new file mode 100644 index 00000000000000..c58ea078995535 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-tamilroberto_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English tamilroberto_pipeline pipeline RoBertaEmbeddings from apkbala107 +author: John Snow Labs +name: tamilroberto_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`tamilroberto_pipeline` is a English model originally trained by apkbala107. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/tamilroberto_pipeline_en_5.5.0_3.0_1727056915857.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/tamilroberto_pipeline_en_5.5.0_3.0_1727056915857.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("tamilroberto_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("tamilroberto_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|tamilroberto_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|310.1 MB| + +## References + +https://huggingface.co/apkbala107/tamilroberto + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-test1_sss2000_en.md b/docs/_posts/ahmedlone127/2024-09-23-test1_sss2000_en.md new file mode 100644 index 00000000000000..58810dfa657338 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-test1_sss2000_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English test1_sss2000 DistilBertForSequenceClassification from sss2000 +author: John Snow Labs +name: test1_sss2000 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`test1_sss2000` is a English model originally trained by sss2000. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/test1_sss2000_en_5.5.0_3.0_1727059348426.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/test1_sss2000_en_5.5.0_3.0_1727059348426.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("test1_sss2000","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("test1_sss2000", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|test1_sss2000| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/sss2000/test1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-test1_sss2000_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-test1_sss2000_pipeline_en.md new file mode 100644 index 00000000000000..67755238346eb5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-test1_sss2000_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English test1_sss2000_pipeline pipeline DistilBertForSequenceClassification from sss2000 +author: John Snow Labs +name: test1_sss2000_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`test1_sss2000_pipeline` is a English model originally trained by sss2000. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/test1_sss2000_pipeline_en_5.5.0_3.0_1727059360280.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/test1_sss2000_pipeline_en_5.5.0_3.0_1727059360280.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("test1_sss2000_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("test1_sss2000_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|test1_sss2000_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/sss2000/test1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-test_model_peace5544_en.md b/docs/_posts/ahmedlone127/2024-09-23-test_model_peace5544_en.md new file mode 100644 index 00000000000000..d60f8d2659afd1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-test_model_peace5544_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English test_model_peace5544 DistilBertForSequenceClassification from Peace5544 +author: John Snow Labs +name: test_model_peace5544 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`test_model_peace5544` is a English model originally trained by Peace5544. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/test_model_peace5544_en_5.5.0_3.0_1727110455138.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/test_model_peace5544_en_5.5.0_3.0_1727110455138.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("test_model_peace5544","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("test_model_peace5544", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|test_model_peace5544| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.4 MB| + +## References + +https://huggingface.co/Peace5544/test_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-test_model_peace5544_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-test_model_peace5544_pipeline_en.md new file mode 100644 index 00000000000000..2bf2e2a1c38396 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-test_model_peace5544_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English test_model_peace5544_pipeline pipeline DistilBertForSequenceClassification from Peace5544 +author: John Snow Labs +name: test_model_peace5544_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`test_model_peace5544_pipeline` is a English model originally trained by Peace5544. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/test_model_peace5544_pipeline_en_5.5.0_3.0_1727110466791.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/test_model_peace5544_pipeline_en_5.5.0_3.0_1727110466791.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("test_model_peace5544_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("test_model_peace5544_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|test_model_peace5544_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Peace5544/test_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-test_trainerb2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-test_trainerb2_pipeline_en.md new file mode 100644 index 00000000000000..b49bbdffffb4d8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-test_trainerb2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English test_trainerb2_pipeline pipeline DistilBertForSequenceClassification from SimoneJLaudani +author: John Snow Labs +name: test_trainerb2_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`test_trainerb2_pipeline` is a English model originally trained by SimoneJLaudani. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/test_trainerb2_pipeline_en_5.5.0_3.0_1727110753420.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/test_trainerb2_pipeline_en_5.5.0_3.0_1727110753420.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("test_trainerb2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("test_trainerb2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|test_trainerb2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/SimoneJLaudani/test_trainerb2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-testing_model_jim33282007_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-testing_model_jim33282007_pipeline_en.md new file mode 100644 index 00000000000000..a54067796383eb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-testing_model_jim33282007_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English testing_model_jim33282007_pipeline pipeline DistilBertForSequenceClassification from jim33282007 +author: John Snow Labs +name: testing_model_jim33282007_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`testing_model_jim33282007_pipeline` is a English model originally trained by jim33282007. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/testing_model_jim33282007_pipeline_en_5.5.0_3.0_1727082128409.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/testing_model_jim33282007_pipeline_en_5.5.0_3.0_1727082128409.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("testing_model_jim33282007_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("testing_model_jim33282007_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|testing_model_jim33282007_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/jim33282007/testing_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-text_classification_model_elijahriley_en.md b/docs/_posts/ahmedlone127/2024-09-23-text_classification_model_elijahriley_en.md new file mode 100644 index 00000000000000..916430ae97e3ac --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-text_classification_model_elijahriley_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English text_classification_model_elijahriley DistilBertForSequenceClassification from elijahriley +author: John Snow Labs +name: text_classification_model_elijahriley +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`text_classification_model_elijahriley` is a English model originally trained by elijahriley. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/text_classification_model_elijahriley_en_5.5.0_3.0_1727073736648.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/text_classification_model_elijahriley_en_5.5.0_3.0_1727073736648.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("text_classification_model_elijahriley","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("text_classification_model_elijahriley", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|text_classification_model_elijahriley| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/elijahriley/text_classification_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-text_classification_model_elijahriley_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-text_classification_model_elijahriley_pipeline_en.md new file mode 100644 index 00000000000000..430f3638ee79ea --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-text_classification_model_elijahriley_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English text_classification_model_elijahriley_pipeline pipeline DistilBertForSequenceClassification from elijahriley +author: John Snow Labs +name: text_classification_model_elijahriley_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`text_classification_model_elijahriley_pipeline` is a English model originally trained by elijahriley. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/text_classification_model_elijahriley_pipeline_en_5.5.0_3.0_1727073748733.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/text_classification_model_elijahriley_pipeline_en_5.5.0_3.0_1727073748733.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("text_classification_model_elijahriley_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("text_classification_model_elijahriley_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|text_classification_model_elijahriley_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/elijahriley/text_classification_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-text_entity_recognigtion_en.md b/docs/_posts/ahmedlone127/2024-09-23-text_entity_recognigtion_en.md new file mode 100644 index 00000000000000..cf2d1d6af391ab --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-text_entity_recognigtion_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English text_entity_recognigtion DistilBertForSequenceClassification from Mohamedfasil +author: John Snow Labs +name: text_entity_recognigtion +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`text_entity_recognigtion` is a English model originally trained by Mohamedfasil. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/text_entity_recognigtion_en_5.5.0_3.0_1727108447911.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/text_entity_recognigtion_en_5.5.0_3.0_1727108447911.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("text_entity_recognigtion","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("text_entity_recognigtion", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|text_entity_recognigtion| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Mohamedfasil/text-entity-recognigtion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-text_entity_recognigtion_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-text_entity_recognigtion_pipeline_en.md new file mode 100644 index 00000000000000..2eaeb6947958f7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-text_entity_recognigtion_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English text_entity_recognigtion_pipeline pipeline DistilBertForSequenceClassification from Mohamedfasil +author: John Snow Labs +name: text_entity_recognigtion_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`text_entity_recognigtion_pipeline` is a English model originally trained by Mohamedfasil. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/text_entity_recognigtion_pipeline_en_5.5.0_3.0_1727108459924.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/text_entity_recognigtion_pipeline_en_5.5.0_3.0_1727108459924.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("text_entity_recognigtion_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("text_entity_recognigtion_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|text_entity_recognigtion_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Mohamedfasil/text-entity-recognigtion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-text_subject_classification_distilbert_base_uncased_single_label_mgd_textbooks_zg_en.md b/docs/_posts/ahmedlone127/2024-09-23-text_subject_classification_distilbert_base_uncased_single_label_mgd_textbooks_zg_en.md new file mode 100644 index 00000000000000..819bb011d0ff88 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-text_subject_classification_distilbert_base_uncased_single_label_mgd_textbooks_zg_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English text_subject_classification_distilbert_base_uncased_single_label_mgd_textbooks_zg DistilBertForSequenceClassification from acuvity +author: John Snow Labs +name: text_subject_classification_distilbert_base_uncased_single_label_mgd_textbooks_zg +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`text_subject_classification_distilbert_base_uncased_single_label_mgd_textbooks_zg` is a English model originally trained by acuvity. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/text_subject_classification_distilbert_base_uncased_single_label_mgd_textbooks_zg_en_5.5.0_3.0_1727087011727.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/text_subject_classification_distilbert_base_uncased_single_label_mgd_textbooks_zg_en_5.5.0_3.0_1727087011727.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("text_subject_classification_distilbert_base_uncased_single_label_mgd_textbooks_zg","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("text_subject_classification_distilbert_base_uncased_single_label_mgd_textbooks_zg", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|text_subject_classification_distilbert_base_uncased_single_label_mgd_textbooks_zg| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/acuvity/text-subject_classification-distilbert-base-uncased-single_label-mgd_textbooks-zg \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-text_subject_classification_distilbert_base_uncased_single_label_mgd_textbooks_zg_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-text_subject_classification_distilbert_base_uncased_single_label_mgd_textbooks_zg_pipeline_en.md new file mode 100644 index 00000000000000..d04c09e099c952 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-text_subject_classification_distilbert_base_uncased_single_label_mgd_textbooks_zg_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English text_subject_classification_distilbert_base_uncased_single_label_mgd_textbooks_zg_pipeline pipeline DistilBertForSequenceClassification from acuvity +author: John Snow Labs +name: text_subject_classification_distilbert_base_uncased_single_label_mgd_textbooks_zg_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`text_subject_classification_distilbert_base_uncased_single_label_mgd_textbooks_zg_pipeline` is a English model originally trained by acuvity. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/text_subject_classification_distilbert_base_uncased_single_label_mgd_textbooks_zg_pipeline_en_5.5.0_3.0_1727087023830.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/text_subject_classification_distilbert_base_uncased_single_label_mgd_textbooks_zg_pipeline_en_5.5.0_3.0_1727087023830.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("text_subject_classification_distilbert_base_uncased_single_label_mgd_textbooks_zg_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("text_subject_classification_distilbert_base_uncased_single_label_mgd_textbooks_zg_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|text_subject_classification_distilbert_base_uncased_single_label_mgd_textbooks_zg_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/acuvity/text-subject_classification-distilbert-base-uncased-single_label-mgd_textbooks-zg + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-topic_topic_random1_seed2_bernice_en.md b/docs/_posts/ahmedlone127/2024-09-23-topic_topic_random1_seed2_bernice_en.md new file mode 100644 index 00000000000000..e16e2aecf39585 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-topic_topic_random1_seed2_bernice_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English topic_topic_random1_seed2_bernice XlmRoBertaForSequenceClassification from tweettemposhift +author: John Snow Labs +name: topic_topic_random1_seed2_bernice +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`topic_topic_random1_seed2_bernice` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/topic_topic_random1_seed2_bernice_en_5.5.0_3.0_1727088493478.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/topic_topic_random1_seed2_bernice_en_5.5.0_3.0_1727088493478.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("topic_topic_random1_seed2_bernice","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("topic_topic_random1_seed2_bernice", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|topic_topic_random1_seed2_bernice| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|805.6 MB| + +## References + +https://huggingface.co/tweettemposhift/topic-topic_random1_seed2-bernice \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-topic_topic_random1_seed2_bernice_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-topic_topic_random1_seed2_bernice_pipeline_en.md new file mode 100644 index 00000000000000..466d4276ed507f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-topic_topic_random1_seed2_bernice_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English topic_topic_random1_seed2_bernice_pipeline pipeline XlmRoBertaForSequenceClassification from tweettemposhift +author: John Snow Labs +name: topic_topic_random1_seed2_bernice_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`topic_topic_random1_seed2_bernice_pipeline` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/topic_topic_random1_seed2_bernice_pipeline_en_5.5.0_3.0_1727088633000.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/topic_topic_random1_seed2_bernice_pipeline_en_5.5.0_3.0_1727088633000.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("topic_topic_random1_seed2_bernice_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("topic_topic_random1_seed2_bernice_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|topic_topic_random1_seed2_bernice_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|805.6 MB| + +## References + +https://huggingface.co/tweettemposhift/topic-topic_random1_seed2-bernice + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-trialz_en.md b/docs/_posts/ahmedlone127/2024-09-23-trialz_en.md new file mode 100644 index 00000000000000..c3dc75a74b0b87 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-trialz_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English trialz RoBertaEmbeddings from JoAmps +author: John Snow Labs +name: trialz +date: 2024-09-23 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`trialz` is a English model originally trained by JoAmps. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/trialz_en_5.5.0_3.0_1727056713394.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/trialz_en_5.5.0_3.0_1727056713394.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("trialz","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("trialz","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|trialz| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|306.6 MB| + +## References + +https://huggingface.co/JoAmps/trialz \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-trialz_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-trialz_pipeline_en.md new file mode 100644 index 00000000000000..5604bdb16840b0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-trialz_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English trialz_pipeline pipeline RoBertaEmbeddings from JoAmps +author: John Snow Labs +name: trialz_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`trialz_pipeline` is a English model originally trained by JoAmps. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/trialz_pipeline_en_5.5.0_3.0_1727056728080.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/trialz_pipeline_en_5.5.0_3.0_1727056728080.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("trialz_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("trialz_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|trialz_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|306.6 MB| + +## References + +https://huggingface.co/JoAmps/trialz + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-twitterfin_padding90model_en.md b/docs/_posts/ahmedlone127/2024-09-23-twitterfin_padding90model_en.md new file mode 100644 index 00000000000000..9c7674f68f3976 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-twitterfin_padding90model_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English twitterfin_padding90model DistilBertForSequenceClassification from Realgon +author: John Snow Labs +name: twitterfin_padding90model +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`twitterfin_padding90model` is a English model originally trained by Realgon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/twitterfin_padding90model_en_5.5.0_3.0_1727074141408.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/twitterfin_padding90model_en_5.5.0_3.0_1727074141408.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("twitterfin_padding90model","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("twitterfin_padding90model", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|twitterfin_padding90model| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Realgon/twitterfin_padding90model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-twitterfin_padding90model_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-twitterfin_padding90model_pipeline_en.md new file mode 100644 index 00000000000000..1e6e42ace673b1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-twitterfin_padding90model_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English twitterfin_padding90model_pipeline pipeline DistilBertForSequenceClassification from Realgon +author: John Snow Labs +name: twitterfin_padding90model_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`twitterfin_padding90model_pipeline` is a English model originally trained by Realgon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/twitterfin_padding90model_pipeline_en_5.5.0_3.0_1727074153808.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/twitterfin_padding90model_pipeline_en_5.5.0_3.0_1727074153808.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("twitterfin_padding90model_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("twitterfin_padding90model_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|twitterfin_padding90model_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Realgon/twitterfin_padding90model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-user_476da26872df492f830a65925d422651_model_pipeline_ja.md b/docs/_posts/ahmedlone127/2024-09-23-user_476da26872df492f830a65925d422651_model_pipeline_ja.md new file mode 100644 index 00000000000000..d05217b9b94359 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-user_476da26872df492f830a65925d422651_model_pipeline_ja.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Japanese user_476da26872df492f830a65925d422651_model_pipeline pipeline WhisperForCTC from hoangvanvietanh +author: John Snow Labs +name: user_476da26872df492f830a65925d422651_model_pipeline +date: 2024-09-23 +tags: [ja, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: ja +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`user_476da26872df492f830a65925d422651_model_pipeline` is a Japanese model originally trained by hoangvanvietanh. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/user_476da26872df492f830a65925d422651_model_pipeline_ja_5.5.0_3.0_1727076537830.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/user_476da26872df492f830a65925d422651_model_pipeline_ja_5.5.0_3.0_1727076537830.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("user_476da26872df492f830a65925d422651_model_pipeline", lang = "ja") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("user_476da26872df492f830a65925d422651_model_pipeline", lang = "ja") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|user_476da26872df492f830a65925d422651_model_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|ja| +|Size:|1.7 GB| + +## References + +https://huggingface.co/hoangvanvietanh/user_476da26872df492f830a65925d422651_model + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-uztext_568mb_roberta_bpe_en.md b/docs/_posts/ahmedlone127/2024-09-23-uztext_568mb_roberta_bpe_en.md new file mode 100644 index 00000000000000..2a8371eaa64dea --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-uztext_568mb_roberta_bpe_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English uztext_568mb_roberta_bpe RoBertaEmbeddings from rifkat +author: John Snow Labs +name: uztext_568mb_roberta_bpe +date: 2024-09-23 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`uztext_568mb_roberta_bpe` is a English model originally trained by rifkat. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/uztext_568mb_roberta_bpe_en_5.5.0_3.0_1727121549174.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/uztext_568mb_roberta_bpe_en_5.5.0_3.0_1727121549174.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("uztext_568mb_roberta_bpe","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("uztext_568mb_roberta_bpe","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|uztext_568mb_roberta_bpe| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|311.9 MB| + +## References + +https://huggingface.co/rifkat/uztext_568Mb_Roberta_BPE \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-viz_wiz_bert_base_uncased_f16_en.md b/docs/_posts/ahmedlone127/2024-09-23-viz_wiz_bert_base_uncased_f16_en.md new file mode 100644 index 00000000000000..30d62a5f02ab79 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-viz_wiz_bert_base_uncased_f16_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English viz_wiz_bert_base_uncased_f16 BertEmbeddings from eisenjulian +author: John Snow Labs +name: viz_wiz_bert_base_uncased_f16 +date: 2024-09-23 +tags: [en, open_source, onnx, embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`viz_wiz_bert_base_uncased_f16` is a English model originally trained by eisenjulian. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/viz_wiz_bert_base_uncased_f16_en_5.5.0_3.0_1727107587086.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/viz_wiz_bert_base_uncased_f16_en_5.5.0_3.0_1727107587086.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = BertEmbeddings.pretrained("viz_wiz_bert_base_uncased_f16","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = BertEmbeddings.pretrained("viz_wiz_bert_base_uncased_f16","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|viz_wiz_bert_base_uncased_f16| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[bert]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/eisenjulian/viz-wiz-bert-base-uncased_f16 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-viz_wiz_bert_base_uncased_f16_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-viz_wiz_bert_base_uncased_f16_pipeline_en.md new file mode 100644 index 00000000000000..ed371f2a543486 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-viz_wiz_bert_base_uncased_f16_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English viz_wiz_bert_base_uncased_f16_pipeline pipeline BertEmbeddings from eisenjulian +author: John Snow Labs +name: viz_wiz_bert_base_uncased_f16_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`viz_wiz_bert_base_uncased_f16_pipeline` is a English model originally trained by eisenjulian. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/viz_wiz_bert_base_uncased_f16_pipeline_en_5.5.0_3.0_1727107606457.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/viz_wiz_bert_base_uncased_f16_pipeline_en_5.5.0_3.0_1727107606457.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("viz_wiz_bert_base_uncased_f16_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("viz_wiz_bert_base_uncased_f16_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|viz_wiz_bert_base_uncased_f16_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/eisenjulian/viz-wiz-bert-base-uncased_f16 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whipser_small_r2_en.md b/docs/_posts/ahmedlone127/2024-09-23-whipser_small_r2_en.md new file mode 100644 index 00000000000000..8a3b998e8f77c4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whipser_small_r2_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whipser_small_r2 WhisperForCTC from spsither +author: John Snow Labs +name: whipser_small_r2 +date: 2024-09-23 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whipser_small_r2` is a English model originally trained by spsither. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whipser_small_r2_en_5.5.0_3.0_1727053944271.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whipser_small_r2_en_5.5.0_3.0_1727053944271.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whipser_small_r2","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whipser_small_r2", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whipser_small_r2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/spsither/whipser-small-r2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whipser_small_r2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-whipser_small_r2_pipeline_en.md new file mode 100644 index 00000000000000..ffda878398ec3f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whipser_small_r2_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whipser_small_r2_pipeline pipeline WhisperForCTC from spsither +author: John Snow Labs +name: whipser_small_r2_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whipser_small_r2_pipeline` is a English model originally trained by spsither. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whipser_small_r2_pipeline_en_5.5.0_3.0_1727054035387.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whipser_small_r2_pipeline_en_5.5.0_3.0_1727054035387.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whipser_small_r2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whipser_small_r2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whipser_small_r2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/spsither/whipser-small-r2 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whisper_6e_4_clean_legion_v2_en.md b/docs/_posts/ahmedlone127/2024-09-23-whisper_6e_4_clean_legion_v2_en.md new file mode 100644 index 00000000000000..8024a843184316 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whisper_6e_4_clean_legion_v2_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_6e_4_clean_legion_v2 WhisperForCTC from yusufagung29 +author: John Snow Labs +name: whisper_6e_4_clean_legion_v2 +date: 2024-09-23 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_6e_4_clean_legion_v2` is a English model originally trained by yusufagung29. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_6e_4_clean_legion_v2_en_5.5.0_3.0_1727076301568.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_6e_4_clean_legion_v2_en_5.5.0_3.0_1727076301568.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_6e_4_clean_legion_v2","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_6e_4_clean_legion_v2", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_6e_4_clean_legion_v2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|390.9 MB| + +## References + +https://huggingface.co/yusufagung29/whisper_6e-4_clean_legion_v2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whisper_6e_4_clean_legion_v2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-whisper_6e_4_clean_legion_v2_pipeline_en.md new file mode 100644 index 00000000000000..81529e9603363a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whisper_6e_4_clean_legion_v2_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_6e_4_clean_legion_v2_pipeline pipeline WhisperForCTC from yusufagung29 +author: John Snow Labs +name: whisper_6e_4_clean_legion_v2_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_6e_4_clean_legion_v2_pipeline` is a English model originally trained by yusufagung29. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_6e_4_clean_legion_v2_pipeline_en_5.5.0_3.0_1727076325115.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_6e_4_clean_legion_v2_pipeline_en_5.5.0_3.0_1727076325115.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_6e_4_clean_legion_v2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_6e_4_clean_legion_v2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_6e_4_clean_legion_v2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|390.9 MB| + +## References + +https://huggingface.co/yusufagung29/whisper_6e-4_clean_legion_v2 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whisper_ai_nomi_en.md b/docs/_posts/ahmedlone127/2024-09-23-whisper_ai_nomi_en.md new file mode 100644 index 00000000000000..178e8b06b0d450 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whisper_ai_nomi_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_ai_nomi WhisperForCTC from susmitabhatt +author: John Snow Labs +name: whisper_ai_nomi +date: 2024-09-23 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_ai_nomi` is a English model originally trained by susmitabhatt. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_ai_nomi_en_5.5.0_3.0_1727117464662.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_ai_nomi_en_5.5.0_3.0_1727117464662.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_ai_nomi","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_ai_nomi", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_ai_nomi| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/susmitabhatt/whisper-ai-nomi \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whisper_ai_nomi_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-whisper_ai_nomi_pipeline_en.md new file mode 100644 index 00000000000000..bc3ba68c52fe72 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whisper_ai_nomi_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_ai_nomi_pipeline pipeline WhisperForCTC from susmitabhatt +author: John Snow Labs +name: whisper_ai_nomi_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_ai_nomi_pipeline` is a English model originally trained by susmitabhatt. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_ai_nomi_pipeline_en_5.5.0_3.0_1727117563862.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_ai_nomi_pipeline_en_5.5.0_3.0_1727117563862.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_ai_nomi_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_ai_nomi_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_ai_nomi_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/susmitabhatt/whisper-ai-nomi + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whisper_base_english_tonga_tonga_islands_myst55h_en.md b/docs/_posts/ahmedlone127/2024-09-23-whisper_base_english_tonga_tonga_islands_myst55h_en.md new file mode 100644 index 00000000000000..da4ebdfa642f11 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whisper_base_english_tonga_tonga_islands_myst55h_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_base_english_tonga_tonga_islands_myst55h WhisperForCTC from rishabhjain16 +author: John Snow Labs +name: whisper_base_english_tonga_tonga_islands_myst55h +date: 2024-09-23 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_base_english_tonga_tonga_islands_myst55h` is a English model originally trained by rishabhjain16. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_base_english_tonga_tonga_islands_myst55h_en_5.5.0_3.0_1727051397134.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_base_english_tonga_tonga_islands_myst55h_en_5.5.0_3.0_1727051397134.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_base_english_tonga_tonga_islands_myst55h","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_base_english_tonga_tonga_islands_myst55h", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_base_english_tonga_tonga_islands_myst55h| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|646.7 MB| + +## References + +https://huggingface.co/rishabhjain16/whisper_base_en_to_myst55h \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whisper_base_english_tonga_tonga_islands_myst55h_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-whisper_base_english_tonga_tonga_islands_myst55h_pipeline_en.md new file mode 100644 index 00000000000000..13d865f7b7feb7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whisper_base_english_tonga_tonga_islands_myst55h_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_base_english_tonga_tonga_islands_myst55h_pipeline pipeline WhisperForCTC from rishabhjain16 +author: John Snow Labs +name: whisper_base_english_tonga_tonga_islands_myst55h_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_base_english_tonga_tonga_islands_myst55h_pipeline` is a English model originally trained by rishabhjain16. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_base_english_tonga_tonga_islands_myst55h_pipeline_en_5.5.0_3.0_1727051433286.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_base_english_tonga_tonga_islands_myst55h_pipeline_en_5.5.0_3.0_1727051433286.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_base_english_tonga_tonga_islands_myst55h_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_base_english_tonga_tonga_islands_myst55h_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_base_english_tonga_tonga_islands_myst55h_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|646.7 MB| + +## References + +https://huggingface.co/rishabhjain16/whisper_base_en_to_myst55h + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whisper_base_pashto_ihanif_en.md b/docs/_posts/ahmedlone127/2024-09-23-whisper_base_pashto_ihanif_en.md new file mode 100644 index 00000000000000..df4d122b5cacd8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whisper_base_pashto_ihanif_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_base_pashto_ihanif WhisperForCTC from ihanif +author: John Snow Labs +name: whisper_base_pashto_ihanif +date: 2024-09-23 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_base_pashto_ihanif` is a English model originally trained by ihanif. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_base_pashto_ihanif_en_5.5.0_3.0_1727050761013.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_base_pashto_ihanif_en_5.5.0_3.0_1727050761013.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_base_pashto_ihanif","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_base_pashto_ihanif", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_base_pashto_ihanif| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|643.8 MB| + +## References + +https://huggingface.co/ihanif/whisper-base-pashto \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whisper_base_pashto_ihanif_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-whisper_base_pashto_ihanif_pipeline_en.md new file mode 100644 index 00000000000000..3d445b31501cf8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whisper_base_pashto_ihanif_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_base_pashto_ihanif_pipeline pipeline WhisperForCTC from ihanif +author: John Snow Labs +name: whisper_base_pashto_ihanif_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_base_pashto_ihanif_pipeline` is a English model originally trained by ihanif. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_base_pashto_ihanif_pipeline_en_5.5.0_3.0_1727050794305.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_base_pashto_ihanif_pipeline_en_5.5.0_3.0_1727050794305.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_base_pashto_ihanif_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_base_pashto_ihanif_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_base_pashto_ihanif_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|643.9 MB| + +## References + +https://huggingface.co/ihanif/whisper-base-pashto + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whisper_base_tamil_parambharat_pipeline_ta.md b/docs/_posts/ahmedlone127/2024-09-23-whisper_base_tamil_parambharat_pipeline_ta.md new file mode 100644 index 00000000000000..8f534695b74d93 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whisper_base_tamil_parambharat_pipeline_ta.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Tamil whisper_base_tamil_parambharat_pipeline pipeline WhisperForCTC from parambharat +author: John Snow Labs +name: whisper_base_tamil_parambharat_pipeline +date: 2024-09-23 +tags: [ta, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: ta +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_base_tamil_parambharat_pipeline` is a Tamil model originally trained by parambharat. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_base_tamil_parambharat_pipeline_ta_5.5.0_3.0_1727052624577.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_base_tamil_parambharat_pipeline_ta_5.5.0_3.0_1727052624577.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_base_tamil_parambharat_pipeline", lang = "ta") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_base_tamil_parambharat_pipeline", lang = "ta") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_base_tamil_parambharat_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|ta| +|Size:|643.5 MB| + +## References + +https://huggingface.co/parambharat/whisper-base-ta + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whisper_base_tamil_parambharat_ta.md b/docs/_posts/ahmedlone127/2024-09-23-whisper_base_tamil_parambharat_ta.md new file mode 100644 index 00000000000000..db4d66840c346a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whisper_base_tamil_parambharat_ta.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Tamil whisper_base_tamil_parambharat WhisperForCTC from parambharat +author: John Snow Labs +name: whisper_base_tamil_parambharat +date: 2024-09-23 +tags: [ta, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: ta +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_base_tamil_parambharat` is a Tamil model originally trained by parambharat. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_base_tamil_parambharat_ta_5.5.0_3.0_1727052588661.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_base_tamil_parambharat_ta_5.5.0_3.0_1727052588661.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_base_tamil_parambharat","ta") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_base_tamil_parambharat", "ta") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_base_tamil_parambharat| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|ta| +|Size:|643.5 MB| + +## References + +https://huggingface.co/parambharat/whisper-base-ta \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whisper_base_thai_der_1_pipeline_th.md b/docs/_posts/ahmedlone127/2024-09-23-whisper_base_thai_der_1_pipeline_th.md new file mode 100644 index 00000000000000..c7fa898311ade9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whisper_base_thai_der_1_pipeline_th.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Thai whisper_base_thai_der_1_pipeline pipeline WhisperForCTC from arun100 +author: John Snow Labs +name: whisper_base_thai_der_1_pipeline +date: 2024-09-23 +tags: [th, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: th +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_base_thai_der_1_pipeline` is a Thai model originally trained by arun100. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_base_thai_der_1_pipeline_th_5.5.0_3.0_1727077846599.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_base_thai_der_1_pipeline_th_5.5.0_3.0_1727077846599.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_base_thai_der_1_pipeline", lang = "th") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_base_thai_der_1_pipeline", lang = "th") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_base_thai_der_1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|th| +|Size:|642.6 MB| + +## References + +https://huggingface.co/arun100/whisper-base-thai-der-1 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whisper_base_thai_der_1_th.md b/docs/_posts/ahmedlone127/2024-09-23-whisper_base_thai_der_1_th.md new file mode 100644 index 00000000000000..296980ad059a55 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whisper_base_thai_der_1_th.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Thai whisper_base_thai_der_1 WhisperForCTC from arun100 +author: John Snow Labs +name: whisper_base_thai_der_1 +date: 2024-09-23 +tags: [th, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: th +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_base_thai_der_1` is a Thai model originally trained by arun100. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_base_thai_der_1_th_5.5.0_3.0_1727077814185.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_base_thai_der_1_th_5.5.0_3.0_1727077814185.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_base_thai_der_1","th") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_base_thai_der_1", "th") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_base_thai_der_1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|th| +|Size:|642.6 MB| + +## References + +https://huggingface.co/arun100/whisper-base-thai-der-1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whisper_base_v3_en.md b/docs/_posts/ahmedlone127/2024-09-23-whisper_base_v3_en.md new file mode 100644 index 00000000000000..e8bbd9c471229b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whisper_base_v3_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_base_v3 WhisperForCTC from raiyan007 +author: John Snow Labs +name: whisper_base_v3 +date: 2024-09-23 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_base_v3` is a English model originally trained by raiyan007. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_base_v3_en_5.5.0_3.0_1727117972210.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_base_v3_en_5.5.0_3.0_1727117972210.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_base_v3","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_base_v3", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_base_v3| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|642.8 MB| + +## References + +https://huggingface.co/raiyan007/whisper-base-v3 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whisper_base_v3_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-whisper_base_v3_pipeline_en.md new file mode 100644 index 00000000000000..ef3dc8544a050e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whisper_base_v3_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_base_v3_pipeline pipeline WhisperForCTC from raiyan007 +author: John Snow Labs +name: whisper_base_v3_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_base_v3_pipeline` is a English model originally trained by raiyan007. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_base_v3_pipeline_en_5.5.0_3.0_1727118005617.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_base_v3_pipeline_en_5.5.0_3.0_1727118005617.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_base_v3_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_base_v3_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_base_v3_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|642.8 MB| + +## References + +https://huggingface.co/raiyan007/whisper-base-v3 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whisper_medium_arabic_original_en.md b/docs/_posts/ahmedlone127/2024-09-23-whisper_medium_arabic_original_en.md new file mode 100644 index 00000000000000..dfdeecc7608b35 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whisper_medium_arabic_original_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_medium_arabic_original WhisperForCTC from aghannam +author: John Snow Labs +name: whisper_medium_arabic_original +date: 2024-09-23 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_medium_arabic_original` is a English model originally trained by aghannam. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_medium_arabic_original_en_5.5.0_3.0_1727117875987.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_medium_arabic_original_en_5.5.0_3.0_1727117875987.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_medium_arabic_original","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_medium_arabic_original", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_medium_arabic_original| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|4.8 GB| + +## References + +https://huggingface.co/aghannam/whisper-medium-ar-original \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whisper_medium_cantonese_cm_voice_en.md b/docs/_posts/ahmedlone127/2024-09-23-whisper_medium_cantonese_cm_voice_en.md new file mode 100644 index 00000000000000..7b6630754d8538 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whisper_medium_cantonese_cm_voice_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_medium_cantonese_cm_voice WhisperForCTC from jed351 +author: John Snow Labs +name: whisper_medium_cantonese_cm_voice +date: 2024-09-23 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_medium_cantonese_cm_voice` is a English model originally trained by jed351. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_medium_cantonese_cm_voice_en_5.5.0_3.0_1727078780314.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_medium_cantonese_cm_voice_en_5.5.0_3.0_1727078780314.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_medium_cantonese_cm_voice","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_medium_cantonese_cm_voice", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_medium_cantonese_cm_voice| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|4.8 GB| + +## References + +https://huggingface.co/jed351/whisper_medium_cantonese_cm_voice \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whisper_medium_malay_augmented_en.md b/docs/_posts/ahmedlone127/2024-09-23-whisper_medium_malay_augmented_en.md new file mode 100644 index 00000000000000..a656ba665fe8b2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whisper_medium_malay_augmented_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_medium_malay_augmented WhisperForCTC from Scrya +author: John Snow Labs +name: whisper_medium_malay_augmented +date: 2024-09-23 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_medium_malay_augmented` is a English model originally trained by Scrya. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_medium_malay_augmented_en_5.5.0_3.0_1727054306177.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_medium_malay_augmented_en_5.5.0_3.0_1727054306177.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_medium_malay_augmented","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_medium_malay_augmented", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_medium_malay_augmented| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|4.8 GB| + +## References + +https://huggingface.co/Scrya/whisper-medium-ms-augmented \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whisper_medium_portuguese_cv16_fleurs2_lr_wu_en.md b/docs/_posts/ahmedlone127/2024-09-23-whisper_medium_portuguese_cv16_fleurs2_lr_wu_en.md new file mode 100644 index 00000000000000..2c0540aada703f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whisper_medium_portuguese_cv16_fleurs2_lr_wu_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_medium_portuguese_cv16_fleurs2_lr_wu WhisperForCTC from fsicoli +author: John Snow Labs +name: whisper_medium_portuguese_cv16_fleurs2_lr_wu +date: 2024-09-23 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_medium_portuguese_cv16_fleurs2_lr_wu` is a English model originally trained by fsicoli. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_medium_portuguese_cv16_fleurs2_lr_wu_en_5.5.0_3.0_1727079959850.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_medium_portuguese_cv16_fleurs2_lr_wu_en_5.5.0_3.0_1727079959850.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_medium_portuguese_cv16_fleurs2_lr_wu","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_medium_portuguese_cv16_fleurs2_lr_wu", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_medium_portuguese_cv16_fleurs2_lr_wu| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|4.8 GB| + +## References + +https://huggingface.co/fsicoli/whisper-medium-pt-cv16-fleurs2-lr-wu \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whisper_medium_portuguese_cv16_fleurs2_lr_wu_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-whisper_medium_portuguese_cv16_fleurs2_lr_wu_pipeline_en.md new file mode 100644 index 00000000000000..9598d993d0fa50 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whisper_medium_portuguese_cv16_fleurs2_lr_wu_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_medium_portuguese_cv16_fleurs2_lr_wu_pipeline pipeline WhisperForCTC from fsicoli +author: John Snow Labs +name: whisper_medium_portuguese_cv16_fleurs2_lr_wu_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_medium_portuguese_cv16_fleurs2_lr_wu_pipeline` is a English model originally trained by fsicoli. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_medium_portuguese_cv16_fleurs2_lr_wu_pipeline_en_5.5.0_3.0_1727080163618.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_medium_portuguese_cv16_fleurs2_lr_wu_pipeline_en_5.5.0_3.0_1727080163618.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_medium_portuguese_cv16_fleurs2_lr_wu_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_medium_portuguese_cv16_fleurs2_lr_wu_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_medium_portuguese_cv16_fleurs2_lr_wu_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|4.8 GB| + +## References + +https://huggingface.co/fsicoli/whisper-medium-pt-cv16-fleurs2-lr-wu + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whisper_medium_sango_50_2_30_part2_30_2_en.md b/docs/_posts/ahmedlone127/2024-09-23-whisper_medium_sango_50_2_30_part2_30_2_en.md new file mode 100644 index 00000000000000..bc3313cfd2ac48 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whisper_medium_sango_50_2_30_part2_30_2_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_medium_sango_50_2_30_part2_30_2 WhisperForCTC from eighty88 +author: John Snow Labs +name: whisper_medium_sango_50_2_30_part2_30_2 +date: 2024-09-23 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_medium_sango_50_2_30_part2_30_2` is a English model originally trained by eighty88. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_medium_sango_50_2_30_part2_30_2_en_5.5.0_3.0_1727119418541.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_medium_sango_50_2_30_part2_30_2_en_5.5.0_3.0_1727119418541.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_medium_sango_50_2_30_part2_30_2","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_medium_sango_50_2_30_part2_30_2", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_medium_sango_50_2_30_part2_30_2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|4.8 GB| + +## References + +https://huggingface.co/eighty88/whisper-medium-sg-50-2-30-part2-30-2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whisper_small_ar2_ar.md b/docs/_posts/ahmedlone127/2024-09-23-whisper_small_ar2_ar.md new file mode 100644 index 00000000000000..ba1fc3f4023bd6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whisper_small_ar2_ar.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Arabic whisper_small_ar2 WhisperForCTC from whitefox123 +author: John Snow Labs +name: whisper_small_ar2 +date: 2024-09-23 +tags: [ar, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: ar +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_ar2` is a Arabic model originally trained by whitefox123. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_ar2_ar_5.5.0_3.0_1727119299746.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_ar2_ar_5.5.0_3.0_1727119299746.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_ar2","ar") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_ar2", "ar") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_ar2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|ar| +|Size:|1.7 GB| + +## References + +https://huggingface.co/whitefox123/whisper-small-ar2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whisper_small_arabic_kecil_en.md b/docs/_posts/ahmedlone127/2024-09-23-whisper_small_arabic_kecil_en.md new file mode 100644 index 00000000000000..2704a0664fe06e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whisper_small_arabic_kecil_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_small_arabic_kecil WhisperForCTC from mujadid-syahbana +author: John Snow Labs +name: whisper_small_arabic_kecil +date: 2024-09-23 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_arabic_kecil` is a English model originally trained by mujadid-syahbana. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_arabic_kecil_en_5.5.0_3.0_1727076598671.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_arabic_kecil_en_5.5.0_3.0_1727076598671.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_arabic_kecil","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_arabic_kecil", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_arabic_kecil| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|389.6 MB| + +## References + +https://huggingface.co/mujadid-syahbana/whisper-small-ar-kecil \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whisper_small_arabic_kecil_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-whisper_small_arabic_kecil_pipeline_en.md new file mode 100644 index 00000000000000..6769a47d69fc89 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whisper_small_arabic_kecil_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_small_arabic_kecil_pipeline pipeline WhisperForCTC from mujadid-syahbana +author: John Snow Labs +name: whisper_small_arabic_kecil_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_arabic_kecil_pipeline` is a English model originally trained by mujadid-syahbana. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_arabic_kecil_pipeline_en_5.5.0_3.0_1727076618022.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_arabic_kecil_pipeline_en_5.5.0_3.0_1727076618022.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_arabic_kecil_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_arabic_kecil_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_arabic_kecil_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|389.6 MB| + +## References + +https://huggingface.co/mujadid-syahbana/whisper-small-ar-kecil + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whisper_small_arabic_v2_ar.md b/docs/_posts/ahmedlone127/2024-09-23-whisper_small_arabic_v2_ar.md new file mode 100644 index 00000000000000..64af8f0b360f2b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whisper_small_arabic_v2_ar.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Arabic whisper_small_arabic_v2 WhisperForCTC from ymoslem +author: John Snow Labs +name: whisper_small_arabic_v2 +date: 2024-09-23 +tags: [ar, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: ar +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_arabic_v2` is a Arabic model originally trained by ymoslem. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_arabic_v2_ar_5.5.0_3.0_1727051664606.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_arabic_v2_ar_5.5.0_3.0_1727051664606.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_arabic_v2","ar") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_arabic_v2", "ar") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_arabic_v2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|ar| +|Size:|1.7 GB| + +## References + +https://huggingface.co/ymoslem/whisper-small-ar-v2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whisper_small_arabic_v2_pipeline_ar.md b/docs/_posts/ahmedlone127/2024-09-23-whisper_small_arabic_v2_pipeline_ar.md new file mode 100644 index 00000000000000..02c3dbf5a36926 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whisper_small_arabic_v2_pipeline_ar.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Arabic whisper_small_arabic_v2_pipeline pipeline WhisperForCTC from ymoslem +author: John Snow Labs +name: whisper_small_arabic_v2_pipeline +date: 2024-09-23 +tags: [ar, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: ar +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_arabic_v2_pipeline` is a Arabic model originally trained by ymoslem. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_arabic_v2_pipeline_ar_5.5.0_3.0_1727051748920.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_arabic_v2_pipeline_ar_5.5.0_3.0_1727051748920.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_arabic_v2_pipeline", lang = "ar") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_arabic_v2_pipeline", lang = "ar") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_arabic_v2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|ar| +|Size:|1.7 GB| + +## References + +https://huggingface.co/ymoslem/whisper-small-ar-v2 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whisper_small_bangla_bn.md b/docs/_posts/ahmedlone127/2024-09-23-whisper_small_bangla_bn.md new file mode 100644 index 00000000000000..c66b7e8071fd69 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whisper_small_bangla_bn.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Bengali whisper_small_bangla WhisperForCTC from ashrafulparan +author: John Snow Labs +name: whisper_small_bangla +date: 2024-09-23 +tags: [bn, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: bn +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_bangla` is a Bengali model originally trained by ashrafulparan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_bangla_bn_5.5.0_3.0_1727051052841.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_bangla_bn_5.5.0_3.0_1727051052841.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_bangla","bn") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_bangla", "bn") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_bangla| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|bn| +|Size:|1.7 GB| + +## References + +https://huggingface.co/ashrafulparan/whisper-small-bangla \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whisper_small_bangla_pipeline_bn.md b/docs/_posts/ahmedlone127/2024-09-23-whisper_small_bangla_pipeline_bn.md new file mode 100644 index 00000000000000..85787898720740 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whisper_small_bangla_pipeline_bn.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Bengali whisper_small_bangla_pipeline pipeline WhisperForCTC from ashrafulparan +author: John Snow Labs +name: whisper_small_bangla_pipeline +date: 2024-09-23 +tags: [bn, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: bn +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_bangla_pipeline` is a Bengali model originally trained by ashrafulparan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_bangla_pipeline_bn_5.5.0_3.0_1727051140208.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_bangla_pipeline_bn_5.5.0_3.0_1727051140208.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_bangla_pipeline", lang = "bn") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_bangla_pipeline", lang = "bn") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_bangla_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|bn| +|Size:|1.7 GB| + +## References + +https://huggingface.co/ashrafulparan/whisper-small-bangla + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whisper_small_basque_xezpeleta_eu.md b/docs/_posts/ahmedlone127/2024-09-23-whisper_small_basque_xezpeleta_eu.md new file mode 100644 index 00000000000000..d12d07f372432d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whisper_small_basque_xezpeleta_eu.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Basque whisper_small_basque_xezpeleta WhisperForCTC from xezpeleta +author: John Snow Labs +name: whisper_small_basque_xezpeleta +date: 2024-09-23 +tags: [eu, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: eu +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_basque_xezpeleta` is a Basque model originally trained by xezpeleta. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_basque_xezpeleta_eu_5.5.0_3.0_1727118123821.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_basque_xezpeleta_eu_5.5.0_3.0_1727118123821.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_basque_xezpeleta","eu") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_basque_xezpeleta", "eu") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_basque_xezpeleta| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|eu| +|Size:|1.7 GB| + +## References + +https://huggingface.co/xezpeleta/whisper-small-eu \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whisper_small_basque_xezpeleta_pipeline_eu.md b/docs/_posts/ahmedlone127/2024-09-23-whisper_small_basque_xezpeleta_pipeline_eu.md new file mode 100644 index 00000000000000..c2757750e74b68 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whisper_small_basque_xezpeleta_pipeline_eu.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Basque whisper_small_basque_xezpeleta_pipeline pipeline WhisperForCTC from xezpeleta +author: John Snow Labs +name: whisper_small_basque_xezpeleta_pipeline +date: 2024-09-23 +tags: [eu, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: eu +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_basque_xezpeleta_pipeline` is a Basque model originally trained by xezpeleta. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_basque_xezpeleta_pipeline_eu_5.5.0_3.0_1727118205773.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_basque_xezpeleta_pipeline_eu_5.5.0_3.0_1727118205773.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_basque_xezpeleta_pipeline", lang = "eu") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_basque_xezpeleta_pipeline", lang = "eu") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_basque_xezpeleta_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|eu| +|Size:|1.7 GB| + +## References + +https://huggingface.co/xezpeleta/whisper-small-eu + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whisper_small_child50k_timestretch_steplr_ko.md b/docs/_posts/ahmedlone127/2024-09-23-whisper_small_child50k_timestretch_steplr_ko.md new file mode 100644 index 00000000000000..d46d05438f0d99 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whisper_small_child50k_timestretch_steplr_ko.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Korean whisper_small_child50k_timestretch_steplr WhisperForCTC from haseong8012 +author: John Snow Labs +name: whisper_small_child50k_timestretch_steplr +date: 2024-09-23 +tags: [ko, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: ko +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_child50k_timestretch_steplr` is a Korean model originally trained by haseong8012. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_child50k_timestretch_steplr_ko_5.5.0_3.0_1727052144778.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_child50k_timestretch_steplr_ko_5.5.0_3.0_1727052144778.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_child50k_timestretch_steplr","ko") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_child50k_timestretch_steplr", "ko") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_child50k_timestretch_steplr| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|ko| +|Size:|1.7 GB| + +## References + +https://huggingface.co/haseong8012/whisper-small_child50K_timestretch_stepLR \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whisper_small_child50k_timestretch_steplr_pipeline_ko.md b/docs/_posts/ahmedlone127/2024-09-23-whisper_small_child50k_timestretch_steplr_pipeline_ko.md new file mode 100644 index 00000000000000..be11e88b09da41 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whisper_small_child50k_timestretch_steplr_pipeline_ko.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Korean whisper_small_child50k_timestretch_steplr_pipeline pipeline WhisperForCTC from haseong8012 +author: John Snow Labs +name: whisper_small_child50k_timestretch_steplr_pipeline +date: 2024-09-23 +tags: [ko, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: ko +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_child50k_timestretch_steplr_pipeline` is a Korean model originally trained by haseong8012. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_child50k_timestretch_steplr_pipeline_ko_5.5.0_3.0_1727052228583.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_child50k_timestretch_steplr_pipeline_ko_5.5.0_3.0_1727052228583.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_child50k_timestretch_steplr_pipeline", lang = "ko") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_child50k_timestretch_steplr_pipeline", lang = "ko") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_child50k_timestretch_steplr_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|ko| +|Size:|1.7 GB| + +## References + +https://huggingface.co/haseong8012/whisper-small_child50K_timestretch_stepLR + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whisper_small_chinese_cn_pipeline_zh.md b/docs/_posts/ahmedlone127/2024-09-23-whisper_small_chinese_cn_pipeline_zh.md new file mode 100644 index 00000000000000..5ee5d7aa8254c5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whisper_small_chinese_cn_pipeline_zh.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Chinese whisper_small_chinese_cn_pipeline pipeline WhisperForCTC from JunSir +author: John Snow Labs +name: whisper_small_chinese_cn_pipeline +date: 2024-09-23 +tags: [zh, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: zh +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_chinese_cn_pipeline` is a Chinese model originally trained by JunSir. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_chinese_cn_pipeline_zh_5.5.0_3.0_1727053166922.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_chinese_cn_pipeline_zh_5.5.0_3.0_1727053166922.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_chinese_cn_pipeline", lang = "zh") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_chinese_cn_pipeline", lang = "zh") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_chinese_cn_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|zh| +|Size:|1.7 GB| + +## References + +https://huggingface.co/JunSir/whisper-small-zh-CN + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whisper_small_chinese_cn_zh.md b/docs/_posts/ahmedlone127/2024-09-23-whisper_small_chinese_cn_zh.md new file mode 100644 index 00000000000000..96c19fcf28fe38 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whisper_small_chinese_cn_zh.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Chinese whisper_small_chinese_cn WhisperForCTC from JunSir +author: John Snow Labs +name: whisper_small_chinese_cn +date: 2024-09-23 +tags: [zh, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: zh +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_chinese_cn` is a Chinese model originally trained by JunSir. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_chinese_cn_zh_5.5.0_3.0_1727053083927.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_chinese_cn_zh_5.5.0_3.0_1727053083927.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_chinese_cn","zh") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_chinese_cn", "zh") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_chinese_cn| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|zh| +|Size:|1.7 GB| + +## References + +https://huggingface.co/JunSir/whisper-small-zh-CN \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whisper_small_chuvash_43_freeze_encoder_hi.md b/docs/_posts/ahmedlone127/2024-09-23-whisper_small_chuvash_43_freeze_encoder_hi.md new file mode 100644 index 00000000000000..6c3cdd3fb48167 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whisper_small_chuvash_43_freeze_encoder_hi.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Hindi whisper_small_chuvash_43_freeze_encoder WhisperForCTC from alikanakar +author: John Snow Labs +name: whisper_small_chuvash_43_freeze_encoder +date: 2024-09-23 +tags: [hi, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: hi +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_chuvash_43_freeze_encoder` is a Hindi model originally trained by alikanakar. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_chuvash_43_freeze_encoder_hi_5.5.0_3.0_1727117372970.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_chuvash_43_freeze_encoder_hi_5.5.0_3.0_1727117372970.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_chuvash_43_freeze_encoder","hi") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_chuvash_43_freeze_encoder", "hi") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_chuvash_43_freeze_encoder| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|hi| +|Size:|1.7 GB| + +## References + +https://huggingface.co/alikanakar/whisper-small-CV-43-freeze-encoder \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whisper_small_chuvash_43_freeze_encoder_pipeline_hi.md b/docs/_posts/ahmedlone127/2024-09-23-whisper_small_chuvash_43_freeze_encoder_pipeline_hi.md new file mode 100644 index 00000000000000..62b57dd04e107d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whisper_small_chuvash_43_freeze_encoder_pipeline_hi.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Hindi whisper_small_chuvash_43_freeze_encoder_pipeline pipeline WhisperForCTC from alikanakar +author: John Snow Labs +name: whisper_small_chuvash_43_freeze_encoder_pipeline +date: 2024-09-23 +tags: [hi, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: hi +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_chuvash_43_freeze_encoder_pipeline` is a Hindi model originally trained by alikanakar. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_chuvash_43_freeze_encoder_pipeline_hi_5.5.0_3.0_1727117456078.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_chuvash_43_freeze_encoder_pipeline_hi_5.5.0_3.0_1727117456078.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_chuvash_43_freeze_encoder_pipeline", lang = "hi") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_chuvash_43_freeze_encoder_pipeline", lang = "hi") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_chuvash_43_freeze_encoder_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|hi| +|Size:|1.7 GB| + +## References + +https://huggingface.co/alikanakar/whisper-small-CV-43-freeze-encoder + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whisper_small_danish_wasurats_da.md b/docs/_posts/ahmedlone127/2024-09-23-whisper_small_danish_wasurats_da.md new file mode 100644 index 00000000000000..9ce5aa48c20a8f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whisper_small_danish_wasurats_da.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Danish whisper_small_danish_wasurats WhisperForCTC from WasuratS +author: John Snow Labs +name: whisper_small_danish_wasurats +date: 2024-09-23 +tags: [da, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: da +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_danish_wasurats` is a Danish model originally trained by WasuratS. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_danish_wasurats_da_5.5.0_3.0_1727075824331.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_danish_wasurats_da_5.5.0_3.0_1727075824331.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_danish_wasurats","da") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_danish_wasurats", "da") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_danish_wasurats| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|da| +|Size:|1.7 GB| + +## References + +https://huggingface.co/WasuratS/whisper-small-da \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whisper_small_danish_wasurats_pipeline_da.md b/docs/_posts/ahmedlone127/2024-09-23-whisper_small_danish_wasurats_pipeline_da.md new file mode 100644 index 00000000000000..a653f620c628ef --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whisper_small_danish_wasurats_pipeline_da.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Danish whisper_small_danish_wasurats_pipeline pipeline WhisperForCTC from WasuratS +author: John Snow Labs +name: whisper_small_danish_wasurats_pipeline +date: 2024-09-23 +tags: [da, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: da +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_danish_wasurats_pipeline` is a Danish model originally trained by WasuratS. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_danish_wasurats_pipeline_da_5.5.0_3.0_1727075909323.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_danish_wasurats_pipeline_da_5.5.0_3.0_1727075909323.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_danish_wasurats_pipeline", lang = "da") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_danish_wasurats_pipeline", lang = "da") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_danish_wasurats_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|da| +|Size:|1.7 GB| + +## References + +https://huggingface.co/WasuratS/whisper-small-da + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whisper_small_divehi_shahukareem_dv.md b/docs/_posts/ahmedlone127/2024-09-23-whisper_small_divehi_shahukareem_dv.md new file mode 100644 index 00000000000000..af3b887395b63a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whisper_small_divehi_shahukareem_dv.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Dhivehi, Divehi, Maldivian whisper_small_divehi_shahukareem WhisperForCTC from shahukareem +author: John Snow Labs +name: whisper_small_divehi_shahukareem +date: 2024-09-23 +tags: [dv, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: dv +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_divehi_shahukareem` is a Dhivehi, Divehi, Maldivian model originally trained by shahukareem. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_divehi_shahukareem_dv_5.5.0_3.0_1727117023387.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_divehi_shahukareem_dv_5.5.0_3.0_1727117023387.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_divehi_shahukareem","dv") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_divehi_shahukareem", "dv") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_divehi_shahukareem| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|dv| +|Size:|1.7 GB| + +## References + +https://huggingface.co/shahukareem/whisper-small-dv \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whisper_small_divehi_shahukareem_pipeline_dv.md b/docs/_posts/ahmedlone127/2024-09-23-whisper_small_divehi_shahukareem_pipeline_dv.md new file mode 100644 index 00000000000000..b78371767deb43 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whisper_small_divehi_shahukareem_pipeline_dv.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Dhivehi, Divehi, Maldivian whisper_small_divehi_shahukareem_pipeline pipeline WhisperForCTC from shahukareem +author: John Snow Labs +name: whisper_small_divehi_shahukareem_pipeline +date: 2024-09-23 +tags: [dv, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: dv +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_divehi_shahukareem_pipeline` is a Dhivehi, Divehi, Maldivian model originally trained by shahukareem. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_divehi_shahukareem_pipeline_dv_5.5.0_3.0_1727117112946.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_divehi_shahukareem_pipeline_dv_5.5.0_3.0_1727117112946.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_divehi_shahukareem_pipeline", lang = "dv") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_divehi_shahukareem_pipeline", lang = "dv") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_divehi_shahukareem_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|dv| +|Size:|1.7 GB| + +## References + +https://huggingface.co/shahukareem/whisper-small-dv + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whisper_small_dutch_vl_nl.md b/docs/_posts/ahmedlone127/2024-09-23-whisper_small_dutch_vl_nl.md new file mode 100644 index 00000000000000..4274befb2deeb1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whisper_small_dutch_vl_nl.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Dutch, Flemish whisper_small_dutch_vl WhisperForCTC from fibleep +author: John Snow Labs +name: whisper_small_dutch_vl +date: 2024-09-23 +tags: [nl, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: nl +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_dutch_vl` is a Dutch, Flemish model originally trained by fibleep. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_dutch_vl_nl_5.5.0_3.0_1727116375931.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_dutch_vl_nl_5.5.0_3.0_1727116375931.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_dutch_vl","nl") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_dutch_vl", "nl") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_dutch_vl| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|nl| +|Size:|1.7 GB| + +## References + +https://huggingface.co/fibleep/whisper-small-nl-vl \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whisper_small_dutch_vl_pipeline_nl.md b/docs/_posts/ahmedlone127/2024-09-23-whisper_small_dutch_vl_pipeline_nl.md new file mode 100644 index 00000000000000..9d1672966168e3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whisper_small_dutch_vl_pipeline_nl.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Dutch, Flemish whisper_small_dutch_vl_pipeline pipeline WhisperForCTC from fibleep +author: John Snow Labs +name: whisper_small_dutch_vl_pipeline +date: 2024-09-23 +tags: [nl, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: nl +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_dutch_vl_pipeline` is a Dutch, Flemish model originally trained by fibleep. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_dutch_vl_pipeline_nl_5.5.0_3.0_1727116471808.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_dutch_vl_pipeline_nl_5.5.0_3.0_1727116471808.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_dutch_vl_pipeline", lang = "nl") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_dutch_vl_pipeline", lang = "nl") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_dutch_vl_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|nl| +|Size:|1.7 GB| + +## References + +https://huggingface.co/fibleep/whisper-small-nl-vl + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whisper_small_hindi_auro_pipeline_hi.md b/docs/_posts/ahmedlone127/2024-09-23-whisper_small_hindi_auro_pipeline_hi.md new file mode 100644 index 00000000000000..910dfddff375f2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whisper_small_hindi_auro_pipeline_hi.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Hindi whisper_small_hindi_auro_pipeline pipeline WhisperForCTC from auro +author: John Snow Labs +name: whisper_small_hindi_auro_pipeline +date: 2024-09-23 +tags: [hi, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: hi +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_hindi_auro_pipeline` is a Hindi model originally trained by auro. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_hindi_auro_pipeline_hi_5.5.0_3.0_1727078576756.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_hindi_auro_pipeline_hi_5.5.0_3.0_1727078576756.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_hindi_auro_pipeline", lang = "hi") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_hindi_auro_pipeline", lang = "hi") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_hindi_auro_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|hi| +|Size:|1.7 GB| + +## References + +https://huggingface.co/auro/whisper-small-hi + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whisper_small_hindi_reach_vb_en.md b/docs/_posts/ahmedlone127/2024-09-23-whisper_small_hindi_reach_vb_en.md new file mode 100644 index 00000000000000..47ff0168b394c3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whisper_small_hindi_reach_vb_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_small_hindi_reach_vb WhisperForCTC from reach-vb +author: John Snow Labs +name: whisper_small_hindi_reach_vb +date: 2024-09-23 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_hindi_reach_vb` is a English model originally trained by reach-vb. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_hindi_reach_vb_en_5.5.0_3.0_1727116374943.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_hindi_reach_vb_en_5.5.0_3.0_1727116374943.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_hindi_reach_vb","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_hindi_reach_vb", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_hindi_reach_vb| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/reach-vb/whisper-small-hi \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whisper_small_hindi_reach_vb_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-whisper_small_hindi_reach_vb_pipeline_en.md new file mode 100644 index 00000000000000..7000601151cc12 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whisper_small_hindi_reach_vb_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_small_hindi_reach_vb_pipeline pipeline WhisperForCTC from reach-vb +author: John Snow Labs +name: whisper_small_hindi_reach_vb_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_hindi_reach_vb_pipeline` is a English model originally trained by reach-vb. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_hindi_reach_vb_pipeline_en_5.5.0_3.0_1727116465843.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_hindi_reach_vb_pipeline_en_5.5.0_3.0_1727116465843.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_hindi_reach_vb_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_hindi_reach_vb_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_hindi_reach_vb_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/reach-vb/whisper-small-hi + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whisper_small_kdn_en.md b/docs/_posts/ahmedlone127/2024-09-23-whisper_small_kdn_en.md new file mode 100644 index 00000000000000..f48e7e7f2cd81c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whisper_small_kdn_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_small_kdn WhisperForCTC from susmitabhatt +author: John Snow Labs +name: whisper_small_kdn +date: 2024-09-23 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_kdn` is a English model originally trained by susmitabhatt. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_kdn_en_5.5.0_3.0_1727052386799.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_kdn_en_5.5.0_3.0_1727052386799.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_kdn","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_kdn", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_kdn| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/susmitabhatt/whisper-small-kdn \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whisper_small_kdn_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-whisper_small_kdn_pipeline_en.md new file mode 100644 index 00000000000000..4726c6b3b1455d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whisper_small_kdn_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_small_kdn_pipeline pipeline WhisperForCTC from susmitabhatt +author: John Snow Labs +name: whisper_small_kdn_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_kdn_pipeline` is a English model originally trained by susmitabhatt. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_kdn_pipeline_en_5.5.0_3.0_1727052472835.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_kdn_pipeline_en_5.5.0_3.0_1727052472835.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_kdn_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_kdn_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_kdn_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/susmitabhatt/whisper-small-kdn + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whisper_small_mongolian_11_mn.md b/docs/_posts/ahmedlone127/2024-09-23-whisper_small_mongolian_11_mn.md new file mode 100644 index 00000000000000..b73a92e6225edc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whisper_small_mongolian_11_mn.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Mongolian whisper_small_mongolian_11 WhisperForCTC from bayartsogt +author: John Snow Labs +name: whisper_small_mongolian_11 +date: 2024-09-23 +tags: [mn, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: mn +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_mongolian_11` is a Mongolian model originally trained by bayartsogt. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_mongolian_11_mn_5.5.0_3.0_1727053943998.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_mongolian_11_mn_5.5.0_3.0_1727053943998.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_mongolian_11","mn") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_mongolian_11", "mn") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_mongolian_11| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|mn| +|Size:|1.7 GB| + +## References + +https://huggingface.co/bayartsogt/whisper-small-mn-11 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whisper_small_mongolian_11_pipeline_mn.md b/docs/_posts/ahmedlone127/2024-09-23-whisper_small_mongolian_11_pipeline_mn.md new file mode 100644 index 00000000000000..be6d68fee37a77 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whisper_small_mongolian_11_pipeline_mn.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Mongolian whisper_small_mongolian_11_pipeline pipeline WhisperForCTC from bayartsogt +author: John Snow Labs +name: whisper_small_mongolian_11_pipeline +date: 2024-09-23 +tags: [mn, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: mn +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_mongolian_11_pipeline` is a Mongolian model originally trained by bayartsogt. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_mongolian_11_pipeline_mn_5.5.0_3.0_1727054037542.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_mongolian_11_pipeline_mn_5.5.0_3.0_1727054037542.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_mongolian_11_pipeline", lang = "mn") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_mongolian_11_pipeline", lang = "mn") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_mongolian_11_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|mn| +|Size:|1.7 GB| + +## References + +https://huggingface.co/bayartsogt/whisper-small-mn-11 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whisper_small_mongolian_cafet_mn.md b/docs/_posts/ahmedlone127/2024-09-23-whisper_small_mongolian_cafet_mn.md new file mode 100644 index 00000000000000..c55df9efe1286d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whisper_small_mongolian_cafet_mn.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Mongolian whisper_small_mongolian_cafet WhisperForCTC from Cafet +author: John Snow Labs +name: whisper_small_mongolian_cafet +date: 2024-09-23 +tags: [mn, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: mn +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_mongolian_cafet` is a Mongolian model originally trained by Cafet. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_mongolian_cafet_mn_5.5.0_3.0_1727118411098.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_mongolian_cafet_mn_5.5.0_3.0_1727118411098.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_mongolian_cafet","mn") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_mongolian_cafet", "mn") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_mongolian_cafet| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|mn| +|Size:|1.7 GB| + +## References + +https://huggingface.co/Cafet/whisper-small-mongolian \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whisper_small_mongolian_cafet_pipeline_mn.md b/docs/_posts/ahmedlone127/2024-09-23-whisper_small_mongolian_cafet_pipeline_mn.md new file mode 100644 index 00000000000000..3e7079cc0de108 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whisper_small_mongolian_cafet_pipeline_mn.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Mongolian whisper_small_mongolian_cafet_pipeline pipeline WhisperForCTC from Cafet +author: John Snow Labs +name: whisper_small_mongolian_cafet_pipeline +date: 2024-09-23 +tags: [mn, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: mn +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_mongolian_cafet_pipeline` is a Mongolian model originally trained by Cafet. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_mongolian_cafet_pipeline_mn_5.5.0_3.0_1727118496606.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_mongolian_cafet_pipeline_mn_5.5.0_3.0_1727118496606.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_mongolian_cafet_pipeline", lang = "mn") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_mongolian_cafet_pipeline", lang = "mn") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_mongolian_cafet_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|mn| +|Size:|1.7 GB| + +## References + +https://huggingface.co/Cafet/whisper-small-mongolian + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whisper_small_nepal_bhasa_hindi_en.md b/docs/_posts/ahmedlone127/2024-09-23-whisper_small_nepal_bhasa_hindi_en.md new file mode 100644 index 00000000000000..3ec9ee25158184 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whisper_small_nepal_bhasa_hindi_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_small_nepal_bhasa_hindi WhisperForCTC from RamNaamSatyaHai +author: John Snow Labs +name: whisper_small_nepal_bhasa_hindi +date: 2024-09-23 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_nepal_bhasa_hindi` is a English model originally trained by RamNaamSatyaHai. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_nepal_bhasa_hindi_en_5.5.0_3.0_1727075925615.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_nepal_bhasa_hindi_en_5.5.0_3.0_1727075925615.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_nepal_bhasa_hindi","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_nepal_bhasa_hindi", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_nepal_bhasa_hindi| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/RamNaamSatyaHai/whisper-small_new-hi \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whisper_small_nepal_bhasa_hindi_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-whisper_small_nepal_bhasa_hindi_pipeline_en.md new file mode 100644 index 00000000000000..3911b7dbd9650d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whisper_small_nepal_bhasa_hindi_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_small_nepal_bhasa_hindi_pipeline pipeline WhisperForCTC from RamNaamSatyaHai +author: John Snow Labs +name: whisper_small_nepal_bhasa_hindi_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_nepal_bhasa_hindi_pipeline` is a English model originally trained by RamNaamSatyaHai. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_nepal_bhasa_hindi_pipeline_en_5.5.0_3.0_1727076015404.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_nepal_bhasa_hindi_pipeline_en_5.5.0_3.0_1727076015404.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_nepal_bhasa_hindi_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_nepal_bhasa_hindi_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_nepal_bhasa_hindi_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/RamNaamSatyaHai/whisper-small_new-hi + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whisper_small_np_en.md b/docs/_posts/ahmedlone127/2024-09-23-whisper_small_np_en.md new file mode 100644 index 00000000000000..5436850458d08f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whisper_small_np_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_small_np WhisperForCTC from bhimrazy +author: John Snow Labs +name: whisper_small_np +date: 2024-09-23 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_np` is a English model originally trained by bhimrazy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_np_en_5.5.0_3.0_1727116601657.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_np_en_5.5.0_3.0_1727116601657.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_np","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_np", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_np| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/bhimrazy/whisper-small-np \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whisper_small_np_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-whisper_small_np_pipeline_en.md new file mode 100644 index 00000000000000..ab6b4664760db0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whisper_small_np_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_small_np_pipeline pipeline WhisperForCTC from bhimrazy +author: John Snow Labs +name: whisper_small_np_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_np_pipeline` is a English model originally trained by bhimrazy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_np_pipeline_en_5.5.0_3.0_1727116687886.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_np_pipeline_en_5.5.0_3.0_1727116687886.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_np_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_np_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_np_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/bhimrazy/whisper-small-np + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whisper_small_romanian_yehoward_pipeline_ro.md b/docs/_posts/ahmedlone127/2024-09-23-whisper_small_romanian_yehoward_pipeline_ro.md new file mode 100644 index 00000000000000..6a07678ff76a14 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whisper_small_romanian_yehoward_pipeline_ro.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Moldavian, Moldovan, Romanian whisper_small_romanian_yehoward_pipeline pipeline WhisperForCTC from Yehoward +author: John Snow Labs +name: whisper_small_romanian_yehoward_pipeline +date: 2024-09-23 +tags: [ro, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: ro +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_romanian_yehoward_pipeline` is a Moldavian, Moldovan, Romanian model originally trained by Yehoward. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_romanian_yehoward_pipeline_ro_5.5.0_3.0_1727079098548.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_romanian_yehoward_pipeline_ro_5.5.0_3.0_1727079098548.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_romanian_yehoward_pipeline", lang = "ro") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_romanian_yehoward_pipeline", lang = "ro") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_romanian_yehoward_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|ro| +|Size:|1.7 GB| + +## References + +https://huggingface.co/Yehoward/whisper-small-ro + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whisper_small_romanian_yehoward_ro.md b/docs/_posts/ahmedlone127/2024-09-23-whisper_small_romanian_yehoward_ro.md new file mode 100644 index 00000000000000..d3b0a43585c36a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whisper_small_romanian_yehoward_ro.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Moldavian, Moldovan, Romanian whisper_small_romanian_yehoward WhisperForCTC from Yehoward +author: John Snow Labs +name: whisper_small_romanian_yehoward +date: 2024-09-23 +tags: [ro, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: ro +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_romanian_yehoward` is a Moldavian, Moldovan, Romanian model originally trained by Yehoward. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_romanian_yehoward_ro_5.5.0_3.0_1727079009473.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_romanian_yehoward_ro_5.5.0_3.0_1727079009473.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_romanian_yehoward","ro") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_romanian_yehoward", "ro") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_romanian_yehoward| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|ro| +|Size:|1.7 GB| + +## References + +https://huggingface.co/Yehoward/whisper-small-ro \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whisper_small_russian_ord_4_en.md b/docs/_posts/ahmedlone127/2024-09-23-whisper_small_russian_ord_4_en.md new file mode 100644 index 00000000000000..689d90d1365a99 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whisper_small_russian_ord_4_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_small_russian_ord_4 WhisperForCTC from mizoru +author: John Snow Labs +name: whisper_small_russian_ord_4 +date: 2024-09-23 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_russian_ord_4` is a English model originally trained by mizoru. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_russian_ord_4_en_5.5.0_3.0_1727051982000.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_russian_ord_4_en_5.5.0_3.0_1727051982000.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_russian_ord_4","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_russian_ord_4", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_russian_ord_4| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/mizoru/whisper-small-ru-ORD_4 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whisper_small_russian_ord_4_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-whisper_small_russian_ord_4_pipeline_en.md new file mode 100644 index 00000000000000..39129523929de8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whisper_small_russian_ord_4_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_small_russian_ord_4_pipeline pipeline WhisperForCTC from mizoru +author: John Snow Labs +name: whisper_small_russian_ord_4_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_russian_ord_4_pipeline` is a English model originally trained by mizoru. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_russian_ord_4_pipeline_en_5.5.0_3.0_1727052078189.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_russian_ord_4_pipeline_en_5.5.0_3.0_1727052078189.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_russian_ord_4_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_russian_ord_4_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_russian_ord_4_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/mizoru/whisper-small-ru-ORD_4 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whisper_small_saudi_podcasts_asr_en.md b/docs/_posts/ahmedlone127/2024-09-23-whisper_small_saudi_podcasts_asr_en.md new file mode 100644 index 00000000000000..6ead3048eaf9b4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whisper_small_saudi_podcasts_asr_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_small_saudi_podcasts_asr WhisperForCTC from HuggingPanda +author: John Snow Labs +name: whisper_small_saudi_podcasts_asr +date: 2024-09-23 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_saudi_podcasts_asr` is a English model originally trained by HuggingPanda. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_saudi_podcasts_asr_en_5.5.0_3.0_1727079199652.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_saudi_podcasts_asr_en_5.5.0_3.0_1727079199652.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_saudi_podcasts_asr","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_saudi_podcasts_asr", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_saudi_podcasts_asr| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/HuggingPanda/whisper-small-Saudi-Podcasts-ASR \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whisper_small_tamil_carlfeynman_pipeline_ta.md b/docs/_posts/ahmedlone127/2024-09-23-whisper_small_tamil_carlfeynman_pipeline_ta.md new file mode 100644 index 00000000000000..3268c96f5820ae --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whisper_small_tamil_carlfeynman_pipeline_ta.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Tamil whisper_small_tamil_carlfeynman_pipeline pipeline WhisperForCTC from carlfeynman +author: John Snow Labs +name: whisper_small_tamil_carlfeynman_pipeline +date: 2024-09-23 +tags: [ta, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: ta +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_tamil_carlfeynman_pipeline` is a Tamil model originally trained by carlfeynman. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_tamil_carlfeynman_pipeline_ta_5.5.0_3.0_1727077144726.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_tamil_carlfeynman_pipeline_ta_5.5.0_3.0_1727077144726.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_tamil_carlfeynman_pipeline", lang = "ta") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_tamil_carlfeynman_pipeline", lang = "ta") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_tamil_carlfeynman_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|ta| +|Size:|1.7 GB| + +## References + +https://huggingface.co/carlfeynman/whisper-small-tamil + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whisper_small_tamil_carlfeynman_ta.md b/docs/_posts/ahmedlone127/2024-09-23-whisper_small_tamil_carlfeynman_ta.md new file mode 100644 index 00000000000000..976a862258eab0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whisper_small_tamil_carlfeynman_ta.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Tamil whisper_small_tamil_carlfeynman WhisperForCTC from carlfeynman +author: John Snow Labs +name: whisper_small_tamil_carlfeynman +date: 2024-09-23 +tags: [ta, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: ta +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_tamil_carlfeynman` is a Tamil model originally trained by carlfeynman. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_tamil_carlfeynman_ta_5.5.0_3.0_1727077063936.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_tamil_carlfeynman_ta_5.5.0_3.0_1727077063936.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_tamil_carlfeynman","ta") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_tamil_carlfeynman", "ta") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_tamil_carlfeynman| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|ta| +|Size:|1.7 GB| + +## References + +https://huggingface.co/carlfeynman/whisper-small-tamil \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whisper_small_tonga_tonga_islands_chuvash_albanian_pipeline_sq.md b/docs/_posts/ahmedlone127/2024-09-23-whisper_small_tonga_tonga_islands_chuvash_albanian_pipeline_sq.md new file mode 100644 index 00000000000000..c44397e357b172 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whisper_small_tonga_tonga_islands_chuvash_albanian_pipeline_sq.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Albanian whisper_small_tonga_tonga_islands_chuvash_albanian_pipeline pipeline WhisperForCTC from rishabhjain16 +author: John Snow Labs +name: whisper_small_tonga_tonga_islands_chuvash_albanian_pipeline +date: 2024-09-23 +tags: [sq, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: sq +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_tonga_tonga_islands_chuvash_albanian_pipeline` is a Albanian model originally trained by rishabhjain16. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_tonga_tonga_islands_chuvash_albanian_pipeline_sq_5.5.0_3.0_1727117970737.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_tonga_tonga_islands_chuvash_albanian_pipeline_sq_5.5.0_3.0_1727117970737.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_tonga_tonga_islands_chuvash_albanian_pipeline", lang = "sq") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_tonga_tonga_islands_chuvash_albanian_pipeline", lang = "sq") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_tonga_tonga_islands_chuvash_albanian_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|sq| +|Size:|1.7 GB| + +## References + +https://huggingface.co/rishabhjain16/whisper-small_to_cv_albanian + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whisper_small_tonga_tonga_islands_chuvash_albanian_sq.md b/docs/_posts/ahmedlone127/2024-09-23-whisper_small_tonga_tonga_islands_chuvash_albanian_sq.md new file mode 100644 index 00000000000000..8bbeee9d48e8ef --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whisper_small_tonga_tonga_islands_chuvash_albanian_sq.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Albanian whisper_small_tonga_tonga_islands_chuvash_albanian WhisperForCTC from rishabhjain16 +author: John Snow Labs +name: whisper_small_tonga_tonga_islands_chuvash_albanian +date: 2024-09-23 +tags: [sq, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: sq +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_tonga_tonga_islands_chuvash_albanian` is a Albanian model originally trained by rishabhjain16. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_tonga_tonga_islands_chuvash_albanian_sq_5.5.0_3.0_1727117888841.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_tonga_tonga_islands_chuvash_albanian_sq_5.5.0_3.0_1727117888841.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_tonga_tonga_islands_chuvash_albanian","sq") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_tonga_tonga_islands_chuvash_albanian", "sq") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_tonga_tonga_islands_chuvash_albanian| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|sq| +|Size:|1.7 GB| + +## References + +https://huggingface.co/rishabhjain16/whisper-small_to_cv_albanian \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whisper_sudanese_dialect_tiny_ayman_kagglee_en.md b/docs/_posts/ahmedlone127/2024-09-23-whisper_sudanese_dialect_tiny_ayman_kagglee_en.md new file mode 100644 index 00000000000000..2ac9c0a23c4009 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whisper_sudanese_dialect_tiny_ayman_kagglee_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_sudanese_dialect_tiny_ayman_kagglee WhisperForCTC from AymanMansour +author: John Snow Labs +name: whisper_sudanese_dialect_tiny_ayman_kagglee +date: 2024-09-23 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_sudanese_dialect_tiny_ayman_kagglee` is a English model originally trained by AymanMansour. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_sudanese_dialect_tiny_ayman_kagglee_en_5.5.0_3.0_1727117364896.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_sudanese_dialect_tiny_ayman_kagglee_en_5.5.0_3.0_1727117364896.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_sudanese_dialect_tiny_ayman_kagglee","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_sudanese_dialect_tiny_ayman_kagglee", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_sudanese_dialect_tiny_ayman_kagglee| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|390.7 MB| + +## References + +https://huggingface.co/AymanMansour/Whisper-Sudanese-Dialect-tiny-ayman-kagglee \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whisper_synthesized_turkish_8_hour_hlr_en.md b/docs/_posts/ahmedlone127/2024-09-23-whisper_synthesized_turkish_8_hour_hlr_en.md new file mode 100644 index 00000000000000..f8dc886ac5bc46 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whisper_synthesized_turkish_8_hour_hlr_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_synthesized_turkish_8_hour_hlr WhisperForCTC from alikanakar +author: John Snow Labs +name: whisper_synthesized_turkish_8_hour_hlr +date: 2024-09-23 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_synthesized_turkish_8_hour_hlr` is a English model originally trained by alikanakar. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_synthesized_turkish_8_hour_hlr_en_5.5.0_3.0_1727079416083.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_synthesized_turkish_8_hour_hlr_en_5.5.0_3.0_1727079416083.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_synthesized_turkish_8_hour_hlr","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_synthesized_turkish_8_hour_hlr", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_synthesized_turkish_8_hour_hlr| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/alikanakar/whisper-synthesized-turkish-8-hour-hlr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whisper_synthesized_turkish_8_hour_hlr_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-whisper_synthesized_turkish_8_hour_hlr_pipeline_en.md new file mode 100644 index 00000000000000..12c3218f54b2a9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whisper_synthesized_turkish_8_hour_hlr_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_synthesized_turkish_8_hour_hlr_pipeline pipeline WhisperForCTC from alikanakar +author: John Snow Labs +name: whisper_synthesized_turkish_8_hour_hlr_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_synthesized_turkish_8_hour_hlr_pipeline` is a English model originally trained by alikanakar. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_synthesized_turkish_8_hour_hlr_pipeline_en_5.5.0_3.0_1727079496940.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_synthesized_turkish_8_hour_hlr_pipeline_en_5.5.0_3.0_1727079496940.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_synthesized_turkish_8_hour_hlr_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_synthesized_turkish_8_hour_hlr_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_synthesized_turkish_8_hour_hlr_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/alikanakar/whisper-synthesized-turkish-8-hour-hlr + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whisper_tiny_aug_on_fly_en.md b/docs/_posts/ahmedlone127/2024-09-23-whisper_tiny_aug_on_fly_en.md new file mode 100644 index 00000000000000..c150301b8f6753 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whisper_tiny_aug_on_fly_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_tiny_aug_on_fly WhisperForCTC from thanhduycao +author: John Snow Labs +name: whisper_tiny_aug_on_fly +date: 2024-09-23 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_aug_on_fly` is a English model originally trained by thanhduycao. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_aug_on_fly_en_5.5.0_3.0_1727116764205.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_aug_on_fly_en_5.5.0_3.0_1727116764205.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_tiny_aug_on_fly","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_tiny_aug_on_fly", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_aug_on_fly| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|379.0 MB| + +## References + +https://huggingface.co/thanhduycao/whisper-tiny-aug-on-fly \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whisper_tiny_aug_on_fly_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-whisper_tiny_aug_on_fly_pipeline_en.md new file mode 100644 index 00000000000000..db120b08a8d3c1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whisper_tiny_aug_on_fly_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_tiny_aug_on_fly_pipeline pipeline WhisperForCTC from thanhduycao +author: John Snow Labs +name: whisper_tiny_aug_on_fly_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_aug_on_fly_pipeline` is a English model originally trained by thanhduycao. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_aug_on_fly_pipeline_en_5.5.0_3.0_1727116789679.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_aug_on_fly_pipeline_en_5.5.0_3.0_1727116789679.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_tiny_aug_on_fly_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_tiny_aug_on_fly_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_aug_on_fly_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|379.0 MB| + +## References + +https://huggingface.co/thanhduycao/whisper-tiny-aug-on-fly + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whisper_tiny_cb_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-whisper_tiny_cb_pipeline_en.md new file mode 100644 index 00000000000000..bfd527914ba230 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whisper_tiny_cb_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_tiny_cb_pipeline pipeline WhisperForCTC from mikemason +author: John Snow Labs +name: whisper_tiny_cb_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_cb_pipeline` is a English model originally trained by mikemason. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_cb_pipeline_en_5.5.0_3.0_1727077622726.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_cb_pipeline_en_5.5.0_3.0_1727077622726.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_tiny_cb_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_tiny_cb_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_cb_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|394.9 MB| + +## References + +https://huggingface.co/mikemason/whisper-tiny-cb + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whisper_tiny_chinese_cn_lr4_3600_pipeline_zh.md b/docs/_posts/ahmedlone127/2024-09-23-whisper_tiny_chinese_cn_lr4_3600_pipeline_zh.md new file mode 100644 index 00000000000000..f0928e31a743f4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whisper_tiny_chinese_cn_lr4_3600_pipeline_zh.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Chinese whisper_tiny_chinese_cn_lr4_3600_pipeline pipeline WhisperForCTC from VingeNie +author: John Snow Labs +name: whisper_tiny_chinese_cn_lr4_3600_pipeline +date: 2024-09-23 +tags: [zh, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: zh +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_chinese_cn_lr4_3600_pipeline` is a Chinese model originally trained by VingeNie. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_chinese_cn_lr4_3600_pipeline_zh_5.5.0_3.0_1727117802346.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_chinese_cn_lr4_3600_pipeline_zh_5.5.0_3.0_1727117802346.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_tiny_chinese_cn_lr4_3600_pipeline", lang = "zh") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_tiny_chinese_cn_lr4_3600_pipeline", lang = "zh") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_chinese_cn_lr4_3600_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|zh| +|Size:|389.2 MB| + +## References + +https://huggingface.co/VingeNie/whisper-tiny-zh_CN_lr4_3600 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whisper_tiny_chinese_cn_lr4_3600_zh.md b/docs/_posts/ahmedlone127/2024-09-23-whisper_tiny_chinese_cn_lr4_3600_zh.md new file mode 100644 index 00000000000000..db56394993a27c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whisper_tiny_chinese_cn_lr4_3600_zh.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Chinese whisper_tiny_chinese_cn_lr4_3600 WhisperForCTC from VingeNie +author: John Snow Labs +name: whisper_tiny_chinese_cn_lr4_3600 +date: 2024-09-23 +tags: [zh, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: zh +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_chinese_cn_lr4_3600` is a Chinese model originally trained by VingeNie. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_chinese_cn_lr4_3600_zh_5.5.0_3.0_1727117779492.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_chinese_cn_lr4_3600_zh_5.5.0_3.0_1727117779492.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_tiny_chinese_cn_lr4_3600","zh") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_tiny_chinese_cn_lr4_3600", "zh") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_chinese_cn_lr4_3600| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|zh| +|Size:|389.2 MB| + +## References + +https://huggingface.co/VingeNie/whisper-tiny-zh_CN_lr4_3600 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whisper_tiny_ft_tts_english_welsh_en.md b/docs/_posts/ahmedlone127/2024-09-23-whisper_tiny_ft_tts_english_welsh_en.md new file mode 100644 index 00000000000000..0e08d416b8583a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whisper_tiny_ft_tts_english_welsh_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_tiny_ft_tts_english_welsh WhisperForCTC from DewiBrynJones +author: John Snow Labs +name: whisper_tiny_ft_tts_english_welsh +date: 2024-09-23 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_ft_tts_english_welsh` is a English model originally trained by DewiBrynJones. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_ft_tts_english_welsh_en_5.5.0_3.0_1727079068885.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_ft_tts_english_welsh_en_5.5.0_3.0_1727079068885.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_tiny_ft_tts_english_welsh","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_tiny_ft_tts_english_welsh", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_ft_tts_english_welsh| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|389.8 MB| + +## References + +https://huggingface.co/DewiBrynJones/whisper-tiny-ft-tts-en-cy \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whisper_tiny_ft_tts_english_welsh_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-whisper_tiny_ft_tts_english_welsh_pipeline_en.md new file mode 100644 index 00000000000000..e141bc659d805a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whisper_tiny_ft_tts_english_welsh_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_tiny_ft_tts_english_welsh_pipeline pipeline WhisperForCTC from DewiBrynJones +author: John Snow Labs +name: whisper_tiny_ft_tts_english_welsh_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_ft_tts_english_welsh_pipeline` is a English model originally trained by DewiBrynJones. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_ft_tts_english_welsh_pipeline_en_5.5.0_3.0_1727079089735.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_ft_tts_english_welsh_pipeline_en_5.5.0_3.0_1727079089735.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_tiny_ft_tts_english_welsh_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_tiny_ft_tts_english_welsh_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_ft_tts_english_welsh_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|389.8 MB| + +## References + +https://huggingface.co/DewiBrynJones/whisper-tiny-ft-tts-en-cy + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whisper_tiny_hebrew_modern_2_he.md b/docs/_posts/ahmedlone127/2024-09-23-whisper_tiny_hebrew_modern_2_he.md new file mode 100644 index 00000000000000..070c8301888742 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whisper_tiny_hebrew_modern_2_he.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Hebrew whisper_tiny_hebrew_modern_2 WhisperForCTC from NS-Y +author: John Snow Labs +name: whisper_tiny_hebrew_modern_2 +date: 2024-09-23 +tags: [he, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: he +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_hebrew_modern_2` is a Hebrew model originally trained by NS-Y. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_hebrew_modern_2_he_5.5.0_3.0_1727117873987.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_hebrew_modern_2_he_5.5.0_3.0_1727117873987.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_tiny_hebrew_modern_2","he") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_tiny_hebrew_modern_2", "he") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_hebrew_modern_2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|he| +|Size:|242.8 MB| + +## References + +https://huggingface.co/NS-Y/whisper-tiny-he-2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whisper_tiny_hebrew_modern_2_pipeline_he.md b/docs/_posts/ahmedlone127/2024-09-23-whisper_tiny_hebrew_modern_2_pipeline_he.md new file mode 100644 index 00000000000000..1b37d62c91beef --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whisper_tiny_hebrew_modern_2_pipeline_he.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Hebrew whisper_tiny_hebrew_modern_2_pipeline pipeline WhisperForCTC from NS-Y +author: John Snow Labs +name: whisper_tiny_hebrew_modern_2_pipeline +date: 2024-09-23 +tags: [he, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: he +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_hebrew_modern_2_pipeline` is a Hebrew model originally trained by NS-Y. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_hebrew_modern_2_pipeline_he_5.5.0_3.0_1727117938886.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_hebrew_modern_2_pipeline_he_5.5.0_3.0_1727117938886.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_tiny_hebrew_modern_2_pipeline", lang = "he") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_tiny_hebrew_modern_2_pipeline", lang = "he") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_hebrew_modern_2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|he| +|Size:|242.9 MB| + +## References + +https://huggingface.co/NS-Y/whisper-tiny-he-2 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whisper_tiny_hindi_alexao_en.md b/docs/_posts/ahmedlone127/2024-09-23-whisper_tiny_hindi_alexao_en.md new file mode 100644 index 00000000000000..9cef0d3c4fb48a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whisper_tiny_hindi_alexao_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_tiny_hindi_alexao WhisperForCTC from Alexao +author: John Snow Labs +name: whisper_tiny_hindi_alexao +date: 2024-09-23 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_hindi_alexao` is a English model originally trained by Alexao. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_hindi_alexao_en_5.5.0_3.0_1727118439012.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_hindi_alexao_en_5.5.0_3.0_1727118439012.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_tiny_hindi_alexao","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_tiny_hindi_alexao", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_hindi_alexao| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|390.8 MB| + +## References + +https://huggingface.co/Alexao/whisper-tiny-hi \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whisper_tiny_hindi_alexao_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-whisper_tiny_hindi_alexao_pipeline_en.md new file mode 100644 index 00000000000000..eff27d107c1179 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whisper_tiny_hindi_alexao_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_tiny_hindi_alexao_pipeline pipeline WhisperForCTC from Alexao +author: John Snow Labs +name: whisper_tiny_hindi_alexao_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_hindi_alexao_pipeline` is a English model originally trained by Alexao. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_hindi_alexao_pipeline_en_5.5.0_3.0_1727118460972.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_hindi_alexao_pipeline_en_5.5.0_3.0_1727118460972.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_tiny_hindi_alexao_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_tiny_hindi_alexao_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_hindi_alexao_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|390.8 MB| + +## References + +https://huggingface.co/Alexao/whisper-tiny-hi + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whisper_tiny_minds14_english_us_b_koopman_en.md b/docs/_posts/ahmedlone127/2024-09-23-whisper_tiny_minds14_english_us_b_koopman_en.md new file mode 100644 index 00000000000000..bfcfc6958bf703 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whisper_tiny_minds14_english_us_b_koopman_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_tiny_minds14_english_us_b_koopman WhisperForCTC from b-koopman +author: John Snow Labs +name: whisper_tiny_minds14_english_us_b_koopman +date: 2024-09-23 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_minds14_english_us_b_koopman` is a English model originally trained by b-koopman. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_minds14_english_us_b_koopman_en_5.5.0_3.0_1727051409895.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_minds14_english_us_b_koopman_en_5.5.0_3.0_1727051409895.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_tiny_minds14_english_us_b_koopman","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_tiny_minds14_english_us_b_koopman", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_minds14_english_us_b_koopman| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|390.9 MB| + +## References + +https://huggingface.co/b-koopman/whisper-tiny-minds14-en-US \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whisper_tiny_minds14_english_us_b_koopman_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-whisper_tiny_minds14_english_us_b_koopman_pipeline_en.md new file mode 100644 index 00000000000000..0923de373427e3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whisper_tiny_minds14_english_us_b_koopman_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_tiny_minds14_english_us_b_koopman_pipeline pipeline WhisperForCTC from b-koopman +author: John Snow Labs +name: whisper_tiny_minds14_english_us_b_koopman_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_minds14_english_us_b_koopman_pipeline` is a English model originally trained by b-koopman. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_minds14_english_us_b_koopman_pipeline_en_5.5.0_3.0_1727051433077.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_minds14_english_us_b_koopman_pipeline_en_5.5.0_3.0_1727051433077.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_tiny_minds14_english_us_b_koopman_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_tiny_minds14_english_us_b_koopman_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_minds14_english_us_b_koopman_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|390.9 MB| + +## References + +https://huggingface.co/b-koopman/whisper-tiny-minds14-en-US + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whisper_tiny_minds14_sjdata_en.md b/docs/_posts/ahmedlone127/2024-09-23-whisper_tiny_minds14_sjdata_en.md new file mode 100644 index 00000000000000..26074bbb0ff2e1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whisper_tiny_minds14_sjdata_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_tiny_minds14_sjdata WhisperForCTC from sjdata +author: John Snow Labs +name: whisper_tiny_minds14_sjdata +date: 2024-09-23 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_minds14_sjdata` is a English model originally trained by sjdata. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_minds14_sjdata_en_5.5.0_3.0_1727076889654.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_minds14_sjdata_en_5.5.0_3.0_1727076889654.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_tiny_minds14_sjdata","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_tiny_minds14_sjdata", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_minds14_sjdata| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|390.9 MB| + +## References + +https://huggingface.co/sjdata/whisper-tiny-minds14 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whisper_tiny_minds14_sjdata_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-whisper_tiny_minds14_sjdata_pipeline_en.md new file mode 100644 index 00000000000000..e003c7623b2018 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whisper_tiny_minds14_sjdata_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_tiny_minds14_sjdata_pipeline pipeline WhisperForCTC from sjdata +author: John Snow Labs +name: whisper_tiny_minds14_sjdata_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_minds14_sjdata_pipeline` is a English model originally trained by sjdata. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_minds14_sjdata_pipeline_en_5.5.0_3.0_1727076909305.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_minds14_sjdata_pipeline_en_5.5.0_3.0_1727076909305.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_tiny_minds14_sjdata_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_tiny_minds14_sjdata_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_minds14_sjdata_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|390.9 MB| + +## References + +https://huggingface.co/sjdata/whisper-tiny-minds14 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whisper_tiny_spanish_spanish_nemo_unified_2024_06_26_09_12_11_en.md b/docs/_posts/ahmedlone127/2024-09-23-whisper_tiny_spanish_spanish_nemo_unified_2024_06_26_09_12_11_en.md new file mode 100644 index 00000000000000..ee46a06127f2ad --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whisper_tiny_spanish_spanish_nemo_unified_2024_06_26_09_12_11_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_tiny_spanish_spanish_nemo_unified_2024_06_26_09_12_11 WhisperForCTC from sgonzalezsilot +author: John Snow Labs +name: whisper_tiny_spanish_spanish_nemo_unified_2024_06_26_09_12_11 +date: 2024-09-23 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_spanish_spanish_nemo_unified_2024_06_26_09_12_11` is a English model originally trained by sgonzalezsilot. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_spanish_spanish_nemo_unified_2024_06_26_09_12_11_en_5.5.0_3.0_1727117371147.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_spanish_spanish_nemo_unified_2024_06_26_09_12_11_en_5.5.0_3.0_1727117371147.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_tiny_spanish_spanish_nemo_unified_2024_06_26_09_12_11","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_tiny_spanish_spanish_nemo_unified_2024_06_26_09_12_11", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_spanish_spanish_nemo_unified_2024_06_26_09_12_11| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|390.6 MB| + +## References + +https://huggingface.co/sgonzalezsilot/whisper-tiny-spanish-es-Nemo_unified_2024-06-26_09-12-11 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whisper_tiny_spanish_spanish_nemo_unified_2024_06_26_09_12_11_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-whisper_tiny_spanish_spanish_nemo_unified_2024_06_26_09_12_11_pipeline_en.md new file mode 100644 index 00000000000000..455302d3cd4799 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whisper_tiny_spanish_spanish_nemo_unified_2024_06_26_09_12_11_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_tiny_spanish_spanish_nemo_unified_2024_06_26_09_12_11_pipeline pipeline WhisperForCTC from sgonzalezsilot +author: John Snow Labs +name: whisper_tiny_spanish_spanish_nemo_unified_2024_06_26_09_12_11_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_spanish_spanish_nemo_unified_2024_06_26_09_12_11_pipeline` is a English model originally trained by sgonzalezsilot. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_spanish_spanish_nemo_unified_2024_06_26_09_12_11_pipeline_en_5.5.0_3.0_1727117390711.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_spanish_spanish_nemo_unified_2024_06_26_09_12_11_pipeline_en_5.5.0_3.0_1727117390711.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_tiny_spanish_spanish_nemo_unified_2024_06_26_09_12_11_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_tiny_spanish_spanish_nemo_unified_2024_06_26_09_12_11_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_spanish_spanish_nemo_unified_2024_06_26_09_12_11_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|390.6 MB| + +## References + +https://huggingface.co/sgonzalezsilot/whisper-tiny-spanish-es-Nemo_unified_2024-06-26_09-12-11 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whispercheckpoints3_pipeline_sv.md b/docs/_posts/ahmedlone127/2024-09-23-whispercheckpoints3_pipeline_sv.md new file mode 100644 index 00000000000000..2324a2ad3413b1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whispercheckpoints3_pipeline_sv.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Swedish whispercheckpoints3_pipeline pipeline WhisperForCTC from Yulle +author: John Snow Labs +name: whispercheckpoints3_pipeline +date: 2024-09-23 +tags: [sv, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: sv +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whispercheckpoints3_pipeline` is a Swedish model originally trained by Yulle. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whispercheckpoints3_pipeline_sv_5.5.0_3.0_1727053110214.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whispercheckpoints3_pipeline_sv_5.5.0_3.0_1727053110214.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whispercheckpoints3_pipeline", lang = "sv") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whispercheckpoints3_pipeline", lang = "sv") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whispercheckpoints3_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|sv| +|Size:|1.7 GB| + +## References + +https://huggingface.co/Yulle/WhisperCheckpoints3 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whispercheckpoints3_sv.md b/docs/_posts/ahmedlone127/2024-09-23-whispercheckpoints3_sv.md new file mode 100644 index 00000000000000..e9e54b4ffa0a5b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whispercheckpoints3_sv.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Swedish whispercheckpoints3 WhisperForCTC from Yulle +author: John Snow Labs +name: whispercheckpoints3 +date: 2024-09-23 +tags: [sv, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: sv +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whispercheckpoints3` is a Swedish model originally trained by Yulle. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whispercheckpoints3_sv_5.5.0_3.0_1727053029604.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whispercheckpoints3_sv_5.5.0_3.0_1727053029604.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whispercheckpoints3","sv") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whispercheckpoints3", "sv") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whispercheckpoints3| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|sv| +|Size:|1.7 GB| + +## References + +https://huggingface.co/Yulle/WhisperCheckpoints3 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-withinapps_ndd_mantisbt_test_tags_en.md b/docs/_posts/ahmedlone127/2024-09-23-withinapps_ndd_mantisbt_test_tags_en.md new file mode 100644 index 00000000000000..9b6b4cf4a81cc4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-withinapps_ndd_mantisbt_test_tags_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English withinapps_ndd_mantisbt_test_tags DistilBertForSequenceClassification from lgk03 +author: John Snow Labs +name: withinapps_ndd_mantisbt_test_tags +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`withinapps_ndd_mantisbt_test_tags` is a English model originally trained by lgk03. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/withinapps_ndd_mantisbt_test_tags_en_5.5.0_3.0_1727108288657.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/withinapps_ndd_mantisbt_test_tags_en_5.5.0_3.0_1727108288657.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("withinapps_ndd_mantisbt_test_tags","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("withinapps_ndd_mantisbt_test_tags", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|withinapps_ndd_mantisbt_test_tags| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/lgk03/WITHINAPPS_NDD-mantisbt_test-tags \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-withinapps_ndd_mantisbt_test_tags_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-withinapps_ndd_mantisbt_test_tags_pipeline_en.md new file mode 100644 index 00000000000000..a617f9422d18b8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-withinapps_ndd_mantisbt_test_tags_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English withinapps_ndd_mantisbt_test_tags_pipeline pipeline DistilBertForSequenceClassification from lgk03 +author: John Snow Labs +name: withinapps_ndd_mantisbt_test_tags_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`withinapps_ndd_mantisbt_test_tags_pipeline` is a English model originally trained by lgk03. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/withinapps_ndd_mantisbt_test_tags_pipeline_en_5.5.0_3.0_1727108302988.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/withinapps_ndd_mantisbt_test_tags_pipeline_en_5.5.0_3.0_1727108302988.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("withinapps_ndd_mantisbt_test_tags_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("withinapps_ndd_mantisbt_test_tags_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|withinapps_ndd_mantisbt_test_tags_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/lgk03/WITHINAPPS_NDD-mantisbt_test-tags + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-wolof_maskedmodel_en.md b/docs/_posts/ahmedlone127/2024-09-23-wolof_maskedmodel_en.md new file mode 100644 index 00000000000000..ecc4cf6b9a7bcc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-wolof_maskedmodel_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English wolof_maskedmodel RoBertaEmbeddings from gjonesQ02 +author: John Snow Labs +name: wolof_maskedmodel +date: 2024-09-23 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`wolof_maskedmodel` is a English model originally trained by gjonesQ02. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/wolof_maskedmodel_en_5.5.0_3.0_1727080695069.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/wolof_maskedmodel_en_5.5.0_3.0_1727080695069.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("wolof_maskedmodel","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("wolof_maskedmodel","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|wolof_maskedmodel| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|306.5 MB| + +## References + +https://huggingface.co/gjonesQ02/WO_MaskedModel \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-wolof_maskedmodel_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-wolof_maskedmodel_pipeline_en.md new file mode 100644 index 00000000000000..fd25b10a4f5fbd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-wolof_maskedmodel_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English wolof_maskedmodel_pipeline pipeline RoBertaEmbeddings from gjonesQ02 +author: John Snow Labs +name: wolof_maskedmodel_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`wolof_maskedmodel_pipeline` is a English model originally trained by gjonesQ02. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/wolof_maskedmodel_pipeline_en_5.5.0_3.0_1727080710145.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/wolof_maskedmodel_pipeline_en_5.5.0_3.0_1727080710145.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("wolof_maskedmodel_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("wolof_maskedmodel_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|wolof_maskedmodel_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|306.5 MB| + +## References + +https://huggingface.co/gjonesQ02/WO_MaskedModel + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-xlm_r_galen_livingner3_es.md b/docs/_posts/ahmedlone127/2024-09-23-xlm_r_galen_livingner3_es.md new file mode 100644 index 00000000000000..fde18acd4c0c84 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-xlm_r_galen_livingner3_es.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Castilian, Spanish xlm_r_galen_livingner3 XlmRoBertaForSequenceClassification from IIC +author: John Snow Labs +name: xlm_r_galen_livingner3 +date: 2024-09-23 +tags: [es, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: es +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_r_galen_livingner3` is a Castilian, Spanish model originally trained by IIC. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_r_galen_livingner3_es_5.5.0_3.0_1727099436320.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_r_galen_livingner3_es_5.5.0_3.0_1727099436320.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_r_galen_livingner3","es") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_r_galen_livingner3", "es") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_r_galen_livingner3| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|es| +|Size:|1.0 GB| + +## References + +https://huggingface.co/IIC/XLM_R_Galen-livingner3 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-xlm_r_galen_livingner3_pipeline_es.md b/docs/_posts/ahmedlone127/2024-09-23-xlm_r_galen_livingner3_pipeline_es.md new file mode 100644 index 00000000000000..0171a39f1c9a5d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-xlm_r_galen_livingner3_pipeline_es.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Castilian, Spanish xlm_r_galen_livingner3_pipeline pipeline XlmRoBertaForSequenceClassification from IIC +author: John Snow Labs +name: xlm_r_galen_livingner3_pipeline +date: 2024-09-23 +tags: [es, open_source, pipeline, onnx] +task: Text Classification +language: es +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_r_galen_livingner3_pipeline` is a Castilian, Spanish model originally trained by IIC. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_r_galen_livingner3_pipeline_es_5.5.0_3.0_1727099485645.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_r_galen_livingner3_pipeline_es_5.5.0_3.0_1727099485645.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_r_galen_livingner3_pipeline", lang = "es") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_r_galen_livingner3_pipeline", lang = "es") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_r_galen_livingner3_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|es| +|Size:|1.0 GB| + +## References + +https://huggingface.co/IIC/XLM_R_Galen-livingner3 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_english_sentweet_targeted_insult_en.md b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_english_sentweet_targeted_insult_en.md new file mode 100644 index 00000000000000..50b80e4fb07be0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_english_sentweet_targeted_insult_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_english_sentweet_targeted_insult XlmRoBertaForSequenceClassification from jayanta +author: John Snow Labs +name: xlm_roberta_base_english_sentweet_targeted_insult +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_english_sentweet_targeted_insult` is a English model originally trained by jayanta. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_english_sentweet_targeted_insult_en_5.5.0_3.0_1727089332791.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_english_sentweet_targeted_insult_en_5.5.0_3.0_1727089332791.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_english_sentweet_targeted_insult","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_english_sentweet_targeted_insult", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_english_sentweet_targeted_insult| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|785.8 MB| + +## References + +https://huggingface.co/jayanta/xlm-roberta-base-english-sentweet-targeted-insult \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_english_sentweet_targeted_insult_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_english_sentweet_targeted_insult_pipeline_en.md new file mode 100644 index 00000000000000..53615b5f53a68e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_english_sentweet_targeted_insult_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_english_sentweet_targeted_insult_pipeline pipeline XlmRoBertaForSequenceClassification from jayanta +author: John Snow Labs +name: xlm_roberta_base_english_sentweet_targeted_insult_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_english_sentweet_targeted_insult_pipeline` is a English model originally trained by jayanta. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_english_sentweet_targeted_insult_pipeline_en_5.5.0_3.0_1727089470422.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_english_sentweet_targeted_insult_pipeline_en_5.5.0_3.0_1727089470422.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_english_sentweet_targeted_insult_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_english_sentweet_targeted_insult_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_english_sentweet_targeted_insult_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|785.8 MB| + +## References + +https://huggingface.co/jayanta/xlm-roberta-base-english-sentweet-targeted-insult + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_final_mixed_train_1_en.md b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_final_mixed_train_1_en.md new file mode 100644 index 00000000000000..ddba5e8ae4b112 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_final_mixed_train_1_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_final_mixed_train_1 XlmRoBertaForSequenceClassification from ThuyNT03 +author: John Snow Labs +name: xlm_roberta_base_final_mixed_train_1 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_final_mixed_train_1` is a English model originally trained by ThuyNT03. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_final_mixed_train_1_en_5.5.0_3.0_1727126747007.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_final_mixed_train_1_en_5.5.0_3.0_1727126747007.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_final_mixed_train_1","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_final_mixed_train_1", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_final_mixed_train_1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|795.0 MB| + +## References + +https://huggingface.co/ThuyNT03/xlm-roberta-base-Final_Mixed-train-1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_final_mixed_train_2_en.md b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_final_mixed_train_2_en.md new file mode 100644 index 00000000000000..bb05dcd41c03eb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_final_mixed_train_2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_final_mixed_train_2 XlmRoBertaForSequenceClassification from ThuyNT03 +author: John Snow Labs +name: xlm_roberta_base_final_mixed_train_2 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_final_mixed_train_2` is a English model originally trained by ThuyNT03. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_final_mixed_train_2_en_5.5.0_3.0_1727099667830.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_final_mixed_train_2_en_5.5.0_3.0_1727099667830.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_final_mixed_train_2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_final_mixed_train_2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_final_mixed_train_2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|795.0 MB| + +## References + +https://huggingface.co/ThuyNT03/xlm-roberta-base-Final_Mixed-train-2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_final_mixed_train_2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_final_mixed_train_2_pipeline_en.md new file mode 100644 index 00000000000000..01a851a7e0eb3d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_final_mixed_train_2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_final_mixed_train_2_pipeline pipeline XlmRoBertaForSequenceClassification from ThuyNT03 +author: John Snow Labs +name: xlm_roberta_base_final_mixed_train_2_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_final_mixed_train_2_pipeline` is a English model originally trained by ThuyNT03. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_final_mixed_train_2_pipeline_en_5.5.0_3.0_1727099797245.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_final_mixed_train_2_pipeline_en_5.5.0_3.0_1727099797245.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_final_mixed_train_2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_final_mixed_train_2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_final_mixed_train_2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|795.0 MB| + +## References + +https://huggingface.co/ThuyNT03/xlm-roberta-base-Final_Mixed-train-2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_kinyarwanda_kinyarwanda_sent2_en.md b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_kinyarwanda_kinyarwanda_sent2_en.md new file mode 100644 index 00000000000000..c1ea13115ebd99 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_kinyarwanda_kinyarwanda_sent2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_kinyarwanda_kinyarwanda_sent2 XlmRoBertaForSequenceClassification from RogerB +author: John Snow Labs +name: xlm_roberta_base_finetuned_kinyarwanda_kinyarwanda_sent2 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_kinyarwanda_kinyarwanda_sent2` is a English model originally trained by RogerB. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_kinyarwanda_kinyarwanda_sent2_en_5.5.0_3.0_1727099771063.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_kinyarwanda_kinyarwanda_sent2_en_5.5.0_3.0_1727099771063.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_finetuned_kinyarwanda_kinyarwanda_sent2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_finetuned_kinyarwanda_kinyarwanda_sent2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_kinyarwanda_kinyarwanda_sent2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.0 GB| + +## References + +https://huggingface.co/RogerB/xlm-roberta-base-finetuned-kinyarwanda-kin-sent2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_kinyarwanda_kinyarwanda_sent2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_kinyarwanda_kinyarwanda_sent2_pipeline_en.md new file mode 100644 index 00000000000000..0209979d5b3c17 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_kinyarwanda_kinyarwanda_sent2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_kinyarwanda_kinyarwanda_sent2_pipeline pipeline XlmRoBertaForSequenceClassification from RogerB +author: John Snow Labs +name: xlm_roberta_base_finetuned_kinyarwanda_kinyarwanda_sent2_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_kinyarwanda_kinyarwanda_sent2_pipeline` is a English model originally trained by RogerB. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_kinyarwanda_kinyarwanda_sent2_pipeline_en_5.5.0_3.0_1727099820980.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_kinyarwanda_kinyarwanda_sent2_pipeline_en_5.5.0_3.0_1727099820980.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_kinyarwanda_kinyarwanda_sent2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_kinyarwanda_kinyarwanda_sent2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_kinyarwanda_kinyarwanda_sent2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.0 GB| + +## References + +https://huggingface.co/RogerB/xlm-roberta-base-finetuned-kinyarwanda-kin-sent2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_marc_clp_en.md b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_marc_clp_en.md new file mode 100644 index 00000000000000..f00eff15aea086 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_marc_clp_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_marc_clp XlmRoBertaForSequenceClassification from clp +author: John Snow Labs +name: xlm_roberta_base_finetuned_marc_clp +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_marc_clp` is a English model originally trained by clp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_marc_clp_en_5.5.0_3.0_1727088208546.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_marc_clp_en_5.5.0_3.0_1727088208546.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_finetuned_marc_clp","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_finetuned_marc_clp", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_marc_clp| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|835.1 MB| + +## References + +https://huggingface.co/clp/xlm-roberta-base-finetuned-marc \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_marc_clp_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_marc_clp_pipeline_en.md new file mode 100644 index 00000000000000..c871838f7ef236 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_marc_clp_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_marc_clp_pipeline pipeline XlmRoBertaForSequenceClassification from clp +author: John Snow Labs +name: xlm_roberta_base_finetuned_marc_clp_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_marc_clp_pipeline` is a English model originally trained by clp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_marc_clp_pipeline_en_5.5.0_3.0_1727088289384.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_marc_clp_pipeline_en_5.5.0_3.0_1727088289384.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_marc_clp_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_marc_clp_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_marc_clp_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|835.1 MB| + +## References + +https://huggingface.co/clp/xlm-roberta-base-finetuned-marc + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_all_jx7789_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_all_jx7789_pipeline_en.md new file mode 100644 index 00000000000000..d24e41e094ad29 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_all_jx7789_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_all_jx7789_pipeline pipeline XlmRoBertaForTokenClassification from jx7789 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_all_jx7789_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_all_jx7789_pipeline` is a English model originally trained by jx7789. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_jx7789_pipeline_en_5.5.0_3.0_1727061706795.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_jx7789_pipeline_en_5.5.0_3.0_1727061706795.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_all_jx7789_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_all_jx7789_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_all_jx7789_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|861.0 MB| + +## References + +https://huggingface.co/jx7789/xlm-roberta-base-finetuned-panx-all + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_all_ligerre_en.md b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_all_ligerre_en.md new file mode 100644 index 00000000000000..cae8867c22f489 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_all_ligerre_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_all_ligerre XlmRoBertaForTokenClassification from ligerre +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_all_ligerre +date: 2024-09-23 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_all_ligerre` is a English model originally trained by ligerre. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_ligerre_en_5.5.0_3.0_1727062079218.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_ligerre_en_5.5.0_3.0_1727062079218.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_all_ligerre","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_all_ligerre", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_all_ligerre| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|861.0 MB| + +## References + +https://huggingface.co/ligerre/xlm-roberta-base-finetuned-panx-all \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_all_ligerre_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_all_ligerre_pipeline_en.md new file mode 100644 index 00000000000000..674ece68453120 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_all_ligerre_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_all_ligerre_pipeline pipeline XlmRoBertaForTokenClassification from ligerre +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_all_ligerre_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_all_ligerre_pipeline` is a English model originally trained by ligerre. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_ligerre_pipeline_en_5.5.0_3.0_1727062144173.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_ligerre_pipeline_en_5.5.0_3.0_1727062144173.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_all_ligerre_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_all_ligerre_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_all_ligerre_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|861.1 MB| + +## References + +https://huggingface.co/ligerre/xlm-roberta-base-finetuned-panx-all + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_arabic_zaina01_en.md b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_arabic_zaina01_en.md new file mode 100644 index 00000000000000..d85d4e00e14127 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_arabic_zaina01_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_arabic_zaina01 XlmRoBertaForTokenClassification from zaina01 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_arabic_zaina01 +date: 2024-09-23 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_arabic_zaina01` is a English model originally trained by zaina01. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_arabic_zaina01_en_5.5.0_3.0_1727132561127.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_arabic_zaina01_en_5.5.0_3.0_1727132561127.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_arabic_zaina01","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_arabic_zaina01", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_arabic_zaina01| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|843.6 MB| + +## References + +https://huggingface.co/zaina01/xlm-roberta-base-finetuned-panx-ar \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_arabic_zaina01_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_arabic_zaina01_pipeline_en.md new file mode 100644 index 00000000000000..7715cfbe0f311b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_arabic_zaina01_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_arabic_zaina01_pipeline pipeline XlmRoBertaForTokenClassification from zaina01 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_arabic_zaina01_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_arabic_zaina01_pipeline` is a English model originally trained by zaina01. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_arabic_zaina01_pipeline_en_5.5.0_3.0_1727132633542.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_arabic_zaina01_pipeline_en_5.5.0_3.0_1727132633542.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_arabic_zaina01_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_arabic_zaina01_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_arabic_zaina01_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|843.7 MB| + +## References + +https://huggingface.co/zaina01/xlm-roberta-base-finetuned-panx-ar + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_english_juhyun76_en.md b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_english_juhyun76_en.md new file mode 100644 index 00000000000000..c0b9f4a6b62ad2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_english_juhyun76_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_english_juhyun76 XlmRoBertaForTokenClassification from juhyun76 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_english_juhyun76 +date: 2024-09-23 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_english_juhyun76` is a English model originally trained by juhyun76. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_juhyun76_en_5.5.0_3.0_1727132867349.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_juhyun76_en_5.5.0_3.0_1727132867349.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_english_juhyun76","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_english_juhyun76", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_english_juhyun76| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|814.7 MB| + +## References + +https://huggingface.co/juhyun76/xlm-roberta-base-finetuned-panx-en \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_english_juhyun76_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_english_juhyun76_pipeline_en.md new file mode 100644 index 00000000000000..448cc027a31882 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_english_juhyun76_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_english_juhyun76_pipeline pipeline XlmRoBertaForTokenClassification from juhyun76 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_english_juhyun76_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_english_juhyun76_pipeline` is a English model originally trained by juhyun76. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_juhyun76_pipeline_en_5.5.0_3.0_1727132971232.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_juhyun76_pipeline_en_5.5.0_3.0_1727132971232.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_english_juhyun76_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_english_juhyun76_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_english_juhyun76_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|814.7 MB| + +## References + +https://huggingface.co/juhyun76/xlm-roberta-base-finetuned-panx-en + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_french_clboetticher_school_en.md b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_french_clboetticher_school_en.md new file mode 100644 index 00000000000000..5dfe5d08839c0c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_french_clboetticher_school_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_clboetticher_school XlmRoBertaForTokenClassification from clboetticher-school +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_clboetticher_school +date: 2024-09-23 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_clboetticher_school` is a English model originally trained by clboetticher-school. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_clboetticher_school_en_5.5.0_3.0_1727132540994.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_clboetticher_school_en_5.5.0_3.0_1727132540994.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_french_clboetticher_school","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_french_clboetticher_school", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_clboetticher_school| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|840.9 MB| + +## References + +https://huggingface.co/clboetticher-school/xlm-roberta-base-finetuned-panx-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_french_clboetticher_school_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_french_clboetticher_school_pipeline_en.md new file mode 100644 index 00000000000000..88093567d5b1a9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_french_clboetticher_school_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_clboetticher_school_pipeline pipeline XlmRoBertaForTokenClassification from clboetticher-school +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_clboetticher_school_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_clboetticher_school_pipeline` is a English model originally trained by clboetticher-school. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_clboetticher_school_pipeline_en_5.5.0_3.0_1727132628814.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_clboetticher_school_pipeline_en_5.5.0_3.0_1727132628814.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_clboetticher_school_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_clboetticher_school_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_clboetticher_school_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|840.9 MB| + +## References + +https://huggingface.co/clboetticher-school/xlm-roberta-base-finetuned-panx-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_french_esperesa_en.md b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_french_esperesa_en.md new file mode 100644 index 00000000000000..f08cf7a2468d24 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_french_esperesa_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_esperesa XlmRoBertaForTokenClassification from esperesa +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_esperesa +date: 2024-09-23 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_esperesa` is a English model originally trained by esperesa. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_esperesa_en_5.5.0_3.0_1727132067738.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_esperesa_en_5.5.0_3.0_1727132067738.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_french_esperesa","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_french_esperesa", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_esperesa| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|831.2 MB| + +## References + +https://huggingface.co/esperesa/xlm-roberta-base-finetuned-panx-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_french_esperesa_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_french_esperesa_pipeline_en.md new file mode 100644 index 00000000000000..892eb990fc9dcc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_french_esperesa_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_esperesa_pipeline pipeline XlmRoBertaForTokenClassification from esperesa +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_esperesa_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_esperesa_pipeline` is a English model originally trained by esperesa. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_esperesa_pipeline_en_5.5.0_3.0_1727132152169.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_esperesa_pipeline_en_5.5.0_3.0_1727132152169.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_esperesa_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_esperesa_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_esperesa_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|831.2 MB| + +## References + +https://huggingface.co/esperesa/xlm-roberta-base-finetuned-panx-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_german_french_arnaudmkonan_en.md b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_german_french_arnaudmkonan_en.md new file mode 100644 index 00000000000000..73f506ed4d67d1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_german_french_arnaudmkonan_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_arnaudmkonan XlmRoBertaForTokenClassification from Arnaudmkonan +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_arnaudmkonan +date: 2024-09-23 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_arnaudmkonan` is a English model originally trained by Arnaudmkonan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_arnaudmkonan_en_5.5.0_3.0_1727132035379.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_arnaudmkonan_en_5.5.0_3.0_1727132035379.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_arnaudmkonan","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_arnaudmkonan", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_arnaudmkonan| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|858.2 MB| + +## References + +https://huggingface.co/Arnaudmkonan/xlm-roberta-base-finetuned-panx-de-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_german_french_arnaudmkonan_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_german_french_arnaudmkonan_pipeline_en.md new file mode 100644 index 00000000000000..4393b757a52f55 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_german_french_arnaudmkonan_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_arnaudmkonan_pipeline pipeline XlmRoBertaForTokenClassification from Arnaudmkonan +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_arnaudmkonan_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_arnaudmkonan_pipeline` is a English model originally trained by Arnaudmkonan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_arnaudmkonan_pipeline_en_5.5.0_3.0_1727132101751.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_arnaudmkonan_pipeline_en_5.5.0_3.0_1727132101751.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_arnaudmkonan_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_arnaudmkonan_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_arnaudmkonan_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|858.2 MB| + +## References + +https://huggingface.co/Arnaudmkonan/xlm-roberta-base-finetuned-panx-de-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_german_french_edwardjross_en.md b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_german_french_edwardjross_en.md new file mode 100644 index 00000000000000..914b8d3e7dc8a2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_german_french_edwardjross_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_edwardjross XlmRoBertaForTokenClassification from edwardjross +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_edwardjross +date: 2024-09-23 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_edwardjross` is a English model originally trained by edwardjross. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_edwardjross_en_5.5.0_3.0_1727132220889.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_edwardjross_en_5.5.0_3.0_1727132220889.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_edwardjross","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_edwardjross", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_edwardjross| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|858.2 MB| + +## References + +https://huggingface.co/edwardjross/xlm-roberta-base-finetuned-panx-de-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_german_french_edwardjross_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_german_french_edwardjross_pipeline_en.md new file mode 100644 index 00000000000000..5fa86f95fdef6e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_german_french_edwardjross_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_edwardjross_pipeline pipeline XlmRoBertaForTokenClassification from edwardjross +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_edwardjross_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_edwardjross_pipeline` is a English model originally trained by edwardjross. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_edwardjross_pipeline_en_5.5.0_3.0_1727132284996.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_edwardjross_pipeline_en_5.5.0_3.0_1727132284996.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_edwardjross_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_edwardjross_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_edwardjross_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|858.2 MB| + +## References + +https://huggingface.co/edwardjross/xlm-roberta-base-finetuned-panx-de-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_german_french_lee_soha_en.md b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_german_french_lee_soha_en.md new file mode 100644 index 00000000000000..7b5d02aed12ce4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_german_french_lee_soha_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_lee_soha XlmRoBertaForTokenClassification from Lee-soha +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_lee_soha +date: 2024-09-23 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_lee_soha` is a English model originally trained by Lee-soha. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_lee_soha_en_5.5.0_3.0_1727062311153.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_lee_soha_en_5.5.0_3.0_1727062311153.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_lee_soha","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_lee_soha", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_lee_soha| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|843.4 MB| + +## References + +https://huggingface.co/Lee-soha/xlm-roberta-base-finetuned-panx-de-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_german_french_lee_soha_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_german_french_lee_soha_pipeline_en.md new file mode 100644 index 00000000000000..4bfc4638d2f7ab --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_german_french_lee_soha_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_lee_soha_pipeline pipeline XlmRoBertaForTokenClassification from Lee-soha +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_lee_soha_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_lee_soha_pipeline` is a English model originally trained by Lee-soha. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_lee_soha_pipeline_en_5.5.0_3.0_1727062395510.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_lee_soha_pipeline_en_5.5.0_3.0_1727062395510.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_lee_soha_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_lee_soha_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_lee_soha_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|843.4 MB| + +## References + +https://huggingface.co/Lee-soha/xlm-roberta-base-finetuned-panx-de-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_german_french_ligerre_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_german_french_ligerre_pipeline_en.md new file mode 100644 index 00000000000000..8b994b72e838ff --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_german_french_ligerre_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_ligerre_pipeline pipeline XlmRoBertaForTokenClassification from ligerre +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_ligerre_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_ligerre_pipeline` is a English model originally trained by ligerre. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_ligerre_pipeline_en_5.5.0_3.0_1727132729786.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_ligerre_pipeline_en_5.5.0_3.0_1727132729786.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_ligerre_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_ligerre_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_ligerre_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|858.2 MB| + +## References + +https://huggingface.co/ligerre/xlm-roberta-base-finetuned-panx-de-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_german_french_misterneil_en.md b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_german_french_misterneil_en.md new file mode 100644 index 00000000000000..03813ab53e91a1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_german_french_misterneil_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_misterneil XlmRoBertaForTokenClassification from misterneil +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_misterneil +date: 2024-09-23 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_misterneil` is a English model originally trained by misterneil. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_misterneil_en_5.5.0_3.0_1727133168862.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_misterneil_en_5.5.0_3.0_1727133168862.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_misterneil","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_misterneil", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_misterneil| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|858.2 MB| + +## References + +https://huggingface.co/misterneil/xlm-roberta-base-finetuned-panx-de-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_german_french_misterneil_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_german_french_misterneil_pipeline_en.md new file mode 100644 index 00000000000000..42300f7a29d902 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_german_french_misterneil_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_misterneil_pipeline pipeline XlmRoBertaForTokenClassification from misterneil +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_misterneil_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_misterneil_pipeline` is a English model originally trained by misterneil. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_misterneil_pipeline_en_5.5.0_3.0_1727133242943.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_misterneil_pipeline_en_5.5.0_3.0_1727133242943.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_misterneil_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_misterneil_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_misterneil_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|858.2 MB| + +## References + +https://huggingface.co/misterneil/xlm-roberta-base-finetuned-panx-de-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_german_french_mjqing_en.md b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_german_french_mjqing_en.md new file mode 100644 index 00000000000000..3f0643eda98851 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_german_french_mjqing_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_mjqing XlmRoBertaForTokenClassification from MJQing +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_mjqing +date: 2024-09-23 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_mjqing` is a English model originally trained by MJQing. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_mjqing_en_5.5.0_3.0_1727061524790.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_mjqing_en_5.5.0_3.0_1727061524790.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_mjqing","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_mjqing", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_mjqing| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|843.4 MB| + +## References + +https://huggingface.co/MJQing/xlm-roberta-base-finetuned-panx-de-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_german_french_mjqing_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_german_french_mjqing_pipeline_en.md new file mode 100644 index 00000000000000..1f9280c19a32f4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_german_french_mjqing_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_mjqing_pipeline pipeline XlmRoBertaForTokenClassification from MJQing +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_mjqing_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_mjqing_pipeline` is a English model originally trained by MJQing. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_mjqing_pipeline_en_5.5.0_3.0_1727061612868.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_mjqing_pipeline_en_5.5.0_3.0_1727061612868.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_mjqing_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_mjqing_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_mjqing_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|843.4 MB| + +## References + +https://huggingface.co/MJQing/xlm-roberta-base-finetuned-panx-de-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_german_french_sh_zheng_en.md b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_german_french_sh_zheng_en.md new file mode 100644 index 00000000000000..942ebdc317b629 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_german_french_sh_zheng_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_sh_zheng XlmRoBertaForTokenClassification from sh-zheng +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_sh_zheng +date: 2024-09-23 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_sh_zheng` is a English model originally trained by sh-zheng. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_sh_zheng_en_5.5.0_3.0_1727061887846.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_sh_zheng_en_5.5.0_3.0_1727061887846.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_sh_zheng","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_sh_zheng", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_sh_zheng| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|843.4 MB| + +## References + +https://huggingface.co/sh-zheng/xlm-roberta-base-finetuned-panx-de-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_german_french_sh_zheng_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_german_french_sh_zheng_pipeline_en.md new file mode 100644 index 00000000000000..8572ba7a93a9e0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_german_french_sh_zheng_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_sh_zheng_pipeline pipeline XlmRoBertaForTokenClassification from sh-zheng +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_sh_zheng_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_sh_zheng_pipeline` is a English model originally trained by sh-zheng. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_sh_zheng_pipeline_en_5.5.0_3.0_1727061975805.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_sh_zheng_pipeline_en_5.5.0_3.0_1727061975805.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_sh_zheng_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_sh_zheng_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_sh_zheng_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|843.4 MB| + +## References + +https://huggingface.co/sh-zheng/xlm-roberta-base-finetuned-panx-de-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_german_jiogenes_en.md b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_german_jiogenes_en.md new file mode 100644 index 00000000000000..cf16ede2f0a9d0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_german_jiogenes_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_jiogenes XlmRoBertaForTokenClassification from jiogenes +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_jiogenes +date: 2024-09-23 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_jiogenes` is a English model originally trained by jiogenes. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_jiogenes_en_5.5.0_3.0_1727061753198.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_jiogenes_en_5.5.0_3.0_1727061753198.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_jiogenes","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_jiogenes", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_jiogenes| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|839.7 MB| + +## References + +https://huggingface.co/jiogenes/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_german_penguinman73_en.md b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_german_penguinman73_en.md new file mode 100644 index 00000000000000..cd998884554207 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_german_penguinman73_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_penguinman73 XlmRoBertaForTokenClassification from penguinman73 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_penguinman73 +date: 2024-09-23 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_penguinman73` is a English model originally trained by penguinman73. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_penguinman73_en_5.5.0_3.0_1727062023810.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_penguinman73_en_5.5.0_3.0_1727062023810.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_penguinman73","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_penguinman73", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_penguinman73| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|827.0 MB| + +## References + +https://huggingface.co/penguinman73/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_german_penguinman73_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_german_penguinman73_pipeline_en.md new file mode 100644 index 00000000000000..df240949465c3b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_german_penguinman73_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_penguinman73_pipeline pipeline XlmRoBertaForTokenClassification from penguinman73 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_penguinman73_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_penguinman73_pipeline` is a English model originally trained by penguinman73. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_penguinman73_pipeline_en_5.5.0_3.0_1727062120759.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_penguinman73_pipeline_en_5.5.0_3.0_1727062120759.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_penguinman73_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_penguinman73_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_penguinman73_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|827.0 MB| + +## References + +https://huggingface.co/penguinman73/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_german_thucdangvan020999_en.md b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_german_thucdangvan020999_en.md new file mode 100644 index 00000000000000..7d69996a0b52df --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_german_thucdangvan020999_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_thucdangvan020999 XlmRoBertaForTokenClassification from thucdangvan020999 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_thucdangvan020999 +date: 2024-09-23 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_thucdangvan020999` is a English model originally trained by thucdangvan020999. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_thucdangvan020999_en_5.5.0_3.0_1727132679594.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_thucdangvan020999_en_5.5.0_3.0_1727132679594.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_thucdangvan020999","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_thucdangvan020999", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_thucdangvan020999| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/thucdangvan020999/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_german_thucdangvan020999_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_german_thucdangvan020999_pipeline_en.md new file mode 100644 index 00000000000000..842b11eeed404f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_german_thucdangvan020999_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_thucdangvan020999_pipeline pipeline XlmRoBertaForTokenClassification from thucdangvan020999 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_thucdangvan020999_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_thucdangvan020999_pipeline` is a English model originally trained by thucdangvan020999. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_thucdangvan020999_pipeline_en_5.5.0_3.0_1727132751090.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_thucdangvan020999_pipeline_en_5.5.0_3.0_1727132751090.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_thucdangvan020999_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_thucdangvan020999_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_thucdangvan020999_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/thucdangvan020999/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_italian_agvelu_en.md b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_italian_agvelu_en.md new file mode 100644 index 00000000000000..1b998610d0544e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_italian_agvelu_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_agvelu XlmRoBertaForTokenClassification from agvelu +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_agvelu +date: 2024-09-23 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_agvelu` is a English model originally trained by agvelu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_agvelu_en_5.5.0_3.0_1727061995703.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_agvelu_en_5.5.0_3.0_1727061995703.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_agvelu","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_agvelu", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_agvelu| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|828.6 MB| + +## References + +https://huggingface.co/agvelu/xlm-roberta-base-finetuned-panx-it \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_italian_agvelu_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_italian_agvelu_pipeline_en.md new file mode 100644 index 00000000000000..7a1b18210fd36c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_italian_agvelu_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_agvelu_pipeline pipeline XlmRoBertaForTokenClassification from agvelu +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_agvelu_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_agvelu_pipeline` is a English model originally trained by agvelu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_agvelu_pipeline_en_5.5.0_3.0_1727062084285.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_agvelu_pipeline_en_5.5.0_3.0_1727062084285.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_agvelu_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_agvelu_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_agvelu_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|828.7 MB| + +## References + +https://huggingface.co/agvelu/xlm-roberta-base-finetuned-panx-it + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_italian_hcy5561_en.md b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_italian_hcy5561_en.md new file mode 100644 index 00000000000000..e9c417a3dfe579 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_italian_hcy5561_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_hcy5561 XlmRoBertaForTokenClassification from hcy5561 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_hcy5561 +date: 2024-09-23 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_hcy5561` is a English model originally trained by hcy5561. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_hcy5561_en_5.5.0_3.0_1727061541689.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_hcy5561_en_5.5.0_3.0_1727061541689.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_hcy5561","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_hcy5561", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_hcy5561| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|816.7 MB| + +## References + +https://huggingface.co/hcy5561/xlm-roberta-base-finetuned-panx-it \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_italian_hcy5561_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_italian_hcy5561_pipeline_en.md new file mode 100644 index 00000000000000..88f737d77de33a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_italian_hcy5561_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_hcy5561_pipeline pipeline XlmRoBertaForTokenClassification from hcy5561 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_hcy5561_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_hcy5561_pipeline` is a English model originally trained by hcy5561. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_hcy5561_pipeline_en_5.5.0_3.0_1727061647734.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_hcy5561_pipeline_en_5.5.0_3.0_1727061647734.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_hcy5561_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_hcy5561_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_hcy5561_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|816.8 MB| + +## References + +https://huggingface.co/hcy5561/xlm-roberta-base-finetuned-panx-it + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_italian_henryjiang_en.md b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_italian_henryjiang_en.md new file mode 100644 index 00000000000000..3484cb9b8439cd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_italian_henryjiang_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_henryjiang XlmRoBertaForTokenClassification from henryjiang +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_henryjiang +date: 2024-09-23 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_henryjiang` is a English model originally trained by henryjiang. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_henryjiang_en_5.5.0_3.0_1727132112453.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_henryjiang_en_5.5.0_3.0_1727132112453.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_henryjiang","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_henryjiang", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_henryjiang| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|833.1 MB| + +## References + +https://huggingface.co/henryjiang/xlm-roberta-base-finetuned-panx-it \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_italian_henryjiang_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_italian_henryjiang_pipeline_en.md new file mode 100644 index 00000000000000..bcb15a29258064 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_italian_henryjiang_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_henryjiang_pipeline pipeline XlmRoBertaForTokenClassification from henryjiang +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_henryjiang_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_henryjiang_pipeline` is a English model originally trained by henryjiang. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_henryjiang_pipeline_en_5.5.0_3.0_1727132195280.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_henryjiang_pipeline_en_5.5.0_3.0_1727132195280.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_henryjiang_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_henryjiang_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_henryjiang_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|833.1 MB| + +## References + +https://huggingface.co/henryjiang/xlm-roberta-base-finetuned-panx-it + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_italian_ryatora_en.md b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_italian_ryatora_en.md new file mode 100644 index 00000000000000..9a0e19c851ee98 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_italian_ryatora_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_ryatora XlmRoBertaForTokenClassification from ryatora +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_ryatora +date: 2024-09-23 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_ryatora` is a English model originally trained by ryatora. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_ryatora_en_5.5.0_3.0_1727133268595.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_ryatora_en_5.5.0_3.0_1727133268595.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_ryatora","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_ryatora", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_ryatora| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|828.6 MB| + +## References + +https://huggingface.co/ryatora/xlm-roberta-base-finetuned-panx-it \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_italian_seobak_en.md b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_italian_seobak_en.md new file mode 100644 index 00000000000000..8fe9b3cc5da1ca --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_italian_seobak_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_seobak XlmRoBertaForTokenClassification from seobak +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_seobak +date: 2024-09-23 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_seobak` is a English model originally trained by seobak. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_seobak_en_5.5.0_3.0_1727133052888.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_seobak_en_5.5.0_3.0_1727133052888.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_seobak","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_seobak", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_seobak| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|816.7 MB| + +## References + +https://huggingface.co/seobak/xlm-roberta-base-finetuned-panx-it \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_italian_seobak_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_italian_seobak_pipeline_en.md new file mode 100644 index 00000000000000..7b434c1bb642fc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_italian_seobak_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_seobak_pipeline pipeline XlmRoBertaForTokenClassification from seobak +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_seobak_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_seobak_pipeline` is a English model originally trained by seobak. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_seobak_pipeline_en_5.5.0_3.0_1727133155549.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_seobak_pipeline_en_5.5.0_3.0_1727133155549.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_seobak_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_seobak_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_seobak_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|816.8 MB| + +## References + +https://huggingface.co/seobak/xlm-roberta-base-finetuned-panx-it + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_italian_zardian_en.md b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_italian_zardian_en.md new file mode 100644 index 00000000000000..e9226da2034f01 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_italian_zardian_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_zardian XlmRoBertaForTokenClassification from Zardian +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_zardian +date: 2024-09-23 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_zardian` is a English model originally trained by Zardian. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_zardian_en_5.5.0_3.0_1727133058148.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_zardian_en_5.5.0_3.0_1727133058148.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_zardian","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_zardian", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_zardian| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|816.7 MB| + +## References + +https://huggingface.co/Zardian/xlm-roberta-base-finetuned-panx-it \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_italian_zardian_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_italian_zardian_pipeline_en.md new file mode 100644 index 00000000000000..83632726c915b9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_italian_zardian_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_zardian_pipeline pipeline XlmRoBertaForTokenClassification from Zardian +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_zardian_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_zardian_pipeline` is a English model originally trained by Zardian. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_zardian_pipeline_en_5.5.0_3.0_1727133160822.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_zardian_pipeline_en_5.5.0_3.0_1727133160822.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_zardian_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_zardian_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_zardian_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|816.8 MB| + +## References + +https://huggingface.co/Zardian/xlm-roberta-base-finetuned-panx-it + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_hate_speech_ben_hin_bn.md b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_hate_speech_ben_hin_bn.md new file mode 100644 index 00000000000000..a657a2036bacb8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_hate_speech_ben_hin_bn.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Bengali xlm_roberta_base_hate_speech_ben_hin XlmRoBertaForSequenceClassification from kingshukroy +author: John Snow Labs +name: xlm_roberta_base_hate_speech_ben_hin +date: 2024-09-23 +tags: [bn, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: bn +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_hate_speech_ben_hin` is a Bengali model originally trained by kingshukroy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_hate_speech_ben_hin_bn_5.5.0_3.0_1727089310552.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_hate_speech_ben_hin_bn_5.5.0_3.0_1727089310552.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_hate_speech_ben_hin","bn") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_hate_speech_ben_hin", "bn") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_hate_speech_ben_hin| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|bn| +|Size:|791.2 MB| + +## References + +https://huggingface.co/kingshukroy/xlm-roberta-base-hate-speech-ben-hin \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_hate_speech_ben_hin_pipeline_bn.md b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_hate_speech_ben_hin_pipeline_bn.md new file mode 100644 index 00000000000000..1f668c4c57d2e1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_hate_speech_ben_hin_pipeline_bn.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Bengali xlm_roberta_base_hate_speech_ben_hin_pipeline pipeline XlmRoBertaForSequenceClassification from kingshukroy +author: John Snow Labs +name: xlm_roberta_base_hate_speech_ben_hin_pipeline +date: 2024-09-23 +tags: [bn, open_source, pipeline, onnx] +task: Text Classification +language: bn +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_hate_speech_ben_hin_pipeline` is a Bengali model originally trained by kingshukroy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_hate_speech_ben_hin_pipeline_bn_5.5.0_3.0_1727089444477.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_hate_speech_ben_hin_pipeline_bn_5.5.0_3.0_1727089444477.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_hate_speech_ben_hin_pipeline", lang = "bn") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_hate_speech_ben_hin_pipeline", lang = "bn") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_hate_speech_ben_hin_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|bn| +|Size:|791.2 MB| + +## References + +https://huggingface.co/kingshukroy/xlm-roberta-base-hate-speech-ben-hin + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_language_detection_finetuned_en.md b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_language_detection_finetuned_en.md new file mode 100644 index 00000000000000..4d892cce314608 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_language_detection_finetuned_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_language_detection_finetuned XlmRoBertaForSequenceClassification from RonTon05 +author: John Snow Labs +name: xlm_roberta_base_language_detection_finetuned +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_language_detection_finetuned` is a English model originally trained by RonTon05. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_language_detection_finetuned_en_5.5.0_3.0_1727088733158.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_language_detection_finetuned_en_5.5.0_3.0_1727088733158.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_language_detection_finetuned","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_language_detection_finetuned", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_language_detection_finetuned| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|890.4 MB| + +## References + +https://huggingface.co/RonTon05/xlm-roberta-base-language-detection-finetuned \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_language_detection_finetuned_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_language_detection_finetuned_pipeline_en.md new file mode 100644 index 00000000000000..1ec17258b44315 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_language_detection_finetuned_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_language_detection_finetuned_pipeline pipeline XlmRoBertaForSequenceClassification from RonTon05 +author: John Snow Labs +name: xlm_roberta_base_language_detection_finetuned_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_language_detection_finetuned_pipeline` is a English model originally trained by RonTon05. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_language_detection_finetuned_pipeline_en_5.5.0_3.0_1727088824556.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_language_detection_finetuned_pipeline_en_5.5.0_3.0_1727088824556.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_language_detection_finetuned_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_language_detection_finetuned_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_language_detection_finetuned_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|890.4 MB| + +## References + +https://huggingface.co/RonTon05/xlm-roberta-base-language-detection-finetuned + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_lr5e_06_seed42_basic_original_amh_esp_eng_train_en.md b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_lr5e_06_seed42_basic_original_amh_esp_eng_train_en.md new file mode 100644 index 00000000000000..0beeff250764aa --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_lr5e_06_seed42_basic_original_amh_esp_eng_train_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_lr5e_06_seed42_basic_original_amh_esp_eng_train XlmRoBertaForSequenceClassification from shanhy +author: John Snow Labs +name: xlm_roberta_base_lr5e_06_seed42_basic_original_amh_esp_eng_train +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_lr5e_06_seed42_basic_original_amh_esp_eng_train` is a English model originally trained by shanhy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_lr5e_06_seed42_basic_original_amh_esp_eng_train_en_5.5.0_3.0_1727125835133.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_lr5e_06_seed42_basic_original_amh_esp_eng_train_en_5.5.0_3.0_1727125835133.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_lr5e_06_seed42_basic_original_amh_esp_eng_train","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_lr5e_06_seed42_basic_original_amh_esp_eng_train", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_lr5e_06_seed42_basic_original_amh_esp_eng_train| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|804.8 MB| + +## References + +https://huggingface.co/shanhy/xlm-roberta-base_lr5e-06_seed42_basic_original_amh-esp-eng_train \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_lr5e_06_seed42_basic_original_amh_esp_eng_train_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_lr5e_06_seed42_basic_original_amh_esp_eng_train_pipeline_en.md new file mode 100644 index 00000000000000..faf39b0fefca5d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_lr5e_06_seed42_basic_original_amh_esp_eng_train_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_lr5e_06_seed42_basic_original_amh_esp_eng_train_pipeline pipeline XlmRoBertaForSequenceClassification from shanhy +author: John Snow Labs +name: xlm_roberta_base_lr5e_06_seed42_basic_original_amh_esp_eng_train_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_lr5e_06_seed42_basic_original_amh_esp_eng_train_pipeline` is a English model originally trained by shanhy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_lr5e_06_seed42_basic_original_amh_esp_eng_train_pipeline_en_5.5.0_3.0_1727125963604.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_lr5e_06_seed42_basic_original_amh_esp_eng_train_pipeline_en_5.5.0_3.0_1727125963604.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_lr5e_06_seed42_basic_original_amh_esp_eng_train_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_lr5e_06_seed42_basic_original_amh_esp_eng_train_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_lr5e_06_seed42_basic_original_amh_esp_eng_train_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|804.8 MB| + +## References + +https://huggingface.co/shanhy/xlm-roberta-base_lr5e-06_seed42_basic_original_amh-esp-eng_train + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_seed42_amh_esp_eng_train_en.md b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_seed42_amh_esp_eng_train_en.md new file mode 100644 index 00000000000000..5839ea1fd16215 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_seed42_amh_esp_eng_train_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_seed42_amh_esp_eng_train XlmRoBertaForSequenceClassification from shanhy +author: John Snow Labs +name: xlm_roberta_base_seed42_amh_esp_eng_train +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_seed42_amh_esp_eng_train` is a English model originally trained by shanhy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_seed42_amh_esp_eng_train_en_5.5.0_3.0_1727099233198.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_seed42_amh_esp_eng_train_en_5.5.0_3.0_1727099233198.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_seed42_amh_esp_eng_train","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_seed42_amh_esp_eng_train", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_seed42_amh_esp_eng_train| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|810.2 MB| + +## References + +https://huggingface.co/shanhy/xlm-roberta-base_seed42_amh-esp-eng_train \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_seed42_amh_esp_eng_train_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_seed42_amh_esp_eng_train_pipeline_en.md new file mode 100644 index 00000000000000..6d35e264979520 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_seed42_amh_esp_eng_train_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_seed42_amh_esp_eng_train_pipeline pipeline XlmRoBertaForSequenceClassification from shanhy +author: John Snow Labs +name: xlm_roberta_base_seed42_amh_esp_eng_train_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_seed42_amh_esp_eng_train_pipeline` is a English model originally trained by shanhy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_seed42_amh_esp_eng_train_pipeline_en_5.5.0_3.0_1727099363649.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_seed42_amh_esp_eng_train_pipeline_en_5.5.0_3.0_1727099363649.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_seed42_amh_esp_eng_train_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_seed42_amh_esp_eng_train_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_seed42_amh_esp_eng_train_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|810.2 MB| + +## References + +https://huggingface.co/shanhy/xlm-roberta-base_seed42_amh-esp-eng_train + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_text_classification_en.md b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_text_classification_en.md new file mode 100644 index 00000000000000..f08a63f29aada7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_text_classification_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_text_classification XlmRoBertaForSequenceClassification from CeroShrijver +author: John Snow Labs +name: xlm_roberta_base_text_classification +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_text_classification` is a English model originally trained by CeroShrijver. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_text_classification_en_5.5.0_3.0_1727088644804.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_text_classification_en_5.5.0_3.0_1727088644804.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_text_classification","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_text_classification", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_text_classification| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|838.1 MB| + +## References + +https://huggingface.co/CeroShrijver/xlm-roberta-base-text-classification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_text_classification_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_text_classification_pipeline_en.md new file mode 100644 index 00000000000000..2bc84da58c1ccf --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_text_classification_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_text_classification_pipeline pipeline XlmRoBertaForSequenceClassification from CeroShrijver +author: John Snow Labs +name: xlm_roberta_base_text_classification_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_text_classification_pipeline` is a English model originally trained by CeroShrijver. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_text_classification_pipeline_en_5.5.0_3.0_1727088729153.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_text_classification_pipeline_en_5.5.0_3.0_1727088729153.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_text_classification_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_text_classification_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_text_classification_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|838.1 MB| + +## References + +https://huggingface.co/CeroShrijver/xlm-roberta-base-text-classification + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_trimmed_english_tweet_sentiment_english_en.md b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_trimmed_english_tweet_sentiment_english_en.md new file mode 100644 index 00000000000000..354e7c91881b33 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_trimmed_english_tweet_sentiment_english_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_trimmed_english_tweet_sentiment_english XlmRoBertaForSequenceClassification from vocabtrimmer +author: John Snow Labs +name: xlm_roberta_base_trimmed_english_tweet_sentiment_english +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_trimmed_english_tweet_sentiment_english` is a English model originally trained by vocabtrimmer. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_trimmed_english_tweet_sentiment_english_en_5.5.0_3.0_1727088551061.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_trimmed_english_tweet_sentiment_english_en_5.5.0_3.0_1727088551061.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_trimmed_english_tweet_sentiment_english","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_trimmed_english_tweet_sentiment_english", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_trimmed_english_tweet_sentiment_english| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|647.0 MB| + +## References + +https://huggingface.co/vocabtrimmer/xlm-roberta-base-trimmed-en-tweet-sentiment-en \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_trimmed_english_tweet_sentiment_english_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_trimmed_english_tweet_sentiment_english_pipeline_en.md new file mode 100644 index 00000000000000..6d6d5465f86b92 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_trimmed_english_tweet_sentiment_english_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_trimmed_english_tweet_sentiment_english_pipeline pipeline XlmRoBertaForSequenceClassification from vocabtrimmer +author: John Snow Labs +name: xlm_roberta_base_trimmed_english_tweet_sentiment_english_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_trimmed_english_tweet_sentiment_english_pipeline` is a English model originally trained by vocabtrimmer. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_trimmed_english_tweet_sentiment_english_pipeline_en_5.5.0_3.0_1727088654178.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_trimmed_english_tweet_sentiment_english_pipeline_en_5.5.0_3.0_1727088654178.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_trimmed_english_tweet_sentiment_english_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_trimmed_english_tweet_sentiment_english_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_trimmed_english_tweet_sentiment_english_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|647.0 MB| + +## References + +https://huggingface.co/vocabtrimmer/xlm-roberta-base-trimmed-en-tweet-sentiment-en + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_trimmed_german_xnli_german_en.md b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_trimmed_german_xnli_german_en.md new file mode 100644 index 00000000000000..1c76ddea6c1884 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_trimmed_german_xnli_german_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_trimmed_german_xnli_german XlmRoBertaForSequenceClassification from vocabtrimmer +author: John Snow Labs +name: xlm_roberta_base_trimmed_german_xnli_german +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_trimmed_german_xnli_german` is a English model originally trained by vocabtrimmer. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_trimmed_german_xnli_german_en_5.5.0_3.0_1727126538849.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_trimmed_german_xnli_german_en_5.5.0_3.0_1727126538849.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_trimmed_german_xnli_german","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_trimmed_german_xnli_german", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_trimmed_german_xnli_german| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|528.9 MB| + +## References + +https://huggingface.co/vocabtrimmer/xlm-roberta-base-trimmed-de-xnli-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_trimmed_german_xnli_german_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_trimmed_german_xnli_german_pipeline_en.md new file mode 100644 index 00000000000000..3acc64065564bf --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_trimmed_german_xnli_german_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_trimmed_german_xnli_german_pipeline pipeline XlmRoBertaForSequenceClassification from vocabtrimmer +author: John Snow Labs +name: xlm_roberta_base_trimmed_german_xnli_german_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_trimmed_german_xnli_german_pipeline` is a English model originally trained by vocabtrimmer. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_trimmed_german_xnli_german_pipeline_en_5.5.0_3.0_1727126586159.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_trimmed_german_xnli_german_pipeline_en_5.5.0_3.0_1727126586159.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_trimmed_german_xnli_german_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_trimmed_german_xnli_german_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_trimmed_german_xnli_german_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|528.9 MB| + +## References + +https://huggingface.co/vocabtrimmer/xlm-roberta-base-trimmed-de-xnli-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_trimmed_italian_60000_tweet_sentiment_italian_en.md b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_trimmed_italian_60000_tweet_sentiment_italian_en.md new file mode 100644 index 00000000000000..3c67124b9bb52b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_trimmed_italian_60000_tweet_sentiment_italian_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_trimmed_italian_60000_tweet_sentiment_italian XlmRoBertaForSequenceClassification from vocabtrimmer +author: John Snow Labs +name: xlm_roberta_base_trimmed_italian_60000_tweet_sentiment_italian +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_trimmed_italian_60000_tweet_sentiment_italian` is a English model originally trained by vocabtrimmer. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_trimmed_italian_60000_tweet_sentiment_italian_en_5.5.0_3.0_1727126717646.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_trimmed_italian_60000_tweet_sentiment_italian_en_5.5.0_3.0_1727126717646.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_trimmed_italian_60000_tweet_sentiment_italian","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_trimmed_italian_60000_tweet_sentiment_italian", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_trimmed_italian_60000_tweet_sentiment_italian| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|443.4 MB| + +## References + +https://huggingface.co/vocabtrimmer/xlm-roberta-base-trimmed-it-60000-tweet-sentiment-it \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_trimmed_italian_tweet_sentiment_italian_en.md b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_trimmed_italian_tweet_sentiment_italian_en.md new file mode 100644 index 00000000000000..f946bf281507eb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_trimmed_italian_tweet_sentiment_italian_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_trimmed_italian_tweet_sentiment_italian XlmRoBertaForSequenceClassification from vocabtrimmer +author: John Snow Labs +name: xlm_roberta_base_trimmed_italian_tweet_sentiment_italian +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_trimmed_italian_tweet_sentiment_italian` is a English model originally trained by vocabtrimmer. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_trimmed_italian_tweet_sentiment_italian_en_5.5.0_3.0_1727099902553.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_trimmed_italian_tweet_sentiment_italian_en_5.5.0_3.0_1727099902553.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_trimmed_italian_tweet_sentiment_italian","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_trimmed_italian_tweet_sentiment_italian", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_trimmed_italian_tweet_sentiment_italian| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|457.4 MB| + +## References + +https://huggingface.co/vocabtrimmer/xlm-roberta-base-trimmed-it-tweet-sentiment-it \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_trimmed_italian_tweet_sentiment_italian_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_trimmed_italian_tweet_sentiment_italian_pipeline_en.md new file mode 100644 index 00000000000000..c27e8e7562f8b8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_trimmed_italian_tweet_sentiment_italian_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_trimmed_italian_tweet_sentiment_italian_pipeline pipeline XlmRoBertaForSequenceClassification from vocabtrimmer +author: John Snow Labs +name: xlm_roberta_base_trimmed_italian_tweet_sentiment_italian_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_trimmed_italian_tweet_sentiment_italian_pipeline` is a English model originally trained by vocabtrimmer. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_trimmed_italian_tweet_sentiment_italian_pipeline_en_5.5.0_3.0_1727099948457.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_trimmed_italian_tweet_sentiment_italian_pipeline_en_5.5.0_3.0_1727099948457.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_trimmed_italian_tweet_sentiment_italian_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_trimmed_italian_tweet_sentiment_italian_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_trimmed_italian_tweet_sentiment_italian_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|457.4 MB| + +## References + +https://huggingface.co/vocabtrimmer/xlm-roberta-base-trimmed-it-tweet-sentiment-it + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_tweet_sentiment_french_trimmed_french_10000_en.md b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_tweet_sentiment_french_trimmed_french_10000_en.md new file mode 100644 index 00000000000000..0f73d9e1ef2359 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_tweet_sentiment_french_trimmed_french_10000_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_tweet_sentiment_french_trimmed_french_10000 XlmRoBertaForSequenceClassification from vocabtrimmer +author: John Snow Labs +name: xlm_roberta_base_tweet_sentiment_french_trimmed_french_10000 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_tweet_sentiment_french_trimmed_french_10000` is a English model originally trained by vocabtrimmer. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_tweet_sentiment_french_trimmed_french_10000_en_5.5.0_3.0_1727099650809.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_tweet_sentiment_french_trimmed_french_10000_en_5.5.0_3.0_1727099650809.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_tweet_sentiment_french_trimmed_french_10000","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_tweet_sentiment_french_trimmed_french_10000", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_tweet_sentiment_french_trimmed_french_10000| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|349.5 MB| + +## References + +https://huggingface.co/vocabtrimmer/xlm-roberta-base-tweet-sentiment-fr-trimmed-fr-10000 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_tweet_sentiment_italian_trimmed_italian_15000_en.md b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_tweet_sentiment_italian_trimmed_italian_15000_en.md new file mode 100644 index 00000000000000..46a928bb376630 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_tweet_sentiment_italian_trimmed_italian_15000_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_tweet_sentiment_italian_trimmed_italian_15000 XlmRoBertaForSequenceClassification from vocabtrimmer +author: John Snow Labs +name: xlm_roberta_base_tweet_sentiment_italian_trimmed_italian_15000 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_tweet_sentiment_italian_trimmed_italian_15000` is a English model originally trained by vocabtrimmer. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_tweet_sentiment_italian_trimmed_italian_15000_en_5.5.0_3.0_1727126677799.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_tweet_sentiment_italian_trimmed_italian_15000_en_5.5.0_3.0_1727126677799.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_tweet_sentiment_italian_trimmed_italian_15000","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_tweet_sentiment_italian_trimmed_italian_15000", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_tweet_sentiment_italian_trimmed_italian_15000| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|360.2 MB| + +## References + +https://huggingface.co/vocabtrimmer/xlm-roberta-base-tweet-sentiment-it-trimmed-it-15000 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_vaxxstance_spanish_en.md b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_vaxxstance_spanish_en.md new file mode 100644 index 00000000000000..e54cf92d4d45ea --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_vaxxstance_spanish_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_vaxxstance_spanish XlmRoBertaForSequenceClassification from nouman-10 +author: John Snow Labs +name: xlm_roberta_base_vaxxstance_spanish +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_vaxxstance_spanish` is a English model originally trained by nouman-10. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_vaxxstance_spanish_en_5.5.0_3.0_1727125849546.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_vaxxstance_spanish_en_5.5.0_3.0_1727125849546.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_vaxxstance_spanish","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_vaxxstance_spanish", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_vaxxstance_spanish| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|831.2 MB| + +## References + +https://huggingface.co/nouman-10/xlm-roberta-base_vaxxstance_spanish \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_vaxxstance_spanish_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_vaxxstance_spanish_pipeline_en.md new file mode 100644 index 00000000000000..c430cce4bb8ae9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_vaxxstance_spanish_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_vaxxstance_spanish_pipeline pipeline XlmRoBertaForSequenceClassification from nouman-10 +author: John Snow Labs +name: xlm_roberta_base_vaxxstance_spanish_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_vaxxstance_spanish_pipeline` is a English model originally trained by nouman-10. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_vaxxstance_spanish_pipeline_en_5.5.0_3.0_1727125931087.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_vaxxstance_spanish_pipeline_en_5.5.0_3.0_1727125931087.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_vaxxstance_spanish_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_vaxxstance_spanish_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_vaxxstance_spanish_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|831.3 MB| + +## References + +https://huggingface.co/nouman-10/xlm-roberta-base_vaxxstance_spanish + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_xnli_spanish_trimmed_spanish_10000_en.md b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_xnli_spanish_trimmed_spanish_10000_en.md new file mode 100644 index 00000000000000..9382428145254a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_xnli_spanish_trimmed_spanish_10000_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_xnli_spanish_trimmed_spanish_10000 XlmRoBertaForSequenceClassification from vocabtrimmer +author: John Snow Labs +name: xlm_roberta_base_xnli_spanish_trimmed_spanish_10000 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_xnli_spanish_trimmed_spanish_10000` is a English model originally trained by vocabtrimmer. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_xnli_spanish_trimmed_spanish_10000_en_5.5.0_3.0_1727089318019.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_xnli_spanish_trimmed_spanish_10000_en_5.5.0_3.0_1727089318019.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_xnli_spanish_trimmed_spanish_10000","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_xnli_spanish_trimmed_spanish_10000", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_xnli_spanish_trimmed_spanish_10000| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|353.6 MB| + +## References + +https://huggingface.co/vocabtrimmer/xlm-roberta-base-xnli-es-trimmed-es-10000 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_xnli_spanish_trimmed_spanish_10000_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_xnli_spanish_trimmed_spanish_10000_pipeline_en.md new file mode 100644 index 00000000000000..1ed0a2c68f078e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_xnli_spanish_trimmed_spanish_10000_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_xnli_spanish_trimmed_spanish_10000_pipeline pipeline XlmRoBertaForSequenceClassification from vocabtrimmer +author: John Snow Labs +name: xlm_roberta_base_xnli_spanish_trimmed_spanish_10000_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_xnli_spanish_trimmed_spanish_10000_pipeline` is a English model originally trained by vocabtrimmer. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_xnli_spanish_trimmed_spanish_10000_pipeline_en_5.5.0_3.0_1727089335088.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_xnli_spanish_trimmed_spanish_10000_pipeline_en_5.5.0_3.0_1727089335088.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_xnli_spanish_trimmed_spanish_10000_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_xnli_spanish_trimmed_spanish_10000_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_xnli_spanish_trimmed_spanish_10000_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|353.6 MB| + +## References + +https://huggingface.co/vocabtrimmer/xlm-roberta-base-xnli-es-trimmed-es-10000 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-xlmr_estonian_english_all_shuffled_2020_test1000_en.md b/docs/_posts/ahmedlone127/2024-09-23-xlmr_estonian_english_all_shuffled_2020_test1000_en.md new file mode 100644 index 00000000000000..4446d63beeaf57 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-xlmr_estonian_english_all_shuffled_2020_test1000_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlmr_estonian_english_all_shuffled_2020_test1000 XlmRoBertaForSequenceClassification from patpizio +author: John Snow Labs +name: xlmr_estonian_english_all_shuffled_2020_test1000 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlmr_estonian_english_all_shuffled_2020_test1000` is a English model originally trained by patpizio. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmr_estonian_english_all_shuffled_2020_test1000_en_5.5.0_3.0_1727099510614.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmr_estonian_english_all_shuffled_2020_test1000_en_5.5.0_3.0_1727099510614.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlmr_estonian_english_all_shuffled_2020_test1000","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlmr_estonian_english_all_shuffled_2020_test1000", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmr_estonian_english_all_shuffled_2020_test1000| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|819.1 MB| + +## References + +https://huggingface.co/patpizio/xlmr-et-en-all_shuffled-2020-test1000 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-xlmr_estonian_english_all_shuffled_2020_test1000_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-xlmr_estonian_english_all_shuffled_2020_test1000_pipeline_en.md new file mode 100644 index 00000000000000..8fd3a51adaa510 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-xlmr_estonian_english_all_shuffled_2020_test1000_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlmr_estonian_english_all_shuffled_2020_test1000_pipeline pipeline XlmRoBertaForSequenceClassification from patpizio +author: John Snow Labs +name: xlmr_estonian_english_all_shuffled_2020_test1000_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlmr_estonian_english_all_shuffled_2020_test1000_pipeline` is a English model originally trained by patpizio. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmr_estonian_english_all_shuffled_2020_test1000_pipeline_en_5.5.0_3.0_1727099627897.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmr_estonian_english_all_shuffled_2020_test1000_pipeline_en_5.5.0_3.0_1727099627897.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlmr_estonian_english_all_shuffled_2020_test1000_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlmr_estonian_english_all_shuffled_2020_test1000_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmr_estonian_english_all_shuffled_2020_test1000_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|819.1 MB| + +## References + +https://huggingface.co/patpizio/xlmr-et-en-all_shuffled-2020-test1000 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-xlmr_romanian_english_all_shuffled_764_test1000_en.md b/docs/_posts/ahmedlone127/2024-09-23-xlmr_romanian_english_all_shuffled_764_test1000_en.md new file mode 100644 index 00000000000000..f6818a7086d018 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-xlmr_romanian_english_all_shuffled_764_test1000_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlmr_romanian_english_all_shuffled_764_test1000 XlmRoBertaForSequenceClassification from patpizio +author: John Snow Labs +name: xlmr_romanian_english_all_shuffled_764_test1000 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlmr_romanian_english_all_shuffled_764_test1000` is a English model originally trained by patpizio. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmr_romanian_english_all_shuffled_764_test1000_en_5.5.0_3.0_1727099124966.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmr_romanian_english_all_shuffled_764_test1000_en_5.5.0_3.0_1727099124966.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlmr_romanian_english_all_shuffled_764_test1000","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlmr_romanian_english_all_shuffled_764_test1000", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmr_romanian_english_all_shuffled_764_test1000| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|820.4 MB| + +## References + +https://huggingface.co/patpizio/xlmr-ro-en-all_shuffled-764-test1000 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-xlmr_romanian_english_all_shuffled_764_test1000_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-xlmr_romanian_english_all_shuffled_764_test1000_pipeline_en.md new file mode 100644 index 00000000000000..76d961336dbea0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-xlmr_romanian_english_all_shuffled_764_test1000_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlmr_romanian_english_all_shuffled_764_test1000_pipeline pipeline XlmRoBertaForSequenceClassification from patpizio +author: John Snow Labs +name: xlmr_romanian_english_all_shuffled_764_test1000_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlmr_romanian_english_all_shuffled_764_test1000_pipeline` is a English model originally trained by patpizio. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmr_romanian_english_all_shuffled_764_test1000_pipeline_en_5.5.0_3.0_1727099236724.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmr_romanian_english_all_shuffled_764_test1000_pipeline_en_5.5.0_3.0_1727099236724.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlmr_romanian_english_all_shuffled_764_test1000_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlmr_romanian_english_all_shuffled_764_test1000_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmr_romanian_english_all_shuffled_764_test1000_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|820.4 MB| + +## References + +https://huggingface.co/patpizio/xlmr-ro-en-all_shuffled-764-test1000 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-xlmrobertabaseforpawsx_english_en.md b/docs/_posts/ahmedlone127/2024-09-23-xlmrobertabaseforpawsx_english_en.md new file mode 100644 index 00000000000000..976c70f563c132 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-xlmrobertabaseforpawsx_english_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlmrobertabaseforpawsx_english XlmRoBertaForSequenceClassification from ziqingyang +author: John Snow Labs +name: xlmrobertabaseforpawsx_english +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlmrobertabaseforpawsx_english` is a English model originally trained by ziqingyang. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmrobertabaseforpawsx_english_en_5.5.0_3.0_1727088286342.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmrobertabaseforpawsx_english_en_5.5.0_3.0_1727088286342.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlmrobertabaseforpawsx_english","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlmrobertabaseforpawsx_english", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmrobertabaseforpawsx_english| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|860.0 MB| + +## References + +https://huggingface.co/ziqingyang/XLMRobertaBaseForPAWSX-en \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-xlmrobertabaseforpawsx_english_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-xlmrobertabaseforpawsx_english_pipeline_en.md new file mode 100644 index 00000000000000..39257900f36c04 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-xlmrobertabaseforpawsx_english_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlmrobertabaseforpawsx_english_pipeline pipeline XlmRoBertaForSequenceClassification from ziqingyang +author: John Snow Labs +name: xlmrobertabaseforpawsx_english_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlmrobertabaseforpawsx_english_pipeline` is a English model originally trained by ziqingyang. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmrobertabaseforpawsx_english_pipeline_en_5.5.0_3.0_1727088356190.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmrobertabaseforpawsx_english_pipeline_en_5.5.0_3.0_1727088356190.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlmrobertabaseforpawsx_english_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlmrobertabaseforpawsx_english_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmrobertabaseforpawsx_english_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|860.1 MB| + +## References + +https://huggingface.co/ziqingyang/XLMRobertaBaseForPAWSX-en + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-xmlr_roberta_base_finetuned_panx_korean_en.md b/docs/_posts/ahmedlone127/2024-09-23-xmlr_roberta_base_finetuned_panx_korean_en.md new file mode 100644 index 00000000000000..43383a8aefb96f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-xmlr_roberta_base_finetuned_panx_korean_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xmlr_roberta_base_finetuned_panx_korean XlmRoBertaForTokenClassification from ghks4861 +author: John Snow Labs +name: xmlr_roberta_base_finetuned_panx_korean +date: 2024-09-23 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xmlr_roberta_base_finetuned_panx_korean` is a English model originally trained by ghks4861. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xmlr_roberta_base_finetuned_panx_korean_en_5.5.0_3.0_1727132271934.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xmlr_roberta_base_finetuned_panx_korean_en_5.5.0_3.0_1727132271934.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xmlr_roberta_base_finetuned_panx_korean","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xmlr_roberta_base_finetuned_panx_korean", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xmlr_roberta_base_finetuned_panx_korean| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|1.0 GB| + +## References + +https://huggingface.co/ghks4861/xmlr-roberta-base-finetuned-panx-ko \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-xnli_xlm_r_only_bulgarian_en.md b/docs/_posts/ahmedlone127/2024-09-23-xnli_xlm_r_only_bulgarian_en.md new file mode 100644 index 00000000000000..0b173a804d5fca --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-xnli_xlm_r_only_bulgarian_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xnli_xlm_r_only_bulgarian XlmRoBertaForSequenceClassification from semindan +author: John Snow Labs +name: xnli_xlm_r_only_bulgarian +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xnli_xlm_r_only_bulgarian` is a English model originally trained by semindan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xnli_xlm_r_only_bulgarian_en_5.5.0_3.0_1727126414636.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xnli_xlm_r_only_bulgarian_en_5.5.0_3.0_1727126414636.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xnli_xlm_r_only_bulgarian","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xnli_xlm_r_only_bulgarian", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xnli_xlm_r_only_bulgarian| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|803.0 MB| + +## References + +https://huggingface.co/semindan/xnli_xlm_r_only_bg \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-0_0000005_0_999_rose_e_wang_en.md b/docs/_posts/ahmedlone127/2024-09-24-0_0000005_0_999_rose_e_wang_en.md new file mode 100644 index 00000000000000..360018005e6bcd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-0_0000005_0_999_rose_e_wang_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English 0_0000005_0_999_rose_e_wang RoBertaForSequenceClassification from rose-e-wang +author: John Snow Labs +name: 0_0000005_0_999_rose_e_wang +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`0_0000005_0_999_rose_e_wang` is a English model originally trained by rose-e-wang. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/0_0000005_0_999_rose_e_wang_en_5.5.0_3.0_1727171744851.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/0_0000005_0_999_rose_e_wang_en_5.5.0_3.0_1727171744851.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("0_0000005_0_999_rose_e_wang","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("0_0000005_0_999_rose_e_wang", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|0_0000005_0_999_rose_e_wang| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/rose-e-wang/0.0000005_0.999 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-0_0000005_0_999_rose_e_wang_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-0_0000005_0_999_rose_e_wang_pipeline_en.md new file mode 100644 index 00000000000000..4824866730192c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-0_0000005_0_999_rose_e_wang_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English 0_0000005_0_999_rose_e_wang_pipeline pipeline RoBertaForSequenceClassification from rose-e-wang +author: John Snow Labs +name: 0_0000005_0_999_rose_e_wang_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`0_0000005_0_999_rose_e_wang_pipeline` is a English model originally trained by rose-e-wang. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/0_0000005_0_999_rose_e_wang_pipeline_en_5.5.0_3.0_1727171846773.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/0_0000005_0_999_rose_e_wang_pipeline_en_5.5.0_3.0_1727171846773.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("0_0000005_0_999_rose_e_wang_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("0_0000005_0_999_rose_e_wang_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|0_0000005_0_999_rose_e_wang_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/rose-e-wang/0.0000005_0.999 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-1030_1_en.md b/docs/_posts/ahmedlone127/2024-09-24-1030_1_en.md new file mode 100644 index 00000000000000..49cf9b5f541d0a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-1030_1_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English 1030_1 DistilBertForSequenceClassification from tingchih +author: John Snow Labs +name: 1030_1 +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`1030_1` is a English model originally trained by tingchih. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/1030_1_en_5.5.0_3.0_1727154388487.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/1030_1_en_5.5.0_3.0_1727154388487.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("1030_1","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("1030_1", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|1030_1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/tingchih/1030-1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-1030_1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-1030_1_pipeline_en.md new file mode 100644 index 00000000000000..64ac2aa2efe866 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-1030_1_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English 1030_1_pipeline pipeline DistilBertForSequenceClassification from tingchih +author: John Snow Labs +name: 1030_1_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`1030_1_pipeline` is a English model originally trained by tingchih. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/1030_1_pipeline_en_5.5.0_3.0_1727154406470.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/1030_1_pipeline_en_5.5.0_3.0_1727154406470.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("1030_1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("1030_1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|1030_1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/tingchih/1030-1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-2020_q4_50p_filtered_random_prog_from_q3_en.md b/docs/_posts/ahmedlone127/2024-09-24-2020_q4_50p_filtered_random_prog_from_q3_en.md new file mode 100644 index 00000000000000..5400fc166749a3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-2020_q4_50p_filtered_random_prog_from_q3_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English 2020_q4_50p_filtered_random_prog_from_q3 RoBertaEmbeddings from DouglasPontes +author: John Snow Labs +name: 2020_q4_50p_filtered_random_prog_from_q3 +date: 2024-09-24 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`2020_q4_50p_filtered_random_prog_from_q3` is a English model originally trained by DouglasPontes. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/2020_q4_50p_filtered_random_prog_from_q3_en_5.5.0_3.0_1727168989425.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/2020_q4_50p_filtered_random_prog_from_q3_en_5.5.0_3.0_1727168989425.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("2020_q4_50p_filtered_random_prog_from_q3","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("2020_q4_50p_filtered_random_prog_from_q3","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|2020_q4_50p_filtered_random_prog_from_q3| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|466.0 MB| + +## References + +https://huggingface.co/DouglasPontes/2020-Q4-50p-filtered-random-prog_from_Q3 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-2020_q4_50p_filtered_random_prog_from_q3_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-2020_q4_50p_filtered_random_prog_from_q3_pipeline_en.md new file mode 100644 index 00000000000000..5b20b93f1bbd08 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-2020_q4_50p_filtered_random_prog_from_q3_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English 2020_q4_50p_filtered_random_prog_from_q3_pipeline pipeline RoBertaEmbeddings from DouglasPontes +author: John Snow Labs +name: 2020_q4_50p_filtered_random_prog_from_q3_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`2020_q4_50p_filtered_random_prog_from_q3_pipeline` is a English model originally trained by DouglasPontes. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/2020_q4_50p_filtered_random_prog_from_q3_pipeline_en_5.5.0_3.0_1727169013666.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/2020_q4_50p_filtered_random_prog_from_q3_pipeline_en_5.5.0_3.0_1727169013666.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("2020_q4_50p_filtered_random_prog_from_q3_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("2020_q4_50p_filtered_random_prog_from_q3_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|2020_q4_50p_filtered_random_prog_from_q3_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|466.1 MB| + +## References + +https://huggingface.co/DouglasPontes/2020-Q4-50p-filtered-random-prog_from_Q3 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-activitat3_en.md b/docs/_posts/ahmedlone127/2024-09-24-activitat3_en.md new file mode 100644 index 00000000000000..e7b5fc5a3c8915 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-activitat3_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English activitat3 RoBertaForSequenceClassification from rcodina +author: John Snow Labs +name: activitat3 +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`activitat3` is a English model originally trained by rcodina. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/activitat3_en_5.5.0_3.0_1727171075907.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/activitat3_en_5.5.0_3.0_1727171075907.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("activitat3","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("activitat3", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|activitat3| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|426.4 MB| + +## References + +https://huggingface.co/rcodina/activitat3 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-activitat3_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-activitat3_pipeline_en.md new file mode 100644 index 00000000000000..3b769764772f00 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-activitat3_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English activitat3_pipeline pipeline RoBertaForSequenceClassification from rcodina +author: John Snow Labs +name: activitat3_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`activitat3_pipeline` is a English model originally trained by rcodina. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/activitat3_pipeline_en_5.5.0_3.0_1727171109140.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/activitat3_pipeline_en_5.5.0_3.0_1727171109140.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("activitat3_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("activitat3_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|activitat3_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|426.4 MB| + +## References + +https://huggingface.co/rcodina/activitat3 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-afro_xlmr_base_pipeline_xx.md b/docs/_posts/ahmedlone127/2024-09-24-afro_xlmr_base_pipeline_xx.md new file mode 100644 index 00000000000000..32f732b7fb4b78 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-afro_xlmr_base_pipeline_xx.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Multilingual afro_xlmr_base_pipeline pipeline XlmRoBertaEmbeddings from Davlan +author: John Snow Labs +name: afro_xlmr_base_pipeline +date: 2024-09-24 +tags: [xx, open_source, pipeline, onnx] +task: Embeddings +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`afro_xlmr_base_pipeline` is a Multilingual model originally trained by Davlan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/afro_xlmr_base_pipeline_xx_5.5.0_3.0_1727209726819.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/afro_xlmr_base_pipeline_xx_5.5.0_3.0_1727209726819.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("afro_xlmr_base_pipeline", lang = "xx") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("afro_xlmr_base_pipeline", lang = "xx") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|afro_xlmr_base_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|xx| +|Size:|1.0 GB| + +## References + +https://huggingface.co/Davlan/afro-xlmr-base + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-afro_xlmr_base_xx.md b/docs/_posts/ahmedlone127/2024-09-24-afro_xlmr_base_xx.md new file mode 100644 index 00000000000000..d9d92e0e59322c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-afro_xlmr_base_xx.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Multilingual afro_xlmr_base XlmRoBertaEmbeddings from Davlan +author: John Snow Labs +name: afro_xlmr_base +date: 2024-09-24 +tags: [xx, open_source, onnx, embeddings, xlm_roberta] +task: Embeddings +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`afro_xlmr_base` is a Multilingual model originally trained by Davlan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/afro_xlmr_base_xx_5.5.0_3.0_1727209666721.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/afro_xlmr_base_xx_5.5.0_3.0_1727209666721.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = XlmRoBertaEmbeddings.pretrained("afro_xlmr_base","xx") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = XlmRoBertaEmbeddings.pretrained("afro_xlmr_base","xx") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|afro_xlmr_base| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[xlm_roberta]| +|Language:|xx| +|Size:|1.0 GB| + +## References + +https://huggingface.co/Davlan/afro-xlmr-base \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-albert_base_japanese_v1_ja.md b/docs/_posts/ahmedlone127/2024-09-24-albert_base_japanese_v1_ja.md new file mode 100644 index 00000000000000..b04afdabf53d66 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-albert_base_japanese_v1_ja.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Japanese albert_base_japanese_v1 AlbertEmbeddings from ken11 +author: John Snow Labs +name: albert_base_japanese_v1 +date: 2024-09-24 +tags: [ja, open_source, onnx, embeddings, albert] +task: Embeddings +language: ja +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: AlbertEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained AlbertEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`albert_base_japanese_v1` is a Japanese model originally trained by ken11. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/albert_base_japanese_v1_ja_5.5.0_3.0_1727220084075.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/albert_base_japanese_v1_ja_5.5.0_3.0_1727220084075.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = AlbertEmbeddings.pretrained("albert_base_japanese_v1","ja") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = AlbertEmbeddings.pretrained("albert_base_japanese_v1","ja") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|albert_base_japanese_v1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence, token]| +|Output Labels:|[albert]| +|Language:|ja| +|Size:|42.8 MB| + +## References + +https://huggingface.co/ken11/albert-base-japanese-v1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-albert_base_japanese_v1_pipeline_ja.md b/docs/_posts/ahmedlone127/2024-09-24-albert_base_japanese_v1_pipeline_ja.md new file mode 100644 index 00000000000000..a7936abbc8b2d1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-albert_base_japanese_v1_pipeline_ja.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Japanese albert_base_japanese_v1_pipeline pipeline AlbertEmbeddings from ken11 +author: John Snow Labs +name: albert_base_japanese_v1_pipeline +date: 2024-09-24 +tags: [ja, open_source, pipeline, onnx] +task: Embeddings +language: ja +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained AlbertEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`albert_base_japanese_v1_pipeline` is a Japanese model originally trained by ken11. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/albert_base_japanese_v1_pipeline_ja_5.5.0_3.0_1727220086439.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/albert_base_japanese_v1_pipeline_ja_5.5.0_3.0_1727220086439.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("albert_base_japanese_v1_pipeline", lang = "ja") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("albert_base_japanese_v1_pipeline", lang = "ja") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|albert_base_japanese_v1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|ja| +|Size:|42.8 MB| + +## References + +https://huggingface.co/ken11/albert-base-japanese-v1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- AlbertEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-albert_japanese_en.md b/docs/_posts/ahmedlone127/2024-09-24-albert_japanese_en.md new file mode 100644 index 00000000000000..b422d66c7930b1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-albert_japanese_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English albert_japanese AlbertEmbeddings from ALINEAR +author: John Snow Labs +name: albert_japanese +date: 2024-09-24 +tags: [en, open_source, onnx, embeddings, albert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: AlbertEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained AlbertEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`albert_japanese` is a English model originally trained by ALINEAR. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/albert_japanese_en_5.5.0_3.0_1727220080203.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/albert_japanese_en_5.5.0_3.0_1727220080203.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = AlbertEmbeddings.pretrained("albert_japanese","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = AlbertEmbeddings.pretrained("albert_japanese","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|albert_japanese| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence, token]| +|Output Labels:|[albert]| +|Language:|en| +|Size:|42.9 MB| + +## References + +https://huggingface.co/ALINEAR/albert-japanese \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-albert_japanese_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-albert_japanese_pipeline_en.md new file mode 100644 index 00000000000000..3c45a989de2aab --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-albert_japanese_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English albert_japanese_pipeline pipeline AlbertEmbeddings from ALINEAR +author: John Snow Labs +name: albert_japanese_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained AlbertEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`albert_japanese_pipeline` is a English model originally trained by ALINEAR. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/albert_japanese_pipeline_en_5.5.0_3.0_1727220082589.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/albert_japanese_pipeline_en_5.5.0_3.0_1727220082589.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("albert_japanese_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("albert_japanese_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|albert_japanese_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|42.9 MB| + +## References + +https://huggingface.co/ALINEAR/albert-japanese + +## Included Models + +- DocumentAssembler +- TokenizerModel +- AlbertEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-albert_news_classification_tw.md b/docs/_posts/ahmedlone127/2024-09-24-albert_news_classification_tw.md new file mode 100644 index 00000000000000..208cbbee2ead41 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-albert_news_classification_tw.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Twi albert_news_classification BertForSequenceClassification from clhuang +author: John Snow Labs +name: albert_news_classification +date: 2024-09-24 +tags: [tw, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: tw +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`albert_news_classification` is a Twi model originally trained by clhuang. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/albert_news_classification_tw_5.5.0_3.0_1727213606690.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/albert_news_classification_tw_5.5.0_3.0_1727213606690.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("albert_news_classification","tw") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("albert_news_classification", "tw") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|albert_news_classification| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|tw| +|Size:|39.8 MB| + +## References + +https://huggingface.co/clhuang/albert-news-classification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-albert_punctuation_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-albert_punctuation_pipeline_en.md new file mode 100644 index 00000000000000..5b8573b26c98b8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-albert_punctuation_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English albert_punctuation_pipeline pipeline BertForTokenClassification from Wikidepia +author: John Snow Labs +name: albert_punctuation_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`albert_punctuation_pipeline` is a English model originally trained by Wikidepia. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/albert_punctuation_pipeline_en_5.5.0_3.0_1727203077794.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/albert_punctuation_pipeline_en_5.5.0_3.0_1727203077794.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("albert_punctuation_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("albert_punctuation_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|albert_punctuation_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|42.0 MB| + +## References + +https://huggingface.co/Wikidepia/albert-punctuation + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-albert_small_kor_v1_en.md b/docs/_posts/ahmedlone127/2024-09-24-albert_small_kor_v1_en.md new file mode 100644 index 00000000000000..bb0f6c4d7e8f2f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-albert_small_kor_v1_en.md @@ -0,0 +1,96 @@ +--- +layout: model +title: English albert_small_kor_v1 AlbertEmbeddings from bongsoo +author: John Snow Labs +name: albert_small_kor_v1 +date: 2024-09-24 +tags: [en, open_source, onnx, embeddings, albert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained AlbertEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`albert_small_kor_v1` is a English model originally trained by bongsoo. + +## Predicted Entities + + + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/albert_small_kor_v1_en_5.5.0_3.0_1727158725304.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/albert_small_kor_v1_en_5.5.0_3.0_1727158725304.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = AlbertEmbeddings.pretrained("albert_small_kor_v1","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = AlbertEmbeddings.pretrained("albert_small_kor_v1","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|albert_small_kor_v1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[bert]| +|Language:|en| +|Size:|41.7 MB| + +## References + +References + +https://huggingface.co/bongsoo/albert-small-kor-v1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-albert_small_kor_v1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-albert_small_kor_v1_pipeline_en.md new file mode 100644 index 00000000000000..4f14f1098269d7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-albert_small_kor_v1_pipeline_en.md @@ -0,0 +1,72 @@ +--- +layout: model +title: English albert_small_kor_v1_pipeline pipeline AlbertEmbeddings from bongsoo +author: John Snow Labs +name: albert_small_kor_v1_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained AlbertEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`albert_small_kor_v1_pipeline` is a English model originally trained by bongsoo. + +## Predicted Entities + + + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/albert_small_kor_v1_pipeline_en_5.5.0_3.0_1727158727845.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/albert_small_kor_v1_pipeline_en_5.5.0_3.0_1727158727845.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +pipeline = PretrainedPipeline("albert_small_kor_v1_pipeline", lang = "en") +annotations = pipeline.transform(df) +``` +```scala +val pipeline = new PretrainedPipeline("albert_small_kor_v1_pipeline", lang = "en") +val annotations = pipeline.transform(df) +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|albert_small_kor_v1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|41.7 MB| + +## References + +References + +https://huggingface.co/bongsoo/albert-small-kor-v1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-all_roberta_large_v1_small_talk_5_16_5_en.md b/docs/_posts/ahmedlone127/2024-09-24-all_roberta_large_v1_small_talk_5_16_5_en.md new file mode 100644 index 00000000000000..9a9d6abb2bcd06 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-all_roberta_large_v1_small_talk_5_16_5_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English all_roberta_large_v1_small_talk_5_16_5 RoBertaForSequenceClassification from fathyshalab +author: John Snow Labs +name: all_roberta_large_v1_small_talk_5_16_5 +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`all_roberta_large_v1_small_talk_5_16_5` is a English model originally trained by fathyshalab. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/all_roberta_large_v1_small_talk_5_16_5_en_5.5.0_3.0_1727167978436.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/all_roberta_large_v1_small_talk_5_16_5_en_5.5.0_3.0_1727167978436.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("all_roberta_large_v1_small_talk_5_16_5","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("all_roberta_large_v1_small_talk_5_16_5", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|all_roberta_large_v1_small_talk_5_16_5| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/fathyshalab/all-roberta-large-v1-small_talk-5-16-5 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-all_roberta_large_v1_small_talk_5_16_5_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-all_roberta_large_v1_small_talk_5_16_5_pipeline_en.md new file mode 100644 index 00000000000000..1d5e082ecf5cf0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-all_roberta_large_v1_small_talk_5_16_5_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English all_roberta_large_v1_small_talk_5_16_5_pipeline pipeline RoBertaForSequenceClassification from fathyshalab +author: John Snow Labs +name: all_roberta_large_v1_small_talk_5_16_5_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`all_roberta_large_v1_small_talk_5_16_5_pipeline` is a English model originally trained by fathyshalab. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/all_roberta_large_v1_small_talk_5_16_5_pipeline_en_5.5.0_3.0_1727168043789.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/all_roberta_large_v1_small_talk_5_16_5_pipeline_en_5.5.0_3.0_1727168043789.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("all_roberta_large_v1_small_talk_5_16_5_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("all_roberta_large_v1_small_talk_5_16_5_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|all_roberta_large_v1_small_talk_5_16_5_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/fathyshalab/all-roberta-large-v1-small_talk-5-16-5 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-all_roberta_large_v1_work_3_16_5_en.md b/docs/_posts/ahmedlone127/2024-09-24-all_roberta_large_v1_work_3_16_5_en.md new file mode 100644 index 00000000000000..b3677abc2d801d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-all_roberta_large_v1_work_3_16_5_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English all_roberta_large_v1_work_3_16_5 RoBertaForSequenceClassification from fathyshalab +author: John Snow Labs +name: all_roberta_large_v1_work_3_16_5 +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`all_roberta_large_v1_work_3_16_5` is a English model originally trained by fathyshalab. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/all_roberta_large_v1_work_3_16_5_en_5.5.0_3.0_1727172010747.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/all_roberta_large_v1_work_3_16_5_en_5.5.0_3.0_1727172010747.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("all_roberta_large_v1_work_3_16_5","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("all_roberta_large_v1_work_3_16_5", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|all_roberta_large_v1_work_3_16_5| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/fathyshalab/all-roberta-large-v1-work-3-16-5 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-all_roberta_large_v1_work_3_16_5_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-all_roberta_large_v1_work_3_16_5_pipeline_en.md new file mode 100644 index 00000000000000..da9db2db9867ce --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-all_roberta_large_v1_work_3_16_5_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English all_roberta_large_v1_work_3_16_5_pipeline pipeline RoBertaForSequenceClassification from fathyshalab +author: John Snow Labs +name: all_roberta_large_v1_work_3_16_5_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`all_roberta_large_v1_work_3_16_5_pipeline` is a English model originally trained by fathyshalab. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/all_roberta_large_v1_work_3_16_5_pipeline_en_5.5.0_3.0_1727172079312.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/all_roberta_large_v1_work_3_16_5_pipeline_en_5.5.0_3.0_1727172079312.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("all_roberta_large_v1_work_3_16_5_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("all_roberta_large_v1_work_3_16_5_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|all_roberta_large_v1_work_3_16_5_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/fathyshalab/all-roberta-large-v1-work-3-16-5 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-angela_untranslated_diacritics_eval_en.md b/docs/_posts/ahmedlone127/2024-09-24-angela_untranslated_diacritics_eval_en.md new file mode 100644 index 00000000000000..a19c520b19436f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-angela_untranslated_diacritics_eval_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English angela_untranslated_diacritics_eval XlmRoBertaForTokenClassification from azhang1212 +author: John Snow Labs +name: angela_untranslated_diacritics_eval +date: 2024-09-24 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`angela_untranslated_diacritics_eval` is a English model originally trained by azhang1212. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/angela_untranslated_diacritics_eval_en_5.5.0_3.0_1727147416973.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/angela_untranslated_diacritics_eval_en_5.5.0_3.0_1727147416973.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("angela_untranslated_diacritics_eval","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("angela_untranslated_diacritics_eval", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|angela_untranslated_diacritics_eval| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|1.0 GB| + +## References + +https://huggingface.co/azhang1212/angela_untranslated_diacritics_eval \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-angela_untranslated_diacritics_eval_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-angela_untranslated_diacritics_eval_pipeline_en.md new file mode 100644 index 00000000000000..cd8b3ebb075296 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-angela_untranslated_diacritics_eval_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English angela_untranslated_diacritics_eval_pipeline pipeline XlmRoBertaForTokenClassification from azhang1212 +author: John Snow Labs +name: angela_untranslated_diacritics_eval_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`angela_untranslated_diacritics_eval_pipeline` is a English model originally trained by azhang1212. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/angela_untranslated_diacritics_eval_pipeline_en_5.5.0_3.0_1727147468390.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/angela_untranslated_diacritics_eval_pipeline_en_5.5.0_3.0_1727147468390.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("angela_untranslated_diacritics_eval_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("angela_untranslated_diacritics_eval_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|angela_untranslated_diacritics_eval_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.0 GB| + +## References + +https://huggingface.co/azhang1212/angela_untranslated_diacritics_eval + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-answer_equivalence_bert_en.md b/docs/_posts/ahmedlone127/2024-09-24-answer_equivalence_bert_en.md new file mode 100644 index 00000000000000..1e4489af0453f6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-answer_equivalence_bert_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English answer_equivalence_bert BertForSequenceClassification from zli12321 +author: John Snow Labs +name: answer_equivalence_bert +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`answer_equivalence_bert` is a English model originally trained by zli12321. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/answer_equivalence_bert_en_5.5.0_3.0_1727219409393.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/answer_equivalence_bert_en_5.5.0_3.0_1727219409393.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("answer_equivalence_bert","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("answer_equivalence_bert", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|answer_equivalence_bert| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/zli12321/answer_equivalence_bert \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-autotrain_1_xlmr_rs_53879126771_en.md b/docs/_posts/ahmedlone127/2024-09-24-autotrain_1_xlmr_rs_53879126771_en.md new file mode 100644 index 00000000000000..04cd7cb03eba82 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-autotrain_1_xlmr_rs_53879126771_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English autotrain_1_xlmr_rs_53879126771 XlmRoBertaForTokenClassification from tinyYhorm +author: John Snow Labs +name: autotrain_1_xlmr_rs_53879126771 +date: 2024-09-24 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`autotrain_1_xlmr_rs_53879126771` is a English model originally trained by tinyYhorm. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/autotrain_1_xlmr_rs_53879126771_en_5.5.0_3.0_1727147790963.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/autotrain_1_xlmr_rs_53879126771_en_5.5.0_3.0_1727147790963.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("autotrain_1_xlmr_rs_53879126771","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("autotrain_1_xlmr_rs_53879126771", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|autotrain_1_xlmr_rs_53879126771| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|770.0 MB| + +## References + +https://huggingface.co/tinyYhorm/autotrain-1-xlmr-rs-53879126771 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-autotrain_1_xlmr_rs_53879126771_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-autotrain_1_xlmr_rs_53879126771_pipeline_en.md new file mode 100644 index 00000000000000..90f7f846f00e93 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-autotrain_1_xlmr_rs_53879126771_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English autotrain_1_xlmr_rs_53879126771_pipeline pipeline XlmRoBertaForTokenClassification from tinyYhorm +author: John Snow Labs +name: autotrain_1_xlmr_rs_53879126771_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`autotrain_1_xlmr_rs_53879126771_pipeline` is a English model originally trained by tinyYhorm. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/autotrain_1_xlmr_rs_53879126771_pipeline_en_5.5.0_3.0_1727147952345.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/autotrain_1_xlmr_rs_53879126771_pipeline_en_5.5.0_3.0_1727147952345.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("autotrain_1_xlmr_rs_53879126771_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("autotrain_1_xlmr_rs_53879126771_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|autotrain_1_xlmr_rs_53879126771_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|770.0 MB| + +## References + +https://huggingface.co/tinyYhorm/autotrain-1-xlmr-rs-53879126771 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-autotrain_hindi_ner_xlmr_869827677_en.md b/docs/_posts/ahmedlone127/2024-09-24-autotrain_hindi_ner_xlmr_869827677_en.md new file mode 100644 index 00000000000000..a2e6fddf222c91 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-autotrain_hindi_ner_xlmr_869827677_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English autotrain_hindi_ner_xlmr_869827677 XlmRoBertaForTokenClassification from pujaburman30 +author: John Snow Labs +name: autotrain_hindi_ner_xlmr_869827677 +date: 2024-09-24 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`autotrain_hindi_ner_xlmr_869827677` is a English model originally trained by pujaburman30. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/autotrain_hindi_ner_xlmr_869827677_en_5.5.0_3.0_1727148036528.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/autotrain_hindi_ner_xlmr_869827677_en_5.5.0_3.0_1727148036528.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("autotrain_hindi_ner_xlmr_869827677","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("autotrain_hindi_ner_xlmr_869827677", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|autotrain_hindi_ner_xlmr_869827677| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|770.6 MB| + +## References + +https://huggingface.co/pujaburman30/autotrain-hi_ner_xlmr-869827677 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-autotrain_hindi_ner_xlmr_869827677_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-autotrain_hindi_ner_xlmr_869827677_pipeline_en.md new file mode 100644 index 00000000000000..d668d8ca96ccab --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-autotrain_hindi_ner_xlmr_869827677_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English autotrain_hindi_ner_xlmr_869827677_pipeline pipeline XlmRoBertaForTokenClassification from pujaburman30 +author: John Snow Labs +name: autotrain_hindi_ner_xlmr_869827677_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`autotrain_hindi_ner_xlmr_869827677_pipeline` is a English model originally trained by pujaburman30. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/autotrain_hindi_ner_xlmr_869827677_pipeline_en_5.5.0_3.0_1727148187756.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/autotrain_hindi_ner_xlmr_869827677_pipeline_en_5.5.0_3.0_1727148187756.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("autotrain_hindi_ner_xlmr_869827677_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("autotrain_hindi_ner_xlmr_869827677_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|autotrain_hindi_ner_xlmr_869827677_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|770.6 MB| + +## References + +https://huggingface.co/pujaburman30/autotrain-hi_ner_xlmr-869827677 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_ads_classification_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_ads_classification_en.md new file mode 100644 index 00000000000000..5bd40d8c6066d6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_ads_classification_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_ads_classification BertForSequenceClassification from bondarchukb +author: John Snow Labs +name: bert_ads_classification +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_ads_classification` is a English model originally trained by bondarchukb. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ads_classification_en_5.5.0_3.0_1727213686943.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ads_classification_en_5.5.0_3.0_1727213686943.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_ads_classification","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_ads_classification", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ads_classification| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|405.9 MB| + +## References + +https://huggingface.co/bondarchukb/bert-ads-classification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_ads_classification_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_ads_classification_pipeline_en.md new file mode 100644 index 00000000000000..f27dcd29b13863 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_ads_classification_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_ads_classification_pipeline pipeline BertForSequenceClassification from bondarchukb +author: John Snow Labs +name: bert_ads_classification_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_ads_classification_pipeline` is a English model originally trained by bondarchukb. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ads_classification_pipeline_en_5.5.0_3.0_1727213707589.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ads_classification_pipeline_en_5.5.0_3.0_1727213707589.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_ads_classification_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_ads_classification_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ads_classification_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|405.9 MB| + +## References + +https://huggingface.co/bondarchukb/bert-ads-classification + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_base_arabic_camelbert_catalan_99189_pretrain_resampled_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_base_arabic_camelbert_catalan_99189_pretrain_resampled_en.md new file mode 100644 index 00000000000000..9bbb1a20c70807 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_base_arabic_camelbert_catalan_99189_pretrain_resampled_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_arabic_camelbert_catalan_99189_pretrain_resampled BertForQuestionAnswering from MatMulMan +author: John Snow Labs +name: bert_base_arabic_camelbert_catalan_99189_pretrain_resampled +date: 2024-09-24 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_arabic_camelbert_catalan_99189_pretrain_resampled` is a English model originally trained by MatMulMan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_arabic_camelbert_catalan_99189_pretrain_resampled_en_5.5.0_3.0_1727206805653.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_arabic_camelbert_catalan_99189_pretrain_resampled_en_5.5.0_3.0_1727206805653.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_arabic_camelbert_catalan_99189_pretrain_resampled","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_arabic_camelbert_catalan_99189_pretrain_resampled", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_arabic_camelbert_catalan_99189_pretrain_resampled| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|406.6 MB| + +## References + +https://huggingface.co/MatMulMan/bert-base-arabic-camelbert-ca-99189-pretrain_resampled \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_base_arabic_camelbert_catalan_99189_pretrain_resampled_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_base_arabic_camelbert_catalan_99189_pretrain_resampled_pipeline_en.md new file mode 100644 index 00000000000000..b1223ff71a0ad7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_base_arabic_camelbert_catalan_99189_pretrain_resampled_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_arabic_camelbert_catalan_99189_pretrain_resampled_pipeline pipeline BertForQuestionAnswering from MatMulMan +author: John Snow Labs +name: bert_base_arabic_camelbert_catalan_99189_pretrain_resampled_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_arabic_camelbert_catalan_99189_pretrain_resampled_pipeline` is a English model originally trained by MatMulMan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_arabic_camelbert_catalan_99189_pretrain_resampled_pipeline_en_5.5.0_3.0_1727206827761.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_arabic_camelbert_catalan_99189_pretrain_resampled_pipeline_en_5.5.0_3.0_1727206827761.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_arabic_camelbert_catalan_99189_pretrain_resampled_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_arabic_camelbert_catalan_99189_pretrain_resampled_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_arabic_camelbert_catalan_99189_pretrain_resampled_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|406.6 MB| + +## References + +https://huggingface.co/MatMulMan/bert-base-arabic-camelbert-ca-99189-pretrain_resampled + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_base_cased_finetuned_squad_v2_bosnian_16_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_base_cased_finetuned_squad_v2_bosnian_16_en.md new file mode 100644 index 00000000000000..c42409a1595927 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_base_cased_finetuned_squad_v2_bosnian_16_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_cased_finetuned_squad_v2_bosnian_16 BertForQuestionAnswering from lauraparra28 +author: John Snow Labs +name: bert_base_cased_finetuned_squad_v2_bosnian_16 +date: 2024-09-24 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_cased_finetuned_squad_v2_bosnian_16` is a English model originally trained by lauraparra28. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_cased_finetuned_squad_v2_bosnian_16_en_5.5.0_3.0_1727176007761.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_cased_finetuned_squad_v2_bosnian_16_en_5.5.0_3.0_1727176007761.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_cased_finetuned_squad_v2_bosnian_16","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_cased_finetuned_squad_v2_bosnian_16", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_cased_finetuned_squad_v2_bosnian_16| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/lauraparra28/bert-base-cased-finetuned-squad_v2-bs_16 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_base_cased_finetuned_squad_v2_bosnian_16_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_base_cased_finetuned_squad_v2_bosnian_16_pipeline_en.md new file mode 100644 index 00000000000000..a96b3b68128091 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_base_cased_finetuned_squad_v2_bosnian_16_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_cased_finetuned_squad_v2_bosnian_16_pipeline pipeline BertForQuestionAnswering from lauraparra28 +author: John Snow Labs +name: bert_base_cased_finetuned_squad_v2_bosnian_16_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_cased_finetuned_squad_v2_bosnian_16_pipeline` is a English model originally trained by lauraparra28. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_cased_finetuned_squad_v2_bosnian_16_pipeline_en_5.5.0_3.0_1727176028261.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_cased_finetuned_squad_v2_bosnian_16_pipeline_en_5.5.0_3.0_1727176028261.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_cased_finetuned_squad_v2_bosnian_16_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_cased_finetuned_squad_v2_bosnian_16_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_cased_finetuned_squad_v2_bosnian_16_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/lauraparra28/bert-base-cased-finetuned-squad_v2-bs_16 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_base_cased_finetuned_stsb_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_base_cased_finetuned_stsb_en.md new file mode 100644 index 00000000000000..22b4775abcee67 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_base_cased_finetuned_stsb_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_cased_finetuned_stsb BertForSequenceClassification from gchhablani +author: John Snow Labs +name: bert_base_cased_finetuned_stsb +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_cased_finetuned_stsb` is a English model originally trained by gchhablani. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_cased_finetuned_stsb_en_5.5.0_3.0_1727218707497.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_cased_finetuned_stsb_en_5.5.0_3.0_1727218707497.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_cased_finetuned_stsb","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_cased_finetuned_stsb", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_cased_finetuned_stsb| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|405.9 MB| + +## References + +https://huggingface.co/gchhablani/bert-base-cased-finetuned-stsb \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_base_cased_finetuned_stsb_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_base_cased_finetuned_stsb_pipeline_en.md new file mode 100644 index 00000000000000..e559e53ad59c15 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_base_cased_finetuned_stsb_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_base_cased_finetuned_stsb_pipeline pipeline BertForSequenceClassification from gchhablani +author: John Snow Labs +name: bert_base_cased_finetuned_stsb_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_cased_finetuned_stsb_pipeline` is a English model originally trained by gchhablani. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_cased_finetuned_stsb_pipeline_en_5.5.0_3.0_1727218728753.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_cased_finetuned_stsb_pipeline_en_5.5.0_3.0_1727218728753.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_cased_finetuned_stsb_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_cased_finetuned_stsb_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_cased_finetuned_stsb_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|405.9 MB| + +## References + +https://huggingface.co/gchhablani/bert-base-cased-finetuned-stsb + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_base_cased_jennyc_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_base_cased_jennyc_en.md new file mode 100644 index 00000000000000..e0f3dfd4812526 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_base_cased_jennyc_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_cased_jennyc BertForQuestionAnswering from jennyc +author: John Snow Labs +name: bert_base_cased_jennyc +date: 2024-09-24 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_cased_jennyc` is a English model originally trained by jennyc. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_cased_jennyc_en_5.5.0_3.0_1727175887390.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_cased_jennyc_en_5.5.0_3.0_1727175887390.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_cased_jennyc","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_cased_jennyc", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_cased_jennyc| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|403.6 MB| + +## References + +https://huggingface.co/jennyc/bert-base-cased \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_base_cased_scmedium_scqa2_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_base_cased_scmedium_scqa2_en.md new file mode 100644 index 00000000000000..3a6bfecb642986 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_base_cased_scmedium_scqa2_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_cased_scmedium_scqa2 BertForQuestionAnswering from CambridgeMolecularEngineering +author: John Snow Labs +name: bert_base_cased_scmedium_scqa2 +date: 2024-09-24 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_cased_scmedium_scqa2` is a English model originally trained by CambridgeMolecularEngineering. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_cased_scmedium_scqa2_en_5.5.0_3.0_1727175347706.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_cased_scmedium_scqa2_en_5.5.0_3.0_1727175347706.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_cased_scmedium_scqa2","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_cased_scmedium_scqa2", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_cased_scmedium_scqa2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|403.6 MB| + +## References + +https://huggingface.co/CambridgeMolecularEngineering/bert-base-cased-scmedium-scqa2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_base_cased_scmedium_scqa2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_base_cased_scmedium_scqa2_pipeline_en.md new file mode 100644 index 00000000000000..3dedc015585163 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_base_cased_scmedium_scqa2_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_cased_scmedium_scqa2_pipeline pipeline BertForQuestionAnswering from CambridgeMolecularEngineering +author: John Snow Labs +name: bert_base_cased_scmedium_scqa2_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_cased_scmedium_scqa2_pipeline` is a English model originally trained by CambridgeMolecularEngineering. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_cased_scmedium_scqa2_pipeline_en_5.5.0_3.0_1727175369752.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_cased_scmedium_scqa2_pipeline_en_5.5.0_3.0_1727175369752.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_cased_scmedium_scqa2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_cased_scmedium_scqa2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_cased_scmedium_scqa2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/CambridgeMolecularEngineering/bert-base-cased-scmedium-scqa2 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_base_chinese_finetuned_question_answering_4_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_base_chinese_finetuned_question_answering_4_en.md new file mode 100644 index 00000000000000..96503d1d0fe3e8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_base_chinese_finetuned_question_answering_4_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_chinese_finetuned_question_answering_4 BertForQuestionAnswering from jazzson +author: John Snow Labs +name: bert_base_chinese_finetuned_question_answering_4 +date: 2024-09-24 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_chinese_finetuned_question_answering_4` is a English model originally trained by jazzson. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_chinese_finetuned_question_answering_4_en_5.5.0_3.0_1727217040537.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_chinese_finetuned_question_answering_4_en_5.5.0_3.0_1727217040537.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_chinese_finetuned_question_answering_4","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_chinese_finetuned_question_answering_4", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_chinese_finetuned_question_answering_4| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|381.1 MB| + +## References + +https://huggingface.co/jazzson/bert-base-chinese-finetuned-question-answering-4 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_base_chinese_finetuned_question_answering_4_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_base_chinese_finetuned_question_answering_4_pipeline_en.md new file mode 100644 index 00000000000000..914b7a745201a7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_base_chinese_finetuned_question_answering_4_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_chinese_finetuned_question_answering_4_pipeline pipeline BertForQuestionAnswering from jazzson +author: John Snow Labs +name: bert_base_chinese_finetuned_question_answering_4_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_chinese_finetuned_question_answering_4_pipeline` is a English model originally trained by jazzson. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_chinese_finetuned_question_answering_4_pipeline_en_5.5.0_3.0_1727217060142.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_chinese_finetuned_question_answering_4_pipeline_en_5.5.0_3.0_1727217060142.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_chinese_finetuned_question_answering_4_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_chinese_finetuned_question_answering_4_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_chinese_finetuned_question_answering_4_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|381.1 MB| + +## References + +https://huggingface.co/jazzson/bert-base-chinese-finetuned-question-answering-4 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_base_chinese_finetuned_question_answering_6_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_base_chinese_finetuned_question_answering_6_en.md new file mode 100644 index 00000000000000..8fdbc4ebdbd95c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_base_chinese_finetuned_question_answering_6_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_chinese_finetuned_question_answering_6 BertForQuestionAnswering from jazzson +author: John Snow Labs +name: bert_base_chinese_finetuned_question_answering_6 +date: 2024-09-24 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_chinese_finetuned_question_answering_6` is a English model originally trained by jazzson. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_chinese_finetuned_question_answering_6_en_5.5.0_3.0_1727216906697.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_chinese_finetuned_question_answering_6_en_5.5.0_3.0_1727216906697.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_chinese_finetuned_question_answering_6","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_chinese_finetuned_question_answering_6", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_chinese_finetuned_question_answering_6| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|381.0 MB| + +## References + +https://huggingface.co/jazzson/bert-base-chinese-finetuned-question-answering-6 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_base_english_greek_modern_cased_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_base_english_greek_modern_cased_en.md new file mode 100644 index 00000000000000..aa807f51a017e3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_base_english_greek_modern_cased_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_english_greek_modern_cased BertEmbeddings from Geotrend +author: John Snow Labs +name: bert_base_english_greek_modern_cased +date: 2024-09-24 +tags: [en, open_source, onnx, embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_english_greek_modern_cased` is a English model originally trained by Geotrend. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_english_greek_modern_cased_en_5.5.0_3.0_1727162039629.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_english_greek_modern_cased_en_5.5.0_3.0_1727162039629.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = BertEmbeddings.pretrained("bert_base_english_greek_modern_cased","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = BertEmbeddings.pretrained("bert_base_english_greek_modern_cased","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_english_greek_modern_cased| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[bert]| +|Language:|en| +|Size:|408.4 MB| + +## References + +https://huggingface.co/Geotrend/bert-base-en-el-cased \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_base_english_greek_modern_cased_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_base_english_greek_modern_cased_pipeline_en.md new file mode 100644 index 00000000000000..531b76490a8e47 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_base_english_greek_modern_cased_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_base_english_greek_modern_cased_pipeline pipeline BertEmbeddings from Geotrend +author: John Snow Labs +name: bert_base_english_greek_modern_cased_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_english_greek_modern_cased_pipeline` is a English model originally trained by Geotrend. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_english_greek_modern_cased_pipeline_en_5.5.0_3.0_1727162061121.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_english_greek_modern_cased_pipeline_en_5.5.0_3.0_1727162061121.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_english_greek_modern_cased_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_english_greek_modern_cased_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_english_greek_modern_cased_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|408.4 MB| + +## References + +https://huggingface.co/Geotrend/bert-base-en-el-cased + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_base_english_greek_modern_russian_cased_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_base_english_greek_modern_russian_cased_en.md new file mode 100644 index 00000000000000..399901617ca5e8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_base_english_greek_modern_russian_cased_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_english_greek_modern_russian_cased BertEmbeddings from Geotrend +author: John Snow Labs +name: bert_base_english_greek_modern_russian_cased +date: 2024-09-24 +tags: [en, open_source, onnx, embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_english_greek_modern_russian_cased` is a English model originally trained by Geotrend. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_english_greek_modern_russian_cased_en_5.5.0_3.0_1727161619829.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_english_greek_modern_russian_cased_en_5.5.0_3.0_1727161619829.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = BertEmbeddings.pretrained("bert_base_english_greek_modern_russian_cased","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = BertEmbeddings.pretrained("bert_base_english_greek_modern_russian_cased","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_english_greek_modern_russian_cased| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[bert]| +|Language:|en| +|Size:|433.2 MB| + +## References + +https://huggingface.co/Geotrend/bert-base-en-el-ru-cased \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_base_multilingual_cased_based_encoder_pipeline_xx.md b/docs/_posts/ahmedlone127/2024-09-24-bert_base_multilingual_cased_based_encoder_pipeline_xx.md new file mode 100644 index 00000000000000..66843946d3f942 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_base_multilingual_cased_based_encoder_pipeline_xx.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Multilingual bert_base_multilingual_cased_based_encoder_pipeline pipeline BertEmbeddings from shsha0110 +author: John Snow Labs +name: bert_base_multilingual_cased_based_encoder_pipeline +date: 2024-09-24 +tags: [xx, open_source, pipeline, onnx] +task: Embeddings +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_multilingual_cased_based_encoder_pipeline` is a Multilingual model originally trained by shsha0110. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_multilingual_cased_based_encoder_pipeline_xx_5.5.0_3.0_1727200645774.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_multilingual_cased_based_encoder_pipeline_xx_5.5.0_3.0_1727200645774.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_multilingual_cased_based_encoder_pipeline", lang = "xx") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_multilingual_cased_based_encoder_pipeline", lang = "xx") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_multilingual_cased_based_encoder_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|xx| +|Size:|664.9 MB| + +## References + +https://huggingface.co/shsha0110/bert-base-multilingual-cased-based-encoder + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_base_multilingual_cased_based_encoder_xx.md b/docs/_posts/ahmedlone127/2024-09-24-bert_base_multilingual_cased_based_encoder_xx.md new file mode 100644 index 00000000000000..70723b56381235 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_base_multilingual_cased_based_encoder_xx.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Multilingual bert_base_multilingual_cased_based_encoder BertEmbeddings from shsha0110 +author: John Snow Labs +name: bert_base_multilingual_cased_based_encoder +date: 2024-09-24 +tags: [xx, open_source, onnx, embeddings, bert] +task: Embeddings +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_multilingual_cased_based_encoder` is a Multilingual model originally trained by shsha0110. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_multilingual_cased_based_encoder_xx_5.5.0_3.0_1727200610130.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_multilingual_cased_based_encoder_xx_5.5.0_3.0_1727200610130.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = BertEmbeddings.pretrained("bert_base_multilingual_cased_based_encoder","xx") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = BertEmbeddings.pretrained("bert_base_multilingual_cased_based_encoder","xx") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_multilingual_cased_based_encoder| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[bert]| +|Language:|xx| +|Size:|664.9 MB| + +## References + +https://huggingface.co/shsha0110/bert-base-multilingual-cased-based-encoder \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_base_multilingual_cased_finetuned_rqa_pipeline_xx.md b/docs/_posts/ahmedlone127/2024-09-24-bert_base_multilingual_cased_finetuned_rqa_pipeline_xx.md new file mode 100644 index 00000000000000..21c4df879a1199 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_base_multilingual_cased_finetuned_rqa_pipeline_xx.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Multilingual bert_base_multilingual_cased_finetuned_rqa_pipeline pipeline BertForQuestionAnswering from AsifAbrar6 +author: John Snow Labs +name: bert_base_multilingual_cased_finetuned_rqa_pipeline +date: 2024-09-24 +tags: [xx, open_source, pipeline, onnx] +task: Question Answering +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_multilingual_cased_finetuned_rqa_pipeline` is a Multilingual model originally trained by AsifAbrar6. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_multilingual_cased_finetuned_rqa_pipeline_xx_5.5.0_3.0_1727163292797.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_multilingual_cased_finetuned_rqa_pipeline_xx_5.5.0_3.0_1727163292797.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_multilingual_cased_finetuned_rqa_pipeline", lang = "xx") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_multilingual_cased_finetuned_rqa_pipeline", lang = "xx") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_multilingual_cased_finetuned_rqa_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|xx| +|Size:|665.1 MB| + +## References + +https://huggingface.co/AsifAbrar6/bert-base-multilingual-cased-finetuned-RQA + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_base_multilingual_cased_finetuned_rqa_xx.md b/docs/_posts/ahmedlone127/2024-09-24-bert_base_multilingual_cased_finetuned_rqa_xx.md new file mode 100644 index 00000000000000..8e7f34e6ae6318 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_base_multilingual_cased_finetuned_rqa_xx.md @@ -0,0 +1,86 @@ +--- +layout: model +title: Multilingual bert_base_multilingual_cased_finetuned_rqa BertForQuestionAnswering from AsifAbrar6 +author: John Snow Labs +name: bert_base_multilingual_cased_finetuned_rqa +date: 2024-09-24 +tags: [xx, open_source, onnx, question_answering, bert] +task: Question Answering +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_multilingual_cased_finetuned_rqa` is a Multilingual model originally trained by AsifAbrar6. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_multilingual_cased_finetuned_rqa_xx_5.5.0_3.0_1727163256211.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_multilingual_cased_finetuned_rqa_xx_5.5.0_3.0_1727163256211.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_multilingual_cased_finetuned_rqa","xx") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_multilingual_cased_finetuned_rqa", "xx") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_multilingual_cased_finetuned_rqa| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|xx| +|Size:|665.1 MB| + +## References + +https://huggingface.co/AsifAbrar6/bert-base-multilingual-cased-finetuned-RQA \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_base_multilingual_cased_sv2_pipeline_xx.md b/docs/_posts/ahmedlone127/2024-09-24-bert_base_multilingual_cased_sv2_pipeline_xx.md new file mode 100644 index 00000000000000..573f25548a615b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_base_multilingual_cased_sv2_pipeline_xx.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Multilingual bert_base_multilingual_cased_sv2_pipeline pipeline BertForQuestionAnswering from monakth +author: John Snow Labs +name: bert_base_multilingual_cased_sv2_pipeline +date: 2024-09-24 +tags: [xx, open_source, pipeline, onnx] +task: Question Answering +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_multilingual_cased_sv2_pipeline` is a Multilingual model originally trained by monakth. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_multilingual_cased_sv2_pipeline_xx_5.5.0_3.0_1727175396956.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_multilingual_cased_sv2_pipeline_xx_5.5.0_3.0_1727175396956.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_multilingual_cased_sv2_pipeline", lang = "xx") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_multilingual_cased_sv2_pipeline", lang = "xx") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_multilingual_cased_sv2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|xx| +|Size:|665.1 MB| + +## References + +https://huggingface.co/monakth/bert-base-multilingual-cased-sv2 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_base_paws_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_base_paws_en.md new file mode 100644 index 00000000000000..16a482083b41d5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_base_paws_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_paws BertForSequenceClassification from harouzie +author: John Snow Labs +name: bert_base_paws +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_paws` is a English model originally trained by harouzie. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_paws_en_5.5.0_3.0_1727218976635.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_paws_en_5.5.0_3.0_1727218976635.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_paws","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_paws", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_paws| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/harouzie/bert-base-paws \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_base_squad_v1_1_portuguese_ibama_v0_420240914224146_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_base_squad_v1_1_portuguese_ibama_v0_420240914224146_en.md new file mode 100644 index 00000000000000..e419c295a5b567 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_base_squad_v1_1_portuguese_ibama_v0_420240914224146_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_squad_v1_1_portuguese_ibama_v0_420240914224146 BertForQuestionAnswering from alcalazans +author: John Snow Labs +name: bert_base_squad_v1_1_portuguese_ibama_v0_420240914224146 +date: 2024-09-24 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_squad_v1_1_portuguese_ibama_v0_420240914224146` is a English model originally trained by alcalazans. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_squad_v1_1_portuguese_ibama_v0_420240914224146_en_5.5.0_3.0_1727216919379.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_squad_v1_1_portuguese_ibama_v0_420240914224146_en_5.5.0_3.0_1727216919379.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_squad_v1_1_portuguese_ibama_v0_420240914224146","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_squad_v1_1_portuguese_ibama_v0_420240914224146", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_squad_v1_1_portuguese_ibama_v0_420240914224146| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|405.9 MB| + +## References + +https://huggingface.co/alcalazans/bert-base-squad-v1.1-pt-IBAMA_v0.420240914224146 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_base_squad_v1_1_portuguese_ibama_v0_420240914224146_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_base_squad_v1_1_portuguese_ibama_v0_420240914224146_pipeline_en.md new file mode 100644 index 00000000000000..b5b6b44ea335e1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_base_squad_v1_1_portuguese_ibama_v0_420240914224146_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_squad_v1_1_portuguese_ibama_v0_420240914224146_pipeline pipeline BertForQuestionAnswering from alcalazans +author: John Snow Labs +name: bert_base_squad_v1_1_portuguese_ibama_v0_420240914224146_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_squad_v1_1_portuguese_ibama_v0_420240914224146_pipeline` is a English model originally trained by alcalazans. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_squad_v1_1_portuguese_ibama_v0_420240914224146_pipeline_en_5.5.0_3.0_1727216940946.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_squad_v1_1_portuguese_ibama_v0_420240914224146_pipeline_en_5.5.0_3.0_1727216940946.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_squad_v1_1_portuguese_ibama_v0_420240914224146_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_squad_v1_1_portuguese_ibama_v0_420240914224146_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_squad_v1_1_portuguese_ibama_v0_420240914224146_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|405.9 MB| + +## References + +https://huggingface.co/alcalazans/bert-base-squad-v1.1-pt-IBAMA_v0.420240914224146 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_emotion_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_emotion_en.md new file mode 100644 index 00000000000000..590d0e7c8f9723 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_emotion_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_uncased_emotion DistilBertForSequenceClassification from isom5240sp24 +author: John Snow Labs +name: bert_base_uncased_emotion +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_emotion` is a English model originally trained by isom5240sp24. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_emotion_en_5.5.0_3.0_1727205014862.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_emotion_en_5.5.0_3.0_1727205014862.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("bert_base_uncased_emotion","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("bert_base_uncased_emotion", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_emotion| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/isom5240sp24/bert-base-uncased-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_ep_1_0_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_ep_1_0_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_en.md new file mode 100644 index 00000000000000..14c5da998550f9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_ep_1_0_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_uncased_ep_1_0_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0 BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_ep_1_0_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0 +date: 2024-09-24 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_ep_1_0_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_ep_1_0_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_en_5.5.0_3.0_1727175500437.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_ep_1_0_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_en_5.5.0_3.0_1727175500437.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_ep_1_0_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_ep_1_0_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_ep_1_0_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-ep-1.0-b-32-lr-8e-07-dp-0.5-ss-0-st-True-fh-False-hs-0 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_ep_1_0_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_ep_1_0_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_pipeline_en.md new file mode 100644 index 00000000000000..078bd909b5058c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_ep_1_0_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_uncased_ep_1_0_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_pipeline pipeline BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_ep_1_0_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_ep_1_0_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_pipeline` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_ep_1_0_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_pipeline_en_5.5.0_3.0_1727175521932.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_ep_1_0_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_pipeline_en_5.5.0_3.0_1727175521932.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_ep_1_0_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_ep_1_0_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_ep_1_0_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-ep-1.0-b-32-lr-8e-07-dp-0.5-ss-0-st-True-fh-False-hs-0 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_ep_1_0_b_32_lr_8e_07_dp_0_5_swati_100_southern_sotho_false_fh_true_hs_0_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_ep_1_0_b_32_lr_8e_07_dp_0_5_swati_100_southern_sotho_false_fh_true_hs_0_pipeline_en.md new file mode 100644 index 00000000000000..c8e9a8461e62ea --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_ep_1_0_b_32_lr_8e_07_dp_0_5_swati_100_southern_sotho_false_fh_true_hs_0_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_uncased_ep_1_0_b_32_lr_8e_07_dp_0_5_swati_100_southern_sotho_false_fh_true_hs_0_pipeline pipeline BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_ep_1_0_b_32_lr_8e_07_dp_0_5_swati_100_southern_sotho_false_fh_true_hs_0_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_ep_1_0_b_32_lr_8e_07_dp_0_5_swati_100_southern_sotho_false_fh_true_hs_0_pipeline` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_ep_1_0_b_32_lr_8e_07_dp_0_5_swati_100_southern_sotho_false_fh_true_hs_0_pipeline_en_5.5.0_3.0_1727176181678.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_ep_1_0_b_32_lr_8e_07_dp_0_5_swati_100_southern_sotho_false_fh_true_hs_0_pipeline_en_5.5.0_3.0_1727176181678.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_ep_1_0_b_32_lr_8e_07_dp_0_5_swati_100_southern_sotho_false_fh_true_hs_0_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_ep_1_0_b_32_lr_8e_07_dp_0_5_swati_100_southern_sotho_false_fh_true_hs_0_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_ep_1_0_b_32_lr_8e_07_dp_0_5_swati_100_southern_sotho_false_fh_true_hs_0_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-ep-1.0-b-32-lr-8e-07-dp-0.5-ss-100-st-False-fh-True-hs-0 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_ep_1_45_b_32_lr_4e_06_dp_0_1_swati_300_southern_sotho_false_fh_true_hs_0_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_ep_1_45_b_32_lr_4e_06_dp_0_1_swati_300_southern_sotho_false_fh_true_hs_0_en.md new file mode 100644 index 00000000000000..5cc6dd6f85dc90 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_ep_1_45_b_32_lr_4e_06_dp_0_1_swati_300_southern_sotho_false_fh_true_hs_0_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_uncased_ep_1_45_b_32_lr_4e_06_dp_0_1_swati_300_southern_sotho_false_fh_true_hs_0 BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_ep_1_45_b_32_lr_4e_06_dp_0_1_swati_300_southern_sotho_false_fh_true_hs_0 +date: 2024-09-24 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_ep_1_45_b_32_lr_4e_06_dp_0_1_swati_300_southern_sotho_false_fh_true_hs_0` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_ep_1_45_b_32_lr_4e_06_dp_0_1_swati_300_southern_sotho_false_fh_true_hs_0_en_5.5.0_3.0_1727176070330.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_ep_1_45_b_32_lr_4e_06_dp_0_1_swati_300_southern_sotho_false_fh_true_hs_0_en_5.5.0_3.0_1727176070330.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_ep_1_45_b_32_lr_4e_06_dp_0_1_swati_300_southern_sotho_false_fh_true_hs_0","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_ep_1_45_b_32_lr_4e_06_dp_0_1_swati_300_southern_sotho_false_fh_true_hs_0", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_ep_1_45_b_32_lr_4e_06_dp_0_1_swati_300_southern_sotho_false_fh_true_hs_0| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-ep-1.45-b-32-lr-4e-06-dp-0.1-ss-300-st-False-fh-True-hs-0 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_ep_1_45_b_32_lr_4e_06_dp_0_1_swati_300_southern_sotho_false_fh_true_hs_0_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_ep_1_45_b_32_lr_4e_06_dp_0_1_swati_300_southern_sotho_false_fh_true_hs_0_pipeline_en.md new file mode 100644 index 00000000000000..b997466c26921e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_ep_1_45_b_32_lr_4e_06_dp_0_1_swati_300_southern_sotho_false_fh_true_hs_0_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_uncased_ep_1_45_b_32_lr_4e_06_dp_0_1_swati_300_southern_sotho_false_fh_true_hs_0_pipeline pipeline BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_ep_1_45_b_32_lr_4e_06_dp_0_1_swati_300_southern_sotho_false_fh_true_hs_0_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_ep_1_45_b_32_lr_4e_06_dp_0_1_swati_300_southern_sotho_false_fh_true_hs_0_pipeline` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_ep_1_45_b_32_lr_4e_06_dp_0_1_swati_300_southern_sotho_false_fh_true_hs_0_pipeline_en_5.5.0_3.0_1727176091440.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_ep_1_45_b_32_lr_4e_06_dp_0_1_swati_300_southern_sotho_false_fh_true_hs_0_pipeline_en_5.5.0_3.0_1727176091440.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_ep_1_45_b_32_lr_4e_06_dp_0_1_swati_300_southern_sotho_false_fh_true_hs_0_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_ep_1_45_b_32_lr_4e_06_dp_0_1_swati_300_southern_sotho_false_fh_true_hs_0_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_ep_1_45_b_32_lr_4e_06_dp_0_1_swati_300_southern_sotho_false_fh_true_hs_0_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-ep-1.45-b-32-lr-4e-06-dp-0.1-ss-300-st-False-fh-True-hs-0 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_ep_2_25_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_ep_2_25_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_en.md new file mode 100644 index 00000000000000..5ed2fe4095072d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_ep_2_25_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_uncased_ep_2_25_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0 BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_ep_2_25_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0 +date: 2024-09-24 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_ep_2_25_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_ep_2_25_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_en_5.5.0_3.0_1727175918804.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_ep_2_25_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_en_5.5.0_3.0_1727175918804.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_ep_2_25_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_ep_2_25_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_ep_2_25_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-ep-2.25-b-32-lr-8e-07-dp-0.5-ss-0-st-True-fh-False-hs-0 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_ep_2_69_b_32_lr_4e_06_dp_0_1_swati_0_southern_sotho_true_fh_false_hs_0_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_ep_2_69_b_32_lr_4e_06_dp_0_1_swati_0_southern_sotho_true_fh_false_hs_0_en.md new file mode 100644 index 00000000000000..c1f55cdc6675ba --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_ep_2_69_b_32_lr_4e_06_dp_0_1_swati_0_southern_sotho_true_fh_false_hs_0_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_uncased_ep_2_69_b_32_lr_4e_06_dp_0_1_swati_0_southern_sotho_true_fh_false_hs_0 BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_ep_2_69_b_32_lr_4e_06_dp_0_1_swati_0_southern_sotho_true_fh_false_hs_0 +date: 2024-09-24 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_ep_2_69_b_32_lr_4e_06_dp_0_1_swati_0_southern_sotho_true_fh_false_hs_0` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_ep_2_69_b_32_lr_4e_06_dp_0_1_swati_0_southern_sotho_true_fh_false_hs_0_en_5.5.0_3.0_1727163481076.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_ep_2_69_b_32_lr_4e_06_dp_0_1_swati_0_southern_sotho_true_fh_false_hs_0_en_5.5.0_3.0_1727163481076.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_ep_2_69_b_32_lr_4e_06_dp_0_1_swati_0_southern_sotho_true_fh_false_hs_0","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_ep_2_69_b_32_lr_4e_06_dp_0_1_swati_0_southern_sotho_true_fh_false_hs_0", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_ep_2_69_b_32_lr_4e_06_dp_0_1_swati_0_southern_sotho_true_fh_false_hs_0| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-ep-2.69-b-32-lr-4e-06-dp-0.1-ss-0-st-True-fh-False-hs-0 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_ep_2_69_b_32_lr_4e_06_dp_0_1_swati_0_southern_sotho_true_fh_false_hs_0_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_ep_2_69_b_32_lr_4e_06_dp_0_1_swati_0_southern_sotho_true_fh_false_hs_0_pipeline_en.md new file mode 100644 index 00000000000000..64da160f2c69da --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_ep_2_69_b_32_lr_4e_06_dp_0_1_swati_0_southern_sotho_true_fh_false_hs_0_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_uncased_ep_2_69_b_32_lr_4e_06_dp_0_1_swati_0_southern_sotho_true_fh_false_hs_0_pipeline pipeline BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_ep_2_69_b_32_lr_4e_06_dp_0_1_swati_0_southern_sotho_true_fh_false_hs_0_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_ep_2_69_b_32_lr_4e_06_dp_0_1_swati_0_southern_sotho_true_fh_false_hs_0_pipeline` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_ep_2_69_b_32_lr_4e_06_dp_0_1_swati_0_southern_sotho_true_fh_false_hs_0_pipeline_en_5.5.0_3.0_1727163506920.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_ep_2_69_b_32_lr_4e_06_dp_0_1_swati_0_southern_sotho_true_fh_false_hs_0_pipeline_en_5.5.0_3.0_1727163506920.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_ep_2_69_b_32_lr_4e_06_dp_0_1_swati_0_southern_sotho_true_fh_false_hs_0_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_ep_2_69_b_32_lr_4e_06_dp_0_1_swati_0_southern_sotho_true_fh_false_hs_0_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_ep_2_69_b_32_lr_4e_06_dp_0_1_swati_0_southern_sotho_true_fh_false_hs_0_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-ep-2.69-b-32-lr-4e-06-dp-0.1-ss-0-st-True-fh-False-hs-0 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_ep_3_11_b_32_lr_8e_07_dp_0_5_swati_700_southern_sotho_false_fh_true_hs_0_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_ep_3_11_b_32_lr_8e_07_dp_0_5_swati_700_southern_sotho_false_fh_true_hs_0_en.md new file mode 100644 index 00000000000000..416a3de611bbdd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_ep_3_11_b_32_lr_8e_07_dp_0_5_swati_700_southern_sotho_false_fh_true_hs_0_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_uncased_ep_3_11_b_32_lr_8e_07_dp_0_5_swati_700_southern_sotho_false_fh_true_hs_0 BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_ep_3_11_b_32_lr_8e_07_dp_0_5_swati_700_southern_sotho_false_fh_true_hs_0 +date: 2024-09-24 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_ep_3_11_b_32_lr_8e_07_dp_0_5_swati_700_southern_sotho_false_fh_true_hs_0` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_ep_3_11_b_32_lr_8e_07_dp_0_5_swati_700_southern_sotho_false_fh_true_hs_0_en_5.5.0_3.0_1727175786858.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_ep_3_11_b_32_lr_8e_07_dp_0_5_swati_700_southern_sotho_false_fh_true_hs_0_en_5.5.0_3.0_1727175786858.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_ep_3_11_b_32_lr_8e_07_dp_0_5_swati_700_southern_sotho_false_fh_true_hs_0","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_ep_3_11_b_32_lr_8e_07_dp_0_5_swati_700_southern_sotho_false_fh_true_hs_0", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_ep_3_11_b_32_lr_8e_07_dp_0_5_swati_700_southern_sotho_false_fh_true_hs_0| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-ep-3.11-b-32-lr-8e-07-dp-0.5-ss-700-st-False-fh-True-hs-0 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_ep_3_11_b_32_lr_8e_07_dp_0_5_swati_700_southern_sotho_false_fh_true_hs_0_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_ep_3_11_b_32_lr_8e_07_dp_0_5_swati_700_southern_sotho_false_fh_true_hs_0_pipeline_en.md new file mode 100644 index 00000000000000..8c27c535706bd9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_ep_3_11_b_32_lr_8e_07_dp_0_5_swati_700_southern_sotho_false_fh_true_hs_0_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_uncased_ep_3_11_b_32_lr_8e_07_dp_0_5_swati_700_southern_sotho_false_fh_true_hs_0_pipeline pipeline BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_ep_3_11_b_32_lr_8e_07_dp_0_5_swati_700_southern_sotho_false_fh_true_hs_0_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_ep_3_11_b_32_lr_8e_07_dp_0_5_swati_700_southern_sotho_false_fh_true_hs_0_pipeline` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_ep_3_11_b_32_lr_8e_07_dp_0_5_swati_700_southern_sotho_false_fh_true_hs_0_pipeline_en_5.5.0_3.0_1727175807534.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_ep_3_11_b_32_lr_8e_07_dp_0_5_swati_700_southern_sotho_false_fh_true_hs_0_pipeline_en_5.5.0_3.0_1727175807534.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_ep_3_11_b_32_lr_8e_07_dp_0_5_swati_700_southern_sotho_false_fh_true_hs_0_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_ep_3_11_b_32_lr_8e_07_dp_0_5_swati_700_southern_sotho_false_fh_true_hs_0_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_ep_3_11_b_32_lr_8e_07_dp_0_5_swati_700_southern_sotho_false_fh_true_hs_0_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-ep-3.11-b-32-lr-8e-07-dp-0.5-ss-700-st-False-fh-True-hs-0 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_ep_3_44_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_false_fh_false_hs_800_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_ep_3_44_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_false_fh_false_hs_800_en.md new file mode 100644 index 00000000000000..714f6c8676e656 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_ep_3_44_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_false_fh_false_hs_800_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_uncased_ep_3_44_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_false_fh_false_hs_800 BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_ep_3_44_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_false_fh_false_hs_800 +date: 2024-09-24 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_ep_3_44_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_false_fh_false_hs_800` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_ep_3_44_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_false_fh_false_hs_800_en_5.5.0_3.0_1727163199936.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_ep_3_44_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_false_fh_false_hs_800_en_5.5.0_3.0_1727163199936.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_ep_3_44_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_false_fh_false_hs_800","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_ep_3_44_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_false_fh_false_hs_800", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_ep_3_44_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_false_fh_false_hs_800| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-ep-3.44-b-32-lr-8e-07-dp-0.5-ss-0-st-False-fh-False-hs-800 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_ep_3_44_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_false_fh_false_hs_800_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_ep_3_44_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_false_fh_false_hs_800_pipeline_en.md new file mode 100644 index 00000000000000..ab2c99f6807bc2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_ep_3_44_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_false_fh_false_hs_800_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_uncased_ep_3_44_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_false_fh_false_hs_800_pipeline pipeline BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_ep_3_44_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_false_fh_false_hs_800_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_ep_3_44_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_false_fh_false_hs_800_pipeline` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_ep_3_44_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_false_fh_false_hs_800_pipeline_en_5.5.0_3.0_1727163220743.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_ep_3_44_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_false_fh_false_hs_800_pipeline_en_5.5.0_3.0_1727163220743.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_ep_3_44_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_false_fh_false_hs_800_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_ep_3_44_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_false_fh_false_hs_800_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_ep_3_44_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_false_fh_false_hs_800_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-ep-3.44-b-32-lr-8e-07-dp-0.5-ss-0-st-False-fh-False-hs-800 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_ep_4_87_b_32_lr_4e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_ep_4_87_b_32_lr_4e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_en.md new file mode 100644 index 00000000000000..35e46589b89d3a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_ep_4_87_b_32_lr_4e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_uncased_ep_4_87_b_32_lr_4e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0 BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_ep_4_87_b_32_lr_4e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0 +date: 2024-09-24 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_ep_4_87_b_32_lr_4e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_ep_4_87_b_32_lr_4e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_en_5.5.0_3.0_1727163813410.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_ep_4_87_b_32_lr_4e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_en_5.5.0_3.0_1727163813410.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_ep_4_87_b_32_lr_4e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_ep_4_87_b_32_lr_4e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_ep_4_87_b_32_lr_4e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-ep-4.87-b-32-lr-4e-07-dp-0.5-ss-0-st-True-fh-False-hs-0 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_ep_4_87_b_32_lr_4e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_ep_4_87_b_32_lr_4e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_pipeline_en.md new file mode 100644 index 00000000000000..848d56f9b9e697 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_ep_4_87_b_32_lr_4e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_uncased_ep_4_87_b_32_lr_4e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_pipeline pipeline BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_ep_4_87_b_32_lr_4e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_ep_4_87_b_32_lr_4e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_pipeline` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_ep_4_87_b_32_lr_4e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_pipeline_en_5.5.0_3.0_1727163833689.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_ep_4_87_b_32_lr_4e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_pipeline_en_5.5.0_3.0_1727163833689.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_ep_4_87_b_32_lr_4e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_ep_4_87_b_32_lr_4e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_ep_4_87_b_32_lr_4e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-ep-4.87-b-32-lr-4e-07-dp-0.5-ss-0-st-True-fh-False-hs-0 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_few_shot_k_64_finetuned_squad_seed_6_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_few_shot_k_64_finetuned_squad_seed_6_en.md new file mode 100644 index 00000000000000..8cec5288f868cf --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_few_shot_k_64_finetuned_squad_seed_6_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_uncased_few_shot_k_64_finetuned_squad_seed_6 BertForQuestionAnswering from anas-awadalla +author: John Snow Labs +name: bert_base_uncased_few_shot_k_64_finetuned_squad_seed_6 +date: 2024-09-24 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_few_shot_k_64_finetuned_squad_seed_6` is a English model originally trained by anas-awadalla. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_few_shot_k_64_finetuned_squad_seed_6_en_5.5.0_3.0_1727206732278.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_few_shot_k_64_finetuned_squad_seed_6_en_5.5.0_3.0_1727206732278.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_few_shot_k_64_finetuned_squad_seed_6","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_few_shot_k_64_finetuned_squad_seed_6", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_few_shot_k_64_finetuned_squad_seed_6| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/anas-awadalla/bert-base-uncased-few-shot-k-64-finetuned-squad-seed-6 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_few_shot_k_64_finetuned_squad_seed_6_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_few_shot_k_64_finetuned_squad_seed_6_pipeline_en.md new file mode 100644 index 00000000000000..124f8c6705c672 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_few_shot_k_64_finetuned_squad_seed_6_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_uncased_few_shot_k_64_finetuned_squad_seed_6_pipeline pipeline BertForQuestionAnswering from anas-awadalla +author: John Snow Labs +name: bert_base_uncased_few_shot_k_64_finetuned_squad_seed_6_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_few_shot_k_64_finetuned_squad_seed_6_pipeline` is a English model originally trained by anas-awadalla. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_few_shot_k_64_finetuned_squad_seed_6_pipeline_en_5.5.0_3.0_1727206753795.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_few_shot_k_64_finetuned_squad_seed_6_pipeline_en_5.5.0_3.0_1727206753795.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_few_shot_k_64_finetuned_squad_seed_6_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_few_shot_k_64_finetuned_squad_seed_6_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_few_shot_k_64_finetuned_squad_seed_6_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/anas-awadalla/bert-base-uncased-few-shot-k-64-finetuned-squad-seed-6 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetune_squad_ep_0_5_lr_1e_05_wd_0_001_dp_0_2_swati_0_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetune_squad_ep_0_5_lr_1e_05_wd_0_001_dp_0_2_swati_0_en.md new file mode 100644 index 00000000000000..920f39b1a2953d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetune_squad_ep_0_5_lr_1e_05_wd_0_001_dp_0_2_swati_0_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_uncased_finetune_squad_ep_0_5_lr_1e_05_wd_0_001_dp_0_2_swati_0 BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_finetune_squad_ep_0_5_lr_1e_05_wd_0_001_dp_0_2_swati_0 +date: 2024-09-24 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetune_squad_ep_0_5_lr_1e_05_wd_0_001_dp_0_2_swati_0` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_0_5_lr_1e_05_wd_0_001_dp_0_2_swati_0_en_5.5.0_3.0_1727163793729.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_0_5_lr_1e_05_wd_0_001_dp_0_2_swati_0_en_5.5.0_3.0_1727163793729.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetune_squad_ep_0_5_lr_1e_05_wd_0_001_dp_0_2_swati_0","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetune_squad_ep_0_5_lr_1e_05_wd_0_001_dp_0_2_swati_0", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetune_squad_ep_0_5_lr_1e_05_wd_0_001_dp_0_2_swati_0| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-finetune-squad-ep-0.5-lr-1e-05-wd-0.001-dp-0.2-ss-0 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetune_squad_ep_0_5_lr_1e_05_wd_0_001_dp_0_2_swati_0_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetune_squad_ep_0_5_lr_1e_05_wd_0_001_dp_0_2_swati_0_pipeline_en.md new file mode 100644 index 00000000000000..dd2821e259194f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetune_squad_ep_0_5_lr_1e_05_wd_0_001_dp_0_2_swati_0_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_uncased_finetune_squad_ep_0_5_lr_1e_05_wd_0_001_dp_0_2_swati_0_pipeline pipeline BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_finetune_squad_ep_0_5_lr_1e_05_wd_0_001_dp_0_2_swati_0_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetune_squad_ep_0_5_lr_1e_05_wd_0_001_dp_0_2_swati_0_pipeline` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_0_5_lr_1e_05_wd_0_001_dp_0_2_swati_0_pipeline_en_5.5.0_3.0_1727163816652.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_0_5_lr_1e_05_wd_0_001_dp_0_2_swati_0_pipeline_en_5.5.0_3.0_1727163816652.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_finetune_squad_ep_0_5_lr_1e_05_wd_0_001_dp_0_2_swati_0_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_finetune_squad_ep_0_5_lr_1e_05_wd_0_001_dp_0_2_swati_0_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetune_squad_ep_0_5_lr_1e_05_wd_0_001_dp_0_2_swati_0_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-finetune-squad-ep-0.5-lr-1e-05-wd-0.001-dp-0.2-ss-0 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetune_squad_ep_0_9_lr_1e_06_wd_0_001_dp_0_99999_swati_160000_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetune_squad_ep_0_9_lr_1e_06_wd_0_001_dp_0_99999_swati_160000_en.md new file mode 100644 index 00000000000000..73a80ded98edb9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetune_squad_ep_0_9_lr_1e_06_wd_0_001_dp_0_99999_swati_160000_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_uncased_finetune_squad_ep_0_9_lr_1e_06_wd_0_001_dp_0_99999_swati_160000 BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_finetune_squad_ep_0_9_lr_1e_06_wd_0_001_dp_0_99999_swati_160000 +date: 2024-09-24 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetune_squad_ep_0_9_lr_1e_06_wd_0_001_dp_0_99999_swati_160000` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_0_9_lr_1e_06_wd_0_001_dp_0_99999_swati_160000_en_5.5.0_3.0_1727175641920.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_0_9_lr_1e_06_wd_0_001_dp_0_99999_swati_160000_en_5.5.0_3.0_1727175641920.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetune_squad_ep_0_9_lr_1e_06_wd_0_001_dp_0_99999_swati_160000","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetune_squad_ep_0_9_lr_1e_06_wd_0_001_dp_0_99999_swati_160000", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetune_squad_ep_0_9_lr_1e_06_wd_0_001_dp_0_99999_swati_160000| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-finetune-squad-ep-0.9-lr-1e-06-wd-0.001-dp-0.99999-ss-160000 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetune_squad_ep_0_9_lr_1e_06_wd_0_001_dp_0_99999_swati_160000_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetune_squad_ep_0_9_lr_1e_06_wd_0_001_dp_0_99999_swati_160000_pipeline_en.md new file mode 100644 index 00000000000000..3aa865f5649e2b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetune_squad_ep_0_9_lr_1e_06_wd_0_001_dp_0_99999_swati_160000_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_uncased_finetune_squad_ep_0_9_lr_1e_06_wd_0_001_dp_0_99999_swati_160000_pipeline pipeline BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_finetune_squad_ep_0_9_lr_1e_06_wd_0_001_dp_0_99999_swati_160000_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetune_squad_ep_0_9_lr_1e_06_wd_0_001_dp_0_99999_swati_160000_pipeline` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_0_9_lr_1e_06_wd_0_001_dp_0_99999_swati_160000_pipeline_en_5.5.0_3.0_1727175665424.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_0_9_lr_1e_06_wd_0_001_dp_0_99999_swati_160000_pipeline_en_5.5.0_3.0_1727175665424.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_finetune_squad_ep_0_9_lr_1e_06_wd_0_001_dp_0_99999_swati_160000_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_finetune_squad_ep_0_9_lr_1e_06_wd_0_001_dp_0_99999_swati_160000_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetune_squad_ep_0_9_lr_1e_06_wd_0_001_dp_0_99999_swati_160000_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-finetune-squad-ep-0.9-lr-1e-06-wd-0.001-dp-0.99999-ss-160000 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetune_squad_ep_0_9_lr_1e_06_wd_0_001_dp_0_99999_swati_80000_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetune_squad_ep_0_9_lr_1e_06_wd_0_001_dp_0_99999_swati_80000_en.md new file mode 100644 index 00000000000000..190936d8112520 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetune_squad_ep_0_9_lr_1e_06_wd_0_001_dp_0_99999_swati_80000_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_uncased_finetune_squad_ep_0_9_lr_1e_06_wd_0_001_dp_0_99999_swati_80000 BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_finetune_squad_ep_0_9_lr_1e_06_wd_0_001_dp_0_99999_swati_80000 +date: 2024-09-24 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetune_squad_ep_0_9_lr_1e_06_wd_0_001_dp_0_99999_swati_80000` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_0_9_lr_1e_06_wd_0_001_dp_0_99999_swati_80000_en_5.5.0_3.0_1727175781415.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_0_9_lr_1e_06_wd_0_001_dp_0_99999_swati_80000_en_5.5.0_3.0_1727175781415.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetune_squad_ep_0_9_lr_1e_06_wd_0_001_dp_0_99999_swati_80000","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetune_squad_ep_0_9_lr_1e_06_wd_0_001_dp_0_99999_swati_80000", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetune_squad_ep_0_9_lr_1e_06_wd_0_001_dp_0_99999_swati_80000| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-finetune-squad-ep-0.9-lr-1e-06-wd-0.001-dp-0.99999-ss-80000 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetune_squad_ep_0_9_lr_1e_06_wd_0_001_dp_0_99999_swati_80000_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetune_squad_ep_0_9_lr_1e_06_wd_0_001_dp_0_99999_swati_80000_pipeline_en.md new file mode 100644 index 00000000000000..1715f6974faa7f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetune_squad_ep_0_9_lr_1e_06_wd_0_001_dp_0_99999_swati_80000_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_uncased_finetune_squad_ep_0_9_lr_1e_06_wd_0_001_dp_0_99999_swati_80000_pipeline pipeline BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_finetune_squad_ep_0_9_lr_1e_06_wd_0_001_dp_0_99999_swati_80000_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetune_squad_ep_0_9_lr_1e_06_wd_0_001_dp_0_99999_swati_80000_pipeline` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_0_9_lr_1e_06_wd_0_001_dp_0_99999_swati_80000_pipeline_en_5.5.0_3.0_1727175802203.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_0_9_lr_1e_06_wd_0_001_dp_0_99999_swati_80000_pipeline_en_5.5.0_3.0_1727175802203.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_finetune_squad_ep_0_9_lr_1e_06_wd_0_001_dp_0_99999_swati_80000_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_finetune_squad_ep_0_9_lr_1e_06_wd_0_001_dp_0_99999_swati_80000_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetune_squad_ep_0_9_lr_1e_06_wd_0_001_dp_0_99999_swati_80000_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-finetune-squad-ep-0.9-lr-1e-06-wd-0.001-dp-0.99999-ss-80000 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetune_squad_ep_10_0_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_1000_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetune_squad_ep_10_0_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_1000_en.md new file mode 100644 index 00000000000000..c317e093abae86 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetune_squad_ep_10_0_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_1000_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_uncased_finetune_squad_ep_10_0_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_1000 BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_finetune_squad_ep_10_0_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_1000 +date: 2024-09-24 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetune_squad_ep_10_0_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_1000` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_10_0_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_1000_en_5.5.0_3.0_1727163262037.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_10_0_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_1000_en_5.5.0_3.0_1727163262037.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetune_squad_ep_10_0_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_1000","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetune_squad_ep_10_0_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_1000", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetune_squad_ep_10_0_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_1000| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-finetune-squad-ep-10.0-lr-4e-07-wd-1e-05-dp-1.0-ss-0-st-False-fh-False-hs-1000 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetune_squad_ep_10_0_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_1000_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetune_squad_ep_10_0_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_1000_pipeline_en.md new file mode 100644 index 00000000000000..e13467368a96e5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetune_squad_ep_10_0_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_1000_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_uncased_finetune_squad_ep_10_0_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_1000_pipeline pipeline BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_finetune_squad_ep_10_0_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_1000_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetune_squad_ep_10_0_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_1000_pipeline` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_10_0_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_1000_pipeline_en_5.5.0_3.0_1727163284975.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_10_0_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_1000_pipeline_en_5.5.0_3.0_1727163284975.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_finetune_squad_ep_10_0_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_1000_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_finetune_squad_ep_10_0_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_1000_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetune_squad_ep_10_0_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_1000_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-finetune-squad-ep-10.0-lr-4e-07-wd-1e-05-dp-1.0-ss-0-st-False-fh-False-hs-1000 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetune_squad_ep_1_0_lr_1e_05_wd_0_001_dp_0_2_swati_700_southern_sotho_false_fh_true_hs_666_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetune_squad_ep_1_0_lr_1e_05_wd_0_001_dp_0_2_swati_700_southern_sotho_false_fh_true_hs_666_en.md new file mode 100644 index 00000000000000..7610ab309acbd2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetune_squad_ep_1_0_lr_1e_05_wd_0_001_dp_0_2_swati_700_southern_sotho_false_fh_true_hs_666_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_uncased_finetune_squad_ep_1_0_lr_1e_05_wd_0_001_dp_0_2_swati_700_southern_sotho_false_fh_true_hs_666 BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_finetune_squad_ep_1_0_lr_1e_05_wd_0_001_dp_0_2_swati_700_southern_sotho_false_fh_true_hs_666 +date: 2024-09-24 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetune_squad_ep_1_0_lr_1e_05_wd_0_001_dp_0_2_swati_700_southern_sotho_false_fh_true_hs_666` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_1_0_lr_1e_05_wd_0_001_dp_0_2_swati_700_southern_sotho_false_fh_true_hs_666_en_5.5.0_3.0_1727163461473.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_1_0_lr_1e_05_wd_0_001_dp_0_2_swati_700_southern_sotho_false_fh_true_hs_666_en_5.5.0_3.0_1727163461473.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetune_squad_ep_1_0_lr_1e_05_wd_0_001_dp_0_2_swati_700_southern_sotho_false_fh_true_hs_666","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetune_squad_ep_1_0_lr_1e_05_wd_0_001_dp_0_2_swati_700_southern_sotho_false_fh_true_hs_666", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetune_squad_ep_1_0_lr_1e_05_wd_0_001_dp_0_2_swati_700_southern_sotho_false_fh_true_hs_666| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-finetune-squad-ep-1.0-lr-1e-05-wd-0.001-dp-0.2-ss-700-st-False-fh-True-hs-666 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetune_squad_ep_1_0_lr_1e_05_wd_0_001_dp_0_2_swati_700_southern_sotho_false_fh_true_hs_666_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetune_squad_ep_1_0_lr_1e_05_wd_0_001_dp_0_2_swati_700_southern_sotho_false_fh_true_hs_666_pipeline_en.md new file mode 100644 index 00000000000000..c279529dde5d78 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetune_squad_ep_1_0_lr_1e_05_wd_0_001_dp_0_2_swati_700_southern_sotho_false_fh_true_hs_666_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_uncased_finetune_squad_ep_1_0_lr_1e_05_wd_0_001_dp_0_2_swati_700_southern_sotho_false_fh_true_hs_666_pipeline pipeline BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_finetune_squad_ep_1_0_lr_1e_05_wd_0_001_dp_0_2_swati_700_southern_sotho_false_fh_true_hs_666_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetune_squad_ep_1_0_lr_1e_05_wd_0_001_dp_0_2_swati_700_southern_sotho_false_fh_true_hs_666_pipeline` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_1_0_lr_1e_05_wd_0_001_dp_0_2_swati_700_southern_sotho_false_fh_true_hs_666_pipeline_en_5.5.0_3.0_1727163482205.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_1_0_lr_1e_05_wd_0_001_dp_0_2_swati_700_southern_sotho_false_fh_true_hs_666_pipeline_en_5.5.0_3.0_1727163482205.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_finetune_squad_ep_1_0_lr_1e_05_wd_0_001_dp_0_2_swati_700_southern_sotho_false_fh_true_hs_666_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_finetune_squad_ep_1_0_lr_1e_05_wd_0_001_dp_0_2_swati_700_southern_sotho_false_fh_true_hs_666_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetune_squad_ep_1_0_lr_1e_05_wd_0_001_dp_0_2_swati_700_southern_sotho_false_fh_true_hs_666_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-finetune-squad-ep-1.0-lr-1e-05-wd-0.001-dp-0.2-ss-700-st-False-fh-True-hs-666 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetune_squad_ep_1_0_lr_1e_06_wd_0_001_dp_0_2_swati_700_southern_sotho_true_fh_true_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetune_squad_ep_1_0_lr_1e_06_wd_0_001_dp_0_2_swati_700_southern_sotho_true_fh_true_en.md new file mode 100644 index 00000000000000..15dd0420b096d7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetune_squad_ep_1_0_lr_1e_06_wd_0_001_dp_0_2_swati_700_southern_sotho_true_fh_true_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_uncased_finetune_squad_ep_1_0_lr_1e_06_wd_0_001_dp_0_2_swati_700_southern_sotho_true_fh_true BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_finetune_squad_ep_1_0_lr_1e_06_wd_0_001_dp_0_2_swati_700_southern_sotho_true_fh_true +date: 2024-09-24 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetune_squad_ep_1_0_lr_1e_06_wd_0_001_dp_0_2_swati_700_southern_sotho_true_fh_true` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_1_0_lr_1e_06_wd_0_001_dp_0_2_swati_700_southern_sotho_true_fh_true_en_5.5.0_3.0_1727163926396.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_1_0_lr_1e_06_wd_0_001_dp_0_2_swati_700_southern_sotho_true_fh_true_en_5.5.0_3.0_1727163926396.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetune_squad_ep_1_0_lr_1e_06_wd_0_001_dp_0_2_swati_700_southern_sotho_true_fh_true","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetune_squad_ep_1_0_lr_1e_06_wd_0_001_dp_0_2_swati_700_southern_sotho_true_fh_true", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetune_squad_ep_1_0_lr_1e_06_wd_0_001_dp_0_2_swati_700_southern_sotho_true_fh_true| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-finetune-squad-ep-1.0-lr-1e-06-wd-0.001-dp-0.2-ss-700-st-True-fh-True \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetune_squad_ep_1_0_lr_1e_06_wd_0_001_dp_0_2_swati_700_southern_sotho_true_fh_true_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetune_squad_ep_1_0_lr_1e_06_wd_0_001_dp_0_2_swati_700_southern_sotho_true_fh_true_pipeline_en.md new file mode 100644 index 00000000000000..c1964e88d0f231 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetune_squad_ep_1_0_lr_1e_06_wd_0_001_dp_0_2_swati_700_southern_sotho_true_fh_true_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_uncased_finetune_squad_ep_1_0_lr_1e_06_wd_0_001_dp_0_2_swati_700_southern_sotho_true_fh_true_pipeline pipeline BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_finetune_squad_ep_1_0_lr_1e_06_wd_0_001_dp_0_2_swati_700_southern_sotho_true_fh_true_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetune_squad_ep_1_0_lr_1e_06_wd_0_001_dp_0_2_swati_700_southern_sotho_true_fh_true_pipeline` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_1_0_lr_1e_06_wd_0_001_dp_0_2_swati_700_southern_sotho_true_fh_true_pipeline_en_5.5.0_3.0_1727163946848.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_1_0_lr_1e_06_wd_0_001_dp_0_2_swati_700_southern_sotho_true_fh_true_pipeline_en_5.5.0_3.0_1727163946848.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_finetune_squad_ep_1_0_lr_1e_06_wd_0_001_dp_0_2_swati_700_southern_sotho_true_fh_true_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_finetune_squad_ep_1_0_lr_1e_06_wd_0_001_dp_0_2_swati_700_southern_sotho_true_fh_true_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetune_squad_ep_1_0_lr_1e_06_wd_0_001_dp_0_2_swati_700_southern_sotho_true_fh_true_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-finetune-squad-ep-1.0-lr-1e-06-wd-0.001-dp-0.2-ss-700-st-True-fh-True + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetune_squad_ep_1_29_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_300_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetune_squad_ep_1_29_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_300_en.md new file mode 100644 index 00000000000000..51dc228f9b10cd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetune_squad_ep_1_29_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_300_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_uncased_finetune_squad_ep_1_29_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_300 BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_finetune_squad_ep_1_29_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_300 +date: 2024-09-24 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetune_squad_ep_1_29_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_300` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_1_29_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_300_en_5.5.0_3.0_1727163343938.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_1_29_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_300_en_5.5.0_3.0_1727163343938.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetune_squad_ep_1_29_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_300","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetune_squad_ep_1_29_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_300", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetune_squad_ep_1_29_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_300| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-finetune-squad-ep-1.29-lr-4e-07-wd-1e-05-dp-1.0-ss-0-st-False-fh-False-hs-300 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetune_squad_ep_1_29_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_300_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetune_squad_ep_1_29_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_300_pipeline_en.md new file mode 100644 index 00000000000000..adc2a55a7e997d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetune_squad_ep_1_29_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_300_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_uncased_finetune_squad_ep_1_29_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_300_pipeline pipeline BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_finetune_squad_ep_1_29_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_300_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetune_squad_ep_1_29_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_300_pipeline` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_1_29_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_300_pipeline_en_5.5.0_3.0_1727163364437.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_1_29_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_300_pipeline_en_5.5.0_3.0_1727163364437.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_finetune_squad_ep_1_29_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_300_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_finetune_squad_ep_1_29_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_300_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetune_squad_ep_1_29_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_300_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-finetune-squad-ep-1.29-lr-4e-07-wd-1e-05-dp-1.0-ss-0-st-False-fh-False-hs-300 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetune_squad_ep_2_0_lr_0_0001_wd_0_001_dp_0_4_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetune_squad_ep_2_0_lr_0_0001_wd_0_001_dp_0_4_en.md new file mode 100644 index 00000000000000..23d3170e08ec80 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetune_squad_ep_2_0_lr_0_0001_wd_0_001_dp_0_4_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_uncased_finetune_squad_ep_2_0_lr_0_0001_wd_0_001_dp_0_4 BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_finetune_squad_ep_2_0_lr_0_0001_wd_0_001_dp_0_4 +date: 2024-09-24 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetune_squad_ep_2_0_lr_0_0001_wd_0_001_dp_0_4` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_2_0_lr_0_0001_wd_0_001_dp_0_4_en_5.5.0_3.0_1727175347800.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_2_0_lr_0_0001_wd_0_001_dp_0_4_en_5.5.0_3.0_1727175347800.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetune_squad_ep_2_0_lr_0_0001_wd_0_001_dp_0_4","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetune_squad_ep_2_0_lr_0_0001_wd_0_001_dp_0_4", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetune_squad_ep_2_0_lr_0_0001_wd_0_001_dp_0_4| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-finetune-squad-ep-2.0-lr-0.0001-wd-0.001-dp-0.4 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetune_squad_ep_2_0_lr_1e_06_wd_0_001_dp_0_99999_swati_50000_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetune_squad_ep_2_0_lr_1e_06_wd_0_001_dp_0_99999_swati_50000_en.md new file mode 100644 index 00000000000000..e3cfcf21aca705 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetune_squad_ep_2_0_lr_1e_06_wd_0_001_dp_0_99999_swati_50000_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_uncased_finetune_squad_ep_2_0_lr_1e_06_wd_0_001_dp_0_99999_swati_50000 BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_finetune_squad_ep_2_0_lr_1e_06_wd_0_001_dp_0_99999_swati_50000 +date: 2024-09-24 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetune_squad_ep_2_0_lr_1e_06_wd_0_001_dp_0_99999_swati_50000` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_2_0_lr_1e_06_wd_0_001_dp_0_99999_swati_50000_en_5.5.0_3.0_1727176190428.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_2_0_lr_1e_06_wd_0_001_dp_0_99999_swati_50000_en_5.5.0_3.0_1727176190428.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetune_squad_ep_2_0_lr_1e_06_wd_0_001_dp_0_99999_swati_50000","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetune_squad_ep_2_0_lr_1e_06_wd_0_001_dp_0_99999_swati_50000", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetune_squad_ep_2_0_lr_1e_06_wd_0_001_dp_0_99999_swati_50000| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-finetune-squad-ep-2.0-lr-1e-06-wd-0.001-dp-0.99999-ss-50000 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetune_squad_ep_2_0_lr_1e_06_wd_0_001_dp_0_99999_swati_50000_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetune_squad_ep_2_0_lr_1e_06_wd_0_001_dp_0_99999_swati_50000_pipeline_en.md new file mode 100644 index 00000000000000..a4dc4a1bd3b4e4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetune_squad_ep_2_0_lr_1e_06_wd_0_001_dp_0_99999_swati_50000_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_uncased_finetune_squad_ep_2_0_lr_1e_06_wd_0_001_dp_0_99999_swati_50000_pipeline pipeline BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_finetune_squad_ep_2_0_lr_1e_06_wd_0_001_dp_0_99999_swati_50000_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetune_squad_ep_2_0_lr_1e_06_wd_0_001_dp_0_99999_swati_50000_pipeline` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_2_0_lr_1e_06_wd_0_001_dp_0_99999_swati_50000_pipeline_en_5.5.0_3.0_1727176210563.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_2_0_lr_1e_06_wd_0_001_dp_0_99999_swati_50000_pipeline_en_5.5.0_3.0_1727176210563.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_finetune_squad_ep_2_0_lr_1e_06_wd_0_001_dp_0_99999_swati_50000_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_finetune_squad_ep_2_0_lr_1e_06_wd_0_001_dp_0_99999_swati_50000_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetune_squad_ep_2_0_lr_1e_06_wd_0_001_dp_0_99999_swati_50000_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-finetune-squad-ep-2.0-lr-1e-06-wd-0.001-dp-0.99999-ss-50000 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetune_squad_ep_2_0_lr_4e_05_wd_0_001_dp_0_999_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetune_squad_ep_2_0_lr_4e_05_wd_0_001_dp_0_999_en.md new file mode 100644 index 00000000000000..f36ec8ae280b3c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetune_squad_ep_2_0_lr_4e_05_wd_0_001_dp_0_999_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_uncased_finetune_squad_ep_2_0_lr_4e_05_wd_0_001_dp_0_999 BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_finetune_squad_ep_2_0_lr_4e_05_wd_0_001_dp_0_999 +date: 2024-09-24 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetune_squad_ep_2_0_lr_4e_05_wd_0_001_dp_0_999` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_2_0_lr_4e_05_wd_0_001_dp_0_999_en_5.5.0_3.0_1727163618337.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_2_0_lr_4e_05_wd_0_001_dp_0_999_en_5.5.0_3.0_1727163618337.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetune_squad_ep_2_0_lr_4e_05_wd_0_001_dp_0_999","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetune_squad_ep_2_0_lr_4e_05_wd_0_001_dp_0_999", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetune_squad_ep_2_0_lr_4e_05_wd_0_001_dp_0_999| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-finetune-squad-ep-2.0-lr-4e-05-wd-0.001-dp-0.999 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetune_squad_ep_2_0_lr_4e_05_wd_0_001_dp_0_999_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetune_squad_ep_2_0_lr_4e_05_wd_0_001_dp_0_999_pipeline_en.md new file mode 100644 index 00000000000000..8087939eaa1602 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetune_squad_ep_2_0_lr_4e_05_wd_0_001_dp_0_999_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_uncased_finetune_squad_ep_2_0_lr_4e_05_wd_0_001_dp_0_999_pipeline pipeline BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_finetune_squad_ep_2_0_lr_4e_05_wd_0_001_dp_0_999_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetune_squad_ep_2_0_lr_4e_05_wd_0_001_dp_0_999_pipeline` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_2_0_lr_4e_05_wd_0_001_dp_0_999_pipeline_en_5.5.0_3.0_1727163638907.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_2_0_lr_4e_05_wd_0_001_dp_0_999_pipeline_en_5.5.0_3.0_1727163638907.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_finetune_squad_ep_2_0_lr_4e_05_wd_0_001_dp_0_999_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_finetune_squad_ep_2_0_lr_4e_05_wd_0_001_dp_0_999_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetune_squad_ep_2_0_lr_4e_05_wd_0_001_dp_0_999_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-finetune-squad-ep-2.0-lr-4e-05-wd-0.001-dp-0.999 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetune_squad_ep_2_0_lr_4e_06_wd_0_01_dp_0_2_swati_0_southern_sotho_true_fh_true_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetune_squad_ep_2_0_lr_4e_06_wd_0_01_dp_0_2_swati_0_southern_sotho_true_fh_true_pipeline_en.md new file mode 100644 index 00000000000000..4c90981886ae6b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetune_squad_ep_2_0_lr_4e_06_wd_0_01_dp_0_2_swati_0_southern_sotho_true_fh_true_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_uncased_finetune_squad_ep_2_0_lr_4e_06_wd_0_01_dp_0_2_swati_0_southern_sotho_true_fh_true_pipeline pipeline BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_finetune_squad_ep_2_0_lr_4e_06_wd_0_01_dp_0_2_swati_0_southern_sotho_true_fh_true_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetune_squad_ep_2_0_lr_4e_06_wd_0_01_dp_0_2_swati_0_southern_sotho_true_fh_true_pipeline` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_2_0_lr_4e_06_wd_0_01_dp_0_2_swati_0_southern_sotho_true_fh_true_pipeline_en_5.5.0_3.0_1727175524689.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_2_0_lr_4e_06_wd_0_01_dp_0_2_swati_0_southern_sotho_true_fh_true_pipeline_en_5.5.0_3.0_1727175524689.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_finetune_squad_ep_2_0_lr_4e_06_wd_0_01_dp_0_2_swati_0_southern_sotho_true_fh_true_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_finetune_squad_ep_2_0_lr_4e_06_wd_0_01_dp_0_2_swati_0_southern_sotho_true_fh_true_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetune_squad_ep_2_0_lr_4e_06_wd_0_01_dp_0_2_swati_0_southern_sotho_true_fh_true_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-finetune-squad-ep-2.0-lr-4e-06-wd-0.01-dp-0.2-ss-0-st-True-fh-True + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetune_squad_ep_3_0_lr_1e_06_wd_0_001_dp_0_2_swati_8228_southern_sotho_false_fh_true_hs_666_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetune_squad_ep_3_0_lr_1e_06_wd_0_001_dp_0_2_swati_8228_southern_sotho_false_fh_true_hs_666_en.md new file mode 100644 index 00000000000000..e4074b65f8fa7e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetune_squad_ep_3_0_lr_1e_06_wd_0_001_dp_0_2_swati_8228_southern_sotho_false_fh_true_hs_666_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_uncased_finetune_squad_ep_3_0_lr_1e_06_wd_0_001_dp_0_2_swati_8228_southern_sotho_false_fh_true_hs_666 BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_finetune_squad_ep_3_0_lr_1e_06_wd_0_001_dp_0_2_swati_8228_southern_sotho_false_fh_true_hs_666 +date: 2024-09-24 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetune_squad_ep_3_0_lr_1e_06_wd_0_001_dp_0_2_swati_8228_southern_sotho_false_fh_true_hs_666` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_3_0_lr_1e_06_wd_0_001_dp_0_2_swati_8228_southern_sotho_false_fh_true_hs_666_en_5.5.0_3.0_1727175825323.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_3_0_lr_1e_06_wd_0_001_dp_0_2_swati_8228_southern_sotho_false_fh_true_hs_666_en_5.5.0_3.0_1727175825323.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetune_squad_ep_3_0_lr_1e_06_wd_0_001_dp_0_2_swati_8228_southern_sotho_false_fh_true_hs_666","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetune_squad_ep_3_0_lr_1e_06_wd_0_001_dp_0_2_swati_8228_southern_sotho_false_fh_true_hs_666", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetune_squad_ep_3_0_lr_1e_06_wd_0_001_dp_0_2_swati_8228_southern_sotho_false_fh_true_hs_666| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-finetune-squad-ep-3.0-lr-1e-06-wd-0.001-dp-0.2-ss-8228-st-False-fh-True-hs-666 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetune_squad_ep_3_44_lr_4e_07_wd_1e_05_dp_0_3_swati_0_southern_sotho_false_fh_false_hs_800_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetune_squad_ep_3_44_lr_4e_07_wd_1e_05_dp_0_3_swati_0_southern_sotho_false_fh_false_hs_800_en.md new file mode 100644 index 00000000000000..23778f1c4acec2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetune_squad_ep_3_44_lr_4e_07_wd_1e_05_dp_0_3_swati_0_southern_sotho_false_fh_false_hs_800_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_uncased_finetune_squad_ep_3_44_lr_4e_07_wd_1e_05_dp_0_3_swati_0_southern_sotho_false_fh_false_hs_800 BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_finetune_squad_ep_3_44_lr_4e_07_wd_1e_05_dp_0_3_swati_0_southern_sotho_false_fh_false_hs_800 +date: 2024-09-24 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetune_squad_ep_3_44_lr_4e_07_wd_1e_05_dp_0_3_swati_0_southern_sotho_false_fh_false_hs_800` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_3_44_lr_4e_07_wd_1e_05_dp_0_3_swati_0_southern_sotho_false_fh_false_hs_800_en_5.5.0_3.0_1727175546565.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_3_44_lr_4e_07_wd_1e_05_dp_0_3_swati_0_southern_sotho_false_fh_false_hs_800_en_5.5.0_3.0_1727175546565.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetune_squad_ep_3_44_lr_4e_07_wd_1e_05_dp_0_3_swati_0_southern_sotho_false_fh_false_hs_800","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetune_squad_ep_3_44_lr_4e_07_wd_1e_05_dp_0_3_swati_0_southern_sotho_false_fh_false_hs_800", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetune_squad_ep_3_44_lr_4e_07_wd_1e_05_dp_0_3_swati_0_southern_sotho_false_fh_false_hs_800| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-finetune-squad-ep-3.44-lr-4e-07-wd-1e-05-dp-0.3-ss-0-st-False-fh-False-hs-800 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetuned_imdb_shushuile_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetuned_imdb_shushuile_en.md new file mode 100644 index 00000000000000..0cd1e8019f704b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetuned_imdb_shushuile_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_uncased_finetuned_imdb_shushuile BertEmbeddings from shushuile +author: John Snow Labs +name: bert_base_uncased_finetuned_imdb_shushuile +date: 2024-09-24 +tags: [en, open_source, onnx, embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetuned_imdb_shushuile` is a English model originally trained by shushuile. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_imdb_shushuile_en_5.5.0_3.0_1727201186154.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_imdb_shushuile_en_5.5.0_3.0_1727201186154.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = BertEmbeddings.pretrained("bert_base_uncased_finetuned_imdb_shushuile","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = BertEmbeddings.pretrained("bert_base_uncased_finetuned_imdb_shushuile","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetuned_imdb_shushuile| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[bert]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/shushuile/bert-base-uncased-finetuned-imdb \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetuned_imdb_shushuile_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetuned_imdb_shushuile_pipeline_en.md new file mode 100644 index 00000000000000..02390c6e10d145 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetuned_imdb_shushuile_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_base_uncased_finetuned_imdb_shushuile_pipeline pipeline BertEmbeddings from shushuile +author: John Snow Labs +name: bert_base_uncased_finetuned_imdb_shushuile_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetuned_imdb_shushuile_pipeline` is a English model originally trained by shushuile. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_imdb_shushuile_pipeline_en_5.5.0_3.0_1727201208409.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_imdb_shushuile_pipeline_en_5.5.0_3.0_1727201208409.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_finetuned_imdb_shushuile_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_finetuned_imdb_shushuile_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetuned_imdb_shushuile_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/shushuile/bert-base-uncased-finetuned-imdb + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetuned_news_2009_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetuned_news_2009_en.md new file mode 100644 index 00000000000000..bbcb38d49d195b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetuned_news_2009_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_uncased_finetuned_news_2009 BertEmbeddings from sally9805 +author: John Snow Labs +name: bert_base_uncased_finetuned_news_2009 +date: 2024-09-24 +tags: [en, open_source, onnx, embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetuned_news_2009` is a English model originally trained by sally9805. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_news_2009_en_5.5.0_3.0_1727177493599.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_news_2009_en_5.5.0_3.0_1727177493599.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = BertEmbeddings.pretrained("bert_base_uncased_finetuned_news_2009","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = BertEmbeddings.pretrained("bert_base_uncased_finetuned_news_2009","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetuned_news_2009| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[bert]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/sally9805/bert-base-uncased-finetuned-news-2009 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetuned_news_2009_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetuned_news_2009_pipeline_en.md new file mode 100644 index 00000000000000..ce1d46add590f1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetuned_news_2009_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_base_uncased_finetuned_news_2009_pipeline pipeline BertEmbeddings from sally9805 +author: John Snow Labs +name: bert_base_uncased_finetuned_news_2009_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetuned_news_2009_pipeline` is a English model originally trained by sally9805. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_news_2009_pipeline_en_5.5.0_3.0_1727177514131.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_news_2009_pipeline_en_5.5.0_3.0_1727177514131.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_finetuned_news_2009_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_finetuned_news_2009_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetuned_news_2009_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/sally9805/bert-base-uncased-finetuned-news-2009 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetuned_quac_nohistory_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetuned_quac_nohistory_en.md new file mode 100644 index 00000000000000..1d20395d120dbd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetuned_quac_nohistory_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_uncased_finetuned_quac_nohistory BertForQuestionAnswering from Jellevdl +author: John Snow Labs +name: bert_base_uncased_finetuned_quac_nohistory +date: 2024-09-24 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetuned_quac_nohistory` is a English model originally trained by Jellevdl. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_quac_nohistory_en_5.5.0_3.0_1727163668616.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_quac_nohistory_en_5.5.0_3.0_1727163668616.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetuned_quac_nohistory","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetuned_quac_nohistory", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetuned_quac_nohistory| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/Jellevdl/bert-base-uncased-finetuned-quac-noHistory \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetuned_quac_nohistory_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetuned_quac_nohistory_pipeline_en.md new file mode 100644 index 00000000000000..48f8242af6f720 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetuned_quac_nohistory_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_uncased_finetuned_quac_nohistory_pipeline pipeline BertForQuestionAnswering from Jellevdl +author: John Snow Labs +name: bert_base_uncased_finetuned_quac_nohistory_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetuned_quac_nohistory_pipeline` is a English model originally trained by Jellevdl. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_quac_nohistory_pipeline_en_5.5.0_3.0_1727163689623.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_quac_nohistory_pipeline_en_5.5.0_3.0_1727163689623.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_finetuned_quac_nohistory_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_finetuned_quac_nohistory_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetuned_quac_nohistory_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/Jellevdl/bert-base-uncased-finetuned-quac-noHistory + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetuned_quac_withouthistory_v2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetuned_quac_withouthistory_v2_pipeline_en.md new file mode 100644 index 00000000000000..76f97aa2d5064c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetuned_quac_withouthistory_v2_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_uncased_finetuned_quac_withouthistory_v2_pipeline pipeline BertForQuestionAnswering from Jellevdl +author: John Snow Labs +name: bert_base_uncased_finetuned_quac_withouthistory_v2_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetuned_quac_withouthistory_v2_pipeline` is a English model originally trained by Jellevdl. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_quac_withouthistory_v2_pipeline_en_5.5.0_3.0_1727163158239.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_quac_withouthistory_v2_pipeline_en_5.5.0_3.0_1727163158239.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_finetuned_quac_withouthistory_v2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_finetuned_quac_withouthistory_v2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetuned_quac_withouthistory_v2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/Jellevdl/bert-base-uncased-finetuned-quac-withoutHistory-v2 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_issues_128_igory1999_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_issues_128_igory1999_en.md new file mode 100644 index 00000000000000..3597c8b204c8c0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_issues_128_igory1999_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_uncased_issues_128_igory1999 BertEmbeddings from igory1999 +author: John Snow Labs +name: bert_base_uncased_issues_128_igory1999 +date: 2024-09-24 +tags: [en, open_source, onnx, embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_issues_128_igory1999` is a English model originally trained by igory1999. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_issues_128_igory1999_en_5.5.0_3.0_1727173511657.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_issues_128_igory1999_en_5.5.0_3.0_1727173511657.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = BertEmbeddings.pretrained("bert_base_uncased_issues_128_igory1999","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = BertEmbeddings.pretrained("bert_base_uncased_issues_128_igory1999","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_issues_128_igory1999| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[bert]| +|Language:|en| +|Size:|407.1 MB| + +## References + +https://huggingface.co/igory1999/bert-base-uncased-issues-128 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_issues_128_igory1999_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_issues_128_igory1999_pipeline_en.md new file mode 100644 index 00000000000000..2996d93762f8af --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_issues_128_igory1999_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_base_uncased_issues_128_igory1999_pipeline pipeline BertEmbeddings from igory1999 +author: John Snow Labs +name: bert_base_uncased_issues_128_igory1999_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_issues_128_igory1999_pipeline` is a English model originally trained by igory1999. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_issues_128_igory1999_pipeline_en_5.5.0_3.0_1727173532770.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_issues_128_igory1999_pipeline_en_5.5.0_3.0_1727173532770.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_issues_128_igory1999_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_issues_128_igory1999_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_issues_128_igory1999_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/igory1999/bert-base-uncased-issues-128 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_newscategoryclassification_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_newscategoryclassification_en.md new file mode 100644 index 00000000000000..ddbbba0216f3fb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_newscategoryclassification_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_uncased_newscategoryclassification DistilBertForSequenceClassification from akashmaggon +author: John Snow Labs +name: bert_base_uncased_newscategoryclassification +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_newscategoryclassification` is a English model originally trained by akashmaggon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_newscategoryclassification_en_5.5.0_3.0_1727204776004.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_newscategoryclassification_en_5.5.0_3.0_1727204776004.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("bert_base_uncased_newscategoryclassification","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("bert_base_uncased_newscategoryclassification", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_newscategoryclassification| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.6 MB| + +## References + +https://huggingface.co/akashmaggon/bert-base-uncased-newscategoryclassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_newscategoryclassification_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_newscategoryclassification_pipeline_en.md new file mode 100644 index 00000000000000..f1d3f3fe9f1961 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_newscategoryclassification_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_base_uncased_newscategoryclassification_pipeline pipeline DistilBertForSequenceClassification from akashmaggon +author: John Snow Labs +name: bert_base_uncased_newscategoryclassification_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_newscategoryclassification_pipeline` is a English model originally trained by akashmaggon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_newscategoryclassification_pipeline_en_5.5.0_3.0_1727204796937.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_newscategoryclassification_pipeline_en_5.5.0_3.0_1727204796937.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_newscategoryclassification_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_newscategoryclassification_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_newscategoryclassification_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.6 MB| + +## References + +https://huggingface.co/akashmaggon/bert-base-uncased-newscategoryclassification + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_qna_mlqa_dataset_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_qna_mlqa_dataset_en.md new file mode 100644 index 00000000000000..beaa6ad95f7e81 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_qna_mlqa_dataset_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_uncased_qna_mlqa_dataset BertForQuestionAnswering from DunnBC22 +author: John Snow Labs +name: bert_base_uncased_qna_mlqa_dataset +date: 2024-09-24 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_qna_mlqa_dataset` is a English model originally trained by DunnBC22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_qna_mlqa_dataset_en_5.5.0_3.0_1727163771530.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_qna_mlqa_dataset_en_5.5.0_3.0_1727163771530.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_qna_mlqa_dataset","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_qna_mlqa_dataset", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_qna_mlqa_dataset| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/DunnBC22/bert-base-uncased-QnA-MLQA_Dataset \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_qna_mlqa_dataset_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_qna_mlqa_dataset_pipeline_en.md new file mode 100644 index 00000000000000..0d55d51eaced23 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_qna_mlqa_dataset_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_uncased_qna_mlqa_dataset_pipeline pipeline BertForQuestionAnswering from DunnBC22 +author: John Snow Labs +name: bert_base_uncased_qna_mlqa_dataset_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_qna_mlqa_dataset_pipeline` is a English model originally trained by DunnBC22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_qna_mlqa_dataset_pipeline_en_5.5.0_3.0_1727163792312.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_qna_mlqa_dataset_pipeline_en_5.5.0_3.0_1727163792312.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_qna_mlqa_dataset_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_qna_mlqa_dataset_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_qna_mlqa_dataset_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/DunnBC22/bert-base-uncased-QnA-MLQA_Dataset + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_scqa1_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_scqa1_en.md new file mode 100644 index 00000000000000..dbaa00d4a09aa4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_scqa1_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_uncased_scqa1 BertForQuestionAnswering from CambridgeMolecularEngineering +author: John Snow Labs +name: bert_base_uncased_scqa1 +date: 2024-09-24 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_scqa1` is a English model originally trained by CambridgeMolecularEngineering. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_scqa1_en_5.5.0_3.0_1727163133858.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_scqa1_en_5.5.0_3.0_1727163133858.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_scqa1","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_scqa1", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_scqa1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/CambridgeMolecularEngineering/bert-base-uncased-scqa1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_scqa1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_scqa1_pipeline_en.md new file mode 100644 index 00000000000000..119e3baaa3a316 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_scqa1_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_uncased_scqa1_pipeline pipeline BertForQuestionAnswering from CambridgeMolecularEngineering +author: John Snow Labs +name: bert_base_uncased_scqa1_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_scqa1_pipeline` is a English model originally trained by CambridgeMolecularEngineering. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_scqa1_pipeline_en_5.5.0_3.0_1727163156836.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_scqa1_pipeline_en_5.5.0_3.0_1727163156836.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_scqa1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_scqa1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_scqa1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/CambridgeMolecularEngineering/bert-base-uncased-scqa1 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_based_false_positive_secrets_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_based_false_positive_secrets_en.md new file mode 100644 index 00000000000000..b718bc58377ab5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_based_false_positive_secrets_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_based_false_positive_secrets DistilBertForSequenceClassification from harshvkarn +author: John Snow Labs +name: bert_based_false_positive_secrets +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_based_false_positive_secrets` is a English model originally trained by harshvkarn. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_based_false_positive_secrets_en_5.5.0_3.0_1727204776135.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_based_false_positive_secrets_en_5.5.0_3.0_1727204776135.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("bert_based_false_positive_secrets","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("bert_based_false_positive_secrets", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_based_false_positive_secrets| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/harshvkarn/bert-based-false-positive-secrets \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_based_false_positive_secrets_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_based_false_positive_secrets_pipeline_en.md new file mode 100644 index 00000000000000..9b2f9d9f53fcd3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_based_false_positive_secrets_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_based_false_positive_secrets_pipeline pipeline DistilBertForSequenceClassification from harshvkarn +author: John Snow Labs +name: bert_based_false_positive_secrets_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_based_false_positive_secrets_pipeline` is a English model originally trained by harshvkarn. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_based_false_positive_secrets_pipeline_en_5.5.0_3.0_1727204796891.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_based_false_positive_secrets_pipeline_en_5.5.0_3.0_1727204796891.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_based_false_positive_secrets_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_based_false_positive_secrets_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_based_false_positive_secrets_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/harshvkarn/bert-based-false-positive-secrets + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_finetuned_arxiv_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_finetuned_arxiv_en.md new file mode 100644 index 00000000000000..115972580d717b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_finetuned_arxiv_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_finetuned_arxiv BertForSequenceClassification from AyoubChLin +author: John Snow Labs +name: bert_finetuned_arxiv +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_finetuned_arxiv` is a English model originally trained by AyoubChLin. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_finetuned_arxiv_en_5.5.0_3.0_1727222221401.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_finetuned_arxiv_en_5.5.0_3.0_1727222221401.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_finetuned_arxiv","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_finetuned_arxiv", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_finetuned_arxiv| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.5 MB| + +## References + +https://huggingface.co/AyoubChLin/bert-finetuned-Arxiv \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_finetuned_squad_delayedkarma_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_finetuned_squad_delayedkarma_en.md new file mode 100644 index 00000000000000..a8ea932cb033ac --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_finetuned_squad_delayedkarma_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_finetuned_squad_delayedkarma BertForQuestionAnswering from delayedkarma +author: John Snow Labs +name: bert_finetuned_squad_delayedkarma +date: 2024-09-24 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_finetuned_squad_delayedkarma` is a English model originally trained by delayedkarma. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_finetuned_squad_delayedkarma_en_5.5.0_3.0_1727175623331.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_finetuned_squad_delayedkarma_en_5.5.0_3.0_1727175623331.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_finetuned_squad_delayedkarma","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_finetuned_squad_delayedkarma", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_finetuned_squad_delayedkarma| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/delayedkarma/bert-finetuned-squad \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_finetuned_squad_delayedkarma_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_finetuned_squad_delayedkarma_pipeline_en.md new file mode 100644 index 00000000000000..e9dcb873d5d637 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_finetuned_squad_delayedkarma_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_finetuned_squad_delayedkarma_pipeline pipeline BertForQuestionAnswering from delayedkarma +author: John Snow Labs +name: bert_finetuned_squad_delayedkarma_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_finetuned_squad_delayedkarma_pipeline` is a English model originally trained by delayedkarma. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_finetuned_squad_delayedkarma_pipeline_en_5.5.0_3.0_1727175643894.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_finetuned_squad_delayedkarma_pipeline_en_5.5.0_3.0_1727175643894.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_finetuned_squad_delayedkarma_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_finetuned_squad_delayedkarma_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_finetuned_squad_delayedkarma_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/delayedkarma/bert-finetuned-squad + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_gemma2b_multivllm_nodropsus_0_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_gemma2b_multivllm_nodropsus_0_pipeline_en.md new file mode 100644 index 00000000000000..bcc1e936dbefe0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_gemma2b_multivllm_nodropsus_0_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_gemma2b_multivllm_nodropsus_0_pipeline pipeline DistilBertForSequenceClassification from jvelja +author: John Snow Labs +name: bert_gemma2b_multivllm_nodropsus_0_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_gemma2b_multivllm_nodropsus_0_pipeline` is a English model originally trained by jvelja. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_gemma2b_multivllm_nodropsus_0_pipeline_en_5.5.0_3.0_1727164275603.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_gemma2b_multivllm_nodropsus_0_pipeline_en_5.5.0_3.0_1727164275603.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_gemma2b_multivllm_nodropsus_0_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_gemma2b_multivllm_nodropsus_0_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_gemma2b_multivllm_nodropsus_0_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/jvelja/BERT_gemma2b-multivllm-NodropSus_0 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_interview_nepal_bhasa_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_interview_nepal_bhasa_en.md new file mode 100644 index 00000000000000..1be27bd5981be0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_interview_nepal_bhasa_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_interview_nepal_bhasa DistilBertForSequenceClassification from eskayML +author: John Snow Labs +name: bert_interview_nepal_bhasa +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_interview_nepal_bhasa` is a English model originally trained by eskayML. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_interview_nepal_bhasa_en_5.5.0_3.0_1727136875844.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_interview_nepal_bhasa_en_5.5.0_3.0_1727136875844.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("bert_interview_nepal_bhasa","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("bert_interview_nepal_bhasa", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_interview_nepal_bhasa| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/eskayML/bert_interview_new \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_interview_nepal_bhasa_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_interview_nepal_bhasa_pipeline_en.md new file mode 100644 index 00000000000000..77a1ec908b9616 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_interview_nepal_bhasa_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_interview_nepal_bhasa_pipeline pipeline DistilBertForSequenceClassification from eskayML +author: John Snow Labs +name: bert_interview_nepal_bhasa_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_interview_nepal_bhasa_pipeline` is a English model originally trained by eskayML. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_interview_nepal_bhasa_pipeline_en_5.5.0_3.0_1727136888891.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_interview_nepal_bhasa_pipeline_en_5.5.0_3.0_1727136888891.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_interview_nepal_bhasa_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_interview_nepal_bhasa_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_interview_nepal_bhasa_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/eskayML/bert_interview_new + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_large_cased_squadscqa1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_large_cased_squadscqa1_pipeline_en.md new file mode 100644 index 00000000000000..358aaaa301eef6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_large_cased_squadscqa1_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_large_cased_squadscqa1_pipeline pipeline BertForQuestionAnswering from CambridgeMolecularEngineering +author: John Snow Labs +name: bert_large_cased_squadscqa1_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_large_cased_squadscqa1_pipeline` is a English model originally trained by CambridgeMolecularEngineering. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_large_cased_squadscqa1_pipeline_en_5.5.0_3.0_1727175904915.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_large_cased_squadscqa1_pipeline_en_5.5.0_3.0_1727175904915.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_large_cased_squadscqa1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_large_cased_squadscqa1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_large_cased_squadscqa1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/CambridgeMolecularEngineering/bert-large-cased-squadscqa1 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_large_uncased_sparse_90_unstructured_pruneofa_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_large_uncased_sparse_90_unstructured_pruneofa_en.md new file mode 100644 index 00000000000000..68a2bbcf160761 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_large_uncased_sparse_90_unstructured_pruneofa_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_large_uncased_sparse_90_unstructured_pruneofa BertEmbeddings from Intel +author: John Snow Labs +name: bert_large_uncased_sparse_90_unstructured_pruneofa +date: 2024-09-24 +tags: [en, open_source, onnx, embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_large_uncased_sparse_90_unstructured_pruneofa` is a English model originally trained by Intel. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_large_uncased_sparse_90_unstructured_pruneofa_en_5.5.0_3.0_1727173563921.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_large_uncased_sparse_90_unstructured_pruneofa_en_5.5.0_3.0_1727173563921.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = BertEmbeddings.pretrained("bert_large_uncased_sparse_90_unstructured_pruneofa","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = BertEmbeddings.pretrained("bert_large_uncased_sparse_90_unstructured_pruneofa","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_large_uncased_sparse_90_unstructured_pruneofa| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[bert]| +|Language:|en| +|Size:|361.7 MB| + +## References + +https://huggingface.co/Intel/bert-large-uncased-sparse-90-unstructured-pruneofa \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_large_uncased_sparse_90_unstructured_pruneofa_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_large_uncased_sparse_90_unstructured_pruneofa_pipeline_en.md new file mode 100644 index 00000000000000..b94516ed1b36b0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_large_uncased_sparse_90_unstructured_pruneofa_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_large_uncased_sparse_90_unstructured_pruneofa_pipeline pipeline BertEmbeddings from Intel +author: John Snow Labs +name: bert_large_uncased_sparse_90_unstructured_pruneofa_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_large_uncased_sparse_90_unstructured_pruneofa_pipeline` is a English model originally trained by Intel. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_large_uncased_sparse_90_unstructured_pruneofa_pipeline_en_5.5.0_3.0_1727173623641.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_large_uncased_sparse_90_unstructured_pruneofa_pipeline_en_5.5.0_3.0_1727173623641.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_large_uncased_sparse_90_unstructured_pruneofa_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_large_uncased_sparse_90_unstructured_pruneofa_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_large_uncased_sparse_90_unstructured_pruneofa_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|361.7 MB| + +## References + +https://huggingface.co/Intel/bert-large-uncased-sparse-90-unstructured-pruneofa + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_persian_farsi_base_uncased_nlp_course_hw2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_persian_farsi_base_uncased_nlp_course_hw2_pipeline_en.md new file mode 100644 index 00000000000000..aad8fe351cb506 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_persian_farsi_base_uncased_nlp_course_hw2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_persian_farsi_base_uncased_nlp_course_hw2_pipeline pipeline BertEmbeddings from iMahdiGhazavi +author: John Snow Labs +name: bert_persian_farsi_base_uncased_nlp_course_hw2_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_persian_farsi_base_uncased_nlp_course_hw2_pipeline` is a English model originally trained by iMahdiGhazavi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_persian_farsi_base_uncased_nlp_course_hw2_pipeline_en_5.5.0_3.0_1727161796011.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_persian_farsi_base_uncased_nlp_course_hw2_pipeline_en_5.5.0_3.0_1727161796011.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_persian_farsi_base_uncased_nlp_course_hw2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_persian_farsi_base_uncased_nlp_course_hw2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_persian_farsi_base_uncased_nlp_course_hw2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|605.8 MB| + +## References + +https://huggingface.co/iMahdiGhazavi/bert-fa-base-uncased-nlp-course-hw2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_political_classification_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_political_classification_en.md new file mode 100644 index 00000000000000..c1565246e8099c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_political_classification_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_political_classification BertForSequenceClassification from harshal-11 +author: John Snow Labs +name: bert_political_classification +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_political_classification` is a English model originally trained by harshal-11. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_political_classification_en_5.5.0_3.0_1727149334675.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_political_classification_en_5.5.0_3.0_1727149334675.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_political_classification","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_political_classification", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_political_classification| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/harshal-11/Bert-political-classification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_political_classification_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_political_classification_pipeline_en.md new file mode 100644 index 00000000000000..38aca996ba1c32 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_political_classification_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_political_classification_pipeline pipeline BertForSequenceClassification from harshal-11 +author: John Snow Labs +name: bert_political_classification_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_political_classification_pipeline` is a English model originally trained by harshal-11. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_political_classification_pipeline_en_5.5.0_3.0_1727149356671.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_political_classification_pipeline_en_5.5.0_3.0_1727149356671.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_political_classification_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_political_classification_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_political_classification_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/harshal-11/Bert-political-classification + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_tagalog_base_uncased_ner_v1_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_tagalog_base_uncased_ner_v1_en.md new file mode 100644 index 00000000000000..68a8b54ab1ec24 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_tagalog_base_uncased_ner_v1_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_tagalog_base_uncased_ner_v1 BertForTokenClassification from scostiniano +author: John Snow Labs +name: bert_tagalog_base_uncased_ner_v1 +date: 2024-09-24 +tags: [en, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_tagalog_base_uncased_ner_v1` is a English model originally trained by scostiniano. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_tagalog_base_uncased_ner_v1_en_5.5.0_3.0_1727203607818.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_tagalog_base_uncased_ner_v1_en_5.5.0_3.0_1727203607818.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("bert_tagalog_base_uncased_ner_v1","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_tagalog_base_uncased_ner_v1", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_tagalog_base_uncased_ner_v1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|407.0 MB| + +## References + +https://huggingface.co/scostiniano/bert-tagalog-base-uncased-ner-v1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_tagalog_base_uncased_ner_v1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_tagalog_base_uncased_ner_v1_pipeline_en.md new file mode 100644 index 00000000000000..daea8ebaca87f4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_tagalog_base_uncased_ner_v1_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_tagalog_base_uncased_ner_v1_pipeline pipeline BertForTokenClassification from scostiniano +author: John Snow Labs +name: bert_tagalog_base_uncased_ner_v1_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_tagalog_base_uncased_ner_v1_pipeline` is a English model originally trained by scostiniano. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_tagalog_base_uncased_ner_v1_pipeline_en_5.5.0_3.0_1727203629522.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_tagalog_base_uncased_ner_v1_pipeline_en_5.5.0_3.0_1727203629522.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_tagalog_base_uncased_ner_v1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_tagalog_base_uncased_ner_v1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_tagalog_base_uncased_ner_v1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.0 MB| + +## References + +https://huggingface.co/scostiniano/bert-tagalog-base-uncased-ner-v1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_vllm_gemma2b_8_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_vllm_gemma2b_8_pipeline_en.md new file mode 100644 index 00000000000000..4cc3884ba47157 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_vllm_gemma2b_8_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_vllm_gemma2b_8_pipeline pipeline DistilBertForSequenceClassification from jvelja +author: John Snow Labs +name: bert_vllm_gemma2b_8_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_vllm_gemma2b_8_pipeline` is a English model originally trained by jvelja. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_vllm_gemma2b_8_pipeline_en_5.5.0_3.0_1727154524127.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_vllm_gemma2b_8_pipeline_en_5.5.0_3.0_1727154524127.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_vllm_gemma2b_8_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_vllm_gemma2b_8_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_vllm_gemma2b_8_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/jvelja/BERT_vllm-gemma2b_8 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bertin_base_random_es.md b/docs/_posts/ahmedlone127/2024-09-24-bertin_base_random_es.md new file mode 100644 index 00000000000000..81f9b356d44e88 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bertin_base_random_es.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Castilian, Spanish bertin_base_random RoBertaEmbeddings from bertin-project +author: John Snow Labs +name: bertin_base_random +date: 2024-09-24 +tags: [es, open_source, onnx, embeddings, roberta] +task: Embeddings +language: es +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bertin_base_random` is a Castilian, Spanish model originally trained by bertin-project. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bertin_base_random_es_5.5.0_3.0_1727216375790.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bertin_base_random_es_5.5.0_3.0_1727216375790.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("bertin_base_random","es") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("bertin_base_random","es") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bertin_base_random| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|es| +|Size:|231.6 MB| + +## References + +https://huggingface.co/bertin-project/bertin-base-random \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bertin_base_random_pipeline_es.md b/docs/_posts/ahmedlone127/2024-09-24-bertin_base_random_pipeline_es.md new file mode 100644 index 00000000000000..bbcb02b5a31251 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bertin_base_random_pipeline_es.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Castilian, Spanish bertin_base_random_pipeline pipeline RoBertaEmbeddings from bertin-project +author: John Snow Labs +name: bertin_base_random_pipeline +date: 2024-09-24 +tags: [es, open_source, pipeline, onnx] +task: Embeddings +language: es +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bertin_base_random_pipeline` is a Castilian, Spanish model originally trained by bertin-project. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bertin_base_random_pipeline_es_5.5.0_3.0_1727216451962.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bertin_base_random_pipeline_es_5.5.0_3.0_1727216451962.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bertin_base_random_pipeline", lang = "es") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bertin_base_random_pipeline", lang = "es") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bertin_base_random_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|es| +|Size:|231.6 MB| + +## References + +https://huggingface.co/bertin-project/bertin-base-random + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bertin_roberta_base_spanish_es.md b/docs/_posts/ahmedlone127/2024-09-24-bertin_roberta_base_spanish_es.md new file mode 100644 index 00000000000000..a5e55df24c4b27 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bertin_roberta_base_spanish_es.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Castilian, Spanish bertin_roberta_base_spanish RoBertaEmbeddings from bertin-project +author: John Snow Labs +name: bertin_roberta_base_spanish +date: 2024-09-24 +tags: [es, open_source, onnx, embeddings, roberta] +task: Embeddings +language: es +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bertin_roberta_base_spanish` is a Castilian, Spanish model originally trained by bertin-project. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bertin_roberta_base_spanish_es_5.5.0_3.0_1727168816239.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bertin_roberta_base_spanish_es_5.5.0_3.0_1727168816239.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("bertin_roberta_base_spanish","es") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("bertin_roberta_base_spanish","es") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bertin_roberta_base_spanish| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|es| +|Size:|462.2 MB| + +## References + +https://huggingface.co/bertin-project/bertin-roberta-base-spanish \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bertin_roberta_base_spanish_pipeline_es.md b/docs/_posts/ahmedlone127/2024-09-24-bertin_roberta_base_spanish_pipeline_es.md new file mode 100644 index 00000000000000..7720e86b1a7271 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bertin_roberta_base_spanish_pipeline_es.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Castilian, Spanish bertin_roberta_base_spanish_pipeline pipeline RoBertaEmbeddings from bertin-project +author: John Snow Labs +name: bertin_roberta_base_spanish_pipeline +date: 2024-09-24 +tags: [es, open_source, pipeline, onnx] +task: Embeddings +language: es +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bertin_roberta_base_spanish_pipeline` is a Castilian, Spanish model originally trained by bertin-project. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bertin_roberta_base_spanish_pipeline_es_5.5.0_3.0_1727168840178.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bertin_roberta_base_spanish_pipeline_es_5.5.0_3.0_1727168840178.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bertin_roberta_base_spanish_pipeline", lang = "es") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bertin_roberta_base_spanish_pipeline", lang = "es") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bertin_roberta_base_spanish_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|es| +|Size:|462.3 MB| + +## References + +https://huggingface.co/bertin-project/bertin-roberta-base-spanish + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bertis_en.md b/docs/_posts/ahmedlone127/2024-09-24-bertis_en.md new file mode 100644 index 00000000000000..da9e633d374e22 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bertis_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bertis BertForSequenceClassification from mireillfares +author: John Snow Labs +name: bertis +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bertis` is a English model originally trained by mireillfares. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bertis_en_5.5.0_3.0_1727214041559.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bertis_en_5.5.0_3.0_1727214041559.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bertis","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bertis", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bertis| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|405.9 MB| + +## References + +https://huggingface.co/mireillfares/BERTIS \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bertis_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-bertis_pipeline_en.md new file mode 100644 index 00000000000000..21ba4f29df2716 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bertis_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bertis_pipeline pipeline BertForSequenceClassification from mireillfares +author: John Snow Labs +name: bertis_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bertis_pipeline` is a English model originally trained by mireillfares. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bertis_pipeline_en_5.5.0_3.0_1727214062959.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bertis_pipeline_en_5.5.0_3.0_1727214062959.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bertis_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bertis_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bertis_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|405.9 MB| + +## References + +https://huggingface.co/mireillfares/BERTIS + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bge_base_financial_matryoshka_en.md b/docs/_posts/ahmedlone127/2024-09-24-bge_base_financial_matryoshka_en.md new file mode 100644 index 00000000000000..06efba613eb589 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bge_base_financial_matryoshka_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English bge_base_financial_matryoshka BGEEmbeddings from ValentinaKim +author: John Snow Labs +name: bge_base_financial_matryoshka +date: 2024-09-24 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_financial_matryoshka` is a English model originally trained by ValentinaKim. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_financial_matryoshka_en_5.5.0_3.0_1727207436216.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_financial_matryoshka_en_5.5.0_3.0_1727207436216.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("bge_base_financial_matryoshka","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("bge_base_financial_matryoshka","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_financial_matryoshka| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|387.1 MB| + +## References + +https://huggingface.co/ValentinaKim/bge-base-financial-matryoshka \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bge_base_financial_matryoshka_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-bge_base_financial_matryoshka_pipeline_en.md new file mode 100644 index 00000000000000..8a49ae1d5c3f7c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bge_base_financial_matryoshka_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bge_base_financial_matryoshka_pipeline pipeline BGEEmbeddings from ValentinaKim +author: John Snow Labs +name: bge_base_financial_matryoshka_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_financial_matryoshka_pipeline` is a English model originally trained by ValentinaKim. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_financial_matryoshka_pipeline_en_5.5.0_3.0_1727207463772.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_financial_matryoshka_pipeline_en_5.5.0_3.0_1727207463772.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bge_base_financial_matryoshka_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bge_base_financial_matryoshka_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_financial_matryoshka_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|387.1 MB| + +## References + +https://huggingface.co/ValentinaKim/bge-base-financial-matryoshka + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bio_mobilebert_en.md b/docs/_posts/ahmedlone127/2024-09-24-bio_mobilebert_en.md new file mode 100644 index 00000000000000..5af22c5ecc073a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bio_mobilebert_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bio_mobilebert BertEmbeddings from nlpie +author: John Snow Labs +name: bio_mobilebert +date: 2024-09-24 +tags: [en, open_source, onnx, embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bio_mobilebert` is a English model originally trained by nlpie. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bio_mobilebert_en_5.5.0_3.0_1727173387006.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bio_mobilebert_en_5.5.0_3.0_1727173387006.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = BertEmbeddings.pretrained("bio_mobilebert","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = BertEmbeddings.pretrained("bio_mobilebert","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bio_mobilebert| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[bert]| +|Language:|en| +|Size:|92.5 MB| + +## References + +https://huggingface.co/nlpie/bio-mobilebert \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-biom_albert_xxlarge_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-biom_albert_xxlarge_pipeline_en.md new file mode 100644 index 00000000000000..e7531cc8d1fe53 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-biom_albert_xxlarge_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English biom_albert_xxlarge_pipeline pipeline AlbertEmbeddings from sultan +author: John Snow Labs +name: biom_albert_xxlarge_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained AlbertEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`biom_albert_xxlarge_pipeline` is a English model originally trained by sultan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/biom_albert_xxlarge_pipeline_en_5.5.0_3.0_1727220257398.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/biom_albert_xxlarge_pipeline_en_5.5.0_3.0_1727220257398.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("biom_albert_xxlarge_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("biom_albert_xxlarge_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|biom_albert_xxlarge_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|771.3 MB| + +## References + +https://huggingface.co/sultan/BioM-ALBERT-xxlarge + +## Included Models + +- DocumentAssembler +- TokenizerModel +- AlbertEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-biomedroberta_finetuned_valid_testing_en.md b/docs/_posts/ahmedlone127/2024-09-24-biomedroberta_finetuned_valid_testing_en.md new file mode 100644 index 00000000000000..00d20d3c3149ee --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-biomedroberta_finetuned_valid_testing_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English biomedroberta_finetuned_valid_testing RoBertaForTokenClassification from pabRomero +author: John Snow Labs +name: biomedroberta_finetuned_valid_testing +date: 2024-09-24 +tags: [en, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`biomedroberta_finetuned_valid_testing` is a English model originally trained by pabRomero. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/biomedroberta_finetuned_valid_testing_en_5.5.0_3.0_1727199316956.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/biomedroberta_finetuned_valid_testing_en_5.5.0_3.0_1727199316956.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("biomedroberta_finetuned_valid_testing","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("biomedroberta_finetuned_valid_testing", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|biomedroberta_finetuned_valid_testing| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|466.3 MB| + +## References + +https://huggingface.co/pabRomero/BioMedRoBERTa-finetuned-valid-testing \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-biomedroberta_finetuned_valid_testing_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-biomedroberta_finetuned_valid_testing_pipeline_en.md new file mode 100644 index 00000000000000..6958fbe43e4c59 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-biomedroberta_finetuned_valid_testing_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English biomedroberta_finetuned_valid_testing_pipeline pipeline RoBertaForTokenClassification from pabRomero +author: John Snow Labs +name: biomedroberta_finetuned_valid_testing_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`biomedroberta_finetuned_valid_testing_pipeline` is a English model originally trained by pabRomero. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/biomedroberta_finetuned_valid_testing_pipeline_en_5.5.0_3.0_1727199340266.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/biomedroberta_finetuned_valid_testing_pipeline_en_5.5.0_3.0_1727199340266.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("biomedroberta_finetuned_valid_testing_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("biomedroberta_finetuned_valid_testing_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|biomedroberta_finetuned_valid_testing_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|466.3 MB| + +## References + +https://huggingface.co/pabRomero/BioMedRoBERTa-finetuned-valid-testing + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bpe_selfies_pubchem_shard00_166_5k_en.md b/docs/_posts/ahmedlone127/2024-09-24-bpe_selfies_pubchem_shard00_166_5k_en.md new file mode 100644 index 00000000000000..546b9280742de0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bpe_selfies_pubchem_shard00_166_5k_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bpe_selfies_pubchem_shard00_166_5k RoBertaEmbeddings from seyonec +author: John Snow Labs +name: bpe_selfies_pubchem_shard00_166_5k +date: 2024-09-24 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bpe_selfies_pubchem_shard00_166_5k` is a English model originally trained by seyonec. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bpe_selfies_pubchem_shard00_166_5k_en_5.5.0_3.0_1727168680327.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bpe_selfies_pubchem_shard00_166_5k_en_5.5.0_3.0_1727168680327.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("bpe_selfies_pubchem_shard00_166_5k","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("bpe_selfies_pubchem_shard00_166_5k","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bpe_selfies_pubchem_shard00_166_5k| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|309.3 MB| + +## References + +https://huggingface.co/seyonec/BPE_SELFIES_PubChem_shard00_166_5k \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bpe_selfies_pubchem_shard00_166_5k_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-bpe_selfies_pubchem_shard00_166_5k_pipeline_en.md new file mode 100644 index 00000000000000..3820b3a68e5ba6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bpe_selfies_pubchem_shard00_166_5k_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bpe_selfies_pubchem_shard00_166_5k_pipeline pipeline RoBertaEmbeddings from seyonec +author: John Snow Labs +name: bpe_selfies_pubchem_shard00_166_5k_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bpe_selfies_pubchem_shard00_166_5k_pipeline` is a English model originally trained by seyonec. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bpe_selfies_pubchem_shard00_166_5k_pipeline_en_5.5.0_3.0_1727168695825.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bpe_selfies_pubchem_shard00_166_5k_pipeline_en_5.5.0_3.0_1727168695825.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bpe_selfies_pubchem_shard00_166_5k_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bpe_selfies_pubchem_shard00_166_5k_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bpe_selfies_pubchem_shard00_166_5k_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|309.3 MB| + +## References + +https://huggingface.co/seyonec/BPE_SELFIES_PubChem_shard00_166_5k + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-brwac_v1_5__checkpoint_27_100000_en.md b/docs/_posts/ahmedlone127/2024-09-24-brwac_v1_5__checkpoint_27_100000_en.md new file mode 100644 index 00000000000000..8269acf42c27f6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-brwac_v1_5__checkpoint_27_100000_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English brwac_v1_5__checkpoint_27_100000 RoBertaEmbeddings from eduagarcia-temp +author: John Snow Labs +name: brwac_v1_5__checkpoint_27_100000 +date: 2024-09-24 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`brwac_v1_5__checkpoint_27_100000` is a English model originally trained by eduagarcia-temp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/brwac_v1_5__checkpoint_27_100000_en_5.5.0_3.0_1727169121903.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/brwac_v1_5__checkpoint_27_100000_en_5.5.0_3.0_1727169121903.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("brwac_v1_5__checkpoint_27_100000","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("brwac_v1_5__checkpoint_27_100000","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|brwac_v1_5__checkpoint_27_100000| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|296.9 MB| + +## References + +https://huggingface.co/eduagarcia-temp/brwac_v1_5__checkpoint_27_100000 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-brwac_v1_5__checkpoint_27_100000_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-brwac_v1_5__checkpoint_27_100000_pipeline_en.md new file mode 100644 index 00000000000000..5ff1b8538440c2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-brwac_v1_5__checkpoint_27_100000_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English brwac_v1_5__checkpoint_27_100000_pipeline pipeline RoBertaEmbeddings from eduagarcia-temp +author: John Snow Labs +name: brwac_v1_5__checkpoint_27_100000_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`brwac_v1_5__checkpoint_27_100000_pipeline` is a English model originally trained by eduagarcia-temp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/brwac_v1_5__checkpoint_27_100000_pipeline_en_5.5.0_3.0_1727169209114.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/brwac_v1_5__checkpoint_27_100000_pipeline_en_5.5.0_3.0_1727169209114.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("brwac_v1_5__checkpoint_27_100000_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("brwac_v1_5__checkpoint_27_100000_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|brwac_v1_5__checkpoint_27_100000_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|296.9 MB| + +## References + +https://huggingface.co/eduagarcia-temp/brwac_v1_5__checkpoint_27_100000 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bsc_bio_ehr_spanish_carmen_species_es.md b/docs/_posts/ahmedlone127/2024-09-24-bsc_bio_ehr_spanish_carmen_species_es.md new file mode 100644 index 00000000000000..4015a36fbf1ef7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bsc_bio_ehr_spanish_carmen_species_es.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Castilian, Spanish bsc_bio_ehr_spanish_carmen_species RoBertaForTokenClassification from BSC-NLP4BIA +author: John Snow Labs +name: bsc_bio_ehr_spanish_carmen_species +date: 2024-09-24 +tags: [es, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: es +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bsc_bio_ehr_spanish_carmen_species` is a Castilian, Spanish model originally trained by BSC-NLP4BIA. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bsc_bio_ehr_spanish_carmen_species_es_5.5.0_3.0_1727151557808.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bsc_bio_ehr_spanish_carmen_species_es_5.5.0_3.0_1727151557808.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("bsc_bio_ehr_spanish_carmen_species","es") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("bsc_bio_ehr_spanish_carmen_species", "es") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bsc_bio_ehr_spanish_carmen_species| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|es| +|Size:|438.2 MB| + +## References + +https://huggingface.co/BSC-NLP4BIA/bsc-bio-ehr-es-carmen-species \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bsc_bio_ehr_spanish_carmen_species_pipeline_es.md b/docs/_posts/ahmedlone127/2024-09-24-bsc_bio_ehr_spanish_carmen_species_pipeline_es.md new file mode 100644 index 00000000000000..b662f6f9a0b18f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bsc_bio_ehr_spanish_carmen_species_pipeline_es.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Castilian, Spanish bsc_bio_ehr_spanish_carmen_species_pipeline pipeline RoBertaForTokenClassification from BSC-NLP4BIA +author: John Snow Labs +name: bsc_bio_ehr_spanish_carmen_species_pipeline +date: 2024-09-24 +tags: [es, open_source, pipeline, onnx] +task: Named Entity Recognition +language: es +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bsc_bio_ehr_spanish_carmen_species_pipeline` is a Castilian, Spanish model originally trained by BSC-NLP4BIA. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bsc_bio_ehr_spanish_carmen_species_pipeline_es_5.5.0_3.0_1727151583876.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bsc_bio_ehr_spanish_carmen_species_pipeline_es_5.5.0_3.0_1727151583876.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bsc_bio_ehr_spanish_carmen_species_pipeline", lang = "es") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bsc_bio_ehr_spanish_carmen_species_pipeline", lang = "es") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bsc_bio_ehr_spanish_carmen_species_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|es| +|Size:|438.3 MB| + +## References + +https://huggingface.co/BSC-NLP4BIA/bsc-bio-ehr-es-carmen-species + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bsc_bio_ehr_spanish_symptemist_es.md b/docs/_posts/ahmedlone127/2024-09-24-bsc_bio_ehr_spanish_symptemist_es.md new file mode 100644 index 00000000000000..59138f40e2ba2f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bsc_bio_ehr_spanish_symptemist_es.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Castilian, Spanish bsc_bio_ehr_spanish_symptemist RoBertaForTokenClassification from BSC-NLP4BIA +author: John Snow Labs +name: bsc_bio_ehr_spanish_symptemist +date: 2024-09-24 +tags: [es, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: es +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bsc_bio_ehr_spanish_symptemist` is a Castilian, Spanish model originally trained by BSC-NLP4BIA. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bsc_bio_ehr_spanish_symptemist_es_5.5.0_3.0_1727151462665.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bsc_bio_ehr_spanish_symptemist_es_5.5.0_3.0_1727151462665.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("bsc_bio_ehr_spanish_symptemist","es") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("bsc_bio_ehr_spanish_symptemist", "es") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bsc_bio_ehr_spanish_symptemist| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|es| +|Size:|441.8 MB| + +## References + +https://huggingface.co/BSC-NLP4BIA/bsc-bio-ehr-es-symptemist \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bsc_bio_ehr_spanish_symptemist_pipeline_es.md b/docs/_posts/ahmedlone127/2024-09-24-bsc_bio_ehr_spanish_symptemist_pipeline_es.md new file mode 100644 index 00000000000000..b6fa6ca8663622 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bsc_bio_ehr_spanish_symptemist_pipeline_es.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Castilian, Spanish bsc_bio_ehr_spanish_symptemist_pipeline pipeline RoBertaForTokenClassification from BSC-NLP4BIA +author: John Snow Labs +name: bsc_bio_ehr_spanish_symptemist_pipeline +date: 2024-09-24 +tags: [es, open_source, pipeline, onnx] +task: Named Entity Recognition +language: es +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bsc_bio_ehr_spanish_symptemist_pipeline` is a Castilian, Spanish model originally trained by BSC-NLP4BIA. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bsc_bio_ehr_spanish_symptemist_pipeline_es_5.5.0_3.0_1727151487083.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bsc_bio_ehr_spanish_symptemist_pipeline_es_5.5.0_3.0_1727151487083.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bsc_bio_ehr_spanish_symptemist_pipeline", lang = "es") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bsc_bio_ehr_spanish_symptemist_pipeline", lang = "es") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bsc_bio_ehr_spanish_symptemist_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|es| +|Size:|441.8 MB| + +## References + +https://huggingface.co/BSC-NLP4BIA/bsc-bio-ehr-es-symptemist + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-burmese_awesome_eli5_mlm_model_eran_t_imdb_nepal_bhasa_en.md b/docs/_posts/ahmedlone127/2024-09-24-burmese_awesome_eli5_mlm_model_eran_t_imdb_nepal_bhasa_en.md new file mode 100644 index 00000000000000..be50f578bae8c1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-burmese_awesome_eli5_mlm_model_eran_t_imdb_nepal_bhasa_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_awesome_eli5_mlm_model_eran_t_imdb_nepal_bhasa RoBertaEmbeddings from Erantr1 +author: John Snow Labs +name: burmese_awesome_eli5_mlm_model_eran_t_imdb_nepal_bhasa +date: 2024-09-24 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_eli5_mlm_model_eran_t_imdb_nepal_bhasa` is a English model originally trained by Erantr1. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_eli5_mlm_model_eran_t_imdb_nepal_bhasa_en_5.5.0_3.0_1727169074199.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_eli5_mlm_model_eran_t_imdb_nepal_bhasa_en_5.5.0_3.0_1727169074199.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("burmese_awesome_eli5_mlm_model_eran_t_imdb_nepal_bhasa","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("burmese_awesome_eli5_mlm_model_eran_t_imdb_nepal_bhasa","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_eli5_mlm_model_eran_t_imdb_nepal_bhasa| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|466.0 MB| + +## References + +https://huggingface.co/Erantr1/my_awesome_eli5_mlm_model_eran_t_imdb_new \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-burmese_awesome_eli5_mlm_model_eran_t_imdb_nepal_bhasa_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-burmese_awesome_eli5_mlm_model_eran_t_imdb_nepal_bhasa_pipeline_en.md new file mode 100644 index 00000000000000..d3736fcdcddef1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-burmese_awesome_eli5_mlm_model_eran_t_imdb_nepal_bhasa_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_eli5_mlm_model_eran_t_imdb_nepal_bhasa_pipeline pipeline RoBertaEmbeddings from Erantr1 +author: John Snow Labs +name: burmese_awesome_eli5_mlm_model_eran_t_imdb_nepal_bhasa_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_eli5_mlm_model_eran_t_imdb_nepal_bhasa_pipeline` is a English model originally trained by Erantr1. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_eli5_mlm_model_eran_t_imdb_nepal_bhasa_pipeline_en_5.5.0_3.0_1727169097130.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_eli5_mlm_model_eran_t_imdb_nepal_bhasa_pipeline_en_5.5.0_3.0_1727169097130.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_eli5_mlm_model_eran_t_imdb_nepal_bhasa_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_eli5_mlm_model_eran_t_imdb_nepal_bhasa_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_eli5_mlm_model_eran_t_imdb_nepal_bhasa_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|466.0 MB| + +## References + +https://huggingface.co/Erantr1/my_awesome_eli5_mlm_model_eran_t_imdb_new + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-burmese_awesome_model_anatg_en.md b/docs/_posts/ahmedlone127/2024-09-24-burmese_awesome_model_anatg_en.md new file mode 100644 index 00000000000000..041e4659eb3582 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-burmese_awesome_model_anatg_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_awesome_model_anatg DistilBertForSequenceClassification from Anatg +author: John Snow Labs +name: burmese_awesome_model_anatg +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_anatg` is a English model originally trained by Anatg. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_anatg_en_5.5.0_3.0_1727164377263.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_anatg_en_5.5.0_3.0_1727164377263.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_anatg","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_anatg", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_anatg| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Anatg/my_awesome_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-burmese_awesome_model_anatg_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-burmese_awesome_model_anatg_pipeline_en.md new file mode 100644 index 00000000000000..ee784e9ed8198d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-burmese_awesome_model_anatg_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_model_anatg_pipeline pipeline DistilBertForSequenceClassification from Anatg +author: John Snow Labs +name: burmese_awesome_model_anatg_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_anatg_pipeline` is a English model originally trained by Anatg. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_anatg_pipeline_en_5.5.0_3.0_1727164395017.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_anatg_pipeline_en_5.5.0_3.0_1727164395017.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_model_anatg_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_model_anatg_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_anatg_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Anatg/my_awesome_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-burmese_awesome_model_boldirev_as_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-burmese_awesome_model_boldirev_as_pipeline_en.md new file mode 100644 index 00000000000000..c301a9f9249daf --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-burmese_awesome_model_boldirev_as_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_model_boldirev_as_pipeline pipeline DistilBertForSequenceClassification from boldirev-as +author: John Snow Labs +name: burmese_awesome_model_boldirev_as_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_boldirev_as_pipeline` is a English model originally trained by boldirev-as. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_boldirev_as_pipeline_en_5.5.0_3.0_1727164849755.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_boldirev_as_pipeline_en_5.5.0_3.0_1727164849755.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_model_boldirev_as_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_model_boldirev_as_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_boldirev_as_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/boldirev-as/my_awesome_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-burmese_awesome_model_cobegreene_en.md b/docs/_posts/ahmedlone127/2024-09-24-burmese_awesome_model_cobegreene_en.md new file mode 100644 index 00000000000000..9847a8234fb283 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-burmese_awesome_model_cobegreene_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_awesome_model_cobegreene DistilBertForSequenceClassification from cobegreene +author: John Snow Labs +name: burmese_awesome_model_cobegreene +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_cobegreene` is a English model originally trained by cobegreene. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_cobegreene_en_5.5.0_3.0_1727154733582.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_cobegreene_en_5.5.0_3.0_1727154733582.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_cobegreene","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_cobegreene", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_cobegreene| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/cobegreene/my_awesome_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-burmese_awesome_model_cobegreene_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-burmese_awesome_model_cobegreene_pipeline_en.md new file mode 100644 index 00000000000000..fb375eb776f702 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-burmese_awesome_model_cobegreene_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_model_cobegreene_pipeline pipeline DistilBertForSequenceClassification from cobegreene +author: John Snow Labs +name: burmese_awesome_model_cobegreene_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_cobegreene_pipeline` is a English model originally trained by cobegreene. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_cobegreene_pipeline_en_5.5.0_3.0_1727154747838.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_cobegreene_pipeline_en_5.5.0_3.0_1727154747838.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_model_cobegreene_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_model_cobegreene_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_cobegreene_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/cobegreene/my_awesome_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-burmese_awesome_model_hyungho_en.md b/docs/_posts/ahmedlone127/2024-09-24-burmese_awesome_model_hyungho_en.md new file mode 100644 index 00000000000000..9bfa15d48fedf3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-burmese_awesome_model_hyungho_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_awesome_model_hyungho DistilBertForSequenceClassification from Hyungho +author: John Snow Labs +name: burmese_awesome_model_hyungho +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_hyungho` is a English model originally trained by Hyungho. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_hyungho_en_5.5.0_3.0_1727136938617.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_hyungho_en_5.5.0_3.0_1727136938617.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_hyungho","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_hyungho", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_hyungho| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Hyungho/my_awesome_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-burmese_awesome_model_hyungho_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-burmese_awesome_model_hyungho_pipeline_en.md new file mode 100644 index 00000000000000..5aa0867e73b5b1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-burmese_awesome_model_hyungho_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_model_hyungho_pipeline pipeline DistilBertForSequenceClassification from Hyungho +author: John Snow Labs +name: burmese_awesome_model_hyungho_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_hyungho_pipeline` is a English model originally trained by Hyungho. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_hyungho_pipeline_en_5.5.0_3.0_1727136951846.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_hyungho_pipeline_en_5.5.0_3.0_1727136951846.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_model_hyungho_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_model_hyungho_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_hyungho_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Hyungho/my_awesome_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-burmese_awesome_model_sharadakatla_en.md b/docs/_posts/ahmedlone127/2024-09-24-burmese_awesome_model_sharadakatla_en.md new file mode 100644 index 00000000000000..b27fe35bb00966 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-burmese_awesome_model_sharadakatla_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_awesome_model_sharadakatla DistilBertForSequenceClassification from sharadakatla +author: John Snow Labs +name: burmese_awesome_model_sharadakatla +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_sharadakatla` is a English model originally trained by sharadakatla. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_sharadakatla_en_5.5.0_3.0_1727164826419.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_sharadakatla_en_5.5.0_3.0_1727164826419.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_sharadakatla","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_sharadakatla", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_sharadakatla| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/sharadakatla/my_awesome_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-burmese_awesome_model_sharadakatla_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-burmese_awesome_model_sharadakatla_pipeline_en.md new file mode 100644 index 00000000000000..2d76a30d11882c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-burmese_awesome_model_sharadakatla_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_model_sharadakatla_pipeline pipeline DistilBertForSequenceClassification from sharadakatla +author: John Snow Labs +name: burmese_awesome_model_sharadakatla_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_sharadakatla_pipeline` is a English model originally trained by sharadakatla. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_sharadakatla_pipeline_en_5.5.0_3.0_1727164839121.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_sharadakatla_pipeline_en_5.5.0_3.0_1727164839121.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_model_sharadakatla_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_model_sharadakatla_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_sharadakatla_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/sharadakatla/my_awesome_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-burmese_awesome_model_thayinm_en.md b/docs/_posts/ahmedlone127/2024-09-24-burmese_awesome_model_thayinm_en.md new file mode 100644 index 00000000000000..ca3f5323d02af4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-burmese_awesome_model_thayinm_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_awesome_model_thayinm DistilBertForSequenceClassification from thayinm +author: John Snow Labs +name: burmese_awesome_model_thayinm +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_thayinm` is a English model originally trained by thayinm. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_thayinm_en_5.5.0_3.0_1727164663718.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_thayinm_en_5.5.0_3.0_1727164663718.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_thayinm","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_thayinm", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_thayinm| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/thayinm/my_awesome_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-burmese_awesome_model_thayinm_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-burmese_awesome_model_thayinm_pipeline_en.md new file mode 100644 index 00000000000000..facb0cfdca03d8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-burmese_awesome_model_thayinm_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_model_thayinm_pipeline pipeline DistilBertForSequenceClassification from thayinm +author: John Snow Labs +name: burmese_awesome_model_thayinm_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_thayinm_pipeline` is a English model originally trained by thayinm. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_thayinm_pipeline_en_5.5.0_3.0_1727164676551.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_thayinm_pipeline_en_5.5.0_3.0_1727164676551.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_model_thayinm_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_model_thayinm_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_thayinm_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/thayinm/my_awesome_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-burmese_awesome_text_classification_jeruan3_en.md b/docs/_posts/ahmedlone127/2024-09-24-burmese_awesome_text_classification_jeruan3_en.md new file mode 100644 index 00000000000000..c012723a30edd5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-burmese_awesome_text_classification_jeruan3_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_awesome_text_classification_jeruan3 DistilBertForSequenceClassification from jeruan3 +author: John Snow Labs +name: burmese_awesome_text_classification_jeruan3 +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_text_classification_jeruan3` is a English model originally trained by jeruan3. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_text_classification_jeruan3_en_5.5.0_3.0_1727154509727.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_text_classification_jeruan3_en_5.5.0_3.0_1727154509727.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_text_classification_jeruan3","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_text_classification_jeruan3", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_text_classification_jeruan3| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.9 MB| + +## References + +https://huggingface.co/jeruan3/my-awesome-text-classification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-burmese_bert_question_answering_model_en.md b/docs/_posts/ahmedlone127/2024-09-24-burmese_bert_question_answering_model_en.md new file mode 100644 index 00000000000000..e73d7c0c9d58cd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-burmese_bert_question_answering_model_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English burmese_bert_question_answering_model BertForQuestionAnswering from Ashkh0099 +author: John Snow Labs +name: burmese_bert_question_answering_model +date: 2024-09-24 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_bert_question_answering_model` is a English model originally trained by Ashkh0099. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_bert_question_answering_model_en_5.5.0_3.0_1727217061833.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_bert_question_answering_model_en_5.5.0_3.0_1727217061833.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("burmese_bert_question_answering_model","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("burmese_bert_question_answering_model", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_bert_question_answering_model| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/Ashkh0099/my-bert-question-answering-model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-burmese_bert_question_answering_model_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-burmese_bert_question_answering_model_pipeline_en.md new file mode 100644 index 00000000000000..ea66055bf32c69 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-burmese_bert_question_answering_model_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English burmese_bert_question_answering_model_pipeline pipeline BertForQuestionAnswering from Ashkh0099 +author: John Snow Labs +name: burmese_bert_question_answering_model_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_bert_question_answering_model_pipeline` is a English model originally trained by Ashkh0099. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_bert_question_answering_model_pipeline_en_5.5.0_3.0_1727217085841.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_bert_question_answering_model_pipeline_en_5.5.0_3.0_1727217085841.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_bert_question_answering_model_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_bert_question_answering_model_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_bert_question_answering_model_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/Ashkh0099/my-bert-question-answering-model + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-busu_model_small_pipeline_ro.md b/docs/_posts/ahmedlone127/2024-09-24-busu_model_small_pipeline_ro.md new file mode 100644 index 00000000000000..a4fbadf2691ae9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-busu_model_small_pipeline_ro.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Moldavian, Moldovan, Romanian busu_model_small_pipeline pipeline WhisperForCTC from iulik-pisik +author: John Snow Labs +name: busu_model_small_pipeline +date: 2024-09-24 +tags: [ro, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: ro +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`busu_model_small_pipeline` is a Moldavian, Moldovan, Romanian model originally trained by iulik-pisik. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/busu_model_small_pipeline_ro_5.5.0_3.0_1727144357527.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/busu_model_small_pipeline_ro_5.5.0_3.0_1727144357527.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("busu_model_small_pipeline", lang = "ro") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("busu_model_small_pipeline", lang = "ro") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|busu_model_small_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|ro| +|Size:|1.7 GB| + +## References + +https://huggingface.co/iulik-pisik/busu_model_small + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-busu_model_small_ro.md b/docs/_posts/ahmedlone127/2024-09-24-busu_model_small_ro.md new file mode 100644 index 00000000000000..e567ef21cf321a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-busu_model_small_ro.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Moldavian, Moldovan, Romanian busu_model_small WhisperForCTC from iulik-pisik +author: John Snow Labs +name: busu_model_small +date: 2024-09-24 +tags: [ro, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: ro +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`busu_model_small` is a Moldavian, Moldovan, Romanian model originally trained by iulik-pisik. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/busu_model_small_ro_5.5.0_3.0_1727144260430.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/busu_model_small_ro_5.5.0_3.0_1727144260430.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("busu_model_small","ro") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("busu_model_small", "ro") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|busu_model_small| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|ro| +|Size:|1.7 GB| + +## References + +https://huggingface.co/iulik-pisik/busu_model_small \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-calc_nepal_bhasa_roberta_ep20_en.md b/docs/_posts/ahmedlone127/2024-09-24-calc_nepal_bhasa_roberta_ep20_en.md new file mode 100644 index 00000000000000..8d0265f04b03f2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-calc_nepal_bhasa_roberta_ep20_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English calc_nepal_bhasa_roberta_ep20 RoBertaForTokenClassification from vishruthnath +author: John Snow Labs +name: calc_nepal_bhasa_roberta_ep20 +date: 2024-09-24 +tags: [en, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`calc_nepal_bhasa_roberta_ep20` is a English model originally trained by vishruthnath. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/calc_nepal_bhasa_roberta_ep20_en_5.5.0_3.0_1727151021605.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/calc_nepal_bhasa_roberta_ep20_en_5.5.0_3.0_1727151021605.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("calc_nepal_bhasa_roberta_ep20","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("calc_nepal_bhasa_roberta_ep20", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|calc_nepal_bhasa_roberta_ep20| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|422.6 MB| + +## References + +https://huggingface.co/vishruthnath/Calc_new_RoBERTa_ep20 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-camembert_base_fr.md b/docs/_posts/ahmedlone127/2024-09-24-camembert_base_fr.md new file mode 100644 index 00000000000000..4fe642cf4b454e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-camembert_base_fr.md @@ -0,0 +1,86 @@ +--- +layout: model +title: CamemBERT Base Model +author: John Snow Labs +name: camembert_base +date: 2024-09-24 +tags: [fr, french, embeddings, camembert, base, open_source, onnx] +task: Embeddings +language: fr +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: CamemBertEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +[CamemBERT](https://arxiv.org/abs/1911.03894) is a state-of-the-art language model for French based on the RoBERTa model. +For further information or requests, please go to [Camembert Website](https://camembert-model.fr/) + +## Predicted Entities + + + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/camembert_base_fr_5.5.0_3.0_1727210253431.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/camembert_base_fr_5.5.0_3.0_1727210253431.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +embeddings = CamemBertEmbeddings.pretrained("camembert_base", "fr") \ +.setInputCols("sentence", "token") \ +.setOutputCol("embeddings") +``` +```scala +val embeddings = CamemBertEmbeddings.pretrained("camembert_base", "fr") +.setInputCols("sentence", "token") +.setOutputCol("embeddings") +``` + +{:.nlu-block} +```python +import nlu +nlu.load("fr.embed.camembert_base").predict("""Put your text here.""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|camembert_base| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[camembert]| +|Language:|fr| +|Size:|264.0 MB| + +## Benchmarking + +```bash + + +| Model | #params | Arch. | Training data | +|--------------------------------|--------------------------------|-------|-----------------------------------| +| `camembert-base` | 110M | Base | OSCAR (138 GB of text) | +| `camembert/camembert-large` | 335M | Large | CCNet (135 GB of text) | +| `camembert/camembert-base-ccnet` | 110M | Base | CCNet (135 GB of text) | +| `camembert/camembert-base-wikipedia-4gb` | 110M | Base | Wikipedia (4 GB of text) | +| `camembert/camembert-base-oscar-4gb` | 110M | Base | Subsample of OSCAR (4 GB of text) | +| `camembert/camembert-base-ccnet-4gb` | 110M | Base | Subsample of CCNet (4 GB of text) | +``` \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-camembert_base_pipeline_fr.md b/docs/_posts/ahmedlone127/2024-09-24-camembert_base_pipeline_fr.md new file mode 100644 index 00000000000000..3e0a850cd2baab --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-camembert_base_pipeline_fr.md @@ -0,0 +1,70 @@ +--- +layout: model +title: French camembert_base_pipeline pipeline CamemBertEmbeddings from almanach +author: John Snow Labs +name: camembert_base_pipeline +date: 2024-09-24 +tags: [fr, open_source, pipeline, onnx] +task: Embeddings +language: fr +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained CamemBertEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`camembert_base_pipeline` is a French model originally trained by almanach. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/camembert_base_pipeline_fr_5.5.0_3.0_1727210328201.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/camembert_base_pipeline_fr_5.5.0_3.0_1727210328201.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("camembert_base_pipeline", lang = "fr") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("camembert_base_pipeline", lang = "fr") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|camembert_base_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|fr| +|Size:|264.0 MB| + +## References + +https://huggingface.co/almanach/camembert-base + +## Included Models + +- DocumentAssembler +- TokenizerModel +- CamemBertEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-camembert_embeddings_Sonny_generic_model_fr.md b/docs/_posts/ahmedlone127/2024-09-24-camembert_embeddings_Sonny_generic_model_fr.md new file mode 100644 index 00000000000000..7fcd2cf9741eed --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-camembert_embeddings_Sonny_generic_model_fr.md @@ -0,0 +1,98 @@ +--- +layout: model +title: French CamemBert Embeddings (from Sonny) +author: John Snow Labs +name: camembert_embeddings_Sonny_generic_model +date: 2024-09-24 +tags: [fr, open_source, camembert, embeddings, onnx] +task: Embeddings +language: fr +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: CamemBertEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained CamemBert Embeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `dummy-model` is a French model orginally trained by `Sonny`. + +## Predicted Entities + + + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/camembert_embeddings_Sonny_generic_model_fr_5.5.0_3.0_1727210256163.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/camembert_embeddings_Sonny_generic_model_fr_5.5.0_3.0_1727210256163.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = CamemBertEmbeddings.pretrained("camembert_embeddings_Sonny_generic_model","fr") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline(stages=[documentAssembler, tokenizer, embeddings]) + +data = spark.createDataFrame([["J'adore Spark NLP"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = CamemBertEmbeddings.pretrained("camembert_embeddings_Sonny_generic_model","fr") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) + +val data = Seq("J'adore Spark NLP").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|camembert_embeddings_Sonny_generic_model| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[camembert]| +|Language:|fr| +|Size:|264.0 MB| + +## References + +References + +- https://huggingface.co/Sonny/dummy-model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-camembert_french_legal_fr.md b/docs/_posts/ahmedlone127/2024-09-24-camembert_french_legal_fr.md new file mode 100644 index 00000000000000..59f1c7f1560589 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-camembert_french_legal_fr.md @@ -0,0 +1,100 @@ +--- +layout: model +title: French Legal CamemBert Embeddings Model +author: John Snow Labs +name: camembert_french_legal +date: 2024-09-24 +tags: [open_source, camembert_embeddings, camembertformaskedlm, fr, onnx] +task: Embeddings +language: fr +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: CamemBertEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained CamemBertEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `legal-camembert` is a French model originally trained by `maastrichtlawtech`. + +## Predicted Entities + + + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/camembert_french_legal_fr_5.5.0_3.0_1727210205552.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/camembert_french_legal_fr_5.5.0_3.0_1727210205552.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = CamemBertEmbeddings.pretrained("camembert_french_legal","fr") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") \ + .setCaseSensitive(True) + +pipeline = Pipeline(stages=[documentAssembler, tokenizer, embeddings]) + +data = spark.createDataFrame([["J'adore Spark NLP"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val embeddings = CamemBertEmbeddings.pretrained("camembert_french_legal","fr") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + .setCaseSensitive(True) + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) + +val data = Seq("J'adore Spark NLP").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|camembert_french_legal| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[camembert]| +|Language:|fr| +|Size:|412.8 MB| + +## References + +References + +https://huggingface.co/maastrichtlawtech/legal-camembert \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-camembert_french_legal_pipeline_fr.md b/docs/_posts/ahmedlone127/2024-09-24-camembert_french_legal_pipeline_fr.md new file mode 100644 index 00000000000000..b0d219e318038d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-camembert_french_legal_pipeline_fr.md @@ -0,0 +1,70 @@ +--- +layout: model +title: French camembert_french_legal_pipeline pipeline CamemBertEmbeddings from maastrichtlawtech +author: John Snow Labs +name: camembert_french_legal_pipeline +date: 2024-09-24 +tags: [fr, open_source, pipeline, onnx] +task: Embeddings +language: fr +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained CamemBertEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`camembert_french_legal_pipeline` is a French model originally trained by maastrichtlawtech. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/camembert_french_legal_pipeline_fr_5.5.0_3.0_1727210226725.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/camembert_french_legal_pipeline_fr_5.5.0_3.0_1727210226725.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("camembert_french_legal_pipeline", lang = "fr") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("camembert_french_legal_pipeline", lang = "fr") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|camembert_french_legal_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|fr| +|Size:|412.9 MB| + +## References + +https://huggingface.co/maastrichtlawtech/legal-camembert + +## Included Models + +- DocumentAssembler +- TokenizerModel +- CamemBertEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-cat_ner_spanish_3_en.md b/docs/_posts/ahmedlone127/2024-09-24-cat_ner_spanish_3_en.md new file mode 100644 index 00000000000000..b9c9c53e4bb8c9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-cat_ner_spanish_3_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English cat_ner_spanish_3 RoBertaForTokenClassification from homersimpson +author: John Snow Labs +name: cat_ner_spanish_3 +date: 2024-09-24 +tags: [en, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`cat_ner_spanish_3` is a English model originally trained by homersimpson. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/cat_ner_spanish_3_en_5.5.0_3.0_1727151167432.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/cat_ner_spanish_3_en_5.5.0_3.0_1727151167432.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("cat_ner_spanish_3","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("cat_ner_spanish_3", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|cat_ner_spanish_3| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|462.3 MB| + +## References + +https://huggingface.co/homersimpson/cat-ner-es-3 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-cat_ner_spanish_3_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-cat_ner_spanish_3_pipeline_en.md new file mode 100644 index 00000000000000..79baf3391b44b0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-cat_ner_spanish_3_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English cat_ner_spanish_3_pipeline pipeline RoBertaForTokenClassification from homersimpson +author: John Snow Labs +name: cat_ner_spanish_3_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`cat_ner_spanish_3_pipeline` is a English model originally trained by homersimpson. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/cat_ner_spanish_3_pipeline_en_5.5.0_3.0_1727151191419.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/cat_ner_spanish_3_pipeline_en_5.5.0_3.0_1727151191419.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("cat_ner_spanish_3_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("cat_ner_spanish_3_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|cat_ner_spanish_3_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|462.3 MB| + +## References + +https://huggingface.co/homersimpson/cat-ner-es-3 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-chinese_roberta_wwm_ext_large_finetuned_ner_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-chinese_roberta_wwm_ext_large_finetuned_ner_pipeline_en.md new file mode 100644 index 00000000000000..1190dec77e5b3f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-chinese_roberta_wwm_ext_large_finetuned_ner_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English chinese_roberta_wwm_ext_large_finetuned_ner_pipeline pipeline BertForTokenClassification from HYM +author: John Snow Labs +name: chinese_roberta_wwm_ext_large_finetuned_ner_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`chinese_roberta_wwm_ext_large_finetuned_ner_pipeline` is a English model originally trained by HYM. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/chinese_roberta_wwm_ext_large_finetuned_ner_pipeline_en_5.5.0_3.0_1727203754472.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/chinese_roberta_wwm_ext_large_finetuned_ner_pipeline_en_5.5.0_3.0_1727203754472.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("chinese_roberta_wwm_ext_large_finetuned_ner_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("chinese_roberta_wwm_ext_large_finetuned_ner_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|chinese_roberta_wwm_ext_large_finetuned_ner_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/HYM/chinese-roberta-wwm-ext-large-finetuned-ner + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-cl_arabertv0_1_base_33379_arabic_tydiqa_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-cl_arabertv0_1_base_33379_arabic_tydiqa_pipeline_en.md new file mode 100644 index 00000000000000..b05e372ff5ac3a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-cl_arabertv0_1_base_33379_arabic_tydiqa_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English cl_arabertv0_1_base_33379_arabic_tydiqa_pipeline pipeline BertForQuestionAnswering from MatMulMan +author: John Snow Labs +name: cl_arabertv0_1_base_33379_arabic_tydiqa_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`cl_arabertv0_1_base_33379_arabic_tydiqa_pipeline` is a English model originally trained by MatMulMan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/cl_arabertv0_1_base_33379_arabic_tydiqa_pipeline_en_5.5.0_3.0_1727216854587.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/cl_arabertv0_1_base_33379_arabic_tydiqa_pipeline_en_5.5.0_3.0_1727216854587.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("cl_arabertv0_1_base_33379_arabic_tydiqa_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("cl_arabertv0_1_base_33379_arabic_tydiqa_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|cl_arabertv0_1_base_33379_arabic_tydiqa_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|505.0 MB| + +## References + +https://huggingface.co/MatMulMan/CL-AraBERTv0.1-base-33379-arabic_tydiqa + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-classification_multi_label_des_crimes_ar.md b/docs/_posts/ahmedlone127/2024-09-24-classification_multi_label_des_crimes_ar.md new file mode 100644 index 00000000000000..4a20fa29fb59c1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-classification_multi_label_des_crimes_ar.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Arabic classification_multi_label_des_crimes BertForSequenceClassification from fatttty +author: John Snow Labs +name: classification_multi_label_des_crimes +date: 2024-09-24 +tags: [ar, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: ar +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`classification_multi_label_des_crimes` is a Arabic model originally trained by fatttty. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/classification_multi_label_des_crimes_ar_5.5.0_3.0_1727222144533.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/classification_multi_label_des_crimes_ar_5.5.0_3.0_1727222144533.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("classification_multi_label_des_crimes","ar") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("classification_multi_label_des_crimes", "ar") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|classification_multi_label_des_crimes| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|ar| +|Size:|508.7 MB| + +## References + +https://huggingface.co/fatttty/classification_multi_label_des_crimes \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-classification_multi_label_des_crimes_pipeline_ar.md b/docs/_posts/ahmedlone127/2024-09-24-classification_multi_label_des_crimes_pipeline_ar.md new file mode 100644 index 00000000000000..373bc6fae4add6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-classification_multi_label_des_crimes_pipeline_ar.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Arabic classification_multi_label_des_crimes_pipeline pipeline BertForSequenceClassification from fatttty +author: John Snow Labs +name: classification_multi_label_des_crimes_pipeline +date: 2024-09-24 +tags: [ar, open_source, pipeline, onnx] +task: Text Classification +language: ar +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`classification_multi_label_des_crimes_pipeline` is a Arabic model originally trained by fatttty. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/classification_multi_label_des_crimes_pipeline_ar_5.5.0_3.0_1727222170898.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/classification_multi_label_des_crimes_pipeline_ar_5.5.0_3.0_1727222170898.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("classification_multi_label_des_crimes_pipeline", lang = "ar") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("classification_multi_label_des_crimes_pipeline", lang = "ar") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|classification_multi_label_des_crimes_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|ar| +|Size:|508.7 MB| + +## References + +https://huggingface.co/fatttty/classification_multi_label_des_crimes + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-classification_tagging_en.md b/docs/_posts/ahmedlone127/2024-09-24-classification_tagging_en.md new file mode 100644 index 00000000000000..1b5b29e0e19f9d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-classification_tagging_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English classification_tagging BertEmbeddings from kumarsonu +author: John Snow Labs +name: classification_tagging +date: 2024-09-24 +tags: [en, open_source, onnx, embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`classification_tagging` is a English model originally trained by kumarsonu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/classification_tagging_en_5.5.0_3.0_1727177672772.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/classification_tagging_en_5.5.0_3.0_1727177672772.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = BertEmbeddings.pretrained("classification_tagging","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = BertEmbeddings.pretrained("classification_tagging","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|classification_tagging| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[bert]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/kumarsonu/Classification_Tagging \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-classification_tagging_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-classification_tagging_pipeline_en.md new file mode 100644 index 00000000000000..feeb49c38ca78c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-classification_tagging_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English classification_tagging_pipeline pipeline BertEmbeddings from kumarsonu +author: John Snow Labs +name: classification_tagging_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`classification_tagging_pipeline` is a English model originally trained by kumarsonu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/classification_tagging_pipeline_en_5.5.0_3.0_1727177692643.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/classification_tagging_pipeline_en_5.5.0_3.0_1727177692643.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("classification_tagging_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("classification_tagging_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|classification_tagging_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/kumarsonu/Classification_Tagging + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-clip_vit_large_patch14_en.md b/docs/_posts/ahmedlone127/2024-09-24-clip_vit_large_patch14_en.md new file mode 100644 index 00000000000000..a9e0a48916c7e9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-clip_vit_large_patch14_en.md @@ -0,0 +1,120 @@ +--- +layout: model +title: English clip_vit_large_patch14 CLIPForZeroShotClassification from openai +author: John Snow Labs +name: clip_vit_large_patch14 +date: 2024-09-24 +tags: [en, open_source, onnx, zero_shot, clip, image] +task: Zero-Shot Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: CLIPForZeroShotClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained CLIPForZeroShotClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`clip_vit_large_patch14` is a English model originally trained by openai. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/clip_vit_large_patch14_en_5.5.0_3.0_1727207942979.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/clip_vit_large_patch14_en_5.5.0_3.0_1727207942979.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +imageDF = spark.read \ + .format("image") \ + .option("dropInvalid", value = True) \ + .load("src/test/resources/image/") + +candidateLabels = [ + "a photo of a bird", + "a photo of a cat", + "a photo of a dog", + "a photo of a hen", + "a photo of a hippo", + "a photo of a room", + "a photo of a tractor", + "a photo of an ostrich", + "a photo of an ox"] + +ImageAssembler = ImageAssembler() \ + .setInputCol("image") \ + .setOutputCol("image_assembler") + +imageClassifier = CLIPForZeroShotClassification.pretrained("clip_vit_large_patch14","en") \ + .setInputCols(["image_assembler"]) \ + .setOutputCol("label") \ + .setCandidateLabels(candidateLabels) + +pipeline = Pipeline().setStages([ImageAssembler, imageClassifier]) +pipelineModel = pipeline.fit(imageDF) +pipelineDF = pipelineModel.transform(imageDF) + + +``` +```scala + + +val imageDF = ResourceHelper.spark.read + .format("image") + .option("dropInvalid", value = true) + .load("src/test/resources/image/") + +val candidateLabels = Array( + "a photo of a bird", + "a photo of a cat", + "a photo of a dog", + "a photo of a hen", + "a photo of a hippo", + "a photo of a room", + "a photo of a tractor", + "a photo of an ostrich", + "a photo of an ox") + +val imageAssembler = new ImageAssembler() + .setInputCol("image") + .setOutputCol("image_assembler") + +val imageClassifier = CLIPForZeroShotClassification.pretrained("clip_vit_large_patch14","en") \ + .setInputCols(Array("image_assembler")) \ + .setOutputCol("label") \ + .setCandidateLabels(candidateLabels) + +val pipeline = new Pipeline().setStages(Array(imageAssembler, imageClassifier)) +val pipelineModel = pipeline.fit(imageDF) +val pipelineDF = pipelineModel.transform(imageDF) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|clip_vit_large_patch14| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[image_assembler]| +|Output Labels:|[label]| +|Language:|en| +|Size:|1.1 GB| + +## References + +https://huggingface.co/openai/clip-vit-large-patch14 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-code_search_codebert_base_random_trimmed_en.md b/docs/_posts/ahmedlone127/2024-09-24-code_search_codebert_base_random_trimmed_en.md new file mode 100644 index 00000000000000..d5e7ca366fec6b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-code_search_codebert_base_random_trimmed_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English code_search_codebert_base_random_trimmed RoBertaForTokenClassification from DianaIulia +author: John Snow Labs +name: code_search_codebert_base_random_trimmed +date: 2024-09-24 +tags: [en, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`code_search_codebert_base_random_trimmed` is a English model originally trained by DianaIulia. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/code_search_codebert_base_random_trimmed_en_5.5.0_3.0_1727150954940.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/code_search_codebert_base_random_trimmed_en_5.5.0_3.0_1727150954940.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("code_search_codebert_base_random_trimmed","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("code_search_codebert_base_random_trimmed", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|code_search_codebert_base_random_trimmed| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|466.1 MB| + +## References + +https://huggingface.co/DianaIulia/code_search_codebert_base_random_trimmed \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-code_search_codebert_base_random_trimmed_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-code_search_codebert_base_random_trimmed_pipeline_en.md new file mode 100644 index 00000000000000..4be6be30b58b60 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-code_search_codebert_base_random_trimmed_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English code_search_codebert_base_random_trimmed_pipeline pipeline RoBertaForTokenClassification from DianaIulia +author: John Snow Labs +name: code_search_codebert_base_random_trimmed_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`code_search_codebert_base_random_trimmed_pipeline` is a English model originally trained by DianaIulia. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/code_search_codebert_base_random_trimmed_pipeline_en_5.5.0_3.0_1727150992459.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/code_search_codebert_base_random_trimmed_pipeline_en_5.5.0_3.0_1727150992459.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("code_search_codebert_base_random_trimmed_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("code_search_codebert_base_random_trimmed_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|code_search_codebert_base_random_trimmed_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|466.2 MB| + +## References + +https://huggingface.co/DianaIulia/code_search_codebert_base_random_trimmed + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-code_search_codebert_base_up_down_1_trimmed_en.md b/docs/_posts/ahmedlone127/2024-09-24-code_search_codebert_base_up_down_1_trimmed_en.md new file mode 100644 index 00000000000000..47eed14df08dbd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-code_search_codebert_base_up_down_1_trimmed_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English code_search_codebert_base_up_down_1_trimmed RoBertaForTokenClassification from DianaIulia +author: John Snow Labs +name: code_search_codebert_base_up_down_1_trimmed +date: 2024-09-24 +tags: [en, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`code_search_codebert_base_up_down_1_trimmed` is a English model originally trained by DianaIulia. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/code_search_codebert_base_up_down_1_trimmed_en_5.5.0_3.0_1727139369172.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/code_search_codebert_base_up_down_1_trimmed_en_5.5.0_3.0_1727139369172.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("code_search_codebert_base_up_down_1_trimmed","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("code_search_codebert_base_up_down_1_trimmed", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|code_search_codebert_base_up_down_1_trimmed| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|466.1 MB| + +## References + +https://huggingface.co/DianaIulia/code_search_codebert_base_up_down_1_trimmed \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-code_search_codebert_base_up_down_1_trimmed_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-code_search_codebert_base_up_down_1_trimmed_pipeline_en.md new file mode 100644 index 00000000000000..426be3af133856 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-code_search_codebert_base_up_down_1_trimmed_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English code_search_codebert_base_up_down_1_trimmed_pipeline pipeline RoBertaForTokenClassification from DianaIulia +author: John Snow Labs +name: code_search_codebert_base_up_down_1_trimmed_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`code_search_codebert_base_up_down_1_trimmed_pipeline` is a English model originally trained by DianaIulia. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/code_search_codebert_base_up_down_1_trimmed_pipeline_en_5.5.0_3.0_1727139395640.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/code_search_codebert_base_up_down_1_trimmed_pipeline_en_5.5.0_3.0_1727139395640.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("code_search_codebert_base_up_down_1_trimmed_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("code_search_codebert_base_up_down_1_trimmed_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|code_search_codebert_base_up_down_1_trimmed_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|466.2 MB| + +## References + +https://huggingface.co/DianaIulia/code_search_codebert_base_up_down_1_trimmed + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-codebert_java_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-codebert_java_pipeline_en.md new file mode 100644 index 00000000000000..fd5c683ee5f943 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-codebert_java_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English codebert_java_pipeline pipeline RoBertaEmbeddings from neulab +author: John Snow Labs +name: codebert_java_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`codebert_java_pipeline` is a English model originally trained by neulab. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/codebert_java_pipeline_en_5.5.0_3.0_1727216315272.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/codebert_java_pipeline_en_5.5.0_3.0_1727216315272.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("codebert_java_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("codebert_java_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|codebert_java_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|466.0 MB| + +## References + +https://huggingface.co/neulab/codebert-java + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-cold_fusion_itr9_seed3_en.md b/docs/_posts/ahmedlone127/2024-09-24-cold_fusion_itr9_seed3_en.md new file mode 100644 index 00000000000000..56181b2f7aae36 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-cold_fusion_itr9_seed3_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English cold_fusion_itr9_seed3 RoBertaForSequenceClassification from ibm +author: John Snow Labs +name: cold_fusion_itr9_seed3 +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`cold_fusion_itr9_seed3` is a English model originally trained by ibm. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/cold_fusion_itr9_seed3_en_5.5.0_3.0_1727171208554.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/cold_fusion_itr9_seed3_en_5.5.0_3.0_1727171208554.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("cold_fusion_itr9_seed3","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("cold_fusion_itr9_seed3", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|cold_fusion_itr9_seed3| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|467.9 MB| + +## References + +https://huggingface.co/ibm/ColD-Fusion-itr9-seed3 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-cold_fusion_itr9_seed3_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-cold_fusion_itr9_seed3_pipeline_en.md new file mode 100644 index 00000000000000..f485406fc28310 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-cold_fusion_itr9_seed3_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English cold_fusion_itr9_seed3_pipeline pipeline RoBertaForSequenceClassification from ibm +author: John Snow Labs +name: cold_fusion_itr9_seed3_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`cold_fusion_itr9_seed3_pipeline` is a English model originally trained by ibm. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/cold_fusion_itr9_seed3_pipeline_en_5.5.0_3.0_1727171232747.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/cold_fusion_itr9_seed3_pipeline_en_5.5.0_3.0_1727171232747.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("cold_fusion_itr9_seed3_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("cold_fusion_itr9_seed3_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|cold_fusion_itr9_seed3_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|468.0 MB| + +## References + +https://huggingface.co/ibm/ColD-Fusion-itr9-seed3 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-conflibert_cont_cased_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-conflibert_cont_cased_pipeline_en.md new file mode 100644 index 00000000000000..c0acf6f8db2f25 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-conflibert_cont_cased_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English conflibert_cont_cased_pipeline pipeline BertEmbeddings from snowood1 +author: John Snow Labs +name: conflibert_cont_cased_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`conflibert_cont_cased_pipeline` is a English model originally trained by snowood1. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/conflibert_cont_cased_pipeline_en_5.5.0_3.0_1727220709179.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/conflibert_cont_cased_pipeline_en_5.5.0_3.0_1727220709179.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("conflibert_cont_cased_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("conflibert_cont_cased_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|conflibert_cont_cased_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|402.9 MB| + +## References + +https://huggingface.co/snowood1/ConfliBERT-cont-cased + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-conflibert_cont_uncased_en.md b/docs/_posts/ahmedlone127/2024-09-24-conflibert_cont_uncased_en.md new file mode 100644 index 00000000000000..a038e41e4bf7aa --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-conflibert_cont_uncased_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English conflibert_cont_uncased BertEmbeddings from snowood1 +author: John Snow Labs +name: conflibert_cont_uncased +date: 2024-09-24 +tags: [en, open_source, onnx, embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`conflibert_cont_uncased` is a English model originally trained by snowood1. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/conflibert_cont_uncased_en_5.5.0_3.0_1727221175110.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/conflibert_cont_uncased_en_5.5.0_3.0_1727221175110.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = BertEmbeddings.pretrained("conflibert_cont_uncased","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = BertEmbeddings.pretrained("conflibert_cont_uncased","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|conflibert_cont_uncased| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[bert]| +|Language:|en| +|Size:|406.8 MB| + +## References + +https://huggingface.co/snowood1/ConfliBERT-cont-uncased \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-conflibert_cont_uncased_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-conflibert_cont_uncased_pipeline_en.md new file mode 100644 index 00000000000000..3760c07f3d59e2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-conflibert_cont_uncased_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English conflibert_cont_uncased_pipeline pipeline BertEmbeddings from snowood1 +author: John Snow Labs +name: conflibert_cont_uncased_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`conflibert_cont_uncased_pipeline` is a English model originally trained by snowood1. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/conflibert_cont_uncased_pipeline_en_5.5.0_3.0_1727221195692.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/conflibert_cont_uncased_pipeline_en_5.5.0_3.0_1727221195692.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("conflibert_cont_uncased_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("conflibert_cont_uncased_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|conflibert_cont_uncased_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|406.9 MB| + +## References + +https://huggingface.co/snowood1/ConfliBERT-cont-uncased + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-contaminationquestionanswering_en.md b/docs/_posts/ahmedlone127/2024-09-24-contaminationquestionanswering_en.md new file mode 100644 index 00000000000000..ed87debf5d4a0d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-contaminationquestionanswering_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English contaminationquestionanswering DistilBertForQuestionAnswering from Shushant +author: John Snow Labs +name: contaminationquestionanswering +date: 2024-09-24 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`contaminationquestionanswering` is a English model originally trained by Shushant. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/contaminationquestionanswering_en_5.5.0_3.0_1727219904234.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/contaminationquestionanswering_en_5.5.0_3.0_1727219904234.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("contaminationquestionanswering","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("contaminationquestionanswering", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|contaminationquestionanswering| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/Shushant/ContaminationQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-contaminationquestionanswering_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-contaminationquestionanswering_pipeline_en.md new file mode 100644 index 00000000000000..304bf39e316815 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-contaminationquestionanswering_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English contaminationquestionanswering_pipeline pipeline DistilBertForQuestionAnswering from Shushant +author: John Snow Labs +name: contaminationquestionanswering_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`contaminationquestionanswering_pipeline` is a English model originally trained by Shushant. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/contaminationquestionanswering_pipeline_en_5.5.0_3.0_1727219918030.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/contaminationquestionanswering_pipeline_en_5.5.0_3.0_1727219918030.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("contaminationquestionanswering_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("contaminationquestionanswering_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|contaminationquestionanswering_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.3 MB| + +## References + +https://huggingface.co/Shushant/ContaminationQuestionAnswering + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-correct_bert_token_itr0_0_0001_webdiscourse_01_03_2022_15_47_14_en.md b/docs/_posts/ahmedlone127/2024-09-24-correct_bert_token_itr0_0_0001_webdiscourse_01_03_2022_15_47_14_en.md new file mode 100644 index 00000000000000..f6b7659db77131 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-correct_bert_token_itr0_0_0001_webdiscourse_01_03_2022_15_47_14_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English correct_bert_token_itr0_0_0001_webdiscourse_01_03_2022_15_47_14 BertForTokenClassification from ali2066 +author: John Snow Labs +name: correct_bert_token_itr0_0_0001_webdiscourse_01_03_2022_15_47_14 +date: 2024-09-24 +tags: [en, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`correct_bert_token_itr0_0_0001_webdiscourse_01_03_2022_15_47_14` is a English model originally trained by ali2066. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/correct_bert_token_itr0_0_0001_webdiscourse_01_03_2022_15_47_14_en_5.5.0_3.0_1727203282167.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/correct_bert_token_itr0_0_0001_webdiscourse_01_03_2022_15_47_14_en_5.5.0_3.0_1727203282167.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("correct_bert_token_itr0_0_0001_webdiscourse_01_03_2022_15_47_14","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("correct_bert_token_itr0_0_0001_webdiscourse_01_03_2022_15_47_14", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|correct_bert_token_itr0_0_0001_webdiscourse_01_03_2022_15_47_14| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/ali2066/correct_BERT_token_itr0_0.0001_webDiscourse_01_03_2022-15_47_14 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-correct_bert_token_itr0_0_0001_webdiscourse_01_03_2022_15_47_14_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-correct_bert_token_itr0_0_0001_webdiscourse_01_03_2022_15_47_14_pipeline_en.md new file mode 100644 index 00000000000000..2afb6fc03d145b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-correct_bert_token_itr0_0_0001_webdiscourse_01_03_2022_15_47_14_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English correct_bert_token_itr0_0_0001_webdiscourse_01_03_2022_15_47_14_pipeline pipeline BertForTokenClassification from ali2066 +author: John Snow Labs +name: correct_bert_token_itr0_0_0001_webdiscourse_01_03_2022_15_47_14_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`correct_bert_token_itr0_0_0001_webdiscourse_01_03_2022_15_47_14_pipeline` is a English model originally trained by ali2066. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/correct_bert_token_itr0_0_0001_webdiscourse_01_03_2022_15_47_14_pipeline_en_5.5.0_3.0_1727203304268.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/correct_bert_token_itr0_0_0001_webdiscourse_01_03_2022_15_47_14_pipeline_en_5.5.0_3.0_1727203304268.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("correct_bert_token_itr0_0_0001_webdiscourse_01_03_2022_15_47_14_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("correct_bert_token_itr0_0_0001_webdiscourse_01_03_2022_15_47_14_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|correct_bert_token_itr0_0_0001_webdiscourse_01_03_2022_15_47_14_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/ali2066/correct_BERT_token_itr0_0.0001_webDiscourse_01_03_2022-15_47_14 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-credit_card_collection_intent_classification_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-credit_card_collection_intent_classification_pipeline_en.md new file mode 100644 index 00000000000000..48a1fb5aa12a85 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-credit_card_collection_intent_classification_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English credit_card_collection_intent_classification_pipeline pipeline DistilBertForSequenceClassification from PabitraJiban +author: John Snow Labs +name: credit_card_collection_intent_classification_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`credit_card_collection_intent_classification_pipeline` is a English model originally trained by PabitraJiban. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/credit_card_collection_intent_classification_pipeline_en_5.5.0_3.0_1727137348683.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/credit_card_collection_intent_classification_pipeline_en_5.5.0_3.0_1727137348683.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("credit_card_collection_intent_classification_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("credit_card_collection_intent_classification_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|credit_card_collection_intent_classification_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/PabitraJiban/Credit-card-collection-intent-classification + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-danish_roberta_portuguese_en.md b/docs/_posts/ahmedlone127/2024-09-24-danish_roberta_portuguese_en.md new file mode 100644 index 00000000000000..bda9ea5ab7c417 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-danish_roberta_portuguese_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English danish_roberta_portuguese RoBertaForSequenceClassification from mediabiasgroup +author: John Snow Labs +name: danish_roberta_portuguese +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`danish_roberta_portuguese` is a English model originally trained by mediabiasgroup. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/danish_roberta_portuguese_en_5.5.0_3.0_1727211564054.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/danish_roberta_portuguese_en_5.5.0_3.0_1727211564054.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("danish_roberta_portuguese","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("danish_roberta_portuguese", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|danish_roberta_portuguese| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|444.4 MB| + +## References + +https://huggingface.co/mediabiasgroup/da-roberta-pt \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-danish_roberta_portuguese_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-danish_roberta_portuguese_pipeline_en.md new file mode 100644 index 00000000000000..79643dcad33d72 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-danish_roberta_portuguese_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English danish_roberta_portuguese_pipeline pipeline RoBertaForSequenceClassification from mediabiasgroup +author: John Snow Labs +name: danish_roberta_portuguese_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`danish_roberta_portuguese_pipeline` is a English model originally trained by mediabiasgroup. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/danish_roberta_portuguese_pipeline_en_5.5.0_3.0_1727211599011.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/danish_roberta_portuguese_pipeline_en_5.5.0_3.0_1727211599011.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("danish_roberta_portuguese_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("danish_roberta_portuguese_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|danish_roberta_portuguese_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|444.5 MB| + +## References + +https://huggingface.co/mediabiasgroup/da-roberta-pt + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-db_mc2_4_1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-db_mc2_4_1_pipeline_en.md new file mode 100644 index 00000000000000..fa9e08d6ff9e35 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-db_mc2_4_1_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English db_mc2_4_1_pipeline pipeline DistilBertForSequenceClassification from exala +author: John Snow Labs +name: db_mc2_4_1_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`db_mc2_4_1_pipeline` is a English model originally trained by exala. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/db_mc2_4_1_pipeline_en_5.5.0_3.0_1727137585316.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/db_mc2_4_1_pipeline_en_5.5.0_3.0_1727137585316.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("db_mc2_4_1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("db_mc2_4_1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|db_mc2_4_1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.7 MB| + +## References + +https://huggingface.co/exala/db_mc2_4.1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-deberta_v2_base_japanese_ku_nlp_ja.md b/docs/_posts/ahmedlone127/2024-09-24-deberta_v2_base_japanese_ku_nlp_ja.md new file mode 100644 index 00000000000000..9b4d0e016a35a8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-deberta_v2_base_japanese_ku_nlp_ja.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Japanese deberta_v2_base_japanese_ku_nlp DeBertaEmbeddings from ku-nlp +author: John Snow Labs +name: deberta_v2_base_japanese_ku_nlp +date: 2024-09-24 +tags: [ja, open_source, onnx, embeddings, deberta] +task: Embeddings +language: ja +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DeBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DeBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`deberta_v2_base_japanese_ku_nlp` is a Japanese model originally trained by ku-nlp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/deberta_v2_base_japanese_ku_nlp_ja_5.5.0_3.0_1727196997773.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/deberta_v2_base_japanese_ku_nlp_ja_5.5.0_3.0_1727196997773.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = DeBertaEmbeddings.pretrained("deberta_v2_base_japanese_ku_nlp","ja") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = DeBertaEmbeddings.pretrained("deberta_v2_base_japanese_ku_nlp","ja") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|deberta_v2_base_japanese_ku_nlp| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence, token]| +|Output Labels:|[deberta]| +|Language:|ja| +|Size:|419.0 MB| + +## References + +https://huggingface.co/ku-nlp/deberta-v2-base-japanese \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-deberta_v2_base_japanese_ku_nlp_pipeline_ja.md b/docs/_posts/ahmedlone127/2024-09-24-deberta_v2_base_japanese_ku_nlp_pipeline_ja.md new file mode 100644 index 00000000000000..3ead0421a9f772 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-deberta_v2_base_japanese_ku_nlp_pipeline_ja.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Japanese deberta_v2_base_japanese_ku_nlp_pipeline pipeline DeBertaEmbeddings from ku-nlp +author: John Snow Labs +name: deberta_v2_base_japanese_ku_nlp_pipeline +date: 2024-09-24 +tags: [ja, open_source, pipeline, onnx] +task: Embeddings +language: ja +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DeBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`deberta_v2_base_japanese_ku_nlp_pipeline` is a Japanese model originally trained by ku-nlp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/deberta_v2_base_japanese_ku_nlp_pipeline_ja_5.5.0_3.0_1727197018414.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/deberta_v2_base_japanese_ku_nlp_pipeline_ja_5.5.0_3.0_1727197018414.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("deberta_v2_base_japanese_ku_nlp_pipeline", lang = "ja") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("deberta_v2_base_japanese_ku_nlp_pipeline", lang = "ja") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|deberta_v2_base_japanese_ku_nlp_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|ja| +|Size:|419.0 MB| + +## References + +https://huggingface.co/ku-nlp/deberta-v2-base-japanese + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DeBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-deberta_v2_large_japanese_ja.md b/docs/_posts/ahmedlone127/2024-09-24-deberta_v2_large_japanese_ja.md new file mode 100644 index 00000000000000..a74e558350cd1a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-deberta_v2_large_japanese_ja.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Japanese deberta_v2_large_japanese DeBertaEmbeddings from ku-nlp +author: John Snow Labs +name: deberta_v2_large_japanese +date: 2024-09-24 +tags: [ja, open_source, onnx, embeddings, deberta] +task: Embeddings +language: ja +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DeBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DeBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`deberta_v2_large_japanese` is a Japanese model originally trained by ku-nlp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/deberta_v2_large_japanese_ja_5.5.0_3.0_1727197101334.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/deberta_v2_large_japanese_ja_5.5.0_3.0_1727197101334.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = DeBertaEmbeddings.pretrained("deberta_v2_large_japanese","ja") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = DeBertaEmbeddings.pretrained("deberta_v2_large_japanese","ja") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|deberta_v2_large_japanese| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence, token]| +|Output Labels:|[deberta]| +|Language:|ja| +|Size:|1.3 GB| + +## References + +https://huggingface.co/ku-nlp/deberta-v2-large-japanese \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-deberta_v2_large_japanese_pipeline_ja.md b/docs/_posts/ahmedlone127/2024-09-24-deberta_v2_large_japanese_pipeline_ja.md new file mode 100644 index 00000000000000..8898d22e16cab0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-deberta_v2_large_japanese_pipeline_ja.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Japanese deberta_v2_large_japanese_pipeline pipeline DeBertaEmbeddings from ku-nlp +author: John Snow Labs +name: deberta_v2_large_japanese_pipeline +date: 2024-09-24 +tags: [ja, open_source, pipeline, onnx] +task: Embeddings +language: ja +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DeBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`deberta_v2_large_japanese_pipeline` is a Japanese model originally trained by ku-nlp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/deberta_v2_large_japanese_pipeline_ja_5.5.0_3.0_1727197163862.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/deberta_v2_large_japanese_pipeline_ja_5.5.0_3.0_1727197163862.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("deberta_v2_large_japanese_pipeline", lang = "ja") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("deberta_v2_large_japanese_pipeline", lang = "ja") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|deberta_v2_large_japanese_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|ja| +|Size:|1.3 GB| + +## References + +https://huggingface.co/ku-nlp/deberta-v2-large-japanese + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DeBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-deberta_v3_base_prompt_injection_protectai_en.md b/docs/_posts/ahmedlone127/2024-09-24-deberta_v3_base_prompt_injection_protectai_en.md new file mode 100644 index 00000000000000..f83622f8c1c10f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-deberta_v3_base_prompt_injection_protectai_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English deberta_v3_base_prompt_injection_protectai DeBertaForSequenceClassification from protectai +author: John Snow Labs +name: deberta_v3_base_prompt_injection_protectai +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, deberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DeBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DeBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`deberta_v3_base_prompt_injection_protectai` is a English model originally trained by protectai. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/deberta_v3_base_prompt_injection_protectai_en_5.5.0_3.0_1727212657514.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/deberta_v3_base_prompt_injection_protectai_en_5.5.0_3.0_1727212657514.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DeBertaForSequenceClassification.pretrained("deberta_v3_base_prompt_injection_protectai","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DeBertaForSequenceClassification.pretrained("deberta_v3_base_prompt_injection_protectai", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|deberta_v3_base_prompt_injection_protectai| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|677.6 MB| + +## References + +https://huggingface.co/protectai/deberta-v3-base-prompt-injection \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-deberta_v3_base_prompt_injection_protectai_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-deberta_v3_base_prompt_injection_protectai_pipeline_en.md new file mode 100644 index 00000000000000..ef3676ca704fb5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-deberta_v3_base_prompt_injection_protectai_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English deberta_v3_base_prompt_injection_protectai_pipeline pipeline DeBertaForSequenceClassification from protectai +author: John Snow Labs +name: deberta_v3_base_prompt_injection_protectai_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DeBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`deberta_v3_base_prompt_injection_protectai_pipeline` is a English model originally trained by protectai. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/deberta_v3_base_prompt_injection_protectai_pipeline_en_5.5.0_3.0_1727212696688.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/deberta_v3_base_prompt_injection_protectai_pipeline_en_5.5.0_3.0_1727212696688.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("deberta_v3_base_prompt_injection_protectai_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("deberta_v3_base_prompt_injection_protectai_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|deberta_v3_base_prompt_injection_protectai_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|677.6 MB| + +## References + +https://huggingface.co/protectai/deberta-v3-base-prompt-injection + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DeBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-deberta_v3_large_hf_weights_en.md b/docs/_posts/ahmedlone127/2024-09-24-deberta_v3_large_hf_weights_en.md new file mode 100644 index 00000000000000..a77c95fd9b6ba7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-deberta_v3_large_hf_weights_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English deberta_v3_large_hf_weights DeBertaEmbeddings from nagupv +author: John Snow Labs +name: deberta_v3_large_hf_weights +date: 2024-09-24 +tags: [en, open_source, onnx, embeddings, deberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DeBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DeBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`deberta_v3_large_hf_weights` is a English model originally trained by nagupv. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/deberta_v3_large_hf_weights_en_5.5.0_3.0_1727197193591.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/deberta_v3_large_hf_weights_en_5.5.0_3.0_1727197193591.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = DeBertaEmbeddings.pretrained("deberta_v3_large_hf_weights","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = DeBertaEmbeddings.pretrained("deberta_v3_large_hf_weights","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|deberta_v3_large_hf_weights| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence, token]| +|Output Labels:|[deberta]| +|Language:|en| +|Size:|1.6 GB| + +## References + +https://huggingface.co/nagupv/deberta-v3-large-hf-weights \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-deberta_v3_large_hf_weights_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-deberta_v3_large_hf_weights_pipeline_en.md new file mode 100644 index 00000000000000..eee146bb569ea3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-deberta_v3_large_hf_weights_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English deberta_v3_large_hf_weights_pipeline pipeline DeBertaEmbeddings from nagupv +author: John Snow Labs +name: deberta_v3_large_hf_weights_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DeBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`deberta_v3_large_hf_weights_pipeline` is a English model originally trained by nagupv. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/deberta_v3_large_hf_weights_pipeline_en_5.5.0_3.0_1727197274114.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/deberta_v3_large_hf_weights_pipeline_en_5.5.0_3.0_1727197274114.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("deberta_v3_large_hf_weights_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("deberta_v3_large_hf_weights_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|deberta_v3_large_hf_weights_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.6 GB| + +## References + +https://huggingface.co/nagupv/deberta-v3-large-hf-weights + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DeBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-deberta_v3_small_finetuned_squad_en.md b/docs/_posts/ahmedlone127/2024-09-24-deberta_v3_small_finetuned_squad_en.md new file mode 100644 index 00000000000000..21279ac15a8d21 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-deberta_v3_small_finetuned_squad_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English deberta_v3_small_finetuned_squad DeBertaForQuestionAnswering from mrm8488 +author: John Snow Labs +name: deberta_v3_small_finetuned_squad +date: 2024-09-24 +tags: [en, open_source, onnx, question_answering, deberta] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DeBertaForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DeBertaForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`deberta_v3_small_finetuned_squad` is a English model originally trained by mrm8488. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/deberta_v3_small_finetuned_squad_en_5.5.0_3.0_1727215551405.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/deberta_v3_small_finetuned_squad_en_5.5.0_3.0_1727215551405.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DeBertaForQuestionAnswering.pretrained("deberta_v3_small_finetuned_squad","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DeBertaForQuestionAnswering.pretrained("deberta_v3_small_finetuned_squad", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|deberta_v3_small_finetuned_squad| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|484.5 MB| + +## References + +https://huggingface.co/mrm8488/deberta-v3-small-finetuned-squad \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-deberta_v3_small_finetuned_squad_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-deberta_v3_small_finetuned_squad_pipeline_en.md new file mode 100644 index 00000000000000..8b2d409674c310 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-deberta_v3_small_finetuned_squad_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English deberta_v3_small_finetuned_squad_pipeline pipeline DeBertaForQuestionAnswering from mrm8488 +author: John Snow Labs +name: deberta_v3_small_finetuned_squad_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DeBertaForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`deberta_v3_small_finetuned_squad_pipeline` is a English model originally trained by mrm8488. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/deberta_v3_small_finetuned_squad_pipeline_en_5.5.0_3.0_1727215592183.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/deberta_v3_small_finetuned_squad_pipeline_en_5.5.0_3.0_1727215592183.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("deberta_v3_small_finetuned_squad_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("deberta_v3_small_finetuned_squad_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|deberta_v3_small_finetuned_squad_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|484.5 MB| + +## References + +https://huggingface.co/mrm8488/deberta-v3-small-finetuned-squad + +## Included Models + +- MultiDocumentAssembler +- DeBertaForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-deberta_v3_xsmall_stsb_en.md b/docs/_posts/ahmedlone127/2024-09-24-deberta_v3_xsmall_stsb_en.md new file mode 100644 index 00000000000000..5892c87324efdd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-deberta_v3_xsmall_stsb_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English deberta_v3_xsmall_stsb DeBertaForSequenceClassification from cliang1453 +author: John Snow Labs +name: deberta_v3_xsmall_stsb +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, deberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DeBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DeBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`deberta_v3_xsmall_stsb` is a English model originally trained by cliang1453. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/deberta_v3_xsmall_stsb_en_5.5.0_3.0_1727212651337.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/deberta_v3_xsmall_stsb_en_5.5.0_3.0_1727212651337.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DeBertaForSequenceClassification.pretrained("deberta_v3_xsmall_stsb","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DeBertaForSequenceClassification.pretrained("deberta_v3_xsmall_stsb", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|deberta_v3_xsmall_stsb| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|207.8 MB| + +## References + +https://huggingface.co/cliang1453/deberta-v3-xsmall-stsb \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-deberta_v3_xsmall_stsb_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-deberta_v3_xsmall_stsb_pipeline_en.md new file mode 100644 index 00000000000000..21e86b1d261e5c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-deberta_v3_xsmall_stsb_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English deberta_v3_xsmall_stsb_pipeline pipeline DeBertaForSequenceClassification from cliang1453 +author: John Snow Labs +name: deberta_v3_xsmall_stsb_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DeBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`deberta_v3_xsmall_stsb_pipeline` is a English model originally trained by cliang1453. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/deberta_v3_xsmall_stsb_pipeline_en_5.5.0_3.0_1727212684677.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/deberta_v3_xsmall_stsb_pipeline_en_5.5.0_3.0_1727212684677.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("deberta_v3_xsmall_stsb_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("deberta_v3_xsmall_stsb_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|deberta_v3_xsmall_stsb_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|207.9 MB| + +## References + +https://huggingface.co/cliang1453/deberta-v3-xsmall-stsb + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DeBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-deeppolicytracker_200k_en.md b/docs/_posts/ahmedlone127/2024-09-24-deeppolicytracker_200k_en.md new file mode 100644 index 00000000000000..4cda7546e51c74 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-deeppolicytracker_200k_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English deeppolicytracker_200k RoBertaEmbeddings from flavio-nakasato +author: John Snow Labs +name: deeppolicytracker_200k +date: 2024-09-24 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`deeppolicytracker_200k` is a English model originally trained by flavio-nakasato. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/deeppolicytracker_200k_en_5.5.0_3.0_1727169009922.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/deeppolicytracker_200k_en_5.5.0_3.0_1727169009922.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("deeppolicytracker_200k","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("deeppolicytracker_200k","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|deeppolicytracker_200k| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|305.8 MB| + +## References + +https://huggingface.co/flavio-nakasato/deeppolicytracker_200k \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-deeppolicytracker_200k_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-deeppolicytracker_200k_pipeline_en.md new file mode 100644 index 00000000000000..28fad635fbd3e8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-deeppolicytracker_200k_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English deeppolicytracker_200k_pipeline pipeline RoBertaEmbeddings from flavio-nakasato +author: John Snow Labs +name: deeppolicytracker_200k_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`deeppolicytracker_200k_pipeline` is a English model originally trained by flavio-nakasato. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/deeppolicytracker_200k_pipeline_en_5.5.0_3.0_1727169025830.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/deeppolicytracker_200k_pipeline_en_5.5.0_3.0_1727169025830.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("deeppolicytracker_200k_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("deeppolicytracker_200k_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|deeppolicytracker_200k_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|305.8 MB| + +## References + +https://huggingface.co/flavio-nakasato/deeppolicytracker_200k + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-deepset_bert_base_cased_squad2_orkg_what_5e_05_en.md b/docs/_posts/ahmedlone127/2024-09-24-deepset_bert_base_cased_squad2_orkg_what_5e_05_en.md new file mode 100644 index 00000000000000..1162738474146a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-deepset_bert_base_cased_squad2_orkg_what_5e_05_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English deepset_bert_base_cased_squad2_orkg_what_5e_05 BertForQuestionAnswering from Moussab +author: John Snow Labs +name: deepset_bert_base_cased_squad2_orkg_what_5e_05 +date: 2024-09-24 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`deepset_bert_base_cased_squad2_orkg_what_5e_05` is a English model originally trained by Moussab. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/deepset_bert_base_cased_squad2_orkg_what_5e_05_en_5.5.0_3.0_1727176079756.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/deepset_bert_base_cased_squad2_orkg_what_5e_05_en_5.5.0_3.0_1727176079756.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("deepset_bert_base_cased_squad2_orkg_what_5e_05","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("deepset_bert_base_cased_squad2_orkg_what_5e_05", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|deepset_bert_base_cased_squad2_orkg_what_5e_05| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/Moussab/deepset_bert-base-cased-squad2-orkg-what-5e-05 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-deepset_bert_base_cased_squad2_orkg_what_5e_05_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-deepset_bert_base_cased_squad2_orkg_what_5e_05_pipeline_en.md new file mode 100644 index 00000000000000..f5e6af3642871e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-deepset_bert_base_cased_squad2_orkg_what_5e_05_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English deepset_bert_base_cased_squad2_orkg_what_5e_05_pipeline pipeline BertForQuestionAnswering from Moussab +author: John Snow Labs +name: deepset_bert_base_cased_squad2_orkg_what_5e_05_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`deepset_bert_base_cased_squad2_orkg_what_5e_05_pipeline` is a English model originally trained by Moussab. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/deepset_bert_base_cased_squad2_orkg_what_5e_05_pipeline_en_5.5.0_3.0_1727176104687.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/deepset_bert_base_cased_squad2_orkg_what_5e_05_pipeline_en_5.5.0_3.0_1727176104687.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("deepset_bert_base_cased_squad2_orkg_what_5e_05_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("deepset_bert_base_cased_squad2_orkg_what_5e_05_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|deepset_bert_base_cased_squad2_orkg_what_5e_05_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/Moussab/deepset_bert-base-cased-squad2-orkg-what-5e-05 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-delivery_balanced_distilbert_base_uncased_v1_en.md b/docs/_posts/ahmedlone127/2024-09-24-delivery_balanced_distilbert_base_uncased_v1_en.md new file mode 100644 index 00000000000000..e2786cb5ef3a4d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-delivery_balanced_distilbert_base_uncased_v1_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English delivery_balanced_distilbert_base_uncased_v1 DistilBertForSequenceClassification from chuuhtetnaing +author: John Snow Labs +name: delivery_balanced_distilbert_base_uncased_v1 +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`delivery_balanced_distilbert_base_uncased_v1` is a English model originally trained by chuuhtetnaing. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/delivery_balanced_distilbert_base_uncased_v1_en_5.5.0_3.0_1727137364883.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/delivery_balanced_distilbert_base_uncased_v1_en_5.5.0_3.0_1727137364883.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("delivery_balanced_distilbert_base_uncased_v1","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("delivery_balanced_distilbert_base_uncased_v1", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|delivery_balanced_distilbert_base_uncased_v1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/chuuhtetnaing/delivery-balanced-distilbert-base-uncased-v1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-delivery_balanced_distilbert_base_uncased_v1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-delivery_balanced_distilbert_base_uncased_v1_pipeline_en.md new file mode 100644 index 00000000000000..5385b601c2fd9d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-delivery_balanced_distilbert_base_uncased_v1_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English delivery_balanced_distilbert_base_uncased_v1_pipeline pipeline DistilBertForSequenceClassification from chuuhtetnaing +author: John Snow Labs +name: delivery_balanced_distilbert_base_uncased_v1_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`delivery_balanced_distilbert_base_uncased_v1_pipeline` is a English model originally trained by chuuhtetnaing. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/delivery_balanced_distilbert_base_uncased_v1_pipeline_en_5.5.0_3.0_1727137377799.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/delivery_balanced_distilbert_base_uncased_v1_pipeline_en_5.5.0_3.0_1727137377799.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("delivery_balanced_distilbert_base_uncased_v1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("delivery_balanced_distilbert_base_uncased_v1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|delivery_balanced_distilbert_base_uncased_v1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/chuuhtetnaing/delivery-balanced-distilbert-base-uncased-v1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-dgx1_whisper_base_finetune_teacher_norwegian_noise_mozilla_100_epochs_batch_8_en.md b/docs/_posts/ahmedlone127/2024-09-24-dgx1_whisper_base_finetune_teacher_norwegian_noise_mozilla_100_epochs_batch_8_en.md new file mode 100644 index 00000000000000..eb0ff465b8c88d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-dgx1_whisper_base_finetune_teacher_norwegian_noise_mozilla_100_epochs_batch_8_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English dgx1_whisper_base_finetune_teacher_norwegian_noise_mozilla_100_epochs_batch_8 WhisperForCTC from rohitp1 +author: John Snow Labs +name: dgx1_whisper_base_finetune_teacher_norwegian_noise_mozilla_100_epochs_batch_8 +date: 2024-09-24 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`dgx1_whisper_base_finetune_teacher_norwegian_noise_mozilla_100_epochs_batch_8` is a English model originally trained by rohitp1. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/dgx1_whisper_base_finetune_teacher_norwegian_noise_mozilla_100_epochs_batch_8_en_5.5.0_3.0_1727146591942.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/dgx1_whisper_base_finetune_teacher_norwegian_noise_mozilla_100_epochs_batch_8_en_5.5.0_3.0_1727146591942.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("dgx1_whisper_base_finetune_teacher_norwegian_noise_mozilla_100_epochs_batch_8","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("dgx1_whisper_base_finetune_teacher_norwegian_noise_mozilla_100_epochs_batch_8", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|dgx1_whisper_base_finetune_teacher_norwegian_noise_mozilla_100_epochs_batch_8| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|646.8 MB| + +## References + +https://huggingface.co/rohitp1/dgx1_whisper_base_finetune_teacher_no_noise_mozilla_100_epochs_batch_8 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-dgx1_whisper_base_finetune_teacher_norwegian_noise_mozilla_100_epochs_batch_8_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-dgx1_whisper_base_finetune_teacher_norwegian_noise_mozilla_100_epochs_batch_8_pipeline_en.md new file mode 100644 index 00000000000000..062a203579537e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-dgx1_whisper_base_finetune_teacher_norwegian_noise_mozilla_100_epochs_batch_8_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English dgx1_whisper_base_finetune_teacher_norwegian_noise_mozilla_100_epochs_batch_8_pipeline pipeline WhisperForCTC from rohitp1 +author: John Snow Labs +name: dgx1_whisper_base_finetune_teacher_norwegian_noise_mozilla_100_epochs_batch_8_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`dgx1_whisper_base_finetune_teacher_norwegian_noise_mozilla_100_epochs_batch_8_pipeline` is a English model originally trained by rohitp1. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/dgx1_whisper_base_finetune_teacher_norwegian_noise_mozilla_100_epochs_batch_8_pipeline_en_5.5.0_3.0_1727146625621.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/dgx1_whisper_base_finetune_teacher_norwegian_noise_mozilla_100_epochs_batch_8_pipeline_en_5.5.0_3.0_1727146625621.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("dgx1_whisper_base_finetune_teacher_norwegian_noise_mozilla_100_epochs_batch_8_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("dgx1_whisper_base_finetune_teacher_norwegian_noise_mozilla_100_epochs_batch_8_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|dgx1_whisper_base_finetune_teacher_norwegian_noise_mozilla_100_epochs_batch_8_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|646.9 MB| + +## References + +https://huggingface.co/rohitp1/dgx1_whisper_base_finetune_teacher_no_noise_mozilla_100_epochs_batch_8 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-dictabert_large_he.md b/docs/_posts/ahmedlone127/2024-09-24-dictabert_large_he.md new file mode 100644 index 00000000000000..4a81e96f5a3ec5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-dictabert_large_he.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Hebrew dictabert_large BertEmbeddings from dicta-il +author: John Snow Labs +name: dictabert_large +date: 2024-09-24 +tags: [he, open_source, onnx, embeddings, bert] +task: Embeddings +language: he +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`dictabert_large` is a Hebrew model originally trained by dicta-il. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/dictabert_large_he_5.5.0_3.0_1727174099880.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/dictabert_large_he_5.5.0_3.0_1727174099880.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = BertEmbeddings.pretrained("dictabert_large","he") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = BertEmbeddings.pretrained("dictabert_large","he") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|dictabert_large| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[bert]| +|Language:|he| +|Size:|1.0 GB| + +## References + +https://huggingface.co/dicta-il/dictabert-large \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-dictabert_large_pipeline_he.md b/docs/_posts/ahmedlone127/2024-09-24-dictabert_large_pipeline_he.md new file mode 100644 index 00000000000000..67bb1e33d34b6f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-dictabert_large_pipeline_he.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Hebrew dictabert_large_pipeline pipeline BertEmbeddings from dicta-il +author: John Snow Labs +name: dictabert_large_pipeline +date: 2024-09-24 +tags: [he, open_source, pipeline, onnx] +task: Embeddings +language: he +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`dictabert_large_pipeline` is a Hebrew model originally trained by dicta-il. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/dictabert_large_pipeline_he_5.5.0_3.0_1727174390044.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/dictabert_large_pipeline_he_5.5.0_3.0_1727174390044.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("dictabert_large_pipeline", lang = "he") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("dictabert_large_pipeline", lang = "he") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|dictabert_large_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|he| +|Size:|1.0 GB| + +## References + +https://huggingface.co/dicta-il/dictabert-large + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-did_the_doctor_call_italian_a_specialty_bert_first512_en.md b/docs/_posts/ahmedlone127/2024-09-24-did_the_doctor_call_italian_a_specialty_bert_first512_en.md new file mode 100644 index 00000000000000..75413b1c33db40 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-did_the_doctor_call_italian_a_specialty_bert_first512_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English did_the_doctor_call_italian_a_specialty_bert_first512 BertForSequenceClassification from etadevosyan +author: John Snow Labs +name: did_the_doctor_call_italian_a_specialty_bert_first512 +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`did_the_doctor_call_italian_a_specialty_bert_first512` is a English model originally trained by etadevosyan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/did_the_doctor_call_italian_a_specialty_bert_first512_en_5.5.0_3.0_1727222235067.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/did_the_doctor_call_italian_a_specialty_bert_first512_en_5.5.0_3.0_1727222235067.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("did_the_doctor_call_italian_a_specialty_bert_first512","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("did_the_doctor_call_italian_a_specialty_bert_first512", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|did_the_doctor_call_italian_a_specialty_bert_first512| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|666.5 MB| + +## References + +https://huggingface.co/etadevosyan/did_the_doctor_call_it_a_specialty_bert_First512 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-did_the_doctor_call_italian_a_specialty_bert_first512_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-did_the_doctor_call_italian_a_specialty_bert_first512_pipeline_en.md new file mode 100644 index 00000000000000..0604281d719c4f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-did_the_doctor_call_italian_a_specialty_bert_first512_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English did_the_doctor_call_italian_a_specialty_bert_first512_pipeline pipeline BertForSequenceClassification from etadevosyan +author: John Snow Labs +name: did_the_doctor_call_italian_a_specialty_bert_first512_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`did_the_doctor_call_italian_a_specialty_bert_first512_pipeline` is a English model originally trained by etadevosyan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/did_the_doctor_call_italian_a_specialty_bert_first512_pipeline_en_5.5.0_3.0_1727222269094.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/did_the_doctor_call_italian_a_specialty_bert_first512_pipeline_en_5.5.0_3.0_1727222269094.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("did_the_doctor_call_italian_a_specialty_bert_first512_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("did_the_doctor_call_italian_a_specialty_bert_first512_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|did_the_doctor_call_italian_a_specialty_bert_first512_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|666.5 MB| + +## References + +https://huggingface.co/etadevosyan/did_the_doctor_call_it_a_specialty_bert_First512 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-discourse_model_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-discourse_model_pipeline_en.md new file mode 100644 index 00000000000000..70b3098c9e863f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-discourse_model_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English discourse_model_pipeline pipeline RoBertaForSequenceClassification from lightcarrieson +author: John Snow Labs +name: discourse_model_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`discourse_model_pipeline` is a English model originally trained by lightcarrieson. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/discourse_model_pipeline_en_5.5.0_3.0_1727171842123.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/discourse_model_pipeline_en_5.5.0_3.0_1727171842123.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("discourse_model_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("discourse_model_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|discourse_model_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|422.8 MB| + +## References + +https://huggingface.co/lightcarrieson/discourse_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-disease_classifier_en.md b/docs/_posts/ahmedlone127/2024-09-24-disease_classifier_en.md new file mode 100644 index 00000000000000..22d6b0a20ea9c1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-disease_classifier_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English disease_classifier DistilBertForSequenceClassification from Amirth24 +author: John Snow Labs +name: disease_classifier +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`disease_classifier` is a English model originally trained by Amirth24. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/disease_classifier_en_5.5.0_3.0_1727204902517.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/disease_classifier_en_5.5.0_3.0_1727204902517.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("disease_classifier","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("disease_classifier", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|disease_classifier| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|252.6 MB| + +## References + +https://huggingface.co/Amirth24/disease_classifier \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-dissertation_bert_en.md b/docs/_posts/ahmedlone127/2024-09-24-dissertation_bert_en.md new file mode 100644 index 00000000000000..b5b21abdf24db8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-dissertation_bert_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English dissertation_bert BertForSequenceClassification from ohid19 +author: John Snow Labs +name: dissertation_bert +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`dissertation_bert` is a English model originally trained by ohid19. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/dissertation_bert_en_5.5.0_3.0_1727213690638.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/dissertation_bert_en_5.5.0_3.0_1727213690638.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("dissertation_bert","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("dissertation_bert", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|dissertation_bert| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/ohid19/dissertation_bert \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-dissertation_bert_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-dissertation_bert_pipeline_en.md new file mode 100644 index 00000000000000..a9a14441db95ed --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-dissertation_bert_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English dissertation_bert_pipeline pipeline BertForSequenceClassification from ohid19 +author: John Snow Labs +name: dissertation_bert_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`dissertation_bert_pipeline` is a English model originally trained by ohid19. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/dissertation_bert_pipeline_en_5.5.0_3.0_1727213711964.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/dissertation_bert_pipeline_en_5.5.0_3.0_1727213711964.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("dissertation_bert_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("dissertation_bert_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|dissertation_bert_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/ohid19/dissertation_bert + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_indonesian_fire_classification_silvanus_en.md b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_indonesian_fire_classification_silvanus_en.md new file mode 100644 index 00000000000000..09aad0ffed3a92 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_indonesian_fire_classification_silvanus_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_indonesian_fire_classification_silvanus DistilBertForSequenceClassification from rollerhafeezh-amikom +author: John Snow Labs +name: distilbert_base_indonesian_fire_classification_silvanus +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_indonesian_fire_classification_silvanus` is a English model originally trained by rollerhafeezh-amikom. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_indonesian_fire_classification_silvanus_en_5.5.0_3.0_1727154860154.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_indonesian_fire_classification_silvanus_en_5.5.0_3.0_1727154860154.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_indonesian_fire_classification_silvanus","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_indonesian_fire_classification_silvanus", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_indonesian_fire_classification_silvanus| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|255.2 MB| + +## References + +https://huggingface.co/rollerhafeezh-amikom/distilbert-base-indonesian-fire-classification-silvanus \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_indonesian_fire_classification_silvanus_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_indonesian_fire_classification_silvanus_pipeline_en.md new file mode 100644 index 00000000000000..ba5fee150ebdab --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_indonesian_fire_classification_silvanus_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_indonesian_fire_classification_silvanus_pipeline pipeline DistilBertForSequenceClassification from rollerhafeezh-amikom +author: John Snow Labs +name: distilbert_base_indonesian_fire_classification_silvanus_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_indonesian_fire_classification_silvanus_pipeline` is a English model originally trained by rollerhafeezh-amikom. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_indonesian_fire_classification_silvanus_pipeline_en_5.5.0_3.0_1727154873177.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_indonesian_fire_classification_silvanus_pipeline_en_5.5.0_3.0_1727154873177.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_indonesian_fire_classification_silvanus_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_indonesian_fire_classification_silvanus_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_indonesian_fire_classification_silvanus_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|255.3 MB| + +## References + +https://huggingface.co/rollerhafeezh-amikom/distilbert-base-indonesian-fire-classification-silvanus + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_multilingual_cased_sent_negativo_esp_pipeline_xx.md b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_multilingual_cased_sent_negativo_esp_pipeline_xx.md new file mode 100644 index 00000000000000..4eed197d4e9964 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_multilingual_cased_sent_negativo_esp_pipeline_xx.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Multilingual distilbert_base_multilingual_cased_sent_negativo_esp_pipeline pipeline DistilBertForSequenceClassification from rogelioplatt +author: John Snow Labs +name: distilbert_base_multilingual_cased_sent_negativo_esp_pipeline +date: 2024-09-24 +tags: [xx, open_source, pipeline, onnx] +task: Text Classification +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_multilingual_cased_sent_negativo_esp_pipeline` is a Multilingual model originally trained by rogelioplatt. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_multilingual_cased_sent_negativo_esp_pipeline_xx_5.5.0_3.0_1727204955319.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_multilingual_cased_sent_negativo_esp_pipeline_xx_5.5.0_3.0_1727204955319.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_multilingual_cased_sent_negativo_esp_pipeline", lang = "xx") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_multilingual_cased_sent_negativo_esp_pipeline", lang = "xx") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_multilingual_cased_sent_negativo_esp_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|xx| +|Size:|507.6 MB| + +## References + +https://huggingface.co/rogelioplatt/distilbert-base-multilingual-cased-Sent_Negativo_Esp + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_thai_cased_finetuned_sentiment_cleaned_en.md b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_thai_cased_finetuned_sentiment_cleaned_en.md new file mode 100644 index 00000000000000..9aac433851edd7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_thai_cased_finetuned_sentiment_cleaned_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_thai_cased_finetuned_sentiment_cleaned DistilBertForSequenceClassification from FlukeTJ +author: John Snow Labs +name: distilbert_base_thai_cased_finetuned_sentiment_cleaned +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_thai_cased_finetuned_sentiment_cleaned` is a English model originally trained by FlukeTJ. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_thai_cased_finetuned_sentiment_cleaned_en_5.5.0_3.0_1727137179285.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_thai_cased_finetuned_sentiment_cleaned_en_5.5.0_3.0_1727137179285.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_thai_cased_finetuned_sentiment_cleaned","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_thai_cased_finetuned_sentiment_cleaned", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_thai_cased_finetuned_sentiment_cleaned| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|507.6 MB| + +## References + +https://huggingface.co/FlukeTJ/distilbert-base-thai-cased-finetuned-sentiment-cleaned \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_5000_questions_gt_3_5epochs_en.md b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_5000_questions_gt_3_5epochs_en.md new file mode 100644 index 00000000000000..67c62cd497d2b4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_5000_questions_gt_3_5epochs_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_5000_questions_gt_3_5epochs DistilBertForSequenceClassification from Abhibeats95 +author: John Snow Labs +name: distilbert_base_uncased_5000_questions_gt_3_5epochs +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_5000_questions_gt_3_5epochs` is a English model originally trained by Abhibeats95. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_5000_questions_gt_3_5epochs_en_5.5.0_3.0_1727137172073.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_5000_questions_gt_3_5epochs_en_5.5.0_3.0_1727137172073.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_5000_questions_gt_3_5epochs","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_5000_questions_gt_3_5epochs", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_5000_questions_gt_3_5epochs| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Abhibeats95/distilbert-base-uncased-5000_questions_gt_3_5epochs \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_5000_questions_gt_3_5epochs_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_5000_questions_gt_3_5epochs_pipeline_en.md new file mode 100644 index 00000000000000..df67f58c1b660f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_5000_questions_gt_3_5epochs_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_5000_questions_gt_3_5epochs_pipeline pipeline DistilBertForSequenceClassification from Abhibeats95 +author: John Snow Labs +name: distilbert_base_uncased_5000_questions_gt_3_5epochs_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_5000_questions_gt_3_5epochs_pipeline` is a English model originally trained by Abhibeats95. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_5000_questions_gt_3_5epochs_pipeline_en_5.5.0_3.0_1727137184801.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_5000_questions_gt_3_5epochs_pipeline_en_5.5.0_3.0_1727137184801.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_5000_questions_gt_3_5epochs_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_5000_questions_gt_3_5epochs_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_5000_questions_gt_3_5epochs_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Abhibeats95/distilbert-base-uncased-5000_questions_gt_3_5epochs + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_fb_housing_posts_en.md b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_fb_housing_posts_en.md new file mode 100644 index 00000000000000..c6899214c193ba --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_fb_housing_posts_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_fb_housing_posts DistilBertForSequenceClassification from hoaj +author: John Snow Labs +name: distilbert_base_uncased_fb_housing_posts +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_fb_housing_posts` is a English model originally trained by hoaj. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_fb_housing_posts_en_5.5.0_3.0_1727164361417.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_fb_housing_posts_en_5.5.0_3.0_1727164361417.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_fb_housing_posts","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_fb_housing_posts", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_fb_housing_posts| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/hoaj/distilbert-base-uncased-fb-housing-posts \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_fb_housing_posts_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_fb_housing_posts_pipeline_en.md new file mode 100644 index 00000000000000..d6e8a371d84b45 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_fb_housing_posts_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_fb_housing_posts_pipeline pipeline DistilBertForSequenceClassification from hoaj +author: John Snow Labs +name: distilbert_base_uncased_fb_housing_posts_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_fb_housing_posts_pipeline` is a English model originally trained by hoaj. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_fb_housing_posts_pipeline_en_5.5.0_3.0_1727164375136.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_fb_housing_posts_pipeline_en_5.5.0_3.0_1727164375136.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_fb_housing_posts_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_fb_housing_posts_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_fb_housing_posts_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/hoaj/distilbert-base-uncased-fb-housing-posts + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetune_six_emotions_en.md b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetune_six_emotions_en.md new file mode 100644 index 00000000000000..4731e1c0a5f381 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetune_six_emotions_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetune_six_emotions DistilBertForSequenceClassification from Logicloom44 +author: John Snow Labs +name: distilbert_base_uncased_finetune_six_emotions +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetune_six_emotions` is a English model originally trained by Logicloom44. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetune_six_emotions_en_5.5.0_3.0_1727136928209.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetune_six_emotions_en_5.5.0_3.0_1727136928209.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetune_six_emotions","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetune_six_emotions", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetune_six_emotions| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Logicloom44/distilbert-base-uncased-finetune-six-emotions \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_clinc_bisoye_en.md b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_clinc_bisoye_en.md new file mode 100644 index 00000000000000..82d1faff1ba439 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_clinc_bisoye_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_clinc_bisoye DistilBertForSequenceClassification from bisoye +author: John Snow Labs +name: distilbert_base_uncased_finetuned_clinc_bisoye +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_clinc_bisoye` is a English model originally trained by bisoye. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_clinc_bisoye_en_5.5.0_3.0_1727154941511.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_clinc_bisoye_en_5.5.0_3.0_1727154941511.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_clinc_bisoye","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_clinc_bisoye", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_clinc_bisoye| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.9 MB| + +## References + +https://huggingface.co/bisoye/distilbert-base-uncased-finetuned-clinc \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_clinc_bisoye_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_clinc_bisoye_pipeline_en.md new file mode 100644 index 00000000000000..f010890f6bbe0e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_clinc_bisoye_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_clinc_bisoye_pipeline pipeline DistilBertForSequenceClassification from bisoye +author: John Snow Labs +name: distilbert_base_uncased_finetuned_clinc_bisoye_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_clinc_bisoye_pipeline` is a English model originally trained by bisoye. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_clinc_bisoye_pipeline_en_5.5.0_3.0_1727154954941.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_clinc_bisoye_pipeline_en_5.5.0_3.0_1727154954941.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_clinc_bisoye_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_clinc_bisoye_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_clinc_bisoye_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.9 MB| + +## References + +https://huggingface.co/bisoye/distilbert-base-uncased-finetuned-clinc + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_clinc_nachikethmurthy666_en.md b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_clinc_nachikethmurthy666_en.md new file mode 100644 index 00000000000000..1e9558b5c03723 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_clinc_nachikethmurthy666_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_clinc_nachikethmurthy666 DistilBertForSequenceClassification from nachikethmurthy666 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_clinc_nachikethmurthy666 +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_clinc_nachikethmurthy666` is a English model originally trained by nachikethmurthy666. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_clinc_nachikethmurthy666_en_5.5.0_3.0_1727136820558.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_clinc_nachikethmurthy666_en_5.5.0_3.0_1727136820558.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_clinc_nachikethmurthy666","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_clinc_nachikethmurthy666", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_clinc_nachikethmurthy666| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.9 MB| + +## References + +https://huggingface.co/nachikethmurthy666/distilbert-base-uncased-finetuned-clinc \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_clinc_nachikethmurthy666_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_clinc_nachikethmurthy666_pipeline_en.md new file mode 100644 index 00000000000000..e20f5605f60f78 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_clinc_nachikethmurthy666_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_clinc_nachikethmurthy666_pipeline pipeline DistilBertForSequenceClassification from nachikethmurthy666 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_clinc_nachikethmurthy666_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_clinc_nachikethmurthy666_pipeline` is a English model originally trained by nachikethmurthy666. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_clinc_nachikethmurthy666_pipeline_en_5.5.0_3.0_1727136840887.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_clinc_nachikethmurthy666_pipeline_en_5.5.0_3.0_1727136840887.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_clinc_nachikethmurthy666_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_clinc_nachikethmurthy666_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_clinc_nachikethmurthy666_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.9 MB| + +## References + +https://huggingface.co/nachikethmurthy666/distilbert-base-uncased-finetuned-clinc + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_clinc_pbruna_en.md b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_clinc_pbruna_en.md new file mode 100644 index 00000000000000..6f9b123ad4788e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_clinc_pbruna_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_clinc_pbruna DistilBertForSequenceClassification from pbruna +author: John Snow Labs +name: distilbert_base_uncased_finetuned_clinc_pbruna +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_clinc_pbruna` is a English model originally trained by pbruna. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_clinc_pbruna_en_5.5.0_3.0_1727154503765.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_clinc_pbruna_en_5.5.0_3.0_1727154503765.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_clinc_pbruna","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_clinc_pbruna", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_clinc_pbruna| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.9 MB| + +## References + +https://huggingface.co/pbruna/distilbert-base-uncased-finetuned-clinc \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_clinc_pbruna_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_clinc_pbruna_pipeline_en.md new file mode 100644 index 00000000000000..feba3bfa24b155 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_clinc_pbruna_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_clinc_pbruna_pipeline pipeline DistilBertForSequenceClassification from pbruna +author: John Snow Labs +name: distilbert_base_uncased_finetuned_clinc_pbruna_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_clinc_pbruna_pipeline` is a English model originally trained by pbruna. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_clinc_pbruna_pipeline_en_5.5.0_3.0_1727154516524.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_clinc_pbruna_pipeline_en_5.5.0_3.0_1727154516524.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_clinc_pbruna_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_clinc_pbruna_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_clinc_pbruna_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.9 MB| + +## References + +https://huggingface.co/pbruna/distilbert-base-uncased-finetuned-clinc + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_clinc_woodspoon09_en.md b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_clinc_woodspoon09_en.md new file mode 100644 index 00000000000000..460596aa20fa99 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_clinc_woodspoon09_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_clinc_woodspoon09 DistilBertForSequenceClassification from woodspoon09 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_clinc_woodspoon09 +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_clinc_woodspoon09` is a English model originally trained by woodspoon09. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_clinc_woodspoon09_en_5.5.0_3.0_1727154712970.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_clinc_woodspoon09_en_5.5.0_3.0_1727154712970.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_clinc_woodspoon09","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_clinc_woodspoon09", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_clinc_woodspoon09| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.9 MB| + +## References + +https://huggingface.co/woodspoon09/distilbert-base-uncased-finetuned-clinc \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_clinc_woodspoon09_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_clinc_woodspoon09_pipeline_en.md new file mode 100644 index 00000000000000..a2ea1bba01ce04 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_clinc_woodspoon09_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_clinc_woodspoon09_pipeline pipeline DistilBertForSequenceClassification from woodspoon09 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_clinc_woodspoon09_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_clinc_woodspoon09_pipeline` is a English model originally trained by woodspoon09. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_clinc_woodspoon09_pipeline_en_5.5.0_3.0_1727154727002.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_clinc_woodspoon09_pipeline_en_5.5.0_3.0_1727154727002.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_clinc_woodspoon09_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_clinc_woodspoon09_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_clinc_woodspoon09_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.9 MB| + +## References + +https://huggingface.co/woodspoon09/distilbert-base-uncased-finetuned-clinc + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_cola_againeureka_en.md b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_cola_againeureka_en.md new file mode 100644 index 00000000000000..5e86f9bf4fa218 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_cola_againeureka_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_cola_againeureka DistilBertForSequenceClassification from againeureka +author: John Snow Labs +name: distilbert_base_uncased_finetuned_cola_againeureka +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_cola_againeureka` is a English model originally trained by againeureka. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_againeureka_en_5.5.0_3.0_1727164479442.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_againeureka_en_5.5.0_3.0_1727164479442.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_cola_againeureka","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_cola_againeureka", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_cola_againeureka| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/againeureka/distilbert-base-uncased-finetuned-cola \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_cola_againeureka_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_cola_againeureka_pipeline_en.md new file mode 100644 index 00000000000000..b9080f3c901e36 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_cola_againeureka_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_cola_againeureka_pipeline pipeline DistilBertForSequenceClassification from againeureka +author: John Snow Labs +name: distilbert_base_uncased_finetuned_cola_againeureka_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_cola_againeureka_pipeline` is a English model originally trained by againeureka. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_againeureka_pipeline_en_5.5.0_3.0_1727164492245.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_againeureka_pipeline_en_5.5.0_3.0_1727164492245.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_cola_againeureka_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_cola_againeureka_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_cola_againeureka_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/againeureka/distilbert-base-uncased-finetuned-cola + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_cola_negfir_en.md b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_cola_negfir_en.md new file mode 100644 index 00000000000000..c0321b068efd6b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_cola_negfir_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_cola_negfir BertForSequenceClassification from negfir +author: John Snow Labs +name: distilbert_base_uncased_finetuned_cola_negfir +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_cola_negfir` is a English model originally trained by negfir. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_negfir_en_5.5.0_3.0_1727222318264.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_negfir_en_5.5.0_3.0_1727222318264.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_cola_negfir","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_cola_negfir", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_cola_negfir| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|24.2 MB| + +## References + +https://huggingface.co/negfir/distilbert-base-uncased-finetuned-cola \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_cola_negfir_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_cola_negfir_pipeline_en.md new file mode 100644 index 00000000000000..2f76e92bec5050 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_cola_negfir_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_cola_negfir_pipeline pipeline BertForSequenceClassification from negfir +author: John Snow Labs +name: distilbert_base_uncased_finetuned_cola_negfir_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_cola_negfir_pipeline` is a English model originally trained by negfir. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_negfir_pipeline_en_5.5.0_3.0_1727222319717.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_negfir_pipeline_en_5.5.0_3.0_1727222319717.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_cola_negfir_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_cola_negfir_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_cola_negfir_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|24.2 MB| + +## References + +https://huggingface.co/negfir/distilbert-base-uncased-finetuned-cola + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_cola_rayane321_en.md b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_cola_rayane321_en.md new file mode 100644 index 00000000000000..88683ccb30caa6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_cola_rayane321_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_cola_rayane321 DistilBertForSequenceClassification from rayane321 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_cola_rayane321 +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_cola_rayane321` is a English model originally trained by rayane321. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_rayane321_en_5.5.0_3.0_1727137183273.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_rayane321_en_5.5.0_3.0_1727137183273.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_cola_rayane321","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_cola_rayane321", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_cola_rayane321| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/rayane321/distilbert-base-uncased-finetuned-cola \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_cola_rayane321_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_cola_rayane321_pipeline_en.md new file mode 100644 index 00000000000000..9f4ae36eadf66f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_cola_rayane321_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_cola_rayane321_pipeline pipeline DistilBertForSequenceClassification from rayane321 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_cola_rayane321_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_cola_rayane321_pipeline` is a English model originally trained by rayane321. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_rayane321_pipeline_en_5.5.0_3.0_1727137197535.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_rayane321_pipeline_en_5.5.0_3.0_1727137197535.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_cola_rayane321_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_cola_rayane321_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_cola_rayane321_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/rayane321/distilbert-base-uncased-finetuned-cola + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_cola_robuved_en.md b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_cola_robuved_en.md new file mode 100644 index 00000000000000..f15fc245a967ae --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_cola_robuved_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_cola_robuved DistilBertForSequenceClassification from robuved +author: John Snow Labs +name: distilbert_base_uncased_finetuned_cola_robuved +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_cola_robuved` is a English model originally trained by robuved. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_robuved_en_5.5.0_3.0_1727154839959.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_robuved_en_5.5.0_3.0_1727154839959.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_cola_robuved","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_cola_robuved", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_cola_robuved| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/robuved/distilbert-base-uncased-finetuned-cola \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_cola_robuved_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_cola_robuved_pipeline_en.md new file mode 100644 index 00000000000000..79f9c72bcd42c9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_cola_robuved_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_cola_robuved_pipeline pipeline DistilBertForSequenceClassification from robuved +author: John Snow Labs +name: distilbert_base_uncased_finetuned_cola_robuved_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_cola_robuved_pipeline` is a English model originally trained by robuved. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_robuved_pipeline_en_5.5.0_3.0_1727154854934.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_robuved_pipeline_en_5.5.0_3.0_1727154854934.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_cola_robuved_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_cola_robuved_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_cola_robuved_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/robuved/distilbert-base-uncased-finetuned-cola + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_cola_wy3106714391_en.md b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_cola_wy3106714391_en.md new file mode 100644 index 00000000000000..47031ced5dda4e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_cola_wy3106714391_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_cola_wy3106714391 DistilBertForSequenceClassification from wy3106714391 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_cola_wy3106714391 +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_cola_wy3106714391` is a English model originally trained by wy3106714391. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_wy3106714391_en_5.5.0_3.0_1727164141876.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_wy3106714391_en_5.5.0_3.0_1727164141876.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_cola_wy3106714391","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_cola_wy3106714391", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_cola_wy3106714391| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/wy3106714391/distilbert-base-uncased-finetuned-cola \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_cola_wy3106714391_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_cola_wy3106714391_pipeline_en.md new file mode 100644 index 00000000000000..bdbe6bed085306 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_cola_wy3106714391_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_cola_wy3106714391_pipeline pipeline DistilBertForSequenceClassification from wy3106714391 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_cola_wy3106714391_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_cola_wy3106714391_pipeline` is a English model originally trained by wy3106714391. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_wy3106714391_pipeline_en_5.5.0_3.0_1727164164348.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_wy3106714391_pipeline_en_5.5.0_3.0_1727164164348.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_cola_wy3106714391_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_cola_wy3106714391_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_cola_wy3106714391_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/wy3106714391/distilbert-base-uncased-finetuned-cola + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_emotion_balanced_1000plus3000_en.md b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_emotion_balanced_1000plus3000_en.md new file mode 100644 index 00000000000000..8a0f80bce0ebb1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_emotion_balanced_1000plus3000_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_balanced_1000plus3000 DistilBertForSequenceClassification from atsstagram +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_balanced_1000plus3000 +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_balanced_1000plus3000` is a English model originally trained by atsstagram. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_balanced_1000plus3000_en_5.5.0_3.0_1727137041067.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_balanced_1000plus3000_en_5.5.0_3.0_1727137041067.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_balanced_1000plus3000","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_balanced_1000plus3000", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_balanced_1000plus3000| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/atsstagram/distilbert-base-uncased-finetuned-emotion-balanced-1000plus3000 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_emotion_balanced_1000plus3000_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_emotion_balanced_1000plus3000_pipeline_en.md new file mode 100644 index 00000000000000..fea01fc1c3d777 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_emotion_balanced_1000plus3000_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_balanced_1000plus3000_pipeline pipeline DistilBertForSequenceClassification from atsstagram +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_balanced_1000plus3000_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_balanced_1000plus3000_pipeline` is a English model originally trained by atsstagram. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_balanced_1000plus3000_pipeline_en_5.5.0_3.0_1727137054474.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_balanced_1000plus3000_pipeline_en_5.5.0_3.0_1727137054474.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_balanced_1000plus3000_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_balanced_1000plus3000_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_balanced_1000plus3000_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/atsstagram/distilbert-base-uncased-finetuned-emotion-balanced-1000plus3000 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_emotion_camaganu_en.md b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_emotion_camaganu_en.md new file mode 100644 index 00000000000000..96985f76f4260b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_emotion_camaganu_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_camaganu DistilBertForSequenceClassification from camaganu +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_camaganu +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_camaganu` is a English model originally trained by camaganu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_camaganu_en_5.5.0_3.0_1727164765994.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_camaganu_en_5.5.0_3.0_1727164765994.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_camaganu","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_camaganu", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_camaganu| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/camaganu/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_emotion_camaganu_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_emotion_camaganu_pipeline_en.md new file mode 100644 index 00000000000000..54394f87b3119c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_emotion_camaganu_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_camaganu_pipeline pipeline DistilBertForSequenceClassification from camaganu +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_camaganu_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_camaganu_pipeline` is a English model originally trained by camaganu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_camaganu_pipeline_en_5.5.0_3.0_1727164778657.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_camaganu_pipeline_en_5.5.0_3.0_1727164778657.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_camaganu_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_camaganu_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_camaganu_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/camaganu/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_emotion_jlsurdilla_en.md b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_emotion_jlsurdilla_en.md new file mode 100644 index 00000000000000..e9d12dde9c0e78 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_emotion_jlsurdilla_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_jlsurdilla DistilBertForSequenceClassification from jlsurdilla +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_jlsurdilla +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_jlsurdilla` is a English model originally trained by jlsurdilla. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_jlsurdilla_en_5.5.0_3.0_1727137165572.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_jlsurdilla_en_5.5.0_3.0_1727137165572.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_jlsurdilla","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_jlsurdilla", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_jlsurdilla| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/jlsurdilla/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_emotion_randomchar_en.md b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_emotion_randomchar_en.md new file mode 100644 index 00000000000000..6276cceb618c33 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_emotion_randomchar_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_randomchar DistilBertForSequenceClassification from RandomChar +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_randomchar +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_randomchar` is a English model originally trained by RandomChar. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_randomchar_en_5.5.0_3.0_1727164264856.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_randomchar_en_5.5.0_3.0_1727164264856.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_randomchar","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_randomchar", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_randomchar| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/RandomChar/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_emotion_ryli_en.md b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_emotion_ryli_en.md new file mode 100644 index 00000000000000..1febda74758e90 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_emotion_ryli_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_ryli DistilBertForSequenceClassification from ryli +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_ryli +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_ryli` is a English model originally trained by ryli. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_ryli_en_5.5.0_3.0_1727137053294.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_ryli_en_5.5.0_3.0_1727137053294.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_ryli","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_ryli", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_ryli| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/ryli/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_emotion_ryli_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_emotion_ryli_pipeline_en.md new file mode 100644 index 00000000000000..e5bf957c48e1e3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_emotion_ryli_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_ryli_pipeline pipeline DistilBertForSequenceClassification from ryli +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_ryli_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_ryli_pipeline` is a English model originally trained by ryli. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_ryli_pipeline_en_5.5.0_3.0_1727137068000.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_ryli_pipeline_en_5.5.0_3.0_1727137068000.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_ryli_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_ryli_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_ryli_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/ryli/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_emotion_sapkpa1_en.md b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_emotion_sapkpa1_en.md new file mode 100644 index 00000000000000..0d90712e2670b5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_emotion_sapkpa1_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_sapkpa1 DistilBertForSequenceClassification from sapkpa1 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_sapkpa1 +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_sapkpa1` is a English model originally trained by sapkpa1. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_sapkpa1_en_5.5.0_3.0_1727136821822.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_sapkpa1_en_5.5.0_3.0_1727136821822.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_sapkpa1","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_sapkpa1", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_sapkpa1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/sapkpa1/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_emotion_transformersbook_en.md b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_emotion_transformersbook_en.md new file mode 100644 index 00000000000000..2f149cd6dd67b4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_emotion_transformersbook_en.md @@ -0,0 +1,98 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_transformersbook DistilBertForSequenceClassification from transformersbook +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_transformersbook +date: 2024-09-24 +tags: [bert, en, open_source, sequence_classification, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_transformersbook` is a English model originally trained by transformersbook. + +## Predicted Entities + + + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_transformersbook_en_5.5.0_3.0_1727155017957.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_transformersbook_en_5.5.0_3.0_1727155017957.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +document_assembler = DocumentAssembler()\ + .setInputCol("text")\ + .setOutputCol("document") + +tokenizer = Tokenizer()\ + .setInputCols("document")\ + .setOutputCol("token") + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_transformersbook","en")\ + .setInputCols(["document","token"])\ + .setOutputCol("class") + +pipeline = Pipeline().setStages([document_assembler, tokenizer, sequenceClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val document_assembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_transformersbook","en") + .setInputCols(Array("document","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_transformersbook| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +References + +https://huggingface.co/transformersbook/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_emotion_transformersbook_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_emotion_transformersbook_pipeline_en.md new file mode 100644 index 00000000000000..05d9944e13a7d2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_emotion_transformersbook_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_transformersbook_pipeline pipeline DistilBertForSequenceClassification from hcyying +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_transformersbook_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_transformersbook_pipeline` is a English model originally trained by hcyying. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_transformersbook_pipeline_en_5.5.0_3.0_1727155031131.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_transformersbook_pipeline_en_5.5.0_3.0_1727155031131.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_transformersbook_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_transformersbook_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_transformersbook_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/hcyying/distilbert-base-uncased-finetuned-emotion-transformersbook + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_emotion_yukky777_en.md b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_emotion_yukky777_en.md new file mode 100644 index 00000000000000..159ff06450342a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_emotion_yukky777_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_yukky777 DistilBertForSequenceClassification from yukky777 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_yukky777 +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_yukky777` is a English model originally trained by yukky777. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_yukky777_en_5.5.0_3.0_1727137467123.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_yukky777_en_5.5.0_3.0_1727137467123.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_yukky777","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_yukky777", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_yukky777| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/yukky777/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_tweet_eval_sentiment_en.md b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_tweet_eval_sentiment_en.md new file mode 100644 index 00000000000000..085d931739b686 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_tweet_eval_sentiment_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_tweet_eval_sentiment DistilBertForSequenceClassification from HSIEN1009 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_tweet_eval_sentiment +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_tweet_eval_sentiment` is a English model originally trained by HSIEN1009. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_tweet_eval_sentiment_en_5.5.0_3.0_1727137570075.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_tweet_eval_sentiment_en_5.5.0_3.0_1727137570075.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_tweet_eval_sentiment","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_tweet_eval_sentiment", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_tweet_eval_sentiment| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/HSIEN1009/distilbert-base-uncased-finetuned-tweet_eval_sentiment \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_tweet_eval_sentiment_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_tweet_eval_sentiment_pipeline_en.md new file mode 100644 index 00000000000000..dab778374da199 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_tweet_eval_sentiment_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_tweet_eval_sentiment_pipeline pipeline DistilBertForSequenceClassification from HSIEN1009 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_tweet_eval_sentiment_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_tweet_eval_sentiment_pipeline` is a English model originally trained by HSIEN1009. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_tweet_eval_sentiment_pipeline_en_5.5.0_3.0_1727137582597.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_tweet_eval_sentiment_pipeline_en_5.5.0_3.0_1727137582597.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_tweet_eval_sentiment_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_tweet_eval_sentiment_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_tweet_eval_sentiment_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/HSIEN1009/distilbert-base-uncased-finetuned-tweet_eval_sentiment + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_mbib_2048_en.md b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_mbib_2048_en.md new file mode 100644 index 00000000000000..1be97635947410 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_mbib_2048_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_mbib_2048 DistilBertForSequenceClassification from ANGKJ1995 +author: John Snow Labs +name: distilbert_base_uncased_mbib_2048 +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_mbib_2048` is a English model originally trained by ANGKJ1995. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_mbib_2048_en_5.5.0_3.0_1727154396211.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_mbib_2048_en_5.5.0_3.0_1727154396211.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_mbib_2048","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_mbib_2048", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_mbib_2048| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/ANGKJ1995/distilbert-base-uncased-mbib-2048 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_mbib_2048_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_mbib_2048_pipeline_en.md new file mode 100644 index 00000000000000..3d9d76fde9dcc7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_mbib_2048_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_mbib_2048_pipeline pipeline DistilBertForSequenceClassification from ANGKJ1995 +author: John Snow Labs +name: distilbert_base_uncased_mbib_2048_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_mbib_2048_pipeline` is a English model originally trained by ANGKJ1995. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_mbib_2048_pipeline_en_5.5.0_3.0_1727154410370.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_mbib_2048_pipeline_en_5.5.0_3.0_1727154410370.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_mbib_2048_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_mbib_2048_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_mbib_2048_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/ANGKJ1995/distilbert-base-uncased-mbib-2048 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_odm_6e4exps_0strandom42sd_ut72ut5_plprefix0stlarge42_simsp100_clean100_en.md b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_odm_6e4exps_0strandom42sd_ut72ut5_plprefix0stlarge42_simsp100_clean100_en.md new file mode 100644 index 00000000000000..6f20a815b7f118 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_odm_6e4exps_0strandom42sd_ut72ut5_plprefix0stlarge42_simsp100_clean100_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_6e4exps_0strandom42sd_ut72ut5_plprefix0stlarge42_simsp100_clean100 DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_6e4exps_0strandom42sd_ut72ut5_plprefix0stlarge42_simsp100_clean100 +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_6e4exps_0strandom42sd_ut72ut5_plprefix0stlarge42_simsp100_clean100` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_6e4exps_0strandom42sd_ut72ut5_plprefix0stlarge42_simsp100_clean100_en_5.5.0_3.0_1727164142561.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_6e4exps_0strandom42sd_ut72ut5_plprefix0stlarge42_simsp100_clean100_en_5.5.0_3.0_1727164142561.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_6e4exps_0strandom42sd_ut72ut5_plprefix0stlarge42_simsp100_clean100","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_6e4exps_0strandom42sd_ut72ut5_plprefix0stlarge42_simsp100_clean100", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_6e4exps_0strandom42sd_ut72ut5_plprefix0stlarge42_simsp100_clean100| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.6 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_6e4exps_0strandom42sd_ut72ut5_PLPrefix0stlarge42_simsp100_clean100 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_odm_6e4exps_0strandom42sd_ut72ut5_plprefix0stlarge42_simsp100_clean100_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_odm_6e4exps_0strandom42sd_ut72ut5_plprefix0stlarge42_simsp100_clean100_pipeline_en.md new file mode 100644 index 00000000000000..1831324c346c39 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_odm_6e4exps_0strandom42sd_ut72ut5_plprefix0stlarge42_simsp100_clean100_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_6e4exps_0strandom42sd_ut72ut5_plprefix0stlarge42_simsp100_clean100_pipeline pipeline DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_6e4exps_0strandom42sd_ut72ut5_plprefix0stlarge42_simsp100_clean100_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_6e4exps_0strandom42sd_ut72ut5_plprefix0stlarge42_simsp100_clean100_pipeline` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_6e4exps_0strandom42sd_ut72ut5_plprefix0stlarge42_simsp100_clean100_pipeline_en_5.5.0_3.0_1727164160521.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_6e4exps_0strandom42sd_ut72ut5_plprefix0stlarge42_simsp100_clean100_pipeline_en_5.5.0_3.0_1727164160521.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_odm_6e4exps_0strandom42sd_ut72ut5_plprefix0stlarge42_simsp100_clean100_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_odm_6e4exps_0strandom42sd_ut72ut5_plprefix0stlarge42_simsp100_clean100_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_6e4exps_0strandom42sd_ut72ut5_plprefix0stlarge42_simsp100_clean100_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.7 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_6e4exps_0strandom42sd_ut72ut5_PLPrefix0stlarge42_simsp100_clean100 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_odm_zphr_0st13sd_ut72ut1largepfxnf_simsp300_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_odm_zphr_0st13sd_ut72ut1largepfxnf_simsp300_pipeline_en.md new file mode 100644 index 00000000000000..8e8c5500cf7252 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_odm_zphr_0st13sd_ut72ut1largepfxnf_simsp300_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st13sd_ut72ut1largepfxnf_simsp300_pipeline pipeline DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st13sd_ut72ut1largepfxnf_simsp300_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st13sd_ut72ut1largepfxnf_simsp300_pipeline` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st13sd_ut72ut1largepfxnf_simsp300_pipeline_en_5.5.0_3.0_1727164391704.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st13sd_ut72ut1largepfxnf_simsp300_pipeline_en_5.5.0_3.0_1727164391704.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st13sd_ut72ut1largepfxnf_simsp300_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st13sd_ut72ut1largepfxnf_simsp300_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st13sd_ut72ut1largepfxnf_simsp300_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.7 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st13sd_ut72ut1largePfxNf_simsp300 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_odm_zphr_0st19sd_ut72ut5_plprefix0stlarge_simsp100_clean300_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_odm_zphr_0st19sd_ut72ut5_plprefix0stlarge_simsp100_clean300_pipeline_en.md new file mode 100644 index 00000000000000..aa963f71c116aa --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_odm_zphr_0st19sd_ut72ut5_plprefix0stlarge_simsp100_clean300_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st19sd_ut72ut5_plprefix0stlarge_simsp100_clean300_pipeline pipeline DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st19sd_ut72ut5_plprefix0stlarge_simsp100_clean300_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st19sd_ut72ut5_plprefix0stlarge_simsp100_clean300_pipeline` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st19sd_ut72ut5_plprefix0stlarge_simsp100_clean300_pipeline_en_5.5.0_3.0_1727164575541.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st19sd_ut72ut5_plprefix0stlarge_simsp100_clean300_pipeline_en_5.5.0_3.0_1727164575541.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st19sd_ut72ut5_plprefix0stlarge_simsp100_clean300_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st19sd_ut72ut5_plprefix0stlarge_simsp100_clean300_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st19sd_ut72ut5_plprefix0stlarge_simsp100_clean300_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.7 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st19sd_ut72ut5_PLPrefix0stlarge_simsp100_clean300 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_odm_zphr_0st30sd_ut72ut1large30pfxnf_simsp_en.md b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_odm_zphr_0st30sd_ut72ut1large30pfxnf_simsp_en.md new file mode 100644 index 00000000000000..b5e482213759c0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_odm_zphr_0st30sd_ut72ut1large30pfxnf_simsp_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st30sd_ut72ut1large30pfxnf_simsp DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st30sd_ut72ut1large30pfxnf_simsp +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st30sd_ut72ut1large30pfxnf_simsp` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st30sd_ut72ut1large30pfxnf_simsp_en_5.5.0_3.0_1727154385187.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st30sd_ut72ut1large30pfxnf_simsp_en_5.5.0_3.0_1727154385187.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st30sd_ut72ut1large30pfxnf_simsp","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st30sd_ut72ut1large30pfxnf_simsp", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st30sd_ut72ut1large30pfxnf_simsp| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.6 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st30sd_ut72ut1large30PfxNf_simsp \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_odm_zphr_0st30sd_ut72ut1large30pfxnf_simsp_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_odm_zphr_0st30sd_ut72ut1large30pfxnf_simsp_pipeline_en.md new file mode 100644 index 00000000000000..003ba90a8b59fd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_odm_zphr_0st30sd_ut72ut1large30pfxnf_simsp_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st30sd_ut72ut1large30pfxnf_simsp_pipeline pipeline DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st30sd_ut72ut1large30pfxnf_simsp_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st30sd_ut72ut1large30pfxnf_simsp_pipeline` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st30sd_ut72ut1large30pfxnf_simsp_pipeline_en_5.5.0_3.0_1727154400872.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st30sd_ut72ut1large30pfxnf_simsp_pipeline_en_5.5.0_3.0_1727154400872.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st30sd_ut72ut1large30pfxnf_simsp_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st30sd_ut72ut1large30pfxnf_simsp_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st30sd_ut72ut1large30pfxnf_simsp_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.7 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st30sd_ut72ut1large30PfxNf_simsp + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1_plprefix0stlarge103_simsp_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1_plprefix0stlarge103_simsp_pipeline_en.md new file mode 100644 index 00000000000000..304e1f09e5fcda --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1_plprefix0stlarge103_simsp_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1_plprefix0stlarge103_simsp_pipeline pipeline DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1_plprefix0stlarge103_simsp_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1_plprefix0stlarge103_simsp_pipeline` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1_plprefix0stlarge103_simsp_pipeline_en_5.5.0_3.0_1727137502942.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1_plprefix0stlarge103_simsp_pipeline_en_5.5.0_3.0_1727137502942.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1_plprefix0stlarge103_simsp_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1_plprefix0stlarge103_simsp_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1_plprefix0stlarge103_simsp_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.7 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st42sd_ut72ut1_PLPrefix0stlarge103_simsp + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_odm_zphr_0st42sd_ut72ut5_plprefix0stlarge42_simsp_clean4sd_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_odm_zphr_0st42sd_ut72ut5_plprefix0stlarge42_simsp_clean4sd_pipeline_en.md new file mode 100644 index 00000000000000..a6b93f7844af2d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_odm_zphr_0st42sd_ut72ut5_plprefix0stlarge42_simsp_clean4sd_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st42sd_ut72ut5_plprefix0stlarge42_simsp_clean4sd_pipeline pipeline DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st42sd_ut72ut5_plprefix0stlarge42_simsp_clean4sd_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st42sd_ut72ut5_plprefix0stlarge42_simsp_clean4sd_pipeline` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st42sd_ut72ut5_plprefix0stlarge42_simsp_clean4sd_pipeline_en_5.5.0_3.0_1727154384494.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st42sd_ut72ut5_plprefix0stlarge42_simsp_clean4sd_pipeline_en_5.5.0_3.0_1727154384494.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st42sd_ut72ut5_plprefix0stlarge42_simsp_clean4sd_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st42sd_ut72ut5_plprefix0stlarge42_simsp_clean4sd_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st42sd_ut72ut5_plprefix0stlarge42_simsp_clean4sd_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.7 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st42sd_ut72ut5_PLPrefix0stlarge42_simsp_clean4sd + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilbert_coping_replies_en.md b/docs/_posts/ahmedlone127/2024-09-24-distilbert_coping_replies_en.md new file mode 100644 index 00000000000000..542714b32af327 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilbert_coping_replies_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_coping_replies DistilBertForSequenceClassification from coping-appraisal +author: John Snow Labs +name: distilbert_coping_replies +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_coping_replies` is a English model originally trained by coping-appraisal. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_coping_replies_en_5.5.0_3.0_1727154263199.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_coping_replies_en_5.5.0_3.0_1727154263199.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_coping_replies","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_coping_replies", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_coping_replies| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/coping-appraisal/distilbert-coping-replies \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilbert_coping_replies_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-distilbert_coping_replies_pipeline_en.md new file mode 100644 index 00000000000000..e3582f39c837a2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilbert_coping_replies_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_coping_replies_pipeline pipeline DistilBertForSequenceClassification from coping-appraisal +author: John Snow Labs +name: distilbert_coping_replies_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_coping_replies_pipeline` is a English model originally trained by coping-appraisal. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_coping_replies_pipeline_en_5.5.0_3.0_1727154284313.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_coping_replies_pipeline_en_5.5.0_3.0_1727154284313.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_coping_replies_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_coping_replies_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_coping_replies_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/coping-appraisal/distilbert-coping-replies + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilbert_ebit_en.md b/docs/_posts/ahmedlone127/2024-09-24-distilbert_ebit_en.md new file mode 100644 index 00000000000000..700229f0fde8b3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilbert_ebit_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_ebit DistilBertForSequenceClassification from lenguyen +author: John Snow Labs +name: distilbert_ebit +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_ebit` is a English model originally trained by lenguyen. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_ebit_en_5.5.0_3.0_1727164524124.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_ebit_en_5.5.0_3.0_1727164524124.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_ebit","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_ebit", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_ebit| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|411.0 MB| + +## References + +https://huggingface.co/lenguyen/distilbert_EBIT \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilbert_ebit_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-distilbert_ebit_pipeline_en.md new file mode 100644 index 00000000000000..5eb47a61ca777d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilbert_ebit_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_ebit_pipeline pipeline DistilBertForSequenceClassification from lenguyen +author: John Snow Labs +name: distilbert_ebit_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_ebit_pipeline` is a English model originally trained by lenguyen. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_ebit_pipeline_en_5.5.0_3.0_1727164544841.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_ebit_pipeline_en_5.5.0_3.0_1727164544841.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_ebit_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_ebit_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_ebit_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|411.0 MB| + +## References + +https://huggingface.co/lenguyen/distilbert_EBIT + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilbert_emotion_patdj_en.md b/docs/_posts/ahmedlone127/2024-09-24-distilbert_emotion_patdj_en.md new file mode 100644 index 00000000000000..a838ccb8b29fb4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilbert_emotion_patdj_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_emotion_patdj DistilBertForSequenceClassification from PatDJ +author: John Snow Labs +name: distilbert_emotion_patdj +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_emotion_patdj` is a English model originally trained by PatDJ. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_emotion_patdj_en_5.5.0_3.0_1727154388011.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_emotion_patdj_en_5.5.0_3.0_1727154388011.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_emotion_patdj","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_emotion_patdj", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_emotion_patdj| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/PatDJ/distilbert-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilbert_emotions_clf_en.md b/docs/_posts/ahmedlone127/2024-09-24-distilbert_emotions_clf_en.md new file mode 100644 index 00000000000000..b3cd03238129e5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilbert_emotions_clf_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_emotions_clf DistilBertForSequenceClassification from eduardo-alvarez +author: John Snow Labs +name: distilbert_emotions_clf +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_emotions_clf` is a English model originally trained by eduardo-alvarez. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_emotions_clf_en_5.5.0_3.0_1727136819732.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_emotions_clf_en_5.5.0_3.0_1727136819732.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_emotions_clf","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_emotions_clf", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_emotions_clf| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/eduardo-alvarez/distilbert-emotions-clf \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilbert_emotions_clf_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-distilbert_emotions_clf_pipeline_en.md new file mode 100644 index 00000000000000..217a6b300019f5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilbert_emotions_clf_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_emotions_clf_pipeline pipeline DistilBertForSequenceClassification from eduardo-alvarez +author: John Snow Labs +name: distilbert_emotions_clf_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_emotions_clf_pipeline` is a English model originally trained by eduardo-alvarez. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_emotions_clf_pipeline_en_5.5.0_3.0_1727136834081.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_emotions_clf_pipeline_en_5.5.0_3.0_1727136834081.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_emotions_clf_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_emotions_clf_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_emotions_clf_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/eduardo-alvarez/distilbert-emotions-clf + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilbert_essays_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-distilbert_essays_pipeline_en.md new file mode 100644 index 00000000000000..a98e94f4a4db4b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilbert_essays_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_essays_pipeline pipeline DistilBertForSequenceClassification from Bimarshad +author: John Snow Labs +name: distilbert_essays_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_essays_pipeline` is a English model originally trained by Bimarshad. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_essays_pipeline_en_5.5.0_3.0_1727137592482.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_essays_pipeline_en_5.5.0_3.0_1727137592482.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_essays_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_essays_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_essays_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Bimarshad/distilbert.essays + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilbert_ethics_test_en.md b/docs/_posts/ahmedlone127/2024-09-24-distilbert_ethics_test_en.md new file mode 100644 index 00000000000000..0ccdc1d061b755 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilbert_ethics_test_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_ethics_test DistilBertForSequenceClassification from harplyon +author: John Snow Labs +name: distilbert_ethics_test +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_ethics_test` is a English model originally trained by harplyon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_ethics_test_en_5.5.0_3.0_1727154263197.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_ethics_test_en_5.5.0_3.0_1727154263197.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_ethics_test","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_ethics_test", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_ethics_test| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/harplyon/distilbert-ethics-test \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilbert_ethics_test_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-distilbert_ethics_test_pipeline_en.md new file mode 100644 index 00000000000000..802251342b16d3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilbert_ethics_test_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_ethics_test_pipeline pipeline DistilBertForSequenceClassification from harplyon +author: John Snow Labs +name: distilbert_ethics_test_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_ethics_test_pipeline` is a English model originally trained by harplyon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_ethics_test_pipeline_en_5.5.0_3.0_1727154277585.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_ethics_test_pipeline_en_5.5.0_3.0_1727154277585.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_ethics_test_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_ethics_test_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_ethics_test_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/harplyon/distilbert-ethics-test + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilbert_finetuned_emotion_pt_sk_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-distilbert_finetuned_emotion_pt_sk_pipeline_en.md new file mode 100644 index 00000000000000..e4976bb214e136 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilbert_finetuned_emotion_pt_sk_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_finetuned_emotion_pt_sk_pipeline pipeline DistilBertForSequenceClassification from pt-sk +author: John Snow Labs +name: distilbert_finetuned_emotion_pt_sk_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_finetuned_emotion_pt_sk_pipeline` is a English model originally trained by pt-sk. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_finetuned_emotion_pt_sk_pipeline_en_5.5.0_3.0_1727137410016.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_finetuned_emotion_pt_sk_pipeline_en_5.5.0_3.0_1727137410016.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_finetuned_emotion_pt_sk_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_finetuned_emotion_pt_sk_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_finetuned_emotion_pt_sk_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/pt-sk/distilbert-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilbert_finetuned_hatespeech_en.md b/docs/_posts/ahmedlone127/2024-09-24-distilbert_finetuned_hatespeech_en.md new file mode 100644 index 00000000000000..ce001c2166459b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilbert_finetuned_hatespeech_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_finetuned_hatespeech DistilBertForSequenceClassification from ayln +author: John Snow Labs +name: distilbert_finetuned_hatespeech +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_finetuned_hatespeech` is a English model originally trained by ayln. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_finetuned_hatespeech_en_5.5.0_3.0_1727164471587.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_finetuned_hatespeech_en_5.5.0_3.0_1727164471587.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_finetuned_hatespeech","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_finetuned_hatespeech", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_finetuned_hatespeech| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/ayln/distilbert_finetuned_hatespeech \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilbert_finetuned_hatespeech_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-distilbert_finetuned_hatespeech_pipeline_en.md new file mode 100644 index 00000000000000..5de5d96e55c7ae --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilbert_finetuned_hatespeech_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_finetuned_hatespeech_pipeline pipeline DistilBertForSequenceClassification from ayln +author: John Snow Labs +name: distilbert_finetuned_hatespeech_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_finetuned_hatespeech_pipeline` is a English model originally trained by ayln. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_finetuned_hatespeech_pipeline_en_5.5.0_3.0_1727164484409.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_finetuned_hatespeech_pipeline_en_5.5.0_3.0_1727164484409.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_finetuned_hatespeech_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_finetuned_hatespeech_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_finetuned_hatespeech_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/ayln/distilbert_finetuned_hatespeech + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilbert_imdb_padding50model_en.md b/docs/_posts/ahmedlone127/2024-09-24-distilbert_imdb_padding50model_en.md new file mode 100644 index 00000000000000..406df018df71dc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilbert_imdb_padding50model_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_imdb_padding50model DistilBertForSequenceClassification from Realgon +author: John Snow Labs +name: distilbert_imdb_padding50model +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_imdb_padding50model` is a English model originally trained by Realgon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_imdb_padding50model_en_5.5.0_3.0_1727154292538.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_imdb_padding50model_en_5.5.0_3.0_1727154292538.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_imdb_padding50model","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_imdb_padding50model", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_imdb_padding50model| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Realgon/distilbert_imdb_padding50model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilbert_imdb_padding50model_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-distilbert_imdb_padding50model_pipeline_en.md new file mode 100644 index 00000000000000..a29771514a007a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilbert_imdb_padding50model_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_imdb_padding50model_pipeline pipeline DistilBertForSequenceClassification from Realgon +author: John Snow Labs +name: distilbert_imdb_padding50model_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_imdb_padding50model_pipeline` is a English model originally trained by Realgon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_imdb_padding50model_pipeline_en_5.5.0_3.0_1727154306597.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_imdb_padding50model_pipeline_en_5.5.0_3.0_1727154306597.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_imdb_padding50model_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_imdb_padding50model_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_imdb_padding50model_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Realgon/distilbert_imdb_padding50model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilbert_lr_cosine_epoch_5_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-distilbert_lr_cosine_epoch_5_pipeline_en.md new file mode 100644 index 00000000000000..abfeb2509ea67a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilbert_lr_cosine_epoch_5_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_lr_cosine_epoch_5_pipeline pipeline DistilBertForSequenceClassification from K-kiron +author: John Snow Labs +name: distilbert_lr_cosine_epoch_5_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_lr_cosine_epoch_5_pipeline` is a English model originally trained by K-kiron. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_lr_cosine_epoch_5_pipeline_en_5.5.0_3.0_1727137395233.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_lr_cosine_epoch_5_pipeline_en_5.5.0_3.0_1727137395233.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_lr_cosine_epoch_5_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_lr_cosine_epoch_5_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_lr_cosine_epoch_5_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/K-kiron/distilbert-lr-cosine-epoch-5 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilbert_ndd_html_content_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-distilbert_ndd_html_content_pipeline_en.md new file mode 100644 index 00000000000000..e995ec8b7f7248 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilbert_ndd_html_content_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_ndd_html_content_pipeline pipeline DistilBertForSequenceClassification from lgk03 +author: John Snow Labs +name: distilbert_ndd_html_content_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_ndd_html_content_pipeline` is a English model originally trained by lgk03. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_ndd_html_content_pipeline_en_5.5.0_3.0_1727204801300.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_ndd_html_content_pipeline_en_5.5.0_3.0_1727204801300.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_ndd_html_content_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_ndd_html_content_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_ndd_html_content_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/lgk03/distilBERT-NDD.html.content + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilbert_on_polarity_yelp_reviews_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-distilbert_on_polarity_yelp_reviews_pipeline_en.md new file mode 100644 index 00000000000000..92dd3d07b20ab6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilbert_on_polarity_yelp_reviews_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_on_polarity_yelp_reviews_pipeline pipeline DistilBertForSequenceClassification from BexRedpill +author: John Snow Labs +name: distilbert_on_polarity_yelp_reviews_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_on_polarity_yelp_reviews_pipeline` is a English model originally trained by BexRedpill. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_on_polarity_yelp_reviews_pipeline_en_5.5.0_3.0_1727204801364.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_on_polarity_yelp_reviews_pipeline_en_5.5.0_3.0_1727204801364.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_on_polarity_yelp_reviews_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_on_polarity_yelp_reviews_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_on_polarity_yelp_reviews_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/BexRedpill/distilbert-on-polarity-yelp-reviews + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_mrpc_96_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_mrpc_96_pipeline_en.md new file mode 100644 index 00000000000000..c76f3e07f2aa3a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_mrpc_96_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_mrpc_96_pipeline pipeline DistilBertForSequenceClassification from gokuls +author: John Snow Labs +name: distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_mrpc_96_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_mrpc_96_pipeline` is a English model originally trained by gokuls. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_mrpc_96_pipeline_en_5.5.0_3.0_1727154979697.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_mrpc_96_pipeline_en_5.5.0_3.0_1727154979697.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_mrpc_96_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_mrpc_96_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_mrpc_96_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|25.7 MB| + +## References + +https://huggingface.co/gokuls/distilbert_sa_GLUE_Experiment_logit_kd_data_aug_mrpc_96 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilbert_sanskrit_saskta_glue_experiment_logit_kd_stsb_256_en.md b/docs/_posts/ahmedlone127/2024-09-24-distilbert_sanskrit_saskta_glue_experiment_logit_kd_stsb_256_en.md new file mode 100644 index 00000000000000..1b68a1c002d51e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilbert_sanskrit_saskta_glue_experiment_logit_kd_stsb_256_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_sanskrit_saskta_glue_experiment_logit_kd_stsb_256 DistilBertForSequenceClassification from gokuls +author: John Snow Labs +name: distilbert_sanskrit_saskta_glue_experiment_logit_kd_stsb_256 +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_sanskrit_saskta_glue_experiment_logit_kd_stsb_256` is a English model originally trained by gokuls. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_sanskrit_saskta_glue_experiment_logit_kd_stsb_256_en_5.5.0_3.0_1727137282939.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_sanskrit_saskta_glue_experiment_logit_kd_stsb_256_en_5.5.0_3.0_1727137282939.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_sanskrit_saskta_glue_experiment_logit_kd_stsb_256","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_sanskrit_saskta_glue_experiment_logit_kd_stsb_256", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_sanskrit_saskta_glue_experiment_logit_kd_stsb_256| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|71.6 MB| + +## References + +https://huggingface.co/gokuls/distilbert_sa_GLUE_Experiment_logit_kd_stsb_256 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilbert_sanskrit_saskta_glue_experiment_logit_kd_stsb_256_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-distilbert_sanskrit_saskta_glue_experiment_logit_kd_stsb_256_pipeline_en.md new file mode 100644 index 00000000000000..dc08538619a310 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilbert_sanskrit_saskta_glue_experiment_logit_kd_stsb_256_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_sanskrit_saskta_glue_experiment_logit_kd_stsb_256_pipeline pipeline DistilBertForSequenceClassification from gokuls +author: John Snow Labs +name: distilbert_sanskrit_saskta_glue_experiment_logit_kd_stsb_256_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_sanskrit_saskta_glue_experiment_logit_kd_stsb_256_pipeline` is a English model originally trained by gokuls. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_sanskrit_saskta_glue_experiment_logit_kd_stsb_256_pipeline_en_5.5.0_3.0_1727137286655.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_sanskrit_saskta_glue_experiment_logit_kd_stsb_256_pipeline_en_5.5.0_3.0_1727137286655.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_sanskrit_saskta_glue_experiment_logit_kd_stsb_256_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_sanskrit_saskta_glue_experiment_logit_kd_stsb_256_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_sanskrit_saskta_glue_experiment_logit_kd_stsb_256_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|71.6 MB| + +## References + +https://huggingface.co/gokuls/distilbert_sa_GLUE_Experiment_logit_kd_stsb_256 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilbert_sanskrit_saskta_glue_experiment_mrpc_256_en.md b/docs/_posts/ahmedlone127/2024-09-24-distilbert_sanskrit_saskta_glue_experiment_mrpc_256_en.md new file mode 100644 index 00000000000000..4923a56f748167 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilbert_sanskrit_saskta_glue_experiment_mrpc_256_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_sanskrit_saskta_glue_experiment_mrpc_256 DistilBertForSequenceClassification from gokuls +author: John Snow Labs +name: distilbert_sanskrit_saskta_glue_experiment_mrpc_256 +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_sanskrit_saskta_glue_experiment_mrpc_256` is a English model originally trained by gokuls. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_sanskrit_saskta_glue_experiment_mrpc_256_en_5.5.0_3.0_1727154906729.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_sanskrit_saskta_glue_experiment_mrpc_256_en_5.5.0_3.0_1727154906729.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_sanskrit_saskta_glue_experiment_mrpc_256","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_sanskrit_saskta_glue_experiment_mrpc_256", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_sanskrit_saskta_glue_experiment_mrpc_256| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|71.6 MB| + +## References + +https://huggingface.co/gokuls/distilbert_sa_GLUE_Experiment_mrpc_256 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilbert_sanskrit_saskta_glue_experiment_mrpc_256_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-distilbert_sanskrit_saskta_glue_experiment_mrpc_256_pipeline_en.md new file mode 100644 index 00000000000000..58954cce7cbacb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilbert_sanskrit_saskta_glue_experiment_mrpc_256_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_sanskrit_saskta_glue_experiment_mrpc_256_pipeline pipeline DistilBertForSequenceClassification from gokuls +author: John Snow Labs +name: distilbert_sanskrit_saskta_glue_experiment_mrpc_256_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_sanskrit_saskta_glue_experiment_mrpc_256_pipeline` is a English model originally trained by gokuls. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_sanskrit_saskta_glue_experiment_mrpc_256_pipeline_en_5.5.0_3.0_1727154910905.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_sanskrit_saskta_glue_experiment_mrpc_256_pipeline_en_5.5.0_3.0_1727154910905.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_sanskrit_saskta_glue_experiment_mrpc_256_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_sanskrit_saskta_glue_experiment_mrpc_256_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_sanskrit_saskta_glue_experiment_mrpc_256_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|71.6 MB| + +## References + +https://huggingface.co/gokuls/distilbert_sa_GLUE_Experiment_mrpc_256 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilbert_sentiment_en.md b/docs/_posts/ahmedlone127/2024-09-24-distilbert_sentiment_en.md new file mode 100644 index 00000000000000..89bdf48a367181 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilbert_sentiment_en.md @@ -0,0 +1,98 @@ +--- +layout: model +title: English distilbert_sentiment DistilBertForSequenceClassification from AbeerAlbashiti +author: John Snow Labs +name: distilbert_sentiment +date: 2024-09-24 +tags: [bert, en, open_source, sequence_classification, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_sentiment` is a English model originally trained by AbeerAlbashiti. + +## Predicted Entities + + + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_sentiment_en_5.5.0_3.0_1727136956400.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_sentiment_en_5.5.0_3.0_1727136956400.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +document_assembler = DocumentAssembler()\ + .setInputCol("text")\ + .setOutputCol("document") + +tokenizer = Tokenizer()\ + .setInputCols("document")\ + .setOutputCol("token") + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_sentiment","en")\ + .setInputCols(["document","token"])\ + .setOutputCol("class") + +pipeline = Pipeline().setStages([document_assembler, tokenizer, sequenceClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val document_assembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_sentiment","en") + .setInputCols(Array("document","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_sentiment| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +References + +https://huggingface.co/AbeerAlbashiti/distilbert-sentiment \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilbert_sst5_padding0model_en.md b/docs/_posts/ahmedlone127/2024-09-24-distilbert_sst5_padding0model_en.md new file mode 100644 index 00000000000000..89483168db4d1d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilbert_sst5_padding0model_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_sst5_padding0model DistilBertForSequenceClassification from Realgon +author: John Snow Labs +name: distilbert_sst5_padding0model +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_sst5_padding0model` is a English model originally trained by Realgon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_sst5_padding0model_en_5.5.0_3.0_1727154953564.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_sst5_padding0model_en_5.5.0_3.0_1727154953564.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_sst5_padding0model","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_sst5_padding0model", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_sst5_padding0model| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Realgon/distilbert_sst5_padding0model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilbert_sst5_padding0model_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-distilbert_sst5_padding0model_pipeline_en.md new file mode 100644 index 00000000000000..b25df17ebbd514 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilbert_sst5_padding0model_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_sst5_padding0model_pipeline pipeline DistilBertForSequenceClassification from Realgon +author: John Snow Labs +name: distilbert_sst5_padding0model_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_sst5_padding0model_pipeline` is a English model originally trained by Realgon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_sst5_padding0model_pipeline_en_5.5.0_3.0_1727154968746.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_sst5_padding0model_pipeline_en_5.5.0_3.0_1727154968746.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_sst5_padding0model_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_sst5_padding0model_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_sst5_padding0model_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Realgon/distilbert_sst5_padding0model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilbert_uncased_newsqa_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-distilbert_uncased_newsqa_pipeline_en.md new file mode 100644 index 00000000000000..d4a2895dcd18fb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilbert_uncased_newsqa_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English distilbert_uncased_newsqa_pipeline pipeline DistilBertForQuestionAnswering from Prasetyow12 +author: John Snow Labs +name: distilbert_uncased_newsqa_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_uncased_newsqa_pipeline` is a English model originally trained by Prasetyow12. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_uncased_newsqa_pipeline_en_5.5.0_3.0_1727219916490.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_uncased_newsqa_pipeline_en_5.5.0_3.0_1727219916490.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_uncased_newsqa_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_uncased_newsqa_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_uncased_newsqa_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.3 MB| + +## References + +https://huggingface.co/Prasetyow12/distilbert-uncased-newsqa + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilbert_v1_b_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-distilbert_v1_b_pipeline_en.md new file mode 100644 index 00000000000000..f5cb6debf8f5bc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilbert_v1_b_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_v1_b_pipeline pipeline DistilBertForSequenceClassification from sheduele +author: John Snow Labs +name: distilbert_v1_b_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_v1_b_pipeline` is a English model originally trained by sheduele. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_v1_b_pipeline_en_5.5.0_3.0_1727164425129.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_v1_b_pipeline_en_5.5.0_3.0_1727164425129.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_v1_b_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_v1_b_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_v1_b_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|507.6 MB| + +## References + +https://huggingface.co/sheduele/distilbert_v1_b + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilroberta_base_finetuned_abr_en.md b/docs/_posts/ahmedlone127/2024-09-24-distilroberta_base_finetuned_abr_en.md new file mode 100644 index 00000000000000..21ed9fb3cb7c50 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilroberta_base_finetuned_abr_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilroberta_base_finetuned_abr RoBertaEmbeddings from Transabrar +author: John Snow Labs +name: distilroberta_base_finetuned_abr +date: 2024-09-24 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilroberta_base_finetuned_abr` is a English model originally trained by Transabrar. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilroberta_base_finetuned_abr_en_5.5.0_3.0_1727169121871.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilroberta_base_finetuned_abr_en_5.5.0_3.0_1727169121871.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("distilroberta_base_finetuned_abr","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("distilroberta_base_finetuned_abr","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilroberta_base_finetuned_abr| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|306.5 MB| + +## References + +https://huggingface.co/Transabrar/distilroberta-base-finetuned-abr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilroberta_base_finetuned_abr_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-distilroberta_base_finetuned_abr_pipeline_en.md new file mode 100644 index 00000000000000..0e48b35d2cdf20 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilroberta_base_finetuned_abr_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilroberta_base_finetuned_abr_pipeline pipeline RoBertaEmbeddings from Transabrar +author: John Snow Labs +name: distilroberta_base_finetuned_abr_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilroberta_base_finetuned_abr_pipeline` is a English model originally trained by Transabrar. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilroberta_base_finetuned_abr_pipeline_en_5.5.0_3.0_1727169141026.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilroberta_base_finetuned_abr_pipeline_en_5.5.0_3.0_1727169141026.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilroberta_base_finetuned_abr_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilroberta_base_finetuned_abr_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilroberta_base_finetuned_abr_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|306.5 MB| + +## References + +https://huggingface.co/Transabrar/distilroberta-base-finetuned-abr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilroberta_base_finetuned_wikitext2_0409nnn_en.md b/docs/_posts/ahmedlone127/2024-09-24-distilroberta_base_finetuned_wikitext2_0409nnn_en.md new file mode 100644 index 00000000000000..0a946a03cc9132 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilroberta_base_finetuned_wikitext2_0409nnn_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilroberta_base_finetuned_wikitext2_0409nnn RoBertaEmbeddings from ntust0 +author: John Snow Labs +name: distilroberta_base_finetuned_wikitext2_0409nnn +date: 2024-09-24 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilroberta_base_finetuned_wikitext2_0409nnn` is a English model originally trained by ntust0. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilroberta_base_finetuned_wikitext2_0409nnn_en_5.5.0_3.0_1727168787663.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilroberta_base_finetuned_wikitext2_0409nnn_en_5.5.0_3.0_1727168787663.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("distilroberta_base_finetuned_wikitext2_0409nnn","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("distilroberta_base_finetuned_wikitext2_0409nnn","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilroberta_base_finetuned_wikitext2_0409nnn| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|306.5 MB| + +## References + +https://huggingface.co/ntust0/distilroberta-base-finetuned-wikitext2-0409nnn \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilroberta_base_finetuned_wikitext2_0409nnn_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-distilroberta_base_finetuned_wikitext2_0409nnn_pipeline_en.md new file mode 100644 index 00000000000000..dcb38da9572384 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilroberta_base_finetuned_wikitext2_0409nnn_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilroberta_base_finetuned_wikitext2_0409nnn_pipeline pipeline RoBertaEmbeddings from ntust0 +author: John Snow Labs +name: distilroberta_base_finetuned_wikitext2_0409nnn_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilroberta_base_finetuned_wikitext2_0409nnn_pipeline` is a English model originally trained by ntust0. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilroberta_base_finetuned_wikitext2_0409nnn_pipeline_en_5.5.0_3.0_1727168803569.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilroberta_base_finetuned_wikitext2_0409nnn_pipeline_en_5.5.0_3.0_1727168803569.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilroberta_base_finetuned_wikitext2_0409nnn_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilroberta_base_finetuned_wikitext2_0409nnn_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilroberta_base_finetuned_wikitext2_0409nnn_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|306.5 MB| + +## References + +https://huggingface.co/ntust0/distilroberta-base-finetuned-wikitext2-0409nnn + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilroberta_base_finetuned_wikitext2_aekang12_en.md b/docs/_posts/ahmedlone127/2024-09-24-distilroberta_base_finetuned_wikitext2_aekang12_en.md new file mode 100644 index 00000000000000..1071bb9791913e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilroberta_base_finetuned_wikitext2_aekang12_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilroberta_base_finetuned_wikitext2_aekang12 RoBertaEmbeddings from aekang12 +author: John Snow Labs +name: distilroberta_base_finetuned_wikitext2_aekang12 +date: 2024-09-24 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilroberta_base_finetuned_wikitext2_aekang12` is a English model originally trained by aekang12. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilroberta_base_finetuned_wikitext2_aekang12_en_5.5.0_3.0_1727168669067.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilroberta_base_finetuned_wikitext2_aekang12_en_5.5.0_3.0_1727168669067.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("distilroberta_base_finetuned_wikitext2_aekang12","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("distilroberta_base_finetuned_wikitext2_aekang12","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilroberta_base_finetuned_wikitext2_aekang12| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|306.5 MB| + +## References + +https://huggingface.co/aekang12/distilroberta-base-finetuned-wikitext2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilroberta_base_ft_4chan_en.md b/docs/_posts/ahmedlone127/2024-09-24-distilroberta_base_ft_4chan_en.md new file mode 100644 index 00000000000000..4a0b99c8212abf --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilroberta_base_ft_4chan_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilroberta_base_ft_4chan RoBertaEmbeddings from jkruk +author: John Snow Labs +name: distilroberta_base_ft_4chan +date: 2024-09-24 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilroberta_base_ft_4chan` is a English model originally trained by jkruk. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilroberta_base_ft_4chan_en_5.5.0_3.0_1727168947325.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilroberta_base_ft_4chan_en_5.5.0_3.0_1727168947325.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("distilroberta_base_ft_4chan","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("distilroberta_base_ft_4chan","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilroberta_base_ft_4chan| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|306.5 MB| + +## References + +https://huggingface.co/jkruk/distilroberta-base-ft-4chan \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilroberta_base_ft_4chan_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-distilroberta_base_ft_4chan_pipeline_en.md new file mode 100644 index 00000000000000..b60b762a24f2cb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilroberta_base_ft_4chan_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilroberta_base_ft_4chan_pipeline pipeline RoBertaEmbeddings from jkruk +author: John Snow Labs +name: distilroberta_base_ft_4chan_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilroberta_base_ft_4chan_pipeline` is a English model originally trained by jkruk. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilroberta_base_ft_4chan_pipeline_en_5.5.0_3.0_1727168963458.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilroberta_base_ft_4chan_pipeline_en_5.5.0_3.0_1727168963458.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilroberta_base_ft_4chan_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilroberta_base_ft_4chan_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilroberta_base_ft_4chan_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|306.5 MB| + +## References + +https://huggingface.co/jkruk/distilroberta-base-ft-4chan + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilroberts_base_mrpc_glue_jeraldflowers_en.md b/docs/_posts/ahmedlone127/2024-09-24-distilroberts_base_mrpc_glue_jeraldflowers_en.md new file mode 100644 index 00000000000000..06dd0a1ac89519 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilroberts_base_mrpc_glue_jeraldflowers_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilroberts_base_mrpc_glue_jeraldflowers RoBertaForSequenceClassification from jeraldflowers +author: John Snow Labs +name: distilroberts_base_mrpc_glue_jeraldflowers +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilroberts_base_mrpc_glue_jeraldflowers` is a English model originally trained by jeraldflowers. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilroberts_base_mrpc_glue_jeraldflowers_en_5.5.0_3.0_1727171242683.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilroberts_base_mrpc_glue_jeraldflowers_en_5.5.0_3.0_1727171242683.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("distilroberts_base_mrpc_glue_jeraldflowers","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("distilroberts_base_mrpc_glue_jeraldflowers", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilroberts_base_mrpc_glue_jeraldflowers| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|308.6 MB| + +## References + +https://huggingface.co/jeraldflowers/distilroberts-base-mrpc-glue-jeraldflowers \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-e4a_covid_question_answering_en.md b/docs/_posts/ahmedlone127/2024-09-24-e4a_covid_question_answering_en.md new file mode 100644 index 00000000000000..146fc37cdf7135 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-e4a_covid_question_answering_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English e4a_covid_question_answering BertForQuestionAnswering from racai +author: John Snow Labs +name: e4a_covid_question_answering +date: 2024-09-24 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`e4a_covid_question_answering` is a English model originally trained by racai. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/e4a_covid_question_answering_en_5.5.0_3.0_1727175350127.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/e4a_covid_question_answering_en_5.5.0_3.0_1727175350127.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("e4a_covid_question_answering","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("e4a_covid_question_answering", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|e4a_covid_question_answering| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|464.1 MB| + +## References + +https://huggingface.co/racai/e4a-covid-question-answering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-e4a_covid_question_answering_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-e4a_covid_question_answering_pipeline_en.md new file mode 100644 index 00000000000000..f16dc162e94f2a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-e4a_covid_question_answering_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English e4a_covid_question_answering_pipeline pipeline BertForQuestionAnswering from racai +author: John Snow Labs +name: e4a_covid_question_answering_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`e4a_covid_question_answering_pipeline` is a English model originally trained by racai. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/e4a_covid_question_answering_pipeline_en_5.5.0_3.0_1727175376922.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/e4a_covid_question_answering_pipeline_en_5.5.0_3.0_1727175376922.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("e4a_covid_question_answering_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("e4a_covid_question_answering_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|e4a_covid_question_answering_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|464.1 MB| + +## References + +https://huggingface.co/racai/e4a-covid-question-answering + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-e5_base_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-e5_base_pipeline_en.md new file mode 100644 index 00000000000000..d82525c48b2872 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-e5_base_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English e5_base_pipeline pipeline E5Embeddings from intfloat +author: John Snow Labs +name: e5_base_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained E5Embeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`e5_base_pipeline` is a English model originally trained by intfloat. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/e5_base_pipeline_en_5.5.0_3.0_1727217873798.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/e5_base_pipeline_en_5.5.0_3.0_1727217873798.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("e5_base_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("e5_base_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|e5_base_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|258.6 MB| + +## References + +https://huggingface.co/intfloat/e5-base + +## Included Models + +- DocumentAssembler +- E5Embeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-e5_large_en.md b/docs/_posts/ahmedlone127/2024-09-24-e5_large_en.md new file mode 100644 index 00000000000000..262cad79db1451 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-e5_large_en.md @@ -0,0 +1,73 @@ +--- +layout: model +title: E5 Large Sentence Embeddings +author: John Snow Labs +name: e5_large +date: 2024-09-24 +tags: [en, open_source, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: E5Embeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Text Embeddings by Weakly-Supervised Contrastive Pre-training. Liang Wang, Nan Yang, Xiaolong Huang, Binxing Jiao, Linjun Yang, Daxin Jiang, Rangan Majumder, Furu Wei, arXiv 2022 + +## Predicted Entities + + + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/e5_large_en_5.5.0_3.0_1727217963878.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/e5_large_en_5.5.0_3.0_1727217963878.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +embeddings =E5Embeddings.pretrained("e5_large","en") \ + .setInputCols(["documents"]) \ + .setOutputCol("instructor") + +pipeline = Pipeline().setStages([document_assembler, embeddings]) +``` +```scala +val embeddings = E5Embeddings.pretrained("e5_large","en") + .setInputCols(["document"]) + .setOutputCol("e5_embeddings") +val pipeline = new Pipeline().setStages(Array(document, embeddings)) +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|e5_large| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[E5]| +|Language:|en| +|Size:|796.1 MB| + +## References + +References + +https://huggingface.co/intfloat/e5-large \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-e5_large_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-e5_large_pipeline_en.md new file mode 100644 index 00000000000000..16e365f85ce24d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-e5_large_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English e5_large_pipeline pipeline E5Embeddings from intfloat +author: John Snow Labs +name: e5_large_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained E5Embeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`e5_large_pipeline` is a English model originally trained by intfloat. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/e5_large_pipeline_en_5.5.0_3.0_1727218193373.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/e5_large_pipeline_en_5.5.0_3.0_1727218193373.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("e5_large_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("e5_large_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|e5_large_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|796.1 MB| + +## References + +https://huggingface.co/intfloat/e5-large + +## Included Models + +- DocumentAssembler +- E5Embeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-e5_small_en.md b/docs/_posts/ahmedlone127/2024-09-24-e5_small_en.md new file mode 100644 index 00000000000000..c58797cf36a888 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-e5_small_en.md @@ -0,0 +1,67 @@ +--- +layout: model +title: E5 Small Sentence Embeddings +author: John Snow Labs +name: e5_small +date: 2024-09-24 +tags: [en, open_source, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: E5Embeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Text Embeddings by Weakly-Supervised Contrastive Pre-training. Liang Wang, Nan Yang, Xiaolong Huang, Binxing Jiao, Linjun Yang, Daxin Jiang, Rangan Majumder, Furu Wei, arXiv 2022 + +## Predicted Entities + + + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/e5_small_en_5.5.0_3.0_1727217734668.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/e5_small_en_5.5.0_3.0_1727217734668.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +embeddings =E5Embeddings.pretrained("e5_small","en") \ + .setInputCols(["documents"]) \ + .setOutputCol("instructor") + +pipeline = Pipeline().setStages([document_assembler, embeddings]) +``` +```scala +val embeddings = E5Embeddings.pretrained("e5_small","en") + .setInputCols(["document"]) + .setOutputCol("e5_embeddings") +val pipeline = new Pipeline().setStages(Array(document, embeddings)) +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|e5_small| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[E5]| +|Language:|en| +|Size:|79.9 MB| \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-e5_small_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-e5_small_pipeline_en.md new file mode 100644 index 00000000000000..f21e17a3e47cf8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-e5_small_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English e5_small_pipeline pipeline E5Embeddings from intfloat +author: John Snow Labs +name: e5_small_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained E5Embeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`e5_small_pipeline` is a English model originally trained by intfloat. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/e5_small_pipeline_en_5.5.0_3.0_1727217757925.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/e5_small_pipeline_en_5.5.0_3.0_1727217757925.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("e5_small_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("e5_small_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|e5_small_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|79.9 MB| + +## References + +https://huggingface.co/intfloat/e5-small + +## Included Models + +- DocumentAssembler +- E5Embeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-email_classification_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-email_classification_pipeline_en.md new file mode 100644 index 00000000000000..21cfce4a781bcd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-email_classification_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English email_classification_pipeline pipeline RoBertaForSequenceClassification from arya555 +author: John Snow Labs +name: email_classification_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`email_classification_pipeline` is a English model originally trained by arya555. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/email_classification_pipeline_en_5.5.0_3.0_1727171756827.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/email_classification_pipeline_en_5.5.0_3.0_1727171756827.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("email_classification_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("email_classification_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|email_classification_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|427.2 MB| + +## References + +https://huggingface.co/arya555/email_classification + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-emotion_analysis_en.md b/docs/_posts/ahmedlone127/2024-09-24-emotion_analysis_en.md new file mode 100644 index 00000000000000..c446469975c882 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-emotion_analysis_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English emotion_analysis DistilBertForSequenceClassification from erlend123 +author: John Snow Labs +name: emotion_analysis +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`emotion_analysis` is a English model originally trained by erlend123. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/emotion_analysis_en_5.5.0_3.0_1727154263205.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/emotion_analysis_en_5.5.0_3.0_1727154263205.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("emotion_analysis","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("emotion_analysis", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|emotion_analysis| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/erlend123/emotion-analysis \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-emotion_analysis_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-emotion_analysis_pipeline_en.md new file mode 100644 index 00000000000000..a963ba80c6dc40 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-emotion_analysis_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English emotion_analysis_pipeline pipeline DistilBertForSequenceClassification from erlend123 +author: John Snow Labs +name: emotion_analysis_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`emotion_analysis_pipeline` is a English model originally trained by erlend123. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/emotion_analysis_pipeline_en_5.5.0_3.0_1727154284410.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/emotion_analysis_pipeline_en_5.5.0_3.0_1727154284410.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("emotion_analysis_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("emotion_analysis_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|emotion_analysis_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/erlend123/emotion-analysis + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-emotion_vangmayy_en.md b/docs/_posts/ahmedlone127/2024-09-24-emotion_vangmayy_en.md new file mode 100644 index 00000000000000..5b365671ac11ad --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-emotion_vangmayy_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English emotion_vangmayy DistilBertForSequenceClassification from Vangmayy +author: John Snow Labs +name: emotion_vangmayy +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`emotion_vangmayy` is a English model originally trained by Vangmayy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/emotion_vangmayy_en_5.5.0_3.0_1727137293163.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/emotion_vangmayy_en_5.5.0_3.0_1727137293163.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("emotion_vangmayy","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("emotion_vangmayy", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|emotion_vangmayy| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Vangmayy/emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-environmentalbert_water_en.md b/docs/_posts/ahmedlone127/2024-09-24-environmentalbert_water_en.md new file mode 100644 index 00000000000000..ddadd19d360e3f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-environmentalbert_water_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English environmentalbert_water RoBertaForSequenceClassification from ESGBERT +author: John Snow Labs +name: environmentalbert_water +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`environmentalbert_water` is a English model originally trained by ESGBERT. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/environmentalbert_water_en_5.5.0_3.0_1727168183583.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/environmentalbert_water_en_5.5.0_3.0_1727168183583.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("environmentalbert_water","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("environmentalbert_water", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|environmentalbert_water| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|309.0 MB| + +## References + +https://huggingface.co/ESGBERT/EnvironmentalBERT-water \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-environmentalbert_water_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-environmentalbert_water_pipeline_en.md new file mode 100644 index 00000000000000..a857a45ce8d21e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-environmentalbert_water_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English environmentalbert_water_pipeline pipeline RoBertaForSequenceClassification from ESGBERT +author: John Snow Labs +name: environmentalbert_water_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`environmentalbert_water_pipeline` is a English model originally trained by ESGBERT. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/environmentalbert_water_pipeline_en_5.5.0_3.0_1727168198845.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/environmentalbert_water_pipeline_en_5.5.0_3.0_1727168198845.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("environmentalbert_water_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("environmentalbert_water_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|environmentalbert_water_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|309.0 MB| + +## References + +https://huggingface.co/ESGBERT/EnvironmentalBERT-water + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-fakenews_classifier_nela_gt_en.md b/docs/_posts/ahmedlone127/2024-09-24-fakenews_classifier_nela_gt_en.md new file mode 100644 index 00000000000000..541fc68cf64432 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-fakenews_classifier_nela_gt_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English fakenews_classifier_nela_gt RoBertaForSequenceClassification from newsmediabias +author: John Snow Labs +name: fakenews_classifier_nela_gt +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`fakenews_classifier_nela_gt` is a English model originally trained by newsmediabias. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/fakenews_classifier_nela_gt_en_5.5.0_3.0_1727171381995.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/fakenews_classifier_nela_gt_en_5.5.0_3.0_1727171381995.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("fakenews_classifier_nela_gt","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("fakenews_classifier_nela_gt", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|fakenews_classifier_nela_gt| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|466.3 MB| + +## References + +https://huggingface.co/newsmediabias/FakeNews-Classifier-NELA-GT \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-fakenews_classifier_nela_gt_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-fakenews_classifier_nela_gt_pipeline_en.md new file mode 100644 index 00000000000000..cb585ce19927db --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-fakenews_classifier_nela_gt_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English fakenews_classifier_nela_gt_pipeline pipeline RoBertaForSequenceClassification from newsmediabias +author: John Snow Labs +name: fakenews_classifier_nela_gt_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`fakenews_classifier_nela_gt_pipeline` is a English model originally trained by newsmediabias. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/fakenews_classifier_nela_gt_pipeline_en_5.5.0_3.0_1727171404886.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/fakenews_classifier_nela_gt_pipeline_en_5.5.0_3.0_1727171404886.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("fakenews_classifier_nela_gt_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("fakenews_classifier_nela_gt_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|fakenews_classifier_nela_gt_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|466.3 MB| + +## References + +https://huggingface.co/newsmediabias/FakeNews-Classifier-NELA-GT + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-finbert_ner_en.md b/docs/_posts/ahmedlone127/2024-09-24-finbert_ner_en.md new file mode 100644 index 00000000000000..99f0b8eae0067f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-finbert_ner_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finbert_ner BertForTokenClassification from Rupesh2 +author: John Snow Labs +name: finbert_ner +date: 2024-09-24 +tags: [en, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finbert_ner` is a English model originally trained by Rupesh2. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finbert_ner_en_5.5.0_3.0_1727196324312.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finbert_ner_en_5.5.0_3.0_1727196324312.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("finbert_ner","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("finbert_ner", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finbert_ner| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|409.5 MB| + +## References + +https://huggingface.co/Rupesh2/finbert-ner \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-finbert_ner_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-finbert_ner_pipeline_en.md new file mode 100644 index 00000000000000..16749d41471e8b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-finbert_ner_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finbert_ner_pipeline pipeline BertForTokenClassification from Rupesh2 +author: John Snow Labs +name: finbert_ner_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finbert_ner_pipeline` is a English model originally trained by Rupesh2. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finbert_ner_pipeline_en_5.5.0_3.0_1727196344092.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finbert_ner_pipeline_en_5.5.0_3.0_1727196344092.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finbert_ner_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finbert_ner_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finbert_ner_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.5 MB| + +## References + +https://huggingface.co/Rupesh2/finbert-ner + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-finetune_whisper_tiny_malay_singlish_en.md b/docs/_posts/ahmedlone127/2024-09-24-finetune_whisper_tiny_malay_singlish_en.md new file mode 100644 index 00000000000000..2131efe6e499ab --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-finetune_whisper_tiny_malay_singlish_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English finetune_whisper_tiny_malay_singlish WhisperForCTC from mesolitica +author: John Snow Labs +name: finetune_whisper_tiny_malay_singlish +date: 2024-09-24 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetune_whisper_tiny_malay_singlish` is a English model originally trained by mesolitica. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetune_whisper_tiny_malay_singlish_en_5.5.0_3.0_1727144205921.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetune_whisper_tiny_malay_singlish_en_5.5.0_3.0_1727144205921.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("finetune_whisper_tiny_malay_singlish","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("finetune_whisper_tiny_malay_singlish", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetune_whisper_tiny_malay_singlish| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|378.7 MB| + +## References + +https://huggingface.co/mesolitica/finetune-whisper-tiny-ms-singlish \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-finetune_whisper_tiny_malay_singlish_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-finetune_whisper_tiny_malay_singlish_pipeline_en.md new file mode 100644 index 00000000000000..34f86b0f227b89 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-finetune_whisper_tiny_malay_singlish_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English finetune_whisper_tiny_malay_singlish_pipeline pipeline WhisperForCTC from mesolitica +author: John Snow Labs +name: finetune_whisper_tiny_malay_singlish_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetune_whisper_tiny_malay_singlish_pipeline` is a English model originally trained by mesolitica. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetune_whisper_tiny_malay_singlish_pipeline_en_5.5.0_3.0_1727144232676.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetune_whisper_tiny_malay_singlish_pipeline_en_5.5.0_3.0_1727144232676.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetune_whisper_tiny_malay_singlish_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetune_whisper_tiny_malay_singlish_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetune_whisper_tiny_malay_singlish_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|378.7 MB| + +## References + +https://huggingface.co/mesolitica/finetune-whisper-tiny-ms-singlish + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-finetuned_bert_policy_classifier_en.md b/docs/_posts/ahmedlone127/2024-09-24-finetuned_bert_policy_classifier_en.md new file mode 100644 index 00000000000000..d7bc6d156bc546 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-finetuned_bert_policy_classifier_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuned_bert_policy_classifier BertForSequenceClassification from aryaniyaps +author: John Snow Labs +name: finetuned_bert_policy_classifier +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuned_bert_policy_classifier` is a English model originally trained by aryaniyaps. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuned_bert_policy_classifier_en_5.5.0_3.0_1727219004416.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuned_bert_policy_classifier_en_5.5.0_3.0_1727219004416.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("finetuned_bert_policy_classifier","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("finetuned_bert_policy_classifier", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuned_bert_policy_classifier| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/aryaniyaps/finetuned-bert-policy-classifier \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-finetuned_bert_policy_classifier_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-finetuned_bert_policy_classifier_pipeline_en.md new file mode 100644 index 00000000000000..9a0549ad146089 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-finetuned_bert_policy_classifier_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuned_bert_policy_classifier_pipeline pipeline BertForSequenceClassification from aryaniyaps +author: John Snow Labs +name: finetuned_bert_policy_classifier_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuned_bert_policy_classifier_pipeline` is a English model originally trained by aryaniyaps. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuned_bert_policy_classifier_pipeline_en_5.5.0_3.0_1727219026231.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuned_bert_policy_classifier_pipeline_en_5.5.0_3.0_1727219026231.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuned_bert_policy_classifier_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuned_bert_policy_classifier_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuned_bert_policy_classifier_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/aryaniyaps/finetuned-bert-policy-classifier + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-finetuned_demo_2_shardev_en.md b/docs/_posts/ahmedlone127/2024-09-24-finetuned_demo_2_shardev_en.md new file mode 100644 index 00000000000000..b6e507b5440b5b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-finetuned_demo_2_shardev_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuned_demo_2_shardev DistilBertForSequenceClassification from Shardev +author: John Snow Labs +name: finetuned_demo_2_shardev +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuned_demo_2_shardev` is a English model originally trained by Shardev. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuned_demo_2_shardev_en_5.5.0_3.0_1727164141723.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuned_demo_2_shardev_en_5.5.0_3.0_1727164141723.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuned_demo_2_shardev","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuned_demo_2_shardev", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuned_demo_2_shardev| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|246.0 MB| + +## References + +https://huggingface.co/Shardev/finetuned_demo_2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-finetuned_demo_2_shardev_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-finetuned_demo_2_shardev_pipeline_en.md new file mode 100644 index 00000000000000..4d41b680b6ad15 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-finetuned_demo_2_shardev_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuned_demo_2_shardev_pipeline pipeline DistilBertForSequenceClassification from Shardev +author: John Snow Labs +name: finetuned_demo_2_shardev_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuned_demo_2_shardev_pipeline` is a English model originally trained by Shardev. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuned_demo_2_shardev_pipeline_en_5.5.0_3.0_1727164164033.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuned_demo_2_shardev_pipeline_en_5.5.0_3.0_1727164164033.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuned_demo_2_shardev_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuned_demo_2_shardev_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuned_demo_2_shardev_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|246.0 MB| + +## References + +https://huggingface.co/Shardev/finetuned_demo_2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-finetuned_distilroberta_base_semeval_en.md b/docs/_posts/ahmedlone127/2024-09-24-finetuned_distilroberta_base_semeval_en.md new file mode 100644 index 00000000000000..c3877150425537 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-finetuned_distilroberta_base_semeval_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuned_distilroberta_base_semeval RoBertaForSequenceClassification from Youssef320 +author: John Snow Labs +name: finetuned_distilroberta_base_semeval +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuned_distilroberta_base_semeval` is a English model originally trained by Youssef320. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuned_distilroberta_base_semeval_en_5.5.0_3.0_1727172120857.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuned_distilroberta_base_semeval_en_5.5.0_3.0_1727172120857.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("finetuned_distilroberta_base_semeval","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("finetuned_distilroberta_base_semeval", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuned_distilroberta_base_semeval| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|308.9 MB| + +## References + +https://huggingface.co/Youssef320/finetuned-distilroberta-base-SemEval \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-finetuned_distilroberta_base_semeval_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-finetuned_distilroberta_base_semeval_pipeline_en.md new file mode 100644 index 00000000000000..2ea0b8373bc7d2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-finetuned_distilroberta_base_semeval_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuned_distilroberta_base_semeval_pipeline pipeline RoBertaForSequenceClassification from Youssef320 +author: John Snow Labs +name: finetuned_distilroberta_base_semeval_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuned_distilroberta_base_semeval_pipeline` is a English model originally trained by Youssef320. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuned_distilroberta_base_semeval_pipeline_en_5.5.0_3.0_1727172137087.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuned_distilroberta_base_semeval_pipeline_en_5.5.0_3.0_1727172137087.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuned_distilroberta_base_semeval_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuned_distilroberta_base_semeval_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuned_distilroberta_base_semeval_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|308.9 MB| + +## References + +https://huggingface.co/Youssef320/finetuned-distilroberta-base-SemEval + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-finetunedemotionmodel_en.md b/docs/_posts/ahmedlone127/2024-09-24-finetunedemotionmodel_en.md new file mode 100644 index 00000000000000..651b6325330928 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-finetunedemotionmodel_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetunedemotionmodel DistilBertForSequenceClassification from Rishabh3108 +author: John Snow Labs +name: finetunedemotionmodel +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetunedemotionmodel` is a English model originally trained by Rishabh3108. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetunedemotionmodel_en_5.5.0_3.0_1727164242358.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetunedemotionmodel_en_5.5.0_3.0_1727164242358.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetunedemotionmodel","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetunedemotionmodel", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetunedemotionmodel| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Rishabh3108/finetunedemotionmodel \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-finetunedemotionmodel_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-finetunedemotionmodel_pipeline_en.md new file mode 100644 index 00000000000000..fb1ac65ebc5a42 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-finetunedemotionmodel_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetunedemotionmodel_pipeline pipeline DistilBertForSequenceClassification from Rishabh3108 +author: John Snow Labs +name: finetunedemotionmodel_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetunedemotionmodel_pipeline` is a English model originally trained by Rishabh3108. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetunedemotionmodel_pipeline_en_5.5.0_3.0_1727164255965.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetunedemotionmodel_pipeline_en_5.5.0_3.0_1727164255965.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetunedemotionmodel_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetunedemotionmodel_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetunedemotionmodel_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Rishabh3108/finetunedemotionmodel + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-finetuning_sentiment_analysis_model_3000_en.md b/docs/_posts/ahmedlone127/2024-09-24-finetuning_sentiment_analysis_model_3000_en.md new file mode 100644 index 00000000000000..0527c202a11c96 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-finetuning_sentiment_analysis_model_3000_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuning_sentiment_analysis_model_3000 DistilBertForSequenceClassification from gmvchile +author: John Snow Labs +name: finetuning_sentiment_analysis_model_3000 +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_analysis_model_3000` is a English model originally trained by gmvchile. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_analysis_model_3000_en_5.5.0_3.0_1727154517311.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_analysis_model_3000_en_5.5.0_3.0_1727154517311.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_analysis_model_3000","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_analysis_model_3000", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_analysis_model_3000| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/gmvchile/finetuning-sentiment-analysis-model-3000 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-finetuning_sentiment_analysis_model_3000_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-finetuning_sentiment_analysis_model_3000_pipeline_en.md new file mode 100644 index 00000000000000..25c5111d48de9f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-finetuning_sentiment_analysis_model_3000_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuning_sentiment_analysis_model_3000_pipeline pipeline DistilBertForSequenceClassification from gmvchile +author: John Snow Labs +name: finetuning_sentiment_analysis_model_3000_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_analysis_model_3000_pipeline` is a English model originally trained by gmvchile. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_analysis_model_3000_pipeline_en_5.5.0_3.0_1727154531964.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_analysis_model_3000_pipeline_en_5.5.0_3.0_1727154531964.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuning_sentiment_analysis_model_3000_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuning_sentiment_analysis_model_3000_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_analysis_model_3000_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/gmvchile/finetuning-sentiment-analysis-model-3000 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-finetuning_sentiment_model_3000_kaggle_en.md b/docs/_posts/ahmedlone127/2024-09-24-finetuning_sentiment_model_3000_kaggle_en.md new file mode 100644 index 00000000000000..09b587e0b71ff5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-finetuning_sentiment_model_3000_kaggle_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_kaggle DistilBertForSequenceClassification from Munshid123 +author: John Snow Labs +name: finetuning_sentiment_model_3000_kaggle +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_kaggle` is a English model originally trained by Munshid123. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_kaggle_en_5.5.0_3.0_1727154719128.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_kaggle_en_5.5.0_3.0_1727154719128.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_kaggle","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_kaggle", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_kaggle| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Munshid123/finetuning-sentiment-model-3000-kaggle \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-finetuning_sentiment_model_3000_samples_aadrik_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-finetuning_sentiment_model_3000_samples_aadrik_pipeline_en.md new file mode 100644 index 00000000000000..cbfc88fcc85464 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-finetuning_sentiment_model_3000_samples_aadrik_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_aadrik_pipeline pipeline DistilBertForSequenceClassification from aadrik +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_aadrik_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_aadrik_pipeline` is a English model originally trained by aadrik. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_aadrik_pipeline_en_5.5.0_3.0_1727164275627.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_aadrik_pipeline_en_5.5.0_3.0_1727164275627.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuning_sentiment_model_3000_samples_aadrik_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuning_sentiment_model_3000_samples_aadrik_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_aadrik_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/aadrik/finetuning-sentiment-model-3000-samples + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-finetuning_sentiment_model_3000_samples_dnzy_en.md b/docs/_posts/ahmedlone127/2024-09-24-finetuning_sentiment_model_3000_samples_dnzy_en.md new file mode 100644 index 00000000000000..a2be4f3f3918c8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-finetuning_sentiment_model_3000_samples_dnzy_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_dnzy DistilBertForSequenceClassification from DNZY +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_dnzy +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_dnzy` is a English model originally trained by DNZY. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_dnzy_en_5.5.0_3.0_1727164728881.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_dnzy_en_5.5.0_3.0_1727164728881.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_dnzy","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_dnzy", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_dnzy| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/DNZY/finetuning-sentiment-model-3000-samples \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-finetuning_sentiment_model_3000_samples_dnzy_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-finetuning_sentiment_model_3000_samples_dnzy_pipeline_en.md new file mode 100644 index 00000000000000..7af331dc4e4477 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-finetuning_sentiment_model_3000_samples_dnzy_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_dnzy_pipeline pipeline DistilBertForSequenceClassification from DNZY +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_dnzy_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_dnzy_pipeline` is a English model originally trained by DNZY. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_dnzy_pipeline_en_5.5.0_3.0_1727164741645.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_dnzy_pipeline_en_5.5.0_3.0_1727164741645.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuning_sentiment_model_3000_samples_dnzy_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuning_sentiment_model_3000_samples_dnzy_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_dnzy_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/DNZY/finetuning-sentiment-model-3000-samples + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-finetuning_sentiment_model_3000_samples_jbnextnext_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-finetuning_sentiment_model_3000_samples_jbnextnext_pipeline_en.md new file mode 100644 index 00000000000000..dd3d95774d2a09 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-finetuning_sentiment_model_3000_samples_jbnextnext_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_jbnextnext_pipeline pipeline DistilBertForSequenceClassification from jbnextnext +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_jbnextnext_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_jbnextnext_pipeline` is a English model originally trained by jbnextnext. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_jbnextnext_pipeline_en_5.5.0_3.0_1727155056284.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_jbnextnext_pipeline_en_5.5.0_3.0_1727155056284.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuning_sentiment_model_3000_samples_jbnextnext_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuning_sentiment_model_3000_samples_jbnextnext_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_jbnextnext_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/jbnextnext/finetuning-sentiment-model-3000-samples + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-finetuning_sentiment_model_3000_samples_yudingwang_en.md b/docs/_posts/ahmedlone127/2024-09-24-finetuning_sentiment_model_3000_samples_yudingwang_en.md new file mode 100644 index 00000000000000..56816ec02f03ec --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-finetuning_sentiment_model_3000_samples_yudingwang_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_yudingwang DistilBertForSequenceClassification from YudingWang +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_yudingwang +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_yudingwang` is a English model originally trained by YudingWang. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_yudingwang_en_5.5.0_3.0_1727137478682.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_yudingwang_en_5.5.0_3.0_1727137478682.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_yudingwang","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_yudingwang", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_yudingwang| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/YudingWang/finetuning-sentiment-model-3000-samples \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-finetuning_sentiment_model_3500_samples_train_kurtbadelt_en.md b/docs/_posts/ahmedlone127/2024-09-24-finetuning_sentiment_model_3500_samples_train_kurtbadelt_en.md new file mode 100644 index 00000000000000..0c1e159c8c93e1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-finetuning_sentiment_model_3500_samples_train_kurtbadelt_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuning_sentiment_model_3500_samples_train_kurtbadelt DistilBertForSequenceClassification from KurtBadelt +author: John Snow Labs +name: finetuning_sentiment_model_3500_samples_train_kurtbadelt +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3500_samples_train_kurtbadelt` is a English model originally trained by KurtBadelt. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3500_samples_train_kurtbadelt_en_5.5.0_3.0_1727154263260.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3500_samples_train_kurtbadelt_en_5.5.0_3.0_1727154263260.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3500_samples_train_kurtbadelt","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3500_samples_train_kurtbadelt", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3500_samples_train_kurtbadelt| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/KurtBadelt/finetuning-sentiment-model-3500-samples-train \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-finetuning_sentiment_model_3500_samples_train_kurtbadelt_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-finetuning_sentiment_model_3500_samples_train_kurtbadelt_pipeline_en.md new file mode 100644 index 00000000000000..5c7d287a875ef1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-finetuning_sentiment_model_3500_samples_train_kurtbadelt_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuning_sentiment_model_3500_samples_train_kurtbadelt_pipeline pipeline DistilBertForSequenceClassification from KurtBadelt +author: John Snow Labs +name: finetuning_sentiment_model_3500_samples_train_kurtbadelt_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3500_samples_train_kurtbadelt_pipeline` is a English model originally trained by KurtBadelt. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3500_samples_train_kurtbadelt_pipeline_en_5.5.0_3.0_1727154284385.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3500_samples_train_kurtbadelt_pipeline_en_5.5.0_3.0_1727154284385.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuning_sentiment_model_3500_samples_train_kurtbadelt_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuning_sentiment_model_3500_samples_train_kurtbadelt_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3500_samples_train_kurtbadelt_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/KurtBadelt/finetuning-sentiment-model-3500-samples-train + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-finetuning_sentiment_model_5000_samples_leonardosegurat_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-finetuning_sentiment_model_5000_samples_leonardosegurat_pipeline_en.md new file mode 100644 index 00000000000000..80a99b84638829 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-finetuning_sentiment_model_5000_samples_leonardosegurat_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuning_sentiment_model_5000_samples_leonardosegurat_pipeline pipeline DistilBertForSequenceClassification from leonardosegurat +author: John Snow Labs +name: finetuning_sentiment_model_5000_samples_leonardosegurat_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_5000_samples_leonardosegurat_pipeline` is a English model originally trained by leonardosegurat. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_5000_samples_leonardosegurat_pipeline_en_5.5.0_3.0_1727137501296.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_5000_samples_leonardosegurat_pipeline_en_5.5.0_3.0_1727137501296.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuning_sentiment_model_5000_samples_leonardosegurat_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuning_sentiment_model_5000_samples_leonardosegurat_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_5000_samples_leonardosegurat_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/leonardosegurat/finetuning-sentiment-model-5000-samples + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-finetuning_sentiment_model_fifa_15766_samples_en.md b/docs/_posts/ahmedlone127/2024-09-24-finetuning_sentiment_model_fifa_15766_samples_en.md new file mode 100644 index 00000000000000..a4a1bbfe19ce37 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-finetuning_sentiment_model_fifa_15766_samples_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuning_sentiment_model_fifa_15766_samples DistilBertForSequenceClassification from mdelrosa13 +author: John Snow Labs +name: finetuning_sentiment_model_fifa_15766_samples +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_fifa_15766_samples` is a English model originally trained by mdelrosa13. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_fifa_15766_samples_en_5.5.0_3.0_1727154813159.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_fifa_15766_samples_en_5.5.0_3.0_1727154813159.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_fifa_15766_samples","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_fifa_15766_samples", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_fifa_15766_samples| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/mdelrosa13/finetuning-sentiment-model-fifa-15766-samples \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-food_not_food_distill_bert_en.md b/docs/_posts/ahmedlone127/2024-09-24-food_not_food_distill_bert_en.md new file mode 100644 index 00000000000000..3c8db4e3fa20b4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-food_not_food_distill_bert_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English food_not_food_distill_bert DistilBertForSequenceClassification from ImpactTom6819 +author: John Snow Labs +name: food_not_food_distill_bert +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`food_not_food_distill_bert` is a English model originally trained by ImpactTom6819. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/food_not_food_distill_bert_en_5.5.0_3.0_1727205005327.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/food_not_food_distill_bert_en_5.5.0_3.0_1727205005327.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("food_not_food_distill_bert","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("food_not_food_distill_bert", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|food_not_food_distill_bert| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.4 MB| + +## References + +https://huggingface.co/ImpactTom6819/food_not_food_distill-bert \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-food_not_food_distill_bert_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-food_not_food_distill_bert_pipeline_en.md new file mode 100644 index 00000000000000..99e25a66a36f01 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-food_not_food_distill_bert_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English food_not_food_distill_bert_pipeline pipeline DistilBertForSequenceClassification from ImpactTom6819 +author: John Snow Labs +name: food_not_food_distill_bert_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`food_not_food_distill_bert_pipeline` is a English model originally trained by ImpactTom6819. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/food_not_food_distill_bert_pipeline_en_5.5.0_3.0_1727205019453.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/food_not_food_distill_bert_pipeline_en_5.5.0_3.0_1727205019453.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("food_not_food_distill_bert_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("food_not_food_distill_bert_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|food_not_food_distill_bert_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/ImpactTom6819/food_not_food_distill-bert + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-frabert_distilbert_base_uncased_augmented_en.md b/docs/_posts/ahmedlone127/2024-09-24-frabert_distilbert_base_uncased_augmented_en.md new file mode 100644 index 00000000000000..27ff9a22e324df --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-frabert_distilbert_base_uncased_augmented_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English frabert_distilbert_base_uncased_augmented DistilBertForSequenceClassification from Francesco0101 +author: John Snow Labs +name: frabert_distilbert_base_uncased_augmented +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`frabert_distilbert_base_uncased_augmented` is a English model originally trained by Francesco0101. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/frabert_distilbert_base_uncased_augmented_en_5.5.0_3.0_1727164141989.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/frabert_distilbert_base_uncased_augmented_en_5.5.0_3.0_1727164141989.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("frabert_distilbert_base_uncased_augmented","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("frabert_distilbert_base_uncased_augmented", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|frabert_distilbert_base_uncased_augmented| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Francesco0101/FRABERT-distilbert-base-uncased-AUGMENTED \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-fullcombined_manifesto10000_en.md b/docs/_posts/ahmedlone127/2024-09-24-fullcombined_manifesto10000_en.md new file mode 100644 index 00000000000000..7f4ce92bb9fe52 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-fullcombined_manifesto10000_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English fullcombined_manifesto10000 RoBertaForSequenceClassification from jordankrishnayah +author: John Snow Labs +name: fullcombined_manifesto10000 +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`fullcombined_manifesto10000` is a English model originally trained by jordankrishnayah. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/fullcombined_manifesto10000_en_5.5.0_3.0_1727171869930.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/fullcombined_manifesto10000_en_5.5.0_3.0_1727171869930.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("fullcombined_manifesto10000","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("fullcombined_manifesto10000", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|fullcombined_manifesto10000| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|468.2 MB| + +## References + +https://huggingface.co/jordankrishnayah/fullCombined-manifesto10000 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-fullcombined_manifesto10000_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-fullcombined_manifesto10000_pipeline_en.md new file mode 100644 index 00000000000000..6c3d877a79ca51 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-fullcombined_manifesto10000_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English fullcombined_manifesto10000_pipeline pipeline RoBertaForSequenceClassification from jordankrishnayah +author: John Snow Labs +name: fullcombined_manifesto10000_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`fullcombined_manifesto10000_pipeline` is a English model originally trained by jordankrishnayah. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/fullcombined_manifesto10000_pipeline_en_5.5.0_3.0_1727171893131.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/fullcombined_manifesto10000_pipeline_en_5.5.0_3.0_1727171893131.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("fullcombined_manifesto10000_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("fullcombined_manifesto10000_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|fullcombined_manifesto10000_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|468.2 MB| + +## References + +https://huggingface.co/jordankrishnayah/fullCombined-manifesto10000 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-furina_en.md b/docs/_posts/ahmedlone127/2024-09-24-furina_en.md new file mode 100644 index 00000000000000..54662528e5292b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-furina_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English furina XlmRoBertaEmbeddings from yihongLiu +author: John Snow Labs +name: furina +date: 2024-09-24 +tags: [en, open_source, onnx, embeddings, xlm_roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`furina` is a English model originally trained by yihongLiu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/furina_en_5.5.0_3.0_1727209808625.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/furina_en_5.5.0_3.0_1727209808625.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = XlmRoBertaEmbeddings.pretrained("furina","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = XlmRoBertaEmbeddings.pretrained("furina","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|furina| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[xlm_roberta]| +|Language:|en| +|Size:|1.5 GB| + +## References + +https://huggingface.co/yihongLiu/furina \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-furina_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-furina_pipeline_en.md new file mode 100644 index 00000000000000..1dc2c83a5f5f6b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-furina_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English furina_pipeline pipeline XlmRoBertaEmbeddings from yihongLiu +author: John Snow Labs +name: furina_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`furina_pipeline` is a English model originally trained by yihongLiu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/furina_pipeline_en_5.5.0_3.0_1727209881517.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/furina_pipeline_en_5.5.0_3.0_1727209881517.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("furina_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("furina_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|furina_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.5 GB| + +## References + +https://huggingface.co/yihongLiu/furina + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-german_english_code_switching_bert_en.md b/docs/_posts/ahmedlone127/2024-09-24-german_english_code_switching_bert_en.md new file mode 100644 index 00000000000000..1a7a476fce42c1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-german_english_code_switching_bert_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English german_english_code_switching_bert BertEmbeddings from igorsterner +author: John Snow Labs +name: german_english_code_switching_bert +date: 2024-09-24 +tags: [en, open_source, onnx, embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`german_english_code_switching_bert` is a English model originally trained by igorsterner. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/german_english_code_switching_bert_en_5.5.0_3.0_1727220815162.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/german_english_code_switching_bert_en_5.5.0_3.0_1727220815162.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = BertEmbeddings.pretrained("german_english_code_switching_bert","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = BertEmbeddings.pretrained("german_english_code_switching_bert","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|german_english_code_switching_bert| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[bert]| +|Language:|en| +|Size:|664.7 MB| + +## References + +https://huggingface.co/igorsterner/german-english-code-switching-bert \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-german_english_code_switching_bert_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-german_english_code_switching_bert_pipeline_en.md new file mode 100644 index 00000000000000..a0917a247a894a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-german_english_code_switching_bert_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English german_english_code_switching_bert_pipeline pipeline BertEmbeddings from igorsterner +author: John Snow Labs +name: german_english_code_switching_bert_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`german_english_code_switching_bert_pipeline` is a English model originally trained by igorsterner. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/german_english_code_switching_bert_pipeline_en_5.5.0_3.0_1727220848520.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/german_english_code_switching_bert_pipeline_en_5.5.0_3.0_1727220848520.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("german_english_code_switching_bert_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("german_english_code_switching_bert_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|german_english_code_switching_bert_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|664.7 MB| + +## References + +https://huggingface.co/igorsterner/german-english-code-switching-bert + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-hate_hate_balance_random0_seed2_bertweet_large_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-hate_hate_balance_random0_seed2_bertweet_large_pipeline_en.md new file mode 100644 index 00000000000000..c561a7d782168e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-hate_hate_balance_random0_seed2_bertweet_large_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English hate_hate_balance_random0_seed2_bertweet_large_pipeline pipeline RoBertaForSequenceClassification from tweettemposhift +author: John Snow Labs +name: hate_hate_balance_random0_seed2_bertweet_large_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`hate_hate_balance_random0_seed2_bertweet_large_pipeline` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/hate_hate_balance_random0_seed2_bertweet_large_pipeline_en_5.5.0_3.0_1727171820471.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/hate_hate_balance_random0_seed2_bertweet_large_pipeline_en_5.5.0_3.0_1727171820471.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("hate_hate_balance_random0_seed2_bertweet_large_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("hate_hate_balance_random0_seed2_bertweet_large_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|hate_hate_balance_random0_seed2_bertweet_large_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/tweettemposhift/hate-hate_balance_random0_seed2-bertweet-large + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-hate_hate_balance_random3_seed1_bernice_en.md b/docs/_posts/ahmedlone127/2024-09-24-hate_hate_balance_random3_seed1_bernice_en.md new file mode 100644 index 00000000000000..534ae9ee51f926 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-hate_hate_balance_random3_seed1_bernice_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English hate_hate_balance_random3_seed1_bernice XlmRoBertaForSequenceClassification from tweettemposhift +author: John Snow Labs +name: hate_hate_balance_random3_seed1_bernice +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`hate_hate_balance_random3_seed1_bernice` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/hate_hate_balance_random3_seed1_bernice_en_5.5.0_3.0_1727153045262.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/hate_hate_balance_random3_seed1_bernice_en_5.5.0_3.0_1727153045262.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("hate_hate_balance_random3_seed1_bernice","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("hate_hate_balance_random3_seed1_bernice", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|hate_hate_balance_random3_seed1_bernice| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|783.5 MB| + +## References + +https://huggingface.co/tweettemposhift/hate-hate_balance_random3_seed1-bernice \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-hate_hate_balance_random3_seed1_bernice_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-hate_hate_balance_random3_seed1_bernice_pipeline_en.md new file mode 100644 index 00000000000000..6654ba4b6ef657 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-hate_hate_balance_random3_seed1_bernice_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English hate_hate_balance_random3_seed1_bernice_pipeline pipeline XlmRoBertaForSequenceClassification from tweettemposhift +author: John Snow Labs +name: hate_hate_balance_random3_seed1_bernice_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`hate_hate_balance_random3_seed1_bernice_pipeline` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/hate_hate_balance_random3_seed1_bernice_pipeline_en_5.5.0_3.0_1727153189704.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/hate_hate_balance_random3_seed1_bernice_pipeline_en_5.5.0_3.0_1727153189704.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("hate_hate_balance_random3_seed1_bernice_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("hate_hate_balance_random3_seed1_bernice_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|hate_hate_balance_random3_seed1_bernice_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|783.5 MB| + +## References + +https://huggingface.co/tweettemposhift/hate-hate_balance_random3_seed1-bernice + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-hate_hate_random1_seed2_twitter_roberta_base_2022_154m_en.md b/docs/_posts/ahmedlone127/2024-09-24-hate_hate_random1_seed2_twitter_roberta_base_2022_154m_en.md new file mode 100644 index 00000000000000..1f26fc8c80f02a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-hate_hate_random1_seed2_twitter_roberta_base_2022_154m_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English hate_hate_random1_seed2_twitter_roberta_base_2022_154m RoBertaForSequenceClassification from tweettemposhift +author: John Snow Labs +name: hate_hate_random1_seed2_twitter_roberta_base_2022_154m +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`hate_hate_random1_seed2_twitter_roberta_base_2022_154m` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/hate_hate_random1_seed2_twitter_roberta_base_2022_154m_en_5.5.0_3.0_1727171955647.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/hate_hate_random1_seed2_twitter_roberta_base_2022_154m_en_5.5.0_3.0_1727171955647.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("hate_hate_random1_seed2_twitter_roberta_base_2022_154m","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("hate_hate_random1_seed2_twitter_roberta_base_2022_154m", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|hate_hate_random1_seed2_twitter_roberta_base_2022_154m| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|468.1 MB| + +## References + +https://huggingface.co/tweettemposhift/hate-hate_random1_seed2-twitter-roberta-base-2022-154m \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-hate_hate_random1_seed2_twitter_roberta_base_2022_154m_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-hate_hate_random1_seed2_twitter_roberta_base_2022_154m_pipeline_en.md new file mode 100644 index 00000000000000..f9b6205cd4474b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-hate_hate_random1_seed2_twitter_roberta_base_2022_154m_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English hate_hate_random1_seed2_twitter_roberta_base_2022_154m_pipeline pipeline RoBertaForSequenceClassification from tweettemposhift +author: John Snow Labs +name: hate_hate_random1_seed2_twitter_roberta_base_2022_154m_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`hate_hate_random1_seed2_twitter_roberta_base_2022_154m_pipeline` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/hate_hate_random1_seed2_twitter_roberta_base_2022_154m_pipeline_en_5.5.0_3.0_1727171978817.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/hate_hate_random1_seed2_twitter_roberta_base_2022_154m_pipeline_en_5.5.0_3.0_1727171978817.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("hate_hate_random1_seed2_twitter_roberta_base_2022_154m_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("hate_hate_random1_seed2_twitter_roberta_base_2022_154m_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|hate_hate_random1_seed2_twitter_roberta_base_2022_154m_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|468.1 MB| + +## References + +https://huggingface.co/tweettemposhift/hate-hate_random1_seed2-twitter-roberta-base-2022-154m + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-homophobicclassfication_roberta_large_finetuned_model2_en.md b/docs/_posts/ahmedlone127/2024-09-24-homophobicclassfication_roberta_large_finetuned_model2_en.md new file mode 100644 index 00000000000000..5161c18f6630dd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-homophobicclassfication_roberta_large_finetuned_model2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English homophobicclassfication_roberta_large_finetuned_model2 RoBertaForSequenceClassification from conorgee +author: John Snow Labs +name: homophobicclassfication_roberta_large_finetuned_model2 +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`homophobicclassfication_roberta_large_finetuned_model2` is a English model originally trained by conorgee. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/homophobicclassfication_roberta_large_finetuned_model2_en_5.5.0_3.0_1727168146073.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/homophobicclassfication_roberta_large_finetuned_model2_en_5.5.0_3.0_1727168146073.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("homophobicclassfication_roberta_large_finetuned_model2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("homophobicclassfication_roberta_large_finetuned_model2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|homophobicclassfication_roberta_large_finetuned_model2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/conorgee/HomophobicClassfication_roberta-large_fineTuned_model2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-homophobicclassfication_roberta_large_finetuned_model2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-homophobicclassfication_roberta_large_finetuned_model2_pipeline_en.md new file mode 100644 index 00000000000000..4972b5c64c7c2f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-homophobicclassfication_roberta_large_finetuned_model2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English homophobicclassfication_roberta_large_finetuned_model2_pipeline pipeline RoBertaForSequenceClassification from conorgee +author: John Snow Labs +name: homophobicclassfication_roberta_large_finetuned_model2_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`homophobicclassfication_roberta_large_finetuned_model2_pipeline` is a English model originally trained by conorgee. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/homophobicclassfication_roberta_large_finetuned_model2_pipeline_en_5.5.0_3.0_1727168228310.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/homophobicclassfication_roberta_large_finetuned_model2_pipeline_en_5.5.0_3.0_1727168228310.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("homophobicclassfication_roberta_large_finetuned_model2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("homophobicclassfication_roberta_large_finetuned_model2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|homophobicclassfication_roberta_large_finetuned_model2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/conorgee/HomophobicClassfication_roberta-large_fineTuned_model2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-icebert_is.md b/docs/_posts/ahmedlone127/2024-09-24-icebert_is.md new file mode 100644 index 00000000000000..4083b773b0582d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-icebert_is.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Icelandic icebert RoBertaEmbeddings from mideind +author: John Snow Labs +name: icebert +date: 2024-09-24 +tags: [is, open_source, onnx, embeddings, roberta] +task: Embeddings +language: is +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`icebert` is a Icelandic model originally trained by mideind. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/icebert_is_5.5.0_3.0_1727216135268.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/icebert_is_5.5.0_3.0_1727216135268.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("icebert","is") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("icebert","is") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|icebert| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|is| +|Size:|296.5 MB| + +## References + +https://huggingface.co/mideind/IceBERT \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-icebert_pipeline_is.md b/docs/_posts/ahmedlone127/2024-09-24-icebert_pipeline_is.md new file mode 100644 index 00000000000000..2257169e92fe4b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-icebert_pipeline_is.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Icelandic icebert_pipeline pipeline RoBertaEmbeddings from mideind +author: John Snow Labs +name: icebert_pipeline +date: 2024-09-24 +tags: [is, open_source, pipeline, onnx] +task: Embeddings +language: is +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`icebert_pipeline` is a Icelandic model originally trained by mideind. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/icebert_pipeline_is_5.5.0_3.0_1727216223007.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/icebert_pipeline_is_5.5.0_3.0_1727216223007.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("icebert_pipeline", lang = "is") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("icebert_pipeline", lang = "is") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|icebert_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|is| +|Size:|296.5 MB| + +## References + +https://huggingface.co/mideind/IceBERT + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-imperialism_ner_en.md b/docs/_posts/ahmedlone127/2024-09-24-imperialism_ner_en.md new file mode 100644 index 00000000000000..a7bc15efca3a43 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-imperialism_ner_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English imperialism_ner RoBertaForTokenClassification from matthewleechen +author: John Snow Labs +name: imperialism_ner +date: 2024-09-24 +tags: [en, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`imperialism_ner` is a English model originally trained by matthewleechen. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/imperialism_ner_en_5.5.0_3.0_1727150917558.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/imperialism_ner_en_5.5.0_3.0_1727150917558.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("imperialism_ner","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("imperialism_ner", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|imperialism_ner| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/matthewleechen/imperialism-ner \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-imperialism_ner_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-imperialism_ner_pipeline_en.md new file mode 100644 index 00000000000000..c8c57103596a2f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-imperialism_ner_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English imperialism_ner_pipeline pipeline RoBertaForTokenClassification from matthewleechen +author: John Snow Labs +name: imperialism_ner_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`imperialism_ner_pipeline` is a English model originally trained by matthewleechen. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/imperialism_ner_pipeline_en_5.5.0_3.0_1727151001241.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/imperialism_ner_pipeline_en_5.5.0_3.0_1727151001241.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("imperialism_ner_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("imperialism_ner_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|imperialism_ner_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/matthewleechen/imperialism-ner + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-indobert_lite_squad_id.md b/docs/_posts/ahmedlone127/2024-09-24-indobert_lite_squad_id.md new file mode 100644 index 00000000000000..d594f245983c8f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-indobert_lite_squad_id.md @@ -0,0 +1,86 @@ +--- +layout: model +title: Indonesian indobert_lite_squad BertForQuestionAnswering from Wikidepia +author: John Snow Labs +name: indobert_lite_squad +date: 2024-09-24 +tags: [id, open_source, onnx, question_answering, bert] +task: Question Answering +language: id +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`indobert_lite_squad` is a Indonesian model originally trained by Wikidepia. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/indobert_lite_squad_id_5.5.0_3.0_1727206899323.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/indobert_lite_squad_id_5.5.0_3.0_1727206899323.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("indobert_lite_squad","id") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("indobert_lite_squad", "id") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|indobert_lite_squad| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|id| +|Size:|41.9 MB| + +## References + +https://huggingface.co/Wikidepia/indobert-lite-squad \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-indobert_lite_squad_pipeline_id.md b/docs/_posts/ahmedlone127/2024-09-24-indobert_lite_squad_pipeline_id.md new file mode 100644 index 00000000000000..c2b885ecca5f85 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-indobert_lite_squad_pipeline_id.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Indonesian indobert_lite_squad_pipeline pipeline BertForQuestionAnswering from Wikidepia +author: John Snow Labs +name: indobert_lite_squad_pipeline +date: 2024-09-24 +tags: [id, open_source, pipeline, onnx] +task: Question Answering +language: id +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`indobert_lite_squad_pipeline` is a Indonesian model originally trained by Wikidepia. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/indobert_lite_squad_pipeline_id_5.5.0_3.0_1727206901687.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/indobert_lite_squad_pipeline_id_5.5.0_3.0_1727206901687.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("indobert_lite_squad_pipeline", lang = "id") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("indobert_lite_squad_pipeline", lang = "id") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|indobert_lite_squad_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|id| +|Size:|41.9 MB| + +## References + +https://huggingface.co/Wikidepia/indobert-lite-squad + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-italian_legal_bert_finetuned_squad_italian_it.md b/docs/_posts/ahmedlone127/2024-09-24-italian_legal_bert_finetuned_squad_italian_it.md new file mode 100644 index 00000000000000..abb426e2138266 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-italian_legal_bert_finetuned_squad_italian_it.md @@ -0,0 +1,86 @@ +--- +layout: model +title: Italian italian_legal_bert_finetuned_squad_italian BertForQuestionAnswering from Decre99 +author: John Snow Labs +name: italian_legal_bert_finetuned_squad_italian +date: 2024-09-24 +tags: [it, open_source, onnx, question_answering, bert] +task: Question Answering +language: it +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`italian_legal_bert_finetuned_squad_italian` is a Italian model originally trained by Decre99. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/italian_legal_bert_finetuned_squad_italian_it_5.5.0_3.0_1727163557917.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/italian_legal_bert_finetuned_squad_italian_it_5.5.0_3.0_1727163557917.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("italian_legal_bert_finetuned_squad_italian","it") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("italian_legal_bert_finetuned_squad_italian", "it") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|italian_legal_bert_finetuned_squad_italian| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|it| +|Size:|408.9 MB| + +## References + +https://huggingface.co/Decre99/Italian-Legal-BERT-finetuned-squad-it \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-italian_legal_bert_finetuned_squad_italian_pipeline_it.md b/docs/_posts/ahmedlone127/2024-09-24-italian_legal_bert_finetuned_squad_italian_pipeline_it.md new file mode 100644 index 00000000000000..db39f2164297b8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-italian_legal_bert_finetuned_squad_italian_pipeline_it.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Italian italian_legal_bert_finetuned_squad_italian_pipeline pipeline BertForQuestionAnswering from Decre99 +author: John Snow Labs +name: italian_legal_bert_finetuned_squad_italian_pipeline +date: 2024-09-24 +tags: [it, open_source, pipeline, onnx] +task: Question Answering +language: it +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`italian_legal_bert_finetuned_squad_italian_pipeline` is a Italian model originally trained by Decre99. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/italian_legal_bert_finetuned_squad_italian_pipeline_it_5.5.0_3.0_1727163579981.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/italian_legal_bert_finetuned_squad_italian_pipeline_it_5.5.0_3.0_1727163579981.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("italian_legal_bert_finetuned_squad_italian_pipeline", lang = "it") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("italian_legal_bert_finetuned_squad_italian_pipeline", lang = "it") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|italian_legal_bert_finetuned_squad_italian_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|it| +|Size:|408.9 MB| + +## References + +https://huggingface.co/Decre99/Italian-Legal-BERT-finetuned-squad-it + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-jailbreak_classifier_en.md b/docs/_posts/ahmedlone127/2024-09-24-jailbreak_classifier_en.md new file mode 100644 index 00000000000000..73c25d68b9f9fa --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-jailbreak_classifier_en.md @@ -0,0 +1,98 @@ +--- +layout: model +title: English jailbreak_classifier BertForSequenceClassification from jackhhao +author: John Snow Labs +name: jailbreak_classifier +date: 2024-09-24 +tags: [bert, en, open_source, sequence_classification, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`jailbreak_classifier` is a English model originally trained by jackhhao. + +## Predicted Entities + + + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/jailbreak_classifier_en_5.5.0_3.0_1727149486052.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/jailbreak_classifier_en_5.5.0_3.0_1727149486052.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +document_assembler = DocumentAssembler()\ + .setInputCol("text")\ + .setOutputCol("document") + +tokenizer = Tokenizer()\ + .setInputCols("document")\ + .setOutputCol("token") + +sequenceClassifier = BertForSequenceClassification.pretrained("jailbreak_classifier","en")\ + .setInputCols(["document","token"])\ + .setOutputCol("class") + +pipeline = Pipeline().setStages([document_assembler, tokenizer, sequenceClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val document_assembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("jailbreak_classifier","en") + .setInputCols(Array("document","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|jailbreak_classifier| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +References + +https://huggingface.co/jackhhao/jailbreak-classifier \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-jailbreak_classifier_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-jailbreak_classifier_pipeline_en.md new file mode 100644 index 00000000000000..4806cdfc122de2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-jailbreak_classifier_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English jailbreak_classifier_pipeline pipeline BertForSequenceClassification from lordofthejars +author: John Snow Labs +name: jailbreak_classifier_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`jailbreak_classifier_pipeline` is a English model originally trained by lordofthejars. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/jailbreak_classifier_pipeline_en_5.5.0_3.0_1727149507802.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/jailbreak_classifier_pipeline_en_5.5.0_3.0_1727149507802.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("jailbreak_classifier_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("jailbreak_classifier_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|jailbreak_classifier_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/lordofthejars/jailbreak-classifier + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-jmedroberta_base_sentencepiece_ja.md b/docs/_posts/ahmedlone127/2024-09-24-jmedroberta_base_sentencepiece_ja.md new file mode 100644 index 00000000000000..12adfbad41ad93 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-jmedroberta_base_sentencepiece_ja.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Japanese jmedroberta_base_sentencepiece BertEmbeddings from alabnii +author: John Snow Labs +name: jmedroberta_base_sentencepiece +date: 2024-09-24 +tags: [ja, open_source, onnx, embeddings, bert] +task: Embeddings +language: ja +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`jmedroberta_base_sentencepiece` is a Japanese model originally trained by alabnii. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/jmedroberta_base_sentencepiece_ja_5.5.0_3.0_1727220944206.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/jmedroberta_base_sentencepiece_ja_5.5.0_3.0_1727220944206.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = BertEmbeddings.pretrained("jmedroberta_base_sentencepiece","ja") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = BertEmbeddings.pretrained("jmedroberta_base_sentencepiece","ja") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|jmedroberta_base_sentencepiece| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[bert]| +|Language:|ja| +|Size:|406.1 MB| + +## References + +https://huggingface.co/alabnii/jmedroberta-base-sentencepiece \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-joe_roberta_en.md b/docs/_posts/ahmedlone127/2024-09-24-joe_roberta_en.md new file mode 100644 index 00000000000000..92bdd208ac8b93 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-joe_roberta_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English joe_roberta RoBertaForSequenceClassification from Gikubu +author: John Snow Labs +name: joe_roberta +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`joe_roberta` is a English model originally trained by Gikubu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/joe_roberta_en_5.5.0_3.0_1727167617417.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/joe_roberta_en_5.5.0_3.0_1727167617417.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("joe_roberta","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("joe_roberta", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|joe_roberta| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|444.0 MB| + +## References + +https://huggingface.co/Gikubu/joe_roberta \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-khipu_finetuned_amazon_reviews_multi_cpiana_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-khipu_finetuned_amazon_reviews_multi_cpiana_pipeline_en.md new file mode 100644 index 00000000000000..c3a1eee62c0e96 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-khipu_finetuned_amazon_reviews_multi_cpiana_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English khipu_finetuned_amazon_reviews_multi_cpiana_pipeline pipeline RoBertaForSequenceClassification from cpiana +author: John Snow Labs +name: khipu_finetuned_amazon_reviews_multi_cpiana_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`khipu_finetuned_amazon_reviews_multi_cpiana_pipeline` is a English model originally trained by cpiana. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/khipu_finetuned_amazon_reviews_multi_cpiana_pipeline_en_5.5.0_3.0_1727167329955.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/khipu_finetuned_amazon_reviews_multi_cpiana_pipeline_en_5.5.0_3.0_1727167329955.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("khipu_finetuned_amazon_reviews_multi_cpiana_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("khipu_finetuned_amazon_reviews_multi_cpiana_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|khipu_finetuned_amazon_reviews_multi_cpiana_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|428.9 MB| + +## References + +https://huggingface.co/cpiana/khipu-finetuned-amazon_reviews_multi + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-kinyaroberta_large_kinte_finetuned_kinyarwanda_sent3_en.md b/docs/_posts/ahmedlone127/2024-09-24-kinyaroberta_large_kinte_finetuned_kinyarwanda_sent3_en.md new file mode 100644 index 00000000000000..0835043838713f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-kinyaroberta_large_kinte_finetuned_kinyarwanda_sent3_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English kinyaroberta_large_kinte_finetuned_kinyarwanda_sent3 RoBertaForSequenceClassification from RogerB +author: John Snow Labs +name: kinyaroberta_large_kinte_finetuned_kinyarwanda_sent3 +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`kinyaroberta_large_kinte_finetuned_kinyarwanda_sent3` is a English model originally trained by RogerB. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/kinyaroberta_large_kinte_finetuned_kinyarwanda_sent3_en_5.5.0_3.0_1727171180880.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/kinyaroberta_large_kinte_finetuned_kinyarwanda_sent3_en_5.5.0_3.0_1727171180880.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("kinyaroberta_large_kinte_finetuned_kinyarwanda_sent3","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("kinyaroberta_large_kinte_finetuned_kinyarwanda_sent3", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|kinyaroberta_large_kinte_finetuned_kinyarwanda_sent3| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.3 MB| + +## References + +https://huggingface.co/RogerB/kinyaRoberta-large-kinte-finetuned-kin-sent3 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-kinyaroberta_large_kinte_finetuned_kinyarwanda_sent3_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-kinyaroberta_large_kinte_finetuned_kinyarwanda_sent3_pipeline_en.md new file mode 100644 index 00000000000000..014eb6a44babdb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-kinyaroberta_large_kinte_finetuned_kinyarwanda_sent3_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English kinyaroberta_large_kinte_finetuned_kinyarwanda_sent3_pipeline pipeline RoBertaForSequenceClassification from RogerB +author: John Snow Labs +name: kinyaroberta_large_kinte_finetuned_kinyarwanda_sent3_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`kinyaroberta_large_kinte_finetuned_kinyarwanda_sent3_pipeline` is a English model originally trained by RogerB. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/kinyaroberta_large_kinte_finetuned_kinyarwanda_sent3_pipeline_en_5.5.0_3.0_1727171201410.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/kinyaroberta_large_kinte_finetuned_kinyarwanda_sent3_pipeline_en_5.5.0_3.0_1727171201410.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("kinyaroberta_large_kinte_finetuned_kinyarwanda_sent3_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("kinyaroberta_large_kinte_finetuned_kinyarwanda_sent3_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|kinyaroberta_large_kinte_finetuned_kinyarwanda_sent3_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.3 MB| + +## References + +https://huggingface.co/RogerB/kinyaRoberta-large-kinte-finetuned-kin-sent3 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-kpfbert_korquad_1_en.md b/docs/_posts/ahmedlone127/2024-09-24-kpfbert_korquad_1_en.md new file mode 100644 index 00000000000000..196e6a669b836b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-kpfbert_korquad_1_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English kpfbert_korquad_1 BertForQuestionAnswering from eeeyounglee +author: John Snow Labs +name: kpfbert_korquad_1 +date: 2024-09-24 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`kpfbert_korquad_1` is a English model originally trained by eeeyounglee. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/kpfbert_korquad_1_en_5.5.0_3.0_1727176045436.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/kpfbert_korquad_1_en_5.5.0_3.0_1727176045436.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("kpfbert_korquad_1","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("kpfbert_korquad_1", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|kpfbert_korquad_1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|425.1 MB| + +## References + +https://huggingface.co/eeeyounglee/kpfbert-korquad-1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-kpfbert_korquad_1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-kpfbert_korquad_1_pipeline_en.md new file mode 100644 index 00000000000000..7e2243585c1af3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-kpfbert_korquad_1_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English kpfbert_korquad_1_pipeline pipeline BertForQuestionAnswering from eeeyounglee +author: John Snow Labs +name: kpfbert_korquad_1_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`kpfbert_korquad_1_pipeline` is a English model originally trained by eeeyounglee. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/kpfbert_korquad_1_pipeline_en_5.5.0_3.0_1727176067447.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/kpfbert_korquad_1_pipeline_en_5.5.0_3.0_1727176067447.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("kpfbert_korquad_1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("kpfbert_korquad_1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|kpfbert_korquad_1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|425.1 MB| + +## References + +https://huggingface.co/eeeyounglee/kpfbert-korquad-1 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-legal_bert_small_filtered_cuad_en.md b/docs/_posts/ahmedlone127/2024-09-24-legal_bert_small_filtered_cuad_en.md new file mode 100644 index 00000000000000..6e27fb91a9a4e8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-legal_bert_small_filtered_cuad_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English legal_bert_small_filtered_cuad BertForQuestionAnswering from alex-apostolo +author: John Snow Labs +name: legal_bert_small_filtered_cuad +date: 2024-09-24 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`legal_bert_small_filtered_cuad` is a English model originally trained by alex-apostolo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/legal_bert_small_filtered_cuad_en_5.5.0_3.0_1727175449083.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/legal_bert_small_filtered_cuad_en_5.5.0_3.0_1727175449083.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("legal_bert_small_filtered_cuad","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("legal_bert_small_filtered_cuad", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|legal_bert_small_filtered_cuad| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|130.6 MB| + +## References + +https://huggingface.co/alex-apostolo/legal-bert-small-filtered-cuad \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-legalbert_large_1_7m_2_class_actions_en.md b/docs/_posts/ahmedlone127/2024-09-24-legalbert_large_1_7m_2_class_actions_en.md new file mode 100644 index 00000000000000..cddba3e62a1f12 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-legalbert_large_1_7m_2_class_actions_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English legalbert_large_1_7m_2_class_actions BertForSequenceClassification from afsuarezg +author: John Snow Labs +name: legalbert_large_1_7m_2_class_actions +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`legalbert_large_1_7m_2_class_actions` is a English model originally trained by afsuarezg. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/legalbert_large_1_7m_2_class_actions_en_5.5.0_3.0_1727221908635.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/legalbert_large_1_7m_2_class_actions_en_5.5.0_3.0_1727221908635.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("legalbert_large_1_7m_2_class_actions","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("legalbert_large_1_7m_2_class_actions", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|legalbert_large_1_7m_2_class_actions| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/afsuarezg/legalbert-large-1.7M-2_class_actions \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-legalbert_large_1_7m_2_class_actions_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-legalbert_large_1_7m_2_class_actions_pipeline_en.md new file mode 100644 index 00000000000000..5396e465f0f928 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-legalbert_large_1_7m_2_class_actions_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English legalbert_large_1_7m_2_class_actions_pipeline pipeline BertForSequenceClassification from afsuarezg +author: John Snow Labs +name: legalbert_large_1_7m_2_class_actions_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`legalbert_large_1_7m_2_class_actions_pipeline` is a English model originally trained by afsuarezg. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/legalbert_large_1_7m_2_class_actions_pipeline_en_5.5.0_3.0_1727221974056.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/legalbert_large_1_7m_2_class_actions_pipeline_en_5.5.0_3.0_1727221974056.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("legalbert_large_1_7m_2_class_actions_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("legalbert_large_1_7m_2_class_actions_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|legalbert_large_1_7m_2_class_actions_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/afsuarezg/legalbert-large-1.7M-2_class_actions + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-less_300000_xlm_roberta_mmar_recipe_10_en.md b/docs/_posts/ahmedlone127/2024-09-24-less_300000_xlm_roberta_mmar_recipe_10_en.md new file mode 100644 index 00000000000000..7ce9c45d9ec7c1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-less_300000_xlm_roberta_mmar_recipe_10_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English less_300000_xlm_roberta_mmar_recipe_10 XlmRoBertaEmbeddings from CennetOguz +author: John Snow Labs +name: less_300000_xlm_roberta_mmar_recipe_10 +date: 2024-09-24 +tags: [en, open_source, onnx, embeddings, xlm_roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`less_300000_xlm_roberta_mmar_recipe_10` is a English model originally trained by CennetOguz. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/less_300000_xlm_roberta_mmar_recipe_10_en_5.5.0_3.0_1727209434850.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/less_300000_xlm_roberta_mmar_recipe_10_en_5.5.0_3.0_1727209434850.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = XlmRoBertaEmbeddings.pretrained("less_300000_xlm_roberta_mmar_recipe_10","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = XlmRoBertaEmbeddings.pretrained("less_300000_xlm_roberta_mmar_recipe_10","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|less_300000_xlm_roberta_mmar_recipe_10| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[xlm_roberta]| +|Language:|en| +|Size:|1.0 GB| + +## References + +https://huggingface.co/CennetOguz/less_300000_xlm_roberta_mmar_recipe_10 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-less_300000_xlm_roberta_mmar_recipe_10_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-less_300000_xlm_roberta_mmar_recipe_10_pipeline_en.md new file mode 100644 index 00000000000000..de25cac91fa649 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-less_300000_xlm_roberta_mmar_recipe_10_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English less_300000_xlm_roberta_mmar_recipe_10_pipeline pipeline XlmRoBertaEmbeddings from CennetOguz +author: John Snow Labs +name: less_300000_xlm_roberta_mmar_recipe_10_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`less_300000_xlm_roberta_mmar_recipe_10_pipeline` is a English model originally trained by CennetOguz. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/less_300000_xlm_roberta_mmar_recipe_10_pipeline_en_5.5.0_3.0_1727209488775.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/less_300000_xlm_roberta_mmar_recipe_10_pipeline_en_5.5.0_3.0_1727209488775.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("less_300000_xlm_roberta_mmar_recipe_10_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("less_300000_xlm_roberta_mmar_recipe_10_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|less_300000_xlm_roberta_mmar_recipe_10_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.0 GB| + +## References + +https://huggingface.co/CennetOguz/less_300000_xlm_roberta_mmar_recipe_10 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-lnmt15_en.md b/docs/_posts/ahmedlone127/2024-09-24-lnmt15_en.md new file mode 100644 index 00000000000000..23a811e4d04498 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-lnmt15_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English lnmt15 DistilBertForSequenceClassification from carmenlozano +author: John Snow Labs +name: lnmt15 +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`lnmt15` is a English model originally trained by carmenlozano. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/lnmt15_en_5.5.0_3.0_1727154835592.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/lnmt15_en_5.5.0_3.0_1727154835592.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("lnmt15","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("lnmt15", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|lnmt15| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.6 MB| + +## References + +https://huggingface.co/carmenlozano/lnmt15 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-lnmt15_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-lnmt15_pipeline_en.md new file mode 100644 index 00000000000000..ffb6fa284cda97 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-lnmt15_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English lnmt15_pipeline pipeline DistilBertForSequenceClassification from carmenlozano +author: John Snow Labs +name: lnmt15_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`lnmt15_pipeline` is a English model originally trained by carmenlozano. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/lnmt15_pipeline_en_5.5.0_3.0_1727154851421.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/lnmt15_pipeline_en_5.5.0_3.0_1727154851421.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("lnmt15_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("lnmt15_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|lnmt15_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.6 MB| + +## References + +https://huggingface.co/carmenlozano/lnmt15 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-malayalam_qa_model_pipeline_ml.md b/docs/_posts/ahmedlone127/2024-09-24-malayalam_qa_model_pipeline_ml.md new file mode 100644 index 00000000000000..dfd857f9b83fd0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-malayalam_qa_model_pipeline_ml.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Malayalam malayalam_qa_model_pipeline pipeline BertForQuestionAnswering from Anitha2020 +author: John Snow Labs +name: malayalam_qa_model_pipeline +date: 2024-09-24 +tags: [ml, open_source, pipeline, onnx] +task: Question Answering +language: ml +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`malayalam_qa_model_pipeline` is a Malayalam model originally trained by Anitha2020. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/malayalam_qa_model_pipeline_ml_5.5.0_3.0_1727163232993.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/malayalam_qa_model_pipeline_ml_5.5.0_3.0_1727163232993.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("malayalam_qa_model_pipeline", lang = "ml") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("malayalam_qa_model_pipeline", lang = "ml") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|malayalam_qa_model_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|ml| +|Size:|890.5 MB| + +## References + +https://huggingface.co/Anitha2020/Malayalam_QA_model + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-mbert_argmining_abstrct_english_spanish_pipeline_es.md b/docs/_posts/ahmedlone127/2024-09-24-mbert_argmining_abstrct_english_spanish_pipeline_es.md new file mode 100644 index 00000000000000..c02f37279a3b05 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-mbert_argmining_abstrct_english_spanish_pipeline_es.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Castilian, Spanish mbert_argmining_abstrct_english_spanish_pipeline pipeline BertForTokenClassification from HiTZ +author: John Snow Labs +name: mbert_argmining_abstrct_english_spanish_pipeline +date: 2024-09-24 +tags: [es, open_source, pipeline, onnx] +task: Named Entity Recognition +language: es +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`mbert_argmining_abstrct_english_spanish_pipeline` is a Castilian, Spanish model originally trained by HiTZ. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/mbert_argmining_abstrct_english_spanish_pipeline_es_5.5.0_3.0_1727195893132.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/mbert_argmining_abstrct_english_spanish_pipeline_es_5.5.0_3.0_1727195893132.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("mbert_argmining_abstrct_english_spanish_pipeline", lang = "es") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("mbert_argmining_abstrct_english_spanish_pipeline", lang = "es") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|mbert_argmining_abstrct_english_spanish_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|es| +|Size:|665.1 MB| + +## References + +https://huggingface.co/HiTZ/mbert-argmining-abstrct-en-es + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-mefmqgve_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-mefmqgve_pipeline_en.md new file mode 100644 index 00000000000000..f9f49c85cde4a4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-mefmqgve_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English mefmqgve_pipeline pipeline DistilBertForSequenceClassification from chernandezc +author: John Snow Labs +name: mefmqgve_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`mefmqgve_pipeline` is a English model originally trained by chernandezc. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/mefmqgve_pipeline_en_5.5.0_3.0_1727154653152.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/mefmqgve_pipeline_en_5.5.0_3.0_1727154653152.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("mefmqgve_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("mefmqgve_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|mefmqgve_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/chernandezc/mefmqgve + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-memo_bert_wsd_memo_bert_danskbert_last_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-memo_bert_wsd_memo_bert_danskbert_last_pipeline_en.md new file mode 100644 index 00000000000000..8e6643f5f0d155 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-memo_bert_wsd_memo_bert_danskbert_last_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English memo_bert_wsd_memo_bert_danskbert_last_pipeline pipeline XlmRoBertaForSequenceClassification from yemen2016 +author: John Snow Labs +name: memo_bert_wsd_memo_bert_danskbert_last_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`memo_bert_wsd_memo_bert_danskbert_last_pipeline` is a English model originally trained by yemen2016. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/memo_bert_wsd_memo_bert_danskbert_last_pipeline_en_5.5.0_3.0_1727155837014.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/memo_bert_wsd_memo_bert_danskbert_last_pipeline_en_5.5.0_3.0_1727155837014.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("memo_bert_wsd_memo_bert_danskbert_last_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("memo_bert_wsd_memo_bert_danskbert_last_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|memo_bert_wsd_memo_bert_danskbert_last_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|428.3 MB| + +## References + +https://huggingface.co/yemen2016/MeMo_BERT-WSD-MeMo-BERT-DanskBERT_last + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-mentbert_en.md b/docs/_posts/ahmedlone127/2024-09-24-mentbert_en.md new file mode 100644 index 00000000000000..285aba8923595f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-mentbert_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English mentbert BertForSequenceClassification from reab5555 +author: John Snow Labs +name: mentbert +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`mentbert` is a English model originally trained by reab5555. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/mentbert_en_5.5.0_3.0_1727219025111.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/mentbert_en_5.5.0_3.0_1727219025111.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("mentbert","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("mentbert", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|mentbert| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/reab5555/mentBERT \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-mentbert_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-mentbert_pipeline_en.md new file mode 100644 index 00000000000000..47077e1d5e43dc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-mentbert_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English mentbert_pipeline pipeline BertForSequenceClassification from reab5555 +author: John Snow Labs +name: mentbert_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`mentbert_pipeline` is a English model originally trained by reab5555. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/mentbert_pipeline_en_5.5.0_3.0_1727219051028.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/mentbert_pipeline_en_5.5.0_3.0_1727219051028.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("mentbert_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("mentbert_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|mentbert_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/reab5555/mentBERT + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-model_one_ashleyinust_en.md b/docs/_posts/ahmedlone127/2024-09-24-model_one_ashleyinust_en.md new file mode 100644 index 00000000000000..8e108b8f8fda51 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-model_one_ashleyinust_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English model_one_ashleyinust DistilBertForSequenceClassification from Ashleyinust +author: John Snow Labs +name: model_one_ashleyinust +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`model_one_ashleyinust` is a English model originally trained by Ashleyinust. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/model_one_ashleyinust_en_5.5.0_3.0_1727164141906.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/model_one_ashleyinust_en_5.5.0_3.0_1727164141906.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("model_one_ashleyinust","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("model_one_ashleyinust", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|model_one_ashleyinust| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Ashleyinust/model_one \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-modelocanal_es.md b/docs/_posts/ahmedlone127/2024-09-24-modelocanal_es.md new file mode 100644 index 00000000000000..fa9524fd9c2470 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-modelocanal_es.md @@ -0,0 +1,86 @@ +--- +layout: model +title: Castilian, Spanish modelocanal BertForQuestionAnswering from Antonio49 +author: John Snow Labs +name: modelocanal +date: 2024-09-24 +tags: [es, open_source, onnx, question_answering, bert] +task: Question Answering +language: es +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`modelocanal` is a Castilian, Spanish model originally trained by Antonio49. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/modelocanal_es_5.5.0_3.0_1727207066216.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/modelocanal_es_5.5.0_3.0_1727207066216.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("modelocanal","es") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("modelocanal", "es") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|modelocanal| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|es| +|Size:|409.7 MB| + +## References + +https://huggingface.co/Antonio49/ModeloCanal \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-modelocanal_pipeline_es.md b/docs/_posts/ahmedlone127/2024-09-24-modelocanal_pipeline_es.md new file mode 100644 index 00000000000000..3841510ad1d7f8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-modelocanal_pipeline_es.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Castilian, Spanish modelocanal_pipeline pipeline BertForQuestionAnswering from Antonio49 +author: John Snow Labs +name: modelocanal_pipeline +date: 2024-09-24 +tags: [es, open_source, pipeline, onnx] +task: Question Answering +language: es +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`modelocanal_pipeline` is a Castilian, Spanish model originally trained by Antonio49. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/modelocanal_pipeline_es_5.5.0_3.0_1727207087231.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/modelocanal_pipeline_es_5.5.0_3.0_1727207087231.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("modelocanal_pipeline", lang = "es") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("modelocanal_pipeline", lang = "es") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|modelocanal_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|es| +|Size:|409.7 MB| + +## References + +https://huggingface.co/Antonio49/ModeloCanal + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-multi_label_classification_venkatarajendra_en.md b/docs/_posts/ahmedlone127/2024-09-24-multi_label_classification_venkatarajendra_en.md new file mode 100644 index 00000000000000..1a365ff8e0368a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-multi_label_classification_venkatarajendra_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English multi_label_classification_venkatarajendra RoBertaForSequenceClassification from venkatarajendra +author: John Snow Labs +name: multi_label_classification_venkatarajendra +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`multi_label_classification_venkatarajendra` is a English model originally trained by venkatarajendra. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/multi_label_classification_venkatarajendra_en_5.5.0_3.0_1727171494775.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/multi_label_classification_venkatarajendra_en_5.5.0_3.0_1727171494775.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("multi_label_classification_venkatarajendra","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("multi_label_classification_venkatarajendra", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|multi_label_classification_venkatarajendra| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|457.1 MB| + +## References + +https://huggingface.co/venkatarajendra/multi-label-classification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-multi_label_classification_venkatarajendra_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-multi_label_classification_venkatarajendra_pipeline_en.md new file mode 100644 index 00000000000000..67f5daaac44ab9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-multi_label_classification_venkatarajendra_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English multi_label_classification_venkatarajendra_pipeline pipeline RoBertaForSequenceClassification from venkatarajendra +author: John Snow Labs +name: multi_label_classification_venkatarajendra_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`multi_label_classification_venkatarajendra_pipeline` is a English model originally trained by venkatarajendra. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/multi_label_classification_venkatarajendra_pipeline_en_5.5.0_3.0_1727171524292.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/multi_label_classification_venkatarajendra_pipeline_en_5.5.0_3.0_1727171524292.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("multi_label_classification_venkatarajendra_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("multi_label_classification_venkatarajendra_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|multi_label_classification_venkatarajendra_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|457.1 MB| + +## References + +https://huggingface.co/venkatarajendra/multi-label-classification + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-multilingual_xlm_roberta_for_ner_yvzplay2_pipeline_xx.md b/docs/_posts/ahmedlone127/2024-09-24-multilingual_xlm_roberta_for_ner_yvzplay2_pipeline_xx.md new file mode 100644 index 00000000000000..d4ed0efac246f8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-multilingual_xlm_roberta_for_ner_yvzplay2_pipeline_xx.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Multilingual multilingual_xlm_roberta_for_ner_yvzplay2_pipeline pipeline XlmRoBertaForTokenClassification from yvzplay2 +author: John Snow Labs +name: multilingual_xlm_roberta_for_ner_yvzplay2_pipeline +date: 2024-09-24 +tags: [xx, open_source, pipeline, onnx] +task: Named Entity Recognition +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`multilingual_xlm_roberta_for_ner_yvzplay2_pipeline` is a Multilingual model originally trained by yvzplay2. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/multilingual_xlm_roberta_for_ner_yvzplay2_pipeline_xx_5.5.0_3.0_1727160807499.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/multilingual_xlm_roberta_for_ner_yvzplay2_pipeline_xx_5.5.0_3.0_1727160807499.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("multilingual_xlm_roberta_for_ner_yvzplay2_pipeline", lang = "xx") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("multilingual_xlm_roberta_for_ner_yvzplay2_pipeline", lang = "xx") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|multilingual_xlm_roberta_for_ner_yvzplay2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|xx| +|Size:|853.8 MB| + +## References + +https://huggingface.co/yvzplay2/multilingual-xlm-roberta-for-ner + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-multipleqg_full_ctxt_only_filtered_0_15_pubmedbert_en.md b/docs/_posts/ahmedlone127/2024-09-24-multipleqg_full_ctxt_only_filtered_0_15_pubmedbert_en.md new file mode 100644 index 00000000000000..42096c49264ee2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-multipleqg_full_ctxt_only_filtered_0_15_pubmedbert_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English multipleqg_full_ctxt_only_filtered_0_15_pubmedbert BertForQuestionAnswering from LeWince +author: John Snow Labs +name: multipleqg_full_ctxt_only_filtered_0_15_pubmedbert +date: 2024-09-24 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`multipleqg_full_ctxt_only_filtered_0_15_pubmedbert` is a English model originally trained by LeWince. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/multipleqg_full_ctxt_only_filtered_0_15_pubmedbert_en_5.5.0_3.0_1727175563624.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/multipleqg_full_ctxt_only_filtered_0_15_pubmedbert_en_5.5.0_3.0_1727175563624.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("multipleqg_full_ctxt_only_filtered_0_15_pubmedbert","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("multipleqg_full_ctxt_only_filtered_0_15_pubmedbert", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|multipleqg_full_ctxt_only_filtered_0_15_pubmedbert| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|53.7 MB| + +## References + +https://huggingface.co/LeWince/MultipleQG-Full_Ctxt_Only-filtered_0_15_PubMedBert \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-multipleqg_full_ctxt_only_filtered_0_15_pubmedbert_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-multipleqg_full_ctxt_only_filtered_0_15_pubmedbert_pipeline_en.md new file mode 100644 index 00000000000000..31878ea2460834 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-multipleqg_full_ctxt_only_filtered_0_15_pubmedbert_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English multipleqg_full_ctxt_only_filtered_0_15_pubmedbert_pipeline pipeline BertForQuestionAnswering from LeWince +author: John Snow Labs +name: multipleqg_full_ctxt_only_filtered_0_15_pubmedbert_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`multipleqg_full_ctxt_only_filtered_0_15_pubmedbert_pipeline` is a English model originally trained by LeWince. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/multipleqg_full_ctxt_only_filtered_0_15_pubmedbert_pipeline_en_5.5.0_3.0_1727175566545.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/multipleqg_full_ctxt_only_filtered_0_15_pubmedbert_pipeline_en_5.5.0_3.0_1727175566545.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("multipleqg_full_ctxt_only_filtered_0_15_pubmedbert_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("multipleqg_full_ctxt_only_filtered_0_15_pubmedbert_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|multipleqg_full_ctxt_only_filtered_0_15_pubmedbert_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|53.7 MB| + +## References + +https://huggingface.co/LeWince/MultipleQG-Full_Ctxt_Only-filtered_0_15_PubMedBert + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-mymodel_cased_en.md b/docs/_posts/ahmedlone127/2024-09-24-mymodel_cased_en.md new file mode 100644 index 00000000000000..36555058747233 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-mymodel_cased_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English mymodel_cased DistilBertForSequenceClassification from AkhilGTom +author: John Snow Labs +name: mymodel_cased +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`mymodel_cased` is a English model originally trained by AkhilGTom. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/mymodel_cased_en_5.5.0_3.0_1727154613983.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/mymodel_cased_en_5.5.0_3.0_1727154613983.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("mymodel_cased","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("mymodel_cased", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|mymodel_cased| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|246.0 MB| + +## References + +https://huggingface.co/AkhilGTom/myModel_cased \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-mymodel_cased_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-mymodel_cased_pipeline_en.md new file mode 100644 index 00000000000000..c56eb84806916a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-mymodel_cased_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English mymodel_cased_pipeline pipeline DistilBertForSequenceClassification from AkhilGTom +author: John Snow Labs +name: mymodel_cased_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`mymodel_cased_pipeline` is a English model originally trained by AkhilGTom. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/mymodel_cased_pipeline_en_5.5.0_3.0_1727154627542.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/mymodel_cased_pipeline_en_5.5.0_3.0_1727154627542.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("mymodel_cased_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("mymodel_cased_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|mymodel_cased_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|246.0 MB| + +## References + +https://huggingface.co/AkhilGTom/myModel_cased + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-mymodel_en.md b/docs/_posts/ahmedlone127/2024-09-24-mymodel_en.md new file mode 100644 index 00000000000000..3a84f89c60d9a0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-mymodel_en.md @@ -0,0 +1,92 @@ +--- +layout: model +title: English mymodel BertEmbeddings from heima +author: John Snow Labs +name: mymodel +date: 2024-09-24 +tags: [bert, en, open_source, fill_mask, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`mymodel` is a English model originally trained by heima. + +## Predicted Entities + + + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/mymodel_en_5.5.0_3.0_1727171496587.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/mymodel_en_5.5.0_3.0_1727171496587.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +document_assembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +embeddings =BertEmbeddings.pretrained("mymodel","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([document_assembler, embeddings]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) +``` +```scala +val document_assembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val embeddings = BertEmbeddings + .pretrained("mymodel", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(document_assembler, embeddings)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|mymodel| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|308.7 MB| + +## References + +References + +https://huggingface.co/heima/mymodel \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-n_distilbert_sst5_padding0model_wyzhw_en.md b/docs/_posts/ahmedlone127/2024-09-24-n_distilbert_sst5_padding0model_wyzhw_en.md new file mode 100644 index 00000000000000..ef12632f5493c4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-n_distilbert_sst5_padding0model_wyzhw_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English n_distilbert_sst5_padding0model_wyzhw DistilBertForSequenceClassification from wyzhw +author: John Snow Labs +name: n_distilbert_sst5_padding0model_wyzhw +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`n_distilbert_sst5_padding0model_wyzhw` is a English model originally trained by wyzhw. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/n_distilbert_sst5_padding0model_wyzhw_en_5.5.0_3.0_1727136932805.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/n_distilbert_sst5_padding0model_wyzhw_en_5.5.0_3.0_1727136932805.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("n_distilbert_sst5_padding0model_wyzhw","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("n_distilbert_sst5_padding0model_wyzhw", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|n_distilbert_sst5_padding0model_wyzhw| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/wyzhw/N_distilbert_sst5_padding0model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-ndd_mantisbt_test_content_tags_en.md b/docs/_posts/ahmedlone127/2024-09-24-ndd_mantisbt_test_content_tags_en.md new file mode 100644 index 00000000000000..9b68af8d9b668f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-ndd_mantisbt_test_content_tags_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English ndd_mantisbt_test_content_tags DistilBertForSequenceClassification from lgk03 +author: John Snow Labs +name: ndd_mantisbt_test_content_tags +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ndd_mantisbt_test_content_tags` is a English model originally trained by lgk03. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ndd_mantisbt_test_content_tags_en_5.5.0_3.0_1727164635187.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ndd_mantisbt_test_content_tags_en_5.5.0_3.0_1727164635187.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("ndd_mantisbt_test_content_tags","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("ndd_mantisbt_test_content_tags", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ndd_mantisbt_test_content_tags| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/lgk03/NDD-mantisbt_test-content_tags \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-ndd_mantisbt_test_content_tags_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-ndd_mantisbt_test_content_tags_pipeline_en.md new file mode 100644 index 00000000000000..fca31305b64250 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-ndd_mantisbt_test_content_tags_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English ndd_mantisbt_test_content_tags_pipeline pipeline DistilBertForSequenceClassification from lgk03 +author: John Snow Labs +name: ndd_mantisbt_test_content_tags_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ndd_mantisbt_test_content_tags_pipeline` is a English model originally trained by lgk03. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ndd_mantisbt_test_content_tags_pipeline_en_5.5.0_3.0_1727164648732.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ndd_mantisbt_test_content_tags_pipeline_en_5.5.0_3.0_1727164648732.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("ndd_mantisbt_test_content_tags_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("ndd_mantisbt_test_content_tags_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ndd_mantisbt_test_content_tags_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/lgk03/NDD-mantisbt_test-content_tags + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-ner_ner_random2_seed2_roberta_large_en.md b/docs/_posts/ahmedlone127/2024-09-24-ner_ner_random2_seed2_roberta_large_en.md new file mode 100644 index 00000000000000..a95a6d13cecc77 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-ner_ner_random2_seed2_roberta_large_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English ner_ner_random2_seed2_roberta_large RoBertaForTokenClassification from tweettemposhift +author: John Snow Labs +name: ner_ner_random2_seed2_roberta_large +date: 2024-09-24 +tags: [en, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ner_ner_random2_seed2_roberta_large` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ner_ner_random2_seed2_roberta_large_en_5.5.0_3.0_1727151411205.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ner_ner_random2_seed2_roberta_large_en_5.5.0_3.0_1727151411205.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("ner_ner_random2_seed2_roberta_large","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("ner_ner_random2_seed2_roberta_large", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ner_ner_random2_seed2_roberta_large| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/tweettemposhift/ner-ner_random2_seed2-roberta-large \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-ner_ner_random2_seed2_roberta_large_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-ner_ner_random2_seed2_roberta_large_pipeline_en.md new file mode 100644 index 00000000000000..3b55f0f4b46b28 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-ner_ner_random2_seed2_roberta_large_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English ner_ner_random2_seed2_roberta_large_pipeline pipeline RoBertaForTokenClassification from tweettemposhift +author: John Snow Labs +name: ner_ner_random2_seed2_roberta_large_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ner_ner_random2_seed2_roberta_large_pipeline` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ner_ner_random2_seed2_roberta_large_pipeline_en_5.5.0_3.0_1727151489651.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ner_ner_random2_seed2_roberta_large_pipeline_en_5.5.0_3.0_1727151489651.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("ner_ner_random2_seed2_roberta_large_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("ner_ner_random2_seed2_roberta_large_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ner_ner_random2_seed2_roberta_large_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/tweettemposhift/ner-ner_random2_seed2-roberta-large + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-nerubios_roberta_base_bne_training_testing_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-nerubios_roberta_base_bne_training_testing_pipeline_en.md new file mode 100644 index 00000000000000..3f91dfc23de7a6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-nerubios_roberta_base_bne_training_testing_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English nerubios_roberta_base_bne_training_testing_pipeline pipeline RoBertaForTokenClassification from ajtamayoh +author: John Snow Labs +name: nerubios_roberta_base_bne_training_testing_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`nerubios_roberta_base_bne_training_testing_pipeline` is a English model originally trained by ajtamayoh. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/nerubios_roberta_base_bne_training_testing_pipeline_en_5.5.0_3.0_1727151576531.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/nerubios_roberta_base_bne_training_testing_pipeline_en_5.5.0_3.0_1727151576531.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("nerubios_roberta_base_bne_training_testing_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("nerubios_roberta_base_bne_training_testing_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|nerubios_roberta_base_bne_training_testing_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|437.6 MB| + +## References + +https://huggingface.co/ajtamayoh/NeRUBioS_RoBERTa_base_bne_Training_Testing + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-nlp2_base_3e_5_en.md b/docs/_posts/ahmedlone127/2024-09-24-nlp2_base_3e_5_en.md new file mode 100644 index 00000000000000..4234292ba2336b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-nlp2_base_3e_5_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English nlp2_base_3e_5 DistilBertForSequenceClassification from VRT-2428211 +author: John Snow Labs +name: nlp2_base_3e_5 +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`nlp2_base_3e_5` is a English model originally trained by VRT-2428211. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/nlp2_base_3e_5_en_5.5.0_3.0_1727154918792.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/nlp2_base_3e_5_en_5.5.0_3.0_1727154918792.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("nlp2_base_3e_5","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("nlp2_base_3e_5", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|nlp2_base_3e_5| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/VRT-2428211/NLP2_Base_3e-5 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-nlp2_base_3e_5_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-nlp2_base_3e_5_pipeline_en.md new file mode 100644 index 00000000000000..4a990b965d2d66 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-nlp2_base_3e_5_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English nlp2_base_3e_5_pipeline pipeline DistilBertForSequenceClassification from VRT-2428211 +author: John Snow Labs +name: nlp2_base_3e_5_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`nlp2_base_3e_5_pipeline` is a English model originally trained by VRT-2428211. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/nlp2_base_3e_5_pipeline_en_5.5.0_3.0_1727154932055.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/nlp2_base_3e_5_pipeline_en_5.5.0_3.0_1727154932055.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("nlp2_base_3e_5_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("nlp2_base_3e_5_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|nlp2_base_3e_5_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/VRT-2428211/NLP2_Base_3e-5 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-norwegian_repeats_en.md b/docs/_posts/ahmedlone127/2024-09-24-norwegian_repeats_en.md new file mode 100644 index 00000000000000..5f38dac92897bd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-norwegian_repeats_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English norwegian_repeats XlmRoBertaForTokenClassification from grace-pro +author: John Snow Labs +name: norwegian_repeats +date: 2024-09-24 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`norwegian_repeats` is a English model originally trained by grace-pro. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/norwegian_repeats_en_5.5.0_3.0_1727174648394.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/norwegian_repeats_en_5.5.0_3.0_1727174648394.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("norwegian_repeats","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("norwegian_repeats", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|norwegian_repeats| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|1.0 GB| + +## References + +https://huggingface.co/grace-pro/no_repeats \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-norwegian_repeats_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-norwegian_repeats_pipeline_en.md new file mode 100644 index 00000000000000..c9f67c7f15c673 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-norwegian_repeats_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English norwegian_repeats_pipeline pipeline XlmRoBertaForTokenClassification from grace-pro +author: John Snow Labs +name: norwegian_repeats_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`norwegian_repeats_pipeline` is a English model originally trained by grace-pro. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/norwegian_repeats_pipeline_en_5.5.0_3.0_1727174699382.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/norwegian_repeats_pipeline_en_5.5.0_3.0_1727174699382.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("norwegian_repeats_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("norwegian_repeats_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|norwegian_repeats_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.0 GB| + +## References + +https://huggingface.co/grace-pro/no_repeats + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-nuner_v1_ontonotes5_en.md b/docs/_posts/ahmedlone127/2024-09-24-nuner_v1_ontonotes5_en.md new file mode 100644 index 00000000000000..95e2cd879f8fee --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-nuner_v1_ontonotes5_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English nuner_v1_ontonotes5 RoBertaForTokenClassification from guishe +author: John Snow Labs +name: nuner_v1_ontonotes5 +date: 2024-09-24 +tags: [en, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`nuner_v1_ontonotes5` is a English model originally trained by guishe. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/nuner_v1_ontonotes5_en_5.5.0_3.0_1727139834509.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/nuner_v1_ontonotes5_en_5.5.0_3.0_1727139834509.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("nuner_v1_ontonotes5","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("nuner_v1_ontonotes5", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|nuner_v1_ontonotes5| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|453.8 MB| + +## References + +https://huggingface.co/guishe/nuner-v1_ontonotes5 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-nuner_v1_ontonotes5_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-nuner_v1_ontonotes5_pipeline_en.md new file mode 100644 index 00000000000000..663e00e4cda17e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-nuner_v1_ontonotes5_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English nuner_v1_ontonotes5_pipeline pipeline RoBertaForTokenClassification from guishe +author: John Snow Labs +name: nuner_v1_ontonotes5_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`nuner_v1_ontonotes5_pipeline` is a English model originally trained by guishe. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/nuner_v1_ontonotes5_pipeline_en_5.5.0_3.0_1727139860961.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/nuner_v1_ontonotes5_pipeline_en_5.5.0_3.0_1727139860961.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("nuner_v1_ontonotes5_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("nuner_v1_ontonotes5_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|nuner_v1_ontonotes5_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|453.8 MB| + +## References + +https://huggingface.co/guishe/nuner-v1_ontonotes5 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-opus_maltese_arabic_english_en.md b/docs/_posts/ahmedlone127/2024-09-24-opus_maltese_arabic_english_en.md new file mode 100644 index 00000000000000..4098b18d135b34 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-opus_maltese_arabic_english_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English opus_maltese_arabic_english MarianTransformer from finnstrom3693 +author: John Snow Labs +name: opus_maltese_arabic_english +date: 2024-09-24 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opus_maltese_arabic_english` is a English model originally trained by finnstrom3693. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opus_maltese_arabic_english_en_5.5.0_3.0_1727166100814.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opus_maltese_arabic_english_en_5.5.0_3.0_1727166100814.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("opus_maltese_arabic_english","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("opus_maltese_arabic_english","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opus_maltese_arabic_english| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|335.5 MB| + +## References + +https://huggingface.co/finnstrom3693/opus-mt-ar-en \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-opus_maltese_arabic_english_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-opus_maltese_arabic_english_pipeline_en.md new file mode 100644 index 00000000000000..3a1d084975eeae --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-opus_maltese_arabic_english_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English opus_maltese_arabic_english_pipeline pipeline MarianTransformer from finnstrom3693 +author: John Snow Labs +name: opus_maltese_arabic_english_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opus_maltese_arabic_english_pipeline` is a English model originally trained by finnstrom3693. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opus_maltese_arabic_english_pipeline_en_5.5.0_3.0_1727166193635.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opus_maltese_arabic_english_pipeline_en_5.5.0_3.0_1727166193635.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("opus_maltese_arabic_english_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("opus_maltese_arabic_english_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opus_maltese_arabic_english_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|336.1 MB| + +## References + +https://huggingface.co/finnstrom3693/opus-mt-ar-en + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-opus_maltese_english_arabic_en.md b/docs/_posts/ahmedlone127/2024-09-24-opus_maltese_english_arabic_en.md new file mode 100644 index 00000000000000..d959dab5b5bb2a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-opus_maltese_english_arabic_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English opus_maltese_english_arabic MarianTransformer from finnstrom3693 +author: John Snow Labs +name: opus_maltese_english_arabic +date: 2024-09-24 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opus_maltese_english_arabic` is a English model originally trained by finnstrom3693. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opus_maltese_english_arabic_en_5.5.0_3.0_1727166110577.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opus_maltese_english_arabic_en_5.5.0_3.0_1727166110577.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("opus_maltese_english_arabic","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("opus_maltese_english_arabic","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opus_maltese_english_arabic| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|336.6 MB| + +## References + +https://huggingface.co/finnstrom3693/opus-mt-en-ar \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-opus_maltese_english_arabic_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-opus_maltese_english_arabic_pipeline_en.md new file mode 100644 index 00000000000000..051fea77f9ecb4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-opus_maltese_english_arabic_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English opus_maltese_english_arabic_pipeline pipeline MarianTransformer from finnstrom3693 +author: John Snow Labs +name: opus_maltese_english_arabic_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opus_maltese_english_arabic_pipeline` is a English model originally trained by finnstrom3693. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opus_maltese_english_arabic_pipeline_en_5.5.0_3.0_1727166203604.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opus_maltese_english_arabic_pipeline_en_5.5.0_3.0_1727166203604.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("opus_maltese_english_arabic_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("opus_maltese_english_arabic_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opus_maltese_english_arabic_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|337.2 MB| + +## References + +https://huggingface.co/finnstrom3693/opus-mt-en-ar + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-opus_maltese_english_indonesian_en.md b/docs/_posts/ahmedlone127/2024-09-24-opus_maltese_english_indonesian_en.md new file mode 100644 index 00000000000000..a86460d4c33ca1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-opus_maltese_english_indonesian_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English opus_maltese_english_indonesian MarianTransformer from finnstrom3693 +author: John Snow Labs +name: opus_maltese_english_indonesian +date: 2024-09-24 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opus_maltese_english_indonesian` is a English model originally trained by finnstrom3693. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opus_maltese_english_indonesian_en_5.5.0_3.0_1727166492980.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opus_maltese_english_indonesian_en_5.5.0_3.0_1727166492980.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("opus_maltese_english_indonesian","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("opus_maltese_english_indonesian","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opus_maltese_english_indonesian| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|307.3 MB| + +## References + +https://huggingface.co/finnstrom3693/opus-mt-en-id \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-opus_maltese_english_indonesian_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-opus_maltese_english_indonesian_pipeline_en.md new file mode 100644 index 00000000000000..a69b093ece070a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-opus_maltese_english_indonesian_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English opus_maltese_english_indonesian_pipeline pipeline MarianTransformer from finnstrom3693 +author: John Snow Labs +name: opus_maltese_english_indonesian_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opus_maltese_english_indonesian_pipeline` is a English model originally trained by finnstrom3693. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opus_maltese_english_indonesian_pipeline_en_5.5.0_3.0_1727166577448.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opus_maltese_english_indonesian_pipeline_en_5.5.0_3.0_1727166577448.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("opus_maltese_english_indonesian_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("opus_maltese_english_indonesian_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opus_maltese_english_indonesian_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|307.8 MB| + +## References + +https://huggingface.co/finnstrom3693/opus-mt-en-id + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-output_sonyy_en.md b/docs/_posts/ahmedlone127/2024-09-24-output_sonyy_en.md new file mode 100644 index 00000000000000..77bb74b426b1ba --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-output_sonyy_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English output_sonyy DistilBertForSequenceClassification from sonyy +author: John Snow Labs +name: output_sonyy +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`output_sonyy` is a English model originally trained by sonyy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/output_sonyy_en_5.5.0_3.0_1727164667739.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/output_sonyy_en_5.5.0_3.0_1727164667739.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("output_sonyy","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("output_sonyy", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|output_sonyy| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/sonyy/output \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-patient_doctor_text_classifier_eng_en.md b/docs/_posts/ahmedlone127/2024-09-24-patient_doctor_text_classifier_eng_en.md new file mode 100644 index 00000000000000..d808054f1f8117 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-patient_doctor_text_classifier_eng_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English patient_doctor_text_classifier_eng DistilBertForSequenceClassification from LukeGPT88 +author: John Snow Labs +name: patient_doctor_text_classifier_eng +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`patient_doctor_text_classifier_eng` is a English model originally trained by LukeGPT88. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/patient_doctor_text_classifier_eng_en_5.5.0_3.0_1727204907822.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/patient_doctor_text_classifier_eng_en_5.5.0_3.0_1727204907822.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("patient_doctor_text_classifier_eng","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("patient_doctor_text_classifier_eng", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|patient_doctor_text_classifier_eng| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/LukeGPT88/patient-doctor-text-classifier-eng \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-patient_doctor_text_classifier_eng_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-patient_doctor_text_classifier_eng_pipeline_en.md new file mode 100644 index 00000000000000..15709d42deeaf8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-patient_doctor_text_classifier_eng_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English patient_doctor_text_classifier_eng_pipeline pipeline DistilBertForSequenceClassification from LukeGPT88 +author: John Snow Labs +name: patient_doctor_text_classifier_eng_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`patient_doctor_text_classifier_eng_pipeline` is a English model originally trained by LukeGPT88. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/patient_doctor_text_classifier_eng_pipeline_en_5.5.0_3.0_1727204921327.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/patient_doctor_text_classifier_eng_pipeline_en_5.5.0_3.0_1727204921327.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("patient_doctor_text_classifier_eng_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("patient_doctor_text_classifier_eng_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|patient_doctor_text_classifier_eng_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/LukeGPT88/patient-doctor-text-classifier-eng + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-platzi_distilroberta_base_mrpc_glue_will_mendoza_en.md b/docs/_posts/ahmedlone127/2024-09-24-platzi_distilroberta_base_mrpc_glue_will_mendoza_en.md new file mode 100644 index 00000000000000..7a9db78dfa106a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-platzi_distilroberta_base_mrpc_glue_will_mendoza_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English platzi_distilroberta_base_mrpc_glue_will_mendoza RoBertaForSequenceClassification from willmendoza +author: John Snow Labs +name: platzi_distilroberta_base_mrpc_glue_will_mendoza +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`platzi_distilroberta_base_mrpc_glue_will_mendoza` is a English model originally trained by willmendoza. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/platzi_distilroberta_base_mrpc_glue_will_mendoza_en_5.5.0_3.0_1727167752829.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/platzi_distilroberta_base_mrpc_glue_will_mendoza_en_5.5.0_3.0_1727167752829.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("platzi_distilroberta_base_mrpc_glue_will_mendoza","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("platzi_distilroberta_base_mrpc_glue_will_mendoza", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|platzi_distilroberta_base_mrpc_glue_will_mendoza| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|308.6 MB| + +## References + +https://huggingface.co/willmendoza/platzi-distilroberta-base-mrpc-glue-will-mendoza \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-policy_sayula_popoluca_neg_2012_roberta_norwegian_dropout_en.md b/docs/_posts/ahmedlone127/2024-09-24-policy_sayula_popoluca_neg_2012_roberta_norwegian_dropout_en.md new file mode 100644 index 00000000000000..204ae166937096 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-policy_sayula_popoluca_neg_2012_roberta_norwegian_dropout_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English policy_sayula_popoluca_neg_2012_roberta_norwegian_dropout RoBertaForTokenClassification from GiladH +author: John Snow Labs +name: policy_sayula_popoluca_neg_2012_roberta_norwegian_dropout +date: 2024-09-24 +tags: [en, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`policy_sayula_popoluca_neg_2012_roberta_norwegian_dropout` is a English model originally trained by GiladH. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/policy_sayula_popoluca_neg_2012_roberta_norwegian_dropout_en_5.5.0_3.0_1727150881095.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/policy_sayula_popoluca_neg_2012_roberta_norwegian_dropout_en_5.5.0_3.0_1727150881095.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("policy_sayula_popoluca_neg_2012_roberta_norwegian_dropout","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("policy_sayula_popoluca_neg_2012_roberta_norwegian_dropout", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|policy_sayula_popoluca_neg_2012_roberta_norwegian_dropout| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/GiladH/policy_pos_neg_2012_roberta_no_dropout \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-policy_sayula_popoluca_neg_2012_roberta_norwegian_dropout_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-policy_sayula_popoluca_neg_2012_roberta_norwegian_dropout_pipeline_en.md new file mode 100644 index 00000000000000..127a14dfdb1b03 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-policy_sayula_popoluca_neg_2012_roberta_norwegian_dropout_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English policy_sayula_popoluca_neg_2012_roberta_norwegian_dropout_pipeline pipeline RoBertaForTokenClassification from GiladH +author: John Snow Labs +name: policy_sayula_popoluca_neg_2012_roberta_norwegian_dropout_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`policy_sayula_popoluca_neg_2012_roberta_norwegian_dropout_pipeline` is a English model originally trained by GiladH. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/policy_sayula_popoluca_neg_2012_roberta_norwegian_dropout_pipeline_en_5.5.0_3.0_1727150950330.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/policy_sayula_popoluca_neg_2012_roberta_norwegian_dropout_pipeline_en_5.5.0_3.0_1727150950330.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("policy_sayula_popoluca_neg_2012_roberta_norwegian_dropout_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("policy_sayula_popoluca_neg_2012_roberta_norwegian_dropout_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|policy_sayula_popoluca_neg_2012_roberta_norwegian_dropout_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/GiladH/policy_pos_neg_2012_roberta_no_dropout + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-portuguese_up_xlmr_contextincluded_idiomexcluded_4_best_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-portuguese_up_xlmr_contextincluded_idiomexcluded_4_best_pipeline_en.md new file mode 100644 index 00000000000000..d6f3563ad95abf --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-portuguese_up_xlmr_contextincluded_idiomexcluded_4_best_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English portuguese_up_xlmr_contextincluded_idiomexcluded_4_best_pipeline pipeline XlmRoBertaForSequenceClassification from harish +author: John Snow Labs +name: portuguese_up_xlmr_contextincluded_idiomexcluded_4_best_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`portuguese_up_xlmr_contextincluded_idiomexcluded_4_best_pipeline` is a English model originally trained by harish. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/portuguese_up_xlmr_contextincluded_idiomexcluded_4_best_pipeline_en_5.5.0_3.0_1727153550979.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/portuguese_up_xlmr_contextincluded_idiomexcluded_4_best_pipeline_en_5.5.0_3.0_1727153550979.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("portuguese_up_xlmr_contextincluded_idiomexcluded_4_best_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("portuguese_up_xlmr_contextincluded_idiomexcluded_4_best_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|portuguese_up_xlmr_contextincluded_idiomexcluded_4_best_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|788.2 MB| + +## References + +https://huggingface.co/harish/PT-UP-xlmR-ContextIncluded_IdiomExcluded-4_BEST + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-prueba4_en.md b/docs/_posts/ahmedlone127/2024-09-24-prueba4_en.md new file mode 100644 index 00000000000000..80f8a720a467d5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-prueba4_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English prueba4 RoBertaForSequenceClassification from Saul98lm +author: John Snow Labs +name: prueba4 +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`prueba4` is a English model originally trained by Saul98lm. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/prueba4_en_5.5.0_3.0_1727172004997.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/prueba4_en_5.5.0_3.0_1727172004997.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("prueba4","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("prueba4", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|prueba4| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|308.6 MB| + +## References + +https://huggingface.co/Saul98lm/Prueba4 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-prueba4_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-prueba4_pipeline_en.md new file mode 100644 index 00000000000000..68e3acd18496e3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-prueba4_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English prueba4_pipeline pipeline RoBertaForSequenceClassification from Saul98lm +author: John Snow Labs +name: prueba4_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`prueba4_pipeline` is a English model originally trained by Saul98lm. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/prueba4_pipeline_en_5.5.0_3.0_1727172020392.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/prueba4_pipeline_en_5.5.0_3.0_1727172020392.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("prueba4_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("prueba4_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|prueba4_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|308.6 MB| + +## References + +https://huggingface.co/Saul98lm/Prueba4 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-python_code_comment_classification_en.md b/docs/_posts/ahmedlone127/2024-09-24-python_code_comment_classification_en.md new file mode 100644 index 00000000000000..ea87c9443966ee --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-python_code_comment_classification_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English python_code_comment_classification BertEmbeddings from ZarahShibli +author: John Snow Labs +name: python_code_comment_classification +date: 2024-09-24 +tags: [en, open_source, onnx, embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`python_code_comment_classification` is a English model originally trained by ZarahShibli. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/python_code_comment_classification_en_5.5.0_3.0_1727161834464.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/python_code_comment_classification_en_5.5.0_3.0_1727161834464.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = BertEmbeddings.pretrained("python_code_comment_classification","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = BertEmbeddings.pretrained("python_code_comment_classification","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|python_code_comment_classification| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[bert]| +|Language:|en| +|Size:|406.7 MB| + +## References + +https://huggingface.co/ZarahShibli/python-code-comment-classification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-python_code_comment_classification_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-python_code_comment_classification_pipeline_en.md new file mode 100644 index 00000000000000..76e74dbcebf851 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-python_code_comment_classification_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English python_code_comment_classification_pipeline pipeline BertEmbeddings from ZarahShibli +author: John Snow Labs +name: python_code_comment_classification_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`python_code_comment_classification_pipeline` is a English model originally trained by ZarahShibli. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/python_code_comment_classification_pipeline_en_5.5.0_3.0_1727161855350.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/python_code_comment_classification_pipeline_en_5.5.0_3.0_1727161855350.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("python_code_comment_classification_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("python_code_comment_classification_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|python_code_comment_classification_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|406.7 MB| + +## References + +https://huggingface.co/ZarahShibli/python-code-comment-classification + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-quiltnet_b_16_en.md b/docs/_posts/ahmedlone127/2024-09-24-quiltnet_b_16_en.md new file mode 100644 index 00000000000000..e1a038cec6a2a5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-quiltnet_b_16_en.md @@ -0,0 +1,120 @@ +--- +layout: model +title: English quiltnet_b_16 CLIPForZeroShotClassification from wisdomik +author: John Snow Labs +name: quiltnet_b_16 +date: 2024-09-24 +tags: [en, open_source, onnx, zero_shot, clip, image] +task: Zero-Shot Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: CLIPForZeroShotClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained CLIPForZeroShotClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`quiltnet_b_16` is a English model originally trained by wisdomik. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/quiltnet_b_16_en_5.5.0_3.0_1727207720261.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/quiltnet_b_16_en_5.5.0_3.0_1727207720261.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +imageDF = spark.read \ + .format("image") \ + .option("dropInvalid", value = True) \ + .load("src/test/resources/image/") + +candidateLabels = [ + "a photo of a bird", + "a photo of a cat", + "a photo of a dog", + "a photo of a hen", + "a photo of a hippo", + "a photo of a room", + "a photo of a tractor", + "a photo of an ostrich", + "a photo of an ox"] + +ImageAssembler = ImageAssembler() \ + .setInputCol("image") \ + .setOutputCol("image_assembler") + +imageClassifier = CLIPForZeroShotClassification.pretrained("quiltnet_b_16","en") \ + .setInputCols(["image_assembler"]) \ + .setOutputCol("label") \ + .setCandidateLabels(candidateLabels) + +pipeline = Pipeline().setStages([ImageAssembler, imageClassifier]) +pipelineModel = pipeline.fit(imageDF) +pipelineDF = pipelineModel.transform(imageDF) + + +``` +```scala + + +val imageDF = ResourceHelper.spark.read + .format("image") + .option("dropInvalid", value = true) + .load("src/test/resources/image/") + +val candidateLabels = Array( + "a photo of a bird", + "a photo of a cat", + "a photo of a dog", + "a photo of a hen", + "a photo of a hippo", + "a photo of a room", + "a photo of a tractor", + "a photo of an ostrich", + "a photo of an ox") + +val imageAssembler = new ImageAssembler() + .setInputCol("image") + .setOutputCol("image_assembler") + +val imageClassifier = CLIPForZeroShotClassification.pretrained("quiltnet_b_16","en") \ + .setInputCols(Array("image_assembler")) \ + .setOutputCol("label") \ + .setCandidateLabels(candidateLabels) + +val pipeline = new Pipeline().setStages(Array(imageAssembler, imageClassifier)) +val pipelineModel = pipeline.fit(imageDF) +val pipelineDF = pipelineModel.transform(imageDF) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|quiltnet_b_16| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[image_assembler]| +|Output Labels:|[label]| +|Language:|en| +|Size:|561.2 MB| + +## References + +https://huggingface.co/wisdomik/QuiltNet-B-16 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-quiltnet_b_16_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-quiltnet_b_16_pipeline_en.md new file mode 100644 index 00000000000000..0a49e7afa32910 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-quiltnet_b_16_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English quiltnet_b_16_pipeline pipeline CLIPForZeroShotClassification from wisdomik +author: John Snow Labs +name: quiltnet_b_16_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Zero-Shot Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained CLIPForZeroShotClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`quiltnet_b_16_pipeline` is a English model originally trained by wisdomik. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/quiltnet_b_16_pipeline_en_5.5.0_3.0_1727207751751.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/quiltnet_b_16_pipeline_en_5.5.0_3.0_1727207751751.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("quiltnet_b_16_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("quiltnet_b_16_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|quiltnet_b_16_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|561.2 MB| + +## References + +https://huggingface.co/wisdomik/QuiltNet-B-16 + +## Included Models + +- ImageAssembler +- CLIPForZeroShotClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-rejection_detection_en.md b/docs/_posts/ahmedlone127/2024-09-24-rejection_detection_en.md new file mode 100644 index 00000000000000..5580bf829837e9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-rejection_detection_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English rejection_detection RoBertaForSequenceClassification from holistic-ai +author: John Snow Labs +name: rejection_detection +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`rejection_detection` is a English model originally trained by holistic-ai. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/rejection_detection_en_5.5.0_3.0_1727211994868.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/rejection_detection_en_5.5.0_3.0_1727211994868.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("rejection_detection","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("rejection_detection", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|rejection_detection| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|309.0 MB| + +## References + +https://huggingface.co/holistic-ai/rejection_detection \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-rejection_detection_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-rejection_detection_pipeline_en.md new file mode 100644 index 00000000000000..d99e8706b7d3ee --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-rejection_detection_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English rejection_detection_pipeline pipeline RoBertaForSequenceClassification from holistic-ai +author: John Snow Labs +name: rejection_detection_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`rejection_detection_pipeline` is a English model originally trained by holistic-ai. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/rejection_detection_pipeline_en_5.5.0_3.0_1727212010929.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/rejection_detection_pipeline_en_5.5.0_3.0_1727212010929.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("rejection_detection_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("rejection_detection_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|rejection_detection_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|309.1 MB| + +## References + +https://huggingface.co/holistic-ai/rejection_detection + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-repo_31_5_mlops_zh0rg_en.md b/docs/_posts/ahmedlone127/2024-09-24-repo_31_5_mlops_zh0rg_en.md new file mode 100644 index 00000000000000..f0e3c483717086 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-repo_31_5_mlops_zh0rg_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English repo_31_5_mlops_zh0rg DistilBertForSequenceClassification from Zh0rg +author: John Snow Labs +name: repo_31_5_mlops_zh0rg +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`repo_31_5_mlops_zh0rg` is a English model originally trained by Zh0rg. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/repo_31_5_mlops_zh0rg_en_5.5.0_3.0_1727154615680.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/repo_31_5_mlops_zh0rg_en_5.5.0_3.0_1727154615680.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("repo_31_5_mlops_zh0rg","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("repo_31_5_mlops_zh0rg", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|repo_31_5_mlops_zh0rg| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Zh0rg/repo-31-5-MLOps \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-results_deberta_en.md b/docs/_posts/ahmedlone127/2024-09-24-results_deberta_en.md new file mode 100644 index 00000000000000..531e4f17198f91 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-results_deberta_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English results_deberta DeBertaForSequenceClassification from Siddartha10 +author: John Snow Labs +name: results_deberta +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, deberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DeBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DeBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`results_deberta` is a English model originally trained by Siddartha10. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/results_deberta_en_5.5.0_3.0_1727162438786.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/results_deberta_en_5.5.0_3.0_1727162438786.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DeBertaForSequenceClassification.pretrained("results_deberta","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DeBertaForSequenceClassification.pretrained("results_deberta", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|results_deberta| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|641.0 MB| + +## References + +https://huggingface.co/Siddartha10/results_deberta \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-results_deberta_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-results_deberta_pipeline_en.md new file mode 100644 index 00000000000000..b9dd93af5b0b57 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-results_deberta_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English results_deberta_pipeline pipeline DeBertaForSequenceClassification from Siddartha10 +author: John Snow Labs +name: results_deberta_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DeBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`results_deberta_pipeline` is a English model originally trained by Siddartha10. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/results_deberta_pipeline_en_5.5.0_3.0_1727162483793.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/results_deberta_pipeline_en_5.5.0_3.0_1727162483793.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("results_deberta_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("results_deberta_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|results_deberta_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|641.0 MB| + +## References + +https://huggingface.co/Siddartha10/results_deberta + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DeBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-rinna_roberta_qa_arcd1_en.md b/docs/_posts/ahmedlone127/2024-09-24-rinna_roberta_qa_arcd1_en.md new file mode 100644 index 00000000000000..d9c4500471af9c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-rinna_roberta_qa_arcd1_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English rinna_roberta_qa_arcd1 BertForQuestionAnswering from Echiguerkh +author: John Snow Labs +name: rinna_roberta_qa_arcd1 +date: 2024-09-24 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`rinna_roberta_qa_arcd1` is a English model originally trained by Echiguerkh. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/rinna_roberta_qa_arcd1_en_5.5.0_3.0_1727163851135.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/rinna_roberta_qa_arcd1_en_5.5.0_3.0_1727163851135.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("rinna_roberta_qa_arcd1","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("rinna_roberta_qa_arcd1", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|rinna_roberta_qa_arcd1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|222.1 MB| + +## References + +https://huggingface.co/Echiguerkh/rinna-roberta-qa-arcd1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-rinna_roberta_qa_arcd1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-rinna_roberta_qa_arcd1_pipeline_en.md new file mode 100644 index 00000000000000..d34756d5b47781 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-rinna_roberta_qa_arcd1_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English rinna_roberta_qa_arcd1_pipeline pipeline BertForQuestionAnswering from Echiguerkh +author: John Snow Labs +name: rinna_roberta_qa_arcd1_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`rinna_roberta_qa_arcd1_pipeline` is a English model originally trained by Echiguerkh. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/rinna_roberta_qa_arcd1_pipeline_en_5.5.0_3.0_1727163862400.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/rinna_roberta_qa_arcd1_pipeline_en_5.5.0_3.0_1727163862400.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("rinna_roberta_qa_arcd1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("rinna_roberta_qa_arcd1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|rinna_roberta_qa_arcd1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|222.1 MB| + +## References + +https://huggingface.co/Echiguerkh/rinna-roberta-qa-arcd1 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-robbert_v2_dutch_base_finetuned_emotion_en.md b/docs/_posts/ahmedlone127/2024-09-24-robbert_v2_dutch_base_finetuned_emotion_en.md new file mode 100644 index 00000000000000..ba351b66968ff1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-robbert_v2_dutch_base_finetuned_emotion_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English robbert_v2_dutch_base_finetuned_emotion RoBertaForSequenceClassification from antalvdb +author: John Snow Labs +name: robbert_v2_dutch_base_finetuned_emotion +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`robbert_v2_dutch_base_finetuned_emotion` is a English model originally trained by antalvdb. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/robbert_v2_dutch_base_finetuned_emotion_en_5.5.0_3.0_1727211566914.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/robbert_v2_dutch_base_finetuned_emotion_en_5.5.0_3.0_1727211566914.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("robbert_v2_dutch_base_finetuned_emotion","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("robbert_v2_dutch_base_finetuned_emotion", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|robbert_v2_dutch_base_finetuned_emotion| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|437.9 MB| + +## References + +https://huggingface.co/antalvdb/robbert-v2-dutch-base-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-robbert_v2_dutch_base_finetuned_emotion_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-robbert_v2_dutch_base_finetuned_emotion_pipeline_en.md new file mode 100644 index 00000000000000..3aac848a351caf --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-robbert_v2_dutch_base_finetuned_emotion_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English robbert_v2_dutch_base_finetuned_emotion_pipeline pipeline RoBertaForSequenceClassification from antalvdb +author: John Snow Labs +name: robbert_v2_dutch_base_finetuned_emotion_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`robbert_v2_dutch_base_finetuned_emotion_pipeline` is a English model originally trained by antalvdb. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/robbert_v2_dutch_base_finetuned_emotion_pipeline_en_5.5.0_3.0_1727211590894.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/robbert_v2_dutch_base_finetuned_emotion_pipeline_en_5.5.0_3.0_1727211590894.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("robbert_v2_dutch_base_finetuned_emotion_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("robbert_v2_dutch_base_finetuned_emotion_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|robbert_v2_dutch_base_finetuned_emotion_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|438.0 MB| + +## References + +https://huggingface.co/antalvdb/robbert-v2-dutch-base-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-roberta_base_biomedical_spanish_finetunedemoevent_en.md b/docs/_posts/ahmedlone127/2024-09-24-roberta_base_biomedical_spanish_finetunedemoevent_en.md new file mode 100644 index 00000000000000..9380c8a220648a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-roberta_base_biomedical_spanish_finetunedemoevent_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_base_biomedical_spanish_finetunedemoevent RoBertaForSequenceClassification from joancipria +author: John Snow Labs +name: roberta_base_biomedical_spanish_finetunedemoevent +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_biomedical_spanish_finetunedemoevent` is a English model originally trained by joancipria. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_biomedical_spanish_finetunedemoevent_en_5.5.0_3.0_1727171044796.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_biomedical_spanish_finetunedemoevent_en_5.5.0_3.0_1727171044796.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_base_biomedical_spanish_finetunedemoevent","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_base_biomedical_spanish_finetunedemoevent", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_biomedical_spanish_finetunedemoevent| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|438.8 MB| + +## References + +https://huggingface.co/joancipria/roberta-base-biomedical-es-FineTunedEmoEvent \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-roberta_base_bne_capitel_ner_plantl_gob_es_pipeline_es.md b/docs/_posts/ahmedlone127/2024-09-24-roberta_base_bne_capitel_ner_plantl_gob_es_pipeline_es.md new file mode 100644 index 00000000000000..632a283276027f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-roberta_base_bne_capitel_ner_plantl_gob_es_pipeline_es.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Castilian, Spanish roberta_base_bne_capitel_ner_plantl_gob_es_pipeline pipeline RoBertaForTokenClassification from PlanTL-GOB-ES +author: John Snow Labs +name: roberta_base_bne_capitel_ner_plantl_gob_es_pipeline +date: 2024-09-24 +tags: [es, open_source, pipeline, onnx] +task: Named Entity Recognition +language: es +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_bne_capitel_ner_plantl_gob_es_pipeline` is a Castilian, Spanish model originally trained by PlanTL-GOB-ES. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_bne_capitel_ner_plantl_gob_es_pipeline_es_5.5.0_3.0_1727198929333.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_bne_capitel_ner_plantl_gob_es_pipeline_es_5.5.0_3.0_1727198929333.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_bne_capitel_ner_plantl_gob_es_pipeline", lang = "es") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_bne_capitel_ner_plantl_gob_es_pipeline", lang = "es") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_bne_capitel_ner_plantl_gob_es_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|es| +|Size:|456.6 MB| + +## References + +https://huggingface.co/PlanTL-GOB-ES/roberta-base-bne-capitel-ner + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-roberta_base_bne_finetuned_amazon_reviews_multi_pdres_en.md b/docs/_posts/ahmedlone127/2024-09-24-roberta_base_bne_finetuned_amazon_reviews_multi_pdres_en.md new file mode 100644 index 00000000000000..7b02bb3c39136a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-roberta_base_bne_finetuned_amazon_reviews_multi_pdres_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_base_bne_finetuned_amazon_reviews_multi_pdres RoBertaForSequenceClassification from PDRES +author: John Snow Labs +name: roberta_base_bne_finetuned_amazon_reviews_multi_pdres +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_bne_finetuned_amazon_reviews_multi_pdres` is a English model originally trained by PDRES. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_bne_finetuned_amazon_reviews_multi_pdres_en_5.5.0_3.0_1727171594652.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_bne_finetuned_amazon_reviews_multi_pdres_en_5.5.0_3.0_1727171594652.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_base_bne_finetuned_amazon_reviews_multi_pdres","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_base_bne_finetuned_amazon_reviews_multi_pdres", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_bne_finetuned_amazon_reviews_multi_pdres| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|297.3 MB| + +## References + +https://huggingface.co/PDRES/roberta-base-bne-finetuned-amazon_reviews_multi \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-roberta_base_bne_finetuned_amazon_reviews_multi_pdres_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-roberta_base_bne_finetuned_amazon_reviews_multi_pdres_pipeline_en.md new file mode 100644 index 00000000000000..4bc8dfccb45968 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-roberta_base_bne_finetuned_amazon_reviews_multi_pdres_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_bne_finetuned_amazon_reviews_multi_pdres_pipeline pipeline RoBertaForSequenceClassification from PDRES +author: John Snow Labs +name: roberta_base_bne_finetuned_amazon_reviews_multi_pdres_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_bne_finetuned_amazon_reviews_multi_pdres_pipeline` is a English model originally trained by PDRES. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_bne_finetuned_amazon_reviews_multi_pdres_pipeline_en_5.5.0_3.0_1727171677197.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_bne_finetuned_amazon_reviews_multi_pdres_pipeline_en_5.5.0_3.0_1727171677197.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_bne_finetuned_amazon_reviews_multi_pdres_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_bne_finetuned_amazon_reviews_multi_pdres_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_bne_finetuned_amazon_reviews_multi_pdres_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|297.3 MB| + +## References + +https://huggingface.co/PDRES/roberta-base-bne-finetuned-amazon_reviews_multi + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-roberta_base_epoch_24_en.md b/docs/_posts/ahmedlone127/2024-09-24-roberta_base_epoch_24_en.md new file mode 100644 index 00000000000000..649e348b2dc974 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-roberta_base_epoch_24_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_base_epoch_24 RoBertaEmbeddings from yanaiela +author: John Snow Labs +name: roberta_base_epoch_24 +date: 2024-09-24 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_epoch_24` is a English model originally trained by yanaiela. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_epoch_24_en_5.5.0_3.0_1727169325365.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_epoch_24_en_5.5.0_3.0_1727169325365.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("roberta_base_epoch_24","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("roberta_base_epoch_24","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_epoch_24| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|297.3 MB| + +## References + +https://huggingface.co/yanaiela/roberta-base-epoch_24 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-roberta_base_epoch_53_en.md b/docs/_posts/ahmedlone127/2024-09-24-roberta_base_epoch_53_en.md new file mode 100644 index 00000000000000..d315e6ef3d5fa1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-roberta_base_epoch_53_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_base_epoch_53 RoBertaEmbeddings from yanaiela +author: John Snow Labs +name: roberta_base_epoch_53 +date: 2024-09-24 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_epoch_53` is a English model originally trained by yanaiela. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_epoch_53_en_5.5.0_3.0_1727168834976.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_epoch_53_en_5.5.0_3.0_1727168834976.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("roberta_base_epoch_53","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("roberta_base_epoch_53","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_epoch_53| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|297.3 MB| + +## References + +https://huggingface.co/yanaiela/roberta-base-epoch_53 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-roberta_base_epoch_53_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-roberta_base_epoch_53_pipeline_en.md new file mode 100644 index 00000000000000..02347dfcb08b70 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-roberta_base_epoch_53_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_epoch_53_pipeline pipeline RoBertaEmbeddings from yanaiela +author: John Snow Labs +name: roberta_base_epoch_53_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_epoch_53_pipeline` is a English model originally trained by yanaiela. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_epoch_53_pipeline_en_5.5.0_3.0_1727168921993.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_epoch_53_pipeline_en_5.5.0_3.0_1727168921993.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_epoch_53_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_epoch_53_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_epoch_53_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|297.3 MB| + +## References + +https://huggingface.co/yanaiela/roberta-base-epoch_53 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-roberta_base_epoch_83_en.md b/docs/_posts/ahmedlone127/2024-09-24-roberta_base_epoch_83_en.md new file mode 100644 index 00000000000000..5e7fda55a61a2a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-roberta_base_epoch_83_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_base_epoch_83 RoBertaEmbeddings from yanaiela +author: John Snow Labs +name: roberta_base_epoch_83 +date: 2024-09-24 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_epoch_83` is a English model originally trained by yanaiela. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_epoch_83_en_5.5.0_3.0_1727169305070.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_epoch_83_en_5.5.0_3.0_1727169305070.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("roberta_base_epoch_83","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("roberta_base_epoch_83","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_epoch_83| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|297.3 MB| + +## References + +https://huggingface.co/yanaiela/roberta-base-epoch_83 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-roberta_base_finetuned_wallisian_manual_4ep_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-roberta_base_finetuned_wallisian_manual_4ep_pipeline_en.md new file mode 100644 index 00000000000000..f4ec72d222f446 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-roberta_base_finetuned_wallisian_manual_4ep_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_finetuned_wallisian_manual_4ep_pipeline pipeline RoBertaEmbeddings from btamm12 +author: John Snow Labs +name: roberta_base_finetuned_wallisian_manual_4ep_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_finetuned_wallisian_manual_4ep_pipeline` is a English model originally trained by btamm12. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_finetuned_wallisian_manual_4ep_pipeline_en_5.5.0_3.0_1727168712517.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_finetuned_wallisian_manual_4ep_pipeline_en_5.5.0_3.0_1727168712517.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_finetuned_wallisian_manual_4ep_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_finetuned_wallisian_manual_4ep_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_finetuned_wallisian_manual_4ep_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|464.6 MB| + +## References + +https://huggingface.co/btamm12/roberta-base-finetuned-wls-manual-4ep + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-roberta_base_legal_multi_downstream_indian_ner_en.md b/docs/_posts/ahmedlone127/2024-09-24-roberta_base_legal_multi_downstream_indian_ner_en.md new file mode 100644 index 00000000000000..8e4dad798426f5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-roberta_base_legal_multi_downstream_indian_ner_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_base_legal_multi_downstream_indian_ner RoBertaForTokenClassification from MHGanainy +author: John Snow Labs +name: roberta_base_legal_multi_downstream_indian_ner +date: 2024-09-24 +tags: [en, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_legal_multi_downstream_indian_ner` is a English model originally trained by MHGanainy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_legal_multi_downstream_indian_ner_en_5.5.0_3.0_1727195286050.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_legal_multi_downstream_indian_ner_en_5.5.0_3.0_1727195286050.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_base_legal_multi_downstream_indian_ner","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_base_legal_multi_downstream_indian_ner", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_legal_multi_downstream_indian_ner| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|466.4 MB| + +## References + +https://huggingface.co/MHGanainy/roberta-base-legal-multi-downstream-indian-ner \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-roberta_base_legal_multi_downstream_indian_ner_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-roberta_base_legal_multi_downstream_indian_ner_pipeline_en.md new file mode 100644 index 00000000000000..25de030ffe5767 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-roberta_base_legal_multi_downstream_indian_ner_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_legal_multi_downstream_indian_ner_pipeline pipeline RoBertaForTokenClassification from MHGanainy +author: John Snow Labs +name: roberta_base_legal_multi_downstream_indian_ner_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_legal_multi_downstream_indian_ner_pipeline` is a English model originally trained by MHGanainy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_legal_multi_downstream_indian_ner_pipeline_en_5.5.0_3.0_1727195309069.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_legal_multi_downstream_indian_ner_pipeline_en_5.5.0_3.0_1727195309069.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_legal_multi_downstream_indian_ner_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_legal_multi_downstream_indian_ner_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_legal_multi_downstream_indian_ner_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|466.4 MB| + +## References + +https://huggingface.co/MHGanainy/roberta-base-legal-multi-downstream-indian-ner + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-roberta_base_ours_rundi_2_en.md b/docs/_posts/ahmedlone127/2024-09-24-roberta_base_ours_rundi_2_en.md new file mode 100644 index 00000000000000..e1009198313143 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-roberta_base_ours_rundi_2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_base_ours_rundi_2 RoBertaForSequenceClassification from SkyR +author: John Snow Labs +name: roberta_base_ours_rundi_2 +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_ours_rundi_2` is a English model originally trained by SkyR. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_ours_rundi_2_en_5.5.0_3.0_1727172175283.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_ours_rundi_2_en_5.5.0_3.0_1727172175283.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_base_ours_rundi_2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_base_ours_rundi_2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_ours_rundi_2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|429.2 MB| + +## References + +https://huggingface.co/SkyR/roberta-base-ours-run-2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-roberta_base_ours_rundi_2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-roberta_base_ours_rundi_2_pipeline_en.md new file mode 100644 index 00000000000000..155d0a4c4891c0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-roberta_base_ours_rundi_2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_ours_rundi_2_pipeline pipeline RoBertaForSequenceClassification from SkyR +author: John Snow Labs +name: roberta_base_ours_rundi_2_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_ours_rundi_2_pipeline` is a English model originally trained by SkyR. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_ours_rundi_2_pipeline_en_5.5.0_3.0_1727172219611.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_ours_rundi_2_pipeline_en_5.5.0_3.0_1727172219611.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_ours_rundi_2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_ours_rundi_2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_ours_rundi_2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|429.3 MB| + +## References + +https://huggingface.co/SkyR/roberta-base-ours-run-2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-roberta_base_sst_2_32_13_smoothed_en.md b/docs/_posts/ahmedlone127/2024-09-24-roberta_base_sst_2_32_13_smoothed_en.md new file mode 100644 index 00000000000000..0d55e2e8fd84be --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-roberta_base_sst_2_32_13_smoothed_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_base_sst_2_32_13_smoothed RoBertaForSequenceClassification from simonycl +author: John Snow Labs +name: roberta_base_sst_2_32_13_smoothed +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_sst_2_32_13_smoothed` is a English model originally trained by simonycl. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_sst_2_32_13_smoothed_en_5.5.0_3.0_1727167523794.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_sst_2_32_13_smoothed_en_5.5.0_3.0_1727167523794.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_base_sst_2_32_13_smoothed","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_base_sst_2_32_13_smoothed", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_sst_2_32_13_smoothed| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|424.9 MB| + +## References + +https://huggingface.co/simonycl/roberta-base-sst-2-32-13-smoothed \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-roberta_base_sst_2_32_13_smoothed_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-roberta_base_sst_2_32_13_smoothed_pipeline_en.md new file mode 100644 index 00000000000000..f4fa61e4eab5c6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-roberta_base_sst_2_32_13_smoothed_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_sst_2_32_13_smoothed_pipeline pipeline RoBertaForSequenceClassification from simonycl +author: John Snow Labs +name: roberta_base_sst_2_32_13_smoothed_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_sst_2_32_13_smoothed_pipeline` is a English model originally trained by simonycl. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_sst_2_32_13_smoothed_pipeline_en_5.5.0_3.0_1727167559838.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_sst_2_32_13_smoothed_pipeline_en_5.5.0_3.0_1727167559838.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_sst_2_32_13_smoothed_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_sst_2_32_13_smoothed_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_sst_2_32_13_smoothed_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|424.9 MB| + +## References + +https://huggingface.co/simonycl/roberta-base-sst-2-32-13-smoothed + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-roberta_base_sst_2_64_13_30_en.md b/docs/_posts/ahmedlone127/2024-09-24-roberta_base_sst_2_64_13_30_en.md new file mode 100644 index 00000000000000..bd5a69fd9a1311 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-roberta_base_sst_2_64_13_30_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_base_sst_2_64_13_30 RoBertaForSequenceClassification from simonycl +author: John Snow Labs +name: roberta_base_sst_2_64_13_30 +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_sst_2_64_13_30` is a English model originally trained by simonycl. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_sst_2_64_13_30_en_5.5.0_3.0_1727167163017.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_sst_2_64_13_30_en_5.5.0_3.0_1727167163017.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_base_sst_2_64_13_30","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_base_sst_2_64_13_30", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_sst_2_64_13_30| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|425.6 MB| + +## References + +https://huggingface.co/simonycl/roberta-base-sst-2-64-13-30 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-roberta_conll_epoch_8_en.md b/docs/_posts/ahmedlone127/2024-09-24-roberta_conll_epoch_8_en.md new file mode 100644 index 00000000000000..3a96e2f1a2784d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-roberta_conll_epoch_8_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_conll_epoch_8 RoBertaForTokenClassification from ICT2214Team7 +author: John Snow Labs +name: roberta_conll_epoch_8 +date: 2024-09-24 +tags: [en, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_conll_epoch_8` is a English model originally trained by ICT2214Team7. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_conll_epoch_8_en_5.5.0_3.0_1727139356272.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_conll_epoch_8_en_5.5.0_3.0_1727139356272.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_conll_epoch_8","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_conll_epoch_8", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_conll_epoch_8| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|306.5 MB| + +## References + +https://huggingface.co/ICT2214Team7/RoBERTa_conll_epoch_8 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-roberta_conll_epoch_8_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-roberta_conll_epoch_8_pipeline_en.md new file mode 100644 index 00000000000000..91da817aa8e7ff --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-roberta_conll_epoch_8_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_conll_epoch_8_pipeline pipeline RoBertaForTokenClassification from ICT2214Team7 +author: John Snow Labs +name: roberta_conll_epoch_8_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_conll_epoch_8_pipeline` is a English model originally trained by ICT2214Team7. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_conll_epoch_8_pipeline_en_5.5.0_3.0_1727139372178.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_conll_epoch_8_pipeline_en_5.5.0_3.0_1727139372178.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_conll_epoch_8_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_conll_epoch_8_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_conll_epoch_8_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|306.6 MB| + +## References + +https://huggingface.co/ICT2214Team7/RoBERTa_conll_epoch_8 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-roberta_conll_epoch_9_en.md b/docs/_posts/ahmedlone127/2024-09-24-roberta_conll_epoch_9_en.md new file mode 100644 index 00000000000000..0db6092e4e92b1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-roberta_conll_epoch_9_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_conll_epoch_9 RoBertaForTokenClassification from ICT2214Team7 +author: John Snow Labs +name: roberta_conll_epoch_9 +date: 2024-09-24 +tags: [en, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_conll_epoch_9` is a English model originally trained by ICT2214Team7. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_conll_epoch_9_en_5.5.0_3.0_1727151113750.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_conll_epoch_9_en_5.5.0_3.0_1727151113750.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_conll_epoch_9","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_conll_epoch_9", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_conll_epoch_9| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|306.6 MB| + +## References + +https://huggingface.co/ICT2214Team7/RoBERTa_conll_epoch_9 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-roberta_conll_epoch_9_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-roberta_conll_epoch_9_pipeline_en.md new file mode 100644 index 00000000000000..601a9f363805fe --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-roberta_conll_epoch_9_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_conll_epoch_9_pipeline pipeline RoBertaForTokenClassification from ICT2214Team7 +author: John Snow Labs +name: roberta_conll_epoch_9_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_conll_epoch_9_pipeline` is a English model originally trained by ICT2214Team7. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_conll_epoch_9_pipeline_en_5.5.0_3.0_1727151129521.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_conll_epoch_9_pipeline_en_5.5.0_3.0_1727151129521.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_conll_epoch_9_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_conll_epoch_9_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_conll_epoch_9_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|306.6 MB| + +## References + +https://huggingface.co/ICT2214Team7/RoBERTa_conll_epoch_9 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-roberta_ganda_cased_malay_ner_v2_test_en.md b/docs/_posts/ahmedlone127/2024-09-24-roberta_ganda_cased_malay_ner_v2_test_en.md new file mode 100644 index 00000000000000..56b701243fc978 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-roberta_ganda_cased_malay_ner_v2_test_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_ganda_cased_malay_ner_v2_test RoBertaForTokenClassification from nxaliao +author: John Snow Labs +name: roberta_ganda_cased_malay_ner_v2_test +date: 2024-09-24 +tags: [en, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_ganda_cased_malay_ner_v2_test` is a English model originally trained by nxaliao. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_ganda_cased_malay_ner_v2_test_en_5.5.0_3.0_1727151284445.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_ganda_cased_malay_ner_v2_test_en_5.5.0_3.0_1727151284445.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_ganda_cased_malay_ner_v2_test","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_ganda_cased_malay_ner_v2_test", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_ganda_cased_malay_ner_v2_test| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/nxaliao/roberta-lg-cased-ms-ner-v2-test \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-roberta_ganda_cased_malay_ner_v2_test_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-roberta_ganda_cased_malay_ner_v2_test_pipeline_en.md new file mode 100644 index 00000000000000..e4531a5d94c394 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-roberta_ganda_cased_malay_ner_v2_test_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_ganda_cased_malay_ner_v2_test_pipeline pipeline RoBertaForTokenClassification from nxaliao +author: John Snow Labs +name: roberta_ganda_cased_malay_ner_v2_test_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_ganda_cased_malay_ner_v2_test_pipeline` is a English model originally trained by nxaliao. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_ganda_cased_malay_ner_v2_test_pipeline_en_5.5.0_3.0_1727151359082.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_ganda_cased_malay_ner_v2_test_pipeline_en_5.5.0_3.0_1727151359082.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_ganda_cased_malay_ner_v2_test_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_ganda_cased_malay_ner_v2_test_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_ganda_cased_malay_ner_v2_test_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/nxaliao/roberta-lg-cased-ms-ner-v2-test + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-roberta_large_bc4chemd_en.md b/docs/_posts/ahmedlone127/2024-09-24-roberta_large_bc4chemd_en.md new file mode 100644 index 00000000000000..cf99c41a72ce41 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-roberta_large_bc4chemd_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_large_bc4chemd RoBertaForTokenClassification from CheccoCando +author: John Snow Labs +name: roberta_large_bc4chemd +date: 2024-09-24 +tags: [en, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_large_bc4chemd` is a English model originally trained by CheccoCando. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_large_bc4chemd_en_5.5.0_3.0_1727150722493.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_large_bc4chemd_en_5.5.0_3.0_1727150722493.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_large_bc4chemd","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_large_bc4chemd", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_large_bc4chemd| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/CheccoCando/roberta-large_bc4chemd \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-roberta_large_bc4chemd_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-roberta_large_bc4chemd_pipeline_en.md new file mode 100644 index 00000000000000..cd3130c43c080c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-roberta_large_bc4chemd_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_large_bc4chemd_pipeline pipeline RoBertaForTokenClassification from CheccoCando +author: John Snow Labs +name: roberta_large_bc4chemd_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_large_bc4chemd_pipeline` is a English model originally trained by CheccoCando. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_large_bc4chemd_pipeline_en_5.5.0_3.0_1727150808360.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_large_bc4chemd_pipeline_en_5.5.0_3.0_1727150808360.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_large_bc4chemd_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_large_bc4chemd_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_large_bc4chemd_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/CheccoCando/roberta-large_bc4chemd + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-roberta_large_finetuned_ner_finetuned_ner_lionellow_en.md b/docs/_posts/ahmedlone127/2024-09-24-roberta_large_finetuned_ner_finetuned_ner_lionellow_en.md new file mode 100644 index 00000000000000..b69087ec5d333f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-roberta_large_finetuned_ner_finetuned_ner_lionellow_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_large_finetuned_ner_finetuned_ner_lionellow RoBertaForTokenClassification from LionelLow +author: John Snow Labs +name: roberta_large_finetuned_ner_finetuned_ner_lionellow +date: 2024-09-24 +tags: [en, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_large_finetuned_ner_finetuned_ner_lionellow` is a English model originally trained by LionelLow. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_large_finetuned_ner_finetuned_ner_lionellow_en_5.5.0_3.0_1727150718733.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_large_finetuned_ner_finetuned_ner_lionellow_en_5.5.0_3.0_1727150718733.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_large_finetuned_ner_finetuned_ner_lionellow","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_large_finetuned_ner_finetuned_ner_lionellow", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_large_finetuned_ner_finetuned_ner_lionellow| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/LionelLow/roberta-large-finetuned-ner-finetuned-ner \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-roberta_large_gest_pred_seqeval_partialmatch_en.md b/docs/_posts/ahmedlone127/2024-09-24-roberta_large_gest_pred_seqeval_partialmatch_en.md new file mode 100644 index 00000000000000..a434bef64070e8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-roberta_large_gest_pred_seqeval_partialmatch_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_large_gest_pred_seqeval_partialmatch RoBertaForTokenClassification from Jsevisal +author: John Snow Labs +name: roberta_large_gest_pred_seqeval_partialmatch +date: 2024-09-24 +tags: [en, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_large_gest_pred_seqeval_partialmatch` is a English model originally trained by Jsevisal. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_large_gest_pred_seqeval_partialmatch_en_5.5.0_3.0_1727139943371.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_large_gest_pred_seqeval_partialmatch_en_5.5.0_3.0_1727139943371.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_large_gest_pred_seqeval_partialmatch","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_large_gest_pred_seqeval_partialmatch", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_large_gest_pred_seqeval_partialmatch| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/Jsevisal/roberta-large-gest-pred-seqeval-partialmatch \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-roberta_large_gest_pred_seqeval_partialmatch_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-roberta_large_gest_pred_seqeval_partialmatch_pipeline_en.md new file mode 100644 index 00000000000000..8c75f7cd3f5778 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-roberta_large_gest_pred_seqeval_partialmatch_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_large_gest_pred_seqeval_partialmatch_pipeline pipeline RoBertaForTokenClassification from Jsevisal +author: John Snow Labs +name: roberta_large_gest_pred_seqeval_partialmatch_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_large_gest_pred_seqeval_partialmatch_pipeline` is a English model originally trained by Jsevisal. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_large_gest_pred_seqeval_partialmatch_pipeline_en_5.5.0_3.0_1727140019503.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_large_gest_pred_seqeval_partialmatch_pipeline_en_5.5.0_3.0_1727140019503.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_large_gest_pred_seqeval_partialmatch_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_large_gest_pred_seqeval_partialmatch_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_large_gest_pred_seqeval_partialmatch_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/Jsevisal/roberta-large-gest-pred-seqeval-partialmatch + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-roberta_large_pm_m3_voc_hf_finetuned_ner_gundapusunil_en.md b/docs/_posts/ahmedlone127/2024-09-24-roberta_large_pm_m3_voc_hf_finetuned_ner_gundapusunil_en.md new file mode 100644 index 00000000000000..30c13686b4352d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-roberta_large_pm_m3_voc_hf_finetuned_ner_gundapusunil_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_large_pm_m3_voc_hf_finetuned_ner_gundapusunil RoBertaForTokenClassification from gundapusunil +author: John Snow Labs +name: roberta_large_pm_m3_voc_hf_finetuned_ner_gundapusunil +date: 2024-09-24 +tags: [en, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_large_pm_m3_voc_hf_finetuned_ner_gundapusunil` is a English model originally trained by gundapusunil. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_large_pm_m3_voc_hf_finetuned_ner_gundapusunil_en_5.5.0_3.0_1727139436438.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_large_pm_m3_voc_hf_finetuned_ner_gundapusunil_en_5.5.0_3.0_1727139436438.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_large_pm_m3_voc_hf_finetuned_ner_gundapusunil","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_large_pm_m3_voc_hf_finetuned_ner_gundapusunil", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_large_pm_m3_voc_hf_finetuned_ner_gundapusunil| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/gundapusunil/RoBERTa-large-PM-M3-Voc-hf-finetuned-ner \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-roberta_large_pm_m3_voc_hf_finetuned_ner_gundapusunil_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-roberta_large_pm_m3_voc_hf_finetuned_ner_gundapusunil_pipeline_en.md new file mode 100644 index 00000000000000..bef9b67662e864 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-roberta_large_pm_m3_voc_hf_finetuned_ner_gundapusunil_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_large_pm_m3_voc_hf_finetuned_ner_gundapusunil_pipeline pipeline RoBertaForTokenClassification from gundapusunil +author: John Snow Labs +name: roberta_large_pm_m3_voc_hf_finetuned_ner_gundapusunil_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_large_pm_m3_voc_hf_finetuned_ner_gundapusunil_pipeline` is a English model originally trained by gundapusunil. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_large_pm_m3_voc_hf_finetuned_ner_gundapusunil_pipeline_en_5.5.0_3.0_1727139516513.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_large_pm_m3_voc_hf_finetuned_ner_gundapusunil_pipeline_en_5.5.0_3.0_1727139516513.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_large_pm_m3_voc_hf_finetuned_ner_gundapusunil_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_large_pm_m3_voc_hf_finetuned_ner_gundapusunil_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_large_pm_m3_voc_hf_finetuned_ner_gundapusunil_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/gundapusunil/RoBERTa-large-PM-M3-Voc-hf-finetuned-ner + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-roberta_large_temp_classifier_bootstrapped_v2_en.md b/docs/_posts/ahmedlone127/2024-09-24-roberta_large_temp_classifier_bootstrapped_v2_en.md new file mode 100644 index 00000000000000..e421fa410fe617 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-roberta_large_temp_classifier_bootstrapped_v2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_large_temp_classifier_bootstrapped_v2 RoBertaForSequenceClassification from research-dump +author: John Snow Labs +name: roberta_large_temp_classifier_bootstrapped_v2 +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_large_temp_classifier_bootstrapped_v2` is a English model originally trained by research-dump. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_large_temp_classifier_bootstrapped_v2_en_5.5.0_3.0_1727171317924.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_large_temp_classifier_bootstrapped_v2_en_5.5.0_3.0_1727171317924.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_large_temp_classifier_bootstrapped_v2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_large_temp_classifier_bootstrapped_v2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_large_temp_classifier_bootstrapped_v2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/research-dump/roberta_large_temp_classifier_bootstrapped_v2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-roberta_large_temp_classifier_bootstrapped_v2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-roberta_large_temp_classifier_bootstrapped_v2_pipeline_en.md new file mode 100644 index 00000000000000..a3b7d411fd0303 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-roberta_large_temp_classifier_bootstrapped_v2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_large_temp_classifier_bootstrapped_v2_pipeline pipeline RoBertaForSequenceClassification from research-dump +author: John Snow Labs +name: roberta_large_temp_classifier_bootstrapped_v2_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_large_temp_classifier_bootstrapped_v2_pipeline` is a English model originally trained by research-dump. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_large_temp_classifier_bootstrapped_v2_pipeline_en_5.5.0_3.0_1727171385064.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_large_temp_classifier_bootstrapped_v2_pipeline_en_5.5.0_3.0_1727171385064.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_large_temp_classifier_bootstrapped_v2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_large_temp_classifier_bootstrapped_v2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_large_temp_classifier_bootstrapped_v2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/research-dump/roberta_large_temp_classifier_bootstrapped_v2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-roberta_qa_dpr_nq_reader_roberta_base_en.md b/docs/_posts/ahmedlone127/2024-09-24-roberta_qa_dpr_nq_reader_roberta_base_en.md new file mode 100644 index 00000000000000..82764907a8ae99 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-roberta_qa_dpr_nq_reader_roberta_base_en.md @@ -0,0 +1,106 @@ +--- +layout: model +title: English RobertaForQuestionAnswering (from nlpconnect) +author: John Snow Labs +name: roberta_qa_dpr_nq_reader_roberta_base +date: 2024-09-24 +tags: [en, open_source, question_answering, roberta, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained Question Answering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `dpr-nq-reader-roberta-base` is a English model originally trained by `nlpconnect`. + +## Predicted Entities + + + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_qa_dpr_nq_reader_roberta_base_en_5.5.0_3.0_1727210947078.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_qa_dpr_nq_reader_roberta_base_en_5.5.0_3.0_1727210947078.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +document_assembler = MultiDocumentAssembler() \ +.setInputCols(["question", "context"]) \ +.setOutputCols(["document_question", "document_context"]) + +spanClassifier = RoBertaForQuestionAnswering.pretrained("roberta_qa_dpr_nq_reader_roberta_base","en") \ +.setInputCols(["document_question", "document_context"]) \ +.setOutputCol("answer") \ +.setCaseSensitive(True) + +pipeline = Pipeline().setStages([ +document_assembler, +spanClassifier +]) + +example = spark.createDataFrame([["What's my name?", "My name is Clara and I live in Berkeley."]]).toDF("question", "context") + +result = pipeline.fit(example).transform(example) +``` +```scala +val document = new MultiDocumentAssembler() +.setInputCols("question", "context") +.setOutputCols("document_question", "document_context") + +val spanClassifier = RoBertaForQuestionAnswering +.pretrained("roberta_qa_dpr_nq_reader_roberta_base","en") +.setInputCols(Array("document_question", "document_context")) +.setOutputCol("answer") +.setCaseSensitive(true) +.setMaxSentenceLength(512) + +val pipeline = new Pipeline().setStages(Array(document, spanClassifier)) + +val example = Seq( +("Where was John Lenon born?", "John Lenon was born in London and lived in Paris. My name is Sarah and I live in London."), +("What's my name?", "My name is Clara and I live in Berkeley.")) +.toDF("question", "context") + +val result = pipeline.fit(example).transform(example) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.answer_question.roberta.base.by_nlpconnect").predict("""What's my name?|||"My name is Clara and I live in Berkeley.""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_qa_dpr_nq_reader_roberta_base| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|465.6 MB| + +## References + +References + +- https://huggingface.co/nlpconnect/dpr-nq-reader-roberta-base \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-roberta_qa_dpr_nq_reader_roberta_base_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-roberta_qa_dpr_nq_reader_roberta_base_pipeline_en.md new file mode 100644 index 00000000000000..6bffe1dc109ea5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-roberta_qa_dpr_nq_reader_roberta_base_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English roberta_qa_dpr_nq_reader_roberta_base_pipeline pipeline RoBertaForQuestionAnswering from nlpconnect +author: John Snow Labs +name: roberta_qa_dpr_nq_reader_roberta_base_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_qa_dpr_nq_reader_roberta_base_pipeline` is a English model originally trained by nlpconnect. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_qa_dpr_nq_reader_roberta_base_pipeline_en_5.5.0_3.0_1727210971188.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_qa_dpr_nq_reader_roberta_base_pipeline_en_5.5.0_3.0_1727210971188.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_qa_dpr_nq_reader_roberta_base_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_qa_dpr_nq_reader_roberta_base_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_qa_dpr_nq_reader_roberta_base_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|465.6 MB| + +## References + +https://huggingface.co/nlpconnect/dpr-nq-reader-roberta-base + +## Included Models + +- MultiDocumentAssembler +- RoBertaForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-roberta_qa_finetuned_state_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-roberta_qa_finetuned_state_pipeline_en.md new file mode 100644 index 00000000000000..c874d576fd4bf5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-roberta_qa_finetuned_state_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English roberta_qa_finetuned_state_pipeline pipeline RoBertaForQuestionAnswering from skandaonsolve +author: John Snow Labs +name: roberta_qa_finetuned_state_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_qa_finetuned_state_pipeline` is a English model originally trained by skandaonsolve. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_qa_finetuned_state_pipeline_en_5.5.0_3.0_1727211015905.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_qa_finetuned_state_pipeline_en_5.5.0_3.0_1727211015905.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_qa_finetuned_state_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_qa_finetuned_state_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_qa_finetuned_state_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|463.8 MB| + +## References + +https://huggingface.co/skandaonsolve/roberta-finetuned-state + +## Included Models + +- MultiDocumentAssembler +- RoBertaForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-roberta_qa_quales_iberlef_en.md b/docs/_posts/ahmedlone127/2024-09-24-roberta_qa_quales_iberlef_en.md new file mode 100644 index 00000000000000..6992c4efec1b28 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-roberta_qa_quales_iberlef_en.md @@ -0,0 +1,106 @@ +--- +layout: model +title: English RobertaForQuestionAnswering (from stevemobs) +author: John Snow Labs +name: roberta_qa_quales_iberlef +date: 2024-09-24 +tags: [en, open_source, question_answering, roberta, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained Question Answering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `quales-iberlef` is a English model originally trained by `stevemobs`. + +## Predicted Entities + + + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_qa_quales_iberlef_en_5.5.0_3.0_1727210853804.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_qa_quales_iberlef_en_5.5.0_3.0_1727210853804.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +document_assembler = MultiDocumentAssembler() \ +.setInputCols(["question", "context"]) \ +.setOutputCols(["document_question", "document_context"]) + +spanClassifier = RoBertaForQuestionAnswering.pretrained("roberta_qa_quales_iberlef","en") \ +.setInputCols(["document_question", "document_context"]) \ +.setOutputCol("answer") \ +.setCaseSensitive(True) + +pipeline = Pipeline().setStages([ +document_assembler, +spanClassifier +]) + +example = spark.createDataFrame([["What's my name?", "My name is Clara and I live in Berkeley."]]).toDF("question", "context") + +result = pipeline.fit(example).transform(example) +``` +```scala +val document = new MultiDocumentAssembler() +.setInputCols("question", "context") +.setOutputCols("document_question", "document_context") + +val spanClassifier = RoBertaForQuestionAnswering +.pretrained("roberta_qa_quales_iberlef","en") +.setInputCols(Array("document_question", "document_context")) +.setOutputCol("answer") +.setCaseSensitive(true) +.setMaxSentenceLength(512) + +val pipeline = new Pipeline().setStages(Array(document, spanClassifier)) + +val example = Seq( +("Where was John Lenon born?", "John Lenon was born in London and lived in Paris. My name is Sarah and I live in London."), +("What's my name?", "My name is Clara and I live in Berkeley.")) +.toDF("question", "context") + +val result = pipeline.fit(example).transform(example) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.answer_question.roberta.by_stevemobs").predict("""What's my name?|||"My name is Clara and I live in Berkeley.""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_qa_quales_iberlef| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|1.3 GB| + +## References + +References + +- https://huggingface.co/stevemobs/quales-iberlef \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-roberta_transfer2_en.md b/docs/_posts/ahmedlone127/2024-09-24-roberta_transfer2_en.md new file mode 100644 index 00000000000000..ced9224d24576b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-roberta_transfer2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_transfer2 RoBertaForSequenceClassification from SOUMYADEEPSAR +author: John Snow Labs +name: roberta_transfer2 +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_transfer2` is a English model originally trained by SOUMYADEEPSAR. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_transfer2_en_5.5.0_3.0_1727171890209.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_transfer2_en_5.5.0_3.0_1727171890209.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_transfer2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_transfer2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_transfer2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|435.7 MB| + +## References + +https://huggingface.co/SOUMYADEEPSAR/roberta_transfer2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-roberta_transfer2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-roberta_transfer2_pipeline_en.md new file mode 100644 index 00000000000000..ad754fddc8b261 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-roberta_transfer2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_transfer2_pipeline pipeline RoBertaForSequenceClassification from SOUMYADEEPSAR +author: John Snow Labs +name: roberta_transfer2_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_transfer2_pipeline` is a English model originally trained by SOUMYADEEPSAR. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_transfer2_pipeline_en_5.5.0_3.0_1727171925260.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_transfer2_pipeline_en_5.5.0_3.0_1727171925260.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_transfer2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_transfer2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_transfer2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|435.7 MB| + +## References + +https://huggingface.co/SOUMYADEEPSAR/roberta_transfer2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-robertalarge_finetuned_winogrande_en.md b/docs/_posts/ahmedlone127/2024-09-24-robertalarge_finetuned_winogrande_en.md new file mode 100644 index 00000000000000..6538b9c2a1038e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-robertalarge_finetuned_winogrande_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English robertalarge_finetuned_winogrande RoBertaForSequenceClassification from Kalslice +author: John Snow Labs +name: robertalarge_finetuned_winogrande +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`robertalarge_finetuned_winogrande` is a English model originally trained by Kalslice. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/robertalarge_finetuned_winogrande_en_5.5.0_3.0_1727167625898.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/robertalarge_finetuned_winogrande_en_5.5.0_3.0_1727167625898.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("robertalarge_finetuned_winogrande","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("robertalarge_finetuned_winogrande", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|robertalarge_finetuned_winogrande| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/Kalslice/robertalarge-finetuned-winogrande \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-robertalarge_finetuned_winogrande_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-robertalarge_finetuned_winogrande_pipeline_en.md new file mode 100644 index 00000000000000..a9028db5c85300 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-robertalarge_finetuned_winogrande_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English robertalarge_finetuned_winogrande_pipeline pipeline RoBertaForSequenceClassification from Kalslice +author: John Snow Labs +name: robertalarge_finetuned_winogrande_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`robertalarge_finetuned_winogrande_pipeline` is a English model originally trained by Kalslice. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/robertalarge_finetuned_winogrande_pipeline_en_5.5.0_3.0_1727167709776.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/robertalarge_finetuned_winogrande_pipeline_en_5.5.0_3.0_1727167709776.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("robertalarge_finetuned_winogrande_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("robertalarge_finetuned_winogrande_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|robertalarge_finetuned_winogrande_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/Kalslice/robertalarge-finetuned-winogrande + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-robertinha_gl.md b/docs/_posts/ahmedlone127/2024-09-24-robertinha_gl.md new file mode 100644 index 00000000000000..a38be62c597cd2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-robertinha_gl.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Galician robertinha RoBertaEmbeddings from mrm8488 +author: John Snow Labs +name: robertinha +date: 2024-09-24 +tags: [gl, open_source, onnx, embeddings, roberta] +task: Embeddings +language: gl +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`robertinha` is a Galician model originally trained by mrm8488. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/robertinha_gl_5.5.0_3.0_1727169258285.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/robertinha_gl_5.5.0_3.0_1727169258285.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("robertinha","gl") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("robertinha","gl") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|robertinha| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|gl| +|Size:|311.7 MB| + +## References + +https://huggingface.co/mrm8488/RoBERTinha \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-robertinha_pipeline_gl.md b/docs/_posts/ahmedlone127/2024-09-24-robertinha_pipeline_gl.md new file mode 100644 index 00000000000000..4d10f023580d33 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-robertinha_pipeline_gl.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Galician robertinha_pipeline pipeline RoBertaEmbeddings from mrm8488 +author: John Snow Labs +name: robertinha_pipeline +date: 2024-09-24 +tags: [gl, open_source, pipeline, onnx] +task: Embeddings +language: gl +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`robertinha_pipeline` is a Galician model originally trained by mrm8488. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/robertinha_pipeline_gl_5.5.0_3.0_1727169273729.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/robertinha_pipeline_gl_5.5.0_3.0_1727169273729.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("robertinha_pipeline", lang = "gl") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("robertinha_pipeline", lang = "gl") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|robertinha_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|gl| +|Size:|311.7 MB| + +## References + +https://huggingface.co/mrm8488/RoBERTinha + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-rubert_conversational_cased_sentiment_pipeline_ru.md b/docs/_posts/ahmedlone127/2024-09-24-rubert_conversational_cased_sentiment_pipeline_ru.md new file mode 100644 index 00000000000000..461fb390820984 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-rubert_conversational_cased_sentiment_pipeline_ru.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Russian rubert_conversational_cased_sentiment_pipeline pipeline BertForSequenceClassification from MonoHime +author: John Snow Labs +name: rubert_conversational_cased_sentiment_pipeline +date: 2024-09-24 +tags: [ru, open_source, pipeline, onnx] +task: Text Classification +language: ru +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`rubert_conversational_cased_sentiment_pipeline` is a Russian model originally trained by MonoHime. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/rubert_conversational_cased_sentiment_pipeline_ru_5.5.0_3.0_1727214205347.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/rubert_conversational_cased_sentiment_pipeline_ru_5.5.0_3.0_1727214205347.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("rubert_conversational_cased_sentiment_pipeline", lang = "ru") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("rubert_conversational_cased_sentiment_pipeline", lang = "ru") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|rubert_conversational_cased_sentiment_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|ru| +|Size:|664.5 MB| + +## References + +https://huggingface.co/MonoHime/rubert_conversational_cased_sentiment + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-rubert_conversational_cased_sentiment_ru.md b/docs/_posts/ahmedlone127/2024-09-24-rubert_conversational_cased_sentiment_ru.md new file mode 100644 index 00000000000000..286c51a0aa3cd5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-rubert_conversational_cased_sentiment_ru.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Russian rubert_conversational_cased_sentiment BertForSequenceClassification from MonoHime +author: John Snow Labs +name: rubert_conversational_cased_sentiment +date: 2024-09-24 +tags: [ru, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: ru +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`rubert_conversational_cased_sentiment` is a Russian model originally trained by MonoHime. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/rubert_conversational_cased_sentiment_ru_5.5.0_3.0_1727214171413.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/rubert_conversational_cased_sentiment_ru_5.5.0_3.0_1727214171413.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("rubert_conversational_cased_sentiment","ru") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("rubert_conversational_cased_sentiment", "ru") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|rubert_conversational_cased_sentiment| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|ru| +|Size:|664.4 MB| + +## References + +https://huggingface.co/MonoHime/rubert_conversational_cased_sentiment \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-rubert_sentence_similarity_en.md b/docs/_posts/ahmedlone127/2024-09-24-rubert_sentence_similarity_en.md new file mode 100644 index 00000000000000..dd2261ab253528 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-rubert_sentence_similarity_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English rubert_sentence_similarity BertForSequenceClassification from AlanRobotics +author: John Snow Labs +name: rubert_sentence_similarity +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`rubert_sentence_similarity` is a English model originally trained by AlanRobotics. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/rubert_sentence_similarity_en_5.5.0_3.0_1727219170114.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/rubert_sentence_similarity_en_5.5.0_3.0_1727219170114.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("rubert_sentence_similarity","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("rubert_sentence_similarity", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|rubert_sentence_similarity| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|666.5 MB| + +## References + +https://huggingface.co/AlanRobotics/rubert-sentence-similarity \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-rubert_sentence_similarity_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-rubert_sentence_similarity_pipeline_en.md new file mode 100644 index 00000000000000..e9aadb3ea6c9ac --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-rubert_sentence_similarity_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English rubert_sentence_similarity_pipeline pipeline BertForSequenceClassification from AlanRobotics +author: John Snow Labs +name: rubert_sentence_similarity_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`rubert_sentence_similarity_pipeline` is a English model originally trained by AlanRobotics. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/rubert_sentence_similarity_pipeline_en_5.5.0_3.0_1727219203780.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/rubert_sentence_similarity_pipeline_en_5.5.0_3.0_1727219203780.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("rubert_sentence_similarity_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("rubert_sentence_similarity_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|rubert_sentence_similarity_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|666.5 MB| + +## References + +https://huggingface.co/AlanRobotics/rubert-sentence-similarity + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-securebert_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-securebert_pipeline_en.md new file mode 100644 index 00000000000000..66b6727225c9ad --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-securebert_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English securebert_pipeline pipeline RoBertaEmbeddings from ehsanaghaei +author: John Snow Labs +name: securebert_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`securebert_pipeline` is a English model originally trained by ehsanaghaei. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/securebert_pipeline_en_5.5.0_3.0_1727216344948.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/securebert_pipeline_en_5.5.0_3.0_1727216344948.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("securebert_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("securebert_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|securebert_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|466.1 MB| + +## References + +https://huggingface.co/ehsanaghaei/SecureBERT + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-sent_afro_xlmr_base_pipeline_xx.md b/docs/_posts/ahmedlone127/2024-09-24-sent_afro_xlmr_base_pipeline_xx.md new file mode 100644 index 00000000000000..3e06443b83d6b8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-sent_afro_xlmr_base_pipeline_xx.md @@ -0,0 +1,71 @@ +--- +layout: model +title: Multilingual sent_afro_xlmr_base_pipeline pipeline XlmRoBertaSentenceEmbeddings from Davlan +author: John Snow Labs +name: sent_afro_xlmr_base_pipeline +date: 2024-09-24 +tags: [xx, open_source, pipeline, onnx] +task: Embeddings +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_afro_xlmr_base_pipeline` is a Multilingual model originally trained by Davlan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_afro_xlmr_base_pipeline_xx_5.5.0_3.0_1727205840413.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_afro_xlmr_base_pipeline_xx_5.5.0_3.0_1727205840413.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_afro_xlmr_base_pipeline", lang = "xx") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_afro_xlmr_base_pipeline", lang = "xx") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_afro_xlmr_base_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|xx| +|Size:|1.0 GB| + +## References + +https://huggingface.co/Davlan/afro-xlmr-base + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- XlmRoBertaSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-sent_afro_xlmr_base_xx.md b/docs/_posts/ahmedlone127/2024-09-24-sent_afro_xlmr_base_xx.md new file mode 100644 index 00000000000000..db2df7cd71fa08 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-sent_afro_xlmr_base_xx.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Multilingual sent_afro_xlmr_base XlmRoBertaSentenceEmbeddings from Davlan +author: John Snow Labs +name: sent_afro_xlmr_base +date: 2024-09-24 +tags: [xx, open_source, onnx, sentence_embeddings, xlm_roberta] +task: Embeddings +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_afro_xlmr_base` is a Multilingual model originally trained by Davlan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_afro_xlmr_base_xx_5.5.0_3.0_1727205787447.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_afro_xlmr_base_xx_5.5.0_3.0_1727205787447.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = XlmRoBertaSentenceEmbeddings.pretrained("sent_afro_xlmr_base","xx") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = XlmRoBertaSentenceEmbeddings.pretrained("sent_afro_xlmr_base","xx") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_afro_xlmr_base| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|xx| +|Size:|1.0 GB| + +## References + +https://huggingface.co/Davlan/afro-xlmr-base \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-sent_batteryscibert_uncased_en.md b/docs/_posts/ahmedlone127/2024-09-24-sent_batteryscibert_uncased_en.md new file mode 100644 index 00000000000000..e2f15777a5d8ec --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-sent_batteryscibert_uncased_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_batteryscibert_uncased BertSentenceEmbeddings from batterydata +author: John Snow Labs +name: sent_batteryscibert_uncased +date: 2024-09-24 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_batteryscibert_uncased` is a English model originally trained by batterydata. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_batteryscibert_uncased_en_5.5.0_3.0_1727202645434.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_batteryscibert_uncased_en_5.5.0_3.0_1727202645434.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_batteryscibert_uncased","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_batteryscibert_uncased","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_batteryscibert_uncased| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|410.0 MB| + +## References + +https://huggingface.co/batterydata/batteryscibert-uncased \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-sent_batteryscibert_uncased_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-sent_batteryscibert_uncased_pipeline_en.md new file mode 100644 index 00000000000000..0182dce4114fb8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-sent_batteryscibert_uncased_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_batteryscibert_uncased_pipeline pipeline BertSentenceEmbeddings from batterydata +author: John Snow Labs +name: sent_batteryscibert_uncased_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_batteryscibert_uncased_pipeline` is a English model originally trained by batterydata. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_batteryscibert_uncased_pipeline_en_5.5.0_3.0_1727202666128.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_batteryscibert_uncased_pipeline_en_5.5.0_3.0_1727202666128.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_batteryscibert_uncased_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_batteryscibert_uncased_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_batteryscibert_uncased_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|410.5 MB| + +## References + +https://huggingface.co/batterydata/batteryscibert-uncased + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-sent_bert_base_arabert_finetuned_mdeberta_tswana_en.md b/docs/_posts/ahmedlone127/2024-09-24-sent_bert_base_arabert_finetuned_mdeberta_tswana_en.md new file mode 100644 index 00000000000000..4a0f757c1e27ab --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-sent_bert_base_arabert_finetuned_mdeberta_tswana_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_base_arabert_finetuned_mdeberta_tswana BertSentenceEmbeddings from betteib +author: John Snow Labs +name: sent_bert_base_arabert_finetuned_mdeberta_tswana +date: 2024-09-24 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_arabert_finetuned_mdeberta_tswana` is a English model originally trained by betteib. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_arabert_finetuned_mdeberta_tswana_en_5.5.0_3.0_1727202340808.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_arabert_finetuned_mdeberta_tswana_en_5.5.0_3.0_1727202340808.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_arabert_finetuned_mdeberta_tswana","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_arabert_finetuned_mdeberta_tswana","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_arabert_finetuned_mdeberta_tswana| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|504.6 MB| + +## References + +https://huggingface.co/betteib/bert-base-arabert-finetuned-mdeberta-tn \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-sent_bert_base_arabertv02_ar.md b/docs/_posts/ahmedlone127/2024-09-24-sent_bert_base_arabertv02_ar.md new file mode 100644 index 00000000000000..5a2869fc5a660a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-sent_bert_base_arabertv02_ar.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Arabic sent_bert_base_arabertv02 BertSentenceEmbeddings from aubmindlab +author: John Snow Labs +name: sent_bert_base_arabertv02 +date: 2024-09-24 +tags: [ar, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: ar +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_arabertv02` is a Arabic model originally trained by aubmindlab. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_arabertv02_ar_5.5.0_3.0_1727202120487.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_arabertv02_ar_5.5.0_3.0_1727202120487.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_arabertv02","ar") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_arabertv02","ar") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_arabertv02| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|ar| +|Size:|505.1 MB| + +## References + +https://huggingface.co/aubmindlab/bert-base-arabertv02 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-sent_bert_base_arabertv02_finetuned_sandouq_ar.md b/docs/_posts/ahmedlone127/2024-09-24-sent_bert_base_arabertv02_finetuned_sandouq_ar.md new file mode 100644 index 00000000000000..283ed150378a34 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-sent_bert_base_arabertv02_finetuned_sandouq_ar.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Arabic sent_bert_base_arabertv02_finetuned_sandouq BertSentenceEmbeddings from AbdoMamdouh +author: John Snow Labs +name: sent_bert_base_arabertv02_finetuned_sandouq +date: 2024-09-24 +tags: [ar, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: ar +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_arabertv02_finetuned_sandouq` is a Arabic model originally trained by AbdoMamdouh. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_arabertv02_finetuned_sandouq_ar_5.5.0_3.0_1727157206623.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_arabertv02_finetuned_sandouq_ar_5.5.0_3.0_1727157206623.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_arabertv02_finetuned_sandouq","ar") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_arabertv02_finetuned_sandouq","ar") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_arabertv02_finetuned_sandouq| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|ar| +|Size:|505.1 MB| + +## References + +https://huggingface.co/AbdoMamdouh/bert-base-arabertv02-finetuned-sandouq \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-sent_bert_base_arabertv02_finetuned_sandouq_pipeline_ar.md b/docs/_posts/ahmedlone127/2024-09-24-sent_bert_base_arabertv02_finetuned_sandouq_pipeline_ar.md new file mode 100644 index 00000000000000..d31b6985a71a31 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-sent_bert_base_arabertv02_finetuned_sandouq_pipeline_ar.md @@ -0,0 +1,71 @@ +--- +layout: model +title: Arabic sent_bert_base_arabertv02_finetuned_sandouq_pipeline pipeline BertSentenceEmbeddings from AbdoMamdouh +author: John Snow Labs +name: sent_bert_base_arabertv02_finetuned_sandouq_pipeline +date: 2024-09-24 +tags: [ar, open_source, pipeline, onnx] +task: Embeddings +language: ar +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_arabertv02_finetuned_sandouq_pipeline` is a Arabic model originally trained by AbdoMamdouh. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_arabertv02_finetuned_sandouq_pipeline_ar_5.5.0_3.0_1727157231886.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_arabertv02_finetuned_sandouq_pipeline_ar_5.5.0_3.0_1727157231886.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_arabertv02_finetuned_sandouq_pipeline", lang = "ar") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_arabertv02_finetuned_sandouq_pipeline", lang = "ar") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_arabertv02_finetuned_sandouq_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|ar| +|Size:|505.6 MB| + +## References + +https://huggingface.co/AbdoMamdouh/bert-base-arabertv02-finetuned-sandouq + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-sent_bert_base_arabertv02_pipeline_ar.md b/docs/_posts/ahmedlone127/2024-09-24-sent_bert_base_arabertv02_pipeline_ar.md new file mode 100644 index 00000000000000..33620917c1020b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-sent_bert_base_arabertv02_pipeline_ar.md @@ -0,0 +1,71 @@ +--- +layout: model +title: Arabic sent_bert_base_arabertv02_pipeline pipeline BertSentenceEmbeddings from aubmindlab +author: John Snow Labs +name: sent_bert_base_arabertv02_pipeline +date: 2024-09-24 +tags: [ar, open_source, pipeline, onnx] +task: Embeddings +language: ar +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_arabertv02_pipeline` is a Arabic model originally trained by aubmindlab. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_arabertv02_pipeline_ar_5.5.0_3.0_1727202148000.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_arabertv02_pipeline_ar_5.5.0_3.0_1727202148000.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_arabertv02_pipeline", lang = "ar") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_arabertv02_pipeline", lang = "ar") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_arabertv02_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|ar| +|Size:|505.6 MB| + +## References + +https://huggingface.co/aubmindlab/bert-base-arabertv02 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-sent_bert_base_blbooks_cased_en.md b/docs/_posts/ahmedlone127/2024-09-24-sent_bert_base_blbooks_cased_en.md new file mode 100644 index 00000000000000..d80eed60f0fec9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-sent_bert_base_blbooks_cased_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_base_blbooks_cased BertSentenceEmbeddings from bigscience-historical-texts +author: John Snow Labs +name: sent_bert_base_blbooks_cased +date: 2024-09-24 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_blbooks_cased` is a English model originally trained by bigscience-historical-texts. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_blbooks_cased_en_5.5.0_3.0_1727157729173.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_blbooks_cased_en_5.5.0_3.0_1727157729173.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_blbooks_cased","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_blbooks_cased","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_blbooks_cased| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|412.3 MB| + +## References + +https://huggingface.co/bigscience-historical-texts/bert-base-blbooks-cased \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-sent_bert_base_blbooks_cased_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-sent_bert_base_blbooks_cased_pipeline_en.md new file mode 100644 index 00000000000000..3cb0c43d516dd6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-sent_bert_base_blbooks_cased_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_base_blbooks_cased_pipeline pipeline BertSentenceEmbeddings from bigscience-historical-texts +author: John Snow Labs +name: sent_bert_base_blbooks_cased_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_blbooks_cased_pipeline` is a English model originally trained by bigscience-historical-texts. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_blbooks_cased_pipeline_en_5.5.0_3.0_1727157757584.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_blbooks_cased_pipeline_en_5.5.0_3.0_1727157757584.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_blbooks_cased_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_blbooks_cased_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_blbooks_cased_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|412.9 MB| + +## References + +https://huggingface.co/bigscience-historical-texts/bert-base-blbooks-cased + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-sent_bert_base_english_french_dutch_russian_arabic_cased_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-sent_bert_base_english_french_dutch_russian_arabic_cased_pipeline_en.md new file mode 100644 index 00000000000000..e18ace2311611f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-sent_bert_base_english_french_dutch_russian_arabic_cased_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_base_english_french_dutch_russian_arabic_cased_pipeline pipeline BertSentenceEmbeddings from Geotrend +author: John Snow Labs +name: sent_bert_base_english_french_dutch_russian_arabic_cased_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_english_french_dutch_russian_arabic_cased_pipeline` is a English model originally trained by Geotrend. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_english_french_dutch_russian_arabic_cased_pipeline_en_5.5.0_3.0_1727202207349.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_english_french_dutch_russian_arabic_cased_pipeline_en_5.5.0_3.0_1727202207349.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_english_french_dutch_russian_arabic_cased_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_english_french_dutch_russian_arabic_cased_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_english_french_dutch_russian_arabic_cased_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|462.1 MB| + +## References + +https://huggingface.co/Geotrend/bert-base-en-fr-nl-ru-ar-cased + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-sent_bert_base_multilingual_uncased_finetuned_hp_pipeline_xx.md b/docs/_posts/ahmedlone127/2024-09-24-sent_bert_base_multilingual_uncased_finetuned_hp_pipeline_xx.md new file mode 100644 index 00000000000000..3c098590225bbe --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-sent_bert_base_multilingual_uncased_finetuned_hp_pipeline_xx.md @@ -0,0 +1,71 @@ +--- +layout: model +title: Multilingual sent_bert_base_multilingual_uncased_finetuned_hp_pipeline pipeline BertSentenceEmbeddings from rman-rahimi-29 +author: John Snow Labs +name: sent_bert_base_multilingual_uncased_finetuned_hp_pipeline +date: 2024-09-24 +tags: [xx, open_source, pipeline, onnx] +task: Embeddings +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_multilingual_uncased_finetuned_hp_pipeline` is a Multilingual model originally trained by rman-rahimi-29. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_multilingual_uncased_finetuned_hp_pipeline_xx_5.5.0_3.0_1727157727859.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_multilingual_uncased_finetuned_hp_pipeline_xx_5.5.0_3.0_1727157727859.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_multilingual_uncased_finetuned_hp_pipeline", lang = "xx") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_multilingual_uncased_finetuned_hp_pipeline", lang = "xx") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_multilingual_uncased_finetuned_hp_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|xx| +|Size:|626.1 MB| + +## References + +https://huggingface.co/rman-rahimi-29/bert-base-multilingual-uncased-finetuned-hp + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-sent_bert_base_multilingual_uncased_finetuned_hp_xx.md b/docs/_posts/ahmedlone127/2024-09-24-sent_bert_base_multilingual_uncased_finetuned_hp_xx.md new file mode 100644 index 00000000000000..f222d396ad4cee --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-sent_bert_base_multilingual_uncased_finetuned_hp_xx.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Multilingual sent_bert_base_multilingual_uncased_finetuned_hp BertSentenceEmbeddings from rman-rahimi-29 +author: John Snow Labs +name: sent_bert_base_multilingual_uncased_finetuned_hp +date: 2024-09-24 +tags: [xx, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_multilingual_uncased_finetuned_hp` is a Multilingual model originally trained by rman-rahimi-29. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_multilingual_uncased_finetuned_hp_xx_5.5.0_3.0_1727157695390.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_multilingual_uncased_finetuned_hp_xx_5.5.0_3.0_1727157695390.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_multilingual_uncased_finetuned_hp","xx") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_multilingual_uncased_finetuned_hp","xx") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_multilingual_uncased_finetuned_hp| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|xx| +|Size:|625.5 MB| + +## References + +https://huggingface.co/rman-rahimi-29/bert-base-multilingual-uncased-finetuned-hp \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-sent_bert_base_nli_stsb_en.md b/docs/_posts/ahmedlone127/2024-09-24-sent_bert_base_nli_stsb_en.md new file mode 100644 index 00000000000000..f50e41087ca203 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-sent_bert_base_nli_stsb_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_base_nli_stsb BertSentenceEmbeddings from binwang +author: John Snow Labs +name: sent_bert_base_nli_stsb +date: 2024-09-24 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_nli_stsb` is a English model originally trained by binwang. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_nli_stsb_en_5.5.0_3.0_1727202139914.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_nli_stsb_en_5.5.0_3.0_1727202139914.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_nli_stsb","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_nli_stsb","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_nli_stsb| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/binwang/bert-base-nli-stsb \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-sent_bert_base_uncased_2022_habana_test_3_en.md b/docs/_posts/ahmedlone127/2024-09-24-sent_bert_base_uncased_2022_habana_test_3_en.md new file mode 100644 index 00000000000000..ab972f82eedca6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-sent_bert_base_uncased_2022_habana_test_3_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_base_uncased_2022_habana_test_3 BertSentenceEmbeddings from philschmid +author: John Snow Labs +name: sent_bert_base_uncased_2022_habana_test_3 +date: 2024-09-24 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_uncased_2022_habana_test_3` is a English model originally trained by philschmid. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_2022_habana_test_3_en_5.5.0_3.0_1727201932435.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_2022_habana_test_3_en_5.5.0_3.0_1727201932435.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_uncased_2022_habana_test_3","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_uncased_2022_habana_test_3","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_uncased_2022_habana_test_3| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|412.0 MB| + +## References + +https://huggingface.co/philschmid/bert-base-uncased-2022-habana-test-3 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-sent_bert_base_uncased_2022_habana_test_3_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-sent_bert_base_uncased_2022_habana_test_3_pipeline_en.md new file mode 100644 index 00000000000000..8090ab5a2b51c9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-sent_bert_base_uncased_2022_habana_test_3_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_base_uncased_2022_habana_test_3_pipeline pipeline BertSentenceEmbeddings from philschmid +author: John Snow Labs +name: sent_bert_base_uncased_2022_habana_test_3_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_uncased_2022_habana_test_3_pipeline` is a English model originally trained by philschmid. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_2022_habana_test_3_pipeline_en_5.5.0_3.0_1727201954160.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_2022_habana_test_3_pipeline_en_5.5.0_3.0_1727201954160.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_uncased_2022_habana_test_3_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_uncased_2022_habana_test_3_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_uncased_2022_habana_test_3_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|412.5 MB| + +## References + +https://huggingface.co/philschmid/bert-base-uncased-2022-habana-test-3 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-sent_bert_base_uncased_finetuned_news_1973_1974_en.md b/docs/_posts/ahmedlone127/2024-09-24-sent_bert_base_uncased_finetuned_news_1973_1974_en.md new file mode 100644 index 00000000000000..7807f5e2bec964 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-sent_bert_base_uncased_finetuned_news_1973_1974_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_base_uncased_finetuned_news_1973_1974 BertSentenceEmbeddings from sally9805 +author: John Snow Labs +name: sent_bert_base_uncased_finetuned_news_1973_1974 +date: 2024-09-24 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_uncased_finetuned_news_1973_1974` is a English model originally trained by sally9805. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_finetuned_news_1973_1974_en_5.5.0_3.0_1727157152030.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_finetuned_news_1973_1974_en_5.5.0_3.0_1727157152030.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_uncased_finetuned_news_1973_1974","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_uncased_finetuned_news_1973_1974","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_uncased_finetuned_news_1973_1974| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/sally9805/bert-base-uncased-finetuned-news-1973-1974 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-sent_bert_base_uncased_finetuned_news_1973_1974_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-sent_bert_base_uncased_finetuned_news_1973_1974_pipeline_en.md new file mode 100644 index 00000000000000..233b1211c3df65 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-sent_bert_base_uncased_finetuned_news_1973_1974_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_base_uncased_finetuned_news_1973_1974_pipeline pipeline BertSentenceEmbeddings from sally9805 +author: John Snow Labs +name: sent_bert_base_uncased_finetuned_news_1973_1974_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_uncased_finetuned_news_1973_1974_pipeline` is a English model originally trained by sally9805. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_finetuned_news_1973_1974_pipeline_en_5.5.0_3.0_1727157172719.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_finetuned_news_1973_1974_pipeline_en_5.5.0_3.0_1727157172719.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_uncased_finetuned_news_1973_1974_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_uncased_finetuned_news_1973_1974_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_uncased_finetuned_news_1973_1974_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.7 MB| + +## References + +https://huggingface.co/sally9805/bert-base-uncased-finetuned-news-1973-1974 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-sent_bert_base_wikitext_en.md b/docs/_posts/ahmedlone127/2024-09-24-sent_bert_base_wikitext_en.md new file mode 100644 index 00000000000000..e00de8895fb53b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-sent_bert_base_wikitext_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_base_wikitext BertSentenceEmbeddings from AiresPucrs +author: John Snow Labs +name: sent_bert_base_wikitext +date: 2024-09-24 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_wikitext` is a English model originally trained by AiresPucrs. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_wikitext_en_5.5.0_3.0_1727157617538.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_wikitext_en_5.5.0_3.0_1727157617538.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_wikitext","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_wikitext","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_wikitext| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|407.0 MB| + +## References + +https://huggingface.co/AiresPucrs/bert-base-wikitext \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-sent_bert_base_wikitext_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-sent_bert_base_wikitext_pipeline_en.md new file mode 100644 index 00000000000000..2629719850ff4e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-sent_bert_base_wikitext_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_base_wikitext_pipeline pipeline BertSentenceEmbeddings from AiresPucrs +author: John Snow Labs +name: sent_bert_base_wikitext_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_wikitext_pipeline` is a English model originally trained by AiresPucrs. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_wikitext_pipeline_en_5.5.0_3.0_1727157638652.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_wikitext_pipeline_en_5.5.0_3.0_1727157638652.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_wikitext_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_wikitext_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_wikitext_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.6 MB| + +## References + +https://huggingface.co/AiresPucrs/bert-base-wikitext + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-sent_bert_large_cased_portuguese_lenerbr_pipeline_pt.md b/docs/_posts/ahmedlone127/2024-09-24-sent_bert_large_cased_portuguese_lenerbr_pipeline_pt.md new file mode 100644 index 00000000000000..a9ee304c2155c2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-sent_bert_large_cased_portuguese_lenerbr_pipeline_pt.md @@ -0,0 +1,71 @@ +--- +layout: model +title: Portuguese sent_bert_large_cased_portuguese_lenerbr_pipeline pipeline BertSentenceEmbeddings from pierreguillou +author: John Snow Labs +name: sent_bert_large_cased_portuguese_lenerbr_pipeline +date: 2024-09-24 +tags: [pt, open_source, pipeline, onnx] +task: Embeddings +language: pt +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_large_cased_portuguese_lenerbr_pipeline` is a Portuguese model originally trained by pierreguillou. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_large_cased_portuguese_lenerbr_pipeline_pt_5.5.0_3.0_1727202540280.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_large_cased_portuguese_lenerbr_pipeline_pt_5.5.0_3.0_1727202540280.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_large_cased_portuguese_lenerbr_pipeline", lang = "pt") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_large_cased_portuguese_lenerbr_pipeline", lang = "pt") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_large_cased_portuguese_lenerbr_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|pt| +|Size:|1.2 GB| + +## References + +https://huggingface.co/pierreguillou/bert-large-cased-pt-lenerbr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-sent_bert_large_cased_portuguese_lenerbr_pt.md b/docs/_posts/ahmedlone127/2024-09-24-sent_bert_large_cased_portuguese_lenerbr_pt.md new file mode 100644 index 00000000000000..6061a322bd868d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-sent_bert_large_cased_portuguese_lenerbr_pt.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Portuguese sent_bert_large_cased_portuguese_lenerbr BertSentenceEmbeddings from pierreguillou +author: John Snow Labs +name: sent_bert_large_cased_portuguese_lenerbr +date: 2024-09-24 +tags: [pt, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: pt +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_large_cased_portuguese_lenerbr` is a Portuguese model originally trained by pierreguillou. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_large_cased_portuguese_lenerbr_pt_5.5.0_3.0_1727202476652.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_large_cased_portuguese_lenerbr_pt_5.5.0_3.0_1727202476652.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_large_cased_portuguese_lenerbr","pt") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_large_cased_portuguese_lenerbr","pt") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_large_cased_portuguese_lenerbr| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|pt| +|Size:|1.2 GB| + +## References + +https://huggingface.co/pierreguillou/bert-large-cased-pt-lenerbr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-sent_bert_medium_mlsm_en.md b/docs/_posts/ahmedlone127/2024-09-24-sent_bert_medium_mlsm_en.md new file mode 100644 index 00000000000000..49d041ff2a3a00 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-sent_bert_medium_mlsm_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_medium_mlsm BertSentenceEmbeddings from SzegedAI +author: John Snow Labs +name: sent_bert_medium_mlsm +date: 2024-09-24 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_medium_mlsm` is a English model originally trained by SzegedAI. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_medium_mlsm_en_5.5.0_3.0_1727178506274.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_medium_mlsm_en_5.5.0_3.0_1727178506274.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_medium_mlsm","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_medium_mlsm","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_medium_mlsm| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|157.1 MB| + +## References + +https://huggingface.co/SzegedAI/bert-medium-mlsm \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-sent_bert_medium_mlsm_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-sent_bert_medium_mlsm_pipeline_en.md new file mode 100644 index 00000000000000..3749aa035da864 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-sent_bert_medium_mlsm_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_medium_mlsm_pipeline pipeline BertSentenceEmbeddings from SzegedAI +author: John Snow Labs +name: sent_bert_medium_mlsm_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_medium_mlsm_pipeline` is a English model originally trained by SzegedAI. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_medium_mlsm_pipeline_en_5.5.0_3.0_1727178514079.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_medium_mlsm_pipeline_en_5.5.0_3.0_1727178514079.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_medium_mlsm_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_medium_mlsm_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_medium_mlsm_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|157.7 MB| + +## References + +https://huggingface.co/SzegedAI/bert-medium-mlsm + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-sent_bert_multilingial_geolocation_prediction_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-sent_bert_multilingial_geolocation_prediction_pipeline_en.md new file mode 100644 index 00000000000000..1000c44cd006f8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-sent_bert_multilingial_geolocation_prediction_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_multilingial_geolocation_prediction_pipeline pipeline BertSentenceEmbeddings from k4tel +author: John Snow Labs +name: sent_bert_multilingial_geolocation_prediction_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_multilingial_geolocation_prediction_pipeline` is a English model originally trained by k4tel. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_multilingial_geolocation_prediction_pipeline_en_5.5.0_3.0_1727157396347.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_multilingial_geolocation_prediction_pipeline_en_5.5.0_3.0_1727157396347.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_multilingial_geolocation_prediction_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_multilingial_geolocation_prediction_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_multilingial_geolocation_prediction_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|663.8 MB| + +## References + +https://huggingface.co/k4tel/bert-multilingial-geolocation-prediction + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-sent_bert_persian_farsi_base_uncased_finetuned_parsbert_fa.md b/docs/_posts/ahmedlone127/2024-09-24-sent_bert_persian_farsi_base_uncased_finetuned_parsbert_fa.md new file mode 100644 index 00000000000000..280cf90db64cc9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-sent_bert_persian_farsi_base_uncased_finetuned_parsbert_fa.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Persian sent_bert_persian_farsi_base_uncased_finetuned_parsbert BertSentenceEmbeddings from Yasamansaffari73 +author: John Snow Labs +name: sent_bert_persian_farsi_base_uncased_finetuned_parsbert +date: 2024-09-24 +tags: [fa, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: fa +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_persian_farsi_base_uncased_finetuned_parsbert` is a Persian model originally trained by Yasamansaffari73. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_persian_farsi_base_uncased_finetuned_parsbert_fa_5.5.0_3.0_1727178534063.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_persian_farsi_base_uncased_finetuned_parsbert_fa_5.5.0_3.0_1727178534063.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_persian_farsi_base_uncased_finetuned_parsbert","fa") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_persian_farsi_base_uncased_finetuned_parsbert","fa") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_persian_farsi_base_uncased_finetuned_parsbert| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|fa| +|Size:|606.5 MB| + +## References + +https://huggingface.co/Yasamansaffari73/bert-fa-base-uncased-finetuned-ParsBert \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-sent_bert_persian_farsi_base_uncased_finetuned_parsbert_pipeline_fa.md b/docs/_posts/ahmedlone127/2024-09-24-sent_bert_persian_farsi_base_uncased_finetuned_parsbert_pipeline_fa.md new file mode 100644 index 00000000000000..79de17e25f8508 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-sent_bert_persian_farsi_base_uncased_finetuned_parsbert_pipeline_fa.md @@ -0,0 +1,71 @@ +--- +layout: model +title: Persian sent_bert_persian_farsi_base_uncased_finetuned_parsbert_pipeline pipeline BertSentenceEmbeddings from Yasamansaffari73 +author: John Snow Labs +name: sent_bert_persian_farsi_base_uncased_finetuned_parsbert_pipeline +date: 2024-09-24 +tags: [fa, open_source, pipeline, onnx] +task: Embeddings +language: fa +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_persian_farsi_base_uncased_finetuned_parsbert_pipeline` is a Persian model originally trained by Yasamansaffari73. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_persian_farsi_base_uncased_finetuned_parsbert_pipeline_fa_5.5.0_3.0_1727178564218.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_persian_farsi_base_uncased_finetuned_parsbert_pipeline_fa_5.5.0_3.0_1727178564218.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_persian_farsi_base_uncased_finetuned_parsbert_pipeline", lang = "fa") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_persian_farsi_base_uncased_finetuned_parsbert_pipeline", lang = "fa") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_persian_farsi_base_uncased_finetuned_parsbert_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|fa| +|Size:|607.1 MB| + +## References + +https://huggingface.co/Yasamansaffari73/bert-fa-base-uncased-finetuned-ParsBert + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-sent_bertinho_galician_small_cased_gl.md b/docs/_posts/ahmedlone127/2024-09-24-sent_bertinho_galician_small_cased_gl.md new file mode 100644 index 00000000000000..6b21d62bf8d16d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-sent_bertinho_galician_small_cased_gl.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Galician sent_bertinho_galician_small_cased BertSentenceEmbeddings from dvilares +author: John Snow Labs +name: sent_bertinho_galician_small_cased +date: 2024-09-24 +tags: [gl, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: gl +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bertinho_galician_small_cased` is a Galician model originally trained by dvilares. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bertinho_galician_small_cased_gl_5.5.0_3.0_1727178498873.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bertinho_galician_small_cased_gl_5.5.0_3.0_1727178498873.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bertinho_galician_small_cased","gl") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bertinho_galician_small_cased","gl") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bertinho_galician_small_cased| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|gl| +|Size:|245.8 MB| + +## References + +https://huggingface.co/dvilares/bertinho-gl-small-cased \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-sent_bertinho_galician_small_cased_pipeline_gl.md b/docs/_posts/ahmedlone127/2024-09-24-sent_bertinho_galician_small_cased_pipeline_gl.md new file mode 100644 index 00000000000000..e1ecaf05ea9ffe --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-sent_bertinho_galician_small_cased_pipeline_gl.md @@ -0,0 +1,71 @@ +--- +layout: model +title: Galician sent_bertinho_galician_small_cased_pipeline pipeline BertSentenceEmbeddings from dvilares +author: John Snow Labs +name: sent_bertinho_galician_small_cased_pipeline +date: 2024-09-24 +tags: [gl, open_source, pipeline, onnx] +task: Embeddings +language: gl +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bertinho_galician_small_cased_pipeline` is a Galician model originally trained by dvilares. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bertinho_galician_small_cased_pipeline_gl_5.5.0_3.0_1727178511734.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bertinho_galician_small_cased_pipeline_gl_5.5.0_3.0_1727178511734.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bertinho_galician_small_cased_pipeline", lang = "gl") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bertinho_galician_small_cased_pipeline", lang = "gl") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bertinho_galician_small_cased_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|gl| +|Size:|246.4 MB| + +## References + +https://huggingface.co/dvilares/bertinho-gl-small-cased + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-sent_danish_bert_iolariu_en.md b/docs/_posts/ahmedlone127/2024-09-24-sent_danish_bert_iolariu_en.md new file mode 100644 index 00000000000000..0010ac240fd067 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-sent_danish_bert_iolariu_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_danish_bert_iolariu BertSentenceEmbeddings from iolariu +author: John Snow Labs +name: sent_danish_bert_iolariu +date: 2024-09-24 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_danish_bert_iolariu` is a English model originally trained by iolariu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_danish_bert_iolariu_en_5.5.0_3.0_1727157488463.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_danish_bert_iolariu_en_5.5.0_3.0_1727157488463.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_danish_bert_iolariu","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_danish_bert_iolariu","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_danish_bert_iolariu| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|406.0 MB| + +## References + +https://huggingface.co/iolariu/DA_BERT \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-sent_danish_bert_iolariu_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-sent_danish_bert_iolariu_pipeline_en.md new file mode 100644 index 00000000000000..7c119c34b5efa3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-sent_danish_bert_iolariu_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_danish_bert_iolariu_pipeline pipeline BertSentenceEmbeddings from iolariu +author: John Snow Labs +name: sent_danish_bert_iolariu_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_danish_bert_iolariu_pipeline` is a English model originally trained by iolariu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_danish_bert_iolariu_pipeline_en_5.5.0_3.0_1727157509892.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_danish_bert_iolariu_pipeline_en_5.5.0_3.0_1727157509892.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_danish_bert_iolariu_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_danish_bert_iolariu_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_danish_bert_iolariu_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|406.6 MB| + +## References + +https://huggingface.co/iolariu/DA_BERT + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-sent_furina_en.md b/docs/_posts/ahmedlone127/2024-09-24-sent_furina_en.md new file mode 100644 index 00000000000000..ec723ca77d0aeb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-sent_furina_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_furina XlmRoBertaSentenceEmbeddings from yihongLiu +author: John Snow Labs +name: sent_furina +date: 2024-09-24 +tags: [en, open_source, onnx, sentence_embeddings, xlm_roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_furina` is a English model originally trained by yihongLiu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_furina_en_5.5.0_3.0_1727205845902.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_furina_en_5.5.0_3.0_1727205845902.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = XlmRoBertaSentenceEmbeddings.pretrained("sent_furina","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = XlmRoBertaSentenceEmbeddings.pretrained("sent_furina","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_furina| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|1.5 GB| + +## References + +https://huggingface.co/yihongLiu/furina \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-sent_furina_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-sent_furina_pipeline_en.md new file mode 100644 index 00000000000000..e909075711fe2b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-sent_furina_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_furina_pipeline pipeline XlmRoBertaSentenceEmbeddings from yihongLiu +author: John Snow Labs +name: sent_furina_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_furina_pipeline` is a English model originally trained by yihongLiu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_furina_pipeline_en_5.5.0_3.0_1727205926922.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_furina_pipeline_en_5.5.0_3.0_1727205926922.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_furina_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_furina_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_furina_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.5 GB| + +## References + +https://huggingface.co/yihongLiu/furina + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- XlmRoBertaSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-sent_guaran_bert_tiny_cased_gn.md b/docs/_posts/ahmedlone127/2024-09-24-sent_guaran_bert_tiny_cased_gn.md new file mode 100644 index 00000000000000..d70eba96267028 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-sent_guaran_bert_tiny_cased_gn.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Guarani sent_guaran_bert_tiny_cased BertSentenceEmbeddings from mmaguero +author: John Snow Labs +name: sent_guaran_bert_tiny_cased +date: 2024-09-24 +tags: [gn, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: gn +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_guaran_bert_tiny_cased` is a Guarani model originally trained by mmaguero. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_guaran_bert_tiny_cased_gn_5.5.0_3.0_1727157602791.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_guaran_bert_tiny_cased_gn_5.5.0_3.0_1727157602791.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_guaran_bert_tiny_cased","gn") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_guaran_bert_tiny_cased","gn") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_guaran_bert_tiny_cased| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|gn| +|Size:|34.5 MB| + +## References + +https://huggingface.co/mmaguero/gn-bert-tiny-cased \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-sent_guaran_bert_tiny_cased_pipeline_gn.md b/docs/_posts/ahmedlone127/2024-09-24-sent_guaran_bert_tiny_cased_pipeline_gn.md new file mode 100644 index 00000000000000..007e42c4cbd052 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-sent_guaran_bert_tiny_cased_pipeline_gn.md @@ -0,0 +1,71 @@ +--- +layout: model +title: Guarani sent_guaran_bert_tiny_cased_pipeline pipeline BertSentenceEmbeddings from mmaguero +author: John Snow Labs +name: sent_guaran_bert_tiny_cased_pipeline +date: 2024-09-24 +tags: [gn, open_source, pipeline, onnx] +task: Embeddings +language: gn +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_guaran_bert_tiny_cased_pipeline` is a Guarani model originally trained by mmaguero. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_guaran_bert_tiny_cased_pipeline_gn_5.5.0_3.0_1727157605105.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_guaran_bert_tiny_cased_pipeline_gn_5.5.0_3.0_1727157605105.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_guaran_bert_tiny_cased_pipeline", lang = "gn") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_guaran_bert_tiny_cased_pipeline", lang = "gn") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_guaran_bert_tiny_cased_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|gn| +|Size:|35.0 MB| + +## References + +https://huggingface.co/mmaguero/gn-bert-tiny-cased + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-sent_hm_model001_en.md b/docs/_posts/ahmedlone127/2024-09-24-sent_hm_model001_en.md new file mode 100644 index 00000000000000..9d8acf7ddf4fec --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-sent_hm_model001_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_hm_model001 BertSentenceEmbeddings from FAN-L +author: John Snow Labs +name: sent_hm_model001 +date: 2024-09-24 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_hm_model001` is a English model originally trained by FAN-L. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_hm_model001_en_5.5.0_3.0_1727178735714.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_hm_model001_en_5.5.0_3.0_1727178735714.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_hm_model001","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_hm_model001","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_hm_model001| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/FAN-L/HM_model001 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-sent_jaberv2_en.md b/docs/_posts/ahmedlone127/2024-09-24-sent_jaberv2_en.md new file mode 100644 index 00000000000000..b46295b566d940 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-sent_jaberv2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_jaberv2 BertSentenceEmbeddings from huawei-noah +author: John Snow Labs +name: sent_jaberv2 +date: 2024-09-24 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_jaberv2` is a English model originally trained by huawei-noah. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_jaberv2_en_5.5.0_3.0_1727157347613.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_jaberv2_en_5.5.0_3.0_1727157347613.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_jaberv2","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_jaberv2","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_jaberv2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|504.8 MB| + +## References + +https://huggingface.co/huawei-noah/JABERv2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-sent_jaberv2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-sent_jaberv2_pipeline_en.md new file mode 100644 index 00000000000000..8043b2ba6c8a2a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-sent_jaberv2_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_jaberv2_pipeline pipeline BertSentenceEmbeddings from huawei-noah +author: John Snow Labs +name: sent_jaberv2_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_jaberv2_pipeline` is a English model originally trained by huawei-noah. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_jaberv2_pipeline_en_5.5.0_3.0_1727157373264.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_jaberv2_pipeline_en_5.5.0_3.0_1727157373264.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_jaberv2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_jaberv2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_jaberv2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|505.4 MB| + +## References + +https://huggingface.co/huawei-noah/JABERv2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-sent_less_300000_xlm_roberta_mmar_recipe_10_en.md b/docs/_posts/ahmedlone127/2024-09-24-sent_less_300000_xlm_roberta_mmar_recipe_10_en.md new file mode 100644 index 00000000000000..8d1b6386d8a74c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-sent_less_300000_xlm_roberta_mmar_recipe_10_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_less_300000_xlm_roberta_mmar_recipe_10 XlmRoBertaSentenceEmbeddings from CennetOguz +author: John Snow Labs +name: sent_less_300000_xlm_roberta_mmar_recipe_10 +date: 2024-09-24 +tags: [en, open_source, onnx, sentence_embeddings, xlm_roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_less_300000_xlm_roberta_mmar_recipe_10` is a English model originally trained by CennetOguz. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_less_300000_xlm_roberta_mmar_recipe_10_en_5.5.0_3.0_1727205527478.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_less_300000_xlm_roberta_mmar_recipe_10_en_5.5.0_3.0_1727205527478.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = XlmRoBertaSentenceEmbeddings.pretrained("sent_less_300000_xlm_roberta_mmar_recipe_10","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = XlmRoBertaSentenceEmbeddings.pretrained("sent_less_300000_xlm_roberta_mmar_recipe_10","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_less_300000_xlm_roberta_mmar_recipe_10| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|1.0 GB| + +## References + +https://huggingface.co/CennetOguz/less_300000_xlm_roberta_mmar_recipe_10 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-sent_less_300000_xlm_roberta_mmar_recipe_10_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-sent_less_300000_xlm_roberta_mmar_recipe_10_pipeline_en.md new file mode 100644 index 00000000000000..ef59e36c7042a8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-sent_less_300000_xlm_roberta_mmar_recipe_10_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_less_300000_xlm_roberta_mmar_recipe_10_pipeline pipeline XlmRoBertaSentenceEmbeddings from CennetOguz +author: John Snow Labs +name: sent_less_300000_xlm_roberta_mmar_recipe_10_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_less_300000_xlm_roberta_mmar_recipe_10_pipeline` is a English model originally trained by CennetOguz. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_less_300000_xlm_roberta_mmar_recipe_10_pipeline_en_5.5.0_3.0_1727205583101.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_less_300000_xlm_roberta_mmar_recipe_10_pipeline_en_5.5.0_3.0_1727205583101.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_less_300000_xlm_roberta_mmar_recipe_10_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_less_300000_xlm_roberta_mmar_recipe_10_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_less_300000_xlm_roberta_mmar_recipe_10_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.0 GB| + +## References + +https://huggingface.co/CennetOguz/less_300000_xlm_roberta_mmar_recipe_10 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- XlmRoBertaSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-sent_s3_v1_20_epochs_en.md b/docs/_posts/ahmedlone127/2024-09-24-sent_s3_v1_20_epochs_en.md new file mode 100644 index 00000000000000..c699e07a46a6aa --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-sent_s3_v1_20_epochs_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_s3_v1_20_epochs BertSentenceEmbeddings from AethiQs-Max +author: John Snow Labs +name: sent_s3_v1_20_epochs +date: 2024-09-24 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_s3_v1_20_epochs` is a English model originally trained by AethiQs-Max. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_s3_v1_20_epochs_en_5.5.0_3.0_1727202409357.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_s3_v1_20_epochs_en_5.5.0_3.0_1727202409357.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_s3_v1_20_epochs","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_s3_v1_20_epochs","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_s3_v1_20_epochs| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|406.8 MB| + +## References + +https://huggingface.co/AethiQs-Max/s3-v1-20_epochs \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-sent_s3_v1_20_epochs_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-sent_s3_v1_20_epochs_pipeline_en.md new file mode 100644 index 00000000000000..468ed0070d6a61 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-sent_s3_v1_20_epochs_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_s3_v1_20_epochs_pipeline pipeline BertSentenceEmbeddings from AethiQs-Max +author: John Snow Labs +name: sent_s3_v1_20_epochs_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_s3_v1_20_epochs_pipeline` is a English model originally trained by AethiQs-Max. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_s3_v1_20_epochs_pipeline_en_5.5.0_3.0_1727202430246.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_s3_v1_20_epochs_pipeline_en_5.5.0_3.0_1727202430246.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_s3_v1_20_epochs_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_s3_v1_20_epochs_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_s3_v1_20_epochs_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.4 MB| + +## References + +https://huggingface.co/AethiQs-Max/s3-v1-20_epochs + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-sent_turkish_mini_bert_uncased_pipeline_tr.md b/docs/_posts/ahmedlone127/2024-09-24-sent_turkish_mini_bert_uncased_pipeline_tr.md new file mode 100644 index 00000000000000..c38c57cb803312 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-sent_turkish_mini_bert_uncased_pipeline_tr.md @@ -0,0 +1,71 @@ +--- +layout: model +title: Turkish sent_turkish_mini_bert_uncased_pipeline pipeline BertSentenceEmbeddings from ytu-ce-cosmos +author: John Snow Labs +name: sent_turkish_mini_bert_uncased_pipeline +date: 2024-09-24 +tags: [tr, open_source, pipeline, onnx] +task: Embeddings +language: tr +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_turkish_mini_bert_uncased_pipeline` is a Turkish model originally trained by ytu-ce-cosmos. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_turkish_mini_bert_uncased_pipeline_tr_5.5.0_3.0_1727202577142.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_turkish_mini_bert_uncased_pipeline_tr_5.5.0_3.0_1727202577142.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_turkish_mini_bert_uncased_pipeline", lang = "tr") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_turkish_mini_bert_uncased_pipeline", lang = "tr") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_turkish_mini_bert_uncased_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|tr| +|Size:|43.8 MB| + +## References + +https://huggingface.co/ytu-ce-cosmos/turkish-mini-bert-uncased + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-sent_xlm_roberta_base_finetuned_kinyarwanda_kinyarwanda_finetuned_en.md b/docs/_posts/ahmedlone127/2024-09-24-sent_xlm_roberta_base_finetuned_kinyarwanda_kinyarwanda_finetuned_en.md new file mode 100644 index 00000000000000..e5a14d4fa671fe --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-sent_xlm_roberta_base_finetuned_kinyarwanda_kinyarwanda_finetuned_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_xlm_roberta_base_finetuned_kinyarwanda_kinyarwanda_finetuned XlmRoBertaSentenceEmbeddings from RogerB +author: John Snow Labs +name: sent_xlm_roberta_base_finetuned_kinyarwanda_kinyarwanda_finetuned +date: 2024-09-24 +tags: [en, open_source, onnx, sentence_embeddings, xlm_roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_xlm_roberta_base_finetuned_kinyarwanda_kinyarwanda_finetuned` is a English model originally trained by RogerB. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_xlm_roberta_base_finetuned_kinyarwanda_kinyarwanda_finetuned_en_5.5.0_3.0_1727205637920.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_xlm_roberta_base_finetuned_kinyarwanda_kinyarwanda_finetuned_en_5.5.0_3.0_1727205637920.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = XlmRoBertaSentenceEmbeddings.pretrained("sent_xlm_roberta_base_finetuned_kinyarwanda_kinyarwanda_finetuned","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = XlmRoBertaSentenceEmbeddings.pretrained("sent_xlm_roberta_base_finetuned_kinyarwanda_kinyarwanda_finetuned","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_xlm_roberta_base_finetuned_kinyarwanda_kinyarwanda_finetuned| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|1.0 GB| + +## References + +https://huggingface.co/RogerB/xlm-roberta-base-finetuned-kinyarwanda-kin-finetuned \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-sent_xlm_roberta_base_finetuned_kinyarwanda_kinyarwanda_finetuned_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-sent_xlm_roberta_base_finetuned_kinyarwanda_kinyarwanda_finetuned_pipeline_en.md new file mode 100644 index 00000000000000..9eeb31a0cc744f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-sent_xlm_roberta_base_finetuned_kinyarwanda_kinyarwanda_finetuned_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_xlm_roberta_base_finetuned_kinyarwanda_kinyarwanda_finetuned_pipeline pipeline XlmRoBertaSentenceEmbeddings from RogerB +author: John Snow Labs +name: sent_xlm_roberta_base_finetuned_kinyarwanda_kinyarwanda_finetuned_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_xlm_roberta_base_finetuned_kinyarwanda_kinyarwanda_finetuned_pipeline` is a English model originally trained by RogerB. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_xlm_roberta_base_finetuned_kinyarwanda_kinyarwanda_finetuned_pipeline_en_5.5.0_3.0_1727205705111.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_xlm_roberta_base_finetuned_kinyarwanda_kinyarwanda_finetuned_pipeline_en_5.5.0_3.0_1727205705111.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_xlm_roberta_base_finetuned_kinyarwanda_kinyarwanda_finetuned_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_xlm_roberta_base_finetuned_kinyarwanda_kinyarwanda_finetuned_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_xlm_roberta_base_finetuned_kinyarwanda_kinyarwanda_finetuned_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.0 GB| + +## References + +https://huggingface.co/RogerB/xlm-roberta-base-finetuned-kinyarwanda-kin-finetuned + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- XlmRoBertaSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-sent_xlm_v_base_trimmed_italian_en.md b/docs/_posts/ahmedlone127/2024-09-24-sent_xlm_v_base_trimmed_italian_en.md new file mode 100644 index 00000000000000..9a33209fdcf372 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-sent_xlm_v_base_trimmed_italian_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_xlm_v_base_trimmed_italian XlmRoBertaSentenceEmbeddings from vocabtrimmer +author: John Snow Labs +name: sent_xlm_v_base_trimmed_italian +date: 2024-09-24 +tags: [en, open_source, onnx, sentence_embeddings, xlm_roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_xlm_v_base_trimmed_italian` is a English model originally trained by vocabtrimmer. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_xlm_v_base_trimmed_italian_en_5.5.0_3.0_1727205485563.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_xlm_v_base_trimmed_italian_en_5.5.0_3.0_1727205485563.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = XlmRoBertaSentenceEmbeddings.pretrained("sent_xlm_v_base_trimmed_italian","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = XlmRoBertaSentenceEmbeddings.pretrained("sent_xlm_v_base_trimmed_italian","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_xlm_v_base_trimmed_italian| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|526.3 MB| + +## References + +https://huggingface.co/vocabtrimmer/xlm-v-base-trimmed-it \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-sent_xlm_v_base_trimmed_italian_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-sent_xlm_v_base_trimmed_italian_pipeline_en.md new file mode 100644 index 00000000000000..0957f778649f25 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-sent_xlm_v_base_trimmed_italian_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_xlm_v_base_trimmed_italian_pipeline pipeline XlmRoBertaSentenceEmbeddings from vocabtrimmer +author: John Snow Labs +name: sent_xlm_v_base_trimmed_italian_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_xlm_v_base_trimmed_italian_pipeline` is a English model originally trained by vocabtrimmer. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_xlm_v_base_trimmed_italian_pipeline_en_5.5.0_3.0_1727205639190.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_xlm_v_base_trimmed_italian_pipeline_en_5.5.0_3.0_1727205639190.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_xlm_v_base_trimmed_italian_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_xlm_v_base_trimmed_italian_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_xlm_v_base_trimmed_italian_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|526.8 MB| + +## References + +https://huggingface.co/vocabtrimmer/xlm-v-base-trimmed-it + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- XlmRoBertaSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-sent_xlm_v_base_trimmed_portuguese_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-sent_xlm_v_base_trimmed_portuguese_pipeline_en.md new file mode 100644 index 00000000000000..a7b5329fe3c91d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-sent_xlm_v_base_trimmed_portuguese_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_xlm_v_base_trimmed_portuguese_pipeline pipeline XlmRoBertaSentenceEmbeddings from vocabtrimmer +author: John Snow Labs +name: sent_xlm_v_base_trimmed_portuguese_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_xlm_v_base_trimmed_portuguese_pipeline` is a English model originally trained by vocabtrimmer. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_xlm_v_base_trimmed_portuguese_pipeline_en_5.5.0_3.0_1727205808735.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_xlm_v_base_trimmed_portuguese_pipeline_en_5.5.0_3.0_1727205808735.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_xlm_v_base_trimmed_portuguese_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_xlm_v_base_trimmed_portuguese_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_xlm_v_base_trimmed_portuguese_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|520.9 MB| + +## References + +https://huggingface.co/vocabtrimmer/xlm-v-base-trimmed-pt + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- XlmRoBertaSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-sentiment_analysis_model_mahmoud8_en.md b/docs/_posts/ahmedlone127/2024-09-24-sentiment_analysis_model_mahmoud8_en.md new file mode 100644 index 00000000000000..6ef38a67352f35 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-sentiment_analysis_model_mahmoud8_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sentiment_analysis_model_mahmoud8 DistilBertForSequenceClassification from Mahmoud8 +author: John Snow Labs +name: sentiment_analysis_model_mahmoud8 +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sentiment_analysis_model_mahmoud8` is a English model originally trained by Mahmoud8. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sentiment_analysis_model_mahmoud8_en_5.5.0_3.0_1727154821588.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sentiment_analysis_model_mahmoud8_en_5.5.0_3.0_1727154821588.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("sentiment_analysis_model_mahmoud8","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("sentiment_analysis_model_mahmoud8", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sentiment_analysis_model_mahmoud8| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Mahmoud8/sentiment_analysis_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-sentiment_analysis_model_mahmoud8_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-sentiment_analysis_model_mahmoud8_pipeline_en.md new file mode 100644 index 00000000000000..1f24d1a6c6a6fa --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-sentiment_analysis_model_mahmoud8_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English sentiment_analysis_model_mahmoud8_pipeline pipeline DistilBertForSequenceClassification from Mahmoud8 +author: John Snow Labs +name: sentiment_analysis_model_mahmoud8_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sentiment_analysis_model_mahmoud8_pipeline` is a English model originally trained by Mahmoud8. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sentiment_analysis_model_mahmoud8_pipeline_en_5.5.0_3.0_1727154834425.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sentiment_analysis_model_mahmoud8_pipeline_en_5.5.0_3.0_1727154834425.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sentiment_analysis_model_mahmoud8_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sentiment_analysis_model_mahmoud8_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sentiment_analysis_model_mahmoud8_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Mahmoud8/sentiment_analysis_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-sentiment_analysis_with_distilbert_en.md b/docs/_posts/ahmedlone127/2024-09-24-sentiment_analysis_with_distilbert_en.md new file mode 100644 index 00000000000000..88852b01cf048d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-sentiment_analysis_with_distilbert_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sentiment_analysis_with_distilbert DistilBertForSequenceClassification from hdv2709 +author: John Snow Labs +name: sentiment_analysis_with_distilbert +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sentiment_analysis_with_distilbert` is a English model originally trained by hdv2709. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sentiment_analysis_with_distilbert_en_5.5.0_3.0_1727137046808.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sentiment_analysis_with_distilbert_en_5.5.0_3.0_1727137046808.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("sentiment_analysis_with_distilbert","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("sentiment_analysis_with_distilbert", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sentiment_analysis_with_distilbert| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/hdv2709/sentiment_analysis_with_DistilBERT \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-sentiment_analysis_with_distilbert_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-sentiment_analysis_with_distilbert_pipeline_en.md new file mode 100644 index 00000000000000..353da1286909f6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-sentiment_analysis_with_distilbert_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English sentiment_analysis_with_distilbert_pipeline pipeline DistilBertForSequenceClassification from hdv2709 +author: John Snow Labs +name: sentiment_analysis_with_distilbert_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sentiment_analysis_with_distilbert_pipeline` is a English model originally trained by hdv2709. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sentiment_analysis_with_distilbert_pipeline_en_5.5.0_3.0_1727137059738.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sentiment_analysis_with_distilbert_pipeline_en_5.5.0_3.0_1727137059738.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sentiment_analysis_with_distilbert_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sentiment_analysis_with_distilbert_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sentiment_analysis_with_distilbert_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/hdv2709/sentiment_analysis_with_DistilBERT + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-services_ucacue_bryansagbay_en.md b/docs/_posts/ahmedlone127/2024-09-24-services_ucacue_bryansagbay_en.md new file mode 100644 index 00000000000000..916048d7fa54b1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-services_ucacue_bryansagbay_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English services_ucacue_bryansagbay RoBertaForSequenceClassification from BryanSagbay +author: John Snow Labs +name: services_ucacue_bryansagbay +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`services_ucacue_bryansagbay` is a English model originally trained by BryanSagbay. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/services_ucacue_bryansagbay_en_5.5.0_3.0_1727171716556.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/services_ucacue_bryansagbay_en_5.5.0_3.0_1727171716556.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("services_ucacue_bryansagbay","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("services_ucacue_bryansagbay", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|services_ucacue_bryansagbay| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|445.8 MB| + +## References + +https://huggingface.co/BryanSagbay/services-ucacue \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-sesgo_genero_model_en.md b/docs/_posts/ahmedlone127/2024-09-24-sesgo_genero_model_en.md new file mode 100644 index 00000000000000..739ae8b193240a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-sesgo_genero_model_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sesgo_genero_model RoBertaForSequenceClassification from bonzo1971 +author: John Snow Labs +name: sesgo_genero_model +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sesgo_genero_model` is a English model originally trained by bonzo1971. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sesgo_genero_model_en_5.5.0_3.0_1727171127915.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sesgo_genero_model_en_5.5.0_3.0_1727171127915.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("sesgo_genero_model","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("sesgo_genero_model", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sesgo_genero_model| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|408.3 MB| + +## References + +https://huggingface.co/bonzo1971/sesgo_genero_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-sesgo_genero_model_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-sesgo_genero_model_pipeline_en.md new file mode 100644 index 00000000000000..c9092cb4a3d543 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-sesgo_genero_model_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English sesgo_genero_model_pipeline pipeline RoBertaForSequenceClassification from bonzo1971 +author: John Snow Labs +name: sesgo_genero_model_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sesgo_genero_model_pipeline` is a English model originally trained by bonzo1971. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sesgo_genero_model_pipeline_en_5.5.0_3.0_1727171148443.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sesgo_genero_model_pipeline_en_5.5.0_3.0_1727171148443.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sesgo_genero_model_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sesgo_genero_model_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sesgo_genero_model_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|408.3 MB| + +## References + +https://huggingface.co/bonzo1971/sesgo_genero_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-sgppellow_en.md b/docs/_posts/ahmedlone127/2024-09-24-sgppellow_en.md new file mode 100644 index 00000000000000..00d288cd61abbe --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-sgppellow_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sgppellow RoBertaForSequenceClassification from SGPPellow +author: John Snow Labs +name: sgppellow +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sgppellow` is a English model originally trained by SGPPellow. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sgppellow_en_5.5.0_3.0_1727171053996.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sgppellow_en_5.5.0_3.0_1727171053996.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("sgppellow","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("sgppellow", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sgppellow| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|416.2 MB| + +## References + +https://huggingface.co/SGPPellow/SGPPellow \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-sgppellow_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-sgppellow_pipeline_en.md new file mode 100644 index 00000000000000..15dee56749bd7b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-sgppellow_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English sgppellow_pipeline pipeline RoBertaForSequenceClassification from SGPPellow +author: John Snow Labs +name: sgppellow_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sgppellow_pipeline` is a English model originally trained by SGPPellow. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sgppellow_pipeline_en_5.5.0_3.0_1727171097734.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sgppellow_pipeline_en_5.5.0_3.0_1727171097734.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sgppellow_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sgppellow_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sgppellow_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|416.2 MB| + +## References + +https://huggingface.co/SGPPellow/SGPPellow + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-spanish_sentiment_model_pysentiment_en.md b/docs/_posts/ahmedlone127/2024-09-24-spanish_sentiment_model_pysentiment_en.md new file mode 100644 index 00000000000000..9fc36b7544f0cd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-spanish_sentiment_model_pysentiment_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English spanish_sentiment_model_pysentiment RoBertaForSequenceClassification from der-emmanuel +author: John Snow Labs +name: spanish_sentiment_model_pysentiment +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`spanish_sentiment_model_pysentiment` is a English model originally trained by der-emmanuel. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/spanish_sentiment_model_pysentiment_en_5.5.0_3.0_1727167265977.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/spanish_sentiment_model_pysentiment_en_5.5.0_3.0_1727167265977.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("spanish_sentiment_model_pysentiment","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("spanish_sentiment_model_pysentiment", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|spanish_sentiment_model_pysentiment| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|408.3 MB| + +## References + +https://huggingface.co/der-emmanuel/es-sentiment-model-pysentiment \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-spanish_sentiment_model_pysentiment_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-spanish_sentiment_model_pysentiment_pipeline_en.md new file mode 100644 index 00000000000000..e18a1e2dc98e8b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-spanish_sentiment_model_pysentiment_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English spanish_sentiment_model_pysentiment_pipeline pipeline RoBertaForSequenceClassification from der-emmanuel +author: John Snow Labs +name: spanish_sentiment_model_pysentiment_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`spanish_sentiment_model_pysentiment_pipeline` is a English model originally trained by der-emmanuel. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/spanish_sentiment_model_pysentiment_pipeline_en_5.5.0_3.0_1727167286848.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/spanish_sentiment_model_pysentiment_pipeline_en_5.5.0_3.0_1727167286848.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("spanish_sentiment_model_pysentiment_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("spanish_sentiment_model_pysentiment_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|spanish_sentiment_model_pysentiment_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|408.3 MB| + +## References + +https://huggingface.co/der-emmanuel/es-sentiment-model-pysentiment + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-spillage_distilbert_base_uncased_en.md b/docs/_posts/ahmedlone127/2024-09-24-spillage_distilbert_base_uncased_en.md new file mode 100644 index 00000000000000..1a8851d67586fd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-spillage_distilbert_base_uncased_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English spillage_distilbert_base_uncased DistilBertForSequenceClassification from chuuhtetnaing +author: John Snow Labs +name: spillage_distilbert_base_uncased +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`spillage_distilbert_base_uncased` is a English model originally trained by chuuhtetnaing. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/spillage_distilbert_base_uncased_en_5.5.0_3.0_1727164741410.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/spillage_distilbert_base_uncased_en_5.5.0_3.0_1727164741410.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("spillage_distilbert_base_uncased","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("spillage_distilbert_base_uncased", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|spillage_distilbert_base_uncased| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/chuuhtetnaing/spillage-distilbert-base-uncased \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-spillage_distilbert_base_uncased_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-spillage_distilbert_base_uncased_pipeline_en.md new file mode 100644 index 00000000000000..be83df8fa1fc97 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-spillage_distilbert_base_uncased_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English spillage_distilbert_base_uncased_pipeline pipeline DistilBertForSequenceClassification from chuuhtetnaing +author: John Snow Labs +name: spillage_distilbert_base_uncased_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`spillage_distilbert_base_uncased_pipeline` is a English model originally trained by chuuhtetnaing. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/spillage_distilbert_base_uncased_pipeline_en_5.5.0_3.0_1727164756348.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/spillage_distilbert_base_uncased_pipeline_en_5.5.0_3.0_1727164756348.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("spillage_distilbert_base_uncased_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("spillage_distilbert_base_uncased_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|spillage_distilbert_base_uncased_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/chuuhtetnaing/spillage-distilbert-base-uncased + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-squeezebert_uncased_finetuned_squad_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-squeezebert_uncased_finetuned_squad_pipeline_en.md new file mode 100644 index 00000000000000..791ffde689e51c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-squeezebert_uncased_finetuned_squad_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English squeezebert_uncased_finetuned_squad_pipeline pipeline BertForQuestionAnswering from SupriyaArun +author: John Snow Labs +name: squeezebert_uncased_finetuned_squad_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`squeezebert_uncased_finetuned_squad_pipeline` is a English model originally trained by SupriyaArun. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/squeezebert_uncased_finetuned_squad_pipeline_en_5.5.0_3.0_1727206792798.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/squeezebert_uncased_finetuned_squad_pipeline_en_5.5.0_3.0_1727206792798.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("squeezebert_uncased_finetuned_squad_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("squeezebert_uncased_finetuned_squad_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|squeezebert_uncased_finetuned_squad_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|187.4 MB| + +## References + +https://huggingface.co/SupriyaArun/squeezebert-uncased-finetuned-squad + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-sroberta_hr.md b/docs/_posts/ahmedlone127/2024-09-24-sroberta_hr.md new file mode 100644 index 00000000000000..b921b01841f2fe --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-sroberta_hr.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Croatian sroberta RoBertaEmbeddings from Andrija +author: John Snow Labs +name: sroberta +date: 2024-09-24 +tags: [hr, open_source, onnx, embeddings, roberta] +task: Embeddings +language: hr +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sroberta` is a Croatian model originally trained by Andrija. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sroberta_hr_5.5.0_3.0_1727216187649.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sroberta_hr_5.5.0_3.0_1727216187649.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("sroberta","hr") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("sroberta","hr") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sroberta| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|hr| +|Size:|450.7 MB| + +## References + +https://huggingface.co/Andrija/SRoBERTa \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-sroberta_pipeline_hr.md b/docs/_posts/ahmedlone127/2024-09-24-sroberta_pipeline_hr.md new file mode 100644 index 00000000000000..b118fcb6a362cc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-sroberta_pipeline_hr.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Croatian sroberta_pipeline pipeline RoBertaEmbeddings from Andrija +author: John Snow Labs +name: sroberta_pipeline +date: 2024-09-24 +tags: [hr, open_source, pipeline, onnx] +task: Embeddings +language: hr +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sroberta_pipeline` is a Croatian model originally trained by Andrija. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sroberta_pipeline_hr_5.5.0_3.0_1727216211408.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sroberta_pipeline_hr_5.5.0_3.0_1727216211408.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sroberta_pipeline", lang = "hr") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sroberta_pipeline", lang = "hr") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sroberta_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|hr| +|Size:|450.8 MB| + +## References + +https://huggingface.co/Andrija/SRoBERTa + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-sst2_roberta_large_seed_1_en.md b/docs/_posts/ahmedlone127/2024-09-24-sst2_roberta_large_seed_1_en.md new file mode 100644 index 00000000000000..47e95740e85feb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-sst2_roberta_large_seed_1_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sst2_roberta_large_seed_1 RoBertaForSequenceClassification from utahnlp +author: John Snow Labs +name: sst2_roberta_large_seed_1 +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sst2_roberta_large_seed_1` is a English model originally trained by utahnlp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sst2_roberta_large_seed_1_en_5.5.0_3.0_1727167866607.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sst2_roberta_large_seed_1_en_5.5.0_3.0_1727167866607.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("sst2_roberta_large_seed_1","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("sst2_roberta_large_seed_1", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sst2_roberta_large_seed_1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/utahnlp/sst2_roberta-large_seed-1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-stego_classifier_checkpoint_epoch_0_2024_07_26_16_19_31_en.md b/docs/_posts/ahmedlone127/2024-09-24-stego_classifier_checkpoint_epoch_0_2024_07_26_16_19_31_en.md new file mode 100644 index 00000000000000..86e2e0e2cc4099 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-stego_classifier_checkpoint_epoch_0_2024_07_26_16_19_31_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English stego_classifier_checkpoint_epoch_0_2024_07_26_16_19_31 DistilBertForSequenceClassification from jvelja +author: John Snow Labs +name: stego_classifier_checkpoint_epoch_0_2024_07_26_16_19_31 +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`stego_classifier_checkpoint_epoch_0_2024_07_26_16_19_31` is a English model originally trained by jvelja. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/stego_classifier_checkpoint_epoch_0_2024_07_26_16_19_31_en_5.5.0_3.0_1727137393609.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/stego_classifier_checkpoint_epoch_0_2024_07_26_16_19_31_en_5.5.0_3.0_1727137393609.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("stego_classifier_checkpoint_epoch_0_2024_07_26_16_19_31","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("stego_classifier_checkpoint_epoch_0_2024_07_26_16_19_31", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|stego_classifier_checkpoint_epoch_0_2024_07_26_16_19_31| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/jvelja/stego-classifier-checkpoint-epoch-0-2024-07-26_16-19-31 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-stego_classifier_checkpoint_epoch_0_2024_07_26_16_19_31_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-stego_classifier_checkpoint_epoch_0_2024_07_26_16_19_31_pipeline_en.md new file mode 100644 index 00000000000000..0d4dc83f5aeed8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-stego_classifier_checkpoint_epoch_0_2024_07_26_16_19_31_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English stego_classifier_checkpoint_epoch_0_2024_07_26_16_19_31_pipeline pipeline DistilBertForSequenceClassification from jvelja +author: John Snow Labs +name: stego_classifier_checkpoint_epoch_0_2024_07_26_16_19_31_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`stego_classifier_checkpoint_epoch_0_2024_07_26_16_19_31_pipeline` is a English model originally trained by jvelja. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/stego_classifier_checkpoint_epoch_0_2024_07_26_16_19_31_pipeline_en_5.5.0_3.0_1727137406514.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/stego_classifier_checkpoint_epoch_0_2024_07_26_16_19_31_pipeline_en_5.5.0_3.0_1727137406514.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("stego_classifier_checkpoint_epoch_0_2024_07_26_16_19_31_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("stego_classifier_checkpoint_epoch_0_2024_07_26_16_19_31_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|stego_classifier_checkpoint_epoch_0_2024_07_26_16_19_31_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/jvelja/stego-classifier-checkpoint-epoch-0-2024-07-26_16-19-31 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-subtopics_bigbird_base_en.md b/docs/_posts/ahmedlone127/2024-09-24-subtopics_bigbird_base_en.md new file mode 100644 index 00000000000000..0499d864c90bcf --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-subtopics_bigbird_base_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English subtopics_bigbird_base RoBertaForSequenceClassification from RogerKam +author: John Snow Labs +name: subtopics_bigbird_base +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`subtopics_bigbird_base` is a English model originally trained by RogerKam. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/subtopics_bigbird_base_en_5.5.0_3.0_1727167881927.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/subtopics_bigbird_base_en_5.5.0_3.0_1727167881927.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("subtopics_bigbird_base","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("subtopics_bigbird_base", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|subtopics_bigbird_base| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|436.5 MB| + +## References + +https://huggingface.co/RogerKam/subTopics-bigBird-base \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-subtopics_bigbird_base_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-subtopics_bigbird_base_pipeline_en.md new file mode 100644 index 00000000000000..26ed1d9a6d631b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-subtopics_bigbird_base_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English subtopics_bigbird_base_pipeline pipeline RoBertaForSequenceClassification from RogerKam +author: John Snow Labs +name: subtopics_bigbird_base_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`subtopics_bigbird_base_pipeline` is a English model originally trained by RogerKam. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/subtopics_bigbird_base_pipeline_en_5.5.0_3.0_1727167913886.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/subtopics_bigbird_base_pipeline_en_5.5.0_3.0_1727167913886.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("subtopics_bigbird_base_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("subtopics_bigbird_base_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|subtopics_bigbird_base_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|436.6 MB| + +## References + +https://huggingface.co/RogerKam/subTopics-bigBird-base + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-sucidal_text_classification_distillbert_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-sucidal_text_classification_distillbert_pipeline_en.md new file mode 100644 index 00000000000000..ac60a2665dfb4f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-sucidal_text_classification_distillbert_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English sucidal_text_classification_distillbert_pipeline pipeline DistilBertForSequenceClassification from pradanaadn +author: John Snow Labs +name: sucidal_text_classification_distillbert_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sucidal_text_classification_distillbert_pipeline` is a English model originally trained by pradanaadn. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sucidal_text_classification_distillbert_pipeline_en_5.5.0_3.0_1727136840942.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sucidal_text_classification_distillbert_pipeline_en_5.5.0_3.0_1727136840942.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sucidal_text_classification_distillbert_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sucidal_text_classification_distillbert_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sucidal_text_classification_distillbert_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/pradanaadn/sucidal-text-classification-distillbert + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-task_2_english_en.md b/docs/_posts/ahmedlone127/2024-09-24-task_2_english_en.md new file mode 100644 index 00000000000000..148b930dc62f95 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-task_2_english_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English task_2_english RoBertaForTokenClassification from esacalderonru +author: John Snow Labs +name: task_2_english +date: 2024-09-24 +tags: [en, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`task_2_english` is a English model originally trained by esacalderonru. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/task_2_english_en_5.5.0_3.0_1727150704327.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/task_2_english_en_5.5.0_3.0_1727150704327.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("task_2_english","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("task_2_english", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|task_2_english| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|445.1 MB| + +## References + +https://huggingface.co/esacalderonru/Task_2_en \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-task_2_english_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-task_2_english_pipeline_en.md new file mode 100644 index 00000000000000..61c81b70d72fa9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-task_2_english_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English task_2_english_pipeline pipeline RoBertaForTokenClassification from esacalderonru +author: John Snow Labs +name: task_2_english_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`task_2_english_pipeline` is a English model originally trained by esacalderonru. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/task_2_english_pipeline_en_5.5.0_3.0_1727150733107.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/task_2_english_pipeline_en_5.5.0_3.0_1727150733107.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("task_2_english_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("task_2_english_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|task_2_english_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|445.1 MB| + +## References + +https://huggingface.co/esacalderonru/Task_2_en + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-terjman_large_ar.md b/docs/_posts/ahmedlone127/2024-09-24-terjman_large_ar.md new file mode 100644 index 00000000000000..1a1057c65bf371 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-terjman_large_ar.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Arabic terjman_large MarianTransformer from atlasia +author: John Snow Labs +name: terjman_large +date: 2024-09-24 +tags: [ar, open_source, onnx, translation, marian] +task: Translation +language: ar +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`terjman_large` is a Arabic model originally trained by atlasia. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/terjman_large_ar_5.5.0_3.0_1727208921086.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/terjman_large_ar_5.5.0_3.0_1727208921086.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("terjman_large","ar") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("terjman_large","ar") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|terjman_large| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|ar| +|Size:|695.3 MB| + +## References + +https://huggingface.co/atlasia/Terjman-Large \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-terjman_large_pipeline_ar.md b/docs/_posts/ahmedlone127/2024-09-24-terjman_large_pipeline_ar.md new file mode 100644 index 00000000000000..2afedf4f21e57b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-terjman_large_pipeline_ar.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Arabic terjman_large_pipeline pipeline MarianTransformer from atlasia +author: John Snow Labs +name: terjman_large_pipeline +date: 2024-09-24 +tags: [ar, open_source, pipeline, onnx] +task: Translation +language: ar +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`terjman_large_pipeline` is a Arabic model originally trained by atlasia. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/terjman_large_pipeline_ar_5.5.0_3.0_1727209153955.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/terjman_large_pipeline_ar_5.5.0_3.0_1727209153955.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("terjman_large_pipeline", lang = "ar") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("terjman_large_pipeline", lang = "ar") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|terjman_large_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|ar| +|Size:|695.8 MB| + +## References + +https://huggingface.co/atlasia/Terjman-Large + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-test_en.md b/docs/_posts/ahmedlone127/2024-09-24-test_en.md new file mode 100644 index 00000000000000..1bf999b8285888 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-test_en.md @@ -0,0 +1,88 @@ +--- +layout: model +title: English test RoBertaForQuestionAnswering from Nadav +author: John Snow Labs +name: test +date: 2024-09-24 +tags: [en, open_source, onnx, question_answering, roberta] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`test` is a English model originally trained by Nadav. + +## Predicted Entities + + + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/test_en_5.5.0_3.0_1727156654782.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/test_en_5.5.0_3.0_1727156654782.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = RoBertaForQuestionAnswering.pretrained("test","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) +``` +```scala +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = RoBertaForQuestionAnswering.pretrained("test", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|test| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|877.1 MB| + +## References + +References + +https://huggingface.co/Nadav/test \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-test_nepal_bhasa_study_roberta_large_two_way_en.md b/docs/_posts/ahmedlone127/2024-09-24-test_nepal_bhasa_study_roberta_large_two_way_en.md new file mode 100644 index 00000000000000..7cd1b3f7fbdcc3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-test_nepal_bhasa_study_roberta_large_two_way_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English test_nepal_bhasa_study_roberta_large_two_way RoBertaForSequenceClassification from xiazeng +author: John Snow Labs +name: test_nepal_bhasa_study_roberta_large_two_way +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`test_nepal_bhasa_study_roberta_large_two_way` is a English model originally trained by xiazeng. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/test_nepal_bhasa_study_roberta_large_two_way_en_5.5.0_3.0_1727172164564.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/test_nepal_bhasa_study_roberta_large_two_way_en_5.5.0_3.0_1727172164564.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("test_nepal_bhasa_study_roberta_large_two_way","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("test_nepal_bhasa_study_roberta_large_two_way", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|test_nepal_bhasa_study_roberta_large_two_way| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/xiazeng/test-new-study_roberta-large_two-way \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-test_nepal_bhasa_study_roberta_large_two_way_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-test_nepal_bhasa_study_roberta_large_two_way_pipeline_en.md new file mode 100644 index 00000000000000..d2bf81088a8a22 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-test_nepal_bhasa_study_roberta_large_two_way_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English test_nepal_bhasa_study_roberta_large_two_way_pipeline pipeline RoBertaForSequenceClassification from xiazeng +author: John Snow Labs +name: test_nepal_bhasa_study_roberta_large_two_way_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`test_nepal_bhasa_study_roberta_large_two_way_pipeline` is a English model originally trained by xiazeng. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/test_nepal_bhasa_study_roberta_large_two_way_pipeline_en_5.5.0_3.0_1727172233741.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/test_nepal_bhasa_study_roberta_large_two_way_pipeline_en_5.5.0_3.0_1727172233741.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("test_nepal_bhasa_study_roberta_large_two_way_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("test_nepal_bhasa_study_roberta_large_two_way_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|test_nepal_bhasa_study_roberta_large_two_way_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/xiazeng/test-new-study_roberta-large_two-way + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-tiny_random_debertafortokenclassification_en.md b/docs/_posts/ahmedlone127/2024-09-24-tiny_random_debertafortokenclassification_en.md new file mode 100644 index 00000000000000..94d68c7c71c3e5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-tiny_random_debertafortokenclassification_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English tiny_random_debertafortokenclassification BertForTokenClassification from hf-tiny-model-private +author: John Snow Labs +name: tiny_random_debertafortokenclassification +date: 2024-09-24 +tags: [en, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`tiny_random_debertafortokenclassification` is a English model originally trained by hf-tiny-model-private. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/tiny_random_debertafortokenclassification_en_5.5.0_3.0_1727203363980.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/tiny_random_debertafortokenclassification_en_5.5.0_3.0_1727203363980.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("tiny_random_debertafortokenclassification","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("tiny_random_debertafortokenclassification", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|tiny_random_debertafortokenclassification| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|346.1 KB| + +## References + +https://huggingface.co/hf-tiny-model-private/tiny-random-DebertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-tiny_random_debertafortokenclassification_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-tiny_random_debertafortokenclassification_pipeline_en.md new file mode 100644 index 00000000000000..18d50b81660b69 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-tiny_random_debertafortokenclassification_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English tiny_random_debertafortokenclassification_pipeline pipeline BertForTokenClassification from hf-tiny-model-private +author: John Snow Labs +name: tiny_random_debertafortokenclassification_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`tiny_random_debertafortokenclassification_pipeline` is a English model originally trained by hf-tiny-model-private. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/tiny_random_debertafortokenclassification_pipeline_en_5.5.0_3.0_1727203364394.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/tiny_random_debertafortokenclassification_pipeline_en_5.5.0_3.0_1727203364394.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("tiny_random_debertafortokenclassification_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("tiny_random_debertafortokenclassification_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|tiny_random_debertafortokenclassification_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|368.3 KB| + +## References + +https://huggingface.co/hf-tiny-model-private/tiny-random-DebertaForTokenClassification + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-tinybert_phishing_model_en.md b/docs/_posts/ahmedlone127/2024-09-24-tinybert_phishing_model_en.md new file mode 100644 index 00000000000000..e978b46978ccdf --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-tinybert_phishing_model_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English tinybert_phishing_model BertForSequenceClassification from rpg1 +author: John Snow Labs +name: tinybert_phishing_model +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`tinybert_phishing_model` is a English model originally trained by rpg1. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/tinybert_phishing_model_en_5.5.0_3.0_1727219397029.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/tinybert_phishing_model_en_5.5.0_3.0_1727219397029.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("tinybert_phishing_model","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("tinybert_phishing_model", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|tinybert_phishing_model| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|54.2 MB| + +## References + +https://huggingface.co/rpg1/tinyBERT_phishing_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-tinybert_sentiment_amazon_en.md b/docs/_posts/ahmedlone127/2024-09-24-tinybert_sentiment_amazon_en.md new file mode 100644 index 00000000000000..e5edb90007ea96 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-tinybert_sentiment_amazon_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English tinybert_sentiment_amazon BertForSequenceClassification from AdamCodd +author: John Snow Labs +name: tinybert_sentiment_amazon +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`tinybert_sentiment_amazon` is a English model originally trained by AdamCodd. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/tinybert_sentiment_amazon_en_5.5.0_3.0_1727149426382.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/tinybert_sentiment_amazon_en_5.5.0_3.0_1727149426382.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("tinybert_sentiment_amazon","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("tinybert_sentiment_amazon", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|tinybert_sentiment_amazon| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|16.7 MB| + +## References + +https://huggingface.co/AdamCodd/tinybert-sentiment-amazon \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-tinybert_sentiment_amazon_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-tinybert_sentiment_amazon_pipeline_en.md new file mode 100644 index 00000000000000..aae23b9155ebbf --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-tinybert_sentiment_amazon_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English tinybert_sentiment_amazon_pipeline pipeline BertForSequenceClassification from AdamCodd +author: John Snow Labs +name: tinybert_sentiment_amazon_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`tinybert_sentiment_amazon_pipeline` is a English model originally trained by AdamCodd. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/tinybert_sentiment_amazon_pipeline_en_5.5.0_3.0_1727149427633.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/tinybert_sentiment_amazon_pipeline_en_5.5.0_3.0_1727149427633.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("tinybert_sentiment_amazon_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("tinybert_sentiment_amazon_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|tinybert_sentiment_amazon_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|16.8 MB| + +## References + +https://huggingface.co/AdamCodd/tinybert-sentiment-amazon + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-tinyroberta_squad2_en.md b/docs/_posts/ahmedlone127/2024-09-24-tinyroberta_squad2_en.md new file mode 100644 index 00000000000000..5c95a32f858964 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-tinyroberta_squad2_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English tinyroberta_squad2 RoBertaForQuestionAnswering from JohnDoe70 +author: John Snow Labs +name: tinyroberta_squad2 +date: 2024-09-24 +tags: [en, open_source, onnx, question_answering, roberta] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`tinyroberta_squad2` is a English model originally trained by JohnDoe70. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/tinyroberta_squad2_en_5.5.0_3.0_1727210789171.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/tinyroberta_squad2_en_5.5.0_3.0_1727210789171.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = RoBertaForQuestionAnswering.pretrained("tinyroberta_squad2","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = RoBertaForQuestionAnswering.pretrained("tinyroberta_squad2", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|tinyroberta_squad2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|306.9 MB| + +## References + +https://huggingface.co/JohnDoe70/tinyroberta-squad2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-tinyroberta_squad2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-tinyroberta_squad2_pipeline_en.md new file mode 100644 index 00000000000000..50cbb4cd701f9e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-tinyroberta_squad2_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English tinyroberta_squad2_pipeline pipeline RoBertaForQuestionAnswering from JohnDoe70 +author: John Snow Labs +name: tinyroberta_squad2_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`tinyroberta_squad2_pipeline` is a English model originally trained by JohnDoe70. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/tinyroberta_squad2_pipeline_en_5.5.0_3.0_1727210805073.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/tinyroberta_squad2_pipeline_en_5.5.0_3.0_1727210805073.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("tinyroberta_squad2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("tinyroberta_squad2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|tinyroberta_squad2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|306.9 MB| + +## References + +https://huggingface.co/JohnDoe70/tinyroberta-squad2 + +## Included Models + +- MultiDocumentAssembler +- RoBertaForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-tmp0xmacdh7_en.md b/docs/_posts/ahmedlone127/2024-09-24-tmp0xmacdh7_en.md new file mode 100644 index 00000000000000..805d580b64e75a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-tmp0xmacdh7_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English tmp0xmacdh7 DistilBertForSequenceClassification from NikDiGio +author: John Snow Labs +name: tmp0xmacdh7 +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`tmp0xmacdh7` is a English model originally trained by NikDiGio. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/tmp0xmacdh7_en_5.5.0_3.0_1727154736193.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/tmp0xmacdh7_en_5.5.0_3.0_1727154736193.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("tmp0xmacdh7","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("tmp0xmacdh7", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|tmp0xmacdh7| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/NikDiGio/tmp0xmacdh7 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-tmp_trainer_parth49_en.md b/docs/_posts/ahmedlone127/2024-09-24-tmp_trainer_parth49_en.md new file mode 100644 index 00000000000000..6234a48fd0345a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-tmp_trainer_parth49_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English tmp_trainer_parth49 DistilBertForSequenceClassification from Parth49 +author: John Snow Labs +name: tmp_trainer_parth49 +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`tmp_trainer_parth49` is a English model originally trained by Parth49. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/tmp_trainer_parth49_en_5.5.0_3.0_1727154770677.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/tmp_trainer_parth49_en_5.5.0_3.0_1727154770677.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("tmp_trainer_parth49","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("tmp_trainer_parth49", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|tmp_trainer_parth49| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Parth49/tmp_trainer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-tner_xlm_roberta_base_ontonotes5_earnings21_non_normalized_en.md b/docs/_posts/ahmedlone127/2024-09-24-tner_xlm_roberta_base_ontonotes5_earnings21_non_normalized_en.md new file mode 100644 index 00000000000000..14b7a68d7b9f2c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-tner_xlm_roberta_base_ontonotes5_earnings21_non_normalized_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English tner_xlm_roberta_base_ontonotes5_earnings21_non_normalized XlmRoBertaForTokenClassification from anonymoussubmissions +author: John Snow Labs +name: tner_xlm_roberta_base_ontonotes5_earnings21_non_normalized +date: 2024-09-24 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`tner_xlm_roberta_base_ontonotes5_earnings21_non_normalized` is a English model originally trained by anonymoussubmissions. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/tner_xlm_roberta_base_ontonotes5_earnings21_non_normalized_en_5.5.0_3.0_1727214968307.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/tner_xlm_roberta_base_ontonotes5_earnings21_non_normalized_en_5.5.0_3.0_1727214968307.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("tner_xlm_roberta_base_ontonotes5_earnings21_non_normalized","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("tner_xlm_roberta_base_ontonotes5_earnings21_non_normalized", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|tner_xlm_roberta_base_ontonotes5_earnings21_non_normalized| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|858.2 MB| + +## References + +https://huggingface.co/anonymoussubmissions/tner-xlm-roberta-base-ontonotes5-earnings21-non-normalized \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-tner_xlm_roberta_base_ontonotes5_earnings21_non_normalized_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-tner_xlm_roberta_base_ontonotes5_earnings21_non_normalized_pipeline_en.md new file mode 100644 index 00000000000000..c40b8229bff916 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-tner_xlm_roberta_base_ontonotes5_earnings21_non_normalized_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English tner_xlm_roberta_base_ontonotes5_earnings21_non_normalized_pipeline pipeline XlmRoBertaForTokenClassification from anonymoussubmissions +author: John Snow Labs +name: tner_xlm_roberta_base_ontonotes5_earnings21_non_normalized_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`tner_xlm_roberta_base_ontonotes5_earnings21_non_normalized_pipeline` is a English model originally trained by anonymoussubmissions. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/tner_xlm_roberta_base_ontonotes5_earnings21_non_normalized_pipeline_en_5.5.0_3.0_1727215035397.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/tner_xlm_roberta_base_ontonotes5_earnings21_non_normalized_pipeline_en_5.5.0_3.0_1727215035397.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("tner_xlm_roberta_base_ontonotes5_earnings21_non_normalized_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("tner_xlm_roberta_base_ontonotes5_earnings21_non_normalized_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|tner_xlm_roberta_base_ontonotes5_earnings21_non_normalized_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|858.3 MB| + +## References + +https://huggingface.co/anonymoussubmissions/tner-xlm-roberta-base-ontonotes5-earnings21-non-normalized + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-transcript_classification_en.md b/docs/_posts/ahmedlone127/2024-09-24-transcript_classification_en.md new file mode 100644 index 00000000000000..6dfac0b9477952 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-transcript_classification_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English transcript_classification DistilBertForSequenceClassification from aoshita +author: John Snow Labs +name: transcript_classification +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`transcript_classification` is a English model originally trained by aoshita. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/transcript_classification_en_5.5.0_3.0_1727154549754.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/transcript_classification_en_5.5.0_3.0_1727154549754.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("transcript_classification","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("transcript_classification", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|transcript_classification| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/aoshita/transcript_classification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-transcript_classification_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-transcript_classification_pipeline_en.md new file mode 100644 index 00000000000000..d3dca89ea32712 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-transcript_classification_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English transcript_classification_pipeline pipeline DistilBertForSequenceClassification from aoshita +author: John Snow Labs +name: transcript_classification_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`transcript_classification_pipeline` is a English model originally trained by aoshita. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/transcript_classification_pipeline_en_5.5.0_3.0_1727154562472.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/transcript_classification_pipeline_en_5.5.0_3.0_1727154562472.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("transcript_classification_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("transcript_classification_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|transcript_classification_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/aoshita/transcript_classification + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-trial_model_djames62_en.md b/docs/_posts/ahmedlone127/2024-09-24-trial_model_djames62_en.md new file mode 100644 index 00000000000000..33102c4da9f1b2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-trial_model_djames62_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English trial_model_djames62 RoBertaForSequenceClassification from djames62 +author: John Snow Labs +name: trial_model_djames62 +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`trial_model_djames62` is a English model originally trained by djames62. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/trial_model_djames62_en_5.5.0_3.0_1727167306603.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/trial_model_djames62_en_5.5.0_3.0_1727167306603.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("trial_model_djames62","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("trial_model_djames62", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|trial_model_djames62| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|416.4 MB| + +## References + +https://huggingface.co/djames62/trial-model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-trial_model_djames62_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-trial_model_djames62_pipeline_en.md new file mode 100644 index 00000000000000..7d2c8cca47c05a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-trial_model_djames62_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English trial_model_djames62_pipeline pipeline RoBertaForSequenceClassification from djames62 +author: John Snow Labs +name: trial_model_djames62_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`trial_model_djames62_pipeline` is a English model originally trained by djames62. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/trial_model_djames62_pipeline_en_5.5.0_3.0_1727167351420.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/trial_model_djames62_pipeline_en_5.5.0_3.0_1727167351420.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("trial_model_djames62_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("trial_model_djames62_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|trial_model_djames62_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|416.4 MB| + +## References + +https://huggingface.co/djames62/trial-model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-trial_model_qstrats_en.md b/docs/_posts/ahmedlone127/2024-09-24-trial_model_qstrats_en.md new file mode 100644 index 00000000000000..6b694f31b622a2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-trial_model_qstrats_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English trial_model_qstrats RoBertaForSequenceClassification from qstrats +author: John Snow Labs +name: trial_model_qstrats +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`trial_model_qstrats` is a English model originally trained by qstrats. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/trial_model_qstrats_en_5.5.0_3.0_1727167479675.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/trial_model_qstrats_en_5.5.0_3.0_1727167479675.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("trial_model_qstrats","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("trial_model_qstrats", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|trial_model_qstrats| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|416.3 MB| + +## References + +https://huggingface.co/qstrats/trial-model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-trial_model_qstrats_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-trial_model_qstrats_pipeline_en.md new file mode 100644 index 00000000000000..c082306b56698f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-trial_model_qstrats_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English trial_model_qstrats_pipeline pipeline RoBertaForSequenceClassification from qstrats +author: John Snow Labs +name: trial_model_qstrats_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`trial_model_qstrats_pipeline` is a English model originally trained by qstrats. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/trial_model_qstrats_pipeline_en_5.5.0_3.0_1727167523937.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/trial_model_qstrats_pipeline_en_5.5.0_3.0_1727167523937.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("trial_model_qstrats_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("trial_model_qstrats_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|trial_model_qstrats_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|416.3 MB| + +## References + +https://huggingface.co/qstrats/trial-model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-trial_model_quant_chef_en.md b/docs/_posts/ahmedlone127/2024-09-24-trial_model_quant_chef_en.md new file mode 100644 index 00000000000000..fa0d5cbc1efacb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-trial_model_quant_chef_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English trial_model_quant_chef RoBertaForSequenceClassification from quant-chef +author: John Snow Labs +name: trial_model_quant_chef +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`trial_model_quant_chef` is a English model originally trained by quant-chef. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/trial_model_quant_chef_en_5.5.0_3.0_1727167608768.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/trial_model_quant_chef_en_5.5.0_3.0_1727167608768.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("trial_model_quant_chef","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("trial_model_quant_chef", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|trial_model_quant_chef| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|416.1 MB| + +## References + +https://huggingface.co/quant-chef/trial-model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-trial_model_quant_chef_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-trial_model_quant_chef_pipeline_en.md new file mode 100644 index 00000000000000..61f71f7d496d07 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-trial_model_quant_chef_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English trial_model_quant_chef_pipeline pipeline RoBertaForSequenceClassification from quant-chef +author: John Snow Labs +name: trial_model_quant_chef_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`trial_model_quant_chef_pipeline` is a English model originally trained by quant-chef. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/trial_model_quant_chef_pipeline_en_5.5.0_3.0_1727167651590.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/trial_model_quant_chef_pipeline_en_5.5.0_3.0_1727167651590.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("trial_model_quant_chef_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("trial_model_quant_chef_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|trial_model_quant_chef_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|416.1 MB| + +## References + +https://huggingface.co/quant-chef/trial-model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-trial_model_vkattukolu3_en.md b/docs/_posts/ahmedlone127/2024-09-24-trial_model_vkattukolu3_en.md new file mode 100644 index 00000000000000..3010b990a6d208 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-trial_model_vkattukolu3_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English trial_model_vkattukolu3 RoBertaForSequenceClassification from vkattukolu3 +author: John Snow Labs +name: trial_model_vkattukolu3 +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`trial_model_vkattukolu3` is a English model originally trained by vkattukolu3. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/trial_model_vkattukolu3_en_5.5.0_3.0_1727167855320.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/trial_model_vkattukolu3_en_5.5.0_3.0_1727167855320.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("trial_model_vkattukolu3","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("trial_model_vkattukolu3", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|trial_model_vkattukolu3| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|416.5 MB| + +## References + +https://huggingface.co/vkattukolu3/trial-model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-trial_model_vkattukolu3_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-trial_model_vkattukolu3_pipeline_en.md new file mode 100644 index 00000000000000..6c00a29f936bfc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-trial_model_vkattukolu3_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English trial_model_vkattukolu3_pipeline pipeline RoBertaForSequenceClassification from vkattukolu3 +author: John Snow Labs +name: trial_model_vkattukolu3_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`trial_model_vkattukolu3_pipeline` is a English model originally trained by vkattukolu3. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/trial_model_vkattukolu3_pipeline_en_5.5.0_3.0_1727167898668.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/trial_model_vkattukolu3_pipeline_en_5.5.0_3.0_1727167898668.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("trial_model_vkattukolu3_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("trial_model_vkattukolu3_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|trial_model_vkattukolu3_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|416.5 MB| + +## References + +https://huggingface.co/vkattukolu3/trial-model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-tuned_test_trainer_bert_base_uncased_mrredborne_en.md b/docs/_posts/ahmedlone127/2024-09-24-tuned_test_trainer_bert_base_uncased_mrredborne_en.md new file mode 100644 index 00000000000000..13e6687a7088f6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-tuned_test_trainer_bert_base_uncased_mrredborne_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English tuned_test_trainer_bert_base_uncased_mrredborne BertForSequenceClassification from Mrredborne +author: John Snow Labs +name: tuned_test_trainer_bert_base_uncased_mrredborne +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`tuned_test_trainer_bert_base_uncased_mrredborne` is a English model originally trained by Mrredborne. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/tuned_test_trainer_bert_base_uncased_mrredborne_en_5.5.0_3.0_1727213463146.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/tuned_test_trainer_bert_base_uncased_mrredborne_en_5.5.0_3.0_1727213463146.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("tuned_test_trainer_bert_base_uncased_mrredborne","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("tuned_test_trainer_bert_base_uncased_mrredborne", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|tuned_test_trainer_bert_base_uncased_mrredborne| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/Mrredborne/tuned_test_trainer-bert-base-uncased \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-tuned_test_trainer_bert_base_uncased_mrredborne_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-tuned_test_trainer_bert_base_uncased_mrredborne_pipeline_en.md new file mode 100644 index 00000000000000..aa86e25f957c6e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-tuned_test_trainer_bert_base_uncased_mrredborne_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English tuned_test_trainer_bert_base_uncased_mrredborne_pipeline pipeline BertForSequenceClassification from Mrredborne +author: John Snow Labs +name: tuned_test_trainer_bert_base_uncased_mrredborne_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`tuned_test_trainer_bert_base_uncased_mrredborne_pipeline` is a English model originally trained by Mrredborne. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/tuned_test_trainer_bert_base_uncased_mrredborne_pipeline_en_5.5.0_3.0_1727213484226.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/tuned_test_trainer_bert_base_uncased_mrredborne_pipeline_en_5.5.0_3.0_1727213484226.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("tuned_test_trainer_bert_base_uncased_mrredborne_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("tuned_test_trainer_bert_base_uncased_mrredborne_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|tuned_test_trainer_bert_base_uncased_mrredborne_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/Mrredborne/tuned_test_trainer-bert-base-uncased + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-tuning_lr_0_1_wd_0_01_epochs_1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-tuning_lr_0_1_wd_0_01_epochs_1_pipeline_en.md new file mode 100644 index 00000000000000..fd9729b968b78d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-tuning_lr_0_1_wd_0_01_epochs_1_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English tuning_lr_0_1_wd_0_01_epochs_1_pipeline pipeline DistilBertForSequenceClassification from ash-akjp-ga +author: John Snow Labs +name: tuning_lr_0_1_wd_0_01_epochs_1_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`tuning_lr_0_1_wd_0_01_epochs_1_pipeline` is a English model originally trained by ash-akjp-ga. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/tuning_lr_0_1_wd_0_01_epochs_1_pipeline_en_5.5.0_3.0_1727164669526.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/tuning_lr_0_1_wd_0_01_epochs_1_pipeline_en_5.5.0_3.0_1727164669526.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("tuning_lr_0_1_wd_0_01_epochs_1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("tuning_lr_0_1_wd_0_01_epochs_1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|tuning_lr_0_1_wd_0_01_epochs_1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|251.1 MB| + +## References + +https://huggingface.co/ash-akjp-ga/tuning_lr_0.1_wd_0.01_epochs_1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-twitter_roberta_base_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-twitter_roberta_base_pipeline_en.md new file mode 100644 index 00000000000000..02c9ec233bfda9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-twitter_roberta_base_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English twitter_roberta_base_pipeline pipeline RoBertaEmbeddings from cardiffnlp +author: John Snow Labs +name: twitter_roberta_base_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`twitter_roberta_base_pipeline` is a English model originally trained by cardiffnlp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/twitter_roberta_base_pipeline_en_5.5.0_3.0_1727216074828.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/twitter_roberta_base_pipeline_en_5.5.0_3.0_1727216074828.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("twitter_roberta_base_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("twitter_roberta_base_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|twitter_roberta_base_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|465.9 MB| + +## References + +https://huggingface.co/cardiffnlp/twitter-roberta-base + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-wav2vec2_base_igbo_en.md b/docs/_posts/ahmedlone127/2024-09-24-wav2vec2_base_igbo_en.md new file mode 100644 index 00000000000000..bca3fda5eb2bcd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-wav2vec2_base_igbo_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English wav2vec2_base_igbo WhisperForCTC from Msughterx +author: John Snow Labs +name: wav2vec2_base_igbo +date: 2024-09-24 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`wav2vec2_base_igbo` is a English model originally trained by Msughterx. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/wav2vec2_base_igbo_en_5.5.0_3.0_1727145263148.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/wav2vec2_base_igbo_en_5.5.0_3.0_1727145263148.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("wav2vec2_base_igbo","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("wav2vec2_base_igbo", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|wav2vec2_base_igbo| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/Msughterx/wav2vec2-base-igbo \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-wav2vec2_base_igbo_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-wav2vec2_base_igbo_pipeline_en.md new file mode 100644 index 00000000000000..d6e71025e408d8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-wav2vec2_base_igbo_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English wav2vec2_base_igbo_pipeline pipeline WhisperForCTC from Msughterx +author: John Snow Labs +name: wav2vec2_base_igbo_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`wav2vec2_base_igbo_pipeline` is a English model originally trained by Msughterx. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/wav2vec2_base_igbo_pipeline_en_5.5.0_3.0_1727145348500.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/wav2vec2_base_igbo_pipeline_en_5.5.0_3.0_1727145348500.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("wav2vec2_base_igbo_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("wav2vec2_base_igbo_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|wav2vec2_base_igbo_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/Msughterx/wav2vec2-base-igbo + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-whisper_ai_nomimode_en.md b/docs/_posts/ahmedlone127/2024-09-24-whisper_ai_nomimode_en.md new file mode 100644 index 00000000000000..cf1e72c0dbb5f7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-whisper_ai_nomimode_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_ai_nomimode WhisperForCTC from susmitabhatt +author: John Snow Labs +name: whisper_ai_nomimode +date: 2024-09-24 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_ai_nomimode` is a English model originally trained by susmitabhatt. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_ai_nomimode_en_5.5.0_3.0_1727142115819.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_ai_nomimode_en_5.5.0_3.0_1727142115819.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_ai_nomimode","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_ai_nomimode", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_ai_nomimode| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/susmitabhatt/whisper-ai-nomimode \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-whisper_ai_nomimode_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-whisper_ai_nomimode_pipeline_en.md new file mode 100644 index 00000000000000..4cc4366846f2f8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-whisper_ai_nomimode_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_ai_nomimode_pipeline pipeline WhisperForCTC from susmitabhatt +author: John Snow Labs +name: whisper_ai_nomimode_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_ai_nomimode_pipeline` is a English model originally trained by susmitabhatt. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_ai_nomimode_pipeline_en_5.5.0_3.0_1727142206732.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_ai_nomimode_pipeline_en_5.5.0_3.0_1727142206732.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_ai_nomimode_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_ai_nomimode_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_ai_nomimode_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/susmitabhatt/whisper-ai-nomimode + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-whisper_base_basque_cv16_1_eu.md b/docs/_posts/ahmedlone127/2024-09-24-whisper_base_basque_cv16_1_eu.md new file mode 100644 index 00000000000000..b14f84980d5dd9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-whisper_base_basque_cv16_1_eu.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Basque whisper_base_basque_cv16_1 WhisperForCTC from zuazo +author: John Snow Labs +name: whisper_base_basque_cv16_1 +date: 2024-09-24 +tags: [eu, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: eu +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_base_basque_cv16_1` is a Basque model originally trained by zuazo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_base_basque_cv16_1_eu_5.5.0_3.0_1727141467311.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_base_basque_cv16_1_eu_5.5.0_3.0_1727141467311.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_base_basque_cv16_1","eu") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_base_basque_cv16_1", "eu") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_base_basque_cv16_1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|eu| +|Size:|641.4 MB| + +## References + +https://huggingface.co/zuazo/whisper-base-eu-cv16_1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-whisper_base_basque_cv16_1_pipeline_eu.md b/docs/_posts/ahmedlone127/2024-09-24-whisper_base_basque_cv16_1_pipeline_eu.md new file mode 100644 index 00000000000000..4eef3c84fef78e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-whisper_base_basque_cv16_1_pipeline_eu.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Basque whisper_base_basque_cv16_1_pipeline pipeline WhisperForCTC from zuazo +author: John Snow Labs +name: whisper_base_basque_cv16_1_pipeline +date: 2024-09-24 +tags: [eu, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: eu +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_base_basque_cv16_1_pipeline` is a Basque model originally trained by zuazo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_base_basque_cv16_1_pipeline_eu_5.5.0_3.0_1727141501672.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_base_basque_cv16_1_pipeline_eu_5.5.0_3.0_1727141501672.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_base_basque_cv16_1_pipeline", lang = "eu") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_base_basque_cv16_1_pipeline", lang = "eu") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_base_basque_cv16_1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|eu| +|Size:|641.4 MB| + +## References + +https://huggingface.co/zuazo/whisper-base-eu-cv16_1 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-whisper_base_thai_project_6_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-whisper_base_thai_project_6_pipeline_en.md new file mode 100644 index 00000000000000..d869eb63c939c3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-whisper_base_thai_project_6_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_base_thai_project_6_pipeline pipeline WhisperForCTC from Varit +author: John Snow Labs +name: whisper_base_thai_project_6_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_base_thai_project_6_pipeline` is a English model originally trained by Varit. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_base_thai_project_6_pipeline_en_5.5.0_3.0_1727145914176.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_base_thai_project_6_pipeline_en_5.5.0_3.0_1727145914176.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_base_thai_project_6_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_base_thai_project_6_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_base_thai_project_6_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|642.3 MB| + +## References + +https://huggingface.co/Varit/whisper-base-th-project-6 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-whisper_medium_with_google_fleurs_arabic_4000_steps_en.md b/docs/_posts/ahmedlone127/2024-09-24-whisper_medium_with_google_fleurs_arabic_4000_steps_en.md new file mode 100644 index 00000000000000..9dd3f2964dfefb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-whisper_medium_with_google_fleurs_arabic_4000_steps_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_medium_with_google_fleurs_arabic_4000_steps WhisperForCTC from MohammadJamalaldeen +author: John Snow Labs +name: whisper_medium_with_google_fleurs_arabic_4000_steps +date: 2024-09-24 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_medium_with_google_fleurs_arabic_4000_steps` is a English model originally trained by MohammadJamalaldeen. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_medium_with_google_fleurs_arabic_4000_steps_en_5.5.0_3.0_1727144470487.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_medium_with_google_fleurs_arabic_4000_steps_en_5.5.0_3.0_1727144470487.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_medium_with_google_fleurs_arabic_4000_steps","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_medium_with_google_fleurs_arabic_4000_steps", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_medium_with_google_fleurs_arabic_4000_steps| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|4.8 GB| + +## References + +https://huggingface.co/MohammadJamalaldeen/whisper-medium-with-google-fleurs-ar-4000_steps \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-whisper_small_divehi_cleandata_dv.md b/docs/_posts/ahmedlone127/2024-09-24-whisper_small_divehi_cleandata_dv.md new file mode 100644 index 00000000000000..5edb5cd5937918 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-whisper_small_divehi_cleandata_dv.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Dhivehi, Divehi, Maldivian whisper_small_divehi_cleandata WhisperForCTC from cleandata +author: John Snow Labs +name: whisper_small_divehi_cleandata +date: 2024-09-24 +tags: [dv, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: dv +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_divehi_cleandata` is a Dhivehi, Divehi, Maldivian model originally trained by cleandata. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_divehi_cleandata_dv_5.5.0_3.0_1727143368743.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_divehi_cleandata_dv_5.5.0_3.0_1727143368743.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_divehi_cleandata","dv") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_divehi_cleandata", "dv") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_divehi_cleandata| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|dv| +|Size:|1.7 GB| + +## References + +https://huggingface.co/cleandata/whisper-small-dv \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-whisper_small_divehi_cleandata_pipeline_dv.md b/docs/_posts/ahmedlone127/2024-09-24-whisper_small_divehi_cleandata_pipeline_dv.md new file mode 100644 index 00000000000000..8883835d9f3792 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-whisper_small_divehi_cleandata_pipeline_dv.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Dhivehi, Divehi, Maldivian whisper_small_divehi_cleandata_pipeline pipeline WhisperForCTC from cleandata +author: John Snow Labs +name: whisper_small_divehi_cleandata_pipeline +date: 2024-09-24 +tags: [dv, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: dv +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_divehi_cleandata_pipeline` is a Dhivehi, Divehi, Maldivian model originally trained by cleandata. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_divehi_cleandata_pipeline_dv_5.5.0_3.0_1727143459591.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_divehi_cleandata_pipeline_dv_5.5.0_3.0_1727143459591.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_divehi_cleandata_pipeline", lang = "dv") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_divehi_cleandata_pipeline", lang = "dv") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_divehi_cleandata_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|dv| +|Size:|1.7 GB| + +## References + +https://huggingface.co/cleandata/whisper-small-dv + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-whisper_small_hindi_abatula_en.md b/docs/_posts/ahmedlone127/2024-09-24-whisper_small_hindi_abatula_en.md new file mode 100644 index 00000000000000..346783fec99509 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-whisper_small_hindi_abatula_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_small_hindi_abatula WhisperForCTC from abatula +author: John Snow Labs +name: whisper_small_hindi_abatula +date: 2024-09-24 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_hindi_abatula` is a English model originally trained by abatula. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_hindi_abatula_en_5.5.0_3.0_1727141617044.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_hindi_abatula_en_5.5.0_3.0_1727141617044.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_hindi_abatula","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_hindi_abatula", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_hindi_abatula| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/abatula/whisper-small-hi \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-whisper_small_hindi_abatula_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-whisper_small_hindi_abatula_pipeline_en.md new file mode 100644 index 00000000000000..5ba327c7a32961 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-whisper_small_hindi_abatula_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_small_hindi_abatula_pipeline pipeline WhisperForCTC from abatula +author: John Snow Labs +name: whisper_small_hindi_abatula_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_hindi_abatula_pipeline` is a English model originally trained by abatula. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_hindi_abatula_pipeline_en_5.5.0_3.0_1727141715892.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_hindi_abatula_pipeline_en_5.5.0_3.0_1727141715892.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_hindi_abatula_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_hindi_abatula_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_hindi_abatula_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/abatula/whisper-small-hi + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-whisper_small_hk_en.md b/docs/_posts/ahmedlone127/2024-09-24-whisper_small_hk_en.md new file mode 100644 index 00000000000000..ff57556d789e5d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-whisper_small_hk_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_small_hk WhisperForCTC from PenguinbladeZ +author: John Snow Labs +name: whisper_small_hk +date: 2024-09-24 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_hk` is a English model originally trained by PenguinbladeZ. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_hk_en_5.5.0_3.0_1727194164950.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_hk_en_5.5.0_3.0_1727194164950.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_hk","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_hk", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_hk| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/PenguinbladeZ/whisper-small-hk \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-whisper_small_hk_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-whisper_small_hk_pipeline_en.md new file mode 100644 index 00000000000000..f5ad97415b35c9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-whisper_small_hk_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_small_hk_pipeline pipeline WhisperForCTC from PenguinbladeZ +author: John Snow Labs +name: whisper_small_hk_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_hk_pipeline` is a English model originally trained by PenguinbladeZ. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_hk_pipeline_en_5.5.0_3.0_1727194266210.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_hk_pipeline_en_5.5.0_3.0_1727194266210.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_hk_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_hk_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_hk_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/PenguinbladeZ/whisper-small-hk + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-whisper_small_portuguese_pedropauletti_pt.md b/docs/_posts/ahmedlone127/2024-09-24-whisper_small_portuguese_pedropauletti_pt.md new file mode 100644 index 00000000000000..59652cf79487f5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-whisper_small_portuguese_pedropauletti_pt.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Portuguese whisper_small_portuguese_pedropauletti WhisperForCTC from pedropauletti +author: John Snow Labs +name: whisper_small_portuguese_pedropauletti +date: 2024-09-24 +tags: [pt, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: pt +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_portuguese_pedropauletti` is a Portuguese model originally trained by pedropauletti. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_portuguese_pedropauletti_pt_5.5.0_3.0_1727194190826.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_portuguese_pedropauletti_pt_5.5.0_3.0_1727194190826.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_portuguese_pedropauletti","pt") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_portuguese_pedropauletti", "pt") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_portuguese_pedropauletti| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|pt| +|Size:|1.7 GB| + +## References + +https://huggingface.co/pedropauletti/whisper-small-pt \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-whisper_small_telugu_4k_pipeline_te.md b/docs/_posts/ahmedlone127/2024-09-24-whisper_small_telugu_4k_pipeline_te.md new file mode 100644 index 00000000000000..8413e95da35a94 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-whisper_small_telugu_4k_pipeline_te.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Telugu whisper_small_telugu_4k_pipeline pipeline WhisperForCTC from bnriiitb +author: John Snow Labs +name: whisper_small_telugu_4k_pipeline +date: 2024-09-24 +tags: [te, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: te +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_telugu_4k_pipeline` is a Telugu model originally trained by bnriiitb. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_telugu_4k_pipeline_te_5.5.0_3.0_1727144406215.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_telugu_4k_pipeline_te_5.5.0_3.0_1727144406215.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_telugu_4k_pipeline", lang = "te") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_telugu_4k_pipeline", lang = "te") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_telugu_4k_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|te| +|Size:|1.7 GB| + +## References + +https://huggingface.co/bnriiitb/whisper-small-te-4k + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-whisper_small_telugu_4k_te.md b/docs/_posts/ahmedlone127/2024-09-24-whisper_small_telugu_4k_te.md new file mode 100644 index 00000000000000..be69d3bbe59aa1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-whisper_small_telugu_4k_te.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Telugu whisper_small_telugu_4k WhisperForCTC from bnriiitb +author: John Snow Labs +name: whisper_small_telugu_4k +date: 2024-09-24 +tags: [te, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: te +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_telugu_4k` is a Telugu model originally trained by bnriiitb. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_telugu_4k_te_5.5.0_3.0_1727144311164.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_telugu_4k_te_5.5.0_3.0_1727144311164.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_telugu_4k","te") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_telugu_4k", "te") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_telugu_4k| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|te| +|Size:|1.7 GB| + +## References + +https://huggingface.co/bnriiitb/whisper-small-te-4k \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-whisper_tiny_english_arielcerdap_en.md b/docs/_posts/ahmedlone127/2024-09-24-whisper_tiny_english_arielcerdap_en.md new file mode 100644 index 00000000000000..f29d82356b56eb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-whisper_tiny_english_arielcerdap_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_tiny_english_arielcerdap WhisperForCTC from arielcerdap +author: John Snow Labs +name: whisper_tiny_english_arielcerdap +date: 2024-09-24 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_english_arielcerdap` is a English model originally trained by arielcerdap. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_english_arielcerdap_en_5.5.0_3.0_1727142002434.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_english_arielcerdap_en_5.5.0_3.0_1727142002434.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_tiny_english_arielcerdap","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_tiny_english_arielcerdap", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_english_arielcerdap| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|389.9 MB| + +## References + +https://huggingface.co/arielcerdap/whisper-tiny-en \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-whisper_tiny_english_arielcerdap_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-whisper_tiny_english_arielcerdap_pipeline_en.md new file mode 100644 index 00000000000000..b7044c6a1c1427 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-whisper_tiny_english_arielcerdap_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_tiny_english_arielcerdap_pipeline pipeline WhisperForCTC from arielcerdap +author: John Snow Labs +name: whisper_tiny_english_arielcerdap_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_english_arielcerdap_pipeline` is a English model originally trained by arielcerdap. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_english_arielcerdap_pipeline_en_5.5.0_3.0_1727142022683.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_english_arielcerdap_pipeline_en_5.5.0_3.0_1727142022683.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_tiny_english_arielcerdap_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_tiny_english_arielcerdap_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_english_arielcerdap_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|389.9 MB| + +## References + +https://huggingface.co/arielcerdap/whisper-tiny-en + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-wikibert_base_parsinlu_entailment_fa.md b/docs/_posts/ahmedlone127/2024-09-24-wikibert_base_parsinlu_entailment_fa.md new file mode 100644 index 00000000000000..dd508a63958234 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-wikibert_base_parsinlu_entailment_fa.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Persian wikibert_base_parsinlu_entailment BertForSequenceClassification from persiannlp +author: John Snow Labs +name: wikibert_base_parsinlu_entailment +date: 2024-09-24 +tags: [fa, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: fa +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`wikibert_base_parsinlu_entailment` is a Persian model originally trained by persiannlp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/wikibert_base_parsinlu_entailment_fa_5.5.0_3.0_1727219331306.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/wikibert_base_parsinlu_entailment_fa_5.5.0_3.0_1727219331306.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("wikibert_base_parsinlu_entailment","fa") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("wikibert_base_parsinlu_entailment", "fa") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|wikibert_base_parsinlu_entailment| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|fa| +|Size:|380.3 MB| + +## References + +https://huggingface.co/persiannlp/wikibert-base-parsinlu-entailment \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-wikibert_base_parsinlu_entailment_pipeline_fa.md b/docs/_posts/ahmedlone127/2024-09-24-wikibert_base_parsinlu_entailment_pipeline_fa.md new file mode 100644 index 00000000000000..70cc1a1cd2df88 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-wikibert_base_parsinlu_entailment_pipeline_fa.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Persian wikibert_base_parsinlu_entailment_pipeline pipeline BertForSequenceClassification from persiannlp +author: John Snow Labs +name: wikibert_base_parsinlu_entailment_pipeline +date: 2024-09-24 +tags: [fa, open_source, pipeline, onnx] +task: Text Classification +language: fa +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`wikibert_base_parsinlu_entailment_pipeline` is a Persian model originally trained by persiannlp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/wikibert_base_parsinlu_entailment_pipeline_fa_5.5.0_3.0_1727219350247.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/wikibert_base_parsinlu_entailment_pipeline_fa_5.5.0_3.0_1727219350247.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("wikibert_base_parsinlu_entailment_pipeline", lang = "fa") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("wikibert_base_parsinlu_entailment_pipeline", lang = "fa") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|wikibert_base_parsinlu_entailment_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|fa| +|Size:|380.3 MB| + +## References + +https://huggingface.co/persiannlp/wikibert-base-parsinlu-entailment + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-wineberto_ner_en.md b/docs/_posts/ahmedlone127/2024-09-24-wineberto_ner_en.md new file mode 100644 index 00000000000000..4a1a00c6f4d05a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-wineberto_ner_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English wineberto_ner BertForTokenClassification from panigrah +author: John Snow Labs +name: wineberto_ner +date: 2024-09-24 +tags: [en, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`wineberto_ner` is a English model originally trained by panigrah. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/wineberto_ner_en_5.5.0_3.0_1727203626360.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/wineberto_ner_en_5.5.0_3.0_1727203626360.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("wineberto_ner","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("wineberto_ner", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|wineberto_ner| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/panigrah/wineberto-ner \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-wineberto_ner_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-wineberto_ner_pipeline_en.md new file mode 100644 index 00000000000000..6efe3745e5d6ce --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-wineberto_ner_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English wineberto_ner_pipeline pipeline BertForTokenClassification from panigrah +author: John Snow Labs +name: wineberto_ner_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`wineberto_ner_pipeline` is a English model originally trained by panigrah. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/wineberto_ner_pipeline_en_5.5.0_3.0_1727203648432.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/wineberto_ner_pipeline_en_5.5.0_3.0_1727203648432.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("wineberto_ner_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("wineberto_ner_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|wineberto_ner_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/panigrah/wineberto-ner + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-withinapps_ndd_ppma_test_content_cwadj_en.md b/docs/_posts/ahmedlone127/2024-09-24-withinapps_ndd_ppma_test_content_cwadj_en.md new file mode 100644 index 00000000000000..6c374505a54741 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-withinapps_ndd_ppma_test_content_cwadj_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English withinapps_ndd_ppma_test_content_cwadj DistilBertForSequenceClassification from lgk03 +author: John Snow Labs +name: withinapps_ndd_ppma_test_content_cwadj +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`withinapps_ndd_ppma_test_content_cwadj` is a English model originally trained by lgk03. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/withinapps_ndd_ppma_test_content_cwadj_en_5.5.0_3.0_1727154623208.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/withinapps_ndd_ppma_test_content_cwadj_en_5.5.0_3.0_1727154623208.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("withinapps_ndd_ppma_test_content_cwadj","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("withinapps_ndd_ppma_test_content_cwadj", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|withinapps_ndd_ppma_test_content_cwadj| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/lgk03/WITHINAPPS_NDD-ppma_test-content-CWAdj \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-xddmodel_en.md b/docs/_posts/ahmedlone127/2024-09-24-xddmodel_en.md new file mode 100644 index 00000000000000..f4747f7ddcd376 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-xddmodel_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xddmodel XlmRoBertaForTokenClassification from pushokay +author: John Snow Labs +name: xddmodel +date: 2024-09-24 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xddmodel` is a English model originally trained by pushokay. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xddmodel_en_5.5.0_3.0_1727179961981.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xddmodel_en_5.5.0_3.0_1727179961981.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xddmodel","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xddmodel", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xddmodel| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|388.5 MB| + +## References + +https://huggingface.co/pushokay/xddModel \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-xddmodel_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-xddmodel_pipeline_en.md new file mode 100644 index 00000000000000..3b5143356189e7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-xddmodel_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xddmodel_pipeline pipeline XlmRoBertaForTokenClassification from pushokay +author: John Snow Labs +name: xddmodel_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xddmodel_pipeline` is a English model originally trained by pushokay. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xddmodel_pipeline_en_5.5.0_3.0_1727179987730.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xddmodel_pipeline_en_5.5.0_3.0_1727179987730.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xddmodel_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xddmodel_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xddmodel_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|388.5 MB| + +## References + +https://huggingface.co/pushokay/xddModel + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_balance_vietnam_aug_delete_en.md b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_balance_vietnam_aug_delete_en.md new file mode 100644 index 00000000000000..f376e31e2caac3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_balance_vietnam_aug_delete_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_balance_vietnam_aug_delete XlmRoBertaForSequenceClassification from ThuyNT03 +author: John Snow Labs +name: xlm_roberta_base_balance_vietnam_aug_delete +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_balance_vietnam_aug_delete` is a English model originally trained by ThuyNT03. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_balance_vietnam_aug_delete_en_5.5.0_3.0_1727170140787.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_balance_vietnam_aug_delete_en_5.5.0_3.0_1727170140787.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_balance_vietnam_aug_delete","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_balance_vietnam_aug_delete", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_balance_vietnam_aug_delete| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|793.6 MB| + +## References + +https://huggingface.co/ThuyNT03/xlm-roberta-base-Balance_VietNam-aug_delete \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_balance_vietnam_aug_delete_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_balance_vietnam_aug_delete_pipeline_en.md new file mode 100644 index 00000000000000..7b08fa4d2b15f8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_balance_vietnam_aug_delete_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_balance_vietnam_aug_delete_pipeline pipeline XlmRoBertaForSequenceClassification from ThuyNT03 +author: John Snow Labs +name: xlm_roberta_base_balance_vietnam_aug_delete_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_balance_vietnam_aug_delete_pipeline` is a English model originally trained by ThuyNT03. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_balance_vietnam_aug_delete_pipeline_en_5.5.0_3.0_1727170276044.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_balance_vietnam_aug_delete_pipeline_en_5.5.0_3.0_1727170276044.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_balance_vietnam_aug_delete_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_balance_vietnam_aug_delete_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_balance_vietnam_aug_delete_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|793.7 MB| + +## References + +https://huggingface.co/ThuyNT03/xlm-roberta-base-Balance_VietNam-aug_delete + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_final_mixed_aug_backtranslation_1_en.md b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_final_mixed_aug_backtranslation_1_en.md new file mode 100644 index 00000000000000..b151236e6c9e81 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_final_mixed_aug_backtranslation_1_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_final_mixed_aug_backtranslation_1 XlmRoBertaForSequenceClassification from ThuyNT03 +author: John Snow Labs +name: xlm_roberta_base_final_mixed_aug_backtranslation_1 +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_final_mixed_aug_backtranslation_1` is a English model originally trained by ThuyNT03. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_final_mixed_aug_backtranslation_1_en_5.5.0_3.0_1727152826621.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_final_mixed_aug_backtranslation_1_en_5.5.0_3.0_1727152826621.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_final_mixed_aug_backtranslation_1","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_final_mixed_aug_backtranslation_1", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_final_mixed_aug_backtranslation_1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|794.6 MB| + +## References + +https://huggingface.co/ThuyNT03/xlm-roberta-base-Final_Mixed-aug_backtranslation-1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_final_mixed_aug_backtranslation_1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_final_mixed_aug_backtranslation_1_pipeline_en.md new file mode 100644 index 00000000000000..40a8186e5ed5e9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_final_mixed_aug_backtranslation_1_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_final_mixed_aug_backtranslation_1_pipeline pipeline XlmRoBertaForSequenceClassification from ThuyNT03 +author: John Snow Labs +name: xlm_roberta_base_final_mixed_aug_backtranslation_1_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_final_mixed_aug_backtranslation_1_pipeline` is a English model originally trained by ThuyNT03. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_final_mixed_aug_backtranslation_1_pipeline_en_5.5.0_3.0_1727152966363.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_final_mixed_aug_backtranslation_1_pipeline_en_5.5.0_3.0_1727152966363.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_final_mixed_aug_backtranslation_1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_final_mixed_aug_backtranslation_1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_final_mixed_aug_backtranslation_1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|794.7 MB| + +## References + +https://huggingface.co/ThuyNT03/xlm-roberta-base-Final_Mixed-aug_backtranslation-1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_iberautextification2024_7030_4epo_task1_v2_en.md b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_iberautextification2024_7030_4epo_task1_v2_en.md new file mode 100644 index 00000000000000..3edf0f59e9003b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_iberautextification2024_7030_4epo_task1_v2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_iberautextification2024_7030_4epo_task1_v2 XlmRoBertaForSequenceClassification from vg055 +author: John Snow Labs +name: xlm_roberta_base_finetuned_iberautextification2024_7030_4epo_task1_v2 +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_iberautextification2024_7030_4epo_task1_v2` is a English model originally trained by vg055. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_iberautextification2024_7030_4epo_task1_v2_en_5.5.0_3.0_1727152751701.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_iberautextification2024_7030_4epo_task1_v2_en_5.5.0_3.0_1727152751701.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_finetuned_iberautextification2024_7030_4epo_task1_v2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_finetuned_iberautextification2024_7030_4epo_task1_v2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_iberautextification2024_7030_4epo_task1_v2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|894.7 MB| + +## References + +https://huggingface.co/vg055/xlm-roberta-base-finetuned-IberAuTexTification2024-7030-4epo-task1-v2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_iberautextification2024_7030_4epo_task1_v2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_iberautextification2024_7030_4epo_task1_v2_pipeline_en.md new file mode 100644 index 00000000000000..b27d4eb7c76015 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_iberautextification2024_7030_4epo_task1_v2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_iberautextification2024_7030_4epo_task1_v2_pipeline pipeline XlmRoBertaForSequenceClassification from vg055 +author: John Snow Labs +name: xlm_roberta_base_finetuned_iberautextification2024_7030_4epo_task1_v2_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_iberautextification2024_7030_4epo_task1_v2_pipeline` is a English model originally trained by vg055. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_iberautextification2024_7030_4epo_task1_v2_pipeline_en_5.5.0_3.0_1727152814072.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_iberautextification2024_7030_4epo_task1_v2_pipeline_en_5.5.0_3.0_1727152814072.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_iberautextification2024_7030_4epo_task1_v2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_iberautextification2024_7030_4epo_task1_v2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_iberautextification2024_7030_4epo_task1_v2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|894.7 MB| + +## References + +https://huggingface.co/vg055/xlm-roberta-base-finetuned-IberAuTexTification2024-7030-4epo-task1-v2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_kinyarwanda_kinre_finetuned_kinyarwanda_sent2_en.md b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_kinyarwanda_kinre_finetuned_kinyarwanda_sent2_en.md new file mode 100644 index 00000000000000..3751702e6ac84e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_kinyarwanda_kinre_finetuned_kinyarwanda_sent2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_kinyarwanda_kinre_finetuned_kinyarwanda_sent2 XlmRoBertaForSequenceClassification from RogerB +author: John Snow Labs +name: xlm_roberta_base_finetuned_kinyarwanda_kinre_finetuned_kinyarwanda_sent2 +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_kinyarwanda_kinre_finetuned_kinyarwanda_sent2` is a English model originally trained by RogerB. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_kinyarwanda_kinre_finetuned_kinyarwanda_sent2_en_5.5.0_3.0_1727170138983.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_kinyarwanda_kinre_finetuned_kinyarwanda_sent2_en_5.5.0_3.0_1727170138983.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_finetuned_kinyarwanda_kinre_finetuned_kinyarwanda_sent2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_finetuned_kinyarwanda_kinre_finetuned_kinyarwanda_sent2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_kinyarwanda_kinre_finetuned_kinyarwanda_sent2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.0 GB| + +## References + +https://huggingface.co/RogerB/xlm-roberta-base-finetuned-kinyarwanda-kinre-finetuned-kin-sent2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_kinyarwanda_kinre_finetuned_kinyarwanda_sent2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_kinyarwanda_kinre_finetuned_kinyarwanda_sent2_pipeline_en.md new file mode 100644 index 00000000000000..1fbb06cd57a998 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_kinyarwanda_kinre_finetuned_kinyarwanda_sent2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_kinyarwanda_kinre_finetuned_kinyarwanda_sent2_pipeline pipeline XlmRoBertaForSequenceClassification from RogerB +author: John Snow Labs +name: xlm_roberta_base_finetuned_kinyarwanda_kinre_finetuned_kinyarwanda_sent2_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_kinyarwanda_kinre_finetuned_kinyarwanda_sent2_pipeline` is a English model originally trained by RogerB. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_kinyarwanda_kinre_finetuned_kinyarwanda_sent2_pipeline_en_5.5.0_3.0_1727170198683.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_kinyarwanda_kinre_finetuned_kinyarwanda_sent2_pipeline_en_5.5.0_3.0_1727170198683.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_kinyarwanda_kinre_finetuned_kinyarwanda_sent2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_kinyarwanda_kinre_finetuned_kinyarwanda_sent2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_kinyarwanda_kinre_finetuned_kinyarwanda_sent2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.0 GB| + +## References + +https://huggingface.co/RogerB/xlm-roberta-base-finetuned-kinyarwanda-kinre-finetuned-kin-sent2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_marc_tielupeng_en.md b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_marc_tielupeng_en.md new file mode 100644 index 00000000000000..160bcb71b8bb81 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_marc_tielupeng_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_marc_tielupeng XlmRoBertaForSequenceClassification from tielupeng +author: John Snow Labs +name: xlm_roberta_base_finetuned_marc_tielupeng +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_marc_tielupeng` is a English model originally trained by tielupeng. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_marc_tielupeng_en_5.5.0_3.0_1727156592734.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_marc_tielupeng_en_5.5.0_3.0_1727156592734.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_finetuned_marc_tielupeng","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_finetuned_marc_tielupeng", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_marc_tielupeng| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|835.1 MB| + +## References + +https://huggingface.co/tielupeng/xlm-roberta-base-finetuned-marc \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_marc_tielupeng_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_marc_tielupeng_pipeline_en.md new file mode 100644 index 00000000000000..8d1513dd841347 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_marc_tielupeng_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_marc_tielupeng_pipeline pipeline XlmRoBertaForSequenceClassification from tielupeng +author: John Snow Labs +name: xlm_roberta_base_finetuned_marc_tielupeng_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_marc_tielupeng_pipeline` is a English model originally trained by tielupeng. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_marc_tielupeng_pipeline_en_5.5.0_3.0_1727156675655.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_marc_tielupeng_pipeline_en_5.5.0_3.0_1727156675655.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_marc_tielupeng_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_marc_tielupeng_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_marc_tielupeng_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|835.1 MB| + +## References + +https://huggingface.co/tielupeng/xlm-roberta-base-finetuned-marc + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_all_0ppxnhximxr_en.md b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_all_0ppxnhximxr_en.md new file mode 100644 index 00000000000000..08ea63e58c1427 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_all_0ppxnhximxr_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_all_0ppxnhximxr XlmRoBertaForTokenClassification from 0ppxnhximxr +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_all_0ppxnhximxr +date: 2024-09-24 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_all_0ppxnhximxr` is a English model originally trained by 0ppxnhximxr. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_0ppxnhximxr_en_5.5.0_3.0_1727180196647.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_0ppxnhximxr_en_5.5.0_3.0_1727180196647.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_all_0ppxnhximxr","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_all_0ppxnhximxr", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_all_0ppxnhximxr| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|848.0 MB| + +## References + +https://huggingface.co/0ppxnhximxr/xlm-roberta-base-finetuned-panx-all \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_all_0ppxnhximxr_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_all_0ppxnhximxr_pipeline_en.md new file mode 100644 index 00000000000000..7ac5021db0ce74 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_all_0ppxnhximxr_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_all_0ppxnhximxr_pipeline pipeline XlmRoBertaForTokenClassification from 0ppxnhximxr +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_all_0ppxnhximxr_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_all_0ppxnhximxr_pipeline` is a English model originally trained by 0ppxnhximxr. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_0ppxnhximxr_pipeline_en_5.5.0_3.0_1727180277781.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_0ppxnhximxr_pipeline_en_5.5.0_3.0_1727180277781.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_all_0ppxnhximxr_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_all_0ppxnhximxr_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_all_0ppxnhximxr_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|848.0 MB| + +## References + +https://huggingface.co/0ppxnhximxr/xlm-roberta-base-finetuned-panx-all + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_all_amitjain171980_en.md b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_all_amitjain171980_en.md new file mode 100644 index 00000000000000..73b3e0a5caedad --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_all_amitjain171980_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_all_amitjain171980 XlmRoBertaForTokenClassification from amitjain171980 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_all_amitjain171980 +date: 2024-09-24 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_all_amitjain171980` is a English model originally trained by amitjain171980. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_amitjain171980_en_5.5.0_3.0_1727160303720.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_amitjain171980_en_5.5.0_3.0_1727160303720.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_all_amitjain171980","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_all_amitjain171980", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_all_amitjain171980| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|855.9 MB| + +## References + +https://huggingface.co/amitjain171980/xlm-roberta-base-finetuned-panx-all \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_all_hravi_en.md b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_all_hravi_en.md new file mode 100644 index 00000000000000..1f6358a3ae5ff1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_all_hravi_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_all_hravi XlmRoBertaForTokenClassification from hravi +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_all_hravi +date: 2024-09-24 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_all_hravi` is a English model originally trained by hravi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_hravi_en_5.5.0_3.0_1727180020333.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_hravi_en_5.5.0_3.0_1727180020333.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_all_hravi","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_all_hravi", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_all_hravi| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|848.0 MB| + +## References + +https://huggingface.co/hravi/xlm-roberta-base-finetuned-panx-all \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_all_k3lana_en.md b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_all_k3lana_en.md new file mode 100644 index 00000000000000..991f5f6df6c27a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_all_k3lana_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_all_k3lana XlmRoBertaForTokenClassification from k3lana +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_all_k3lana +date: 2024-09-24 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_all_k3lana` is a English model originally trained by k3lana. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_k3lana_en_5.5.0_3.0_1727174783070.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_k3lana_en_5.5.0_3.0_1727174783070.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_all_k3lana","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_all_k3lana", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_all_k3lana| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|861.0 MB| + +## References + +https://huggingface.co/k3lana/xlm-roberta-base-finetuned-panx-all \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_all_k3lana_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_all_k3lana_pipeline_en.md new file mode 100644 index 00000000000000..d0209a1ce071e8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_all_k3lana_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_all_k3lana_pipeline pipeline XlmRoBertaForTokenClassification from k3lana +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_all_k3lana_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_all_k3lana_pipeline` is a English model originally trained by k3lana. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_k3lana_pipeline_en_5.5.0_3.0_1727174847821.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_k3lana_pipeline_en_5.5.0_3.0_1727174847821.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_all_k3lana_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_all_k3lana_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_all_k3lana_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|861.0 MB| + +## References + +https://huggingface.co/k3lana/xlm-roberta-base-finetuned-panx-all + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_all_maxnet_en.md b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_all_maxnet_en.md new file mode 100644 index 00000000000000..043428c379df01 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_all_maxnet_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_all_maxnet XlmRoBertaForTokenClassification from Maxnet +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_all_maxnet +date: 2024-09-24 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_all_maxnet` is a English model originally trained by Maxnet. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_maxnet_en_5.5.0_3.0_1727160602568.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_maxnet_en_5.5.0_3.0_1727160602568.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_all_maxnet","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_all_maxnet", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_all_maxnet| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|859.8 MB| + +## References + +https://huggingface.co/Maxnet/xlm-roberta-base-finetuned-panx-all \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_all_maxnet_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_all_maxnet_pipeline_en.md new file mode 100644 index 00000000000000..40cd70f02e51b6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_all_maxnet_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_all_maxnet_pipeline pipeline XlmRoBertaForTokenClassification from Maxnet +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_all_maxnet_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_all_maxnet_pipeline` is a English model originally trained by Maxnet. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_maxnet_pipeline_en_5.5.0_3.0_1727160669843.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_maxnet_pipeline_en_5.5.0_3.0_1727160669843.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_all_maxnet_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_all_maxnet_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_all_maxnet_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|859.8 MB| + +## References + +https://huggingface.co/Maxnet/xlm-roberta-base-finetuned-panx-all + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_english_khadija267_en.md b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_english_khadija267_en.md new file mode 100644 index 00000000000000..a8f2af5d235f51 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_english_khadija267_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_english_khadija267 XlmRoBertaForTokenClassification from khadija267 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_english_khadija267 +date: 2024-09-24 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_english_khadija267` is a English model originally trained by khadija267. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_khadija267_en_5.5.0_3.0_1727160892552.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_khadija267_en_5.5.0_3.0_1727160892552.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_english_khadija267","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_english_khadija267", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_english_khadija267| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|826.4 MB| + +## References + +https://huggingface.co/khadija267/xlm-roberta-base-finetuned-panx-en \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_english_pockypocky_en.md b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_english_pockypocky_en.md new file mode 100644 index 00000000000000..e8dd82c98740f8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_english_pockypocky_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_english_pockypocky XlmRoBertaForTokenClassification from pockypocky +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_english_pockypocky +date: 2024-09-24 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_english_pockypocky` is a English model originally trained by pockypocky. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_pockypocky_en_5.5.0_3.0_1727147476717.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_pockypocky_en_5.5.0_3.0_1727147476717.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_english_pockypocky","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_english_pockypocky", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_english_pockypocky| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|814.3 MB| + +## References + +https://huggingface.co/pockypocky/xlm-roberta-base-finetuned-panx-en \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_english_skr1125_en.md b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_english_skr1125_en.md new file mode 100644 index 00000000000000..b97d3b87fef1d7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_english_skr1125_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_english_skr1125 XlmRoBertaForTokenClassification from skr1125 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_english_skr1125 +date: 2024-09-24 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_english_skr1125` is a English model originally trained by skr1125. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_skr1125_en_5.5.0_3.0_1727180150112.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_skr1125_en_5.5.0_3.0_1727180150112.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_english_skr1125","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_english_skr1125", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_english_skr1125| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|826.4 MB| + +## References + +https://huggingface.co/skr1125/xlm-roberta-base-finetuned-panx-en \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_french_cyrildever_en.md b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_french_cyrildever_en.md new file mode 100644 index 00000000000000..878def22d053cb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_french_cyrildever_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_cyrildever XlmRoBertaForTokenClassification from cyrildever +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_cyrildever +date: 2024-09-24 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_cyrildever` is a English model originally trained by cyrildever. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_cyrildever_en_5.5.0_3.0_1727148148262.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_cyrildever_en_5.5.0_3.0_1727148148262.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_french_cyrildever","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_french_cyrildever", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_cyrildever| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|831.2 MB| + +## References + +https://huggingface.co/cyrildever/xlm-roberta-base-finetuned-panx-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_french_cyrildever_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_french_cyrildever_pipeline_en.md new file mode 100644 index 00000000000000..8ea065110376ed --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_french_cyrildever_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_cyrildever_pipeline pipeline XlmRoBertaForTokenClassification from cyrildever +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_cyrildever_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_cyrildever_pipeline` is a English model originally trained by cyrildever. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_cyrildever_pipeline_en_5.5.0_3.0_1727148236718.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_cyrildever_pipeline_en_5.5.0_3.0_1727148236718.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_cyrildever_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_cyrildever_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_cyrildever_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|831.2 MB| + +## References + +https://huggingface.co/cyrildever/xlm-roberta-base-finetuned-panx-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_french_ridealist_en.md b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_french_ridealist_en.md new file mode 100644 index 00000000000000..99d6e6a7491fdf --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_french_ridealist_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_ridealist XlmRoBertaForTokenClassification from Ridealist +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_ridealist +date: 2024-09-24 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_ridealist` is a English model originally trained by Ridealist. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_ridealist_en_5.5.0_3.0_1727147695062.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_ridealist_en_5.5.0_3.0_1727147695062.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_french_ridealist","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_french_ridealist", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_ridealist| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|827.9 MB| + +## References + +https://huggingface.co/Ridealist/xlm-roberta-base-finetuned-panx-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_french_ridealist_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_french_ridealist_pipeline_en.md new file mode 100644 index 00000000000000..8098d39588e8d7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_french_ridealist_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_ridealist_pipeline pipeline XlmRoBertaForTokenClassification from Ridealist +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_ridealist_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_ridealist_pipeline` is a English model originally trained by Ridealist. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_ridealist_pipeline_en_5.5.0_3.0_1727147792847.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_ridealist_pipeline_en_5.5.0_3.0_1727147792847.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_ridealist_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_ridealist_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_ridealist_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|827.9 MB| + +## References + +https://huggingface.co/Ridealist/xlm-roberta-base-finetuned-panx-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_french_zebans_en.md b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_french_zebans_en.md new file mode 100644 index 00000000000000..827d75ce5e06e8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_french_zebans_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_zebans XlmRoBertaForTokenClassification from zebans +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_zebans +date: 2024-09-24 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_zebans` is a English model originally trained by zebans. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_zebans_en_5.5.0_3.0_1727160536340.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_zebans_en_5.5.0_3.0_1727160536340.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_french_zebans","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_french_zebans", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_zebans| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|831.2 MB| + +## References + +https://huggingface.co/zebans/xlm-roberta-base-finetuned-panx-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_french_zebans_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_french_zebans_pipeline_en.md new file mode 100644 index 00000000000000..a1fc33c67227f5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_french_zebans_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_zebans_pipeline pipeline XlmRoBertaForTokenClassification from zebans +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_zebans_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_zebans_pipeline` is a English model originally trained by zebans. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_zebans_pipeline_en_5.5.0_3.0_1727160620635.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_zebans_pipeline_en_5.5.0_3.0_1727160620635.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_zebans_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_zebans_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_zebans_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|831.2 MB| + +## References + +https://huggingface.co/zebans/xlm-roberta-base-finetuned-panx-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_german_cramade_en.md b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_german_cramade_en.md new file mode 100644 index 00000000000000..2b25b99cec94f8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_german_cramade_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_cramade XlmRoBertaForTokenClassification from cramade +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_cramade +date: 2024-09-24 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_cramade` is a English model originally trained by cramade. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_cramade_en_5.5.0_3.0_1727160712619.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_cramade_en_5.5.0_3.0_1727160712619.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_cramade","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_cramade", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_cramade| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|840.8 MB| + +## References + +https://huggingface.co/cramade/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_german_cramade_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_german_cramade_pipeline_en.md new file mode 100644 index 00000000000000..aaeebedc093d91 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_german_cramade_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_cramade_pipeline pipeline XlmRoBertaForTokenClassification from cramade +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_cramade_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_cramade_pipeline` is a English model originally trained by cramade. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_cramade_pipeline_en_5.5.0_3.0_1727160798955.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_cramade_pipeline_en_5.5.0_3.0_1727160798955.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_cramade_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_cramade_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_cramade_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|840.8 MB| + +## References + +https://huggingface.co/cramade/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_german_french_bessho_en.md b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_german_french_bessho_en.md new file mode 100644 index 00000000000000..ee4e6cf19cf54e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_german_french_bessho_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_bessho XlmRoBertaForTokenClassification from bessho +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_bessho +date: 2024-09-24 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_bessho` is a English model originally trained by bessho. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_bessho_en_5.5.0_3.0_1727175116872.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_bessho_en_5.5.0_3.0_1727175116872.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_bessho","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_bessho", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_bessho| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|858.2 MB| + +## References + +https://huggingface.co/bessho/xlm-roberta-base-finetuned-panx-de-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_german_french_dasooo_en.md b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_german_french_dasooo_en.md new file mode 100644 index 00000000000000..19602d988689e0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_german_french_dasooo_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_dasooo XlmRoBertaForTokenClassification from daSooo +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_dasooo +date: 2024-09-24 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_dasooo` is a English model originally trained by daSooo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_dasooo_en_5.5.0_3.0_1727180293698.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_dasooo_en_5.5.0_3.0_1727180293698.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_dasooo","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_dasooo", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_dasooo| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|843.4 MB| + +## References + +https://huggingface.co/daSooo/xlm-roberta-base-finetuned-panx-de-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_german_french_dasooo_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_german_french_dasooo_pipeline_en.md new file mode 100644 index 00000000000000..b495880fc8e646 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_german_french_dasooo_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_dasooo_pipeline pipeline XlmRoBertaForTokenClassification from daSooo +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_dasooo_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_dasooo_pipeline` is a English model originally trained by daSooo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_dasooo_pipeline_en_5.5.0_3.0_1727180378202.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_dasooo_pipeline_en_5.5.0_3.0_1727180378202.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_dasooo_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_dasooo_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_dasooo_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|843.4 MB| + +## References + +https://huggingface.co/daSooo/xlm-roberta-base-finetuned-panx-de-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_german_french_isaacp_en.md b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_german_french_isaacp_en.md new file mode 100644 index 00000000000000..78eab5801e6db5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_german_french_isaacp_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_isaacp XlmRoBertaForTokenClassification from Isaacp +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_isaacp +date: 2024-09-24 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_isaacp` is a English model originally trained by Isaacp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_isaacp_en_5.5.0_3.0_1727147434563.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_isaacp_en_5.5.0_3.0_1727147434563.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_isaacp","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_isaacp", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_isaacp| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|858.2 MB| + +## References + +https://huggingface.co/Isaacp/xlm-roberta-base-finetuned-panx-de-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_german_french_isaacp_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_german_french_isaacp_pipeline_en.md new file mode 100644 index 00000000000000..2be52190488257 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_german_french_isaacp_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_isaacp_pipeline pipeline XlmRoBertaForTokenClassification from Isaacp +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_isaacp_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_isaacp_pipeline` is a English model originally trained by Isaacp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_isaacp_pipeline_en_5.5.0_3.0_1727147501827.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_isaacp_pipeline_en_5.5.0_3.0_1727147501827.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_isaacp_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_isaacp_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_isaacp_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|858.2 MB| + +## References + +https://huggingface.co/Isaacp/xlm-roberta-base-finetuned-panx-de-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_german_french_rlpeter70_en.md b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_german_french_rlpeter70_en.md new file mode 100644 index 00000000000000..16c1e9df0ec5ea --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_german_french_rlpeter70_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_rlpeter70 XlmRoBertaForTokenClassification from rlpeter70 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_rlpeter70 +date: 2024-09-24 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_rlpeter70` is a English model originally trained by rlpeter70. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_rlpeter70_en_5.5.0_3.0_1727160215681.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_rlpeter70_en_5.5.0_3.0_1727160215681.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_rlpeter70","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_rlpeter70", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_rlpeter70| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|858.2 MB| + +## References + +https://huggingface.co/rlpeter70/xlm-roberta-base-finetuned-panx-de-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_german_french_rlpeter70_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_german_french_rlpeter70_pipeline_en.md new file mode 100644 index 00000000000000..3997592a7e3a3d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_german_french_rlpeter70_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_rlpeter70_pipeline pipeline XlmRoBertaForTokenClassification from rlpeter70 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_rlpeter70_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_rlpeter70_pipeline` is a English model originally trained by rlpeter70. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_rlpeter70_pipeline_en_5.5.0_3.0_1727160282532.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_rlpeter70_pipeline_en_5.5.0_3.0_1727160282532.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_rlpeter70_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_rlpeter70_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_rlpeter70_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|858.2 MB| + +## References + +https://huggingface.co/rlpeter70/xlm-roberta-base-finetuned-panx-de-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_german_french_xrchen11_en.md b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_german_french_xrchen11_en.md new file mode 100644 index 00000000000000..7171b5dea74df3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_german_french_xrchen11_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_xrchen11 XlmRoBertaForTokenClassification from xrchen11 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_xrchen11 +date: 2024-09-24 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_xrchen11` is a English model originally trained by xrchen11. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_xrchen11_en_5.5.0_3.0_1727147833906.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_xrchen11_en_5.5.0_3.0_1727147833906.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_xrchen11","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_xrchen11", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_xrchen11| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|843.4 MB| + +## References + +https://huggingface.co/xrchen11/xlm-roberta-base-finetuned-panx-de-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_german_french_ysige_en.md b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_german_french_ysige_en.md new file mode 100644 index 00000000000000..bb99e12f495196 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_german_french_ysige_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_ysige XlmRoBertaForTokenClassification from ysige +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_ysige +date: 2024-09-24 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_ysige` is a English model originally trained by ysige. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_ysige_en_5.5.0_3.0_1727214815789.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_ysige_en_5.5.0_3.0_1727214815789.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_ysige","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_ysige", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_ysige| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|858.2 MB| + +## References + +https://huggingface.co/ysige/xlm-roberta-base-finetuned-panx-de-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_german_hirosay_en.md b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_german_hirosay_en.md new file mode 100644 index 00000000000000..e2c9bac49ae777 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_german_hirosay_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_hirosay XlmRoBertaForTokenClassification from hirosay +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_hirosay +date: 2024-09-24 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_hirosay` is a English model originally trained by hirosay. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_hirosay_en_5.5.0_3.0_1727214830655.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_hirosay_en_5.5.0_3.0_1727214830655.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_hirosay","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_hirosay", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_hirosay| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/hirosay/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_german_hirosay_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_german_hirosay_pipeline_en.md new file mode 100644 index 00000000000000..644244ca5c4283 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_german_hirosay_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_hirosay_pipeline pipeline XlmRoBertaForTokenClassification from hirosay +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_hirosay_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_hirosay_pipeline` is a English model originally trained by hirosay. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_hirosay_pipeline_en_5.5.0_3.0_1727214903264.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_hirosay_pipeline_en_5.5.0_3.0_1727214903264.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_hirosay_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_hirosay_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_hirosay_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/hirosay/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_german_laurentiustancioiu_en.md b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_german_laurentiustancioiu_en.md new file mode 100644 index 00000000000000..ba1a8eb9efa145 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_german_laurentiustancioiu_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_laurentiustancioiu XlmRoBertaForTokenClassification from LaurentiuStancioiu +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_laurentiustancioiu +date: 2024-09-24 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_laurentiustancioiu` is a English model originally trained by LaurentiuStancioiu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_laurentiustancioiu_en_5.5.0_3.0_1727214813344.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_laurentiustancioiu_en_5.5.0_3.0_1727214813344.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_laurentiustancioiu","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_laurentiustancioiu", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_laurentiustancioiu| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/LaurentiuStancioiu/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_german_param_mehta_en.md b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_german_param_mehta_en.md new file mode 100644 index 00000000000000..f91f198320e9b3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_german_param_mehta_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_param_mehta XlmRoBertaForTokenClassification from param-mehta +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_param_mehta +date: 2024-09-24 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_param_mehta` is a English model originally trained by param-mehta. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_param_mehta_en_5.5.0_3.0_1727215226799.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_param_mehta_en_5.5.0_3.0_1727215226799.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_param_mehta","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_param_mehta", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_param_mehta| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|841.1 MB| + +## References + +https://huggingface.co/param-mehta/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_german_param_mehta_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_german_param_mehta_pipeline_en.md new file mode 100644 index 00000000000000..961480b8010717 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_german_param_mehta_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_param_mehta_pipeline pipeline XlmRoBertaForTokenClassification from param-mehta +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_param_mehta_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_param_mehta_pipeline` is a English model originally trained by param-mehta. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_param_mehta_pipeline_en_5.5.0_3.0_1727215312498.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_param_mehta_pipeline_en_5.5.0_3.0_1727215312498.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_param_mehta_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_param_mehta_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_param_mehta_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|841.2 MB| + +## References + +https://huggingface.co/param-mehta/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_german_youngbreadho_en.md b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_german_youngbreadho_en.md new file mode 100644 index 00000000000000..dbff1a6fc9ff6d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_german_youngbreadho_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_youngbreadho XlmRoBertaForTokenClassification from youngbreadho +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_youngbreadho +date: 2024-09-24 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_youngbreadho` is a English model originally trained by youngbreadho. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_youngbreadho_en_5.5.0_3.0_1727215193256.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_youngbreadho_en_5.5.0_3.0_1727215193256.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_youngbreadho","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_youngbreadho", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_youngbreadho| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|857.8 MB| + +## References + +https://huggingface.co/youngbreadho/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_hindi_urdu_en.md b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_hindi_urdu_en.md new file mode 100644 index 00000000000000..1c71b5c8f53e25 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_hindi_urdu_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_hindi_urdu XlmRoBertaForTokenClassification from DeepaPeri +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_hindi_urdu +date: 2024-09-24 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_hindi_urdu` is a English model originally trained by DeepaPeri. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_hindi_urdu_en_5.5.0_3.0_1727147870933.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_hindi_urdu_en_5.5.0_3.0_1727147870933.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_hindi_urdu","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_hindi_urdu", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_hindi_urdu| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|828.1 MB| + +## References + +https://huggingface.co/DeepaPeri/xlm-roberta-base-finetuned-panx-hi-ur \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_hindi_urdu_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_hindi_urdu_pipeline_en.md new file mode 100644 index 00000000000000..dbb7386fcfb873 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_hindi_urdu_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_hindi_urdu_pipeline pipeline XlmRoBertaForTokenClassification from DeepaPeri +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_hindi_urdu_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_hindi_urdu_pipeline` is a English model originally trained by DeepaPeri. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_hindi_urdu_pipeline_en_5.5.0_3.0_1727147963171.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_hindi_urdu_pipeline_en_5.5.0_3.0_1727147963171.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_hindi_urdu_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_hindi_urdu_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_hindi_urdu_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|828.1 MB| + +## References + +https://huggingface.co/DeepaPeri/xlm-roberta-base-finetuned-panx-hi-ur + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_italian_ankit15nov_en.md b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_italian_ankit15nov_en.md new file mode 100644 index 00000000000000..60b5496ce7b64c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_italian_ankit15nov_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_ankit15nov XlmRoBertaForTokenClassification from Ankit15nov +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_ankit15nov +date: 2024-09-24 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_ankit15nov` is a English model originally trained by Ankit15nov. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_ankit15nov_en_5.5.0_3.0_1727160899552.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_ankit15nov_en_5.5.0_3.0_1727160899552.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_ankit15nov","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_ankit15nov", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_ankit15nov| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|828.6 MB| + +## References + +https://huggingface.co/Ankit15nov/xlm-roberta-base-finetuned-panx-it \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_italian_ankit15nov_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_italian_ankit15nov_pipeline_en.md new file mode 100644 index 00000000000000..32e95354e577ac --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_italian_ankit15nov_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_ankit15nov_pipeline pipeline XlmRoBertaForTokenClassification from Ankit15nov +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_ankit15nov_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_ankit15nov_pipeline` is a English model originally trained by Ankit15nov. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_ankit15nov_pipeline_en_5.5.0_3.0_1727160987501.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_ankit15nov_pipeline_en_5.5.0_3.0_1727160987501.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_ankit15nov_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_ankit15nov_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_ankit15nov_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|828.6 MB| + +## References + +https://huggingface.co/Ankit15nov/xlm-roberta-base-finetuned-panx-it + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_italian_khadija267_en.md b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_italian_khadija267_en.md new file mode 100644 index 00000000000000..f2692a521846ff --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_italian_khadija267_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_khadija267 XlmRoBertaForTokenClassification from khadija267 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_khadija267 +date: 2024-09-24 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_khadija267` is a English model originally trained by khadija267. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_khadija267_en_5.5.0_3.0_1727174887087.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_khadija267_en_5.5.0_3.0_1727174887087.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_khadija267","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_khadija267", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_khadija267| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|828.6 MB| + +## References + +https://huggingface.co/khadija267/xlm-roberta-base-finetuned-panx-it \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_italian_khadija267_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_italian_khadija267_pipeline_en.md new file mode 100644 index 00000000000000..da219feab531db --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_italian_khadija267_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_khadija267_pipeline pipeline XlmRoBertaForTokenClassification from khadija267 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_khadija267_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_khadija267_pipeline` is a English model originally trained by khadija267. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_khadija267_pipeline_en_5.5.0_3.0_1727174972482.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_khadija267_pipeline_en_5.5.0_3.0_1727174972482.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_khadija267_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_khadija267_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_khadija267_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|828.6 MB| + +## References + +https://huggingface.co/khadija267/xlm-roberta-base-finetuned-panx-it + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_lr5e_06_seed42_amh_esp_eng_train_en.md b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_lr5e_06_seed42_amh_esp_eng_train_en.md new file mode 100644 index 00000000000000..17e1b85d654aed --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_lr5e_06_seed42_amh_esp_eng_train_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_lr5e_06_seed42_amh_esp_eng_train XlmRoBertaForSequenceClassification from shanhy +author: John Snow Labs +name: xlm_roberta_base_lr5e_06_seed42_amh_esp_eng_train +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_lr5e_06_seed42_amh_esp_eng_train` is a English model originally trained by shanhy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_lr5e_06_seed42_amh_esp_eng_train_en_5.5.0_3.0_1727156531764.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_lr5e_06_seed42_amh_esp_eng_train_en_5.5.0_3.0_1727156531764.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_lr5e_06_seed42_amh_esp_eng_train","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_lr5e_06_seed42_amh_esp_eng_train", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_lr5e_06_seed42_amh_esp_eng_train| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|810.5 MB| + +## References + +https://huggingface.co/shanhy/xlm-roberta-base_lr5e-06_seed42_amh-esp-eng_train \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_lr5e_06_seed42_amh_esp_eng_train_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_lr5e_06_seed42_amh_esp_eng_train_pipeline_en.md new file mode 100644 index 00000000000000..98ce11a0c47b4a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_lr5e_06_seed42_amh_esp_eng_train_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_lr5e_06_seed42_amh_esp_eng_train_pipeline pipeline XlmRoBertaForSequenceClassification from shanhy +author: John Snow Labs +name: xlm_roberta_base_lr5e_06_seed42_amh_esp_eng_train_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_lr5e_06_seed42_amh_esp_eng_train_pipeline` is a English model originally trained by shanhy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_lr5e_06_seed42_amh_esp_eng_train_pipeline_en_5.5.0_3.0_1727156661585.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_lr5e_06_seed42_amh_esp_eng_train_pipeline_en_5.5.0_3.0_1727156661585.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_lr5e_06_seed42_amh_esp_eng_train_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_lr5e_06_seed42_amh_esp_eng_train_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_lr5e_06_seed42_amh_esp_eng_train_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|810.5 MB| + +## References + +https://huggingface.co/shanhy/xlm-roberta-base_lr5e-06_seed42_amh-esp-eng_train + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_mixed_aug_insert_vietnamese_en.md b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_mixed_aug_insert_vietnamese_en.md new file mode 100644 index 00000000000000..01fa458eafc04a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_mixed_aug_insert_vietnamese_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_mixed_aug_insert_vietnamese XlmRoBertaForSequenceClassification from ThuyNT03 +author: John Snow Labs +name: xlm_roberta_base_mixed_aug_insert_vietnamese +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_mixed_aug_insert_vietnamese` is a English model originally trained by ThuyNT03. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_mixed_aug_insert_vietnamese_en_5.5.0_3.0_1727155878058.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_mixed_aug_insert_vietnamese_en_5.5.0_3.0_1727155878058.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_mixed_aug_insert_vietnamese","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_mixed_aug_insert_vietnamese", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_mixed_aug_insert_vietnamese| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|794.9 MB| + +## References + +https://huggingface.co/ThuyNT03/xlm-roberta-base-Mixed-aug_insert_vi \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_mixed_aug_insert_vietnamese_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_mixed_aug_insert_vietnamese_pipeline_en.md new file mode 100644 index 00000000000000..ecdad5514f3346 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_mixed_aug_insert_vietnamese_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_mixed_aug_insert_vietnamese_pipeline pipeline XlmRoBertaForSequenceClassification from ThuyNT03 +author: John Snow Labs +name: xlm_roberta_base_mixed_aug_insert_vietnamese_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_mixed_aug_insert_vietnamese_pipeline` is a English model originally trained by ThuyNT03. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_mixed_aug_insert_vietnamese_pipeline_en_5.5.0_3.0_1727156009421.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_mixed_aug_insert_vietnamese_pipeline_en_5.5.0_3.0_1727156009421.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_mixed_aug_insert_vietnamese_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_mixed_aug_insert_vietnamese_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_mixed_aug_insert_vietnamese_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|794.9 MB| + +## References + +https://huggingface.co/ThuyNT03/xlm-roberta-base-Mixed-aug_insert_vi + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_mixed_aug_swap_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_mixed_aug_swap_pipeline_en.md new file mode 100644 index 00000000000000..33071f17ec7a32 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_mixed_aug_swap_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_mixed_aug_swap_pipeline pipeline XlmRoBertaForSequenceClassification from ThuyNT03 +author: John Snow Labs +name: xlm_roberta_base_mixed_aug_swap_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_mixed_aug_swap_pipeline` is a English model originally trained by ThuyNT03. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_mixed_aug_swap_pipeline_en_5.5.0_3.0_1727153213563.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_mixed_aug_swap_pipeline_en_5.5.0_3.0_1727153213563.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_mixed_aug_swap_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_mixed_aug_swap_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_mixed_aug_swap_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|796.2 MB| + +## References + +https://huggingface.co/ThuyNT03/xlm-roberta-base-Mixed-aug_swap + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_pharmaconer_kanansharmaa_en.md b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_pharmaconer_kanansharmaa_en.md new file mode 100644 index 00000000000000..3134784b614898 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_pharmaconer_kanansharmaa_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_pharmaconer_kanansharmaa RoBertaForTokenClassification from kanansharmaa +author: John Snow Labs +name: xlm_roberta_base_pharmaconer_kanansharmaa +date: 2024-09-24 +tags: [en, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_pharmaconer_kanansharmaa` is a English model originally trained by kanansharmaa. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_pharmaconer_kanansharmaa_en_5.5.0_3.0_1727139613353.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_pharmaconer_kanansharmaa_en_5.5.0_3.0_1727139613353.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("xlm_roberta_base_pharmaconer_kanansharmaa","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("xlm_roberta_base_pharmaconer_kanansharmaa", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_pharmaconer_kanansharmaa| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|829.0 MB| + +## References + +https://huggingface.co/kanansharmaa/xlm-roberta-base-pharmaconer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_pharmaconer_kanansharmaa_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_pharmaconer_kanansharmaa_pipeline_en.md new file mode 100644 index 00000000000000..fc81b88cceba40 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_pharmaconer_kanansharmaa_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_pharmaconer_kanansharmaa_pipeline pipeline RoBertaForTokenClassification from kanansharmaa +author: John Snow Labs +name: xlm_roberta_base_pharmaconer_kanansharmaa_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_pharmaconer_kanansharmaa_pipeline` is a English model originally trained by kanansharmaa. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_pharmaconer_kanansharmaa_pipeline_en_5.5.0_3.0_1727139705988.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_pharmaconer_kanansharmaa_pipeline_en_5.5.0_3.0_1727139705988.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_pharmaconer_kanansharmaa_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_pharmaconer_kanansharmaa_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_pharmaconer_kanansharmaa_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|829.0 MB| + +## References + +https://huggingface.co/kanansharmaa/xlm-roberta-base-pharmaconer + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_russian_sentiment_liniscrowd_en.md b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_russian_sentiment_liniscrowd_en.md new file mode 100644 index 00000000000000..1b7378ced619f6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_russian_sentiment_liniscrowd_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_russian_sentiment_liniscrowd XlmRoBertaForSequenceClassification from sismetanin +author: John Snow Labs +name: xlm_roberta_base_russian_sentiment_liniscrowd +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_russian_sentiment_liniscrowd` is a English model originally trained by sismetanin. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_russian_sentiment_liniscrowd_en_5.5.0_3.0_1727152693144.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_russian_sentiment_liniscrowd_en_5.5.0_3.0_1727152693144.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_russian_sentiment_liniscrowd","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_russian_sentiment_liniscrowd", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_russian_sentiment_liniscrowd| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|800.0 MB| + +## References + +https://huggingface.co/sismetanin/xlm_roberta_base-ru-sentiment-liniscrowd \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_trimmed_spanish_10000_xnli_spanish_en.md b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_trimmed_spanish_10000_xnli_spanish_en.md new file mode 100644 index 00000000000000..01d15a0134ae8e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_trimmed_spanish_10000_xnli_spanish_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_trimmed_spanish_10000_xnli_spanish XlmRoBertaForSequenceClassification from vocabtrimmer +author: John Snow Labs +name: xlm_roberta_base_trimmed_spanish_10000_xnli_spanish +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_trimmed_spanish_10000_xnli_spanish` is a English model originally trained by vocabtrimmer. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_trimmed_spanish_10000_xnli_spanish_en_5.5.0_3.0_1727170235468.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_trimmed_spanish_10000_xnli_spanish_en_5.5.0_3.0_1727170235468.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_trimmed_spanish_10000_xnli_spanish","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_trimmed_spanish_10000_xnli_spanish", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_trimmed_spanish_10000_xnli_spanish| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|353.7 MB| + +## References + +https://huggingface.co/vocabtrimmer/xlm-roberta-base-trimmed-es-10000-xnli-es \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_trimmed_spanish_10000_xnli_spanish_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_trimmed_spanish_10000_xnli_spanish_pipeline_en.md new file mode 100644 index 00000000000000..6272e45e6ad4e6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_trimmed_spanish_10000_xnli_spanish_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_trimmed_spanish_10000_xnli_spanish_pipeline pipeline XlmRoBertaForSequenceClassification from vocabtrimmer +author: John Snow Labs +name: xlm_roberta_base_trimmed_spanish_10000_xnli_spanish_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_trimmed_spanish_10000_xnli_spanish_pipeline` is a English model originally trained by vocabtrimmer. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_trimmed_spanish_10000_xnli_spanish_pipeline_en_5.5.0_3.0_1727170253798.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_trimmed_spanish_10000_xnli_spanish_pipeline_en_5.5.0_3.0_1727170253798.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_trimmed_spanish_10000_xnli_spanish_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_trimmed_spanish_10000_xnli_spanish_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_trimmed_spanish_10000_xnli_spanish_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|353.7 MB| + +## References + +https://huggingface.co/vocabtrimmer/xlm-roberta-base-trimmed-es-10000-xnli-es + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_vietnam_aug_swap_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_vietnam_aug_swap_pipeline_en.md new file mode 100644 index 00000000000000..b111e1f8fac0dc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_vietnam_aug_swap_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_vietnam_aug_swap_pipeline pipeline XlmRoBertaForSequenceClassification from ThuyNT03 +author: John Snow Labs +name: xlm_roberta_base_vietnam_aug_swap_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_vietnam_aug_swap_pipeline` is a English model originally trained by ThuyNT03. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_vietnam_aug_swap_pipeline_en_5.5.0_3.0_1727152761189.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_vietnam_aug_swap_pipeline_en_5.5.0_3.0_1727152761189.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_vietnam_aug_swap_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_vietnam_aug_swap_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_vietnam_aug_swap_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|794.3 MB| + +## References + +https://huggingface.co/ThuyNT03/xlm-roberta-base-VietNam-aug_swap + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_wnli_10_en.md b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_wnli_10_en.md new file mode 100644 index 00000000000000..0244a50500d924 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_wnli_10_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_wnli_10 XlmRoBertaForSequenceClassification from tmnam20 +author: John Snow Labs +name: xlm_roberta_base_wnli_10 +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_wnli_10` is a English model originally trained by tmnam20. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_wnli_10_en_5.5.0_3.0_1727152300856.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_wnli_10_en_5.5.0_3.0_1727152300856.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_wnli_10","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_wnli_10", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_wnli_10| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|772.2 MB| + +## References + +https://huggingface.co/tmnam20/xlm-roberta-base-wnli-10 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_wnli_10_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_wnli_10_pipeline_en.md new file mode 100644 index 00000000000000..97d74cb1533d94 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_wnli_10_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_wnli_10_pipeline pipeline XlmRoBertaForSequenceClassification from tmnam20 +author: John Snow Labs +name: xlm_roberta_base_wnli_10_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_wnli_10_pipeline` is a English model originally trained by tmnam20. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_wnli_10_pipeline_en_5.5.0_3.0_1727152443513.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_wnli_10_pipeline_en_5.5.0_3.0_1727152443513.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_wnli_10_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_wnli_10_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_wnli_10_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|772.3 MB| + +## References + +https://huggingface.co/tmnam20/xlm-roberta-base-wnli-10 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_finetuned_emojis_iid_fed_en.md b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_finetuned_emojis_iid_fed_en.md new file mode 100644 index 00000000000000..39048f2e2e5270 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_finetuned_emojis_iid_fed_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_finetuned_emojis_iid_fed XlmRoBertaForSequenceClassification from Karim-Gamal +author: John Snow Labs +name: xlm_roberta_finetuned_emojis_iid_fed +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_finetuned_emojis_iid_fed` is a English model originally trained by Karim-Gamal. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_finetuned_emojis_iid_fed_en_5.5.0_3.0_1727152208462.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_finetuned_emojis_iid_fed_en_5.5.0_3.0_1727152208462.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_finetuned_emojis_iid_fed","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_finetuned_emojis_iid_fed", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_finetuned_emojis_iid_fed| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.0 GB| + +## References + +https://huggingface.co/Karim-Gamal/XLM-Roberta-finetuned-emojis-IID-Fed \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_finetuned_emojis_iid_fed_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_finetuned_emojis_iid_fed_pipeline_en.md new file mode 100644 index 00000000000000..14ed8ed53537fe --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_finetuned_emojis_iid_fed_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_finetuned_emojis_iid_fed_pipeline pipeline XlmRoBertaForSequenceClassification from Karim-Gamal +author: John Snow Labs +name: xlm_roberta_finetuned_emojis_iid_fed_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_finetuned_emojis_iid_fed_pipeline` is a English model originally trained by Karim-Gamal. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_finetuned_emojis_iid_fed_pipeline_en_5.5.0_3.0_1727152260950.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_finetuned_emojis_iid_fed_pipeline_en_5.5.0_3.0_1727152260950.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_finetuned_emojis_iid_fed_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_finetuned_emojis_iid_fed_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_finetuned_emojis_iid_fed_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.0 GB| + +## References + +https://huggingface.co/Karim-Gamal/XLM-Roberta-finetuned-emojis-IID-Fed + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_finetuned_semeval_2018_emojis_cen_1_en.md b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_finetuned_semeval_2018_emojis_cen_1_en.md new file mode 100644 index 00000000000000..be956ab5531532 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_finetuned_semeval_2018_emojis_cen_1_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_finetuned_semeval_2018_emojis_cen_1 XlmRoBertaForSequenceClassification from Karim-Gamal +author: John Snow Labs +name: xlm_roberta_finetuned_semeval_2018_emojis_cen_1 +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_finetuned_semeval_2018_emojis_cen_1` is a English model originally trained by Karim-Gamal. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_finetuned_semeval_2018_emojis_cen_1_en_5.5.0_3.0_1727170435788.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_finetuned_semeval_2018_emojis_cen_1_en_5.5.0_3.0_1727170435788.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_finetuned_semeval_2018_emojis_cen_1","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_finetuned_semeval_2018_emojis_cen_1", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_finetuned_semeval_2018_emojis_cen_1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.0 GB| + +## References + +https://huggingface.co/Karim-Gamal/XLM-Roberta-finetuned-SemEval-2018-emojis-cen-1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_finetuned_semeval_2018_emojis_cen_1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_finetuned_semeval_2018_emojis_cen_1_pipeline_en.md new file mode 100644 index 00000000000000..c5ee7b90e08a20 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_finetuned_semeval_2018_emojis_cen_1_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_finetuned_semeval_2018_emojis_cen_1_pipeline pipeline XlmRoBertaForSequenceClassification from Karim-Gamal +author: John Snow Labs +name: xlm_roberta_finetuned_semeval_2018_emojis_cen_1_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_finetuned_semeval_2018_emojis_cen_1_pipeline` is a English model originally trained by Karim-Gamal. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_finetuned_semeval_2018_emojis_cen_1_pipeline_en_5.5.0_3.0_1727170486782.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_finetuned_semeval_2018_emojis_cen_1_pipeline_en_5.5.0_3.0_1727170486782.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_finetuned_semeval_2018_emojis_cen_1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_finetuned_semeval_2018_emojis_cen_1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_finetuned_semeval_2018_emojis_cen_1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.0 GB| + +## References + +https://huggingface.co/Karim-Gamal/XLM-Roberta-finetuned-SemEval-2018-emojis-cen-1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_longformer_base_4096_repnum_wl_rua_wl_3_classes_fr.md b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_longformer_base_4096_repnum_wl_rua_wl_3_classes_fr.md new file mode 100644 index 00000000000000..2f92903242f5d8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_longformer_base_4096_repnum_wl_rua_wl_3_classes_fr.md @@ -0,0 +1,94 @@ +--- +layout: model +title: French xlm_roberta_longformer_base_4096_repnum_wl_rua_wl_3_classes XlmRoBertaForSequenceClassification from waboucay +author: John Snow Labs +name: xlm_roberta_longformer_base_4096_repnum_wl_rua_wl_3_classes +date: 2024-09-24 +tags: [fr, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: fr +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_longformer_base_4096_repnum_wl_rua_wl_3_classes` is a French model originally trained by waboucay. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_longformer_base_4096_repnum_wl_rua_wl_3_classes_fr_5.5.0_3.0_1727170484761.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_longformer_base_4096_repnum_wl_rua_wl_3_classes_fr_5.5.0_3.0_1727170484761.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_longformer_base_4096_repnum_wl_rua_wl_3_classes","fr") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_longformer_base_4096_repnum_wl_rua_wl_3_classes", "fr") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_longformer_base_4096_repnum_wl_rua_wl_3_classes| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|fr| +|Size:|1.1 GB| + +## References + +https://huggingface.co/waboucay/xlm-roberta-longformer-base-4096-repnum_wl-rua_wl_3_classes \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-xlm_v_base_trimmed_german_tweet_sentiment_german_en.md b/docs/_posts/ahmedlone127/2024-09-24-xlm_v_base_trimmed_german_tweet_sentiment_german_en.md new file mode 100644 index 00000000000000..e76e97197ba8d6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-xlm_v_base_trimmed_german_tweet_sentiment_german_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_v_base_trimmed_german_tweet_sentiment_german XlmRoBertaForSequenceClassification from vocabtrimmer +author: John Snow Labs +name: xlm_v_base_trimmed_german_tweet_sentiment_german +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_v_base_trimmed_german_tweet_sentiment_german` is a English model originally trained by vocabtrimmer. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_v_base_trimmed_german_tweet_sentiment_german_en_5.5.0_3.0_1727152544394.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_v_base_trimmed_german_tweet_sentiment_german_en_5.5.0_3.0_1727152544394.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_v_base_trimmed_german_tweet_sentiment_german","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_v_base_trimmed_german_tweet_sentiment_german", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_v_base_trimmed_german_tweet_sentiment_german| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|750.9 MB| + +## References + +https://huggingface.co/vocabtrimmer/xlm-v-base-trimmed-de-tweet-sentiment-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-xlm_v_base_trimmed_german_tweet_sentiment_german_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-xlm_v_base_trimmed_german_tweet_sentiment_german_pipeline_en.md new file mode 100644 index 00000000000000..ee38ed1defc192 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-xlm_v_base_trimmed_german_tweet_sentiment_german_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_v_base_trimmed_german_tweet_sentiment_german_pipeline pipeline XlmRoBertaForSequenceClassification from vocabtrimmer +author: John Snow Labs +name: xlm_v_base_trimmed_german_tweet_sentiment_german_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_v_base_trimmed_german_tweet_sentiment_german_pipeline` is a English model originally trained by vocabtrimmer. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_v_base_trimmed_german_tweet_sentiment_german_pipeline_en_5.5.0_3.0_1727152675534.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_v_base_trimmed_german_tweet_sentiment_german_pipeline_en_5.5.0_3.0_1727152675534.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_v_base_trimmed_german_tweet_sentiment_german_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_v_base_trimmed_german_tweet_sentiment_german_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_v_base_trimmed_german_tweet_sentiment_german_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|750.9 MB| + +## References + +https://huggingface.co/vocabtrimmer/xlm-v-base-trimmed-de-tweet-sentiment-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-xlmr_english_german_all_shuffled_1986_test1000_en.md b/docs/_posts/ahmedlone127/2024-09-24-xlmr_english_german_all_shuffled_1986_test1000_en.md new file mode 100644 index 00000000000000..c0f52830f23431 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-xlmr_english_german_all_shuffled_1986_test1000_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlmr_english_german_all_shuffled_1986_test1000 XlmRoBertaForSequenceClassification from patpizio +author: John Snow Labs +name: xlmr_english_german_all_shuffled_1986_test1000 +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlmr_english_german_all_shuffled_1986_test1000` is a English model originally trained by patpizio. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmr_english_german_all_shuffled_1986_test1000_en_5.5.0_3.0_1727156076571.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmr_english_german_all_shuffled_1986_test1000_en_5.5.0_3.0_1727156076571.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlmr_english_german_all_shuffled_1986_test1000","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlmr_english_german_all_shuffled_1986_test1000", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmr_english_german_all_shuffled_1986_test1000| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|826.3 MB| + +## References + +https://huggingface.co/patpizio/xlmr-en-de-all_shuffled-1986-test1000 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-xlmr_english_german_all_shuffled_1986_test1000_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-xlmr_english_german_all_shuffled_1986_test1000_pipeline_en.md new file mode 100644 index 00000000000000..6d957715ba62cc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-xlmr_english_german_all_shuffled_1986_test1000_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlmr_english_german_all_shuffled_1986_test1000_pipeline pipeline XlmRoBertaForSequenceClassification from patpizio +author: John Snow Labs +name: xlmr_english_german_all_shuffled_1986_test1000_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlmr_english_german_all_shuffled_1986_test1000_pipeline` is a English model originally trained by patpizio. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmr_english_german_all_shuffled_1986_test1000_pipeline_en_5.5.0_3.0_1727156194567.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmr_english_german_all_shuffled_1986_test1000_pipeline_en_5.5.0_3.0_1727156194567.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlmr_english_german_all_shuffled_1986_test1000_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlmr_english_german_all_shuffled_1986_test1000_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmr_english_german_all_shuffled_1986_test1000_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|826.3 MB| + +## References + +https://huggingface.co/patpizio/xlmr-en-de-all_shuffled-1986-test1000 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-xlmr_estonian_english_all_shuffled_1986_test1000_en.md b/docs/_posts/ahmedlone127/2024-09-24-xlmr_estonian_english_all_shuffled_1986_test1000_en.md new file mode 100644 index 00000000000000..c86e97db5e5d0a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-xlmr_estonian_english_all_shuffled_1986_test1000_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlmr_estonian_english_all_shuffled_1986_test1000 XlmRoBertaForSequenceClassification from patpizio +author: John Snow Labs +name: xlmr_estonian_english_all_shuffled_1986_test1000 +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlmr_estonian_english_all_shuffled_1986_test1000` is a English model originally trained by patpizio. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmr_estonian_english_all_shuffled_1986_test1000_en_5.5.0_3.0_1727155839562.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmr_estonian_english_all_shuffled_1986_test1000_en_5.5.0_3.0_1727155839562.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlmr_estonian_english_all_shuffled_1986_test1000","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlmr_estonian_english_all_shuffled_1986_test1000", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmr_estonian_english_all_shuffled_1986_test1000| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|820.5 MB| + +## References + +https://huggingface.co/patpizio/xlmr-et-en-all_shuffled-1986-test1000 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-xlmr_estonian_english_all_shuffled_1986_test1000_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-xlmr_estonian_english_all_shuffled_1986_test1000_pipeline_en.md new file mode 100644 index 00000000000000..12d100bbfa19dd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-xlmr_estonian_english_all_shuffled_1986_test1000_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlmr_estonian_english_all_shuffled_1986_test1000_pipeline pipeline XlmRoBertaForSequenceClassification from patpizio +author: John Snow Labs +name: xlmr_estonian_english_all_shuffled_1986_test1000_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlmr_estonian_english_all_shuffled_1986_test1000_pipeline` is a English model originally trained by patpizio. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmr_estonian_english_all_shuffled_1986_test1000_pipeline_en_5.5.0_3.0_1727155957648.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmr_estonian_english_all_shuffled_1986_test1000_pipeline_en_5.5.0_3.0_1727155957648.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlmr_estonian_english_all_shuffled_1986_test1000_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlmr_estonian_english_all_shuffled_1986_test1000_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmr_estonian_english_all_shuffled_1986_test1000_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|820.5 MB| + +## References + +https://huggingface.co/patpizio/xlmr-et-en-all_shuffled-1986-test1000 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-xlmr_nepali_english_all_shuffled_1985_test1000_en.md b/docs/_posts/ahmedlone127/2024-09-24-xlmr_nepali_english_all_shuffled_1985_test1000_en.md new file mode 100644 index 00000000000000..d84d98b95a687d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-xlmr_nepali_english_all_shuffled_1985_test1000_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlmr_nepali_english_all_shuffled_1985_test1000 XlmRoBertaForSequenceClassification from patpizio +author: John Snow Labs +name: xlmr_nepali_english_all_shuffled_1985_test1000 +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlmr_nepali_english_all_shuffled_1985_test1000` is a English model originally trained by patpizio. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmr_nepali_english_all_shuffled_1985_test1000_en_5.5.0_3.0_1727156451263.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmr_nepali_english_all_shuffled_1985_test1000_en_5.5.0_3.0_1727156451263.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlmr_nepali_english_all_shuffled_1985_test1000","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlmr_nepali_english_all_shuffled_1985_test1000", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmr_nepali_english_all_shuffled_1985_test1000| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|817.8 MB| + +## References + +https://huggingface.co/patpizio/xlmr-ne-en-all_shuffled-1985-test1000 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-xlmr_nepali_english_all_shuffled_1985_test1000_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-xlmr_nepali_english_all_shuffled_1985_test1000_pipeline_en.md new file mode 100644 index 00000000000000..3aa36292c84891 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-xlmr_nepali_english_all_shuffled_1985_test1000_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlmr_nepali_english_all_shuffled_1985_test1000_pipeline pipeline XlmRoBertaForSequenceClassification from patpizio +author: John Snow Labs +name: xlmr_nepali_english_all_shuffled_1985_test1000_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlmr_nepali_english_all_shuffled_1985_test1000_pipeline` is a English model originally trained by patpizio. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmr_nepali_english_all_shuffled_1985_test1000_pipeline_en_5.5.0_3.0_1727156573161.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmr_nepali_english_all_shuffled_1985_test1000_pipeline_en_5.5.0_3.0_1727156573161.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlmr_nepali_english_all_shuffled_1985_test1000_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlmr_nepali_english_all_shuffled_1985_test1000_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmr_nepali_english_all_shuffled_1985_test1000_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|817.8 MB| + +## References + +https://huggingface.co/patpizio/xlmr-ne-en-all_shuffled-1985-test1000 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-xlmr_semantic_textual_relatedness_en.md b/docs/_posts/ahmedlone127/2024-09-24-xlmr_semantic_textual_relatedness_en.md new file mode 100644 index 00000000000000..430ae9c0560951 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-xlmr_semantic_textual_relatedness_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlmr_semantic_textual_relatedness XlmRoBertaForSequenceClassification from kietnt0603 +author: John Snow Labs +name: xlmr_semantic_textual_relatedness +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlmr_semantic_textual_relatedness` is a English model originally trained by kietnt0603. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmr_semantic_textual_relatedness_en_5.5.0_3.0_1727156208292.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmr_semantic_textual_relatedness_en_5.5.0_3.0_1727156208292.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlmr_semantic_textual_relatedness","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlmr_semantic_textual_relatedness", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmr_semantic_textual_relatedness| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.0 GB| + +## References + +https://huggingface.co/kietnt0603/xlmr-semantic-textual-relatedness \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-xlmr_semantic_textual_relatedness_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-xlmr_semantic_textual_relatedness_pipeline_en.md new file mode 100644 index 00000000000000..eebcf52cc4dff6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-xlmr_semantic_textual_relatedness_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlmr_semantic_textual_relatedness_pipeline pipeline XlmRoBertaForSequenceClassification from kietnt0603 +author: John Snow Labs +name: xlmr_semantic_textual_relatedness_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlmr_semantic_textual_relatedness_pipeline` is a English model originally trained by kietnt0603. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmr_semantic_textual_relatedness_pipeline_en_5.5.0_3.0_1727156263340.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmr_semantic_textual_relatedness_pipeline_en_5.5.0_3.0_1727156263340.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlmr_semantic_textual_relatedness_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlmr_semantic_textual_relatedness_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmr_semantic_textual_relatedness_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.0 GB| + +## References + +https://huggingface.co/kietnt0603/xlmr-semantic-textual-relatedness + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-xlmroberta_ner_moghis_base_finetuned_panx_fr.md b/docs/_posts/ahmedlone127/2024-09-24-xlmroberta_ner_moghis_base_finetuned_panx_fr.md new file mode 100644 index 00000000000000..dc1764a995290c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-xlmroberta_ner_moghis_base_finetuned_panx_fr.md @@ -0,0 +1,112 @@ +--- +layout: model +title: French XLMRobertaForTokenClassification Base Cased model (from moghis) +author: John Snow Labs +name: xlmroberta_ner_moghis_base_finetuned_panx +date: 2024-09-24 +tags: [fr, open_source, xlm_roberta, ner, onnx] +task: Named Entity Recognition +language: fr +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XLMRobertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `xlm-roberta-base-finetuned-panx-fr` is a French model originally trained by `moghis`. + +## Predicted Entities + +`PER`, `LOC`, `ORG` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_moghis_base_finetuned_panx_fr_5.5.0_3.0_1727215007838.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_moghis_base_finetuned_panx_fr_5.5.0_3.0_1727215007838.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +token_classifier = XlmRoBertaForTokenClassification.pretrained("xlmroberta_ner_moghis_base_finetuned_panx","fr") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("ner") + +ner_converter = NerConverter()\ + .setInputCols(["document", "token", "ner"])\ + .setOutputCol("ner_chunk") + +pipeline = Pipeline(stages=[documentAssembler, tokenizer, token_classifier, ner_converter]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCols(Array("text")) + .setOutputCols(Array("document")) + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val token_classifier = XlmRoBertaForTokenClassification.pretrained("xlmroberta_ner_moghis_base_finetuned_panx","fr") + .setInputCols(Array("document", "token")) + .setOutputCol("ner") + +val ner_converter = new NerConverter() + .setInputCols(Array("document", "token', "ner")) + .setOutputCol("ner_chunk") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, token_classifier, ner_converter)) + +val data = Seq("PUT YOUR STRING HERE").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("fr.ner.xlmr_roberta.xtreme.base_finetuned.by_moghis").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmroberta_ner_moghis_base_finetuned_panx| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|fr| +|Size:|840.9 MB| + +## References + +References + +- https://huggingface.co/moghis/xlm-roberta-base-finetuned-panx-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-xlmroberta_ner_moghis_base_finetuned_panx_pipeline_fr.md b/docs/_posts/ahmedlone127/2024-09-24-xlmroberta_ner_moghis_base_finetuned_panx_pipeline_fr.md new file mode 100644 index 00000000000000..e1713b7fd9f52e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-xlmroberta_ner_moghis_base_finetuned_panx_pipeline_fr.md @@ -0,0 +1,70 @@ +--- +layout: model +title: French xlmroberta_ner_moghis_base_finetuned_panx_pipeline pipeline XlmRoBertaForTokenClassification from moghis +author: John Snow Labs +name: xlmroberta_ner_moghis_base_finetuned_panx_pipeline +date: 2024-09-24 +tags: [fr, open_source, pipeline, onnx] +task: Named Entity Recognition +language: fr +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlmroberta_ner_moghis_base_finetuned_panx_pipeline` is a French model originally trained by moghis. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_moghis_base_finetuned_panx_pipeline_fr_5.5.0_3.0_1727215088565.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_moghis_base_finetuned_panx_pipeline_fr_5.5.0_3.0_1727215088565.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlmroberta_ner_moghis_base_finetuned_panx_pipeline", lang = "fr") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlmroberta_ner_moghis_base_finetuned_panx_pipeline", lang = "fr") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmroberta_ner_moghis_base_finetuned_panx_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|fr| +|Size:|840.9 MB| + +## References + +https://huggingface.co/moghis/xlm-roberta-base-finetuned-panx-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-2d_oomv2_800_en.md b/docs/_posts/ahmedlone127/2024-09-25-2d_oomv2_800_en.md new file mode 100644 index 00000000000000..168ee37ad7b9fb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-2d_oomv2_800_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English 2d_oomv2_800 BertForSequenceClassification from abbassix +author: John Snow Labs +name: 2d_oomv2_800 +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`2d_oomv2_800` is a English model originally trained by abbassix. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/2d_oomv2_800_en_5.5.0_3.0_1727288371864.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/2d_oomv2_800_en_5.5.0_3.0_1727288371864.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("2d_oomv2_800","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("2d_oomv2_800", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|2d_oomv2_800| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/abbassix/2d_oomv2_800 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-2d_psn_1600_en.md b/docs/_posts/ahmedlone127/2024-09-25-2d_psn_1600_en.md new file mode 100644 index 00000000000000..b0b45e0f0eada9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-2d_psn_1600_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English 2d_psn_1600 BertForSequenceClassification from abbassix +author: John Snow Labs +name: 2d_psn_1600 +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`2d_psn_1600` is a English model originally trained by abbassix. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/2d_psn_1600_en_5.5.0_3.0_1727276200209.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/2d_psn_1600_en_5.5.0_3.0_1727276200209.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("2d_psn_1600","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("2d_psn_1600", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|2d_psn_1600| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/abbassix/2d_psn_1600 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-acronyms_baseline_vert_correct_clinicalbert_en.md b/docs/_posts/ahmedlone127/2024-09-25-acronyms_baseline_vert_correct_clinicalbert_en.md new file mode 100644 index 00000000000000..4e1e3bd81fc370 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-acronyms_baseline_vert_correct_clinicalbert_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English acronyms_baseline_vert_correct_clinicalbert BertForSequenceClassification from Wiggily +author: John Snow Labs +name: acronyms_baseline_vert_correct_clinicalbert +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`acronyms_baseline_vert_correct_clinicalbert` is a English model originally trained by Wiggily. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/acronyms_baseline_vert_correct_clinicalbert_en_5.5.0_3.0_1727245392430.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/acronyms_baseline_vert_correct_clinicalbert_en_5.5.0_3.0_1727245392430.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("acronyms_baseline_vert_correct_clinicalbert","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("acronyms_baseline_vert_correct_clinicalbert", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|acronyms_baseline_vert_correct_clinicalbert| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|405.5 MB| + +## References + +https://huggingface.co/Wiggily/acronyms_baseline_vert_correct_clinicalbert \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-adrv2024_markadamsmsba24_en.md b/docs/_posts/ahmedlone127/2024-09-25-adrv2024_markadamsmsba24_en.md new file mode 100644 index 00000000000000..cd1b83272e7931 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-adrv2024_markadamsmsba24_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English adrv2024_markadamsmsba24 BertForSequenceClassification from MarkAdamsMSBA24 +author: John Snow Labs +name: adrv2024_markadamsmsba24 +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`adrv2024_markadamsmsba24` is a English model originally trained by MarkAdamsMSBA24. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/adrv2024_markadamsmsba24_en_5.5.0_3.0_1727267305241.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/adrv2024_markadamsmsba24_en_5.5.0_3.0_1727267305241.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("adrv2024_markadamsmsba24","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("adrv2024_markadamsmsba24", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|adrv2024_markadamsmsba24| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/MarkAdamsMSBA24/ADRv2024 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-adrv2024_paragon_analytics_en.md b/docs/_posts/ahmedlone127/2024-09-25-adrv2024_paragon_analytics_en.md new file mode 100644 index 00000000000000..ae86905aca6194 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-adrv2024_paragon_analytics_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English adrv2024_paragon_analytics BertForSequenceClassification from paragon-analytics +author: John Snow Labs +name: adrv2024_paragon_analytics +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`adrv2024_paragon_analytics` is a English model originally trained by paragon-analytics. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/adrv2024_paragon_analytics_en_5.5.0_3.0_1727268598866.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/adrv2024_paragon_analytics_en_5.5.0_3.0_1727268598866.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("adrv2024_paragon_analytics","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("adrv2024_paragon_analytics", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|adrv2024_paragon_analytics| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|405.9 MB| + +## References + +https://huggingface.co/paragon-analytics/ADRv2024 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-adrv2024_paragon_analytics_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-adrv2024_paragon_analytics_pipeline_en.md new file mode 100644 index 00000000000000..4a5da165abfb4c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-adrv2024_paragon_analytics_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English adrv2024_paragon_analytics_pipeline pipeline BertForSequenceClassification from paragon-analytics +author: John Snow Labs +name: adrv2024_paragon_analytics_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`adrv2024_paragon_analytics_pipeline` is a English model originally trained by paragon-analytics. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/adrv2024_paragon_analytics_pipeline_en_5.5.0_3.0_1727268620346.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/adrv2024_paragon_analytics_pipeline_en_5.5.0_3.0_1727268620346.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("adrv2024_paragon_analytics_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("adrv2024_paragon_analytics_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|adrv2024_paragon_analytics_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|405.9 MB| + +## References + +https://huggingface.co/paragon-analytics/ADRv2024 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-advance_bert_classification_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-advance_bert_classification_pipeline_en.md new file mode 100644 index 00000000000000..7a5e8d4a1b45bf --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-advance_bert_classification_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English advance_bert_classification_pipeline pipeline BertForSequenceClassification from Kurkur99 +author: John Snow Labs +name: advance_bert_classification_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`advance_bert_classification_pipeline` is a English model originally trained by Kurkur99. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/advance_bert_classification_pipeline_en_5.5.0_3.0_1727269957471.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/advance_bert_classification_pipeline_en_5.5.0_3.0_1727269957471.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("advance_bert_classification_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("advance_bert_classification_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|advance_bert_classification_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|414.1 MB| + +## References + +https://huggingface.co/Kurkur99/Advance_Bert_Classification + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-aes_bert_base_sp90_lr1e_05_wr1e_01_wd1e_02_ep15_elsa_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-aes_bert_base_sp90_lr1e_05_wr1e_01_wd1e_02_ep15_elsa_pipeline_en.md new file mode 100644 index 00000000000000..8a475e9031f6ac --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-aes_bert_base_sp90_lr1e_05_wr1e_01_wd1e_02_ep15_elsa_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English aes_bert_base_sp90_lr1e_05_wr1e_01_wd1e_02_ep15_elsa_pipeline pipeline BertForSequenceClassification from ys7yoo +author: John Snow Labs +name: aes_bert_base_sp90_lr1e_05_wr1e_01_wd1e_02_ep15_elsa_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`aes_bert_base_sp90_lr1e_05_wr1e_01_wd1e_02_ep15_elsa_pipeline` is a English model originally trained by ys7yoo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/aes_bert_base_sp90_lr1e_05_wr1e_01_wd1e_02_ep15_elsa_pipeline_en_5.5.0_3.0_1727287956799.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/aes_bert_base_sp90_lr1e_05_wr1e_01_wd1e_02_ep15_elsa_pipeline_en_5.5.0_3.0_1727287956799.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("aes_bert_base_sp90_lr1e_05_wr1e_01_wd1e_02_ep15_elsa_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("aes_bert_base_sp90_lr1e_05_wr1e_01_wd1e_02_ep15_elsa_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|aes_bert_base_sp90_lr1e_05_wr1e_01_wd1e_02_ep15_elsa_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|414.6 MB| + +## References + +https://huggingface.co/ys7yoo/aes_bert-base_sp90_lr1e-05_wr1e-01_wd1e-02_ep15_elsa + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-aes_enem_models_sourcea_regression_from_bertimbau_large_c5_en.md b/docs/_posts/ahmedlone127/2024-09-25-aes_enem_models_sourcea_regression_from_bertimbau_large_c5_en.md new file mode 100644 index 00000000000000..239ed97a46214c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-aes_enem_models_sourcea_regression_from_bertimbau_large_c5_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English aes_enem_models_sourcea_regression_from_bertimbau_large_c5 BertForSequenceClassification from kamel-usp +author: John Snow Labs +name: aes_enem_models_sourcea_regression_from_bertimbau_large_c5 +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`aes_enem_models_sourcea_regression_from_bertimbau_large_c5` is a English model originally trained by kamel-usp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/aes_enem_models_sourcea_regression_from_bertimbau_large_c5_en_5.5.0_3.0_1727261656934.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/aes_enem_models_sourcea_regression_from_bertimbau_large_c5_en_5.5.0_3.0_1727261656934.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("aes_enem_models_sourcea_regression_from_bertimbau_large_c5","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("aes_enem_models_sourcea_regression_from_bertimbau_large_c5", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|aes_enem_models_sourcea_regression_from_bertimbau_large_c5| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/kamel-usp/aes_enem_models-sourceA-regression-from-bertimbau-large-C5 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-ag_news_38400_bert_base_uncased_en.md b/docs/_posts/ahmedlone127/2024-09-25-ag_news_38400_bert_base_uncased_en.md new file mode 100644 index 00000000000000..0f8a792179dc19 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-ag_news_38400_bert_base_uncased_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English ag_news_38400_bert_base_uncased BertForSequenceClassification from Kyle1668 +author: John Snow Labs +name: ag_news_38400_bert_base_uncased +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ag_news_38400_bert_base_uncased` is a English model originally trained by Kyle1668. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ag_news_38400_bert_base_uncased_en_5.5.0_3.0_1727222427253.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ag_news_38400_bert_base_uncased_en_5.5.0_3.0_1727222427253.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("ag_news_38400_bert_base_uncased","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("ag_news_38400_bert_base_uncased", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ag_news_38400_bert_base_uncased| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/Kyle1668/ag-news-38400-bert-base-uncased \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-alberti_stanzas_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-alberti_stanzas_pipeline_en.md new file mode 100644 index 00000000000000..b99a40fc10fd87 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-alberti_stanzas_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English alberti_stanzas_pipeline pipeline BertForSequenceClassification from alvp +author: John Snow Labs +name: alberti_stanzas_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`alberti_stanzas_pipeline` is a English model originally trained by alvp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/alberti_stanzas_pipeline_en_5.5.0_3.0_1727267423790.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/alberti_stanzas_pipeline_en_5.5.0_3.0_1727267423790.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("alberti_stanzas_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("alberti_stanzas_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|alberti_stanzas_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|666.7 MB| + +## References + +https://huggingface.co/alvp/alberti-stanzas + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-albertv2_dc_unsorted_dec_cf_en.md b/docs/_posts/ahmedlone127/2024-09-25-albertv2_dc_unsorted_dec_cf_en.md new file mode 100644 index 00000000000000..c60a873920f9fa --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-albertv2_dc_unsorted_dec_cf_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English albertv2_dc_unsorted_dec_cf BertForSequenceClassification from rpii2023 +author: John Snow Labs +name: albertv2_dc_unsorted_dec_cf +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`albertv2_dc_unsorted_dec_cf` is a English model originally trained by rpii2023. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/albertv2_dc_unsorted_dec_cf_en_5.5.0_3.0_1727239459657.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/albertv2_dc_unsorted_dec_cf_en_5.5.0_3.0_1727239459657.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("albertv2_dc_unsorted_dec_cf","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("albertv2_dc_unsorted_dec_cf", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|albertv2_dc_unsorted_dec_cf| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/rpii2023/albertv2_DC_unsorted_DEC_CF \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-albertv2_dc_unsorted_dec_cf_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-albertv2_dc_unsorted_dec_cf_pipeline_en.md new file mode 100644 index 00000000000000..d9485925e26a78 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-albertv2_dc_unsorted_dec_cf_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English albertv2_dc_unsorted_dec_cf_pipeline pipeline BertForSequenceClassification from rpii2023 +author: John Snow Labs +name: albertv2_dc_unsorted_dec_cf_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`albertv2_dc_unsorted_dec_cf_pipeline` is a English model originally trained by rpii2023. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/albertv2_dc_unsorted_dec_cf_pipeline_en_5.5.0_3.0_1727239484248.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/albertv2_dc_unsorted_dec_cf_pipeline_en_5.5.0_3.0_1727239484248.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("albertv2_dc_unsorted_dec_cf_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("albertv2_dc_unsorted_dec_cf_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|albertv2_dc_unsorted_dec_cf_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/rpii2023/albertv2_DC_unsorted_DEC_CF + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-alsval_en.md b/docs/_posts/ahmedlone127/2024-09-25-alsval_en.md new file mode 100644 index 00000000000000..cb341ee6e114b0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-alsval_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English alsval BertForSequenceClassification from yeamerci +author: John Snow Labs +name: alsval +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`alsval` is a English model originally trained by yeamerci. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/alsval_en_5.5.0_3.0_1727268126325.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/alsval_en_5.5.0_3.0_1727268126325.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("alsval","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("alsval", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|alsval| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|664.5 MB| + +## References + +https://huggingface.co/yeamerci/alsval \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-alsval_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-alsval_pipeline_en.md new file mode 100644 index 00000000000000..1ba604632e3b72 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-alsval_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English alsval_pipeline pipeline BertForSequenceClassification from yeamerci +author: John Snow Labs +name: alsval_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`alsval_pipeline` is a English model originally trained by yeamerci. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/alsval_pipeline_en_5.5.0_3.0_1727268162233.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/alsval_pipeline_en_5.5.0_3.0_1727268162233.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("alsval_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("alsval_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|alsval_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|664.5 MB| + +## References + +https://huggingface.co/yeamerci/alsval + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-amir_clinicalbert_2_en.md b/docs/_posts/ahmedlone127/2024-09-25-amir_clinicalbert_2_en.md new file mode 100644 index 00000000000000..9e970332a74bf2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-amir_clinicalbert_2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English amir_clinicalbert_2 BertForTokenClassification from amirali26 +author: John Snow Labs +name: amir_clinicalbert_2 +date: 2024-09-25 +tags: [en, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`amir_clinicalbert_2` is a English model originally trained by amirali26. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/amir_clinicalbert_2_en_5.5.0_3.0_1727282119447.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/amir_clinicalbert_2_en_5.5.0_3.0_1727282119447.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("amir_clinicalbert_2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("amir_clinicalbert_2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|amir_clinicalbert_2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.3 MB| + +## References + +https://huggingface.co/amirali26/amir-clinicalbert-2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-amir_clinicalbert_2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-amir_clinicalbert_2_pipeline_en.md new file mode 100644 index 00000000000000..413501e142efc7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-amir_clinicalbert_2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English amir_clinicalbert_2_pipeline pipeline BertForTokenClassification from amirali26 +author: John Snow Labs +name: amir_clinicalbert_2_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`amir_clinicalbert_2_pipeline` is a English model originally trained by amirali26. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/amir_clinicalbert_2_pipeline_en_5.5.0_3.0_1727282140590.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/amir_clinicalbert_2_pipeline_en_5.5.0_3.0_1727282140590.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("amir_clinicalbert_2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("amir_clinicalbert_2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|amir_clinicalbert_2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|403.4 MB| + +## References + +https://huggingface.co/amirali26/amir-clinicalbert-2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-amir_clinicalbert_specialities_en.md b/docs/_posts/ahmedlone127/2024-09-25-amir_clinicalbert_specialities_en.md new file mode 100644 index 00000000000000..d0ad7550d1187f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-amir_clinicalbert_specialities_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English amir_clinicalbert_specialities BertForTokenClassification from amirali26 +author: John Snow Labs +name: amir_clinicalbert_specialities +date: 2024-09-25 +tags: [en, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`amir_clinicalbert_specialities` is a English model originally trained by amirali26. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/amir_clinicalbert_specialities_en_5.5.0_3.0_1727260708867.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/amir_clinicalbert_specialities_en_5.5.0_3.0_1727260708867.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("amir_clinicalbert_specialities","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("amir_clinicalbert_specialities", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|amir_clinicalbert_specialities| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.3 MB| + +## References + +https://huggingface.co/amirali26/amir-clinicalbert-specialities \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-amir_clinicalbert_specialities_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-amir_clinicalbert_specialities_pipeline_en.md new file mode 100644 index 00000000000000..1721fd482c3bbe --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-amir_clinicalbert_specialities_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English amir_clinicalbert_specialities_pipeline pipeline BertForTokenClassification from amirali26 +author: John Snow Labs +name: amir_clinicalbert_specialities_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`amir_clinicalbert_specialities_pipeline` is a English model originally trained by amirali26. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/amir_clinicalbert_specialities_pipeline_en_5.5.0_3.0_1727260729965.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/amir_clinicalbert_specialities_pipeline_en_5.5.0_3.0_1727260729965.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("amir_clinicalbert_specialities_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("amir_clinicalbert_specialities_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|amir_clinicalbert_specialities_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|403.4 MB| + +## References + +https://huggingface.co/amirali26/amir-clinicalbert-specialities + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-anglicisms_spanish_beto_es.md b/docs/_posts/ahmedlone127/2024-09-25-anglicisms_spanish_beto_es.md new file mode 100644 index 00000000000000..0124c95d597974 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-anglicisms_spanish_beto_es.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Castilian, Spanish anglicisms_spanish_beto BertForTokenClassification from lirondos +author: John Snow Labs +name: anglicisms_spanish_beto +date: 2024-09-25 +tags: [es, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: es +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`anglicisms_spanish_beto` is a Castilian, Spanish model originally trained by lirondos. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/anglicisms_spanish_beto_es_5.5.0_3.0_1727249839024.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/anglicisms_spanish_beto_es_5.5.0_3.0_1727249839024.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("anglicisms_spanish_beto","es") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("anglicisms_spanish_beto", "es") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|anglicisms_spanish_beto| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|es| +|Size:|409.5 MB| + +## References + +https://huggingface.co/lirondos/anglicisms-spanish-beto \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-arqmath_bert_base_cased_en.md b/docs/_posts/ahmedlone127/2024-09-25-arqmath_bert_base_cased_en.md new file mode 100644 index 00000000000000..019ae9d313bd75 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-arqmath_bert_base_cased_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English arqmath_bert_base_cased BertForSequenceClassification from malteos +author: John Snow Labs +name: arqmath_bert_base_cased +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`arqmath_bert_base_cased` is a English model originally trained by malteos. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/arqmath_bert_base_cased_en_5.5.0_3.0_1727273232573.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/arqmath_bert_base_cased_en_5.5.0_3.0_1727273232573.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("arqmath_bert_base_cased","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("arqmath_bert_base_cased", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|arqmath_bert_base_cased| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|405.7 MB| + +## References + +https://huggingface.co/malteos/arqmath-bert-base-cased \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-authorparsermodel_de.md b/docs/_posts/ahmedlone127/2024-09-25-authorparsermodel_de.md new file mode 100644 index 00000000000000..1d10fbc22c7ade --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-authorparsermodel_de.md @@ -0,0 +1,94 @@ +--- +layout: model +title: German authorparsermodel BertForTokenClassification from GEOcite +author: John Snow Labs +name: authorparsermodel +date: 2024-09-25 +tags: [de, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: de +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`authorparsermodel` is a German model originally trained by GEOcite. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/authorparsermodel_de_5.5.0_3.0_1727280916608.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/authorparsermodel_de_5.5.0_3.0_1727280916608.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("authorparsermodel","de") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("authorparsermodel", "de") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|authorparsermodel| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|de| +|Size:|625.5 MB| + +## References + +https://huggingface.co/GEOcite/AuthorParserModel \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-autotrain_bertbase_imdb_1275748792_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-autotrain_bertbase_imdb_1275748792_pipeline_en.md new file mode 100644 index 00000000000000..5758eea8fb0bfc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-autotrain_bertbase_imdb_1275748792_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English autotrain_bertbase_imdb_1275748792_pipeline pipeline BertForSequenceClassification from sasha +author: John Snow Labs +name: autotrain_bertbase_imdb_1275748792_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`autotrain_bertbase_imdb_1275748792_pipeline` is a English model originally trained by sasha. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/autotrain_bertbase_imdb_1275748792_pipeline_en_5.5.0_3.0_1727277487477.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/autotrain_bertbase_imdb_1275748792_pipeline_en_5.5.0_3.0_1727277487477.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("autotrain_bertbase_imdb_1275748792_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("autotrain_bertbase_imdb_1275748792_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|autotrain_bertbase_imdb_1275748792_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/sasha/autotrain-BERTBase-imdb-1275748792 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-autotrain_bertbase_imdb_1275748793_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-autotrain_bertbase_imdb_1275748793_pipeline_en.md new file mode 100644 index 00000000000000..6af0cba49af530 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-autotrain_bertbase_imdb_1275748793_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English autotrain_bertbase_imdb_1275748793_pipeline pipeline BertForSequenceClassification from sasha +author: John Snow Labs +name: autotrain_bertbase_imdb_1275748793_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`autotrain_bertbase_imdb_1275748793_pipeline` is a English model originally trained by sasha. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/autotrain_bertbase_imdb_1275748793_pipeline_en_5.5.0_3.0_1727284881697.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/autotrain_bertbase_imdb_1275748793_pipeline_en_5.5.0_3.0_1727284881697.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("autotrain_bertbase_imdb_1275748793_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("autotrain_bertbase_imdb_1275748793_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|autotrain_bertbase_imdb_1275748793_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/sasha/autotrain-BERTBase-imdb-1275748793 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-banglabert_generator_bn.md b/docs/_posts/ahmedlone127/2024-09-25-banglabert_generator_bn.md new file mode 100644 index 00000000000000..3ea9e9f0f5d7a7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-banglabert_generator_bn.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Bengali banglabert_generator BertEmbeddings from csebuetnlp +author: John Snow Labs +name: banglabert_generator +date: 2024-09-25 +tags: [bn, open_source, onnx, embeddings, bert] +task: Embeddings +language: bn +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`banglabert_generator` is a Bengali model originally trained by csebuetnlp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/banglabert_generator_bn_5.5.0_3.0_1727240835855.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/banglabert_generator_bn_5.5.0_3.0_1727240835855.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = BertEmbeddings.pretrained("banglabert_generator","bn") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = BertEmbeddings.pretrained("banglabert_generator","bn") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|banglabert_generator| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[bert]| +|Language:|bn| +|Size:|130.0 MB| + +## References + +https://huggingface.co/csebuetnlp/banglabert_generator \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-banglabert_generator_pipeline_bn.md b/docs/_posts/ahmedlone127/2024-09-25-banglabert_generator_pipeline_bn.md new file mode 100644 index 00000000000000..bcd28666242de2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-banglabert_generator_pipeline_bn.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Bengali banglabert_generator_pipeline pipeline BertEmbeddings from csebuetnlp +author: John Snow Labs +name: banglabert_generator_pipeline +date: 2024-09-25 +tags: [bn, open_source, pipeline, onnx] +task: Embeddings +language: bn +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`banglabert_generator_pipeline` is a Bengali model originally trained by csebuetnlp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/banglabert_generator_pipeline_bn_5.5.0_3.0_1727240842238.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/banglabert_generator_pipeline_bn_5.5.0_3.0_1727240842238.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("banglabert_generator_pipeline", lang = "bn") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("banglabert_generator_pipeline", lang = "bn") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|banglabert_generator_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|bn| +|Size:|130.0 MB| + +## References + +https://huggingface.co/csebuetnlp/banglabert_generator + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-base_bert_finetuned_mtsamples_en.md b/docs/_posts/ahmedlone127/2024-09-25-base_bert_finetuned_mtsamples_en.md new file mode 100644 index 00000000000000..f1693ed31295b2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-base_bert_finetuned_mtsamples_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English base_bert_finetuned_mtsamples BertForSequenceClassification from mnaylor +author: John Snow Labs +name: base_bert_finetuned_mtsamples +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`base_bert_finetuned_mtsamples` is a English model originally trained by mnaylor. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/base_bert_finetuned_mtsamples_en_5.5.0_3.0_1727276195937.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/base_bert_finetuned_mtsamples_en_5.5.0_3.0_1727276195937.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("base_bert_finetuned_mtsamples","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("base_bert_finetuned_mtsamples", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|base_bert_finetuned_mtsamples| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/mnaylor/base-bert-finetuned-mtsamples \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-base_bert_finetuned_mtsamples_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-base_bert_finetuned_mtsamples_pipeline_en.md new file mode 100644 index 00000000000000..017d8c81915876 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-base_bert_finetuned_mtsamples_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English base_bert_finetuned_mtsamples_pipeline pipeline BertForSequenceClassification from mnaylor +author: John Snow Labs +name: base_bert_finetuned_mtsamples_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`base_bert_finetuned_mtsamples_pipeline` is a English model originally trained by mnaylor. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/base_bert_finetuned_mtsamples_pipeline_en_5.5.0_3.0_1727276218426.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/base_bert_finetuned_mtsamples_pipeline_en_5.5.0_3.0_1727276218426.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("base_bert_finetuned_mtsamples_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("base_bert_finetuned_mtsamples_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|base_bert_finetuned_mtsamples_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/mnaylor/base-bert-finetuned-mtsamples + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_amazon_product_classification_small_data_epoch_2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_amazon_product_classification_small_data_epoch_2_pipeline_en.md new file mode 100644 index 00000000000000..8199ee9276a584 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_amazon_product_classification_small_data_epoch_2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_amazon_product_classification_small_data_epoch_2_pipeline pipeline BertForSequenceClassification from nthieu +author: John Snow Labs +name: bert_amazon_product_classification_small_data_epoch_2_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_amazon_product_classification_small_data_epoch_2_pipeline` is a English model originally trained by nthieu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_amazon_product_classification_small_data_epoch_2_pipeline_en_5.5.0_3.0_1727288372872.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_amazon_product_classification_small_data_epoch_2_pipeline_en_5.5.0_3.0_1727288372872.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_amazon_product_classification_small_data_epoch_2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_amazon_product_classification_small_data_epoch_2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_amazon_product_classification_small_data_epoch_2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.5 MB| + +## References + +https://huggingface.co/nthieu/bert-amazon-product-classification-small-data-epoch-2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_banking77_pt2_andriydovgal_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_banking77_pt2_andriydovgal_en.md new file mode 100644 index 00000000000000..d4728bf5cd7f64 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_banking77_pt2_andriydovgal_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_banking77_pt2_andriydovgal BertForSequenceClassification from andriydovgal +author: John Snow Labs +name: bert_base_banking77_pt2_andriydovgal +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_banking77_pt2_andriydovgal` is a English model originally trained by andriydovgal. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_banking77_pt2_andriydovgal_en_5.5.0_3.0_1727267267008.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_banking77_pt2_andriydovgal_en_5.5.0_3.0_1727267267008.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_banking77_pt2_andriydovgal","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_banking77_pt2_andriydovgal", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_banking77_pt2_andriydovgal| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.6 MB| + +## References + +https://huggingface.co/andriydovgal/bert-base-banking77-pt2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_banking77_pt2_bakuretso_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_banking77_pt2_bakuretso_en.md new file mode 100644 index 00000000000000..367bd93b5a6379 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_banking77_pt2_bakuretso_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_banking77_pt2_bakuretso BertForSequenceClassification from Bakuretso +author: John Snow Labs +name: bert_base_banking77_pt2_bakuretso +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_banking77_pt2_bakuretso` is a English model originally trained by Bakuretso. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_banking77_pt2_bakuretso_en_5.5.0_3.0_1727266221799.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_banking77_pt2_bakuretso_en_5.5.0_3.0_1727266221799.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_banking77_pt2_bakuretso","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_banking77_pt2_bakuretso", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_banking77_pt2_bakuretso| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.6 MB| + +## References + +https://huggingface.co/Bakuretso/bert-base-banking77-pt2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_banking77_pt2_dangdana_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_banking77_pt2_dangdana_pipeline_en.md new file mode 100644 index 00000000000000..f8b1e265acd0b9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_banking77_pt2_dangdana_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_base_banking77_pt2_dangdana_pipeline pipeline BertForSequenceClassification from dangdana +author: John Snow Labs +name: bert_base_banking77_pt2_dangdana_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_banking77_pt2_dangdana_pipeline` is a English model originally trained by dangdana. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_banking77_pt2_dangdana_pipeline_en_5.5.0_3.0_1727268699035.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_banking77_pt2_dangdana_pipeline_en_5.5.0_3.0_1727268699035.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_banking77_pt2_dangdana_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_banking77_pt2_dangdana_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_banking77_pt2_dangdana_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.6 MB| + +## References + +https://huggingface.co/dangdana/bert-base-banking77-pt2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_banking77_pt2_psj0919_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_banking77_pt2_psj0919_pipeline_en.md new file mode 100644 index 00000000000000..b562bbe1f78ca6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_banking77_pt2_psj0919_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_base_banking77_pt2_psj0919_pipeline pipeline BertForSequenceClassification from psj0919 +author: John Snow Labs +name: bert_base_banking77_pt2_psj0919_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_banking77_pt2_psj0919_pipeline` is a English model originally trained by psj0919. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_banking77_pt2_psj0919_pipeline_en_5.5.0_3.0_1727266525860.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_banking77_pt2_psj0919_pipeline_en_5.5.0_3.0_1727266525860.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_banking77_pt2_psj0919_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_banking77_pt2_psj0919_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_banking77_pt2_psj0919_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.6 MB| + +## References + +https://huggingface.co/psj0919/bert-base-banking77-pt2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_banking77_pt2_tonyla25_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_banking77_pt2_tonyla25_en.md new file mode 100644 index 00000000000000..1917bfc577377d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_banking77_pt2_tonyla25_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_banking77_pt2_tonyla25 BertForSequenceClassification from tonyla25 +author: John Snow Labs +name: bert_base_banking77_pt2_tonyla25 +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_banking77_pt2_tonyla25` is a English model originally trained by tonyla25. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_banking77_pt2_tonyla25_en_5.5.0_3.0_1727268905496.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_banking77_pt2_tonyla25_en_5.5.0_3.0_1727268905496.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_banking77_pt2_tonyla25","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_banking77_pt2_tonyla25", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_banking77_pt2_tonyla25| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|8.7 MB| + +## References + +https://huggingface.co/tonyla25/bert-base-banking77-pt2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_bookcorpus_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_bookcorpus_en.md new file mode 100644 index 00000000000000..f5901fc9d00347 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_bookcorpus_en.md @@ -0,0 +1,92 @@ +--- +layout: model +title: English bert_base_bookcorpus BertEmbeddings from nicholasKluge +author: John Snow Labs +name: bert_base_bookcorpus +date: 2024-09-25 +tags: [bert, en, open_source, fill_mask, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_bookcorpus` is a English model originally trained by nicholasKluge. + +## Predicted Entities + + + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_bookcorpus_en_5.5.0_3.0_1727240901527.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_bookcorpus_en_5.5.0_3.0_1727240901527.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +document_assembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +embeddings =BertEmbeddings.pretrained("bert_base_bookcorpus","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([document_assembler, embeddings]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) +``` +```scala +val document_assembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val embeddings = BertEmbeddings + .pretrained("bert_base_bookcorpus", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(document_assembler, embeddings)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_bookcorpus| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[bert]| +|Language:|en| +|Size:|406.3 MB| + +## References + +References + +https://huggingface.co/nicholasKluge/bert-base-bookcorpus \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_buddhist_sanskrit_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_buddhist_sanskrit_en.md new file mode 100644 index 00000000000000..2716ceae4cf0ea --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_buddhist_sanskrit_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_buddhist_sanskrit BertEmbeddings from Matej +author: John Snow Labs +name: bert_base_buddhist_sanskrit +date: 2024-09-25 +tags: [en, open_source, onnx, embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_buddhist_sanskrit` is a English model originally trained by Matej. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_buddhist_sanskrit_en_5.5.0_3.0_1727254998263.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_buddhist_sanskrit_en_5.5.0_3.0_1727254998263.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = BertEmbeddings.pretrained("bert_base_buddhist_sanskrit","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = BertEmbeddings.pretrained("bert_base_buddhist_sanskrit","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_buddhist_sanskrit| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[bert]| +|Language:|en| +|Size:|406.8 MB| + +## References + +https://huggingface.co/Matej/bert-base-buddhist-sanskrit \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_case_ner_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_case_ner_pipeline_en.md new file mode 100644 index 00000000000000..bc158d8d5a7486 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_case_ner_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_base_case_ner_pipeline pipeline BertForTokenClassification from raulgdp +author: John Snow Labs +name: bert_base_case_ner_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_case_ner_pipeline` is a English model originally trained by raulgdp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_case_ner_pipeline_en_5.5.0_3.0_1727280259153.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_case_ner_pipeline_en_5.5.0_3.0_1727280259153.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_case_ner_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_case_ner_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_case_ner_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/raulgdp/bert-base-case-ner + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_cased_0210_celential_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_cased_0210_celential_en.md new file mode 100644 index 00000000000000..6434d9ea67151b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_cased_0210_celential_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_cased_0210_celential BertForSequenceClassification from feiyangDu +author: John Snow Labs +name: bert_base_cased_0210_celential +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_cased_0210_celential` is a English model originally trained by feiyangDu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_cased_0210_celential_en_5.5.0_3.0_1727285678748.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_cased_0210_celential_en_5.5.0_3.0_1727285678748.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_cased_0210_celential","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_cased_0210_celential", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_cased_0210_celential| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|405.9 MB| + +## References + +https://huggingface.co/feiyangDu/bert-base-cased-0210-celential \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_cased_cola_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_cased_cola_en.md new file mode 100644 index 00000000000000..075ad379cfff85 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_cased_cola_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_cased_cola BertForSequenceClassification from gmihaila +author: John Snow Labs +name: bert_base_cased_cola +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_cased_cola` is a English model originally trained by gmihaila. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_cased_cola_en_5.5.0_3.0_1727287559197.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_cased_cola_en_5.5.0_3.0_1727287559197.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_cased_cola","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_cased_cola", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_cased_cola| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|405.9 MB| + +## References + +https://huggingface.co/gmihaila/bert-base-cased-cola \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_cased_english_sentweet_derogatory_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_cased_english_sentweet_derogatory_pipeline_en.md new file mode 100644 index 00000000000000..8c203e81d778fb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_cased_english_sentweet_derogatory_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_base_cased_english_sentweet_derogatory_pipeline pipeline BertForSequenceClassification from jayanta +author: John Snow Labs +name: bert_base_cased_english_sentweet_derogatory_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_cased_english_sentweet_derogatory_pipeline` is a English model originally trained by jayanta. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_cased_english_sentweet_derogatory_pipeline_en_5.5.0_3.0_1727288791450.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_cased_english_sentweet_derogatory_pipeline_en_5.5.0_3.0_1727288791450.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_cased_english_sentweet_derogatory_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_cased_english_sentweet_derogatory_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_cased_english_sentweet_derogatory_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|405.9 MB| + +## References + +https://huggingface.co/jayanta/bert-base-cased-english-sentweet-Derogatory + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_cased_finetuned_ner_bc2gm_iob_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_cased_finetuned_ner_bc2gm_iob_en.md new file mode 100644 index 00000000000000..473d6fc34f6ca8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_cased_finetuned_ner_bc2gm_iob_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_cased_finetuned_ner_bc2gm_iob BertForTokenClassification from DunnBC22 +author: John Snow Labs +name: bert_base_cased_finetuned_ner_bc2gm_iob +date: 2024-09-25 +tags: [en, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_cased_finetuned_ner_bc2gm_iob` is a English model originally trained by DunnBC22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_cased_finetuned_ner_bc2gm_iob_en_5.5.0_3.0_1727284110801.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_cased_finetuned_ner_bc2gm_iob_en_5.5.0_3.0_1727284110801.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("bert_base_cased_finetuned_ner_bc2gm_iob","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_base_cased_finetuned_ner_bc2gm_iob", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_cased_finetuned_ner_bc2gm_iob| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/DunnBC22/bert-base-cased-finetuned-ner-BC2GM-IOB \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_cased_textcls_rheology_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_cased_textcls_rheology_pipeline_en.md new file mode 100644 index 00000000000000..69490b4cda4fd0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_cased_textcls_rheology_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_base_cased_textcls_rheology_pipeline pipeline BertForSequenceClassification from jonas-luehrs +author: John Snow Labs +name: bert_base_cased_textcls_rheology_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_cased_textcls_rheology_pipeline` is a English model originally trained by jonas-luehrs. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_cased_textcls_rheology_pipeline_en_5.5.0_3.0_1727272973625.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_cased_textcls_rheology_pipeline_en_5.5.0_3.0_1727272973625.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_cased_textcls_rheology_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_cased_textcls_rheology_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_cased_textcls_rheology_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|405.9 MB| + +## References + +https://huggingface.co/jonas-luehrs/bert-base-cased-textCLS-RHEOLOGY + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_casedepoch3_sexist_baseline_with_reddit_and_gabfortest_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_casedepoch3_sexist_baseline_with_reddit_and_gabfortest_pipeline_en.md new file mode 100644 index 00000000000000..8f684b6dfa984b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_casedepoch3_sexist_baseline_with_reddit_and_gabfortest_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_base_casedepoch3_sexist_baseline_with_reddit_and_gabfortest_pipeline pipeline BertForSequenceClassification from Wiebke +author: John Snow Labs +name: bert_base_casedepoch3_sexist_baseline_with_reddit_and_gabfortest_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_casedepoch3_sexist_baseline_with_reddit_and_gabfortest_pipeline` is a English model originally trained by Wiebke. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_casedepoch3_sexist_baseline_with_reddit_and_gabfortest_pipeline_en_5.5.0_3.0_1727284527197.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_casedepoch3_sexist_baseline_with_reddit_and_gabfortest_pipeline_en_5.5.0_3.0_1727284527197.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_casedepoch3_sexist_baseline_with_reddit_and_gabfortest_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_casedepoch3_sexist_baseline_with_reddit_and_gabfortest_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_casedepoch3_sexist_baseline_with_reddit_and_gabfortest_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|405.9 MB| + +## References + +https://huggingface.co/Wiebke/bert-base-casedepoch3_sexist_baseline_with_reddit_and_gabfortest + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_chinese_climate_risk_opportunity_prediction_v4_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_chinese_climate_risk_opportunity_prediction_v4_en.md new file mode 100644 index 00000000000000..a8ec49ea35b2de --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_chinese_climate_risk_opportunity_prediction_v4_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_chinese_climate_risk_opportunity_prediction_v4 BertForSequenceClassification from hw2942 +author: John Snow Labs +name: bert_base_chinese_climate_risk_opportunity_prediction_v4 +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_chinese_climate_risk_opportunity_prediction_v4` is a English model originally trained by hw2942. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_chinese_climate_risk_opportunity_prediction_v4_en_5.5.0_3.0_1727285677709.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_chinese_climate_risk_opportunity_prediction_v4_en_5.5.0_3.0_1727285677709.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_chinese_climate_risk_opportunity_prediction_v4","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_chinese_climate_risk_opportunity_prediction_v4", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_chinese_climate_risk_opportunity_prediction_v4| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|383.3 MB| + +## References + +https://huggingface.co/hw2942/bert-base-chinese-climate-risk-opportunity-prediction-v4 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_chinese_finetuning_financial_news_sentiment_zh.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_chinese_finetuning_financial_news_sentiment_zh.md new file mode 100644 index 00000000000000..e335614831ec3a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_chinese_finetuning_financial_news_sentiment_zh.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Chinese bert_base_chinese_finetuning_financial_news_sentiment BertForSequenceClassification from hw2942 +author: John Snow Labs +name: bert_base_chinese_finetuning_financial_news_sentiment +date: 2024-09-25 +tags: [zh, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: zh +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_chinese_finetuning_financial_news_sentiment` is a Chinese model originally trained by hw2942. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_chinese_finetuning_financial_news_sentiment_zh_5.5.0_3.0_1727279222615.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_chinese_finetuning_financial_news_sentiment_zh_5.5.0_3.0_1727279222615.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_chinese_finetuning_financial_news_sentiment","zh") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_chinese_finetuning_financial_news_sentiment", "zh") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_chinese_finetuning_financial_news_sentiment| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|zh| +|Size:|383.3 MB| + +## References + +https://huggingface.co/hw2942/bert-base-chinese-finetuning-financial-news-sentiment \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_finetuned_code_classification_mid_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_finetuned_code_classification_mid_pipeline_en.md new file mode 100644 index 00000000000000..cdc5e6e19a3242 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_finetuned_code_classification_mid_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_base_finetuned_code_classification_mid_pipeline pipeline BertForSequenceClassification from JUNstats +author: John Snow Labs +name: bert_base_finetuned_code_classification_mid_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_finetuned_code_classification_mid_pipeline` is a English model originally trained by JUNstats. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_finetuned_code_classification_mid_pipeline_en_5.5.0_3.0_1727286164330.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_finetuned_code_classification_mid_pipeline_en_5.5.0_3.0_1727286164330.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_finetuned_code_classification_mid_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_finetuned_code_classification_mid_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_finetuned_code_classification_mid_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|414.8 MB| + +## References + +https://huggingface.co/JUNstats/bert-base-finetuned-code-classification-mid + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_finetuned_masakhaner_amh_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_finetuned_masakhaner_amh_en.md new file mode 100644 index 00000000000000..d07be6cb1cf7ce --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_finetuned_masakhaner_amh_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_finetuned_masakhaner_amh BertForTokenClassification from TokenfreeEMNLPSubmission +author: John Snow Labs +name: bert_base_finetuned_masakhaner_amh +date: 2024-09-25 +tags: [en, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_finetuned_masakhaner_amh` is a English model originally trained by TokenfreeEMNLPSubmission. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_finetuned_masakhaner_amh_en_5.5.0_3.0_1727283797535.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_finetuned_masakhaner_amh_en_5.5.0_3.0_1727283797535.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("bert_base_finetuned_masakhaner_amh","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_base_finetuned_masakhaner_amh", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_finetuned_masakhaner_amh| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/TokenfreeEMNLPSubmission/bert-base-finetuned-masakhaner-amh \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_finetuned_masakhaner_amh_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_finetuned_masakhaner_amh_pipeline_en.md new file mode 100644 index 00000000000000..faa43e947fa524 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_finetuned_masakhaner_amh_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_base_finetuned_masakhaner_amh_pipeline pipeline BertForTokenClassification from TokenfreeEMNLPSubmission +author: John Snow Labs +name: bert_base_finetuned_masakhaner_amh_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_finetuned_masakhaner_amh_pipeline` is a English model originally trained by TokenfreeEMNLPSubmission. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_finetuned_masakhaner_amh_pipeline_en_5.5.0_3.0_1727283818894.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_finetuned_masakhaner_amh_pipeline_en_5.5.0_3.0_1727283818894.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_finetuned_masakhaner_amh_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_finetuned_masakhaner_amh_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_finetuned_masakhaner_amh_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/TokenfreeEMNLPSubmission/bert-base-finetuned-masakhaner-amh + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_finetuned_sts_rurupang_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_finetuned_sts_rurupang_en.md new file mode 100644 index 00000000000000..11d22145199121 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_finetuned_sts_rurupang_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_finetuned_sts_rurupang BertForSequenceClassification from rurupang +author: John Snow Labs +name: bert_base_finetuned_sts_rurupang +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_finetuned_sts_rurupang` is a English model originally trained by rurupang. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_finetuned_sts_rurupang_en_5.5.0_3.0_1727279480961.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_finetuned_sts_rurupang_en_5.5.0_3.0_1727279480961.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_finetuned_sts_rurupang","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_finetuned_sts_rurupang", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_finetuned_sts_rurupang| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|414.6 MB| + +## References + +https://huggingface.co/rurupang/bert-base-finetuned-sts \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_finetuned_sts_rurupang_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_finetuned_sts_rurupang_pipeline_en.md new file mode 100644 index 00000000000000..5d5f7983a81083 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_finetuned_sts_rurupang_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_base_finetuned_sts_rurupang_pipeline pipeline BertForSequenceClassification from rurupang +author: John Snow Labs +name: bert_base_finetuned_sts_rurupang_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_finetuned_sts_rurupang_pipeline` is a English model originally trained by rurupang. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_finetuned_sts_rurupang_pipeline_en_5.5.0_3.0_1727279502720.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_finetuned_sts_rurupang_pipeline_en_5.5.0_3.0_1727279502720.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_finetuned_sts_rurupang_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_finetuned_sts_rurupang_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_finetuned_sts_rurupang_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|414.7 MB| + +## References + +https://huggingface.co/rurupang/bert-base-finetuned-sts + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_finetuned_ynat_zgotter_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_finetuned_ynat_zgotter_en.md new file mode 100644 index 00000000000000..c754609d5786d5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_finetuned_ynat_zgotter_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_finetuned_ynat_zgotter BertForSequenceClassification from zgotter +author: John Snow Labs +name: bert_base_finetuned_ynat_zgotter +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_finetuned_ynat_zgotter` is a English model originally trained by zgotter. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_finetuned_ynat_zgotter_en_5.5.0_3.0_1727268694344.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_finetuned_ynat_zgotter_en_5.5.0_3.0_1727268694344.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_finetuned_ynat_zgotter","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_finetuned_ynat_zgotter", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_finetuned_ynat_zgotter| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|414.7 MB| + +## References + +https://huggingface.co/zgotter/bert-base-finetuned-ynat \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_finetuned_ynat_zgotter_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_finetuned_ynat_zgotter_pipeline_en.md new file mode 100644 index 00000000000000..e113568d0e3e33 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_finetuned_ynat_zgotter_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_base_finetuned_ynat_zgotter_pipeline pipeline BertForSequenceClassification from zgotter +author: John Snow Labs +name: bert_base_finetuned_ynat_zgotter_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_finetuned_ynat_zgotter_pipeline` is a English model originally trained by zgotter. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_finetuned_ynat_zgotter_pipeline_en_5.5.0_3.0_1727268718283.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_finetuned_ynat_zgotter_pipeline_en_5.5.0_3.0_1727268718283.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_finetuned_ynat_zgotter_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_finetuned_ynat_zgotter_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_finetuned_ynat_zgotter_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|414.7 MB| + +## References + +https://huggingface.co/zgotter/bert-base-finetuned-ynat + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_german_cased_archaeo_ner_pipeline_de.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_german_cased_archaeo_ner_pipeline_de.md new file mode 100644 index 00000000000000..2c0b1d4905fc3d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_german_cased_archaeo_ner_pipeline_de.md @@ -0,0 +1,70 @@ +--- +layout: model +title: German bert_base_german_cased_archaeo_ner_pipeline pipeline BertForTokenClassification from alexbrandsen +author: John Snow Labs +name: bert_base_german_cased_archaeo_ner_pipeline +date: 2024-09-25 +tags: [de, open_source, pipeline, onnx] +task: Named Entity Recognition +language: de +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_german_cased_archaeo_ner_pipeline` is a German model originally trained by alexbrandsen. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_german_cased_archaeo_ner_pipeline_de_5.5.0_3.0_1727246633648.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_german_cased_archaeo_ner_pipeline_de_5.5.0_3.0_1727246633648.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_german_cased_archaeo_ner_pipeline", lang = "de") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_german_cased_archaeo_ner_pipeline", lang = "de") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_german_cased_archaeo_ner_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|de| +|Size:|406.9 MB| + +## References + +https://huggingface.co/alexbrandsen/bert-base-german-cased-archaeo-NER + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_german_cased_finetuned_subj_pretrained_with_noisydata_v2_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_german_cased_finetuned_subj_pretrained_with_noisydata_v2_en.md new file mode 100644 index 00000000000000..724709b01dd673 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_german_cased_finetuned_subj_pretrained_with_noisydata_v2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_german_cased_finetuned_subj_pretrained_with_noisydata_v2 BertForTokenClassification from tbosse +author: John Snow Labs +name: bert_base_german_cased_finetuned_subj_pretrained_with_noisydata_v2 +date: 2024-09-25 +tags: [en, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_german_cased_finetuned_subj_pretrained_with_noisydata_v2` is a English model originally trained by tbosse. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_german_cased_finetuned_subj_pretrained_with_noisydata_v2_en_5.5.0_3.0_1727260502352.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_german_cased_finetuned_subj_pretrained_with_noisydata_v2_en_5.5.0_3.0_1727260502352.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("bert_base_german_cased_finetuned_subj_pretrained_with_noisydata_v2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_base_german_cased_finetuned_subj_pretrained_with_noisydata_v2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_german_cased_finetuned_subj_pretrained_with_noisydata_v2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|406.9 MB| + +## References + +https://huggingface.co/tbosse/bert-base-german-cased-finetuned-subj_preTrained_with_noisyData_v2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_german_cased_finetuned_subj_v6_7epoch_v3_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_german_cased_finetuned_subj_v6_7epoch_v3_en.md new file mode 100644 index 00000000000000..a1af598a3048f4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_german_cased_finetuned_subj_v6_7epoch_v3_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_german_cased_finetuned_subj_v6_7epoch_v3 BertForTokenClassification from tbosse +author: John Snow Labs +name: bert_base_german_cased_finetuned_subj_v6_7epoch_v3 +date: 2024-09-25 +tags: [en, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_german_cased_finetuned_subj_v6_7epoch_v3` is a English model originally trained by tbosse. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_german_cased_finetuned_subj_v6_7epoch_v3_en_5.5.0_3.0_1727284295244.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_german_cased_finetuned_subj_v6_7epoch_v3_en_5.5.0_3.0_1727284295244.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("bert_base_german_cased_finetuned_subj_v6_7epoch_v3","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_base_german_cased_finetuned_subj_v6_7epoch_v3", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_german_cased_finetuned_subj_v6_7epoch_v3| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|406.9 MB| + +## References + +https://huggingface.co/tbosse/bert-base-german-cased-finetuned-subj_v6_7Epoch_v3 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_german_cased_noisy_pretrain_fine_tuned_v1_2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_german_cased_noisy_pretrain_fine_tuned_v1_2_pipeline_en.md new file mode 100644 index 00000000000000..ddc2658715bda6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_german_cased_noisy_pretrain_fine_tuned_v1_2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_base_german_cased_noisy_pretrain_fine_tuned_v1_2_pipeline pipeline BertForTokenClassification from tbosse +author: John Snow Labs +name: bert_base_german_cased_noisy_pretrain_fine_tuned_v1_2_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_german_cased_noisy_pretrain_fine_tuned_v1_2_pipeline` is a English model originally trained by tbosse. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_german_cased_noisy_pretrain_fine_tuned_v1_2_pipeline_en_5.5.0_3.0_1727280917448.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_german_cased_noisy_pretrain_fine_tuned_v1_2_pipeline_en_5.5.0_3.0_1727280917448.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_german_cased_noisy_pretrain_fine_tuned_v1_2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_german_cased_noisy_pretrain_fine_tuned_v1_2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_german_cased_noisy_pretrain_fine_tuned_v1_2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|406.9 MB| + +## References + +https://huggingface.co/tbosse/bert-base-german-cased-noisy-pretrain-fine-tuned_v1.2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_italian_xxl_uncased_finetuned_emotions_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_italian_xxl_uncased_finetuned_emotions_en.md new file mode 100644 index 00000000000000..398a31dad69102 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_italian_xxl_uncased_finetuned_emotions_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_italian_xxl_uncased_finetuned_emotions BertForSequenceClassification from MelmaGrigia +author: John Snow Labs +name: bert_base_italian_xxl_uncased_finetuned_emotions +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_italian_xxl_uncased_finetuned_emotions` is a English model originally trained by MelmaGrigia. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_italian_xxl_uncased_finetuned_emotions_en_5.5.0_3.0_1727222474327.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_italian_xxl_uncased_finetuned_emotions_en_5.5.0_3.0_1727222474327.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_italian_xxl_uncased_finetuned_emotions","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_italian_xxl_uncased_finetuned_emotions", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_italian_xxl_uncased_finetuned_emotions| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|414.8 MB| + +## References + +https://huggingface.co/MelmaGrigia/bert-base-italian-xxl-uncased-finetuned-emotions \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_massive_intent_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_massive_intent_pipeline_en.md new file mode 100644 index 00000000000000..7848de87d384dc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_massive_intent_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_base_massive_intent_pipeline pipeline BertForSequenceClassification from gokuls +author: John Snow Labs +name: bert_base_massive_intent_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_massive_intent_pipeline` is a English model originally trained by gokuls. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_massive_intent_pipeline_en_5.5.0_3.0_1727273214064.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_massive_intent_pipeline_en_5.5.0_3.0_1727273214064.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_massive_intent_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_massive_intent_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_massive_intent_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.6 MB| + +## References + +https://huggingface.co/gokuls/bert-base-Massive-intent + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_msmarco_fiqa_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_msmarco_fiqa_en.md new file mode 100644 index 00000000000000..9220161ec6677c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_msmarco_fiqa_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_msmarco_fiqa BertForSequenceClassification from vittoriomaggio +author: John Snow Labs +name: bert_base_msmarco_fiqa +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_msmarco_fiqa` is a English model originally trained by vittoriomaggio. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_msmarco_fiqa_en_5.5.0_3.0_1727273470072.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_msmarco_fiqa_en_5.5.0_3.0_1727273470072.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_msmarco_fiqa","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_msmarco_fiqa", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_msmarco_fiqa| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/vittoriomaggio/bert-base-msmarco-fiqa \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_multilingual_cased_finetuned_ner_geocorpus_pipeline_xx.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_multilingual_cased_finetuned_ner_geocorpus_pipeline_xx.md new file mode 100644 index 00000000000000..f8c863147c5bea --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_multilingual_cased_finetuned_ner_geocorpus_pipeline_xx.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Multilingual bert_base_multilingual_cased_finetuned_ner_geocorpus_pipeline pipeline BertForTokenClassification from GuiTap +author: John Snow Labs +name: bert_base_multilingual_cased_finetuned_ner_geocorpus_pipeline +date: 2024-09-25 +tags: [xx, open_source, pipeline, onnx] +task: Named Entity Recognition +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_multilingual_cased_finetuned_ner_geocorpus_pipeline` is a Multilingual model originally trained by GuiTap. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_multilingual_cased_finetuned_ner_geocorpus_pipeline_xx_5.5.0_3.0_1727249902683.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_multilingual_cased_finetuned_ner_geocorpus_pipeline_xx_5.5.0_3.0_1727249902683.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_multilingual_cased_finetuned_ner_geocorpus_pipeline", lang = "xx") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_multilingual_cased_finetuned_ner_geocorpus_pipeline", lang = "xx") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_multilingual_cased_finetuned_ner_geocorpus_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|xx| +|Size:|665.2 MB| + +## References + +https://huggingface.co/GuiTap/bert-base-multilingual-cased-finetuned-ner-geocorpus + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_multilingual_cased_wnli_1_pipeline_xx.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_multilingual_cased_wnli_1_pipeline_xx.md new file mode 100644 index 00000000000000..a00f334e07226a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_multilingual_cased_wnli_1_pipeline_xx.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Multilingual bert_base_multilingual_cased_wnli_1_pipeline pipeline BertForSequenceClassification from tmnam20 +author: John Snow Labs +name: bert_base_multilingual_cased_wnli_1_pipeline +date: 2024-09-25 +tags: [xx, open_source, pipeline, onnx] +task: Text Classification +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_multilingual_cased_wnli_1_pipeline` is a Multilingual model originally trained by tmnam20. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_multilingual_cased_wnli_1_pipeline_xx_5.5.0_3.0_1727285032268.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_multilingual_cased_wnli_1_pipeline_xx_5.5.0_3.0_1727285032268.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_multilingual_cased_wnli_1_pipeline", lang = "xx") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_multilingual_cased_wnli_1_pipeline", lang = "xx") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_multilingual_cased_wnli_1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|xx| +|Size:|667.3 MB| + +## References + +https://huggingface.co/tmnam20/bert-base-multilingual-cased-wnli-1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_multilingual_uncased_finetuned_for_multilang_ner_pipeline_xx.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_multilingual_uncased_finetuned_for_multilang_ner_pipeline_xx.md new file mode 100644 index 00000000000000..338aadf28b0e3c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_multilingual_uncased_finetuned_for_multilang_ner_pipeline_xx.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Multilingual bert_base_multilingual_uncased_finetuned_for_multilang_ner_pipeline pipeline BertForTokenClassification from Misha24-10 +author: John Snow Labs +name: bert_base_multilingual_uncased_finetuned_for_multilang_ner_pipeline +date: 2024-09-25 +tags: [xx, open_source, pipeline, onnx] +task: Named Entity Recognition +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_multilingual_uncased_finetuned_for_multilang_ner_pipeline` is a Multilingual model originally trained by Misha24-10. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_multilingual_uncased_finetuned_for_multilang_ner_pipeline_xx_5.5.0_3.0_1727275975615.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_multilingual_uncased_finetuned_for_multilang_ner_pipeline_xx_5.5.0_3.0_1727275975615.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_multilingual_uncased_finetuned_for_multilang_ner_pipeline", lang = "xx") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_multilingual_uncased_finetuned_for_multilang_ner_pipeline", lang = "xx") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_multilingual_uncased_finetuned_for_multilang_ner_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|xx| +|Size:|625.8 MB| + +## References + +https://huggingface.co/Misha24-10/bert-base-multilingual-uncased-finetuned-for-multilang-ner + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_multilingual_uncased_finetuned_for_multilang_ner_xx.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_multilingual_uncased_finetuned_for_multilang_ner_xx.md new file mode 100644 index 00000000000000..de9764642e5fa2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_multilingual_uncased_finetuned_for_multilang_ner_xx.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Multilingual bert_base_multilingual_uncased_finetuned_for_multilang_ner BertForTokenClassification from Misha24-10 +author: John Snow Labs +name: bert_base_multilingual_uncased_finetuned_for_multilang_ner +date: 2024-09-25 +tags: [xx, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_multilingual_uncased_finetuned_for_multilang_ner` is a Multilingual model originally trained by Misha24-10. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_multilingual_uncased_finetuned_for_multilang_ner_xx_5.5.0_3.0_1727275943080.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_multilingual_uncased_finetuned_for_multilang_ner_xx_5.5.0_3.0_1727275943080.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("bert_base_multilingual_uncased_finetuned_for_multilang_ner","xx") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_base_multilingual_uncased_finetuned_for_multilang_ner", "xx") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_multilingual_uncased_finetuned_for_multilang_ner| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|xx| +|Size:|625.7 MB| + +## References + +https://huggingface.co/Misha24-10/bert-base-multilingual-uncased-finetuned-for-multilang-ner \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_multilingual_uncased_finetuned_masress_pipeline_xx.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_multilingual_uncased_finetuned_masress_pipeline_xx.md new file mode 100644 index 00000000000000..8355eced5abfb0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_multilingual_uncased_finetuned_masress_pipeline_xx.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Multilingual bert_base_multilingual_uncased_finetuned_masress_pipeline pipeline BertForSequenceClassification from cjbarrie +author: John Snow Labs +name: bert_base_multilingual_uncased_finetuned_masress_pipeline +date: 2024-09-25 +tags: [xx, open_source, pipeline, onnx] +task: Text Classification +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_multilingual_uncased_finetuned_masress_pipeline` is a Multilingual model originally trained by cjbarrie. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_multilingual_uncased_finetuned_masress_pipeline_xx_5.5.0_3.0_1727257527230.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_multilingual_uncased_finetuned_masress_pipeline_xx_5.5.0_3.0_1727257527230.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_multilingual_uncased_finetuned_masress_pipeline", lang = "xx") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_multilingual_uncased_finetuned_masress_pipeline", lang = "xx") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_multilingual_uncased_finetuned_masress_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|xx| +|Size:|627.8 MB| + +## References + +https://huggingface.co/cjbarrie/bert-base-multilingual-uncased-finetuned-masress + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_multilingual_uncased_finetuned_masress_xx.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_multilingual_uncased_finetuned_masress_xx.md new file mode 100644 index 00000000000000..2409bfd2ddd013 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_multilingual_uncased_finetuned_masress_xx.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Multilingual bert_base_multilingual_uncased_finetuned_masress BertForSequenceClassification from cjbarrie +author: John Snow Labs +name: bert_base_multilingual_uncased_finetuned_masress +date: 2024-09-25 +tags: [xx, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_multilingual_uncased_finetuned_masress` is a Multilingual model originally trained by cjbarrie. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_multilingual_uncased_finetuned_masress_xx_5.5.0_3.0_1727257494505.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_multilingual_uncased_finetuned_masress_xx_5.5.0_3.0_1727257494505.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_multilingual_uncased_finetuned_masress","xx") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_multilingual_uncased_finetuned_masress", "xx") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_multilingual_uncased_finetuned_masress| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|xx| +|Size:|627.7 MB| + +## References + +https://huggingface.co/cjbarrie/bert-base-multilingual-uncased-finetuned-masress \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_multilingual_uncased_ner_silvanus_pipeline_xx.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_multilingual_uncased_ner_silvanus_pipeline_xx.md new file mode 100644 index 00000000000000..4799cd6723bec5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_multilingual_uncased_ner_silvanus_pipeline_xx.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Multilingual bert_base_multilingual_uncased_ner_silvanus_pipeline pipeline BertForTokenClassification from rollerhafeezh-amikom +author: John Snow Labs +name: bert_base_multilingual_uncased_ner_silvanus_pipeline +date: 2024-09-25 +tags: [xx, open_source, pipeline, onnx] +task: Named Entity Recognition +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_multilingual_uncased_ner_silvanus_pipeline` is a Multilingual model originally trained by rollerhafeezh-amikom. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_multilingual_uncased_ner_silvanus_pipeline_xx_5.5.0_3.0_1727247866149.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_multilingual_uncased_ner_silvanus_pipeline_xx_5.5.0_3.0_1727247866149.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_multilingual_uncased_ner_silvanus_pipeline", lang = "xx") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_multilingual_uncased_ner_silvanus_pipeline", lang = "xx") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_multilingual_uncased_ner_silvanus_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|xx| +|Size:|625.6 MB| + +## References + +https://huggingface.co/rollerhafeezh-amikom/bert-base-multilingual-uncased-ner-silvanus + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_multilingual_uncased_ner_silvanus_xx.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_multilingual_uncased_ner_silvanus_xx.md new file mode 100644 index 00000000000000..89d3ec1cdb3c91 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_multilingual_uncased_ner_silvanus_xx.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Multilingual bert_base_multilingual_uncased_ner_silvanus BertForTokenClassification from rollerhafeezh-amikom +author: John Snow Labs +name: bert_base_multilingual_uncased_ner_silvanus +date: 2024-09-25 +tags: [xx, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_multilingual_uncased_ner_silvanus` is a Multilingual model originally trained by rollerhafeezh-amikom. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_multilingual_uncased_ner_silvanus_xx_5.5.0_3.0_1727247832780.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_multilingual_uncased_ner_silvanus_xx_5.5.0_3.0_1727247832780.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("bert_base_multilingual_uncased_ner_silvanus","xx") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_base_multilingual_uncased_ner_silvanus", "xx") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_multilingual_uncased_ner_silvanus| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|xx| +|Size:|625.5 MB| + +## References + +https://huggingface.co/rollerhafeezh-amikom/bert-base-multilingual-uncased-ner-silvanus \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_multilingual_uncased_sentiment_finetuned_meia_analisisdesentimientos_beamandym_pipeline_xx.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_multilingual_uncased_sentiment_finetuned_meia_analisisdesentimientos_beamandym_pipeline_xx.md new file mode 100644 index 00000000000000..2d7fdeef57314f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_multilingual_uncased_sentiment_finetuned_meia_analisisdesentimientos_beamandym_pipeline_xx.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Multilingual bert_base_multilingual_uncased_sentiment_finetuned_meia_analisisdesentimientos_beamandym_pipeline pipeline BertForSequenceClassification from beamandym +author: John Snow Labs +name: bert_base_multilingual_uncased_sentiment_finetuned_meia_analisisdesentimientos_beamandym_pipeline +date: 2024-09-25 +tags: [xx, open_source, pipeline, onnx] +task: Text Classification +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_multilingual_uncased_sentiment_finetuned_meia_analisisdesentimientos_beamandym_pipeline` is a Multilingual model originally trained by beamandym. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_multilingual_uncased_sentiment_finetuned_meia_analisisdesentimientos_beamandym_pipeline_xx_5.5.0_3.0_1727237764695.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_multilingual_uncased_sentiment_finetuned_meia_analisisdesentimientos_beamandym_pipeline_xx_5.5.0_3.0_1727237764695.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_multilingual_uncased_sentiment_finetuned_meia_analisisdesentimientos_beamandym_pipeline", lang = "xx") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_multilingual_uncased_sentiment_finetuned_meia_analisisdesentimientos_beamandym_pipeline", lang = "xx") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_multilingual_uncased_sentiment_finetuned_meia_analisisdesentimientos_beamandym_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|xx| +|Size:|627.8 MB| + +## References + +https://huggingface.co/beamandym/bert-base-multilingual-uncased-sentiment-finetuned-MeIA-AnalisisDeSentimientos + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_multilingual_uncased_sentiment_finetuned_meia_analisisdesentimientos_beamandym_xx.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_multilingual_uncased_sentiment_finetuned_meia_analisisdesentimientos_beamandym_xx.md new file mode 100644 index 00000000000000..af3f194994e60c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_multilingual_uncased_sentiment_finetuned_meia_analisisdesentimientos_beamandym_xx.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Multilingual bert_base_multilingual_uncased_sentiment_finetuned_meia_analisisdesentimientos_beamandym BertForSequenceClassification from beamandym +author: John Snow Labs +name: bert_base_multilingual_uncased_sentiment_finetuned_meia_analisisdesentimientos_beamandym +date: 2024-09-25 +tags: [xx, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_multilingual_uncased_sentiment_finetuned_meia_analisisdesentimientos_beamandym` is a Multilingual model originally trained by beamandym. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_multilingual_uncased_sentiment_finetuned_meia_analisisdesentimientos_beamandym_xx_5.5.0_3.0_1727237732984.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_multilingual_uncased_sentiment_finetuned_meia_analisisdesentimientos_beamandym_xx_5.5.0_3.0_1727237732984.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_multilingual_uncased_sentiment_finetuned_meia_analisisdesentimientos_beamandym","xx") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_multilingual_uncased_sentiment_finetuned_meia_analisisdesentimientos_beamandym", "xx") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_multilingual_uncased_sentiment_finetuned_meia_analisisdesentimientos_beamandym| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|xx| +|Size:|627.8 MB| + +## References + +https://huggingface.co/beamandym/bert-base-multilingual-uncased-sentiment-finetuned-MeIA-AnalisisDeSentimientos \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_multilingual_uncased_sentiment_finetuned_meia_analisisdesentimientos_jumartineze_pipeline_xx.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_multilingual_uncased_sentiment_finetuned_meia_analisisdesentimientos_jumartineze_pipeline_xx.md new file mode 100644 index 00000000000000..d286bb0c0b4488 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_multilingual_uncased_sentiment_finetuned_meia_analisisdesentimientos_jumartineze_pipeline_xx.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Multilingual bert_base_multilingual_uncased_sentiment_finetuned_meia_analisisdesentimientos_jumartineze_pipeline pipeline BertForSequenceClassification from Jumartineze +author: John Snow Labs +name: bert_base_multilingual_uncased_sentiment_finetuned_meia_analisisdesentimientos_jumartineze_pipeline +date: 2024-09-25 +tags: [xx, open_source, pipeline, onnx] +task: Text Classification +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_multilingual_uncased_sentiment_finetuned_meia_analisisdesentimientos_jumartineze_pipeline` is a Multilingual model originally trained by Jumartineze. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_multilingual_uncased_sentiment_finetuned_meia_analisisdesentimientos_jumartineze_pipeline_xx_5.5.0_3.0_1727276569808.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_multilingual_uncased_sentiment_finetuned_meia_analisisdesentimientos_jumartineze_pipeline_xx_5.5.0_3.0_1727276569808.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_multilingual_uncased_sentiment_finetuned_meia_analisisdesentimientos_jumartineze_pipeline", lang = "xx") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_multilingual_uncased_sentiment_finetuned_meia_analisisdesentimientos_jumartineze_pipeline", lang = "xx") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_multilingual_uncased_sentiment_finetuned_meia_analisisdesentimientos_jumartineze_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|xx| +|Size:|627.8 MB| + +## References + +https://huggingface.co/Jumartineze/bert-base-multilingual-uncased-sentiment-finetuned-MeIA-AnalisisDeSentimientos + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_multilingual_uncased_sentiment_finetuned_meia_analisisdesentimientos_jumartineze_xx.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_multilingual_uncased_sentiment_finetuned_meia_analisisdesentimientos_jumartineze_xx.md new file mode 100644 index 00000000000000..6997ac25e6b9d1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_multilingual_uncased_sentiment_finetuned_meia_analisisdesentimientos_jumartineze_xx.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Multilingual bert_base_multilingual_uncased_sentiment_finetuned_meia_analisisdesentimientos_jumartineze BertForSequenceClassification from Jumartineze +author: John Snow Labs +name: bert_base_multilingual_uncased_sentiment_finetuned_meia_analisisdesentimientos_jumartineze +date: 2024-09-25 +tags: [xx, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_multilingual_uncased_sentiment_finetuned_meia_analisisdesentimientos_jumartineze` is a Multilingual model originally trained by Jumartineze. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_multilingual_uncased_sentiment_finetuned_meia_analisisdesentimientos_jumartineze_xx_5.5.0_3.0_1727276535539.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_multilingual_uncased_sentiment_finetuned_meia_analisisdesentimientos_jumartineze_xx_5.5.0_3.0_1727276535539.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_multilingual_uncased_sentiment_finetuned_meia_analisisdesentimientos_jumartineze","xx") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_multilingual_uncased_sentiment_finetuned_meia_analisisdesentimientos_jumartineze", "xx") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_multilingual_uncased_sentiment_finetuned_meia_analisisdesentimientos_jumartineze| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|xx| +|Size:|627.8 MB| + +## References + +https://huggingface.co/Jumartineze/bert-base-multilingual-uncased-sentiment-finetuned-MeIA-AnalisisDeSentimientos \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_multilingual_uncased_sentiment_finetuned_qqp_pipeline_xx.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_multilingual_uncased_sentiment_finetuned_qqp_pipeline_xx.md new file mode 100644 index 00000000000000..a1f4b74181d443 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_multilingual_uncased_sentiment_finetuned_qqp_pipeline_xx.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Multilingual bert_base_multilingual_uncased_sentiment_finetuned_qqp_pipeline pipeline BertForSequenceClassification from anuj55 +author: John Snow Labs +name: bert_base_multilingual_uncased_sentiment_finetuned_qqp_pipeline +date: 2024-09-25 +tags: [xx, open_source, pipeline, onnx] +task: Text Classification +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_multilingual_uncased_sentiment_finetuned_qqp_pipeline` is a Multilingual model originally trained by anuj55. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_multilingual_uncased_sentiment_finetuned_qqp_pipeline_xx_5.5.0_3.0_1727272900504.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_multilingual_uncased_sentiment_finetuned_qqp_pipeline_xx_5.5.0_3.0_1727272900504.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_multilingual_uncased_sentiment_finetuned_qqp_pipeline", lang = "xx") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_multilingual_uncased_sentiment_finetuned_qqp_pipeline", lang = "xx") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_multilingual_uncased_sentiment_finetuned_qqp_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|xx| +|Size:|627.8 MB| + +## References + +https://huggingface.co/anuj55/bert-base-multilingual-uncased-sentiment-finetuned-qqp + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_multilingual_uncased_sentiment_finetuned_qqp_xx.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_multilingual_uncased_sentiment_finetuned_qqp_xx.md new file mode 100644 index 00000000000000..14f7004cf652ab --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_multilingual_uncased_sentiment_finetuned_qqp_xx.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Multilingual bert_base_multilingual_uncased_sentiment_finetuned_qqp BertForSequenceClassification from anuj55 +author: John Snow Labs +name: bert_base_multilingual_uncased_sentiment_finetuned_qqp +date: 2024-09-25 +tags: [xx, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_multilingual_uncased_sentiment_finetuned_qqp` is a Multilingual model originally trained by anuj55. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_multilingual_uncased_sentiment_finetuned_qqp_xx_5.5.0_3.0_1727272865915.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_multilingual_uncased_sentiment_finetuned_qqp_xx_5.5.0_3.0_1727272865915.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_multilingual_uncased_sentiment_finetuned_qqp","xx") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_multilingual_uncased_sentiment_finetuned_qqp", "xx") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_multilingual_uncased_sentiment_finetuned_qqp| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|xx| +|Size:|627.8 MB| + +## References + +https://huggingface.co/anuj55/bert-base-multilingual-uncased-sentiment-finetuned-qqp \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_multilingual_uncased_vaxxstance_spanish_xx.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_multilingual_uncased_vaxxstance_spanish_xx.md new file mode 100644 index 00000000000000..a6cd122b45b0b4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_multilingual_uncased_vaxxstance_spanish_xx.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Multilingual bert_base_multilingual_uncased_vaxxstance_spanish BertForSequenceClassification from nouman-10 +author: John Snow Labs +name: bert_base_multilingual_uncased_vaxxstance_spanish +date: 2024-09-25 +tags: [xx, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_multilingual_uncased_vaxxstance_spanish` is a Multilingual model originally trained by nouman-10. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_multilingual_uncased_vaxxstance_spanish_xx_5.5.0_3.0_1727277454316.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_multilingual_uncased_vaxxstance_spanish_xx_5.5.0_3.0_1727277454316.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_multilingual_uncased_vaxxstance_spanish","xx") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_multilingual_uncased_vaxxstance_spanish", "xx") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_multilingual_uncased_vaxxstance_spanish| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|xx| +|Size:|627.7 MB| + +## References + +https://huggingface.co/nouman-10/bert-base-multilingual-uncased_vaxxstance_spanish \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_nlp100_title_classification_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_nlp100_title_classification_en.md new file mode 100644 index 00000000000000..c66436fbec6a2f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_nlp100_title_classification_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_nlp100_title_classification BertForSequenceClassification from udaizin +author: John Snow Labs +name: bert_base_nlp100_title_classification +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_nlp100_title_classification` is a English model originally trained by udaizin. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_nlp100_title_classification_en_5.5.0_3.0_1727268187429.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_nlp100_title_classification_en_5.5.0_3.0_1727268187429.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_nlp100_title_classification","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_nlp100_title_classification", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_nlp100_title_classification| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/udaizin/bert-base-nlp100_title_classification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_portuguese_cased_assin_similarity_pipeline_pt.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_portuguese_cased_assin_similarity_pipeline_pt.md new file mode 100644 index 00000000000000..67270861055020 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_portuguese_cased_assin_similarity_pipeline_pt.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Portuguese bert_base_portuguese_cased_assin_similarity_pipeline pipeline BertForSequenceClassification from ruanchaves +author: John Snow Labs +name: bert_base_portuguese_cased_assin_similarity_pipeline +date: 2024-09-25 +tags: [pt, open_source, pipeline, onnx] +task: Text Classification +language: pt +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_portuguese_cased_assin_similarity_pipeline` is a Portuguese model originally trained by ruanchaves. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_portuguese_cased_assin_similarity_pipeline_pt_5.5.0_3.0_1727267089381.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_portuguese_cased_assin_similarity_pipeline_pt_5.5.0_3.0_1727267089381.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_portuguese_cased_assin_similarity_pipeline", lang = "pt") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_portuguese_cased_assin_similarity_pipeline", lang = "pt") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_portuguese_cased_assin_similarity_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|pt| +|Size:|408.2 MB| + +## References + +https://huggingface.co/ruanchaves/bert-base-portuguese-cased-assin-similarity + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_portuguese_cased_assin_similarity_pt.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_portuguese_cased_assin_similarity_pt.md new file mode 100644 index 00000000000000..750667876d3704 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_portuguese_cased_assin_similarity_pt.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Portuguese bert_base_portuguese_cased_assin_similarity BertForSequenceClassification from ruanchaves +author: John Snow Labs +name: bert_base_portuguese_cased_assin_similarity +date: 2024-09-25 +tags: [pt, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: pt +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_portuguese_cased_assin_similarity` is a Portuguese model originally trained by ruanchaves. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_portuguese_cased_assin_similarity_pt_5.5.0_3.0_1727267066594.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_portuguese_cased_assin_similarity_pt_5.5.0_3.0_1727267066594.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_portuguese_cased_assin_similarity","pt") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_portuguese_cased_assin_similarity", "pt") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_portuguese_cased_assin_similarity| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|pt| +|Size:|408.2 MB| + +## References + +https://huggingface.co/ruanchaves/bert-base-portuguese-cased-assin-similarity \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_portuguese_cased_porsimplessent_pt.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_portuguese_cased_porsimplessent_pt.md new file mode 100644 index 00000000000000..d261a8f154f6b6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_portuguese_cased_porsimplessent_pt.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Portuguese bert_base_portuguese_cased_porsimplessent BertForSequenceClassification from ruanchaves +author: John Snow Labs +name: bert_base_portuguese_cased_porsimplessent +date: 2024-09-25 +tags: [pt, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: pt +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_portuguese_cased_porsimplessent` is a Portuguese model originally trained by ruanchaves. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_portuguese_cased_porsimplessent_pt_5.5.0_3.0_1727253718165.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_portuguese_cased_porsimplessent_pt_5.5.0_3.0_1727253718165.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_portuguese_cased_porsimplessent","pt") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_portuguese_cased_porsimplessent", "pt") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_portuguese_cased_porsimplessent| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|pt| +|Size:|408.2 MB| + +## References + +https://huggingface.co/ruanchaves/bert-base-portuguese-cased-porsimplessent \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_portuguese_fine_tuned_mrpc_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_portuguese_fine_tuned_mrpc_en.md new file mode 100644 index 00000000000000..35c10ed6246100 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_portuguese_fine_tuned_mrpc_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_portuguese_fine_tuned_mrpc BertForSequenceClassification from erickrribeiro +author: John Snow Labs +name: bert_base_portuguese_fine_tuned_mrpc +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_portuguese_fine_tuned_mrpc` is a English model originally trained by erickrribeiro. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_portuguese_fine_tuned_mrpc_en_5.5.0_3.0_1727273584144.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_portuguese_fine_tuned_mrpc_en_5.5.0_3.0_1727273584144.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_portuguese_fine_tuned_mrpc","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_portuguese_fine_tuned_mrpc", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_portuguese_fine_tuned_mrpc| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|408.2 MB| + +## References + +https://huggingface.co/erickrribeiro/bert-base-portuguese-fine-tuned-mrpc \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_sayula_popoluca_theseus_bulgarian_bg.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_sayula_popoluca_theseus_bulgarian_bg.md new file mode 100644 index 00000000000000..0872b08e8ee471 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_sayula_popoluca_theseus_bulgarian_bg.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Bulgarian bert_base_sayula_popoluca_theseus_bulgarian BertForTokenClassification from rmihaylov +author: John Snow Labs +name: bert_base_sayula_popoluca_theseus_bulgarian +date: 2024-09-25 +tags: [bg, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: bg +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_sayula_popoluca_theseus_bulgarian` is a Bulgarian model originally trained by rmihaylov. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_sayula_popoluca_theseus_bulgarian_bg_5.5.0_3.0_1727274959042.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_sayula_popoluca_theseus_bulgarian_bg_5.5.0_3.0_1727274959042.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("bert_base_sayula_popoluca_theseus_bulgarian","bg") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_base_sayula_popoluca_theseus_bulgarian", "bg") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_sayula_popoluca_theseus_bulgarian| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|bg| +|Size:|505.5 MB| + +## References + +https://huggingface.co/rmihaylov/bert-base-pos-theseus-bg \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_spanish_wwm_cased_meddocan_es.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_spanish_wwm_cased_meddocan_es.md new file mode 100644 index 00000000000000..3292d3d84ea096 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_spanish_wwm_cased_meddocan_es.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Castilian, Spanish bert_base_spanish_wwm_cased_meddocan BertForTokenClassification from IIC +author: John Snow Labs +name: bert_base_spanish_wwm_cased_meddocan +date: 2024-09-25 +tags: [es, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: es +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_spanish_wwm_cased_meddocan` is a Castilian, Spanish model originally trained by IIC. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_spanish_wwm_cased_meddocan_es_5.5.0_3.0_1727265268046.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_spanish_wwm_cased_meddocan_es_5.5.0_3.0_1727265268046.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("bert_base_spanish_wwm_cased_meddocan","es") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_base_spanish_wwm_cased_meddocan", "es") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_spanish_wwm_cased_meddocan| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|es| +|Size:|409.6 MB| + +## References + +https://huggingface.co/IIC/bert-base-spanish-wwm-cased-meddocan \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_spanish_wwm_cased_meddocan_pipeline_es.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_spanish_wwm_cased_meddocan_pipeline_es.md new file mode 100644 index 00000000000000..18582e58dba1b1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_spanish_wwm_cased_meddocan_pipeline_es.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Castilian, Spanish bert_base_spanish_wwm_cased_meddocan_pipeline pipeline BertForTokenClassification from IIC +author: John Snow Labs +name: bert_base_spanish_wwm_cased_meddocan_pipeline +date: 2024-09-25 +tags: [es, open_source, pipeline, onnx] +task: Named Entity Recognition +language: es +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_spanish_wwm_cased_meddocan_pipeline` is a Castilian, Spanish model originally trained by IIC. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_spanish_wwm_cased_meddocan_pipeline_es_5.5.0_3.0_1727265289796.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_spanish_wwm_cased_meddocan_pipeline_es_5.5.0_3.0_1727265289796.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_spanish_wwm_cased_meddocan_pipeline", lang = "es") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_spanish_wwm_cased_meddocan_pipeline", lang = "es") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_spanish_wwm_cased_meddocan_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|es| +|Size:|409.6 MB| + +## References + +https://huggingface.co/IIC/bert-base-spanish-wwm-cased-meddocan + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_spanish_wwm_cased_socialdisner_es.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_spanish_wwm_cased_socialdisner_es.md new file mode 100644 index 00000000000000..91e0e311e197e1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_spanish_wwm_cased_socialdisner_es.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Castilian, Spanish bert_base_spanish_wwm_cased_socialdisner BertForTokenClassification from IIC +author: John Snow Labs +name: bert_base_spanish_wwm_cased_socialdisner +date: 2024-09-25 +tags: [es, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: es +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_spanish_wwm_cased_socialdisner` is a Castilian, Spanish model originally trained by IIC. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_spanish_wwm_cased_socialdisner_es_5.5.0_3.0_1727284156369.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_spanish_wwm_cased_socialdisner_es_5.5.0_3.0_1727284156369.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("bert_base_spanish_wwm_cased_socialdisner","es") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_base_spanish_wwm_cased_socialdisner", "es") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_spanish_wwm_cased_socialdisner| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|es| +|Size:|409.5 MB| + +## References + +https://huggingface.co/IIC/bert-base-spanish-wwm-cased-socialdisner \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_spanish_wwm_cased_socialdisner_pipeline_es.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_spanish_wwm_cased_socialdisner_pipeline_es.md new file mode 100644 index 00000000000000..f5943e1434c66b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_spanish_wwm_cased_socialdisner_pipeline_es.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Castilian, Spanish bert_base_spanish_wwm_cased_socialdisner_pipeline pipeline BertForTokenClassification from IIC +author: John Snow Labs +name: bert_base_spanish_wwm_cased_socialdisner_pipeline +date: 2024-09-25 +tags: [es, open_source, pipeline, onnx] +task: Named Entity Recognition +language: es +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_spanish_wwm_cased_socialdisner_pipeline` is a Castilian, Spanish model originally trained by IIC. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_spanish_wwm_cased_socialdisner_pipeline_es_5.5.0_3.0_1727284181712.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_spanish_wwm_cased_socialdisner_pipeline_es_5.5.0_3.0_1727284181712.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_spanish_wwm_cased_socialdisner_pipeline", lang = "es") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_spanish_wwm_cased_socialdisner_pipeline", lang = "es") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_spanish_wwm_cased_socialdisner_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|es| +|Size:|409.5 MB| + +## References + +https://huggingface.co/IIC/bert-base-spanish-wwm-cased-socialdisner + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_spanish_wwm_uncased_finetuned_sayula_popoluca_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_spanish_wwm_uncased_finetuned_sayula_popoluca_pipeline_en.md new file mode 100644 index 00000000000000..ac72248b6c5bbc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_spanish_wwm_uncased_finetuned_sayula_popoluca_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_base_spanish_wwm_uncased_finetuned_sayula_popoluca_pipeline pipeline BertForTokenClassification from dccuchile +author: John Snow Labs +name: bert_base_spanish_wwm_uncased_finetuned_sayula_popoluca_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_spanish_wwm_uncased_finetuned_sayula_popoluca_pipeline` is a English model originally trained by dccuchile. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_spanish_wwm_uncased_finetuned_sayula_popoluca_pipeline_en_5.5.0_3.0_1727271585281.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_spanish_wwm_uncased_finetuned_sayula_popoluca_pipeline_en_5.5.0_3.0_1727271585281.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_spanish_wwm_uncased_finetuned_sayula_popoluca_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_spanish_wwm_uncased_finetuned_sayula_popoluca_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_spanish_wwm_uncased_finetuned_sayula_popoluca_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.7 MB| + +## References + +https://huggingface.co/dccuchile/bert-base-spanish-wwm-uncased-finetuned-pos + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_sst_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_sst_pipeline_en.md new file mode 100644 index 00000000000000..ca1855bbc9e289 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_sst_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_base_sst_pipeline pipeline BertForSequenceClassification from hugmanskj +author: John Snow Labs +name: bert_base_sst_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_sst_pipeline` is a English model originally trained by hugmanskj. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_sst_pipeline_en_5.5.0_3.0_1727286638169.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_sst_pipeline_en_5.5.0_3.0_1727286638169.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_sst_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_sst_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_sst_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/hugmanskj/bert-base-sst + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_temp_classifier_boot_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_temp_classifier_boot_pipeline_en.md new file mode 100644 index 00000000000000..318d08e3b1b823 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_temp_classifier_boot_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_base_temp_classifier_boot_pipeline pipeline BertForSequenceClassification from research-dump +author: John Snow Labs +name: bert_base_temp_classifier_boot_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_temp_classifier_boot_pipeline` is a English model originally trained by research-dump. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_temp_classifier_boot_pipeline_en_5.5.0_3.0_1727288190931.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_temp_classifier_boot_pipeline_en_5.5.0_3.0_1727288190931.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_temp_classifier_boot_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_temp_classifier_boot_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_temp_classifier_boot_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|405.9 MB| + +## References + +https://huggingface.co/research-dump/bert_base_temp_classifier_boot + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_theseus_bulgarian_bg.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_theseus_bulgarian_bg.md new file mode 100644 index 00000000000000..d45147a67b9d44 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_theseus_bulgarian_bg.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Bulgarian bert_base_theseus_bulgarian BertEmbeddings from rmihaylov +author: John Snow Labs +name: bert_base_theseus_bulgarian +date: 2024-09-25 +tags: [bg, open_source, onnx, embeddings, bert] +task: Embeddings +language: bg +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_theseus_bulgarian` is a Bulgarian model originally trained by rmihaylov. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_theseus_bulgarian_bg_5.5.0_3.0_1727258333284.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_theseus_bulgarian_bg_5.5.0_3.0_1727258333284.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = BertEmbeddings.pretrained("bert_base_theseus_bulgarian","bg") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = BertEmbeddings.pretrained("bert_base_theseus_bulgarian","bg") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_theseus_bulgarian| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[bert]| +|Language:|bg| +|Size:|505.4 MB| + +## References + +https://huggingface.co/rmihaylov/bert-base-theseus-bg \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_theseus_bulgarian_pipeline_bg.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_theseus_bulgarian_pipeline_bg.md new file mode 100644 index 00000000000000..63f7a4269191ab --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_theseus_bulgarian_pipeline_bg.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Bulgarian bert_base_theseus_bulgarian_pipeline pipeline BertEmbeddings from rmihaylov +author: John Snow Labs +name: bert_base_theseus_bulgarian_pipeline +date: 2024-09-25 +tags: [bg, open_source, pipeline, onnx] +task: Embeddings +language: bg +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_theseus_bulgarian_pipeline` is a Bulgarian model originally trained by rmihaylov. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_theseus_bulgarian_pipeline_bg_5.5.0_3.0_1727258359737.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_theseus_bulgarian_pipeline_bg_5.5.0_3.0_1727258359737.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_theseus_bulgarian_pipeline", lang = "bg") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_theseus_bulgarian_pipeline", lang = "bg") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_theseus_bulgarian_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|bg| +|Size:|505.4 MB| + +## References + +https://huggingface.co/rmihaylov/bert-base-theseus-bg + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_turkish_cased_finetuned_ner_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_turkish_cased_finetuned_ner_en.md new file mode 100644 index 00000000000000..aee3808816a42f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_turkish_cased_finetuned_ner_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_turkish_cased_finetuned_ner BertForTokenClassification from ugrozkr +author: John Snow Labs +name: bert_base_turkish_cased_finetuned_ner +date: 2024-09-25 +tags: [en, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_turkish_cased_finetuned_ner` is a English model originally trained by ugrozkr. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_turkish_cased_finetuned_ner_en_5.5.0_3.0_1727262491540.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_turkish_cased_finetuned_ner_en_5.5.0_3.0_1727262491540.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("bert_base_turkish_cased_finetuned_ner","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_base_turkish_cased_finetuned_ner", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_turkish_cased_finetuned_ner| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|412.3 MB| + +## References + +https://huggingface.co/ugrozkr/bert-base-turkish-cased-finetuned-ner \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_turkish_cased_finetuned_ner_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_turkish_cased_finetuned_ner_pipeline_en.md new file mode 100644 index 00000000000000..e15bd2dcc387f0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_turkish_cased_finetuned_ner_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_base_turkish_cased_finetuned_ner_pipeline pipeline BertForTokenClassification from ugrozkr +author: John Snow Labs +name: bert_base_turkish_cased_finetuned_ner_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_turkish_cased_finetuned_ner_pipeline` is a English model originally trained by ugrozkr. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_turkish_cased_finetuned_ner_pipeline_en_5.5.0_3.0_1727262513423.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_turkish_cased_finetuned_ner_pipeline_en_5.5.0_3.0_1727262513423.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_turkish_cased_finetuned_ner_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_turkish_cased_finetuned_ner_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_turkish_cased_finetuned_ner_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|412.3 MB| + +## References + +https://huggingface.co/ugrozkr/bert-base-turkish-cased-finetuned-ner + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_tweetner7_2020_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_tweetner7_2020_pipeline_en.md new file mode 100644 index 00000000000000..541035ba2e944e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_tweetner7_2020_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_base_tweetner7_2020_pipeline pipeline BertForTokenClassification from tner +author: John Snow Labs +name: bert_base_tweetner7_2020_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_tweetner7_2020_pipeline` is a English model originally trained by tner. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_tweetner7_2020_pipeline_en_5.5.0_3.0_1727264846427.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_tweetner7_2020_pipeline_en_5.5.0_3.0_1727264846427.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_tweetner7_2020_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_tweetner7_2020_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_tweetner7_2020_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/tner/bert-base-tweetner7-2020 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_1802_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_1802_en.md new file mode 100644 index 00000000000000..fb9e3b4ff22450 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_1802_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_uncased_1802 BertEmbeddings from JamesKim +author: John Snow Labs +name: bert_base_uncased_1802 +date: 2024-09-25 +tags: [en, open_source, onnx, embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_1802` is a English model originally trained by JamesKim. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_1802_en_5.5.0_3.0_1727256384007.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_1802_en_5.5.0_3.0_1727256384007.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = BertEmbeddings.pretrained("bert_base_uncased_1802","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = BertEmbeddings.pretrained("bert_base_uncased_1802","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_1802| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[bert]| +|Language:|en| +|Size:|407.1 MB| + +## References + +https://huggingface.co/JamesKim/bert-base-uncased_1802 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_1802_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_1802_pipeline_en.md new file mode 100644 index 00000000000000..9a85a1e8184bb8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_1802_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_base_uncased_1802_pipeline pipeline BertEmbeddings from JamesKim +author: John Snow Labs +name: bert_base_uncased_1802_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_1802_pipeline` is a English model originally trained by JamesKim. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_1802_pipeline_en_5.5.0_3.0_1727256405375.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_1802_pipeline_en_5.5.0_3.0_1727256405375.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_1802_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_1802_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_1802_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.1 MB| + +## References + +https://huggingface.co/JamesKim/bert-base-uncased_1802 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_1802_r2_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_1802_r2_en.md new file mode 100644 index 00000000000000..a2ee3d6f34a07c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_1802_r2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_uncased_1802_r2 BertEmbeddings from JamesKim +author: John Snow Labs +name: bert_base_uncased_1802_r2 +date: 2024-09-25 +tags: [en, open_source, onnx, embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_1802_r2` is a English model originally trained by JamesKim. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_1802_r2_en_5.5.0_3.0_1727236503350.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_1802_r2_en_5.5.0_3.0_1727236503350.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = BertEmbeddings.pretrained("bert_base_uncased_1802_r2","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = BertEmbeddings.pretrained("bert_base_uncased_1802_r2","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_1802_r2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[bert]| +|Language:|en| +|Size:|407.1 MB| + +## References + +https://huggingface.co/JamesKim/bert-base-uncased_1802_r2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_1802_r2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_1802_r2_pipeline_en.md new file mode 100644 index 00000000000000..68cd6277731f39 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_1802_r2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_base_uncased_1802_r2_pipeline pipeline BertEmbeddings from JamesKim +author: John Snow Labs +name: bert_base_uncased_1802_r2_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_1802_r2_pipeline` is a English model originally trained by JamesKim. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_1802_r2_pipeline_en_5.5.0_3.0_1727236525159.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_1802_r2_pipeline_en_5.5.0_3.0_1727236525159.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_1802_r2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_1802_r2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_1802_r2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.1 MB| + +## References + +https://huggingface.co/JamesKim/bert-base-uncased_1802_r2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_8_50_0_01_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_8_50_0_01_en.md new file mode 100644 index 00000000000000..0b6436310aa5e9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_8_50_0_01_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_uncased_8_50_0_01 BertForSequenceClassification from daisyxie21 +author: John Snow Labs +name: bert_base_uncased_8_50_0_01 +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_8_50_0_01` is a English model originally trained by daisyxie21. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_8_50_0_01_en_5.5.0_3.0_1727276675095.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_8_50_0_01_en_5.5.0_3.0_1727276675095.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_uncased_8_50_0_01","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_uncased_8_50_0_01", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_8_50_0_01| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|406.4 MB| + +## References + +https://huggingface.co/daisyxie21/bert-base-uncased-8-50-0.01 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_ad_nonad_classifer_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_ad_nonad_classifer_en.md new file mode 100644 index 00000000000000..9d22e9ec8a3072 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_ad_nonad_classifer_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_uncased_ad_nonad_classifer BertForSequenceClassification from Kaleemullah +author: John Snow Labs +name: bert_base_uncased_ad_nonad_classifer +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_ad_nonad_classifer` is a English model originally trained by Kaleemullah. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_ad_nonad_classifer_en_5.5.0_3.0_1727285254090.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_ad_nonad_classifer_en_5.5.0_3.0_1727285254090.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_uncased_ad_nonad_classifer","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_uncased_ad_nonad_classifer", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_ad_nonad_classifer| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/Kaleemullah/bert-base-uncased-ad-nonad-classifer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_airlines_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_airlines_en.md new file mode 100644 index 00000000000000..2883afbfc6ca03 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_airlines_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_uncased_airlines BertForSequenceClassification from tasosk +author: John Snow Labs +name: bert_base_uncased_airlines +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_airlines` is a English model originally trained by tasosk. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_airlines_en_5.5.0_3.0_1727268558796.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_airlines_en_5.5.0_3.0_1727268558796.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_uncased_airlines","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_uncased_airlines", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_airlines| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/tasosk/bert-base-uncased-airlines \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_airlines_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_airlines_pipeline_en.md new file mode 100644 index 00000000000000..3007c725f9a075 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_airlines_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_base_uncased_airlines_pipeline pipeline BertForSequenceClassification from tasosk +author: John Snow Labs +name: bert_base_uncased_airlines_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_airlines_pipeline` is a English model originally trained by tasosk. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_airlines_pipeline_en_5.5.0_3.0_1727268581178.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_airlines_pipeline_en_5.5.0_3.0_1727268581178.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_airlines_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_airlines_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_airlines_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/tasosk/bert-base-uncased-airlines + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_alerts04142023_rsplit_2000_category1_severity_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_alerts04142023_rsplit_2000_category1_severity_en.md new file mode 100644 index 00000000000000..3623d7eb6391c6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_alerts04142023_rsplit_2000_category1_severity_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_uncased_alerts04142023_rsplit_2000_category1_severity BertForSequenceClassification from slewis +author: John Snow Labs +name: bert_base_uncased_alerts04142023_rsplit_2000_category1_severity +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_alerts04142023_rsplit_2000_category1_severity` is a English model originally trained by slewis. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_alerts04142023_rsplit_2000_category1_severity_en_5.5.0_3.0_1727287688543.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_alerts04142023_rsplit_2000_category1_severity_en_5.5.0_3.0_1727287688543.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_uncased_alerts04142023_rsplit_2000_category1_severity","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_uncased_alerts04142023_rsplit_2000_category1_severity", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_alerts04142023_rsplit_2000_category1_severity| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.5 MB| + +## References + +https://huggingface.co/slewis/bert-base-uncased_alerts04142023_rsplit_2000_Category1_Severity \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_alerts04142023_rsplit_2000_category1_severity_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_alerts04142023_rsplit_2000_category1_severity_pipeline_en.md new file mode 100644 index 00000000000000..73812e4fcdb964 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_alerts04142023_rsplit_2000_category1_severity_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_base_uncased_alerts04142023_rsplit_2000_category1_severity_pipeline pipeline BertForSequenceClassification from slewis +author: John Snow Labs +name: bert_base_uncased_alerts04142023_rsplit_2000_category1_severity_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_alerts04142023_rsplit_2000_category1_severity_pipeline` is a English model originally trained by slewis. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_alerts04142023_rsplit_2000_category1_severity_pipeline_en_5.5.0_3.0_1727287709665.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_alerts04142023_rsplit_2000_category1_severity_pipeline_en_5.5.0_3.0_1727287709665.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_alerts04142023_rsplit_2000_category1_severity_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_alerts04142023_rsplit_2000_category1_severity_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_alerts04142023_rsplit_2000_category1_severity_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.5 MB| + +## References + +https://huggingface.co/slewis/bert-base-uncased_alerts04142023_rsplit_2000_Category1_Severity + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_cola_int8_indic_languages_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_cola_int8_indic_languages_en.md new file mode 100644 index 00000000000000..377818af554ad7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_cola_int8_indic_languages_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_uncased_cola_int8_indic_languages BertForSequenceClassification from Intel +author: John Snow Labs +name: bert_base_uncased_cola_int8_indic_languages +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_cola_int8_indic_languages` is a English model originally trained by Intel. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_cola_int8_indic_languages_en_5.5.0_3.0_1727269236157.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_cola_int8_indic_languages_en_5.5.0_3.0_1727269236157.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_uncased_cola_int8_indic_languages","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_uncased_cola_int8_indic_languages", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_cola_int8_indic_languages| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.0 MB| + +## References + +https://huggingface.co/Intel/bert-base-uncased-CoLA-int8-inc \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_crows_pairs_classifieronly_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_crows_pairs_classifieronly_en.md new file mode 100644 index 00000000000000..c0b08a9b1872d7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_crows_pairs_classifieronly_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_uncased_crows_pairs_classifieronly BertForSequenceClassification from asun17904 +author: John Snow Labs +name: bert_base_uncased_crows_pairs_classifieronly +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_crows_pairs_classifieronly` is a English model originally trained by asun17904. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_crows_pairs_classifieronly_en_5.5.0_3.0_1727279543934.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_crows_pairs_classifieronly_en_5.5.0_3.0_1727279543934.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_uncased_crows_pairs_classifieronly","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_uncased_crows_pairs_classifieronly", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_crows_pairs_classifieronly| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/asun17904/bert-base-uncased_crows_pairs_classifieronly \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_dstc10_kb_title_body_validate_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_dstc10_kb_title_body_validate_pipeline_en.md new file mode 100644 index 00000000000000..9cd68482ba1f2f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_dstc10_kb_title_body_validate_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_base_uncased_dstc10_kb_title_body_validate_pipeline pipeline BertForSequenceClassification from wilsontam +author: John Snow Labs +name: bert_base_uncased_dstc10_kb_title_body_validate_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_dstc10_kb_title_body_validate_pipeline` is a English model originally trained by wilsontam. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_dstc10_kb_title_body_validate_pipeline_en_5.5.0_3.0_1727288135713.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_dstc10_kb_title_body_validate_pipeline_en_5.5.0_3.0_1727288135713.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_dstc10_kb_title_body_validate_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_dstc10_kb_title_body_validate_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_dstc10_kb_title_body_validate_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/wilsontam/bert-base-uncased-dstc10-kb-title-body-validate + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_e_care_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_e_care_en.md new file mode 100644 index 00000000000000..79155c80275a1e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_e_care_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_uncased_e_care BertForQuestionAnswering from DunnBC22 +author: John Snow Labs +name: bert_base_uncased_e_care +date: 2024-09-25 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_e_care` is a English model originally trained by DunnBC22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_e_care_en_5.5.0_3.0_1727239194784.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_e_care_en_5.5.0_3.0_1727239194784.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_e_care","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_e_care", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_e_care| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/DunnBC22/bert-base-uncased-e_CARE \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_ear_mlma_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_ear_mlma_en.md new file mode 100644 index 00000000000000..1edb8a10f0cb36 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_ear_mlma_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_uncased_ear_mlma BertForSequenceClassification from MilaNLProc +author: John Snow Labs +name: bert_base_uncased_ear_mlma +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_ear_mlma` is a English model originally trained by MilaNLProc. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_ear_mlma_en_5.5.0_3.0_1727263485878.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_ear_mlma_en_5.5.0_3.0_1727263485878.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_uncased_ear_mlma","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_uncased_ear_mlma", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_ear_mlma| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/MilaNLProc/bert-base-uncased-ear-mlma \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_emotion_ft_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_emotion_ft_en.md new file mode 100644 index 00000000000000..bc0b92d05fcdd5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_emotion_ft_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_uncased_emotion_ft BertForSequenceClassification from colingao +author: John Snow Labs +name: bert_base_uncased_emotion_ft +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_emotion_ft` is a English model originally trained by colingao. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_emotion_ft_en_5.5.0_3.0_1727276584617.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_emotion_ft_en_5.5.0_3.0_1727276584617.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_uncased_emotion_ft","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_uncased_emotion_ft", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_emotion_ft| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/colingao/bert-base-uncased_emotion_ft \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_fine_tuned_imdb_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_fine_tuned_imdb_en.md new file mode 100644 index 00000000000000..31b84c43af0edc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_fine_tuned_imdb_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_uncased_fine_tuned_imdb BertForSequenceClassification from shre-db +author: John Snow Labs +name: bert_base_uncased_fine_tuned_imdb +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_fine_tuned_imdb` is a English model originally trained by shre-db. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_fine_tuned_imdb_en_5.5.0_3.0_1727261528486.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_fine_tuned_imdb_en_5.5.0_3.0_1727261528486.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_uncased_fine_tuned_imdb","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_uncased_fine_tuned_imdb", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_fine_tuned_imdb| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/shre-db/Bert-Base-Uncased-Fine-Tuned-IMDB \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_finetuned2_cola_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_finetuned2_cola_en.md new file mode 100644 index 00000000000000..9c81729ddb62a7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_finetuned2_cola_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_uncased_finetuned2_cola BertForSequenceClassification from ilkekas +author: John Snow Labs +name: bert_base_uncased_finetuned2_cola +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetuned2_cola` is a English model originally trained by ilkekas. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned2_cola_en_5.5.0_3.0_1727267764170.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned2_cola_en_5.5.0_3.0_1727267764170.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_uncased_finetuned2_cola","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_uncased_finetuned2_cola", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetuned2_cola| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/ilkekas/bert-base-uncased-finetuned2-cola \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_finetuned2_cola_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_finetuned2_cola_pipeline_en.md new file mode 100644 index 00000000000000..df5f097b916f9a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_finetuned2_cola_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_base_uncased_finetuned2_cola_pipeline pipeline BertForSequenceClassification from ilkekas +author: John Snow Labs +name: bert_base_uncased_finetuned2_cola_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetuned2_cola_pipeline` is a English model originally trained by ilkekas. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned2_cola_pipeline_en_5.5.0_3.0_1727267786213.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned2_cola_pipeline_en_5.5.0_3.0_1727267786213.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_finetuned2_cola_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_finetuned2_cola_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetuned2_cola_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/ilkekas/bert-base-uncased-finetuned2-cola + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_finetuned_amazon_reviews_multi_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_finetuned_amazon_reviews_multi_en.md new file mode 100644 index 00000000000000..63430e8195d397 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_finetuned_amazon_reviews_multi_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_uncased_finetuned_amazon_reviews_multi BertForSequenceClassification from JoelVIU +author: John Snow Labs +name: bert_base_uncased_finetuned_amazon_reviews_multi +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetuned_amazon_reviews_multi` is a English model originally trained by JoelVIU. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_amazon_reviews_multi_en_5.5.0_3.0_1727286280588.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_amazon_reviews_multi_en_5.5.0_3.0_1727286280588.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_uncased_finetuned_amazon_reviews_multi","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_uncased_finetuned_amazon_reviews_multi", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetuned_amazon_reviews_multi| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/JoelVIU/bert-base-uncased-finetuned-amazon_reviews_multi \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_finetuned_cda_gender_neutral_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_finetuned_cda_gender_neutral_en.md new file mode 100644 index 00000000000000..3424bc4280c90a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_finetuned_cda_gender_neutral_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_uncased_finetuned_cda_gender_neutral BertEmbeddings from zz990906 +author: John Snow Labs +name: bert_base_uncased_finetuned_cda_gender_neutral +date: 2024-09-25 +tags: [en, open_source, onnx, embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetuned_cda_gender_neutral` is a English model originally trained by zz990906. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_cda_gender_neutral_en_5.5.0_3.0_1727232569580.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_cda_gender_neutral_en_5.5.0_3.0_1727232569580.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = BertEmbeddings.pretrained("bert_base_uncased_finetuned_cda_gender_neutral","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = BertEmbeddings.pretrained("bert_base_uncased_finetuned_cda_gender_neutral","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetuned_cda_gender_neutral| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[bert]| +|Language:|en| +|Size:|407.1 MB| + +## References + +https://huggingface.co/zz990906/bert-base-uncased-finetuned-cda-gender-neutral \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_finetuned_cola_avb_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_finetuned_cola_avb_pipeline_en.md new file mode 100644 index 00000000000000..8a174907cdd17f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_finetuned_cola_avb_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_base_uncased_finetuned_cola_avb_pipeline pipeline BertForSequenceClassification from avb +author: John Snow Labs +name: bert_base_uncased_finetuned_cola_avb_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetuned_cola_avb_pipeline` is a English model originally trained by avb. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_cola_avb_pipeline_en_5.5.0_3.0_1727268577092.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_cola_avb_pipeline_en_5.5.0_3.0_1727268577092.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_finetuned_cola_avb_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_finetuned_cola_avb_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetuned_cola_avb_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/avb/bert-base-uncased-finetuned-cola + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_finetuned_cola_kaanha_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_finetuned_cola_kaanha_en.md new file mode 100644 index 00000000000000..17351122f7dcd2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_finetuned_cola_kaanha_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_uncased_finetuned_cola_kaanha BertForSequenceClassification from KaanHa +author: John Snow Labs +name: bert_base_uncased_finetuned_cola_kaanha +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetuned_cola_kaanha` is a English model originally trained by KaanHa. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_cola_kaanha_en_5.5.0_3.0_1727287133479.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_cola_kaanha_en_5.5.0_3.0_1727287133479.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_uncased_finetuned_cola_kaanha","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_uncased_finetuned_cola_kaanha", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetuned_cola_kaanha| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/KaanHa/bert-base-uncased-finetuned-cola \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_finetuned_cola_learning_rate_2e_05_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_finetuned_cola_learning_rate_2e_05_en.md new file mode 100644 index 00000000000000..987031b7184264 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_finetuned_cola_learning_rate_2e_05_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_uncased_finetuned_cola_learning_rate_2e_05 BertForSequenceClassification from cansurav +author: John Snow Labs +name: bert_base_uncased_finetuned_cola_learning_rate_2e_05 +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetuned_cola_learning_rate_2e_05` is a English model originally trained by cansurav. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_cola_learning_rate_2e_05_en_5.5.0_3.0_1727286389430.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_cola_learning_rate_2e_05_en_5.5.0_3.0_1727286389430.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_uncased_finetuned_cola_learning_rate_2e_05","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_uncased_finetuned_cola_learning_rate_2e_05", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetuned_cola_learning_rate_2e_05| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/cansurav/bert-base-uncased-finetuned-cola-learning_rate-2e-05 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_finetuned_cola_sepehrbakhshi_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_finetuned_cola_sepehrbakhshi_en.md new file mode 100644 index 00000000000000..8b02c39d0e8f81 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_finetuned_cola_sepehrbakhshi_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_uncased_finetuned_cola_sepehrbakhshi BertForSequenceClassification from sepehrbakhshi +author: John Snow Labs +name: bert_base_uncased_finetuned_cola_sepehrbakhshi +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetuned_cola_sepehrbakhshi` is a English model originally trained by sepehrbakhshi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_cola_sepehrbakhshi_en_5.5.0_3.0_1727288324819.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_cola_sepehrbakhshi_en_5.5.0_3.0_1727288324819.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_uncased_finetuned_cola_sepehrbakhshi","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_uncased_finetuned_cola_sepehrbakhshi", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetuned_cola_sepehrbakhshi| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/sepehrbakhshi/bert-base-uncased-finetuned-cola \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_finetuned_depression_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_finetuned_depression_pipeline_en.md new file mode 100644 index 00000000000000..da445793d0c331 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_finetuned_depression_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_base_uncased_finetuned_depression_pipeline pipeline BertForSequenceClassification from welsachy +author: John Snow Labs +name: bert_base_uncased_finetuned_depression_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetuned_depression_pipeline` is a English model originally trained by welsachy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_depression_pipeline_en_5.5.0_3.0_1727276762936.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_depression_pipeline_en_5.5.0_3.0_1727276762936.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_finetuned_depression_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_finetuned_depression_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetuned_depression_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/welsachy/bert-base-uncased-finetuned-depression + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_finetuned_detests_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_finetuned_detests_en.md new file mode 100644 index 00000000000000..8941177e001a71 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_finetuned_detests_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_uncased_finetuned_detests BertForSequenceClassification from Pablo94 +author: John Snow Labs +name: bert_base_uncased_finetuned_detests +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetuned_detests` is a English model originally trained by Pablo94. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_detests_en_5.5.0_3.0_1727268304494.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_detests_en_5.5.0_3.0_1727268304494.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_uncased_finetuned_detests","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_uncased_finetuned_detests", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetuned_detests| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/Pablo94/bert-base-uncased-finetuned-detests \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_finetuned_detests_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_finetuned_detests_pipeline_en.md new file mode 100644 index 00000000000000..ae52dad3402a76 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_finetuned_detests_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_base_uncased_finetuned_detests_pipeline pipeline BertForSequenceClassification from Pablo94 +author: John Snow Labs +name: bert_base_uncased_finetuned_detests_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetuned_detests_pipeline` is a English model originally trained by Pablo94. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_detests_pipeline_en_5.5.0_3.0_1727268327256.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_detests_pipeline_en_5.5.0_3.0_1727268327256.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_finetuned_detests_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_finetuned_detests_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetuned_detests_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/Pablo94/bert-base-uncased-finetuned-detests + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_finetuned_imdb_rman_rahimi_29_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_finetuned_imdb_rman_rahimi_29_en.md new file mode 100644 index 00000000000000..5441edf05f0db1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_finetuned_imdb_rman_rahimi_29_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_uncased_finetuned_imdb_rman_rahimi_29 BertEmbeddings from rman-rahimi-29 +author: John Snow Labs +name: bert_base_uncased_finetuned_imdb_rman_rahimi_29 +date: 2024-09-25 +tags: [en, open_source, onnx, embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetuned_imdb_rman_rahimi_29` is a English model originally trained by rman-rahimi-29. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_imdb_rman_rahimi_29_en_5.5.0_3.0_1727240722847.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_imdb_rman_rahimi_29_en_5.5.0_3.0_1727240722847.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = BertEmbeddings.pretrained("bert_base_uncased_finetuned_imdb_rman_rahimi_29","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = BertEmbeddings.pretrained("bert_base_uncased_finetuned_imdb_rman_rahimi_29","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetuned_imdb_rman_rahimi_29| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[bert]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/rman-rahimi-29/bert-base-uncased-finetuned-imdb \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_finetuned_learningrate_2_cola_4e_05_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_finetuned_learningrate_2_cola_4e_05_pipeline_en.md new file mode 100644 index 00000000000000..06b02b517aa49c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_finetuned_learningrate_2_cola_4e_05_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_base_uncased_finetuned_learningrate_2_cola_4e_05_pipeline pipeline BertForSequenceClassification from yagmurery +author: John Snow Labs +name: bert_base_uncased_finetuned_learningrate_2_cola_4e_05_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetuned_learningrate_2_cola_4e_05_pipeline` is a English model originally trained by yagmurery. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_learningrate_2_cola_4e_05_pipeline_en_5.5.0_3.0_1727273487652.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_learningrate_2_cola_4e_05_pipeline_en_5.5.0_3.0_1727273487652.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_finetuned_learningrate_2_cola_4e_05_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_finetuned_learningrate_2_cola_4e_05_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetuned_learningrate_2_cola_4e_05_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/yagmurery/bert-base-uncased-finetuned-learningRate-2-cola-4e-05 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_finetuned_mnli_max_length_256_epoch_5_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_finetuned_mnli_max_length_256_epoch_5_en.md new file mode 100644 index 00000000000000..75cb040e6d3b95 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_finetuned_mnli_max_length_256_epoch_5_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_uncased_finetuned_mnli_max_length_256_epoch_5 BertForSequenceClassification from yy642 +author: John Snow Labs +name: bert_base_uncased_finetuned_mnli_max_length_256_epoch_5 +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetuned_mnli_max_length_256_epoch_5` is a English model originally trained by yy642. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_mnli_max_length_256_epoch_5_en_5.5.0_3.0_1727278395887.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_mnli_max_length_256_epoch_5_en_5.5.0_3.0_1727278395887.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_uncased_finetuned_mnli_max_length_256_epoch_5","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_uncased_finetuned_mnli_max_length_256_epoch_5", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetuned_mnli_max_length_256_epoch_5| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/yy642/bert-base-uncased-finetuned-mnli-max-length-256-epoch-5 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_finetuned_mnli_max_length_256_epoch_5_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_finetuned_mnli_max_length_256_epoch_5_pipeline_en.md new file mode 100644 index 00000000000000..260f5a0b35ddac --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_finetuned_mnli_max_length_256_epoch_5_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_base_uncased_finetuned_mnli_max_length_256_epoch_5_pipeline pipeline BertForSequenceClassification from yy642 +author: John Snow Labs +name: bert_base_uncased_finetuned_mnli_max_length_256_epoch_5_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetuned_mnli_max_length_256_epoch_5_pipeline` is a English model originally trained by yy642. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_mnli_max_length_256_epoch_5_pipeline_en_5.5.0_3.0_1727278417026.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_mnli_max_length_256_epoch_5_pipeline_en_5.5.0_3.0_1727278417026.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_finetuned_mnli_max_length_256_epoch_5_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_finetuned_mnli_max_length_256_epoch_5_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetuned_mnli_max_length_256_epoch_5_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/yy642/bert-base-uncased-finetuned-mnli-max-length-256-epoch-5 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_finetuned_mnli_rte_wnli_3_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_finetuned_mnli_rte_wnli_3_en.md new file mode 100644 index 00000000000000..4991135ca6741a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_finetuned_mnli_rte_wnli_3_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_uncased_finetuned_mnli_rte_wnli_3 BertForSequenceClassification from yy642 +author: John Snow Labs +name: bert_base_uncased_finetuned_mnli_rte_wnli_3 +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetuned_mnli_rte_wnli_3` is a English model originally trained by yy642. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_mnli_rte_wnli_3_en_5.5.0_3.0_1727273560436.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_mnli_rte_wnli_3_en_5.5.0_3.0_1727273560436.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_uncased_finetuned_mnli_rte_wnli_3","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_uncased_finetuned_mnli_rte_wnli_3", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetuned_mnli_rte_wnli_3| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/yy642/bert-base-uncased-finetuned-mnli-rte-wnli-3 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_finetuned_news_1929_1932_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_finetuned_news_1929_1932_pipeline_en.md new file mode 100644 index 00000000000000..bf76a41e02352d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_finetuned_news_1929_1932_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_base_uncased_finetuned_news_1929_1932_pipeline pipeline BertEmbeddings from sally9805 +author: John Snow Labs +name: bert_base_uncased_finetuned_news_1929_1932_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetuned_news_1929_1932_pipeline` is a English model originally trained by sally9805. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_news_1929_1932_pipeline_en_5.5.0_3.0_1727254945892.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_news_1929_1932_pipeline_en_5.5.0_3.0_1727254945892.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_finetuned_news_1929_1932_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_finetuned_news_1929_1932_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetuned_news_1929_1932_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/sally9805/bert-base-uncased-finetuned-news-1929-1932 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_finetuned_poli_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_finetuned_poli_pipeline_en.md new file mode 100644 index 00000000000000..7758e327705ddd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_finetuned_poli_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_base_uncased_finetuned_poli_pipeline pipeline BertForSequenceClassification from lmajer +author: John Snow Labs +name: bert_base_uncased_finetuned_poli_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetuned_poli_pipeline` is a English model originally trained by lmajer. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_poli_pipeline_en_5.5.0_3.0_1727284948735.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_poli_pipeline_en_5.5.0_3.0_1727284948735.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_finetuned_poli_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_finetuned_poli_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetuned_poli_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/lmajer/bert-base-uncased-finetuned-POLI + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_finetuned_rte_max_length_512_epoch_10_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_finetuned_rte_max_length_512_epoch_10_en.md new file mode 100644 index 00000000000000..5c4737ab074f66 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_finetuned_rte_max_length_512_epoch_10_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_uncased_finetuned_rte_max_length_512_epoch_10 BertForSequenceClassification from yy642 +author: John Snow Labs +name: bert_base_uncased_finetuned_rte_max_length_512_epoch_10 +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetuned_rte_max_length_512_epoch_10` is a English model originally trained by yy642. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_rte_max_length_512_epoch_10_en_5.5.0_3.0_1727272894453.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_rte_max_length_512_epoch_10_en_5.5.0_3.0_1727272894453.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_uncased_finetuned_rte_max_length_512_epoch_10","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_uncased_finetuned_rte_max_length_512_epoch_10", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetuned_rte_max_length_512_epoch_10| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/yy642/bert-base-uncased-finetuned-rte-max-length-512-epoch-10 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_finetuned_rte_max_length_512_epoch_10_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_finetuned_rte_max_length_512_epoch_10_pipeline_en.md new file mode 100644 index 00000000000000..e2a116b1f50d8c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_finetuned_rte_max_length_512_epoch_10_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_base_uncased_finetuned_rte_max_length_512_epoch_10_pipeline pipeline BertForSequenceClassification from yy642 +author: John Snow Labs +name: bert_base_uncased_finetuned_rte_max_length_512_epoch_10_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetuned_rte_max_length_512_epoch_10_pipeline` is a English model originally trained by yy642. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_rte_max_length_512_epoch_10_pipeline_en_5.5.0_3.0_1727272916326.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_rte_max_length_512_epoch_10_pipeline_en_5.5.0_3.0_1727272916326.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_finetuned_rte_max_length_512_epoch_10_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_finetuned_rte_max_length_512_epoch_10_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetuned_rte_max_length_512_epoch_10_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/yy642/bert-base-uncased-finetuned-rte-max-length-512-epoch-10 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_finetuned_rte_max_length_512_epoch_5_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_finetuned_rte_max_length_512_epoch_5_pipeline_en.md new file mode 100644 index 00000000000000..317acb9f7ea01d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_finetuned_rte_max_length_512_epoch_5_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_base_uncased_finetuned_rte_max_length_512_epoch_5_pipeline pipeline BertForSequenceClassification from yy642 +author: John Snow Labs +name: bert_base_uncased_finetuned_rte_max_length_512_epoch_5_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetuned_rte_max_length_512_epoch_5_pipeline` is a English model originally trained by yy642. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_rte_max_length_512_epoch_5_pipeline_en_5.5.0_3.0_1727286811073.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_rte_max_length_512_epoch_5_pipeline_en_5.5.0_3.0_1727286811073.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_finetuned_rte_max_length_512_epoch_5_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_finetuned_rte_max_length_512_epoch_5_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetuned_rte_max_length_512_epoch_5_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/yy642/bert-base-uncased-finetuned-rte-max-length-512-epoch-5 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_finetuned_stationary_epoch_update_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_finetuned_stationary_epoch_update_en.md new file mode 100644 index 00000000000000..ff552f2769a0a5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_finetuned_stationary_epoch_update_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_uncased_finetuned_stationary_epoch_update BertForSequenceClassification from MKS3099 +author: John Snow Labs +name: bert_base_uncased_finetuned_stationary_epoch_update +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetuned_stationary_epoch_update` is a English model originally trained by MKS3099. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_stationary_epoch_update_en_5.5.0_3.0_1727269239360.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_stationary_epoch_update_en_5.5.0_3.0_1727269239360.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_uncased_finetuned_stationary_epoch_update","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_uncased_finetuned_stationary_epoch_update", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetuned_stationary_epoch_update| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/MKS3099/bert-base-uncased-finetuned-stationary-epoch-update \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_finetuned_toxic_comment_detection_ws23_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_finetuned_toxic_comment_detection_ws23_pipeline_en.md new file mode 100644 index 00000000000000..7890190b921ad4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_finetuned_toxic_comment_detection_ws23_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_base_uncased_finetuned_toxic_comment_detection_ws23_pipeline pipeline BertForSequenceClassification from tillschwoerer +author: John Snow Labs +name: bert_base_uncased_finetuned_toxic_comment_detection_ws23_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetuned_toxic_comment_detection_ws23_pipeline` is a English model originally trained by tillschwoerer. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_toxic_comment_detection_ws23_pipeline_en_5.5.0_3.0_1727261098857.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_toxic_comment_detection_ws23_pipeline_en_5.5.0_3.0_1727261098857.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_finetuned_toxic_comment_detection_ws23_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_finetuned_toxic_comment_detection_ws23_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetuned_toxic_comment_detection_ws23_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/tillschwoerer/bert-base-uncased-finetuned-toxic-comment-detection-ws23 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_glue_cola_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_glue_cola_pipeline_en.md new file mode 100644 index 00000000000000..ccde384f823bad --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_glue_cola_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_base_uncased_glue_cola_pipeline pipeline BertForSequenceClassification from pmthangk09 +author: John Snow Labs +name: bert_base_uncased_glue_cola_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_glue_cola_pipeline` is a English model originally trained by pmthangk09. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_glue_cola_pipeline_en_5.5.0_3.0_1727266397271.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_glue_cola_pipeline_en_5.5.0_3.0_1727266397271.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_glue_cola_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_glue_cola_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_glue_cola_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/pmthangk09/bert-base-uncased-glue-cola + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_goemotions_original_finetuned_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_goemotions_original_finetuned_en.md new file mode 100644 index 00000000000000..49507c9e1f085e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_goemotions_original_finetuned_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_uncased_goemotions_original_finetuned BertForSequenceClassification from justin871030 +author: John Snow Labs +name: bert_base_uncased_goemotions_original_finetuned +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_goemotions_original_finetuned` is a English model originally trained by justin871030. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_goemotions_original_finetuned_en_5.5.0_3.0_1727256792152.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_goemotions_original_finetuned_en_5.5.0_3.0_1727256792152.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_uncased_goemotions_original_finetuned","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_uncased_goemotions_original_finetuned", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_goemotions_original_finetuned| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.5 MB| + +## References + +https://huggingface.co/justin871030/bert-base-uncased-goemotions-original-finetuned \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_imdb_yujiepan_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_imdb_yujiepan_pipeline_en.md new file mode 100644 index 00000000000000..ea96d06150fbb1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_imdb_yujiepan_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_base_uncased_imdb_yujiepan_pipeline pipeline BertForSequenceClassification from yujiepan +author: John Snow Labs +name: bert_base_uncased_imdb_yujiepan_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_imdb_yujiepan_pipeline` is a English model originally trained by yujiepan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_imdb_yujiepan_pipeline_en_5.5.0_3.0_1727273456187.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_imdb_yujiepan_pipeline_en_5.5.0_3.0_1727273456187.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_imdb_yujiepan_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_imdb_yujiepan_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_imdb_yujiepan_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/yujiepan/bert-base-uncased-imdb + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_issues_128_hndc_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_issues_128_hndc_en.md new file mode 100644 index 00000000000000..720a5ab9417a8f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_issues_128_hndc_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_uncased_issues_128_hndc BertEmbeddings from hndc +author: John Snow Labs +name: bert_base_uncased_issues_128_hndc +date: 2024-09-25 +tags: [en, open_source, onnx, embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_issues_128_hndc` is a English model originally trained by hndc. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_issues_128_hndc_en_5.5.0_3.0_1727241058797.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_issues_128_hndc_en_5.5.0_3.0_1727241058797.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = BertEmbeddings.pretrained("bert_base_uncased_issues_128_hndc","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = BertEmbeddings.pretrained("bert_base_uncased_issues_128_hndc","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_issues_128_hndc| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[bert]| +|Language:|en| +|Size:|407.1 MB| + +## References + +https://huggingface.co/hndc/bert-base-uncased-issues-128 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_issues_128_hndc_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_issues_128_hndc_pipeline_en.md new file mode 100644 index 00000000000000..d2604dbeda17ce --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_issues_128_hndc_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_base_uncased_issues_128_hndc_pipeline pipeline BertEmbeddings from hndc +author: John Snow Labs +name: bert_base_uncased_issues_128_hndc_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_issues_128_hndc_pipeline` is a English model originally trained by hndc. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_issues_128_hndc_pipeline_en_5.5.0_3.0_1727241079853.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_issues_128_hndc_pipeline_en_5.5.0_3.0_1727241079853.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_issues_128_hndc_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_issues_128_hndc_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_issues_128_hndc_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/hndc/bert-base-uncased-issues-128 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_issues_128_makaniski_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_issues_128_makaniski_en.md new file mode 100644 index 00000000000000..bde06600b17617 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_issues_128_makaniski_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_uncased_issues_128_makaniski BertEmbeddings from makaniski +author: John Snow Labs +name: bert_base_uncased_issues_128_makaniski +date: 2024-09-25 +tags: [en, open_source, onnx, embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_issues_128_makaniski` is a English model originally trained by makaniski. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_issues_128_makaniski_en_5.5.0_3.0_1727256250380.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_issues_128_makaniski_en_5.5.0_3.0_1727256250380.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = BertEmbeddings.pretrained("bert_base_uncased_issues_128_makaniski","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = BertEmbeddings.pretrained("bert_base_uncased_issues_128_makaniski","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_issues_128_makaniski| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[bert]| +|Language:|en| +|Size:|407.1 MB| + +## References + +https://huggingface.co/makaniski/bert-base-uncased-issues-128 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_issues_128_pensuke_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_issues_128_pensuke_en.md new file mode 100644 index 00000000000000..adc7119511c955 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_issues_128_pensuke_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_uncased_issues_128_pensuke BertEmbeddings from pensuke +author: John Snow Labs +name: bert_base_uncased_issues_128_pensuke +date: 2024-09-25 +tags: [en, open_source, onnx, embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_issues_128_pensuke` is a English model originally trained by pensuke. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_issues_128_pensuke_en_5.5.0_3.0_1727258538352.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_issues_128_pensuke_en_5.5.0_3.0_1727258538352.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = BertEmbeddings.pretrained("bert_base_uncased_issues_128_pensuke","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = BertEmbeddings.pretrained("bert_base_uncased_issues_128_pensuke","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_issues_128_pensuke| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[bert]| +|Language:|en| +|Size:|407.1 MB| + +## References + +https://huggingface.co/pensuke/bert-base-uncased-issues-128 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_issues_128_robinsh2023_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_issues_128_robinsh2023_en.md new file mode 100644 index 00000000000000..a9fdc7f3dcdf3b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_issues_128_robinsh2023_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_uncased_issues_128_robinsh2023 BertEmbeddings from Robinsh2023 +author: John Snow Labs +name: bert_base_uncased_issues_128_robinsh2023 +date: 2024-09-25 +tags: [en, open_source, onnx, embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_issues_128_robinsh2023` is a English model originally trained by Robinsh2023. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_issues_128_robinsh2023_en_5.5.0_3.0_1727236979136.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_issues_128_robinsh2023_en_5.5.0_3.0_1727236979136.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = BertEmbeddings.pretrained("bert_base_uncased_issues_128_robinsh2023","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = BertEmbeddings.pretrained("bert_base_uncased_issues_128_robinsh2023","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_issues_128_robinsh2023| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[bert]| +|Language:|en| +|Size:|407.1 MB| + +## References + +https://huggingface.co/Robinsh2023/bert-base-uncased-issues-128 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_issues_128_robinsh2023_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_issues_128_robinsh2023_pipeline_en.md new file mode 100644 index 00000000000000..bed36dc5a81c19 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_issues_128_robinsh2023_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_base_uncased_issues_128_robinsh2023_pipeline pipeline BertEmbeddings from Robinsh2023 +author: John Snow Labs +name: bert_base_uncased_issues_128_robinsh2023_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_issues_128_robinsh2023_pipeline` is a English model originally trained by Robinsh2023. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_issues_128_robinsh2023_pipeline_en_5.5.0_3.0_1727237000542.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_issues_128_robinsh2023_pipeline_en_5.5.0_3.0_1727237000542.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_issues_128_robinsh2023_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_issues_128_robinsh2023_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_issues_128_robinsh2023_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/Robinsh2023/bert-base-uncased-issues-128 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_issues_128_seddiktrk_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_issues_128_seddiktrk_en.md new file mode 100644 index 00000000000000..b92dd63079e977 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_issues_128_seddiktrk_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_uncased_issues_128_seddiktrk BertEmbeddings from seddiktrk +author: John Snow Labs +name: bert_base_uncased_issues_128_seddiktrk +date: 2024-09-25 +tags: [en, open_source, onnx, embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_issues_128_seddiktrk` is a English model originally trained by seddiktrk. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_issues_128_seddiktrk_en_5.5.0_3.0_1727231353065.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_issues_128_seddiktrk_en_5.5.0_3.0_1727231353065.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = BertEmbeddings.pretrained("bert_base_uncased_issues_128_seddiktrk","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = BertEmbeddings.pretrained("bert_base_uncased_issues_128_seddiktrk","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_issues_128_seddiktrk| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[bert]| +|Language:|en| +|Size:|407.1 MB| + +## References + +https://huggingface.co/seddiktrk/bert-base-uncased-issues-128 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_issues_128_seddiktrk_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_issues_128_seddiktrk_pipeline_en.md new file mode 100644 index 00000000000000..e58b1d21370cd6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_issues_128_seddiktrk_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_base_uncased_issues_128_seddiktrk_pipeline pipeline BertEmbeddings from seddiktrk +author: John Snow Labs +name: bert_base_uncased_issues_128_seddiktrk_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_issues_128_seddiktrk_pipeline` is a English model originally trained by seddiktrk. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_issues_128_seddiktrk_pipeline_en_5.5.0_3.0_1727231374198.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_issues_128_seddiktrk_pipeline_en_5.5.0_3.0_1727231374198.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_issues_128_seddiktrk_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_issues_128_seddiktrk_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_issues_128_seddiktrk_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/seddiktrk/bert-base-uncased-issues-128 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_job_bias_seq_cls_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_job_bias_seq_cls_en.md new file mode 100644 index 00000000000000..3ccf905358ccaf --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_job_bias_seq_cls_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_uncased_job_bias_seq_cls BertForSequenceClassification from 2024-mcm-everitt-ryan +author: John Snow Labs +name: bert_base_uncased_job_bias_seq_cls +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_job_bias_seq_cls` is a English model originally trained by 2024-mcm-everitt-ryan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_job_bias_seq_cls_en_5.5.0_3.0_1727269409641.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_job_bias_seq_cls_en_5.5.0_3.0_1727269409641.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_uncased_job_bias_seq_cls","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_uncased_job_bias_seq_cls", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_job_bias_seq_cls| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/2024-mcm-everitt-ryan/bert-base-uncased-job-bias-seq-cls \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_kaggle_twitter_small_finetuned_clf_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_kaggle_twitter_small_finetuned_clf_en.md new file mode 100644 index 00000000000000..5f0a827e4ff165 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_kaggle_twitter_small_finetuned_clf_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_uncased_kaggle_twitter_small_finetuned_clf BertForSequenceClassification from zloelias +author: John Snow Labs +name: bert_base_uncased_kaggle_twitter_small_finetuned_clf +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_kaggle_twitter_small_finetuned_clf` is a English model originally trained by zloelias. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_kaggle_twitter_small_finetuned_clf_en_5.5.0_3.0_1727272761007.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_kaggle_twitter_small_finetuned_clf_en_5.5.0_3.0_1727272761007.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_uncased_kaggle_twitter_small_finetuned_clf","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_uncased_kaggle_twitter_small_finetuned_clf", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_kaggle_twitter_small_finetuned_clf| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/zloelias/bert-base-uncased-kaggle_twitter_small-finetuned-clf \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_kaggle_twitter_small_finetuned_clf_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_kaggle_twitter_small_finetuned_clf_pipeline_en.md new file mode 100644 index 00000000000000..3d95797c3b0442 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_kaggle_twitter_small_finetuned_clf_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_base_uncased_kaggle_twitter_small_finetuned_clf_pipeline pipeline BertForSequenceClassification from zloelias +author: John Snow Labs +name: bert_base_uncased_kaggle_twitter_small_finetuned_clf_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_kaggle_twitter_small_finetuned_clf_pipeline` is a English model originally trained by zloelias. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_kaggle_twitter_small_finetuned_clf_pipeline_en_5.5.0_3.0_1727272783099.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_kaggle_twitter_small_finetuned_clf_pipeline_en_5.5.0_3.0_1727272783099.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_kaggle_twitter_small_finetuned_clf_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_kaggle_twitter_small_finetuned_clf_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_kaggle_twitter_small_finetuned_clf_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/zloelias/bert-base-uncased-kaggle_twitter_small-finetuned-clf + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_kinyarwanda_finetuned_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_kinyarwanda_finetuned_en.md new file mode 100644 index 00000000000000..b723f286d24036 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_kinyarwanda_finetuned_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_uncased_kinyarwanda_finetuned BertEmbeddings from RogerB +author: John Snow Labs +name: bert_base_uncased_kinyarwanda_finetuned +date: 2024-09-25 +tags: [en, open_source, onnx, embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_kinyarwanda_finetuned` is a English model originally trained by RogerB. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_kinyarwanda_finetuned_en_5.5.0_3.0_1727242863679.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_kinyarwanda_finetuned_en_5.5.0_3.0_1727242863679.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = BertEmbeddings.pretrained("bert_base_uncased_kinyarwanda_finetuned","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = BertEmbeddings.pretrained("bert_base_uncased_kinyarwanda_finetuned","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_kinyarwanda_finetuned| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[bert]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/RogerB/bert-base-uncased-kinyarwanda-finetuned \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_kinyarwanda_finetuned_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_kinyarwanda_finetuned_pipeline_en.md new file mode 100644 index 00000000000000..848d790829ea6a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_kinyarwanda_finetuned_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_base_uncased_kinyarwanda_finetuned_pipeline pipeline BertEmbeddings from RogerB +author: John Snow Labs +name: bert_base_uncased_kinyarwanda_finetuned_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_kinyarwanda_finetuned_pipeline` is a English model originally trained by RogerB. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_kinyarwanda_finetuned_pipeline_en_5.5.0_3.0_1727242885172.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_kinyarwanda_finetuned_pipeline_en_5.5.0_3.0_1727242885172.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_kinyarwanda_finetuned_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_kinyarwanda_finetuned_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_kinyarwanda_finetuned_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/RogerB/bert-base-uncased-kinyarwanda-finetuned + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_malayalam_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_malayalam_pipeline_en.md new file mode 100644 index 00000000000000..f20b43bd5a6b19 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_malayalam_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_base_uncased_malayalam_pipeline pipeline BertEmbeddings from Tural +author: John Snow Labs +name: bert_base_uncased_malayalam_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_malayalam_pipeline` is a English model originally trained by Tural. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_malayalam_pipeline_en_5.5.0_3.0_1727232998365.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_malayalam_pipeline_en_5.5.0_3.0_1727232998365.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_malayalam_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_malayalam_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_malayalam_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.9 MB| + +## References + +https://huggingface.co/Tural/bert-base-uncased-ml + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_mlp_scirepeval_chemistry_large_textcls_rheology_20230913_3_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_mlp_scirepeval_chemistry_large_textcls_rheology_20230913_3_en.md new file mode 100644 index 00000000000000..e0126974b98a37 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_mlp_scirepeval_chemistry_large_textcls_rheology_20230913_3_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_uncased_mlp_scirepeval_chemistry_large_textcls_rheology_20230913_3 BertForSequenceClassification from jonas-luehrs +author: John Snow Labs +name: bert_base_uncased_mlp_scirepeval_chemistry_large_textcls_rheology_20230913_3 +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_mlp_scirepeval_chemistry_large_textcls_rheology_20230913_3` is a English model originally trained by jonas-luehrs. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_mlp_scirepeval_chemistry_large_textcls_rheology_20230913_3_en_5.5.0_3.0_1727263801059.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_mlp_scirepeval_chemistry_large_textcls_rheology_20230913_3_en_5.5.0_3.0_1727263801059.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_uncased_mlp_scirepeval_chemistry_large_textcls_rheology_20230913_3","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_uncased_mlp_scirepeval_chemistry_large_textcls_rheology_20230913_3", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_mlp_scirepeval_chemistry_large_textcls_rheology_20230913_3| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/jonas-luehrs/bert-base-uncased-MLP-scirepeval-chemistry-LARGE-textCLS-RHEOLOGY-20230913-3 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_qa_classification_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_qa_classification_pipeline_en.md new file mode 100644 index 00000000000000..fcf0d049084348 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_qa_classification_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_base_uncased_qa_classification_pipeline pipeline BertForSequenceClassification from kgourgou +author: John Snow Labs +name: bert_base_uncased_qa_classification_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_qa_classification_pipeline` is a English model originally trained by kgourgou. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_qa_classification_pipeline_en_5.5.0_3.0_1727285908570.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_qa_classification_pipeline_en_5.5.0_3.0_1727285908570.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_qa_classification_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_qa_classification_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_qa_classification_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/kgourgou/bert-base-uncased-QA-classification + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_qnli_howey_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_qnli_howey_pipeline_en.md new file mode 100644 index 00000000000000..d7f225a1bb3758 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_qnli_howey_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_base_uncased_qnli_howey_pipeline pipeline BertForSequenceClassification from howey +author: John Snow Labs +name: bert_base_uncased_qnli_howey_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_qnli_howey_pipeline` is a English model originally trained by howey. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_qnli_howey_pipeline_en_5.5.0_3.0_1727269957256.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_qnli_howey_pipeline_en_5.5.0_3.0_1727269957256.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_qnli_howey_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_qnli_howey_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_qnli_howey_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/howey/bert-base-uncased-qnli + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_review1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_review1_pipeline_en.md new file mode 100644 index 00000000000000..2f74c9fbb8a8c2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_review1_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_base_uncased_review1_pipeline pipeline BertForSequenceClassification from Iresh88 +author: John Snow Labs +name: bert_base_uncased_review1_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_review1_pipeline` is a English model originally trained by Iresh88. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_review1_pipeline_en_5.5.0_3.0_1727267682494.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_review1_pipeline_en_5.5.0_3.0_1727267682494.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_review1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_review1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_review1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/Iresh88/bert-base-uncased-review1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_rte_from_bert_large_uncased_rte_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_rte_from_bert_large_uncased_rte_en.md new file mode 100644 index 00000000000000..19a245f1687c28 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_rte_from_bert_large_uncased_rte_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_uncased_rte_from_bert_large_uncased_rte BertForSequenceClassification from yoshitomo-matsubara +author: John Snow Labs +name: bert_base_uncased_rte_from_bert_large_uncased_rte +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_rte_from_bert_large_uncased_rte` is a English model originally trained by yoshitomo-matsubara. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_rte_from_bert_large_uncased_rte_en_5.5.0_3.0_1727269973506.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_rte_from_bert_large_uncased_rte_en_5.5.0_3.0_1727269973506.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_uncased_rte_from_bert_large_uncased_rte","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_uncased_rte_from_bert_large_uncased_rte", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_rte_from_bert_large_uncased_rte| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/yoshitomo-matsubara/bert-base-uncased-rte_from_bert-large-uncased-rte \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_rte_from_bert_large_uncased_rte_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_rte_from_bert_large_uncased_rte_pipeline_en.md new file mode 100644 index 00000000000000..5eb55ca6b48ddf --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_rte_from_bert_large_uncased_rte_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_base_uncased_rte_from_bert_large_uncased_rte_pipeline pipeline BertForSequenceClassification from yoshitomo-matsubara +author: John Snow Labs +name: bert_base_uncased_rte_from_bert_large_uncased_rte_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_rte_from_bert_large_uncased_rte_pipeline` is a English model originally trained by yoshitomo-matsubara. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_rte_from_bert_large_uncased_rte_pipeline_en_5.5.0_3.0_1727269994734.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_rte_from_bert_large_uncased_rte_pipeline_en_5.5.0_3.0_1727269994734.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_rte_from_bert_large_uncased_rte_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_rte_from_bert_large_uncased_rte_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_rte_from_bert_large_uncased_rte_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/yoshitomo-matsubara/bert-base-uncased-rte_from_bert-large-uncased-rte + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_sst_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_sst_en.md new file mode 100644 index 00000000000000..0fd14d26195a79 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_sst_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_uncased_sst BertForSequenceClassification from pmthangk09 +author: John Snow Labs +name: bert_base_uncased_sst +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_sst` is a English model originally trained by pmthangk09. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_sst_en_5.5.0_3.0_1727278143092.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_sst_en_5.5.0_3.0_1727278143092.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_uncased_sst","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_uncased_sst", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_sst| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/pmthangk09/bert-base-uncased-sst \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_tajik_ner_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_tajik_ner_en.md new file mode 100644 index 00000000000000..baf373d81dffbc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_tajik_ner_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_uncased_tajik_ner BertForTokenClassification from muhtasham +author: John Snow Labs +name: bert_base_uncased_tajik_ner +date: 2024-09-25 +tags: [en, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_tajik_ner` is a English model originally trained by muhtasham. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_tajik_ner_en_5.5.0_3.0_1727260762157.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_tajik_ner_en_5.5.0_3.0_1727260762157.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("bert_base_uncased_tajik_ner","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_base_uncased_tajik_ner", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_tajik_ner| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/muhtasham/bert-base-uncased-tajik-ner \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_tajik_ner_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_tajik_ner_pipeline_en.md new file mode 100644 index 00000000000000..aadd53fcdc5617 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_tajik_ner_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_base_uncased_tajik_ner_pipeline pipeline BertForTokenClassification from muhtasham +author: John Snow Labs +name: bert_base_uncased_tajik_ner_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_tajik_ner_pipeline` is a English model originally trained by muhtasham. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_tajik_ner_pipeline_en_5.5.0_3.0_1727260783318.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_tajik_ner_pipeline_en_5.5.0_3.0_1727260783318.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_tajik_ner_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_tajik_ner_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_tajik_ner_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/muhtasham/bert-base-uncased-tajik-ner + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_token_itr0_0_0001_train_all_test_null__second_train_set_null_false_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_token_itr0_0_0001_train_all_test_null__second_train_set_null_false_en.md new file mode 100644 index 00000000000000..7d08dd032c4e75 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_token_itr0_0_0001_train_all_test_null__second_train_set_null_false_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_uncased_token_itr0_0_0001_train_all_test_null__second_train_set_null_false BertForTokenClassification from ali2066 +author: John Snow Labs +name: bert_base_uncased_token_itr0_0_0001_train_all_test_null__second_train_set_null_false +date: 2024-09-25 +tags: [en, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_token_itr0_0_0001_train_all_test_null__second_train_set_null_false` is a English model originally trained by ali2066. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_token_itr0_0_0001_train_all_test_null__second_train_set_null_false_en_5.5.0_3.0_1727260588150.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_token_itr0_0_0001_train_all_test_null__second_train_set_null_false_en_5.5.0_3.0_1727260588150.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("bert_base_uncased_token_itr0_0_0001_train_all_test_null__second_train_set_null_false","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_base_uncased_token_itr0_0_0001_train_all_test_null__second_train_set_null_false", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_token_itr0_0_0001_train_all_test_null__second_train_set_null_false| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/ali2066/bert-base-uncased_token_itr0_0.0001_TRAIN_all_TEST_null__second_train_set_NULL_False \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_toxicity_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_toxicity_en.md new file mode 100644 index 00000000000000..7376f426de5fa3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_toxicity_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_uncased_toxicity BertForSequenceClassification from mohsenfayyaz +author: John Snow Labs +name: bert_base_uncased_toxicity +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_toxicity` is a English model originally trained by mohsenfayyaz. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_toxicity_en_5.5.0_3.0_1727269890788.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_toxicity_en_5.5.0_3.0_1727269890788.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_uncased_toxicity","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_uncased_toxicity", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_toxicity| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/mohsenfayyaz/bert-base-uncased-toxicity \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_vietnamese_pipeline_vi.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_vietnamese_pipeline_vi.md new file mode 100644 index 00000000000000..5fbb84c256ba81 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_vietnamese_pipeline_vi.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Vietnamese bert_base_vietnamese_pipeline pipeline BertForSequenceClassification from ndbao2002 +author: John Snow Labs +name: bert_base_vietnamese_pipeline +date: 2024-09-25 +tags: [vi, open_source, pipeline, onnx] +task: Text Classification +language: vi +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_vietnamese_pipeline` is a Vietnamese model originally trained by ndbao2002. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_vietnamese_pipeline_vi_5.5.0_3.0_1727278741439.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_vietnamese_pipeline_vi_5.5.0_3.0_1727278741439.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_vietnamese_pipeline", lang = "vi") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_vietnamese_pipeline", lang = "vi") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_vietnamese_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|vi| +|Size:|409.4 MB| + +## References + +https://huggingface.co/ndbao2002/bert-base-vi + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_vk_posts_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_vk_posts_en.md new file mode 100644 index 00000000000000..a12644fe0dbdb5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_vk_posts_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_vk_posts BertEmbeddings from serggor +author: John Snow Labs +name: bert_base_vk_posts +date: 2024-09-25 +tags: [en, open_source, onnx, embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_vk_posts` is a English model originally trained by serggor. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_vk_posts_en_5.5.0_3.0_1727256498749.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_vk_posts_en_5.5.0_3.0_1727256498749.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = BertEmbeddings.pretrained("bert_base_vk_posts","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = BertEmbeddings.pretrained("bert_base_vk_posts","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_vk_posts| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[bert]| +|Language:|en| +|Size:|403.6 MB| + +## References + +https://huggingface.co/serggor/bert-base-vk-posts \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_vk_posts_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_vk_posts_pipeline_en.md new file mode 100644 index 00000000000000..ad3029946052f1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_vk_posts_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_base_vk_posts_pipeline pipeline BertEmbeddings from serggor +author: John Snow Labs +name: bert_base_vk_posts_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_vk_posts_pipeline` is a English model originally trained by serggor. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_vk_posts_pipeline_en_5.5.0_3.0_1727256519859.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_vk_posts_pipeline_en_5.5.0_3.0_1727256519859.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_vk_posts_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_vk_posts_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_vk_posts_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/serggor/bert-base-vk-posts + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_baseline_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_baseline_en.md new file mode 100644 index 00000000000000..b91db117ec9834 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_baseline_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_baseline BertForSequenceClassification from florentgbelidji +author: John Snow Labs +name: bert_baseline +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_baseline` is a English model originally trained by florentgbelidji. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_baseline_en_5.5.0_3.0_1727278504573.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_baseline_en_5.5.0_3.0_1727278504573.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_baseline","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_baseline", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_baseline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/florentgbelidji/BERT_baseline \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_classifier_tuned_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_classifier_tuned_en.md new file mode 100644 index 00000000000000..8d053083ee44ba --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_classifier_tuned_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_classifier_tuned BertForSequenceClassification from omgavy +author: John Snow Labs +name: bert_classifier_tuned +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_classifier_tuned` is a English model originally trained by omgavy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_classifier_tuned_en_5.5.0_3.0_1727267541449.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_classifier_tuned_en_5.5.0_3.0_1727267541449.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_classifier_tuned","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_classifier_tuned", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_classifier_tuned| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/omgavy/bert-classifier-tuned \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_classifier_turkish_sentiment_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_classifier_turkish_sentiment_en.md new file mode 100644 index 00000000000000..906731dfb03d32 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_classifier_turkish_sentiment_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_classifier_turkish_sentiment BertForSequenceClassification from sunor +author: John Snow Labs +name: bert_classifier_turkish_sentiment +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_classifier_turkish_sentiment` is a English model originally trained by sunor. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_classifier_turkish_sentiment_en_5.5.0_3.0_1727263454242.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_classifier_turkish_sentiment_en_5.5.0_3.0_1727263454242.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_classifier_turkish_sentiment","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_classifier_turkish_sentiment", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_classifier_turkish_sentiment| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|414.5 MB| + +## References + +https://huggingface.co/sunor/bert-classifier-turkish-sentiment \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_classifier_turkish_sentiment_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_classifier_turkish_sentiment_pipeline_en.md new file mode 100644 index 00000000000000..0cc764fc570a07 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_classifier_turkish_sentiment_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_classifier_turkish_sentiment_pipeline pipeline BertForSequenceClassification from sunor +author: John Snow Labs +name: bert_classifier_turkish_sentiment_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_classifier_turkish_sentiment_pipeline` is a English model originally trained by sunor. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_classifier_turkish_sentiment_pipeline_en_5.5.0_3.0_1727263478776.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_classifier_turkish_sentiment_pipeline_en_5.5.0_3.0_1727263478776.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_classifier_turkish_sentiment_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_classifier_turkish_sentiment_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_classifier_turkish_sentiment_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|414.5 MB| + +## References + +https://huggingface.co/sunor/bert-classifier-turkish-sentiment + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_cn_finetuning_wangyuwei_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_cn_finetuning_wangyuwei_pipeline_en.md new file mode 100644 index 00000000000000..b726b838b12ec2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_cn_finetuning_wangyuwei_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_cn_finetuning_wangyuwei_pipeline pipeline BertForSequenceClassification from wangyuwei +author: John Snow Labs +name: bert_cn_finetuning_wangyuwei_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_cn_finetuning_wangyuwei_pipeline` is a English model originally trained by wangyuwei. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_cn_finetuning_wangyuwei_pipeline_en_5.5.0_3.0_1727288868856.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_cn_finetuning_wangyuwei_pipeline_en_5.5.0_3.0_1727288868856.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_cn_finetuning_wangyuwei_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_cn_finetuning_wangyuwei_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_cn_finetuning_wangyuwei_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|383.3 MB| + +## References + +https://huggingface.co/wangyuwei/bert_cn_finetuning + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_election2020_twitter_stance_biden_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_election2020_twitter_stance_biden_en.md new file mode 100644 index 00000000000000..786113fa5ac6c2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_election2020_twitter_stance_biden_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_election2020_twitter_stance_biden BertForSequenceClassification from kornosk +author: John Snow Labs +name: bert_election2020_twitter_stance_biden +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_election2020_twitter_stance_biden` is a English model originally trained by kornosk. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_election2020_twitter_stance_biden_en_5.5.0_3.0_1727239460016.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_election2020_twitter_stance_biden_en_5.5.0_3.0_1727239460016.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_election2020_twitter_stance_biden","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_election2020_twitter_stance_biden", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_election2020_twitter_stance_biden| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.8 MB| + +## References + +https://huggingface.co/kornosk/bert-election2020-twitter-stance-biden \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_emotions_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_emotions_en.md new file mode 100644 index 00000000000000..6eed5305f48186 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_emotions_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_emotions BertForSequenceClassification from Yanni8 +author: John Snow Labs +name: bert_emotions +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_emotions` is a English model originally trained by Yanni8. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_emotions_en_5.5.0_3.0_1727261714721.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_emotions_en_5.5.0_3.0_1727261714721.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_emotions","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_emotions", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_emotions| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/Yanni8/bert-emotions \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_emotions_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_emotions_pipeline_en.md new file mode 100644 index 00000000000000..7bc0b843148c5a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_emotions_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_emotions_pipeline pipeline BertForSequenceClassification from Yanni8 +author: John Snow Labs +name: bert_emotions_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_emotions_pipeline` is a English model originally trained by Yanni8. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_emotions_pipeline_en_5.5.0_3.0_1727261736321.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_emotions_pipeline_en_5.5.0_3.0_1727261736321.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_emotions_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_emotions_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_emotions_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/Yanni8/bert-emotions + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_fined_tuned_cola_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_fined_tuned_cola_en.md new file mode 100644 index 00000000000000..b7e19684f4a695 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_fined_tuned_cola_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_fined_tuned_cola BertForSequenceClassification from Utshav +author: John Snow Labs +name: bert_fined_tuned_cola +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_fined_tuned_cola` is a English model originally trained by Utshav. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_fined_tuned_cola_en_5.5.0_3.0_1727288341793.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_fined_tuned_cola_en_5.5.0_3.0_1727288341793.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_fined_tuned_cola","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_fined_tuned_cola", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_fined_tuned_cola| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|405.9 MB| + +## References + +https://huggingface.co/Utshav/bert-fined-tuned-cola \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_finetuned_abbreviation_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_finetuned_abbreviation_en.md new file mode 100644 index 00000000000000..bb85271802db91 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_finetuned_abbreviation_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_finetuned_abbreviation BertForTokenClassification from dammy +author: John Snow Labs +name: bert_finetuned_abbreviation +date: 2024-09-25 +tags: [en, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_finetuned_abbreviation` is a English model originally trained by dammy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_finetuned_abbreviation_en_5.5.0_3.0_1727260244412.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_finetuned_abbreviation_en_5.5.0_3.0_1727260244412.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("bert_finetuned_abbreviation","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_finetuned_abbreviation", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_finetuned_abbreviation| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/dammy/bert-finetuned-abbreviation \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_finetuned_abbreviation_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_finetuned_abbreviation_pipeline_en.md new file mode 100644 index 00000000000000..61b00a4a07bb43 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_finetuned_abbreviation_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_finetuned_abbreviation_pipeline pipeline BertForTokenClassification from dammy +author: John Snow Labs +name: bert_finetuned_abbreviation_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_finetuned_abbreviation_pipeline` is a English model originally trained by dammy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_finetuned_abbreviation_pipeline_en_5.5.0_3.0_1727260265611.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_finetuned_abbreviation_pipeline_en_5.5.0_3.0_1727260265611.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_finetuned_abbreviation_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_finetuned_abbreviation_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_finetuned_abbreviation_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/dammy/bert-finetuned-abbreviation + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_finetuned_age_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_finetuned_age_pipeline_en.md new file mode 100644 index 00000000000000..801f828ff5e566 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_finetuned_age_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_finetuned_age_pipeline pipeline BertForSequenceClassification from Abderrahim2 +author: John Snow Labs +name: bert_finetuned_age_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_finetuned_age_pipeline` is a English model originally trained by Abderrahim2. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_finetuned_age_pipeline_en_5.5.0_3.0_1727276390139.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_finetuned_age_pipeline_en_5.5.0_3.0_1727276390139.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_finetuned_age_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_finetuned_age_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_finetuned_age_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|414.5 MB| + +## References + +https://huggingface.co/Abderrahim2/bert-finetuned-Age + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_finetuned_hausa_ner_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_finetuned_hausa_ner_pipeline_en.md new file mode 100644 index 00000000000000..65c522ef37d80b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_finetuned_hausa_ner_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_finetuned_hausa_ner_pipeline pipeline BertForTokenClassification from peteryushunli +author: John Snow Labs +name: bert_finetuned_hausa_ner_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_finetuned_hausa_ner_pipeline` is a English model originally trained by peteryushunli. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_finetuned_hausa_ner_pipeline_en_5.5.0_3.0_1727260026645.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_finetuned_hausa_ner_pipeline_en_5.5.0_3.0_1727260026645.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_finetuned_hausa_ner_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_finetuned_hausa_ner_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_finetuned_hausa_ner_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/peteryushunli/bert-finetuned-hausa_ner + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_finetuned_ner_cti_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_finetuned_ner_cti_en.md new file mode 100644 index 00000000000000..a9a43e3b081c7e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_finetuned_ner_cti_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_finetuned_ner_cti BertForTokenClassification from thongnef +author: John Snow Labs +name: bert_finetuned_ner_cti +date: 2024-09-25 +tags: [en, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_finetuned_ner_cti` is a English model originally trained by thongnef. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_finetuned_ner_cti_en_5.5.0_3.0_1727250597375.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_finetuned_ner_cti_en_5.5.0_3.0_1727250597375.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("bert_finetuned_ner_cti","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_finetuned_ner_cti", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_finetuned_ner_cti| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/thongnef/bert-finetuned-ner-cti \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_finetuned_ner_cti_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_finetuned_ner_cti_pipeline_en.md new file mode 100644 index 00000000000000..4c21064956ec19 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_finetuned_ner_cti_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_finetuned_ner_cti_pipeline pipeline BertForTokenClassification from thongnef +author: John Snow Labs +name: bert_finetuned_ner_cti_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_finetuned_ner_cti_pipeline` is a English model originally trained by thongnef. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_finetuned_ner_cti_pipeline_en_5.5.0_3.0_1727250618087.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_finetuned_ner_cti_pipeline_en_5.5.0_3.0_1727250618087.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_finetuned_ner_cti_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_finetuned_ner_cti_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_finetuned_ner_cti_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/thongnef/bert-finetuned-ner-cti + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_finetuned_ner_hydrochii_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_finetuned_ner_hydrochii_en.md new file mode 100644 index 00000000000000..76582a2ba76dc0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_finetuned_ner_hydrochii_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_finetuned_ner_hydrochii BertForTokenClassification from hydrochii +author: John Snow Labs +name: bert_finetuned_ner_hydrochii +date: 2024-09-25 +tags: [en, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_finetuned_ner_hydrochii` is a English model originally trained by hydrochii. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_finetuned_ner_hydrochii_en_5.5.0_3.0_1727270734813.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_finetuned_ner_hydrochii_en_5.5.0_3.0_1727270734813.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("bert_finetuned_ner_hydrochii","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_finetuned_ner_hydrochii", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_finetuned_ner_hydrochii| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/hydrochii/bert-finetuned-ner \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_finetuned_ner_mjwlyy_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_finetuned_ner_mjwlyy_pipeline_en.md new file mode 100644 index 00000000000000..dc4bf9a11a4931 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_finetuned_ner_mjwlyy_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_finetuned_ner_mjwlyy_pipeline pipeline BertForTokenClassification from MJWLYY +author: John Snow Labs +name: bert_finetuned_ner_mjwlyy_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_finetuned_ner_mjwlyy_pipeline` is a English model originally trained by MJWLYY. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_finetuned_ner_mjwlyy_pipeline_en_5.5.0_3.0_1727249785506.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_finetuned_ner_mjwlyy_pipeline_en_5.5.0_3.0_1727249785506.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_finetuned_ner_mjwlyy_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_finetuned_ner_mjwlyy_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_finetuned_ner_mjwlyy_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/MJWLYY/bert-finetuned-ner + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_finetuned_ner_proccyon_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_finetuned_ner_proccyon_en.md new file mode 100644 index 00000000000000..1b871c653e068a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_finetuned_ner_proccyon_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_finetuned_ner_proccyon BertForTokenClassification from Proccyon +author: John Snow Labs +name: bert_finetuned_ner_proccyon +date: 2024-09-25 +tags: [en, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_finetuned_ner_proccyon` is a English model originally trained by Proccyon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_finetuned_ner_proccyon_en_5.5.0_3.0_1727262284868.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_finetuned_ner_proccyon_en_5.5.0_3.0_1727262284868.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("bert_finetuned_ner_proccyon","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_finetuned_ner_proccyon", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_finetuned_ner_proccyon| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/Proccyon/bert-finetuned-ner \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_finetuned_ner_proccyon_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_finetuned_ner_proccyon_pipeline_en.md new file mode 100644 index 00000000000000..1c1cea251231cd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_finetuned_ner_proccyon_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_finetuned_ner_proccyon_pipeline pipeline BertForTokenClassification from Proccyon +author: John Snow Labs +name: bert_finetuned_ner_proccyon_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_finetuned_ner_proccyon_pipeline` is a English model originally trained by Proccyon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_finetuned_ner_proccyon_pipeline_en_5.5.0_3.0_1727262306020.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_finetuned_ner_proccyon_pipeline_en_5.5.0_3.0_1727262306020.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_finetuned_ner_proccyon_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_finetuned_ner_proccyon_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_finetuned_ner_proccyon_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/Proccyon/bert-finetuned-ner + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_finetuned_ner_word_embedding_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_finetuned_ner_word_embedding_en.md new file mode 100644 index 00000000000000..41d2f9ee78a26f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_finetuned_ner_word_embedding_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_finetuned_ner_word_embedding BertForTokenClassification from lsoni +author: John Snow Labs +name: bert_finetuned_ner_word_embedding +date: 2024-09-25 +tags: [en, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_finetuned_ner_word_embedding` is a English model originally trained by lsoni. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_finetuned_ner_word_embedding_en_5.5.0_3.0_1727283090886.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_finetuned_ner_word_embedding_en_5.5.0_3.0_1727283090886.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("bert_finetuned_ner_word_embedding","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_finetuned_ner_word_embedding", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_finetuned_ner_word_embedding| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/lsoni/bert-finetuned-ner-word-embedding \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_finetuned_ner_word_embedding_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_finetuned_ner_word_embedding_pipeline_en.md new file mode 100644 index 00000000000000..340e4830116fe1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_finetuned_ner_word_embedding_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_finetuned_ner_word_embedding_pipeline pipeline BertForTokenClassification from lsoni +author: John Snow Labs +name: bert_finetuned_ner_word_embedding_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_finetuned_ner_word_embedding_pipeline` is a English model originally trained by lsoni. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_finetuned_ner_word_embedding_pipeline_en_5.5.0_3.0_1727283112582.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_finetuned_ner_word_embedding_pipeline_en_5.5.0_3.0_1727283112582.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_finetuned_ner_word_embedding_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_finetuned_ner_word_embedding_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_finetuned_ner_word_embedding_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/lsoni/bert-finetuned-ner-word-embedding + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_finetuned_semitic_languages_eval_english_lachin_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_finetuned_semitic_languages_eval_english_lachin_en.md new file mode 100644 index 00000000000000..49c4a36d1b090d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_finetuned_semitic_languages_eval_english_lachin_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_finetuned_semitic_languages_eval_english_lachin BertForSequenceClassification from Lachin +author: John Snow Labs +name: bert_finetuned_semitic_languages_eval_english_lachin +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_finetuned_semitic_languages_eval_english_lachin` is a English model originally trained by Lachin. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_finetuned_semitic_languages_eval_english_lachin_en_5.5.0_3.0_1727287151317.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_finetuned_semitic_languages_eval_english_lachin_en_5.5.0_3.0_1727287151317.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_finetuned_semitic_languages_eval_english_lachin","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_finetuned_semitic_languages_eval_english_lachin", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_finetuned_semitic_languages_eval_english_lachin| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/Lachin/bert-finetuned-sem_eval-english \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_finetuned_semitic_languages_eval_english_lachin_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_finetuned_semitic_languages_eval_english_lachin_pipeline_en.md new file mode 100644 index 00000000000000..b169efa574e4f9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_finetuned_semitic_languages_eval_english_lachin_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_finetuned_semitic_languages_eval_english_lachin_pipeline pipeline BertForSequenceClassification from Lachin +author: John Snow Labs +name: bert_finetuned_semitic_languages_eval_english_lachin_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_finetuned_semitic_languages_eval_english_lachin_pipeline` is a English model originally trained by Lachin. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_finetuned_semitic_languages_eval_english_lachin_pipeline_en_5.5.0_3.0_1727287172332.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_finetuned_semitic_languages_eval_english_lachin_pipeline_en_5.5.0_3.0_1727287172332.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_finetuned_semitic_languages_eval_english_lachin_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_finetuned_semitic_languages_eval_english_lachin_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_finetuned_semitic_languages_eval_english_lachin_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/Lachin/bert-finetuned-sem_eval-english + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_large_ambiguidade_sintatica_v1_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_large_ambiguidade_sintatica_v1_en.md new file mode 100644 index 00000000000000..5944cfc31f1935 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_large_ambiguidade_sintatica_v1_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_large_ambiguidade_sintatica_v1 BertForSequenceClassification from osouza +author: John Snow Labs +name: bert_large_ambiguidade_sintatica_v1 +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_large_ambiguidade_sintatica_v1` is a English model originally trained by osouza. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_large_ambiguidade_sintatica_v1_en_5.5.0_3.0_1727265779382.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_large_ambiguidade_sintatica_v1_en_5.5.0_3.0_1727265779382.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_large_ambiguidade_sintatica_v1","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_large_ambiguidade_sintatica_v1", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_large_ambiguidade_sintatica_v1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|408.2 MB| + +## References + +https://huggingface.co/osouza/bert-large-ambiguidade-sintatica-v1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_large_ambiguidade_sintatica_v1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_large_ambiguidade_sintatica_v1_pipeline_en.md new file mode 100644 index 00000000000000..293c6c03a9c1b3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_large_ambiguidade_sintatica_v1_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_large_ambiguidade_sintatica_v1_pipeline pipeline BertForSequenceClassification from osouza +author: John Snow Labs +name: bert_large_ambiguidade_sintatica_v1_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_large_ambiguidade_sintatica_v1_pipeline` is a English model originally trained by osouza. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_large_ambiguidade_sintatica_v1_pipeline_en_5.5.0_3.0_1727265802099.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_large_ambiguidade_sintatica_v1_pipeline_en_5.5.0_3.0_1727265802099.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_large_ambiguidade_sintatica_v1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_large_ambiguidade_sintatica_v1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_large_ambiguidade_sintatica_v1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|408.2 MB| + +## References + +https://huggingface.co/osouza/bert-large-ambiguidade-sintatica-v1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_large_cased_finetuned_ner_augment_01_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_large_cased_finetuned_ner_augment_01_en.md new file mode 100644 index 00000000000000..30e00733a666a9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_large_cased_finetuned_ner_augment_01_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_large_cased_finetuned_ner_augment_01 BertForTokenClassification from lamthanhtin2811 +author: John Snow Labs +name: bert_large_cased_finetuned_ner_augment_01 +date: 2024-09-25 +tags: [en, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_large_cased_finetuned_ner_augment_01` is a English model originally trained by lamthanhtin2811. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_large_cased_finetuned_ner_augment_01_en_5.5.0_3.0_1727282321654.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_large_cased_finetuned_ner_augment_01_en_5.5.0_3.0_1727282321654.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("bert_large_cased_finetuned_ner_augment_01","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_large_cased_finetuned_ner_augment_01", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_large_cased_finetuned_ner_augment_01| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/lamthanhtin2811/bert-large-cased-finetuned-ner-augment-01 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_large_cased_finetuned_ner_augment_01_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_large_cased_finetuned_ner_augment_01_pipeline_en.md new file mode 100644 index 00000000000000..d600041f859f74 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_large_cased_finetuned_ner_augment_01_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_large_cased_finetuned_ner_augment_01_pipeline pipeline BertForTokenClassification from lamthanhtin2811 +author: John Snow Labs +name: bert_large_cased_finetuned_ner_augment_01_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_large_cased_finetuned_ner_augment_01_pipeline` is a English model originally trained by lamthanhtin2811. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_large_cased_finetuned_ner_augment_01_pipeline_en_5.5.0_3.0_1727282385046.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_large_cased_finetuned_ner_augment_01_pipeline_en_5.5.0_3.0_1727282385046.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_large_cased_finetuned_ner_augment_01_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_large_cased_finetuned_ner_augment_01_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_large_cased_finetuned_ner_augment_01_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/lamthanhtin2811/bert-large-cased-finetuned-ner-augment-01 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_large_ner_pii_062024_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_large_ner_pii_062024_en.md new file mode 100644 index 00000000000000..65e343651905e9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_large_ner_pii_062024_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_large_ner_pii_062024 BertForTokenClassification from vuminhtue +author: John Snow Labs +name: bert_large_ner_pii_062024 +date: 2024-09-25 +tags: [en, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_large_ner_pii_062024` is a English model originally trained by vuminhtue. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_large_ner_pii_062024_en_5.5.0_3.0_1727275036455.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_large_ner_pii_062024_en_5.5.0_3.0_1727275036455.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("bert_large_ner_pii_062024","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_large_ner_pii_062024", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_large_ner_pii_062024| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/vuminhtue/Bert_large_NER_PII_062024 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_large_portuguese_archive_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_large_portuguese_archive_pipeline_en.md new file mode 100644 index 00000000000000..61c01491d65fda --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_large_portuguese_archive_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_large_portuguese_archive_pipeline pipeline BertForTokenClassification from lfcc +author: John Snow Labs +name: bert_large_portuguese_archive_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_large_portuguese_archive_pipeline` is a English model originally trained by lfcc. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_large_portuguese_archive_pipeline_en_5.5.0_3.0_1727270979698.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_large_portuguese_archive_pipeline_en_5.5.0_3.0_1727270979698.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_large_portuguese_archive_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_large_portuguese_archive_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_large_portuguese_archive_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/lfcc/bert-large-pt-archive + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_large_uncased_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_large_uncased_en.md new file mode 100644 index 00000000000000..da71f4e73bf065 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_large_uncased_en.md @@ -0,0 +1,92 @@ +--- +layout: model +title: BERT Embeddings (Large Uncased) +author: John Snow Labs +name: bert_large_uncased +date: 2024-09-25 +tags: [open_source, embeddings, en, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +This model contains a deep bidirectional transformer trained on Wikipedia and the BookCorpus. The details are described in the paper "[BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding](https://arxiv.org/abs/1810.04805)". + +## Predicted Entities + + + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_large_uncased_en_5.5.0_3.0_1727242974255.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_large_uncased_en_5.5.0_3.0_1727242974255.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +... +embeddings = BertEmbeddings.pretrained("bert_large_uncased", "en") \ +.setInputCols("sentence", "token") \ +.setOutputCol("embeddings") +nlp_pipeline = Pipeline(stages=[document_assembler, sentence_detector, tokenizer, embeddings]) +pipeline_model = nlp_pipeline.fit(spark.createDataFrame([[""]]).toDF("text")) +result = pipeline_model.transform(spark.createDataFrame([['I love NLP']], ["text"])) +``` +```scala +... +val embeddings = BertEmbeddings.pretrained("bert_large_uncased", "en") +.setInputCols("sentence", "token") +.setOutputCol("embeddings") +val pipeline = new Pipeline().setStages(Array(document_assembler, sentence_detector, tokenizer, embeddings)) +val data = Seq("I love NLP").toDF("text") +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu + +text = ["I love NLP"] +embeddings_df = nlu.load('en.embed.bert.large_uncased').predict(text, output_level='token') +embeddings_df +``` +
+ +## Results + +```bash + + en_embed_bert_large_uncased_embeddings token + + [-0.07447264343500137, -0.337308406829834, -0.... I + [-0.5735481977462769, -0.3580206632614136, -0.... love + [-0.3929762840270996, -0.4147087037563324, 0.2... NLP +``` + +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_large_uncased| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[bert]| +|Language:|en| +|Size:|1.3 GB| \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_large_uncased_english_ner_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_large_uncased_english_ner_pipeline_en.md new file mode 100644 index 00000000000000..036d0a62f168f5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_large_uncased_english_ner_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_large_uncased_english_ner_pipeline pipeline BertForTokenClassification from n6ai +author: John Snow Labs +name: bert_large_uncased_english_ner_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_large_uncased_english_ner_pipeline` is a English model originally trained by n6ai. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_large_uncased_english_ner_pipeline_en_5.5.0_3.0_1727281779260.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_large_uncased_english_ner_pipeline_en_5.5.0_3.0_1727281779260.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_large_uncased_english_ner_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_large_uncased_english_ner_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_large_uncased_english_ner_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/n6ai/bert-large-uncased-en-ner + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_large_uncased_finetuned_edos_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_large_uncased_finetuned_edos_pipeline_en.md new file mode 100644 index 00000000000000..8229d6e80fee0b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_large_uncased_finetuned_edos_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_large_uncased_finetuned_edos_pipeline pipeline BertForSequenceClassification from reinforz +author: John Snow Labs +name: bert_large_uncased_finetuned_edos_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_large_uncased_finetuned_edos_pipeline` is a English model originally trained by reinforz. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_large_uncased_finetuned_edos_pipeline_en_5.5.0_3.0_1727269394576.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_large_uncased_finetuned_edos_pipeline_en_5.5.0_3.0_1727269394576.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_large_uncased_finetuned_edos_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_large_uncased_finetuned_edos_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_large_uncased_finetuned_edos_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/reinforz/bert-large-uncased-finetuned-edos + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_large_uncased_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_large_uncased_pipeline_en.md new file mode 100644 index 00000000000000..2f83dab680ec15 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_large_uncased_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_large_uncased_pipeline pipeline BertEmbeddings from google-bert +author: John Snow Labs +name: bert_large_uncased_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_large_uncased_pipeline` is a English model originally trained by google-bert. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_large_uncased_pipeline_en_5.5.0_3.0_1727243037738.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_large_uncased_pipeline_en_5.5.0_3.0_1727243037738.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_large_uncased_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_large_uncased_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_large_uncased_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/google-bert/bert-large-uncased + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_large_uncased_wnli_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_large_uncased_wnli_en.md new file mode 100644 index 00000000000000..e05894ebd106d8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_large_uncased_wnli_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_large_uncased_wnli BertForSequenceClassification from yoshitomo-matsubara +author: John Snow Labs +name: bert_large_uncased_wnli +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_large_uncased_wnli` is a English model originally trained by yoshitomo-matsubara. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_large_uncased_wnli_en_5.5.0_3.0_1727285124102.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_large_uncased_wnli_en_5.5.0_3.0_1727285124102.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_large_uncased_wnli","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_large_uncased_wnli", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_large_uncased_wnli| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-wnli \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_mini_domain_adapted_imdb_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_mini_domain_adapted_imdb_en.md new file mode 100644 index 00000000000000..4b4bbbd97df26d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_mini_domain_adapted_imdb_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_mini_domain_adapted_imdb BertEmbeddings from rasyosef +author: John Snow Labs +name: bert_mini_domain_adapted_imdb +date: 2024-09-25 +tags: [en, open_source, onnx, embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_mini_domain_adapted_imdb` is a English model originally trained by rasyosef. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_mini_domain_adapted_imdb_en_5.5.0_3.0_1727240872831.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_mini_domain_adapted_imdb_en_5.5.0_3.0_1727240872831.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = BertEmbeddings.pretrained("bert_mini_domain_adapted_imdb","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = BertEmbeddings.pretrained("bert_mini_domain_adapted_imdb","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_mini_domain_adapted_imdb| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[bert]| +|Language:|en| +|Size:|41.9 MB| + +## References + +https://huggingface.co/rasyosef/bert-mini-domain-adapted-imdb \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_mini_domain_adapted_imdb_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_mini_domain_adapted_imdb_pipeline_en.md new file mode 100644 index 00000000000000..7df871cf6cfc8d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_mini_domain_adapted_imdb_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_mini_domain_adapted_imdb_pipeline pipeline BertEmbeddings from rasyosef +author: John Snow Labs +name: bert_mini_domain_adapted_imdb_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_mini_domain_adapted_imdb_pipeline` is a English model originally trained by rasyosef. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_mini_domain_adapted_imdb_pipeline_en_5.5.0_3.0_1727240875150.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_mini_domain_adapted_imdb_pipeline_en_5.5.0_3.0_1727240875150.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_mini_domain_adapted_imdb_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_mini_domain_adapted_imdb_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_mini_domain_adapted_imdb_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|41.9 MB| + +## References + +https://huggingface.co/rasyosef/bert-mini-domain-adapted-imdb + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_mini_sst2_distilled_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_mini_sst2_distilled_en.md new file mode 100644 index 00000000000000..1ba10dfd9a782c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_mini_sst2_distilled_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_mini_sst2_distilled BertForSequenceClassification from philschmid +author: John Snow Labs +name: bert_mini_sst2_distilled +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_mini_sst2_distilled` is a English model originally trained by philschmid. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_mini_sst2_distilled_en_5.5.0_3.0_1727269691593.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_mini_sst2_distilled_en_5.5.0_3.0_1727269691593.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_mini_sst2_distilled","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_mini_sst2_distilled", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_mini_sst2_distilled| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|42.1 MB| + +## References + +https://huggingface.co/philschmid/bert-mini-sst2-distilled \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_mini_sst2_distilled_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_mini_sst2_distilled_pipeline_en.md new file mode 100644 index 00000000000000..e050214574312e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_mini_sst2_distilled_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_mini_sst2_distilled_pipeline pipeline BertForSequenceClassification from philschmid +author: John Snow Labs +name: bert_mini_sst2_distilled_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_mini_sst2_distilled_pipeline` is a English model originally trained by philschmid. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_mini_sst2_distilled_pipeline_en_5.5.0_3.0_1727269694087.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_mini_sst2_distilled_pipeline_en_5.5.0_3.0_1727269694087.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_mini_sst2_distilled_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_mini_sst2_distilled_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_mini_sst2_distilled_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|42.1 MB| + +## References + +https://huggingface.co/philschmid/bert-mini-sst2-distilled + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_mrpc_distilled_cka_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_mrpc_distilled_cka_en.md new file mode 100644 index 00000000000000..afa330d4837a32 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_mrpc_distilled_cka_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_mrpc_distilled_cka BertForSequenceClassification from Sayan01 +author: John Snow Labs +name: bert_mrpc_distilled_cka +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_mrpc_distilled_cka` is a English model originally trained by Sayan01. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_mrpc_distilled_cka_en_5.5.0_3.0_1727268774840.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_mrpc_distilled_cka_en_5.5.0_3.0_1727268774840.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_mrpc_distilled_cka","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_mrpc_distilled_cka", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_mrpc_distilled_cka| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|154.8 MB| + +## References + +https://huggingface.co/Sayan01/bert-mrpc-distilled-cka \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_multi_pad_ner_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_multi_pad_ner_en.md new file mode 100644 index 00000000000000..102e031a8dbde1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_multi_pad_ner_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_multi_pad_ner BertForTokenClassification from ArseniyBolotin +author: John Snow Labs +name: bert_multi_pad_ner +date: 2024-09-25 +tags: [en, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_multi_pad_ner` is a English model originally trained by ArseniyBolotin. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_multi_pad_ner_en_5.5.0_3.0_1727263187379.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_multi_pad_ner_en_5.5.0_3.0_1727263187379.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("bert_multi_pad_ner","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_multi_pad_ner", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_multi_pad_ner| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|665.1 MB| + +## References + +https://huggingface.co/ArseniyBolotin/bert-multi-PAD-ner \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_multi_pad_ner_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_multi_pad_ner_pipeline_en.md new file mode 100644 index 00000000000000..15e7ba3dbe984a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_multi_pad_ner_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_multi_pad_ner_pipeline pipeline BertForTokenClassification from ArseniyBolotin +author: John Snow Labs +name: bert_multi_pad_ner_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_multi_pad_ner_pipeline` is a English model originally trained by ArseniyBolotin. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_multi_pad_ner_pipeline_en_5.5.0_3.0_1727263220672.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_multi_pad_ner_pipeline_en_5.5.0_3.0_1727263220672.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_multi_pad_ner_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_multi_pad_ner_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_multi_pad_ner_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|665.1 MB| + +## References + +https://huggingface.co/ArseniyBolotin/bert-multi-PAD-ner + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_nlp_project_ft_imdb_ds_news_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_nlp_project_ft_imdb_ds_news_en.md new file mode 100644 index 00000000000000..6f03583c58363e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_nlp_project_ft_imdb_ds_news_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_nlp_project_ft_imdb_ds_news BertForSequenceClassification from MatFil99 +author: John Snow Labs +name: bert_nlp_project_ft_imdb_ds_news +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_nlp_project_ft_imdb_ds_news` is a English model originally trained by MatFil99. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_nlp_project_ft_imdb_ds_news_en_5.5.0_3.0_1727278738362.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_nlp_project_ft_imdb_ds_news_en_5.5.0_3.0_1727278738362.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_nlp_project_ft_imdb_ds_news","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_nlp_project_ft_imdb_ds_news", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_nlp_project_ft_imdb_ds_news| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/MatFil99/bert-nlp-project-ft-imdb-ds-news \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_nlp_project_ft_imdb_ds_news_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_nlp_project_ft_imdb_ds_news_pipeline_en.md new file mode 100644 index 00000000000000..3edd4a3b0b5f24 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_nlp_project_ft_imdb_ds_news_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_nlp_project_ft_imdb_ds_news_pipeline pipeline BertForSequenceClassification from MatFil99 +author: John Snow Labs +name: bert_nlp_project_ft_imdb_ds_news_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_nlp_project_ft_imdb_ds_news_pipeline` is a English model originally trained by MatFil99. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_nlp_project_ft_imdb_ds_news_pipeline_en_5.5.0_3.0_1727278760646.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_nlp_project_ft_imdb_ds_news_pipeline_en_5.5.0_3.0_1727278760646.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_nlp_project_ft_imdb_ds_news_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_nlp_project_ft_imdb_ds_news_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_nlp_project_ft_imdb_ds_news_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/MatFil99/bert-nlp-project-ft-imdb-ds-news + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_persian_farsi_base_uncased_finetuned_parsbert_fa.md b/docs/_posts/ahmedlone127/2024-09-25-bert_persian_farsi_base_uncased_finetuned_parsbert_fa.md new file mode 100644 index 00000000000000..15fb132e1e4864 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_persian_farsi_base_uncased_finetuned_parsbert_fa.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Persian bert_persian_farsi_base_uncased_finetuned_parsbert BertEmbeddings from Yasamansaffari73 +author: John Snow Labs +name: bert_persian_farsi_base_uncased_finetuned_parsbert +date: 2024-09-25 +tags: [fa, open_source, onnx, embeddings, bert] +task: Embeddings +language: fa +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_persian_farsi_base_uncased_finetuned_parsbert` is a Persian model originally trained by Yasamansaffari73. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_persian_farsi_base_uncased_finetuned_parsbert_fa_5.5.0_3.0_1727241132849.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_persian_farsi_base_uncased_finetuned_parsbert_fa_5.5.0_3.0_1727241132849.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = BertEmbeddings.pretrained("bert_persian_farsi_base_uncased_finetuned_parsbert","fa") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = BertEmbeddings.pretrained("bert_persian_farsi_base_uncased_finetuned_parsbert","fa") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_persian_farsi_base_uncased_finetuned_parsbert| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[bert]| +|Language:|fa| +|Size:|606.5 MB| + +## References + +https://huggingface.co/Yasamansaffari73/bert-fa-base-uncased-finetuned-ParsBert \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_persian_farsi_base_uncased_finetuned_parsbert_pipeline_fa.md b/docs/_posts/ahmedlone127/2024-09-25-bert_persian_farsi_base_uncased_finetuned_parsbert_pipeline_fa.md new file mode 100644 index 00000000000000..c0dfbd8abb714e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_persian_farsi_base_uncased_finetuned_parsbert_pipeline_fa.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Persian bert_persian_farsi_base_uncased_finetuned_parsbert_pipeline pipeline BertEmbeddings from Yasamansaffari73 +author: John Snow Labs +name: bert_persian_farsi_base_uncased_finetuned_parsbert_pipeline +date: 2024-09-25 +tags: [fa, open_source, pipeline, onnx] +task: Embeddings +language: fa +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_persian_farsi_base_uncased_finetuned_parsbert_pipeline` is a Persian model originally trained by Yasamansaffari73. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_persian_farsi_base_uncased_finetuned_parsbert_pipeline_fa_5.5.0_3.0_1727241163964.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_persian_farsi_base_uncased_finetuned_parsbert_pipeline_fa_5.5.0_3.0_1727241163964.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_persian_farsi_base_uncased_finetuned_parsbert_pipeline", lang = "fa") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_persian_farsi_base_uncased_finetuned_parsbert_pipeline", lang = "fa") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_persian_farsi_base_uncased_finetuned_parsbert_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|fa| +|Size:|606.5 MB| + +## References + +https://huggingface.co/Yasamansaffari73/bert-fa-base-uncased-finetuned-ParsBert + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_phrasebank_sentiment_analysis_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_phrasebank_sentiment_analysis_en.md new file mode 100644 index 00000000000000..7a94276a86dde7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_phrasebank_sentiment_analysis_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_phrasebank_sentiment_analysis BertForSequenceClassification from pkbiswas +author: John Snow Labs +name: bert_phrasebank_sentiment_analysis +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_phrasebank_sentiment_analysis` is a English model originally trained by pkbiswas. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_phrasebank_sentiment_analysis_en_5.5.0_3.0_1727264209163.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_phrasebank_sentiment_analysis_en_5.5.0_3.0_1727264209163.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_phrasebank_sentiment_analysis","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_phrasebank_sentiment_analysis", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_phrasebank_sentiment_analysis| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/pkbiswas/Bert-Phrasebank-Sentiment-Analysis \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_phrasebank_sentiment_analysis_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_phrasebank_sentiment_analysis_pipeline_en.md new file mode 100644 index 00000000000000..9ffc7ab67a41b6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_phrasebank_sentiment_analysis_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_phrasebank_sentiment_analysis_pipeline pipeline BertForSequenceClassification from pkbiswas +author: John Snow Labs +name: bert_phrasebank_sentiment_analysis_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_phrasebank_sentiment_analysis_pipeline` is a English model originally trained by pkbiswas. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_phrasebank_sentiment_analysis_pipeline_en_5.5.0_3.0_1727264230213.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_phrasebank_sentiment_analysis_pipeline_en_5.5.0_3.0_1727264230213.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_phrasebank_sentiment_analysis_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_phrasebank_sentiment_analysis_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_phrasebank_sentiment_analysis_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/pkbiswas/Bert-Phrasebank-Sentiment-Analysis + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_pooling_based_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_pooling_based_en.md new file mode 100644 index 00000000000000..06a8fb941346d5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_pooling_based_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_pooling_based BertForSequenceClassification from elifcen +author: John Snow Labs +name: bert_pooling_based +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_pooling_based` is a English model originally trained by elifcen. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_pooling_based_en_5.5.0_3.0_1727284833679.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_pooling_based_en_5.5.0_3.0_1727284833679.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_pooling_based","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_pooling_based", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_pooling_based| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/elifcen/bert-pooling-based \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_pretrained_wikitext_2_raw_v1_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_pretrained_wikitext_2_raw_v1_en.md new file mode 100644 index 00000000000000..7b42358bdd40e9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_pretrained_wikitext_2_raw_v1_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_pretrained_wikitext_2_raw_v1 BertEmbeddings from dimpo +author: John Snow Labs +name: bert_pretrained_wikitext_2_raw_v1 +date: 2024-09-25 +tags: [en, open_source, onnx, embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_pretrained_wikitext_2_raw_v1` is a English model originally trained by dimpo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_pretrained_wikitext_2_raw_v1_en_5.5.0_3.0_1727256191997.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_pretrained_wikitext_2_raw_v1_en_5.5.0_3.0_1727256191997.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = BertEmbeddings.pretrained("bert_pretrained_wikitext_2_raw_v1","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = BertEmbeddings.pretrained("bert_pretrained_wikitext_2_raw_v1","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_pretrained_wikitext_2_raw_v1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[bert]| +|Language:|en| +|Size:|407.9 MB| + +## References + +https://huggingface.co/dimpo/bert-pretrained-wikitext-2-raw-v1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_semaphore_prediction_w2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_semaphore_prediction_w2_pipeline_en.md new file mode 100644 index 00000000000000..787bb017cda68a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_semaphore_prediction_w2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_semaphore_prediction_w2_pipeline pipeline BertForSequenceClassification from bondi +author: John Snow Labs +name: bert_semaphore_prediction_w2_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_semaphore_prediction_w2_pipeline` is a English model originally trained by bondi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_semaphore_prediction_w2_pipeline_en_5.5.0_3.0_1727284708832.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_semaphore_prediction_w2_pipeline_en_5.5.0_3.0_1727284708832.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_semaphore_prediction_w2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_semaphore_prediction_w2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_semaphore_prediction_w2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|411.7 MB| + +## References + +https://huggingface.co/bondi/bert-semaphore-prediction-w2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_sentiment_classification_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_sentiment_classification_en.md new file mode 100644 index 00000000000000..ffdcd331cbe930 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_sentiment_classification_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_sentiment_classification BertForSequenceClassification from Naren579 +author: John Snow Labs +name: bert_sentiment_classification +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_sentiment_classification` is a English model originally trained by Naren579. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_sentiment_classification_en_5.5.0_3.0_1727267226186.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_sentiment_classification_en_5.5.0_3.0_1727267226186.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_sentiment_classification","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_sentiment_classification", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_sentiment_classification| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/Naren579/BERT-Sentiment-classification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_sentiment_classification_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_sentiment_classification_pipeline_en.md new file mode 100644 index 00000000000000..7c184bdf92c84d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_sentiment_classification_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_sentiment_classification_pipeline pipeline BertForSequenceClassification from Naren579 +author: John Snow Labs +name: bert_sentiment_classification_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_sentiment_classification_pipeline` is a English model originally trained by Naren579. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_sentiment_classification_pipeline_en_5.5.0_3.0_1727267248567.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_sentiment_classification_pipeline_en_5.5.0_3.0_1727267248567.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_sentiment_classification_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_sentiment_classification_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_sentiment_classification_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/Naren579/BERT-Sentiment-classification + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_sst5_padding50model_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_sst5_padding50model_en.md new file mode 100644 index 00000000000000..d38861560f89ef --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_sst5_padding50model_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_sst5_padding50model BertForSequenceClassification from Realgon +author: John Snow Labs +name: bert_sst5_padding50model +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_sst5_padding50model` is a English model originally trained by Realgon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_sst5_padding50model_en_5.5.0_3.0_1727287947774.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_sst5_padding50model_en_5.5.0_3.0_1727287947774.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_sst5_padding50model","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_sst5_padding50model", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_sst5_padding50model| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/Realgon/bert_sst5_padding50model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_swe_skills_ner_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_swe_skills_ner_en.md new file mode 100644 index 00000000000000..a19fd483f2dc59 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_swe_skills_ner_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_swe_skills_ner BertForTokenClassification from RJuro +author: John Snow Labs +name: bert_swe_skills_ner +date: 2024-09-25 +tags: [en, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_swe_skills_ner` is a English model originally trained by RJuro. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_swe_skills_ner_en_5.5.0_3.0_1727275164826.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_swe_skills_ner_en_5.5.0_3.0_1727275164826.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("bert_swe_skills_ner","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_swe_skills_ner", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_swe_skills_ner| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|465.2 MB| + +## References + +https://huggingface.co/RJuro/bert-swe-skills-ner \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_swe_skills_ner_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_swe_skills_ner_pipeline_en.md new file mode 100644 index 00000000000000..c1ceaf25d16cfa --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_swe_skills_ner_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_swe_skills_ner_pipeline pipeline BertForTokenClassification from RJuro +author: John Snow Labs +name: bert_swe_skills_ner_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_swe_skills_ner_pipeline` is a English model originally trained by RJuro. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_swe_skills_ner_pipeline_en_5.5.0_3.0_1727275189804.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_swe_skills_ner_pipeline_en_5.5.0_3.0_1727275189804.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_swe_skills_ner_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_swe_skills_ner_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_swe_skills_ner_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|465.3 MB| + +## References + +https://huggingface.co/RJuro/bert-swe-skills-ner + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_tiny_emotion_kd_bert_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_tiny_emotion_kd_bert_en.md new file mode 100644 index 00000000000000..bce6b3e6b358f9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_tiny_emotion_kd_bert_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_tiny_emotion_kd_bert BertForSequenceClassification from gokuls +author: John Snow Labs +name: bert_tiny_emotion_kd_bert +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_tiny_emotion_kd_bert` is a English model originally trained by gokuls. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_tiny_emotion_kd_bert_en_5.5.0_3.0_1727279234958.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_tiny_emotion_kd_bert_en_5.5.0_3.0_1727279234958.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_tiny_emotion_kd_bert","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_tiny_emotion_kd_bert", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_tiny_emotion_kd_bert| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|16.7 MB| + +## References + +https://huggingface.co/gokuls/bert-tiny-emotion-KD-BERT \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_tiny_emotion_kd_bert_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_tiny_emotion_kd_bert_pipeline_en.md new file mode 100644 index 00000000000000..b078379ff5c44a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_tiny_emotion_kd_bert_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_tiny_emotion_kd_bert_pipeline pipeline BertForSequenceClassification from gokuls +author: John Snow Labs +name: bert_tiny_emotion_kd_bert_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_tiny_emotion_kd_bert_pipeline` is a English model originally trained by gokuls. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_tiny_emotion_kd_bert_pipeline_en_5.5.0_3.0_1727279236292.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_tiny_emotion_kd_bert_pipeline_en_5.5.0_3.0_1727279236292.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_tiny_emotion_kd_bert_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_tiny_emotion_kd_bert_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_tiny_emotion_kd_bert_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|16.8 MB| + +## References + +https://huggingface.co/gokuls/bert-tiny-emotion-KD-BERT + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_tiny_massive_intent_kd_bert_and_distilbert_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_tiny_massive_intent_kd_bert_and_distilbert_en.md new file mode 100644 index 00000000000000..8db7ad884c1106 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_tiny_massive_intent_kd_bert_and_distilbert_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_tiny_massive_intent_kd_bert_and_distilbert BertForSequenceClassification from gokuls +author: John Snow Labs +name: bert_tiny_massive_intent_kd_bert_and_distilbert +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_tiny_massive_intent_kd_bert_and_distilbert` is a English model originally trained by gokuls. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_tiny_massive_intent_kd_bert_and_distilbert_en_5.5.0_3.0_1727278610224.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_tiny_massive_intent_kd_bert_and_distilbert_en_5.5.0_3.0_1727278610224.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_tiny_massive_intent_kd_bert_and_distilbert","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_tiny_massive_intent_kd_bert_and_distilbert", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_tiny_massive_intent_kd_bert_and_distilbert| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|16.8 MB| + +## References + +https://huggingface.co/gokuls/bert-tiny-Massive-intent-KD-BERT_and_distilBERT \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_tiny_massive_intent_kd_bert_and_distilbert_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_tiny_massive_intent_kd_bert_and_distilbert_pipeline_en.md new file mode 100644 index 00000000000000..26bb05c5019956 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_tiny_massive_intent_kd_bert_and_distilbert_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_tiny_massive_intent_kd_bert_and_distilbert_pipeline pipeline BertForSequenceClassification from gokuls +author: John Snow Labs +name: bert_tiny_massive_intent_kd_bert_and_distilbert_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_tiny_massive_intent_kd_bert_and_distilbert_pipeline` is a English model originally trained by gokuls. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_tiny_massive_intent_kd_bert_and_distilbert_pipeline_en_5.5.0_3.0_1727278611412.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_tiny_massive_intent_kd_bert_and_distilbert_pipeline_en_5.5.0_3.0_1727278611412.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_tiny_massive_intent_kd_bert_and_distilbert_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_tiny_massive_intent_kd_bert_and_distilbert_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_tiny_massive_intent_kd_bert_and_distilbert_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|16.8 MB| + +## References + +https://huggingface.co/gokuls/bert-tiny-Massive-intent-KD-BERT_and_distilBERT + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_tokenizer_updated_multilingual_words_pipeline_xx.md b/docs/_posts/ahmedlone127/2024-09-25-bert_tokenizer_updated_multilingual_words_pipeline_xx.md new file mode 100644 index 00000000000000..f9f2929acc01e4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_tokenizer_updated_multilingual_words_pipeline_xx.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Multilingual bert_tokenizer_updated_multilingual_words_pipeline pipeline BertForTokenClassification from junaidali +author: John Snow Labs +name: bert_tokenizer_updated_multilingual_words_pipeline +date: 2024-09-25 +tags: [xx, open_source, pipeline, onnx] +task: Named Entity Recognition +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_tokenizer_updated_multilingual_words_pipeline` is a Multilingual model originally trained by junaidali. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_tokenizer_updated_multilingual_words_pipeline_xx_5.5.0_3.0_1727246876908.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_tokenizer_updated_multilingual_words_pipeline_xx_5.5.0_3.0_1727246876908.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_tokenizer_updated_multilingual_words_pipeline", lang = "xx") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_tokenizer_updated_multilingual_words_pipeline", lang = "xx") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_tokenizer_updated_multilingual_words_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|xx| +|Size:|1.0 GB| + +## References + +https://huggingface.co/junaidali/bert_tokenizer_updated_multilingual_words + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_tokenizer_updated_multilingual_words_xx.md b/docs/_posts/ahmedlone127/2024-09-25-bert_tokenizer_updated_multilingual_words_xx.md new file mode 100644 index 00000000000000..89a0ed20d163e7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_tokenizer_updated_multilingual_words_xx.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Multilingual bert_tokenizer_updated_multilingual_words BertForTokenClassification from junaidali +author: John Snow Labs +name: bert_tokenizer_updated_multilingual_words +date: 2024-09-25 +tags: [xx, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_tokenizer_updated_multilingual_words` is a Multilingual model originally trained by junaidali. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_tokenizer_updated_multilingual_words_xx_5.5.0_3.0_1727246823247.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_tokenizer_updated_multilingual_words_xx_5.5.0_3.0_1727246823247.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("bert_tokenizer_updated_multilingual_words","xx") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_tokenizer_updated_multilingual_words", "xx") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_tokenizer_updated_multilingual_words| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|xx| +|Size:|1.0 GB| + +## References + +https://huggingface.co/junaidali/bert_tokenizer_updated_multilingual_words \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_tonga_tonga_islands_distilbert_ner_zacarage_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_tonga_tonga_islands_distilbert_ner_zacarage_en.md new file mode 100644 index 00000000000000..6077cfe1c0bdbb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_tonga_tonga_islands_distilbert_ner_zacarage_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_tonga_tonga_islands_distilbert_ner_zacarage BertForTokenClassification from Zacarage +author: John Snow Labs +name: bert_tonga_tonga_islands_distilbert_ner_zacarage +date: 2024-09-25 +tags: [en, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_tonga_tonga_islands_distilbert_ner_zacarage` is a English model originally trained by Zacarage. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_tonga_tonga_islands_distilbert_ner_zacarage_en_5.5.0_3.0_1727246436869.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_tonga_tonga_islands_distilbert_ner_zacarage_en_5.5.0_3.0_1727246436869.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("bert_tonga_tonga_islands_distilbert_ner_zacarage","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_tonga_tonga_islands_distilbert_ner_zacarage", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_tonga_tonga_islands_distilbert_ner_zacarage| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|244.3 MB| + +## References + +https://huggingface.co/Zacarage/bert-to-distilbert-NER \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_tonga_tonga_islands_distilbert_ner_zacarage_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_tonga_tonga_islands_distilbert_ner_zacarage_pipeline_en.md new file mode 100644 index 00000000000000..74bc331faf0dc4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_tonga_tonga_islands_distilbert_ner_zacarage_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_tonga_tonga_islands_distilbert_ner_zacarage_pipeline pipeline BertForTokenClassification from Zacarage +author: John Snow Labs +name: bert_tonga_tonga_islands_distilbert_ner_zacarage_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_tonga_tonga_islands_distilbert_ner_zacarage_pipeline` is a English model originally trained by Zacarage. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_tonga_tonga_islands_distilbert_ner_zacarage_pipeline_en_5.5.0_3.0_1727246449659.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_tonga_tonga_islands_distilbert_ner_zacarage_pipeline_en_5.5.0_3.0_1727246449659.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_tonga_tonga_islands_distilbert_ner_zacarage_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_tonga_tonga_islands_distilbert_ner_zacarage_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_tonga_tonga_islands_distilbert_ner_zacarage_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|244.3 MB| + +## References + +https://huggingface.co/Zacarage/bert-to-distilbert-NER + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_twitter_english_lost_job_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_twitter_english_lost_job_pipeline_en.md new file mode 100644 index 00000000000000..af6f8c26b98853 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_twitter_english_lost_job_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_twitter_english_lost_job_pipeline pipeline BertForSequenceClassification from worldbank +author: John Snow Labs +name: bert_twitter_english_lost_job_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_twitter_english_lost_job_pipeline` is a English model originally trained by worldbank. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_twitter_english_lost_job_pipeline_en_5.5.0_3.0_1727277530447.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_twitter_english_lost_job_pipeline_en_5.5.0_3.0_1727277530447.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_twitter_english_lost_job_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_twitter_english_lost_job_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_twitter_english_lost_job_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|405.1 MB| + +## References + +https://huggingface.co/worldbank/bert-twitter-en-lost-job + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bertmodel_en.md b/docs/_posts/ahmedlone127/2024-09-25-bertmodel_en.md new file mode 100644 index 00000000000000..403e2ae1fc14db --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bertmodel_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bertmodel BertForTokenClassification from sigaldanilov +author: John Snow Labs +name: bertmodel +date: 2024-09-25 +tags: [en, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bertmodel` is a English model originally trained by sigaldanilov. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bertmodel_en_5.5.0_3.0_1727246283275.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bertmodel_en_5.5.0_3.0_1727246283275.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("bertmodel","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bertmodel", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bertmodel| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/sigaldanilov/bertmodel \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bertmodel_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-bertmodel_pipeline_en.md new file mode 100644 index 00000000000000..0379e305d9d912 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bertmodel_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bertmodel_pipeline pipeline BertForTokenClassification from sigaldanilov +author: John Snow Labs +name: bertmodel_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bertmodel_pipeline` is a English model originally trained by sigaldanilov. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bertmodel_pipeline_en_5.5.0_3.0_1727246305633.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bertmodel_pipeline_en_5.5.0_3.0_1727246305633.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bertmodel_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bertmodel_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bertmodel_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/sigaldanilov/bertmodel + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-best_model_sst_2_16_21_en.md b/docs/_posts/ahmedlone127/2024-09-25-best_model_sst_2_16_21_en.md new file mode 100644 index 00000000000000..6bfc463485d5cc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-best_model_sst_2_16_21_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English best_model_sst_2_16_21 BertForSequenceClassification from simonycl +author: John Snow Labs +name: best_model_sst_2_16_21 +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`best_model_sst_2_16_21` is a English model originally trained by simonycl. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/best_model_sst_2_16_21_en_5.5.0_3.0_1727266995876.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/best_model_sst_2_16_21_en_5.5.0_3.0_1727266995876.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("best_model_sst_2_16_21","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("best_model_sst_2_16_21", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|best_model_sst_2_16_21| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/simonycl/best_model-sst-2-16-21 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-beto_prescripciones_medicas_es.md b/docs/_posts/ahmedlone127/2024-09-25-beto_prescripciones_medicas_es.md new file mode 100644 index 00000000000000..f8de637f8e1b8a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-beto_prescripciones_medicas_es.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Castilian, Spanish beto_prescripciones_medicas BertForTokenClassification from ccarvajal +author: John Snow Labs +name: beto_prescripciones_medicas +date: 2024-09-25 +tags: [es, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: es +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`beto_prescripciones_medicas` is a Castilian, Spanish model originally trained by ccarvajal. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/beto_prescripciones_medicas_es_5.5.0_3.0_1727271089147.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/beto_prescripciones_medicas_es_5.5.0_3.0_1727271089147.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("beto_prescripciones_medicas","es") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("beto_prescripciones_medicas", "es") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|beto_prescripciones_medicas| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|es| +|Size:|409.5 MB| + +## References + +https://huggingface.co/ccarvajal/beto-prescripciones-medicas \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-beto_sentiment_analysis_finetuned_onpremise_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-beto_sentiment_analysis_finetuned_onpremise_pipeline_en.md new file mode 100644 index 00000000000000..f2d6f9c7b9e76a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-beto_sentiment_analysis_finetuned_onpremise_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English beto_sentiment_analysis_finetuned_onpremise_pipeline pipeline BertForSequenceClassification from Cristian-dcg +author: John Snow Labs +name: beto_sentiment_analysis_finetuned_onpremise_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`beto_sentiment_analysis_finetuned_onpremise_pipeline` is a English model originally trained by Cristian-dcg. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/beto_sentiment_analysis_finetuned_onpremise_pipeline_en_5.5.0_3.0_1727263946307.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/beto_sentiment_analysis_finetuned_onpremise_pipeline_en_5.5.0_3.0_1727263946307.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("beto_sentiment_analysis_finetuned_onpremise_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("beto_sentiment_analysis_finetuned_onpremise_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|beto_sentiment_analysis_finetuned_onpremise_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|411.7 MB| + +## References + +https://huggingface.co/Cristian-dcg/beto-sentiment-analysis-finetuned-onpremise + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-biobert_huner_cell_v1_en.md b/docs/_posts/ahmedlone127/2024-09-25-biobert_huner_cell_v1_en.md new file mode 100644 index 00000000000000..5f2fbb14de4c14 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-biobert_huner_cell_v1_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English biobert_huner_cell_v1 BertForTokenClassification from aitslab +author: John Snow Labs +name: biobert_huner_cell_v1 +date: 2024-09-25 +tags: [en, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`biobert_huner_cell_v1` is a English model originally trained by aitslab. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/biobert_huner_cell_v1_en_5.5.0_3.0_1727246222103.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/biobert_huner_cell_v1_en_5.5.0_3.0_1727246222103.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("biobert_huner_cell_v1","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("biobert_huner_cell_v1", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|biobert_huner_cell_v1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.1 MB| + +## References + +https://huggingface.co/aitslab/biobert_huner_cell_v1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-biobert_huner_disease_v1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-biobert_huner_disease_v1_pipeline_en.md new file mode 100644 index 00000000000000..00ad252e2f275f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-biobert_huner_disease_v1_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English biobert_huner_disease_v1_pipeline pipeline BertForTokenClassification from aitslab +author: John Snow Labs +name: biobert_huner_disease_v1_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`biobert_huner_disease_v1_pipeline` is a English model originally trained by aitslab. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/biobert_huner_disease_v1_pipeline_en_5.5.0_3.0_1727280441019.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/biobert_huner_disease_v1_pipeline_en_5.5.0_3.0_1727280441019.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("biobert_huner_disease_v1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("biobert_huner_disease_v1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|biobert_huner_disease_v1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|403.1 MB| + +## References + +https://huggingface.co/aitslab/biobert_huner_disease_v1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-biobit_drugtemist_italian_ner_en.md b/docs/_posts/ahmedlone127/2024-09-25-biobit_drugtemist_italian_ner_en.md new file mode 100644 index 00000000000000..7f3082ae818a42 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-biobit_drugtemist_italian_ner_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English biobit_drugtemist_italian_ner BertForTokenClassification from Rodrigo1771 +author: John Snow Labs +name: biobit_drugtemist_italian_ner +date: 2024-09-25 +tags: [en, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`biobit_drugtemist_italian_ner` is a English model originally trained by Rodrigo1771. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/biobit_drugtemist_italian_ner_en_5.5.0_3.0_1727258955213.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/biobit_drugtemist_italian_ner_en_5.5.0_3.0_1727258955213.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("biobit_drugtemist_italian_ner","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("biobit_drugtemist_italian_ner", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|biobit_drugtemist_italian_ner| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|409.2 MB| + +## References + +https://huggingface.co/Rodrigo1771/bioBIT-drugtemist-it-ner \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-biomednlp_pubmedbert_proteinstructure_ner_v1_2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-biomednlp_pubmedbert_proteinstructure_ner_v1_2_pipeline_en.md new file mode 100644 index 00000000000000..05851fb62635d5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-biomednlp_pubmedbert_proteinstructure_ner_v1_2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English biomednlp_pubmedbert_proteinstructure_ner_v1_2_pipeline pipeline BertForTokenClassification from PDBEurope +author: John Snow Labs +name: biomednlp_pubmedbert_proteinstructure_ner_v1_2_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`biomednlp_pubmedbert_proteinstructure_ner_v1_2_pipeline` is a English model originally trained by PDBEurope. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/biomednlp_pubmedbert_proteinstructure_ner_v1_2_pipeline_en_5.5.0_3.0_1727275559505.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/biomednlp_pubmedbert_proteinstructure_ner_v1_2_pipeline_en_5.5.0_3.0_1727275559505.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("biomednlp_pubmedbert_proteinstructure_ner_v1_2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("biomednlp_pubmedbert_proteinstructure_ner_v1_2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|biomednlp_pubmedbert_proteinstructure_ner_v1_2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|408.3 MB| + +## References + +https://huggingface.co/PDBEurope/BiomedNLP-PubMedBERT-ProteinStructure-NER-v1.2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-boss_toxicity_24000_bert_base_uncased_en.md b/docs/_posts/ahmedlone127/2024-09-25-boss_toxicity_24000_bert_base_uncased_en.md new file mode 100644 index 00000000000000..67a001d658d96a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-boss_toxicity_24000_bert_base_uncased_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English boss_toxicity_24000_bert_base_uncased BertForSequenceClassification from Kyle1668 +author: John Snow Labs +name: boss_toxicity_24000_bert_base_uncased +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`boss_toxicity_24000_bert_base_uncased` is a English model originally trained by Kyle1668. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/boss_toxicity_24000_bert_base_uncased_en_5.5.0_3.0_1727264076701.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/boss_toxicity_24000_bert_base_uncased_en_5.5.0_3.0_1727264076701.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("boss_toxicity_24000_bert_base_uncased","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("boss_toxicity_24000_bert_base_uncased", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|boss_toxicity_24000_bert_base_uncased| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/Kyle1668/boss-toxicity-24000-bert-base-uncased \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-burmese_awesome_pubmed_bert_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-burmese_awesome_pubmed_bert_pipeline_en.md new file mode 100644 index 00000000000000..9b08f9f5cf98f0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-burmese_awesome_pubmed_bert_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_pubmed_bert_pipeline pipeline BertForTokenClassification from arunavsk1 +author: John Snow Labs +name: burmese_awesome_pubmed_bert_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_pubmed_bert_pipeline` is a English model originally trained by arunavsk1. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_pubmed_bert_pipeline_en_5.5.0_3.0_1727247467774.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_pubmed_bert_pipeline_en_5.5.0_3.0_1727247467774.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_pubmed_bert_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_pubmed_bert_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_pubmed_bert_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/arunavsk1/my-awesome-pubmed-bert + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-burmese_awesome_wnut_model_niharikavats2397_en.md b/docs/_posts/ahmedlone127/2024-09-25-burmese_awesome_wnut_model_niharikavats2397_en.md new file mode 100644 index 00000000000000..2a1aa22aaa9985 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-burmese_awesome_wnut_model_niharikavats2397_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_awesome_wnut_model_niharikavats2397 BertForTokenClassification from niharikavats2397 +author: John Snow Labs +name: burmese_awesome_wnut_model_niharikavats2397 +date: 2024-09-25 +tags: [en, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_wnut_model_niharikavats2397` is a English model originally trained by niharikavats2397. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_wnut_model_niharikavats2397_en_5.5.0_3.0_1727246600387.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_wnut_model_niharikavats2397_en_5.5.0_3.0_1727246600387.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("burmese_awesome_wnut_model_niharikavats2397","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("burmese_awesome_wnut_model_niharikavats2397", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_wnut_model_niharikavats2397| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.3 MB| + +## References + +https://huggingface.co/niharikavats2397/my_awesome_wnut_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-burmese_awesome_wnut_model_niharikavats2397_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-burmese_awesome_wnut_model_niharikavats2397_pipeline_en.md new file mode 100644 index 00000000000000..3929854427a80c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-burmese_awesome_wnut_model_niharikavats2397_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_wnut_model_niharikavats2397_pipeline pipeline BertForTokenClassification from niharikavats2397 +author: John Snow Labs +name: burmese_awesome_wnut_model_niharikavats2397_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_wnut_model_niharikavats2397_pipeline` is a English model originally trained by niharikavats2397. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_wnut_model_niharikavats2397_pipeline_en_5.5.0_3.0_1727246621542.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_wnut_model_niharikavats2397_pipeline_en_5.5.0_3.0_1727246621542.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_wnut_model_niharikavats2397_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_wnut_model_niharikavats2397_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_wnut_model_niharikavats2397_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|403.4 MB| + +## References + +https://huggingface.co/niharikavats2397/my_awesome_wnut_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-camedbert_512_fl32_checkpoint_17386_de.md b/docs/_posts/ahmedlone127/2024-09-25-camedbert_512_fl32_checkpoint_17386_de.md new file mode 100644 index 00000000000000..90cb764113722c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-camedbert_512_fl32_checkpoint_17386_de.md @@ -0,0 +1,94 @@ +--- +layout: model +title: German camedbert_512_fl32_checkpoint_17386 BertForTokenClassification from MSey +author: John Snow Labs +name: camedbert_512_fl32_checkpoint_17386 +date: 2024-09-25 +tags: [de, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: de +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`camedbert_512_fl32_checkpoint_17386` is a German model originally trained by MSey. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/camedbert_512_fl32_checkpoint_17386_de_5.5.0_3.0_1727247124891.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/camedbert_512_fl32_checkpoint_17386_de_5.5.0_3.0_1727247124891.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("camedbert_512_fl32_checkpoint_17386","de") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("camedbert_512_fl32_checkpoint_17386", "de") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|camedbert_512_fl32_checkpoint_17386| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|de| +|Size:|406.9 MB| + +## References + +https://huggingface.co/MSey/CaMedBERT-512_fl32_checkpoint-17386 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-cares_bert_base_en.md b/docs/_posts/ahmedlone127/2024-09-25-cares_bert_base_en.md new file mode 100644 index 00000000000000..9bbf09f69c784a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-cares_bert_base_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English cares_bert_base BertForSequenceClassification from chizhikchi +author: John Snow Labs +name: cares_bert_base +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`cares_bert_base` is a English model originally trained by chizhikchi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/cares_bert_base_en_5.5.0_3.0_1727285262658.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/cares_bert_base_en_5.5.0_3.0_1727285262658.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("cares_bert_base","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("cares_bert_base", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|cares_bert_base| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|411.7 MB| + +## References + +https://huggingface.co/chizhikchi/cares-bert-base \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-case_analysis_inlegalbert_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-case_analysis_inlegalbert_pipeline_en.md new file mode 100644 index 00000000000000..1922fc82ed74f7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-case_analysis_inlegalbert_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English case_analysis_inlegalbert_pipeline pipeline BertForSequenceClassification from cite-text-analysis +author: John Snow Labs +name: case_analysis_inlegalbert_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`case_analysis_inlegalbert_pipeline` is a English model originally trained by cite-text-analysis. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/case_analysis_inlegalbert_pipeline_en_5.5.0_3.0_1727263755028.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/case_analysis_inlegalbert_pipeline_en_5.5.0_3.0_1727263755028.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("case_analysis_inlegalbert_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("case_analysis_inlegalbert_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|case_analysis_inlegalbert_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.5 MB| + +## References + +https://huggingface.co/cite-text-analysis/case-analysis-InLegalBERT + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-chilean_spanish_hate_speech_pipeline_es.md b/docs/_posts/ahmedlone127/2024-09-25-chilean_spanish_hate_speech_pipeline_es.md new file mode 100644 index 00000000000000..9252683571bbce --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-chilean_spanish_hate_speech_pipeline_es.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Castilian, Spanish chilean_spanish_hate_speech_pipeline pipeline BertForSequenceClassification from jorgeortizfuentes +author: John Snow Labs +name: chilean_spanish_hate_speech_pipeline +date: 2024-09-25 +tags: [es, open_source, pipeline, onnx] +task: Text Classification +language: es +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`chilean_spanish_hate_speech_pipeline` is a Castilian, Spanish model originally trained by jorgeortizfuentes. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/chilean_spanish_hate_speech_pipeline_es_5.5.0_3.0_1727245880418.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/chilean_spanish_hate_speech_pipeline_es_5.5.0_3.0_1727245880418.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("chilean_spanish_hate_speech_pipeline", lang = "es") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("chilean_spanish_hate_speech_pipeline", lang = "es") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|chilean_spanish_hate_speech_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|es| +|Size:|411.6 MB| + +## References + +https://huggingface.co/jorgeortizfuentes/chilean-spanish-hate-speech + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-ckiplab_albert_base_chinese_david_ner_en.md b/docs/_posts/ahmedlone127/2024-09-25-ckiplab_albert_base_chinese_david_ner_en.md new file mode 100644 index 00000000000000..bbbb36198ca9e8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-ckiplab_albert_base_chinese_david_ner_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English ckiplab_albert_base_chinese_david_ner BertForTokenClassification from davidliu1110 +author: John Snow Labs +name: ckiplab_albert_base_chinese_david_ner +date: 2024-09-25 +tags: [en, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ckiplab_albert_base_chinese_david_ner` is a English model originally trained by davidliu1110. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ckiplab_albert_base_chinese_david_ner_en_5.5.0_3.0_1727249669813.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ckiplab_albert_base_chinese_david_ner_en_5.5.0_3.0_1727249669813.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("ckiplab_albert_base_chinese_david_ner","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("ckiplab_albert_base_chinese_david_ner", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ckiplab_albert_base_chinese_david_ner| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|37.6 MB| + +## References + +https://huggingface.co/davidliu1110/ckiplab-albert-base-chinese-david-ner \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-ckiplab_albert_base_chinese_david_ner_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-ckiplab_albert_base_chinese_david_ner_pipeline_en.md new file mode 100644 index 00000000000000..05fc000044d35e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-ckiplab_albert_base_chinese_david_ner_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English ckiplab_albert_base_chinese_david_ner_pipeline pipeline BertForTokenClassification from davidliu1110 +author: John Snow Labs +name: ckiplab_albert_base_chinese_david_ner_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ckiplab_albert_base_chinese_david_ner_pipeline` is a English model originally trained by davidliu1110. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ckiplab_albert_base_chinese_david_ner_pipeline_en_5.5.0_3.0_1727249671995.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ckiplab_albert_base_chinese_david_ner_pipeline_en_5.5.0_3.0_1727249671995.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("ckiplab_albert_base_chinese_david_ner_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("ckiplab_albert_base_chinese_david_ner_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ckiplab_albert_base_chinese_david_ner_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|37.6 MB| + +## References + +https://huggingface.co/davidliu1110/ckiplab-albert-base-chinese-david-ner + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-clasificador_poem_sentiment_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-clasificador_poem_sentiment_pipeline_en.md new file mode 100644 index 00000000000000..f2ac974be0c5b0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-clasificador_poem_sentiment_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English clasificador_poem_sentiment_pipeline pipeline BertForSequenceClassification from joheras +author: John Snow Labs +name: clasificador_poem_sentiment_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`clasificador_poem_sentiment_pipeline` is a English model originally trained by joheras. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/clasificador_poem_sentiment_pipeline_en_5.5.0_3.0_1727276929740.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/clasificador_poem_sentiment_pipeline_en_5.5.0_3.0_1727276929740.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("clasificador_poem_sentiment_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("clasificador_poem_sentiment_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|clasificador_poem_sentiment_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/joheras/clasificador-poem-sentiment + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-classicalchineseletterclassification_pipeline_zh.md b/docs/_posts/ahmedlone127/2024-09-25-classicalchineseletterclassification_pipeline_zh.md new file mode 100644 index 00000000000000..c79141b9f062bd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-classicalchineseletterclassification_pipeline_zh.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Chinese classicalchineseletterclassification_pipeline pipeline BertForSequenceClassification from cbdb +author: John Snow Labs +name: classicalchineseletterclassification_pipeline +date: 2024-09-25 +tags: [zh, open_source, pipeline, onnx] +task: Text Classification +language: zh +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`classicalchineseletterclassification_pipeline` is a Chinese model originally trained by cbdb. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/classicalchineseletterclassification_pipeline_zh_5.5.0_3.0_1727267123675.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/classicalchineseletterclassification_pipeline_zh_5.5.0_3.0_1727267123675.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("classicalchineseletterclassification_pipeline", lang = "zh") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("classicalchineseletterclassification_pipeline", lang = "zh") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|classicalchineseletterclassification_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|zh| +|Size:|383.3 MB| + +## References + +https://huggingface.co/cbdb/ClassicalChineseLetterClassification + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-classicalchineseletterclassification_zh.md b/docs/_posts/ahmedlone127/2024-09-25-classicalchineseletterclassification_zh.md new file mode 100644 index 00000000000000..a582014c71ffcd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-classicalchineseletterclassification_zh.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Chinese classicalchineseletterclassification BertForSequenceClassification from cbdb +author: John Snow Labs +name: classicalchineseletterclassification +date: 2024-09-25 +tags: [zh, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: zh +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`classicalchineseletterclassification` is a Chinese model originally trained by cbdb. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/classicalchineseletterclassification_zh_5.5.0_3.0_1727267102616.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/classicalchineseletterclassification_zh_5.5.0_3.0_1727267102616.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("classicalchineseletterclassification","zh") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("classicalchineseletterclassification", "zh") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|classicalchineseletterclassification| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|zh| +|Size:|383.3 MB| + +## References + +https://huggingface.co/cbdb/ClassicalChineseLetterClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-classifier_theojolliffe_en.md b/docs/_posts/ahmedlone127/2024-09-25-classifier_theojolliffe_en.md new file mode 100644 index 00000000000000..1aebe4faee0112 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-classifier_theojolliffe_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English classifier_theojolliffe BertForSequenceClassification from theojolliffe +author: John Snow Labs +name: classifier_theojolliffe +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`classifier_theojolliffe` is a English model originally trained by theojolliffe. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/classifier_theojolliffe_en_5.5.0_3.0_1727266697558.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/classifier_theojolliffe_en_5.5.0_3.0_1727266697558.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("classifier_theojolliffe","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("classifier_theojolliffe", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|classifier_theojolliffe| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|415.8 MB| + +## References + +https://huggingface.co/theojolliffe/classifier \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-classifier_theojolliffe_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-classifier_theojolliffe_pipeline_en.md new file mode 100644 index 00000000000000..bcc635aac4bb00 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-classifier_theojolliffe_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English classifier_theojolliffe_pipeline pipeline BertForSequenceClassification from theojolliffe +author: John Snow Labs +name: classifier_theojolliffe_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`classifier_theojolliffe_pipeline` is a English model originally trained by theojolliffe. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/classifier_theojolliffe_pipeline_en_5.5.0_3.0_1727266719676.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/classifier_theojolliffe_pipeline_en_5.5.0_3.0_1727266719676.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("classifier_theojolliffe_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("classifier_theojolliffe_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|classifier_theojolliffe_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|415.8 MB| + +## References + +https://huggingface.co/theojolliffe/classifier + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-clinicalbert_finetuned_en.md b/docs/_posts/ahmedlone127/2024-09-25-clinicalbert_finetuned_en.md new file mode 100644 index 00000000000000..c02c9d611d0f00 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-clinicalbert_finetuned_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English clinicalbert_finetuned BertForSequenceClassification from SrinivasaPragada +author: John Snow Labs +name: clinicalbert_finetuned +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`clinicalbert_finetuned` is a English model originally trained by SrinivasaPragada. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/clinicalbert_finetuned_en_5.5.0_3.0_1727254489266.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/clinicalbert_finetuned_en_5.5.0_3.0_1727254489266.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("clinicalbert_finetuned","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("clinicalbert_finetuned", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|clinicalbert_finetuned| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|405.6 MB| + +## References + +https://huggingface.co/SrinivasaPragada/clinicalbert-finetuned \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-cnc_v2_st1_csc_en.md b/docs/_posts/ahmedlone127/2024-09-25-cnc_v2_st1_csc_en.md new file mode 100644 index 00000000000000..e6ae349a4272b1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-cnc_v2_st1_csc_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English cnc_v2_st1_csc BertForSequenceClassification from tanfiona +author: John Snow Labs +name: cnc_v2_st1_csc +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`cnc_v2_st1_csc` is a English model originally trained by tanfiona. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/cnc_v2_st1_csc_en_5.5.0_3.0_1727269584466.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/cnc_v2_st1_csc_en_5.5.0_3.0_1727269584466.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("cnc_v2_st1_csc","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("cnc_v2_st1_csc", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|cnc_v2_st1_csc| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|405.9 MB| + +## References + +https://huggingface.co/tanfiona/cnc-v2-st1-csc \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-cnc_v2_st1_csc_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-cnc_v2_st1_csc_pipeline_en.md new file mode 100644 index 00000000000000..29cc011c2c3ed5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-cnc_v2_st1_csc_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English cnc_v2_st1_csc_pipeline pipeline BertForSequenceClassification from tanfiona +author: John Snow Labs +name: cnc_v2_st1_csc_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`cnc_v2_st1_csc_pipeline` is a English model originally trained by tanfiona. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/cnc_v2_st1_csc_pipeline_en_5.5.0_3.0_1727269605981.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/cnc_v2_st1_csc_pipeline_en_5.5.0_3.0_1727269605981.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("cnc_v2_st1_csc_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("cnc_v2_st1_csc_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|cnc_v2_st1_csc_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|405.9 MB| + +## References + +https://huggingface.co/tanfiona/cnc-v2-st1-csc + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-conjunction_classification_finetuned_en.md b/docs/_posts/ahmedlone127/2024-09-25-conjunction_classification_finetuned_en.md new file mode 100644 index 00000000000000..847861e24ace54 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-conjunction_classification_finetuned_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English conjunction_classification_finetuned BertForSequenceClassification from nhanpv +author: John Snow Labs +name: conjunction_classification_finetuned +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`conjunction_classification_finetuned` is a English model originally trained by nhanpv. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/conjunction_classification_finetuned_en_5.5.0_3.0_1727288691878.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/conjunction_classification_finetuned_en_5.5.0_3.0_1727288691878.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("conjunction_classification_finetuned","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("conjunction_classification_finetuned", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|conjunction_classification_finetuned| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/nhanpv/conjunction-classification-finetuned \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-conjunction_classification_finetuned_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-conjunction_classification_finetuned_pipeline_en.md new file mode 100644 index 00000000000000..ebc661c1740428 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-conjunction_classification_finetuned_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English conjunction_classification_finetuned_pipeline pipeline BertForSequenceClassification from nhanpv +author: John Snow Labs +name: conjunction_classification_finetuned_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`conjunction_classification_finetuned_pipeline` is a English model originally trained by nhanpv. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/conjunction_classification_finetuned_pipeline_en_5.5.0_3.0_1727288718154.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/conjunction_classification_finetuned_pipeline_en_5.5.0_3.0_1727288718154.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("conjunction_classification_finetuned_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("conjunction_classification_finetuned_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|conjunction_classification_finetuned_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/nhanpv/conjunction-classification-finetuned + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-consumer_complaint_categorization_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-consumer_complaint_categorization_pipeline_en.md new file mode 100644 index 00000000000000..4db5499dfc12a2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-consumer_complaint_categorization_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English consumer_complaint_categorization_pipeline pipeline BertForSequenceClassification from ThirdEyeData +author: John Snow Labs +name: consumer_complaint_categorization_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`consumer_complaint_categorization_pipeline` is a English model originally trained by ThirdEyeData. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/consumer_complaint_categorization_pipeline_en_5.5.0_3.0_1727245845783.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/consumer_complaint_categorization_pipeline_en_5.5.0_3.0_1727245845783.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("consumer_complaint_categorization_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("consumer_complaint_categorization_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|consumer_complaint_categorization_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/ThirdEyeData/Consumer-Complaint-Categorization + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-correct_bert_token_itr0_0_0001_essays_01_03_2022_15_48_47_en.md b/docs/_posts/ahmedlone127/2024-09-25-correct_bert_token_itr0_0_0001_essays_01_03_2022_15_48_47_en.md new file mode 100644 index 00000000000000..d3209048a792f6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-correct_bert_token_itr0_0_0001_essays_01_03_2022_15_48_47_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English correct_bert_token_itr0_0_0001_essays_01_03_2022_15_48_47 BertForTokenClassification from ali2066 +author: John Snow Labs +name: correct_bert_token_itr0_0_0001_essays_01_03_2022_15_48_47 +date: 2024-09-25 +tags: [en, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`correct_bert_token_itr0_0_0001_essays_01_03_2022_15_48_47` is a English model originally trained by ali2066. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/correct_bert_token_itr0_0_0001_essays_01_03_2022_15_48_47_en_5.5.0_3.0_1727270650498.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/correct_bert_token_itr0_0_0001_essays_01_03_2022_15_48_47_en_5.5.0_3.0_1727270650498.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("correct_bert_token_itr0_0_0001_essays_01_03_2022_15_48_47","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("correct_bert_token_itr0_0_0001_essays_01_03_2022_15_48_47", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|correct_bert_token_itr0_0_0001_essays_01_03_2022_15_48_47| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/ali2066/correct_BERT_token_itr0_0.0001_essays_01_03_2022-15_48_47 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-correct_bert_token_itr0_0_0001_essays_01_03_2022_15_48_47_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-correct_bert_token_itr0_0_0001_essays_01_03_2022_15_48_47_pipeline_en.md new file mode 100644 index 00000000000000..a068e7fde012a8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-correct_bert_token_itr0_0_0001_essays_01_03_2022_15_48_47_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English correct_bert_token_itr0_0_0001_essays_01_03_2022_15_48_47_pipeline pipeline BertForTokenClassification from ali2066 +author: John Snow Labs +name: correct_bert_token_itr0_0_0001_essays_01_03_2022_15_48_47_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`correct_bert_token_itr0_0_0001_essays_01_03_2022_15_48_47_pipeline` is a English model originally trained by ali2066. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/correct_bert_token_itr0_0_0001_essays_01_03_2022_15_48_47_pipeline_en_5.5.0_3.0_1727270671729.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/correct_bert_token_itr0_0_0001_essays_01_03_2022_15_48_47_pipeline_en_5.5.0_3.0_1727270671729.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("correct_bert_token_itr0_0_0001_essays_01_03_2022_15_48_47_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("correct_bert_token_itr0_0_0001_essays_01_03_2022_15_48_47_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|correct_bert_token_itr0_0_0001_essays_01_03_2022_15_48_47_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/ali2066/correct_BERT_token_itr0_0.0001_essays_01_03_2022-15_48_47 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-crypto_sentiment_analysis_bert_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-crypto_sentiment_analysis_bert_pipeline_en.md new file mode 100644 index 00000000000000..53efac5f0b324c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-crypto_sentiment_analysis_bert_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English crypto_sentiment_analysis_bert_pipeline pipeline BertForSequenceClassification from Robertuus +author: John Snow Labs +name: crypto_sentiment_analysis_bert_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`crypto_sentiment_analysis_bert_pipeline` is a English model originally trained by Robertuus. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/crypto_sentiment_analysis_bert_pipeline_en_5.5.0_3.0_1727285070915.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/crypto_sentiment_analysis_bert_pipeline_en_5.5.0_3.0_1727285070915.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("crypto_sentiment_analysis_bert_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("crypto_sentiment_analysis_bert_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|crypto_sentiment_analysis_bert_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/Robertuus/Crypto_Sentiment_Analysis_Bert + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-crypto_sentiment_en.md b/docs/_posts/ahmedlone127/2024-09-25-crypto_sentiment_en.md new file mode 100644 index 00000000000000..5f3f66a2a1c48c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-crypto_sentiment_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English crypto_sentiment BertForSequenceClassification from ckandemir +author: John Snow Labs +name: crypto_sentiment +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`crypto_sentiment` is a English model originally trained by ckandemir. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/crypto_sentiment_en_5.5.0_3.0_1727268448023.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/crypto_sentiment_en_5.5.0_3.0_1727268448023.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("crypto_sentiment","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("crypto_sentiment", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|crypto_sentiment| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/ckandemir/crypto_sentiment \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-crypto_sentiment_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-crypto_sentiment_pipeline_en.md new file mode 100644 index 00000000000000..7efc59467d69ab --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-crypto_sentiment_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English crypto_sentiment_pipeline pipeline BertForSequenceClassification from ckandemir +author: John Snow Labs +name: crypto_sentiment_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`crypto_sentiment_pipeline` is a English model originally trained by ckandemir. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/crypto_sentiment_pipeline_en_5.5.0_3.0_1727268470814.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/crypto_sentiment_pipeline_en_5.5.0_3.0_1727268470814.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("crypto_sentiment_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("crypto_sentiment_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|crypto_sentiment_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/ckandemir/crypto_sentiment + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-cvai_bert_asag_en.md b/docs/_posts/ahmedlone127/2024-09-25-cvai_bert_asag_en.md new file mode 100644 index 00000000000000..cfc47fff6376a7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-cvai_bert_asag_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English cvai_bert_asag BertForSequenceClassification from johnpaulbin +author: John Snow Labs +name: cvai_bert_asag +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`cvai_bert_asag` is a English model originally trained by johnpaulbin. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/cvai_bert_asag_en_5.5.0_3.0_1727286207987.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/cvai_bert_asag_en_5.5.0_3.0_1727286207987.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("cvai_bert_asag","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("cvai_bert_asag", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|cvai_bert_asag| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|405.9 MB| + +## References + +https://huggingface.co/johnpaulbin/cvai-bert-asag \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-cvai_bert_asag_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-cvai_bert_asag_pipeline_en.md new file mode 100644 index 00000000000000..bc604c5c661f39 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-cvai_bert_asag_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English cvai_bert_asag_pipeline pipeline BertForSequenceClassification from johnpaulbin +author: John Snow Labs +name: cvai_bert_asag_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`cvai_bert_asag_pipeline` is a English model originally trained by johnpaulbin. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/cvai_bert_asag_pipeline_en_5.5.0_3.0_1727286229187.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/cvai_bert_asag_pipeline_en_5.5.0_3.0_1727286229187.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("cvai_bert_asag_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("cvai_bert_asag_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|cvai_bert_asag_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|405.9 MB| + +## References + +https://huggingface.co/johnpaulbin/cvai-bert-asag + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-danish_bert_en.md b/docs/_posts/ahmedlone127/2024-09-25-danish_bert_en.md new file mode 100644 index 00000000000000..63176340962c83 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-danish_bert_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English danish_bert BertEmbeddings from iolariu +author: John Snow Labs +name: danish_bert +date: 2024-09-25 +tags: [en, open_source, onnx, embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`danish_bert` is a English model originally trained by iolariu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/danish_bert_en_5.5.0_3.0_1727232586705.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/danish_bert_en_5.5.0_3.0_1727232586705.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = BertEmbeddings.pretrained("danish_bert","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = BertEmbeddings.pretrained("danish_bert","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|danish_bert| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[bert]| +|Language:|en| +|Size:|406.0 MB| + +## References + +https://huggingface.co/iolariu/DA_BERT \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-dbpedia_classes_bert_base_uncased_few_20_en.md b/docs/_posts/ahmedlone127/2024-09-25-dbpedia_classes_bert_base_uncased_few_20_en.md new file mode 100644 index 00000000000000..21acbd6c8e0b04 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-dbpedia_classes_bert_base_uncased_few_20_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English dbpedia_classes_bert_base_uncased_few_20 BertForSequenceClassification from TheChickenAgent +author: John Snow Labs +name: dbpedia_classes_bert_base_uncased_few_20 +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`dbpedia_classes_bert_base_uncased_few_20` is a English model originally trained by TheChickenAgent. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/dbpedia_classes_bert_base_uncased_few_20_en_5.5.0_3.0_1727286884203.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/dbpedia_classes_bert_base_uncased_few_20_en_5.5.0_3.0_1727286884203.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("dbpedia_classes_bert_base_uncased_few_20","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("dbpedia_classes_bert_base_uncased_few_20", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|dbpedia_classes_bert_base_uncased_few_20| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/TheChickenAgent/DBPedia_Classes_BERT-base-uncased-few-20 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-dbpedia_classes_bert_base_uncased_few_20_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-dbpedia_classes_bert_base_uncased_few_20_pipeline_en.md new file mode 100644 index 00000000000000..d167e65599cad0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-dbpedia_classes_bert_base_uncased_few_20_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English dbpedia_classes_bert_base_uncased_few_20_pipeline pipeline BertForSequenceClassification from TheChickenAgent +author: John Snow Labs +name: dbpedia_classes_bert_base_uncased_few_20_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`dbpedia_classes_bert_base_uncased_few_20_pipeline` is a English model originally trained by TheChickenAgent. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/dbpedia_classes_bert_base_uncased_few_20_pipeline_en_5.5.0_3.0_1727286905887.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/dbpedia_classes_bert_base_uncased_few_20_pipeline_en_5.5.0_3.0_1727286905887.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("dbpedia_classes_bert_base_uncased_few_20_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("dbpedia_classes_bert_base_uncased_few_20_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|dbpedia_classes_bert_base_uncased_few_20_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/TheChickenAgent/DBPedia_Classes_BERT-base-uncased-few-20 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-decision_bert_bio_en.md b/docs/_posts/ahmedlone127/2024-09-25-decision_bert_bio_en.md new file mode 100644 index 00000000000000..0dc7ac2fa1add4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-decision_bert_bio_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English decision_bert_bio BertForSequenceClassification from k-partha +author: John Snow Labs +name: decision_bert_bio +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`decision_bert_bio` is a English model originally trained by k-partha. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/decision_bert_bio_en_5.5.0_3.0_1727273102482.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/decision_bert_bio_en_5.5.0_3.0_1727273102482.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("decision_bert_bio","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("decision_bert_bio", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|decision_bert_bio| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/k-partha/decision_bert_bio \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-decision_bert_bio_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-decision_bert_bio_pipeline_en.md new file mode 100644 index 00000000000000..e74c2559d67dba --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-decision_bert_bio_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English decision_bert_bio_pipeline pipeline BertForSequenceClassification from k-partha +author: John Snow Labs +name: decision_bert_bio_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`decision_bert_bio_pipeline` is a English model originally trained by k-partha. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/decision_bert_bio_pipeline_en_5.5.0_3.0_1727273137493.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/decision_bert_bio_pipeline_en_5.5.0_3.0_1727273137493.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("decision_bert_bio_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("decision_bert_bio_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|decision_bert_bio_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/k-partha/decision_bert_bio + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-destractive_context_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-destractive_context_pipeline_en.md new file mode 100644 index 00000000000000..63a93aadc03014 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-destractive_context_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English destractive_context_pipeline pipeline BertForSequenceClassification from Vlad1m +author: John Snow Labs +name: destractive_context_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`destractive_context_pipeline` is a English model originally trained by Vlad1m. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/destractive_context_pipeline_en_5.5.0_3.0_1727261275422.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/destractive_context_pipeline_en_5.5.0_3.0_1727261275422.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("destractive_context_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("destractive_context_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|destractive_context_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.6 GB| + +## References + +https://huggingface.co/Vlad1m/destractive_context + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-dialect_msa_detection_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-dialect_msa_detection_pipeline_en.md new file mode 100644 index 00000000000000..bbed8ef2e34011 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-dialect_msa_detection_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English dialect_msa_detection_pipeline pipeline XlmRoBertaForSequenceClassification from sadanyh +author: John Snow Labs +name: dialect_msa_detection_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`dialect_msa_detection_pipeline` is a English model originally trained by sadanyh. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/dialect_msa_detection_pipeline_en_5.5.0_3.0_1727229668451.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/dialect_msa_detection_pipeline_en_5.5.0_3.0_1727229668451.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("dialect_msa_detection_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("dialect_msa_detection_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|dialect_msa_detection_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|782.4 MB| + +## References + +https://huggingface.co/sadanyh/Dialect-MSA-detection + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-dialogue_final_model_en.md b/docs/_posts/ahmedlone127/2024-09-25-dialogue_final_model_en.md new file mode 100644 index 00000000000000..fcec274221a7cb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-dialogue_final_model_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English dialogue_final_model BertForSequenceClassification from SharonTudi +author: John Snow Labs +name: dialogue_final_model +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`dialogue_final_model` is a English model originally trained by SharonTudi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/dialogue_final_model_en_5.5.0_3.0_1727288753786.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/dialogue_final_model_en_5.5.0_3.0_1727288753786.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("dialogue_final_model","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("dialogue_final_model", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|dialogue_final_model| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/SharonTudi/DIALOGUE_final_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-distilbert_base_cased_en.md b/docs/_posts/ahmedlone127/2024-09-25-distilbert_base_cased_en.md new file mode 100644 index 00000000000000..88e6c4ebf0be21 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-distilbert_base_cased_en.md @@ -0,0 +1,96 @@ +--- +layout: model +title: DistilBERT base model (cased) +author: John Snow Labs +name: distilbert_base_cased +date: 2024-09-25 +tags: [distilbert, en, english, open_source, embeddings, onnx, openvino] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: openvino +annotator: DistilBertEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +This model is a distilled version of the [BERT base model](https://huggingface.co/bert-base-cased). It was introduced in [this paper](https://arxiv.org/abs/1910.01108). The code for the distillation process can be found [here](https://github.com/huggingface/transformers/tree/master/examples/research_projects/distillation). This model is cased: it does make a difference between english and English. + +## Predicted Entities + + + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_cased_en_5.5.0_3.0_1727268763405.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_cased_en_5.5.0_3.0_1727268763405.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + +{:.model-param} + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +embeddings = DistilBertEmbeddings.pretrained("distilbert_base_cased", "en") \ +.setInputCols("sentence", "token") \ +.setOutputCol("embeddings") +nlp_pipeline = Pipeline(stages=[document_assembler, sentence_detector, tokenizer, embeddings]) +``` +```scala +val embeddings = DistilBertEmbeddings.pretrained("distilbert_base_cased", "en") +.setInputCols("sentence", "token") +.setOutputCol("embeddings") +val pipeline = new Pipeline().setStages(Array(document_assembler, sentence_detector, tokenizer, embeddings)) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.embed.distilbert").predict("""Put your text here.""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_cased| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|243.6 MB| +|Case sensitive:|false| +|Max sentence length:|512| + +## References + +References + +[https://huggingface.co/distilbert-base-cased](https://huggingface.co/distilbert-base-cased) + +## Benchmarking + +```bash + +Benchmarking + + +When fine-tuned on downstream tasks, this model achieves the following results: + +Glue test results: + +| Task | MNLI | QQP | QNLI | SST-2 | CoLA | STS-B | MRPC | RTE | +|:----:|:----:|:----:|:----:|:-----:|:----:|:-----:|:----:|:----:| +| | 81.5 | 87.8 | 88.2 | 90.4 | 47.2 | 85.5 | 85.6 | 60.6 | +``` \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-distilbert_base_uncased_accelerate_en.md b/docs/_posts/ahmedlone127/2024-09-25-distilbert_base_uncased_accelerate_en.md new file mode 100644 index 00000000000000..d887395f391ff2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-distilbert_base_uncased_accelerate_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_accelerate BertForTokenClassification from NSandra +author: John Snow Labs +name: distilbert_base_uncased_accelerate +date: 2024-09-25 +tags: [en, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_accelerate` is a English model originally trained by NSandra. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_accelerate_en_5.5.0_3.0_1727283620759.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_accelerate_en_5.5.0_3.0_1727283620759.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("distilbert_base_uncased_accelerate","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("distilbert_base_uncased_accelerate", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_accelerate| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/NSandra/distilbert-base-uncased-accelerate \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-distilbert_emotion_yenicerisgk_en.md b/docs/_posts/ahmedlone127/2024-09-25-distilbert_emotion_yenicerisgk_en.md new file mode 100644 index 00000000000000..b6bfbacbce2401 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-distilbert_emotion_yenicerisgk_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_emotion_yenicerisgk BertForSequenceClassification from yeniceriSGK +author: John Snow Labs +name: distilbert_emotion_yenicerisgk +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_emotion_yenicerisgk` is a English model originally trained by yeniceriSGK. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_emotion_yenicerisgk_en_5.5.0_3.0_1727237438555.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_emotion_yenicerisgk_en_5.5.0_3.0_1727237438555.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("distilbert_emotion_yenicerisgk","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("distilbert_emotion_yenicerisgk", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_emotion_yenicerisgk| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/yeniceriSGK/distilbert-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-distilbert_emotion_yenicerisgk_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-distilbert_emotion_yenicerisgk_pipeline_en.md new file mode 100644 index 00000000000000..ee22b1dc3ca919 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-distilbert_emotion_yenicerisgk_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_emotion_yenicerisgk_pipeline pipeline BertForSequenceClassification from yeniceriSGK +author: John Snow Labs +name: distilbert_emotion_yenicerisgk_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_emotion_yenicerisgk_pipeline` is a English model originally trained by yeniceriSGK. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_emotion_yenicerisgk_pipeline_en_5.5.0_3.0_1727237459844.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_emotion_yenicerisgk_pipeline_en_5.5.0_3.0_1727237459844.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_emotion_yenicerisgk_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_emotion_yenicerisgk_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_emotion_yenicerisgk_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/yeniceriSGK/distilbert-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-distilbert_finetuned_imdb_neural_net_rahul_en.md b/docs/_posts/ahmedlone127/2024-09-25-distilbert_finetuned_imdb_neural_net_rahul_en.md new file mode 100644 index 00000000000000..f043707d37e842 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-distilbert_finetuned_imdb_neural_net_rahul_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_finetuned_imdb_neural_net_rahul BertEmbeddings from neural-net-rahul +author: John Snow Labs +name: distilbert_finetuned_imdb_neural_net_rahul +date: 2024-09-25 +tags: [en, open_source, onnx, embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_finetuned_imdb_neural_net_rahul` is a English model originally trained by neural-net-rahul. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_finetuned_imdb_neural_net_rahul_en_5.5.0_3.0_1727231514095.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_finetuned_imdb_neural_net_rahul_en_5.5.0_3.0_1727231514095.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = BertEmbeddings.pretrained("distilbert_finetuned_imdb_neural_net_rahul","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = BertEmbeddings.pretrained("distilbert_finetuned_imdb_neural_net_rahul","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_finetuned_imdb_neural_net_rahul| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[bert]| +|Language:|en| +|Size:|403.6 MB| + +## References + +https://huggingface.co/neural-net-rahul/distilbert-finetuned-imdb \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-distilbert_finetuned_imdb_neural_net_rahul_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-distilbert_finetuned_imdb_neural_net_rahul_pipeline_en.md new file mode 100644 index 00000000000000..fe06b8f1f3fb92 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-distilbert_finetuned_imdb_neural_net_rahul_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_finetuned_imdb_neural_net_rahul_pipeline pipeline BertEmbeddings from neural-net-rahul +author: John Snow Labs +name: distilbert_finetuned_imdb_neural_net_rahul_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_finetuned_imdb_neural_net_rahul_pipeline` is a English model originally trained by neural-net-rahul. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_finetuned_imdb_neural_net_rahul_pipeline_en_5.5.0_3.0_1727231535128.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_finetuned_imdb_neural_net_rahul_pipeline_en_5.5.0_3.0_1727231535128.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_finetuned_imdb_neural_net_rahul_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_finetuned_imdb_neural_net_rahul_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_finetuned_imdb_neural_net_rahul_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/neural-net-rahul/distilbert-finetuned-imdb + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-distilbert_portuguese_cased_finetuned_quantity_en.md b/docs/_posts/ahmedlone127/2024-09-25-distilbert_portuguese_cased_finetuned_quantity_en.md new file mode 100644 index 00000000000000..7a960caae1d799 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-distilbert_portuguese_cased_finetuned_quantity_en.md @@ -0,0 +1,98 @@ +--- +layout: model +title: English distilbert_portuguese_cased_finetuned_quantity BertForSequenceClassification from alexia20816 +author: John Snow Labs +name: distilbert_portuguese_cased_finetuned_quantity +date: 2024-09-25 +tags: [bert, en, open_source, sequence_classification, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_portuguese_cased_finetuned_quantity` is a English model originally trained by alexia20816. + +## Predicted Entities + + + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_portuguese_cased_finetuned_quantity_en_5.5.0_3.0_1727235868972.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_portuguese_cased_finetuned_quantity_en_5.5.0_3.0_1727235868972.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +document_assembler = DocumentAssembler()\ + .setInputCol("text")\ + .setOutputCol("document") + +tokenizer = Tokenizer()\ + .setInputCols("document")\ + .setOutputCol("token") + +sequenceClassifier = BertForSequenceClassification.pretrained("distilbert_portuguese_cased_finetuned_quantity","en")\ + .setInputCols(["document","token"])\ + .setOutputCol("class") + +pipeline = Pipeline().setStages([document_assembler, tokenizer, sequenceClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val document_assembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("distilbert_portuguese_cased_finetuned_quantity","en") + .setInputCols(Array("document","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_portuguese_cased_finetuned_quantity| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|279.9 MB| + +## References + +References + +https://huggingface.co/alexia20816/distilbert-portuguese-cased-finetuned-quantity \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-distilbert_portuguese_cased_finetuned_quantity_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-distilbert_portuguese_cased_finetuned_quantity_pipeline_en.md new file mode 100644 index 00000000000000..1794a824f0de41 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-distilbert_portuguese_cased_finetuned_quantity_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_portuguese_cased_finetuned_quantity_pipeline pipeline BertForSequenceClassification from xc2450 +author: John Snow Labs +name: distilbert_portuguese_cased_finetuned_quantity_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_portuguese_cased_finetuned_quantity_pipeline` is a English model originally trained by xc2450. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_portuguese_cased_finetuned_quantity_pipeline_en_5.5.0_3.0_1727235884228.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_portuguese_cased_finetuned_quantity_pipeline_en_5.5.0_3.0_1727235884228.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_portuguese_cased_finetuned_quantity_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_portuguese_cased_finetuned_quantity_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_portuguese_cased_finetuned_quantity_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|279.9 MB| + +## References + +https://huggingface.co/xc2450/distilbert-portuguese-cased-finetuned-quantity + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-distillbert_distilled_ag_news_en.md b/docs/_posts/ahmedlone127/2024-09-25-distillbert_distilled_ag_news_en.md new file mode 100644 index 00000000000000..d25c7a97ce77cd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-distillbert_distilled_ag_news_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distillbert_distilled_ag_news BertForSequenceClassification from odunola +author: John Snow Labs +name: distillbert_distilled_ag_news +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distillbert_distilled_ag_news` is a English model originally trained by odunola. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distillbert_distilled_ag_news_en_5.5.0_3.0_1727264201881.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distillbert_distilled_ag_news_en_5.5.0_3.0_1727264201881.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("distillbert_distilled_ag_news","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("distillbert_distilled_ag_news", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distillbert_distilled_ag_news| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|65.8 MB| + +## References + +https://huggingface.co/odunola/distillbert-distilled-ag-news \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-distillbert_distilled_ag_news_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-distillbert_distilled_ag_news_pipeline_en.md new file mode 100644 index 00000000000000..0dbdd8a357bf1e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-distillbert_distilled_ag_news_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distillbert_distilled_ag_news_pipeline pipeline BertForSequenceClassification from odunola +author: John Snow Labs +name: distillbert_distilled_ag_news_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distillbert_distilled_ag_news_pipeline` is a English model originally trained by odunola. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distillbert_distilled_ag_news_pipeline_en_5.5.0_3.0_1727264205159.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distillbert_distilled_ag_news_pipeline_en_5.5.0_3.0_1727264205159.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distillbert_distilled_ag_news_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distillbert_distilled_ag_news_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distillbert_distilled_ag_news_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|65.8 MB| + +## References + +https://huggingface.co/odunola/distillbert-distilled-ag-news + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-e5_large_mnli_en.md b/docs/_posts/ahmedlone127/2024-09-25-e5_large_mnli_en.md new file mode 100644 index 00000000000000..4ba192ba479c33 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-e5_large_mnli_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English e5_large_mnli BertForZeroShotClassification from mjwong +author: John Snow Labs +name: e5_large_mnli +date: 2024-09-25 +tags: [en, open_source, onnx, zero_shot, bert] +task: Zero-Shot Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForZeroShotClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForZeroShotClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`e5_large_mnli` is a English model originally trained by mjwong. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/e5_large_mnli_en_5.5.0_3.0_1727222972046.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/e5_large_mnli_en_5.5.0_3.0_1727222972046.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +zeroShotClassifier = BertForZeroShotClassification.pretrained("e5_large_mnli","en") \ + .setInputCols(["document","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, zeroShotClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val zeroShotClassifier = BertForZeroShotClassification.pretrained("e5_large_mnli", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, zeroShotClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|e5_large_mnli| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/mjwong/e5-large-mnli \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-e5_large_mnli_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-e5_large_mnli_pipeline_en.md new file mode 100644 index 00000000000000..f6596e63e770dd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-e5_large_mnli_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English e5_large_mnli_pipeline pipeline BertForZeroShotClassification from mjwong +author: John Snow Labs +name: e5_large_mnli_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Zero-Shot Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForZeroShotClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`e5_large_mnli_pipeline` is a English model originally trained by mjwong. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/e5_large_mnli_pipeline_en_5.5.0_3.0_1727223033526.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/e5_large_mnli_pipeline_en_5.5.0_3.0_1727223033526.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("e5_large_mnli_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("e5_large_mnli_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|e5_large_mnli_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/mjwong/e5-large-mnli + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForZeroShotClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-english_astitchtask1a_bertbasecased_falsetrue_0_3_best_en.md b/docs/_posts/ahmedlone127/2024-09-25-english_astitchtask1a_bertbasecased_falsetrue_0_3_best_en.md new file mode 100644 index 00000000000000..15db0a3b61b507 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-english_astitchtask1a_bertbasecased_falsetrue_0_3_best_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English english_astitchtask1a_bertbasecased_falsetrue_0_3_best BertForSequenceClassification from harish +author: John Snow Labs +name: english_astitchtask1a_bertbasecased_falsetrue_0_3_best +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`english_astitchtask1a_bertbasecased_falsetrue_0_3_best` is a English model originally trained by harish. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/english_astitchtask1a_bertbasecased_falsetrue_0_3_best_en_5.5.0_3.0_1727277609277.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/english_astitchtask1a_bertbasecased_falsetrue_0_3_best_en_5.5.0_3.0_1727277609277.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("english_astitchtask1a_bertbasecased_falsetrue_0_3_best","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("english_astitchtask1a_bertbasecased_falsetrue_0_3_best", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|english_astitchtask1a_bertbasecased_falsetrue_0_3_best| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|405.9 MB| + +## References + +https://huggingface.co/harish/EN-AStitchTask1A-BERTBaseCased-FalseTrue-0-3-BEST \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-english_base_en.md b/docs/_posts/ahmedlone127/2024-09-25-english_base_en.md new file mode 100644 index 00000000000000..c70489d6a1f7d9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-english_base_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English english_base BertForTokenClassification from mudes +author: John Snow Labs +name: english_base +date: 2024-09-25 +tags: [en, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`english_base` is a English model originally trained by mudes. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/english_base_en_5.5.0_3.0_1727270402283.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/english_base_en_5.5.0_3.0_1727270402283.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("english_base","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("english_base", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|english_base| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/mudes/en-base \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-estbert128_rubric_et.md b/docs/_posts/ahmedlone127/2024-09-25-estbert128_rubric_et.md new file mode 100644 index 00000000000000..8db925642367d3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-estbert128_rubric_et.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Estonian estbert128_rubric BertForSequenceClassification from tartuNLP +author: John Snow Labs +name: estbert128_rubric +date: 2024-09-25 +tags: [et, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: et +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`estbert128_rubric` is a Estonian model originally trained by tartuNLP. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/estbert128_rubric_et_5.5.0_3.0_1727272904136.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/estbert128_rubric_et_5.5.0_3.0_1727272904136.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("estbert128_rubric","et") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("estbert128_rubric", "et") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|estbert128_rubric| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|et| +|Size:|465.7 MB| + +## References + +https://huggingface.co/tartuNLP/EstBERT128_Rubric \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-estbert128_rubric_pipeline_et.md b/docs/_posts/ahmedlone127/2024-09-25-estbert128_rubric_pipeline_et.md new file mode 100644 index 00000000000000..40eb842769e69c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-estbert128_rubric_pipeline_et.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Estonian estbert128_rubric_pipeline pipeline BertForSequenceClassification from tartuNLP +author: John Snow Labs +name: estbert128_rubric_pipeline +date: 2024-09-25 +tags: [et, open_source, pipeline, onnx] +task: Text Classification +language: et +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`estbert128_rubric_pipeline` is a Estonian model originally trained by tartuNLP. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/estbert128_rubric_pipeline_et_5.5.0_3.0_1727272932764.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/estbert128_rubric_pipeline_et_5.5.0_3.0_1727272932764.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("estbert128_rubric_pipeline", lang = "et") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("estbert128_rubric_pipeline", lang = "et") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|estbert128_rubric_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|et| +|Size:|465.7 MB| + +## References + +https://huggingface.co/tartuNLP/EstBERT128_Rubric + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-fake_news_classifier_en.md b/docs/_posts/ahmedlone127/2024-09-25-fake_news_classifier_en.md new file mode 100644 index 00000000000000..6784665c6dd8ab --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-fake_news_classifier_en.md @@ -0,0 +1,96 @@ +--- +layout: model +title: English fake_news_classifier RoBertaForSequenceClassification from T0asty +author: John Snow Labs +name: fake_news_classifier +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`fake_news_classifier` is a English model originally trained by T0asty. + +## Predicted Entities + + + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/fake_news_classifier_en_5.5.0_3.0_1727242442851.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/fake_news_classifier_en_5.5.0_3.0_1727242442851.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("fake_news_classifier","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("fake_news_classifier", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|fake_news_classifier| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|405.9 MB| + +## References + +References + +https://huggingface.co/T0asty/fake-news-classifier \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-fake_news_classifier_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-fake_news_classifier_pipeline_en.md new file mode 100644 index 00000000000000..3ea10846f2c12f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-fake_news_classifier_pipeline_en.md @@ -0,0 +1,72 @@ +--- +layout: model +title: English fake_news_classifier_pipeline pipeline RoBertaForSequenceClassification from T0asty +author: John Snow Labs +name: fake_news_classifier_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`fake_news_classifier_pipeline` is a English model originally trained by T0asty. + +## Predicted Entities + + + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/fake_news_classifier_pipeline_en_5.5.0_3.0_1727242465113.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/fake_news_classifier_pipeline_en_5.5.0_3.0_1727242465113.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +pipeline = PretrainedPipeline("fake_news_classifier_pipeline", lang = "en") +annotations = pipeline.transform(df) +``` +```scala +val pipeline = new PretrainedPipeline("fake_news_classifier_pipeline", lang = "en") +val annotations = pipeline.transform(df) +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|fake_news_classifier_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|405.9 MB| + +## References + +References + +https://huggingface.co/T0asty/fake-news-classifier + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-fakenews_bert_base_cased_denyol_en.md b/docs/_posts/ahmedlone127/2024-09-25-fakenews_bert_base_cased_denyol_en.md new file mode 100644 index 00000000000000..8d1914a2908cb4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-fakenews_bert_base_cased_denyol_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English fakenews_bert_base_cased_denyol BertForSequenceClassification from Denyol +author: John Snow Labs +name: fakenews_bert_base_cased_denyol +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`fakenews_bert_base_cased_denyol` is a English model originally trained by Denyol. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/fakenews_bert_base_cased_denyol_en_5.5.0_3.0_1727276634895.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/fakenews_bert_base_cased_denyol_en_5.5.0_3.0_1727276634895.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("fakenews_bert_base_cased_denyol","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("fakenews_bert_base_cased_denyol", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|fakenews_bert_base_cased_denyol| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|405.9 MB| + +## References + +https://huggingface.co/Denyol/FakeNews-bert-base-cased \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-fakenews_bert_base_cased_denyol_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-fakenews_bert_base_cased_denyol_pipeline_en.md new file mode 100644 index 00000000000000..c1b9705f1136e3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-fakenews_bert_base_cased_denyol_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English fakenews_bert_base_cased_denyol_pipeline pipeline BertForSequenceClassification from Denyol +author: John Snow Labs +name: fakenews_bert_base_cased_denyol_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`fakenews_bert_base_cased_denyol_pipeline` is a English model originally trained by Denyol. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/fakenews_bert_base_cased_denyol_pipeline_en_5.5.0_3.0_1727276657063.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/fakenews_bert_base_cased_denyol_pipeline_en_5.5.0_3.0_1727276657063.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("fakenews_bert_base_cased_denyol_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("fakenews_bert_base_cased_denyol_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|fakenews_bert_base_cased_denyol_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|405.9 MB| + +## References + +https://huggingface.co/Denyol/FakeNews-bert-base-cased + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-favs_filtersort_multilabel_classification_bert_base_cased_jacquesle_en.md b/docs/_posts/ahmedlone127/2024-09-25-favs_filtersort_multilabel_classification_bert_base_cased_jacquesle_en.md new file mode 100644 index 00000000000000..2b707721043224 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-favs_filtersort_multilabel_classification_bert_base_cased_jacquesle_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English favs_filtersort_multilabel_classification_bert_base_cased_jacquesle BertForSequenceClassification from jacquesle +author: John Snow Labs +name: favs_filtersort_multilabel_classification_bert_base_cased_jacquesle +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`favs_filtersort_multilabel_classification_bert_base_cased_jacquesle` is a English model originally trained by jacquesle. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/favs_filtersort_multilabel_classification_bert_base_cased_jacquesle_en_5.5.0_3.0_1727276870556.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/favs_filtersort_multilabel_classification_bert_base_cased_jacquesle_en_5.5.0_3.0_1727276870556.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("favs_filtersort_multilabel_classification_bert_base_cased_jacquesle","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("favs_filtersort_multilabel_classification_bert_base_cased_jacquesle", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|favs_filtersort_multilabel_classification_bert_base_cased_jacquesle| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|405.9 MB| + +## References + +https://huggingface.co/jacquesle/favs-filtersort-multilabel-classification-bert-base-cased \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-favs_filtersort_multilabel_classification_bert_base_cased_nguyenkhoa2407_en.md b/docs/_posts/ahmedlone127/2024-09-25-favs_filtersort_multilabel_classification_bert_base_cased_nguyenkhoa2407_en.md new file mode 100644 index 00000000000000..e4b5eec690125f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-favs_filtersort_multilabel_classification_bert_base_cased_nguyenkhoa2407_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English favs_filtersort_multilabel_classification_bert_base_cased_nguyenkhoa2407 BertForSequenceClassification from nguyenkhoa2407 +author: John Snow Labs +name: favs_filtersort_multilabel_classification_bert_base_cased_nguyenkhoa2407 +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`favs_filtersort_multilabel_classification_bert_base_cased_nguyenkhoa2407` is a English model originally trained by nguyenkhoa2407. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/favs_filtersort_multilabel_classification_bert_base_cased_nguyenkhoa2407_en_5.5.0_3.0_1727277882852.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/favs_filtersort_multilabel_classification_bert_base_cased_nguyenkhoa2407_en_5.5.0_3.0_1727277882852.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("favs_filtersort_multilabel_classification_bert_base_cased_nguyenkhoa2407","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("favs_filtersort_multilabel_classification_bert_base_cased_nguyenkhoa2407", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|favs_filtersort_multilabel_classification_bert_base_cased_nguyenkhoa2407| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|405.9 MB| + +## References + +https://huggingface.co/nguyenkhoa2407/favs-filtersort-multilabel-classification-bert-base-cased \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-favs_filtersort_multilabel_classification_bert_base_cased_nguyenkhoa2407_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-favs_filtersort_multilabel_classification_bert_base_cased_nguyenkhoa2407_pipeline_en.md new file mode 100644 index 00000000000000..b6a3be0b046c95 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-favs_filtersort_multilabel_classification_bert_base_cased_nguyenkhoa2407_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English favs_filtersort_multilabel_classification_bert_base_cased_nguyenkhoa2407_pipeline pipeline BertForSequenceClassification from nguyenkhoa2407 +author: John Snow Labs +name: favs_filtersort_multilabel_classification_bert_base_cased_nguyenkhoa2407_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`favs_filtersort_multilabel_classification_bert_base_cased_nguyenkhoa2407_pipeline` is a English model originally trained by nguyenkhoa2407. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/favs_filtersort_multilabel_classification_bert_base_cased_nguyenkhoa2407_pipeline_en_5.5.0_3.0_1727277904230.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/favs_filtersort_multilabel_classification_bert_base_cased_nguyenkhoa2407_pipeline_en_5.5.0_3.0_1727277904230.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("favs_filtersort_multilabel_classification_bert_base_cased_nguyenkhoa2407_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("favs_filtersort_multilabel_classification_bert_base_cased_nguyenkhoa2407_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|favs_filtersort_multilabel_classification_bert_base_cased_nguyenkhoa2407_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|405.9 MB| + +## References + +https://huggingface.co/nguyenkhoa2407/favs-filtersort-multilabel-classification-bert-base-cased + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-finbert_tuned_en.md b/docs/_posts/ahmedlone127/2024-09-25-finbert_tuned_en.md new file mode 100644 index 00000000000000..8b0a6f6e5af445 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-finbert_tuned_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finbert_tuned BertForSequenceClassification from manvik28 +author: John Snow Labs +name: finbert_tuned +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finbert_tuned` is a English model originally trained by manvik28. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finbert_tuned_en_5.5.0_3.0_1727285196421.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finbert_tuned_en_5.5.0_3.0_1727285196421.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("finbert_tuned","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("finbert_tuned", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finbert_tuned| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/manvik28/FinBERT_Tuned \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-fine_tuned_bert_czech_wikann_en.md b/docs/_posts/ahmedlone127/2024-09-25-fine_tuned_bert_czech_wikann_en.md new file mode 100644 index 00000000000000..5caf9ab545e613 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-fine_tuned_bert_czech_wikann_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English fine_tuned_bert_czech_wikann BertForTokenClassification from stulcrad +author: John Snow Labs +name: fine_tuned_bert_czech_wikann +date: 2024-09-25 +tags: [en, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`fine_tuned_bert_czech_wikann` is a English model originally trained by stulcrad. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/fine_tuned_bert_czech_wikann_en_5.5.0_3.0_1727275485253.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/fine_tuned_bert_czech_wikann_en_5.5.0_3.0_1727275485253.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("fine_tuned_bert_czech_wikann","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("fine_tuned_bert_czech_wikann", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|fine_tuned_bert_czech_wikann| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|665.1 MB| + +## References + +https://huggingface.co/stulcrad/fine_tuned_BERT_cs_wikann \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-finetuned_bert_base_on_shemo_transcripts_en.md b/docs/_posts/ahmedlone127/2024-09-25-finetuned_bert_base_on_shemo_transcripts_en.md new file mode 100644 index 00000000000000..797d3919c82c97 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-finetuned_bert_base_on_shemo_transcripts_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuned_bert_base_on_shemo_transcripts BertForSequenceClassification from minoosh +author: John Snow Labs +name: finetuned_bert_base_on_shemo_transcripts +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuned_bert_base_on_shemo_transcripts` is a English model originally trained by minoosh. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuned_bert_base_on_shemo_transcripts_en_5.5.0_3.0_1727263641848.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuned_bert_base_on_shemo_transcripts_en_5.5.0_3.0_1727263641848.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("finetuned_bert_base_on_shemo_transcripts","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("finetuned_bert_base_on_shemo_transcripts", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuned_bert_base_on_shemo_transcripts| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/minoosh/finetuned_bert-base_on_shEMO_transcripts \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-finetuned_bert_base_uncased_olivernyu_en.md b/docs/_posts/ahmedlone127/2024-09-25-finetuned_bert_base_uncased_olivernyu_en.md new file mode 100644 index 00000000000000..b5674e748644e8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-finetuned_bert_base_uncased_olivernyu_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuned_bert_base_uncased_olivernyu BertForSequenceClassification from Olivernyu +author: John Snow Labs +name: finetuned_bert_base_uncased_olivernyu +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuned_bert_base_uncased_olivernyu` is a English model originally trained by Olivernyu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuned_bert_base_uncased_olivernyu_en_5.5.0_3.0_1727263622137.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuned_bert_base_uncased_olivernyu_en_5.5.0_3.0_1727263622137.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("finetuned_bert_base_uncased_olivernyu","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("finetuned_bert_base_uncased_olivernyu", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuned_bert_base_uncased_olivernyu| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/Olivernyu/finetuned_bert_base_uncased \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-finetuning_classification_model_3000_samples_en.md b/docs/_posts/ahmedlone127/2024-09-25-finetuning_classification_model_3000_samples_en.md new file mode 100644 index 00000000000000..d47bd1c26b6d0f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-finetuning_classification_model_3000_samples_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuning_classification_model_3000_samples BertForSequenceClassification from GMW123 +author: John Snow Labs +name: finetuning_classification_model_3000_samples +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_classification_model_3000_samples` is a English model originally trained by GMW123. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_classification_model_3000_samples_en_5.5.0_3.0_1727254007618.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_classification_model_3000_samples_en_5.5.0_3.0_1727254007618.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("finetuning_classification_model_3000_samples","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("finetuning_classification_model_3000_samples", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_classification_model_3000_samples| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|84.8 MB| + +## References + +https://huggingface.co/GMW123/finetuning-classification-model-3000-samples \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-finetuning_classification_model_3000_samples_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-finetuning_classification_model_3000_samples_pipeline_en.md new file mode 100644 index 00000000000000..0d33b112fe55d7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-finetuning_classification_model_3000_samples_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuning_classification_model_3000_samples_pipeline pipeline BertForSequenceClassification from GMW123 +author: John Snow Labs +name: finetuning_classification_model_3000_samples_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_classification_model_3000_samples_pipeline` is a English model originally trained by GMW123. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_classification_model_3000_samples_pipeline_en_5.5.0_3.0_1727254011991.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_classification_model_3000_samples_pipeline_en_5.5.0_3.0_1727254011991.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuning_classification_model_3000_samples_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuning_classification_model_3000_samples_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_classification_model_3000_samples_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|84.8 MB| + +## References + +https://huggingface.co/GMW123/finetuning-classification-model-3000-samples + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-finetuning_sentiment_analysis_bert2epoch_en.md b/docs/_posts/ahmedlone127/2024-09-25-finetuning_sentiment_analysis_bert2epoch_en.md new file mode 100644 index 00000000000000..e65d2a1d97aaa0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-finetuning_sentiment_analysis_bert2epoch_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuning_sentiment_analysis_bert2epoch BertForSequenceClassification from aruca +author: John Snow Labs +name: finetuning_sentiment_analysis_bert2epoch +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_analysis_bert2epoch` is a English model originally trained by aruca. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_analysis_bert2epoch_en_5.5.0_3.0_1727266348143.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_analysis_bert2epoch_en_5.5.0_3.0_1727266348143.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("finetuning_sentiment_analysis_bert2epoch","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("finetuning_sentiment_analysis_bert2epoch", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_analysis_bert2epoch| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/aruca/finetuning-sentiment-analysis-bert2epoch \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-finetuning_sentiment_analysis_bert2epoch_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-finetuning_sentiment_analysis_bert2epoch_pipeline_en.md new file mode 100644 index 00000000000000..31d0d4e7ab1e5b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-finetuning_sentiment_analysis_bert2epoch_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuning_sentiment_analysis_bert2epoch_pipeline pipeline BertForSequenceClassification from aruca +author: John Snow Labs +name: finetuning_sentiment_analysis_bert2epoch_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_analysis_bert2epoch_pipeline` is a English model originally trained by aruca. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_analysis_bert2epoch_pipeline_en_5.5.0_3.0_1727266370972.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_analysis_bert2epoch_pipeline_en_5.5.0_3.0_1727266370972.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuning_sentiment_analysis_bert2epoch_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuning_sentiment_analysis_bert2epoch_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_analysis_bert2epoch_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/aruca/finetuning-sentiment-analysis-bert2epoch + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-gbert_germeval_2021_de.md b/docs/_posts/ahmedlone127/2024-09-25-gbert_germeval_2021_de.md new file mode 100644 index 00000000000000..866409e61b6aa6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-gbert_germeval_2021_de.md @@ -0,0 +1,94 @@ +--- +layout: model +title: German gbert_germeval_2021 BertForSequenceClassification from shahrukhx01 +author: John Snow Labs +name: gbert_germeval_2021 +date: 2024-09-25 +tags: [de, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: de +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`gbert_germeval_2021` is a German model originally trained by shahrukhx01. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/gbert_germeval_2021_de_5.5.0_3.0_1727286926871.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/gbert_germeval_2021_de_5.5.0_3.0_1727286926871.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("gbert_germeval_2021","de") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("gbert_germeval_2021", "de") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|gbert_germeval_2021| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|de| +|Size:|412.0 MB| + +## References + +https://huggingface.co/shahrukhx01/gbert-germeval-2021 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-gbert_germeval_2021_pipeline_de.md b/docs/_posts/ahmedlone127/2024-09-25-gbert_germeval_2021_pipeline_de.md new file mode 100644 index 00000000000000..c7b90831c5807d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-gbert_germeval_2021_pipeline_de.md @@ -0,0 +1,70 @@ +--- +layout: model +title: German gbert_germeval_2021_pipeline pipeline BertForSequenceClassification from shahrukhx01 +author: John Snow Labs +name: gbert_germeval_2021_pipeline +date: 2024-09-25 +tags: [de, open_source, pipeline, onnx] +task: Text Classification +language: de +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`gbert_germeval_2021_pipeline` is a German model originally trained by shahrukhx01. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/gbert_germeval_2021_pipeline_de_5.5.0_3.0_1727286947949.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/gbert_germeval_2021_pipeline_de_5.5.0_3.0_1727286947949.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("gbert_germeval_2021_pipeline", lang = "de") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("gbert_germeval_2021_pipeline", lang = "de") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|gbert_germeval_2021_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|de| +|Size:|412.0 MB| + +## References + +https://huggingface.co/shahrukhx01/gbert-germeval-2021 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-genome_finder_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-genome_finder_pipeline_en.md new file mode 100644 index 00000000000000..165761ed5c5a43 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-genome_finder_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English genome_finder_pipeline pipeline BertForSequenceClassification from rdhinaz +author: John Snow Labs +name: genome_finder_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`genome_finder_pipeline` is a English model originally trained by rdhinaz. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/genome_finder_pipeline_en_5.5.0_3.0_1727273166754.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/genome_finder_pipeline_en_5.5.0_3.0_1727273166754.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("genome_finder_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("genome_finder_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|genome_finder_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/rdhinaz/genome-finder + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-geotrend_10_epochs_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-geotrend_10_epochs_pipeline_en.md new file mode 100644 index 00000000000000..73027b44b68277 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-geotrend_10_epochs_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English geotrend_10_epochs_pipeline pipeline BertForTokenClassification from Azizun +author: John Snow Labs +name: geotrend_10_epochs_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`geotrend_10_epochs_pipeline` is a English model originally trained by Azizun. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/geotrend_10_epochs_pipeline_en_5.5.0_3.0_1727281877953.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/geotrend_10_epochs_pipeline_en_5.5.0_3.0_1727281877953.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("geotrend_10_epochs_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("geotrend_10_epochs_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|geotrend_10_epochs_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|344.9 MB| + +## References + +https://huggingface.co/Azizun/Geotrend-10-epochs + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-hate_ita_it.md b/docs/_posts/ahmedlone127/2024-09-25-hate_ita_it.md new file mode 100644 index 00000000000000..530bcf3108b171 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-hate_ita_it.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Italian hate_ita XlmRoBertaForSequenceClassification from MilaNLProc +author: John Snow Labs +name: hate_ita +date: 2024-09-25 +tags: [it, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: it +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`hate_ita` is a Italian model originally trained by MilaNLProc. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/hate_ita_it_5.5.0_3.0_1727229657021.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/hate_ita_it_5.5.0_3.0_1727229657021.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("hate_ita","it") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("hate_ita", "it") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|hate_ita| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|it| +|Size:|1.0 GB| + +## References + +https://huggingface.co/MilaNLProc/hate-ita \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-hate_ita_pipeline_it.md b/docs/_posts/ahmedlone127/2024-09-25-hate_ita_pipeline_it.md new file mode 100644 index 00000000000000..810578872a99ec --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-hate_ita_pipeline_it.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Italian hate_ita_pipeline pipeline XlmRoBertaForSequenceClassification from MilaNLProc +author: John Snow Labs +name: hate_ita_pipeline +date: 2024-09-25 +tags: [it, open_source, pipeline, onnx] +task: Text Classification +language: it +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`hate_ita_pipeline` is a Italian model originally trained by MilaNLProc. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/hate_ita_pipeline_it_5.5.0_3.0_1727229711840.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/hate_ita_pipeline_it_5.5.0_3.0_1727229711840.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("hate_ita_pipeline", lang = "it") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("hate_ita_pipeline", lang = "it") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|hate_ita_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|it| +|Size:|1.0 GB| + +## References + +https://huggingface.co/MilaNLProc/hate-ita + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-hate_speech_slo_sl.md b/docs/_posts/ahmedlone127/2024-09-25-hate_speech_slo_sl.md new file mode 100644 index 00000000000000..f0c299a0500413 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-hate_speech_slo_sl.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Slovenian hate_speech_slo BertForSequenceClassification from IMSyPP +author: John Snow Labs +name: hate_speech_slo +date: 2024-09-25 +tags: [sl, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: sl +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`hate_speech_slo` is a Slovenian model originally trained by IMSyPP. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/hate_speech_slo_sl_5.5.0_3.0_1727245720551.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/hate_speech_slo_sl_5.5.0_3.0_1727245720551.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("hate_speech_slo","sl") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("hate_speech_slo", "sl") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|hate_speech_slo| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|sl| +|Size:|465.7 MB| + +## References + +https://huggingface.co/IMSyPP/hate_speech_slo \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-hatexplain_ds_labeled_001_en.md b/docs/_posts/ahmedlone127/2024-09-25-hatexplain_ds_labeled_001_en.md new file mode 100644 index 00000000000000..ea03fcbaf50e57 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-hatexplain_ds_labeled_001_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English hatexplain_ds_labeled_001 BertForSequenceClassification from jkhan447 +author: John Snow Labs +name: hatexplain_ds_labeled_001 +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`hatexplain_ds_labeled_001` is a English model originally trained by jkhan447. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/hatexplain_ds_labeled_001_en_5.5.0_3.0_1727267777367.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/hatexplain_ds_labeled_001_en_5.5.0_3.0_1727267777367.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("hatexplain_ds_labeled_001","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("hatexplain_ds_labeled_001", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|hatexplain_ds_labeled_001| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/jkhan447/HateXplain-DS-labeled-001 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-hatexplain_weighted_majority_labeled_en.md b/docs/_posts/ahmedlone127/2024-09-25-hatexplain_weighted_majority_labeled_en.md new file mode 100644 index 00000000000000..c974f958958af5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-hatexplain_weighted_majority_labeled_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English hatexplain_weighted_majority_labeled BertForSequenceClassification from jkhan447 +author: John Snow Labs +name: hatexplain_weighted_majority_labeled +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`hatexplain_weighted_majority_labeled` is a English model originally trained by jkhan447. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/hatexplain_weighted_majority_labeled_en_5.5.0_3.0_1727268331231.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/hatexplain_weighted_majority_labeled_en_5.5.0_3.0_1727268331231.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("hatexplain_weighted_majority_labeled","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("hatexplain_weighted_majority_labeled", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|hatexplain_weighted_majority_labeled| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/jkhan447/HateXplain-weighted-majority-labeled \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-hindi_topic_all_doc_hi.md b/docs/_posts/ahmedlone127/2024-09-25-hindi_topic_all_doc_hi.md new file mode 100644 index 00000000000000..1552c41fda6e5a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-hindi_topic_all_doc_hi.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Hindi hindi_topic_all_doc BertForSequenceClassification from l3cube-pune +author: John Snow Labs +name: hindi_topic_all_doc +date: 2024-09-25 +tags: [hi, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: hi +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`hindi_topic_all_doc` is a Hindi model originally trained by l3cube-pune. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/hindi_topic_all_doc_hi_5.5.0_3.0_1727238129237.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/hindi_topic_all_doc_hi_5.5.0_3.0_1727238129237.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("hindi_topic_all_doc","hi") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("hindi_topic_all_doc", "hi") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|hindi_topic_all_doc| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|hi| +|Size:|892.9 MB| + +## References + +https://huggingface.co/l3cube-pune/hindi-topic-all-doc \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-hindi_topic_all_doc_pipeline_hi.md b/docs/_posts/ahmedlone127/2024-09-25-hindi_topic_all_doc_pipeline_hi.md new file mode 100644 index 00000000000000..bc924212b5bea6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-hindi_topic_all_doc_pipeline_hi.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Hindi hindi_topic_all_doc_pipeline pipeline BertForSequenceClassification from l3cube-pune +author: John Snow Labs +name: hindi_topic_all_doc_pipeline +date: 2024-09-25 +tags: [hi, open_source, pipeline, onnx] +task: Text Classification +language: hi +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`hindi_topic_all_doc_pipeline` is a Hindi model originally trained by l3cube-pune. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/hindi_topic_all_doc_pipeline_hi_5.5.0_3.0_1727238174674.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/hindi_topic_all_doc_pipeline_hi_5.5.0_3.0_1727238174674.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("hindi_topic_all_doc_pipeline", lang = "hi") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("hindi_topic_all_doc_pipeline", lang = "hi") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|hindi_topic_all_doc_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|hi| +|Size:|892.9 MB| + +## References + +https://huggingface.co/l3cube-pune/hindi-topic-all-doc + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-ic_en.md b/docs/_posts/ahmedlone127/2024-09-25-ic_en.md new file mode 100644 index 00000000000000..d6ac54880b0b54 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-ic_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English ic BertForSequenceClassification from JohnDoe70 +author: John Snow Labs +name: ic +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ic` is a English model originally trained by JohnDoe70. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ic_en_5.5.0_3.0_1727261919875.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ic_en_5.5.0_3.0_1727261919875.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("ic","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("ic", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ic| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|410.0 MB| + +## References + +https://huggingface.co/JohnDoe70/ic \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-ic_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-ic_pipeline_en.md new file mode 100644 index 00000000000000..e78905fdff9c9e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-ic_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English ic_pipeline pipeline BertForSequenceClassification from JohnDoe70 +author: John Snow Labs +name: ic_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ic_pipeline` is a English model originally trained by JohnDoe70. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ic_pipeline_en_5.5.0_3.0_1727261941152.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ic_pipeline_en_5.5.0_3.0_1727261941152.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("ic_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("ic_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ic_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|410.1 MB| + +## References + +https://huggingface.co/JohnDoe70/ic + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-ideology_facebookai_xlm_roberta_large_en.md b/docs/_posts/ahmedlone127/2024-09-25-ideology_facebookai_xlm_roberta_large_en.md new file mode 100644 index 00000000000000..3cd10d2cb4e893 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-ideology_facebookai_xlm_roberta_large_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English ideology_facebookai_xlm_roberta_large RoBertaForSequenceClassification from juan-glez29 +author: John Snow Labs +name: ideology_facebookai_xlm_roberta_large +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ideology_facebookai_xlm_roberta_large` is a English model originally trained by juan-glez29. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ideology_facebookai_xlm_roberta_large_en_5.5.0_3.0_1727233724111.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ideology_facebookai_xlm_roberta_large_en_5.5.0_3.0_1727233724111.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("ideology_facebookai_xlm_roberta_large","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("ideology_facebookai_xlm_roberta_large", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ideology_facebookai_xlm_roberta_large| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|406.8 MB| + +## References + +https://huggingface.co/juan-glez29/ideology-FacebookAI-xlm-roberta-large \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-incel_alberto_en.md b/docs/_posts/ahmedlone127/2024-09-25-incel_alberto_en.md new file mode 100644 index 00000000000000..ef8f218f19b26b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-incel_alberto_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English incel_alberto BertEmbeddings from pgajo +author: John Snow Labs +name: incel_alberto +date: 2024-09-25 +tags: [en, open_source, onnx, embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`incel_alberto` is a English model originally trained by pgajo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/incel_alberto_en_5.5.0_3.0_1727243328830.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/incel_alberto_en_5.5.0_3.0_1727243328830.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = BertEmbeddings.pretrained("incel_alberto","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = BertEmbeddings.pretrained("incel_alberto","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|incel_alberto| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[bert]| +|Language:|en| +|Size:|688.7 MB| + +## References + +https://huggingface.co/pgajo/incel-alberto \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-incel_alberto_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-incel_alberto_pipeline_en.md new file mode 100644 index 00000000000000..264d3bab2b5c58 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-incel_alberto_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English incel_alberto_pipeline pipeline BertEmbeddings from pgajo +author: John Snow Labs +name: incel_alberto_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`incel_alberto_pipeline` is a English model originally trained by pgajo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/incel_alberto_pipeline_en_5.5.0_3.0_1727243364491.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/incel_alberto_pipeline_en_5.5.0_3.0_1727243364491.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("incel_alberto_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("incel_alberto_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|incel_alberto_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|688.8 MB| + +## References + +https://huggingface.co/pgajo/incel-alberto + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-jaberv2_en.md b/docs/_posts/ahmedlone127/2024-09-25-jaberv2_en.md new file mode 100644 index 00000000000000..a39fe61380e58f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-jaberv2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English jaberv2 BertEmbeddings from huawei-noah +author: John Snow Labs +name: jaberv2 +date: 2024-09-25 +tags: [en, open_source, onnx, embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`jaberv2` is a English model originally trained by huawei-noah. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/jaberv2_en_5.5.0_3.0_1727258162659.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/jaberv2_en_5.5.0_3.0_1727258162659.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = BertEmbeddings.pretrained("jaberv2","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = BertEmbeddings.pretrained("jaberv2","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|jaberv2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[bert]| +|Language:|en| +|Size:|504.8 MB| + +## References + +https://huggingface.co/huawei-noah/JABERv2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-jaberv2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-jaberv2_pipeline_en.md new file mode 100644 index 00000000000000..17b38f85b42a68 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-jaberv2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English jaberv2_pipeline pipeline BertEmbeddings from huawei-noah +author: John Snow Labs +name: jaberv2_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`jaberv2_pipeline` is a English model originally trained by huawei-noah. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/jaberv2_pipeline_en_5.5.0_3.0_1727258189766.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/jaberv2_pipeline_en_5.5.0_3.0_1727258189766.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("jaberv2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("jaberv2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|jaberv2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|504.9 MB| + +## References + +https://huggingface.co/huawei-noah/JABERv2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-khadija_ner_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-khadija_ner_pipeline_en.md new file mode 100644 index 00000000000000..1b8ad571628d8c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-khadija_ner_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English khadija_ner_pipeline pipeline BertForTokenClassification from didazz +author: John Snow Labs +name: khadija_ner_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`khadija_ner_pipeline` is a English model originally trained by didazz. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/khadija_ner_pipeline_en_5.5.0_3.0_1727283171556.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/khadija_ner_pipeline_en_5.5.0_3.0_1727283171556.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("khadija_ner_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("khadija_ner_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|khadija_ner_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/didazz/khadija_ner + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-kid_whisper_medium_english_myst_cslu_en.md b/docs/_posts/ahmedlone127/2024-09-25-kid_whisper_medium_english_myst_cslu_en.md new file mode 100644 index 00000000000000..057c4bd2512430 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-kid_whisper_medium_english_myst_cslu_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English kid_whisper_medium_english_myst_cslu WhisperForCTC from aadel4 +author: John Snow Labs +name: kid_whisper_medium_english_myst_cslu +date: 2024-09-25 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`kid_whisper_medium_english_myst_cslu` is a English model originally trained by aadel4. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/kid_whisper_medium_english_myst_cslu_en_5.5.0_3.0_1727227825039.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/kid_whisper_medium_english_myst_cslu_en_5.5.0_3.0_1727227825039.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("kid_whisper_medium_english_myst_cslu","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("kid_whisper_medium_english_myst_cslu", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|kid_whisper_medium_english_myst_cslu| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|4.8 GB| + +## References + +https://huggingface.co/aadel4/kid-whisper-medium-en-myst_cslu \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-klue_bert_base_senti_pipeline_ko.md b/docs/_posts/ahmedlone127/2024-09-25-klue_bert_base_senti_pipeline_ko.md new file mode 100644 index 00000000000000..48e4e569db8f3f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-klue_bert_base_senti_pipeline_ko.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Korean klue_bert_base_senti_pipeline pipeline BertForSequenceClassification from dudududukim +author: John Snow Labs +name: klue_bert_base_senti_pipeline +date: 2024-09-25 +tags: [ko, open_source, pipeline, onnx] +task: Text Classification +language: ko +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`klue_bert_base_senti_pipeline` is a Korean model originally trained by dudududukim. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/klue_bert_base_senti_pipeline_ko_5.5.0_3.0_1727242428540.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/klue_bert_base_senti_pipeline_ko_5.5.0_3.0_1727242428540.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("klue_bert_base_senti_pipeline", lang = "ko") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("klue_bert_base_senti_pipeline", lang = "ko") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|klue_bert_base_senti_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|ko| +|Size:|414.8 MB| + +## References + +https://huggingface.co/dudududukim/klue-bert-base-senti + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-kor_naver_ner_name_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-kor_naver_ner_name_pipeline_en.md new file mode 100644 index 00000000000000..7508143860f69a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-kor_naver_ner_name_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English kor_naver_ner_name_pipeline pipeline BertForTokenClassification from joon09 +author: John Snow Labs +name: kor_naver_ner_name_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`kor_naver_ner_name_pipeline` is a English model originally trained by joon09. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/kor_naver_ner_name_pipeline_en_5.5.0_3.0_1727262973966.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/kor_naver_ner_name_pipeline_en_5.5.0_3.0_1727262973966.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("kor_naver_ner_name_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("kor_naver_ner_name_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|kor_naver_ner_name_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|441.2 MB| + +## References + +https://huggingface.co/joon09/kor-naver-ner-name + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-korean_albert_base_v1_ko.md b/docs/_posts/ahmedlone127/2024-09-25-korean_albert_base_v1_ko.md new file mode 100644 index 00000000000000..d149178124721b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-korean_albert_base_v1_ko.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Korean korean_albert_base_v1 BertEmbeddings from lots-o +author: John Snow Labs +name: korean_albert_base_v1 +date: 2024-09-25 +tags: [ko, open_source, onnx, embeddings, bert] +task: Embeddings +language: ko +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`korean_albert_base_v1` is a Korean model originally trained by lots-o. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/korean_albert_base_v1_ko_5.5.0_3.0_1727236721501.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/korean_albert_base_v1_ko_5.5.0_3.0_1727236721501.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = BertEmbeddings.pretrained("korean_albert_base_v1","ko") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = BertEmbeddings.pretrained("korean_albert_base_v1","ko") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|korean_albert_base_v1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[bert]| +|Language:|ko| +|Size:|47.7 MB| + +## References + +https://huggingface.co/lots-o/ko-albert-base-v1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-korean_albert_base_v1_pipeline_ko.md b/docs/_posts/ahmedlone127/2024-09-25-korean_albert_base_v1_pipeline_ko.md new file mode 100644 index 00000000000000..02006a79b196ec --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-korean_albert_base_v1_pipeline_ko.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Korean korean_albert_base_v1_pipeline pipeline BertEmbeddings from lots-o +author: John Snow Labs +name: korean_albert_base_v1_pipeline +date: 2024-09-25 +tags: [ko, open_source, pipeline, onnx] +task: Embeddings +language: ko +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`korean_albert_base_v1_pipeline` is a Korean model originally trained by lots-o. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/korean_albert_base_v1_pipeline_ko_5.5.0_3.0_1727236724245.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/korean_albert_base_v1_pipeline_ko_5.5.0_3.0_1727236724245.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("korean_albert_base_v1_pipeline", lang = "ko") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("korean_albert_base_v1_pipeline", lang = "ko") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|korean_albert_base_v1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|ko| +|Size:|47.8 MB| + +## References + +https://huggingface.co/lots-o/ko-albert-base-v1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-korean_disease_ner_en.md b/docs/_posts/ahmedlone127/2024-09-25-korean_disease_ner_en.md new file mode 100644 index 00000000000000..56df4728f29277 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-korean_disease_ner_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English korean_disease_ner BertForTokenClassification from keonju +author: John Snow Labs +name: korean_disease_ner +date: 2024-09-25 +tags: [en, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`korean_disease_ner` is a English model originally trained by keonju. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/korean_disease_ner_en_5.5.0_3.0_1727283023258.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/korean_disease_ner_en_5.5.0_3.0_1727283023258.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("korean_disease_ner","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("korean_disease_ner", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|korean_disease_ner| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|412.4 MB| + +## References + +https://huggingface.co/keonju/korean_disease_ner \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-kyrgyz_language_ner_ky.md b/docs/_posts/ahmedlone127/2024-09-25-kyrgyz_language_ner_ky.md new file mode 100644 index 00000000000000..03ae008b7d2c9a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-kyrgyz_language_ner_ky.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Kirghiz, Kyrgyz kyrgyz_language_ner BertForTokenClassification from murat +author: John Snow Labs +name: kyrgyz_language_ner +date: 2024-09-25 +tags: [ky, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: ky +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`kyrgyz_language_ner` is a Kirghiz, Kyrgyz model originally trained by murat. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/kyrgyz_language_ner_ky_5.5.0_3.0_1727249916067.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/kyrgyz_language_ner_ky_5.5.0_3.0_1727249916067.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("kyrgyz_language_ner","ky") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("kyrgyz_language_ner", "ky") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|kyrgyz_language_ner| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|ky| +|Size:|665.1 MB| + +## References + +https://huggingface.co/murat/kyrgyz_language_NER \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-kyrgyz_language_ner_pipeline_ky.md b/docs/_posts/ahmedlone127/2024-09-25-kyrgyz_language_ner_pipeline_ky.md new file mode 100644 index 00000000000000..a05b622e466037 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-kyrgyz_language_ner_pipeline_ky.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Kirghiz, Kyrgyz kyrgyz_language_ner_pipeline pipeline BertForTokenClassification from murat +author: John Snow Labs +name: kyrgyz_language_ner_pipeline +date: 2024-09-25 +tags: [ky, open_source, pipeline, onnx] +task: Named Entity Recognition +language: ky +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`kyrgyz_language_ner_pipeline` is a Kirghiz, Kyrgyz model originally trained by murat. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/kyrgyz_language_ner_pipeline_ky_5.5.0_3.0_1727249950446.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/kyrgyz_language_ner_pipeline_ky_5.5.0_3.0_1727249950446.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("kyrgyz_language_ner_pipeline", lang = "ky") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("kyrgyz_language_ner_pipeline", lang = "ky") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|kyrgyz_language_ner_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|ky| +|Size:|665.1 MB| + +## References + +https://huggingface.co/murat/kyrgyz_language_NER + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-labse_malach_multilabel_en.md b/docs/_posts/ahmedlone127/2024-09-25-labse_malach_multilabel_en.md new file mode 100644 index 00000000000000..7dbb6a675c7dbe --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-labse_malach_multilabel_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English labse_malach_multilabel BertForSequenceClassification from ChrisBridges +author: John Snow Labs +name: labse_malach_multilabel +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`labse_malach_multilabel` is a English model originally trained by ChrisBridges. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/labse_malach_multilabel_en_5.5.0_3.0_1727240285805.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/labse_malach_multilabel_en_5.5.0_3.0_1727240285805.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("labse_malach_multilabel","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("labse_malach_multilabel", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|labse_malach_multilabel| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.8 GB| + +## References + +https://huggingface.co/ChrisBridges/labse-malach-multilabel \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-labse_malach_multilabel_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-labse_malach_multilabel_pipeline_en.md new file mode 100644 index 00000000000000..e02218ccab4d2a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-labse_malach_multilabel_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English labse_malach_multilabel_pipeline pipeline BertForSequenceClassification from ChrisBridges +author: John Snow Labs +name: labse_malach_multilabel_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`labse_malach_multilabel_pipeline` is a English model originally trained by ChrisBridges. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/labse_malach_multilabel_pipeline_en_5.5.0_3.0_1727240370019.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/labse_malach_multilabel_pipeline_en_5.5.0_3.0_1727240370019.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("labse_malach_multilabel_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("labse_malach_multilabel_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|labse_malach_multilabel_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.8 GB| + +## References + +https://huggingface.co/ChrisBridges/labse-malach-multilabel + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-legal_bert_samoan_gen1_large_summarized_chuvash_4_en.md b/docs/_posts/ahmedlone127/2024-09-25-legal_bert_samoan_gen1_large_summarized_chuvash_4_en.md new file mode 100644 index 00000000000000..0bded634167878 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-legal_bert_samoan_gen1_large_summarized_chuvash_4_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English legal_bert_samoan_gen1_large_summarized_chuvash_4 BertForSequenceClassification from wiorz +author: John Snow Labs +name: legal_bert_samoan_gen1_large_summarized_chuvash_4 +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`legal_bert_samoan_gen1_large_summarized_chuvash_4` is a English model originally trained by wiorz. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/legal_bert_samoan_gen1_large_summarized_chuvash_4_en_5.5.0_3.0_1727288632555.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/legal_bert_samoan_gen1_large_summarized_chuvash_4_en_5.5.0_3.0_1727288632555.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("legal_bert_samoan_gen1_large_summarized_chuvash_4","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("legal_bert_samoan_gen1_large_summarized_chuvash_4", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|legal_bert_samoan_gen1_large_summarized_chuvash_4| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.5 MB| + +## References + +https://huggingface.co/wiorz/legal_bert_sm_gen1_large_summarized_cv_4 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-malaysian_whisper_small_ms.md b/docs/_posts/ahmedlone127/2024-09-25-malaysian_whisper_small_ms.md new file mode 100644 index 00000000000000..a737dbefcb4701 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-malaysian_whisper_small_ms.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Malay (macrolanguage) malaysian_whisper_small WhisperForCTC from mesolitica +author: John Snow Labs +name: malaysian_whisper_small +date: 2024-09-25 +tags: [ms, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: ms +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`malaysian_whisper_small` is a Malay (macrolanguage) model originally trained by mesolitica. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/malaysian_whisper_small_ms_5.5.0_3.0_1727226895155.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/malaysian_whisper_small_ms_5.5.0_3.0_1727226895155.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("malaysian_whisper_small","ms") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("malaysian_whisper_small", "ms") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|malaysian_whisper_small| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|ms| +|Size:|856.3 MB| + +## References + +https://huggingface.co/mesolitica/malaysian-whisper-small \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-malaysian_whisper_small_pipeline_ms.md b/docs/_posts/ahmedlone127/2024-09-25-malaysian_whisper_small_pipeline_ms.md new file mode 100644 index 00000000000000..bd25b793d07ec8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-malaysian_whisper_small_pipeline_ms.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Malay (macrolanguage) malaysian_whisper_small_pipeline pipeline WhisperForCTC from mesolitica +author: John Snow Labs +name: malaysian_whisper_small_pipeline +date: 2024-09-25 +tags: [ms, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: ms +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`malaysian_whisper_small_pipeline` is a Malay (macrolanguage) model originally trained by mesolitica. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/malaysian_whisper_small_pipeline_ms_5.5.0_3.0_1727227176898.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/malaysian_whisper_small_pipeline_ms_5.5.0_3.0_1727227176898.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("malaysian_whisper_small_pipeline", lang = "ms") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("malaysian_whisper_small_pipeline", lang = "ms") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|malaysian_whisper_small_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|ms| +|Size:|856.3 MB| + +## References + +https://huggingface.co/mesolitica/malaysian-whisper-small + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-marathi_marh_val_f_mr.md b/docs/_posts/ahmedlone127/2024-09-25-marathi_marh_val_f_mr.md new file mode 100644 index 00000000000000..45106bb99ef809 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-marathi_marh_val_f_mr.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Marathi marathi_marh_val_f WhisperForCTC from simran14 +author: John Snow Labs +name: marathi_marh_val_f +date: 2024-09-25 +tags: [mr, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: mr +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`marathi_marh_val_f` is a Marathi model originally trained by simran14. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/marathi_marh_val_f_mr_5.5.0_3.0_1727226055295.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/marathi_marh_val_f_mr_5.5.0_3.0_1727226055295.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("marathi_marh_val_f","mr") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("marathi_marh_val_f", "mr") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|marathi_marh_val_f| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|mr| +|Size:|1.7 GB| + +## References + +https://huggingface.co/simran14/mr-val-f \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-marathi_marh_val_f_pipeline_mr.md b/docs/_posts/ahmedlone127/2024-09-25-marathi_marh_val_f_pipeline_mr.md new file mode 100644 index 00000000000000..e8e091c0baa0b4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-marathi_marh_val_f_pipeline_mr.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Marathi marathi_marh_val_f_pipeline pipeline WhisperForCTC from simran14 +author: John Snow Labs +name: marathi_marh_val_f_pipeline +date: 2024-09-25 +tags: [mr, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: mr +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`marathi_marh_val_f_pipeline` is a Marathi model originally trained by simran14. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/marathi_marh_val_f_pipeline_mr_5.5.0_3.0_1727226147970.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/marathi_marh_val_f_pipeline_mr_5.5.0_3.0_1727226147970.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("marathi_marh_val_f_pipeline", lang = "mr") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("marathi_marh_val_f_pipeline", lang = "mr") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|marathi_marh_val_f_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|mr| +|Size:|1.7 GB| + +## References + +https://huggingface.co/simran14/mr-val-f + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-marbertv2_flat_seed_42_en.md b/docs/_posts/ahmedlone127/2024-09-25-marbertv2_flat_seed_42_en.md new file mode 100644 index 00000000000000..ccb777a558da41 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-marbertv2_flat_seed_42_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English marbertv2_flat_seed_42 BertForTokenClassification from ahmedoumar +author: John Snow Labs +name: marbertv2_flat_seed_42 +date: 2024-09-25 +tags: [en, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`marbertv2_flat_seed_42` is a English model originally trained by ahmedoumar. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/marbertv2_flat_seed_42_en_5.5.0_3.0_1727275573150.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/marbertv2_flat_seed_42_en_5.5.0_3.0_1727275573150.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("marbertv2_flat_seed_42","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("marbertv2_flat_seed_42", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|marbertv2_flat_seed_42| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|606.7 MB| + +## References + +https://huggingface.co/ahmedoumar/MARBERTv2_FLAT_SEED_42 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-marbertv2_flat_seed_42_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-marbertv2_flat_seed_42_pipeline_en.md new file mode 100644 index 00000000000000..644d1bc0f640aa --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-marbertv2_flat_seed_42_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English marbertv2_flat_seed_42_pipeline pipeline BertForTokenClassification from ahmedoumar +author: John Snow Labs +name: marbertv2_flat_seed_42_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`marbertv2_flat_seed_42_pipeline` is a English model originally trained by ahmedoumar. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/marbertv2_flat_seed_42_pipeline_en_5.5.0_3.0_1727275605543.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/marbertv2_flat_seed_42_pipeline_en_5.5.0_3.0_1727275605543.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("marbertv2_flat_seed_42_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("marbertv2_flat_seed_42_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|marbertv2_flat_seed_42_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|606.7 MB| + +## References + +https://huggingface.co/ahmedoumar/MARBERTv2_FLAT_SEED_42 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-matscibert_cner_en.md b/docs/_posts/ahmedlone127/2024-09-25-matscibert_cner_en.md new file mode 100644 index 00000000000000..2f91f8d26d5f29 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-matscibert_cner_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English matscibert_cner BertForTokenClassification from nlp-magnets +author: John Snow Labs +name: matscibert_cner +date: 2024-09-25 +tags: [en, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`matscibert_cner` is a English model originally trained by nlp-magnets. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/matscibert_cner_en_5.5.0_3.0_1727275895658.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/matscibert_cner_en_5.5.0_3.0_1727275895658.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("matscibert_cner","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("matscibert_cner", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|matscibert_cner| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|409.9 MB| + +## References + +https://huggingface.co/nlp-magnets/matscibert-cner \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-matscibert_cner_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-matscibert_cner_pipeline_en.md new file mode 100644 index 00000000000000..273c9202218ec0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-matscibert_cner_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English matscibert_cner_pipeline pipeline BertForTokenClassification from nlp-magnets +author: John Snow Labs +name: matscibert_cner_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`matscibert_cner_pipeline` is a English model originally trained by nlp-magnets. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/matscibert_cner_pipeline_en_5.5.0_3.0_1727275917628.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/matscibert_cner_pipeline_en_5.5.0_3.0_1727275917628.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("matscibert_cner_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("matscibert_cner_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|matscibert_cner_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|410.0 MB| + +## References + +https://huggingface.co/nlp-magnets/matscibert-cner + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-mbert_finetuned_sdgs_en.md b/docs/_posts/ahmedlone127/2024-09-25-mbert_finetuned_sdgs_en.md new file mode 100644 index 00000000000000..ddfd03f84fb913 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-mbert_finetuned_sdgs_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English mbert_finetuned_sdgs BertForSequenceClassification from aadhistii +author: John Snow Labs +name: mbert_finetuned_sdgs +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`mbert_finetuned_sdgs` is a English model originally trained by aadhistii. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/mbert_finetuned_sdgs_en_5.5.0_3.0_1727277659365.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/mbert_finetuned_sdgs_en_5.5.0_3.0_1727277659365.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("mbert_finetuned_sdgs","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("mbert_finetuned_sdgs", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|mbert_finetuned_sdgs| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|667.3 MB| + +## References + +https://huggingface.co/aadhistii/mbert-finetuned-sdgs \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-memo_bert_wsd_01_en.md b/docs/_posts/ahmedlone127/2024-09-25-memo_bert_wsd_01_en.md new file mode 100644 index 00000000000000..b5897ec74530f3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-memo_bert_wsd_01_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English memo_bert_wsd_01 BertForSequenceClassification from yemen2016 +author: John Snow Labs +name: memo_bert_wsd_01 +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`memo_bert_wsd_01` is a English model originally trained by yemen2016. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/memo_bert_wsd_01_en_5.5.0_3.0_1727285931602.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/memo_bert_wsd_01_en_5.5.0_3.0_1727285931602.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("memo_bert_wsd_01","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("memo_bert_wsd_01", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|memo_bert_wsd_01| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|410.3 MB| + +## References + +https://huggingface.co/yemen2016/MeMo_BERT-WSD-01 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-mitre_bert_base_cased_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-mitre_bert_base_cased_pipeline_en.md new file mode 100644 index 00000000000000..ff0521e83a94ae --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-mitre_bert_base_cased_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English mitre_bert_base_cased_pipeline pipeline BertForSequenceClassification from bencyc1129 +author: John Snow Labs +name: mitre_bert_base_cased_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`mitre_bert_base_cased_pipeline` is a English model originally trained by bencyc1129. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/mitre_bert_base_cased_pipeline_en_5.5.0_3.0_1727266950234.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/mitre_bert_base_cased_pipeline_en_5.5.0_3.0_1727266950234.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("mitre_bert_base_cased_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("mitre_bert_base_cased_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|mitre_bert_base_cased_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|408.2 MB| + +## References + +https://huggingface.co/bencyc1129/mitre-bert-base-cased + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-mobilebert_sanskrit_saskta_pre_training_complete_en.md b/docs/_posts/ahmedlone127/2024-09-25-mobilebert_sanskrit_saskta_pre_training_complete_en.md new file mode 100644 index 00000000000000..270462cd6b8caf --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-mobilebert_sanskrit_saskta_pre_training_complete_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English mobilebert_sanskrit_saskta_pre_training_complete BertEmbeddings from gokuls +author: John Snow Labs +name: mobilebert_sanskrit_saskta_pre_training_complete +date: 2024-09-25 +tags: [en, open_source, onnx, embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`mobilebert_sanskrit_saskta_pre_training_complete` is a English model originally trained by gokuls. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/mobilebert_sanskrit_saskta_pre_training_complete_en_5.5.0_3.0_1727241248026.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/mobilebert_sanskrit_saskta_pre_training_complete_en_5.5.0_3.0_1727241248026.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = BertEmbeddings.pretrained("mobilebert_sanskrit_saskta_pre_training_complete","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = BertEmbeddings.pretrained("mobilebert_sanskrit_saskta_pre_training_complete","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|mobilebert_sanskrit_saskta_pre_training_complete| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[bert]| +|Language:|en| +|Size:|92.5 MB| + +## References + +https://huggingface.co/gokuls/mobilebert_sa_pre-training-complete \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-mobilebert_sanskrit_saskta_pre_training_complete_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-mobilebert_sanskrit_saskta_pre_training_complete_pipeline_en.md new file mode 100644 index 00000000000000..4a8b8a87ceba3a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-mobilebert_sanskrit_saskta_pre_training_complete_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English mobilebert_sanskrit_saskta_pre_training_complete_pipeline pipeline BertEmbeddings from gokuls +author: John Snow Labs +name: mobilebert_sanskrit_saskta_pre_training_complete_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`mobilebert_sanskrit_saskta_pre_training_complete_pipeline` is a English model originally trained by gokuls. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/mobilebert_sanskrit_saskta_pre_training_complete_pipeline_en_5.5.0_3.0_1727241252743.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/mobilebert_sanskrit_saskta_pre_training_complete_pipeline_en_5.5.0_3.0_1727241252743.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("mobilebert_sanskrit_saskta_pre_training_complete_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("mobilebert_sanskrit_saskta_pre_training_complete_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|mobilebert_sanskrit_saskta_pre_training_complete_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|92.5 MB| + +## References + +https://huggingface.co/gokuls/mobilebert_sa_pre-training-complete + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-mobilebert_stsb_en.md b/docs/_posts/ahmedlone127/2024-09-25-mobilebert_stsb_en.md new file mode 100644 index 00000000000000..74403cf3ff6ec4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-mobilebert_stsb_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English mobilebert_stsb BertForSequenceClassification from Alireza1044 +author: John Snow Labs +name: mobilebert_stsb +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`mobilebert_stsb` is a English model originally trained by Alireza1044. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/mobilebert_stsb_en_5.5.0_3.0_1727287378824.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/mobilebert_stsb_en_5.5.0_3.0_1727287378824.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("mobilebert_stsb","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("mobilebert_stsb", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|mobilebert_stsb| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|92.5 MB| + +## References + +https://huggingface.co/Alireza1044/mobilebert_stsb \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-modela_1_12_2023_en.md b/docs/_posts/ahmedlone127/2024-09-25-modela_1_12_2023_en.md new file mode 100644 index 00000000000000..c8782ff8345c11 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-modela_1_12_2023_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English modela_1_12_2023 BertForTokenClassification from MaryDatascientist +author: John Snow Labs +name: modela_1_12_2023 +date: 2024-09-25 +tags: [en, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`modela_1_12_2023` is a English model originally trained by MaryDatascientist. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/modela_1_12_2023_en_5.5.0_3.0_1727264606089.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/modela_1_12_2023_en_5.5.0_3.0_1727264606089.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("modela_1_12_2023","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("modela_1_12_2023", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|modela_1_12_2023| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/MaryDatascientist/modelA_1_12_2023 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-modelo_racismo_9_april_24_en.md b/docs/_posts/ahmedlone127/2024-09-25-modelo_racismo_9_april_24_en.md new file mode 100644 index 00000000000000..adc1bc99bf607e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-modelo_racismo_9_april_24_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English modelo_racismo_9_april_24 BertForSequenceClassification from leofn3 +author: John Snow Labs +name: modelo_racismo_9_april_24 +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`modelo_racismo_9_april_24` is a English model originally trained by leofn3. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/modelo_racismo_9_april_24_en_5.5.0_3.0_1727276484310.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/modelo_racismo_9_april_24_en_5.5.0_3.0_1727276484310.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("modelo_racismo_9_april_24","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("modelo_racismo_9_april_24", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|modelo_racismo_9_april_24| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|667.3 MB| + +## References + +https://huggingface.co/leofn3/modelo_racismo_9_april_24 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-movie_genre_classifier_davooddkareshki_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-movie_genre_classifier_davooddkareshki_pipeline_en.md new file mode 100644 index 00000000000000..543cdb77d93ed0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-movie_genre_classifier_davooddkareshki_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English movie_genre_classifier_davooddkareshki_pipeline pipeline BertForSequenceClassification from davooddkareshki +author: John Snow Labs +name: movie_genre_classifier_davooddkareshki_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`movie_genre_classifier_davooddkareshki_pipeline` is a English model originally trained by davooddkareshki. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/movie_genre_classifier_davooddkareshki_pipeline_en_5.5.0_3.0_1727277484777.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/movie_genre_classifier_davooddkareshki_pipeline_en_5.5.0_3.0_1727277484777.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("movie_genre_classifier_davooddkareshki_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("movie_genre_classifier_davooddkareshki_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|movie_genre_classifier_davooddkareshki_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/davooddkareshki/Movie_Genre_Classifier + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-multitaskdistilledmodel_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-multitaskdistilledmodel_pipeline_en.md new file mode 100644 index 00000000000000..f6ee94c889d611 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-multitaskdistilledmodel_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English multitaskdistilledmodel_pipeline pipeline BertForSequenceClassification from privacy-tech-lab +author: John Snow Labs +name: multitaskdistilledmodel_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`multitaskdistilledmodel_pipeline` is a English model originally trained by privacy-tech-lab. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/multitaskdistilledmodel_pipeline_en_5.5.0_3.0_1727286271829.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/multitaskdistilledmodel_pipeline_en_5.5.0_3.0_1727286271829.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("multitaskdistilledmodel_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("multitaskdistilledmodel_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|multitaskdistilledmodel_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|54.2 MB| + +## References + +https://huggingface.co/privacy-tech-lab/MultitaskDistilledModel + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-n_bert_imdb_padding20model_en.md b/docs/_posts/ahmedlone127/2024-09-25-n_bert_imdb_padding20model_en.md new file mode 100644 index 00000000000000..b856b80e592ac3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-n_bert_imdb_padding20model_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English n_bert_imdb_padding20model BertForSequenceClassification from Realgon +author: John Snow Labs +name: n_bert_imdb_padding20model +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`n_bert_imdb_padding20model` is a English model originally trained by Realgon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/n_bert_imdb_padding20model_en_5.5.0_3.0_1727267427419.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/n_bert_imdb_padding20model_en_5.5.0_3.0_1727267427419.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("n_bert_imdb_padding20model","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("n_bert_imdb_padding20model", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|n_bert_imdb_padding20model| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.5 MB| + +## References + +https://huggingface.co/Realgon/N_bert_imdb_padding20model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-n_bert_imdb_padding80model_en.md b/docs/_posts/ahmedlone127/2024-09-25-n_bert_imdb_padding80model_en.md new file mode 100644 index 00000000000000..a19eb7f8a5f1bf --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-n_bert_imdb_padding80model_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English n_bert_imdb_padding80model BertForSequenceClassification from Realgon +author: John Snow Labs +name: n_bert_imdb_padding80model +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`n_bert_imdb_padding80model` is a English model originally trained by Realgon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/n_bert_imdb_padding80model_en_5.5.0_3.0_1727278327518.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/n_bert_imdb_padding80model_en_5.5.0_3.0_1727278327518.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("n_bert_imdb_padding80model","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("n_bert_imdb_padding80model", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|n_bert_imdb_padding80model| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.6 MB| + +## References + +https://huggingface.co/Realgon/N_bert_imdb_padding80model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-n_bert_imdb_padding80model_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-n_bert_imdb_padding80model_pipeline_en.md new file mode 100644 index 00000000000000..ee2b4479aa1eaa --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-n_bert_imdb_padding80model_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English n_bert_imdb_padding80model_pipeline pipeline BertForSequenceClassification from Realgon +author: John Snow Labs +name: n_bert_imdb_padding80model_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`n_bert_imdb_padding80model_pipeline` is a English model originally trained by Realgon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/n_bert_imdb_padding80model_pipeline_en_5.5.0_3.0_1727278348554.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/n_bert_imdb_padding80model_pipeline_en_5.5.0_3.0_1727278348554.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("n_bert_imdb_padding80model_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("n_bert_imdb_padding80model_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|n_bert_imdb_padding80model_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.7 MB| + +## References + +https://huggingface.co/Realgon/N_bert_imdb_padding80model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-n_bert_sst5_padding100model_en.md b/docs/_posts/ahmedlone127/2024-09-25-n_bert_sst5_padding100model_en.md new file mode 100644 index 00000000000000..01c3011bf6a77c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-n_bert_sst5_padding100model_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English n_bert_sst5_padding100model BertForSequenceClassification from Realgon +author: John Snow Labs +name: n_bert_sst5_padding100model +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`n_bert_sst5_padding100model` is a English model originally trained by Realgon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/n_bert_sst5_padding100model_en_5.5.0_3.0_1727278694323.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/n_bert_sst5_padding100model_en_5.5.0_3.0_1727278694323.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("n_bert_sst5_padding100model","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("n_bert_sst5_padding100model", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|n_bert_sst5_padding100model| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.7 MB| + +## References + +https://huggingface.co/Realgon/N_bert_sst5_padding100model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-n_bert_twitterfin_padding60model_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-n_bert_twitterfin_padding60model_pipeline_en.md new file mode 100644 index 00000000000000..88cdf45a1b0aab --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-n_bert_twitterfin_padding60model_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English n_bert_twitterfin_padding60model_pipeline pipeline BertForSequenceClassification from Realgon +author: John Snow Labs +name: n_bert_twitterfin_padding60model_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`n_bert_twitterfin_padding60model_pipeline` is a English model originally trained by Realgon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/n_bert_twitterfin_padding60model_pipeline_en_5.5.0_3.0_1727286190095.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/n_bert_twitterfin_padding60model_pipeline_en_5.5.0_3.0_1727286190095.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("n_bert_twitterfin_padding60model_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("n_bert_twitterfin_padding60model_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|n_bert_twitterfin_padding60model_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.6 MB| + +## References + +https://huggingface.co/Realgon/N_bert_twitterfin_padding60model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-n_bert_twitterfin_padding90model_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-n_bert_twitterfin_padding90model_pipeline_en.md new file mode 100644 index 00000000000000..b27a287b77c8d6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-n_bert_twitterfin_padding90model_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English n_bert_twitterfin_padding90model_pipeline pipeline BertForSequenceClassification from Realgon +author: John Snow Labs +name: n_bert_twitterfin_padding90model_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`n_bert_twitterfin_padding90model_pipeline` is a English model originally trained by Realgon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/n_bert_twitterfin_padding90model_pipeline_en_5.5.0_3.0_1727279592031.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/n_bert_twitterfin_padding90model_pipeline_en_5.5.0_3.0_1727279592031.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("n_bert_twitterfin_padding90model_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("n_bert_twitterfin_padding90model_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|n_bert_twitterfin_padding90model_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.7 MB| + +## References + +https://huggingface.co/Realgon/N_bert_twitterfin_padding90model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-name_anonymization_tr.md b/docs/_posts/ahmedlone127/2024-09-25-name_anonymization_tr.md new file mode 100644 index 00000000000000..78d78b901202d1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-name_anonymization_tr.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Turkish name_anonymization BertForTokenClassification from deprem-ml +author: John Snow Labs +name: name_anonymization +date: 2024-09-25 +tags: [tr, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: tr +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`name_anonymization` is a Turkish model originally trained by deprem-ml. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/name_anonymization_tr_5.5.0_3.0_1727284019137.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/name_anonymization_tr_5.5.0_3.0_1727284019137.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("name_anonymization","tr") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("name_anonymization", "tr") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|name_anonymization| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|tr| +|Size:|412.3 MB| + +## References + +https://huggingface.co/deprem-ml/name_anonymization \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-ner_bert_ingredients_en.md b/docs/_posts/ahmedlone127/2024-09-25-ner_bert_ingredients_en.md new file mode 100644 index 00000000000000..5de06465fcc7ab --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-ner_bert_ingredients_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English ner_bert_ingredients BertForTokenClassification from Shresthadev403 +author: John Snow Labs +name: ner_bert_ingredients +date: 2024-09-25 +tags: [en, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ner_bert_ingredients` is a English model originally trained by Shresthadev403. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ner_bert_ingredients_en_5.5.0_3.0_1727260673154.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ner_bert_ingredients_en_5.5.0_3.0_1727260673154.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("ner_bert_ingredients","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("ner_bert_ingredients", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ner_bert_ingredients| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|407.3 MB| + +## References + +https://huggingface.co/Shresthadev403/ner-bert-ingredients \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-ner_darijabert_arabizi_en.md b/docs/_posts/ahmedlone127/2024-09-25-ner_darijabert_arabizi_en.md new file mode 100644 index 00000000000000..1ff21d48ed9022 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-ner_darijabert_arabizi_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English ner_darijabert_arabizi BertForTokenClassification from Oelbourki +author: John Snow Labs +name: ner_darijabert_arabizi +date: 2024-09-25 +tags: [en, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ner_darijabert_arabizi` is a English model originally trained by Oelbourki. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ner_darijabert_arabizi_en_5.5.0_3.0_1727282192543.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ner_darijabert_arabizi_en_5.5.0_3.0_1727282192543.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("ner_darijabert_arabizi","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("ner_darijabert_arabizi", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ner_darijabert_arabizi| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|634.9 MB| + +## References + +https://huggingface.co/Oelbourki/ner-DarijaBERT-arabizi \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-ner_harem_bert_base_portuguese_cased_en.md b/docs/_posts/ahmedlone127/2024-09-25-ner_harem_bert_base_portuguese_cased_en.md new file mode 100644 index 00000000000000..9e4df60e54d0aa --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-ner_harem_bert_base_portuguese_cased_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English ner_harem_bert_base_portuguese_cased BertForTokenClassification from liaad +author: John Snow Labs +name: ner_harem_bert_base_portuguese_cased +date: 2024-09-25 +tags: [en, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ner_harem_bert_base_portuguese_cased` is a English model originally trained by liaad. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ner_harem_bert_base_portuguese_cased_en_5.5.0_3.0_1727248241432.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ner_harem_bert_base_portuguese_cased_en_5.5.0_3.0_1727248241432.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("ner_harem_bert_base_portuguese_cased","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("ner_harem_bert_base_portuguese_cased", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ner_harem_bert_base_portuguese_cased| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|406.0 MB| + +## References + +https://huggingface.co/liaad/NER_harem_bert-base-portuguese-cased \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-ner_harem_bert_base_portuguese_cased_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-ner_harem_bert_base_portuguese_cased_pipeline_en.md new file mode 100644 index 00000000000000..aa596ca52a5c6b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-ner_harem_bert_base_portuguese_cased_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English ner_harem_bert_base_portuguese_cased_pipeline pipeline BertForTokenClassification from liaad +author: John Snow Labs +name: ner_harem_bert_base_portuguese_cased_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ner_harem_bert_base_portuguese_cased_pipeline` is a English model originally trained by liaad. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ner_harem_bert_base_portuguese_cased_pipeline_en_5.5.0_3.0_1727248267196.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ner_harem_bert_base_portuguese_cased_pipeline_en_5.5.0_3.0_1727248267196.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("ner_harem_bert_base_portuguese_cased_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("ner_harem_bert_base_portuguese_cased_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ner_harem_bert_base_portuguese_cased_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|406.0 MB| + +## References + +https://huggingface.co/liaad/NER_harem_bert-base-portuguese-cased + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-ner_resume_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-ner_resume_pipeline_en.md new file mode 100644 index 00000000000000..c0153ed3e40665 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-ner_resume_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English ner_resume_pipeline pipeline BertForTokenClassification from ClaudiuFilip1100 +author: John Snow Labs +name: ner_resume_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ner_resume_pipeline` is a English model originally trained by ClaudiuFilip1100. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ner_resume_pipeline_en_5.5.0_3.0_1727270974736.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ner_resume_pipeline_en_5.5.0_3.0_1727270974736.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("ner_resume_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("ner_resume_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ner_resume_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/ClaudiuFilip1100/ner-resume + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-news_category_classifier_distilbert_en.md b/docs/_posts/ahmedlone127/2024-09-25-news_category_classifier_distilbert_en.md new file mode 100644 index 00000000000000..32fe4dafec34cd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-news_category_classifier_distilbert_en.md @@ -0,0 +1,98 @@ +--- +layout: model +title: English news_category_classifier_distilbert BertForSequenceClassification from dima806 +author: John Snow Labs +name: news_category_classifier_distilbert +date: 2024-09-25 +tags: [bert, en, open_source, sequence_classification, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`news_category_classifier_distilbert` is a English model originally trained by dima806. + +## Predicted Entities + + + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/news_category_classifier_distilbert_en_5.5.0_3.0_1727268666692.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/news_category_classifier_distilbert_en_5.5.0_3.0_1727268666692.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +document_assembler = DocumentAssembler()\ + .setInputCol("text")\ + .setOutputCol("document") + +tokenizer = Tokenizer()\ + .setInputCols("document")\ + .setOutputCol("token") + +sequenceClassifier = BertForSequenceClassification.pretrained("news_category_classifier_distilbert","en")\ + .setInputCols(["document","token"])\ + .setOutputCol("class") + +pipeline = Pipeline().setStages([document_assembler, tokenizer, sequenceClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val document_assembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("news_category_classifier_distilbert","en") + .setInputCols(Array("document","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|news_category_classifier_distilbert| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.5 MB| + +## References + +References + +https://huggingface.co/dima806/news-category-classifier-distilbert \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-news_category_classifier_distilbert_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-news_category_classifier_distilbert_pipeline_en.md new file mode 100644 index 00000000000000..190923ed5fc0a4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-news_category_classifier_distilbert_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English news_category_classifier_distilbert_pipeline pipeline BertForSequenceClassification from wnic00 +author: John Snow Labs +name: news_category_classifier_distilbert_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`news_category_classifier_distilbert_pipeline` is a English model originally trained by wnic00. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/news_category_classifier_distilbert_pipeline_en_5.5.0_3.0_1727268689039.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/news_category_classifier_distilbert_pipeline_en_5.5.0_3.0_1727268689039.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("news_category_classifier_distilbert_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("news_category_classifier_distilbert_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|news_category_classifier_distilbert_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.5 MB| + +## References + +https://huggingface.co/wnic00/news-category-classifier-distilbert + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-nlp_sardinian_based_on_bert_en.md b/docs/_posts/ahmedlone127/2024-09-25-nlp_sardinian_based_on_bert_en.md new file mode 100644 index 00000000000000..31d16c98ba12f9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-nlp_sardinian_based_on_bert_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English nlp_sardinian_based_on_bert BertForSequenceClassification from 4TB-USTC +author: John Snow Labs +name: nlp_sardinian_based_on_bert +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`nlp_sardinian_based_on_bert` is a English model originally trained by 4TB-USTC. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/nlp_sardinian_based_on_bert_en_5.5.0_3.0_1727288469493.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/nlp_sardinian_based_on_bert_en_5.5.0_3.0_1727288469493.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("nlp_sardinian_based_on_bert","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("nlp_sardinian_based_on_bert", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|nlp_sardinian_based_on_bert| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/4TB-USTC/nlp_sc_based_on_bert \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-nlp_sardinian_based_on_bert_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-nlp_sardinian_based_on_bert_pipeline_en.md new file mode 100644 index 00000000000000..ccdd277d28fb1b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-nlp_sardinian_based_on_bert_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English nlp_sardinian_based_on_bert_pipeline pipeline BertForSequenceClassification from 4TB-USTC +author: John Snow Labs +name: nlp_sardinian_based_on_bert_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`nlp_sardinian_based_on_bert_pipeline` is a English model originally trained by 4TB-USTC. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/nlp_sardinian_based_on_bert_pipeline_en_5.5.0_3.0_1727288492519.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/nlp_sardinian_based_on_bert_pipeline_en_5.5.0_3.0_1727288492519.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("nlp_sardinian_based_on_bert_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("nlp_sardinian_based_on_bert_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|nlp_sardinian_based_on_bert_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/4TB-USTC/nlp_sc_based_on_bert + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-nonsense_gibberish_detector_pipeline_ru.md b/docs/_posts/ahmedlone127/2024-09-25-nonsense_gibberish_detector_pipeline_ru.md new file mode 100644 index 00000000000000..d636f7605fcbb2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-nonsense_gibberish_detector_pipeline_ru.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Russian nonsense_gibberish_detector_pipeline pipeline BertForSequenceClassification from Den4ikAI +author: John Snow Labs +name: nonsense_gibberish_detector_pipeline +date: 2024-09-25 +tags: [ru, open_source, pipeline, onnx] +task: Text Classification +language: ru +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`nonsense_gibberish_detector_pipeline` is a Russian model originally trained by Den4ikAI. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/nonsense_gibberish_detector_pipeline_ru_5.5.0_3.0_1727239983491.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/nonsense_gibberish_detector_pipeline_ru_5.5.0_3.0_1727239983491.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("nonsense_gibberish_detector_pipeline", lang = "ru") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("nonsense_gibberish_detector_pipeline", lang = "ru") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|nonsense_gibberish_detector_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|ru| +|Size:|426.2 MB| + +## References + +https://huggingface.co/Den4ikAI/nonsense_gibberish_detector + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-nonsense_gibberish_detector_ru.md b/docs/_posts/ahmedlone127/2024-09-25-nonsense_gibberish_detector_ru.md new file mode 100644 index 00000000000000..6c85ad9486dede --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-nonsense_gibberish_detector_ru.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Russian nonsense_gibberish_detector BertForSequenceClassification from Den4ikAI +author: John Snow Labs +name: nonsense_gibberish_detector +date: 2024-09-25 +tags: [ru, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: ru +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`nonsense_gibberish_detector` is a Russian model originally trained by Den4ikAI. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/nonsense_gibberish_detector_ru_5.5.0_3.0_1727239962190.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/nonsense_gibberish_detector_ru_5.5.0_3.0_1727239962190.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("nonsense_gibberish_detector","ru") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("nonsense_gibberish_detector", "ru") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|nonsense_gibberish_detector| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|ru| +|Size:|426.2 MB| + +## References + +https://huggingface.co/Den4ikAI/nonsense_gibberish_detector \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-norwegian_bokml_whisper_base_no.md b/docs/_posts/ahmedlone127/2024-09-25-norwegian_bokml_whisper_base_no.md new file mode 100644 index 00000000000000..697d99f4cdec8e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-norwegian_bokml_whisper_base_no.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Norwegian norwegian_bokml_whisper_base WhisperForCTC from NbAiLab +author: John Snow Labs +name: norwegian_bokml_whisper_base +date: 2024-09-25 +tags: ["no", open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: "no" +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`norwegian_bokml_whisper_base` is a Norwegian model originally trained by NbAiLab. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/norwegian_bokml_whisper_base_no_5.5.0_3.0_1727223458869.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/norwegian_bokml_whisper_base_no_5.5.0_3.0_1727223458869.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("norwegian_bokml_whisper_base","no") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("norwegian_bokml_whisper_base", "no") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|norwegian_bokml_whisper_base| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|no| +|Size:|633.6 MB| + +## References + +https://huggingface.co/NbAiLab/nb-whisper-base \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-norwegian_bokml_whisper_small_no.md b/docs/_posts/ahmedlone127/2024-09-25-norwegian_bokml_whisper_small_no.md new file mode 100644 index 00000000000000..d42a70e5b0f9de --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-norwegian_bokml_whisper_small_no.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Norwegian norwegian_bokml_whisper_small WhisperForCTC from NbAiLabBeta +author: John Snow Labs +name: norwegian_bokml_whisper_small +date: 2024-09-25 +tags: ["no", open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: "no" +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`norwegian_bokml_whisper_small` is a Norwegian model originally trained by NbAiLabBeta. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/norwegian_bokml_whisper_small_no_5.5.0_3.0_1727223710912.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/norwegian_bokml_whisper_small_no_5.5.0_3.0_1727223710912.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("norwegian_bokml_whisper_small","no") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("norwegian_bokml_whisper_small", "no") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|norwegian_bokml_whisper_small| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|no| +|Size:|1.7 GB| + +## References + +https://huggingface.co/NbAiLabBeta/nb-whisper-small \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-norwegian_bokml_whisper_small_pipeline_no.md b/docs/_posts/ahmedlone127/2024-09-25-norwegian_bokml_whisper_small_pipeline_no.md new file mode 100644 index 00000000000000..c4be1e2748114d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-norwegian_bokml_whisper_small_pipeline_no.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Norwegian norwegian_bokml_whisper_small_pipeline pipeline WhisperForCTC from NbAiLabBeta +author: John Snow Labs +name: norwegian_bokml_whisper_small_pipeline +date: 2024-09-25 +tags: ["no", open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: "no" +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`norwegian_bokml_whisper_small_pipeline` is a Norwegian model originally trained by NbAiLabBeta. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/norwegian_bokml_whisper_small_pipeline_no_5.5.0_3.0_1727223803220.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/norwegian_bokml_whisper_small_pipeline_no_5.5.0_3.0_1727223803220.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("norwegian_bokml_whisper_small_pipeline", lang = "no") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("norwegian_bokml_whisper_small_pipeline", lang = "no") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|norwegian_bokml_whisper_small_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|no| +|Size:|1.7 GB| + +## References + +https://huggingface.co/NbAiLabBeta/nb-whisper-small + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-norwegian_bokml_whisper_tiny_nbailab_no.md b/docs/_posts/ahmedlone127/2024-09-25-norwegian_bokml_whisper_tiny_nbailab_no.md new file mode 100644 index 00000000000000..02c44cae29ff62 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-norwegian_bokml_whisper_tiny_nbailab_no.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Norwegian norwegian_bokml_whisper_tiny_nbailab WhisperForCTC from NbAiLab +author: John Snow Labs +name: norwegian_bokml_whisper_tiny_nbailab +date: 2024-09-25 +tags: ["no", open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: "no" +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`norwegian_bokml_whisper_tiny_nbailab` is a Norwegian model originally trained by NbAiLab. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/norwegian_bokml_whisper_tiny_nbailab_no_5.5.0_3.0_1727227479762.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/norwegian_bokml_whisper_tiny_nbailab_no_5.5.0_3.0_1727227479762.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("norwegian_bokml_whisper_tiny_nbailab","no") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("norwegian_bokml_whisper_tiny_nbailab", "no") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|norwegian_bokml_whisper_tiny_nbailab| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|no| +|Size:|384.2 MB| + +## References + +https://huggingface.co/NbAiLab/nb-whisper-tiny \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-norwegian_bokml_whisper_tiny_nbailab_pipeline_no.md b/docs/_posts/ahmedlone127/2024-09-25-norwegian_bokml_whisper_tiny_nbailab_pipeline_no.md new file mode 100644 index 00000000000000..202103aef7b4e6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-norwegian_bokml_whisper_tiny_nbailab_pipeline_no.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Norwegian norwegian_bokml_whisper_tiny_nbailab_pipeline pipeline WhisperForCTC from NbAiLab +author: John Snow Labs +name: norwegian_bokml_whisper_tiny_nbailab_pipeline +date: 2024-09-25 +tags: ["no", open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: "no" +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`norwegian_bokml_whisper_tiny_nbailab_pipeline` is a Norwegian model originally trained by NbAiLab. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/norwegian_bokml_whisper_tiny_nbailab_pipeline_no_5.5.0_3.0_1727227502382.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/norwegian_bokml_whisper_tiny_nbailab_pipeline_no_5.5.0_3.0_1727227502382.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("norwegian_bokml_whisper_tiny_nbailab_pipeline", lang = "no") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("norwegian_bokml_whisper_tiny_nbailab_pipeline", lang = "no") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|norwegian_bokml_whisper_tiny_nbailab_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|no| +|Size:|384.2 MB| + +## References + +https://huggingface.co/NbAiLab/nb-whisper-tiny + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-norwegian_bokml_whisper_tiny_nbailabbeta_no.md b/docs/_posts/ahmedlone127/2024-09-25-norwegian_bokml_whisper_tiny_nbailabbeta_no.md new file mode 100644 index 00000000000000..b52c0cbab903db --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-norwegian_bokml_whisper_tiny_nbailabbeta_no.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Norwegian norwegian_bokml_whisper_tiny_nbailabbeta WhisperForCTC from NbAiLabBeta +author: John Snow Labs +name: norwegian_bokml_whisper_tiny_nbailabbeta +date: 2024-09-25 +tags: ["no", open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: "no" +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`norwegian_bokml_whisper_tiny_nbailabbeta` is a Norwegian model originally trained by NbAiLabBeta. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/norwegian_bokml_whisper_tiny_nbailabbeta_no_5.5.0_3.0_1727224996671.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/norwegian_bokml_whisper_tiny_nbailabbeta_no_5.5.0_3.0_1727224996671.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("norwegian_bokml_whisper_tiny_nbailabbeta","no") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("norwegian_bokml_whisper_tiny_nbailabbeta", "no") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|norwegian_bokml_whisper_tiny_nbailabbeta| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|no| +|Size:|384.2 MB| + +## References + +https://huggingface.co/NbAiLabBeta/nb-whisper-tiny \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-opticalbert_cner_cased_en.md b/docs/_posts/ahmedlone127/2024-09-25-opticalbert_cner_cased_en.md new file mode 100644 index 00000000000000..951494a8e34b4a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-opticalbert_cner_cased_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English opticalbert_cner_cased BertForTokenClassification from opticalmaterials +author: John Snow Labs +name: opticalbert_cner_cased +date: 2024-09-25 +tags: [en, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opticalbert_cner_cased` is a English model originally trained by opticalmaterials. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opticalbert_cner_cased_en_5.5.0_3.0_1727248350546.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opticalbert_cner_cased_en_5.5.0_3.0_1727248350546.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("opticalbert_cner_cased","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("opticalbert_cner_cased", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opticalbert_cner_cased| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.5 MB| + +## References + +https://huggingface.co/opticalmaterials/opticalbert_cner_cased \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-opticalbert_cner_cased_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-opticalbert_cner_cased_pipeline_en.md new file mode 100644 index 00000000000000..3ca3b95bea7f1b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-opticalbert_cner_cased_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English opticalbert_cner_cased_pipeline pipeline BertForTokenClassification from opticalmaterials +author: John Snow Labs +name: opticalbert_cner_cased_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opticalbert_cner_cased_pipeline` is a English model originally trained by opticalmaterials. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opticalbert_cner_cased_pipeline_en_5.5.0_3.0_1727248371555.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opticalbert_cner_cased_pipeline_en_5.5.0_3.0_1727248371555.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("opticalbert_cner_cased_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("opticalbert_cner_cased_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opticalbert_cner_cased_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|403.6 MB| + +## References + +https://huggingface.co/opticalmaterials/opticalbert_cner_cased + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-opus_em_augmented_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-opus_em_augmented_pipeline_en.md new file mode 100644 index 00000000000000..9c44c860a7839e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-opus_em_augmented_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English opus_em_augmented_pipeline pipeline BertForSequenceClassification from keremp +author: John Snow Labs +name: opus_em_augmented_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opus_em_augmented_pipeline` is a English model originally trained by keremp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opus_em_augmented_pipeline_en_5.5.0_3.0_1727267676216.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opus_em_augmented_pipeline_en_5.5.0_3.0_1727267676216.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("opus_em_augmented_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("opus_em_augmented_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opus_em_augmented_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/keremp/opus-em-augmented + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-out_glue_mnli_en.md b/docs/_posts/ahmedlone127/2024-09-25-out_glue_mnli_en.md new file mode 100644 index 00000000000000..483ba4fe648106 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-out_glue_mnli_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English out_glue_mnli BertForSequenceClassification from Tural +author: John Snow Labs +name: out_glue_mnli +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`out_glue_mnli` is a English model originally trained by Tural. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/out_glue_mnli_en_5.5.0_3.0_1727264080681.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/out_glue_mnli_en_5.5.0_3.0_1727264080681.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("out_glue_mnli","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("out_glue_mnli", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|out_glue_mnli| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|410.2 MB| + +## References + +https://huggingface.co/Tural/out-glue-mnli \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-out_glue_mnli_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-out_glue_mnli_pipeline_en.md new file mode 100644 index 00000000000000..9b2648ebf2623c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-out_glue_mnli_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English out_glue_mnli_pipeline pipeline BertForSequenceClassification from Tural +author: John Snow Labs +name: out_glue_mnli_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`out_glue_mnli_pipeline` is a English model originally trained by Tural. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/out_glue_mnli_pipeline_en_5.5.0_3.0_1727264103708.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/out_glue_mnli_pipeline_en_5.5.0_3.0_1727264103708.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("out_glue_mnli_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("out_glue_mnli_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|out_glue_mnli_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|410.3 MB| + +## References + +https://huggingface.co/Tural/out-glue-mnli + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-pabee_bert_base_sst2_en.md b/docs/_posts/ahmedlone127/2024-09-25-pabee_bert_base_sst2_en.md new file mode 100644 index 00000000000000..cca658c28071ff --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-pabee_bert_base_sst2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English pabee_bert_base_sst2 BertForSequenceClassification from mattymchen +author: John Snow Labs +name: pabee_bert_base_sst2 +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`pabee_bert_base_sst2` is a English model originally trained by mattymchen. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/pabee_bert_base_sst2_en_5.5.0_3.0_1727276199313.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/pabee_bert_base_sst2_en_5.5.0_3.0_1727276199313.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("pabee_bert_base_sst2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("pabee_bert_base_sst2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|pabee_bert_base_sst2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/mattymchen/pabee-bert-base-sst2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-pardonmyai_tiny_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-pardonmyai_tiny_pipeline_en.md new file mode 100644 index 00000000000000..0abc2c9a8c7d10 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-pardonmyai_tiny_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English pardonmyai_tiny_pipeline pipeline BertForSequenceClassification from tarekziade +author: John Snow Labs +name: pardonmyai_tiny_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`pardonmyai_tiny_pipeline` is a English model originally trained by tarekziade. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/pardonmyai_tiny_pipeline_en_5.5.0_3.0_1727276171723.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/pardonmyai_tiny_pipeline_en_5.5.0_3.0_1727276171723.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("pardonmyai_tiny_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("pardonmyai_tiny_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|pardonmyai_tiny_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|54.2 MB| + +## References + +https://huggingface.co/tarekziade/pardonmyai-tiny + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-phil_oriya_not_v1_2_en.md b/docs/_posts/ahmedlone127/2024-09-25-phil_oriya_not_v1_2_en.md new file mode 100644 index 00000000000000..667746d5b84017 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-phil_oriya_not_v1_2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English phil_oriya_not_v1_2 BertForSequenceClassification from dbourget +author: John Snow Labs +name: phil_oriya_not_v1_2 +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`phil_oriya_not_v1_2` is a English model originally trained by dbourget. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/phil_oriya_not_v1_2_en_5.5.0_3.0_1727256892636.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/phil_oriya_not_v1_2_en_5.5.0_3.0_1727256892636.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("phil_oriya_not_v1_2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("phil_oriya_not_v1_2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|phil_oriya_not_v1_2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/dbourget/phil-or-not-v1.2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-phil_oriya_not_v1_2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-phil_oriya_not_v1_2_pipeline_en.md new file mode 100644 index 00000000000000..87eea4fd656329 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-phil_oriya_not_v1_2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English phil_oriya_not_v1_2_pipeline pipeline BertForSequenceClassification from dbourget +author: John Snow Labs +name: phil_oriya_not_v1_2_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`phil_oriya_not_v1_2_pipeline` is a English model originally trained by dbourget. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/phil_oriya_not_v1_2_pipeline_en_5.5.0_3.0_1727256957656.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/phil_oriya_not_v1_2_pipeline_en_5.5.0_3.0_1727256957656.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("phil_oriya_not_v1_2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("phil_oriya_not_v1_2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|phil_oriya_not_v1_2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/dbourget/phil-or-not-v1.2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-phrasebank_sentiment_analysis_akode_en.md b/docs/_posts/ahmedlone127/2024-09-25-phrasebank_sentiment_analysis_akode_en.md new file mode 100644 index 00000000000000..d99ff5b65001d3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-phrasebank_sentiment_analysis_akode_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English phrasebank_sentiment_analysis_akode BertForSequenceClassification from akode +author: John Snow Labs +name: phrasebank_sentiment_analysis_akode +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`phrasebank_sentiment_analysis_akode` is a English model originally trained by akode. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/phrasebank_sentiment_analysis_akode_en_5.5.0_3.0_1727285371700.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/phrasebank_sentiment_analysis_akode_en_5.5.0_3.0_1727285371700.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("phrasebank_sentiment_analysis_akode","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("phrasebank_sentiment_analysis_akode", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|phrasebank_sentiment_analysis_akode| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/akode/phrasebank-sentiment-analysis \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-phrasebank_sentiment_analysis_amit7859_en.md b/docs/_posts/ahmedlone127/2024-09-25-phrasebank_sentiment_analysis_amit7859_en.md new file mode 100644 index 00000000000000..d5e010a4275d7f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-phrasebank_sentiment_analysis_amit7859_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English phrasebank_sentiment_analysis_amit7859 BertForSequenceClassification from amit7859 +author: John Snow Labs +name: phrasebank_sentiment_analysis_amit7859 +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`phrasebank_sentiment_analysis_amit7859` is a English model originally trained by amit7859. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/phrasebank_sentiment_analysis_amit7859_en_5.5.0_3.0_1727272646752.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/phrasebank_sentiment_analysis_amit7859_en_5.5.0_3.0_1727272646752.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("phrasebank_sentiment_analysis_amit7859","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("phrasebank_sentiment_analysis_amit7859", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|phrasebank_sentiment_analysis_amit7859| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/amit7859/phrasebank-sentiment-analysis \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-phrasebank_sentiment_analysis_amit7859_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-phrasebank_sentiment_analysis_amit7859_pipeline_en.md new file mode 100644 index 00000000000000..0c53a832999ad5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-phrasebank_sentiment_analysis_amit7859_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English phrasebank_sentiment_analysis_amit7859_pipeline pipeline BertForSequenceClassification from amit7859 +author: John Snow Labs +name: phrasebank_sentiment_analysis_amit7859_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`phrasebank_sentiment_analysis_amit7859_pipeline` is a English model originally trained by amit7859. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/phrasebank_sentiment_analysis_amit7859_pipeline_en_5.5.0_3.0_1727272669484.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/phrasebank_sentiment_analysis_amit7859_pipeline_en_5.5.0_3.0_1727272669484.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("phrasebank_sentiment_analysis_amit7859_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("phrasebank_sentiment_analysis_amit7859_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|phrasebank_sentiment_analysis_amit7859_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/amit7859/phrasebank-sentiment-analysis + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-phrasebank_sentiment_analysis_fakhry_en.md b/docs/_posts/ahmedlone127/2024-09-25-phrasebank_sentiment_analysis_fakhry_en.md new file mode 100644 index 00000000000000..897a7bd3bc6add --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-phrasebank_sentiment_analysis_fakhry_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English phrasebank_sentiment_analysis_fakhry BertForSequenceClassification from Fakhry +author: John Snow Labs +name: phrasebank_sentiment_analysis_fakhry +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`phrasebank_sentiment_analysis_fakhry` is a English model originally trained by Fakhry. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/phrasebank_sentiment_analysis_fakhry_en_5.5.0_3.0_1727279410361.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/phrasebank_sentiment_analysis_fakhry_en_5.5.0_3.0_1727279410361.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("phrasebank_sentiment_analysis_fakhry","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("phrasebank_sentiment_analysis_fakhry", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|phrasebank_sentiment_analysis_fakhry| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/Fakhry/phrasebank-sentiment-analysis \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-phrasebank_sentiment_analysis_ramnathv_en.md b/docs/_posts/ahmedlone127/2024-09-25-phrasebank_sentiment_analysis_ramnathv_en.md new file mode 100644 index 00000000000000..f48d1f786feef5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-phrasebank_sentiment_analysis_ramnathv_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English phrasebank_sentiment_analysis_ramnathv BertForSequenceClassification from ramnathv +author: John Snow Labs +name: phrasebank_sentiment_analysis_ramnathv +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`phrasebank_sentiment_analysis_ramnathv` is a English model originally trained by ramnathv. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/phrasebank_sentiment_analysis_ramnathv_en_5.5.0_3.0_1727269785757.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/phrasebank_sentiment_analysis_ramnathv_en_5.5.0_3.0_1727269785757.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("phrasebank_sentiment_analysis_ramnathv","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("phrasebank_sentiment_analysis_ramnathv", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|phrasebank_sentiment_analysis_ramnathv| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/ramnathv/phrasebank-sentiment-analysis \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-phrasebank_sentiment_analysis_ramnathv_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-phrasebank_sentiment_analysis_ramnathv_pipeline_en.md new file mode 100644 index 00000000000000..b648fd2cd26b26 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-phrasebank_sentiment_analysis_ramnathv_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English phrasebank_sentiment_analysis_ramnathv_pipeline pipeline BertForSequenceClassification from ramnathv +author: John Snow Labs +name: phrasebank_sentiment_analysis_ramnathv_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`phrasebank_sentiment_analysis_ramnathv_pipeline` is a English model originally trained by ramnathv. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/phrasebank_sentiment_analysis_ramnathv_pipeline_en_5.5.0_3.0_1727269808031.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/phrasebank_sentiment_analysis_ramnathv_pipeline_en_5.5.0_3.0_1727269808031.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("phrasebank_sentiment_analysis_ramnathv_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("phrasebank_sentiment_analysis_ramnathv_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|phrasebank_sentiment_analysis_ramnathv_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/ramnathv/phrasebank-sentiment-analysis + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-phrasebank_sentiment_analysis_richychn_en.md b/docs/_posts/ahmedlone127/2024-09-25-phrasebank_sentiment_analysis_richychn_en.md new file mode 100644 index 00000000000000..7343cebb801ce8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-phrasebank_sentiment_analysis_richychn_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English phrasebank_sentiment_analysis_richychn BertForSequenceClassification from richychn +author: John Snow Labs +name: phrasebank_sentiment_analysis_richychn +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`phrasebank_sentiment_analysis_richychn` is a English model originally trained by richychn. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/phrasebank_sentiment_analysis_richychn_en_5.5.0_3.0_1727273091918.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/phrasebank_sentiment_analysis_richychn_en_5.5.0_3.0_1727273091918.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("phrasebank_sentiment_analysis_richychn","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("phrasebank_sentiment_analysis_richychn", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|phrasebank_sentiment_analysis_richychn| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/richychn/phrasebank-sentiment-analysis \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-phrasebank_sentiment_analysis_saiteja_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-phrasebank_sentiment_analysis_saiteja_pipeline_en.md new file mode 100644 index 00000000000000..fed93220283bb9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-phrasebank_sentiment_analysis_saiteja_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English phrasebank_sentiment_analysis_saiteja_pipeline pipeline BertForSequenceClassification from Saiteja +author: John Snow Labs +name: phrasebank_sentiment_analysis_saiteja_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`phrasebank_sentiment_analysis_saiteja_pipeline` is a English model originally trained by Saiteja. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/phrasebank_sentiment_analysis_saiteja_pipeline_en_5.5.0_3.0_1727268451986.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/phrasebank_sentiment_analysis_saiteja_pipeline_en_5.5.0_3.0_1727268451986.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("phrasebank_sentiment_analysis_saiteja_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("phrasebank_sentiment_analysis_saiteja_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|phrasebank_sentiment_analysis_saiteja_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/Saiteja/phrasebank-sentiment-analysis + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-phrasebank_sentiment_analysis_stolbiq_en.md b/docs/_posts/ahmedlone127/2024-09-25-phrasebank_sentiment_analysis_stolbiq_en.md new file mode 100644 index 00000000000000..cf69fccff2ae7e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-phrasebank_sentiment_analysis_stolbiq_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English phrasebank_sentiment_analysis_stolbiq BertForSequenceClassification from stolbiq +author: John Snow Labs +name: phrasebank_sentiment_analysis_stolbiq +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`phrasebank_sentiment_analysis_stolbiq` is a English model originally trained by stolbiq. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/phrasebank_sentiment_analysis_stolbiq_en_5.5.0_3.0_1727266435933.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/phrasebank_sentiment_analysis_stolbiq_en_5.5.0_3.0_1727266435933.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("phrasebank_sentiment_analysis_stolbiq","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("phrasebank_sentiment_analysis_stolbiq", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|phrasebank_sentiment_analysis_stolbiq| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/stolbiq/phrasebank-sentiment-analysis \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-phrasebank_sentiment_analysis_stolbiq_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-phrasebank_sentiment_analysis_stolbiq_pipeline_en.md new file mode 100644 index 00000000000000..d4ce18a19e2154 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-phrasebank_sentiment_analysis_stolbiq_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English phrasebank_sentiment_analysis_stolbiq_pipeline pipeline BertForSequenceClassification from stolbiq +author: John Snow Labs +name: phrasebank_sentiment_analysis_stolbiq_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`phrasebank_sentiment_analysis_stolbiq_pipeline` is a English model originally trained by stolbiq. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/phrasebank_sentiment_analysis_stolbiq_pipeline_en_5.5.0_3.0_1727266458662.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/phrasebank_sentiment_analysis_stolbiq_pipeline_en_5.5.0_3.0_1727266458662.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("phrasebank_sentiment_analysis_stolbiq_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("phrasebank_sentiment_analysis_stolbiq_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|phrasebank_sentiment_analysis_stolbiq_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/stolbiq/phrasebank-sentiment-analysis + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-polite_bert_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-polite_bert_pipeline_en.md new file mode 100644 index 00000000000000..e9e7c4b8e5f627 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-polite_bert_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English polite_bert_pipeline pipeline BertForSequenceClassification from NOVA-vision-language +author: John Snow Labs +name: polite_bert_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`polite_bert_pipeline` is a English model originally trained by NOVA-vision-language. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/polite_bert_pipeline_en_5.5.0_3.0_1727269543128.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/polite_bert_pipeline_en_5.5.0_3.0_1727269543128.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("polite_bert_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("polite_bert_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|polite_bert_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/NOVA-vision-language/polite_bert + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-postagger_bio_portuguese_pipeline_pt.md b/docs/_posts/ahmedlone127/2024-09-25-postagger_bio_portuguese_pipeline_pt.md new file mode 100644 index 00000000000000..1acf011046a6c9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-postagger_bio_portuguese_pipeline_pt.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Portuguese postagger_bio_portuguese_pipeline pipeline BertForTokenClassification from pucpr-br +author: John Snow Labs +name: postagger_bio_portuguese_pipeline +date: 2024-09-25 +tags: [pt, open_source, pipeline, onnx] +task: Named Entity Recognition +language: pt +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`postagger_bio_portuguese_pipeline` is a Portuguese model originally trained by pucpr-br. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/postagger_bio_portuguese_pipeline_pt_5.5.0_3.0_1727259109089.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/postagger_bio_portuguese_pipeline_pt_5.5.0_3.0_1727259109089.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("postagger_bio_portuguese_pipeline", lang = "pt") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("postagger_bio_portuguese_pipeline", lang = "pt") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|postagger_bio_portuguese_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|pt| +|Size:|665.0 MB| + +## References + +https://huggingface.co/pucpr-br/postagger-bio-portuguese + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-postagger_bio_portuguese_pt.md b/docs/_posts/ahmedlone127/2024-09-25-postagger_bio_portuguese_pt.md new file mode 100644 index 00000000000000..307d600f04cd15 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-postagger_bio_portuguese_pt.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Portuguese postagger_bio_portuguese BertForTokenClassification from pucpr-br +author: John Snow Labs +name: postagger_bio_portuguese +date: 2024-09-25 +tags: [pt, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: pt +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`postagger_bio_portuguese` is a Portuguese model originally trained by pucpr-br. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/postagger_bio_portuguese_pt_5.5.0_3.0_1727259074952.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/postagger_bio_portuguese_pt_5.5.0_3.0_1727259074952.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("postagger_bio_portuguese","pt") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("postagger_bio_portuguese", "pt") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|postagger_bio_portuguese| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|pt| +|Size:|664.9 MB| + +## References + +https://huggingface.co/pucpr-br/postagger-bio-portuguese \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-propoint_final_project_en.md b/docs/_posts/ahmedlone127/2024-09-25-propoint_final_project_en.md new file mode 100644 index 00000000000000..277ffb3ecf8639 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-propoint_final_project_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English propoint_final_project BertForSequenceClassification from DataAngelo +author: John Snow Labs +name: propoint_final_project +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`propoint_final_project` is a English model originally trained by DataAngelo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/propoint_final_project_en_5.5.0_3.0_1727272671514.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/propoint_final_project_en_5.5.0_3.0_1727272671514.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("propoint_final_project","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("propoint_final_project", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|propoint_final_project| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|405.9 MB| + +## References + +https://huggingface.co/DataAngelo/propoint_Final_project \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-prototipo_7_emi_en.md b/docs/_posts/ahmedlone127/2024-09-25-prototipo_7_emi_en.md new file mode 100644 index 00000000000000..403836deb137d2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-prototipo_7_emi_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English prototipo_7_emi BertForSequenceClassification from Armandodelca +author: John Snow Labs +name: prototipo_7_emi +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`prototipo_7_emi` is a English model originally trained by Armandodelca. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/prototipo_7_emi_en_5.5.0_3.0_1727270129321.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/prototipo_7_emi_en_5.5.0_3.0_1727270129321.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("prototipo_7_emi","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("prototipo_7_emi", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|prototipo_7_emi| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|411.7 MB| + +## References + +https://huggingface.co/Armandodelca/Prototipo_7_EMI \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-prototipo_7_emi_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-prototipo_7_emi_pipeline_en.md new file mode 100644 index 00000000000000..82034aa2fb77e1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-prototipo_7_emi_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English prototipo_7_emi_pipeline pipeline BertForSequenceClassification from Armandodelca +author: John Snow Labs +name: prototipo_7_emi_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`prototipo_7_emi_pipeline` is a English model originally trained by Armandodelca. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/prototipo_7_emi_pipeline_en_5.5.0_3.0_1727270150743.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/prototipo_7_emi_pipeline_en_5.5.0_3.0_1727270150743.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("prototipo_7_emi_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("prototipo_7_emi_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|prototipo_7_emi_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|411.7 MB| + +## References + +https://huggingface.co/Armandodelca/Prototipo_7_EMI + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-psychbert_finetuned_mentalhealth_en.md b/docs/_posts/ahmedlone127/2024-09-25-psychbert_finetuned_mentalhealth_en.md new file mode 100644 index 00000000000000..b96763d4612595 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-psychbert_finetuned_mentalhealth_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English psychbert_finetuned_mentalhealth BertForSequenceClassification from mnaylor +author: John Snow Labs +name: psychbert_finetuned_mentalhealth +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`psychbert_finetuned_mentalhealth` is a English model originally trained by mnaylor. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/psychbert_finetuned_mentalhealth_en_5.5.0_3.0_1727257449190.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/psychbert_finetuned_mentalhealth_en_5.5.0_3.0_1727257449190.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("psychbert_finetuned_mentalhealth","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("psychbert_finetuned_mentalhealth", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|psychbert_finetuned_mentalhealth| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|405.9 MB| + +## References + +https://huggingface.co/mnaylor/psychbert-finetuned-mentalhealth \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-re2g_reranker_fever_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-re2g_reranker_fever_pipeline_en.md new file mode 100644 index 00000000000000..220cb9a12218e8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-re2g_reranker_fever_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English re2g_reranker_fever_pipeline pipeline BertForSequenceClassification from ibm +author: John Snow Labs +name: re2g_reranker_fever_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`re2g_reranker_fever_pipeline` is a English model originally trained by ibm. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/re2g_reranker_fever_pipeline_en_5.5.0_3.0_1727287247023.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/re2g_reranker_fever_pipeline_en_5.5.0_3.0_1727287247023.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("re2g_reranker_fever_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("re2g_reranker_fever_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|re2g_reranker_fever_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/ibm/re2g-reranker-fever + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-requirements_ambiguity_v2_nl.md b/docs/_posts/ahmedlone127/2024-09-25-requirements_ambiguity_v2_nl.md new file mode 100644 index 00000000000000..c4e79497f8196a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-requirements_ambiguity_v2_nl.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Dutch, Flemish requirements_ambiguity_v2 BertForSequenceClassification from denizspynk +author: John Snow Labs +name: requirements_ambiguity_v2 +date: 2024-09-25 +tags: [nl, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: nl +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`requirements_ambiguity_v2` is a Dutch, Flemish model originally trained by denizspynk. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/requirements_ambiguity_v2_nl_5.5.0_3.0_1727267361597.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/requirements_ambiguity_v2_nl_5.5.0_3.0_1727267361597.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("requirements_ambiguity_v2","nl") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("requirements_ambiguity_v2", "nl") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|requirements_ambiguity_v2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|nl| +|Size:|409.0 MB| + +## References + +https://huggingface.co/denizspynk/requirements_ambiguity_v2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-requirements_ambiguity_v2_pipeline_nl.md b/docs/_posts/ahmedlone127/2024-09-25-requirements_ambiguity_v2_pipeline_nl.md new file mode 100644 index 00000000000000..05d64757cb0842 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-requirements_ambiguity_v2_pipeline_nl.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Dutch, Flemish requirements_ambiguity_v2_pipeline pipeline BertForSequenceClassification from denizspynk +author: John Snow Labs +name: requirements_ambiguity_v2_pipeline +date: 2024-09-25 +tags: [nl, open_source, pipeline, onnx] +task: Text Classification +language: nl +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`requirements_ambiguity_v2_pipeline` is a Dutch, Flemish model originally trained by denizspynk. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/requirements_ambiguity_v2_pipeline_nl_5.5.0_3.0_1727267383613.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/requirements_ambiguity_v2_pipeline_nl_5.5.0_3.0_1727267383613.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("requirements_ambiguity_v2_pipeline", lang = "nl") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("requirements_ambiguity_v2_pipeline", lang = "nl") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|requirements_ambiguity_v2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|nl| +|Size:|409.0 MB| + +## References + +https://huggingface.co/denizspynk/requirements_ambiguity_v2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-response_score_en.md b/docs/_posts/ahmedlone127/2024-09-25-response_score_en.md new file mode 100644 index 00000000000000..339aff0e4a8b56 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-response_score_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English response_score BertForSequenceClassification from conversify +author: John Snow Labs +name: response_score +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`response_score` is a English model originally trained by conversify. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/response_score_en_5.5.0_3.0_1727279110839.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/response_score_en_5.5.0_3.0_1727279110839.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("response_score","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("response_score", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|response_score| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/conversify/response-score \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-robert_sst2_sentiment_full_en.md b/docs/_posts/ahmedlone127/2024-09-25-robert_sst2_sentiment_full_en.md new file mode 100644 index 00000000000000..9bc8a833c1f316 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-robert_sst2_sentiment_full_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English robert_sst2_sentiment_full RoBertaForSequenceClassification from asm3515 +author: John Snow Labs +name: robert_sst2_sentiment_full +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`robert_sst2_sentiment_full` is a English model originally trained by asm3515. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/robert_sst2_sentiment_full_en_5.5.0_3.0_1727234101299.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/robert_sst2_sentiment_full_en_5.5.0_3.0_1727234101299.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("robert_sst2_sentiment_full","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("robert_sst2_sentiment_full", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|robert_sst2_sentiment_full| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|443.3 MB| + +## References + +https://huggingface.co/asm3515/Robert-sst2-sentiment-full \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-robert_sst2_sentiment_full_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-robert_sst2_sentiment_full_pipeline_en.md new file mode 100644 index 00000000000000..a0c0ca3c40b9e5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-robert_sst2_sentiment_full_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English robert_sst2_sentiment_full_pipeline pipeline RoBertaForSequenceClassification from asm3515 +author: John Snow Labs +name: robert_sst2_sentiment_full_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`robert_sst2_sentiment_full_pipeline` is a English model originally trained by asm3515. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/robert_sst2_sentiment_full_pipeline_en_5.5.0_3.0_1727234127250.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/robert_sst2_sentiment_full_pipeline_en_5.5.0_3.0_1727234127250.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("robert_sst2_sentiment_full_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("robert_sst2_sentiment_full_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|robert_sst2_sentiment_full_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|443.3 MB| + +## References + +https://huggingface.co/asm3515/Robert-sst2-sentiment-full + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-roberta_cased_poem_evalutation_en.md b/docs/_posts/ahmedlone127/2024-09-25-roberta_cased_poem_evalutation_en.md new file mode 100644 index 00000000000000..7cb193885e28f6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-roberta_cased_poem_evalutation_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_cased_poem_evalutation XlmRoBertaForSequenceClassification from numblilbug +author: John Snow Labs +name: roberta_cased_poem_evalutation +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_cased_poem_evalutation` is a English model originally trained by numblilbug. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_cased_poem_evalutation_en_5.5.0_3.0_1727229518168.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_cased_poem_evalutation_en_5.5.0_3.0_1727229518168.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("roberta_cased_poem_evalutation","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("roberta_cased_poem_evalutation", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_cased_poem_evalutation| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|800.6 MB| + +## References + +https://huggingface.co/numblilbug/roberta-cased-poem-evalutation \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-roberta_cased_poem_evalutation_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-roberta_cased_poem_evalutation_pipeline_en.md new file mode 100644 index 00000000000000..bebdbfe2ccfb7a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-roberta_cased_poem_evalutation_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_cased_poem_evalutation_pipeline pipeline XlmRoBertaForSequenceClassification from numblilbug +author: John Snow Labs +name: roberta_cased_poem_evalutation_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_cased_poem_evalutation_pipeline` is a English model originally trained by numblilbug. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_cased_poem_evalutation_pipeline_en_5.5.0_3.0_1727229640522.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_cased_poem_evalutation_pipeline_en_5.5.0_3.0_1727229640522.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_cased_poem_evalutation_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_cased_poem_evalutation_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_cased_poem_evalutation_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|800.7 MB| + +## References + +https://huggingface.co/numblilbug/roberta-cased-poem-evalutation + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-roberta_cws_assamese_en.md b/docs/_posts/ahmedlone127/2024-09-25-roberta_cws_assamese_en.md new file mode 100644 index 00000000000000..598bb471144fb0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-roberta_cws_assamese_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_cws_assamese BertForTokenClassification from tjspross +author: John Snow Labs +name: roberta_cws_assamese +date: 2024-09-25 +tags: [en, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_cws_assamese` is a English model originally trained by tjspross. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_cws_assamese_en_5.5.0_3.0_1727247216317.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_cws_assamese_en_5.5.0_3.0_1727247216317.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("roberta_cws_assamese","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("roberta_cws_assamese", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_cws_assamese| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/tjspross/roberta_cws_as \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-roberta_cws_assamese_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-roberta_cws_assamese_pipeline_en.md new file mode 100644 index 00000000000000..590942d60c19d3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-roberta_cws_assamese_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_cws_assamese_pipeline pipeline BertForTokenClassification from tjspross +author: John Snow Labs +name: roberta_cws_assamese_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_cws_assamese_pipeline` is a English model originally trained by tjspross. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_cws_assamese_pipeline_en_5.5.0_3.0_1727247276609.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_cws_assamese_pipeline_en_5.5.0_3.0_1727247276609.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_cws_assamese_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_cws_assamese_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_cws_assamese_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/tjspross/roberta_cws_as + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-roberta_cws_pku_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-roberta_cws_pku_pipeline_en.md new file mode 100644 index 00000000000000..ef3443da8ba3a3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-roberta_cws_pku_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_cws_pku_pipeline pipeline BertForTokenClassification from tjspross +author: John Snow Labs +name: roberta_cws_pku_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_cws_pku_pipeline` is a English model originally trained by tjspross. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_cws_pku_pipeline_en_5.5.0_3.0_1727265373986.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_cws_pku_pipeline_en_5.5.0_3.0_1727265373986.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_cws_pku_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_cws_pku_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_cws_pku_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/tjspross/roberta_cws_pku + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-robust_bert_yelp_en.md b/docs/_posts/ahmedlone127/2024-09-25-robust_bert_yelp_en.md new file mode 100644 index 00000000000000..068c779df22ceb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-robust_bert_yelp_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English robust_bert_yelp BertForSequenceClassification from JiaqiLee +author: John Snow Labs +name: robust_bert_yelp +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`robust_bert_yelp` is a English model originally trained by JiaqiLee. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/robust_bert_yelp_en_5.5.0_3.0_1727235386984.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/robust_bert_yelp_en_5.5.0_3.0_1727235386984.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("robust_bert_yelp","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("robust_bert_yelp", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|robust_bert_yelp| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/JiaqiLee/robust-bert-yelp \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-rubert_tiny2_russian_financial_sentiment_pipeline_ru.md b/docs/_posts/ahmedlone127/2024-09-25-rubert_tiny2_russian_financial_sentiment_pipeline_ru.md new file mode 100644 index 00000000000000..b14bdf015134fc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-rubert_tiny2_russian_financial_sentiment_pipeline_ru.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Russian rubert_tiny2_russian_financial_sentiment_pipeline pipeline BertForSequenceClassification from mxlcw +author: John Snow Labs +name: rubert_tiny2_russian_financial_sentiment_pipeline +date: 2024-09-25 +tags: [ru, open_source, pipeline, onnx] +task: Text Classification +language: ru +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`rubert_tiny2_russian_financial_sentiment_pipeline` is a Russian model originally trained by mxlcw. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/rubert_tiny2_russian_financial_sentiment_pipeline_ru_5.5.0_3.0_1727268672632.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/rubert_tiny2_russian_financial_sentiment_pipeline_ru_5.5.0_3.0_1727268672632.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("rubert_tiny2_russian_financial_sentiment_pipeline", lang = "ru") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("rubert_tiny2_russian_financial_sentiment_pipeline", lang = "ru") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|rubert_tiny2_russian_financial_sentiment_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|ru| +|Size:|109.5 MB| + +## References + +https://huggingface.co/mxlcw/rubert-tiny2-russian-financial-sentiment + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-rubert_tiny2_russian_financial_sentiment_ru.md b/docs/_posts/ahmedlone127/2024-09-25-rubert_tiny2_russian_financial_sentiment_ru.md new file mode 100644 index 00000000000000..820494880f026a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-rubert_tiny2_russian_financial_sentiment_ru.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Russian rubert_tiny2_russian_financial_sentiment BertForSequenceClassification from mxlcw +author: John Snow Labs +name: rubert_tiny2_russian_financial_sentiment +date: 2024-09-25 +tags: [ru, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: ru +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`rubert_tiny2_russian_financial_sentiment` is a Russian model originally trained by mxlcw. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/rubert_tiny2_russian_financial_sentiment_ru_5.5.0_3.0_1727268666371.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/rubert_tiny2_russian_financial_sentiment_ru_5.5.0_3.0_1727268666371.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("rubert_tiny2_russian_financial_sentiment","ru") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("rubert_tiny2_russian_financial_sentiment", "ru") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|rubert_tiny2_russian_financial_sentiment| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|ru| +|Size:|109.5 MB| + +## References + +https://huggingface.co/mxlcw/rubert-tiny2-russian-financial-sentiment \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-ruberttiny_multiclassv1_en.md b/docs/_posts/ahmedlone127/2024-09-25-ruberttiny_multiclassv1_en.md new file mode 100644 index 00000000000000..34aeaf44f53a6d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-ruberttiny_multiclassv1_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English ruberttiny_multiclassv1 BertForSequenceClassification from Shakhovak +author: John Snow Labs +name: ruberttiny_multiclassv1 +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ruberttiny_multiclassv1` is a English model originally trained by Shakhovak. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ruberttiny_multiclassv1_en_5.5.0_3.0_1727261047373.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ruberttiny_multiclassv1_en_5.5.0_3.0_1727261047373.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("ruberttiny_multiclassv1","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("ruberttiny_multiclassv1", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ruberttiny_multiclassv1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|109.5 MB| + +## References + +https://huggingface.co/Shakhovak/ruBertTiny_multiclassv1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-russscholar_seeker_en.md b/docs/_posts/ahmedlone127/2024-09-25-russscholar_seeker_en.md new file mode 100644 index 00000000000000..baf1cb48a0b8d9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-russscholar_seeker_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English russscholar_seeker BertForSequenceClassification from Gao-Tianci +author: John Snow Labs +name: russscholar_seeker +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`russscholar_seeker` is a English model originally trained by Gao-Tianci. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/russscholar_seeker_en_5.5.0_3.0_1727263668402.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/russscholar_seeker_en_5.5.0_3.0_1727263668402.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("russscholar_seeker","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("russscholar_seeker", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|russscholar_seeker| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/Gao-Tianci/RussScholar-Seeker \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sarcasm_detection_bert_base_uncased_cree_en.md b/docs/_posts/ahmedlone127/2024-09-25-sarcasm_detection_bert_base_uncased_cree_en.md new file mode 100644 index 00000000000000..d5e4e5c575031f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sarcasm_detection_bert_base_uncased_cree_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sarcasm_detection_bert_base_uncased_cree BertForSequenceClassification from jkhan447 +author: John Snow Labs +name: sarcasm_detection_bert_base_uncased_cree +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sarcasm_detection_bert_base_uncased_cree` is a English model originally trained by jkhan447. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sarcasm_detection_bert_base_uncased_cree_en_5.5.0_3.0_1727263812158.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sarcasm_detection_bert_base_uncased_cree_en_5.5.0_3.0_1727263812158.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("sarcasm_detection_bert_base_uncased_cree","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("sarcasm_detection_bert_base_uncased_cree", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sarcasm_detection_bert_base_uncased_cree| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/jkhan447/sarcasm-detection-Bert-base-uncased-CR \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sarcasm_detection_bert_base_uncased_cree_sayula_popoluca_en.md b/docs/_posts/ahmedlone127/2024-09-25-sarcasm_detection_bert_base_uncased_cree_sayula_popoluca_en.md new file mode 100644 index 00000000000000..b486e8e9d57565 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sarcasm_detection_bert_base_uncased_cree_sayula_popoluca_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sarcasm_detection_bert_base_uncased_cree_sayula_popoluca BertForSequenceClassification from jkhan447 +author: John Snow Labs +name: sarcasm_detection_bert_base_uncased_cree_sayula_popoluca +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sarcasm_detection_bert_base_uncased_cree_sayula_popoluca` is a English model originally trained by jkhan447. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sarcasm_detection_bert_base_uncased_cree_sayula_popoluca_en_5.5.0_3.0_1727269746904.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sarcasm_detection_bert_base_uncased_cree_sayula_popoluca_en_5.5.0_3.0_1727269746904.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("sarcasm_detection_bert_base_uncased_cree_sayula_popoluca","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("sarcasm_detection_bert_base_uncased_cree_sayula_popoluca", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sarcasm_detection_bert_base_uncased_cree_sayula_popoluca| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/jkhan447/sarcasm-detection-Bert-base-uncased-CR-POS \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sarcasm_detection_bert_base_uncased_cree_sayula_popoluca_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-sarcasm_detection_bert_base_uncased_cree_sayula_popoluca_pipeline_en.md new file mode 100644 index 00000000000000..dbb32bbe15575d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sarcasm_detection_bert_base_uncased_cree_sayula_popoluca_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English sarcasm_detection_bert_base_uncased_cree_sayula_popoluca_pipeline pipeline BertForSequenceClassification from jkhan447 +author: John Snow Labs +name: sarcasm_detection_bert_base_uncased_cree_sayula_popoluca_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sarcasm_detection_bert_base_uncased_cree_sayula_popoluca_pipeline` is a English model originally trained by jkhan447. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sarcasm_detection_bert_base_uncased_cree_sayula_popoluca_pipeline_en_5.5.0_3.0_1727269769545.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sarcasm_detection_bert_base_uncased_cree_sayula_popoluca_pipeline_en_5.5.0_3.0_1727269769545.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sarcasm_detection_bert_base_uncased_cree_sayula_popoluca_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sarcasm_detection_bert_base_uncased_cree_sayula_popoluca_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sarcasm_detection_bert_base_uncased_cree_sayula_popoluca_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/jkhan447/sarcasm-detection-Bert-base-uncased-CR-POS + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-segbert_en.md b/docs/_posts/ahmedlone127/2024-09-25-segbert_en.md new file mode 100644 index 00000000000000..40c83e8ca39c83 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-segbert_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English segbert BertForTokenClassification from gMask +author: John Snow Labs +name: segbert +date: 2024-09-25 +tags: [en, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`segbert` is a English model originally trained by gMask. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/segbert_en_5.5.0_3.0_1727272315504.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/segbert_en_5.5.0_3.0_1727272315504.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("segbert","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("segbert", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|segbert| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|412.6 MB| + +## References + +https://huggingface.co/gMask/SegBERT \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-segbert_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-segbert_pipeline_en.md new file mode 100644 index 00000000000000..e0eebd6d3a5959 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-segbert_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English segbert_pipeline pipeline BertForTokenClassification from gMask +author: John Snow Labs +name: segbert_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`segbert_pipeline` is a English model originally trained by gMask. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/segbert_pipeline_en_5.5.0_3.0_1727272337252.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/segbert_pipeline_en_5.5.0_3.0_1727272337252.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("segbert_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("segbert_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|segbert_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|412.6 MB| + +## References + +https://huggingface.co/gMask/SegBERT + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sembr2023_bert_mini_en.md b/docs/_posts/ahmedlone127/2024-09-25-sembr2023_bert_mini_en.md new file mode 100644 index 00000000000000..ee6d88576d0a99 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sembr2023_bert_mini_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sembr2023_bert_mini BertForTokenClassification from admko +author: John Snow Labs +name: sembr2023_bert_mini +date: 2024-09-25 +tags: [en, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sembr2023_bert_mini` is a English model originally trained by admko. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sembr2023_bert_mini_en_5.5.0_3.0_1727271738534.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sembr2023_bert_mini_en_5.5.0_3.0_1727271738534.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("sembr2023_bert_mini","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("sembr2023_bert_mini", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sembr2023_bert_mini| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|41.9 MB| + +## References + +https://huggingface.co/admko/sembr2023-bert-mini \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sembr2023_bert_mini_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-sembr2023_bert_mini_pipeline_en.md new file mode 100644 index 00000000000000..16a1726ec7bd4f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sembr2023_bert_mini_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English sembr2023_bert_mini_pipeline pipeline BertForTokenClassification from admko +author: John Snow Labs +name: sembr2023_bert_mini_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sembr2023_bert_mini_pipeline` is a English model originally trained by admko. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sembr2023_bert_mini_pipeline_en_5.5.0_3.0_1727271741023.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sembr2023_bert_mini_pipeline_en_5.5.0_3.0_1727271741023.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sembr2023_bert_mini_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sembr2023_bert_mini_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sembr2023_bert_mini_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|41.9 MB| + +## References + +https://huggingface.co/admko/sembr2023-bert-mini + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_arabert_ar.md b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_arabert_ar.md new file mode 100644 index 00000000000000..3146c5f1b5dce4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_arabert_ar.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Arabic sent_bert_base_arabert BertSentenceEmbeddings from aubmindlab +author: John Snow Labs +name: sent_bert_base_arabert +date: 2024-09-25 +tags: [ar, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: ar +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_arabert` is a Arabic model originally trained by aubmindlab. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_arabert_ar_5.5.0_3.0_1727252183947.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_arabert_ar_5.5.0_3.0_1727252183947.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_arabert","ar") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_arabert","ar") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_arabert| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|ar| +|Size:|504.6 MB| + +## References + +https://huggingface.co/aubmindlab/bert-base-arabert \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_cased_finetuned_en.md b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_cased_finetuned_en.md new file mode 100644 index 00000000000000..3e1bd2ecc876ee --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_cased_finetuned_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_base_cased_finetuned BertSentenceEmbeddings from GusNicho +author: John Snow Labs +name: sent_bert_base_cased_finetuned +date: 2024-09-25 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_cased_finetuned` is a English model originally trained by GusNicho. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_cased_finetuned_en_5.5.0_3.0_1727248585487.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_cased_finetuned_en_5.5.0_3.0_1727248585487.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_cased_finetuned","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_cased_finetuned","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_cased_finetuned| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|403.6 MB| + +## References + +https://huggingface.co/GusNicho/bert-base-cased-finetuned \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_cased_finetuned_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_cased_finetuned_pipeline_en.md new file mode 100644 index 00000000000000..e66ba1d82a3bc5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_cased_finetuned_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_base_cased_finetuned_pipeline pipeline BertSentenceEmbeddings from GusNicho +author: John Snow Labs +name: sent_bert_base_cased_finetuned_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_cased_finetuned_pipeline` is a English model originally trained by GusNicho. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_cased_finetuned_pipeline_en_5.5.0_3.0_1727248606505.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_cased_finetuned_pipeline_en_5.5.0_3.0_1727248606505.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_cased_finetuned_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_cased_finetuned_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_cased_finetuned_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|404.2 MB| + +## References + +https://huggingface.co/GusNicho/bert-base-cased-finetuned + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_english_french_spanish_portuguese_italian_cased_en.md b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_english_french_spanish_portuguese_italian_cased_en.md new file mode 100644 index 00000000000000..7201288827bc68 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_english_french_spanish_portuguese_italian_cased_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_base_english_french_spanish_portuguese_italian_cased BertSentenceEmbeddings from Geotrend +author: John Snow Labs +name: sent_bert_base_english_french_spanish_portuguese_italian_cased +date: 2024-09-25 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_english_french_spanish_portuguese_italian_cased` is a English model originally trained by Geotrend. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_english_french_spanish_portuguese_italian_cased_en_5.5.0_3.0_1727252787834.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_english_french_spanish_portuguese_italian_cased_en_5.5.0_3.0_1727252787834.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_english_french_spanish_portuguese_italian_cased","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_english_french_spanish_portuguese_italian_cased","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_english_french_spanish_portuguese_italian_cased| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|444.7 MB| + +## References + +https://huggingface.co/Geotrend/bert-base-en-fr-es-pt-it-cased \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_english_french_spanish_portuguese_italian_cased_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_english_french_spanish_portuguese_italian_cased_pipeline_en.md new file mode 100644 index 00000000000000..b93dc47b15e02f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_english_french_spanish_portuguese_italian_cased_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_base_english_french_spanish_portuguese_italian_cased_pipeline pipeline BertSentenceEmbeddings from Geotrend +author: John Snow Labs +name: sent_bert_base_english_french_spanish_portuguese_italian_cased_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_english_french_spanish_portuguese_italian_cased_pipeline` is a English model originally trained by Geotrend. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_english_french_spanish_portuguese_italian_cased_pipeline_en_5.5.0_3.0_1727252810824.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_english_french_spanish_portuguese_italian_cased_pipeline_en_5.5.0_3.0_1727252810824.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_english_french_spanish_portuguese_italian_cased_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_english_french_spanish_portuguese_italian_cased_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_english_french_spanish_portuguese_italian_cased_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|445.2 MB| + +## References + +https://huggingface.co/Geotrend/bert-base-en-fr-es-pt-it-cased + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_english_greek_modern_cased_en.md b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_english_greek_modern_cased_en.md new file mode 100644 index 00000000000000..149451b12b97ee --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_english_greek_modern_cased_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_base_english_greek_modern_cased BertSentenceEmbeddings from Geotrend +author: John Snow Labs +name: sent_bert_base_english_greek_modern_cased +date: 2024-09-25 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_english_greek_modern_cased` is a English model originally trained by Geotrend. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_english_greek_modern_cased_en_5.5.0_3.0_1727249089682.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_english_greek_modern_cased_en_5.5.0_3.0_1727249089682.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_english_greek_modern_cased","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_english_greek_modern_cased","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_english_greek_modern_cased| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|408.4 MB| + +## References + +https://huggingface.co/Geotrend/bert-base-en-el-cased \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_english_greek_modern_cased_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_english_greek_modern_cased_pipeline_en.md new file mode 100644 index 00000000000000..aa582455c436a2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_english_greek_modern_cased_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_base_english_greek_modern_cased_pipeline pipeline BertSentenceEmbeddings from Geotrend +author: John Snow Labs +name: sent_bert_base_english_greek_modern_cased_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_english_greek_modern_cased_pipeline` is a English model originally trained by Geotrend. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_english_greek_modern_cased_pipeline_en_5.5.0_3.0_1727249112368.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_english_greek_modern_cased_pipeline_en_5.5.0_3.0_1727249112368.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_english_greek_modern_cased_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_english_greek_modern_cased_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_english_greek_modern_cased_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.0 MB| + +## References + +https://huggingface.co/Geotrend/bert-base-en-el-cased + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_english_romanian_cased_en.md b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_english_romanian_cased_en.md new file mode 100644 index 00000000000000..bf34f70ca72761 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_english_romanian_cased_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_base_english_romanian_cased BertSentenceEmbeddings from Geotrend +author: John Snow Labs +name: sent_bert_base_english_romanian_cased +date: 2024-09-25 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_english_romanian_cased` is a English model originally trained by Geotrend. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_english_romanian_cased_en_5.5.0_3.0_1727249389571.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_english_romanian_cased_en_5.5.0_3.0_1727249389571.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_english_romanian_cased","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_english_romanian_cased","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_english_romanian_cased| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|413.3 MB| + +## References + +https://huggingface.co/Geotrend/bert-base-en-ro-cased \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_english_romanian_cased_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_english_romanian_cased_pipeline_en.md new file mode 100644 index 00000000000000..ffa07fd104257d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_english_romanian_cased_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_base_english_romanian_cased_pipeline pipeline BertSentenceEmbeddings from Geotrend +author: John Snow Labs +name: sent_bert_base_english_romanian_cased_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_english_romanian_cased_pipeline` is a English model originally trained by Geotrend. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_english_romanian_cased_pipeline_en_5.5.0_3.0_1727249410986.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_english_romanian_cased_pipeline_en_5.5.0_3.0_1727249410986.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_english_romanian_cased_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_english_romanian_cased_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_english_romanian_cased_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|413.8 MB| + +## References + +https://huggingface.co/Geotrend/bert-base-en-ro-cased + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_english_swahili_cased_en.md b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_english_swahili_cased_en.md new file mode 100644 index 00000000000000..2f710361e7a051 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_english_swahili_cased_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_base_english_swahili_cased BertSentenceEmbeddings from Geotrend +author: John Snow Labs +name: sent_bert_base_english_swahili_cased +date: 2024-09-25 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_english_swahili_cased` is a English model originally trained by Geotrend. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_english_swahili_cased_en_5.5.0_3.0_1727252203082.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_english_swahili_cased_en_5.5.0_3.0_1727252203082.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_english_swahili_cased","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_english_swahili_cased","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_english_swahili_cased| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|408.7 MB| + +## References + +https://huggingface.co/Geotrend/bert-base-en-sw-cased \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_english_urdu_cased_en.md b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_english_urdu_cased_en.md new file mode 100644 index 00000000000000..92fe9b68604874 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_english_urdu_cased_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_base_english_urdu_cased BertSentenceEmbeddings from Geotrend +author: John Snow Labs +name: sent_bert_base_english_urdu_cased +date: 2024-09-25 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_english_urdu_cased` is a English model originally trained by Geotrend. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_english_urdu_cased_en_5.5.0_3.0_1727256669262.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_english_urdu_cased_en_5.5.0_3.0_1727256669262.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_english_urdu_cased","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_english_urdu_cased","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_english_urdu_cased| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|410.7 MB| + +## References + +https://huggingface.co/Geotrend/bert-base-en-ur-cased \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_greek_uncased_v5_finetuned_polylex_malagasy_en.md b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_greek_uncased_v5_finetuned_polylex_malagasy_en.md new file mode 100644 index 00000000000000..d0933c512eddb1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_greek_uncased_v5_finetuned_polylex_malagasy_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_base_greek_uncased_v5_finetuned_polylex_malagasy BertSentenceEmbeddings from snousias +author: John Snow Labs +name: sent_bert_base_greek_uncased_v5_finetuned_polylex_malagasy +date: 2024-09-25 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_greek_uncased_v5_finetuned_polylex_malagasy` is a English model originally trained by snousias. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_greek_uncased_v5_finetuned_polylex_malagasy_en_5.5.0_3.0_1727252727733.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_greek_uncased_v5_finetuned_polylex_malagasy_en_5.5.0_3.0_1727252727733.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_greek_uncased_v5_finetuned_polylex_malagasy","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_greek_uncased_v5_finetuned_polylex_malagasy","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_greek_uncased_v5_finetuned_polylex_malagasy| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|421.1 MB| + +## References + +https://huggingface.co/snousias/bert-base-greek-uncased-v5-finetuned-polylex-mg \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_greek_uncased_v5_finetuned_polylex_malagasy_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_greek_uncased_v5_finetuned_polylex_malagasy_pipeline_en.md new file mode 100644 index 00000000000000..d961b5455089e5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_greek_uncased_v5_finetuned_polylex_malagasy_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_base_greek_uncased_v5_finetuned_polylex_malagasy_pipeline pipeline BertSentenceEmbeddings from snousias +author: John Snow Labs +name: sent_bert_base_greek_uncased_v5_finetuned_polylex_malagasy_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_greek_uncased_v5_finetuned_polylex_malagasy_pipeline` is a English model originally trained by snousias. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_greek_uncased_v5_finetuned_polylex_malagasy_pipeline_en_5.5.0_3.0_1727252749532.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_greek_uncased_v5_finetuned_polylex_malagasy_pipeline_en_5.5.0_3.0_1727252749532.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_greek_uncased_v5_finetuned_polylex_malagasy_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_greek_uncased_v5_finetuned_polylex_malagasy_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_greek_uncased_v5_finetuned_polylex_malagasy_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|421.7 MB| + +## References + +https://huggingface.co/snousias/bert-base-greek-uncased-v5-finetuned-polylex-mg + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_macedonian_cased_en.md b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_macedonian_cased_en.md new file mode 100644 index 00000000000000..24aa490575cdcd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_macedonian_cased_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_base_macedonian_cased BertSentenceEmbeddings from anon-submission-mk +author: John Snow Labs +name: sent_bert_base_macedonian_cased +date: 2024-09-25 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_macedonian_cased` is a English model originally trained by anon-submission-mk. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_macedonian_cased_en_5.5.0_3.0_1727251966901.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_macedonian_cased_en_5.5.0_3.0_1727251966901.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_macedonian_cased","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_macedonian_cased","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_macedonian_cased| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|412.2 MB| + +## References + +https://huggingface.co/anon-submission-mk/bert-base-macedonian-cased \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_macedonian_cased_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_macedonian_cased_pipeline_en.md new file mode 100644 index 00000000000000..65d6c72fc58dab --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_macedonian_cased_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_base_macedonian_cased_pipeline pipeline BertSentenceEmbeddings from anon-submission-mk +author: John Snow Labs +name: sent_bert_base_macedonian_cased_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_macedonian_cased_pipeline` is a English model originally trained by anon-submission-mk. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_macedonian_cased_pipeline_en_5.5.0_3.0_1727251988928.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_macedonian_cased_pipeline_en_5.5.0_3.0_1727251988928.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_macedonian_cased_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_macedonian_cased_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_macedonian_cased_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|412.8 MB| + +## References + +https://huggingface.co/anon-submission-mk/bert-base-macedonian-cased + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_portuguese_cased_finetuned_tcu_acordaos_pipeline_pt.md b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_portuguese_cased_finetuned_tcu_acordaos_pipeline_pt.md new file mode 100644 index 00000000000000..1c01c6653f6339 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_portuguese_cased_finetuned_tcu_acordaos_pipeline_pt.md @@ -0,0 +1,71 @@ +--- +layout: model +title: Portuguese sent_bert_base_portuguese_cased_finetuned_tcu_acordaos_pipeline pipeline BertSentenceEmbeddings from Luciano +author: John Snow Labs +name: sent_bert_base_portuguese_cased_finetuned_tcu_acordaos_pipeline +date: 2024-09-25 +tags: [pt, open_source, pipeline, onnx] +task: Embeddings +language: pt +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_portuguese_cased_finetuned_tcu_acordaos_pipeline` is a Portuguese model originally trained by Luciano. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_portuguese_cased_finetuned_tcu_acordaos_pipeline_pt_5.5.0_3.0_1727252674645.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_portuguese_cased_finetuned_tcu_acordaos_pipeline_pt_5.5.0_3.0_1727252674645.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_portuguese_cased_finetuned_tcu_acordaos_pipeline", lang = "pt") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_portuguese_cased_finetuned_tcu_acordaos_pipeline", lang = "pt") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_portuguese_cased_finetuned_tcu_acordaos_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|pt| +|Size:|406.5 MB| + +## References + +https://huggingface.co/Luciano/bert-base-portuguese-cased-finetuned-tcu-acordaos + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_portuguese_cased_finetuned_tcu_acordaos_pt.md b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_portuguese_cased_finetuned_tcu_acordaos_pt.md new file mode 100644 index 00000000000000..b0ecccea9bde39 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_portuguese_cased_finetuned_tcu_acordaos_pt.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Portuguese sent_bert_base_portuguese_cased_finetuned_tcu_acordaos BertSentenceEmbeddings from Luciano +author: John Snow Labs +name: sent_bert_base_portuguese_cased_finetuned_tcu_acordaos +date: 2024-09-25 +tags: [pt, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: pt +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_portuguese_cased_finetuned_tcu_acordaos` is a Portuguese model originally trained by Luciano. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_portuguese_cased_finetuned_tcu_acordaos_pt_5.5.0_3.0_1727252653031.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_portuguese_cased_finetuned_tcu_acordaos_pt_5.5.0_3.0_1727252653031.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_portuguese_cased_finetuned_tcu_acordaos","pt") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_portuguese_cased_finetuned_tcu_acordaos","pt") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_portuguese_cased_finetuned_tcu_acordaos| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|pt| +|Size:|405.9 MB| + +## References + +https://huggingface.co/Luciano/bert-base-portuguese-cased-finetuned-tcu-acordaos \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_spanish_wwm_cased_finetuned_literature_pro_en.md b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_spanish_wwm_cased_finetuned_literature_pro_en.md new file mode 100644 index 00000000000000..7c19178fe4b1f0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_spanish_wwm_cased_finetuned_literature_pro_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_base_spanish_wwm_cased_finetuned_literature_pro BertSentenceEmbeddings from a-v-bely +author: John Snow Labs +name: sent_bert_base_spanish_wwm_cased_finetuned_literature_pro +date: 2024-09-25 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_spanish_wwm_cased_finetuned_literature_pro` is a English model originally trained by a-v-bely. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_spanish_wwm_cased_finetuned_literature_pro_en_5.5.0_3.0_1727251984291.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_spanish_wwm_cased_finetuned_literature_pro_en_5.5.0_3.0_1727251984291.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_spanish_wwm_cased_finetuned_literature_pro","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_spanish_wwm_cased_finetuned_literature_pro","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_spanish_wwm_cased_finetuned_literature_pro| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|409.5 MB| + +## References + +https://huggingface.co/a-v-bely/bert-base-spanish-wwm-cased-finetuned-literature-pro \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_spanish_wwm_cased_finetuned_literature_pro_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_spanish_wwm_cased_finetuned_literature_pro_pipeline_en.md new file mode 100644 index 00000000000000..68fce6150841f7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_spanish_wwm_cased_finetuned_literature_pro_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_base_spanish_wwm_cased_finetuned_literature_pro_pipeline pipeline BertSentenceEmbeddings from a-v-bely +author: John Snow Labs +name: sent_bert_base_spanish_wwm_cased_finetuned_literature_pro_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_spanish_wwm_cased_finetuned_literature_pro_pipeline` is a English model originally trained by a-v-bely. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_spanish_wwm_cased_finetuned_literature_pro_pipeline_en_5.5.0_3.0_1727252006287.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_spanish_wwm_cased_finetuned_literature_pro_pipeline_en_5.5.0_3.0_1727252006287.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_spanish_wwm_cased_finetuned_literature_pro_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_spanish_wwm_cased_finetuned_literature_pro_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_spanish_wwm_cased_finetuned_literature_pro_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|410.0 MB| + +## References + +https://huggingface.co/a-v-bely/bert-base-spanish-wwm-cased-finetuned-literature-pro + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_uncased_1802_r1_en.md b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_uncased_1802_r1_en.md new file mode 100644 index 00000000000000..45977fbe03d8e7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_uncased_1802_r1_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_base_uncased_1802_r1 BertSentenceEmbeddings from JamesKim +author: John Snow Labs +name: sent_bert_base_uncased_1802_r1 +date: 2024-09-25 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_uncased_1802_r1` is a English model originally trained by JamesKim. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_1802_r1_en_5.5.0_3.0_1727252925708.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_1802_r1_en_5.5.0_3.0_1727252925708.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_uncased_1802_r1","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_uncased_1802_r1","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_uncased_1802_r1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|407.1 MB| + +## References + +https://huggingface.co/JamesKim/bert-base-uncased_1802_r1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_uncased_1802_r1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_uncased_1802_r1_pipeline_en.md new file mode 100644 index 00000000000000..bb25b0e09ec5e3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_uncased_1802_r1_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_base_uncased_1802_r1_pipeline pipeline BertSentenceEmbeddings from JamesKim +author: John Snow Labs +name: sent_bert_base_uncased_1802_r1_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_uncased_1802_r1_pipeline` is a English model originally trained by JamesKim. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_1802_r1_pipeline_en_5.5.0_3.0_1727252950127.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_1802_r1_pipeline_en_5.5.0_3.0_1727252950127.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_uncased_1802_r1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_uncased_1802_r1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_uncased_1802_r1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.7 MB| + +## References + +https://huggingface.co/JamesKim/bert-base-uncased_1802_r1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_uncased_finetuned_wallisian_en.md b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_uncased_finetuned_wallisian_en.md new file mode 100644 index 00000000000000..caea42250cb8b2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_uncased_finetuned_wallisian_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_base_uncased_finetuned_wallisian BertSentenceEmbeddings from btamm12 +author: John Snow Labs +name: sent_bert_base_uncased_finetuned_wallisian +date: 2024-09-25 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_uncased_finetuned_wallisian` is a English model originally trained by btamm12. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_finetuned_wallisian_en_5.5.0_3.0_1727248679997.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_finetuned_wallisian_en_5.5.0_3.0_1727248679997.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_uncased_finetuned_wallisian","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_uncased_finetuned_wallisian","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_uncased_finetuned_wallisian| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/btamm12/bert-base-uncased-finetuned-wls \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_uncased_finetuned_wallisian_manual_6ep_lower_en.md b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_uncased_finetuned_wallisian_manual_6ep_lower_en.md new file mode 100644 index 00000000000000..37b27e5e2888cc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_uncased_finetuned_wallisian_manual_6ep_lower_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_base_uncased_finetuned_wallisian_manual_6ep_lower BertSentenceEmbeddings from btamm12 +author: John Snow Labs +name: sent_bert_base_uncased_finetuned_wallisian_manual_6ep_lower +date: 2024-09-25 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_uncased_finetuned_wallisian_manual_6ep_lower` is a English model originally trained by btamm12. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_finetuned_wallisian_manual_6ep_lower_en_5.5.0_3.0_1727252904970.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_finetuned_wallisian_manual_6ep_lower_en_5.5.0_3.0_1727252904970.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_uncased_finetuned_wallisian_manual_6ep_lower","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_uncased_finetuned_wallisian_manual_6ep_lower","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_uncased_finetuned_wallisian_manual_6ep_lower| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/btamm12/bert-base-uncased-finetuned-wls-manual-6ep-lower \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_uncased_finetuned_wallisian_manual_6ep_lower_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_uncased_finetuned_wallisian_manual_6ep_lower_pipeline_en.md new file mode 100644 index 00000000000000..20fcaece26fde7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_uncased_finetuned_wallisian_manual_6ep_lower_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_base_uncased_finetuned_wallisian_manual_6ep_lower_pipeline pipeline BertSentenceEmbeddings from btamm12 +author: John Snow Labs +name: sent_bert_base_uncased_finetuned_wallisian_manual_6ep_lower_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_uncased_finetuned_wallisian_manual_6ep_lower_pipeline` is a English model originally trained by btamm12. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_finetuned_wallisian_manual_6ep_lower_pipeline_en_5.5.0_3.0_1727252927351.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_finetuned_wallisian_manual_6ep_lower_pipeline_en_5.5.0_3.0_1727252927351.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_uncased_finetuned_wallisian_manual_6ep_lower_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_uncased_finetuned_wallisian_manual_6ep_lower_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_uncased_finetuned_wallisian_manual_6ep_lower_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.7 MB| + +## References + +https://huggingface.co/btamm12/bert-base-uncased-finetuned-wls-manual-6ep-lower + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_uncased_finetuned_wallisian_manual_7ep_lower_en.md b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_uncased_finetuned_wallisian_manual_7ep_lower_en.md new file mode 100644 index 00000000000000..ff63beeff82825 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_uncased_finetuned_wallisian_manual_7ep_lower_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_base_uncased_finetuned_wallisian_manual_7ep_lower BertSentenceEmbeddings from btamm12 +author: John Snow Labs +name: sent_bert_base_uncased_finetuned_wallisian_manual_7ep_lower +date: 2024-09-25 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_uncased_finetuned_wallisian_manual_7ep_lower` is a English model originally trained by btamm12. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_finetuned_wallisian_manual_7ep_lower_en_5.5.0_3.0_1727248768771.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_finetuned_wallisian_manual_7ep_lower_en_5.5.0_3.0_1727248768771.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_uncased_finetuned_wallisian_manual_7ep_lower","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_uncased_finetuned_wallisian_manual_7ep_lower","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_uncased_finetuned_wallisian_manual_7ep_lower| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/btamm12/bert-base-uncased-finetuned-wls-manual-7ep-lower \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_uncased_finetuned_wallisian_manual_7ep_lower_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_uncased_finetuned_wallisian_manual_7ep_lower_pipeline_en.md new file mode 100644 index 00000000000000..56b038e7bc33b3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_uncased_finetuned_wallisian_manual_7ep_lower_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_base_uncased_finetuned_wallisian_manual_7ep_lower_pipeline pipeline BertSentenceEmbeddings from btamm12 +author: John Snow Labs +name: sent_bert_base_uncased_finetuned_wallisian_manual_7ep_lower_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_uncased_finetuned_wallisian_manual_7ep_lower_pipeline` is a English model originally trained by btamm12. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_finetuned_wallisian_manual_7ep_lower_pipeline_en_5.5.0_3.0_1727248790865.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_finetuned_wallisian_manual_7ep_lower_pipeline_en_5.5.0_3.0_1727248790865.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_uncased_finetuned_wallisian_manual_7ep_lower_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_uncased_finetuned_wallisian_manual_7ep_lower_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_uncased_finetuned_wallisian_manual_7ep_lower_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.7 MB| + +## References + +https://huggingface.co/btamm12/bert-base-uncased-finetuned-wls-manual-7ep-lower + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_uncased_finetuned_wallisian_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_uncased_finetuned_wallisian_pipeline_en.md new file mode 100644 index 00000000000000..349071eccba737 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_uncased_finetuned_wallisian_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_base_uncased_finetuned_wallisian_pipeline pipeline BertSentenceEmbeddings from btamm12 +author: John Snow Labs +name: sent_bert_base_uncased_finetuned_wallisian_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_uncased_finetuned_wallisian_pipeline` is a English model originally trained by btamm12. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_finetuned_wallisian_pipeline_en_5.5.0_3.0_1727248701470.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_finetuned_wallisian_pipeline_en_5.5.0_3.0_1727248701470.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_uncased_finetuned_wallisian_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_uncased_finetuned_wallisian_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_uncased_finetuned_wallisian_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.7 MB| + +## References + +https://huggingface.co/btamm12/bert-base-uncased-finetuned-wls + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_uncased_issues_128_haesun_en.md b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_uncased_issues_128_haesun_en.md new file mode 100644 index 00000000000000..de2d6069cf6ffc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_uncased_issues_128_haesun_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_base_uncased_issues_128_haesun BertSentenceEmbeddings from haesun +author: John Snow Labs +name: sent_bert_base_uncased_issues_128_haesun +date: 2024-09-25 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_uncased_issues_128_haesun` is a English model originally trained by haesun. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_issues_128_haesun_en_5.5.0_3.0_1727230745218.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_issues_128_haesun_en_5.5.0_3.0_1727230745218.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_uncased_issues_128_haesun","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_uncased_issues_128_haesun","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_uncased_issues_128_haesun| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|407.1 MB| + +## References + +https://huggingface.co/haesun/bert-base-uncased-issues-128 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_uncased_issues_128_haesun_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_uncased_issues_128_haesun_pipeline_en.md new file mode 100644 index 00000000000000..7338c5cc86c0f9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_uncased_issues_128_haesun_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_base_uncased_issues_128_haesun_pipeline pipeline BertSentenceEmbeddings from haesun +author: John Snow Labs +name: sent_bert_base_uncased_issues_128_haesun_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_uncased_issues_128_haesun_pipeline` is a English model originally trained by haesun. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_issues_128_haesun_pipeline_en_5.5.0_3.0_1727230765829.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_issues_128_haesun_pipeline_en_5.5.0_3.0_1727230765829.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_uncased_issues_128_haesun_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_uncased_issues_128_haesun_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_uncased_issues_128_haesun_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.7 MB| + +## References + +https://huggingface.co/haesun/bert-base-uncased-issues-128 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_uncased_issues_128_mabrouk_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_uncased_issues_128_mabrouk_pipeline_en.md new file mode 100644 index 00000000000000..75f207533e8563 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_uncased_issues_128_mabrouk_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_base_uncased_issues_128_mabrouk_pipeline pipeline BertSentenceEmbeddings from mabrouk +author: John Snow Labs +name: sent_bert_base_uncased_issues_128_mabrouk_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_uncased_issues_128_mabrouk_pipeline` is a English model originally trained by mabrouk. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_issues_128_mabrouk_pipeline_en_5.5.0_3.0_1727251458188.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_issues_128_mabrouk_pipeline_en_5.5.0_3.0_1727251458188.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_uncased_issues_128_mabrouk_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_uncased_issues_128_mabrouk_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_uncased_issues_128_mabrouk_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.7 MB| + +## References + +https://huggingface.co/mabrouk/bert-base-uncased-issues-128 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_uncased_issues_128_robinsh2023_en.md b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_uncased_issues_128_robinsh2023_en.md new file mode 100644 index 00000000000000..25bbd73ed86536 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_uncased_issues_128_robinsh2023_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_base_uncased_issues_128_robinsh2023 BertSentenceEmbeddings from Robinsh2023 +author: John Snow Labs +name: sent_bert_base_uncased_issues_128_robinsh2023 +date: 2024-09-25 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_uncased_issues_128_robinsh2023` is a English model originally trained by Robinsh2023. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_issues_128_robinsh2023_en_5.5.0_3.0_1727234930301.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_issues_128_robinsh2023_en_5.5.0_3.0_1727234930301.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_uncased_issues_128_robinsh2023","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_uncased_issues_128_robinsh2023","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_uncased_issues_128_robinsh2023| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|407.1 MB| + +## References + +https://huggingface.co/Robinsh2023/bert-base-uncased-issues-128 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_uncased_issues_128_robinsh2023_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_uncased_issues_128_robinsh2023_pipeline_en.md new file mode 100644 index 00000000000000..98eea9f8c85640 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_uncased_issues_128_robinsh2023_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_base_uncased_issues_128_robinsh2023_pipeline pipeline BertSentenceEmbeddings from Robinsh2023 +author: John Snow Labs +name: sent_bert_base_uncased_issues_128_robinsh2023_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_uncased_issues_128_robinsh2023_pipeline` is a English model originally trained by Robinsh2023. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_issues_128_robinsh2023_pipeline_en_5.5.0_3.0_1727234951381.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_issues_128_robinsh2023_pipeline_en_5.5.0_3.0_1727234951381.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_uncased_issues_128_robinsh2023_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_uncased_issues_128_robinsh2023_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_uncased_issues_128_robinsh2023_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.7 MB| + +## References + +https://huggingface.co/Robinsh2023/bert-base-uncased-issues-128 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_uncased_malayalam_en.md b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_uncased_malayalam_en.md new file mode 100644 index 00000000000000..6fa96082c9f1ac --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_uncased_malayalam_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_base_uncased_malayalam BertSentenceEmbeddings from Tural +author: John Snow Labs +name: sent_bert_base_uncased_malayalam +date: 2024-09-25 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_uncased_malayalam` is a English model originally trained by Tural. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_malayalam_en_5.5.0_3.0_1727253078814.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_malayalam_en_5.5.0_3.0_1727253078814.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_uncased_malayalam","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_uncased_malayalam","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_uncased_malayalam| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|407.9 MB| + +## References + +https://huggingface.co/Tural/bert-base-uncased-ml \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_uncased_malayalam_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_uncased_malayalam_pipeline_en.md new file mode 100644 index 00000000000000..262f1e0f5a523f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_uncased_malayalam_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_base_uncased_malayalam_pipeline pipeline BertSentenceEmbeddings from Tural +author: John Snow Labs +name: sent_bert_base_uncased_malayalam_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_uncased_malayalam_pipeline` is a English model originally trained by Tural. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_malayalam_pipeline_en_5.5.0_3.0_1727253099895.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_malayalam_pipeline_en_5.5.0_3.0_1727253099895.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_uncased_malayalam_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_uncased_malayalam_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_uncased_malayalam_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|408.4 MB| + +## References + +https://huggingface.co/Tural/bert-base-uncased-ml + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_uncased_mlm_scirepeval_fos_chemistry_en.md b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_uncased_mlm_scirepeval_fos_chemistry_en.md new file mode 100644 index 00000000000000..a90a843650af30 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_uncased_mlm_scirepeval_fos_chemistry_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_base_uncased_mlm_scirepeval_fos_chemistry BertSentenceEmbeddings from jonas-luehrs +author: John Snow Labs +name: sent_bert_base_uncased_mlm_scirepeval_fos_chemistry +date: 2024-09-25 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_uncased_mlm_scirepeval_fos_chemistry` is a English model originally trained by jonas-luehrs. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_mlm_scirepeval_fos_chemistry_en_5.5.0_3.0_1727234682485.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_mlm_scirepeval_fos_chemistry_en_5.5.0_3.0_1727234682485.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_uncased_mlm_scirepeval_fos_chemistry","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_uncased_mlm_scirepeval_fos_chemistry","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_uncased_mlm_scirepeval_fos_chemistry| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/jonas-luehrs/bert-base-uncased-MLM-scirepeval_fos_chemistry \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_uncased_mlm_scirepeval_fos_chemistry_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_uncased_mlm_scirepeval_fos_chemistry_pipeline_en.md new file mode 100644 index 00000000000000..5853799494ddbc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_uncased_mlm_scirepeval_fos_chemistry_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_base_uncased_mlm_scirepeval_fos_chemistry_pipeline pipeline BertSentenceEmbeddings from jonas-luehrs +author: John Snow Labs +name: sent_bert_base_uncased_mlm_scirepeval_fos_chemistry_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_uncased_mlm_scirepeval_fos_chemistry_pipeline` is a English model originally trained by jonas-luehrs. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_mlm_scirepeval_fos_chemistry_pipeline_en_5.5.0_3.0_1727234703495.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_mlm_scirepeval_fos_chemistry_pipeline_en_5.5.0_3.0_1727234703495.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_uncased_mlm_scirepeval_fos_chemistry_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_uncased_mlm_scirepeval_fos_chemistry_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_uncased_mlm_scirepeval_fos_chemistry_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.7 MB| + +## References + +https://huggingface.co/jonas-luehrs/bert-base-uncased-MLM-scirepeval_fos_chemistry + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_uncased_model_attribution_challenge_en.md b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_uncased_model_attribution_challenge_en.md new file mode 100644 index 00000000000000..db7d519bf1b884 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_uncased_model_attribution_challenge_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_base_uncased_model_attribution_challenge BertSentenceEmbeddings from model-attribution-challenge +author: John Snow Labs +name: sent_bert_base_uncased_model_attribution_challenge +date: 2024-09-25 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_uncased_model_attribution_challenge` is a English model originally trained by model-attribution-challenge. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_model_attribution_challenge_en_5.5.0_3.0_1727252027969.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_model_attribution_challenge_en_5.5.0_3.0_1727252027969.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_uncased_model_attribution_challenge","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_uncased_model_attribution_challenge","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_uncased_model_attribution_challenge| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/model-attribution-challenge/bert-base-uncased \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_uncased_sclarge_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_uncased_sclarge_pipeline_en.md new file mode 100644 index 00000000000000..2e429c80c582ca --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_uncased_sclarge_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_base_uncased_sclarge_pipeline pipeline BertSentenceEmbeddings from CambridgeMolecularEngineering +author: John Snow Labs +name: sent_bert_base_uncased_sclarge_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_uncased_sclarge_pipeline` is a English model originally trained by CambridgeMolecularEngineering. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_sclarge_pipeline_en_5.5.0_3.0_1727230881067.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_sclarge_pipeline_en_5.5.0_3.0_1727230881067.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_uncased_sclarge_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_uncased_sclarge_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_uncased_sclarge_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.7 MB| + +## References + +https://huggingface.co/CambridgeMolecularEngineering/bert-base-uncased-sclarge + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_uncased_sijia_w_en.md b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_uncased_sijia_w_en.md new file mode 100644 index 00000000000000..8b3e59b073fb20 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_uncased_sijia_w_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_base_uncased_sijia_w BertSentenceEmbeddings from sijia-w +author: John Snow Labs +name: sent_bert_base_uncased_sijia_w +date: 2024-09-25 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_uncased_sijia_w` is a English model originally trained by sijia-w. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_sijia_w_en_5.5.0_3.0_1727234344591.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_sijia_w_en_5.5.0_3.0_1727234344591.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_uncased_sijia_w","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_uncased_sijia_w","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_uncased_sijia_w| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/sijia-w/bert-base-uncased \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_uncased_sijia_w_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_uncased_sijia_w_pipeline_en.md new file mode 100644 index 00000000000000..4c43e76264b8ab --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_uncased_sijia_w_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_base_uncased_sijia_w_pipeline pipeline BertSentenceEmbeddings from sijia-w +author: John Snow Labs +name: sent_bert_base_uncased_sijia_w_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_uncased_sijia_w_pipeline` is a English model originally trained by sijia-w. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_sijia_w_pipeline_en_5.5.0_3.0_1727234365461.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_sijia_w_pipeline_en_5.5.0_3.0_1727234365461.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_uncased_sijia_w_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_uncased_sijia_w_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_uncased_sijia_w_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.7 MB| + +## References + +https://huggingface.co/sijia-w/bert-base-uncased + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_vn_finetuned_portuguese_en.md b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_vn_finetuned_portuguese_en.md new file mode 100644 index 00000000000000..c4e012585d4772 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_vn_finetuned_portuguese_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_base_vn_finetuned_portuguese BertSentenceEmbeddings from dotansang +author: John Snow Labs +name: sent_bert_base_vn_finetuned_portuguese +date: 2024-09-25 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_vn_finetuned_portuguese` is a English model originally trained by dotansang. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_vn_finetuned_portuguese_en_5.5.0_3.0_1727248589931.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_vn_finetuned_portuguese_en_5.5.0_3.0_1727248589931.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_vn_finetuned_portuguese","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_vn_finetuned_portuguese","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_vn_finetuned_portuguese| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|498.8 MB| + +## References + +https://huggingface.co/dotansang/bert-base-vn-finetuned-pt \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_bert_large_cased_en.md b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_large_cased_en.md new file mode 100644 index 00000000000000..ba7aa35d488d88 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_large_cased_en.md @@ -0,0 +1,92 @@ +--- +layout: model +title: BERT Sentence Embeddings (Large Cased) +author: John Snow Labs +name: sent_bert_large_cased +date: 2024-09-25 +tags: [open_source, embeddings, en, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +This model contains a deep bidirectional transformer trained on Wikipedia and the BookCorpus. The details are described in the paper "[BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding](https://arxiv.org/abs/1810.04805)". + +## Predicted Entities + + + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_large_cased_en_5.5.0_3.0_1727230647828.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_large_cased_en_5.5.0_3.0_1727230647828.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +... +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_large_cased", "en") \ +.setInputCols("sentence") \ +.setOutputCol("sentence_embeddings") +nlp_pipeline = Pipeline(stages=[document_assembler, sentence_detector, embeddings]) +pipeline_model = nlp_pipeline.fit(spark.createDataFrame([[""]]).toDF("text")) +result = pipeline_model.transform(spark.createDataFrame([['I hate cancer', "Antibiotics aren't painkiller"]], ["text"])) +``` +```scala +... +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_large_cased", "en") +.setInputCols("sentence") +.setOutputCol("sentence_embeddings") +val pipeline = new Pipeline().setStages(Array(document_assembler, sentence_detector, embeddings)) +val data = Seq("I hate cancer", "Antibiotics aren't painkiller").toDF("text") +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu + +text = ["I hate cancer", "Antibiotics aren't painkiller"] +embeddings_df = nlu.load('en.embed_sentence.bert_large_cased').predict(text, output_level='sentence') +embeddings_df +``` +
+ +## Results + +```bash + + token en_embed_sentence_bert_large_cased_embeddings + + I [[-0.6228358149528503, -0.3453695774078369, 0.... +love [[-0.6228358149528503, -0.3453695774078369, 0.... +NLP [[-0.6228358149528503, -0.3453695774078369, 0.... +``` + +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_large_cased| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|1.2 GB| \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_bert_large_cased_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_large_cased_pipeline_en.md new file mode 100644 index 00000000000000..6dcf322294a214 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_large_cased_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_large_cased_pipeline pipeline BertSentenceEmbeddings from google-bert +author: John Snow Labs +name: sent_bert_large_cased_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_large_cased_pipeline` is a English model originally trained by google-bert. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_large_cased_pipeline_en_5.5.0_3.0_1727230709316.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_large_cased_pipeline_en_5.5.0_3.0_1727230709316.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_large_cased_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_large_cased_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_large_cased_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/google-bert/bert-large-cased + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_bert_large_nli_en.md b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_large_nli_en.md new file mode 100644 index 00000000000000..1cc86107e298c9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_large_nli_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_large_nli BertSentenceEmbeddings from binwang +author: John Snow Labs +name: sent_bert_large_nli +date: 2024-09-25 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_large_nli` is a English model originally trained by binwang. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_large_nli_en_5.5.0_3.0_1727251695723.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_large_nli_en_5.5.0_3.0_1727251695723.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_large_nli","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_large_nli","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_large_nli| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/binwang/bert-large-nli \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_bert_large_nli_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_large_nli_pipeline_en.md new file mode 100644 index 00000000000000..df3914a2b6a4e8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_large_nli_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_large_nli_pipeline pipeline BertSentenceEmbeddings from binwang +author: John Snow Labs +name: sent_bert_large_nli_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_large_nli_pipeline` is a English model originally trained by binwang. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_large_nli_pipeline_en_5.5.0_3.0_1727251760270.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_large_nli_pipeline_en_5.5.0_3.0_1727251760270.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_large_nli_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_large_nli_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_large_nli_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/binwang/bert-large-nli + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_bert_large_portuguese_cased_legal_tsdae_pipeline_pt.md b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_large_portuguese_cased_legal_tsdae_pipeline_pt.md new file mode 100644 index 00000000000000..66d33ec146f0dd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_large_portuguese_cased_legal_tsdae_pipeline_pt.md @@ -0,0 +1,71 @@ +--- +layout: model +title: Portuguese sent_bert_large_portuguese_cased_legal_tsdae_pipeline pipeline BertSentenceEmbeddings from stjiris +author: John Snow Labs +name: sent_bert_large_portuguese_cased_legal_tsdae_pipeline +date: 2024-09-25 +tags: [pt, open_source, pipeline, onnx] +task: Embeddings +language: pt +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_large_portuguese_cased_legal_tsdae_pipeline` is a Portuguese model originally trained by stjiris. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_large_portuguese_cased_legal_tsdae_pipeline_pt_5.5.0_3.0_1727253371409.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_large_portuguese_cased_legal_tsdae_pipeline_pt_5.5.0_3.0_1727253371409.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_large_portuguese_cased_legal_tsdae_pipeline", lang = "pt") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_large_portuguese_cased_legal_tsdae_pipeline", lang = "pt") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_large_portuguese_cased_legal_tsdae_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|pt| +|Size:|1.2 GB| + +## References + +https://huggingface.co/stjiris/bert-large-portuguese-cased-legal-tsdae + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_bert_tagalog_base_uncased_wwm_pipeline_tl.md b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_tagalog_base_uncased_wwm_pipeline_tl.md new file mode 100644 index 00000000000000..d454582e01fe82 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_tagalog_base_uncased_wwm_pipeline_tl.md @@ -0,0 +1,71 @@ +--- +layout: model +title: Tagalog sent_bert_tagalog_base_uncased_wwm_pipeline pipeline BertSentenceEmbeddings from jcblaise +author: John Snow Labs +name: sent_bert_tagalog_base_uncased_wwm_pipeline +date: 2024-09-25 +tags: [tl, open_source, pipeline, onnx] +task: Embeddings +language: tl +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_tagalog_base_uncased_wwm_pipeline` is a Tagalog model originally trained by jcblaise. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_tagalog_base_uncased_wwm_pipeline_tl_5.5.0_3.0_1727249292087.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_tagalog_base_uncased_wwm_pipeline_tl_5.5.0_3.0_1727249292087.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_tagalog_base_uncased_wwm_pipeline", lang = "tl") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_tagalog_base_uncased_wwm_pipeline", lang = "tl") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_tagalog_base_uncased_wwm_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|tl| +|Size:|407.4 MB| + +## References + +https://huggingface.co/jcblaise/bert-tagalog-base-uncased-WWM + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_bert_tagalog_base_uncased_wwm_tl.md b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_tagalog_base_uncased_wwm_tl.md new file mode 100644 index 00000000000000..d51dd28aebdf2d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_tagalog_base_uncased_wwm_tl.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Tagalog sent_bert_tagalog_base_uncased_wwm BertSentenceEmbeddings from jcblaise +author: John Snow Labs +name: sent_bert_tagalog_base_uncased_wwm +date: 2024-09-25 +tags: [tl, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: tl +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_tagalog_base_uncased_wwm` is a Tagalog model originally trained by jcblaise. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_tagalog_base_uncased_wwm_tl_5.5.0_3.0_1727249270927.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_tagalog_base_uncased_wwm_tl_5.5.0_3.0_1727249270927.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_tagalog_base_uncased_wwm","tl") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_tagalog_base_uncased_wwm","tl") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_tagalog_base_uncased_wwm| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|tl| +|Size:|406.9 MB| + +## References + +https://huggingface.co/jcblaise/bert-tagalog-base-uncased-WWM \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_bio_bert_base_spanish_wwm_cased_en.md b/docs/_posts/ahmedlone127/2024-09-25-sent_bio_bert_base_spanish_wwm_cased_en.md new file mode 100644 index 00000000000000..e6bfe710d33d2b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_bio_bert_base_spanish_wwm_cased_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bio_bert_base_spanish_wwm_cased BertSentenceEmbeddings from mrojas +author: John Snow Labs +name: sent_bio_bert_base_spanish_wwm_cased +date: 2024-09-25 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bio_bert_base_spanish_wwm_cased` is a English model originally trained by mrojas. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bio_bert_base_spanish_wwm_cased_en_5.5.0_3.0_1727252456840.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bio_bert_base_spanish_wwm_cased_en_5.5.0_3.0_1727252456840.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bio_bert_base_spanish_wwm_cased","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bio_bert_base_spanish_wwm_cased","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bio_bert_base_spanish_wwm_cased| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|409.0 MB| + +## References + +https://huggingface.co/mrojas/bio-bert-base-spanish-wwm-cased \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_biobert_italian_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-sent_biobert_italian_pipeline_en.md new file mode 100644 index 00000000000000..2fb0f7dc3d0159 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_biobert_italian_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_biobert_italian_pipeline pipeline BertSentenceEmbeddings from marcopost-it +author: John Snow Labs +name: sent_biobert_italian_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_biobert_italian_pipeline` is a English model originally trained by marcopost-it. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_biobert_italian_pipeline_en_5.5.0_3.0_1727249186643.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_biobert_italian_pipeline_en_5.5.0_3.0_1727249186643.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_biobert_italian_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_biobert_italian_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_biobert_italian_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|410.8 MB| + +## References + +https://huggingface.co/marcopost-it/biobert-it + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_cl_arabertv0_1_base_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-sent_cl_arabertv0_1_base_pipeline_en.md new file mode 100644 index 00000000000000..3eaec1d0fe89c7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_cl_arabertv0_1_base_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_cl_arabertv0_1_base_pipeline pipeline BertSentenceEmbeddings from qahq +author: John Snow Labs +name: sent_cl_arabertv0_1_base_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_cl_arabertv0_1_base_pipeline` is a English model originally trained by qahq. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_cl_arabertv0_1_base_pipeline_en_5.5.0_3.0_1727251599249.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_cl_arabertv0_1_base_pipeline_en_5.5.0_3.0_1727251599249.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_cl_arabertv0_1_base_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_cl_arabertv0_1_base_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_cl_arabertv0_1_base_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|505.6 MB| + +## References + +https://huggingface.co/qahq/CL-AraBERTv0.1-base + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_distilbert_base_uncased_finetuned_imdb_accelerate_finetuned_imdb_en.md b/docs/_posts/ahmedlone127/2024-09-25-sent_distilbert_base_uncased_finetuned_imdb_accelerate_finetuned_imdb_en.md new file mode 100644 index 00000000000000..891717c5f332b2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_distilbert_base_uncased_finetuned_imdb_accelerate_finetuned_imdb_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_distilbert_base_uncased_finetuned_imdb_accelerate_finetuned_imdb BertSentenceEmbeddings from cxfajar197 +author: John Snow Labs +name: sent_distilbert_base_uncased_finetuned_imdb_accelerate_finetuned_imdb +date: 2024-09-25 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_distilbert_base_uncased_finetuned_imdb_accelerate_finetuned_imdb` is a English model originally trained by cxfajar197. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_distilbert_base_uncased_finetuned_imdb_accelerate_finetuned_imdb_en_5.5.0_3.0_1727234600526.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_distilbert_base_uncased_finetuned_imdb_accelerate_finetuned_imdb_en_5.5.0_3.0_1727234600526.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_distilbert_base_uncased_finetuned_imdb_accelerate_finetuned_imdb","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_distilbert_base_uncased_finetuned_imdb_accelerate_finetuned_imdb","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_distilbert_base_uncased_finetuned_imdb_accelerate_finetuned_imdb| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|665.1 MB| + +## References + +https://huggingface.co/cxfajar197/distilbert-base-uncased-finetuned-imdb-accelerate-finetuned-imdb \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_distilbert_base_uncased_finetuned_imdb_accelerate_finetuned_imdb_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-sent_distilbert_base_uncased_finetuned_imdb_accelerate_finetuned_imdb_pipeline_en.md new file mode 100644 index 00000000000000..a42686d8627174 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_distilbert_base_uncased_finetuned_imdb_accelerate_finetuned_imdb_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_distilbert_base_uncased_finetuned_imdb_accelerate_finetuned_imdb_pipeline pipeline BertSentenceEmbeddings from cxfajar197 +author: John Snow Labs +name: sent_distilbert_base_uncased_finetuned_imdb_accelerate_finetuned_imdb_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_distilbert_base_uncased_finetuned_imdb_accelerate_finetuned_imdb_pipeline` is a English model originally trained by cxfajar197. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_distilbert_base_uncased_finetuned_imdb_accelerate_finetuned_imdb_pipeline_en_5.5.0_3.0_1727234634556.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_distilbert_base_uncased_finetuned_imdb_accelerate_finetuned_imdb_pipeline_en_5.5.0_3.0_1727234634556.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_distilbert_base_uncased_finetuned_imdb_accelerate_finetuned_imdb_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_distilbert_base_uncased_finetuned_imdb_accelerate_finetuned_imdb_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_distilbert_base_uncased_finetuned_imdb_accelerate_finetuned_imdb_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|665.6 MB| + +## References + +https://huggingface.co/cxfajar197/distilbert-base-uncased-finetuned-imdb-accelerate-finetuned-imdb + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_dummy_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-sent_dummy_pipeline_en.md new file mode 100644 index 00000000000000..dfa357963c3cfb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_dummy_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_dummy_pipeline pipeline BertSentenceEmbeddings from knight7561 +author: John Snow Labs +name: sent_dummy_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_dummy_pipeline` is a English model originally trained by knight7561. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_dummy_pipeline_en_5.5.0_3.0_1727252621212.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_dummy_pipeline_en_5.5.0_3.0_1727252621212.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_dummy_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_dummy_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_dummy_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.7 MB| + +## References + +https://huggingface.co/knight7561/dummy + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_fae_en.md b/docs/_posts/ahmedlone127/2024-09-25-sent_fae_en.md new file mode 100644 index 00000000000000..93fe1269803402 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_fae_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_fae BertSentenceEmbeddings from sereneWithU +author: John Snow Labs +name: sent_fae +date: 2024-09-25 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_fae` is a English model originally trained by sereneWithU. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_fae_en_5.5.0_3.0_1727252735434.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_fae_en_5.5.0_3.0_1727252735434.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_fae","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_fae","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_fae| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/sereneWithU/FAE \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_fae_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-sent_fae_pipeline_en.md new file mode 100644 index 00000000000000..035a0338e697a2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_fae_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_fae_pipeline pipeline BertSentenceEmbeddings from sereneWithU +author: John Snow Labs +name: sent_fae_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_fae_pipeline` is a English model originally trained by sereneWithU. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_fae_pipeline_en_5.5.0_3.0_1727252808462.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_fae_pipeline_en_5.5.0_3.0_1727252808462.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_fae_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_fae_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_fae_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/sereneWithU/FAE + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_fine_tune_bert_mlm_en.md b/docs/_posts/ahmedlone127/2024-09-25-sent_fine_tune_bert_mlm_en.md new file mode 100644 index 00000000000000..840b4446718487 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_fine_tune_bert_mlm_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_fine_tune_bert_mlm BertSentenceEmbeddings from mjavadmt +author: John Snow Labs +name: sent_fine_tune_bert_mlm +date: 2024-09-25 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_fine_tune_bert_mlm` is a English model originally trained by mjavadmt. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_fine_tune_bert_mlm_en_5.5.0_3.0_1727230529238.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_fine_tune_bert_mlm_en_5.5.0_3.0_1727230529238.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_fine_tune_bert_mlm","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_fine_tune_bert_mlm","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_fine_tune_bert_mlm| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|606.4 MB| + +## References + +https://huggingface.co/mjavadmt/fine-tune-BERT-MLM \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_fine_tune_bert_mlm_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-sent_fine_tune_bert_mlm_pipeline_en.md new file mode 100644 index 00000000000000..24027cf165f4da --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_fine_tune_bert_mlm_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_fine_tune_bert_mlm_pipeline pipeline BertSentenceEmbeddings from mjavadmt +author: John Snow Labs +name: sent_fine_tune_bert_mlm_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_fine_tune_bert_mlm_pipeline` is a English model originally trained by mjavadmt. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_fine_tune_bert_mlm_pipeline_en_5.5.0_3.0_1727230560544.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_fine_tune_bert_mlm_pipeline_en_5.5.0_3.0_1727230560544.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_fine_tune_bert_mlm_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_fine_tune_bert_mlm_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_fine_tune_bert_mlm_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|607.0 MB| + +## References + +https://huggingface.co/mjavadmt/fine-tune-BERT-MLM + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_guidebias_bert_base_uncased_gender_en.md b/docs/_posts/ahmedlone127/2024-09-25-sent_guidebias_bert_base_uncased_gender_en.md new file mode 100644 index 00000000000000..84844671ddba3f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_guidebias_bert_base_uncased_gender_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_guidebias_bert_base_uncased_gender BertSentenceEmbeddings from squiduu +author: John Snow Labs +name: sent_guidebias_bert_base_uncased_gender +date: 2024-09-25 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_guidebias_bert_base_uncased_gender` is a English model originally trained by squiduu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_guidebias_bert_base_uncased_gender_en_5.5.0_3.0_1727230418497.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_guidebias_bert_base_uncased_gender_en_5.5.0_3.0_1727230418497.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_guidebias_bert_base_uncased_gender","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_guidebias_bert_base_uncased_gender","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_guidebias_bert_base_uncased_gender| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/squiduu/guidebias-bert-base-uncased-gender \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_gujibert_jian_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-sent_gujibert_jian_pipeline_en.md new file mode 100644 index 00000000000000..a9492e20a36300 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_gujibert_jian_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_gujibert_jian_pipeline pipeline BertSentenceEmbeddings from hsc748NLP +author: John Snow Labs +name: sent_gujibert_jian_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_gujibert_jian_pipeline` is a English model originally trained by hsc748NLP. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_gujibert_jian_pipeline_en_5.5.0_3.0_1727252975988.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_gujibert_jian_pipeline_en_5.5.0_3.0_1727252975988.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_gujibert_jian_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_gujibert_jian_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_gujibert_jian_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|420.8 MB| + +## References + +https://huggingface.co/hsc748NLP/GujiBERT_jian + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_hindi_marathi_dev_bert_hi.md b/docs/_posts/ahmedlone127/2024-09-25-sent_hindi_marathi_dev_bert_hi.md new file mode 100644 index 00000000000000..4a418b158b4128 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_hindi_marathi_dev_bert_hi.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Hindi sent_hindi_marathi_dev_bert BertSentenceEmbeddings from l3cube-pune +author: John Snow Labs +name: sent_hindi_marathi_dev_bert +date: 2024-09-25 +tags: [hi, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: hi +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_hindi_marathi_dev_bert` is a Hindi model originally trained by l3cube-pune. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_hindi_marathi_dev_bert_hi_5.5.0_3.0_1727252023290.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_hindi_marathi_dev_bert_hi_5.5.0_3.0_1727252023290.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_hindi_marathi_dev_bert","hi") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_hindi_marathi_dev_bert","hi") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_hindi_marathi_dev_bert| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|hi| +|Size:|890.7 MB| + +## References + +https://huggingface.co/l3cube-pune/hindi-marathi-dev-bert \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_hindi_marathi_dev_bert_pipeline_hi.md b/docs/_posts/ahmedlone127/2024-09-25-sent_hindi_marathi_dev_bert_pipeline_hi.md new file mode 100644 index 00000000000000..efc034c3aeb5ce --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_hindi_marathi_dev_bert_pipeline_hi.md @@ -0,0 +1,71 @@ +--- +layout: model +title: Hindi sent_hindi_marathi_dev_bert_pipeline pipeline BertSentenceEmbeddings from l3cube-pune +author: John Snow Labs +name: sent_hindi_marathi_dev_bert_pipeline +date: 2024-09-25 +tags: [hi, open_source, pipeline, onnx] +task: Embeddings +language: hi +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_hindi_marathi_dev_bert_pipeline` is a Hindi model originally trained by l3cube-pune. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_hindi_marathi_dev_bert_pipeline_hi_5.5.0_3.0_1727252074552.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_hindi_marathi_dev_bert_pipeline_hi_5.5.0_3.0_1727252074552.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_hindi_marathi_dev_bert_pipeline", lang = "hi") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_hindi_marathi_dev_bert_pipeline", lang = "hi") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_hindi_marathi_dev_bert_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|hi| +|Size:|891.2 MB| + +## References + +https://huggingface.co/l3cube-pune/hindi-marathi-dev-bert + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_hindi_wordpiece_bert_test_2m_en.md b/docs/_posts/ahmedlone127/2024-09-25-sent_hindi_wordpiece_bert_test_2m_en.md new file mode 100644 index 00000000000000..6ba71c52633d2a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_hindi_wordpiece_bert_test_2m_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_hindi_wordpiece_bert_test_2m BertSentenceEmbeddings from rg1683 +author: John Snow Labs +name: sent_hindi_wordpiece_bert_test_2m +date: 2024-09-25 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_hindi_wordpiece_bert_test_2m` is a English model originally trained by rg1683. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_hindi_wordpiece_bert_test_2m_en_5.5.0_3.0_1727249119712.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_hindi_wordpiece_bert_test_2m_en_5.5.0_3.0_1727249119712.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_hindi_wordpiece_bert_test_2m","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_hindi_wordpiece_bert_test_2m","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_hindi_wordpiece_bert_test_2m| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|377.7 MB| + +## References + +https://huggingface.co/rg1683/hindi_wordpiece_bert_test_2m \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_hindi_wordpiece_bert_test_2m_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-sent_hindi_wordpiece_bert_test_2m_pipeline_en.md new file mode 100644 index 00000000000000..333a4b450b0cda --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_hindi_wordpiece_bert_test_2m_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_hindi_wordpiece_bert_test_2m_pipeline pipeline BertSentenceEmbeddings from rg1683 +author: John Snow Labs +name: sent_hindi_wordpiece_bert_test_2m_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_hindi_wordpiece_bert_test_2m_pipeline` is a English model originally trained by rg1683. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_hindi_wordpiece_bert_test_2m_pipeline_en_5.5.0_3.0_1727249140694.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_hindi_wordpiece_bert_test_2m_pipeline_en_5.5.0_3.0_1727249140694.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_hindi_wordpiece_bert_test_2m_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_hindi_wordpiece_bert_test_2m_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_hindi_wordpiece_bert_test_2m_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|378.3 MB| + +## References + +https://huggingface.co/rg1683/hindi_wordpiece_bert_test_2m + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_javanese_bert_small_jv.md b/docs/_posts/ahmedlone127/2024-09-25-sent_javanese_bert_small_jv.md new file mode 100644 index 00000000000000..05cb9c67205aa3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_javanese_bert_small_jv.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Javanese sent_javanese_bert_small BertSentenceEmbeddings from w11wo +author: John Snow Labs +name: sent_javanese_bert_small +date: 2024-09-25 +tags: [jv, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: jv +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_javanese_bert_small` is a Javanese model originally trained by w11wo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_javanese_bert_small_jv_5.5.0_3.0_1727251545296.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_javanese_bert_small_jv_5.5.0_3.0_1727251545296.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_javanese_bert_small","jv") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_javanese_bert_small","jv") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_javanese_bert_small| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|jv| +|Size:|407.3 MB| + +## References + +https://huggingface.co/w11wo/javanese-bert-small \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_javanese_bert_small_pipeline_jv.md b/docs/_posts/ahmedlone127/2024-09-25-sent_javanese_bert_small_pipeline_jv.md new file mode 100644 index 00000000000000..4042c91c5e530a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_javanese_bert_small_pipeline_jv.md @@ -0,0 +1,71 @@ +--- +layout: model +title: Javanese sent_javanese_bert_small_pipeline pipeline BertSentenceEmbeddings from w11wo +author: John Snow Labs +name: sent_javanese_bert_small_pipeline +date: 2024-09-25 +tags: [jv, open_source, pipeline, onnx] +task: Embeddings +language: jv +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_javanese_bert_small_pipeline` is a Javanese model originally trained by w11wo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_javanese_bert_small_pipeline_jv_5.5.0_3.0_1727251567026.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_javanese_bert_small_pipeline_jv_5.5.0_3.0_1727251567026.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_javanese_bert_small_pipeline", lang = "jv") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_javanese_bert_small_pipeline", lang = "jv") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_javanese_bert_small_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|jv| +|Size:|407.8 MB| + +## References + +https://huggingface.co/w11wo/javanese-bert-small + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_knowbias_bert_base_uncased_gender_en.md b/docs/_posts/ahmedlone127/2024-09-25-sent_knowbias_bert_base_uncased_gender_en.md new file mode 100644 index 00000000000000..240f0bb0049c7f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_knowbias_bert_base_uncased_gender_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_knowbias_bert_base_uncased_gender BertSentenceEmbeddings from squiduu +author: John Snow Labs +name: sent_knowbias_bert_base_uncased_gender +date: 2024-09-25 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_knowbias_bert_base_uncased_gender` is a English model originally trained by squiduu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_knowbias_bert_base_uncased_gender_en_5.5.0_3.0_1727252460548.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_knowbias_bert_base_uncased_gender_en_5.5.0_3.0_1727252460548.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_knowbias_bert_base_uncased_gender","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_knowbias_bert_base_uncased_gender","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_knowbias_bert_base_uncased_gender| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/squiduu/knowbias-bert-base-uncased-gender \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_knowbias_bert_base_uncased_gender_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-sent_knowbias_bert_base_uncased_gender_pipeline_en.md new file mode 100644 index 00000000000000..edc3ba98ab60e3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_knowbias_bert_base_uncased_gender_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_knowbias_bert_base_uncased_gender_pipeline pipeline BertSentenceEmbeddings from squiduu +author: John Snow Labs +name: sent_knowbias_bert_base_uncased_gender_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_knowbias_bert_base_uncased_gender_pipeline` is a English model originally trained by squiduu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_knowbias_bert_base_uncased_gender_pipeline_en_5.5.0_3.0_1727252482647.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_knowbias_bert_base_uncased_gender_pipeline_en_5.5.0_3.0_1727252482647.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_knowbias_bert_base_uncased_gender_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_knowbias_bert_base_uncased_gender_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_knowbias_bert_base_uncased_gender_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.7 MB| + +## References + +https://huggingface.co/squiduu/knowbias-bert-base-uncased-gender + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_marathi_bert_smaller_mr.md b/docs/_posts/ahmedlone127/2024-09-25-sent_marathi_bert_smaller_mr.md new file mode 100644 index 00000000000000..b2b5934f5b7b99 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_marathi_bert_smaller_mr.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Marathi sent_marathi_bert_smaller BertSentenceEmbeddings from l3cube-pune +author: John Snow Labs +name: sent_marathi_bert_smaller +date: 2024-09-25 +tags: [mr, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: mr +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_marathi_bert_smaller` is a Marathi model originally trained by l3cube-pune. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_marathi_bert_smaller_mr_5.5.0_3.0_1727252791211.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_marathi_bert_smaller_mr_5.5.0_3.0_1727252791211.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_marathi_bert_smaller","mr") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_marathi_bert_smaller","mr") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_marathi_bert_smaller| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|mr| +|Size:|204.9 MB| + +## References + +https://huggingface.co/l3cube-pune/marathi-bert-smaller \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_marathi_bert_smaller_pipeline_mr.md b/docs/_posts/ahmedlone127/2024-09-25-sent_marathi_bert_smaller_pipeline_mr.md new file mode 100644 index 00000000000000..65572f04a07685 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_marathi_bert_smaller_pipeline_mr.md @@ -0,0 +1,71 @@ +--- +layout: model +title: Marathi sent_marathi_bert_smaller_pipeline pipeline BertSentenceEmbeddings from l3cube-pune +author: John Snow Labs +name: sent_marathi_bert_smaller_pipeline +date: 2024-09-25 +tags: [mr, open_source, pipeline, onnx] +task: Embeddings +language: mr +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_marathi_bert_smaller_pipeline` is a Marathi model originally trained by l3cube-pune. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_marathi_bert_smaller_pipeline_mr_5.5.0_3.0_1727252805395.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_marathi_bert_smaller_pipeline_mr_5.5.0_3.0_1727252805395.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_marathi_bert_smaller_pipeline", lang = "mr") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_marathi_bert_smaller_pipeline", lang = "mr") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_marathi_bert_smaller_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|mr| +|Size:|205.4 MB| + +## References + +https://huggingface.co/l3cube-pune/marathi-bert-smaller + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_medruberttiny2_ru.md b/docs/_posts/ahmedlone127/2024-09-25-sent_medruberttiny2_ru.md new file mode 100644 index 00000000000000..1d84ba3e5b8fe5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_medruberttiny2_ru.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Russian sent_medruberttiny2 BertSentenceEmbeddings from DmitryPogrebnoy +author: John Snow Labs +name: sent_medruberttiny2 +date: 2024-09-25 +tags: [ru, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: ru +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_medruberttiny2` is a Russian model originally trained by DmitryPogrebnoy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_medruberttiny2_ru_5.5.0_3.0_1727248896378.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_medruberttiny2_ru_5.5.0_3.0_1727248896378.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_medruberttiny2","ru") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_medruberttiny2","ru") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_medruberttiny2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|ru| +|Size:|109.1 MB| + +## References + +https://huggingface.co/DmitryPogrebnoy/MedRuBertTiny2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_miem_scibert_linguistic_en.md b/docs/_posts/ahmedlone127/2024-09-25-sent_miem_scibert_linguistic_en.md new file mode 100644 index 00000000000000..f38947d93d9587 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_miem_scibert_linguistic_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_miem_scibert_linguistic BertSentenceEmbeddings from miemBertProject +author: John Snow Labs +name: sent_miem_scibert_linguistic +date: 2024-09-25 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_miem_scibert_linguistic` is a English model originally trained by miemBertProject. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_miem_scibert_linguistic_en_5.5.0_3.0_1727249254687.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_miem_scibert_linguistic_en_5.5.0_3.0_1727249254687.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_miem_scibert_linguistic","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_miem_scibert_linguistic","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_miem_scibert_linguistic| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|657.4 MB| + +## References + +https://huggingface.co/miemBertProject/miem-scibert-linguistic \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_minilmv2_l6_h384_distilled_from_bert_base_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-sent_minilmv2_l6_h384_distilled_from_bert_base_pipeline_en.md new file mode 100644 index 00000000000000..609d5af33a5f92 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_minilmv2_l6_h384_distilled_from_bert_base_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_minilmv2_l6_h384_distilled_from_bert_base_pipeline pipeline BertSentenceEmbeddings from nreimers +author: John Snow Labs +name: sent_minilmv2_l6_h384_distilled_from_bert_base_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_minilmv2_l6_h384_distilled_from_bert_base_pipeline` is a English model originally trained by nreimers. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_minilmv2_l6_h384_distilled_from_bert_base_pipeline_en_5.5.0_3.0_1727230836299.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_minilmv2_l6_h384_distilled_from_bert_base_pipeline_en_5.5.0_3.0_1727230836299.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_minilmv2_l6_h384_distilled_from_bert_base_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_minilmv2_l6_h384_distilled_from_bert_base_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_minilmv2_l6_h384_distilled_from_bert_base_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|54.7 MB| + +## References + +https://huggingface.co/nreimers/MiniLMv2-L6-H384-distilled-from-BERT-Base + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_mitre_bert_small_en.md b/docs/_posts/ahmedlone127/2024-09-25-sent_mitre_bert_small_en.md new file mode 100644 index 00000000000000..366f761751cd5b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_mitre_bert_small_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_mitre_bert_small BertSentenceEmbeddings from bencyc1129 +author: John Snow Labs +name: sent_mitre_bert_small +date: 2024-09-25 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_mitre_bert_small` is a English model originally trained by bencyc1129. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_mitre_bert_small_en_5.5.0_3.0_1727234422468.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_mitre_bert_small_en_5.5.0_3.0_1727234422468.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_mitre_bert_small","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_mitre_bert_small","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_mitre_bert_small| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|108.5 MB| + +## References + +https://huggingface.co/bencyc1129/mitre-bert-small \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_mitre_bert_small_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-sent_mitre_bert_small_pipeline_en.md new file mode 100644 index 00000000000000..859da057a86d43 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_mitre_bert_small_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_mitre_bert_small_pipeline pipeline BertSentenceEmbeddings from bencyc1129 +author: John Snow Labs +name: sent_mitre_bert_small_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_mitre_bert_small_pipeline` is a English model originally trained by bencyc1129. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_mitre_bert_small_pipeline_en_5.5.0_3.0_1727234427869.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_mitre_bert_small_pipeline_en_5.5.0_3.0_1727234427869.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_mitre_bert_small_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_mitre_bert_small_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_mitre_bert_small_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|109.1 MB| + +## References + +https://huggingface.co/bencyc1129/mitre-bert-small + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_nepal_bhasa_bert_en.md b/docs/_posts/ahmedlone127/2024-09-25-sent_nepal_bhasa_bert_en.md new file mode 100644 index 00000000000000..0f1623707e7dec --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_nepal_bhasa_bert_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_nepal_bhasa_bert BertSentenceEmbeddings from searchfind +author: John Snow Labs +name: sent_nepal_bhasa_bert +date: 2024-09-25 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_nepal_bhasa_bert` is a English model originally trained by searchfind. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_nepal_bhasa_bert_en_5.5.0_3.0_1727253496987.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_nepal_bhasa_bert_en_5.5.0_3.0_1727253496987.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_nepal_bhasa_bert","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_nepal_bhasa_bert","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_nepal_bhasa_bert| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|403.6 MB| + +## References + +https://huggingface.co/searchfind/New_BERT \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_nepal_bhasa_bert_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-sent_nepal_bhasa_bert_pipeline_en.md new file mode 100644 index 00000000000000..a3c5e7b1ccfd83 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_nepal_bhasa_bert_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_nepal_bhasa_bert_pipeline pipeline BertSentenceEmbeddings from searchfind +author: John Snow Labs +name: sent_nepal_bhasa_bert_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_nepal_bhasa_bert_pipeline` is a English model originally trained by searchfind. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_nepal_bhasa_bert_pipeline_en_5.5.0_3.0_1727253519790.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_nepal_bhasa_bert_pipeline_en_5.5.0_3.0_1727253519790.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_nepal_bhasa_bert_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_nepal_bhasa_bert_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_nepal_bhasa_bert_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|404.2 MB| + +## References + +https://huggingface.co/searchfind/New_BERT + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_norbert_no.md b/docs/_posts/ahmedlone127/2024-09-25-sent_norbert_no.md new file mode 100644 index 00000000000000..4549a29e9105b5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_norbert_no.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Norwegian sent_norbert BertSentenceEmbeddings from ltg +author: John Snow Labs +name: sent_norbert +date: 2024-09-25 +tags: ["no", open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: "no" +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_norbert` is a Norwegian model originally trained by ltg. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_norbert_no_5.5.0_3.0_1727253356669.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_norbert_no_5.5.0_3.0_1727253356669.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_norbert","no") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_norbert","no") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_norbert| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|no| +|Size:|415.2 MB| + +## References + +https://huggingface.co/ltg/norbert \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_norbert_pipeline_no.md b/docs/_posts/ahmedlone127/2024-09-25-sent_norbert_pipeline_no.md new file mode 100644 index 00000000000000..b602290fecdd90 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_norbert_pipeline_no.md @@ -0,0 +1,71 @@ +--- +layout: model +title: Norwegian sent_norbert_pipeline pipeline BertSentenceEmbeddings from ltg +author: John Snow Labs +name: sent_norbert_pipeline +date: 2024-09-25 +tags: ["no", open_source, pipeline, onnx] +task: Embeddings +language: "no" +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_norbert_pipeline` is a Norwegian model originally trained by ltg. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_norbert_pipeline_no_5.5.0_3.0_1727253378130.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_norbert_pipeline_no_5.5.0_3.0_1727253378130.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_norbert_pipeline", lang = "no") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_norbert_pipeline", lang = "no") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_norbert_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|no| +|Size:|415.7 MB| + +## References + +https://huggingface.co/ltg/norbert + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_prompt_finetune_en.md b/docs/_posts/ahmedlone127/2024-09-25-sent_prompt_finetune_en.md new file mode 100644 index 00000000000000..8e58c290ee8033 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_prompt_finetune_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_prompt_finetune BertSentenceEmbeddings from AndyJ +author: John Snow Labs +name: sent_prompt_finetune +date: 2024-09-25 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_prompt_finetune` is a English model originally trained by AndyJ. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_prompt_finetune_en_5.5.0_3.0_1727234728465.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_prompt_finetune_en_5.5.0_3.0_1727234728465.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_prompt_finetune","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_prompt_finetune","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_prompt_finetune| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|408.0 MB| + +## References + +https://huggingface.co/AndyJ/prompt_finetune \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_prompt_finetune_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-sent_prompt_finetune_pipeline_en.md new file mode 100644 index 00000000000000..a7f20c53e38842 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_prompt_finetune_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_prompt_finetune_pipeline pipeline BertSentenceEmbeddings from AndyJ +author: John Snow Labs +name: sent_prompt_finetune_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_prompt_finetune_pipeline` is a English model originally trained by AndyJ. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_prompt_finetune_pipeline_en_5.5.0_3.0_1727234749748.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_prompt_finetune_pipeline_en_5.5.0_3.0_1727234749748.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_prompt_finetune_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_prompt_finetune_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_prompt_finetune_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|408.6 MB| + +## References + +https://huggingface.co/AndyJ/prompt_finetune + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_protaugment_lm_clinic150_en.md b/docs/_posts/ahmedlone127/2024-09-25-sent_protaugment_lm_clinic150_en.md new file mode 100644 index 00000000000000..64f095398e66c7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_protaugment_lm_clinic150_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_protaugment_lm_clinic150 BertSentenceEmbeddings from tdopierre +author: John Snow Labs +name: sent_protaugment_lm_clinic150 +date: 2024-09-25 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_protaugment_lm_clinic150` is a English model originally trained by tdopierre. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_protaugment_lm_clinic150_en_5.5.0_3.0_1727235126826.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_protaugment_lm_clinic150_en_5.5.0_3.0_1727235126826.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_protaugment_lm_clinic150","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_protaugment_lm_clinic150","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_protaugment_lm_clinic150| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|403.4 MB| + +## References + +https://huggingface.co/tdopierre/ProtAugment-LM-Clinic150 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_protaugment_lm_clinic150_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-sent_protaugment_lm_clinic150_pipeline_en.md new file mode 100644 index 00000000000000..945a985e99a0f4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_protaugment_lm_clinic150_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_protaugment_lm_clinic150_pipeline pipeline BertSentenceEmbeddings from tdopierre +author: John Snow Labs +name: sent_protaugment_lm_clinic150_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_protaugment_lm_clinic150_pipeline` is a English model originally trained by tdopierre. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_protaugment_lm_clinic150_pipeline_en_5.5.0_3.0_1727235147029.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_protaugment_lm_clinic150_pipeline_en_5.5.0_3.0_1727235147029.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_protaugment_lm_clinic150_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_protaugment_lm_clinic150_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_protaugment_lm_clinic150_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|404.0 MB| + +## References + +https://huggingface.co/tdopierre/ProtAugment-LM-Clinic150 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_retromae_msmarco_en.md b/docs/_posts/ahmedlone127/2024-09-25-sent_retromae_msmarco_en.md new file mode 100644 index 00000000000000..e04448e320e66b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_retromae_msmarco_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_retromae_msmarco BertSentenceEmbeddings from Shitao +author: John Snow Labs +name: sent_retromae_msmarco +date: 2024-09-25 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_retromae_msmarco` is a English model originally trained by Shitao. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_retromae_msmarco_en_5.5.0_3.0_1727230352092.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_retromae_msmarco_en_5.5.0_3.0_1727230352092.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_retromae_msmarco","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_retromae_msmarco","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_retromae_msmarco| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|407.7 MB| + +## References + +https://huggingface.co/Shitao/RetroMAE_MSMARCO \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_retromae_msmarco_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-sent_retromae_msmarco_pipeline_en.md new file mode 100644 index 00000000000000..229997ba616906 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_retromae_msmarco_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_retromae_msmarco_pipeline pipeline BertSentenceEmbeddings from Shitao +author: John Snow Labs +name: sent_retromae_msmarco_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_retromae_msmarco_pipeline` is a English model originally trained by Shitao. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_retromae_msmarco_pipeline_en_5.5.0_3.0_1727230372703.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_retromae_msmarco_pipeline_en_5.5.0_3.0_1727230372703.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_retromae_msmarco_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_retromae_msmarco_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_retromae_msmarco_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|408.2 MB| + +## References + +https://huggingface.co/Shitao/RetroMAE_MSMARCO + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_scholarbert_en.md b/docs/_posts/ahmedlone127/2024-09-25-sent_scholarbert_en.md new file mode 100644 index 00000000000000..08afc7e3ba1e64 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_scholarbert_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_scholarbert BertSentenceEmbeddings from globuslabs +author: John Snow Labs +name: sent_scholarbert +date: 2024-09-25 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_scholarbert` is a English model originally trained by globuslabs. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_scholarbert_en_5.5.0_3.0_1727253411352.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_scholarbert_en_5.5.0_3.0_1727253411352.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_scholarbert","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_scholarbert","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_scholarbert| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/globuslabs/ScholarBERT \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_viz_wiz_bert_base_uncased_f32_en.md b/docs/_posts/ahmedlone127/2024-09-25-sent_viz_wiz_bert_base_uncased_f32_en.md new file mode 100644 index 00000000000000..4942df83b4b9d7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_viz_wiz_bert_base_uncased_f32_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_viz_wiz_bert_base_uncased_f32 BertSentenceEmbeddings from eisenjulian +author: John Snow Labs +name: sent_viz_wiz_bert_base_uncased_f32 +date: 2024-09-25 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_viz_wiz_bert_base_uncased_f32` is a English model originally trained by eisenjulian. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_viz_wiz_bert_base_uncased_f32_en_5.5.0_3.0_1727234629727.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_viz_wiz_bert_base_uncased_f32_en_5.5.0_3.0_1727234629727.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_viz_wiz_bert_base_uncased_f32","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_viz_wiz_bert_base_uncased_f32","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_viz_wiz_bert_base_uncased_f32| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/eisenjulian/viz-wiz-bert-base-uncased_f32 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_viz_wiz_bert_base_uncased_f32_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-sent_viz_wiz_bert_base_uncased_f32_pipeline_en.md new file mode 100644 index 00000000000000..465446701401cb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_viz_wiz_bert_base_uncased_f32_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_viz_wiz_bert_base_uncased_f32_pipeline pipeline BertSentenceEmbeddings from eisenjulian +author: John Snow Labs +name: sent_viz_wiz_bert_base_uncased_f32_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_viz_wiz_bert_base_uncased_f32_pipeline` is a English model originally trained by eisenjulian. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_viz_wiz_bert_base_uncased_f32_pipeline_en_5.5.0_3.0_1727234650812.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_viz_wiz_bert_base_uncased_f32_pipeline_en_5.5.0_3.0_1727234650812.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_viz_wiz_bert_base_uncased_f32_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_viz_wiz_bert_base_uncased_f32_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_viz_wiz_bert_base_uncased_f32_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.7 MB| + +## References + +https://huggingface.co/eisenjulian/viz-wiz-bert-base-uncased_f32 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sentiment_ita_en.md b/docs/_posts/ahmedlone127/2024-09-25-sentiment_ita_en.md new file mode 100644 index 00000000000000..e01fbd133b19d5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sentiment_ita_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sentiment_ita BertForSequenceClassification from luigisaetta +author: John Snow Labs +name: sentiment_ita +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sentiment_ita` is a English model originally trained by luigisaetta. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sentiment_ita_en_5.5.0_3.0_1727222587188.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sentiment_ita_en_5.5.0_3.0_1727222587188.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("sentiment_ita","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("sentiment_ita", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sentiment_ita| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|414.8 MB| + +## References + +https://huggingface.co/luigisaetta/sentiment_ita \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sentiment_ita_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-sentiment_ita_pipeline_en.md new file mode 100644 index 00000000000000..2cad2cb329804d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sentiment_ita_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English sentiment_ita_pipeline pipeline BertForSequenceClassification from luigisaetta +author: John Snow Labs +name: sentiment_ita_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sentiment_ita_pipeline` is a English model originally trained by luigisaetta. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sentiment_ita_pipeline_en_5.5.0_3.0_1727222608612.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sentiment_ita_pipeline_en_5.5.0_3.0_1727222608612.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sentiment_ita_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sentiment_ita_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sentiment_ita_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|414.9 MB| + +## References + +https://huggingface.co/luigisaetta/sentiment_ita + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-shulchan_aruch_classifier_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-shulchan_aruch_classifier_pipeline_en.md new file mode 100644 index 00000000000000..623334a386b08c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-shulchan_aruch_classifier_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English shulchan_aruch_classifier_pipeline pipeline BertForSequenceClassification from sivan22 +author: John Snow Labs +name: shulchan_aruch_classifier_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`shulchan_aruch_classifier_pipeline` is a English model originally trained by sivan22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/shulchan_aruch_classifier_pipeline_en_5.5.0_3.0_1727273346179.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/shulchan_aruch_classifier_pipeline_en_5.5.0_3.0_1727273346179.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("shulchan_aruch_classifier_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("shulchan_aruch_classifier_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|shulchan_aruch_classifier_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|692.2 MB| + +## References + +https://huggingface.co/sivan22/shulchan-aruch-classifier + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-snli_test_100k_en.md b/docs/_posts/ahmedlone127/2024-09-25-snli_test_100k_en.md new file mode 100644 index 00000000000000..5aa9d0263afbc6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-snli_test_100k_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English snli_test_100k BertForSequenceClassification from grace-pro +author: John Snow Labs +name: snli_test_100k +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`snli_test_100k` is a English model originally trained by grace-pro. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/snli_test_100k_en_5.5.0_3.0_1727278442647.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/snli_test_100k_en_5.5.0_3.0_1727278442647.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("snli_test_100k","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("snli_test_100k", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|snli_test_100k| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/grace-pro/snli_test_100k \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-snli_test_100k_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-snli_test_100k_pipeline_en.md new file mode 100644 index 00000000000000..fc386b85fbc191 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-snli_test_100k_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English snli_test_100k_pipeline pipeline BertForSequenceClassification from grace-pro +author: John Snow Labs +name: snli_test_100k_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`snli_test_100k_pipeline` is a English model originally trained by grace-pro. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/snli_test_100k_pipeline_en_5.5.0_3.0_1727278464665.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/snli_test_100k_pipeline_en_5.5.0_3.0_1727278464665.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("snli_test_100k_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("snli_test_100k_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|snli_test_100k_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/grace-pro/snli_test_100k + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-star_predictor_en.md b/docs/_posts/ahmedlone127/2024-09-25-star_predictor_en.md new file mode 100644 index 00000000000000..b67a2019095d35 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-star_predictor_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English star_predictor BertForSequenceClassification from Yanni8 +author: John Snow Labs +name: star_predictor +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`star_predictor` is a English model originally trained by Yanni8. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/star_predictor_en_5.5.0_3.0_1727268038749.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/star_predictor_en_5.5.0_3.0_1727268038749.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("star_predictor","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("star_predictor", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|star_predictor| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/Yanni8/star-predictor \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-stereoset_bert_base_uncased_classifieronly_en.md b/docs/_posts/ahmedlone127/2024-09-25-stereoset_bert_base_uncased_classifieronly_en.md new file mode 100644 index 00000000000000..2ffc06ce131289 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-stereoset_bert_base_uncased_classifieronly_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English stereoset_bert_base_uncased_classifieronly BertForSequenceClassification from henryscheible +author: John Snow Labs +name: stereoset_bert_base_uncased_classifieronly +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`stereoset_bert_base_uncased_classifieronly` is a English model originally trained by henryscheible. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/stereoset_bert_base_uncased_classifieronly_en_5.5.0_3.0_1727286038183.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/stereoset_bert_base_uncased_classifieronly_en_5.5.0_3.0_1727286038183.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("stereoset_bert_base_uncased_classifieronly","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("stereoset_bert_base_uncased_classifieronly", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|stereoset_bert_base_uncased_classifieronly| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/henryscheible/stereoset_bert-base-uncased_classifieronly \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-stsb_vn_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-stsb_vn_pipeline_en.md new file mode 100644 index 00000000000000..09fb502aac397d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-stsb_vn_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English stsb_vn_pipeline pipeline BertForSequenceClassification from ntrnghia +author: John Snow Labs +name: stsb_vn_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`stsb_vn_pipeline` is a English model originally trained by ntrnghia. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/stsb_vn_pipeline_en_5.5.0_3.0_1727266098699.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/stsb_vn_pipeline_en_5.5.0_3.0_1727266098699.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("stsb_vn_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("stsb_vn_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|stsb_vn_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|667.3 MB| + +## References + +https://huggingface.co/ntrnghia/stsb_vn + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sustainable_finance_bert_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-sustainable_finance_bert_pipeline_en.md new file mode 100644 index 00000000000000..16d0b1cbdbc427 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sustainable_finance_bert_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English sustainable_finance_bert_pipeline pipeline BertForSequenceClassification from Pelumioluwa +author: John Snow Labs +name: sustainable_finance_bert_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sustainable_finance_bert_pipeline` is a English model originally trained by Pelumioluwa. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sustainable_finance_bert_pipeline_en_5.5.0_3.0_1727262007229.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sustainable_finance_bert_pipeline_en_5.5.0_3.0_1727262007229.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sustainable_finance_bert_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sustainable_finance_bert_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sustainable_finance_bert_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/Pelumioluwa/Sustainable-Finance-BERT + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-symptoms_tonga_tonga_islands_diagnosis_sonatafyai_bert_v1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-symptoms_tonga_tonga_islands_diagnosis_sonatafyai_bert_v1_pipeline_en.md new file mode 100644 index 00000000000000..2f6781c1a9d3a8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-symptoms_tonga_tonga_islands_diagnosis_sonatafyai_bert_v1_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English symptoms_tonga_tonga_islands_diagnosis_sonatafyai_bert_v1_pipeline pipeline BertForSequenceClassification from ajtamayoh +author: John Snow Labs +name: symptoms_tonga_tonga_islands_diagnosis_sonatafyai_bert_v1_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`symptoms_tonga_tonga_islands_diagnosis_sonatafyai_bert_v1_pipeline` is a English model originally trained by ajtamayoh. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/symptoms_tonga_tonga_islands_diagnosis_sonatafyai_bert_v1_pipeline_en_5.5.0_3.0_1727261105110.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/symptoms_tonga_tonga_islands_diagnosis_sonatafyai_bert_v1_pipeline_en_5.5.0_3.0_1727261105110.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("symptoms_tonga_tonga_islands_diagnosis_sonatafyai_bert_v1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("symptoms_tonga_tonga_islands_diagnosis_sonatafyai_bert_v1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|symptoms_tonga_tonga_islands_diagnosis_sonatafyai_bert_v1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.5 MB| + +## References + +https://huggingface.co/ajtamayoh/Symptoms_to_Diagnosis_SonatafyAI_BERT_v1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-t_frex_bert_large_uncased_en.md b/docs/_posts/ahmedlone127/2024-09-25-t_frex_bert_large_uncased_en.md new file mode 100644 index 00000000000000..3ac4d20ef2381f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-t_frex_bert_large_uncased_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English t_frex_bert_large_uncased BertForTokenClassification from quim-motger +author: John Snow Labs +name: t_frex_bert_large_uncased +date: 2024-09-25 +tags: [en, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`t_frex_bert_large_uncased` is a English model originally trained by quim-motger. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/t_frex_bert_large_uncased_en_5.5.0_3.0_1727271769440.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/t_frex_bert_large_uncased_en_5.5.0_3.0_1727271769440.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("t_frex_bert_large_uncased","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("t_frex_bert_large_uncased", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|t_frex_bert_large_uncased| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/quim-motger/t-frex-bert-large-uncased \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-teknofest_nlp_finetuned_tddi_en.md b/docs/_posts/ahmedlone127/2024-09-25-teknofest_nlp_finetuned_tddi_en.md new file mode 100644 index 00000000000000..c32dbafc208f94 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-teknofest_nlp_finetuned_tddi_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English teknofest_nlp_finetuned_tddi BertForSequenceClassification from OnurSahh +author: John Snow Labs +name: teknofest_nlp_finetuned_tddi +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`teknofest_nlp_finetuned_tddi` is a English model originally trained by OnurSahh. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/teknofest_nlp_finetuned_tddi_en_5.5.0_3.0_1727263454258.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/teknofest_nlp_finetuned_tddi_en_5.5.0_3.0_1727263454258.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("teknofest_nlp_finetuned_tddi","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("teknofest_nlp_finetuned_tddi", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|teknofest_nlp_finetuned_tddi| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|414.5 MB| + +## References + +https://huggingface.co/OnurSahh/teknofest_nlp_finetuned_tddi \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-teknofest_nlp_finetuned_tddi_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-teknofest_nlp_finetuned_tddi_pipeline_en.md new file mode 100644 index 00000000000000..d551da14abe8dc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-teknofest_nlp_finetuned_tddi_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English teknofest_nlp_finetuned_tddi_pipeline pipeline BertForSequenceClassification from OnurSahh +author: John Snow Labs +name: teknofest_nlp_finetuned_tddi_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`teknofest_nlp_finetuned_tddi_pipeline` is a English model originally trained by OnurSahh. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/teknofest_nlp_finetuned_tddi_pipeline_en_5.5.0_3.0_1727263478896.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/teknofest_nlp_finetuned_tddi_pipeline_en_5.5.0_3.0_1727263478896.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("teknofest_nlp_finetuned_tddi_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("teknofest_nlp_finetuned_tddi_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|teknofest_nlp_finetuned_tddi_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|414.6 MB| + +## References + +https://huggingface.co/OnurSahh/teknofest_nlp_finetuned_tddi + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-tempclin_biobertpt_clin_pipeline_pt.md b/docs/_posts/ahmedlone127/2024-09-25-tempclin_biobertpt_clin_pipeline_pt.md new file mode 100644 index 00000000000000..57abc1c5f046d7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-tempclin_biobertpt_clin_pipeline_pt.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Portuguese tempclin_biobertpt_clin_pipeline pipeline BertForTokenClassification from pucpr-br +author: John Snow Labs +name: tempclin_biobertpt_clin_pipeline +date: 2024-09-25 +tags: [pt, open_source, pipeline, onnx] +task: Named Entity Recognition +language: pt +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`tempclin_biobertpt_clin_pipeline` is a Portuguese model originally trained by pucpr-br. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/tempclin_biobertpt_clin_pipeline_pt_5.5.0_3.0_1727271160027.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/tempclin_biobertpt_clin_pipeline_pt_5.5.0_3.0_1727271160027.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("tempclin_biobertpt_clin_pipeline", lang = "pt") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("tempclin_biobertpt_clin_pipeline", lang = "pt") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|tempclin_biobertpt_clin_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|pt| +|Size:|665.1 MB| + +## References + +https://huggingface.co/pucpr-br/tempclin-biobertpt-clin + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-test_hub_push_en.md b/docs/_posts/ahmedlone127/2024-09-25-test_hub_push_en.md new file mode 100644 index 00000000000000..c5bdc6fe939a1f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-test_hub_push_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English test_hub_push BertForSequenceClassification from Tonita +author: John Snow Labs +name: test_hub_push +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`test_hub_push` is a English model originally trained by Tonita. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/test_hub_push_en_5.5.0_3.0_1727287918238.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/test_hub_push_en_5.5.0_3.0_1727287918238.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("test_hub_push","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("test_hub_push", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|test_hub_push| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/Tonita/test-hub-push \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-test_ner_rundi_en.md b/docs/_posts/ahmedlone127/2024-09-25-test_ner_rundi_en.md new file mode 100644 index 00000000000000..8194cd2dfb7c0d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-test_ner_rundi_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English test_ner_rundi BertForTokenClassification from lltala +author: John Snow Labs +name: test_ner_rundi +date: 2024-09-25 +tags: [en, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`test_ner_rundi` is a English model originally trained by lltala. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/test_ner_rundi_en_5.5.0_3.0_1727283624953.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/test_ner_rundi_en_5.5.0_3.0_1727283624953.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("test_ner_rundi","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("test_ner_rundi", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|test_ner_rundi| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/lltala/test-ner-run \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-test_trainer_gaito_20_en.md b/docs/_posts/ahmedlone127/2024-09-25-test_trainer_gaito_20_en.md new file mode 100644 index 00000000000000..53b6148a3b323e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-test_trainer_gaito_20_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English test_trainer_gaito_20 BertForSequenceClassification from gaito-20 +author: John Snow Labs +name: test_trainer_gaito_20 +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`test_trainer_gaito_20` is a English model originally trained by gaito-20. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/test_trainer_gaito_20_en_5.5.0_3.0_1727269856481.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/test_trainer_gaito_20_en_5.5.0_3.0_1727269856481.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("test_trainer_gaito_20","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("test_trainer_gaito_20", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|test_trainer_gaito_20| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/gaito-20/test-trainer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-test_trainer_gaito_20_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-test_trainer_gaito_20_pipeline_en.md new file mode 100644 index 00000000000000..5527c02fcc2f8c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-test_trainer_gaito_20_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English test_trainer_gaito_20_pipeline pipeline BertForSequenceClassification from gaito-20 +author: John Snow Labs +name: test_trainer_gaito_20_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`test_trainer_gaito_20_pipeline` is a English model originally trained by gaito-20. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/test_trainer_gaito_20_pipeline_en_5.5.0_3.0_1727269878752.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/test_trainer_gaito_20_pipeline_en_5.5.0_3.0_1727269878752.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("test_trainer_gaito_20_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("test_trainer_gaito_20_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|test_trainer_gaito_20_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/gaito-20/test-trainer + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-tos_bert_en.md b/docs/_posts/ahmedlone127/2024-09-25-tos_bert_en.md new file mode 100644 index 00000000000000..8c44091d460729 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-tos_bert_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English tos_bert BertForSequenceClassification from prasannadhungana8848 +author: John Snow Labs +name: tos_bert +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`tos_bert` is a English model originally trained by prasannadhungana8848. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/tos_bert_en_5.5.0_3.0_1727257634525.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/tos_bert_en_5.5.0_3.0_1727257634525.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("tos_bert","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("tos_bert", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|tos_bert| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/prasannadhungana8848/TOS_BERT \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-toxicity_type_detection_pipeline_pt.md b/docs/_posts/ahmedlone127/2024-09-25-toxicity_type_detection_pipeline_pt.md new file mode 100644 index 00000000000000..4b5a1981a1b04d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-toxicity_type_detection_pipeline_pt.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Portuguese toxicity_type_detection_pipeline pipeline BertForSequenceClassification from dougtrajano +author: John Snow Labs +name: toxicity_type_detection_pipeline +date: 2024-09-25 +tags: [pt, open_source, pipeline, onnx] +task: Text Classification +language: pt +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`toxicity_type_detection_pipeline` is a Portuguese model originally trained by dougtrajano. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/toxicity_type_detection_pipeline_pt_5.5.0_3.0_1727265790660.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/toxicity_type_detection_pipeline_pt_5.5.0_3.0_1727265790660.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("toxicity_type_detection_pipeline", lang = "pt") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("toxicity_type_detection_pipeline", lang = "pt") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|toxicity_type_detection_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|pt| +|Size:|408.2 MB| + +## References + +https://huggingface.co/dougtrajano/toxicity-type-detection + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-toxicity_type_detection_pt.md b/docs/_posts/ahmedlone127/2024-09-25-toxicity_type_detection_pt.md new file mode 100644 index 00000000000000..f9a3cc1d38edf7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-toxicity_type_detection_pt.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Portuguese toxicity_type_detection BertForSequenceClassification from dougtrajano +author: John Snow Labs +name: toxicity_type_detection +date: 2024-09-25 +tags: [pt, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: pt +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`toxicity_type_detection` is a Portuguese model originally trained by dougtrajano. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/toxicity_type_detection_pt_5.5.0_3.0_1727265768511.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/toxicity_type_detection_pt_5.5.0_3.0_1727265768511.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("toxicity_type_detection","pt") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("toxicity_type_detection", "pt") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|toxicity_type_detection| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|pt| +|Size:|408.2 MB| + +## References + +https://huggingface.co/dougtrajano/toxicity-type-detection \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-trac2020_iben_a_bert_base_multilingual_uncased_pipeline_xx.md b/docs/_posts/ahmedlone127/2024-09-25-trac2020_iben_a_bert_base_multilingual_uncased_pipeline_xx.md new file mode 100644 index 00000000000000..217788124d4f28 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-trac2020_iben_a_bert_base_multilingual_uncased_pipeline_xx.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Multilingual trac2020_iben_a_bert_base_multilingual_uncased_pipeline pipeline BertForSequenceClassification from socialmediaie +author: John Snow Labs +name: trac2020_iben_a_bert_base_multilingual_uncased_pipeline +date: 2024-09-25 +tags: [xx, open_source, pipeline, onnx] +task: Text Classification +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`trac2020_iben_a_bert_base_multilingual_uncased_pipeline` is a Multilingual model originally trained by socialmediaie. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/trac2020_iben_a_bert_base_multilingual_uncased_pipeline_xx_5.5.0_3.0_1727257138597.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/trac2020_iben_a_bert_base_multilingual_uncased_pipeline_xx_5.5.0_3.0_1727257138597.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("trac2020_iben_a_bert_base_multilingual_uncased_pipeline", lang = "xx") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("trac2020_iben_a_bert_base_multilingual_uncased_pipeline", lang = "xx") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|trac2020_iben_a_bert_base_multilingual_uncased_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|xx| +|Size:|627.8 MB| + +## References + +https://huggingface.co/socialmediaie/TRAC2020_IBEN_A_bert-base-multilingual-uncased + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-trac2020_iben_a_bert_base_multilingual_uncased_xx.md b/docs/_posts/ahmedlone127/2024-09-25-trac2020_iben_a_bert_base_multilingual_uncased_xx.md new file mode 100644 index 00000000000000..fc62152f2c8a68 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-trac2020_iben_a_bert_base_multilingual_uncased_xx.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Multilingual trac2020_iben_a_bert_base_multilingual_uncased BertForSequenceClassification from socialmediaie +author: John Snow Labs +name: trac2020_iben_a_bert_base_multilingual_uncased +date: 2024-09-25 +tags: [xx, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`trac2020_iben_a_bert_base_multilingual_uncased` is a Multilingual model originally trained by socialmediaie. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/trac2020_iben_a_bert_base_multilingual_uncased_xx_5.5.0_3.0_1727257105132.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/trac2020_iben_a_bert_base_multilingual_uncased_xx_5.5.0_3.0_1727257105132.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("trac2020_iben_a_bert_base_multilingual_uncased","xx") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("trac2020_iben_a_bert_base_multilingual_uncased", "xx") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|trac2020_iben_a_bert_base_multilingual_uncased| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|xx| +|Size:|627.7 MB| + +## References + +https://huggingface.co/socialmediaie/TRAC2020_IBEN_A_bert-base-multilingual-uncased \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-ttpxhunter_en.md b/docs/_posts/ahmedlone127/2024-09-25-ttpxhunter_en.md new file mode 100644 index 00000000000000..f3cff68eb1cf50 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-ttpxhunter_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English ttpxhunter RoBertaForSequenceClassification from nanda-rani +author: John Snow Labs +name: ttpxhunter +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ttpxhunter` is a English model originally trained by nanda-rani. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ttpxhunter_en_5.5.0_3.0_1727234001086.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ttpxhunter_en_5.5.0_3.0_1727234001086.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("ttpxhunter","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("ttpxhunter", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ttpxhunter| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|468.9 MB| + +## References + +https://huggingface.co/nanda-rani/TTPXHunter \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-ttpxhunter_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-ttpxhunter_pipeline_en.md new file mode 100644 index 00000000000000..c659eb13dca69c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-ttpxhunter_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English ttpxhunter_pipeline pipeline RoBertaForSequenceClassification from nanda-rani +author: John Snow Labs +name: ttpxhunter_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ttpxhunter_pipeline` is a English model originally trained by nanda-rani. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ttpxhunter_pipeline_en_5.5.0_3.0_1727234024886.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ttpxhunter_pipeline_en_5.5.0_3.0_1727234024886.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("ttpxhunter_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("ttpxhunter_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ttpxhunter_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|468.9 MB| + +## References + +https://huggingface.co/nanda-rani/TTPXHunter + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-tupi_bert_large_portuguese_cased_multiclass_multilabel_en.md b/docs/_posts/ahmedlone127/2024-09-25-tupi_bert_large_portuguese_cased_multiclass_multilabel_en.md new file mode 100644 index 00000000000000..3de803ba18fa28 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-tupi_bert_large_portuguese_cased_multiclass_multilabel_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English tupi_bert_large_portuguese_cased_multiclass_multilabel BertForSequenceClassification from FpOliveira +author: John Snow Labs +name: tupi_bert_large_portuguese_cased_multiclass_multilabel +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`tupi_bert_large_portuguese_cased_multiclass_multilabel` is a English model originally trained by FpOliveira. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/tupi_bert_large_portuguese_cased_multiclass_multilabel_en_5.5.0_3.0_1727242185903.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/tupi_bert_large_portuguese_cased_multiclass_multilabel_en_5.5.0_3.0_1727242185903.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("tupi_bert_large_portuguese_cased_multiclass_multilabel","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("tupi_bert_large_portuguese_cased_multiclass_multilabel", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|tupi_bert_large_portuguese_cased_multiclass_multilabel| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/FpOliveira/tupi-bert-large-portuguese-cased-multiclass-multilabel \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-tupi_bert_large_portuguese_cased_multiclass_multilabel_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-tupi_bert_large_portuguese_cased_multiclass_multilabel_pipeline_en.md new file mode 100644 index 00000000000000..40083cd3ddfb95 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-tupi_bert_large_portuguese_cased_multiclass_multilabel_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English tupi_bert_large_portuguese_cased_multiclass_multilabel_pipeline pipeline BertForSequenceClassification from FpOliveira +author: John Snow Labs +name: tupi_bert_large_portuguese_cased_multiclass_multilabel_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`tupi_bert_large_portuguese_cased_multiclass_multilabel_pipeline` is a English model originally trained by FpOliveira. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/tupi_bert_large_portuguese_cased_multiclass_multilabel_pipeline_en_5.5.0_3.0_1727242251028.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/tupi_bert_large_portuguese_cased_multiclass_multilabel_pipeline_en_5.5.0_3.0_1727242251028.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("tupi_bert_large_portuguese_cased_multiclass_multilabel_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("tupi_bert_large_portuguese_cased_multiclass_multilabel_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|tupi_bert_large_portuguese_cased_multiclass_multilabel_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/FpOliveira/tupi-bert-large-portuguese-cased-multiclass-multilabel + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-tupy_bert_base_binary_classifier_pt.md b/docs/_posts/ahmedlone127/2024-09-25-tupy_bert_base_binary_classifier_pt.md new file mode 100644 index 00000000000000..df3e61f6d3df3c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-tupy_bert_base_binary_classifier_pt.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Portuguese tupy_bert_base_binary_classifier BertForSequenceClassification from Silly-Machine +author: John Snow Labs +name: tupy_bert_base_binary_classifier +date: 2024-09-25 +tags: [pt, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: pt +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`tupy_bert_base_binary_classifier` is a Portuguese model originally trained by Silly-Machine. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/tupy_bert_base_binary_classifier_pt_5.5.0_3.0_1727268815687.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/tupy_bert_base_binary_classifier_pt_5.5.0_3.0_1727268815687.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("tupy_bert_base_binary_classifier","pt") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("tupy_bert_base_binary_classifier", "pt") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|tupy_bert_base_binary_classifier| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|pt| +|Size:|408.2 MB| + +## References + +https://huggingface.co/Silly-Machine/TuPy-Bert-Base-Binary-Classifier \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-turkish_earthquake_tweets_ner_en.md b/docs/_posts/ahmedlone127/2024-09-25-turkish_earthquake_tweets_ner_en.md new file mode 100644 index 00000000000000..5c947a91f14a57 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-turkish_earthquake_tweets_ner_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English turkish_earthquake_tweets_ner BertForTokenClassification from yhaslan +author: John Snow Labs +name: turkish_earthquake_tweets_ner +date: 2024-09-25 +tags: [en, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`turkish_earthquake_tweets_ner` is a English model originally trained by yhaslan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/turkish_earthquake_tweets_ner_en_5.5.0_3.0_1727249736193.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/turkish_earthquake_tweets_ner_en_5.5.0_3.0_1727249736193.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("turkish_earthquake_tweets_ner","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("turkish_earthquake_tweets_ner", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|turkish_earthquake_tweets_ner| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|412.3 MB| + +## References + +https://huggingface.co/yhaslan/turkish-earthquake-tweets-ner \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-turkish_earthquake_tweets_ner_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-turkish_earthquake_tweets_ner_pipeline_en.md new file mode 100644 index 00000000000000..c5b885c15e02aa --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-turkish_earthquake_tweets_ner_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English turkish_earthquake_tweets_ner_pipeline pipeline BertForTokenClassification from yhaslan +author: John Snow Labs +name: turkish_earthquake_tweets_ner_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`turkish_earthquake_tweets_ner_pipeline` is a English model originally trained by yhaslan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/turkish_earthquake_tweets_ner_pipeline_en_5.5.0_3.0_1727249758142.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/turkish_earthquake_tweets_ner_pipeline_en_5.5.0_3.0_1727249758142.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("turkish_earthquake_tweets_ner_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("turkish_earthquake_tweets_ner_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|turkish_earthquake_tweets_ner_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|412.3 MB| + +## References + +https://huggingface.co/yhaslan/turkish-earthquake-tweets-ner + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-turkish_tiny_bert_uncased_offenseval2020_turkish_tr.md b/docs/_posts/ahmedlone127/2024-09-25-turkish_tiny_bert_uncased_offenseval2020_turkish_tr.md new file mode 100644 index 00000000000000..42ca2361569e67 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-turkish_tiny_bert_uncased_offenseval2020_turkish_tr.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Turkish turkish_tiny_bert_uncased_offenseval2020_turkish BertForSequenceClassification from atasoglu +author: John Snow Labs +name: turkish_tiny_bert_uncased_offenseval2020_turkish +date: 2024-09-25 +tags: [tr, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: tr +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`turkish_tiny_bert_uncased_offenseval2020_turkish` is a Turkish model originally trained by atasoglu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/turkish_tiny_bert_uncased_offenseval2020_turkish_tr_5.5.0_3.0_1727287835043.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/turkish_tiny_bert_uncased_offenseval2020_turkish_tr_5.5.0_3.0_1727287835043.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("turkish_tiny_bert_uncased_offenseval2020_turkish","tr") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("turkish_tiny_bert_uncased_offenseval2020_turkish", "tr") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|turkish_tiny_bert_uncased_offenseval2020_turkish| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|tr| +|Size:|17.5 MB| + +## References + +https://huggingface.co/atasoglu/turkish-tiny-bert-uncased-offenseval2020_tr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-turkishnewsanalysis_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-turkishnewsanalysis_pipeline_en.md new file mode 100644 index 00000000000000..efd5f7ce5030aa --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-turkishnewsanalysis_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English turkishnewsanalysis_pipeline pipeline BertForSequenceClassification from MesutAktas +author: John Snow Labs +name: turkishnewsanalysis_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`turkishnewsanalysis_pipeline` is a English model originally trained by MesutAktas. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/turkishnewsanalysis_pipeline_en_5.5.0_3.0_1727254244524.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/turkishnewsanalysis_pipeline_en_5.5.0_3.0_1727254244524.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("turkishnewsanalysis_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("turkishnewsanalysis_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|turkishnewsanalysis_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|414.6 MB| + +## References + +https://huggingface.co/MesutAktas/TurkishNewsAnalysis + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-twitter_empathy_miles_en.md b/docs/_posts/ahmedlone127/2024-09-25-twitter_empathy_miles_en.md new file mode 100644 index 00000000000000..46d13a1cecb5b8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-twitter_empathy_miles_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English twitter_empathy_miles BertForSequenceClassification from rxsong +author: John Snow Labs +name: twitter_empathy_miles +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`twitter_empathy_miles` is a English model originally trained by rxsong. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/twitter_empathy_miles_en_5.5.0_3.0_1727237576259.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/twitter_empathy_miles_en_5.5.0_3.0_1727237576259.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("twitter_empathy_miles","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("twitter_empathy_miles", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|twitter_empathy_miles| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|405.9 MB| + +## References + +https://huggingface.co/rxsong/twitter_empathy_Miles \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-twitter_empathy_miles_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-twitter_empathy_miles_pipeline_en.md new file mode 100644 index 00000000000000..c6e379ca9d39b2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-twitter_empathy_miles_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English twitter_empathy_miles_pipeline pipeline BertForSequenceClassification from rxsong +author: John Snow Labs +name: twitter_empathy_miles_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`twitter_empathy_miles_pipeline` is a English model originally trained by rxsong. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/twitter_empathy_miles_pipeline_en_5.5.0_3.0_1727237597262.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/twitter_empathy_miles_pipeline_en_5.5.0_3.0_1727237597262.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("twitter_empathy_miles_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("twitter_empathy_miles_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|twitter_empathy_miles_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|405.9 MB| + +## References + +https://huggingface.co/rxsong/twitter_empathy_Miles + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-twitter_sentiment_analysis_en.md b/docs/_posts/ahmedlone127/2024-09-25-twitter_sentiment_analysis_en.md new file mode 100644 index 00000000000000..644a3b5ed86236 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-twitter_sentiment_analysis_en.md @@ -0,0 +1,98 @@ +--- +layout: model +title: English twitter_sentiment_analysis DistilBertForSequenceClassification from vickylin21 +author: John Snow Labs +name: twitter_sentiment_analysis +date: 2024-09-25 +tags: [bert, en, open_source, sequence_classification, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`twitter_sentiment_analysis` is a English model originally trained by vickylin21. + +## Predicted Entities + + + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/twitter_sentiment_analysis_en_5.5.0_3.0_1727277654806.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/twitter_sentiment_analysis_en_5.5.0_3.0_1727277654806.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +document_assembler = DocumentAssembler()\ + .setInputCol("text")\ + .setOutputCol("document") + +tokenizer = Tokenizer()\ + .setInputCols("document")\ + .setOutputCol("token") + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("twitter_sentiment_analysis","en")\ + .setInputCols(["document","token"])\ + .setOutputCol("class") + +pipeline = Pipeline().setStages([document_assembler, tokenizer, sequenceClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val document_assembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("twitter_sentiment_analysis","en") + .setInputCols(Array("document","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|twitter_sentiment_analysis| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +References + +https://huggingface.co/vickylin21/Twitter_sentiment_analysis \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-twitter_xlm_roberta_base_sentiment_en.md b/docs/_posts/ahmedlone127/2024-09-25-twitter_xlm_roberta_base_sentiment_en.md new file mode 100644 index 00000000000000..aee1f003300787 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-twitter_xlm_roberta_base_sentiment_en.md @@ -0,0 +1,88 @@ +--- +layout: model +title: twitter_xlm_roberta_base_sentiment(Cardiff nlp) (Veer) +author: John Snow Labs +name: twitter_xlm_roberta_base_sentiment +date: 2024-09-25 +tags: [en, open_source, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +This is a multilingual XLM-roBERTa-base model trained on ~198M tweets and finetuned for sentiment analysis. The sentiment fine-tuning was done on 8 languages (Ar, En, Fr, De, Hi, It, Sp, Pt) but it can be used for more languages (see paper for details). + +Paper: XLM-T: A Multilingual Language Model Toolkit for Twitter. +Git Repo: XLM-T official repository. +This model has been integrated into the TweetNLP library. + +## Predicted Entities + + + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/twitter_xlm_roberta_base_sentiment_en_5.5.0_3.0_1727229581766.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/twitter_xlm_roberta_base_sentiment_en_5.5.0_3.0_1727229581766.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +from pyspark.ml import Pipeline + +document_assembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained('twitter_xlm_roberta_base_sentiment')\ + .setInputCols(["document",'token'])\ + .setOutputCol("class") + +pipeline = Pipeline(stages=[ + document_assembler, + tokenizer, + sequenceClassifier +]) + +# couple of simple examples +example = spark.createDataFrame([['사랑해!'], ["T'estimo! ❤️"], ["I love you!"], ['Mahal kita!']]).toDF("text") + +result = pipeline.fit(example).transform(example) + +# result is a DataFrame +result.select("text", "class.result").show() +``` + +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|twitter_xlm_roberta_base_sentiment| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.0 GB| \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-twitter_xlm_roberta_base_sentiment_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-twitter_xlm_roberta_base_sentiment_pipeline_en.md new file mode 100644 index 00000000000000..79a96647e2c87d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-twitter_xlm_roberta_base_sentiment_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English twitter_xlm_roberta_base_sentiment_pipeline pipeline XlmRoBertaForSequenceClassification from cardiffnlp +author: John Snow Labs +name: twitter_xlm_roberta_base_sentiment_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`twitter_xlm_roberta_base_sentiment_pipeline` is a English model originally trained by cardiffnlp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/twitter_xlm_roberta_base_sentiment_pipeline_en_5.5.0_3.0_1727229633590.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/twitter_xlm_roberta_base_sentiment_pipeline_en_5.5.0_3.0_1727229633590.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("twitter_xlm_roberta_base_sentiment_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("twitter_xlm_roberta_base_sentiment_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|twitter_xlm_roberta_base_sentiment_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.0 GB| + +## References + +https://huggingface.co/cardiffnlp/twitter-xlm-roberta-base-sentiment + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-valueeval24_bert_baseline_english_en.md b/docs/_posts/ahmedlone127/2024-09-25-valueeval24_bert_baseline_english_en.md new file mode 100644 index 00000000000000..6be7a376a61ad3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-valueeval24_bert_baseline_english_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English valueeval24_bert_baseline_english BertForSequenceClassification from JohannesKiesel +author: John Snow Labs +name: valueeval24_bert_baseline_english +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`valueeval24_bert_baseline_english` is a English model originally trained by JohannesKiesel. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/valueeval24_bert_baseline_english_en_5.5.0_3.0_1727284648180.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/valueeval24_bert_baseline_english_en_5.5.0_3.0_1727284648180.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("valueeval24_bert_baseline_english","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("valueeval24_bert_baseline_english", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|valueeval24_bert_baseline_english| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.5 MB| + +## References + +https://huggingface.co/JohannesKiesel/valueeval24-bert-baseline-en \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-wb_charcs_extraction_en.md b/docs/_posts/ahmedlone127/2024-09-25-wb_charcs_extraction_en.md new file mode 100644 index 00000000000000..46a5d83704b7fb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-wb_charcs_extraction_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English wb_charcs_extraction BertForTokenClassification from vkimbris +author: John Snow Labs +name: wb_charcs_extraction +date: 2024-09-25 +tags: [en, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`wb_charcs_extraction` is a English model originally trained by vkimbris. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/wb_charcs_extraction_en_5.5.0_3.0_1727271968225.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/wb_charcs_extraction_en_5.5.0_3.0_1727271968225.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("wb_charcs_extraction","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("wb_charcs_extraction", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|wb_charcs_extraction| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|661.9 MB| + +## References + +https://huggingface.co/vkimbris/wb-charcs-extraction \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-whisper_dutch_nl.md b/docs/_posts/ahmedlone127/2024-09-25-whisper_dutch_nl.md new file mode 100644 index 00000000000000..14cbde5ffa6927 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-whisper_dutch_nl.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Dutch, Flemish whisper_dutch WhisperForCTC from hannatoenbreker +author: John Snow Labs +name: whisper_dutch +date: 2024-09-25 +tags: [nl, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: nl +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_dutch` is a Dutch, Flemish model originally trained by hannatoenbreker. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_dutch_nl_5.5.0_3.0_1727226938874.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_dutch_nl_5.5.0_3.0_1727226938874.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_dutch","nl") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_dutch", "nl") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_dutch| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|nl| +|Size:|1.7 GB| + +## References + +https://huggingface.co/hannatoenbreker/whisper-dutch \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-whisper_dutch_pipeline_nl.md b/docs/_posts/ahmedlone127/2024-09-25-whisper_dutch_pipeline_nl.md new file mode 100644 index 00000000000000..1b1537bdb3b6ac --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-whisper_dutch_pipeline_nl.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Dutch, Flemish whisper_dutch_pipeline pipeline WhisperForCTC from hannatoenbreker +author: John Snow Labs +name: whisper_dutch_pipeline +date: 2024-09-25 +tags: [nl, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: nl +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_dutch_pipeline` is a Dutch, Flemish model originally trained by hannatoenbreker. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_dutch_pipeline_nl_5.5.0_3.0_1727227027051.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_dutch_pipeline_nl_5.5.0_3.0_1727227027051.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_dutch_pipeline", lang = "nl") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_dutch_pipeline", lang = "nl") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_dutch_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|nl| +|Size:|1.7 GB| + +## References + +https://huggingface.co/hannatoenbreker/whisper-dutch + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-whisper_small_inbrowser_proctor_en.md b/docs/_posts/ahmedlone127/2024-09-25-whisper_small_inbrowser_proctor_en.md new file mode 100644 index 00000000000000..571d31583b59ab --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-whisper_small_inbrowser_proctor_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_small_inbrowser_proctor WhisperForCTC from lord-reso +author: John Snow Labs +name: whisper_small_inbrowser_proctor +date: 2024-09-25 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_inbrowser_proctor` is a English model originally trained by lord-reso. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_inbrowser_proctor_en_5.5.0_3.0_1727226765800.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_inbrowser_proctor_en_5.5.0_3.0_1727226765800.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_inbrowser_proctor","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_inbrowser_proctor", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_inbrowser_proctor| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/lord-reso/whisper-small-inbrowser-proctor \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-whisper_small_inbrowser_proctor_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-whisper_small_inbrowser_proctor_pipeline_en.md new file mode 100644 index 00000000000000..1beab7a3038901 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-whisper_small_inbrowser_proctor_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_small_inbrowser_proctor_pipeline pipeline WhisperForCTC from lord-reso +author: John Snow Labs +name: whisper_small_inbrowser_proctor_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_inbrowser_proctor_pipeline` is a English model originally trained by lord-reso. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_inbrowser_proctor_pipeline_en_5.5.0_3.0_1727226850641.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_inbrowser_proctor_pipeline_en_5.5.0_3.0_1727226850641.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_inbrowser_proctor_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_inbrowser_proctor_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_inbrowser_proctor_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/lord-reso/whisper-small-inbrowser-proctor + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-whisper_small_khmer_km.md b/docs/_posts/ahmedlone127/2024-09-25-whisper_small_khmer_km.md new file mode 100644 index 00000000000000..d44045a4bb35e4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-whisper_small_khmer_km.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Central Khmer, Khmer whisper_small_khmer WhisperForCTC from seanghay +author: John Snow Labs +name: whisper_small_khmer +date: 2024-09-25 +tags: [km, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: km +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_khmer` is a Central Khmer, Khmer model originally trained by seanghay. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_khmer_km_5.5.0_3.0_1727224018090.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_khmer_km_5.5.0_3.0_1727224018090.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_khmer","km") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_khmer", "km") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_khmer| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|km| +|Size:|1.7 GB| + +## References + +https://huggingface.co/seanghay/whisper-small-khmer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-whisper_small_khmer_pipeline_km.md b/docs/_posts/ahmedlone127/2024-09-25-whisper_small_khmer_pipeline_km.md new file mode 100644 index 00000000000000..cf10a7862e11f3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-whisper_small_khmer_pipeline_km.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Central Khmer, Khmer whisper_small_khmer_pipeline pipeline WhisperForCTC from seanghay +author: John Snow Labs +name: whisper_small_khmer_pipeline +date: 2024-09-25 +tags: [km, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: km +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_khmer_pipeline` is a Central Khmer, Khmer model originally trained by seanghay. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_khmer_pipeline_km_5.5.0_3.0_1727224111359.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_khmer_pipeline_km_5.5.0_3.0_1727224111359.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_khmer_pipeline", lang = "km") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_khmer_pipeline", lang = "km") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_khmer_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|km| +|Size:|1.7 GB| + +## References + +https://huggingface.co/seanghay/whisper-small-khmer + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-whisper_small_portuguese_jlondonobo_pipeline_pt.md b/docs/_posts/ahmedlone127/2024-09-25-whisper_small_portuguese_jlondonobo_pipeline_pt.md new file mode 100644 index 00000000000000..af97022d5e0f3c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-whisper_small_portuguese_jlondonobo_pipeline_pt.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Portuguese whisper_small_portuguese_jlondonobo_pipeline pipeline WhisperForCTC from jlondonobo +author: John Snow Labs +name: whisper_small_portuguese_jlondonobo_pipeline +date: 2024-09-25 +tags: [pt, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: pt +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_portuguese_jlondonobo_pipeline` is a Portuguese model originally trained by jlondonobo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_portuguese_jlondonobo_pipeline_pt_5.5.0_3.0_1727224457904.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_portuguese_jlondonobo_pipeline_pt_5.5.0_3.0_1727224457904.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_portuguese_jlondonobo_pipeline", lang = "pt") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_portuguese_jlondonobo_pipeline", lang = "pt") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_portuguese_jlondonobo_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|pt| +|Size:|1.7 GB| + +## References + +https://huggingface.co/jlondonobo/whisper-small-pt + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-whisper_small_portuguese_jlondonobo_pt.md b/docs/_posts/ahmedlone127/2024-09-25-whisper_small_portuguese_jlondonobo_pt.md new file mode 100644 index 00000000000000..e6f723ed1a4fe7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-whisper_small_portuguese_jlondonobo_pt.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Portuguese whisper_small_portuguese_jlondonobo WhisperForCTC from jlondonobo +author: John Snow Labs +name: whisper_small_portuguese_jlondonobo +date: 2024-09-25 +tags: [pt, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: pt +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_portuguese_jlondonobo` is a Portuguese model originally trained by jlondonobo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_portuguese_jlondonobo_pt_5.5.0_3.0_1727224365258.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_portuguese_jlondonobo_pt_5.5.0_3.0_1727224365258.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_portuguese_jlondonobo","pt") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_portuguese_jlondonobo", "pt") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_portuguese_jlondonobo| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|pt| +|Size:|1.7 GB| + +## References + +https://huggingface.co/jlondonobo/whisper-small-pt \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-whisper_small_portuguese_pedropauletti_pipeline_pt.md b/docs/_posts/ahmedlone127/2024-09-25-whisper_small_portuguese_pedropauletti_pipeline_pt.md new file mode 100644 index 00000000000000..b12a03597bd2af --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-whisper_small_portuguese_pedropauletti_pipeline_pt.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Portuguese whisper_small_portuguese_pedropauletti_pipeline pipeline WhisperForCTC from pedropauletti +author: John Snow Labs +name: whisper_small_portuguese_pedropauletti_pipeline +date: 2024-09-25 +tags: [pt, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: pt +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_portuguese_pedropauletti_pipeline` is a Portuguese model originally trained by pedropauletti. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_portuguese_pedropauletti_pipeline_pt_5.5.0_3.0_1727228230723.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_portuguese_pedropauletti_pipeline_pt_5.5.0_3.0_1727228230723.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_portuguese_pedropauletti_pipeline", lang = "pt") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_portuguese_pedropauletti_pipeline", lang = "pt") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_portuguese_pedropauletti_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|pt| +|Size:|1.7 GB| + +## References + +https://huggingface.co/pedropauletti/whisper-small-pt + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-whisper_small_portuguese_pedropauletti_pt.md b/docs/_posts/ahmedlone127/2024-09-25-whisper_small_portuguese_pedropauletti_pt.md new file mode 100644 index 00000000000000..ded2ecb95c7490 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-whisper_small_portuguese_pedropauletti_pt.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Portuguese whisper_small_portuguese_pedropauletti WhisperForCTC from pedropauletti +author: John Snow Labs +name: whisper_small_portuguese_pedropauletti +date: 2024-09-25 +tags: [pt, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: pt +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_portuguese_pedropauletti` is a Portuguese model originally trained by pedropauletti. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_portuguese_pedropauletti_pt_5.5.0_3.0_1727228145048.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_portuguese_pedropauletti_pt_5.5.0_3.0_1727228145048.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_portuguese_pedropauletti","pt") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_portuguese_pedropauletti", "pt") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_portuguese_pedropauletti| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|pt| +|Size:|1.7 GB| + +## References + +https://huggingface.co/pedropauletti/whisper-small-pt \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-whisper_small_turkish_istech_en.md b/docs/_posts/ahmedlone127/2024-09-25-whisper_small_turkish_istech_en.md new file mode 100644 index 00000000000000..d8ad5d3076865b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-whisper_small_turkish_istech_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_small_turkish_istech WhisperForCTC from muratsimsek003 +author: John Snow Labs +name: whisper_small_turkish_istech +date: 2024-09-25 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_turkish_istech` is a English model originally trained by muratsimsek003. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_turkish_istech_en_5.5.0_3.0_1727225876743.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_turkish_istech_en_5.5.0_3.0_1727225876743.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_turkish_istech","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_turkish_istech", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_turkish_istech| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/muratsimsek003/whisper-small-tr-istech \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-whisper_small_turkish_istech_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-whisper_small_turkish_istech_pipeline_en.md new file mode 100644 index 00000000000000..13af8e1cb7d73f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-whisper_small_turkish_istech_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_small_turkish_istech_pipeline pipeline WhisperForCTC from muratsimsek003 +author: John Snow Labs +name: whisper_small_turkish_istech_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_turkish_istech_pipeline` is a English model originally trained by muratsimsek003. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_turkish_istech_pipeline_en_5.5.0_3.0_1727225973702.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_turkish_istech_pipeline_en_5.5.0_3.0_1727225973702.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_turkish_istech_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_turkish_istech_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_turkish_istech_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/muratsimsek003/whisper-small-tr-istech + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-whisper_test_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-whisper_test_pipeline_en.md new file mode 100644 index 00000000000000..3173f244becd3b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-whisper_test_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_test_pipeline pipeline WhisperForCTC from SamagraDataGov +author: John Snow Labs +name: whisper_test_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_test_pipeline` is a English model originally trained by SamagraDataGov. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_test_pipeline_en_5.5.0_3.0_1727224748193.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_test_pipeline_en_5.5.0_3.0_1727224748193.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_test_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_test_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_test_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|390.0 MB| + +## References + +https://huggingface.co/SamagraDataGov/whisper-test + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-word2affect_dutch_pipeline_nl.md b/docs/_posts/ahmedlone127/2024-09-25-word2affect_dutch_pipeline_nl.md new file mode 100644 index 00000000000000..a153e6ed00f871 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-word2affect_dutch_pipeline_nl.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Dutch, Flemish word2affect_dutch_pipeline pipeline BertForSequenceClassification from hplisiecki +author: John Snow Labs +name: word2affect_dutch_pipeline +date: 2024-09-25 +tags: [nl, open_source, pipeline, onnx] +task: Text Classification +language: nl +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`word2affect_dutch_pipeline` is a Dutch, Flemish model originally trained by hplisiecki. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/word2affect_dutch_pipeline_nl_5.5.0_3.0_1727265918885.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/word2affect_dutch_pipeline_nl_5.5.0_3.0_1727265918885.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("word2affect_dutch_pipeline", lang = "nl") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("word2affect_dutch_pipeline", lang = "nl") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|word2affect_dutch_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|nl| +|Size:|409.3 MB| + +## References + +https://huggingface.co/hplisiecki/word2affect_dutch + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-xlm_roberta_base_final_vietnam_aug_insert_bert_2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-xlm_roberta_base_final_vietnam_aug_insert_bert_2_pipeline_en.md new file mode 100644 index 00000000000000..189b648b0bce41 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-xlm_roberta_base_final_vietnam_aug_insert_bert_2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_final_vietnam_aug_insert_bert_2_pipeline pipeline XlmRoBertaForSequenceClassification from ThuyNT03 +author: John Snow Labs +name: xlm_roberta_base_final_vietnam_aug_insert_bert_2_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_final_vietnam_aug_insert_bert_2_pipeline` is a English model originally trained by ThuyNT03. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_final_vietnam_aug_insert_bert_2_pipeline_en_5.5.0_3.0_1727228848305.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_final_vietnam_aug_insert_bert_2_pipeline_en_5.5.0_3.0_1727228848305.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_final_vietnam_aug_insert_bert_2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_final_vietnam_aug_insert_bert_2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_final_vietnam_aug_insert_bert_2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|794.2 MB| + +## References + +https://huggingface.co/ThuyNT03/xlm-roberta-base-Final_VietNam-aug_insert_BERT-2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-xlm_roberta_base_final_vietnam_aug_replace_bert_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-xlm_roberta_base_final_vietnam_aug_replace_bert_pipeline_en.md new file mode 100644 index 00000000000000..f25c16826faa25 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-xlm_roberta_base_final_vietnam_aug_replace_bert_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_final_vietnam_aug_replace_bert_pipeline pipeline XlmRoBertaForSequenceClassification from ThuyNT03 +author: John Snow Labs +name: xlm_roberta_base_final_vietnam_aug_replace_bert_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_final_vietnam_aug_replace_bert_pipeline` is a English model originally trained by ThuyNT03. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_final_vietnam_aug_replace_bert_pipeline_en_5.5.0_3.0_1727229058216.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_final_vietnam_aug_replace_bert_pipeline_en_5.5.0_3.0_1727229058216.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_final_vietnam_aug_replace_bert_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_final_vietnam_aug_replace_bert_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_final_vietnam_aug_replace_bert_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|794.4 MB| + +## References + +https://huggingface.co/ThuyNT03/xlm-roberta-base-Final_VietNam-aug_replace_BERT + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-xlm_roberta_base_finetuned_marc_lewtun_en.md b/docs/_posts/ahmedlone127/2024-09-25-xlm_roberta_base_finetuned_marc_lewtun_en.md new file mode 100644 index 00000000000000..67df13092b333c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-xlm_roberta_base_finetuned_marc_lewtun_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_marc_lewtun XlmRoBertaForSequenceClassification from lewtun +author: John Snow Labs +name: xlm_roberta_base_finetuned_marc_lewtun +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_marc_lewtun` is a English model originally trained by lewtun. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_marc_lewtun_en_5.5.0_3.0_1727228952110.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_marc_lewtun_en_5.5.0_3.0_1727228952110.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_finetuned_marc_lewtun","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_finetuned_marc_lewtun", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_marc_lewtun| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|846.8 MB| + +## References + +https://huggingface.co/lewtun/xlm-roberta-base-finetuned-marc \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-xlm_roberta_base_finetuned_marc_lewtun_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-xlm_roberta_base_finetuned_marc_lewtun_pipeline_en.md new file mode 100644 index 00000000000000..21bfed335616b4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-xlm_roberta_base_finetuned_marc_lewtun_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_marc_lewtun_pipeline pipeline XlmRoBertaForSequenceClassification from lewtun +author: John Snow Labs +name: xlm_roberta_base_finetuned_marc_lewtun_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_marc_lewtun_pipeline` is a English model originally trained by lewtun. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_marc_lewtun_pipeline_en_5.5.0_3.0_1727229035417.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_marc_lewtun_pipeline_en_5.5.0_3.0_1727229035417.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_marc_lewtun_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_marc_lewtun_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_marc_lewtun_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|846.8 MB| + +## References + +https://huggingface.co/lewtun/xlm-roberta-base-finetuned-marc + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-xlm_roberta_base_finetuned_misogyny_sexism_en.md b/docs/_posts/ahmedlone127/2024-09-25-xlm_roberta_base_finetuned_misogyny_sexism_en.md new file mode 100644 index 00000000000000..bc42bc1e4cba89 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-xlm_roberta_base_finetuned_misogyny_sexism_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_misogyny_sexism XlmRoBertaForSequenceClassification from annahaz +author: John Snow Labs +name: xlm_roberta_base_finetuned_misogyny_sexism +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_misogyny_sexism` is a English model originally trained by annahaz. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_misogyny_sexism_en_5.5.0_3.0_1727229178558.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_misogyny_sexism_en_5.5.0_3.0_1727229178558.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_finetuned_misogyny_sexism","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_finetuned_misogyny_sexism", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_misogyny_sexism| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|877.4 MB| + +## References + +https://huggingface.co/annahaz/xlm-roberta-base-finetuned-misogyny-sexism \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-xlm_roberta_base_germeval21_toxic_en.md b/docs/_posts/ahmedlone127/2024-09-25-xlm_roberta_base_germeval21_toxic_en.md new file mode 100644 index 00000000000000..f1adcb1733557b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-xlm_roberta_base_germeval21_toxic_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_germeval21_toxic XlmRoBertaForSequenceClassification from airKlizz +author: John Snow Labs +name: xlm_roberta_base_germeval21_toxic +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_germeval21_toxic` is a English model originally trained by airKlizz. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_germeval21_toxic_en_5.5.0_3.0_1727228708588.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_germeval21_toxic_en_5.5.0_3.0_1727228708588.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_germeval21_toxic","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_germeval21_toxic", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_germeval21_toxic| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|784.9 MB| + +## References + +https://huggingface.co/airKlizz/xlm-roberta-base-germeval21-toxic \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-xlm_roberta_base_germeval21_toxic_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-xlm_roberta_base_germeval21_toxic_pipeline_en.md new file mode 100644 index 00000000000000..fc1a1fffff613f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-xlm_roberta_base_germeval21_toxic_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_germeval21_toxic_pipeline pipeline XlmRoBertaForSequenceClassification from airKlizz +author: John Snow Labs +name: xlm_roberta_base_germeval21_toxic_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_germeval21_toxic_pipeline` is a English model originally trained by airKlizz. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_germeval21_toxic_pipeline_en_5.5.0_3.0_1727228858238.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_germeval21_toxic_pipeline_en_5.5.0_3.0_1727228858238.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_germeval21_toxic_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_germeval21_toxic_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_germeval21_toxic_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|784.9 MB| + +## References + +https://huggingface.co/airKlizz/xlm-roberta-base-germeval21-toxic + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-xlm_roberta_base_irumozhi_pipeline_ta.md b/docs/_posts/ahmedlone127/2024-09-25-xlm_roberta_base_irumozhi_pipeline_ta.md new file mode 100644 index 00000000000000..84bdb91f16c7fa --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-xlm_roberta_base_irumozhi_pipeline_ta.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Tamil xlm_roberta_base_irumozhi_pipeline pipeline XlmRoBertaForSequenceClassification from aryaman +author: John Snow Labs +name: xlm_roberta_base_irumozhi_pipeline +date: 2024-09-25 +tags: [ta, open_source, pipeline, onnx] +task: Text Classification +language: ta +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_irumozhi_pipeline` is a Tamil model originally trained by aryaman. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_irumozhi_pipeline_ta_5.5.0_3.0_1727229375117.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_irumozhi_pipeline_ta_5.5.0_3.0_1727229375117.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_irumozhi_pipeline", lang = "ta") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_irumozhi_pipeline", lang = "ta") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_irumozhi_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|ta| +|Size:|773.9 MB| + +## References + +https://huggingface.co/aryaman/xlm-roberta-base-irumozhi + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-xlm_roberta_base_lr0_001_seed42_amh_esp_eng_train_en.md b/docs/_posts/ahmedlone127/2024-09-25-xlm_roberta_base_lr0_001_seed42_amh_esp_eng_train_en.md new file mode 100644 index 00000000000000..d25974525f95f5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-xlm_roberta_base_lr0_001_seed42_amh_esp_eng_train_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_lr0_001_seed42_amh_esp_eng_train XlmRoBertaForSequenceClassification from shanhy +author: John Snow Labs +name: xlm_roberta_base_lr0_001_seed42_amh_esp_eng_train +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_lr0_001_seed42_amh_esp_eng_train` is a English model originally trained by shanhy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_lr0_001_seed42_amh_esp_eng_train_en_5.5.0_3.0_1727228876329.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_lr0_001_seed42_amh_esp_eng_train_en_5.5.0_3.0_1727228876329.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_lr0_001_seed42_amh_esp_eng_train","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_lr0_001_seed42_amh_esp_eng_train", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_lr0_001_seed42_amh_esp_eng_train| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|832.3 MB| + +## References + +https://huggingface.co/shanhy/xlm-roberta-base_lr0.001_seed42_amh-esp-eng_train \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-xlm_roberta_base_lr0_001_seed42_amh_esp_eng_train_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-xlm_roberta_base_lr0_001_seed42_amh_esp_eng_train_pipeline_en.md new file mode 100644 index 00000000000000..44e126a969b97b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-xlm_roberta_base_lr0_001_seed42_amh_esp_eng_train_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_lr0_001_seed42_amh_esp_eng_train_pipeline pipeline XlmRoBertaForSequenceClassification from shanhy +author: John Snow Labs +name: xlm_roberta_base_lr0_001_seed42_amh_esp_eng_train_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_lr0_001_seed42_amh_esp_eng_train_pipeline` is a English model originally trained by shanhy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_lr0_001_seed42_amh_esp_eng_train_pipeline_en_5.5.0_3.0_1727228956743.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_lr0_001_seed42_amh_esp_eng_train_pipeline_en_5.5.0_3.0_1727228956743.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_lr0_001_seed42_amh_esp_eng_train_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_lr0_001_seed42_amh_esp_eng_train_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_lr0_001_seed42_amh_esp_eng_train_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|832.3 MB| + +## References + +https://huggingface.co/shanhy/xlm-roberta-base_lr0.001_seed42_amh-esp-eng_train + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-xlm_roberta_base_sentiment_classification_test_v2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-xlm_roberta_base_sentiment_classification_test_v2_pipeline_en.md new file mode 100644 index 00000000000000..23d29821e92d4f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-xlm_roberta_base_sentiment_classification_test_v2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_sentiment_classification_test_v2_pipeline pipeline BertForSequenceClassification from pnr-svc +author: John Snow Labs +name: xlm_roberta_base_sentiment_classification_test_v2_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_sentiment_classification_test_v2_pipeline` is a English model originally trained by pnr-svc. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_sentiment_classification_test_v2_pipeline_en_5.5.0_3.0_1727266531365.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_sentiment_classification_test_v2_pipeline_en_5.5.0_3.0_1727266531365.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_sentiment_classification_test_v2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_sentiment_classification_test_v2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_sentiment_classification_test_v2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|414.5 MB| + +## References + +https://huggingface.co/pnr-svc/xlm-roberta-base-sentiment-classification_test_V2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-xlm_roberta_base_sst2_10_en.md b/docs/_posts/ahmedlone127/2024-09-25-xlm_roberta_base_sst2_10_en.md new file mode 100644 index 00000000000000..d6cb2b93e47f3d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-xlm_roberta_base_sst2_10_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_sst2_10 XlmRoBertaForSequenceClassification from tmnam20 +author: John Snow Labs +name: xlm_roberta_base_sst2_10 +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_sst2_10` is a English model originally trained by tmnam20. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_sst2_10_en_5.5.0_3.0_1727229679712.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_sst2_10_en_5.5.0_3.0_1727229679712.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_sst2_10","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_sst2_10", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_sst2_10| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|779.4 MB| + +## References + +https://huggingface.co/tmnam20/xlm-roberta-base-sst2-10 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-xlm_roberta_base_trimmed_english_30000_tweet_sentiment_english_en.md b/docs/_posts/ahmedlone127/2024-09-25-xlm_roberta_base_trimmed_english_30000_tweet_sentiment_english_en.md new file mode 100644 index 00000000000000..2851ffe0f4c172 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-xlm_roberta_base_trimmed_english_30000_tweet_sentiment_english_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_trimmed_english_30000_tweet_sentiment_english XlmRoBertaForSequenceClassification from vocabtrimmer +author: John Snow Labs +name: xlm_roberta_base_trimmed_english_30000_tweet_sentiment_english +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_trimmed_english_30000_tweet_sentiment_english` is a English model originally trained by vocabtrimmer. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_trimmed_english_30000_tweet_sentiment_english_en_5.5.0_3.0_1727228558441.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_trimmed_english_30000_tweet_sentiment_english_en_5.5.0_3.0_1727228558441.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_trimmed_english_30000_tweet_sentiment_english","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_trimmed_english_30000_tweet_sentiment_english", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_trimmed_english_30000_tweet_sentiment_english| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|390.0 MB| + +## References + +https://huggingface.co/vocabtrimmer/xlm-roberta-base-trimmed-en-30000-tweet-sentiment-en \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-xlm_roberta_base_trimmed_english_30000_tweet_sentiment_english_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-xlm_roberta_base_trimmed_english_30000_tweet_sentiment_english_pipeline_en.md new file mode 100644 index 00000000000000..3b1a9c0f206952 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-xlm_roberta_base_trimmed_english_30000_tweet_sentiment_english_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_trimmed_english_30000_tweet_sentiment_english_pipeline pipeline XlmRoBertaForSequenceClassification from vocabtrimmer +author: John Snow Labs +name: xlm_roberta_base_trimmed_english_30000_tweet_sentiment_english_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_trimmed_english_30000_tweet_sentiment_english_pipeline` is a English model originally trained by vocabtrimmer. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_trimmed_english_30000_tweet_sentiment_english_pipeline_en_5.5.0_3.0_1727228586507.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_trimmed_english_30000_tweet_sentiment_english_pipeline_en_5.5.0_3.0_1727228586507.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_trimmed_english_30000_tweet_sentiment_english_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_trimmed_english_30000_tweet_sentiment_english_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_trimmed_english_30000_tweet_sentiment_english_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|390.0 MB| + +## References + +https://huggingface.co/vocabtrimmer/xlm-roberta-base-trimmed-en-30000-tweet-sentiment-en + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-xlm_roberta_base_trimmed_italian_30000_tweet_sentiment_italian_en.md b/docs/_posts/ahmedlone127/2024-09-25-xlm_roberta_base_trimmed_italian_30000_tweet_sentiment_italian_en.md new file mode 100644 index 00000000000000..c5130f4a59b646 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-xlm_roberta_base_trimmed_italian_30000_tweet_sentiment_italian_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_trimmed_italian_30000_tweet_sentiment_italian XlmRoBertaForSequenceClassification from vocabtrimmer +author: John Snow Labs +name: xlm_roberta_base_trimmed_italian_30000_tweet_sentiment_italian +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_trimmed_italian_30000_tweet_sentiment_italian` is a English model originally trained by vocabtrimmer. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_trimmed_italian_30000_tweet_sentiment_italian_en_5.5.0_3.0_1727228710743.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_trimmed_italian_30000_tweet_sentiment_italian_en_5.5.0_3.0_1727228710743.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_trimmed_italian_30000_tweet_sentiment_italian","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_trimmed_italian_30000_tweet_sentiment_italian", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_trimmed_italian_30000_tweet_sentiment_italian| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|388.9 MB| + +## References + +https://huggingface.co/vocabtrimmer/xlm-roberta-base-trimmed-it-30000-tweet-sentiment-it \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-xlmroberta_classifier_base_mrpc_en.md b/docs/_posts/ahmedlone127/2024-09-25-xlmroberta_classifier_base_mrpc_en.md new file mode 100644 index 00000000000000..be0311740fd4fd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-xlmroberta_classifier_base_mrpc_en.md @@ -0,0 +1,105 @@ +--- +layout: model +title: English XlmRobertaForSequenceClassification Base Cased model (from Intel) +author: John Snow Labs +name: xlmroberta_classifier_base_mrpc +date: 2024-09-25 +tags: [en, open_source, xlm_roberta, sequence_classification, classification, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRobertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `xlm-roberta-base-mrpc` is a English model originally trained by `Intel`. + +## Predicted Entities + +`equivalent`, `not_equivalent` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmroberta_classifier_base_mrpc_en_5.5.0_3.0_1727229909505.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmroberta_classifier_base_mrpc_en_5.5.0_3.0_1727229909505.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +seq_classifier = XlmRoBertaForSequenceClassification.pretrained("xlmroberta_classifier_base_mrpc","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("class") + +pipeline = Pipeline(stages=[documentAssembler, tokenizer, seq_classifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCols(Array("text")) + .setOutputCols(Array("document")) + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val seq_classifier = XlmRoBertaForSequenceClassification.pretrained("xlmroberta_classifier_base_mrpc","en") + .setInputCols(Array("document", "token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, seq_classifier)) + +val data = Seq("PUT YOUR STRING HERE").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.classify.xlmr_roberta.glue.base").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmroberta_classifier_base_mrpc| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|787.7 MB| + +## References + +References + +- https://huggingface.co/Intel/xlm-roberta-base-mrpc +- https://paperswithcode.com/sota?task=Text+Classification&dataset=GLUE+MRPC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-xlmroberta_classifier_base_mrpc_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-xlmroberta_classifier_base_mrpc_pipeline_en.md new file mode 100644 index 00000000000000..c21f16728e31de --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-xlmroberta_classifier_base_mrpc_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlmroberta_classifier_base_mrpc_pipeline pipeline XlmRoBertaForSequenceClassification from Intel +author: John Snow Labs +name: xlmroberta_classifier_base_mrpc_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlmroberta_classifier_base_mrpc_pipeline` is a English model originally trained by Intel. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmroberta_classifier_base_mrpc_pipeline_en_5.5.0_3.0_1727230042885.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmroberta_classifier_base_mrpc_pipeline_en_5.5.0_3.0_1727230042885.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlmroberta_classifier_base_mrpc_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlmroberta_classifier_base_mrpc_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmroberta_classifier_base_mrpc_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|787.8 MB| + +## References + +https://huggingface.co/Intel/xlm-roberta-base-mrpc + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-yahoo1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-yahoo1_pipeline_en.md new file mode 100644 index 00000000000000..9bfc9ab142eec7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-yahoo1_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English yahoo1_pipeline pipeline BertForSequenceClassification from Lumos +author: John Snow Labs +name: yahoo1_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`yahoo1_pipeline` is a English model originally trained by Lumos. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/yahoo1_pipeline_en_5.5.0_3.0_1727272725736.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/yahoo1_pipeline_en_5.5.0_3.0_1727272725736.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("yahoo1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("yahoo1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|yahoo1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/Lumos/yahoo1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-yahoo2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-yahoo2_pipeline_en.md new file mode 100644 index 00000000000000..1b3a98717af3e9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-yahoo2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English yahoo2_pipeline pipeline BertForSequenceClassification from Lumos +author: John Snow Labs +name: yahoo2_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`yahoo2_pipeline` is a English model originally trained by Lumos. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/yahoo2_pipeline_en_5.5.0_3.0_1727287225807.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/yahoo2_pipeline_en_5.5.0_3.0_1727287225807.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("yahoo2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("yahoo2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|yahoo2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/Lumos/yahoo2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file